2020 28th European Signal Processing Conference (EUSIPCO) | 2021

Analysis of Phonetic Dependence of Segmentation Errors in Speaker Diarization

 
 
 

Abstract


Evaluation of speaker segmentation and diarization normally makes use of forgiveness collars around ground truth speaker segment boundaries such that estimated speaker segment boundaries with such collars are considered completely correct. This paper shows that the popular recent approach of removing forgiveness collars from speaker diarization evaluation tools can unfairly penalize speaker diarization systems that correctly estimate speaker segment boundaries. The uncertainty in identifying the start and/or end of a particular phoneme means that the ground truth segmentation is not perfectly accurate, and even trained human listeners are unable to identify phoneme boundaries with full consistency. This research analyses the phoneme dependence of this uncertainty, and shows that it depends on (i) whether the phoneme being detected is at the start or end of an utterance and (ii) what the phoneme is, so that the use of a uniform forgiveness collar is inadequate. This analysis is expected to point the way towards more indicative and repeatable assessment of the performance of speaker diarization systems.

Volume None
Pages 381-385
DOI 10.23919/Eusipco47968.2020.9287552
Language English
Journal 2020 28th European Signal Processing Conference (EUSIPCO)

Full Text