2019 IEEE International Conference on Signal, Information and Data Processing (ICSIDP) | 2019
Speaker diarization for multi-speaker conversations via x-vectors
Abstract
This paper investigates a new way to build x-vectors based speaker diarization system for multi-speaker conversations, and explore how to improve system performance. There has been a lot of work to prove the superiority of x-vectors in speaker diarization, but it has not been applied in multi-speaker scenarios. We have studied several techniques in our system, such as dividing a long conversation into short overlapping segments to facilitate the extraction of x-vectors instead of ignoring overlapping regions, and re-classifying the labels of coincident segments after clustering to reduce errors. In addition, we enhance the training data to deal with the problem of insufficient discriminant analysis, and select the appropriate number of archives to control the iteration of training samples when training neural networks. Finally, the experimental results on the AMI croups demonstrate the effectiveness of our system. Compared with the initial system of 2018 DIHARD challenge track 2, our final result is relatively reduced by 13.21% on Diarization Error Rate (DER).