Multimedia Tools and Applications | 2021

Model selection toward robustness speaker verification in reverberant conditions

 
 

Abstract


Speech signals that recorded in the far field or with a distant microphone typically comprise additive noise and reverberation, which cause degradation and distortion in the reliability and intelligibility of speech signal, and the recognition performance of speaker recognition systems, with severe consequences in a wide range of real applications. Channel equalization, i.e. the removal or reduction or other cleaning methods of the channel effects, to some extent, mitigates the mismatching problem at the cost of added distortions to the vulnerable speech signal themselves, and therefore, its effectiveness is limited. This paper proposed to estimate the reverberation first and incorporate them into individual training examples to create virtually matched channels. The training process is performed before the final decision-making. In the training stage, the selection training target model out of the dataset of models that are trained in different reverberate environments and then using acoustic matched models for the reverberate in the test stage. The best matching model is selected by blindly estimating the full band reverberation time RT using maximum likelihood. Speaker recognition experiments in the artificial and real reverberate conditions show the efficiency of the proposed method in terms of decreased equal error rate EER and detection error trade-off DET.

Volume None
Pages None
DOI 10.1007/s11042-021-11356-3
Language English
Journal Multimedia Tools and Applications

Full Text