2021 12th International Symposium on Chinese Spoken Language Processing (ISCSLP) | 2021

Speaker Embedding Augmentation with Noise Distribution Matching

 
 
 
 
 
 

Abstract


Data augmentation (DA) is an effective strategy to help building robust systems with good generalization ability. In the embedding based speaker verification, data augmentation could be applied to either the front-end embedding extractor or the back-end PLDA. Unlike the conventional back-end augmentation method which adds noises to the raw audios and then extracts augmented embeddings, in this work, we proposed a noise distribution matching (NDM) based algorithm in the speaker embedding space. The basic idea is to use distributions such as Gaussian to model the difference between the clean and original augmented noisy speaker embeddings. Experiments are carried out on SRE16 dataset, where consistent performance improvement could be obtained by the novel NDM. Furthermore, we found that the proposed NDM could be robustly estimated using only a small amount of training data, which saves time and disk cost compared to the conventional augmentation method.

Volume None
Pages 1-5
DOI 10.1109/ISCSLP49672.2021.9362090
Language English
Journal 2021 12th International Symposium on Chinese Spoken Language Processing (ISCSLP)

Full Text