Archive | 2019

Prosody Usage Optimization for Children Speech Recognition with Zero Resource Children Speech

 
 

Abstract


Children’s speech recognition remains a big challenge for automatic speech recognition. Due to the more difficult process and higher cost on data collection, most current ASR systems are optimized only using lots of adult speech with limited or even none children’s speech. Accordingly, the acoustic mismatch between children’s and adult speech is the primary reason for the ASR performance degradation when facing children’s speech. To overcome this problem, we proposed several approaches to improve children’s speech recognition without using any children’s speech data. A better utilization strategy on prosodybased features is developed. First, pitch and prosody modification is explored in both training and testing respectively, which can significantly reduce the mismatch between two types of speech. Furthermore, joint-decoding with both the prosody modified speech and the original speech is designed to get a more robust performance on both children’s and adult speech. Experiments are evaluated on a Mandarin speech recognition task, with only 400-hour adult speech in the training. The results show that our proposed method can obtain a large gain on children’s speech, with relative ∼20% WER reduction compared to the baseline, and also no obvious degradation is observed on the adult speech for the proposed system.

Volume None
Pages 3446-3450
DOI 10.21437/interspeech.2019-2659
Language English
Journal None

Full Text