Archive | 2019

Prosody Usage Optimization for Children Speech Recognition with Zero Resource Children Speech

Abstract

Children’s speech recognition remains a big challenge for automatic speech recognition. Due to the more difficult process and higher cost on data collection, most current ASR systems are optimized only using lots of adult speech with limited or even none children’s speech. Accordingly, the acoustic mismatch between children’s and adult speech is the primary reason for the ASR performance degradation when facing children’s speech. To overcome this problem, we proposed several approaches to improve children’s speech recognition without using any children’s speech data. A better utilization strategy on prosodybased features is developed. First, pitch and prosody modification is explored in both training and testing respectively, which can significantly reduce the mismatch between two types of speech. Furthermore, joint-decoding with both the prosody modified speech and the original speech is designed to get a more robust performance on both children’s and adult speech. Experiments are evaluated on a Mandarin speech recognition task, with only 400-hour adult speech in the training. The results show that our proposed method can obtain a large gain on children’s speech, with relative ∼20% WER reduction compared to the baseline, and also no obvious degradation is observed on the adult speech for the proposed system.

Volume None

Archive | 2019

Prosody Usage Optimization for Children Speech Recognition with Zero Resource Children Speech

Abstract

Volume None

Pages 3446-3450

DOI 10.21437/interspeech.2019-2659

Language English

Journal None

Full Text