Applied Acoustics | 2021

Noise robust in-domain children speech enhancement for automatic Punjabi recognition system under mismatched conditions

 
 

Abstract


Abstract Success of any commercial Automatic Speech Recognition (ASR) system depends upon availability of its training data. Although, it s performance gets degraded due to absence of enough signal processing characteristics in less resource language corpora. Development of Punjabi Children speech system is one such challenge where zero resource conditions and variabilities in children speech occurs due to speaking speed and vocal tract length than that of adult speech. In this paper, efforts have been made to build Punjabi Children ASR system under mismatched conditions using noise robust approaches like Mel Frequency Cepstral Coefficient (MFCC) or Gammatone Frequency Cepstral Coefficient (GFCC). Consequently, acoustic and phonetic variations among adult and children speech are handled using gender based in-domain training data augmentation and later acoustic variability among speakers in training and testing sets are normalised using Vocal Tract Length Normalization (VTLN). We demonstrate that inclusion of pitch features with test normalized children dataset has significantly enhanced system performance over different environment conditions i.e clean or noisy. The experimental results show a relative improvement of 30.94% using adult female voice pooled with limited children speech over adult male corpus on noise based training data augmentation respectively.

Volume 175
Pages 107810
DOI 10.1016/j.apacoust.2020.107810
Language English
Journal Applied Acoustics

Full Text