International Journal of Speech Technology | 2021

Enhancing accuracy of long contextual dependencies for Punjabi speech recognition system using deep LSTM

 
 
 

Abstract


Long short term memory (LSTM) is a powerful model in building of an ASR system whereas standard recurrent networks are generally inefficient to obtain better performance. Although these issues are addressed in LSTM neural network architecture but their performance get degraded on long contextual information. Recent experiments show that LSTM and their improved approaches like Deep LSTM requires a lot of tuning in training and experiences. In this paper Deep LSTM models are built on long contextual sentences by selecting optimal value of batch size, layer, and activation functions. It also indulge comparative study of train and test perplexity through computation of word error rate. Furthermore, we use hybrid discriminative approaches with different variants of iterations which shows significant improvement with Deep LSTM networks. Experiments are mainly perform on single sentences or one to two concatenated sentences. Deep LSTM achieves performance improvement of 3–4% over conventional Language Models (LMs) and modelling classifier approaches with acceptable word error rate on top of state-of-the-art Punjabi speech recognition system.

Volume 24
Pages 517-527
DOI 10.1007/S10772-021-09814-2
Language English
Journal International Journal of Speech Technology

Full Text