Studies in Computational Intelligence | 2021

Recognition of Isolated English Words of E-Lecture Video Using Convolutional Neural Network

 
 
 
 
 
 

Abstract


Speech Recognition has been gaining a lot of importance, as there is tremendous growth in its applications such as building subtitles for e-lecture videos, transcription of recorded speech for people with physical disabilities. Speech recognition is a complex task because it involves various strong accents, run-over words, varying rates of speech and background noise. Previously used Speech Recognition systems were speaker dependent and was challenging in constructing acoustic models, which had non-linear boundaries. The paper presents recognition of isolated English words in three steps, pre-processing, segmentation and extraction of Mel Frequency Cepstrum Coefficient (MFCC) of an audio signal from a video. Further, training and classification of audio signals is done using Convolutional Neural Network (CNN). Various types of input features, which play a vital role in the recognition process is described and we infer that, the representation of MFCC feature with varying length in analysis window and window step produces different results. The feature set with higher Winlen and Winstep, yields better result i.e., 97%.

Volume None
Pages None
DOI 10.1007/978-3-030-68291-0_34
Language English
Journal Studies in Computational Intelligence

Full Text