Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Gayadhar Pradhan is active.

Publication


Featured researches published by Gayadhar Pradhan.


international conference on acoustics, speech, and signal processing | 2017

Enhancing noise and pitch robustness of children's ASR

Syed Shahnawazuddin; K T Deepak; Gayadhar Pradhan; Rohit Sinha

It is well known that, when noisy speech is transcribed using automatic speech recognition (ASR) systems trained on clean data, a highly degraded recognition performance is obtained. The problemgets further aggravatedwhen the targeted group happens to be child speakers. For childrens speech, the acoustic correlates such as pitch and formant frequency vary significantly with age. This makes the recognition of childrens speech very challenging. In this paper, we have explored the ways to enhance the noise robustness of ASR systems for childrens speech. Towards addressing the same, recently developed front-end acoustic features based on spectral moments (SMAC) are explored. The SMAC features are reported to be more noise robust than the conventional features like the mel-frequency cepsatral coefficients. At the same time, the SMAC features are also noted to be sensitive to the variations in the pitch. To reduce the pitch sensitivity, a spectral smoothing approach based on adaptive-liftering is proposed. Spectral smoothening prior to the computation of spectral moments results in a significant improvement in the robustness to pitch without affecting the noise immunity. To further enhance noise robustness, a foreground speech segmentation and enhancement module is also included in the proposed front-end speech parameterization technique.


IEEE Signal Processing Letters | 2017

Pitch-Normalized Acoustic Features for Robust Children's Speech Recognition

Syed Shahnawazuddin; Rohit Sinha; Gayadhar Pradhan

In this letter, the effectiveness of recently reported SMAC (Spectral Moment time–frequency distribution Augmented by low-order Cepstral) features has been evaluated for robust automatic speech recognition (ASR). The SMAC features consist of normalized first central spectral moments appended with low-order cepstral coefficients. These features have been designed for achieving robustness to both additive noise and the pitch variations. We have explored the SMAC features in severe pitch mismatch ASR task, i.e., decoding of childrens speech on adults’ speech trained ASR system. In those tasks, the SMAC features are still observed to be sensitive to pitch variations. Toward addressing the same, a simple spectral smoothening approach employing adaptive-cepstral truncation is explored prior to the computation of spectral moments. With the proposed modification, the SMAC features are noted to achieve enhanced pitch robustness without affecting their noise immunity. Furthermore, the effectiveness of the proposed features is explored in three dominant acoustic modeling paradigms and varying data conditions. In all the cases, the proposed features are observed to significantly outperform the existing ones.


Circuits Systems and Signal Processing | 2017

Improvements in the Detection of Vowel Onset and Offset Points in a Speech Sequence

Avinash Kumar; Syed Shahnawazuddin; Gayadhar Pradhan

Detecting the vowel regions in a given speech signal has been a challenging area of research for a long time. A number of works have been reported over the years to accurately detect the vowel regions and the corresponding vowel onset points (VOPs) and vowel end points (VEPs). Effectiveness of the statistical acoustic modeling techniques and the front-end signal processing approaches has been explored in this regard. The work presented in this paper aims at improving the detection of vowel regions as well as the VOPs and VEPs. A number of statistical modeling approaches developed over the years have been employed in this work for the aforementioned task. To do the same, three-class classifiers (vowel, nonvowel and silence) are developed on the TIMIT database employing the different acoustic modeling techniques and the classification performances are studied. Using any particular three-class classifier, a given speech sample is then forced-aligned against the trained acoustic model under the constraints of first-pass transcription to detect the vowel regions. The correctly detected and spurious vowel regions are analyzed in detail to find the impact of semivowel and nasal sound units on the detection of vowel regions as well as on the determination of VOPs and VEPs. In addition to that, a novel front-end feature extraction technique exploiting the temporal and spectral characteristics of the excitation source information in the speech signal is also proposed. The use of the proposed excitation source feature results in the detection of vowel regions that are quite different from those obtained through the mel-frequency cepstral coefficients. Exploiting those differences in the obtained evidences by using the two kinds of features, a technique to combine the evidences is also proposed in order to get a better estimate of the VOPs and VEPs. When the proposed techniques are evaluated on the vowel–nonvowel classification systems developed using the TIMIT database, significant improvements are noted. Moreover, the improvements are noted to hold across all the acoustic modeling paradigms explored in the presented work.


2016 Twenty Second National Conference on Communication (NCC) | 2016

Exploring different acoustic modeling techniques for the detection of vowels in speech signal

Avinash Kumar; Syed Shahnawazuddin; Gayadhar Pradhan

In this paper, we explore acoustic modeling techniques based on the Gaussian mixture modeling (GMM), the subspace GMM (SGMM) and deep neural network (DNN) for the detection of vowels in a given speech signal. At the outset, we develop a recognition system on the TIMIT database that recognizes the sequence of phonetic units present in a given speech sample. Two recognizers are developed using speech data sampled at 16 kHz and 8 kHz rates, respectively. The phone error rates (classification errors) for the two recognizers help in studying the effect of sampling rate on the classifier performance. The experimental evaluations presented in this study show that there is a slight deterioration in the recognition performance when speech data is re-sampled to 8 kHz rate. Next, a three-class classifier (vowel, non-vowel and silence) is also developed on the TIMIT database and the classification performances are studied. Using the three-class classifier, a given speech sample is then forced aligned against the trained acoustic model under the constraints of true/first-pass transcriptions to detect the vowel regions. The correctly detected and spurious vowel regions are analyzed in detail to find the impact of semivowel and nasal sound units on the detection of vowel regions as well as on the determination of vowel onset and end points. Among the explored acoustic modeling techniques, the SGMM-based system is observed to superior to all other systems. Furthermore, for all the studied modeling techniques, the spurious cases are mostly due to the detection of semivowels as the vowels.


Integration | 2018

An efficient hardware architecture for detection of vowel-like regions in speech signal

Nagapuri Srinivas; Gayadhar Pradhan; Puli Kishore Kumar

Abstract Vowel-like regions (VLRs) in a speech signal include vowel, semivowel and diphthong sound units. In the existing VLRs detection methods, front-end speech parameterization have been done by complex algorithms. Those approaches require more hardware and hence delay the process. To address this issue, a simple and robust signal processing approach and its hardware architecture is proposed for discriminating VLRs in the speech signal. In the proposed approach, non-local slope difference (NSD) at each time instant is computed by processing the speech signal through a single pole filter. The NSD is then averaged over an analysis frame and non-linearly mapped using negative exponential to reduce the fluctuations present in the input speech signal. The non-linearly mapped averaged NSD (NL-ANSD) is used as the front-end feature for discriminating VLRs. The NL-ANSD exhibits significantly sharp transition at the starting and ending points of the VLRs. The regions wherein the proposed feature exhibits significant transition and attains lower magnitude for a considerable duration of time are hypothesized as the VLRs. The proposed approach is very simple and requires significantly less hardware when compared with the existing zero-frequency filtering (ZFF) based methods. On the other hand, the proposed approach outperforms the existing ZFF based approaches for the task of detecting VLRs in clean as well as noisy speech signals. The hardware architecture of the proposed approach is verified by implementing it on the Nexys video Artix − 7(XC7A200T − 1SBG484C) field-programmable gate array (FPGA) trainer board for multimedia applications using Xilinx system generator-2016.2.


Circuits Systems and Signal Processing | 2018

An Efficient ECG Denoising Technique Based on Non-local Means Estimation and Modified Empirical Mode Decomposition

Pratik Singh; Syed Shahnawazuddin; Gayadhar Pradhan

Noninvasive nature of Electrocardiogram (ECG) signal makes it widely accepted for cardiac diagnosis. During the process of data acquisition, ECG signal is generally corrupted by a number of noises. Further, during ambulatory monitoring and wireless recording, ECG signal gets corrupted by additive white Gaussian noise. Without affecting the morphological structure, denoising of ECG signal is essential for proper diagnosis. This paper presents an ECG denoising method based on an effective combination of non-local means (NLM) estimation and empirical mode decomposition (EMD). Earlier works have shown that the patch-based NLM approach is insufficient for denoising the under-averaged region near high-amplitude QRS complex. To address this issue, the denoised signal obtained by NLM is decomposed into intrinsic mode functions (IMFs) using EMD in this work. Next, thresholding of the IMFs is done using the instantaneous half period criterion and the soft-thresholding to obtain the final denoised output. Furthermore, the modified empirical mode decomposition (M-EMD) is used in the place of standard EMD to reduce the computational cost. Performance of the proposed method is tested on a number of ECG signals from the MIT-BIH database. The experimental results presented in this paper show that the aforementioned shortcoming of the NLM method is addressed to a large extent. Moreover, the proposed approach provides improved performance when compared to different state-of-the-art ECG denoising methods.


Digital Signal Processing | 2018

Studying the role of pitch-adaptive spectral estimation and speaking-rate normalization in automatic speech recognition

Syed Shahnawazuddin; Nagaraj Adiga; Hemant Kumar Kathania; Gayadhar Pradhan; Rohit Sinha

Abstract In the context of automatic speech recognition (ASR) systems, the front-end acoustic features should not be affected by signal periodicity (pitch period). Motivated by this fact, we have studied the role of pitch-synchronous spectrum estimation approach, referred to as TANDEM STRAIGHT, in this paper. TANDEM STRAIGHT results in a smoother spectrum devoid of pitch harmonics to a large extent. Consequently, the acoustic features derived using the smoothed spectra outperform the conventional Mel-frequency cepstral coefficients (MFCC). The experimental evaluations reported in this paper are performed on speech data from a wide range of speakers belonging to different age groups including children. The proposed features are found to be effective for all groups of speakers. To further improve the recognition of childrens speech, the effect of vocal-tract length normalization (VTLN) is studied. The inclusion of VTLN further improves the recognition performance. We have also performed a detailed study on the effect of speaking-rate normalization (SRN) in the context of childrens speech recognition. An SRN technique based on the anchoring of glottal closure instants estimated using zero-frequency filtering is explored in this regard. SRN is observed to be highly effective for child speakers belonging to different age groups. Finally, all the studied techniques are combined for effective mismatch reduction. In the case of childrens speech test set, the use of proposed features results in a relative improvement of 21.6% over the MFCC features even after combining VTLN and SRN.


Circuits Systems and Signal Processing | 2018

An Experimental Study on the Significance of Variable Frame-Length and Overlap in the Context of Children’s Speech Recognition

Syed Shahnawazuddin; Chaman Singh; Hemant Kumar Kathania; Waquar Ahmad; Gayadhar Pradhan

It is well known that the recognition performance of an automatic speech recognition (ASR) system is affected by intra-speaker as well inter-speaker variability. The differences in the geometry of vocal organs, pitch and speaking-rate among the speakers are some such inter-speaker variabilities affecting the recognition performance. A mismatch between the training and test data with respect to any of those aforementioned factors leads to increased error rates. An example of acoustically mismatched ASR is the task of transcribing children’s speech on adult data-trained system. A large number of studies have been reported earlier that present a myriad of techniques for addressing acoustic mismatch arising from differences in pitch and dimensions of vocal organs. At the same time, only a few works on speaking-rate adaptation employing timescale modification have been reported. Furthermore, those studies were performed on ASR systems developed using Gaussian mixture models. Motivated by these facts, speaking-rate adaptation is explored in this work in the context of children’s ASR system employing deep neural network-based acoustic modeling. Speaking-rate adaptation is performed by changing the frame-length and overlap during front-end feature extraction process. Significant reductions in errors are noted by speaking-rate adaptation. In addition to that, we have also studied the effect of combining speaking-rate adaptation with vocal-tract length normalization and explicit pitch modification. In both the cases, additive improvements are obtained. To summarize, relative improvements in 15–20% over the baselines are obtained by varying the frame-length and frame-overlap.


Australasian Physical & Engineering Sciences in Medicine | 2018

Variational mode decomposition based ECG denoising using non-local means and wavelet domain filtering

Pratik Singh; Gayadhar Pradhan

This paper presents a novel electrocardiogram (ECG) denoising approach based on variational mode decomposition (VMD). This work also incorporates the efficacy of the non-local means (NLM) estimation and the discrete wavelet transform (DWT) filtering technique. Current ECG denoising methods fail to remove noise from the entire frequency range of the ECG signal. To achieve the effective ECG denoising goal, the noisy ECG signal is decomposed into narrow-band variational mode functions (VMFs) using VMD method. The idea is to filter out noise from these narrow-band VMFs. To achieve that, the center frequency information associated with each VMFs is used to exclusively divide them into lower- and higher-frequency signal groups. The higher frequency VMFs were filtered out using DWT-thresholding technique. The lower frequency VMFs are denoised through NLM estimation technique. The non-recursive nature of VMD enables the parallel processing of NLM estimation and DWT filtering. The traditional DWT-based approaches need large decomposition levels to filter low frequency noises and at the same time NLM technique suffers from the rare-patch effect in high-frequency region. On the contrary, in the proposed framework both NLM and DWT approaches complement each other to overcome their individual ill-effects. The signal reconstruction is performed using the denoised high frequency and low frequency VMFs. The simulation performed on the MIT-BIH Arrhythmia database shows that the proposed method outperforms the existing state-of-the-art ECG denoising techniques.


ieee region 10 conference | 2016

Noise robustness of different front-end features for detection of vowels in speech signals

Avinash Kumar; Syed Shahnawazuddin; Gayadhar Pradhan

In this paper, we study the effectiveness of different front-end acoustic features for the task of detecting the vowel regions in a given speech data under clean and degraded conditions. Further, we have also analyzed their effectiveness for the detection of corresponding vowel onset points (VOP) and vowel end points (VEP). To achieve the same, a four-class classifier is developed using some of the existing front-end features employing the dominant acoustic modeling approaches. In order to detect the vowel regions, the given test data is forced-aligned against the trained acoustic models to generate frame-level alignments. The generated frame-level alignments are then compared with the true hand-labeled transcription to determine the accuracy of the detected vowel regions. We also analyzed how the semivowel and nasal sound units affect on the detection of vowel regions as well as the determination of vowel onset points and vowel end set points. The correctly detected and spurious vowel regions are analyzed in detail. A similar study is repeated for the additive noise condition as well.

Collaboration


Dive into the Gayadhar Pradhan's collaboration.

Top Co-Authors

Avatar

Hemant Kumar Kathania

National Institute of Technology Sikkim

View shared research outputs
Top Co-Authors

Avatar

Syed Shahnawazuddin

National Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

Rohit Sinha

Indian Institute of Technology Guwahati

View shared research outputs
Top Co-Authors

Avatar

A. B. Samaddar

National Institute of Technology Sikkim

View shared research outputs
Top Co-Authors

Avatar

Waquar Ahmad

National Institute of Technology Sikkim

View shared research outputs
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge