Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Sridhar Krishna Nemala is active.

Publication


Featured researches published by Sridhar Krishna Nemala.


international conference on acoustics, speech, and signal processing | 2010

Sparse coding for speech recognition

Garimella S. V. S. Sivaram; Sridhar Krishna Nemala; Mounya Elhilali; Trac D. Tran; Hynek Hermansky

This paper proposes a novel feature extraction technique for speech recognition based on the principles of sparse coding. The idea is to express a spectro-temporal pattern of speech as a linear combination of an overcomplete set of basis functions such that the weights of the linear combination are sparse. These weights (features) are subsequently used for acoustic modeling. We learn a set of overcomplete basis functions (dictionary) from the training set by adopting a previously proposed algorithm which iteratively minimizes the reconstruction error and maximizes the sparsity of weights. Furthermore, features are derived using the learned basis functions by applying the well established principles of compressive sensing. Phoneme recognition experiments show that the proposed features outperform the conventional features in both clean and noisy conditions.


IEEE Transactions on Audio, Speech, and Language Processing | 2013

A Multistream Feature Framework Based on Bandpass Modulation Filtering for Robust Speech Recognition

Sridhar Krishna Nemala; Kailash Patil; Mounya Elhilali

There is strong neurophysiological evidence suggesting that processing of speech signals in the brain happens along parallel paths which encode complementary information in the signal. These parallel streams are organized around a duality of slow vs. fast: Coarse signal dynamics appear to be processed separately from rapidly changing modulations both in the spectral and temporal dimensions. We adapt such duality in a multistream framework for robust speaker-independent phoneme recognition. The scheme presented here centers around a multi-path bandpass modulation analysis of speech sounds with each stream covering an entire range of temporal and spectral modulations. By performing bandpass operations along the spectral and temporal dimensions, the proposed scheme avoids the classic feature explosion problem of previous multistream approaches while maintaining the advantage of parallelism and localized feature analysis. The proposed architecture results in substantial improvements over standard and state-of-the-art feature schemes for phoneme recognition, particularly in presence of nonstationary noise, reverberation and channel distortions.


IEEE Signal Processing Letters | 2010

Data-Driven and Feedback Based Spectro-Temporal Features for Speech Recognition

Garimella S. V. S. Sivaram; Sridhar Krishna Nemala; Nima Mesgarani; Hynek Hermansky

This paper proposes novel data-driven and feedback based discriminative spectro-temporal filters for feature extraction in automatic speech recognition (ASR). Initially a first set of spectro-temporal filters are designed to separate each phoneme from the rest of the phonemes. A hybrid Hidden Markov Model/Multilayer Perceptron (HMM/MLP) phoneme recognition system is trained on the features derived using these filters. As a feedback to the feature extraction stage, top confusions of this system are identified, and a second set of filters are designed specifically to address these confusions. Phoneme recognition experiments on TIMIT show that the features derived from the combined set of discriminative filters outperform conventional speech recognition features, and also contain significant complementary information.


international conference on acoustics, speech, and signal processing | 2012

The UMD-JHU 2011 speaker recognition system

Daniel Garcia-Romero; Xinhui Zhou; Dmitry N. Zotkin; Balaji Vasan Srinivasan; Yuancheng Luo; Sriram Ganapathy; Samuel Thomas; Sridhar Krishna Nemala; Garimella S. V. S. Sivaram; Majid Mirbagheri; Sri Harish Reddy Mallidi; Thomas Janu; Padmanabhan Rajan; Nima Mesgarani; Mounya Elhilali; Hynek Hermansky; Shihab A. Shamma; Ramani Duraiswami

In recent years, there have been significant advances in the field of speaker recognition that has resulted in very robust recognition systems. The primary focus of many recent developments have shifted to the problem of recognizing speakers in adverse conditions, e.g in the presence of noise/reverberation. In this paper, we present the UMD-JHU speaker recognition system applied on the NIST 2010 SRE task. The novel aspects of our systems are: 1) Improved performance on trials involving different vocal effort via the use of linear-scale features; 2) Expected improved recognition performance in the presence of reverberation and noise via the use of frequency domain perceptual linear predictor and cortical features; 3) A new discriminative kernel partial least squares (KPLS) framework that complements state-of-the-art back-end systems JFA and PLDA to aid in better overall recognition; and 4) Acceleration of JFA, PLDA and KPLS back-ends via distributed computing. The individual components of the system and the fused system are compared against a baseline JFA system and results reported by SRI and MIT-LL on SRE2010.


international conference on acoustics, speech, and signal processing | 2010

A joint acoustic and phonological approach to speech intelligibility assessment

Sridhar Krishna Nemala; Mounya Elhilali

While current models of speech intelligibility rely on intricate acoustic analyses of speech attributes, they are limited by the lack of any linguistic information; hence failing to capture natural variability of speech sounds and confining their applicability to average intelligibility assessments. Another important limitation is that the existing models rely on the use of reference clean speech templates (or average profiles). In this work, we propose a novel approach to speech intelligibility by combining a biologically-inspired acoustic analysis of peripheral and cortical processing with phonological statistical models of speech using a hybrid GMM-SVM system. The model results in a novel scheme for speech intelligibility assessment without the use of reference clean speech templates, and the model predictions strongly correlate with scores obtained from human listeners under a variety of realistic listening environments. We further show that the proposed model enables local level tracking of intelligibility and also generalizes well to multiple speech corpora.


Eurasip Journal on Audio, Speech, and Music Processing | 2012

Biomimetic Multi-Resolution Analysis for Robust Speaker Recognition

Sridhar Krishna Nemala; Dmitry N. Zotkin; Ramani Duraiswami; Mounya Elhilali

Humans exhibit a remarkable ability to reliably classify sound sources in the environment even in presence of high levels of noise. In contrast, most engineering systems suffer a drastic drop in performance when speech signals are corrupted with channel or background distortions. Our brains are equipped with elaborate machinery for speech analysis and feature extraction, which hold great lessons for improving the performance of automatic speech processing systems under adverse conditions. The work presented here explores a biologically-motivated multi-resolution speaker information representation obtained by performing an intricate yet computationally-efficient analysis of the information-rich spectro-temporal attributes of the speech signal. We evaluate the proposed features in a speaker verification task performed on NIST SRE 2010 data. The biomimetic approach yields significant robustness in presence of non-stationary noise and reverberation, offering a new framework for deriving reliable features for speaker recognition and speech processing.


Journal of the Acoustical Society of America | 2010

Relevant spectro‐temporal modulations for robust speech and nonspeech classification.

Sridhar Krishna Nemala; Mounya Elhilali

Robust speech/non‐speech classification is an important step in a variety of speech processing applications. For example, in speech and speaker recognition systems designed to work in real world environments, a robust discrimination of speech from other sounds is an essential pre‐processing step. Auditory‐based features at multiple‐scales of time and spectral resolution have been shown to be very useful for the speech/non‐speech classification task [Mesgarani et al., IEEE Trans. Speech Audio Process. 10, 504–516 (2002)]. The features used are computed using a biologically inspired auditory model that maps a given sound to a high‐dimensional representation of its spectro‐temporal modulations (mimicking the various stages taking place along the auditory pathway from the periphery all the way to the primary auditory cortex). In this work, we analyze the contribution of different temporal and spectral modulations for robust speech/non‐speech classification. The results suggest the temporal modulations in the range 12–22 Hz, and spectral modulations in the range 1.5–4 cycles/octave are particularly useful to achieve the robustness in highly noisy and reverberant environments.


International Journal of Speech Technology | 2013

Recognizing the message and the messenger: biomimetic spectral analysis for robust speech and speaker recognition

Sridhar Krishna Nemala; Kailash Patil; Mounya Elhilali

Humans are quite adept at communicating in presence of noise. However most speech processing systems, like automatic speech and speaker recognition systems, suffer from a significant drop in performance when speech signals are corrupted with unseen background distortions. The proposed work explores the use of a biologically-motivated multi-resolution spectral analysis for speech representation. This approach focuses on the information-rich spectral attributes of speech and presents an intricate yet computationally-efficient analysis of the speech signal by careful choice of model parameters. Further, the approach takes advantage of an information-theoretic analysis of the message and speaker dominant regions in the speech signal, and defines feature representations to address two diverse tasks such as speech and speaker recognition. The proposed analysis surpasses the standard Mel-Frequency Cepstral Coefficients (MFCC), and its enhanced variants (via mean subtraction, variance normalization and time sequence filtering) and yields significant improvements over a state-of-the-art noise robust feature scheme, on both speech and speaker recognition tasks.


international conference on acoustics, speech, and signal processing | 2012

Multilevel speech intelligibility for robust speaker recognition

Sridhar Krishna Nemala; Mounya Elhilali

In the real world, natural conversational speech is an amalgam of speech segments, silences and environmental/ background and channel effects. Labeling the different regions of an acoustic signal according to their information levels would greatly benefit all automatic speech processing tasks. In the current work, we propose a novel segmentation approach based on a perception-based measure of speech intelligibility. Unlike segmentation approaches based on various forms of voice-activity detection (VAD), the proposed parsing approach exploits higher-level perceptual information about signal intelligibility levels. This labeling information is integrated into a novel multilevel framework for automatic speaker recognition task. The system processes the input acoustic signal along independent streams reflecting various levels of intelligibility and then fusing the decision scores from the multiple steams according to their intelligibility contribution. Our results show that the proposed system achieves significant improvements over standard baseline and VAD-based approaches, and attains a performance similar to the one obtained with oracle speech segmentation information.


conference on information sciences and systems | 2011

Multistream robust speaker recognition based on speech intelligibility

Sridhar Krishna Nemala; Mounya Elhilali

Delimiting the most informative voice segments of an acoustic signal is often a crucial initial step for any speech processing system. In the current work, we propose a novel segmentation approach based on a perception-based measure of speech intelligibility. Unlike segmentation approaches based on various forms of voice-activity detection (VAD), the proposed segmentation approach exploits higher-level perceptual information about the signal intelligibility levels. This classification based on intelligibility estimates is integrated into a novel multistream framework for automatic speaker recognition task. The multistream system processes the input acoustic signal along multiple independent streams reflecting various levels of intelligibility and then fusing the decision scores from the multiple steams according to their intelligibility contribution. Our results show that the proposed multistream system achieves significant improvements both in clean and noisy conditions when compared with a baseline and a state-of-the-art voice-activity detection algorithm.

Collaboration


Dive into the Sridhar Krishna Nemala's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Kailash Patil

Johns Hopkins University

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

A. G. Ramakrishnan

Indian Institute of Science

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge