Mahesh Kumar Nandwana
University of Texas at Dallas
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Mahesh Kumar Nandwana.
international conference on acoustics, speech, and signal processing | 2016
Chunlei Zhang; Shivesh Ranjan; Mahesh Kumar Nandwana; Qian Zhang; Abhinav Misra; Gang Liu; Finnian Kelly; John H. L. Hansen
Sustaining automatic speaker verification(ASV) systems from spoofing attacks remains an essential challenge, even if significant progress in ASV has been achieved in recent years. In this study, an automatic spoofing detection approach using an i-vector framework is proposed. Two approaches are used for frame-level feature extraction: cepstral-based Perceptual Minimum Variance Distortionless Response (PMVDR), and non-linear speech-production-motivated Teager Energy Operator (TEO) Critical Band (CB) Autocorrelation Envelope (Auto-Env). An utterance-level i-vector for each recording is formed by concatenating PMVDR and TEO-CB-Auto-Envi-vectors, followed by linear discriminative analysis (LDA) for maximizing the ratio of between-class to within-class scatterings. A Gaussian classifier and DNN are also investigated for back-end scoring. Experiments using the ASVspoof 2015 corpus show that our proposed method successfully detects spoofing attacks. By combining the TEO-CB-Auto-Env and PMVDR features, a relative 76.7% improvement in terms of EER is obtained compared with the best single-feature system.
international conference on acoustics, speech, and signal processing | 2015
Mahesh Kumar Nandwana; Ali Ziaei; John H. L. Hansen
This study is focused on an unsupervised approach for detection of human scream vocalizations from continuous recordings in noisy acoustic environments. The proposed detection solution is based on compound segmentation, which employs weighted mean distance, T2-statistics and Bayesian Information Criteria for detection of screams. This solution also employs an unsupervised threshold optimized Combo-SAD for removal of non-vocal noisy segments in the preliminary stage. A total of five noisy environments were simulated for noise levels ranging from -20dB to +20dB for five different noisy environments. Performance of proposed system was compared using two alternative acoustic front-end features (i) Mel-frequency cepstral coefficients (MFCC) and (ii) perceptual minimum variance distortionless response (PMVDR). Evaluation results show that the new scream detection solution works well for clean, +20, +10 dB SNR levels, with performance declining as SNR decreases to -20dB across a number of the noise sources considered.
conference of the international speech communication association | 2018
Mahesh Kumar Nandwana; Mitchell McLaren; Diego Castán; Julien van Hout; Aaron Lawson
Deep neural network (DNN)-based speaker embeddings have resulted in new, state-of-the-art text-independent speaker recognition technology. However, very limited effort has been made to understand DNN speaker embeddings. In this study, our aim is analyzing the behavior of the speaker recognition systems based on speaker embeddings toward different front-end features, including the standard Mel frequency cepstral coefficients (MFCC), as well as power normalized cepstral coefficients (PNCC), and perceptual linear prediction (PLP). Using a speaker recognition system based on DNN speaker embeddings and probabilistic linear discriminant analysis (PLDA), we compared different approaches to leveraging complementary information using score-, embeddings-, and feature-level combination. We report our results for Speakers in the Wild (SITW) and NIST SRE 2016 datasets. We found that first and second embeddings layers are complementary in nature. By applying score and embedding-level fusion we demonstrate relative improvements in equal error rate of 17% on NIST SRE 2016 and 10% on SITW over the baseline system.
conference of the international speech communication association | 2016
Mahesh Kumar Nandwana; Taufiq Hasan
Even with the recent technological advancements in smartcars, safety is still a major challenge in autonomous driving. State-of-the-art self-driving vehicles mostly rely on visual, ultrasonic and radar sensors to assess the surroundings and make decisions. However, in certain driving scenarios, the best modality for context awareness is environmental sound. In this study, we propose an acoustic event recognition framework for detecting abnormal audio events on the road. We consider five classes of audio events, namely, ambulance siren, railroad crossing bell, tire screech, car honk, and glass break. We explore various generative and discriminative back-end classifiers, utilizing Gaussian Mixture Models (GMM), GMM mean supervectors and the I-vector framework. Evaluation results using the proposed strategy validate the effectiveness of the proposed system.
conference of the international speech communication association | 2014
Mahesh Kumar Nandwana; John H. L. Hansen
Journal of the Acoustical Society of America | 2017
John H. L. Hansen; Mahesh Kumar Nandwana; Navid Shokouhi
conference of the international speech communication association | 2015
Mahesh Kumar Nandwana; Hynek Boril; John H. L. Hansen
IEEE Transactions on Audio, Speech, and Language Processing | 2019
Luciana Ferrer; Mahesh Kumar Nandwana; Mitchell McLaren; Diego Castán; Aaron Lawson
conference of the international speech communication association | 2018
Mahesh Kumar Nandwana; Julien van Hout; Mitchell McLaren; Allen Stauffer; Colleen Richey; Aaron Lawson; Martin Graciarena
conference of the international speech communication association | 2018
Colleen Richey; María Auxiliadora Barrios; Zeb Armstrong; Chris D. Bartels; Horacio Franco; Martin Graciarena; Aaron Lawson; Mahesh Kumar Nandwana; Allen Stauffer; Julien van Hout; Paul Gamble; Jeffrey Hetherly; Cory Stephenson; Karl Ni