Mahesh Kumar Nandwana

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Mahesh Kumar Nandwana is active.

Explore More

Publication

Featured researches published by Mahesh Kumar Nandwana.

international conference on acoustics, speech, and signal processing | 2016

Joint information from nonlinear and linear features for spoofing detection: An i-vector/DNN based approach

Chunlei Zhang; Shivesh Ranjan; Mahesh Kumar Nandwana; Qian Zhang; Abhinav Misra; Gang Liu; Finnian Kelly; John H. L. Hansen

Sustaining automatic speaker verification(ASV) systems from spoofing attacks remains an essential challenge, even if significant progress in ASV has been achieved in recent years. In this study, an automatic spoofing detection approach using an i-vector framework is proposed. Two approaches are used for frame-level feature extraction: cepstral-based Perceptual Minimum Variance Distortionless Response (PMVDR), and non-linear speech-production-motivated Teager Energy Operator (TEO) Critical Band (CB) Autocorrelation Envelope (Auto-Env). An utterance-level i-vector for each recording is formed by concatenating PMVDR and TEO-CB-Auto-Envi-vectors, followed by linear discriminative analysis (LDA) for maximizing the ratio of between-class to within-class scatterings. A Gaussian classifier and DNN are also investigated for back-end scoring. Experiments using the ASVspoof 2015 corpus show that our proposed method successfully detects spoofing attacks. By combining the TEO-CB-Auto-Env and PMVDR features, a relative 76.7% improvement in terms of EER is obtained compared with the best single-feature system.

international conference on acoustics, speech, and signal processing | 2015

Robust unsupervised detection of human screams in noisy acoustic environments

Mahesh Kumar Nandwana; Ali Ziaei; John H. L. Hansen

This study is focused on an unsupervised approach for detection of human scream vocalizations from continuous recordings in noisy acoustic environments. The proposed detection solution is based on compound segmentation, which employs weighted mean distance, T2-statistics and Bayesian Information Criteria for detection of screams. This solution also employs an unsupervised threshold optimized Combo-SAD for removal of non-vocal noisy segments in the preliminary stage. A total of five noisy environments were simulated for noise levels ranging from -20dB to +20dB for five different noisy environments. Performance of proposed system was compared using two alternative acoustic front-end features (i) Mel-frequency cepstral coefficients (MFCC) and (ii) perceptual minimum variance distortionless response (PMVDR). Evaluation results show that the new scream detection solution works well for clean, +20, +10 dB SNR levels, with performance declining as SNR decreases to -20dB across a number of the noise sources considered.

conference of the international speech communication association | 2018

Analysis of Complementary Information Sources in the Speaker Embeddings Framework.

Mahesh Kumar Nandwana; Mitchell McLaren; Diego Castán; Julien van Hout; Aaron Lawson

Deep neural network (DNN)-based speaker embeddings have resulted in new, state-of-the-art text-independent speaker recognition technology. However, very limited effort has been made to understand DNN speaker embeddings. In this study, our aim is analyzing the behavior of the speaker recognition systems based on speaker embeddings toward different front-end features, including the standard Mel frequency cepstral coefficients (MFCC), as well as power normalized cepstral coefficients (PNCC), and perceptual linear prediction (PLP). Using a speaker recognition system based on DNN speaker embeddings and probabilistic linear discriminant analysis (PLDA), we compared different approaches to leveraging complementary information using score-, embeddings-, and feature-level combination. We report our results for Speakers in the Wild (SITW) and NIST SRE 2016 datasets. We found that first and second embeddings layers are complementary in nature. By applying score and embedding-level fusion we demonstrate relative improvements in equal error rate of 17% on NIST SRE 2016 and 10% on SITW over the baseline system.

conference of the international speech communication association | 2016

Towards Smart-Cars That Can Listen: Abnormal Acoustic Event Detection on the Road.

Mahesh Kumar Nandwana; Taufiq Hasan

Even with the recent technological advancements in smartcars, safety is still a major challenge in autonomous driving. State-of-the-art self-driving vehicles mostly rely on visual, ultrasonic and radar sensors to assess the surroundings and make decisions. However, in certain driving scenarios, the best modality for context awareness is environmental sound. In this study, we propose an acoustic event recognition framework for detecting abnormal audio events on the road. We consider five classes of audio events, namely, ambulance siren, railroad crossing bell, tire screech, car honk, and glass break. We explore various generative and discriminative back-end classifiers, utilizing Gaussian Mixture Models (GMM), GMM mean supervectors and the I-vector framework. Evaluation results using the proposed strategy validate the effectiveness of the proposed system.

conference of the international speech communication association | 2014

Analysis and identification of human scream: implications for speaker recognition.

Mahesh Kumar Nandwana; John H. L. Hansen

Journal of the Acoustical Society of America | 2017

Analysis of human scream and its impact on text-independent speaker verification

John H. L. Hansen; Mahesh Kumar Nandwana; Navid Shokouhi

conference of the international speech communication association | 2015

A New Front-End for Classification of Non-Speech Sounds: A Study on Human Whistle

Mahesh Kumar Nandwana; Hynek Boril; John H. L. Hansen

IEEE Transactions on Audio, Speech, and Language Processing | 2019

Toward Fail-Safe Speaker Recognition: Trial-Based Calibration With a Reject Option

Luciana Ferrer; Mahesh Kumar Nandwana; Mitchell McLaren; Diego Castán; Aaron Lawson

conference of the international speech communication association | 2018

Robust Speaker Recognition from Distant Speech under Real Reverberant Environments Using Speaker Embeddings.

Mahesh Kumar Nandwana; Julien van Hout; Mitchell McLaren; Allen Stauffer; Colleen Richey; Aaron Lawson; Martin Graciarena

conference of the international speech communication association | 2018

Voices Obscured in Complex Environmental Settings (VOiCES) Corpus.

Colleen Richey; María Auxiliadora Barrios; Zeb Armstrong; Chris D. Bartels; Horacio Franco; Martin Graciarena; Aaron Lawson; Mahesh Kumar Nandwana; Allen Stauffer; Julien van Hout; Paul Gamble; Jeffrey Hetherly; Cory Stephenson; Karl Ni

Explore More