Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Padmanabhan Rajan is active.

Publication


Featured researches published by Padmanabhan Rajan.


international conference on acoustics, speech, and signal processing | 2013

A practical, self-adaptive voice activity detector for speaker verification with noisy telephone and microphone data

Tomi Kinnunen; Padmanabhan Rajan

A voice activity detector (VAD) plays a vital role in robust speaker verification, where energy VAD is most commonly used. Energy VAD works well in noise-free conditions but deteriorates in noisy conditions. One way to tackle this is to introduce speech enhancement preprocessing. We study an alternative, likelihood ratio based VAD that trains speech and nonspeech models on an utterance-by-utterance basis from mel-frequency cepstral coefficients (MFCCs). The training labels are obtained from enhanced energy VAD. As the speech and nonspeech models are re-trained for each utterance, minimum assumptions of the background noise are made. According to both VAD error analysis and speaker verification results utilizing state-of-the-art i-vector system, the proposed method outperforms energy VAD variants by a wide margin. We provide open-source implementation of the method.


Digital Signal Processing | 2014

From single to multiple enrollment i-vectors: practical PLDA scoring variants for speaker verification

Padmanabhan Rajan; Anton Afanasyev; Ville Hautamäki; Tomi Kinnunen

Abstract The availability of multiple utterances (and hence, i-vectors) for speaker enrollment brings up several alternatives for their utilization with probabilistic linear discriminant analysis (PLDA). This paper provides an overview of their effective utilization, from a practical viewpoint. We derive expressions for the evaluation of the likelihood ratio for the multi-enrollment case, with details on the computation of the required matrix inversions and determinants. The performance of five different scoring methods, and the effect of i-vector length normalization is compared experimentally. We conclude that length normalization is a useful technique for all but one of the scoring methods considered, and averaging i-vectors is the most effective out of the methods compared. We also study the application of multicondition training on the PLDA model. Our experiments indicate that multicondition training is more effective in estimating PLDA hyperparameters than it is for likelihood computation. Finally, we look at the effect of the configuration of the enrollment data on PLDA scoring, studying the properties of conditional dependence and number-of-enrollment-utterances per target speaker. Our experiments indicate that these properties affect the performance of the PLDA model. These results further support the conclusion that i-vector averaging is a simple and effective way to process multiple enrollment utterances.


international conference on acoustics, speech, and signal processing | 2012

The UMD-JHU 2011 speaker recognition system

Daniel Garcia-Romero; Xinhui Zhou; Dmitry N. Zotkin; Balaji Vasan Srinivasan; Yuancheng Luo; Sriram Ganapathy; Samuel Thomas; Sridhar Krishna Nemala; Garimella S. V. S. Sivaram; Majid Mirbagheri; Sri Harish Reddy Mallidi; Thomas Janu; Padmanabhan Rajan; Nima Mesgarani; Mounya Elhilali; Hynek Hermansky; Shihab A. Shamma; Ramani Duraiswami

In recent years, there have been significant advances in the field of speaker recognition that has resulted in very robust recognition systems. The primary focus of many recent developments have shifted to the problem of recognizing speakers in adverse conditions, e.g in the presence of noise/reverberation. In this paper, we present the UMD-JHU speaker recognition system applied on the NIST 2010 SRE task. The novel aspects of our systems are: 1) Improved performance on trials involving different vocal effort via the use of linear-scale features; 2) Expected improved recognition performance in the presence of reverberation and noise via the use of frequency domain perceptual linear predictor and cortical features; 3) A new discriminative kernel partial least squares (KPLS) framework that complements state-of-the-art back-end systems JFA and PLDA to aid in better overall recognition; and 4) Acceleration of JFA, PLDA and KPLS back-ends via distributed computing. The individual components of the system and the fused system are compared against a baseline JFA system and results reported by SRI and MIT-LL on SRE2010.


workshop on applications of signal processing to audio and acoustics | 2011

Multi-layer perceptron based speech activity detection for speaker verification

Sriram Ganapathy; Padmanabhan Rajan; Hynek Hermansky

In this paper, we present a speech activity detection (SAD) technique for speaker verification in noisy environments. The proposed SAD is based on phoneme posteriors derived from a multi-layer perceptron (MLP). The MLP is trained using modulation spectral features, where long temporal segments of the speech signal are analyzed in critical bands. In each sub-band, temporal envelopes are derived using the autoregressive modelling technique called frequency domain linear prediction (FDLP). The robustness of the sub-band envelopes is achieved by a minimum mean square envelope estimation technique. We also experiment with MFCC features processed with cepstral mean subtraction. The speech features are input to the trained MLP to estimate phoneme posterior probabilities. For SAD, all the speech phoneme probabilities are merged to one speech class to derive speech/non-speech decisions. The proposed SAD is applied for a speaker verification task using noisy versions of NIST 2008 speaker recognition evaluation (SRE) data, where the proposed SAD provides significant improvements (relative equal error rate (EER) improvement of about 9 % in additive noise and about 19 % in reverberant conditions). Furthermore, the improvements are consistent for the two different front-ends (FDLP and MFCC) considered here.


international conference on signal processing and communication systems | 2016

Model-based unsupervised segmentation of birdcalls from field recordings

Anshul Thakur; Padmanabhan Rajan

In this paper, we describe an unsupervised, species independent method to segment birdcalls from the background in bio-acoustic recordings. The method follows a two-pass approach. An initial segmentation is performed utilizing K-means clustering. This provides labels to train Gaussian mixture acoustic models, which are built using Mel frequency cepstral coefficients. Using the acoustic models, the segmentation is refined further to classify each short-time frame as belonging either to the background or to call-activity. Different features, namely short-time energy, Fourier transform phase-based entropy and inverse spectral flatness (ISF) are evaluated within the framework of the proposed method. Our experiments with real field recordings on two datasets reveal that the ISF reliably provides better segmentation performance when compared to the other two features.


international conference on machine learning and applications | 2016

Bird Call Identification Using Dynamic Kernel Based Support Vector Machines and Deep Neural Networks

Deep Chakraborty; Paawan Mukker; Padmanabhan Rajan; Aroor Dinesh Dileep

In this paper, we apply speech and audio processing techniques to bird vocalizations and for the classification of birds found in the lower Himalayan regions. Mel frequency cepstral coefficients (MFCC) are extracted from each recording. As a result, the recordings are now represented as varying length sets of feature vectors. Dynamic kernel based support vector machines (SVMs) and deep neural networks (DNNs) are popularly used for the classification of such varying length patterns obtained from speech signals. In this work, we propose to use dynamic kernel based SVMs and DNNs for classification of bird calls represented as sets of feature vectors. Results of our studies show that both approaches give comparable performance.


Journal of the Acoustical Society of America | 2018

Local compressed convex spectral embedding for bird species identification

Anshul Thakur; Vinayak Abrol; Pulkit Sharma; Padmanabhan Rajan

This paper proposes a multi-layer alternating sparse-dense framework for bird species identification. The framework takes audio recordings of bird vocalizations and produces compressed convex spectral embeddings (CCSE). Temporal and frequency modulations in bird vocalizations are ensnared by concatenating frames of the spectrogram, resulting in a high dimensional and highly sparse super-frame-based representation. Random projections are then used to compress these super-frames. Class-specific archetypal analysis is employed on the compressed super-frames for acoustic modeling, obtaining the convex-sparse CCSE representation. This representation efficiently captures species-specific discriminative information. However, many bird species exhibit high intra-species variations in their vocalizations, making it hard to appropriately model the whole repertoire of vocalizations using only one dictionary of archetypes. To overcome this, each class is clustered using Gaussian mixture models (GMM), and for each cluster, one dictionary of archetypes is learned. To calculate CCSE for any compressed super-frame, one dictionary from each class is chosen using the responsibilities of individual GMM components. The CCSE obtained using this GMM-archetypal analysis framework is referred to as local CCSE. Experimental results corroborate that local CCSE either outperforms or exhibits comparable performances to existing methods including support vector machine powered by dynamic kernels and deep neural networks.


national conference on communications | 2017

Unsupervised birdcall activity detection using source and system features

Anshul Thakur; Padmanabhan Rajan

In this paper, we describe an unsupervised method to segment birdcalls from the background in bioacoustic recordings. The method utilizes information derived from both source features as well as system features. Three types of source features are extracted from the linear prediction residual signal, and Mel frequency cepstral coefficients are extracted from the system features. The source features are used to generate automatic labels, which are then used to train acoustic models for distinguishing birdcall frames from the background. In the context of a technique proposed earlier, our study demonstrates the improvements brought about by the inclusion of additional source features.


european signal processing conference | 2017

Archetypal analysis based sparse convex sequence kernel for bird activity detection

Vinayak Abrol; Pulkit Sharma; Anshul Thakur; Padmanabhan Rajan; Aroor Dinesh Dileep; Anil Kumar Sao

This paper proposes a novel method based on the archetypal analysis (AA) for bird activity detection (BAD) task. The proposed method extracts a convex representation (frame-wise) by projecting a given audio signal on to a learned dictionary. The AA based dictionary is trained only on bird class signals, which makes the method robust to background noise. Further, it is shown that due to the inherent sparsity property of convex representations, non-bird class signals will have a denser representation as compared to the bird counterpart, which helps in effective discrimination. In order to detect presence/absence of bird vocalization, a fixed length representation is obtained by averaging the obtained frame wise representations of an audio signal. Classification of these fixed length representations is performed using support vector machines (SVM) with a dynamic kernel. In this work, we propose a variant of probabilistic sequence kernel called sparse convex sequence kernel (SCSK) for the BAD task. Experimental results show that the proposed method can efficiently discriminate bird from non-bird class signals.


european signal processing conference | 2017

Rapid bird activity detection using probabilistic sequence kernels

Anshul Thakur; R. Jyothi; Padmanabhan Rajan; Aroor Dinesh Dileep

Bird activity detection is the task of determining if a bird sound is present in a given audio recording. This paper describes a bird activity detector which utilises a support vector machine (SVM) with a dynamic kernel. Dynamic kernels are used to process sets of feature vectors having different cardinalities. Probabilistic sequence kernel (PSK) is one such dynamic kernel. The PSK converts a set of feature vectors from a recording into a fixed-length vector. We propose to use a variant of PSK in this work. Before computing the fixed-length vector, cepstral mean and variance normalisation and short-time Gaussianization is performed on the feature vectors. This reduces environment mismatch between different recordings. Additionally, we also demonstrate a simple procedure to speed up the proposed method by reducing the size of fixed-length vector. A speedup of almost 70% is observed, with a very small drop in accuracy. The proposed method is also compared with a random forest classifier and is shown to outperform it.

Collaboration


Dive into the Padmanabhan Rajan's collaboration.

Top Co-Authors

Avatar

Tomi Kinnunen

University of Eastern Finland

View shared research outputs
Top Co-Authors

Avatar

Anshul Thakur

Indian Institute of Technology Mandi

View shared research outputs
Top Co-Authors

Avatar

Ville Hautamäki

University of Eastern Finland

View shared research outputs
Top Co-Authors

Avatar

Pulkit Sharma

Indian Institute of Technology Mandi

View shared research outputs
Top Co-Authors

Avatar

Vinayak Abrol

Indian Institute of Technology Mandi

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Aroor Dinesh Dileep

Indian Institute of Technology Mandi

View shared research outputs
Top Co-Authors

Avatar

John H. L. Hansen

University of Texas at Dallas

View shared research outputs
Top Co-Authors

Avatar

Navid Shokouhi

University of Texas at Dallas

View shared research outputs
Top Co-Authors

Avatar

Seyed Omid Sadjadi

University of Texas at Dallas

View shared research outputs
Researchain Logo
Decentralizing Knowledge