Srikanth R. Madikeri
Indian Institute of Technology Madras
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Srikanth R. Madikeri.
national conference on communications | 2011
Srikanth R. Madikeri; Hema A. Murthy
This paper investigates the use of Mel Filterbank Slope (MFS) feature for speaker recognition tasks. The Mel filterbank slope feature emphasises formants in comparison with that of the conventional Mel Filterbank Cepstral Coefficients (MFCC). The effectiveness of this feature is evaluated on the NIST 2003 speaker recognition database. Results show significant gain in performance on speaker identification accuracies by 8.9% and speaker verification EER by 1.6% with no additional computational costs involved. A combination of the MFS feature along with the delta MFCC feature shows further 2.7% and 1.2% improvements in the respective tasks. Late fusion on speaker verification systems are shown to give an overall improvement of 3%.
Digital Signal Processing | 2014
Srikanth R. Madikeri
A text-independent speaker recognition system using a hybrid Probabilistic Principal Component Analysis (PPCA) and conventional i-vector modeling technique is proposed. In this framework, the total variability space (TVS) is estimated using PPCA while the i-vectors of target speakers and test utterances are extracted using the conventional method. This leads to appreciable decrease in development time, while the time required for training and testing remains unchanged. In this a paper, an algorithmic optimization to the PPCAs EM algorithm is developed. This is observed to provide a speed up of 3.7x. To simplify the testing procedure, two different approximation procedures are proposed to be used in this framework. The first approximation assumes a covariance matrix computed based on the PPCA framework. The second approximation proposes an optimization to avoid inverting the precision matrix of the i-vector. The comparison of time taken by these approximations with the baseline i-vector extraction procedure shows speed gains with some deterioration in performance in terms of the Equal Error Rate (EER). Among the proposed techniques, a best case trade-off is obtained with a speed up of 81.2x with deterioration in performance by 0.7% in absolute terms. Speaker recognition performances are studied on the telephone conditions of the benchmark NIST SRE 2010 dataset with systems built on the Mel Frequency Cepstral Co-efficient (MFCC) feature. A trade-off in the performance is observed when the proposed approximations are used. The scalability of these trade-offs is tested on the Mel Filterbank Slope (MFS) feature. The trade-offs observed with the approximations are reduced when the two systems are fused.
International Journal of Speech Technology | 2015
Srikanth R. Madikeri; Asha Talambedu; Hema A. Murthy
In this paper, modified group delay (MODGD) features are used to model target speakers in the Total Variability Space (TVS) framework for speaker recognition. MODGD based features have been shown to improve speaker recognition performance owing to the ability of group delay functions to emphasise formants. The basis vectors of TVS are estimated using the PPCA algorithm while i-vectors for a speaker are extracted using the conventional technique. The estimation of the total variability space is simplified by a simple transformation of the supervectors. This results in a significant speed up in the estimation of hyperparameters of TVS as the computational complexity of PPCA algorithm is simpler compared to that of the conventaional procedure. This is important as the estimation procedure needs to handle large amounts data for estimation. The technique has already been shown to provide a speed up of 16
international conference on signal processing | 2012
Srikanth R. Madikeri; Hema A. Murthy
text speech and dialogue | 2012
Srikanth R. Madikeri; Hema A. Murthy
\times
international conference on machine learning and applications | 2011
Srikanth R. Madikeri; Hema A. Murthy
conference of the international speech communication association | 2016
Nauman Dawalatabad; Srikanth R. Madikeri; C. Chandra Sekhar; Hema A. Murthy
×. The performance of the MODGD-based system is compared with that of the MFCC based system on the NIST SRE 2010 benchmark dataset. Two types of fusions are tested in this work—systems fused at the i-vector level and at the score level. A considerable performance improvement is observed in terms of the EER (Equal Error Rate) by employing these fusion techniques. A robust speaker recognition system with decreased development time is obtained as a result.
conference of the international speech communication association | 2016
Marc Ferras; Srikanth R. Madikeri; Subhadeep Dey; Petr Motlicek
Mel Filterbank Slope (MFS) feature has been shown to consistently perform better than the conventional Mel Frequency Cepstral Co-efficients (MFCC) for speaker recognition. In this work, the issues with respect to the features robustness to intersession variability and large dimensionality are addressed. Short term feature warping is used to improve the robustness of MFS. This is observed to give an absolute improvement of 1% in EER on NIST 2003 SRE benchmark dataset. Dimensionality reduction on raw MFS features is performed using Discrete Cosine Transform (DCT). Efficient reduction is obtained using DCT with no deterioration in performance. Feature warping along with DCT is observed to give an absolute improvement of 2% in EER. An overall performance improvement of 3.3% is shown when the feature is fused with temporal information from MFCC.
Archive | 2016
Srikanth R. Madikeri; Subhadeep Dey; Petr Motlicek; Marc Ferras
In this paper, a new approach to keyword spotting is presented that uses event based signal processing to obtain approximate locations of sub-word units. A segmentation algorithm based on group delay functions is used to determine the boundaries. units. These sub-word units are used as individual inputs to an unconstrained endpoints dynamic time warping-based (UE-DTW) template matching algorithm. Appropriate score normalisation is performed using scores of background words. The technique is tested using MFCC and Modified Group Delay features. Performance gains of 13.7% (relative) improvement over the baseline for clean speech is observed. Further, for noisy speech, the degradation is graceful.
european intelligence and security informatics conference | 2017
Khaled Khelif; Yann Mombrun; Gerhard Backfried; Farhan Sahito; Luca Scarpato; Petr Motlicek; Srikanth R. Madikeri; Damien Kelly; Gideon Hazzani; Emmanouil Chatzigavriil
In this paper, we present a discriminative training algorithm for GMM based speaker verification, and develop a convergence proof for the same. During training of the models, instead of performing MAP adaptation of the UBM, the model parameters of the GMM are estimated such that target scores are maximized while impostor scores are minimized. The focus of the algorithm is to estimate more accurately, the parameters of the elements of the mixture that are unique to a speaker. The algorithm uses an Expectation-Maximisation-like framework for estimation of parameters. It is shown that the algorithm converges with appropriate choice of a regularization parameter (