Kuruvachan K. George
Amrita Vishwa Vidyapeetham
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Kuruvachan K. George.
Power Signals Control and Computations (EPSCICON), 2014 International Conference on | 2014
K. T. Sreekumar; Kuruvachan K. George; K. Arunraj; C. Santhosh Kumar
For spoken language processing applications like speaker recognition/verification, not only that the silence segments do not contribute any speaker specific information, but also it dilutes the already available information content in the speech segments in the audio data. It has been experimentally studied that removing silence segments with the help of a voice activity detector(VAD) from the utterance before feature extraction enhances the performance of speaker recognition systems. Empirical algorithms using signal energy and spectral centroid(ESC) is one of the most popular approaches to VAD. In this paper, we show that using spectral matching (SM) to distinguish between silence and speech segments for VAD outperforms the VAD using ESC. We use a neural network with TempoRAl PatternS (TRAPS) of critical band energies as its input for improved performance. We evaluate the performance of VADs using a speaker recognition system developed for 20 speakers.
Power Signals Control and Computations (EPSCICON), 2014 International Conference on | 2014
Kuruvachan K. George; K. Arunraj; K. T. Sreekumar; C. Santhosh Kumar
Speaker Recognition is an active area of research for the last few decades for its applications in several national security, and other forensic applications. In this work, we present the details of a speaker recognition system developed using universal background model and support vector machines(UBM-SVM). We explored several techniques to improve the performance of the baseline system developed using mel frequency cepstral coefficients(MFCC) as input features. We developed and tested the speaker recognition system for 200 speakers, using the data collected over 13 different channels, such as handset regular phone, speaker phone, regular phone headphone, regular phone, etc. We experimented with the use of RelAtive SpecTrA (RASTA) processing, and feature warping on the input MFCC features, and nuisance attribute projection (NAP) on the Gaussian mixture model supervectors derived in the system. It was seen that these techniques have helped improve the system performance significantly by minimizing the effect of different channels on the system performance. The details of the system implementation and results are presented in this paper. The complete system is developed in MATLAB and C/C++.
ieee india conference | 2015
C. Santhosh Kumar; Kuruvachan K. George; Ashish Panda
Cosine distance similarities with a set of reference speakers, cosine distance features (CDF), with a backend support vector machine classifier (CDF-SVM) have been explored in our earlier studies for improving the performance of speaker verification systems. Subsequently, we also investigated on its effectiveness in improving the noise robustness of speaker verification systems. In this work, we study how the performance of CDF-SVM systems can be further improved by weighting the feature vectors using latent semantic information (LSI) technique. We use mel frequency cepstral coefficients (MFCC), power normalized cepstral coefficients (PNCC), or delta spectral cepstral coefficients (DSCC) for deriving CDF. Experimental results on the female part of short2-short3 trials of NIST speaker recognition evaluation dataset show that the proposed weighted CDF-SVM system outperforms the baseline i-vector with cosine distance scoring (i-CDS), i-vector with a backend SVM classifier (i-SVM) and CDF-SVM systems. Finally, we fused the weighted CDF-SVM with i-CDS and the performance of the combined system was evaluated under different stationary and non-stationary additive noise test conditions. It was seen that the noise robustness of the fused weighted CDF-SVM+i-CDS system is significantly better than the individual systems and the fused CDF-SVM+i-CDS of our earlier work in both clean and noisy test environments except for the zero SNR level condition of certain noises.
systems communications | 2014
Neethu Johnson; Kuruvachan K. George; C. Santhosh Kumar; P. C. Reghu Raj
This paper studies the contribution of different phones in speech data towards improving the performance of text/language independent speaker recognition systems. This work is motivated by the fact that the removal of silence segments from the speech data improves the system performance significantly as it does not contain any speaker-specific information. It is also clear from the literature that not all the phones in the speech data contains equal amount of speaker-specific information in it and the performance of the speaker recognition systems depends on this information. In addition to the silence segments, our work empirically finds 18 other diluent phones that has minimum speaker discrimination capability. We propose to use a preprocessing stage that identifies all non-informative set of phones recursively and removes them along with silence segments. Results show that using phones removed preprocessed data in state-of-the-art i-vector system outperforms the baseline i-vector system. We report absolute improvements of 1%, 1%, 2%, 2% and 1% in EER for test set collected through channels of Digital Voice Recorder, Headset, Mobile Phone 1, Mobile Phone 2 and Tablet PC respectively on IITG-MV database.
ieee region 10 conference | 2016
Kuruvachan K. George; Rohan Kumar Das; Sarfaraz Jelil; K. Arun Das; C. Santhosh Kumar; S. R. Mahadeva Prasanna; Ashish Panda
In this work, the details of AMRITA-TCS and IITGUWAHATI speaker recognition systems submitted to the Speakers in the Wild (SITW) speaker recognition challenge are presented. The AMRITA-TCS system is a fusion of i-vector with a backend probabilistic linear discriminant analysis (i-PLDA) system and a cosine distance features (CDF) with backend support vector machine classifier (CDF-SVM) system, developed using the short term cepstral features, mel frequency cepstral coefficients (MFCC) and power normalized cepstral coefficients (PNCC), respectively. The IITGUWAHATI system is an i-PLDA system using MFCC with a vowel like region (VLR) based feature selection (i-PLDA-VLR). The experimental results reported in this work are based on the core-core condition of the challenge. Finally, a fusion of AMRITA-TCS and IITGUWAHATI speaker recognition systems is carried out that enhances the performance than each of the subsystems.
Odyssey 2016 | 2016
Kuruvachan K. George; C Santhosh Kumar; Ashish Panda
Making speaker verification (SV) systems robust to spoofed/mimicked speech attacks is very important to make its use effective in security applications. In this work, we show that using a proximal support vector machine backend classifier with i-vectors as inputs (i-PSVM) can help improve the performance of SV systems for mimicked speech as non-target trials. We compared our results with the state-of-the-art baseline i-vector with cosine distance scoring (i-CDS), i-vector with a backend SVM classifier (i-SVM) and cosine distance features with an SVM backend classifier (CDF-SVM) systems. In iPSVM, proximity of the test utterance to the target and nontarget class is the criteria for decision making while in i-SVM, the distance from the separating hyperplane is the criteria for the decision. It was seen that the i-PSVM approach is advantageous when tested with mimicked speech as non-target trials. This highlights that proximity to the target speakers is a better criteria for speaker verification for mimicked speech. Further, we note that weighting the target and non-target class examples helps us further fine tune the performance of i-PSVM. We then devised a strategy for estimating the weights for every example based on its cosine distance similarity with respect to the centroid of target class examples. The final i-PSVM with example based weighting scheme achieved an improvement of 3.39% absolute in EER when compared to the best baseline system, iSVM. Subsequently, we fused the i-PSVM and i-SVM systems and results show that the performance of the combined system is better than the individual systems.
international conference on digital human modeling and applications in health safety ergonomics and risk management | 2013
Kuruvachan K. George; C. Santhosh Kumar
Dysarthria is a set of congenital and traumatic neuromotor disorders that impair the physical production of speech. These impairments reduce or remove the normal control of the vocal articulators. The acoustic characteristics of dysarthric speech is very different from the speech signal collected from a normative population, with relatively larger intra-speaker inconsistencies in the temporal dynamics of the dysarthric speech [1] [2]. These inconsistencies result in poor audible quality for the dysarthric speech, and in low phone/speech recognition accuracy. Further, collecting and labeling the dysarthric speech is extremely difficult considering the small number of people with these disorders, and the difficulty in labeling the database due to the poor quality of the speech. Hence, it would be of great interest to explore on how to improve the efficiency of the acoustic models built on small dysarthric speech databases such as Nemours [3], or use speech databases collected from a normative population to build acoustic models for dysarthric speakers. In this work, we explore the latter approach.
Pattern Recognition Letters | 2018
Kuruvachan K. George; C. Santhosh Kumar; Sunil Sivadas; Ashish Panda
Abstract In this paper, we describe a method for representing the acoustic similarity of a target speaker with respect to a set of known speakers as a feature for speaker verification. We propose a novel distance based representation by encoding the cosine distance between i-vectors of the utterances belonging to target speaker and reference speakers. The new feature is referred to as cosine distance feature (CDF) and is used with a support vector machine (SVM) classifier (CDF-SVM). We show that reference speakers who rank high in acoustic similarity to the target speaker are more important for better speaker discrimination. A sparse representation of the CDF, that retains only a few of the largest values which correspond to the most similar reference speakers in the CDF vector is found to perform better than the baseline CDF system. We also explore speaker specific CDF where each target speaker has specific subset of most acoustically similar reference speakers. We show that the acoustic similarities between the target and reference speakers are best captured using an intersection kernel SVM. Experimental results on the core short2-short3 condition of NIST 2008 SRE, for both female and male trials, show that the speaker specific CDF outperforms the i-vector and speaker independent CDF based state-of-the-art speaker verification systems.
advances in computing and communications | 2016
K. Arun Das; Kuruvachan K. George; C. Santhosh Kumar; S. Veni; Ashish Panda
Voice spoofing is one of the major challenges that needs to be addressed in the development of robust speaker verification (SV) systems. Therefore, it is necessary to develop systems (spoofing detectors) that are able distinguish between genuine and spoofed speech utterances. In this work, we propose the use of modified gammatone frequency cepstral coefficients (MGFCC) on enhancing the performance of spoofing detection. We also compare the effectiveness of GMM based spoofing detectors developed using mel frequency cepstral coefficients (MFCC), gammatone frequency cepstral coefficients (GFCC), modified group delay cepstral coefficients (MGDCC) and cosine normalized phase cepstral coefficients (CNPCC) with that of MGFCC. The experimental results on ASV spoof 2015 database show that MGFCC outperforms magnitude based, MFCC and GFCC, and phase based, MGDCC and CNPCC, features on the known attack conditions. Further, we performed a score level fusion of the systems developed using MFCC, MGFCC, MGDCC and CNPCC. It is observed that the fused system significantly outperforms all the individual systems for known and unknown attack conditions of ASV spoof 2015 database.
advances in computing and communications | 2016
A. Sathya; J. Swetha; K. Arun Das; Kuruvachan K. George; C. Santhosh Kumar; J Aravinth
It is very important to enhance the robustness of Automatic Speaker Verification (ASV) systems against spoofing attacks. One of the recent research efforts in this direction is to derive features that are robust against spoofed speech. In this work, we experiment with the use of Cosine Normalised Phase-based Cepstral Coefficients (CNPCC) as inputs to a Gaussian Mixture Model (GMM) back-end classifier and compare its results with systems developed using the popular short term cepstral features, Mel-Frequency Cepstral Coefficients (MFCC) and Power Normalised Cepstral Coefficients (PNCC), and show that CNPCC outperforms the other features. We then perform a score level fusion of the system developed using CNPCC with that of the systems using MFCC and PNCC to further enhance the performance. We use known attacks to train and optimise the system and unknown attacks to evaluate and present the results.