Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where C. Santhosh Kumar is active.

Publication


Featured researches published by C. Santhosh Kumar.


Power Signals Control and Computations (EPSCICON), 2014 International Conference on | 2014

Spectral matching based voice activity detector for improved speaker recognition

K. T. Sreekumar; Kuruvachan K. George; K. Arunraj; C. Santhosh Kumar

For spoken language processing applications like speaker recognition/verification, not only that the silence segments do not contribute any speaker specific information, but also it dilutes the already available information content in the speech segments in the audio data. It has been experimentally studied that removing silence segments with the help of a voice activity detector(VAD) from the utterance before feature extraction enhances the performance of speaker recognition systems. Empirical algorithms using signal energy and spectral centroid(ESC) is one of the most popular approaches to VAD. In this paper, we show that using spectral matching (SM) to distinguish between silence and speech segments for VAD outperforms the VAD using ESC. We use a neural network with TempoRAl PatternS (TRAPS) of critical band energies as its input for improved performance. We evaluate the performance of VADs using a speaker recognition system developed for 20 speakers.


Expert Systems With Applications | 2016

Intelligent fault diagnosis of synchronous generators

R. Gopinath; C. Santhosh Kumar; Vanam Upendranath; P.V.R. Sai Kiran

A 3?kVA generator fault model is used to diagnose faults in a 5 kVA generator.The model is trained using 3 kVA generator data and 5 kVA generator (no-fault data).System-dependent dimensions are removed using nuisance attribute projection (NAP).Classification and regression tree (CART) is used as a back-end classifier with NAP.NAP improves the performance of the fault identification system. Condition based maintenance (CBM) requires continuous monitoring of mechanical/electrical signals and various operating conditions of the machine to provide maintenance decisions. However, for expensive complex systems (e.g. aerospace), inducing faults and capturing the intelligence about the system is not possible. This necessitates to have a small working model (SWM) to learn about faults and capture the intelligence about the system, and then scale up the fault models to monitor the condition of the complex/prototype system, without ever injecting faults in the prototype system. We refer to this approach as scalable fault models.We check the effectiveness of the proposed approach using a 3 kVA synchronous generator as SWM and a 5 kVA synchronous generator as the prototype system. In this work, we identify and remove the system-dependent features using a nuisance attribute projection (NAP) algorithm to model a system-independent feature space to make the features robust across the two different capacity synchronous generators. The frequency domain statistical features are extracted from the current signals of the synchronous generators. Classification and regression tree (CART) is used as a back-end classifier. NAP improves the performance of the baseline system by 2.05%, 5.94%, and 9.55% for the R, Y, and B phase faults respectively.


ieee india conference | 2015

Locality constrained linear coding for fault diagnosis of rotating machines using vibration analysis

K. T. Sreekumar; R. Gopinath; M. Pushparajan; Aparna S. Raghunath; C. Santhosh Kumar; M. Saimurugan

Support Vector Machine (SVM) is an important machine learning algorithm widely used for the development of machine fault diagnosis systems. In this work, we use an SVM back-end classifier, with statistical features in time and frequency domain as its input, for the development of a fault diagnosis system for a rotating machine. Our baseline system is evaluated for its speed dependent and speed independent performances. In this paper, we use locality constrained linear coding (LLC) to map the input feature vectors to a higher dimensional linear space, and remove some of the speed specific dimensions to improve the speed independent performance of the fault diagnosis system. We use LLC to do the feature mapping to the higher dimensional space, and select only the k nearest neighbour basis vectors to represent the input feature vector and thus reduce/minimize the effect of speed specific factors from the input feature vector, and thus improve the speed independent performance of the fault diagnosis. We compare the performance of the LLC-SVM system for the time and frequency domain statistical features. The proposed approach has improved the overall classification accuracy by 11.81% absolute for time domain features and 10.53% absolute for frequency domain features compared to the baseline speed independent system.


Power Signals Control and Computations (EPSCICON), 2014 International Conference on | 2014

Towards improving the performance of text/language independent speaker recognition systems

Kuruvachan K. George; K. Arunraj; K. T. Sreekumar; C. Santhosh Kumar

Speaker Recognition is an active area of research for the last few decades for its applications in several national security, and other forensic applications. In this work, we present the details of a speaker recognition system developed using universal background model and support vector machines(UBM-SVM). We explored several techniques to improve the performance of the baseline system developed using mel frequency cepstral coefficients(MFCC) as input features. We developed and tested the speaker recognition system for 200 speakers, using the data collected over 13 different channels, such as handset regular phone, speaker phone, regular phone headphone, regular phone, etc. We experimented with the use of RelAtive SpecTrA (RASTA) processing, and feature warping on the input MFCC features, and nuisance attribute projection (NAP) on the Gaussian mixture model supervectors derived in the system. It was seen that these techniques have helped improve the system performance significantly by minimizing the effect of different channels on the system performance. The details of the system implementation and results are presented in this paper. The complete system is developed in MATLAB and C/C++.


ieee india conference | 2015

Weighted cosine distance features for speaker verification

C. Santhosh Kumar; Kuruvachan K. George; Ashish Panda

Cosine distance similarities with a set of reference speakers, cosine distance features (CDF), with a backend support vector machine classifier (CDF-SVM) have been explored in our earlier studies for improving the performance of speaker verification systems. Subsequently, we also investigated on its effectiveness in improving the noise robustness of speaker verification systems. In this work, we study how the performance of CDF-SVM systems can be further improved by weighting the feature vectors using latent semantic information (LSI) technique. We use mel frequency cepstral coefficients (MFCC), power normalized cepstral coefficients (PNCC), or delta spectral cepstral coefficients (DSCC) for deriving CDF. Experimental results on the female part of short2-short3 trials of NIST speaker recognition evaluation dataset show that the proposed weighted CDF-SVM system outperforms the baseline i-vector with cosine distance scoring (i-CDS), i-vector with a backend SVM classifier (i-SVM) and CDF-SVM systems. Finally, we fused the weighted CDF-SVM with i-CDS and the performance of the combined system was evaluated under different stationary and non-stationary additive noise test conditions. It was seen that the noise robustness of the fused weighted CDF-SVM+i-CDS system is significantly better than the individual systems and the fused CDF-SVM+i-CDS of our earlier work in both clean and noisy test environments except for the zero SNR level condition of certain noises.


advances in computing and communications | 2016

Improving the intelligibility of dysarthric speech towards enhancing the effectiveness of speech therapy

S. Arun Kumar; C. Santhosh Kumar

Dysarthria is a neuro-motor disorder in which the muscles used for speech production and articulation are severely affected. Dysarthric patients are characterized by slow or slurred speech that is difficult to understand. This work aims at enhancing the intelligibility of dysarthric speech towards developing an effective speech therapy tool. In this therapy tool, enhanced speech is used for providing auditory feedback with a delay to instill confidence in the patients, so that they can improve their speech intelligibility gradually through relearning. Feature level transformation techniques based on linear predictive coding (LPC) coefficient mapping and frequency warping of LPC poles are experimented in this work. Speech utterances from Nemours dataset with mild and moderate dysarthria are used to study the effectiveness of the proposed algorithms. The quality of the transformed speech is evaluated using subjective and objective measures. A significant improvement in the intelligibility of speech was observed. Our method henceforth could be used to enhance the effectiveness of speech therapy, by encouraging the dysarthric patients talk more, thus helping in their fast rehabilitation.


ieee india conference | 2015

Improving the performance of continuous non-invasive estimation of blood pressure using ECG and PPG

S. Sree Niranjanaa Bose; C. Santhosh Kumar

Blood pressure (BP) is one of the vital signs for assessing the cardiovascular health condition of a person. In recent years, the continuous non-invasive monitoring of BP is of great interest in routine and critical bedside monitoring. Previous studies have shown that pulse transit time (PTT), time taken by the pressure wave to travel between two arterial locations can be a potential indicator of the BP changes. However, hemodynamic factors (HDF) and regulatory factors (RF) influence the changes in BP. In this paper, we propose a model considering the PTT, hemodynamic and regulatory factors derived from Photoplethysmogram (PPG) and Electrocardiogram (ECG) for the better estimation of BP compared to the baseline model using PTT alone. All experiments in this work were performed using ECG, PPG, Arterial Blood Pressure waveforms from Multi-parameter Intelligent Monitoring in Intensive Care (MIMIC) II database. BP was estimated using linear regression and its coefficients were calculated for each subject. In comparison to the baseline system using PTT alone, the system using PTT, HDF and RF as the input parameters achieves a reduction in the mean absolute error (MAE) and the root mean square error (RMSE) by 6.36% and 4.98% absolute for systolic BP and 12.28% and 28.23% absolute respectively for diastolic BP. The results suggest that the quality of BP estimation using PTT, HDF and RF is improved compared to the baseline system using PTT alone.


systems communications | 2014

Towards improving the performance of speaker recognition systems

Neethu Johnson; Kuruvachan K. George; C. Santhosh Kumar; P. C. Reghu Raj

This paper studies the contribution of different phones in speech data towards improving the performance of text/language independent speaker recognition systems. This work is motivated by the fact that the removal of silence segments from the speech data improves the system performance significantly as it does not contain any speaker-specific information. It is also clear from the literature that not all the phones in the speech data contains equal amount of speaker-specific information in it and the performance of the speaker recognition systems depends on this information. In addition to the silence segments, our work empirically finds 18 other diluent phones that has minimum speaker discrimination capability. We propose to use a preprocessing stage that identifies all non-informative set of phones recursively and removes them along with silence segments. Results show that using phones removed preprocessed data in state-of-the-art i-vector system outperforms the baseline i-vector system. We report absolute improvements of 1%, 1%, 2%, 2% and 1% in EER for test set collected through channels of Digital Voice Recorder, Headset, Mobile Phone 1, Mobile Phone 2 and Tablet PC respectively on IITG-MV database.


systems communications | 2014

Random forest algorithm for improving the performance of speech/non-speech detection

Sincy V. Thambi; K. T. Sreekumar; C. Santhosh Kumar; P. C. Reghu Raj

Speech/non-speech detection (SND) distinguishes between speech and non-speech segments in recorded audio and video documents. SND systems can help reduce the storage space required when only speech segments from the audio documents are required, for example content analysis, spoken language identification, etc. In this work, we experimented with the use of time domain, frequency domain and cepstral domain features for short time frames of 20 ms. size along with their mean and standard deviation for segments of size 200 ms. We then analysed if selecting a subset of the features can help improve the performance of the SND system. Towards this, we experimented with different feature selection algorithms, and observed that correlation based feature selection gave the best results. Further, we experimented with different decision tree classification algorithms, and note that random forest algorithm outperformed other decision tree algorithms. We further improved the SND system performance by smoothing the decisions over 5 segments of 200 ms. each. Our baseline system has 272 features, a classification accuracy of 94.45 % and the final system with 8 features has a classification accuracy of 97.80 %.


international conference on acoustics, speech, and signal processing | 2010

Tuning phone decoders for language identification

C. Santhosh Kumar; Haizhou Li; Rong Tong; Pavel Matejka; Lukas Burget; Jan Cernocky

Phonotactic approach, phone recognition to be followed by language modeling, is one of the most popular approaches to language identification (LID). In this work, we explore how language identification accuracy of a phone decoder can be enhanced by varying acoustic resolution of the phone decoder, and subsequently how multiresolution versions of the same decoder can be integrated to improve the LID accuracy. We use mutual information to select the optimum set of phones for a specific acoustic resolution. Further, we propose strategies for building multilingual systems suitable for LID applications, and subsequently fine tune these systems to enhance the overall accuracy.

Collaboration


Dive into the C. Santhosh Kumar's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

R. Gopinath

Amrita Vishwa Vidyapeetham

View shared research outputs
Top Co-Authors

Avatar

Anand Kumar

Amrita Institute of Medical Sciences and Research Centre

View shared research outputs
Top Co-Authors

Avatar

K. T. Sreekumar

Amrita Vishwa Vidyapeetham

View shared research outputs
Top Co-Authors

Avatar

K. Arun Das

Amrita Vishwa Vidyapeetham

View shared research outputs
Top Co-Authors

Avatar

A Anand Kumar

Amrita Vishwa Vidyapeetham

View shared research outputs
Top Co-Authors

Avatar

Anju Prabha

Amrita Vishwa Vidyapeetham

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

H Haritha

Amrita Vishwa Vidyapeetham

View shared research outputs
Researchain Logo
Decentralizing Knowledge