Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Anil Kumar Vuppala is active.

Publication


Featured researches published by Anil Kumar Vuppala.


IEEE Transactions on Audio, Speech, and Language Processing | 2012

Vowel Onset Point Detection for Low Bit Rate Coded Speech

Anil Kumar Vuppala; Jainath Yadav; Saswat Chakrabarti; K. S. Rao

In this paper, we propose a method for detecting the vowel onset points (VOPs) for low bit rate coded speech. VOP is the instant at which the onset of the vowel takes place in the speech signal. VOP plays an important role for the applications, such as consonant-vowel (CV) unit recognition and speech rate modification. The proposed VOP detection method is based on the spectral energy present in the glottal closure region of the speech signal. Speech coders considered to carry out this study are Global System for Mobile Communications (GSM) full rate, code-excited linear prediction (CELP), and mixed-excitation linear prediction (MELP). TIMIT database and CV units collected from the broadcast news corpus are used for evaluation. Performance of the proposed method is compared with existing methods, which uses the combination of evidence from the excitation source, spectral peaks energy, and modulation spectrum. The proposed VOP detection method has shown significant improvement in the performance compared to the existing method under clean as well as coded cases. The effectiveness of the proposed VOP detection method is analyzed in CV recognition by using VOP as an anchor point.


Speech Communication | 2013

Non-uniform time scale modification using instants of significant excitation and vowel onset points

K. Sreenivasa Rao; Anil Kumar Vuppala

In this paper, a non-uniform time scale modification (TSM) method is proposed for increasing or decreasing speech rate. The proposed method modifies the durations of vowel and pause segments by different modification factors. Vowel segments are modified by factors based on their identities, and pause segments by uniform factors based on the desired speaking rate. Consonant and transition (consonant-to-vowel) segments are not modified in the proposed TSM. These modification factors are derived from the analysis of slow and fast speech collected from professional radio artists. In the proposed TSM method, vowel onset points (VOPs) are used to mark the consonant, transition and vowel regions, and instants of significant excitation (ISE) are used to perform TSM as required. The VOPs indicate the instants at which the onsets of vowels take place. The ISE, also known as epochs, indicate the instants of glottal closure during voiced speech, and some random excitations such as burst onset during non-voiced speech. In this work, VOPs are determined using multiple sources of evidence from excitation source, spectral peaks, modulation spectrum and uniformity in epoch intervals. The ISEs are determined using a zero-frequency filter method. The performance of the proposed non-uniform TSM scheme is compared with uniform and existing non-uniform TSM schemes using epoch and time domain pitch synchronous overlap and add (TD-PSOLA) methods.


national conference on communications | 2012

IITKGP-MLILSC speech database for language identification

Sudhamay Maity; Anil Kumar Vuppala; K. Sreenivasa Rao; Dipanjan Nandi

In this paper, we are introducing speech database consists of 27 Indian languages for analyzing language specific information present in speech. In the context of Indian languages, systematic analysis of various speech features and classification models in view of automatic language identification has not performed, because of the lack of proper speech corpus covering majority of the Indian languages. With this motivation, we have initiated the task of developing multilingual speech corpus in Indian languages. In this paper spectral features are explored for investigating the presence of language specific information. Melfrequency cepstral coefficients (MFCCs) and linear predictive cepstral coefficients (LPCCs) are used for representing the spectral information. Gaussian mixture models (GMMs) are developed to capture the language specific information present in spectral features. The performance of language identification system is analyzed in view of speaker dependent and independent cases. The recognition performance is observed to be 96% and 45% respectively, for speaker dependent and independent environments.


International Journal of Speech Technology | 2013

Vowel onset point detection for noisy speech using spectral energy at formant frequencies

Anil Kumar Vuppala; K. Sreenivasa Rao

In this paper, we propose a method for robust detection of the vowel onset points (VOPs) from noisy speech. The proposed VOP detection method exploits the spectral energy at formant frequencies of the speech segments present in glottal closure region. In this work, formants are extracted by using group delay function, and glottal closure instants are extracted by using zero frequency filter based method. Performance of the proposed VOP detection method is compared with the existing method, which uses the combination of evidence from excitation source, spectral peaks energy and modulation spectrum. Speech data from TIMIT database and noise samples from NOISEX database are used for analyzing the performance of the VOP detection methods. Significant improvement in the performance of VOP detection is observed by using proposed method compared to existing method.


Circuits Systems and Signal Processing | 2012

Spotting and Recognition of Consonant-Vowel Units from Continuous Speech Using Accurate Detection of Vowel Onset Points

Anil Kumar Vuppala; K. Sreenivasa Rao; Saswat Chakrabarti

In this paper, we propose an efficient approach to spotting and recognition of consonant-vowel (CV) units from continuous speech using accurate detection of vowel onset points (VOPs). Existing methods for VOP detection suffer from lack of high accuracy, spurious VOPs, and missed VOPs. The proposed VOP detection is designed to overcome most of the shortcomings of the existing methods and provide accurate detection of VOPs for improving the performance of spotting and recognition of CV units. The proposed method for VOP detection is carried out in two levels. At the first level, VOPs are detected by combining the complementary evidence from excitation source, spectral peaks, and modulation spectrum. At the second level, hypothesized VOPs are verified (genuine or spurious), and their positions are corrected using the uniform epoch intervals present in the vowel regions. The spotted CV units are recognized using a two-stage CV recognizer. Two-stage CV recognition system consists of hidden Markov models (HMMs) at the first stage for recognizing the vowel category of a CV unit and support vector machines (SVMs) for recognizing the consonant category of a CV unit at the second stage. Performance of spotting and recognition of CV units from continuous speech is evaluated using Telugu broadcast news speech corpus.


international conference on signal processing | 2010

Robust speaker recognition on mobile devices

K. Sreenivasa Rao; Anil Kumar Vuppala; Saswat Chakrabarti; Leena Dutta

In this paper we are exploring different models and methods for improving the performance of text independent speaker identification system for mobile devices. The major issues in speaker recognition for mobile devices are (i) presence of varying background environment, (ii) effect of speech coding introduced by the mobile device, and (iii) impairments due to wireless channel. In this paper, we are proposing multi-SNR multi-environment speaker models and speech enhancement (preprocessing) methods for improving the performance of speaker recognition system in mobile environment. For this study, we have simulated five different background environments (Car, Factory, High frequency, pink noise and white Gaussian noise) using NOISEX data. Speaker recognition studies are carried out on TIMIT, cellular, and microphone speech databases. Autoassociative neural network models are explored for developing these multi-SNR multi-environment speaker models. The results indicate that the proposed multi-SNR multi-environment speaker models and speech enhancement preprocessing methods have enhanced the speaker recognition performance in the presence of different noisy environments.


International Journal of Speech Technology | 2011

Recognition of consonant-vowel (CV) units under background noise using combined temporal and spectral preprocessing

Anil Kumar Vuppala; K. Sreenivasa Rao; Saswat Chakrabarti; P. Krishnamoorthy; S. R. M. Prasanna

This paper proposes hybrid classification models and preprocessing methods for enhancing the consonant-vowel (CV) recognition in the presence of background noise. Background Noise is one of the major degradation in real-time environments which strongly effects the performance of speech recognition system. In this work, combined temporal and spectral processing (TSP) methods are explored for preprocessing to improve CV recognition performance. Proposed CV recognition method is carried out in two levels to reduce the similarity among large number of CV classes. In the first level vowel category of CV unit will be recognized, and in the second level consonant category will be recognized. At each level complementary evidences from hybrid models consisting of support vector machine (SVM) and hidden Markov models (HMM) are combined for enhancing the recognition performance. Performance of the proposed CV recognition system is evaluated on Telugu broadcast database for white and vehicle noise. The proposed preprocessing methods and hybrid classification models have improved the recognition performance compared to existed methods.


international conference on contemporary computing | 2010

Effect of Speech Coding on Recognition of Consonant-Vowel (CV) Units

Anil Kumar Vuppala; Saswat Chakrabarti; K. Sreenivasa Rao

The rapid growth of mobile users is creating great deal of interest in the development of robust speech systems in wireless environment. The major challenges involved in adapting the present speech processing technology to mobile wireless systems are: (1) Effect of varying background conditions in mobile environment. (2) Degradations introduced by the speech coders. (3) Errors introduced due to wireless radio channels. In this paper we analyzed the effect of different low bit rate speech coders on the recognition of Consonant-Vowel (CV) units in Indian languages using monolithic SVM and hybrid HMM-SVM models. Speech coders considered in this work are GSM full rate (ETSI 06.10), CELP (FS-1016), and MELP (TI 2.4kbps). From the results, it is observed that there is a significant effect of coding on the recognition of CV units.


International Journal of Signal and Imaging Systems Engineering | 2013

Improved speaker identification in wireless environment

Anil Kumar Vuppala; K. Sreenivasa Rao; Saswat Chakrabarti

The increasing use of wireless mobile systems is creating great deal of interest in the development of robust speech systems in wireless environment. The major degradations involved in wireless environment are: acoustic environment, speech coding and transmission errors. In this paper, we address the problem of Speaker Identification (SI) from coded and cellular speech. Since speaker-specific characteristics are preserved in steady vowel segments of speech even after coding, the features extracted from these steady vowel regions are used to build the SI system. We have proposed a method to determine the steady vowel region from the speech signal by using vowel onset points and epochs. SI studies are carried out using cellular, coded and microphone speech databases. Autoassociative Neural Network (AANN) models are explored for developing the SI models. Speech coders considered in this work are GSM CELP, and MELP. Significant improvement in the performance of SI system is observed by using proposed approach.


ieee students technology symposium | 2010

Continuous digit recognition in mobile environment

Sabin Kafley; Anil Kumar Vuppala; Arun Chauhan; K. Sreenivasa Rao

This paper deals with the recognition of digits uttered in continuous manner in mobile environment (i.e uttering a telephonic data like phone number). In mobile environment, the major issues are background noise, coding and channel impairments. Most of the existing works deal only with the recognition of noisy speech, but in this paper, we have explored about the recognition of speech under noisy coded condition, and it is shown that the performance of recognition is improved by spectral processing techniques. Finally, in this paper, we have shown that speech coding does not significantly affect on the performance of digit recognition. Also, we have compared the results of spectral methods used here for preprocessing, and concluded that Minimum Mean Square Error (MMSE) [1] method produces better result than spectral subtraction method [2].

Collaboration


Dive into the Anil Kumar Vuppala's collaboration.

Top Co-Authors

Avatar

Hari Krishna Vydana

International Institute of Information Technology

View shared research outputs
Top Co-Authors

Avatar

K. Sreenivasa Rao

Indian Institute of Technology Kharagpur

View shared research outputs
Top Co-Authors

Avatar

Suryakanth V. Gangashetty

International Institute of Information Technology

View shared research outputs
Top Co-Authors

Avatar

Saswat Chakrabarti

Indian Institute of Technology Kharagpur

View shared research outputs
Top Co-Authors

Avatar

Krishna Gurugubelli

International Institute of Information Technology

View shared research outputs
Top Co-Authors

Avatar

K. N. R. K. Raju Alluri

International Institute of Information Technology

View shared research outputs
Top Co-Authors

Avatar

Ramakrishna Thirumuru

International Institute of Information Technology

View shared research outputs
Top Co-Authors

Avatar

Manish Shrivastava

International Institute of Information Technology

View shared research outputs
Top Co-Authors

Avatar

Ravi Kumar Vuddagiri

International Institute of Information Technology

View shared research outputs
Top Co-Authors

Avatar

Sivanand Achanta

International Institute of Information Technology

View shared research outputs
Researchain Logo
Decentralizing Knowledge