S. R. Mahadeva Prasanna
Indian Institute of Technology Guwahati
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by S. R. Mahadeva Prasanna.
Speech Communication | 2006
S. R. Mahadeva Prasanna; Cheedella S. Gupta; B. Yegnanarayana
In this paper, through different experimental studies we demonstrate that the excitation component of speech can be exploited for speaker recognition studies. Linear prediction (LP) residual is used as a representation of excitation information in speech. The speaker-specific information in the excitation of voiced speech is captured using the AutoAssociative Neural Network (AANN) models. The decrease in the error during training and recognizing correct speakers during testing demonstrates that the excitation component of speech contains speaker-specific information and is indeed being captured by the AANN models. The study on the effect of different LP orders demonstrates that for a speech signal sampled at 8 kHz, the LP residual extracted using LP order in the range 8-20 best represents the speaker-specific excitation information. It is also demonstrated that the proposed speaker recognition system using excitation information and AANN models requires significantly less amount of data both during training as well as testing, compared to the speaker recognition system using vocal tract information. Finally the speaker recognition studies on NIST 2002 database demonstrates that even though, the recognition performance from the excitation information alone is poor, when combined with evidence from vocal tract information, there is significant improvement in the performance. This result demonstrates the complementary nature of the excitation component of speech.
IEEE Transactions on Audio, Speech, and Language Processing | 2009
S. R. Mahadeva Prasanna; B. V. Sandeep Reddy; P. Krishnamoorthy
Vowel onset point (VOP) is the instant at which the onset of vowel takes place during speech production. There are significant changes occurring in the energies of excitation source, spectral peaks, and modulation spectrum at the VOP. This paper demonstrates the independent use of each of these three energies in detecting the VOPs. Since each of these energies represents a different aspect of speech production, it may be possible that they contain complementary information about the VOP. The individual evidences are therefore combined for detecting the VOPs. The error rates measured as the ratio of missing and spurious to the total number of VOPs evaluated on the sentences taken from the TIMIT database are 6.92%, 8.8%, 6.13%, and 4.0% for source, spectral peaks, modulation spectrum, and combined information, respectively. The performance of the combined method for VOP detection is improved by 2.13% compared to the best performing individual VOP detection method.
international conference on acoustics, speech, and signal processing | 2002
B. Yegnanarayana; S. R. Mahadeva Prasanna; K. Sreenivasa Rao
This paper proposes an approach for processing speech from multiple microphones to enhance speech degraded by noise and reverberation. The approach is based on exploiting the features of the excitation source in speech production. In particular, the characteristics of voiced speech can be used to derive a coherently added signal from the linear prediction (LP) residuals of the degraded speech data from different microphones. A weight function is derived from the coherently added signal. For coherent addition the time-delay between a pair of microphones is estimated using the knowledge of the source information present in the LP residual. The enhanced speech is generated by exciting the time varying all-pole filter with the weighted LP residual.
Iete Technical Review | 2012
Soyuj Kumar Sahoo; Tarun Choubisa; S. R. Mahadeva Prasanna
Abstract This paper provides a review of multimodal biometric person authentication systems. The paper begins with an introduction to biometrics, its advantages, disadvantages, and authentication system using them. A brief discussion on the selection criteria of different biometrics is also given. This is followed by a discussion on the classification of biometric systems, their strengths, and limitations. Detailed descriptions on the multimodal biometric person authentication system, different modes of operation, and integration scenarios are also provided. Considering the importance of information fusion in multi-biometric approach, a separate section is dedicated on the different levels of fusion, which include sensor-level, feature-level, score-level, rank-level, and abstract-level fusions, and also different rules of fusion. This paper also presents an overview of some performance parameters and error rates for biometric person authentication systems. A separate section is devoted to the recent trends in biometrics field, namely, adaptive biometric system, analysis of complementary and supplementary information, and physiological biometrics. The paper concludes with a discussion on the issues that are currently holding the deployment of multimodal biometric person authentication systems and a possible scope for future work.
Iete Technical Review | 2009
H. S. Jayanna; S. R. Mahadeva Prasanna
Abstract Speaker recognition system may be viewed as working in four stages, namely, analysis, feature extraction, modeling and testing. This paper gives an overview of the major techniques developed in each of these stages. Such a review helps in understanding the developments that have taken place in each stage and also the available choices of techniques, along with their relative merits and demerits. A comparative study of different techniques is done at the end of each section to justify the choice of techniques available in the state-of-the-art speaker recognition systems. The paper is concluded with a discussion on the possible future direction for the development of techniques in each stage.
ieee region 10 conference | 2008
P. Kartik; S. R. Mahadeva Prasanna; R. V. S. S. Vara Prasad
In this work, we present a multimodal biometric system using speech and signature features. Speaker recognition system is built using Mel frequency cepstral coefficients (MFCC) for feature extraction and vector quantization (VQ) for modeling. An offline signature recognition system is also built using vertical and horizontal projection profiles (VPP and HPP) and discrete cosine transform (DCT) for feature extraction. A multimodal biometric database with speech and signature biometric features collected from 30 users is used for the study. A multimodal biometric system is demonstrated using score level fusion of speaker and signature recognition systems. Sum rule is used for the fusion of the biometric scores. Experimental results show the efficacy of multimodal biometric system using speech and signature features when the biometric data is affected by noise.
international symposium on neural networks | 2002
C.S. Gupta; S. R. Mahadeva Prasanna; B. Yegnanarayana
We demonstrate the usefulness of excitation source information for text-dependent speaker verification. The nature of vibration of vocal folds may be unique for a given speaker. This can be studied by considering vowels, since the excitation in this case is only due to glottal vibration. Linear prediction (LP) residual contains mostly source information. We propose autoassociative neural network models for capturing speaker-specific source information present in the LP residual. Speaker models are built for each vowel to study the extent of speaker information in each vowel. Using this knowledge an online speaker verification system is developed. This study demonstrates that excitation source indeed contains significant speaker information, which can be exploited for speaker recognition tasks.
international conference on acoustics, speech, and signal processing | 2002
S. R. Mahadeva Prasanna; Jinu Mariam Zachariah
Sound units in many languages are syllabic in nature, and frequently used syllables are of consonant-vowel (CV) type. Vowel onset point (VOP) is an important event in CV units. Knowledge of VOPs helps in many applications such as speech recognition, speaker recognition, speech enhancement, begin-end detection, segmentation of speech into vowel/nonvowel-like units and finding duration of vowels. In this paper we describe parameters or features useful for manually identifying the VOPs for different types of CV units. An automatic algorithm is proposed for detecting VOPs in continuous speech, which is motivated by the nature of production and perception of speech. Speech signal is a result of exciting a time varying vocal tract system with time varying excitation. Changes in the source and system characteristics around the VOP are both useful for the detection of VOPs. In this paper we use the changes in the source characteristics for detecting the VOPs. The performance of the proposed algorithm is evaluated using 25 sentences for which a total of 236 VOPs have been identified manually. It is found that 216 VOPs have been detected within a resolution of +/− 30 ms. Compared to the energy-based approach, VOP-based begin-end detection has significantly improved the performance in the case of a text-dependent speaker verification system. For a telephone database of 32 speakers consisting of 480 genuine
international conference on signal processing | 2004
K. Sri Rama Murty; S. R. Mahadeva Prasanna; B. Yegnanarayana
This paper demonstrates the presence of speaker-specific information in the residual phase using autoassociative neural network (AANN) models. The residual phase is extracted from the speech signal after eliminating the vocal tract information by the linear prediction (LP) analysis. AANN models are used for capturing the speaker-specific information present in the residual phase. The speaker recognition studies infer that the residual phase contains significant speaker-specific information and it is indeed captured by the AANN models. In this study we also demonstrate that in voiced speech segments, regions around the instants of glottal closure are more speaker-specific compared to other regions.
Digital Signal Processing | 2016
Rajib Sharma; S. R. Mahadeva Prasanna
Abstract The objective of this work is to obtain meaningful time domain components , or Intrinsic Mode Functions (IMFs), of the speech signal, using Empirical Mode Decomposition (EMD), with reduced mode mixing , and in a time-efficient manner. This work focuses on two aspects – firstly, extracting IMFs of the speech signal which can better reflect its higher frequency spectrum; and secondly, to get a better representation and distribution of the vocal tract resonances of the speech signal in its IMFs, compared to that obtained from standard EMD. To this effect, modifications are proposed to the EMD algorithm for processing speech signals, based on the critical nature of the interpolation points (IPs) used for cubic spline interpolation in EMD. The effect of using different sets of IPs, other than the extrema of the residue – as used in standard EMD – is analyzed. It is found that having more IPs is beneficial only upto a certain limit, after which the characteristic dyadic filterbank nature of EMD breaks down. For certain sets of IPs, these modified EMD processes perform better than EMD, giving better frequency separability between the IMFs, and an enhanced representation of the higher frequency content of the signal. A detailed study of the distribution of the formants , in the IMFs of the speech signal, is done using Linear Prediction (LP) analysis of the IMFs. It is found that the IMFs of the EMD variants have a far better distribution of the formants structure within them, with reduced overlapping amongst their filter spectrums, compared to that of standard EMD. Henceforth, when subjected to the task of formants estimation of voiced speech, using LP analysis, the IMFs of the modified EMD processes cumulatively exhibit a superior performance than that of standard EMD, or the speech signal itself, under both clean and noisy conditions.