Mangesh S. Deshpande
Shri Guru Gobind Singhji Institute of Engineering and Technology
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Mangesh S. Deshpande.
international conference on emerging trends in engineering and technology | 2008
Mangesh S. Deshpande; Raghunath S. Holambe
This paper presents a closed-set, text-independent speaker identification using continuous density hidden Markov model (CDHMM). Each registered speaker has a separate HMM which is trained using Baum-Welch algorithm. The system performance has been studied for different system parameters such as the number of states, number of mixture components per state and the amount of data required for training. Identification accuracy of 100% is achieved by conducting the experiments on TIMIT database.
international conference on emerging trends in engineering and technology | 2009
Mangesh S. Deshpande; Raghunath S. Holambe
Linear source-filter models have been widely used by researchers as a front-end for speaker identification systems. It uses the cepstral features derived from the power spectrum of the speech signal. But it is also well known that a significant part of the acoustic information cannot be modeled by the linear sourcefilter model, and thus, the need for nonlinear features becomes apparent. In this paper, an attempt is made to investigate the use of phase function in the analytic signal for deriving a representation of frequencies present in the speech signal. The main objective of the paper is to present a novel parameterization of speech that is based on the nonlinear AM-FM speaker model in the context of close-set speaker identification. The proposed features measure the amount of amplitude and frequency modulation and attempt to model aspects of the speaker related information that the commonly used linear source-filter model fails to capture. To evaluate the robustness of the proposed features for speaker identification, clean speech corpus from TIMIT database has been used and combined the speech signal with car noise and babble noise from the NOISEX-92 database. The proposed feature set provides significant improvements in the identification accuracy over the conventional method like MFCC under mismatched training and testing environments. The results show that better speaker identification rates are attainable under mismatched conditions especially at low signal-to-noise ratio (SNR).
Archive | 2012
Raghunath S. Holambe; Mangesh S. Deshpande
Advances in Non-Linear Modeling for Speech Processing includes advanced topics in non-linear estimation and modeling techniques along with their applications to speaker recognition. Non-linear aeroacoustic modeling approach is used to estimate the important fine-structure speech events, which are not revealed by the short time Fourier transform (STFT). This aeroacostic modeling approach provides the impetus for the high resolution Teager energy operator (TEO). This operator is characterized by a time resolution that can track rapid signal energy changes within a glottal cycle. The cepstral features like linear prediction cepstral coefficients (LPCC) and mel frequency cepstral coefficients (MFCC) are computed from the magnitude spectrum of the speech frame and the phase spectra is neglected. To overcome the problem of neglecting the phase spectra, the speech production system can be represented as an amplitude modulation-frequency modulation (AM-FM) model. To demodulate the speech signal, to estimation the amplitude envelope and instantaneous frequency components, the energy separation algorithm (ESA) and the Hilbert transform demodulation (HTD) algorithm are discussed. Different features derived using above non-linear modeling techniques are used to develop a speaker identification system. Finally, it is shown that, the fusion of speech production and speech perception mechanisms can lead to a robust feature set.
International Journal of Biometrics | 2011
Mangesh S. Deshpande; Raghunath S. Holambe
A robust feature set, Teager Energy Operator based Cepstral Coefficients (TEOCC) for speaker identification task is proposed in this paper. Admissible Wavelet Packet (AWP) transform and the Teager Energy Operator (TEO) is used to obtain robust features. The proposed features significantly improve the speaker identification performance (76.25%) compared with the Mel Frequency Cepstral Coefficient (MFCC) features (57%) in the presence of car noise. The performance is evaluated using TIMIT and NOISEX-92 databases. This paper shows that higher-frequency bands also carry more speaker-specific information and the identification rate can be improved without additional processing of the signal to remove noise.
advances in recent technologies in communication and computing | 2009
Mangesh S. Deshpande; Raghunath S. Holambe
In this paper, a nonlinear AM-FM speech model is used to extract robust features for speaker identification. The proposed features measure the amount of amplitude and frequency modulation that the commonly used linear source-filter model and the Mel frequency cepstral coefficients (MFCC) feature fails to capture. From the short time estimates of the frequency and bandwidth, a novel set of features is proposed. The robustness and discriminability of the features is investigated in comparison with the MFCC features using the clean speech corpus from TIMIT database and noise from the NOISEX-92 database. The proposed feature set provides significant improvement in the identification accuracy over the MFCC features under mismatched training and testing environments. The results show that better speaker identification rates are attainable under mismatched conditions especially at low signal-to-noise ratio (SNR).
Archive | 2012
Raghunath S. Holambe; Mangesh S. Deshpande
We begin this chapter by discussing signal energy in general. We then look at an alternative definition, i.e., the Teager energy operator (TEO) and how it can be obtained by considering a second order differential equation, which describes the motion of an object suspended by a spring. This operator is interesting because it has a small time window, making it ideal for local (time) analysis of signals. The analysis of AM–FM signals using the Teager Energy Operator is probably the field where most of the research regarding the operator has been done so far. Energy separation algorithm using TEO is then discussed and finally its noise suppression capability is presented.
Archive | 2012
Raghunath S. Holambe; Mangesh S. Deshpande
This chapter presents a survey of nonlinear methods for speech processing. Recent developments in nonlinear science have already found their way into a wide range of engineering disciplines, including digital signal processing. It is also important and challenging to develop the nonlinear framework for speech processing because of the well known nonlinearities in the human speech production mechanism.
international conference & workshop on emerging trends in technology | 2011
Mangesh S. Deshpande; Raghunath S. Holambe
Performance of speaker recognition systems strongly degrades in the presence of background noise, like the babble noise. Speech babble is one of the most challenging noise interference due to its speaker/speech like characteristics. In contrast to existing works, the aim is to improve noise robustness focusing on the features only. To derive robust features, amplitude modulation - frequency modulation (AM-FM) based speaker model is proposed which combines the speech production and perception mechanism. The performance is evaluated using clean speech corpus from TIMIT database combined with babble noise from the NOISEX-92 database. Experimental results show that the proposed features significantly improve the performance over the conventional Mel frequency cepstral coefficient (MFCC) features under mismatched training and testing environments.
Archive | 2012
Raghunath S. Holambe; Mangesh S. Deshpande
Speaker recognition refers to a task of recognizing people by their voices. In speaker recognition, one is interested in extracting and characterizing the speaker-specific information embedded in speech signal. In a larger context, speaker recognition belongs to the field of biometrics, which refers to authenticating persons based on their physical and/or learned characteristics. There has long been a desire to be able to identify a person on the basis of his or her voice. For many years, judges, lawyers, detectives and law enforcement agencies have wanted to use forensic voice authentication to investigate a suspect or to confirm a judgment of guilt or innocence.
Archive | 2012
Raghunath S. Holambe; Mangesh S. Deshpande
Session variability is one of the challenging tasks in forensic speaker identification. This variability in terms of mismatched environments seriously degrades the identification performance. In order to address the problem of environment mismatch due to noise, different types of robust features are discussed in this chapter. In state-of-the art features, the speech production system is modeled as a linear source-filter model. However, this modeling technique neglects some nonlinear aspects of speech production, which carry some speaker-specific information. Furthermore, the state-of-the art features are based on either speech production mechanism or speech perception mechanism. To overcome such limitations of existing features, features derived using non-linear modeling techniques are proposed in the chapter. The proposed features, Teager energy operator based cepstral coefficients (TEOCC) and amplitude-frequency modulation (AM-FM) based ‘Q’ features show significant improvement in speaker identification rate in mismatched environments. The performance of these features is evaluated for different types of noise signals in the NOISEX-92 database with clean training and noisy testing environments. The speaker identification rate achieved is 57% using TEOCC features and 97% using AM-FM based ‘Q’ features for 0 dB SNR compared to 25.5% using MFCC features, when the signal is corrupted by car engine noise. It is shown that, with the proposed features, speaker identification accuracy can be increased in presence of noise, without any additional pre-processing of the signal to remove noise.
Collaboration
Dive into the Mangesh S. Deshpande's collaboration.
Shri Guru Gobind Singhji Institute of Engineering and Technology
View shared research outputs