Sandipan Chakroborty
Indian Institute of Technology Kharagpur
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Sandipan Chakroborty.
international conference on computing theory and applications | 2007
Sandipan Chakroborty; Anindya Lal Roy; Sourav Majumdar; Goutam Saha
A state of the art speaker identification (SI) system requires a robust feature extraction unit followed by a speaker modeling scheme for generalized representation of these features. Over the years, mel-frequency cepstral coefficients (MFCC) modeled on the human auditory system have been used as a standard acoustic feature set for SI applications. However, due to the structure of its filter bank, it captures vocal tract characteristics more effectively in the lower frequency regions. This work proposes a new set of features using a complementary filter bank structure which improves distinguishability of speaker specific cues present in the higher frequency zone. Unlike high level features that are difficult to extract, the proposed feature set involves little computational burden during the extraction process. When combined with MFCC via a parallel implementation of speaker models, the proposed feature improves performance baseline of MFCC based system. The proposition is validated by experiments conducted on two different kinds of databases namely YOHO (microphone speech) and POLYCOST (telephone speech) with two different classifier paradigms, namely Gaussian Mixture Models (GMM) and Polynomial Classifier (PC) and for various model orders
international conference on industrial technology | 2006
Sandipan Chakroborty; Anindya Lal Roy; Goutam Saha
A state of the art speaker identification (SI) system requires a robust feature extraction unit followed by a speaker modeling scheme for generalized representation of these features. Over the years, Mel-frequency cepstral coefficients (MFCC) modeled on the human auditory system have been used as a standard acoustic feature set for SI applications. However, due to the structure of its filter bank, it captures vocal tract characteristics more effectively in the lower frequency regions. This work proposes a new set of features using a complementary filter bank structure which improves distinguishability of speaker specific cues present in the higher frequency zone. Unlike high level features that are difficult to extract, the proposed feature set involves little computational burden during the extraction process. When combined with MFCC via a parallel implementation of speaker models, the proposed feature improves performance baseline of MFCC based system. The proposition is validated by experiments conducted on two different kinds of databases namely YOHO (microphone speech) and POLYCOST (telephone speech) with Gaussian mixture model (GMM) as a classifier for various model orders.
Speech Communication | 2008
Suman Senapati; Sandipan Chakroborty; Goutam Saha
In speech enhancement, Bayesian Marginal models cannot explain the inter-scale statistical dependencies of different wavelet scales. Simple non-linear estimators for wavelet-based denoising assume that the wavelet coefficients in different scales are independent in nature. However, wavelet coefficients have significant inter-scale dependencies. This paper introduces a new method that uses the inter-scale dependency between the coefficients and their parents by a Circularly Symmetric Probability Density Function (CS-PDF) related to the family of Spherically Invariant Random Processes (SIRPs) in Log Gabor Wavelet (LGW) domain and corresponding joint shrinkage estimators are derived by Maximum a Posteriori (MAP) estimation theory. The proposed work presents two different joint shrinkage estimators. In first, the inter-scale variance of LGW coefficients is kept constant which gives a closed form solution. In second, a relatively more complex approach is presented where variance is not constrained to be constant. It is also shown that the proposed methods show better performance when speech uncertainty is taken into consideration. The robustness of the proposed frameworks are tested on 50 speakers of POLYCOST and YOHO speech corpus in four different noisy environments against four established speech enhancement algorithms. Experimental results show that the proposed estimators yield a higher improvement in Segmental SNR (S-SNR) and also lower Log Spectral Distortion (LSD) compared to other estimators. In the second evaluation, the proposed speech enhancement techniques are found to give more robust Digit Recognition in noisy conditions on the AURORA 2.0 speech corpus compared to competing methods.
International Journal of Biometrics | 2010
Md. Sahidullah; Sandipan Chakroborty; Goutam Saha
Conventional Speaker Identification (SI) systems utilise spectral features like Mel-Frequency Cepstral Coefficients (MFCC) or Perceptual Linear Prediction (PLP) as a frontend module. Line Spectral pairs Frequencies (LSF) are popular alternative representation of Linear Prediction Coefficients (LPC). In this paper, an investigation is carried out to extract LSF from perceptually modified speech. A new feature set extracted from the residual signal is also proposed. SI system based on this residual feature containing complementary information to spectral characteristics, when fused with the conventional spectral feature based system as well as the proposed perceptually modified LSF, shows improved performance.
international conference on neural information processing | 2004
Goutam Saha; Pankaj Kumar; Sandipan Chakroborty
In this paper we present a comparative study of usefulness of four of the most popular feature extraction algorithm in Artificial Neural Network based Text dependent speaker recognition system. The network uses multi-layered perceptron with backpropagation learning. We show the performance of the network for two phrases with a population of 25 speakers. The result shows normalized Mel Frequency Cepstral Coefficients performing better in false acceptance rate as well as in size of the network for an admissible error rate.
ieee india conference | 2005
Goutam Saha; Suman Senapati; Sandipan Chakroborty
Automatic Speaker Recognition (ASR) needs a robust acoustic feature for representation of speaker and an efficient modeling scheme to yield high recognition accuracy even at adverse conditions. This paper presents a noise study of an ASR system using Mel-Frequency Cepstral Coefficients (MFCC) and an Artificial Neural Network (ANN) classifier. Optimization in feature space using Fishers F-Ratio score is done in order to develop reduced speaker model in no noise (only ambient room noise is present) as well as in several noisy conditions. A new ranking scheme is also proposed in order to stabilize the rank of features in various noise levels by taking Arithmetic Mean of the F-Ratio scores obtained from various levels of Signal to Noise Ratio (SNR). The result is presented for a Text-Dependent ASR system with 25 speaker database.
ieee region 10 conference | 2008
Bibhu Prasad Mishra; Sandipan Chakroborty; Goutam Saha
State-of-the-art Speaker Identification (SI) systems use Gaussian Mixture Models (GMM) for modeling speakerspsila data. Using GMM, a speaker can be identified accurately even from a large number of speakers, when model complexity is large. However, lower ordered speaker model using GMM show poor accuracy as lesser number of Gaussian are involved. In SI context, not much attention have been paid towards improving accuracies for lower order models although they have been used in real-time applications like hierarchical speaker pruning. In this paper, two different approaches have been proposed using Singular Value Decomposition (SVD) based Feature Transformer (FT) for improving accuracies especially for lower ordered speaker models. The results show significant improvements over baseline and have been presented on two widely different public databases comprising of more than 130 speakers.
ieee india conference | 2006
Suman Senapati; Sandipan Chakroborty; Goutam Saha
Speaker identification (SI) system needs an efficient feature extraction process and an appropriate speaker model developed from these features. The work introduces the fusion of log Gabor wavelet (LGW) and maximum a posteriori (MAP) estimator for robust text-independent SI system. The focus of this paper is on the robustness to degradations produced by transmission over a telephone channel. Complete experimental framework is conducted on 49 speakers, conversational telephone King-92 SI speech database with two well known speaker models i.e. Gaussian mixture model (GMM) and vector quantization (VQ). Comparisons are made with two different established methods as well as with normal feature extraction procedure to show the robustness of the new approach in different time segments. The GMM attains 98.8% of identification accuracy using 30 second of wide band speech utterances and 87.3% of identification accuracy using 30 second of narrow band speech utterances and is shown to outperform the other methods
World Academy of Science, Engineering and Technology, International Journal of Electrical, Computer, Energetic, Electronic and Communication Engineering | 2008
Sandipan Chakroborty; Anindya Lal Roy; Goutam Saha
Speech Communication | 2010
Sandipan Chakroborty; Goutam Saha