Sheeraz Memon
RMIT University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Sheeraz Memon.
international conference on signal processing and communication systems | 2010
Md. Afzal Hossan; Sheeraz Memon; Mark A. Gregory
The Mel-Frequency Cepstral Coefficients (MFCC) feature extraction method is a leading approach for speech feature extraction and current research aims to identify performance enhancements. One of the recent MFCC implementations is the Delta-Delta MFCC, which improves speaker verification. In this paper, a new MFCC feature extraction method based on distributed Discrete Cosine Transform (DCT-II) is presented. Speaker verification tests are proposed based on three different feature extraction methods including: conventional MFCC, Delta-Delta MFCC and distributed DCT-II based Delta-Delta MFCC with a Gaussian Mixture Model (GMM) classifier.
network and system security | 2009
Sheeraz Memon; Margaret Lech; Namunu Chinthaka Maddage
The introduction of Gaussian Mixture Models (GMMs) in the field of speaker verification has led to very good results. This paper illustrates an evolution in state-of-the-art Speaker Verification by highlighting the contribution of recently established information theoretic based vector quantization technique. We explore the novel application of three different vector quantization algorithms, namely K-means, Linde-Buzo-Gray (LBG) and Information Theoretic Vector Quantization (ITVQ) for efficient speaker verification. The Expectation Maximization (EM) algorithm used by GMM requires a prohibitive amount of iterations to converge. In this paper, comparable alternatives to EM including K-means, LBG and ITVQ algorithm were tested. The GMM-ITVQ algorithm was found to be the most efficient alternative for the GMM-EM. It gives correct classification rates at a similar level to that of GMM-EM. Finally, representative performance benchmarks and system behaviour experiments on NIST SRE corpora are presented.
international conference on computer, control and communication | 2009
Sheeraz Memon; Margaret Lech; Ling He
Over the recent years different versions the GMM classifier combined with the MFCC features have been established as speaker verification benchmarks. Although highly efficient, these systems suffer from computational complexity and occasional convergence problems. In this study a search of alternative classification and feature extraction methods of similar classification efficiency but overcoming some of the problems of the classical methods was undertaken. Preliminary results obtained for two different classification methods: the classical GMM and the ITVQ and three different feature extraction methods: MFCC, IMFCC and the MFCC/IMFCC fusion are presented. The ITVQ did not show better results compare to the classical GMM classifier, however the EER increase in case for the ITVQ was only by 0.2%. The best feature extraction method was proven to be the MFCC/IMFCC fusion. Both the MFCC/IMFCC fusion and the IMFCC outperformed the classical MFCC method.
international conference on bioinformatics and biomedical engineering | 2009
Ling He; Margaret Lech; Namunu Chinthaka Maddage; Sheeraz Memon
The speech signal is an important tool for conveying information between humans; at the same time, it is an indicator of a speakers emotions. In this paper, the automatic identification of affect from speech containing spontaneously expressed (not acted) emotions within different environments was investigated. The Teager Energy Operator?Perceptual Wavelet Packet (TEO-PWP) features as well as the Mel Frequency Cepstral Coefficients (MFCC) were used to model the emotions using two classifiers: the Gaussian mixture model (GMM) and the probabilistic neural network (PNN). The classification experiments were conducted using two data sets: SUSAS with three classes (high stress, moderate stress and neutral) and ORI with five classes (angry, happy, anxious, dysphoric and neutral). Depending on the features/classifier combination, the average classification results for the SUSAS data ranged from 95% to 61%, whereas the ORI data provided lower average rates ranging from 57% to 37%. The best overall performance was achieved while using the TEO-PWP in combination with the GMM classifier giving an average of 94.75% correct classifications for the SUSAS data and 56.6% for the ORI data. Different arousal levels between SUSAS and ORI emotional classes were suggested to be most likely cause for the difference in classification rates between these two data sets.
international multi-topic conference | 2008
Sheeraz Memon; Margaret Lech
This paper explores the application of information theoretic based Vector Quantization algorithm called VQIT for speaker verification. Unlike the K-means and LBG Vector Quantization algorithms, VQIT has a physical interpretation and relies on minimization of quantization error in an efficient way. Vector Quantization based Speaker Verification has proven to be successful; usually a codebook is trained to minimize the quantization error for the data from an individual speaker. In this paper we use a set of 36 speakers from TIMIT database and evaluate MFCC and LPC coefficients of speech samples and later apply it to the K-means Vector Quantization, LBG Vector Quantization and VQIT Vector Quantization and suggest that VQIT performs better than other VQ implementations. We also obtain the results from the GMM classifier for the similar coefficient data and compare it to the VQIT.
international conference on pattern recognition | 2010
Sheeraz Memon; Margaret Lech; Namunu Chinthaka Maddage
The expectation maximization (EM) algorithm is widely used in the Gaussian mixture model (GMM) as the state-of-art statistical modeling technique. Like the classical EM method, the proposed EM-Information Theoretic algorithm (EM-IT) adapts means, covariances and weights, however this process is not conducted directly on feature vectors but on a smaller set of centroids derived by the information theoretic procedure, which simultaneously minimizes the divergence between the Parzen estimates of the feature vector’s distribution within a given Gaussian component and the centroid’s distribution within the same Gaussian component. The EM-IT algorithm was applied to the speaker verification problem using NIST 2004 speech corpus and the MFCC with dynamic features. The results showed an improvement of the equal error rate (ERR) by 1.5% over the classical EM approach. The EM-IT also showed higher convergence rates compare to the EM method.
international conference on bioinformatics and biomedical engineering | 2009
Sheeraz Memon; Namunu Chinthaka Maddage; Margaret Lech; Nicholas B. Allen
This study investigates effects of a clinical environment on speaker recognition rates. Two sets of speakers were used: a clinical set containing speech recordings of 70 clinically depressed speakers and a control set containing 68 non-depressed speakers. MFCC characteristic features were used to produce statistical models of speakers using four modeling methods: GMM_EM, GMM_K-means, GMM_LBG, and LBG_ITVQ. In all cases the speaker recognition rates for the depressed speakers were lower (60%-71%) than for the non-depressed speakers (79%-89%). In this work we also analyze the performance of VQ based Gaussian modeling and suggest that GMM-EM has the higher recognition rates, however the performance of GMM-ITVQ is comparable to GMM-EM. We also perform the experiments using different number of Gaussian mixtures in between 1-1024 and obtain the results that adding more mixtures increases the complexity, makes the thinner distribution of data and thus degrades the recognition rate. Results in this work also suggest that the size of train and test speech could affect the recognition rates largely.
conference of the international speech communication association | 2008
Ling He; Margaret Lech; Sheeraz Memon; Nicholas B. Allen
european signal processing conference | 2008
Sheeraz Memon; Margaret Lech
Archive | 2009
Sheeraz Memon; Margaret Lech; Namunu Chinthaka Maddage; Ling He