Margaret Lech
RMIT University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Margaret Lech.
Biomedical Signal Processing and Control | 2011
Ling He; Margaret Lech; Namunu Chinthaka Maddage; Nicholas B. Allen
Two new approaches to the feature extraction process for automatic stress and emotion classification in speech are proposed and examined. The first method uses the empirical model decomposition (EMD) of speech into intrinsic mode functions (IMF) and calculates the average Renyi entropy for the IMF channels. The second method calculates the average spectral energy in the sub-bands of speech spectrograms and can be enhanced by anisotropic diffusion filtering of spectrograms. In the second method, three types of sub-bands were examined: critical, Bark and ERB. The performance of the new features was compared with the conventional mel frequency cepstral coefficients (MFCC) method. The modeling and classification process applied the classical GMM and KNN algorithms. The experiments used two databases containing natural speech, SUSAS (annotated with three different levels of stress) and the Oregon Research Institute (ORI) data (annotated with five different emotions: neutral, angry, anxious, dysphoric, and happy). For the SUSAS data, the best average recognition rates of 77% were obtained when using spectrogram features calculated within ERB bands and combined with anisotropic filtering. For the ORI data, the best result of 53% was obtained with the same method but without anisotropic filtering. Both the GMM and KNN classifiers showed similar performance. These results indicated that the spectrogram patterns provide promising results in the case of stress recognition, however further improvements are needed in the case of emotion recognition.
international conference on acoustics, speech, and signal processing | 2010
Lu-Shih Alex Low; Namunu Chinthaka Maddage; Margaret Lech; Lisa Sheeber; Nicholas B. Allen
In this paper, we report the influence that classification accuracies have in speech analysis from a clinical dataset by adding acoustic low-level descriptors (LLD) belonging to prosodic (i.e. pitch, formants, energy, jitter, shimmer) and spectral features (i.e. spectral flux, centroid, entropy and roll-off) along with their delta (Δ) and delta-delta (Δ-Δ) coefficients to two baseline features of Mel frequency cepstral coefficients and Teager energy critical-band based autocorrelation envelope. Extracted acoustic low-level descriptors (LLD) that display an increase in accuracy after being added to these baseline features were finally modeled together using Gaussian mixture models and tested. A clinical data set of speech from 139 adolescents, including 68 (49 girls and 19 boys) diagnosed as clinically depressed, was used in the classification experiments. For male subjects, the combination of (TEO-CB-Auto-Env + Δ + Δ-Δ) + F0 + (LogE + Δ + Δ-Δ) + (Shimmer + Δ) + Spectral Flux + Spectral Roll-off gave the highest classification rate of 77.82% while for the female subjects, using TEO-CB-Auto-Env gave an accuracy of 74.74%.
digital image computing: techniques and applications | 2008
Seyed Mehdi Lajevardi; Margaret Lech
An efficient automatic facial expression recognition method is proposed. The method uses a set of characteristic features obtained by averaging the outputs from the Gabor filter bank with 5 frequencies and 8 different orientations, and then further reducing the dimensionality by the means of principal component analysis. The performance of the proposed system was compared with the full Gabor filter bank method. The classification tasks were performed using the K-Nearest neighbor (K-NN) classifier. The training and testing images were selected from the publicly available JAFFE database. The classification results show that the average Gabor filter (AGF) provides very high computational efficiency at the cost of a relatively small decrease in performance when compared to the full Gabor filter features.
image and vision computing new zealand | 2008
Seyed Mehdi Lajevardi; Margaret Lech
A novel method for facial expression recognition from sequences of image frames is described and tested. The expression recognition system is fully automatic, and consists of the following modules: face detection, maximum arousal detection, feature extraction, selection of optimal features, and facial expression recognition. The face detection is based on AdaBoost algorithm and is followed by the extraction of frames with the maximum arousal (intensity) of emotion using the inter-frame mutual information criterion. The selected frames are then processed to generate characteristic features based on the log-Gabor filter method combined with an optimal feature selection process, which uses the MIFS algorithm. The system can automatically recognize six expressions: anger, disgust, fear, happiness, sadness and surprise. The selected features were classified using the Naive Bayesian (NB) classifier.The system was tested using image sequences from the Cohn-Kanade database. The percentage of correct classification was increased from 68.9% for the non-optimized features to 79.5% for the optimized set of features.
digital image computing: techniques and applications | 2008
Seyed Mehdi Lajevardi; Margaret Lech
This study proposes a classification-based facial expression recognition method using a bank of multilayer perceptron neural networks. Six different facial expressions were considered. Firstly, logarithmic Gabor filters were applied to extract the features. Optimal subsets of features were then selected for each expression, down-sampled and further reduced in size via principal component analysis (PCA). The arrays of eigenvectors were multiplied by the original log-Gabor features to form feature arrays concatenated into six data tensors, representing training sets for different emotions. Each tensor was then used to train one of the six parallel neural networks making each network most sensitive to a different emotion. The classification efficiency of the proposed method was tested on static images from the Cohn-Kanade database. The results were compared with the full set of log-Gabor features. The average percentage of the correct classifications varied across different expressions from 31% to 85% for the optimised sub-set of log-Gabor features and from 23% to 67% for the full set of features. The average correct classification rate was increased from 52% for the full set of the log-Gabor features, to 70% for the optimised sub-set of log-Gabor features.
international conference on natural computation | 2009
Ling He; Margaret Lech; Namunu Chinthaka Maddage; Nicholas B. Allen
This paper presents a new system for automatic stress detection in speech. In the process of feature extraction speech spectrograms were used as the primary features. The sigma-pi neuron cells were then employed to derive the secondary features. The analysis was performed at three alternative sets of analytical frequency bands: critical bands, Bark scale bands and equivalent rectangular bandwidth (ERB) scale bands. The presented algorithm was tested using actual stressful speech utterances from SUSAS (Speech Under Simulated and Actual Stress) database on the vowel-based level. The automatic stress-level classification was implemented using Gaussian mixture model (GMM) and k-nearest neighbor (KNN) classifiers. The strongest effect on the classification results was observed when selecting the type of frequency bands. The ERB scale provided the highest classification results ranging from 67.84% to 73.76%. The classification results did not differ between data sets containing specific types of vowels and data sets containing mixtures of vowels. This indicates that the proposed method can be applied to voiced speech in speech independent conditions.
ieee international conference on cognitive informatics | 2009
Lu-Shih Alex Low; Namunu Chinthaka Maddage; Margaret Lech; Nicholas B. Allen
With suicidal behavior being linked to depression that starts at an early age of a persons life, many investigators are trying to find early tell-tale signs to assist psychologists in detecting clinical depression through acoustic analysis of a patients speech. The purpose of this paper was to study the effectiveness of Mel frequency cepstral coefficients (MFCCs) in capturing the overall mental state of a patient through the analysis of their various vocal emotions displayed during 20 minutes of problem-solving interaction sessions. We also propose both gender based and gender independent clinical depression models using Gaussian Mixture models. Experiments on 139 adolescents subject corpus indicates that incorporation of both first and second time derivatives of MFCCs can improve the overall classification accuracy by 3%. Gender differences proved to be a factor in improving clinical depressed subject detection, where gender based models outperformed the gender independent models by 8%.
network and system security | 2009
Sheeraz Memon; Margaret Lech; Namunu Chinthaka Maddage
The introduction of Gaussian Mixture Models (GMMs) in the field of speaker verification has led to very good results. This paper illustrates an evolution in state-of-the-art Speaker Verification by highlighting the contribution of recently established information theoretic based vector quantization technique. We explore the novel application of three different vector quantization algorithms, namely K-means, Linde-Buzo-Gray (LBG) and Information Theoretic Vector Quantization (ITVQ) for efficient speaker verification. The Expectation Maximization (EM) algorithm used by GMM requires a prohibitive amount of iterations to converge. In this paper, comparable alternatives to EM including K-means, LBG and ITVQ algorithm were tested. The GMM-ITVQ algorithm was found to be the most efficient alternative for the GMM-EM. It gives correct classification rates at a similar level to that of GMM-EM. Finally, representative performance benchmarks and system behaviour experiments on NIST SRE corpora are presented.
international conference on signal processing and communication systems | 2015
Haytham M. Fayek; Margaret Lech; Lawrence Cavedon
Most existing Speech Emotion Recognition (SER) systems rely on turn-wise processing, which aims at recognizing emotions from complete utterances and an overly-complicated pipeline marred by many preprocessing steps and hand-engineered features. To overcome both drawbacks, we propose a real-time SER system based on end-to-end deep learning. Namely, a Deep Neural Network (DNN) that recognizes emotions from a one second frame of raw speech spectrograms is presented and investigated. This is achievable due to a deep hierarchical architecture, data augmentation, and sensible regularization. Promising results are reported on two databases which are the eNTERFACE database and the Surrey Audio-Visual Expressed Emotion (SAVEE) database.
international conference on acoustics, speech, and signal processing | 2012
Kuan Ee Brian Ooi; Lu-Shih Alex Low; Margaret Lech; Nicholas B. Allen
Previous studies of an automated detection of Major Depression in adolescents based on acoustic speech analysis identified the glottal and the Teager Energy features as the strongest correlates of depression. This study investigates the effectiveness of these features in an early prediction of Major Depression in adolescents using a fully automated speech analysis and classification system. The prediction was achieved through a binary classification of speech recordings from 15 adolescents who developed Major Depression within two years after these recordings were made and 15 adolescents who did not developed Major Depression within the same time period. The results provided a proof of concept that an acoustic speech analysis can be used in early prediction of depression. The glottal features made the strongest predictors of depression with 69% accuracy, 62% specificity and 76% sensitivity. The TEO feature derived from glottal wave also provided good results, specifically when calculated at the frequency range of 1.3 kHz to 5.5 kHz.