Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Phu Ngoc Le is active.

Publication


Featured researches published by Phu Ngoc Le.


acm multimedia | 2015

An Investigation of Annotation Delay Compensation and Output-Associative Fusion for Multimodal Continuous Emotion Prediction

Zhaocheng Huang; Ting Dang; Nicholas Cummins; Brian Stasak; Phu Ngoc Le; Vidhyasaharan Sethu; Julien Epps

Continuous emotion dimension prediction has increased in popularity over the last few years, as the shift away from discrete classification based tasks has introduced more realism in emotion modeling. However, many questions remain including how best to combine information from several modalities (e.g. audio, video, etc). As part of the AV+EC 2015 Challenge, we investigate annotation delay compensation and propose a range of multimodal systems based on an output-associative fusion framework. The performance of the proposed systems are significantly higher than the challenge baseline, with the strongest performing system yielding 66.7% and 53.9% relative increases in prediction accuracy over the AV+EC 2015 test set arousal and valence baselines respectively. Results also demonstrate the importance of annotation delay compensation for continuous emotion analysis. Of particular interest was the output-associative based fusion framework, which performed very well in a number of significantly different configurations, highlighting that incorporating both affective dimensional dependencies and temporal information is a promising research direction for predicting emotion dimensions.


Speech Communication | 2011

Investigation of spectral centroid features for cognitive load classification

Phu Ngoc Le; Eliathamby Ambikairajah; Julien Epps; Vidhyasaharan Sethu; Eric H. C. Choi

Speech is a promising modality for the convenient measurement of cognitive load, and recent years have seen the development of several cognitive load classification systems. Many of these systems have utilised mel frequency cepstral coefficients (MFCC) and prosodic features like pitch and intensity to discriminate between different cognitive load levels. However, the accuracies obtained by these systems are still not high enough to allow for their use outside of laboratory environments. One reason for this might be the imperfect acoustic description of speech provided by MFCCs. Since these features do not characterise the distribution of the spectral energy within subbands, in this paper, we investigate the use of spectral centroid frequency (SCF) and spectral centroid amplitude (SCA) features, applying them to the problem of automatic cognitive load classification. The effect of varying the number of filters and the frequency scale used is also evaluated, in terms of the effectiveness of the resultant spectral centroid features in discriminating between cognitive loads. The results of classification experiments show that the spectral centroid features consistently and significantly outperform a baseline system employing MFCC, pitch, and intensity features. Experimental results reported in this paper indicate that the fusion of an SCF based system with an SCA based system results in a relative reduction in error rate of 39% and 29% for two different cognitive load databases.


international conference on pattern recognition | 2010

A Study of Voice Source and Vocal Tract Filter Based Features in Cognitive Load Classification

Phu Ngoc Le; Julien Epps; Eric H. C. Choi; Eliathamby Ambikairajah

Speech has been recognized as an attractive method for the measurement of cognitive load. Previous approaches have used mel frequency cepstral coefficients (MFCCs) as discriminative features to classify cognitive load. The MFCCs contain information from both the voice source and the vocal tract, so that the individual contributions of each to cognitive load variation are unclear. This paper aims to extract speech features related to either the voice source or the vocal tract and use them to discriminate between cognitive load levels in order to identify the individual contribution of each for cognitive load measurement. Voice source-related features are then used to improve the performance of current cognitive load classification systems, using adapted Gaussian mixture models. Our experimental result shows that the use of voice source feature could yield around 12% reduction in relative error rate compared with the baseline system based on MFCCs, intensity, and pitch contour.


international conference on communications | 2008

An improved soft threshold method for DCT speech enhancement

Phu Ngoc Le; Eliathamby Ambikairajah; Eric H. C. Choi

An improved soft threshold method for speech enhancement in the discrete cosine transform (DCT) domain is proposed in this paper. Rather than apply a threshold only to noise-dominant frames, as per traditional DCT-based approaches, our proposed approach also applies the threshold process appropriately in signal-dominant frames. Experimental results show a quality improvement with our proposed method compared to traditional soft threshold methods.


international conference on information and communication security | 2009

A non-uniform subband approach to speech-based cognitive load classification

Phu Ngoc Le; Eliathamby Ambikairajah; Eric H. C. Choi; Julien Epps

Speech has recently been recognized as an attractive method for the measurement of cognitive load. Current speech-based cognitive load measurement systems utilize acoustic features derived from auditory-motivated frequency scales. This paper aims to investigate the distribution of speech information specific to cognitive load discrimination as a function of frequency. We found that this distribution is neither uniform nor very similar to the Mel auditory scale and based on our experiments, we propose a novel non-uniform filterbank for acoustic feature extraction to classify cognitive load. Experimental results showed that the use of the proposed filterbank provided a relative improvement of about 10%, compared with the classification accuracy of the traditional cognitive load classification system based on a Mel-scale filterbank.


acm multimedia | 2016

Staircase Regression in OA RVM, Data Selection and Gender Dependency in AVEC 2016

Zhaocheng Huang; Brian Stasak; Ting Dang; Kalani Wataraka Gamage; Phu Ngoc Le; Vidhyasaharan Sethu; Julien Epps

Within the field of affective computing, human emotion and disorder/disease recognition have progressively attracted more interest in multimodal analysis. This submission to the Depression Classification and Continuous Emotion Prediction challenges for AVEC2016 investigates both, with a focus on audio subsystems. For depression classification, we investigate token word selection, vocal tract coordination parameters computed from spectral centroid features, and gender-dependent classification systems. Token word selection performed very well on the development set. For emotion prediction, we investigate emotionally salient data selection based on emotion change, an output-associative regression approach based on the probabilistic outputs of relevance vector machine classifiers operating on low-high class pairs (OA RVM-SR), and gender-dependent systems. Experimental results from both the development and test sets show that the RVM-SR method under the OA framework can improve on OA RVM, which performed very well in the AV+EC2015 challenge.


conference of the international speech communication association | 2016

Investigation of Sub-Band Discriminative Information Between Spoofed and Genuine Speech.

Kaavya Sriskandaraja; Vidhyasaharan Sethu; Phu Ngoc Le; Eliathamby Ambikairajah

A speaker verification system should include effective precautions against malicious spoofing attacks, and although some initial countermeasures have been recently proposed, this remains a challenging research problem. This paper investigates discrimination between spoofed and genuine speech, as a function of frequency bands, across the speech bandwidth. Findings from our investigation inform some proposed filter bank design approaches for discrimination of spoofed speech. Experiments are conducted on the Spoofing and Anti-Spoofing (SAS) corpus using the proposed frequency-selective approach demonstrates an 11% relative improvement in terms of equal error rate compared with a conventional mel filter bank.


2009 IEEE-RIVF International Conference on Computing and Communication Technologies | 2009

Improvement of Vietnamese Tone Classification using FM and MFCC Features

Phu Ngoc Le; Eliathamby Ambikairajah; Eric H. C. Choi

This paper focuses on tone classification for the Vietnamese speech. Traditionally, tone was classified or recognized by the fundamental frequency F0. However, our experimental results indicate that along with the fundamental frequency, Mel Frequency Cepstrum Coefficients and frequency modulation also carry a significant amount of tone information in the Vietnamese speech. Therefore, the proposed method takes into account these two types of features to improve the classification accuracy. The experimental results show that the proposed classification system provides an improvement of 7.5% in accuracy, compared to the conventional system based on F0 alone.


Physiological Measurement | 2017

Improving the quality and accuracy of non-invasive blood pressure measurement by visual inspection and automated signal processing of the Korotkoff sounds

Branko G. Celler; Phu Ngoc Le; Jim Basilakis; Eliathamby Ambikairajah

OBJECTIVE In this study we investigate inter-operator differences in determining systolic and diastolic pressure from auscultatory sound recordings of Korotkoff sounds. We introduce a new method to record and convert Korotkoff sounds to a high fidelity sound file which can be replayed under optimal conditions by multiple operators, for the independent determination of systolic and diastolic pressure points. APPROACH We have developed a digitised data base of 643 NIBP records from 216 subjects. The Korotkoff signals of 310 good quality records were digitised and the Korotkoff sounds converted to high fidelity audio files. A randomly selected subset of 90 of these data files, were used by an expert panel to independently detect systolic and diastolic points. We then developed a semi-automated method of visualising processed Korotkoff sounds, supported by simple algorithms to detect systolic and diastolic pressure points that provided new insights on the reasons for large differences recorded by the expert panel. MAIN RESULTS Detailed analysis of the 90 randomly selected records revealed that peak root mean square (RMS) energy of the Korotkoff sounds, ranged from 3.3 to 84 mV rms, with the lower bound below the audible range of 4-6 mV rms. The diastolic phase was below the minimum auditory threshold in only 47/90 records. This indicates that for approximately 50% of all records diastole could not be determined from Phase V silence. The maximum relative error recorded for systolic pressure between the two methods, auscultatory and visual/algorithmic, was 30.8 mmHg with a mean error of 8.0  ±  5.4 mmHg. We explore the impact of signal morphology and intensity of the Korotkoff sounds, as well as noise, cardiac arrhythmia and the hearing acuity of the operator, on the accuracy of the measurement. SIGNIFICANCE We conclude that large intra-personal variability in Korotkoff signal morphology and amplitudes, as well as variations in the hearing acuity of the operator, make accurate NIBP measurements using sphygmomanometry difficult and should not be used as the gold standard against which automated NIBP devices are calibrated. We propose an alternative method of visualizing the energy of the Korotkoff sounds and applying simple algorithms to determine systolic and diastolic pressure points, which whilst mimicking classical sphygmomanometry eliminates the problems associated with operator hearing acuity and complex and variable Korotkoff signal morphology.


acm multimedia | 2017

Investigating Word Affect Features and Fusion of Probabilistic Predictions Incorporating Uncertainty in AVEC 2017

Ting Dang; Brian Stasak; Zhaocheng Huang; Sadari Jayawardena; Mia Atcheson; Munawar Hayat; Phu Ngoc Le; Vidhyasaharan Sethu; Roland Goecke; Julien Epps

Predicting emotion intensity and severity of depression are both challenging and important problems within the broader field of affective computing. As part of the AVEC 2017, we developed a number of systems to accomplish these tasks. In particular, word affect features, which derive human affect ratings (e.g. arousal and valence) from transcripts, were investigated for predicting depression severity and liking, showing great promise. A simple system based on the word affect features achieved an RMSE of 6.02 on the test set, yielding a relative improvement of 13.6% over the baseline. For the emotion prediction sub-challenge, we investigated multimodal fusion, which incorporated a measure of uncertainty associated with each prediction within an Output-Associative fusion framework for arousal and valence prediction, whilst liking prediction systems mainly focused on text-based features. Our best emotion prediction systems provided significant relative improvements over the baseline on the test set of 39.5%, 17.6%, and 29.3% for arousal, valence, and liking. Of particular note is that consistent improvements were observed when incorporating prediction uncertainty across various system configurations for predicting arousal and valence, suggesting the importance of taking into consideration prediction uncertainty for fusion and more broadly the advantages of probabilistic predictions.

Collaboration


Dive into the Phu Ngoc Le's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar

Vidhyasaharan Sethu

University of New South Wales

View shared research outputs
Top Co-Authors

Avatar

Julien Epps

University of New South Wales

View shared research outputs
Top Co-Authors

Avatar

Brian Stasak

University of New South Wales

View shared research outputs
Top Co-Authors

Avatar

Ting Dang

University of New South Wales

View shared research outputs
Top Co-Authors

Avatar

Zhaocheng Huang

University of New South Wales

View shared research outputs
Top Co-Authors

Avatar

Branko G. Celler

University of New South Wales

View shared research outputs
Top Co-Authors

Avatar

Jayashri Ravishankar

University of New South Wales

View shared research outputs
Top Co-Authors

Avatar

Kaavya Sriskandaraja

University of New South Wales

View shared research outputs
Researchain Logo
Decentralizing Knowledge