Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Duc Le is active.

Publication


Featured researches published by Duc Le.


ieee automatic speech recognition and understanding workshop | 2013

Emotion recognition from spontaneous speech using Hidden Markov models with deep belief networks

Duc Le; Emily Mower Provost

Research in emotion recognition seeks to develop insights into the temporal properties of emotion. However, automatic emotion recognition from spontaneous speech is challenging due to non-ideal recording conditions and highly ambiguous ground truth labels. Further, emotion recognition systems typically work with noisy high-dimensional data, rendering it difficult to find representative features and train an effective classifier. We tackle this problem by using Deep Belief Networks, which can model complex and non-linear high-level relationships between low-level features. We propose and evaluate a suite of hybrid classifiers based on Hidden Markov Models and Deep Belief Networks. We achieve state-of-the-art results on FAU Aibo, a benchmark dataset in emotion recognition [1]. Our work provides insights into important similarities and differences between speech and emotion.


affective computing and intelligent interaction | 2015

Data selection for acoustic emotion recognition: Analyzing and comparing utterance and sub-utterance selection strategies

Duc Le; Emily Mower Provost

Data selection is an important component of cross-corpus training and semi-supervised/active learning. However, its effect on acoustic emotion recognition is still not well understood. In this work, we perform an in-depth exploration of various data selection strategies for emotion classification from speech using classifier agreement as the selection metric. Our methods span both the traditional utterance as well as the less explored sub-utterance level. A median unweighted average recall of 70.68%, comparable to the winner of the 2009 INTERSPEECH Emotion Challenge, was achieved on the FAU Aibo 2-class problem using less than 50% of the training data. Our results indicate that sub-utterance selection leads to slightly faster convergence and significantly more stable learning. In addition, diversifying instances in terms of classifier agreement produces a faster learning rate, whereas selecting those near the median results in higher stability. We show that the selected data instances can be explained intuitively based on their acoustic properties and position within an utterance. Our work helps provide a deeper understanding of the strengths, weaknesses, and trade-offs of different data selection strategies for speech emotion recognition.


IEEE Transactions on Audio, Speech, and Language Processing | 2016

Automatic Assessment of Speech Intelligibility for Individuals With Aphasia

Duc Le; Keli Licata; Carol Persad; Emily Mower Provost

Traditional in-person therapy may be difficult to access for individuals with aphasia due to the shortage of speech-language pathologists and high treatment cost. Computerized exercises offer a promising low-cost and constantly accessible supplement to in-person therapy. Unfortunately, the lack of feedback for verbal expression in existing programs hinders the applicability and effectiveness of this form of treatment. A prerequisite for producing meaningful feedback is speech intelligibility assessment. In this work, we investigate the feasibility of an automated system to assess three aspects of aphasic speech intelligibility: clarity, fluidity, and prosody. We introduce our aphasic speech corpus, which contains speech-based interaction between individuals with aphasia and a tablet-based application designed for therapeutic purposes. We present our method for eliciting reliable ground-truth labels for speech intelligibility based on the perceptual judgment of nonexpert human evaluators. We describe and analyze our feature set engineered for capturing pronunciation, rhythm, and intonation. We investigate the classification performance of our system under two conditions, one using human-labeled transcripts to drive feature extraction, and another using transcripts generated automatically. We show that some aspects of aphasic speech intelligibility can be estimated at human-level performance. Our results demonstrate the potential for the computerized treatment of aphasia and lay the groundwork for bridging the gap between human and automatic intelligibility assessment.


conference of the international speech communication association | 2016

Improving automatic recognition of aphasic speech with AphasiaBank

Duc Le; Emily Mower Provost

Automatic recognition of aphasic speech is challenging due to various speech-language impairments associated with aphasia as well as a scarcity of training data appropriate for this speaker population. AphasiaBank, a shared database of multimedia interactions primarily used by clinicians to study aphasia, offers a promising source of data for Deep Neural Network acoustic modeling. In this paper, we establish the first large-vocabulary continuous speech recognition baseline on AphasiaBank and study recognition accuracy as a function of diagnoses. We investigate several out-of-domain adaptation methods and show that AphasiaBank data can be leveraged to significantly improve the recognition rate on a smaller aphasic speech corpus. This work helps broaden the understanding of aphasic speech recognition, demonstrates the potential of AphasiaBank, and guides researchers who wish to use this database for their own work.


international conference on acoustics, speech, and signal processing | 2014

Automatic analysis of speech quality for aphasia treatment

Duc Le; Keli Licata; Elizabeth Mercado; Carol Persad; Emily Mower Provost

Aphasia is a common language disorder which can severely affect an individuals ability to communicate with others. Aphasia rehabilitation requires intensive practice accompanied by appropriate feedback, the latter of which is difficult to satisfy outside of therapy. In this paper we take a first step towards developing an intelligent system capable of providing feedback to patients with aphasia through the automation of two typical therapeutic exercises, sentence building and picture description. We describe the natural speech corpus collected from our interaction with clients in the University of Michigan Aphasia Program (UMAP). We develop classifiers to automatically estimate speech quality based on human perceptual judgment. Our automatic prediction yields accuracies comparable to the average human evaluator. Our feature selection process gives insights into the factors that influence human evaluation. The results presented in this work provide support for the feasibility of this type of system.


Speech Communication | 2018

Automatic quantitative analysis of spontaneous aphasic speech

Duc Le; Keli Licata; Emily Mower Provost

Abstract Spontaneous speech analysis plays an important role in the study and treatment of aphasia, but can be difficult to perform manually due to the time consuming nature of speech transcription and coding. Techniques in automatic speech recognition and assessment can potentially alleviate this problem by allowing clinicians to quickly process large amount of speech data. However, automatic analysis of spontaneous aphasic speech has been relatively under-explored in the engineering literature, partly due to the limited amount of available data and difficulties associated with aphasic speech processing. In this work, we perform one of the first large-scale quantitative analysis of spontaneous aphasic speech based on automatic speech recognition (ASR) output. We describe our acoustic modeling method that sets a new recognition benchmark on AphasiaBank, a large-scale aphasic speech corpus. We propose a set of clinically-relevant quantitative measures that are shown to be highly robust to automatic transcription errors. Finally, we demonstrate that these measures can be used to accurately predict the revised Western Aphasia Battery (WAB-R) Aphasia Quotient (AQ) without the need for manual transcripts. The results and techniques presented in our work will help advance the state-of-the-art in aphasic speech processing and make ASR-based technology for aphasia treatment more feasible in real-world clinical applications.


international conference on multimodal interfaces | 2016

Wild wild emotion: a multimodal ensemble approach

John Gideon; Biqiao Zhang; Zakaria Aldeneh; Yelin Kim; Soheil Khorram; Duc Le; Emily Mower Provost

Automatic emotion recognition from audio-visual data is a topic that has been broadly explored using data captured in the laboratory. However, these data are not necessarily representative of how emotion is manifested in the real-world. In this paper, we describe our system for the 2016 Emotion Recognition in the Wild challenge. We use the Acted Facial Expressions in the Wild database 6.0 (AFEW 6.0), which contains short clips of popular TV shows and movies and has more variability in the data compared to laboratory recordings. We explore a set of features that incorporate information from facial expressions and speech, in addition to cues from the background music and overall scene. In particular, we propose the use of a feature set composed of dimensional emotion estimates trained from outside acoustic corpora. We design sets of multiclass and pairwise (one-versus-one) classifiers and fuse the resulting systems. Our fusion increases the performance from a baseline of 38.81% to 43.86% and from 40.47% to 46.88%, for validation and test sets, respectively. While the video features perform better than audio features alone, a combination of the two modalities achieves the greatest performance, with gains of 4.4% and 1.4%, with and without information gain, respectively. Because of the flexible design of the fusion, it is easily adaptable to other multimodal learning problems.


Unknown Journal | 2014

Modeling pronunciation, rhythm, and intonation for automatic assessment of speech quality in aphasia rehabilitation

Duc Le; Emily Mower Provost


conference of the international speech communication association | 2017

Discretized continuous speech emotion recognition with multi-task deep recurrent neural network

Duc Le; Zakaria Aldeneh; Emily Mower Provost


conference of the international speech communication association | 2017

Automatic Paraphasia Detection from Aphasic Speech: A Preliminary Study.

Duc Le; Keli Licata; Emily Mower Provost

Collaboration


Dive into the Duc Le's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar

Keli Licata

University of Michigan

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Je Hun Jeon

University of Texas at Dallas

View shared research outputs
Top Co-Authors

Avatar

John Gideon

University of Michigan

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge