Khiet Phuong Truong | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Khiet Phuong Truong is active.

Explore More

Publication

Featured researches published by Khiet Phuong Truong.

Speech Communication | 2007

Automatic discrimination between laughter and speech

Khiet Phuong Truong; David A. van Leeuwen

Emotions can be recognized by audible paralinguistic cues in speech. By detecting these paralinguistic cues that can consist of laughter, a trembling voice, coughs, changes in the intonation contour etc., information about the speakers state and emotion can be revealed. This paper describes the development of a gender-independent laugh detector with the aim to enable automatic emotion recognition. Different types of features (spectral, prosodic) for laughter detection were investigated using different classification techniques (Gaussian Mixture Models, Support Vector Machines, Multi Layer Perceptron) often used in language and speaker recognition. Classification experiments were carried out with short pre-segmented speech and laughter segments extracted from the ICSI Meeting Recorder Corpus (with a mean duration of approximately 2s). Equal error rates of around 3% were obtained when tested on speaker-independent speech data. We found that a fusion between classifiers based on Gaussian Mixture Models and classifiers based on Support Vector Machines increases discriminative power. We also found that a fusion between classifiers that use spectral features and classifiers that use prosodic information usually increases the performance for discrimination between laughter and speech. Our acoustic measurements showed differences between laughter and speech in mean pitch and in the ratio of the durations of unvoiced to voiced portions, which indicate that these prosodic features are indeed useful for discrimination between laughter and speech.

Speech Communication | 2009

Comparing different approaches for automatic pronunciation error detection

Helmer Strik; Khiet Phuong Truong; Febe de Wet; Catia Cucchiarini

One of the biggest challenges in designing computer assisted language learning (CALL) applications that provide automatic feedback on pronunciation errors consists in reliably detecting the pronunciation errors at such a detailed level that the information provided can be useful to learners. In our research we investigate pronunciation errors frequently made by foreigners learning Dutch as a second language. In the present paper we focus on the velar fricative /x/ and the velar plosive /k/. We compare four types of classifiers that can be used to detect erroneous pronunciations of these phones: two acoustic-phonetic classifiers (one of which employs Linear Discriminant Analysis (LDA)), a classifier based on cepstral coefficients in combination with LDA, and one based on confidence measures (the so-called Goodness Of Pronunciation score). The best results were obtained for the two LDA classifiers which produced accuracy levels of about 85-93%.

Journal on Multimodal User Interfaces | 2011

Continuous Interaction with a Virtual Human

Dennis Reidsma; Iwan de Kok; Daniel Neiberg; Sathish Pammi; Bart van Straalen; Khiet Phuong Truong; Herwin van Welbergen

This paper presents our progress in developing a Virtual Human capable of being an attentive speaker. Such a Virtual Human should be able to attend to its interaction partner while it is speaking—and modify its communicative behavior on-the-fly based on what it observes in the behavior of its partner. We report new developments concerning a number of aspects, such as scheduling and interrupting multimodal behavior, automatic classification of listener responses, generation of response eliciting behavior, and strategies for generating appropriate reactions to listener responses. On the basis of this progress, a task-based setup for a responsive Virtual Human was implemented to carry out two user studies, the results of which are presented and discussed in this paper.

intelligent virtual agents | 2010

Backchannel strategies for artificial listeners

Ronald Walter Poppe; Khiet Phuong Truong; Dennis Reidsma; Dirk Heylen

We evaluate multimodal rule-based strategies for backchannel (BC) generation in face-to-face conversations. Such strategies can be used by artificial listeners to determine when to produce a BC in dialogs with human speakers. In this research, we consider features from the speakers speech and gaze. We used six rule-based strategies to determine the placement of BCs. The BCs were performed by an intelligent virtual agent using nods and vocalizations. In a user perception experiment, participants were shown video fragments of a human speaker together with an artificial listener who produced BC behavior according to one of the strategies. Participants were asked to rate how likely they thought the BC behavior had been performed by a human listener. We found that the number, timing and type of BC had a significant effect on how human-like the BC behavior was perceived.

international conference on machine learning | 2008

Decision-Level Fusion for Audio-Visual Laughter Detection

Boris Reuderink; Mannes Poel; Khiet Phuong Truong; Ronald Walter Poppe; Maja Pantic

Laughter is a highly variable signal, which can be caused by a spectrum of emotions. This makes the automatic detection of laughter a challenging, but interesting task. We perform automatic laughter detection using audio-visual data from the AMI Meeting Corpus. Audio-visual laughter detection is performed by fusing the results of separate audio and video classifiers on the decision level. This results in laughter detection with a significantly higher AUC-ROC than single-modality classification.

Speech Communication | 2012

Speech-based recognition of self-reported and observed emotion in a dimensional space

Khiet Phuong Truong; David A. van Leeuwen; Franciska de Jong

The differences between self-reported and observed emotion have only marginally been investigated in the context of speech-based automatic emotion recognition. We address this issue by comparing self-reported emotion ratings to observed emotion ratings and look at how differences between these two types of ratings affect the development and performance of automatic emotion recognizers developed with these ratings. A dimensional approach to emotion modeling is adopted: the ratings are based on continuous arousal and valence scales. We describe the TNO-Gaming Corpus that contains spontaneous vocal and facial expressions elicited via a multiplayer videogame and that includes emotion annotations obtained via self-report and observation by outside observers. Comparisons show that there are discrepancies between self-reported and observed emotion ratings which are also reflected in the performance of the emotion recognizers developed. Using Support Vector Regression in combination with acoustic and textual features, recognizers of arousal and valence are developed that can predict points in a 2-dimensional arousal-valence space. The results of these recognizers show that the self-reported emotion is much harder to recognize than the observed emotion, and that averaging ratings from multiple observers improves performance.

international conference on foundations of augmented cognition | 2007

Unobtrusive multimodal emotion detection in adaptive interfaces: speech and facial expressions

Khiet Phuong Truong; David A. van Leeuwen; Mark A. Neerincx

Two unobtrusive modalities for automatic emotion recognition are discussed: speech and facial expressions. First, an overview is given of emotion recognition studies based on a combination of speech and facial expressions. We will identify difficulties concerning data collection, data fusion, system evaluation and emotion annotation that one is most likely to encounter in emotion recognition research. Further, we identify some of the possible applications for emotion recognition such as health monitoring or e-learning systems. Finally, we will discuss the growing need for developing agreed standards in automatic emotion recognition research.

intelligent virtual agents | 2010

How turn-taking strategies influence users' impressions of an agent

Mark ter Maat; Khiet Phuong Truong; Dirk Heylen

Different turn-taking strategies of an agent influence the impression that people have of it. We recorded conversations of a human with an interviewing agent, controlled by a wizard and using a particular turn-taking strategy. A questionnaire with 27 semantic differential scales concerning personality, emotion, social skills and interviewing skills was used to capture these impressions. We show that it is possible to influence factors such as agreeableness, assertiveness, conversational skill and rapport by varying the agent’s turn-taking strategy.

international conference on foundations of augmented cognition | 2007

Measuring cognitive task load on a naval ship: implications of a real world environment

Marc Grootjen; Mark A. Neerincx; Jochum C. M. van Weert; Khiet Phuong Truong

Application of more and more automation in process control shifts the operators task from manual to supervisory control. Increasing system autonomy, complexity and information fluctuations make it extremely difficult to develop static support concepts that cover all critical situations after implementing the system. Therefore, support systems in dynamic domains should be dynamic as the domain itself. This paper elaborates on the state information needed from the operator to generate effective mitigation strategies. We describe implications of a real world experiment onboard three frigates of the Royal Netherlands Navy. Although new techniques allow us to measure, combine and gain insight in physiological, subjective and task information, many practical issues need to be solved.

Universal Access in The Information Society | 2009

Attuning speech-enabled interfaces to user and context for inclusive design: technology, methodology and practice

Mark A. Neerincx; Anita H. M. Cremers; Judith M. Kessens; David A. van Leeuwen; Khiet Phuong Truong

This paper presents a methodology to apply speech technology for compensating sensory, motor, cognitive and affective usage difficulties. It distinguishes (1) an analysis of accessibility and technological issues for the identification of context-dependent user needs and corresponding opportunities to include speech in multimodal user interfaces, and (2) an iterative generate-and-test process to refine the interface prototype and its design rationale. Best practices show that such inclusion of speech technology, although still imperfect in itself, can enhance both the functional and affective information and communication technology-experiences of specific user groups, such as persons with reading difficulties, hearing-impaired, intellectually disabled, children and older adults.

Explore More