Chi-Chun Lee | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Chi-Chun Lee is active.

Explore More

Publication

Featured researches published by Chi-Chun Lee.

Speech Communication | 2011

Emotion recognition using a hierarchical binary decision tree approach

Chi-Chun Lee; Emily Mower; Carlos Busso; Sungbok Lee; Shrikanth Narayanan

Automated emotion state tracking is a crucial element in the computational study of human communication behaviors. It is important to design robust and reliable emotion recognition systems that are suitable for real-world applications both to enhance analytical abilities to support human decision making and to design human-machine interfaces that facilitate efficient communication. We introduce a hierarchical computational structure to recognize emotions. The proposed structure maps an input speech utterance into one of the multiple emotion classes through subsequent layers of binary classifications. The key idea is that the levels in the tree are designed to solve the easiest classification tasks first, allowing us to mitigate error propagation. We evaluated the classification framework on two different emotional databases using acoustic features, the AIBO database and the USC IEMOCAP database. In the case of the AIBO database, we obtain a balanced recall on each of the individual emotion classes using this hierarchical structure. The performance measure of the average unweighted recall on the evaluation data set improves by 3.37% absolute (8.82% relative) over a Support Vector Machine baseline model. In the USC IEMOCAP database, we obtain an absolute improvement of 7.44% (14.58%) over a baseline Support Vector Machine modeling. The results demonstrate that the presented hierarchical approach is effective for classifying emotional utterances in multiple database contexts.

affective computing and intelligent interaction | 2009

Interpreting ambiguous emotional expressions

Emily Mower; Angeliki Metallinou; Chi-Chun Lee; Abe Kazemzadeh; Carlos Busso; Sungbok Lee; Shrikanth Narayanan

Emotion expression is a complex process involving dependencies based on time, speaker, context, mood, personality, and culture. Emotion classification algorithms designed for real-world application must be able to interpret the emotional content of an utterance or dialog given the modulations resulting from these and other dependencies. Algorithmic development often rests on the assumption that the input emotions are uniformly recognized by a pool of evaluators. However, this style of consistent prototypical emotion expression often does not exist outside of a laboratory environment. This paper presents methods for interpreting the emotional content of non-prototypical utterances. These methods include modeling across multiple time-scales and modeling interaction dynamics between interlocutors. This paper recommends classifying emotions based on emotional profiles, or soft-labels, of emotion expression rather than relying on just raw acoustic features or categorical hard labels. Emotion expression is both interactive and dynamic. Consequently, to accurately recognize emotional content, these aspects must be incorporated during algorithmic design to improve classification performance.

IEEE Transactions on Affective Computing | 2014

Robust Unsupervised Arousal Rating:A Rule-Based Framework withKnowledge-Inspired Vocal Features

Daniel Bone; Chi-Chun Lee; Shrikanth Narayanan

Studies in classifying affect from vocal cues have produced exceptional within-corpus results, especially for arousal (activation or stress); yet cross-corpora affect recognition has only recently garnered attention. An essential requirement of many behavioral studies is affect scoring that generalizes across different social contexts and data conditions. We present a robust, unsupervised (rule-based) method for providing a scale-continuous, bounded arousal rating operating on the vocal signal. The method incorporates just three knowledge-inspired features chosen based on empirical and theoretical evidence. It constructs a speakers baseline model for each feature separately, and then computes single-feature arousal scores. Lastly, it advantageously fuses the single-feature arousal scores into a final rating without knowledge of the true affect. The baseline data is preferably labeled as neutral, but some initial evidence is provided to suggest that no labeled data is required in certain cases. The proposed method is compared to a state-of-the-art supervised technique which employs a high-dimensional feature set. The proposed framework achieveshighly-competitive performance with additional benefits. The measure is interpretable, scale-continuous as opposed to discrete, and can operate without any affective labeling. An accompanying Matlab tool is made available with the paper.

Journal of Speech Language and Hearing Research | 2014

The Psychologist as an Interlocutor in Autism Spectrum Disorder Assessment: Insights From a Study of Spontaneous Prosody

Daniel Bone; Chi-Chun Lee; Matthew P. Black; Marian E. Williams; Sungbok Lee; Pat Levitt; Shrikanth Narayanan

PURPOSE The purpose of this study was to examine relationships between prosodic speech cues and autism spectrum disorder (ASD) severity, hypothesizing a mutually interactive relationship between the speech characteristics of the psychologist and the child. The authors objectively quantified acoustic-prosodic cues of the psychologist and of the child with ASD during spontaneous interaction, establishing a methodology for future large-sample analysis. METHOD Speech acoustic-prosodic features were semiautomatically derived from segments of semistructured interviews (Autism Diagnostic Observation Schedule, ADOS; Lord, Rutter, DiLavore, & Risi, 1999; Lord et al., 2012) with 28 children who had previously been diagnosed with ASD. Prosody was quantified in terms of intonation, volume, rate, and voice quality. Research hypotheses were tested via correlation as well as hierarchical and predictive regression between ADOS severity and prosodic cues. RESULTS Automatically extracted speech features demonstrated prosodic characteristics of dyadic interactions. As rated ASD severity increased, both the psychologist and the child demonstrated effects for turn-end pitch slope, and both spoke with atypical voice quality. The psychologists acoustic cues predicted the childs symptom severity better than did the childs acoustic cues. CONCLUSION The psychologist, acting as evaluator and interlocutor, was shown to adjust his or her behavior in predictable ways based on the childs social-communicative impairments. The results support future study of speech prosody of both interaction partners during spontaneous conversation, while using automatic computational methods that allow for scalable analysis on much larger corpora.

Journal of Autism and Developmental Disorders | 2015

Applying Machine Learning to Facilitate Autism Diagnostics: Pitfalls and Promises

Daniel Bone; Matthew S. Goodwin; Matthew P. Black; Chi-Chun Lee; Kartik Audhkhasi; Shrikanth Narayanan

Machine learning has immense potential to enhance diagnostic and intervention research in the behavioral sciences, and may be especially useful in investigations involving the highly prevalent and heterogeneous syndrome of autism spectrum disorder. However, use of machine learning in the absence of clinical domain expertise can be tenuous and lead to misinformed conclusions. To illustrate this concern, the current paper critically evaluates and attempts to reproduce results from two studies (Wall et al. in Transl Psychiatry 2(4):e100, 2012a; PloS One 7(8), 2012b) that claim to drastically reduce time to diagnose autism using machine learning. Our failure to generate comparable findings to those reported by Wall and colleagues using larger and more balanced data underscores several conceptual and methodological problems associated with these studies. We conclude with proposed best-practices when using machine learning in autism research, and highlight some especially promising areas for collaborative work at the intersection of computational and behavioral science.

affective computing and intelligent interaction | 2011

Affective state recognition in married couples' interactions using PCA-based vocal entrainment measures with multiple instance learning

Chi-Chun Lee; Athanasios Katsamanis; Matthew P. Black; Brian R. Baucom; Panayiotis G. Georgiou; Shrikanth Narayanan

Recently there has been an increase in efforts in Behavioral Signal Processing (BSP), that aims to bring quantitative analysis using signal processing techniques in the domain of observational coding. Currently observational coding in fields such as psychology is based on subjective expert coding of abstract human interaction dynamics. In this work, we use a Multiple Instance Learning (MIL) framework, a saliencybased prediction model, with a signal-driven vocal entrainment measure as the feature to predict the affective state of a spouse in problem solving interactions. We generate 18 MIL classifiers to capture the variablelength saliency of vocal entrainment, and a cross-validation scheme with maximum accuracy and mutual information as the metric to select the best performing classifier for each testing couple. This method obtains a recognition accuracy of 53.93%, a 2.14% (4.13% relative) improvement over baseline model using Support Vector Machine. Furthermore, this MIL-based framework has potential for identifying meaningful regions of interest for further detailed analysis of married couples interactions.

international conference on acoustics, speech, and signal processing | 2013

Using physiology and language cues for modeling verbal response latencies of children with ASD

Theodora Chaspari; Daniel Bone; James Gibson; Chi-Chun Lee; Shrikanth Narayanan

Signal-derived measures can provide effective ways towards quantifying human behavior. Verbal Response Latencies (VRLs) of children with Autism Spectrum Disorders (ASD) during conversational interactions are able to convey valuable information about their cognitive and social skills. Motivated by the inherent gap between the external behavior and inner affective state of children with ASD, we study their VRLs in relation to their explicit but also implicit behavioral cues. Explicit cues include the childrens language use, while implicit cues are based on physiological signals. Using these cues, we perform classification and regression tasks to predict the duration type (short/long) and value of VRLs of children with ASD while they interacted with an Embodied Conversational Agent (ECA) and their parents. Since parents are active participants in these triadic interactions, we also take into account their linguistic and physiological behaviors. Our results suggest an association between VRLs and these externalized and internalized signal information streams, providing complementary views of the same problem.

language resources and evaluation | 2016

The USC CreativeIT database of multimodal dyadic interactions: from speech and full body motion capture to continuous emotional annotations

Angeliki Metallinou; Zhaojun Yang; Chi-Chun Lee; Carlos Busso; Sharon Marie Carnicke; Shrikanth Narayanan

Improvised acting is a viable technique to study expressive human communication and to shed light into actors’ creativity. The USC CreativeIT database provides a novel, freely-available multimodal resource for the study of theatrical improvisation and rich expressive human behavior (speech and body language) in dyadic interactions. The theoretical design of the database is based on the well-established improvisation technique of Active Analysis in order to provide naturally induced affective and expressive, goal-driven interactions. This database contains dyadic theatrical improvisations performed by 16 actors, providing detailed full body motion capture data and audio data of each participant in an interaction. The carefully engineered data collection, the improvisation design to elicit natural emotions and expressive speech and body language, as well as the well-developed annotation processes provide a gateway to study and model various aspects of theatrical performance, expressive behaviors and human communication and interaction.

IEEE Access | 2017

Experimental Study on Extreme Learning Machine Applications for Speech Enhancement

Tassadaq Hussain; Sabato Marco Siniscalchi; Chi-Chun Lee; Syu-Siang Wang; Yu Tsao; Wen-Hung Liao

In wireless telephony and audio data mining applications, it is desirable that noise suppression can be made robust against changing noise conditions and operates in real time (or faster). The learning effectiveness and speed of artificial neural networks are therefore critical factors in applications for speech enhancement tasks. To address these issues, we present an extreme learning machine (ELM) framework, aimed at the effective and fast removal of background noise from a single-channel speech signal, based on a set of randomly chosen hidden units and analytically determined output weights. Because feature learning with shallow ELM may not be effective for natural signals, such as speech, even with a large number of hidden nodes, hierarchical ELM (H-ELM) architectures are deployed by leveraging sparse auto-encoders. In this manner, we not only keep all the advantages of deep models in approximating complicated functions and maintaining strong regression capabilities, but we also overcome the cumbersome and time-consuming features of both greedy layer-wise pre-training and back-propagation (BP)-based fine tuning schemes, which are typically adopted for training deep neural architectures. The proposed ELM framework was evaluated on the Aurora–4 speech database. The Aurora–4 task provides relatively limited training data, and test speech data corrupted with both additive noise and convolutive distortions for matched and mismatched channels and signal-to-noise ratio (SNR) conditions. In addition, the task includes a subset of testing data involving noise types and SNR levels that are not seen in the training data. The experimental results indicate that when the amount of training data is limited, both ELM- and H-ELM-based speech enhancement techniques consistently outperform the conventional BP-based shallow and deep learning algorithms, in terms of standardized objective evaluations, under various testing conditions.

international conference on multimedia and expo | 2013

Head motion synchrony and its correlation to affectivity in dyadic interactions

Bo Xiao; Panayiotis G. Georgiou; Chi-Chun Lee; Brian R. Baucom; Shrikanth Narayanan

Behavioral synchrony, or entrainment, is a phenomenon of great interest to psychologists and a challenging construct to quantify. In this work we study the synchrony behavior of head motion in human dyadic interactions. We model head motion using Gaussian Mixture Model (GMM) of line spectral frequencies extracted from the motion vectors of the head. We quantify interlocutor head motion similarity through the Kullback-Leibler divergence of the GMM posteriors of their respective motion sequences. We use an audiovisual database of distressed couple interactions, extensively annotated by psychologists, to test two hypotheses using the derived similarity measure. We validate the first hypothesis - that people are more likely to increase their degree of synchrony as the interaction progresses - by comparing the first and second halves of the interaction. The second hypothesis tests if the relative change of the similarity measure from these two halves is significantly correlated with the behavioral annotation by the domain experts. This work underscores the importance of head motion as an interaction cue, and the feasibility of using it in a computational model for synchrony behavior.

Explore More