Kun-Yi Huang | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Kun-Yi Huang is active.

Explore More

Publication

Featured researches published by Kun-Yi Huang.

international conference on acoustics, speech, and signal processing | 2017

Mood detection from daily conversational speech using denoising autoencoder and LSTM

Kun-Yi Huang; Chung-Hsien Wu; Ming-Hsiang Su; Hsiang Chi Fu

In current studies, an extended subjective self-report method is generally used for measuring emotions. Even though it is commonly accepted that speech emotion perceived by the listener is close to the intended emotion conveyed by the speaker, research has indicated that there still remains a mismatch between them. In addition, the individuals with different personalities generally have different emotion expressions. Based on the investigation, in this study, a support vector machine (SVM)-based emotion model is first developed to detect perceived emotion from daily conversational speech. Then, a denoising autoencoder (DAE) is used to construct an emotion conversion model to characterize the relationship between the perceived emotion and the expressed emotion of the subject for a specific personality. Finally, a long short-term memory (LSTM)-based mood model is constructed to model the temporal fluctuation of speech emotions for mood detection. Experimental results show that the proposed method achieved a detection accuracy of 64.5%, improving by 5.0% compared to the HMM-based method.

international conference on acoustics, speech, and signal processing | 2015

Affective structure modeling of speech using probabilistic context free grammar for emotion recognition

Kun-Yi Huang; Jia Kuan Lin; Yu Hsien Chiu; Chung-Hsien Wu

A complete emotional expression typically contains a complex temporal course in a natural conversation. Related research on utterance-level and segment-level processing lacks understanding of the underlying structure of emotional speech. In this study, a hierarchical affective structure of an emotional utterance characterized by the probabilistic context free grammars (PCFGs) is proposed for emotion modeling. SVM-based emotion profiles are obtained and employed to segment the utterance into emotionally consistent segments. Vector quantization is applied to convert the emotion profile of each segment into codewords. A binary tree in which each node represents a codeword is constructed to characterize the affective structure of the utterance modeled by PCFG. Given an input utterance, the output emotion is determined according to the PCFG-based emotion model with the highest likelihood of the speech segments along with the score of the affective structure. For evaluation, the EMO-DB database and its expansion in utterance length were conducted. Experimental results show that the proposed method achieved emotion recognition accuracy of 87.22% for long utterances and outperformed the SVM-based method.

ambient intelligence | 2017

Coupled HMM-based multimodal fusion for mood disorder detection through elicited audio–visual signals

Tsung Hsien Yang; Chung-Hsien Wu; Kun-Yi Huang; Ming-Hsiang Su

Mood disorders encompass a wide array of mood issues, including unipolar depression (UD) and bipolar disorder (BD). In diagnostic evaluation on the outpatients with mood disorder, a high percentage of BD patients are initially misdiagnosed as having UD. It is crucial to establish an accurate distinction between BD and UD to make a correct and early diagnosis, leading to improvements in treatment and course of illness. In this study, eliciting emotional videos are firstly used to elicit the patients’ emotions. After watching each video clips, their facial expressions and speech responses are collected when they are interviewing with a clinician. In mood disorder detection, the facial action unit (AU) profiles and speech emotion profiles (EPs) are obtained, respectively, by using the support vector machines (SVMs) which are built via facial features and speech features adapted from two selected databases using a denoising autoencoder-based method. Finally, a Coupled Hidden Markov Model (CHMM)-based fusion method is proposed to characterize the temporal information. The CHMM is modified to fuse the AUs and the EPs with respect to six emotional videos. Experimental results show the promising advantage and efficacy of the CHMM-based fusion approach for mood disorder detection.

international symposium on chinese spoken language processing | 2016

Dialog state tracking for interview coaching using two-level LSTM

Ming-Hsiang Su; Chung-Hsien Wu; Kun-Yi Huang; Tsung Hsien Yang; Tsui Ching Huang

This study presents an approach to dialog state tracking (DST) in an interview conversation by using the long short-term memory (LSTM) and artificial neural network (ANN). First, the techniques of word embedding are employed for word representation by using the word2vec model. Then, each input sentence is represented by a sentence hidden vector using the LSTM-based sentence model. The sentence hidden vectors for each sentence are fed to the LSTM-based answer model to map the interviewees answer to an answer hidden vector. For dialog state detection, the answer hidden vector is finally used to detect the dialog state using an ANN-based dialog state detection model. To evaluate the proposed method, an interview conversation system was constructed, and an average accuracy of 89.93% was obtained for dialog state detection.

international symposium on chinese spoken language processing | 2016

Detection of mood disorder using speech emotion profiles and LSTM

Tsung Hsien Yang; Chung-Hsien Wu; Kun-Yi Huang; Ming-Hsiang Su

In mood disorder diagnosis, bipolar disorder (BD) patients are often misdiagnosed as unipolar depression (UD) on initial presentation. It is crucial to establish an accurate distinction between BD and UD to make a correct and early diagnosis, leading to improvements in treatment and course of illness. To deal with this misdiagnosis problem, in this study, we experimented on eliciting subjects emotions by watching six eliciting emotional video clips. After watching each video clips, their speech responses were collected when they were interviewing with a clinician. In mood disorder detection, speech emotions play an import role to detect manic or depressive symptoms. Therefore, speech emotion profiles (EP) are obtained by using the support vector machine (SVM) which are built via speech features adapted from selected databases using a denoising autoencoder-based method. Finally, a Long Short-Term Memory (LSTM) recurrent neural network is employed to characterize the temporal information of the EPs with respect to six emotional videos. Comparative experiments clearly show the promising advantage and efficacy of the LSTM-based approach for mood disorder detection.

international conference on asian language processing | 2016

Dialog State Tracking and action selection using deep learning mechanism for interview coaching

Ming-Hsiang Su; Kun-Yi Huang; Tsung Hsien Yang; Kuan Jung Lai; Chung-Hsien Wu

The best way to prepare for an interview is to review the different types of possible interview questions you will be asked during an interview and practice responding to questions. An interview coaching system tries to simulate an interviewer to provide mock interview practice simulation sessions for the users. The traditional interview coaching systems provide some feedbacks, including facial preference, head nodding, response time, speaking rate, and volume, to let users know their own performance in the mock interview. But most of these systems are trained with insufficient dialog data and provide the pre-designed interview questions. In this study, we propose an approach to dialog state tracking and action selection based on deep learning methods. First, the interview corpus in this study is collected from 12 participants, and is annotated with dialog states and actions. Next, a long-short term memory and an artificial neural network are employed to predict dialog states and the Deep RL is adopted to learn the relation between dialog states and actions. Finally, the selected action is used to generate the interview question for interview practice. To evaluate the proposed method in action selection, an interview coaching system is constructed. Experimental results show the effectiveness of the proposed method for dialog state tracking and action selection.

conference of the international speech communication association | 2016

Unipolar depression vs. bipolar disorder: An elicitation-based approach to short-term detection of mood disorder

Kun-Yi Huang; Chung-Hsien Wu; Yu Ting Kuo; Fong Lin Jang

Mood disorders include unipolar depression (UD) and bipolar disorder (BD). In this work, an elicitation-based approach to short-term detection of mood disorder based on the elicited speech responses is proposed. First, a long-short term memory (LSTM)-based classifier was constructed to generate the emotion likelihood for each segment in the elicited speech responses. The emotion likelihoods were then clustered into emotion codewords using the K-means algorithm. Latent semantic analysis (LSA) was then adopted to model the latent relationship between the emotion codewords and the elicited responses. The structural relationships among the emotion codewords in the LSA-based matrix were employed to construct a latent affective structure model (LASM) for characterizing each mood. For mood disorder detection, the similarity between the input speech LASM and each of the mood-specific LASMs was estimated. Finally, the mood with its LASM most similar to the input speech LASM is regarded as the detected mood. Experimental results show that the proposed LASM-based method achieved 73.3%, improving the detection accuracy by 13.3% compared to the commonly used SVM-based classifiers.

international conference on orange technologies | 2015

Data collection of elicited facial expressions and speech responses for mood disorder detection

Kun-Yi Huang; Chung-Hsien Wu; Yu Ting Kuo; Hsiao Hsuan Yen; Fong Lin Jang; Yu Hsien Chiu

Mood disorders include unipolar disorder (UD) and bipolar disorder (BD). In this work, database collection of facial expressions and speech responses based on eliciting emotional videos for mood disorder detection is proposed. First, the chi-squared test was used to evaluate the agreement among manual selection of eliciting emotional videos from 11 candidate emotion videos. Six eliciting emotional videos were selected and then applied to elicit speech responses and facial expressions of the patients. In this paper, the collection procedure of the elicited data is described and the statistics of the database is provided. The collected mood database can be used in mood disorder detection and studies on emotions from elicitation.

international conference on orange technologies | 2015

Extraction and representation of nursing diagnosis for assisted assessment and affective analysis

Yu Hsien Chiu; Wei Hao Chen; Yu Wei Hung; Hsien Chang Wang; Kun-Yi Huang; Chung-Hsien Wu

Early detection of changes in patients condition and making an accurate nursing diagnosis are crucial in clinical practice. Current protocol in nursing diagnosis highly depends on caregivers experience. Demand on processing a large number of nursing documents and transferring expertise has become an important issue. This paper applies computational linguistics approach to extract knowledge and structural model of diagnosis for assisted assessment and change detection of affective expression. A corpus with peer-reviewed case reports and nursing documents were collected for analysis. The evaluation results show that narrative texts with inherent structure and emotional information can be systematically extracted.

asia pacific signal and information processing association annual summit and conference | 2014

Automatic assessment of affective episodes for daily activities analysis

Yu-Hsien Chiu; Kun-Yi Huang; Hsiu-E Chiu; Wei-Hao Chen

Monitoring health conditions and events of grandparent-headed family is important to increase their quality of life and reduce care burdens. Affective episodes are significant indexes in monitoring behavior changes. In this paper, we propose an information retrieval approach to extract affect words from speech and written text to provide quantitative evidence of physical functions and social interactivity for living support and the health related quality of life assessment. Hidden Markov model with a developed behavior grammar network was adopted to transcribe speech. Combined with written texts, an adjusted term-frequency and a sliding window method were performed to extract and quantify affect words. A quantitative index scored by trigger pair approach was applied to assess affective episodes with time and place. Experimental results and case study revealed that the proposed approach shows encouraging potential in monitoring daily activity and family dialog. Its extension may provide an alternative way to obtain implicit information of emotional expression between a family.

Explore More