Ming-Hsiang Su | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Ming-Hsiang Su is active.

Explore More

Publication

Featured researches published by Ming-Hsiang Su.

IEEE Transactions on Audio, Speech, and Language Processing | 2016

Exploiting turn-taking temporal evolution for personality trait perception in dyadic conversations

Ming-Hsiang Su; Chung-Hsien Wu; Yu Ting Zheng

In dyadic conversations, turn-taking is a dynamically evolving behavior strongly linked to paralinguistic communication. Turn-taking temporal evolution in a dyadic conversation is inevitable and can be incorporated into a modeling framework for characterizing and recognizing the personality traits (PTs) of two speakers. This study presents an approach to automatically predicting PTs in a dyadic conversation. First, a recurrent neural network (RNN) was used to model the relationship between Big Five Inventory 10 (BFI-10) items and linguistic features of spoken text in each turn of a speaker (speaker turn) to output a BFI-10 profile. The RNN applies a recurrent property to characterize the short-term temporal evolution of a dialog. Second, the coupled hidden Markov model (C-HMM) was employed to model the long-term turn-taking temporal evolution and cross-speaker contextual information for detecting the PTs of two individuals for the entire dialog represented by the BFI-10 profile sequence. The Mandarin Conversational Dialogue Corpus was used for evaluation. The evaluation result shows that an average perception accuracy of 79.66% for the big five traits was achieved using five-fold cross validation. Compared with conventional HMM and support vector machine-based methods, the proposed approach achieved a more favorable performance according to a statistical significance test. The encouraging results confirm the usability of this system for future applications.

international conference on acoustics, speech, and signal processing | 2017

Mood detection from daily conversational speech using denoising autoencoder and LSTM

Kun-Yi Huang; Chung-Hsien Wu; Ming-Hsiang Su; Hsiang Chi Fu

In current studies, an extended subjective self-report method is generally used for measuring emotions. Even though it is commonly accepted that speech emotion perceived by the listener is close to the intended emotion conveyed by the speaker, research has indicated that there still remains a mismatch between them. In addition, the individuals with different personalities generally have different emotion expressions. Based on the investigation, in this study, a support vector machine (SVM)-based emotion model is first developed to detect perceived emotion from daily conversational speech. Then, a denoising autoencoder (DAE) is used to construct an emotion conversion model to characterize the relationship between the perceived emotion and the expressed emotion of the subject for a specific personality. Finally, a long short-term memory (LSTM)-based mood model is constructed to model the temporal fluctuation of speech emotions for mood detection. Experimental results show that the proposed method achieved a detection accuracy of 64.5%, improving by 5.0% compared to the HMM-based method.

ambient intelligence | 2017

Coupled HMM-based multimodal fusion for mood disorder detection through elicited audio–visual signals

Tsung Hsien Yang; Chung-Hsien Wu; Kun-Yi Huang; Ming-Hsiang Su

Mood disorders encompass a wide array of mood issues, including unipolar depression (UD) and bipolar disorder (BD). In diagnostic evaluation on the outpatients with mood disorder, a high percentage of BD patients are initially misdiagnosed as having UD. It is crucial to establish an accurate distinction between BD and UD to make a correct and early diagnosis, leading to improvements in treatment and course of illness. In this study, eliciting emotional videos are firstly used to elicit the patients’ emotions. After watching each video clips, their facial expressions and speech responses are collected when they are interviewing with a clinician. In mood disorder detection, the facial action unit (AU) profiles and speech emotion profiles (EPs) are obtained, respectively, by using the support vector machines (SVMs) which are built via facial features and speech features adapted from two selected databases using a denoising autoencoder-based method. Finally, a Coupled Hidden Markov Model (CHMM)-based fusion method is proposed to characterize the temporal information. The CHMM is modified to fuse the AUs and the EPs with respect to six emotional videos. Experimental results show the promising advantage and efficacy of the CHMM-based fusion approach for mood disorder detection.

international symposium on chinese spoken language processing | 2016

Dialog state tracking for interview coaching using two-level LSTM

Ming-Hsiang Su; Chung-Hsien Wu; Kun-Yi Huang; Tsung Hsien Yang; Tsui Ching Huang

This study presents an approach to dialog state tracking (DST) in an interview conversation by using the long short-term memory (LSTM) and artificial neural network (ANN). First, the techniques of word embedding are employed for word representation by using the word2vec model. Then, each input sentence is represented by a sentence hidden vector using the LSTM-based sentence model. The sentence hidden vectors for each sentence are fed to the LSTM-based answer model to map the interviewees answer to an answer hidden vector. For dialog state detection, the answer hidden vector is finally used to detect the dialog state using an ANN-based dialog state detection model. To evaluate the proposed method, an interview conversation system was constructed, and an average accuracy of 89.93% was obtained for dialog state detection.

international conference on asian language processing | 2016

Dialog State Tracking and action selection using deep learning mechanism for interview coaching

Ming-Hsiang Su; Kun-Yi Huang; Tsung Hsien Yang; Kuan Jung Lai; Chung-Hsien Wu

The best way to prepare for an interview is to review the different types of possible interview questions you will be asked during an interview and practice responding to questions. An interview coaching system tries to simulate an interviewer to provide mock interview practice simulation sessions for the users. The traditional interview coaching systems provide some feedbacks, including facial preference, head nodding, response time, speaking rate, and volume, to let users know their own performance in the mock interview. But most of these systems are trained with insufficient dialog data and provide the pre-designed interview questions. In this study, we propose an approach to dialog state tracking and action selection based on deep learning methods. First, the interview corpus in this study is collected from 12 participants, and is annotated with dialog states and actions. Next, a long-short term memory and an artificial neural network are employed to predict dialog states and the Deep RL is adopted to learn the relation between dialog states and actions. Finally, the selected action is used to generate the interview question for interview practice. To evaluate the proposed method in action selection, an interview coaching system is constructed. Experimental results show the effectiveness of the proposed method for dialog state tracking and action selection.

international symposium on chinese spoken language processing | 2014

Interlocutor personality perception based on BFI profiles and coupled HMMs in a dyadic conversation

Ming-Hsiang Su; Yu Ting Zheng; Chung-Hsien Wu

Previous studies have found systematic associations between personality and individual differences in interpersonal communication. Recently, researchers used various features to analyze individual personality traits in speech, social media content and essays. While few studies focused on detecting the personality interaction between two interlocutors, this paper presents a new approach to automatically and simultaneously predict the personalities of two interlocutors in a dyadic conversation. First, the recurrent neural networks (RNNs) are adopted to project the linguistic features of the transcribed spoken text of the input speech to the Big Five Inventory (BFI) space. The Coupled hidden Markov models (coupled HMMs) are then used to predict the interlocutor personality from the transcribed text of the two speakers considering the conversational interaction in their dialogue turns. The Mandarin Conversational Dialogue Corpus (MCDC) was adopted to evaluate the performance on interlocutor personality perception. Experimental results show that the proposed approach achieved satisfactory results in predicting personalities of two interlocutors at the same time.

Eurasip Journal on Audio, Speech, and Music Processing | 2017