Jun-Heng Yeh
Tatung University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Jun-Heng Yeh.
international conference on pattern recognition | 2006
Tsang-Long Pao; Yu-Te Chen; Jun-Heng Yeh; Pei-Jia Li
The exploration of how we as human beings react to the world and interact with it and each other remains one of the greatest scientific challenges. The ability to recognize emotional states of a person perhaps the most important for successful inter-personal social interaction. Automatic emotional speech recognition system can be characterized by the used features, the investigated emotional categories, the methods to collect speech utterances, the languages, and the type of classifier used in the experiments. In this paper, we used SVM and NN classifiers and feature selection algorithm to classify five emotions from Mandarin emotional speech and compared their experimental results. The overall experimental results reveal that the SVM classifier (84.2%) outperforms than NN classifier (80.8%) and detects anger perfectly, but confuses happiness with sadness, boredom and neutral. The NN classifier achieves better performance in recognizing sadness and neutral and differentiates happiness and boredom perfectly
Computers in Human Behavior | 2011
Jun-Heng Yeh; Tsang-Long Pao; Ching-Yi Lin; Yao-Wei Tsai; Yu-Te Chen
Recognition of emotion in speech has recently matured to one of the key disciplines in speech analysis serving next generation human-machine interaction and communication. However, compared to automatic speech recognition, that emotion recognition from an isolated word or a phrase is inappropriate for conversation. Because a complete emotional expression may stride across several sentences, and may fetch-up on any word in dialogue. In this paper, we present a segment-based emotion recognition approach to continuous Mandarin Chinese speech. In this proposed approach, the unit for recognition is not a phrase or a sentence but an emotional expression in dialogue. To that end, the following procedures are presented: First, we evaluate the performance of several classifiers in short sentence speech emotion recognition architectures. The results of the experiments show that the WD-KNN classifier achieves the best accuracy for the 5-class emotion recognition what among the five classification techniques. We then implemented a continuous Mandarin Chinese speech emotion recognition system with an emotion radar chart which is based on WD-KNN; this system can represent the intensity of each emotion component in speech. This proposed approach shows how emotions can be recognized by speech signals, and in turn how emotional states can be visualized.
中文計算語言學期刊 | 2005
Tsang-Long Pao; Yu-Te Chen; Jun-Heng Yeh; Wen-Yuan Liao
The importance of automatically recognizing emotions in human speech has grown with the increasing role of spoken language interfaces in human-computer interaction applications. In this paper, a Mandarin speech based emotion classification method is presented. Five primary human emotions, including anger, boredom, happiness, neutral and sadness, are investigated. Combining different feature streams to obtain a more accurate result is a well-known statistical technique. For speech emotion recognition, we combined 16 LPC coefficients, 12 LPCC components, 16 LFPC components, 16 PLP coefficients, 20 MFCC components and jitter as the basic features to form the feature vector. Two corpora were employed. The recognizer presented in this paper is based on three classification techniques: LDA, K-NN and HMMs. Results show that the selected features are robust and effective for the emotion recognition in the valence and arousal dimensions of the two corpora. Using the HMMs emotion classification method, an average accuracy of 88.7% was achieved.
affective computing and intelligent interaction | 2005
Tsang-Long Pao; Yu-Te Chen; Jun-Heng Yeh; Wen-Yuan Liao
Combining different feature streams to obtain a more accurate experimental result is a well-known technique. The basic argument is that if the recognition errors of systems using the individual streams occur at different points, there is at least a chance that a combined system will be able to correct some of these errors by reference to the other streams. In the emotional speech recognition system, there are many ways in which this general principle can be applied. In this paper, we proposed using feature selection and feature combination to improve the speaker-dependent emotion recognition in Mandarin speech. Five basic emotions are investigated including anger, boredom, happiness, neutral and sadness. Combining multiple feature streams is clearly highly beneficial in our system. The best accuracy recognizing five different emotions can be achieved 99.44% by using MFCC, LPCC, RastaPLP, LFPC feature streams and the nearest class mean classifier.
IEICE Transactions on Information and Systems | 2008
Tsang-Long Pao; Yu-Te Chen; Jun-Heng Yeh
It is said that technology comes out from humanity. What is humanity? The very definition of humanity is emotion. Emotion is the basis for all human expression and the underlying theme behind everything that is done, said, thought or imagined. Making computers being able to perceive and respond to human emotion, the human-computer interaction will be more natural. Several classifiers are adopted for automatically assigning an emotion category, such as anger, happiness or sadness, to a speech utterance. These classifiers were designed independently and tested on various emotional speech corpora, making it difficult to compare and evaluate their performance. In this paper, we first compared several popular classification methods and evaluated their performance by applying them to a Mandarin speech corpus consisting of five basic emotions, including anger, happiness, boredom, sadness and neutral. The extracted feature streams contain MFCC, LPCC, and LPC. The experimental results show that the proposed WD-MKNN classifier achieves an accuracy of 81.4% for the 5-class emotion recognition and outperforms other classification techniques, including KNN, MKNN, DW-KNN, LDA, QDA, GMM, HMM, SVM, and BPNN. Then, to verify the advantage of the proposed method, we compared these classifiers by applying them to another Mandarin expressive speech corpus consisting of two emotions. The experimental results still show that the proposed WD-MKNN outperforms others.
international conference on intelligent computing | 2007
Tsang-Long Pao; Yu-Te Chen; Jun-Heng Yeh; Yun-Maw Cheng; Yu-Yuan Lin
Emotion is fundamental to human experience influencing cognition, perception and everyday tasks such as learning, communication and even rational decision-making. This aspect must be considered in human-computer interaction. In this paper, we compare four different weighting functions in weighted KNN-based classifiers to recognize five emotions, including anger, happiness, sadness, neutral and boredom, from Mandarin emotional speech. The classifiers studied include weighted KNN, weighted CAP, and weighted DKNN. To give a baseline performance measure, we also adopt traditional KNN classifier. The experimental results show that the used Fibonacci weighting function outperforms than others in all weighted classifiers. The highest accuracy achieves 81.4% with weighted D-KNN classifier.
international symposium on chinese spoken language processing | 2004
Tsang-Long Pao; Yu-Te Chen; Jun-Heng Yeh
In this paper, a Mandarin speech based emotion classification method is presented. Five primary human emotions including anger, boredom, happiness, neutral and sadness are investigated. In emotion classification of speech signals, the conventional features are statistics of fundamental frequency, loudness, duration and voice quality. However, the recognition accuracy of systems employing these features degrades substantially when more than two valence emotion categories are invoked. For speech emotion recognition, we select 16 LPC coefficients, 12 LPCC components, 16 LFPC components, 16 PLP coefficients, 20 MFCC components and jitter as the basic features to form the feature vector. A Mandarin corpus recorded by 12 non-professional speakers is employed. The recognizer presented in this paper is based on three recognition techniques: LDA, K-NN, and HMMs. Experimental results show that the selected features are robust and effective for emotion recognition, not only in the arousal dimension but also in the valence dimension.
intelligent information hiding and multimedia signal processing | 2007
Tsang-Long Pao; Charles S. Chien; Jun-Heng Yeh; Yu-Te Chen; Yun-Maw Cheng
Emotions play a significant role in decision-making, healthy, perception, human interaction and human intelligence. Automatic recognition of emotion in speech is very desirable because it adds to the human- computer interaction and becomes an important research area in the last years. However, to the best of our knowledge, no works have focused on automatic emotion tracking of continuous Mandarin emotional speech. In this paper, we present an emotion tracking system, by dividing the utterance into several independent segments, each of which contains a single emotional category. Experimental results reveal that the proposed system produces satisfactory results. On our testing database composed of 279 utterances which are obtained by concatenating short sentences, the average accuracy achieves 83% by using weighted D-KNN classifier and LPCC and MFCC features.
intelligent information hiding and multimedia signal processing | 2007
Tsang-Long Pao; Charles S. Chien; Yu-Te Chen; Jun-Heng Yeh; Yun-Maw Cheng; Wen-Yuan Liao
Automatic emotional speech recognition system can be characterized by the selected features, the investigated emotional categories, the methods to collect speech utterances, the languages, and the type of classifier used in the experiments. Until now, several classifiers are adopted independently and tested on numerous emotional speech corpora but no any classifier is enough to classify the emotional classes optimally. In this paper, we focus on combination schemes of multiple classifiers to achieve best possible recognition rate for the task of 5-classes emotion recognition in Mandarin speech. The investigated classifiers include KNN, WKNN, WCAP, W-DKNN and SVM. The experimental results have shown that classifier combination schemes, including majority voting method, minimum misclassification method and maximum accuracy method, perform better than the single classifiers in terms of overall accuracy with improvements ranging from 0.9%~6.5%.
systems, man and cybernetics | 2006
Tsang-Long Pao; Jun-Heng Yeh; Min-Yen Liu; Yung-Chang Hsu
The typhoon center location is important for weather forecast and typhoon analysis. However, the appearance of the typhoon center as viewed from the IR satellite cloud image will have different shape and size at different time. At the genesis stage, the center of a typhoon is quite ambiguous. When it reached to certain strength, there will be an eye appeared at the center. As the strength of the typhoon getting stronger, the eye tends to shrink in size and also becomes clearer. When the typhoon hit the land, its strength will decrease and the eye may disappear. Only well-trained meteorologists can identify the typhoon center from the satellite cloud image when there is no eye. Since the portion surrounding the eye will do the most damage, it is important to locate and track the center of a typhoon. In this paper, we proposed a novel approach that partitions the satellite cloud image into slices and use morphology operations and image classification methods to automatically locate the center of the typhoon with or without eye. We applied our approach to locate and track the center of different typhoons occurred in recent years and achieved high accuracy.