Jen-Chun Lin | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Jen-Chun Lin is active.

Explore More

Publication

Featured researches published by Jen-Chun Lin.

IEEE Transactions on Multimedia | 2012

Error Weighted Semi-Coupled Hidden Markov Model for Audio-Visual Emotion Recognition

Jen-Chun Lin; Chung-Hsien Wu; Wen-Li Wei

This paper presents an approach to the automatic recognition of human emotions from audio-visual bimodal signals using an error weighted semi-coupled hidden Markov model (EWSC-HMM). The proposed approach combines an SC-HMM with a state-based bimodal alignment strategy and a Bayesian classifier weighting scheme to obtain the optimal emotion recognition result based on audio-visual bimodal fusion. The state-based bimodal alignment strategy in SC-HMM is proposed to align the temporal relation between audio and visual streams. The Bayesian classifier weighting scheme is then adopted to explore the contributions of the SC-HMM-based classifiers for different audio-visual feature pairs in order to obtain the emotion recognition output. For performance evaluation, two databases are considered: the MHMC posed database and the SEMAINE naturalistic database. Experimental results show that the proposed approach not only outperforms other fusion-based bimodal emotion recognition methods for posed expressions but also provides satisfactory results for naturalistic expressions.

APSIPA Transactions on Signal and Information Processing | 2014

Survey on audiovisual emotion recognition: Databases, features, and data fusion strategies

Chung-Hsien Wu; Jen-Chun Lin; Wen Li Wei

Emotion recognition is the ability to identify what people would think someone is feeling from moment to moment and understand the connection between his/her feelings and expressions. In todays world, human–computer interaction (HCI) interface undoubtedly plays an important role in our daily life. Toward harmonious HCI interface, automated analysis and recognition of human emotion has attracted increasing attention from the researchers in multidisciplinary research fields. In this paper, a survey on the theoretical and practical work offering new and broad views of the latest research in emotion recognition from bimodal information including facial and vocal expressions is provided. First, the currently available audiovisual emotion databases are described. Facial and vocal features and audiovisual bimodal data fusion methods for emotion recognition are then surveyed and discussed. Specifically, this survey also covers the recent emotion challenges in several conferences. Conclusions outline and address some of the existing emotion recognition issues.

IEEE Transactions on Multimedia | 2013

Two-Level Hierarchical Alignment for Semi-Coupled HMM-Based Audiovisual Emotion Recognition With Temporal Course

Chung-Hsien Wu; Jen-Chun Lin; Wen-Li Wei

A complete emotional expression typically contains a complex temporal course in face-to-face natural conversation. To address this problem, a bimodal hidden Markov model (HMM)-based emotion recognition scheme, constructed in terms of sub-emotional states, which are defined to represent temporal phases of onset, apex, and offset, is adopted to model the temporal course of an emotional expression for audio and visual signal streams. A two-level hierarchical alignment mechanism is proposed to align the relationship within and between the temporal phases in the audio and visual HMM sequences at the model and state levels in a proposed semi-coupled hidden Markov model (SC-HMM). Furthermore, by integrating a sub-emotion language model, which considers the temporal transition between sub-emotional states, the proposed two-level hierarchical alignment-based SC-HMM (2H-SC-HMM) can provide a constraint on allowable temporal structures to determine an optimal emotional state. Experimental results show that the proposed approach can yield satisfactory results in both the posed MHMC and the naturalistic SEMAINE databases, and shows that modeling the complex temporal structure is useful to improve the emotion recognition performance, especially for the naturalistic database (i.e., natural conversation). The experimental results also confirm that the proposed 2H-SC-HMM can achieve an acceptable performance for the systems with sparse training data or noisy conditions.

IEEE Transactions on Multimedia | 2013

Speaking Effect Removal on Emotion Recognition From Facial Expressions Based on Eigenface Conversion

Chung-Hsien Wu; Wen-Li Wei; Jen-Chun Lin; Wei-Yu Lee

Speaking effect is a crucial issue that may dramatically degrade performance in emotion recognition from facial expressions. To manage this problem, an eigenface conversion-based approach is proposed to remove speaking effect on facial expressions for improving accuracy of emotion recognition. In the proposed approach, a context-dependent linear conversion function modeled by a statistical Gaussian Mixture Model (GMM) is constructed with parallel data from speaking and non-speaking facial expressions with emotions. To model the speaking effect in more detail, the conversion functions are categorized using a decision tree considering the visual temporal context of the Articulatory Attribute (AA) classes of the corresponding input speech segments. For verification of the identified quadrant of emotional expression on the Arousal-Valence (A-V) emotion plane, which is commonly used to dimensionally define the emotion classes, from the reconstructed facial feature points, an expression template is constructed to represent the feature points of the non-speaking facial expressions for each quadrant. With the verified quadrant, a regression scheme is further employed to estimate the A-V values of the facial expression as a precise point in the A-V emotion plane. Experimental results show that the proposed method outperforms current approaches and demonstrates that removing the speaking effect on facial expression is useful for improving the performance of emotion recognition.

IEEE Transactions on Audio, Speech, and Language Processing | 2014

Exploiting Psychological Factors for Interaction Style Recognition in Spoken Conversation

Wen Li Wei; Chung-Hsien Wu; Jen-Chun Lin; Han Li

Determining how a speaker is engaged in a conversation is crucial for achieving harmonious interaction between computers and humans. In this study, a fusion approach was developed based on psychological factors to recognize Interaction Style ( IS) in spoken conversation, which plays a key role in creating natural dialogue agents. The proposed Fused Cross-Correlation Model (FCCM) provides a unified probabilistic framework to model the relationships among the psychological factors of emotion, personality trait ( PT), transient IS, and IS history, for recognizing IS. An emotional arousal-dependent speech recognizer was used to obtain the recognized spoken text for extracting linguistic features to estimate transient IS likelihood and recognize PT. A temporal course modeling approach and an emotional sub-state language model, based on the temporal phases of an emotional expression, were employed to obtain a better emotion recognition result. The experimental results indicate that the proposed FCCM yields satisfactory results in IS recognition and also demonstrate that combining psychological factors effectively improves IS recognition accuracy.

international conference on acoustics, speech, and signal processing | 2013

Facial action unit prediction under partial occlusion based on Error Weighted Cross-Correlation Model

Jen-Chun Lin; Chung-Hsien Wu; Wen Li Wei

Occlusive effect is a crucial issue that may dramatically degrade performance on facial expression recognition. As emotion recognition from facial expression is based on the entire facial feature, occlusive effect remains a challenging problem to be solved. To manage this problem, an Error Weighted Cross-Correlation Model (EWCCM) is proposed to effectively predict the facial Action Unit (AU) under partial facial occlusion from non-occluded facial regions for providing the correct AU information for emotion recognition. The Gaussian Mixture Model (GMM)-based Cross-Correlation Model (CCM) in EWCCM is first proposed not only modeling the extracted facial features but also constructing the statistical dependency among features from paired facial regions for AU prediction. The Bayesian classifier weighting scheme is then adopted to explore the contributions of the GMM-based CCMs to enhance the prediction accuracy. Experiments show that a promising result of the proposed approach can be obtained.

international conference on acoustics, speech, and signal processing | 2016

DEMV-matchmaker: Emotional temporal course representation and deep similarity matching for automatic music video generation

Jen-Chun Lin; Wen-Li Wei; Hsin-Min Wang

This paper presents a deep similarity matching-based emotion-oriented music video (MV) generation system, called DEMV-matchmaker, which utilizes an emotion-oriented deep similarity matching (EDSM) metric as a bridge to connect music and video. Specifically, we adopt an emotional temporal course model (ETCM) to respectively learn the relationship between music and its emotional temporal phase sequence and the relationship between video and its emotional temporal phase sequence from an emotion-annotated MV corpus. An emotional temporal structure preserved histogram (ETPH) representation is proposed to keep the recognized emotional temporal phase sequence information for EDSM metric construction. A deep neural network (DNN) is then applied to learn an EDSM metric based on the ETPHs for the given positive (official) and negative (artificial) MV examples. For MV generation, the EDSM metric is applied to measure the similarity between ETPHs of video and music. The results of objective and subjective experiments demonstrate that DEMV-matchmaker performs well and can generate appealing music videos that can enhance the viewing and listening experience.

acm multimedia | 2015

EMV-matchmaker: Emotional Temporal Course Modeling and Matching for Automatic Music Video Generation

Jen-Chun Lin; Wen-Li Wei; Hsin-Min Wang

This paper presents a novel content-based emotion-oriented music video (MV) generation system, called EMV-matchmaker, which utilizes the emotional temporal phase sequence of the multimedia content as a bridge to connect music and video. Specifically, we adopt an emotional temporal course model (ETCM) to respectively learn the relationship between music and its emotional temporal phase sequence and the relationship between video and its emotional temporal phase sequence from an emotion-annotated MV corpus. Then, given a video clip (or a music clip), the visual (or acoustic) ETCM is applied to predict its emotional temporal phase sequence in a valence-arousal (VA) emotional space from the corresponding low-level visual (or acoustic) features. For MV generation, string matching is applied to measure the similarity between the emotional temporal phase sequences of video and music. The results of objective and subjective experiments demonstrate that EMV-matchmaker performs well and can generate appealing music videos that can enhance the viewing and listening experience.

asia pacific signal and information processing association annual summit and conference | 2016

Audio-visual speech enhancement using deep neural networks

Jen-Cheng Hou; Syu-Siang Wang; Ying-Hui Lai; Jen-Chun Lin; Yu Tsao; Hsiu-Wen Chang; Hsin-Min Wang

This paper proposes a novel framework that integrates audio and visual information for speech enhancement. Most speech enhancement approaches consider audio features only to design filters or transfer functions to convert noisy speech signals to clean ones. Visual data, which provide useful complementary information to audio data, have been integrated with audio data in many speech-related approaches to attain more effective speech processing performance. This paper presents our investigation into the use of the visual features of the motion of lips as additional visual information to improve the speech enhancement capability of deep neural network (DNN) speech enhancement performance. The experimental results show that the performance of DNN with audio-visual inputs exceeds that of DNN with audio inputs only in four standardized objective evaluations, thereby confirming the effectiveness of the inclusion of visual information into an audio-only speech enhancement framework.

international conference on orange technologies | 2013

A probabilistic fusion strategy for audiovisual emotion recognition of sparse and noisy data

Jen-Chun Lin; Chung-Hsien Wu; Wen Li Wei

Due to diverse expression styles in real-world scenarios, recognizing human emotions is difficult without collecting sufficient and various data for model training. Besides, emotion recognition of noisy data is another challenging problem to be solved. This work endeavors to propose a fusion strategy to alleviate the problems of noisy and sparse data in bimodal emotion recognition. Toward robust bimodal emotion recognition, a Semi-Coupled Hidden Markov Model (SC-HMM) based on a state-based bimodal alignment strategy is proposed to align the temporal relation of states of two component HMMs between audio and visual streams. Based on this strategy, the SC-HMM can diminish the over-fitting problem and achieve better statistical dependency between states of audio and visual HMMs in sparse data conditions and also provides the ability to better accommodate to the noisy conditions. Experiments show a promising result of the proposed approach.

Explore More