Yingchun Yang | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Yingchun Yang is active.

Explore More

Publication

Featured researches published by Yingchun Yang.

ubiquitous computing | 2010

GeeAir: a universal multimodal remote control device for home appliances

Gang Pan; Jiahui Wu; Daqing Zhang; Zhaohui Wu; Yingchun Yang; Shijian Li

In this paper, we present a handheld device called GeeAir for remotely controlling home appliances via a mixed modality of speech, gesture, joystick, button, and light. This solution is superior to the existing universal remote controllers in that it can be used by the users with physical and vision impairments in a natural manner. By combining diverse interaction techniques in a single device, the GeeAir enables different user groups to control home appliances effectively, satisfying even the unmet needs of physically and vision-impaired users while maintaining high usability and reliability. The experiments demonstrate that the GeeAir prototype achieves prominent performance through standardizing a small set of verbal and gesture commands and introducing the feedback mechanisms.

international conference on biometrics | 2006

Dynamic bayesian networks for audio-visual speaker recognition

Dongdong Li; Yingchun Yang; Zhaohui Wu

Audio-Visual speaker recognition promises higher performance than any single modal biometric systems. This paper further improves the novel approach based on Dynamic Bayesian Networks (DBNs) to bimodal speaker recognition. In the present paper, we investigate five different topologies of feature-level fusion framework using DBNs. We demonstrate that the performance of multimodal systems can be further improved by modeling the correlation of between the speech features and the face features appropriately. The experiment conducted on a multi-modal database of 54 users indicates promising results, with an absolute improvement of about 7.44% in the best case and 3.13% in the worst case compared with single modal speaker recognition system.

2006 IEEE Odyssey - The Speaker and Language Recognition Workshop | 2006

MASC: A Speech Corpus in Mandarin for Emotion Analysis and Affective Speaker Recognition

Tian Wu; Yingchun Yang; Zhaohui Wu; Dongdong Li

In this paper, a large emotional speech database MASC (Mandarin affective speech corpus) is introduced. The database contains recordings of 68 native speakers (23 female and 45 male) and five kinds of emotional states: neutral, anger, elation, panic and sadness. Each speaker pronounces 5 phrases, 10 sentences for three times for each emotional states and 2 paragraphs only for neutral. These materials covers all the phonemes in Chinese. This corpus is constructed for prosodic and linguistic investigation of emotion expression in Mandarin. It can also be used for recognition of affectively stressed speakers. Furthermore, prosodic feature analysis and speaker recognition baseline experiment are performed on this database

affective computing and intelligent interaction | 2005

Emotion-State conversion for speaker recognition

Dongdong Li; Yingchun Yang; Zhaohui Wu; Tian Wu

The performance of speaker recognition system is easily disturbed by the changes of the internal states of human. The ongoing work proposes an approach of speech emotion-state conversion to improve the performance of speaker identification system over various affective speech. The features of neutral speech are modified according to statistical prosodic parameters of emotion utterances. Speaker models are generated based on the converted speech. The experiments conducted on an emotion corpus with 14 emotion states shows promising results with an improved performance by 7.2%.

affective computing and intelligent interaction | 2005

Improving speaker recognition by training on emotion-added models

Tian Wu; Yingchun Yang; Zhaohui Wu

In speaker recognition applications, the changes of emotional states are main causes of errors. The ongoing work described in this contribution attempts to enhance the performance of automatic speaker recognition (ASR) systems on emotional speech. Two procedures that only need a small quantity of affective training data are applied to ASR task, which is very practical in real-world situations. The method includes classifying the emotional states by acoustical features and generating emotion-added model based on the emotion grouping. Experimental works are performed on Emotional Prosody Speech (EPS) corpus and show significant improvement in EERs and IRs compared with baseline and comparative experiments.

international conference on acoustics, speech, and signal processing | 2006

Rules Based Feature Modification for Affective Speaker Recognition

Zhaohui Wu; Dongdong Li; Yingchun Yang

One of the largest challenges in speaker recognition applications is dealing with speaker-emotion variability. In this paper, we further investigate the rules based feature modification for robust speaker recognition with emotional speech. Specifically, we learn the rules of prosodic features modification from a small amount of the content matched source-target pairs. Features with emotion information are adapted from the prevalent neutral features by applying the modification rules. The converted features are trained together with the neutral features to build the speaker models. The effects of individual and combined modifications of duration, pitch and amplitude are also studied using EPST dataset recorded by 8 professional actors with 14 kinds of emotion expressiveness. It demonstrates that duration modifications play the most important role; and that, pitch modifications are more effective than amplitude modifications. Promising result with an improved identification rate by 7.83% is achieved compared to the traditional speaker recognition

international conference on pattern recognition | 2008

Learning polynomial function based neutral-emotion GMM transformation for emotional speaker recognition

Zhenyu Shan; Yingchun Yang

One of the biggest challenges in speaker recognition is dealing with speaker-emotion variability. The basic problem is how to train the emotion GMMs of the speakers from their neutral speech and how to calculate the scores of the feature vectors against the emotion GMMs. In this paper, we present a new neutral-emotion GMM transformation algorithm to overcome this limitation. A transformation function based on polynomial function is learned to represent the relationship between the neutral and emotion GMM. It is adopted in testing to calculate the scores against the emotion GMM. The experiments carried on MASC show the performance is improved with an EER reduction of 39.5% from the baseline system.

international conference on acoustics, speech, and signal processing | 2013

Emotional speaker recognition based on i-vector through Atom Aligned Sparse Representation

Li Chen; Yingchun Yang

I-vector algorithm was previously adopted to improve the performance of ASR (Automatic Speaker Recognition) system which is degraded by emotion variability. The variability compensation technique is LDA (Linear Discriminant Analysis) which assumes the variability is speaker-independent. However, this assumption is not suitable for emotion variability because we discover that the pattern of emotion variability is speaker-dependent. Therefore, a novel emotion synthesis algorithm AASR (Atom Aligned Sparse Representation) is proposed to characterize this speaker-dependent pattern and compensate the emotion variability within i-vectors. The experiments conducted on MASC show that our algorithm, compared with the GMM-UBM algorithm and the conventional variability compensation algorithm LDA, both can enhance the speaker identification and verification performances.

chinese conference on biometric recognition | 2011

Applying emotional factor analysis and I-vector to emotional speaker recognition

Li Chen; Yingchun Yang

Emotion variability is an important factor that degrades the performce of speaker recognition system. This paper borrows ideas from Joint Factor Analysis (JFA) algorithm based on the similarity between emotion effect and channel effect and develops Emotional Factor Analysis (EFA) into solving the emotion variability problem. I-Vector is appiled also. The experiment carried on MASC (Madarin Affective Speech Corpus) shows that EFA and I-Vector method can bring an IR increase of 7%-10% and an EER reduction of 3%-4% compared with the GMM-UBM system.

international conference on pattern recognition | 2006

An UBM-Based Reference Space for Speaker Recognition

Zhenchun Lei; Yingchun Yang; Zhaohui Wu

The universal background model represents the speaker independent distribution of features, so it can be used to construct a reference space for speaker recognition. In the anchor models, one speaker utterance can be located at one point in the anchor space. We construct the reference space using the Gaussian distributions in the universal background model instead of the virtual speakers in the anchor models, and one speakers all utterances mainly locate in a small portion of the whole space. On the other hand, we use the support vector machine to separate this speaker dependent portion from the whole space while the Euclidean distance measure is used in the anchor models in general. The experiments on the YOHO database show that our method can get better performance comparing with the decision fashion based on the Euclidean distance which is widely used in the anchor models

Explore More