Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Wai-Yip Chan is active.

Publication


Featured researches published by Wai-Yip Chan.


Speech Communication | 2011

Automatic speech emotion recognition using modulation spectral features

Siqing Wu; Tiago H. Falk; Wai-Yip Chan

In this study, modulation spectral features (MSFs) are proposed for the automatic recognition of human affective information from speech. The features are extracted from an auditory-inspired long-term spectro-temporal representation. Obtained using an auditory filterbank and a modulation filterbank for speech analysis, the representation captures both acoustic frequency and temporal modulation frequency components, thereby conveying information that is important for human speech perception but missing from conventional short-term spectral features. On an experiment assessing classification of discrete emotion categories, the MSFs show promising performance in comparison with features that are based on mel-frequency cepstral coefficients and perceptual linear prediction coefficients, two commonly used short-term spectral representations. The MSFs further render a substantial improvement in recognition performance when used to augment prosodic features, which have been extensively used for emotion recognition. Using both types of features, an overall recognition rate of 91.6% is obtained for classifying seven emotion categories. Moreover, in an experiment assessing recognition of continuous emotions, the proposed features in combination with prosodic features attain estimation performance comparable to human evaluation.


IEEE Transactions on Audio, Speech, and Language Processing | 2010

A Non-Intrusive Quality and Intelligibility Measure of Reverberant and Dereverberated Speech

Tiago H. Falk; Chenxi Zheng; Wai-Yip Chan

A modulation spectral representation is investigated for non-intrusive quality and intelligibility measurement of reverberant and dereverberated speech. The representation is obtained by means of an auditory-inspired filterbank analysis of critical-band temporal envelopes of the speech signal. Modulation spectral insights are used to develop an adaptive measure termed speech to reverberation modulation energy ratio. Experimental results show the proposed measure outperforming three standard algorithms for tasks involving estimation of multiple dimensions of perceived coloration, as well as quality measurement and intelligibility estimation of reverberant and dereverberated speech.


IEEE Signal Processing Magazine | 2011

Speech Quality Estimation: Models and Trends

Sebastian Möller; Wai-Yip Chan; Nicolas Côté; Tiago H. Falk; Alexander Raake; Marcel Wältermann

This article presents a tutorial overview of models for estimating the quality experienced by users of speech transmission and communication services. Such models can be classified as either parametric or signal based. Signal-based models use input speech signals measured at the electrical or acoustic interfaces of the transmission channel. Parametric models, on the other hand, depend on signal and system parameters estimated during network planning or at run time. This tutorial describes the underlying principles as well as advantages and limitations of existing models. It also presents new developments, thus serving as a guide to an appropriate usage of the multitude of current and emerging speech quality models.


IEEE Transactions on Communications | 1992

Enhanced multistage vector quantization by joint codebook design

Wai-Yip Chan; Smita Gupta; Allen Gersho

Multistage vector quantization (MSVQ) can achieve very low encoding and storage complexity in comparison to unstructured vector quantization. However, the conventional stage-by-stage design of the codebooks in MSVQ is suboptimal with respect to the overall performance measure. The authors introduce an algorithm for the joint design of the stage codebooks to optimize the overall performance. The performance improvement, although modest, is achieved with no effect on encoding or storage complexity and only a slight increase in design effort. >


IEEE Transactions on Instrumentation and Measurement | 2010

Temporal Dynamics for Blind Measurement of Room Acoustical Parameters

Tiago H. Falk; Wai-Yip Chan

In this paper, short- and long-term temporal dynamic information is investigated for the blind measurement of room acoustical parameters. In particular, estimators of room reverberation time (T60) and direct-to-reverberant energy ratio (DRR) are proposed. Short-term temporal dynamic information is obtained from differential (delta) cepstral coefficients. The statistics computed from the zeroth-order delta cepstral sequence serve as input features to a support vector T60 estimator. Long-term temporal dynamic cues, on the other hand, are obtained from an auditory spectrotemporal representation of speech commonly referred to as modulation spectrum. A measure termed as reverberation-to-speech modulation energy ratio, which is computed per modulation frequency band, is proposed and serves as input to T60 and DRR estimators. Experiments show that the proposed estimators outperform a baseline system in scenarios involving reverberant speech with and without the presence of acoustic background noise. Experiments also suggest that estimators of subjective perception of spectral coloration, reverberant tail effect, and overall speech quality can be obtained with an adaptive speech-to-reverberation modulation energy ratio measure.


international conference on machine learning and cybernetics | 2006

Robust GMM Based Gender Classification using Pitch and RASTA-PLP Parameters of Speech

Yumin Zeng; Zhenyang Wu; Tiago H. Falk; Wai-Yip Chan

A novel gender classification system has been proposed based on Gaussian mixture models, which apply the combined parameters of pitch and 10th order relative spectral perceptual linear predictive coefficients to model the characteristics of male and female speech. The performances of gender classification system have been evaluated on the conditions of clean speech, noisy speech and multi-language. The simulations show that the performance of the proposed gender classifier is excellent; it is very robust for noise and completely independent of languages; the classification accuracy is as high as above 98% for all clean speech and remains 95% for most noisy speech, even the SNR of speech is degraded to OdB


international conference on digital signal processing | 2009

Automatic recognition of speech emotion using long-term spectro-temporal features

Siqing Wu; Tiago H. Falk; Wai-Yip Chan

This paper proposes a novel feature type for the recognition of emotion from speech. The features are derived from a long-term spectro-temporal representation of speech. They are compared to short-term spectral features as well as popular prosodic features. Experimental results with the Berlin emotional speech database show that the proposed features outperform both types of compared features. An average recognition accuracy of 88.6% is achieved by using a combined proposed & prosodic feature set for classifying 7 discrete emotions. Moreover, the proposed features are evaluated on the VAM corpus to recognize continuous emotion primitives. Estimation performance comparable to human evaluations is furnished.


international conference on image processing | 1994

Approaches to layered coding for dual-rate wireless video transmission

Masoud R. K. Khansari; Awais Zakauddin; Wai-Yip Chan; Eric Dubois; Paul Mermelstein

Visual communications over wireless networks require the efficient and robust coding of video signals for transmission over wireless links having time-varying channel capacity. The authors compare several schemes for encoding video data into two priority streams, thereby enabling the transmission of video data over wireless links to be switched between two bit rates. An H.261 (p/spl times/64) algorithm is modified to implement each candidate scheme. The algorithms are evaluated for a microcellular wireless environment and a clear-channel bit rate of 65 kb/s. The results show that by combining layering with automatic-repeat request (wireless-)link control, almost-wireline visual quality can be achieved.<<ETX>>


international conference of the ieee engineering in medicine and biology society | 2008

Modulation filtering for heart and lung sound separation from breath sound recordings

Tiago H. Falk; Wai-Yip Chan

Separation of heart and lung sounds from breath sound recordings is a challenging task due to the temporal and spectral overlap of the two signals. In this paper, the use of a spectro-temporal representation to improve signal separation is investigated. The representation is obtained by means of a frequency decomposition (termed modulation frequency) of temporal trajectories of short-term spectral components. Experiments described herein suggest that improved separability of heart (HS) and lung sounds (LS) is attained in the modulation frequency domain. Bandpass and bandstop modulation filters are designed to separate HS and LS signals from breath sound recordings, respectively. Visual and auditory inspection, quantitative analysis, as well as algorithm execution time are used to assess algorithm performance. Log-spectral distances below 1 dB corroborate our listening test which found no audible artifacts in separated heart and lung sound signals.


IEEE Transactions on Audio, Speech, and Language Processing | 2008

Hybrid Signal-and-Link-Parametric Speech Quality Measurement for VoIP Communications

Tiago H. Falk; Wai-Yip Chan

A hybrid signal-and-link-parametric approach to speech quality measurement for voice-over-Internet protocol (VoIP) communications is described. Connection parameters are used to determine a base quality representative of the transmission link. Degradation factors, computed from perceptual features extracted from the decoded speech signal, are used to quantify distortions not captured by the connection parameters. The algorithm is tested on speech degraded by acoustic noise, temporal clippings, and noise suppression artifacts, thus simulating degradations present in wireless-VoIP tandem connections. Hybrid measurement is shown to overcome the limitations of pure link parametric and pure signal-based measurement methods, resulting in better measurement accuracy for modern VoIP communications. In addition, the proposed algorithm incurs modest computational overhead relative to pure link parametric measurement and attains up to 88% reduction in processing time relative to the ITU-T standard P.563 signal-based algorithm.

Collaboration


Dive into the Wai-Yip Chan's collaboration.

Top Co-Authors

Avatar

Tiago H. Falk

Institut national de la recherche scientifique

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Allen Gersho

University of California

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Ye Li

Queen's University

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Jiandong Shen

Illinois Institute of Technology

View shared research outputs
Researchain Logo
Decentralizing Knowledge