Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Kaisheng Yao is active.

Publication


Featured researches published by Kaisheng Yao.


international conference on acoustics, speech, and signal processing | 2004

Speech enhancement by perceptual filter with sequential noise parameter estimation

Te-Won Lee; Kaisheng Yao

We report work on speech enhancement that combines sequential noise estimation and perceptual filtering. The sequential estimation employs an extension of the sequential EM-type algorithm. In the algorithm, statistics of clean speech are modeled by hidden Markov models (HMM) and noise is assumed to be Gaussian distributed with a time-varying mean vector (the noise parameter) to be estimated. The estimation process uses a non-linear function that relates speech statistics, noise, and noisy observation. With the estimated noise parameter, the subtraction-type algorithm for speech enhancement may be extended to non-stationary environments. In particular, a perceptual filter with frequency masking is constructed with a tradeoff between noise reduction and speech distortion considering the sensitivity of speech recognition systems to speech distortion. Our experiments in speech enhancement and speech recognition in non-stationary noise confirmed that this approach seems promising in improving performances compared to alternative speech enhancement algorithms.


international conference on acoustics, speech, and signal processing | 2000

Residual noise compensation for robust speech recognition in nonstationary noise

Kaisheng Yao; Bertram E. Shi; Pascale Fung; Zhigang Cao

We present a model-based noise compensation algorithm for robust speech recognition in nonstationary noisy environments. The effect of noise is split into a stationary part, compensated by parallel model combination, and a time varying residual. The evolution of residual noise parameters is represented by a set of state space models. The state space models are updated by Kalman prediction and the sequential maximum likelihood algorithm. Prediction of residual noise parameters from different mixtures are fused, and the fused noise parameters are used to modify the linearized likelihood score of each mixture. Noise compensation proceeds in parallel with recognition. Experimental results demonstrate that the proposed algorithm improves recognition performance in highly nonstationary environments, compared with parallel model combination alone.


international conference on acoustics, speech, and signal processing | 2002

Noise adaptive speech recognition in time-varying noise based on sequential kullback proximal algorithm

Kaisheng Yao; Kuldip Kumar Paliwal; Satoshi Nakamura

We present a noise adaptive speech recognition approach, where time-varying noise parameter estimation and Viterbi process are combined together. The Viterbi process provides approximated joint likelihood of active partial paths and observation sequence given the noise parameter sequence estimated till previous frame. The joint likelihood after normalization provides approximation to the posterior probabilities of state sequences for an EM-type recursive process based on sequential Kullback proximal algorithm to estimate the current noise parameter. The combined process can easily be applied to perform continuous speech recognition in presence of non-stationary noise. Experiments were conducted in simulated and real non-stationary noises. Results showed that the noise adaptive system provides significant improvements in word accuracy as compared to the baseline system (without noise compensation) and the normal noise compensation system (which assumes the noise to be stationary).


ambient intelligence | 2010

Robust speech recognition under noisy ambient conditions

Kuldip Kumar Paliwal; Kaisheng Yao

Automatic speech recognition is critical in natural human-centric interfaces for ambient intelligence. The performance of an automatic speech recognition system, however, degrades drastically when there is a mismatch between training and testing conditions. The aim of robust speech recognition is to overcome the mismatch problem so the result is a moderate and graceful degradation in recognition performance. In this chapter, we provide a brief overview of an automatic speech recognition system, describe sources of speech variability that cause mismatch between training and testing, and discuss some of the current techniques to achieve robust speech recognition.


Speech Communication | 2005

Generative factor analyzed HMM for automatic speech recognition

Kaisheng Yao; Kuldip Kumar Paliwal; Te-Won Lee

We present a generative factor analyzed hidden Markov model (GFA-HMM) for automatic speech recognition. In a standard HMM, observation vectors are represented by mixture of Gaussians (MoG) that are dependent on discrete-valued hidden state sequence. The GFA-HMM introduces a hierarchy of continuous-valued latent representation of observation vectors, where latent vectors in one level are acoustic-unit dependent and latent vectors in a higher level are acoustic-unit independent. An expectation maximization (EM) algorithm is derived for maximum likelihood estimation of the model. We show through a set of experiments to verify the potential of the GFA-HMM as an alternative acoustic modeling technique. In one experiment, by varying the latent dimension and the number of mixture components in the latent spaces, the GFA-HMM attained more compact representation than the standard HMM. In other experiments with varies noise types and speaking styles, the GFA-HMM was able to have (statistically significant) improvement with respect to the standard HMM.


EURASIP Journal on Advances in Signal Processing | 2004

Time-varying noise estimation for speech enhancement and recognition using sequential Monte Carlo method

Kaisheng Yao; Te-Won Lee

We present a method for sequentially estimating time-varying noise parameters. Noise parameters are sequences of time-varying mean vectors representing the noise power in the log-spectral domain. The proposed sequential Monte Carlo method generates a set of particles in compliance with the prior distribution given by clean speech models. The noise parameters in this model evolve according to random walk functions and the model uses extended Kalman filters to update the weight of each particle as a function of observed noisy speech signals, speech model parameters, and the evolved noise parameters in each particle. Finally, the updated noise parameter is obtained by means of minimum mean square error (MMSE) estimation on these particles. For efficient computations, the residual resampling and Metropolis-Hastings smoothing are used. The proposed sequential estimation method is applied to noisy speech recognition and speech enhancement under strongly time-varying noise conditions. In both scenarios, this method outperforms some alternative methods.


ieee automatic speech recognition and understanding workshop | 2001

Time-varying noise compensation by sequential Monte Carlo method

Kaisheng Yao; Satoshi Nakamura

We present a sequential Monte Carlo method applied to additive noise compensation for robust speech recognition in time-varying noise. At each frame, the method generates a set of samples, approximating the posterior distribution of speech and noise parameters for given observation sequences to the current frame. An explicit model representing noise effects on speech features is used, so that an extended Kalman filter is constructed for each sample, generating an updated continuous state as the estimation of the noise parameter, and prediction likelihood as the weight of each sample for minimum mean square error inference of the time-varying noise parameter over these samples. A selection step and a smoothing step are used to improve efficiency. Through experiments, we observed significant performance improvement over that achieved by noise compensation with a stationary noise assumption. It also performed better than the sequential EM algorithm in machine-gun noise.


neural information processing systems | 2001

Sequential Noise Compensation by Sequential Monte Carlo Method

Kaisheng Yao; Satoshi Nakamura


conference of the international speech communication association | 2001

Sequential Noise Compensation by A Sequential Kullback Proximal Algorithm

Kaisheng Yao; Kuldip Kumar Paliwal; Satoshi Nakamura


conference of the international speech communication association | 2003

A speech processing front-end with eigenspace normalization for robust speech recognition in noisy automobile environments.

Kaisheng Yao; Erik Visser; Oh-Wook Kwon; Te-Won Lee

Collaboration


Dive into the Kaisheng Yao's collaboration.

Top Co-Authors

Avatar

Satoshi Nakamura

Nara Institute of Science and Technology

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Bertram E. Shi

Hong Kong University of Science and Technology

View shared research outputs
Top Co-Authors

Avatar

Pascale Fung

Hong Kong University of Science and Technology

View shared research outputs
Top Co-Authors

Avatar

Jingdong Chen

Northwestern Polytechnical University

View shared research outputs
Top Co-Authors

Avatar

Erik Visser

University of California

View shared research outputs
Top Co-Authors

Avatar

Hoon-Young Cho

University of California

View shared research outputs
Top Co-Authors

Avatar

Oh-Wook Kwon

University of California

View shared research outputs
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge