Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Yunxin Zhao is active.

Publication


Featured researches published by Yunxin Zhao.


Journal of the Acoustical Society of America | 1995

Word hypothesizer for continuous speech decoding using stressed-vowel centered bidirectional tree searches

Yunxin Zhao

A word hypothesis module for speech decoding consists of four submodules: vowel center detection, bidirectional tree searches around each vowel center, forward-backward pruning, and additional short words hypotheses. By detecting the strong energy vowel centers, a vowel-centered lexicon tree can be placed at each vowel center and searches can be performed in both the left and right directions, where only simple phone models are used for fast acoustic match. A stage-wise forward-backward technique computes the word-beginning and word-ending likelihood scores over the generated half-word lattice for further pruning of the lattice. To avoid potential miss of short words with weak energy vowel centers, a lexicon tree is compiled for these words and tree searches are performed between each pair of adjacent vowel centers. The integration of the word hypothesizer with a top-down Viterbi beam search in continuous speech decoding provides two-pass decoding which significantly reduces computation time.


IEEE Transactions on Speech and Audio Processing | 1994

An acoustic-phonetic-based speaker adaptation technique for improving speaker-independent continuous speech recognition

Yunxin Zhao

A new speaker adaptation technique is proposed for improving speaker-independent continuous speech recognition based on a decomposition of spectral variation sources. In this technique, the spectral variations are separated into two categories, one acoustic and the other phone-specific, where each variation source is modeled by a linear transformation system. The technique consists of two sequential steps: first, acoustic normalization is performed, and second, phone model parameters are adapted. Experiments of speaker adaptation on the TIMIT database using short calibration speech (5 s per speaker) have shown significant performance improvement over the baseline speaker-independent continuous speech recognition, where the recognition system uses Gaussian mixture density based hidden Markov models of phone units. For a vocabulary size of 853 and test set perplexity of 104, the recognition word accuracy has been improved from 86.9% for the baseline system to 90.5% after adaptation, corresponding to an error reduction of 27.5%. On a more difficult test set that contains an additional variation source due to recording channel mismatch, a more significant performance improvement has been obtained: for the same vocabulary and a test set perplexity of 101, the recognition word accuracy has been improved from 65.4% for the baseline to 86.0% after adaptation, corresponding to an error reduction of 59.5%. >


IEEE Transactions on Image Processing | 1996

Gaussian mixture density modeling, decomposition, and applications

Xinhua Zhuang; Yan Huang; Kannappan Palaniappan; Yunxin Zhao

We present a new approach to the modeling and decomposition of Gaussian mixtures by using robust statistical methods. The mixture distribution is viewed as a contaminated Gaussian density. Using this model and the model-fitting (MF) estimator, we propose a recursive algorithm called the Gaussian mixture density decomposition (GMDD) algorithm for successively identifying each Gaussian component in the mixture. The proposed decomposition scheme has advantages that are desirable but lacking in most existing techniques. In the GMDD algorithm the number of components does not need to be specified a priori, the proportion of noisy data in the mixture can be large, the parameter estimation of each component is virtually initial independent, and the variability in the shape and size of the component densities in the mixture is taken into account. Gaussian mixture density modeling and decomposition has been widely applied in a variety of disciplines that require signal or waveform characterization for classification and recognition. We apply the proposed GMDD algorithm to the identification and extraction of clusters, and the estimation of unknown probability densities. Probability density estimation by identifying a decomposition using the GMDD algorithm, that is, a superposition of normal distributions, is successfully applied to automated cell classification. Computer experiments using both real data and simulated data demonstrate the validity and power of the GMDD algorithm for various models and different noise assumptions.


Pattern Recognition | 1996

Piecewise linear classifiers using binary tree structure and genetic algorithm

Bing-Bing Chai; Tong Huang; Xinhua Zhuang; Yunxin Zhao; Jack Sklansky

Abstract A linear decision binary tree structure is proposed in constructing piecewise linear classifiers with the Genetic Algorithm (GA) being shaped and employed at each nonterminal node in order to search for a linear decision function, optimal in the sense of maximum impurity reduction. The methodology works for both the two-class and multi-class cases. In comparison to several other well-known methods, the proposed Binary Tree-Genetic Algorithm (BTGA) is demonstrated to produce a much lower cross validation misclassification rate. Finally, a modified BTGA is applied to the important pap smear cell classification. This results in a spectrum for the combination of the highest desirable sensitivity along with the lowest possible false alarm rate ranging from 27.34% sensitivity, 0.62% false alarm rate to 97.02% sensitivity, 50.24% false alarm rate from resubstitution validation. The multiple choices offered by the spectrum for the sensitivity-false alarm rate combination will provide the-flexibility needed for the pap smear slide classification.


IEEE Transactions on Signal Processing | 1995

Gaussian mixture density modeling of non-Gaussian source for autoregressive process

Yunxin Zhao; Xinhua Zhuang; Sheu-Jen Ting

A new approach is taken to model non-Gaussian sources of AR processes using Gaussian mixture densities that are known to be effective for approximating wide varieties of probability distributions. A maximum likelihood estimation algorithm is derived for estimating the AR parameters by solving a generalized normal equation, and a clustering algorithm is used for estimating the parameters of Gaussian mixture density of the source signals. The correlation matrix of the generalized normal equation is not Toeplitz but is symmetric and in general positive definite. Higher order statistics of skewness and kurtosis are used for identifying the source distribution as being Gaussian or non-Gaussian and, consequently, determining the parameter estimation technique between the conventional method and the proposed method. Experiments on non-Gaussian source AR processes demonstrate that under high SNR conditions (SNR/spl ges/20 dB), the proposed algorithm outperforms the conventional AR estimation algorithm and the cumulant-based algorithm by an order-of-magnitude reduction of average estimation errors. The proposed algorithm also has very low estimation errors with short data records. Finally, a maximum likelihood prediction method is formulated for non-Gaussian source AR processes that has shown potential in achieving higher efficiency signal coding than linear predictive coding. >


IEEE Transactions on Speech and Audio Processing | 1993

A speaker-independent continuous speech recognition system using continuous mixture Gaussian density HMM of phoneme-sized units

Yunxin Zhao

The author describes a large vocabulary, speaker-independent, continuous speech recognition system which is based on hidden Markov modeling (HMM) of phoneme-sized acoustic units using continuous mixture Gaussian densities. A bottom-up merging algorithm is developed for estimating the parameters of the mixture Gaussian densities, where the resultant number of mixture components is proportional to both the sample size and dispersion of training data. A compression procedure is developed to construct a word transcription dictionary from the acoustic-phonetic labels of sentence utterances. A modified word-pair grammar using context-sensitive grammatical parts is incorporated to constrain task difficulty. The Viterbi beam search is used for decoding. The segmental K-means algorithm is implemented as a baseline for evaluating the bottom-up merging technique. The system has been evaluated on the TIMIT database (1990) for a vocabulary size of 853. For test set perplexities of 24, 104, and 853, the decoding word accuracies are 90.9%, 86.0%, and 62.9%, respectively. For the perplexity of 104, the decoding accuracy achieved by using the merging algorithm is 4.1% higher than that using the segmental K-means (22.8% error reduction), and the decoding accuracy using the compressed dictionary is 3.0% higher than that using a standard dictionary (18.1% error reduction). >


IEEE Transactions on Speech and Audio Processing | 2000

Frequency-domain maximum likelihood estimation for automatic speech recognition in additive and convolutive noises

Yunxin Zhao

A feature estimation technique is proposed for speech signals that are degraded by both additive and convolutive noises. An EM algorithm is formulated in the frequency-domain for identification of the magnitude response of the distortion channel and power spectrum of additive noise, and posterior estimates of short-time power spectra of speech are obtained based on the identified channel and noise. The estimated posterior power spectra are used to calculate perceptually-based linear prediction cepstral coefficients, and the estimated cepstral features and their temporal regression coefficients are used for automatic speech recognition using acoustic models trained from clean speech. Experiments were performed on speaker independent continuous speech recognition, where the speech data were taken from the TIMIT database and were degraded by a distortion channel and simulated additive noises with white or colored spectral characteristics at various SNR levels. Experimental results indicate that the proposed technique leads to convergent identification of channel and noise and significantly improved recognition accuracy for speaker-independent continuous speech.


IEEE Transactions on Speech and Audio Processing | 1999

Adaptive co-channel speech separation and recognition

Kuan-Chieh Yen; Yunxin Zhao

An improved technique of co-channel speech separation, S-AADP/LMS, and its integration with automatic speech recognition is presented. The S-AADF/LMS technique is based on the algorithms of accelerated adaptive decorrelation filtering (AADP) and LMS noise cancellation, where a switching between the two algorithms is made depending upon the active/inactive status of the co-channel signal sources. The AADF improves the previous adaptive decorrelation algorithm in terms of system stability and estimation efficiency, and leads to better estimation of time-varying and reverberant channels. The S-AADF/LMS further improves the estimation accuracy when only one source signal remains active during certain periods of time. A coherence-function based source signal detection algorithm is also presented, which is successfully used in the switching between AADF and LMS and in extracting speech signals from leakage-corrupted background. Experiments were conducted under a simulated environment based on the measurements made of certain real room-acoustic conditions, and the results demonstrated the effectiveness of the proposed technique for co-channel speech separation and recognition.


international conference on acoustics, speech, and signal processing | 2005

Improved confusion network algorithm and shortest path search from word lattice

Jian Xue; Yunxin Zhao

We propose a novel confusion network (CN) generation algorithm with linear time complexity, O(T), which is capable of transforming a very large lattice into a confusion network with insignificant time. We further extend the confusion network concept to incorporate the case that a long word is split into short words. Finally, we develop a shortest path search algorithm that finds a sentence hypothesis from a word lattice to minimize the expected word error rate directly. The proposed algorithms are evaluated on the Switchboard task, where significant reduction of computation time was observed for the proposed confusion network algorithm as compared with a previously proposed confusion network algorithm, and improved word accuracy performance was observed for both the proposed CN algorithm and the shortest path algorithm as compared with one-best beam search decoding.


Speech Communication | 1998

An energy-constrained signal subspace method for speech enhancement and recognition in white and colored noises

Jun Huang; Yunxin Zhao

Abstract In this paper, an energy-constrained signal subspace (ECSS) method is proposed for speech enhancement and automatic speech recognition under additive noise condition. The key idea is to match the short-time energy of the enhanced speech signal to the unbiased estimate of the short-time energy of the clean speech, which is proven very effective for improving the estimation of the noise-like, low-energy segments in continuous speech. The ECSS method is applied to both white and colored noises where the additive colored noise is modelled by an autoregressive (AR) process. A modified covariance method is used to estimate the AR parameters of the colored noise and a prewhitening filter is constructed based on the estimated parameters. The performances of the proposed algorithms were evaluated using the TI46 digit database and the TIMIT continuous speech database. It was found that the ECSS method can achieve very high word recognition accuracy (WRA) for the digits set under low SNR conditions. For continuous speech data set, this method helped to improve the SNR by 2–6 dB and the WRA by 13.7–45.5% for the white noise and 18.6–55.9% for the colored noise under various SNR conditions.

Collaboration


Dive into the Yunxin Zhao's collaboration.

Top Co-Authors

Avatar

Shaojun Wang

Wright State University

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Jian Xue

University of Missouri

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Rusheng Hu

University of Missouri

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Xiaolong Li

University of Missouri

View shared research outputs
Top Co-Authors

Avatar

Rong Hu

University of Missouri

View shared research outputs
Top Co-Authors

Avatar

Fuchun Peng

University of Waterloo

View shared research outputs
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge