Yu Ting Yeung
The Chinese University of Hong Kong
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Yu Ting Yeung.
international conference on acoustics, speech, and signal processing | 2016
Tan Lee; Yuanyuan Liu; Pei-Wen Huang; Jen-Tzung Chien; Wang Kong Lam; Yu Ting Yeung; Thomas Law; Kathy Y. S. Lee; Anthony Pak-Hin Kong; Sam-Po Law
This paper describes the application of state-of-the-art automatic speech recognition (ASR) systems to objective assessment of voice and speech disorders. Acoustical analysis of speech has long been considered a promising approach to non-invasive and objective assessment of people. In the past the types and amount of speech materials used for acoustical assessment were very limited. With the ASR technology, we are able to perform acoustical and linguistic analyses with a large amount of natural speech from impaired speakers. The present study is focused on Cantonese, which is a major Chinese dialect. Two representative disorders of speech production are investigated: dysphonia and aphasia. ASR experiments are carried out with continuous and spontaneous speech utterances from Cantonese-speaking patients. The results confirm the feasibility and potential of using natural speech for acoustical assessment of voice and speech disorders, and reveal the challenging issues in acoustic modeling and language modeling of pathological speech.
conference of the international speech communication association | 2015
Ka-Ho Wong; Yu Ting Yeung; Patrick C. M. Wong; Gina-Anne Levow; Helen M. Meng
Imprecise articulatory breakdown is one of the characteristics of dysarthric speech. This work attempts to develop a framework to automatically identify problematic articulatory patterns of dysarthric speakers in terms of distinctive features (DFs), which are effective for describing speech production. The identification of problematic articulatory patterns aims to assist speech therapists in developing intervention strategies. A multilayer perceptron (MLP) system is trained with nondysarthric speech data for DF recognition. Agreement rates between the recognized DF values and the canonical values based on phonetic transcriptions are computed. For nondysarthric speech, our system achieves an average agreement rate of 85.7%. The agreement rate of dysarthric speech declines, ranging between 1% to 3% in mild cases, 4% to 7% in moderate cases, and 7% to 12% in severe cases, when compared with non-dysarthric speech. We observe that the DF disagreement patterns are consistent with the analysis of a speech
international conference on acoustics, speech, and signal processing | 2012
Yu Ting Yeung; Tan Lee; Cheung-Chi Leung
A single-microphone speech separation framework based on conditional random fields (CRFs) is proposed in this paper. Unlike factorial HMM, CRF does not have the conditional independence assumption on observations, thus different types of observations from the speech mixture can be integrated into the models through feature functions. Similar to factorial HMM, there is the statistical independence assumption on sources. Under this assumption, the two-source single-microphone speech separation problem can be expressed by two independent linear-chain CRFs. The separation problem becomes two pattern recognition problems, with respect to CRF models of the two sources. Experimental results show that by integrating initial separation outputs from factorial HMM with log power spectrum, fundamental frequency and speaker likelihoods of the mixture, CRF separation framework consistently improves the results from factorial HMM in terms of SNR, segmental SNR and PESQ.
international symposium on chinese spoken language processing | 2010
Houwei Cao; P. C. Ching; Tan Lee; Yu Ting Yeung
This paper addresses the problem of language modeling for LVCSR of Cantonese-English code-mixing utterances spoken in daily communications. In the absence of sufficient amount of code-mixing text data, translation-based and semantics-based mapping are applied on n-grams to better estimate the probability of low-frequency and unseen mixed-language n-grams events. In translation-based mapping scheme, the Cantonese-to-English translation dictionary is adopted to transcribe monolingual Cantonese n-grams to mixed-language n-grams. In semantics-based mapping scheme, n-gram mapping is based on the meaning and syntactic function of the English words in the lexicon. Different semantics-based language models are trained with different mapping schemes. They are evaluated in terms of perplexity and in the task of LVCSR. Experimental results confirm that, the more the observed mixed-language n-grams after mapping, the better the language model perplexity as well as the recognition performance. The proposed language models show significant improvement on recognition performance on embedded English words when they are compared with the baseline 3-gram LM. The best recognition accuracy attained is 63.9% and 74.7% respectively for the English words and Cantonese characters in code-mixing utterances.
international conference on acoustics, speech, and signal processing | 2013
Feng Huang; Yu Ting Yeung; Tan Lee
To post-process outputs of speech separation systems with harmonic enhancement, it is normally required to estimate the fundamental frequency. This paper evaluates the performance of a few representative robust pitch estimation algorithms on speech reconstructed from two-speaker mixture signals. The separation outputs obtained by two state-of-the-art single-channel separation algorithms are used for the evaluation. A recently proposed sparsity-based pitch estimation method is applied to the separated speech and a new pitch tracking algorithm is proposed. Experimental results show that on the separated speech the proposed method consistently surpasses the others with significantly low gross error rate, which is similar to the gross error rates of the other methods on clean speech.
international conference on acoustics, speech, and signal processing | 2013
Yu Ting Yeung; Tan Lee; Cheung-Chi Leung
The use of dynamic conditional random field (DCRF) for model-based single-microphone speech separation is investigated. The speech sources are represented by acoustic state sequences from speaker-dependent acoustic models. The posterior probabilities of the source acoustic states given a speech mixture are inferred with a maximum entropy probability distribution which is represented by DCRF. The posterior probabilities are needed for minimum mean-square error estimation of the speech sources. Loopy belief propagation is applied for the inference. Averaged stochastic gradient descent and limited-memory BFGS are compared for parameter estimation. With the log-magnitude spectrum of the speech mixture as input observation, the proposed method achieves better separation performance in terms of Blind Source Separation Metrics (SDR, SAR, SIR) and PESQ than a factorial hidden Markov model baseline system in our experiments.
international conference on acoustics, speech, and signal processing | 2016
Ka-Ho Wong; Wing Sum Yeung; Yu Ting Yeung; Helen M. Meng
Dysarthria is a kind of motor speech disorder due to neurological deficits. Understanding the articulatory problems of dysarthric speakers may help to design suitable intervention strategies to improve their speech intelligibility. We have developed an automatic articulatory characteristics analysis framework based on a distinctive feature (DF) recognition. We recruited 16 Cantonese dysarthric subjects with spinocerebellar ataxia (SCA) or cerebral palsy (CP) to support our research. To the best of our knowledge, this is among the first efforts in collecting and automatically analyzing Cantonese dysarthric speech. The framework shows a close Pearson correlation to manual annotation of the subjects in most DFs and also in the average DF error rates. It indicates a potential way to describe articulatory characteristics of dysarthric speech and automatically assess it.
IEEE Transactions on Audio, Speech, and Language Processing | 2015
Yu Ting Yeung; Tan Lee; Cheung-Chi Leung
We apply conditional random field (CRF) for single-microphone speech separation in a supervised learning scenario. We train the parameters with mixture data in which the sources are competing with the same average signal power. Compared with factorial hidden Markov model (HMM) baselines, the CRF settings require fewer training mixture data to improve objective speech quality measures and speech recognition accuracy of the reconstructed sources, when mixing ratios of training and testing mixture data are matched. The CRF settings also handle minor mixing ratio mismatch after adjusting the gain factors of the sources with non-linear mappings inspired from the mixture-maximization model. When the mixing ratio mismatch further increases such that the speech mixture is dominated by only one source, factorial HMM finally catches up with and performs better than the CRF settings due to improved model accuracy. We also develop a convex statistical inference simplification based on linear-chain CRFs. The simplification achieves the same performance level as the original CRF settings after integrating additional observations.
international conference on signal and information processing | 2013
Yu Ting Yeung; Tan Lee
A variational statistical inference method referred as structured mean field method is studied for factorial Hidden Markov Model (HMM) formulation of single-microphone speech separation problem. By decoupling the Markov chains of the individual speech sources coupled during the mixing process of the speech mixture, the complexity of temporal inference of the speech sources is reduced to quadratic with the number of acoustic states of the sources. Speech separation and automatic speech recognition experiments are performed on the reconstructed speech. Experimental results show that the studied approximating inference method achieves the similar separation results as the exact inference algorithm in terms of Perceptual Evaluation of Speech Quality (PESQ) and word error rate (WER).
conference of the international speech communication association | 2015
Ka-Ho Wong; Yu Ting Yeung; Edwin Ho-yin Chan; Patrick C. M. Wong; Gina-Anne Levow; Helen M. Meng