Yen-Lu Chow
Apple Inc.
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Yen-Lu Chow.
international conference on acoustics, speech, and signal processing | 1985
Richard M. Schwartz; Yen-Lu Chow; Owen Kimball; S. Roucos; M. Krasner; J. Makhoul
This paper describes the results of our work in designing a system for phonetic recognition of unrestricted continuous speech. We describe several algorithms used to recognize phonemes using context-dependent Hidden Markov Models of the phonemes. We present results for several variations of the parameters of the algorithms. In addition, we propose a technique that makes it possible to integrate traditional acoustic-phonetic features into a hidden Markov process. The categorical decisions usually associated with heuristic acoustic-phonetic algorithms are replaced by automated training techniques and global search strategies. The combination of general spectral information and specific acoustic-phonetic features is shown to result in more accurate phonetic recognition than either representation by itself.
international conference on acoustics speech and signal processing | 1996
Jerome R. Bellegarda; John W. Butzberger; Yen-Lu Chow; Noah B. Coccaro; Devang K. Naik
A new approach is proposed for the clustering of words in a given vocabulary. The method is based on a paradigm first formulated in the context of information retrieval, called latent semantic analysis. This paradigm leads to a parsimonious vector representation of each word in a suitable vector space, where familiar clustering techniques can be applied. The distance measure selected in this space arises naturally from the problem formulation. Preliminary experiments indicate that, the clusters produced are intuitively satisfactory. Because these clusters are semantic in nature, this approach may prove useful as a complement to conventional class-based statistical language modeling techniques.
international conference on acoustics, speech, and signal processing | 1984
Richard M. Schwartz; Yen-Lu Chow; S. Roucos; Michael A. Krasner; John Makhoul
This paper discusses the use of the Hidden Markov Model (HMM) in phonetic recognition. In particular, we present improvements that deal with the problems of modeling the effect of phonetic context and the problem of robust pdf estimation. The effect of phonetic context is taken into account by conditioning the probability density functions (pdfs) of the acoustic parameters on the adjacent phonemes, only to the extent that there are sufficient tokens of the phoneme in that context. This partial conditioning is achieved by combining the conditioned and unconditioned pdfs models with weights that depend on the confidence in each pdf estimate. This combination is shown to result in better performance than either model by itself. We also show that it is possible to obtain the computational advantages of using discrete probability densities without the usual requirement for large amounts of training data.
international conference on acoustics, speech, and signal processing | 1994
Hsiaa-Wuen Hon; Baosheng Yuan; Yen-Lu Chow; Shankar Narayan; Kai-Fu Lee
Although commercial dictation products are beginning to emerge for English, the existence of a convenient keyboard has prevented pervasive use of dictation. On the other hand, for non alphabetic languages like Chinese, there is no convenient input method. Therefore, dictation may already be a more appealing input method, for Chinese. In this paper, we demonstrate that our sub-syllable HMM recognizer and tone classifier are able to yield state-of-the-art Mandarin Chinese syllable and tone recognition performance (95.7% for syllables and 98.9% for tones). By combining the HMM syllable recognizer and tone classifier, the tonal syllable result (94%) appears adequate for a syllable base dictation machine. Finally, to alleviate the homophone problem of syllable dictation, we developed a high-performance 5,000-word recognition system with 93% accuracy for the correct answer and 99% accuracy for the top 3 candidates.<<ETX>>
Journal of the Acoustical Society of America | 1998
Jerome R. Bellegarda; John W. Butzberger; Yen-Lu Chow
A system and method for performing speaker adaptation in a speech recognition system which includes a set of reference models corresponding to speech data from a plurality of speakers. The speech data is represented by a plurality of acoustic models and corresponding sub-events, and each sub-event includes one or more observations of speech data. A degree of lateral tying is computed between each pair of sub-events, wherein the degree of tying indicates the degree to which a first observation in a first sub-event contributes to the remaining sub-events. When adaptation data from a new speaker becomes available, a new observation from adaptation data is assigned to one of the sub-events. Each of the sub-events is then populated with the observations contained in the assigned sub-event based on the degree of lateral tying that was computed between each pair of sub-events. The reference models corresponding to the populated sub-events are then adapted to account for speech pattern idiosyncrasies of the new speaker, thereby reducing the error rate of the speech recognition system.
Journal of the Acoustical Society of America | 1998
Yen-Lu Chow; Erik P. Staats
A method and apparatus for detecting end points of speech activity in an input signal using spectral representation vectors performs beginning point detection using spectral representation vectors for the spectrum of each sample of the input signal and a spectral representation vector for the steady state portion of the input signal. The beginning point of speech is detected when the spectrum diverges from the steady state portion of the input signal. Once the beginning point has been detected, the spectral representation vectors of the input signal are used to determine the ending point of the sound in the signal. The ending point of speech is detected when the spectrum converges towards the steady state portion of the input signal. After both the beginning and ending of the sound are detected, vector quantization distortion can be used to classify the sound as speech or noise.
Journal of the Acoustical Society of America | 1997
Yen-Lu Chow; Erik P. Staats
A method and apparatus for detecting speech activity in an input signal. The present invention includes performing begin point detection using power/zero crossing. Once the begin point has been detected, the present invention uses the cepstrum of the input signal to determine the endpoint of the sound in the signal. After both the beginning and ending of the sound are detected, the present invention uses vector quantization distortion to classify the sound as speech or noise.
international conference on acoustics speech and signal processing | 1988
Francis Kubala; Yen-Lu Chow; A. Derr; M.-W. Feng; O. Kimball; J. Makhoul; P. Price; J. Rohlicek; S. Roucos; Richard G. Schwartz; J. Vandegrift
The system was trained in a speaker dependent mode on 28 minutes of speech from each of 8 speakers, and was tested on independent test material for each speaker. The system was tested with three artificial grammars spanning a broad perplexity range. The average performance of the system measured in percent word error was: 1.4% for a pattern grammar of perplexity 9, 7.5% for a word-pair grammar of perplexity 62, and 32.4% for a null grammar of perplexity 1000.<<ETX>>
human language technology | 1989
Richard G. Schwartz; Chris Barry; Yen-Lu Chow; Alan Derr; Ming-Whei Feng; Owen Kimball; Francis Kubala; John Makhoul; Jeffrey Vandegrift
In this paper we describe the algorithms used in the BBN BYBLOS Continuous Speech Recognition system. The BYBLOS system uses context-dependent hidden Markov models of phonemes to provide a robust model of phonetic coarticulation. We provide an update of the ongoing research aimed at improving the recognition accuracy. In the first experiment we confirm the large improvement in accuracy that can be derived by using spectral derivative parameters in the recognition. In particular, the word error rate is reduced by a factor of two. Currently the system achieves a word error rate of 2.9% when tested on the speaker-dependent part of the standard 1000-Word DARPA Resource Management Database using the Word-Pair grammar supplied with the database. When no grammar was used, the error rate is 15.3%. Finally, we present a method for smoothing the discrete densities on the states of the HMM, which is intended to alleviate the problem of insufficient training for detailed phonetic models.
human language technology | 1990
Paul G. Bamberg; Yen-Lu Chow; Laurence Gillick; Robert Roth; Dean Sturtevant
We present a 1000-word continuous speech recognition (CSR) system that operates in real time on a personal computer (PC). The system, designed for large vocabulary natural language tasks, makes use of phonetic Hidden Markov models (HMM) and incorporates acoustic, phonetic, and linguistic sources of knowledge to achieve high recognition performance. We describe the various components of this system. We also present our strategy for achieving real time recognition on the PC. Using a 486-based PC with a 29K-based add-on board, the recognizer has been timed at 1.1 times real time.