Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Hong C. Leung is active.

Publication


Featured researches published by Hong C. Leung.


international conference on acoustics, speech, and signal processing | 1993

A comparative study of signal representations and classification techniques for speech recognition

Hong C. Leung; Benjamin Chigier; James R. Glass

The authors investigate the interactions of two important sets of techniques in speech recognition: signal representation and classification. In addition, in order to quantify the effect of the telephone network, experiments are performed on both wideband and telephone-quality speech. The spectral and cepstral signal processing techniques studied fall into a few major categories based on Fourier analyses, linear prediction, and auditory processing. The classification techniques examined are Gaussian, mixture Gaussians, and the multilayer perceptron (MLP). Results indicate that the MLP consistently produces lower error rates than the other two classifiers. When averaged across all three classifiers, the Bark auditory spectral coefficients (BASC) produce the lowest phonetic classification error rates. When evaluated in a stochastic segment framework using the MLP, BASC also produces the lowest word error rate.<<ETX>>


international conference on acoustics speech and signal processing | 1988

Some phonetic recognition experiments using artificial neural nets

Hong C. Leung; Victor W. Zue

This paper is concerned with the application of artificial neural nets to phonetic recognition. The goal is to investigate how the framework of multilayer perceptrons can be exploited in speech recognition when the are augmented with acoustic-phonetic knowledge. Major issues as the choice of the error metric, the use of contextual information, and determination of the training procedure are investigated within a set of experiments that attempt to recognize the 16 vowels in American English. The results, based on some 10000 vowel tokens excised from 1000 sentences spoken by 200 speakers, indicate that a top-choice accuracy of 54% and 67% can be achieved for the context-independent and -dependent networks, respectively.<<ETX>>


human language technology | 1990

Recent progress on the VOYAGER system

Victor W. Zue; James R. Glass; David Goodine; Hong C. Leung; Michael K. McCandless; Michael S. Phillips; Joseph Polifroni; Stephanie Seneff

The VOYAGER speech recognition system, which was described in some detail at the last DARPA meeting [9], is an urban exploration system which provides the user with help in locating various sites in the area of Cambridge, Massachusetts. The system has a limited database of objects such as banks, restaurants, and post offices and can provide information about these objects (e.g., phone numbers, type of cuisine served) as well as providing navigational assistance between them. VOYAGER accepts both spoken and typed input and responds in the form of text, graphics, and synthesized speech. Since the last meeting, we have made developments to VOYAGER that have had an impact on the usability of the system.


human language technology | 1989

The VOYAGER speech understanding system: a progress report

Victor W. Zue; James R. Glass; David Goodine; Hong C. Leung; Michael S. Phillips; Joseph Polifroni; Stephanie Seneff

As part of the DARPA Spoken Language System program, we recently initiated an effort in spoken language understanding. A spoken language system addresses applications in which speech is used for interactive problem solving between a person and a computer. In these applications, not only must the system convert the speech signal into text, it must also understand the linguistic structure of a sentence in order to generate the correct response. This paper describes our early experience with the development of the MIT VOYAGER spoken language system.


human language technology | 1991

Development and preliminary evaluation of the MIT ATIS system

Stephanie Seneff; James R. Glass; David Goddeau; David Goodine; Lynette Hirschman; Hong C. Leung; Michael S. Phillips; Joseph Polifroni; Victor W. Zue

This paper represents a status report on the MIT ATIS system. The most significant new achievement is that we now have a speech-input mode. It is based on the MIT SUMMIT system using context independent phone models, and includes a word-pair grammar with perplexity 92 (on the June-90 test set). In addition, we have completely redesigned the back-end component, in order to emphasize portability and extensibility. The parser now produces an intermediate semantic frame representation, which serves as the focal point for all back-end operations, such as history management, text generation, and SQL query generation. Most of those aspects of the system that are tied to a particular domain are now entered through a set of tables associated with a small artificial language for decoding them. We have also improved the display of the database table, making it considerably easier for a subject to comprehend the information given. We report here on the results of the official DARPA February-91 evaluation, as well as on results of an evaluation on data collected at MIT, for both speech input and text input.


human language technology | 1990

Recent progress on the SUMMIT system

Victor W. Zue; James R. Glass; David Goodine; Hong C. Leung; Michael S. Phillips; Joseph Polifroni; Stephanie Seneff

The SUMMIT system is a speaker-independent, continuous-speech recognition system that we have developed at MIT [12]. To date, the system has been ported to a variety of tasks with vocabulary sizes up to 1000 words and perplexities up to 73. The architecture of this system is a product of two guiding principles. First, we desired a framework that could be flexible and modular so that we could explore alternative strategies for embedding speech knowledge into the system. Second, we required that the system be stochastic and trainable from a large body of speech data to account for our current incomplete knowledge of the acoustic realization of speech. The current implementation of the system is a reflection of both of these ideas. SUMMIT differs from the majority of prevailing HMM approaches in many respects ranging from its use of auditory models and selected acoustic measurements, to its segmental framework and use of pronunciation networks. In time, the specific implementation of these ideas will undoubtedly be modified as we discover superior techniques and approaches. Until phonetic and word recognition accuracies are competitive with those of human listeners however, we believe it will be appropriate to incorporate both notions of flexibility and trainability into the system.


human language technology | 1989

Preliminary evaluation of the VOYAGER spoken language system

Victor W. Zue; James R. Glass; David Goodine; Hong C. Leung; Michael S. Phillips; Joseph Polifroni; Stephanie Seneff

VOYAGER is a speech understanding system currently under development at MIT. It provides information and navigational assistance for a geographical area within the city of Cambridge, Massachusetts. Recently, we have completed the initial implementation of the system. This paper describes the preliminary evaluation of VOYAGER, using a spontaneous speech database that was also recently collected.


human language technology | 1991

Signal representation, attribute extraction and, the use of distinctive features for phonetic classification

Helen M. Meng; Victor W. Zue; Hong C. Leung

The study reported in this paper addresses three issues related to phonetic classification: 1) whether it is important to choose an appropriate signal representation, 2) whether there are any advantages in extracting acoustic attributes over directly using the spectral information, and 3) whether it is advantageous to introduce an intermediate set of linguistic units, i.e. distinctive features. To restrict the scope of our study, we focused on 16 vowels in American English, and investigated classification performance using an artificial neural network with nearly 22,000 vowels tokens from 550 speakers excised from the TIMIT corpus. Our results indicate that 1) the combined outputs of Seneffs auditory model outperforms five other representations with both undegraded and noisy speech, 2) acoustic attributes give similar performance to raw spectral information, but at potentially considerable computational savings, and 3) the distinctive feature representation gives similar performance to direct vowel classification, but potentially offers a more flexible mechanism for describing context dependency.


Journal of the Acoustical Society of America | 1984

Automatic alignment of phonetic transcriptions with continuous speech

Victor W. Zue; Hong C. Leung

The alignment of a speech signal with its corresponding phonetic transcription is an essential process in speech research, since the time‐aligned transcription provides direct access to specific phonetic events in the signal. Traditionally, the alignment is done manually by a trained acoustic phonetician. The task, however, is prone to error and is extremely time consuming. This paper describes a system that performs the time alignment automatically. The alignment is achieved using a standard pattern classifiction algorithm and a dynamic programming algorithm, augmented with acoustic‐phonetic constraints. The speech signal is first segmented into six broad phoentic classes using a sequence of nonparametric pattern classifiers arranged in a binary decision tree. The output of this initial classification is then aligned with the transcription using a knowledge‐based dynamic programming algorithm. The aligned broad class segments provide “islands of reliability” for more detailed segmentation and refinement ...


Journal of the Acoustical Society of America | 1988

Recoguition of vowels using artificial neural networks

Hong C. Leung; Victor W. Zue

This paper is concerned with the application of artificial neural networks to phonetic recognition. This work is motivated by the observation that improved knowledge of feature extraction is often overshadowed by relative ignorance on how to combine them into a robust decision. The goal is to investigate how the self‐organizing framework of artificial neural networks can be exploited to enable different acoustic cues to interact. The investigation is couched in experiments that recognize the 16 vowels in American English, using some 10 000 tokens in all phonetic contexts. The tokens were extracted from 1000 sentences spoken by 140 males and 60 females. It was found that, by replacing the mean‐squared error metric with a weighted one to train a multilayer perception, better recognition accuracy and rank order statistics were consistently obtained. Using the two‐layer perceptron in a context‐independent manner, a top‐choice accuracy of 54% was achieved, which compares favorably with results reported in the ...

Collaboration


Dive into the Hong C. Leung's collaboration.

Top Co-Authors

Avatar

Victor W. Zue

Massachusetts Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

James R. Glass

Massachusetts Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

Michael S. Phillips

Massachusetts Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

Stephanie Seneff

Massachusetts Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

David Goodine

Massachusetts Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

Joseph Polifroni

Massachusetts Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

David Goddeau

Massachusetts Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

I. Lee Hetherington

Massachusetts Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

Michael K. McCandless

Massachusetts Institute of Technology

View shared research outputs
Researchain Logo
Decentralizing Knowledge