Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Stanley F. Chen is active.

Publication


Featured researches published by Stanley F. Chen.


Computer Speech & Language | 1999

An empirical study of smoothing techniques for language modeling

Stanley F. Chen; Joshua T. Goodman

We survey the most widely-used algorithms for smoothing models for language n -gram modeling. We then present an extensive empirical comparison of several of these smoothing techniques, including those described by Jelinek and Mercer (1980); Katz (1987); Bell, Cleary and Witten (1990); Ney, Essen and Kneser (1994), and Kneser and Ney (1995). We investigate how factors such as training data size, training corpus (e.g. Brown vs. Wall Street Journal), count cutoffs, and n -gram order (bigram vs. trigram) affect the relative performance of these methods, which is measured through the cross-entropy of test data. We find that these factors can significantly affect the relative performance of models, with the most significant factor being training data size. Since no previous comparisons have examined these factors systematically, this is the first thorough characterization of the relative performance of various algorithms. In addition, we introduce methodologies for analyzing smoothing algorithm efficacy in detail, and using these techniques we motivate a novel variation of Kneser?Ney smoothing that consistently outperforms all other algorithms evaluated. Finally, results showing that improved language model smoothing leads to improved speech recognition performance are presented.


meeting of the association for computational linguistics | 1996

An Empirical Study of Smoothing Techniques for Language Modeling

Stanley F. Chen; Joshua T. Goodman

We present an extensive empirical comparison of several smoothing techniques in the domain of language modeling, including those described by Jelinek and Mercer (1980), Katz (1987), and Church and Gale (1991). We investigate for the first time how factors such as training data size, corpus (e.g., Brown versus Wall Street Journal), and n-gram order (bigram versus trigram) affect the relative performance of these methods, which we measure through the cross-entropy of test data. In addition, we introduce two novel smoothing techniques, one a variation of Jelinek-Mercer smoothing and one a very simple linear interpolation technique, both of which outperform existing methods.


meeting of the association for computational linguistics | 1993

ALIGNING SENTENCES IN BILINGUAL CORPORA USING LEXICAL INFORMATION

Stanley F. Chen

In this paper, we describe a fast algorithm for aligning sentences with their translations in a bilingual corpus. Existing efficient algorithms ignore word identities and only consider sentence length (Brown et al., 1991b; Gale and Church, 1991). Our algorithm constructs a simple statistical word-to-word translation model on the fly during alignment. We find the alignment that maximizes the probability of generating the corpus with this translation model. We have achieved an error rate of approximately 0.4% on Canadian Hansard data, which is a significant improvement over previous results. The algorithm is language independent.


IEEE Transactions on Speech and Audio Processing | 2000

A survey of smoothing techniques for ME models

Stanley F. Chen; Ronald Rosenfeld

In certain contexts, maximum entropy (ME) modeling can be viewed as maximum likelihood (ML) training for exponential models, and like other ML methods is prone to overfitting of training data. Several smoothing methods for ME models have been proposed to address this problem, but previous results do not make it clear how these smoothing methods compare with smoothing methods for other types of related models. In this work, we survey previous work in ME smoothing and compare the performance of several of these algorithms with conventional techniques for smoothing n-gram language models. Because of the mature body of research in n-gram model smoothing and the close connection between ME and conventional n-gram models, this domain is well-suited to gauge the performance of ME smoothing methods. Over a large number of data sets, we find that fuzzy ME smoothing performs as well as or better than all other algorithms under consideration. We contrast this method with previous n-gram smoothing methods to explain its superior performance.


IEEE Transactions on Audio, Speech, and Language Processing | 2006

Advances in speech transcription at IBM under the DARPA EARS program

Stanley F. Chen; Brian Kingsbury; Lidia Mangu; Daniel Povey; George Saon; Hagen Soltau; Geoffrey Zweig

This paper describes the technical and system building advances made in IBMs speech recognition technology over the course of the Defense Advanced Research Projects Agency (DARPA) Effective Affordable Reusable Speech-to-Text (EARS) program. At a technical level, these advances include the development of a new form of feature-based minimum phone error training (fMPE), the use of large-scale discriminatively trained full-covariance Gaussian models, the use of septaphone acoustic context in static decoding graphs, and improvements in basic decoding algorithms. At a system building level, the advances include a system architecture based on cross-adaptation and the incorporation of 2100 h of training data in every system component. We present results on English conversational telephony test data from the 2003 and 2004 NIST evaluations. The combination of technical advances and an order of magnitude more training data in 2004 reduced the error rate on the 2003 test set by approximately 21% relative-from 20.4% to 16.1%-over the most accurate system in the 2003 evaluation and produced the most accurate results on the 2004 test sets in every speed category


meeting of the association for computational linguistics | 1995

Bayesian Grammar Induction for Language Modeling

Stanley F. Chen

We describe a corpus-based induction algorithm for probabilistic context-free grammars. The algorithm employs a greedy heuristic search within a Bayesian framework, and a post-pass using the Inside-Outside algorithm. We compare the performance of our algorithm to n-gram models and the Inside-Outside algorithm in three language modeling tasks. In two of the tasks, the training data is generated by a probabilistic context-free grammar and in both tasks our algorithm outperforms the other techniques. The third task involves naturally-occurring data, and in this task our algorithm does not perform as well as n-gram models but vastly outperforms the Inside-Outside algorithm.


north american chapter of the association for computational linguistics | 2009

Performance Prediction for Exponential Language Models

Stanley F. Chen

We investigate the task of performance prediction for language models belonging to the exponential family. First, we attempt to empirically discover a formula for predicting test set cross-entropy for n-gram language models. We build models over varying domains, data set sizes, and n-gram orders, and perform linear regression to see whether we can model test set performance as a simple function of training set performance and various model statistics. Remarkably, we find a simple relationship that predicts test set performance with a correlation of 0.9997. We analyze why this relationship holds and show that it holds for other exponential language models as well, including class-based models and minimum discrimination information models. Finally, we discuss how this relationship can be applied to improve language model performance.


Computer Speech & Language | 1994

Automatic speech recognition in machine-aided translation

Peter F. Brown; Stanley F. Chen; S. Della Pietra; V. Della Pietra; Andrew Kehler; Robert L. Mercer

Abstract It has been observed that humans can translate nearly four times as quickly with little loss in accuracy simply by dictating, as opposed to typing, their translations. In this paper, we consider the integration of speech recognition into a translators workstation. In particular, we show how to combine statistical models of speech, language and translation into a single system that decodes a sequence of words in a target language from a sequence of words in a source language together with an utterance of the target language sequence. Results are provided which demonstrate that the difficulty of the speech recognition task can be reduced by making use of information contained in the source text being translated.


ieee automatic speech recognition and understanding workshop | 2009

Scaling shrinkage-based language models

Stanley F. Chen; Lidia Mangu; Bhuvana Ramabhadran; Ruhi Sarikaya; Abhinav Sethy

In [1], we show that a novel class-based language model, Model M, and the method of regularized minimum discrimination information (rMDI) models outperform comparable methods on moderate amounts of Wall Street Journal data. Both of these methods are motivated by the observation that shrinking the sum of parameter magnitudes in an exponential language model tends to improve performance [2]. In this paper, we investigate whether these shrinkage-based techniques also perform well on larger training sets and on other domains. First, we explain why good performance on large data sets is uncertain, by showing that gains relative to a baseline n-gram model tend to decrease as training set size increases. Next, we evaluate several methods for data/model combination with Model M and rMDI models on limited-scale domains, to uncover which techniques should work best on large domains. Finally, we apply these methods on a variety of medium-to-large-scale domains covering several languages, and show that Model M consistently provides significant gains over existing language models for state-of-the-art systems in both speech recognition and machine translation.


international conference on acoustics, speech, and signal processing | 2006

Morpheme-Based Language Modeling for Arabic Lvcsr

Ghinwa F. Choueiter; Daniel Povey; Stanley F. Chen; Geoffrey Zweig

In this paper, we concentrate on Arabic speech recognition. Taking advantage of the rich morphological structure of the language, we use morpheme-based language modeling to improve the word error rate. We propose a simple constraining method to rid the decoding output of illegal morpheme sequences. We report the results obtained for word and morpheme language models using medium (64 kw) and large (~800 kw) vocabularies, the morpheme LM obtaining an absolute improvement of 2.4% for the former and only 0.2% for the latter. The 2.4% gain surpasses previous gains for morpheme-based LMs for Arabic, and the large vocabulary runs represent the first comparative results for vocabularies of this size for any language. Finally, we analyze the performance of the morpheme LM on word OOVs

Collaboration


Dive into the Stanley F. Chen's collaboration.

Researchain Logo
Decentralizing Knowledge