Ronald Rosenfeld
Carnegie Mellon University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Ronald Rosenfeld.
Computer Speech & Language | 1996
Ronald Rosenfeld
An adaptive statistical language model is described, which successfully integrates long distance linguistic information with other knowledge sources. Most existing statistical language models exploit only the immediate history of a text. To extract information from further back in the documents history, we propose and usetrigger pairsas the basic information bearing elements. This allows the model to adapt its expectations to the topic of discourse. Next, statistical evidence from multiple sources must be combined. Traditionally, linear interpolation and its variants have been used, but these are shown here to be seriously deficient. Instead, we apply the principle of Maximum Entropy (ME). Each information source gives rise to a set of constraints, to be imposed on the combined estimate. The intersection of these constraints is the set of probability functions which are consistent with all the information sources. The function with the highest entropy within that set is the ME solution. Given consistent statistical evidence, a unique ME solution is guaranteed to exist, and an iterative algorithm exists which is guaranteed to converge to it. The ME framework is extremely general: any phenomenon that can be described in terms of statistics of the text can be readily incorporated. An adaptive language model based on the ME approach was trained on theWall Street Journalcorpus, and showed a 32–39% perplexity reduction over the baseline. When interfaced to SPHINX-II, Carnegie Mellons speech recognizer, it reduced its error rate by 10–14%. This thus illustrates the feasibility of incorporating many diverse knowledge sources in a single, unified statistical framework.
Proceedings of the IEEE | 2000
Ronald Rosenfeld
Statistical language models estimate the distribution of various natural language phenomena for the purpose of speech recognition and other language technologies. Since the first significant model was proposed in 1980, many attempts have been made to improve the state of the art. We review them, point to a few promising directions, and argue for a Bayesian approach to integration of linguistic theories with data.
Computer Speech & Language | 1992
Xuedong Huang; Fileno A. Alleva; Hsiao-Wuen Hon; Mei-Yuh Hwang; Ronald Rosenfeld
In order for speech recognizers to deal with increased task perplexity, speaker variation, and environment variation, improved speech recognition is critical. Steady progress has been made along these three dimensions at Carnegie Mellon. In this paper, we review the SPHINX-II speech recognition system and summarize our recent efforts on improved speech recognition.
international conference on acoustics, speech, and signal processing | 1993
Raymond Lau; Ronald Rosenfeld; Salim Roukos
Ongoing efforts at adaptive statistical language modeling are described. To extract information from the document history, trigger pairs are used as the basic information-bearing elements. To combine statistical evidence from multiple triggers, the principle of maximum entropy (ME) is used. To combine the trigger-based model with the static model, the latter is absorbed into the ME formalism. Given consistent statistical evidence, a unique ME solution is guaranteed to exist, and an iterative algorithm exists which is guaranteed to converge to it. Among the advantages of this approach are its simplicity, generality, and incremental nature. Among its disadvantages are its computational requirements. The model described here was trained on five million words of Wall Street Journal text. It used some 40000 unigram constraints, 200000 bigram constraints, 200000 trigram constraints, and 60000 trigger constraints. After 13 iterations, it produced a language model whose perplexity was 12% lower than that of a conventional trigram, as measured on independent data.<<ETX>>
IEEE Transactions on Speech and Audio Processing | 2000
Stanley F. Chen; Ronald Rosenfeld
In certain contexts, maximum entropy (ME) modeling can be viewed as maximum likelihood (ML) training for exponential models, and like other ML methods is prone to overfitting of training data. Several smoothing methods for ME models have been proposed to address this problem, but previous results do not make it clear how these smoothing methods compare with smoothing methods for other types of related models. In this work, we survey previous work in ME smoothing and compare the performance of several of these algorithms with conventional techniques for smoothing n-gram language models. Because of the mature body of research in n-gram model smoothing and the close connection between ME and conventional n-gram models, this domain is well-suited to gauge the performance of ME smoothing methods. Over a large number of data sets, we find that fuzzy ME smoothing performs as well as or better than all other algorithms under consideration. We contrast this method with previous n-gram smoothing methods to explain its superior performance.
international conference on acoustics, speech, and signal processing | 2001
Xiaojin Zhu; Ronald Rosenfeld
We propose a method for using the World Wide Web to acquire trigram estimates for statistical language modeling. We submit an N-gram as a phrase query to Web search engines. The search engines return the number of Web pages containing the phrase, from which the N-gram count is estimated. The N-gram counts are then used to form Web-based trigram probability estimates. We discuss the properties of such estimates, and methods to interpolate them with traditional corpus based trigram estimates. We show that the interpolated models improve speech recognition word error rate significantly over a small test set.
international conference on spoken language processing | 1996
Kristie Seymore; Ronald Rosenfeld
When a trigram backoff language model is created from a large body of text, trigrams and bigrams that occur few times in the training text are often excluded from the model in order to decrease the model size. Generally, the elimination of n-grams with very low counts is believed to not significantly affect model performance. This project investigates the degradation of a trigram backoff models perplexity and word error rates as bigram and trigram cutoffs are increased. The advantage of reduction in model size is compared to the increase in word error rate and perplexity scores. More importantly, this project also investigates alternative ways of excluding bigrams and trigrams from a backoff language model, using criteria other than the number of times an n-gram occurs in the training text. Specifically, a difference method has been investigated where the difference in the logs of the original and backed off trigram and bigram probabilities is used as a basis for n-gram exclusion from the model. We show that excluding trigrams and bigrams based on a weighted version of this difference method results in better perplexity and word error rate performance than excluding trigrams and bigrams based on counts alone.
Interactions | 2001
Ronald Rosenfeld; Dan R. Olsen; Alexander I. Rudnicky
In recent years speech recognition has reached the point of commercial viability realizable on any off-the-shelf computer. This is a goal that has long been sought by both the research community and by prospective users. Anyone who has used these technologies understands that the recognition has many flaws and there is much still to be done. The recognition algorithms are not the whole story. There is still the question of how speech can and should actually be used. Related to this is the issue of tools for development of speech-based applications. Achieving reliable, accurate speech recognition is similar to building an inexpensive mouse and keyboard. The underlying input technology is available but the question of how to build the application interface still remains. We have been considering these problems for some time [Rosenfeld et. al., 2000a]. In this paper we present some of our thoughts about the future of speech-based interaction. This paper is not a report of results we have obtained, but rather a vision of a future to be explored.
human language technology | 1993
Xuedong Huang; Fileno A. Alleva; Mei-Yuh Hwang; Ronald Rosenfeld
In the past year at Carnegie Mellon steady progress has been made in the area of acoustic and language modeling. The result has been a dramatic reduction in speech recognition errors in the SPHINX-II system. In this paper, we review SPHINX-II and summarize our recent efforts on improved speech recognition. Recently SPHINX-II achieved the lowest error rate in the November 1992 DARPA evaluations. For 5000-word, speaker-independent, continuous, speech recognition, the error rate was reduced to 5%.
human language technology | 1992
Ronald Rosenfeld; Xuedong Huang
We describe two attempt to improve our stochastic language models. In the first, we identify a systematic overestimation in the traditional backoff model, and use statistical reasoning to correct it. Our modification results in up to 6% reduction in the perplexity of various tasks. Although the improvement is modest, it is achieved with hardly any increase in the complexity of the model. Both analysis and empirical data suggest that the modification is most suitable when training data is sparse.In the second attempt, we propose a new type of adaptive language model. Existing adaptive models use a dynamic cache, based on the history of the document seen up to that point. But another source of information in the history, within-document word sequence correlations, has not yet been tapped. We describe a model that attempts to capture this information, using a framework where one word sequence triggers another, causing its estimated probability to be raised. We discuss various issues in the design of such a model, and describe our first attempt at building one. Our preliminary results include a perplexity reduction of between 10% and 32%, depending on the test set.