Robert L. Mercer | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Robert L. Mercer is active.

Explore More

Publication

Featured researches published by Robert L. Mercer.

IEEE Transactions on Pattern Analysis and Machine Intelligence | 1983

A Maximum Likelihood Approach to Continuous Speech Recognition

Lalit R. Bahl; Frederick Jelinek; Robert L. Mercer

Speech recognition is formulated as a problem of maximum likelihood decoding. This formulation requires statistical models of the speech production process. In this paper, we describe a number of statistical models for use in speech recognition. We give special attention to determining the parameters for such models from sparse data. We also describe two decoding methods, one appropriate for constrained artificial languages and one appropriate for more realistic decoding tasks. To illustrate the usefulness of the methods described, we review a number of decoding results that have been obtained with them.

international conference on acoustics, speech, and signal processing | 1986

Maximum mutual information estimation of hidden Markov model parameters for speech recognition

Lalit R. Bahl; Peter F. Brown; P. V. de Souza; Robert L. Mercer

A method for estimating the parameters of hidden Markov models of speech is described. Parameter values are chosen to maximize the mutual information between an acoustic observation sequence and the corresponding word sequence. Recognition results are presented comparing this method with maximum likelihood estimation.

meeting of the association for computational linguistics | 1991

ALIGNING SENTENCES IN PARALLEL CORPORA

Peter F. Brown; Jennifer Lai; Robert L. Mercer

In this paper we describe a statistical technique for aligning sentences with their translations in two parallel corpora. In addition to certain anchor points that are available in our data, the only information about the sentences that we use for calculating alignments is the number of tokens that they contain. Because we make no use of the lexical details of the sentence, the alignment computation is fast and therefore practical for application to very large collections of text. We have used this technique to align several million sentences in the English-French Hansard corpora and have achieved an accuracy in excess of 99% in a random selected set of 1000 sentence pairs that we checked by hand. We show that even without the benefit of anchor points the correlation between the lengths of aligned sentences is strong enough that we should expect to achieve an accuracy of between 96% and 97%. Thus, the technique may be applicable to a wider variety of texts than we have yet tried.

meeting of the association for computational linguistics | 1991

WORD-SENSE DISAMBIGUATION USING STATISTICAL METHODS

Peter F. Brown; Stephen A. Della Pietra; Vincent J. Della Pietra; Robert L. Mercer

We describe a statistical technique for assigning senses to words. An instance of a word is assigned a sense by asking a question about the context in which the word appears. The question is constructed to have high mutual information with the translation of that instance in another language. When we incorporated this method of assigning senses into our statistical machine translation system, the error rate of the system decreased by thirteen percent.

IEEE Transactions on Acoustics, Speech, and Signal Processing | 1989

A tree-based statistical language model for natural language speech recognition

Lalit R. Bahl; Peter F. Brown; P. V. de Souza; Robert L. Mercer

The problem of predicting the next word a speaker will say, given the words already spoken; is discussed. Specifically, the problem is to estimate the probability that a given word will be the next word uttered. Algorithms are presented for automatically constructing a binary decision tree designed to estimate these probabilities. At each node of the tree there is a yes/no question relating to the words already spoken, and at each leaf there is a probability distribution over the allowable vocabulary. Ideally, these nodal questions can take the form of arbitrarily complex Boolean expressions, but computationally cheaper alternatives are also discussed. Some results obtained on a 5000-word vocabulary with a tree designed to predict the next word spoken from the preceding 20 words are included. The tree is compared to an equivalent trigram model and shown to be superior. >

IEEE Transactions on Information Theory | 1975

Design of a linguistic statistical decoder for the recognition of continuous speech

Frederick Jelinek; Lalit R. Bahl; Robert L. Mercer

Most current attempts at automatic speech recognition are formulated in an artificial intelligence framework. In this paper we approach the problem from an information-theoretic point of view. We describe the overall structure of a linguistic statistical decoder (LSD) for the recognition of continuous speech. The input to the decoder is a string of phonetic symbols estimated by an acoustic processor (AP). For each phonetic string, the decoder finds the most likely input sentence. The decoder consists of four major subparts: 1) a statistical model of the language being recognized; 2) a phonemic dictionary and statistical phonological rules characterizing the speaker; 3) a phonetic matching algorithm that computes the similarity between phonetic strings, using the performance characteristics of the AP; 4) a word level search control. The details of each of the subparts and their interaction during the decoding process are discussed.

Information Processing and Management | 1991

Context based spelling correction

Eric Mays; Fred J. Damerau; Robert L. Mercer

Abstract Some mistakes in spelling and typing produce correct words, such as typing “fig” when “fog” was intended. These errors are undetectable by traditional spelling correction techniques. In this paper we present a statistical technique capable of detecting and correcting some of these errors when they occur in sentences. Experimental results show that this technique is capable of detecting 76% of simple spelling errors and correcting 73%.

meeting of the association for computational linguistics | 1993

Towards History-based Grammars: Using Richer Models for Probabilistic Parsing

Ezra Black; Frederick Jelinek; John Lafrerty; David M. Magerman; Robert L. Mercer; Salim Roukos

We describe a generative probabilistic model of natural language, which we call HBG, that takes advantage of detailed linguistic information to resolve ambiguity. HBG incorporates lexical, syntactic, semantic, and structural information from the parse tree into the disambiguation process in a novel way. We use a corpus of bracketed sentences, called a Treebank, in combination with decision tree building to tease out the relevant aspects of a parse tree that will determine the correct parse of a sentence. This stands in contrast to the usual approach of further grammar tailoring via the usual linguistic introspection in the hope of generating the correct parse. In head-to-head tests against one of the best existing robust probabilistic parsing models, which we call P-CFG, the HBG model significantly outperforms P-CFG, increasing the parsing accuracy rate from 60% to 75%, a 37% reduction in error.

Archive | 1992

Basic Methods of Probabilistic Context Free Grammars

Frederick Jelinek; John D. Lafferty; Robert L. Mercer

In automatic speech recognition, language models can be represented by Probabilistic Context Free Grammars (PCFGs). In this lecture we review some known algorithms which handle PCFGs; in particular an algorithm for the computation of the total probability that a PCFG generates a given sentence (Inside), an algorithm for finding the most probable parse tree (Viterbi), and an algorithm for the estimation of the probabilities of the rewriting rules of a PCFG given a corpus (Inside-Outside). Moreover, we introduce the Left-to-Right Inside algorithm, which computes the probability that successive applications of the grammar rewriting rules (beginning with the sentence start symbol s) produce a word string whose initial substring is a given one.

international conference on acoustics speech and signal processing | 1988

A new algorithm for the estimation of hidden Markov model parameters

Lalit R. Bahl; Peter F. Brown; P. V. de Souza; Robert L. Mercer

Discusses the problem of estimating the parameter values of hidden Markov word models for speech recognition. The authors argue that maximum-likelihood estimation of the parameters does not lead to values which maximize recognition accuracy and describe an alternative estimation procedure called corrective training which is aimed at minimizing the number of recognition errors. Corrective training is similar to a well-known error-correcting training procedure for linear classifiers and works by iteratively adjusting the parameter values so as to make correct words more probable and incorrect words less probable. There are also strong parallels between corrective training and maximum mutual information estimation. They do not prove that the corrective training algorithm converges, but experimental evidence suggests that it does, and that it leads to significantly fewer recognition errors than maximum likelihood estimation.<<ETX>>

Explore More