Stephen A. Della Pietra

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Stephen A. Della Pietra is active.

Explore More

Publication

Featured researches published by Stephen A. Della Pietra.

meeting of the association for computational linguistics | 1991

WORD-SENSE DISAMBIGUATION USING STATISTICAL METHODS

Peter F. Brown; Stephen A. Della Pietra; Vincent J. Della Pietra; Robert L. Mercer

We describe a statistical technique for assigning senses to words. An instance of a word is assigned a sense by asking a question about the context in which the word appears. The question is constructed to have high mutual information with the translation of that instance in another language. When we incorporated this method of assigning senses into our statistical machine translation system, the error rate of the system decreased by thirteen percent.

human language technology | 1994

The Candide system for machine translation

Adam L. Berger; Peter F. Brown; Stephen A. Della Pietra; Vincent J. Della Pietra; John R. Gillett; John D. Lafferty; Robert L. Mercer; Harry Printz; Lubos Ures

We present an overview of Candide, a system for automatic translation of French text to English text. Candide uses methods of information theory and statistics to develop a probability model of the translation process. This model, which is made to accord as closely as possible with a large body of French and English sentence pairs, is then used to generate English translations of previously unseen French sentences. This paper provides a tutorial in these methods, discussions of the training and operation of the system, and a summary of test results.

human language technology | 1991

A statistical approach to sense disambiguation in machine translation

Peter F. Brown; Stephen A. Della Pietra; Vincent J. Della Pietra; Robert L. Mercer

human language technology | 1993

But dictionaries are data too

Peter F. Brown; Stephen A. Della Pietra; Vincent J. Della Pietra; Meredith J. Goldsmith; Jan Hajic; Robert L. Mercer; Surya Mohanty

Although empiricist approaches to machine translation depend vitally on data in the form of large bilingual corpora, bilingual dictionaries are also a source of information. We show how to model at least a part of the information contained in a bilingual dictionary so that we can treat a bilingual dictionary and a bilingual corpus as two facets of a unified collection of data from which to extract values for the parameters of a probabilistic machine translation system. We give an algorithm for obtaining maximum likelihood estimates of the parameters of a probabilistic model from this combined data and we show how these parameters are affected by inclusion of the dictionary for some sample words.

meeting of the association for computational linguistics | 1997

Fertility Models for Statistical Natural Language Understanding

Stephen A. Della Pietra; Mark E. Epstein; Salim Roukos; Todd Ward

Several recent efforts in statistical natural language understanding (NLU) have focused on generating clumps of English words from semantic meaning concepts (Miller et al., 1995; Levin and Pieracini, 1995; Epstein et al., 1996; Epstein, 1996). This paper extends the IBM Machine Translation Groups concept of fertility (Brown et al., 1993) to the generation of clumps for natural language understanding. The basic underlying intuition is that a single concept may be expressed in English as many disjoint clump of words. We present two fertility models which attempt to capture this phenomenon. The first is a Poisson model which leads to appealing computational simplicity. The second is a general nonparametric fertility model. The general models parameters are boot-strapped from the Poisson model and updated by the EM algorithm. These fertility models can be used to impose clump fertility structure on top of preexisting clump generation models. Here, we present results for adding fertility structure to unigram, bigram, and headword clump generation models on ARPAs Air Travel Information Service (ATIS) domain.

human language technology | 1992

Dividing and conquering long sentences in a translation system

Peter F. Brown; Stephen A. Della Pietra; Vincent J. Della Pietra; Robert L. Mercer; Surya Mohanty

The time required for our translation system to handle a sentence of length l is a rapidly growing function of l. We describe here a method for analyzing a sentence into a series of pieces that can be translated sequentially. We show that for sentences with ten or fewer words, it is possible to decrease the translation time by 40% with almost no effect on translation accuracy. We argue that for longer sentences, the effect should be more dramatic.

Computational Linguistics | 1993