Ralf D. Brown | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Ralf D. Brown is active.

Explore More

Publication

Featured researches published by Ralf D. Brown.

international conference on computational linguistics | 1996

Example-Based Machine Translation in the Pangloss system

Ralf D. Brown

The Pangloss Example-Based Machine Translation engine (PanEBMT) is a translation system requiring essentially no knowledge of the structure of a language, merely a large parallel corpus of example sentences and a bilingual dictionary. Input texts are segmented into sequences of words occurring in the corpus, for which translations are determined by subsentential alignment of the sentence pairs containing those sequences. These partial translations are then combined with the results of other translation engines to form the final translation produced by the Pangloss system. In an internal evaluation, PanEBMT achieved 70.2% coverage of unrestricted Spanish news-wire text, despite a simplistic subsentential alignment algorithm, a suboptimal dictionary, and a corpus from a different domain than the evaluation texts.

international conference on computational linguistics | 1988

Anaphora resolution: a multi-strategy approach

Jaime G. Carbonell; Ralf D. Brown

Anaphora resolution has proven to be a very difficult problem; it requires the integrated application of syntactic, semantic, and pragmatic knowledge. This paper examines the hypothesis that instead of attempting to construct a monolithic method for resolving anaphora, the combination of multiple strategies, each exploiting a different knowledge source, proves more effective - theoretically and computationally. Cognitive plausibility is established in that human judgements of the optimal anaphoric referent accord with those of the strategy-based method, and human inability to determine a unique referent corresponds to the cases where different strategies offer conflicting candidates for the anaphoric referent.

Artificial Intelligence | 1998

Translingual information retrieval: learning from bilingual corpora

Yiming Yang; Jaime G. Carbonell; Ralf D. Brown; Robert E. Frederking

Abstract Translingual information retrieval (TLIR) consists of providing a query in one language and searching document collections in one or more different languages. This paper introduces new TLIR methods and reports on comparative TLIR experiments with these new methods and with previously reported ones in a realistic setting. Methods fall into two categories: query translation and statistical-IR approaches establishing translingual associations. The results show that using bilingual corpora for automated extraction of term equivalences in context outperforms dictionarybased methods. Translingual versions of the Generalized Vector Space Model (GVSM) and Latent Semantic Indexing (LSI) perform well, as does translingual pseudo-relevance feedback (PRF) and Example-Based Term-in-context translation (EBT). All showed relatively small performance loss between monolingual and translingual versions, ranging between 87–101% of monolingual IR performance. Query translation based on a general machine-readable bilingual dictionary—heretofore the most popular method—did not match the performance of other, more sophisticated methods. Also, the previous very high LSI results in the literature based on “mate-finding” were superseded by more realistic relevance-based evaluations; LSI performance proved comparable to that of other statistical corpus-based methods.

international conference on computational linguistics | 2000

Automated generalization of translation examples

Ralf D. Brown

Previous work has shown that adding generalization of the examples in the corpus of an example-based machine translation (EBMT) system can reduce the required amount of pretranslated example text by as much as an order of magnitude for Spanish-English and French-English EBMT. Using word clustering to automatically generalize the example corpus can provide the majority of this improvement for French-English with no manual intervention; the prior work required a large bilingual dictionary tagged with parts of speech and the manual creation of grammar rules. By seeding the clustering with a small amount of manually-created information, even better performance can be achieved. This paper describes a method whereby bilingual word clustering can be performed using standard monolingual document clustering techniques, and its effectiveness at reducing the size of the example corpus required.

international conference on computational linguistics | 1990

Human-computer interaction for semantic disambiguation

Ralf D. Brown; Sergei Nirenburg

We describe a semi-automatic semantic disambiguator integrated in a knowledge-based machine translation system. It is used to bridge the analysis and generation stages in machine translation. The user interface of the disambiguator is built on mouse-based multiple-selection menus.

conference of the association for machine translation in the americas | 2002

Automatic Rule Learning for Resource-Limited MT

Jaime G. Carbonell; Katharina Probst; Erik Peterson; Christian Monson; Alon Lavie; Ralf D. Brown; Lori S. Levin

Machine Translation of minority languages presents unique challenges, including the paucity of bilingual training data and the unavailability of linguistically-trained speakers. This paper focuses on a machine learning approach to transfer-based MT, where data in the form of translations and lexical alignments are elicited from bilingual speakers, and a seeded version-space learning algorithm formulates and refines transfer rules. A rule-generalization lattice is defined based on LFG-style f-structures, permitting generalization operators in the search for the most general rules consistent with the elicited data. The paper presents these methods and illustrates examples.

text speech and dialogue | 2013

Selecting and Weighting N-Grams to Identify 1100 Languages

Ralf D. Brown

This paper presents a language identification algorithm using cosine similarity against a filtered and weighted subset of the most frequent n-grams in training data with optional inter-string score smoothing, and its implementation in an open-source program. When applied to a collection of strings in 1100 languages containing at most 65 characters each, an average classification accuracy of over 99.2% is achieved with smoothing and 98.2% without. Compared to three other open-source language identification programs, the new program is both much more accurate and much faster at classifying short strings given such a large collection of languages.

conference of the association for machine translation in the americas | 2004

A Modified Burrows-Wheeler Transform for Highly Scalable Example-Based Translation

Ralf D. Brown

The Burrows-Wheeler Transform (BWT) was originally developed for data compression, but can also be applied to indexing text. In this paper, an adaptation of the BWT to word-based indexing of the training corpus for an example-based machine translation (EBMT) system is presented. The adapted BWT embeds the necessary information to retrieve matched training instances without requiring any additional space and can be instantiated in a compressed form which reduces disk space and memory requirements by about 40% while still remaining searchable without decompression.

north american chapter of the association for computational linguistics | 2006

Spectral Clustering for Example Based Machine Translation

Rashmi Gangadharaiah; Ralf D. Brown; Jaime G. Carbonell

Prior work has shown that generalization of data in an Example Based Machine Translation (EBMT) system, reduces the amount of pre-translated text required to achieve a certain level of accuracy (Brown, 2000). Several word clustering algorithms have been suggested to perform these generalizations, such as k-Means clustering or Group Average Clustering. The hypothesis is that better contextual clustering can lead to better translation accuracy with limited training data. In this paper, we use a form of spectral clustering to cluster words, and this is shown to result in as much as 29.08% improvement over the baseline EBMT system.

international conference on human language technology research | 2001

Adapting an example-based translation system to Chinese

Ying Zhang; Ralf D. Brown; Robert E. Frederking

We describe an Example-Based Machine Translation (EBMT) system and the adaptations and enhancements made to create a Chinese-English translation system from the Hong Kong legal code and various other bilingual resources available from the Linguistic Data Consortium (LDC).

Explore More