Katharina Probst
Carnegie Mellon University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Katharina Probst.
Machine Translation | 2002
Katharina Probst; Lori S. Levin; Erik Peterson; Alon Lavie; Jaime G. Carbonell
The AVENUE project contains a run-time machine translationprogram that is surrounded by pre- and post-run-time modules. Thepost-run-time module selects among translation alternatives. Thepre-run-time modules are concerned with elicitation of data andautomatic learning of transfer rules in order to facilitate thedevelopment of machine translation between a language with extensiveresources for natural language processing and a language with fewresources for natural language processing. This paper describes therun-time transfer-based machine translation system as well as two ofthe pre-run-time modules: elicitation of data from the minoritylanguage and automated learning of transfer rules from theelicited data.
Speech Communication | 2002
Katharina Probst; Yan Ke; Maxine Eskenazi
Abstract In the past, educators relied on classroom observation to determine the relevance of various pedagogical techniques. Automated language learning now allows us to examine pedagogical questions in a much more rigorous manner. We can use a computer-assisted language learning (CALL) system as a base, tracing all user responses and controlling the information given out. We have thus used the Fluency system [Proceedings of Speech Technology in Language and Learning, 1998, p. 77] to answer the question of what voice a language learner should imitate when working on pronunciation. In this article, we will examine whether there should be a choice of model speakers and what characteristics of a models voice may be important to match when there is a choice.
conference of the association for machine translation in the americas | 2002
Jaime G. Carbonell; Katharina Probst; Erik Peterson; Christian Monson; Alon Lavie; Ralf D. Brown; Lori S. Levin
Machine Translation of minority languages presents unique challenges, including the paucity of bilingual training data and the unavailability of linguistically-trained speakers. This paper focuses on a machine learning approach to transfer-based MT, where data in the form of translations and lexical alignments are elicited from bilingual speakers, and a seeded version-space learning algorithm formulates and refines transfer rules. A rule-generalization lattice is defined based on LFG-style f-structures, permitting generalization operators in the search for the most general rules consistent with the elicited data. The paper presents these methods and illustrates examples.
north american chapter of the association for computational linguistics | 2003
Katharina Probst
We describe an approach to tagging a monolingual dictionary with linguistic features. In particular, we annotate the dictionary entries with parts of speech, number, and tense information. The algorithm uses a bilingual corpus as well as a statistical lexicon to find candidate training examples for specific feature values (e.g. plural). Then a similarity measure in the space defined by the training data serves to define a classifier for unseen data. We report evaluation results for a French dictionary, while the approach is general enough to be applied to any language pair.In a further step, we show that the proposed framework can be used to assign linguistic roles to extracted morphemes, e.g. noun plural markers. While the morphemes can be extracted using any algorithm, we present a simple algorithm for doing so. The emphasis hereby is not on the algorithm itself, but on the power of the framework to assign roles, which are ultimately indispensable for tasks such as Machine Translation.
conference of the association for machine translation in the americas | 2004
Ariadna Font Llitjós; Katharina Probst; Jaime G. Carbonell
This paper compares a manually written MT grammar and a grammar learned automatically from an English-Spanish elicitation corpus with the ultimate purpose of automatically refining the translation rules. The experiment described here shows that the kind of automatic refinement operations required to correct a translation not only varies depending on the type of error, but also on the type of grammar. This paper describes the two types of grammars and gives a detailed error analysis of their output, indicating what kinds of refinements are required in each case.
meeting of the association for computational linguistics | 2002
Katharina Probst; Ralf D. Brown
We describe an approach to improve the bilingual cooccurrence dictionary that is used for word alignment, and evaluate the improved dictionary using a version of the Competitive Linking algorithm. We demonstrate a problem faced by the Competitive Linking algorithm and present an approach to ameliorate it. In particular, we rebuild the bilingual dictionary by clustering similar words in a language and assigning them a higher cooccurrence score with a given word in the other language than each single word would have otherwise. Experimental results show a significant improvement in precision and recall for word alignment when the improved dicitonary is used.
conference of the association for machine translation in the americas | 2004
Katharina Probst; Alon Lavie
We describe an approach to creating a small but diverse corpus in English that can be used to elicit information about any target language. The focus of the corpus is on structural information. The resulting bilingual corpus can then be used for natural language processing tasks such as inferring transfer mappings for Machine Translation. The corpus is sufficiently small that a bilingual user can translate and word-align it within a matter of hours. We describe how the corpus is created and how its structural diversity is ensured. We then argue that it is not necessary to introduce a large amount of redundancy into the corpus. This is shown by creating an increasingly redundant corpus and observing that the information gained converges as redundancy increases.
ACM Transactions on Asian Language Information Processing | 2003
Alon Lavie; Stephan Vogel; Lori S. Levin; Erik Peterson; Katharina Probst; Ariadna Font Llitjós; Rachel Reynolds; Jaime G. Carbonell; Richard J. Cohen
Proceedings of the IEEE | 2004
Alon Lavie; Erik Peterson; Katharina Probst; Shuly Wintner; Yaniv Eytani
Proceedings of the 9th Meeting of the European Association for Machine Translation (EAMT) | 2004
Alon Lavie; Katharina Probst; Erik Peterson; Stephan Vogel; Lori S. Levin; Ariadna Font-Llitjós; Jaime G. Carbonell