I. Dan Melamed
New York University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by I. Dan Melamed.
Computational Linguistics | 2000
I. Dan Melamed
Parallel texts (bitexts) have properties that distinguish them from other kinds of parallel data. First, most words translate to only one other word. Second, bitext correspondence is typically only partialmany words in each text have no clear equivalent in the other text. This article presents methods for biasing statistical translation models to reflect these properties. Evaluation with respect to independent human judgments has confirmed that translation models biased in this fashion are significantly more accurate than a baseline knowledge-free model. This article also shows how a statistical translation model can take advantage of preexisting knowledge that might be available about particular language pairs. Even the simplest kinds of language-specific knowledge, such as the distinction between content words and function words, are shown to reliably boost translation model performance on some tasks. Statistical models that reflect knowledge about the model domain combine the best of both the rationalist and empiricist paradigms.
north american chapter of the association for computational linguistics | 2003
I. Dan Melamed; Ryan Green; Joseph Turian
Machine translation can be evaluated using precision, recall, and the F-measure. These standard measures have significantly higher correlation with human judgments than recently proposed alternatives. More importantly, the standard measures have an intuitive interpretation, which can facilitate insights into how MT systems might be improved. The relevant software is publicly available.
meeting of the association for computational linguistics | 1997
I. Dan Melamed
Many multilingual NLP applications need to translate words between different languages, but cannot afford the computational expenses of inducing or applying a full translation model. For theses applications, we have designed a fast algorithm for estimating a partial translation model, which accounts for translational equivalence only at the word level. The models precision/recall trade-off can be directly controlled via one threshold parameter. This feature makes the model more suitable for applications that are not fully statistical. The models hidden parameters can be easily conditioned on information extrinsic to the model, providing an easy way to integrate pre-existing knowledge such as part-of-speech, dictionaries, word order, etc., Our model can link word tokens in parallel texts as well as other translation models in the literature. Unlike other translation models, it can automatically produce dictionary-sized translation lexicons, and it can do so with over 99% accuracy.
meeting of the association for computational linguistics | 2004
I. Dan Melamed
In an ordinary syntactic parser, the input is a string, and the grammar ranges over strings. This paper explores generalizations of ordinary parsing algorithms that allow the input to consist of string tuples and/or the grammar to range over string tuples. Such algorithms can infer the synchronous structures hidden in parallel texts. It turns out that these generalized parsers can do most of the work required to train and apply a syntax-aware statistical machine translation system.
meeting of the association for computational linguistics | 1997
I. Dan Melamed
The first step in most empirical work in multilingual NLP is to construct maps of the correspondence between texts and their translations (bitext maps). The Smooth Injective Map Recognizer (SIMR) algorithm presented here is a generic pattern recognition algorithm that is particularly well-suited to mapping bitext correspondence. SIMR is faster and significantly more accurate than other algorithms in the literature. The algorithm is robust enough to use on noisy texts, such as those resulting from OCR input, and on translations that are not very literal. SIMR encapsulates its language-specific heuristics, so that it can be ported to any language pair with a minimal effort.
north american chapter of the association for computational linguistics | 2003
I. Dan Melamed
Multitext Grammars (MTGs) generate arbitrarily many parallel texts via production rules of arbitrary length. Both ordinary MTGs and their bilexical subclass admit relatively efficient parsers. Yet, MTGs are more expressive than other synchronous formalisms for which parsers have been described in the literature. The combination of greater expressive power and relatively low cost of inference makes MTGs an attractive foundation for practical models of translational equivalence.
meeting of the association for computational linguistics | 2004
I. Dan Melamed; Giorgio Satta; Benjamin Wellington
Generalized Multitext Grammar (GMTG) is a synchronous grammar formalism that is weakly equivalent to Linear Context-Free Rewriting Systems (LCFRS), but retains much of the notational and intuitive simplicity of Context-Free Grammar (CFG). GMTG allows both synchronous and independent rewriting. Such flexibility facilitates more perspicuous modeling of parallel text than what is possible with other synchronous formalisms. This paper investigates the generative capacity of GMTG, proves that each component grammar of a GMTG retains its generative power, and proposes a generalization of Chomsky Normal Form, which is necessary for synchronous CKY-style parsing.
conference on applied natural language processing | 1997
Philip Resnik; I. Dan Melamed
We investigate the utility of an algorithm for translation lexicon acquisition (SABLE), used previously on a very large corpus to acquire general translation lexicons, when that algorithm is applied to a much smaller corpus to produce candidates for domain-specific translation lexicons.
international conference on computational linguistics | 1996
I. Dan Melamed
ADOMIT is an algorithm for Automatic Detection of OMIssions in Translations. The algorithm relics solely on geometric analysis of bitext maps and uses no linguistic information. This property allows it to deal equally well with omissions that do not correspond to linguistic units, such as might result from word-processing mishaps. ADOMIT has proven itself by discovering many errors in a hand-constructed gold standard for evaluating bitext mapping algorithms. Quantitative evaluation on simulated omissions showed that, even with todays poor bitext mapping technology, ADOMIT is a valuable quality control tool for translators and translation bureaus.
meeting of the association for computational linguistics | 2006
Joseph Turian; I. Dan Melamed
The present work advances the accuracy and training speed of discriminative parsing. Our discriminative parsing method has no generative component, yet surpasses a generative baseline on constituent parsing, and does so with minimal linguistic cleverness. Our model can incorporate arbitrary features of the input and parse state, and performs feature selection incrementally over an exponential feature space during training. We demonstrate the flexibility of our approach by testing it with several parsing strategies and various feature sets. Our implementation is freely available at: http://nlp.cs.nyu.edu/parser/.