Ido Dagan | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Ido Dagan is active.

Explore More

Publication

Featured researches published by Ido Dagan.

meeting of the association for computational linguistics | 2007

The Third PASCAL Recognizing Textual Entailment Challenge

Danilo Giampiccolo; Bernardo Magnini; Ido Dagan; Bill Dolan

This paper presents the Third PASCAL Recognising Textual Entailment Challenge (RTE-3), providing an overview of the dataset creating methodology and the submitted systems. In creating this years dataset, a number of longer texts were introduced to make the challenge more oriented to realistic scenarios. Additionally, a pool of resources was offered so that the participants could share common tools. A pilot task was also set up, aimed at differentiating unknown entailments from identified contradictions and providing justifications for overall system decisions. 26 participants submitted 44 runs, using different approaches and generally presenting new entailment models and achieving higher scores than in the previous challenges.

Machine Learning | 1999

Similarity-Based Models of Word Cooccurrence Probabilities

Ido Dagan; Lillian Lee; Fernando Pereira

In many applications of natural language processing (NLP) it is necessary to determine the likelihood of a given word combination. For example, a speech recognizer may need to determine which of the two word combinations “eat a peach” and ”eat a beach” is more likely. Statistical NLP methods determine the likelihood of a word combination from its frequency in a training corpus. However, the nature of language is such that many word combinations are infrequent and do not occur in any given corpus. In this work we propose a method for estimating the probability of such previously unseen word combinations using available information on “most similar” words.We describe probabilistic word association models based on distributional word similarity, and apply them to two tasks, language modeling and pseudo-word disambiguation. In the language modeling task, a similarity-based model is used to improve probability estimates for unseen bigrams in a back-off language model. The similarity-based method yields a 20% perplexity improvement in the prediction of unseen bigrams and statistically significant reductions in speech-recognition error.We also compare four similarity-based estimation methods against back-off and maximum-likelihood estimation methods on a pseudo-word sense disambiguation task in which we controlled for both unigram and bigram frequency to avoid giving too much weight to easy-to-disambiguate high-frequency configurations. The similarity-based methods perform up to 40% better on this particular task.

meeting of the association for computational linguistics | 1991

Two Languages Are More Informative Than One

Ido Dagan; Alon Itai; Ulrike Schwall

This paper presents a new approach for resolving lexical ambiguities in one language using statistical data on lexical relations in another language. This approach exploits the differences between mappings of words to senses in different languages. We concentrate on the problem of target word selection in machine translation, for which the approach is directly applicable, and employ a statistical model for the selection mechanism. The model was evaluated using two sets of Hebrew and German examples and was found to be very useful for disambiguation.

Computer Speech & Language | 1995

Contextual Word Similarity and Estimation from Sparse Data

Ido Dagan; Shaul Marcus; Shaul Markovitch

Abstract In recent years there is much interest in word co-occurrence relations, such as n-grams, verb–object combinations, or co-occurrence within a limited context. This paper discusses how to estimate the likelihood of co-occurrences that do not occur in the training data. We present a method that makes local analogies between each specific unobserved co-occurrence and other co-occurrences that contain similar words. These analogies are based on the assumption that similar word co-occurrences have similar values of mutual information. Accordingly, the word similarity metric captures similarities between vectors of mutual information values. Our evaluation suggests that this method performs better than existing, frequency-based, smoothing methods, and may provide an alternative to class-based models. A background survey is included, covering issues of lexical co-occurrence, data sparseness and smoothing, word similarity and clustering, and mutual information.

Natural Language Engineering | 2009

Recognizing textual entailment: Rational, evaluation and approaches

Ido Dagan; Bill Dolan; Bernardo Magnini; Dan Roth

The goal of identifying textual entailment – whether one piece of text can be plausibly inferred from another – has emerged in recent years as a generic core problem in natural language understanding. Work in this area has been largely driven by the PASCAL Recognizing Textual Entailment (RTE) challenges, which are a series of annual competitive meetings. The current work exhibits strong ties to some earlier lines of research, particularly automatic acquisition of paraphrases and lexical semantic relationships and unsupervised inference in applications such as question answering, information extraction and summarization. It has also opened the way to newer lines of research on more involved inference methods, on knowledge representations needed to support this natural language understanding challenge and on the use of learning methods in this context. RTE has fostered an active and growing community of researchers focused on the problem of applied entailment. This special issue of the JNLE provides an opportunity to showcase some of the most important work in this emerging area.

intelligent information systems | 1998

Mining Text Using Keyword Distributions

Ronen Feldman; Ido Dagan; Haym Hirsh

Knowledge Discovery in Databases (KDD) focuses on the computerized exploration of large amounts of data and on the discovery of interesting patterns within them. While most work on KDD has been concerned with structured databases, there has been little work on handling the huge amount of information that is available only in unstructured textual form. This paper describes the KDT system for Knowledge Discovery in Text, in which documents are labeled by keywords, and knowledge discovery is performed by analyzing the co-occurrence frequencies of the various keywords labeling the documents. We show how this keyword-frequency approach supports a range of KDD operations, providing a suitable foundation for knowledge discovery and exploration for collections of unstructured text.

meeting of the association for computational linguistics | 1999

Robust Bilingual Word Alignment for Machine Aided Translation

Ido Dagan; Kenneth Ward Church; Willian Gale

We have developed a new program called word_align for aligning parallel text, text such as the Canadian Hansards that are available in two or more languages. The program takes the output of char_align (Church, 1993), a robust alternative to sentence-based alignment programs, and applies word-level constraints using a version of Brown et al.’s Model 2 (Brown et al., 1993), modified and extended to deal with robustness issues. Word_align was tested on a subset of Canadian Hansards supplied by Simard (Simard et al., 1992). The combination of word_align plus char_align reduces the variance (average square error) by a factor of 5 over char_align alone. More importantly, because word_align and char_align were designed to work robustly on texts that are smaller and more noisy than the Hansards, it has been possible to successfully deploy the programs at AT&T Language Line Services, a commercial translation service, to help them with difficult terminology.

Natural Language Engineering | 2010

Directional distributional similarity for lexical inference

Lili Kotlerman; Ido Dagan; Idan Szpektor; Maayan Zhitomirsky-Geffet

Distributional word similarity is most commonly perceived as a symmetric relation. Yet, directional relations are abundant in lexical semantics and in many Natural Language Processing (NLP) settings that require lexical inference, making symmetric similarity measures less suitable for their identification. This paper investigates the nature of directional (asymmetric) similarity measures that aim to quantify distributional feature inclusion. We identify desired properties of such measures for lexical inference, specify a particular measure based on Average Precision that addresses these properties, and demonstrate the empirical benefit of directional measures for two different NLP datasets.

meeting of the association for computational linguistics | 2005

The Distributional Inclusion Hypotheses and Lexical Entailment

Maayan Geffet; Ido Dagan

This paper suggests refinements for the Distributional Similarity Hypothesis. Our proposed hypotheses relate the distributional behavior of pairs of words to lexical entailment -- a tighter notion of semantic similarity that is required by many NLP applications. To automatically explore the validity of the defined hypotheses we developed an inclusion testing algorithm for characteristic features of two words, which incorporates corpus and web-based feature sampling to overcome data sparseness. The degree of hypotheses validity was then empirically tested and manually analyzed with respect to the word sense level. In addition, the above testing algorithm was exploited to improve lexical entailment acquisition.

Discrete Applied Mathematics | 1988

Trapezoid graphs and their coloring

Ido Dagan; Martin Charles Golumbic; Ron Y. Pinter

Abstract We define trapezoid graphs, an extension of both interval and permutation graphs. We show that this new class properly contains the union of the two former classes, and that trapezoid graphs are equivalent to the incomparability graphs of partially ordered sets having interval order dimension at most two. We provide an optimal coloring algorithm for trapezoid graphs that runs in time O(nk), where n is the number of nodes and k is the chromatic number of the graph. Our coloring algorithm has direct applications to channel routing on integrated circuits.

Explore More