Oren Melamud
Bar-Ilan University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Oren Melamud.
conference on computational natural language learning | 2016
Oren Melamud; Jacob Goldberger; Ido Dagan
Context representations are central to various NLP tasks, such as word sense disambiguation, named entity recognition, coreference resolution, and many more. In this work we present a neural model for efficiently learning a generic context embedding function from large corpora, using bidirectional LSTM. With a very simple application of our context representations, we manage to surpass or nearly reach state-of-the-art results on sentence completion, lexical substitution and word sense disambiguation tasks, while substantially outperforming the popular context representation of averaged word embeddings. We release our code and pretrained models, suggesting they could be useful in a wide variety of NLP tasks.
north american chapter of the association for computational linguistics | 2015
Oren Melamud; Omer Levy; Ido Dagan
The lexical substitution task requires identifying meaning-preserving substitutes for a target word instance in a given sentential context. Since its introduction in SemEval-2007, various models addressed this challenge, mostly in an unsupervised setting. In this work we propose a simple model for lexical substitution, which is based on the popular skip-gram word embedding model. The novelty of our approach is in leveraging explicitly the context embeddings generated within the skip-gram model, which were so far considered only as an internal component of the learning process. Our model is efficient, very simple to implement, and at the same time achieves state-ofthe-art results on lexical substitution tasks in an unsupervised setting.
north american chapter of the association for computational linguistics | 2016
Oren Melamud; David McClosky; Siddharth Patwardhan; Mohit Bansal
We provide the first extensive evaluation of how using different types of context to learn skip-gram word embeddings affects performance on a wide range of intrinsic and extrinsic NLP tasks. Our results suggest that while intrinsic tasks tend to exhibit a clear preference to particular types of contexts and higher dimensionality, more careful tuning is required for finding the optimal settings for most of the extrinsic tasks that we considered. Furthermore, for these extrinsic tasks, we find that once the benefit from increasing the embedding dimensionality is mostly exhausted, simple concatenation of word embeddings, learned with different context types, can yield further performance gains. As an additional contribution, we propose a new variant of the skip-gram model that learns word embeddings from weighted contexts of substitute words.
north american chapter of the association for computational linguistics | 2015
Oren Melamud; Ido Dagan; Jacob Goldberger
Context representations are a key element in distributional models of word meaning. In contrast to typical representations based on neighboring words, a recently proposed approach suggests to represent a context of a target word by a substitute vector, comprising the potential fillers for the target word slot in that context. In this work we first propose a variant of substitute vectors, which we find particularly suitable for measuring context similarity. Then, we propose a novel model for representing word meaning in context based on this context representation. Our model outperforms state-of-the-art results on lexical substitution tasks in an unsupervised setting.
workshop on innovative use of nlp for building educational applications | 2014
Torsten Zesch; Oren Melamud
Automatically generating challenging distractors for multiple-choice gap-fill items is still an unsolved problem. We propose to employ context-sensitive lexical inference rules in order to generate distractors that are semantically similar to the gap target word in some sense, but not in the particular sense induced by the gap-fill context. We hypothesize that such distractors should be particularly hard to distinguish from the correct answer. We focus on verbs as they are especially difficult to master for language learners and find that our approach is quite effective. In our test set of 20 items, our proposed method decreases the number of invalid distractors in 90% of the cases, and fully eliminates all of them in 65%. Further analysis on that dataset does not support our hypothesis regarding item difficulty as measured by average error rate of language learners. We conjecture that this may be due to limitations in our evaluation setting, which we plan to address in future work.
conference on computational natural language learning | 2014
Oren Melamud; Ido Dagan; Jacob Goldberger; Idan Szpektor; Deniz Yuret
Most traditional distributional similarity models fail to capture syntagmatic patterns that group together multiple word features within the same joint context. In this work we introduce a novel generic distributional similarity scheme under which the power of probabilistic models can be leveraged to effectively model joint contexts. Based on this scheme, we implement a concrete model which utilizes probabilistic n-gram language models. Our evaluations suggest that this model is particularly wellsuited for measuring similarity for verbs, which are known to exhibit richer syntagmatic patterns, while maintaining comparable or better performance with respect to competitive baselines for nouns. Following this, we propose our scheme as a framework for future semantic similarity models leveraging the substantial body of work that exists in probabilistic language modeling.
ACM Transactions on Algorithms | 2012
Yonatan Aumann; Moshe Lewenstein; Oren Melamud; Ron Y. Pinter; Zohar Yakhini
We introduce a generalization of interval graphs, which we call Dotted Interval Graphs (DIG). A dotted interval graph is an intersection graph of arithmetic progressions (dotted intervals). Coloring of dotted interval graphs naturally arises in the context of high throughput genotyping. We study the properties of dotted interval graphs, with a focus on coloring. We show that any graph is a DIG, but that <i>DIG<sub>d</sub></i> graphs, that is, DIGs in which the arithmetic progressions have a jump of at most <i>d</i>, form a strict hierarchy. We show that coloring <i>DIG<sub>d</sub></i> graphs is NP-complete even for <i>d</i> = 2. For any fixed <i>d</i>, we provide a 5/6<i>d</i> + <i>o</i>(<i>d</i>) approximation for the coloring of <i>DIG<sub>d</sub></i> graphs. Finally, we show that finding the maximal clique in <i>DIG<sub>d</sub></i> graphs is fixed parameter tractable in <i>d</i>.
meeting of the association for computational linguistics | 2017
Oren Melamud; Jacob Goldberger
In this paper we define a measure of dependency between two random variables, based on the Jensen-Shannon (JS) divergence between their joint distribution and the product of their marginal distributions. Then, we show that word2vec’s skip-gram with negative sampling embedding algorithm finds the optimal low-dimensional approximation of this JS dependency measure between the words and their contexts. The gap between the optimal score and the low-dimensional approximation is demonstrated on a standard text corpus.
workshop on innovative use of nlp for building educational applications | 2016
Michael Wojatzki; Oren Melamud; Torsten Zesch
Cloze tests, also known as gap-fill exercises, are a popular tool for acquiring and evaluating language proficiency. A major challenge in the way of automating scoring of cloze tests is the yet unsolved problem of gap filler ambiguity. To address this challenge, we present the concept of bundled gap filling, along with (1) an efficient computational model for automatically generating unambiguous gap bundle exercises, and (2) a disambiguation measure for guiding the construction of the exercises and validating their level of ambiguity. Our evaluation shows that our proposed exercises achieve a dramatic reduction in gap filler ambiguity, while our disambiguation measure can be effectively used to discard exercises that are nevertheless too ambiguous.
meeting of the association for computational linguistics | 2013
Oren Melamud; Jonathan Berant; Ido Dagan; Jacob Goldberger; Idan Szpektor