Shachar Mirkin
Xerox
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Shachar Mirkin.
international joint conference on natural language processing | 2009
Shachar Mirkin; Lucia Specia; Nicola Cancedda; Ido Dagan; Marc Dymetman; Idan Szpektor
This paper addresses the task of handling unknown terms in SMT. We propose using source-language monolingual models and resources to paraphrase the source text prior to translation. We further present a conceptual extension to prior work by allowing translations of entailed texts rather than paraphrases only. A method for performing this process efficiently is presented and applied to some 2500 sentences with unknown terms. Our experiments show that the proposed approach substantially increases the number of properly translated texts.
meeting of the association for computational linguistics | 2006
Shachar Mirkin; Ido Dagan; Maayan Geffet
This paper addresses the problem of acquiring lexical semantic relationships, applied to the lexical entailment relation. Our main contribution is a novel conceptual integration between the two distinct acquisition paradigms for lexical relations - the pattern-based and the distributional similarity approaches. The integrated method exploits mutual complementary information of the two approaches to obtain candidate relations and informative characterizing features. Then, a small size training set is used to construct a more accurate supervised classifier, showing significant increase in both recall and precision over the original approaches.
empirical methods in natural language processing | 2015
Shachar Mirkin; Scott Nowson; Caroline Brun; Julien Perez
Language use is known to be influenced by personality traits as well as by sociodemographic characteristics such as age or mother tongue. As a result, it is possible to automatically identify these traits of the author from her texts. It has recently been shown that knowledge of such dimensions can improve performance in NLP tasks such as topic and sentiment modeling. We posit that machine translation is another application that should be personalized. In order to motivate this, we explore whether translation preserves demographic and psychometric traits. We show that, largely, both translation of the source training data into the target language, and the target test data into the source language has a detrimental effect on the accuracy of predicting author traits. We argue that this supports the need for personal and personality-aware machine translation models.
joint conference on lexical and computational semantics | 2014
Anand Gupta; Manpreet Kaur; Shachar Mirkin; Adarsh Singh; Aseem Goyal
Sentence Connectivity is a textual characteristic that may be incorporated intelligently for the selection of sentences of a well meaning summary. However, the existing summarization methods do not utilize its potential fully. The present paper introduces a novel method for singledocument text summarization. It poses the text summarization task as an optimization problem, and attempts to solve it using Weighted Minimum Vertex Cover (WMVC), a graph-based algorithm. Textual entailment, an established indicator of semantic relationships between text units, is used to measure sentence connectivity and construct the graph on which WMVC operates. Experiments on a standard summarization dataset show that the suggested algorithm outperforms related methods.
european conference on machine learning | 2013
William Darling; Cédric Archambeau; Shachar Mirkin; Guillaume Bouchard
In this paper, we propose a probabilistic framework for predicting the root causes of errors in data processing pipelines made up of several components when we only have access to partial feedback; that is, we are aware when some error has occurred in one or more of the components, but we do not know which one. The proposed error model enables us to direct the user feedback to the correct components in the pipeline to either automatically correct errors as they occur, retrain the component with assimilated training examples, or take other corrective action. We present the model and describe an Expectation Maximization (EM)-based algorithm to learn the model parameters and predict the error configuration. We demonstrate the accuracy and usefulness of our method first on synthetic data, and then on two distinct tasks: error correction in a 2-component opinion summarization system, and phrase error detection in statistical machine translation.
Theory and Applications of Categories | 2008
Roy Bar-Haim; Ido Dagan; Shachar Mirkin; Eyal Shnarch; Idan Szpektor; Jonathan Berant; Iddo Greental
meeting of the association for computational linguistics | 2010
Shachar Mirkin; Ido Dagan; Sebastian Padó
Archive | 2014
Shachar Mirkin; Sriram Venkatapathy; Marc Dymetman
Archive | 2014
Sriram Venkatapathy; Shachar Mirkin
meeting of the association for computational linguistics | 2009
Shachar Mirkin; Ido Dagan; Eyal Shnarch