Noam Ordan | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Noam Ordan is active.

Explore More

Publication

Featured researches published by Noam Ordan.

empirical methods in natural language processing | 2011

Language Models for Machine Translation: Original vs. Translated Texts

Gennadi Lembersky; Noam Ordan; Shuly Wintner

We investigate the differences between language models compiled from original target-language texts and those compiled from texts manually translated to the target language. Corroborating established observations of Translation Studies, we demonstrate that the latter are significantly better predictors of translated sentences than the former, and hence fit the reference set better. Furthermore, translated texts yield better language models for statistical machine translation than original texts.

Literary and Linguistic Computing | 2015

On the features of translationese

Vered Volansky; Noam Ordan; Shuly Wintner

Much research in translation studies indicates that translated texts are ontologically different from original non-translated ones. Translated texts, in any language, can be considered a dialect of that language, known as ‘translationese’. Several characteristics of translationese have been proposed as universal in a series of hypotheses. In this work, we test these hypotheses using a computational methodology that is based on supervised machine learning. We define several classifiers that implement various linguistically informed features, and assess the degree to which different sets of features can distinguish between translated and original texts. We demonstrate that some feature sets are indeed good indicators of translationese, thereby corroborating some hypotheses, whereas others perform much worse (sometimes at chance level), indicating that some ‘universal’ assumptions have to be reconsidered. In memoriam: Miriam Shlesinger, 1947–2012

Computational Linguistics | 2013

Improving statistical machine translation by adapting translation models to translationese

Gennadi Lembersky; Noam Ordan; Shuly Wintner

Translation models used for statistical machine translation are compiled from parallel corpora that are manually translated. The common assumption is that parallel texts are symmetrical: The direction of translation is deemed irrelevant and is consequently ignored. Much research in Translation Studies indicates that the direction of translation matters, however, as translated language (translationese) has many unique properties. It has already been shown that phrase tables constructed from parallel corpora translated in the same direction as the translation task outperform those constructed from corpora translated in the opposite direction.We reconfirm that this is indeed the case, but emphasize the importance of also using texts translated in the “wrong” direction. We take advantage of information pertaining to the direction of translation in constructing phrase tables by adapting the translation model to the special properties of translationese. We explore two adaptation techniques: First, we create a mixture model by interpolating phrase tables trained on texts translated in the “right” and the “wrong” directions. The weights for the interpolation are determined by minimizing perplexity. Second, we define entropy-based measures that estimate the correspondence of target-language phrases to translationese, thereby eliminating the need to annotate the parallel corpus with information pertaining to the direction of translation. We show that incorporating these measures as features in the phrase tables of statistical machine translation systems results in consistent, statistically significant improvement in the quality of the translation.

Literary and Linguistic Computing | 2016

Identifying translationese at the word and sub-word level

Ehud Alexander Avner; Noam Ordan; Shuly Wintner

We use text classification to distinguish automatically between original and translated texts in Hebrew, a morphologically complex language. To this end, we design several linguistically informed feature sets that capture word-level and sub-word-level (in particular, morphological) properties of Hebrew. Such features are abstract enough to allow for the development of accurate, robust classifiers, and they also lend themselves to linguistic interpretation. Careful evaluation shows that some of the classifiers we define are, indeed, highly accurate, and scale up nicely to domains that they were not trained on. In addition, analysis of the best features provides insight into the morphological properties of translated texts.

workshop on statistical machine translation | 2015

Statistical Machine Translation with Automatic Identification of Translationese

Naama Twitto; Noam Ordan; Shuly Wintner

Translated texts (in any language) are so markedly different from original ones that text classification techniques can be used to tease them apart. Previous work has shown that awareness to these differences can significantly improve statistical machine translation. These results, however, required meta-information on the ontological status of texts (original or translated) which is typically unavailable. In this work we show that the predictions of translationese classifiers are as good as meta-information. First, when a monolingual corpus in the target language is given, to be used for constructing a language model, predicting the translated portions of the corpus, and using only them for the language model, is as good as using the entire corpus. Second, identifying the portions of a parallel corpus that are translated in the direction of the translation task, and using only them for the translation model, is as good as using the entire corpus. We present results from several language pairs and various data sets, indicating that these results are robust and general.

north american chapter of the association for computational linguistics | 2015

USAAR-CHRONOS: Crawling the Web for Temporal Annotations

Liling Tan; Noam Ordan

This paper describes the USAAR-CHRONOS participation in the Diachronic Text Evaluation task of SemEval-2015 to identify the time period of historical text snippets. We adapt a web crawler to retrieve the original source of the text snippets and determine the publication year of the retrieved texts from their URLs. We report a precision score of >90% in identifying the text epoch. Additionally, by crawling and cleaning the website that hosts the source of the text snippets, we present Daikon, a corpus that can be used for future work on epoch identification from a diachronic perspective.

meeting of the association for computational linguistics | 2017

Found in Translation: Reconstructing Phylogenetic Language Trees from Translations

Ella Rabinovich; Noam Ordan; Shuly Wintner

Translation has played an important role in trade, law, commerce, politics, and literature for thousands of years. Translators have always tried to be invisible; ideal translations should look as if they were written originally in the target language. We show that traces of the source language remain in the translation product to the extent that it is possible to uncover the history of the source language by looking only at the translation. Specifically, we automatically reconstruct phylogenetic language trees from monolingual texts (translated from several source languages). The signal of the source language is so powerful that it is retained even after two phases of translation. This strongly indicates that source language interference is the most dominant characteristic of translated texts, overshadowing the more subtle signals of universal properties of translation.

Corpus Linguistics and Linguistic Theory | 2010

Translational conflicts between cognate languages: Arabic into Hebrew as case in point

Noam Ordan; Nimrod Hershberg; Miriam Shlesinger

Abstract In this study we examine ten hypotheses related to stylistic features of two Semitic languages, Hebrew and Arabic. Our assumption was that these hypotheses would enable us to discriminate texts written originally in Hebrew from those translated from Arabic. The ten hypotheses take into account a contrastive analytical view as well as recent research in Translation Studies. Being cognate languages, Hebrew and Arabic share morphological, phonetic and semantic features; consequently, certain Arabic forms trigger similar forms in Hebrew. We see this as accounting for the predominance of interference in the case of this (cognate) language pair and as evidence of an overarching hypothesis, whereby translation between cognate languages will entail interference, often superseding the norm of adherence to target-language standards. Eight of the ten hypotheses were borne out, to varying extents.

meeting of the association for computational linguistics | 2011