Amir Hazem
University of Nantes
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Amir Hazem.
Natural Language Engineering | 2016
Emmanuel Morin; Amir Hazem
The main work in bilingual lexicon extraction from comparable corpora is based on the implicit hypothesis that corpora are balanced in terms of size. However, the historical context-based projection method is relatively insensitive to the size of each part of the comparable corpus. Within this context, we have carried out a study on the influence of unbalanced specialized comparable corpora and on the quality of bilingual terminology extraction by doing different experiments. Moreover, we have introduced a strategy into the context-based projection method to re-estimate word co-occurrence observations. This is done by using smoothing or prediction techniques that boost the observations of word co-occurrences which are mainly useful for the smallest part of an unbalanced comparable corpus. Our results show that the use of unbalanced specialized comparable corpora results in a significant improvement in the quality of extracted lexicons.
meeting of the association for computational linguistics | 2014
Emmanuel Morin; Amir Hazem
The main work in bilingual lexicon extraction from comparable corpora is based on the implicit hypothesis that corpora are balanced. However, the historical contextbased projection method dedicated to this task is relatively insensitive to the sizes of each part of the comparable corpus. Within this context, we have carried out a study on the influence of unbalanced specialized comparable corpora on the quality of bilingual terminology extraction through different experiments. Moreover, we have introduced a regression model that boosts the observations of word cooccurrences used in the context-based projection method. Our results show that the use of unbalanced specialized comparable corpora induces a significant gain in the quality of extracted lexicons.
international conference on computational linguistics | 2012
Amir Hazem; Emmanuel Morin
In this paper, we present a new way of looking at the problem of bilingual lexicon extraction from comparable corpora, mainly inspired from information retrieval (IR) domain and more specifically, from question-answering systems (QAS). By analogy to QAS, we consider a word to be translated as a part of a question extracted from a source language, and we try to find out the correct translation assuming that it is contained in the correct answer of that question extracted from the target language. The methods traditionally dedicated to the task of bilingual lexicon extraction from comparable corpora tend to represent the whole contexts of a word in a single vector and thus, give a general representation of all its contexts. We believe that a local representation of the contexts of a word, given by a window that corresponds to the query, is more appropriate as we give more importance to local information that could be swallowed up in the volume if represented and treated in a single whole context vector. We show that the empirical results obtained are competitive with the standard approach traditionally dedicated to this task.
meeting of the association for computational linguistics | 2015
Emmanuel Morin; Amir Hazem; Florian Boudin; Elizaveta Loginova-Clouet
This paper describes the LINA system for the BUCC 2015 shared track. Following (Enright and Kondrak, 2007), our system identify comparable documents by collecting counts of hapax words. We extend this method by filtering out document pairs sharing target documents using pigeonhole reasoning and cross-lingual information .
international conference on computational linguistics | 2014
Amir Hazem; Emmanuel Morin
This paper proposes two strategies for combining a window-based and a syntax-based context representation for the task of bilingual lexicon extraction from comparable corpora. The first strategy involves combining the scores assigned to translations by both models and using them for ranking and selection; the second strategy involves a combination of the context features provided by the two models prior to applying the lexicon extraction method. The reported results show that the combination of the two context representations significantly improves the performance of bilingual lexicon extraction compared to using each of the representations individually.
language resources and evaluation | 2012
Amir Hazem; Emmanuel Morin
meeting of the association for computational linguistics | 2011
Amir Hazem; Emmanuel Morin; Sebastián Peña Saldarriaga
TALN 2011 - Conférence sur le Traitement Automatique des Langues Naturelles | 2011
Li Bo; Éric Gaussier; Emmanuel Morin; Amir Hazem
language resources and evaluation | 2014
Béatrice Daille; Amir Hazem
meeting of the association for computational linguistics | 2013
Amir Hazem; Emmanuel Morin
Collaboration
Dive into the Amir Hazem's collaboration.
Institut de Recherche en Communications et Cybernétique de Nantes
View shared research outputsInstitut de Recherche en Communications et Cybernétique de Nantes
View shared research outputs