Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Eric Gaussier is active.

Publication


Featured researches published by Eric Gaussier.


european conference on information retrieval | 2005

A probabilistic interpretation of precision, recall and F -score, with implication for evaluation

Cyril Goutte; Eric Gaussier

We address the problems of 1/ assessing the confidence of the standard point estimates, precision, recall and F-score, and 2/ comparing the results, in terms of precision, recall and F-score, obtained using two different methods. To do so, we use a probabilistic setting which allows us to obtain posterior distributions on these performance indicators, rather than point estimates. This framework is applied to the case where different methods are run on different datasets from the same source, as well as the standard situation where competing results are obtained on the same data.


international acm sigir conference on research and development in information retrieval | 2005

Relation between PLSA and NMF and implications

Eric Gaussier; Cyril Goutte

Non-negative Matrix Factorization (NMF, [5]) and Probabilistic Latent Semantic Analysis (PLSA, [4]) have been successfully applied to a number of text analysis tasks such as document clustering. Despite their different inspirations, both methods are instances of multinomial PCA [1]. We further explore this relationship and first show that PLSA solves the problem of NMF with KL divergence, and then explore the implications of this relationship.


meeting of the association for computational linguistics | 2004

A Geometric View on Bilingual Lexicon Extraction from Comparable Corpora

Eric Gaussier; Jean-Michel Renders; Irina Matveeva; Cyril Goutte; Hervé Déjean

We present a geometric view on bilingual lexicon extraction from comparable corpora, which allows to re-interpret the methods proposed so far and identify unresolved problems. This motivates three new methods that aim at solving these problems. Empirical evaluation shows the strengths and weaknesses of these methods, as well as a significant gain in the accuracy of extracted lexicons.


international acm sigir conference on research and development in information retrieval | 2010

Information-based models for ad hoc IR

Stéphane Clinchant; Eric Gaussier

We introduce in this paper the family of information-based models for ad hoc information retrieval. These models draw their inspiration from a long-standing hypothesis in IR, namely the fact that the difference in the behaviors of a word at the document and collection levels brings information on the significance of the word for the document. This hypothesis has been exploited in the 2-Poisson mixture models, in the notion of eliteness in BM25, and more recently in DFR models. We show here that, combined with notions related to burstiness, it can lead to simpler and better models.


international conference on computational linguistics | 2002

An approach based on multilingual thesauri and model combination for bilingual lexicon extraction

Hervé Déjean; Eric Gaussier; Fatiha Sadat

This paper focuses on exploiting different models and methods in bilingual lexicon extraction, either from parallel or comparable corpora, in specialized domains. First, a special attention is given to the use of multilingual thesauri, and different search strategies based on such thesauri are investigated. Then, a method to combine the different models for bilingual lexicon extraction is presented. Our results show that the combination of the models significantly improves results, and that the use of the hierarchical information contained in our thesaurus, UMLS/MeSH, is of primary importance. Lastly, methods for bilingual terminology extraction and thesaurus enrichment are discussed.


empirical methods in natural language processing | 2005

Translating with Non-contiguous Phrases

Michel Simard; Nicola Cancedda; Bruno Cavestro; Marc Dymetman; Eric Gaussier; Cyril Goutte; Kenji Yamada; Philippe Langlais; Arne Mauser

This paper presents a phrase-based statistical machine translation method, based on non-contiguous phrases, i.e. phrases with gaps. A method for producing such phrases from a word-aligned corpora is proposed. A statistical translation model is also presented that deals such phrases, as well as a training method based on the maximization of translation accuracy, as measured with the NIST evaluation metric. Translations are produced by means of a beam-search decoder. Experimental results are presented, that demonstrate how the proposed method allows to better generalize from the training data.


european conference on information retrieval | 2002

A Hierarchical Model for Clustering and Categorising Documents

Eric Gaussier; Cyril Goutte; Kris Popat; Francine Chen

We propose a new hierarchical generative model for textual data, where words may be generated by topic specific distributions at any level in the hierarchy. This model is naturally well-suited to clustering documents in preset or automatically generated hierarchies, as well as categorising new documents in an existing hierarchy. Training algorithms are derived for both cases, and illustrated on real data by clustering news stories and categorising newsgroup messages. Finally, the generative model may be used to derive a Fisher kernel expressing similarity between documents.


meeting of the association for computational linguistics | 1998

Flow Network Models for Word Alignment and Terminology Extraction from Bilingual Corpora

Eric Gaussier

This paper presents a new model for word alignments between parallel sentences, which allows one to accurately estimate different parameters, in a computationally efficient way. An application of this model to bilingual terminology extraction, where terms are identified in one language and gussed, through the alignment process, in the other one, is also described. An experiment conducted on a small English-French parallel corpus gave results with high precision, demonstrating the validity of the model.


european conference on information retrieval | 2006

Lexical entailment for information retrieval

Stéphane Clinchant; Cyril Goutte; Eric Gaussier

Textual Entailment has recently been proposed as an application independent task of recognising whether the meaning of one text may be inferred from another. This is potentially a key task in many NLP applications. In this contribution, we investigate the use of various lexical entailment models in Information Retrieval, using the language modelling framework. We show that lexical entailment potentially provides a significant boost in performance, similar to pseudo-relevance feedback, but at a lower computational cost. In addition, we show that the performance is relatively stable with respect to the corpus the lexical entailment measure is estimated on.


north american chapter of the association for computational linguistics | 2003

Reducing parameter space for word alignment

Hervé Déjean; Eric Gaussier; Cyril Goutte; Kenji Yamada

This paper presents the experimental results of our attemps to reduce the size of the parameter space in word alignment algorithm. We use IBM Model 4 as a baseline. In order to reduce the parameter space, we pre-processed the training corpus using a word lemmatizer and a bilingual term extraction algorithm. Using these additional components, we obtained an improvement in the alignment error rate.

Collaboration


Dive into the Eric Gaussier's collaboration.

Researchain Logo
Decentralizing Knowledge