Xavier Tannier
Centre national de la recherche scientifique
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Xavier Tannier.
INEX'10 Proceedings of the 9th international conference on Initiative for the evaluation of XML retrieval: comparative evaluation of focused retrieval | 2010
Eric SanJuan; Patrice Bellot; Véronique Moriceau; Xavier Tannier
The INEX Question Answering track ([emailxa0protected]) aims to evaluate a complex question-answering task using the Wikipedia. The set of questions is composed of factoid, precise questions that expect short answers, as well as more complex questions that can be answered by several sentences or by an aggregation of texts from different documents. n nLong answers have been evaluated based on Kullback Leibler (KL) divergence between n-gram distributions. This allowed summarization systems to participate. Most of them generated a readable extract of sentences from top ranked documents by a state-of-the-art document retrieval engine. Participants also tested several methods of question disambiguation. n nEvaluation has been carried out on a pool of real questions from OverBlog and Yahoo! Answers. Results tend to show that the baseline-restricted focused IR system minimizes KL divergence but misses readability meanwhile summarization systems tend to use longer and standalone sentences thus improving readability but increasing KL divergence.
Journal of the American Medical Informatics Association | 2013
Cyril Grouin; Natalia Grabar; Thierry Hamon; Sophie Rosset; Xavier Tannier; Pierre Zweigenbaum
OBJECTIVEnTo identify the temporal relations between clinical events and temporal expressions in clinical reports, as defined in the i2b2/VA 2012 challenge.nnnDESIGNnTo detect clinical events, we used rules and Conditional Random Fields. We built Random Forest models to identify event modality and polarity. To identify temporal expressions we built on the HeidelTime system. To detect temporal relations, we systematically studied their breakdown into distinct situations; we designed an oracle method to determine the most prominent situations and the most suitable associated classifiers, and combined their results.nnnRESULTSnWe achieved F-measures of 0.8307 for event identification, based on rules, and 0.8385 for temporal expression identification. In the temporal relation task, we identified nine main situations in three groups, experimentally confirming shared intuitions: within-sentence relations, section-related time, and across-sentence relations. Logistic regression and Naïve Bayes performed best on the first and third groups, and decision trees on the second. We reached a 0.6231 global F-measure, improving by 7.5 points our official submission.nnnCONCLUSIONSnCarefully hand-crafted rules obtained good results for the detection of events and temporal expressions, while a combination of classifiers improved temporal link prediction. The characterization of the oracle recall of situations allowed us to point at directions where further work would be most useful for temporal relation detection: within-sentence relations and linking History of Present Illness events to the admission date. We suggest that the systematic situation breakdown proposed in this paper could also help improve other systems addressing this task.
international conference on computational linguistics | 2008
Caroline Hagège; Xavier Tannier
We present in this paper the work that has been developed at Xerox Research Centre Europe to build a robust temporal text processor. The aim of this processor is to extract events described in texts and to link them, when possible, to a temporal anchor. Another goal is to be able to establish temporal ordering between the events expressed in texts. One of the originalities of this work is that the temporal processor is coupled with a syntactic-semantic analyzer. The temporal module takes then advantage of syntactic and semantic information extracted from text and at the same time, syntactic and semantic processing benefits from the temporal processing performed. As a result, analysis and management of temporal information is combined with other kinds of syntactic and semantic information, making possible a more refined text understanding processor that takes into account the temporal dimension.
Journal of Artificial Intelligence Research | 2011
Xavier Tannier; Philippe Muller
Temporal information has been the focus of recent attention in information extraction, leading to some standardization effort, in particular for the task of relating events in a text. This task raises the problem of comparing two annotations of a given text, because relations between events in a story are intrinsically interdependent and cannot be evaluated separately. A proper evaluation measure is also crucial in the context of a machine learning approach to the problem. Finding a common comparison referent at the text level is not obvious, and we argue here in favor of a shift from eventbased measures to measures on a unique textual object, a minimal underlying temporal graph, or more formally the transitive reduction of the graph of relations between event boundaries. We support it by an investigation of its properties on synthetic data and on a well-know temporal corpus.
INEX'09 Proceedings of the Focused retrieval and evaluation, and 8th international conference on Initiative for the evaluation of XML retrieval | 2009
Véronique Moriceau; Eric SanJuan; Xavier Tannier; Patrice Bellot
QA@INEX aims to evaluate a complex question-answering task. In such a task, the set of questions is composed of factoid, precise questions that expect short answers, as well as more complex questions that can be answered by several sentences or by an aggregation of texts from different documents. Question-answering, XML/passage retrieval and automatic summarization are combined in order to get closer to real information needs. This paper presents the groundwork carried out in 2009 to determine the tasks and a novel evaluation methodology that will be used in 2010.
Information Processing and Management | 2016
Patrice Bellot; Véronique Moriceau; Josiane Mothe; Eric SanJuan; Xavier Tannier
A full summary report on the four-year long Tweet Contextualization task.A detail on evaluation metrics and framework we developed for tweet contextualization evaluation.A deep analysis of what the participants suggested in their approaches by categorizing the various methods.A description of the data made available to the community. Microblogging platforms such as Twitter are increasingly used for on-line client and market analysis. This motivated the proposal of a new track at CLEF INEX lab of Tweet Contextualization. The objective of this task was to help a user to understand a tweet by providing him with a short explanatory summary (500 words). This summary should be built automatically using resources like Wikipedia and generated by extracting relevant passages and aggregating them into a coherent summary.Running for four years, results show that the best systems combine NLP techniques with more traditional methods. More precisely the best performing systems combine passage retrieval, sentence segmentation and scoring, named entity recognition, text part-of-speech (POS) analysis, anaphora detection, diversity content measure as well as sentence reordering.This paper provides a full summary report on the four-year long task. While yearly overviews focused on system results, in this paper we provide a detailed report on the approaches proposed by the participants and which can be considered as the state of the art for this task. As an important result from the 4 years competition, we also describe the open access resources that have been built and collected. The evaluation measures for automatic summarization designed in DUC or MUC were not appropriate to evaluate tweet contextualization, we explain why and depict in detailed the LogSim measure used to evaluate informativeness of produced contexts or summaries. Finally, we also mention the lessons we learned and that it is worth considering when designing a task.
international conference on computational linguistics | 2012
Béatrice Arnulphy; Xavier Tannier; Anne Vilnat
In this paper, we propose a method for creating automatically weighted lexicons of event names. Almost all names of events are ambiguous in context (i.e., they can be interpreted in an eventive or non-eventive reading). Therefore, weights representing the relative eventiveness of a noun can help for disambiguating event detection in texts. n nWe applied our method on both French and English corpora. Our method has been applied to both French and English corpora. We performed an evaluation based upon a machine-learning approach that shows that using weighted lexicons can be a good way to improve event extraction. We also propose a study concerning the necessary size of corpus to be used for creating a valuable lexicon.
string processing and information retrieval | 2012
Clément de Groc; Xavier Tannier
In this article, we apply a graph-based approach for pseudo-relevance feedback. We model term co-occurrences in a fixed window or at the document level as a graph and apply a random walk algorithm to select expansion terms. Evaluation of the proposed approach on several standard TREC and CLEF collections including the recent TREC-Microblog dataset show that this approach is in line with state-of-the-art pseudo-relevance feedback models.
Document numérique | 2015
Patrice Bellot; Véronique Moriceau; Josianne Mothe; Eric SanJuan; Xavier Tannier
Cet article s’interesse a l’evaluation de la contextualisation de tweets. La contextualisation est definie comme un resume permettant de remettre en contexte un texte qui, de par sa taille, ne contient pas l’ensemble des elements qui permettent a un lecteur de comprendre son contenu. Nous definissons un cadre d’evaluation pour la contextualisation de tweets generalisable a d’autres textes courts. Nous proposons une collection de reference ainsi que des mesures d’evaluation ad hoc. Ce cadre d’evaluation a ete experimente avec succes dans le contexte de la campagne INEX Tweet Contextualization. Au regard des resultats obtenus lors de cette campagne, nous discutons ici les mesures proposees et les resultats obtenus par les participants.
cross language evaluation forum | 2009
Xavier Tannier; Véronique Moriceau
FIDJI is an open-domain question-answering system for French. The main goal is to validate answers by checking that all the information given in the question is retrieved in the supporting texts. This paper presents FIDJIs results at ResPubliQA 2009, as well as additional experiments bringing to light the role of linguistic modules in this particular campaign.