Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Antoine Doucet is active.

Publication


Featured researches published by Antoine Doucet.


international acm sigir conference on research and development in information retrieval | 2012

Report on INEX 2008

T. Beckers; Patrice Bellot; Gianluca Demartini; Ludovic Denoyer; C.M. de Vries; Antoine Doucet; Khairun Nisa Fachry; Norbert Fuhr; Patrick Gallinari; Shlomo Geva; Wei-Che Huang; Tereza Iofciu; Jaap Kamps; Gabriella Kazai; Marijn Koolen; Sangeetha Kutty; Monica Landoni; Miro Lehtonen; Véronique Moriceau; Richi Nayak; Ragnar Nordlie; Nils Pharo; Eric SanJuan; Ralf Schenkel; Xavier Tannier; Martin Theobald; James A. Thom; Andrew Trotman; A.P. de Vries

INEX investigates focused retrieval from structured documents by providing large test collections of structured documents, uniform evaluation measures, and a forum for organizations to compare their results. This paper reports on the INEX 2008 evaluation campaign, which consisted of a wide range of tracks: Ad hoc, Book, Efficiency, Entity Ranking, Interactive, QA, Link the Wiki, and XML Mining.


International Workshop of the Initiative for the Evaluation of XML Retrieval | 2006

Unsupervised Classification of Text-Centric XML Document Collections

Antoine Doucet; Miro Lehtonen

This paper addresses the problem of the unsupervised classification of text-centric XML documents. In the context of the INEX mining track 2006, we present methods to exploit the inherent structural information of XML documents in the document clustering process. Using the k-means algorithm, we have experimented with a couple of feature sets, to discover that a promising direction is to use structural information as a preliminary means to detect and put aside structural outliers. The improvement of the semantic-wise quality of clustering is significantly higher through this approach than through a combination of the structural and textual feature sets.


international acm sigir conference on research and development in information retrieval | 2006

Advanced document description, a sequential approach

Antoine Doucet

This dissertation addresses the problems of the extraction, selection and exploitation of word sequences, with a particular focus on the applicability to document collections of any type and written in any language.


International Journal on Document Analysis and Recognition | 2011

Setting up a competition framework for the evaluation of structure extraction from OCR-ed books

Antoine Doucet; Gabriella Kazai; Bodin Dresevic; Aleksandar Uzelac; Bogdan Radakovic; Nikola Todic

This paper describes the setup of the Book Structure Extraction competition run at ICDAR 2009. The goal of the competition was to evaluate and compare automatic techniques for deriving structure information from digitized books, which could then be used to aid navigation inside the books. More specifically, the task that participants faced was to construct hyperlinked tables of contents for a collection of 1,000 digitized books. This paper describes the setup of the competition and its challenges. It introduces and discusses the book collection used in the task, the collaborative construction of the ground truth, the evaluation measures, and the evaluation results. The paper also introduces a data set to be used freely for research evaluation purposes.


INEX'10 Proceedings of the 9th international conference on Initiative for the evaluation of XML retrieval: comparative evaluation of focused retrieval | 2010

Overview of the INEX 2010 book track: scaling up the evaluation using crowdsourcing

Gabriella Kazai; Marijn Koolen; Jaap Kamps; Antoine Doucet; Monica Landoni

The goal of the INEX Book Track is to evaluate approaches for supporting users in searching, navigating and reading the full texts of digitized books. The investigation is focused around four tasks: 1) Best Books to Reference, 2) Prove It, 3) Structure Extraction, and 4) Active Reading. In this paper, we report on the setup and the results of these tasks in 2010. The main outcome of the track lies in the changes to the methodology for constructing the test collection for the evaluation of the Best Books and Prove It search tasks. In an effort to scale up the evaluation, we explored the use of crowdsourcing both to create the test topics and then to gather the relevance labels for the topics over a corpus of 50k digitized books. The resulting test collection construction methodology combines editorial judgments contributed by INEX participants with crowdsourced relevance labels. We provide an analysis of the crowdsourced data and conclude that - with appropriate task design - crowdsourcing does provide a suitable framework for the evaluation of book search approaches.


Advances in Focused Retrieval | 2009

Overview of the INEX 2008 Book Track

Gabriella Kazai; Antoine Doucet; Monica Landoni

This paper provides an overview of the INEX 2008 Book Track. Now in its second year, the track aimed at broadening its scope by investigating topics of interest in the fields of information retrieval, human computer interaction, digital libraries, and eBooks. The main topics of investigation were defined around challenges for supporting users in reading, searching, and navigating the full texts of digitized books. Based on these themes, four tasks were defined: 1) The Book Retrieval task aimed at comparing traditional and book-specific retrieval approaches, 2) the Page in Context task aimed at evaluating the value of focused retrieval approaches for searching books, 3) the Structure Extraction task aimed to test automatic techniques for deriving structure from OCR and layout information, and 4) the Active Reading task aimed to explore suitable user interfaces for eBooks enabling reading, annotation, review, and summary across multiple books. We report on the setup and results of each of these tasks.


international acm sigir conference on research and development in information retrieval | 2014

Document summarization based on word associations

Oskar Gross; Antoine Doucet; Hannu Toivonen

In the age of big data, automatic methods for creating summaries of documents become increasingly important. In this paper we propose a novel, unsupervised method for (multi-)document summarization. In an unsupervised and language-independent fashion, this approach relies on the strength of word associations in the set of documents to be summarized. The summaries are generated by picking sentences which cover the most specific word associations of the document(s). We measure the performance on the DUC 2007 dataset. Our experiments indicate that the proposed method is the best-performing unsupervised summarization method in the state-of-the-art that makes no use of human-curated knowledge bases.


Lecture Notes in Computer Science | 2009

Overview of the INEX 2009 book track

Gabriella Kazai; Antoine Doucet; Marijn Koolen; Monica Landoni

The goal of the INEX 2009 Book Track is to evaluate approaches for supporting users in reading, searching, and navigating the full texts of digitized books. The investigation is focused around four tasks: 1) the Book Retrieval task aims at comparing traditional and book-specific retrieval approaches, 2) the Focused Book Search task evaluates focused retrieval approaches for searching books, 3) the Structure Extraction task tests automatic techniques for deriving structure from OCR and layout information, and 4) the Active Reading task aims to explore suitable user interfaces for eBooks enabling reading, annotation, review, and summary across multiple books. We report on the setup and the results of the track.


cross language evaluation forum | 2013

Overview of INEX 2013

Patrice Bellot; Antoine Doucet; Shlomo Geva; Sairam Gurajada; Jaap Kamps; Gabriella Kazai; Marijn Koolen; Arunav Mishra; Véronique Moriceau; Josiane Mothe; Michael Preminger; Eric SanJuan; Ralf Schenkel; Xavier Tannier; Martin Theobald; Matthew Trappett; Qiuyue Wang

INEX investigates focused retrieval from structured documents by providing large test collections of structured documents, uniform evaluation measures, and a forum for organizations to compare their results. This paper reports on the INEX 2013 evaluation campaign, which consisted of four activities addressing three themes: searching professional and user generated data Social Book Search track; searching structured or semantic data Linked Data track; and focused retrieval Snippet Retrieval and Tweet Contextualization tracks. INEX 2013 was an exciting year for INEX in which we consolidated the collaboration with other activities in CLEF and for the second time ran our workshop as part of the CLEF labs in order to facilitate knowledge transfer between the evaluation forums. This paper gives an overview of all the INEX 2013 tracks, their aims and task, the built test-collections, and gives an initial analysis of the results.


ieee international conference on healthcare informatics | 2013

Any Language Early Detection of Epidemic Diseases from Web News Streams

Romain Brixtel; Gaël Lejeune; Antoine Doucet; Nadine Lucas

In this paper, we introduce a multilingual epidemiological news surveillance system. Its main contribution is its ability to extract epidemic events in any language, hence succeeding where state-of-the-art in surveillance systems usually fails : the objective of reactivity. Most systems indeed focus on a selected list of languages, deemed important. However, evidence shows that events are first described in the local language, and translated to other languages later, if and only if they contained important information. Hence, while systems handling only a sample of human languages may indeed succeed at extracting epidemic events, they will only do so after someone else detected the importance of the news, and made the decision to translate it. Thus, with events first described in other languages, such automated systems, that may only detect events that were already detected by humans, are essentially irrelevant for early detection. To overcome this weakness of the state-of-the-art in terms of reactivity, we designed a system that can detect epidemiological events in any language, without requiring any translation, be it automated or human-written. The solution presented in this paper relies on properties that may be called language universals. First, we observe and exploit properties of the news genre that remain unchanged, whatever the writing language. Second, we handle language variations, such as declensions, by processing text at the character-level, rather than at the word level. This additionally allows to handle various writing systems in a similar fashion. We present experiments with 5 languages, steoreotypical of different language families and writing systems : English, Chinese, Greek, Polish and Russian. Our system, DAnIEL, achieves an average F-measure score around 85%, slightly below top-performing systems for the languages that such systems are able to handle. However, its performance is superior for morphologically-rich languages. And it performs of course infinitely better for the languages that other systems are not able to handle : The richest system in the state-of-the-art handles around 10 languages, while there exists about 6,000 languages in the world, 300 of which are spoken by more than one million people. The DAnIEL system is able to process each of them.

Collaboration


Dive into the Antoine Doucet's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Oskar Gross

University of Helsinki

View shared research outputs
Top Co-Authors

Avatar

Jaap Kamps

University of Amsterdam

View shared research outputs
Researchain Logo
Decentralizing Knowledge