Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Olivier Ferret is active.

Publication


Featured researches published by Olivier Ferret.


international joint conference on natural language processing | 2015

Generative Event Schema Induction with Entity Disambiguation

Kiem-Hieu Nguyen; Xavier Tannier; Olivier Ferret; Romaric Besançon

This paper presents a generative model to event schema induction. Previous methods in the literature only use head words to represent entities. However, elements other than head words contain useful information. For instance, an armed man is more discriminative than man. Our model takes into account this information and precisely represents it using probabilistic topic distributions. We illustrate that such information plays an important role in parameter estimation. Mostly, it makes topic distributions more coherent and more discriminative. Experimental results on benchmark dataset empirically confirm this enhancement.


conference on information and knowledge management | 2011

Filtering and clustering relations for unsupervised information extraction in open domain

Wei Wang; Romaric Besançon; Olivier Ferret; Brigitte Grau

Information Extraction has recently been extended to new areas by loosening the constraints on the strict definition of the extracted information and allowing to design more open information extraction systems. In this new domain of unsupervised information extraction, we focus on the task of extracting and characterizing a priori unknown relations between a given set of entity types. One of the challenges of this task is to deal with the large amount of candidate relations when extracting them from a large corpus. We propose in this paper an approach for the filtering of such candidate relations based on heuristics and machine learning models. More precisely, we show that the best model for achieving this task is a Conditional Random Field model according to evaluations performed on a manually annotated corpus of about one thousand relations. We also tackle the problem of identifying semantically similar relations by clustering large sets of them. Such clustering is achieved by combining a classical clustering algorithm and a method for the efficient identification of highly similar relation pairs. Finally, we evaluate the impact of our filtering of relations on this semantic clustering with both internal measures and external measures. Results show that the filtering procedure doubles the recall of the clustering while keeping the same precision.


meeting of the association for computational linguistics | 2006

Enhancing Electronic Dictionaries with an Index Based on Associations

Olivier Ferret; Michael Zock

A good dictionary contains not only many entries and a lot of information concerning each one of them, but also adequate means to reveal the stored information. Information access depends crucially on the quality of the index. We will present here some ideas of how a dictionary could be enhanced to support a speaker/writer to find the word s/he is looking for. To this end we suggest to add to an existing electronic resource an index based on the notion of association. We will also present preliminary work of how a subset of such associations, for example, topical associations, can be acquired by filtering a network of lexical co-occurrences extracted from a corpus.


International Journal of Speech Technology | 2010

Deliberate word access: an intuition, a roadmap and some preliminary empirical results

Michael Zock; Olivier Ferret; Didier Schwab

No doubt, words play a major role in language production, hence finding them is of vital importance, be it for writing or for speaking (spontaneous discourse production, simultaneous translation). Words are stored in a dictionary, and the general belief holds, the more entries the better. Yet, to be truly useful the resource should contain not only many entries and a lot of information concerning each one of them, but also adequate navigational means to reveal the stored information. Information access depends crucially on the organization of the data (words) and the access keys (meaning/form), two factors largely overlooked. We will present here some ideas of how an existing electronic dictionary could be enhanced to support a speaker/writer to find the word s/he is looking for. To this end we suggest to add to an existing electronic dictionary an index based on the notion of association, i.e. words co-occurring in a well balanced corpus, the latter being supposed to represent the average citizen’s knowledge of the world. Before describing our approach, we will briefly take a critical look at the work being done by colleagues working on automatic, spontaneous or deliberate language production,—that is, computer-generated language, simulation of the mental lexicon, or WordNet (WN),—to see how adequate they are with regard to our goal.


cross language evaluation forum | 2003

Concept-based searching and merging for Multilingual information retrieval: First experiments at CLEF 2003

Romaric Besançon; Gaël de Chalendar; Olivier Ferret; Christian Fluhr; Olivier Mesnard; Hubert Naets

This article presents the LIC2M’s crosslingual retrieval system which participated in the Small Multilingual Track of CLEF 2003. This system is based on a deep linguistic analysis of documents and queries that aims at categorizing them in terms of concepts and implements an original search algorithm inherited from the SPIRIT (EMIR) system that takes into account this categorization.


european conference on artificial intelligence | 2012

Combining bootstrapping and feature selection for improving a distributional thesaurus

Olivier Ferret

Work about distributional thesauri has now widely shown that the relations in these thesauri are mainly reliable for high frequency words and for capturing semantic relatedness rather than strict semantic similarity. In this article, we propose a method for improving such a thesaurus through its re-balancing in favor of middle and low frequency words. This method is based on a bootstrapping mechanism: a set of positive and negative examples of semantically related words are selected in a unsupervised way from the results of the initial measure and used for training a supervised classifier. This classifier is then applied for reranking the initial semantic neighbors. We evaluate the interest of this reranking for a large set of English nouns with various frequencies.


international conference natural language processing | 2010

Using temporal cues for segmenting texts into events

Ludovic Jean-Louis; Romaric Besançon; Olivier Ferret

One of the early application of Information Extraction, motivated by the needs for intelligence tools, is the detection of events in news articles. But this detection may be difficult when news articles mention several occurrences of events of the same kind, which is often done for comparison purposes. We propose in this article new approaches to segment the text of news articles in units relative to only one event, in order to help the identification of relevant information associated with the main event of the news. We present two approaches that use statistical machine learning models (HMM and CRF) exploiting temporal information extracted from the texts as a basis for this segmentation. The evaluation of these approaches in the domain of seismic events show that with a robust and generic approach, we can achieve results at least as good as results obtained with a specialized heuristic approach.


Revue des Sciences et Technologies de l'Information - Série RIA : Revue d'Intelligence Artificielle | 2004

REGAL, un système pour la visualisation sélective de documents

Javier Couto; Olivier Ferret; Brigitte Grau; Nicolas Hernandez; Agata Jackiewicz; Jean-Luc Minel; Sylvie Porhiel

Information retrieval systems generally return a list of ranked documents, such as only the title and possibly a snippet that contains the words of the request allow a user to evaluate the document relevance relative to her initial request. This kind of result leads the user to browse a lot of documents before satisfying her information need. In order to improve information retrieval, we have studied text visualization: which information has to be shown and how? Our system REGAL (REsume Guide par les Attentes du Lecteur), automatically extracts the visualized information from texts by applying a thematic analysis that does not require a pre-existing structuring or a formatting of the texts, and is based on the combination of two criteria: lexical cohesion and cue phrases. MOTS-CLES : visualisation de texte, navigation textuelle, resume dynamique, analyse thematique.


Archive | 2015

Typing Relations in Distributional Thesauri

Olivier Ferret

Dictionaries are important tools for language producers but they are rarely organized for an easy access to words from concepts. Such access can be facilitated by the presence of relations between words in dictionaries for implementing associative lookup. Lexical associations can be quite easily extracted from a corpus as first or second order co-occurrence relations. However, these associations face two related problems: they are noisy and the type of relations on which they are based is implicit. In this article, we propose to address to some extent the second problem by studying the type of relations that can be found in distributional thesauri. This study is more precisely performed by relying on a reference lexical network, WordNet in our case, in which the type of the relations is known. This reference network is first used for identifying directly the relations of the thesauri that are present in this network but also for characterizing, through the detection of patterns of composition of known relations, new kinds of relations that do not appear explicitly in it.


cross-language evaluation forum | 2006

Finding answers in the Œdipe system by extracting and applying linguistic patterns

Romaric Besançon; Mehdi Embarek; Olivier Ferret

The simple techniques used in the OEdipe question answering system developed by the CEA-LIST/LIC2M did not perform well on definition questions in past CLEF-QA campaigns. We present in this article a new module for this QA system dedicated to this type of questions. This module is based on the automatic learning of lexico-syntactic patterns from examples. These patterns are then used for the extraction of short answers for definition questions. This technique has been experimented in the French monolingual track of the CLEF-QA 2006 evaluation, and results obtained have shown an improvement on this type of questions, compared to our previous participation.

Collaboration


Dive into the Olivier Ferret's collaboration.

Top Co-Authors

Avatar

Brigitte Grau

Centre national de la recherche scientifique

View shared research outputs
Top Co-Authors

Avatar

Gaël de Chalendar

Centre national de la recherche scientifique

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Aurélie Névéol

National Institutes of Health

View shared research outputs
Top Co-Authors

Avatar

Anne Vilnat

Centre national de la recherche scientifique

View shared research outputs
Top Co-Authors

Avatar

Isabelle Robba

Centre national de la recherche scientifique

View shared research outputs
Top Co-Authors

Avatar

Martine Hurault-Plantet

Centre national de la recherche scientifique

View shared research outputs
Top Co-Authors

Avatar

Christian Jacquemin

Centre national de la recherche scientifique

View shared research outputs
Top Co-Authors

Avatar

Laura Monceaux

Centre national de la recherche scientifique

View shared research outputs
Top Co-Authors

Avatar

Vincent Claveau

Centre national de la recherche scientifique

View shared research outputs
Researchain Logo
Decentralizing Knowledge