Thierry Poibeau
École Normale Supérieure
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Thierry Poibeau.
computational linguistics in the netherlands | 2000
Thierry Poibeau; Leila Kosseim
This paper discusses the influence of the corpus on the automatic identification of proper names in texts. Techniques developed for the newswire genre are generally not sufficient to deal with larger corpora containing texts that do not follow strict writing constraints (for example, e-mail messages, transcriptions of oral conversations, etc). After a brief review of the research performed on news texts, we present some of the problems involved in the analysis of two different corpora: e-mails and hand-transcribed telephone conversations. Once the sources of errors have been presented, we then describe an approach to adapt a proper name extraction system developed for newspaper texts to the analysis of e-mail
Multi-source, Multilingual Information Extraction and Summarization | 2013
Horacio Saggion; Thierry Poibeau
Automatic text summarization, the computer-based production of condensed versions of documents, is an important technology for the information society. Without summaries it would be practically impossible for human beings to get access to the ever growing mass of information available online. Although research in text summarization is over 50 years old, some efforts are still needed given the insufficient quality of automatic summaries and the number of interesting summarization topics being proposed in different contexts by end users (“domain-specific summaries”, “opinion-oriented summaries”, “update summaries”, etc.). This paper gives a short overview of summarization methods and evaluation.
international conference on computational linguistics | 2004
Erick Alphonse; Sophie Aubin; Philippe Bessières; Gilles Bisson; Thierry Hamon; Sandrine Lagarrigue; Adeline Nazarenko; Alain-Pierre Manine; Claire Nédellec; Mohamed Ould Abdel Vetah; Thierry Poibeau; Davy Weissenbacher
This paper gives an overview of the Caderige project. This project involves teams from different areas (biology, machine learning, natural language processing) in order to develop highlevel analysis tools for extracting structured information from biological bibliographical databases, especially Medline. The paper gives an overview of the approach and compares it to the state of the art.
conference of the european chapter of the association for computational linguistics | 2003
Thierry Poibeau
This paper presents a multilingual system designed to recognize named entities in a wide variety of languages (currently more than 12 languages are concerned). The system includes original strategies to deal with a wide variety of encoding character sets, analysis strategies and algorithms to process these languages.
Archive | 2012
Thierry Poibeau; Horacio Saggion; Jakub Piskorski; Roman Yangarber
Information extraction (IE) and text summarization (TS) are powerful technologies for finding relevant pieces of information in text and presenting them to the user in condensed form. The ongoing information explosion makes IE and TS critical for successful functioning within the information society. These technologies face particular challenges due to the inherent multi-source nature of the information explosion. The technologies must now handle not isolated texts or individual narratives, but rather large-scale repositories and streams---in general, in multiple languages---containing a multiplicity of perspectives, opinions, or commentaries on particular topics, entities or events. There is thus a need to adapt existing techniques and develop new ones to deal with these challenges. This volume contains a selection of papers that present a variety of methodologies for content identification and extraction, as well as for content fusion and regeneration. The chapters cover various aspects of the challenges, depending on the nature of the information sought---names vs. events,--- and the nature of the sources---news streams vs. image captions vs. scientific research papers, etc. This volume aims to offer a broad and representative sample of studies from this very active research field.
LREC 2008 Workshop on Sentiment Analysis: Emotion, Metaphor, Ontology and Terminology | 2011
Michel Généreux; Thierry Poibeau; Moshe Koppel
Given a corpus of financial news items labelled according to the market reaction following their publication, we investigate ‘cotemporeneous’ and forward-looking price stock movements. Our approach is to provide a pool of relevant textual features to a machine learning algorithm to detect substantial stock price variations. Our two working hypotheses are that the market reaction to a news item is a good indicator for labelling financial news items, and that a machine learning algorithm can be trained on those news items to build models detecting price movement effectively.
international conference on computational linguistics | 2002
Thierry Poibeau; Dominique Dutoit
This paper presents a module dedicated to the elaboration of linguistic resources for a versatile Information Extraction system. In order to decrease the time spent on the elaboration of resources for the IE system and guide the end-user in a new domain, we suggest to use a machine learning system that helps defining new templates and associated resources. This knowledge is automatically derived from the text collection, in interaction with a large semantic network.
international conference on computational linguistics | 2002
Dominique Dutoit; Thierry Poibeau
In this paper, we present a rich semantic network based on a differential analysis. We then detail implemented measures that take into account common and differential features between words. In a last section, we describe some industrial applications.
joint conference on lexical and computational semantics | 2015
Pablo Ruiz; Thierry Poibeau
An English entity linking (EL) workflow is presented, which combines the annotations of five public open source EL services. The annotations are combined through a weighted voting scheme inspired by the ROVER method , which had not been previously tested on EL outputs. The combined results improved over each individual systems results, as evaluated on four different golden sets.
Cognitive Processing | 2012
Gilles Col; Jeanne Aptekman; Stéphanie Girault; Thierry Poibeau
AbtractWe would like to propose a new model of meaning construction based on language comprehension considered as a dynamic process during which the meaning of each linguistic unit and the global meaning of the sentence are determined simultaneously. This model, which may be called “gestalt compositionality,” is radically opposed to the classic compositional mechanism advocated by linguistic formalism based on the primacy of syntax. The process considers the syntactic structure of an utterance as the product of meaning construction rather than its source. The comprehension of an utterance is consequently directly based on the interaction between the different basic components of this utterance: lexical units, grammatical markers, positional relations between units, and more generally, basic “constructions” in the sense of Construction Grammar. Thus, meaning is really the result of a gestalt compositional process insomuch as the contribution of each basic component depends on the contribution of the other components present in the utterance. We show a first attempt at modeling from French and English examples.