Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Gaël Lejeune is active.

Publication


Featured researches published by Gaël Lejeune.


ieee international conference on healthcare informatics | 2013

Any Language Early Detection of Epidemic Diseases from Web News Streams

Romain Brixtel; Gaël Lejeune; Antoine Doucet; Nadine Lucas

In this paper, we introduce a multilingual epidemiological news surveillance system. Its main contribution is its ability to extract epidemic events in any language, hence succeeding where state-of-the-art in surveillance systems usually fails : the objective of reactivity. Most systems indeed focus on a selected list of languages, deemed important. However, evidence shows that events are first described in the local language, and translated to other languages later, if and only if they contained important information. Hence, while systems handling only a sample of human languages may indeed succeed at extracting epidemic events, they will only do so after someone else detected the importance of the news, and made the decision to translate it. Thus, with events first described in other languages, such automated systems, that may only detect events that were already detected by humans, are essentially irrelevant for early detection. To overcome this weakness of the state-of-the-art in terms of reactivity, we designed a system that can detect epidemiological events in any language, without requiring any translation, be it automated or human-written. The solution presented in this paper relies on properties that may be called language universals. First, we observe and exploit properties of the news genre that remain unchanged, whatever the writing language. Second, we handle language variations, such as declensions, by processing text at the character-level, rather than at the word level. This additionally allows to handle various writing systems in a similar fashion. We present experiments with 5 languages, steoreotypical of different language families and writing systems : English, Chinese, Greek, Polish and Russian. Our system, DAnIEL, achieves an average F-measure score around 85%, slightly below top-performing systems for the languages that such systems are able to handle. However, its performance is superior for morphologically-rich languages. And it performs of course infinitely better for the languages that other systems are not able to handle : The richest system in the state-of-the-art handles around 10 languages, while there exists about 6,000 languages in the world, 300 of which are spoken by more than one million people. The DAnIEL system is able to process each of them.


Artificial Intelligence in Medicine | 2015

Multilingual event extraction for epidemic detection

Gaël Lejeune; Romain Brixtel; Antoine Doucet; Nadine Lucas

OBJECTIVE This paper presents a multilingual news surveillance system applied to tele-epidemiology. It has been shown that multilingual approaches improve timeliness in detection of epidemic events across the globe, eliminating the wait for local news to be translated into major languages. We present here a system to extract epidemic events in potentially any language, provided a Wikipedia seed for common disease names exists. METHODS The Daniel system presented herein relies on properties that are common to news writing (the journalistic genre), the most useful being repetition and saliency. Wikipedia is used to screen common disease names to be matched with repeated characters strings. Language variations, such as declensions, are handled by processing text at the character-level, rather than at the word level. This additionally makes it possible to handle various writing systems in a similar fashion. MATERIAL As no multilingual ground truth existed to evaluate the Daniel system, we built a multilingual corpus from the Web, and collected annotations from native speakers of Chinese, English, Greek, Polish and Russian, with no connection or interest in the Daniel system. This data set is available online freely, and can be used for the evaluation of other event extraction systems. RESULTS Experiments for 5 languages out of 17 tested are detailed in this paper: Chinese, English, Greek, Polish and Russian. The Daniel system achieves an average F-measure of 82% in these 5 languages. It reaches 87% on BEcorpus, the state-of-the-art corpus in English, slightly below top-performing systems, which are tailored with numerous language-specific resources. The consistent performance of Daniel on multiple languages is an important contribution to the reactivity and the coverage of epidemiological event detection systems. CONCLUSIONS Most event extraction systems rely on extensive resources that are language-specific. While their sophistication induces excellent results (over 90% precision and recall), it restricts their coverage in terms of languages and geographic areas. In contrast, in order to detect epidemic events in any language, the Daniel system only requires a list of a few hundreds of disease names and locations, which can actually be acquired automatically. The system can perform consistently well on any language, with precision and recall around 82% on average, according to this papers evaluation. Daniels character-based approach is especially interesting for morphologically-rich and low-resourced languages. The lack of resources to be exploited and the state of the art string matching algorithms imply that Daniel can process thousands of documents per minute on a simple laptop. In the context of epidemic surveillance, reactivity and geographic coverage are of primary importance, since no one knows where the next event will strike, and therefore in what vernacular language it will first be reported. By being able to process any language, the Daniel system offers unique coverage for poorly endowed languages, and can complete state of the art techniques for major languages.


artificial intelligence in medicine in europe | 2013

Added-Value of Automatic Multilingual Text Analysis for Epidemic Surveillance

Gaël Lejeune; Romain Brixtel; Charlotte Lecluze; Antoine Doucet; Nadine Lucas

The early detection of disease outbursts is an important objective of epidemic surveillance. The web news are one of the information bases for detecting epidemic events as soon as possible, but to analyze tens of thousands articles published daily is costly. Recently, automatic systems have been devoted to epidemiological surveillance. The main issue for these systems is to process more languages at a limited cost. However, existing systems mainly process major languages (English, French, Russian, Spanish…). Thus, when the first news reporting a disease is in a minor language, the timeliness of event detection is worsened. In this paper, we test an automatic style-based method, designed to fill the gaps of existing automatic systems. It is parsimonious in resources and specially designed for multilingual issues. The events detected by the human-moderated ProMED mail between November 2011 and January 2012 are used as a reference dataset and compared to events detected in 17 languages by the system DAnIEL2 from web articles of this time-window. We show how being able to process press articles in languages less-spoken allows quicker detection of epidemic events in some regions of the world.


International Conference on NLP | 2012

DAnIEL: Language Independent Character-Based News Surveillance

Gaël Lejeune; Romain Brixtel; Antoine Doucet; Nadine Lucas

This study aims at developing a news surveillance system able to address multilingual web corpora. As an example of a domain where multilingual capacity is crucial, we focus on Epidemic Surveillance. This task necessitates worldwide coverage of news in order to detect new events as quickly as possible, anywhere, whatever the language it is first reported in. In this study, text-genre is used rather than sentence analysis. The news-genre properties allow us to assess the thematic relevance of news, filtered with the help of a specialised lexicon that is automatically collected on Wikipedia. Afterwards, a more detailed analysis of text specific properties is applied to relevant documents to better characterize the epidemic event (i.e., which disease spreads where?). Results from 400 documents in each language demonstrate the interest of this multilingual approach with light resources. DAnIEL achieves an F 1-measure score around 85%. Two issues are addressed: the first is morphology rich languages, e.g. Greek, Polish and Russian as compared to English. The second is event location detection as related to disease detection. This system provides a reliable alternative to the generic IE architecture that is constrained by the lack of numerous components in many languages.


user centric media | 2009

A Proposal for a Multilingual Epidemic Surveillance System

Gaël Lejeune; Mohamed Hatmi; Antoine Doucet; Silja Huttunen; Nadine Lucas

In epidemic surveillance, monitoring numerous languages is an important issue. In this paper we present a system designed to work on French, Spanish and English. The originality of our system is that we use only a few resources to perform our information extraction tasks. Instead of using ontologies, we use structure patterns of newspapers articles. The results on these three languages are encouraging at the preliminary stage and we will present a few examples of interesting experiments in other languages.


Proceedings of the 4th Workshop on Cross Lingual Information Access | 2010

Filtering news for epidemic surveillance: towards processing more languages with fewer resources

Gaël Lejeune; Antoine Doucet; Roman Yangarber; Nadine Lucas


language resources and evaluation | 2016

Ambiguity Diagnosis for Terms in Digital Humanities

Béatrice Daille; Evelyne Jacquey; Gaël Lejeune; Luis Melo; Yannick Toussaint


Traitement Automatique des Langues Naturelles 2015, DEFT | 2015

A stylometric approach for opinion mining

Gaël Lejeune; Frédéric Dumonceaux


Traitement Automatique des Langues Naturelles 2015 | 2015

Évaluation intrinsèque et extrinsèque du nettoyage de pages Web

Gaël Lejeune; Romain Brixtel; Charlotte Lecluze


Traitement Automatique des Langues Naturelles 2015 | 2015

Towards diagnosing ambiguity of candidate terms

Gaël Lejeune; Béatrice Daille

Collaboration


Dive into the Gaël Lejeune's collaboration.

Top Co-Authors

Avatar

Antoine Doucet

University of La Rochelle

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge