Is this you? Create Your Porfile

Cédrick Fairon

Université catholique de Louvain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Cédrick Fairon is active.

Explore More

Publication

Featured researches published by Cédrick Fairon.

Journal of Biomedical Semantics | 2012

Annotation Analysis for Testing Drug Safety Signals using Unstructured Clinical Notes

Paea LePendu; Srinivasan V Iyer; Cédrick Fairon; Nigam H. Shah

BackgroundThe electronic surveillance for adverse drug events is largely based upon the analysis of coded data from reporting systems. Yet, the vast majority of electronic health data lies embedded within the free text of clinical notes and is not gathered into centralized repositories. With the increasing access to large volumes of electronic medical data—in particular the clinical notes—it may be possible to computationally encode and to test drug safety signals in an active manner.ResultsWe describe the application of simple annotation tools on clinical text and the mining of the resulting annotations to compute the risk of getting a myocardial infarction for patients with rheumatoid arthritis that take Vioxx. Our analysis clearly reveals elevated risks for myocardial infarction in rheumatoid arthritis patients taking Vioxx (odds ratio 2.06) before 2005.ConclusionsOur results show that it is possible to apply annotation analysis methods for testing hypotheses about drug safety using electronic medical records.

european conference on information retrieval | 2013

Serelex: search and visualization of semantically related words

Alexander Panchenko; Pavel Romanov; Olga Morozova; Hubert Naets; Andrey Philippovich; Alexey Romanov; Cédrick Fairon

We present Serelex, a system that provides, given a query in English, a list of semantically related words. The terms are ranked according to an original semantic similarity measure learnt from a huge corpus. The system performs comparably to dictionary-based baselines, but does not require any semantic resource such as WordNet. Our study shows that users are completely satisfied with 70% of the query results.

north american chapter of the association for computational linguistics | 2016

TAXI at SemEval-2016 Task 13: a Taxonomy Induction Method based on Lexico-Syntactic Patterns, Substrings and Focused Crawling

Alexander Panchenko; Stefano Faralli; Eugen Ruppert; Steffen Remus; Hubert Naets; Cédrick Fairon; Simone Paolo Ponzetto; Chris Biemann

We present a system for taxonomy construction that reached the first place in all subtasks of the SemEval 2016 challenge on Taxonomy Extraction Evaluation. Our simple yet effective approach harvests hypernyms with substring inclusion and Hearst-style lexicosyntactic patterns from domain-specific texts obtained via language model based focused crawling. Extracted taxonomies are evaluated on English, Dutch, French and Italian for three domains each (Food, Environment and Science). Evaluations against a gold standard and by human judgment show that our method outperforms more complex and knowledge-rich approaches on most domains and languages. Furthermore, to adapt the method to a new domain or language, only a small amount of manual labour is needed.

european conference on information retrieval | 2013

Towards detection of child sexual abuse media: categorization of the associated filenames

Alexander Panchenko; Richard Beaufort; Hubert Naets; Cédrick Fairon

This paper approaches the problem of automatic pedophile content identification. We present a system for filename categorization, which is trained to identify suspicious files on P2P networks. In our initial experiments, we used regular pornography data as a substitution of child pornography. Our system separates filenames of pornographic media from the others with an accuracy that reaches 91---97%.

conference of the european chapter of the association for computational linguistics | 2006

Corporator: a tool for creating RSS-based specialized corpora

Cédrick Fairon

This paper presents a new approach and a software for collecting specialized corpora on the Web. This approach takes advantage of a very popular XML-based norm used on the Web for sharing content among websites: RSS (Really Simple Syndication). After a brief introduction to RSS, we explain the interest of this type of data sources in the framework of corpus development. Finally, we present Corporator, an Open Source software which was designed for collecting corpus from RSS feeds.

Language and Computers | 2006

I'm like, "Hey, it works!": Using GlossaNet to find attestations of the quotative (be) like in English-language newspapers

Cédrick Fairon; John Victor Singler

We present a study of a particular type of a quotative that occurs frequently in American Vernacular English and might be becoming part of Standard English: (be) like. To evaluate how this quotative is spreading in written English, we have used GlossaNet, an automatic system that monitors newspapers analyzing these texts using the programs and linguistic resources of a corpus parser.

recent advances in natural language processing | 2017

Using NLP for Enhancing Second Language Acquisition.

Leonardo Zilio; Rodrigo Wilkens; Cédrick Fairon

This study presents SMILLE, a system that draws on the Noticing Hypothesis and on input enhancements, addressing the lack of salience of grammatical infor mation in online documents chosen by a given user. By means of input enhancements, the system can draw the user’s attention to grammar, which could possibly lead to a higher intake per input ratio for metalinguistic information. The system receives as input an online document and submits it to a combined processing of parser and hand-written rules for detecting its grammatical structures. The input text can be freely chosen by the user, providing a more engaging experience and reflecting the user’s interests. The system can enhance a total of 107 fine-grained types of grammatical structures that are based on the CEFR. An evaluation of some of those structures resulted in an overall precision of 87%.

international conference on advanced learning technologies | 2017

Adaptive System for Language Learning

Leonardo Zilio; Cédrick Fairon

This paper presents a system that combines NLP and hand-written rules for enhancing the text of authentic Web pages based on the needs of a specific language learner. It uses the Stanford CoreNLP system to process texts, and applies hand-written rules for retrieving language information that is relevant according to a given Common European Framework of Reference for Languages (CEFR) level. After the text content of the Web page is processed, it is presented to the user with enhancements of various language structure. These enhancements are meant to draw the users attention to linguistic structures that are present on the text, so that the reading activity does not encompass only the meaning of the text, but also serves as a reinforcement to language learning activities.

north american chapter of the association for computational linguistics | 2016

CENTAL at SemEval-2016 Task 12: a linguistically fed CRF model for medical and temporal information extraction

Charlotte Hansart; Damien De Meyere; Patrick Watrin; André Bittar; Cédrick Fairon

In this paper, we describe the system developed for our participation in the Clinical TempEval task of SemEval 2016 (task 12). Our team focused on the subtasks of span and attribute identification from raw text and proposed a system that integrates both statistical and linguistic approaches. Our system is based on Conditional Random Fields with high-precision linguistic features.

Journal of French Language Studies | 2017

Social media, spontaneous writing and dictation. Spelling variation

Louise-Amélie Cougnon; Lénaïs Maskens; Sophie Roekhaut; Cédrick Fairon

This study investigates the hypothesis of young people having the multi-skills required to switch between formal and informal communication. We collected samples of the written output of students across different media and communication situations. The results obtained through dictation tests show that the students’ level is relatively low, with a majority of grammatical errors. The analysis of linguistic forms common to the corpora indicates that all the participants use traditional spelling in at least one of them. Lastly, we present a qualitative analysis of spelling variation and an overview of the teenagers’ linguistic representations.

Explore More