Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Sophie Aubin is active.

Publication


Featured researches published by Sophie Aubin.


international conference natural language processing | 2006

Improving term extraction with terminological resources

Sophie Aubin; Thierry Hamon

Studies of different term extractors on a corpus of the biomedical domain revealed decreasing performances when applied to highly technical texts. Facing the difficulty or impossibility to customize existing tools, we developed a tunable term extractor. It exploits linguistic-based rules in combination with the reuse of existing terminologies, i.e. exogenous disambiguation. Experiments reported here show that the combination of the two strategies allows the extraction of a greater number of term candidates with a higher level of reliability. We further describe the extraction process involving both endogenous and exogenous disambiguation implemented in the term extractor


BMC Bioinformatics | 2006

Lexical adaptation of link grammar to the biomedical sublanguage: a comparative evaluation of three approaches

Sampo Pyysalo; Tapio Salakoski; Sophie Aubin; Adeline Nazarenko

\rm Y\kern-.36em \lower.7ex\hbox{A}\kern-.25em T\kern-.1667em\lower.7ex\hbox{E}\kern-.08emA


international conference on computational linguistics | 2004

Event-based information extraction for the biomedical domain: the Caderige project

Erick Alphonse; Sophie Aubin; Philippe Bessières; Gilles Bisson; Thierry Hamon; Sandrine Lagarrigue; Adeline Nazarenko; Alain-Pierre Manine; Claire Nédellec; Mohamed Ould Abdel Vetah; Thierry Poibeau; Davy Weissenbacher

.


knowledge acquisition, modeling and management | 2010

Building large lexicalized ontologies from text: a use case in automatic indexing of biotechnology patents

Claire Nédellec; Wiktoria Golik; Sophie Aubin; Robert Bossy

BackgroundWe study the adaptation of Link Grammar Parser to the biomedical sublanguage with a focus on domain terms not found in a general parser lexicon. Using two biomedical corpora, we implement and evaluate three approaches to addressing unknown words: automatic lexicon expansion, the use of morphological clues, and disambiguation using a part-of-speech tagger. We evaluate each approach separately for its effect on parsing performance and consider combinations of these approaches.ResultsIn addition to a 45% increase in parsing efficiency, we find that the best approach, incorporating information from a domain part-of-speech tagger, offers a statistically significant 10% relative decrease in error.ConclusionWhen available, a high-quality domain part-of-speech tagger is the best solution to unknown word issues in the domain adaptation of a general parser. In the absence of such a resource, surface clues can provide remarkably good coverage and performance when tuned to the domain. The adapted parser is available under an open-source license.


Database | 2016

Text mining resources for the life sciences

Piotr Przybyła; Matthew Shardlow; Sophie Aubin; Robert Bossy; Richard Eckart de Castilho; Stelios Piperidis; John McNaught; Sophia Ananiadou

This paper gives an overview of the Caderige project. This project involves teams from different areas (biology, machine learning, natural language processing) in order to develop highlevel analysis tools for extracting structured information from biological bibliographical databases, especially Medline. The paper gives an overview of the approach and compares it to the state of the art.


F1000Research | 2017

Developing data interoperability using standards: A wheat community use case

Esther Dzale Yeumo; Michael Alaux; Elizabeth Arnaud; Sophie Aubin; Ute Baumann; Patrice Buche; Laurel Cooper; Hanna Ćwiek-Kupczyńska; Robert Davey; Richard Fulss; Clement Jonquet; Marie-Angélique Laporte; Pierre Larmande; Cyril Pommier; Vassilis Protonotarios; Carmen Reverte; Rosemary Shrestha; Imma Subirats; Aravind Venkatesan; Alex Whan; Hadi Quesneville

This paper presents a tool, TyDI, and methods experimented in the building of a termino-ontology, i.e. a lexicalized ontology aimed at fine-grained indexation for semantic search applications. TyDI provides facilities for knowledge engineers and domain experts to efficiently collaborate to validate, organize and conceptualize corpus extracted terms. A use case on biotechnology patent search demonstrates TyDIs potential.


Archive | 2018

Gestion des connaissances en agroécologie

Luce Trouche; Laurence Guichard; Sophie Aubin

Text mining is a powerful technology for quickly distilling key information from vast quantities of biomedical literature. However, to harness this power the researcher must be well versed in the availability, suitability, adaptability, interoperability and comparative accuracy of current text mining resources. In this survey, we give an overview of the text mining resources that exist in the life sciences to help researchers, especially those employed in biocuration, to engage with text mining in their own work. We categorize the various resources under three sections: Content Discovery looks at where and how to find biomedical publications for text mining; Knowledge Encoding describes the formats used to represent the different levels of information associated with content that enable text mining, including those formats used to carry such information between processes; Tools and Services gives an overview of workflow management systems that can be used to rapidly configure and compare domain- and task-specific processes, via access to a wide range of pre-built tools. We also provide links to relevant repositories in each section to enable the reader to find resources relevant to their own area of interest. Throughout this work we give a special focus to resources that are interoperable—those that have the crucial ability to share information, enabling smooth integration and reusability.


recent advances in natural language processing | 2006

Adapting a general parser to a sublanguage

Sophie Aubin; Adeline Nazarenko; Claire Nédellec

In this article, we present a joint effort of the wheat research community, along with data and ontology experts, to develop wheat data interoperability guidelines. Interoperability is the ability of two or more systems and devices to cooperate and exchange data, and interpret that shared information. Interoperability is a growing concern to the wheat scientific community, and agriculture in general, as the need to interpret the deluge of data obtained through high-throughput technologies grows. Agreeing on common data formats, metadata, and vocabulary standards is an important step to obtain the required data interoperability level in order to add value by encouraging data sharing, and subsequently facilitate the extraction of new information from existing and new datasets. During a period of more than 18 months, the RDA Wheat Data Interoperability Working Group (WDI-WG) surveyed the wheat research community about the use of data standards, then discussed and selected a set of recommendations based on consensual criteria. The recommendations promote standards for data types identified by the wheat research community as the most important for the coming years: nucleotide sequence variants, genome annotations, phenotypes, germplasm data, gene expression experiments, and physical maps. For each of these data types, the guidelines recommend best practices in terms of use of data formats, metadata standards and ontologies. In addition to the best practices, the guidelines provide examples of tools and implementations that are likely to facilitate the adoption of the recommendations. To maximize the adoption of the recommendations, the WDI-WG used a community-driven approach that involved the wheat research community from the start, took into account their needs and practices, and provided them with a framework to keep the recommendations up to date. We also report this approach’s potential to be generalizable to other (agricultural) domains.


arXiv: Artificial Intelligence | 2007

A robust linguistic platform for efficient and domain specific web content analysis

Thierry Hamon; Adeline Nazarenko; Thierry Poibeau; Sophie Aubin; Julien Derivière

Abstract:Agroecology knowledge management is an application ontology for the description and organization of knowledge to design innovative crop systems.


ICBO: International Conference on Biomedical Ontologies | 2016

Reusing the NCBO BioPortal technology for agronomy to build AgroPortal

Clement Jonquet; Anne Toulet; Elizabeth Arnaud; Sophie Aubin; Esther Dzalé-Yeumo; Vincent Emonet; John Graybeal; Mark A. Musen; Cyril Pommier; Pierre Larmande

Collaboration


Dive into the Sophie Aubin's collaboration.

Top Co-Authors

Avatar

Claire Nédellec

Institut national de la recherche agronomique

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Thierry Poibeau

École Normale Supérieure

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Philippe Bessières

Institut national de la recherche agronomique

View shared research outputs
Top Co-Authors

Avatar

Robert Bossy

Institut national de la recherche agronomique

View shared research outputs
Top Co-Authors

Avatar

Anne Toulet

University of Montpellier

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge