Fleur Mougin
University of Bordeaux
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Fleur Mougin.
medical informatics europe | 2009
Paul Avillach; Fleur Mougin; Michel Joubert; Frantz Thiessard; Antoine Pariente; Jean-Charles Dufour; Gianluca Trifirò; Giovanni Polimeni; Maria Antonietta Catania; Carlo Giaquinto; Giampiero Mazzaglia; Gianluca Baio; Ron M. C. Herings; Rosa Gini; Julia Hippisley-Cox; Mariam Molokhia; Lars Pedersen; Annie Fourrier-Réglat; Miriam Sturkenboom; Marius Fieschi
The overall objective of the eu-ADR project is the design, development, and validation of a computerised system that exploits data from electronic health records and biomedical databases for the early detection of adverse drug reactions. Eight different databases, containing health records of more than 30 million European citizens, are involved in the project. Unique queries cannot be performed across different databases because of their heterogeneity: Medical record and Claims databases, four different terminologies for coding diagnoses, and two languages for the information described in free text. The aim of our study was to provide database owners with a common basis for the construction of their queries. Using the UMLS, we provided a list of medical concepts, with their corresponding terms and codes in the four terminologies, which should be considered to retrieve the relevant information for the events of interest from the databases.
Studies in health technology and informatics | 2010
Paul Avillach; Michel Joubert; Frantz Thiessard; Gianluca Trifirò; Jean-Charles Dufour; Antoine Pariente; Fleur Mougin; Giovanni Polimeni; Maria Antonietta Catania; Carlo Giaquinto; Giampiero Mazzaglia; C. Fornari; Ron M. C. Herings; Rosa Gini; Julia Hippisley-Cox; Mariam Molokhia; Lars Pedersen; Annie Fourrier-Réglat; Miriam Sturkenboom; Marius Fieschi
The overall objective of the EU-ADR project is the design, development, and validation of a computerised system that exploits data from electronic health records and biomedical databases for the early detection of adverse drug reactions. Eight different databases, containing health records of more than 30 million European citizens, are involved in the project. Unique queries cannot be performed across different databases because of their heterogeneity: Medical record and Claims databases, four different terminologies for coding diagnoses, and two languages for the information described in free text. The aim of our study was to provide database owners with a common basis for the construction of their queries. Using the UMLS, we provided a list of medical concepts, with their corresponding terms and codes in the four terminologies, which should be considered to retrieve the relevant information for the events of interest from the databases.
Journal of Biomedical Informatics | 2009
Fleur Mougin; Olivier Bodenreider; Anita Burgun
OBJECTIVES Polysemy is a frequent issue in biomedical terminologies. In the Unified Medical Language System (UMLS), polysemous terms are either represented as several independent concepts, or clustered into a single, multiply-categorized concept. The objective of this study is to analyze polysemous concepts in the UMLS through their categorization and hierarchical relations for auditing purposes. METHODS We used the association of a concept with multiple Semantic Groups (SGs) as a surrogate for polysemy. We first extracted multi-SG (MSG) concepts from the UMLS Metathesaurus and characterized them in terms of the combinations of SGs with which they are associated. We then clustered MSG concepts in order to identify major types of polysemy. We also analyzed the inheritance of SGs in MSG concepts. Finally, we manually reviewed the categorization of the MSG concepts for auditing purposes. RESULTS The 1208 MSG concepts in the Metathesaurus are associated with 30 distinct pairs of SGs. We created 75 semantically homogeneous clusters of MSG concepts, and 276 MSG concepts could not be clustered for lack of hierarchical relations. The clusters were characterized by the most frequent pairs of semantic types of their constituent MSG concepts. MSG concepts exhibit limited semantic compatibility with their parent and child concepts. A large majority of MSG concepts (92%) are adequately categorized. Examples of miscategorized concepts are presented. CONCLUSION This work is a systematic analysis and manual review of all concepts categorized by multiple SGs in the UMLS. The correctly-categorized MSG concepts do reflect polysemy in the UMLS Metathesaurus. The analysis of inheritance of SGs proved useful for auditing concept categorization in the UMLS.
artificial intelligence in medicine in europe | 2011
Fleur Mougin; Marie Dupuch; Natalia Grabar
MedDRA is exploited for the indexing of pharmacovigilance spontaneous reports. But since spontaneous reports cover only a small proportion of the existing adverse drug reactions, the exploration of clinical reports is seriously considered. Through the UMLS, the current mapping between MedDRA and SNOMED CT, this last being used for indexing clinical data in many countries, is only 42%. In this work, we propose to improve this mapping through an automatic lexical-based approach. We obtained 308 direct mappings of a MedDRA term to a SNOMED CT concept. After segmenting MedDRA terms, we identified 535 full mappings associating a MedDRA term with one or more SNOMED CT concepts. The direct approach resulted in 199 (64.6%) correct mappings while through segmentation this number raises to 423 (79.1%). On the whole, our method provided interesting and useful results.
Journal of Biomedical Informatics | 2014
Khadim Dramé; Gayo Diallo; Fleur Delva; Jean-François Dartigues; Evelyne Mouillet; Roger Salamon; Fleur Mougin
Ontologies are useful tools for sharing and exchanging knowledge. However ontology construction is complex and often time consuming. In this paper, we present a method for building a bilingual domain ontology from textual and termino-ontological resources intended for semantic annotation and information retrieval of textual documents. This method combines two approaches: ontology learning from texts and the reuse of existing terminological resources. It consists of four steps: (i) term extraction from domain specific corpora (in French and English) using textual analysis tools, (ii) clustering of terms into concepts organized according to the UMLS Metathesaurus, (iii) ontology enrichment through the alignment of French and English terms using parallel corpora and the integration of new concepts, (iv) refinement and validation of results by domain experts. These validated results are formalized into a domain ontology dedicated to Alzheimers disease and related syndromes which is available online (http://lesim.isped.u-bordeaux2.fr/SemBiP/ressources/ontoAD.owl). The latter currently includes 5765 concepts linked by 7499 taxonomic relationships and 10,889 non-taxonomic relationships. Among these results, 439 concepts absent from the UMLS were created and 608 new synonymous French terms were added. The proposed method is sufficiently flexible to be applied to other domains.
international health informatics symposium | 2012
Fleur Mougin; Anita Burgun; Olivier Bodenreider
The Anatomical Therapeutic Chemical (ATC) classification system is widely used in Europe for the classification and coding of drugs. However, ATC is not well integrated with other medication terminologies (e.g., NDF-RT -- the National Drug File-Reference Terminology), which hinders the integration of data coded to these two systems. In this work, we propose to map ATC to NDF-RT, via the Unified Medical Language System (UMLS), in which several medication terminologies are integrated, including NDF-RT but not ATC. Only half of ATC terms were successfully mapped to the UMLS using automatic lexical techniques, resulting in very few overlapping drug-class pairs between ATC and NDF-RT. To improve these results, we performed a manual mapping of cardiovascular ATC and NDF-RT classes, which increased the number of common drug-class pairs from 39 to 128. We believe that the discovery of mappings between ATC and NDF-RT classes could be further automated and made more effective by identifying mappings between the drugs in these classes.
Journal of the American Medical Informatics Association | 2014
Fleur Mougin; Natalia Grabar
OBJECTIVE This work focuses on multiply-related Unified Medical Language System (UMLS) concepts, that is, concepts associated through multiple relations. The relations involved in such situations are audited to determine whether they are provided by source vocabularies or result from the integration of these vocabularies within the UMLS. METHODS We study the compatibility of the multiple relations which associate the concepts under investigation and try to explain the reason why they co-occur. Towards this end, we analyze the relations both at the concept and term levels. In addition, we randomly select 288 concepts associated through contradictory relations and manually analyze them. RESULTS At the UMLS scale, only 0.7% of combinations of relations are contradictory, while homogeneous combinations are observed in one-third of situations. At the scale of source vocabularies, one-third do not contain more than one relation between the concepts under investigation. Among the remaining source vocabularies, seven of them mainly present multiple non-homogeneous relations between terms. Analysis at the term level also shows that only in a quarter of cases are the source vocabularies responsible for the presence of multiply-related concepts in the UMLS. These results are available at: http://www.isped.u-bordeaux2.fr/ArticleJAMIA/results_multiply_related_concepts.aspx. DISCUSSION Manual analysis was useful to explain the conceptualization difference in relations between terms across source vocabularies. The exploitation of source relations was helpful for understanding why some source vocabularies describe multiple relations between a given pair of terms.
data integration in the life sciences | 2008
Fleur Mougin; Anita Burgun; Olivier Bodenreider; Julie Chabalier; Olivier Loréal; Pierre Le Beux
The information needed by biologists and physicians for research purposes is distributed over many heterogeneous sources. Integration systems provide a single, centralized and homogeneous interface for users to query multiple information sources simultaneously. The major limitation of integration systems, including mediator-based systems, is that the tasks involved in their creation and maintenance remain mainly manual. To address this limitation, we developed automated methods for facilitating the creation of a mediator-based system. We first implemented an automatic method for acquiring the local schemas of the sources to be integrated. We derived the global schema from the UMLS. Finally, we proposed schema-and instance-based approaches to mapping data elements from the local schemas to the global schema. To illustrate the applicability of our methods, we created a mediator-based system integrating eleven biomedical sources. This prototype is operational, available on the Internet (http://www.med.univ-rennes1.fr/cgi-bin/mougin/These/system.pl) and its evolution is managed semi-automatically.
Sprachwissenschaft | 2017
Thierry Hamon; Natalia Grabar; Fleur Mougin
Recent and intensive research in the biomedical area enabled to accumulate and disseminate biomedical knowledge through various knowledge bases increasingly available on the Web. The exploitation of this knowledge requires to create links between these bases and to use them jointly. Linked Data, the SPARQL language and interfaces in natural language question answering provide interesting solutions for querying such knowledge bases. However, while using biomedical Linked Data is crucial, life-science researchers may have difficulties using the SPARQL language. Interfaces based on natural language question answering are recognized to be suitable for querying knowledge bases. In this paper, we propose a method for translating natural language questions into SPARQL queries. We use Natural Language Processing tools, semantic resources and RDF triple descriptions. We designed a four-step method which allows to linguistically and semantically annotate questions, to perform an abstraction of these questions, then to build a representation of the SPARQL queries, and finally to generate the queries. The method is designed on 50 questions over three biomedical knowledge bases used in the task 2 of the QALD-4 challenge framework and evaluated on 27 new questions. It achieves good performance with 0.78 F-measure on the test set. The method for translating questions into SPARQL queries is implemented as a Perl module and is available at http://search.cpan.org/~thhamon/ RDF-NLP-SPARQLQuery/.
Journal of Biomedical Informatics | 2017
Jean Noël Nikiema; Vianney Jouhet; Fleur Mougin
In oncology, the reuse of data is confronted with the heterogeneity of terminologies. It is necessary to semantically integrate these distinct terminologies. The semantic integration by using a third terminology as a support is a conventional approach for the integration of two terminologies that are not very structured. The aim of our study was to use SNOMED CT for integrating ICD-10 and ICD-O3. We used two complementary resources, mapping tables provided by SNOMED CT and the NCI Metathesaurus, in order to find mappings between ICD-10 or ICD-O3 concepts and SNOMED CT concepts. We used the SNOMED CT structure to filter inconsistent mappings, as well as to disambiguate multiple mappings. Based on the remaining mappings, we used semantic relations from SNOMED CT to establish links between ICD-10 and ICD-O3. Overall, the coverage of ICD-O3 and ICD10 codes was over 88%. Finally, we obtained an integration of 24% (203/852) of ICD-10 concepts with 86% (888/1032) of ICD-O3 morphology concepts combined to 39% (127/330) of ICD-O3 topography concepts. Comparing our results with the 23,684 ICD-O3 pairs mapped to ICD-10 concepts in the SEER conversion file, we found 17,447 pairs of ICD-O3 concepts in common among which 11,932 pairs were integrated with the same ICD-10 concept as the SEER conversion file. The automated process leverages logical definitions of SNOMED CT concepts. While the low quality of some of these definitions impacted negatively the integration process, the identification of such situations made it possible to indirectly audit the structure of SNOMED CT.