Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Emilie Pasche is active.

Publication


Featured researches published by Emilie Pasche.


Database | 2013

Managing the data deluge: data-driven GO category assignment improves while complexity of functional annotation increases

Julien Gobeill; Emilie Pasche; Dina Vishnyakova; Patrick Ruch

The available curated data lag behind current biological knowledge contained in the literature. Text mining can assist biologists and curators to locate and access this knowledge, for instance by characterizing the functional profile of publications. Gene Ontology (GO) category assignment in free text already supports various applications, such as powering ontology-based search engines, finding curation-relevant articles (triage) or helping the curator to identify and encode functions. Popular text mining tools for GO classification are based on so called thesaurus-based—or dictionary-based—approaches, which exploit similarities between the input text and GO terms themselves. But their effectiveness remains limited owing to the complex nature of GO terms, which rarely occur in text. In contrast, machine learning approaches exploit similarities between the input text and already curated instances contained in a knowledge base to infer a functional profile. GO Annotations (GOA) and MEDLINE make possible to exploit a growing amount of curated abstracts (97 000 in November 2012) for populating this knowledge base. Our study compares a state-of-the-art thesaurus-based system with a machine learning system (based on a k-Nearest Neighbours algorithm) for the task of proposing a functional profile for unseen MEDLINE abstracts, and shows how resources and performances have evolved. Systems are evaluated on their ability to propose for a given abstract the GO terms (2.8 on average) used for curation in GOA. We show that since 2006, although a massive effort was put into adding synonyms in GO (+300%), our thesaurus-based system effectiveness is rather constant, reaching from 0.28 to 0.31 for Recall at 20 (R20). In contrast, thanks to its knowledge base growth, our machine learning system has steadily improved, reaching from 0.38 in 2006 to 0.56 for R20 in 2012. Integrated in semi-automatic workflows or in fully automatic pipelines, such systems are more and more efficient to provide assistance to biologists. Database URL: http://eagl.unige.ch/GOCat/


Journal of Medical Internet Research | 2012

Building a transnational biosurveillance network using semantic web technologies: requirements, design, and preliminary evaluation.

Douglas Teodoro; Emilie Pasche; Julien Gobeill; Stéphane Paul Emonet; Patrick Ruch; Christian Lovis

Background Antimicrobial resistance has reached globally alarming levels and is becoming a major public health threat. Lack of efficacious antimicrobial resistance surveillance systems was identified as one of the causes of increasing resistance, due to the lag time between new resistances and alerts to care providers. Several initiatives to track drug resistance evolution have been developed. However, no effective real-time and source-independent antimicrobial resistance monitoring system is available publicly. Objective To design and implement an architecture that can provide real-time and source-independent antimicrobial resistance monitoring to support transnational resistance surveillance. In particular, we investigated the use of a Semantic Web-based model to foster integration and interoperability of interinstitutional and cross-border microbiology laboratory databases. Methods Following the agile software development methodology, we derived the main requirements needed for effective antimicrobial resistance monitoring, from which we proposed a decentralized monitoring architecture based on the Semantic Web stack. The architecture uses an ontology-driven approach to promote the integration of a network of sentinel hospitals or laboratories. Local databases are wrapped into semantic data repositories that automatically expose local computing-formalized laboratory information in the Web. A central source mediator, based on local reasoning, coordinates the access to the semantic end points. On the user side, a user-friendly Web interface provides access and graphical visualization to the integrated views. Results We designed and implemented the online Antimicrobial Resistance Trend Monitoring System (ARTEMIS) in a pilot network of seven European health care institutions sharing 70+ million triples of information about drug resistance and consumption. Evaluation of the computing performance of the mediator demonstrated that, on average, query response time was a few seconds (mean 4.3, SD 0.1×102 seconds). Clinical pertinence assessment showed that resistance trends automatically calculated by ARTEMIS had a strong positive correlation with the European Antimicrobial Resistance Surveillance Network (EARS-Net) (ρ = .86, P < .001) and the Sentinel Surveillance of Antibiotic Resistance in Switzerland (SEARCH) (ρ = .84, P < .001) systems. Furthermore, mean resistance rates extracted by ARTEMIS were not significantly different from those of either EARS-Net (∆ = ±0.130; 95% confidence interval –0 to 0.030; P < .001) or SEARCH (∆ = ±0.042; 95% confidence interval –0.004 to 0.028; P = .004). Conclusions We introduce a distributed monitoring architecture that can be used to build transnational antimicrobial resistance surveillance networks. Results indicated that the Semantic Web-based approach provided an efficient and reliable solution for development of eHealth architectures that enable online antimicrobial resistance monitoring from heterogeneous data sources. In future, we expect that more health care institutions can join the ARTEMIS network so that it can provide a large European and wider biosurveillance network that can be used to detect emerging bacterial resistance in a multinational context and support public health actions.


Database | 2015

Deep Question Answering for protein annotation

Julien Gobeill; Arnaud Gaudinat; Emilie Pasche; Dina Vishnyakova; Pascale Gaudet; Amos Marc Bairoch; Patrick Ruch

Biomedical professionals have access to a huge amount of literature, but when they use a search engine, they often have to deal with too many documents to efficiently find the appropriate information in a reasonable time. In this perspective, question-answering (QA) engines are designed to display answers, which were automatically extracted from the retrieved documents. Standard QA engines in literature process a user question, then retrieve relevant documents and finally extract some possible answers out of these documents using various named-entity recognition processes. In our study, we try to answer complex genomics questions, which can be adequately answered only using Gene Ontology (GO) concepts. Such complex answers cannot be found using state-of-the-art dictionary- and redundancy-based QA engines. We compare the effectiveness of two dictionary-based classifiers for extracting correct GO answers from a large set of 100 retrieved abstracts per question. In the same way, we also investigate the power of GOCat, a GO supervised classifier. GOCat exploits the GOA database to propose GO concepts that were annotated by curators for similar abstracts. This approach is called deep QA, as it adds an original classification step, and exploits curated biological data to infer answers, which are not explicitly mentioned in the retrieved documents. We show that for complex answers such as protein functional descriptions, the redundancy phenomenon has a limited effect. Similarly usual dictionary-based approaches are relatively ineffective. In contrast, we demonstrate how existing curated data, beyond information extraction, can be exploited by a supervised classifier, such as GOCat, to massively improve both the quantity and the quality of the answers with a +100% improvement for both recall and precision. Database URL: http://eagl.unige.ch/DeepQA4PA/


cross language evaluation forum | 2009

Simple pre and post processing strategies for patent searching in CLEF intellectual property track 2009

Julien Gobeill; Emilie Pasche; Douglas Teodoro; Patrick Ruch

The objective of the 2009 CLEF-IP Track was to find documents that constitute prior art for a given patent. We explored a wide range of simple preprocessing and post-processing strategies, using Mean Average Precision (MAP) for evaluation purposes. Once determined the best document representation, we tuned a classical Information Retrieval engine in order to perform the retrieval step. Finally, we explored two different post-processing strategies. In our experiments, using the complete IPC codes for filtering purposes led to greater improvements than using 4-digits IPC codes. The second postprocessing strategy was to exploit the citations of retrieved patents in order to boost scores of cited patents. Combining all selected strategies, we computed optimal runs that reached a MAP of 0.122 for the training set, and a MAP of 0.129 for the official 2009 CLEF-IP XL set.


Database | 2012

Using binary classification to prioritize and curate articles for the Comparative Toxicogenomics Database.

Dina Vishnyakova; Emilie Pasche; Patrick Ruch

We report on the original integration of an automatic text categorization pipeline, so-called ToxiCat (Toxicogenomic Categorizer), that we developed to perform biomedical documents classification and prioritization in order to speed up the curation of the Comparative Toxicogenomics Database (CTD). The task can be basically described as a binary classification task, where a scoring function is used to rank a selected set of articles. Then components of a question-answering system are used to extract CTD-specific annotations from the ranked list of articles. The ranking function is generated using a Support Vector Machine, which combines three main modules: an information retrieval engine for MEDLINE (EAGLi), a gene normalization service (NormaGene) developed for a previous BioCreative campaign and finally, a set of answering components and entity recognizer for diseases and chemicals. The main components of the pipeline are publicly available both as web application and web services. The specific integration performed for the BioCreative competition is available via a web user interface at http://pingu.unige.ch:8080/Toxicat.


PLOS ONE | 2013

Assisted knowledge discovery for the maintenance of clinical guidelines

Emilie Pasche; Patrick Ruch; Douglas Teodoro; Angela Huttner; Stéphan Juergen Harbarth; Julien Gobeill; Rolf Wipfli; Christian Lovis

Background Improving antibiotic prescribing practices is an important public-health priority given the widespread antimicrobial resistance. Establishing clinical practice guidelines is crucial to this effort, but their development is a complex task and their quality is directly related to the methodology and source of knowledge used. Objective We present the design and the evaluation of a tool (KART) that aims to facilitate the creation and maintenance of clinical practice guidelines based on information retrieval techniques. Methods KART consists of three main modules 1) a literature-based medical knowledge extraction module, which is built upon a specialized question-answering engine; 2) a module to normalize clinical recommendations based on automatic text categorizers; and 3) a module to manage clinical knowledge, which formalizes and stores clinical recommendations for further use. The evaluation of the usability and utility of KART followed the methodology of the cognitive walkthrough. Results KART was designed and implemented as a standalone web application. The quantitative evaluation of the medical knowledge extraction module showed that 53% of the clinical recommendations generated by KART are consistent with existing clinical guidelines. The user-based evaluation confirmed this result by showing that KART was able to find a relevant antibiotic for half of the clinical scenarios tested. The automatic normalization of the recommendation produced mixed results among end-users. Conclusions We have developed an innovative approach for the process of clinical guidelines development and maintenance in a context where available knowledge is increasing at a rate that cannot be sustained by humans. In contrast to existing knowledge authoring tools, KART not only provides assistance to normalize, formalize and store clinical recommendations, but also aims to facilitate knowledge building.


BMC Proceedings | 2011

KART, a knowledge authoring and refinement tool for clinical guidelines development

Emilie Pasche; Douglas Teodoro; Julien Gobeill; Dina Vishnyakova; Patrick Ruch; Christian Lovis

Optimal antibiotic prescriptions rely on evidence-based clinical guidelines, but creating such guidelines requires a time-consuming systematic review of the literature. We aim at facilitating this process by proposing an innovative tool to extract antibiotic treatments from the literature.


Database | 2016

neXtA5: accelerating annotation of articles via automated approaches in neXtProt

Luc Mottin; Julien Gobeill; Emilie Pasche; Pierre-André Michel; Isabelle Cusin; Pascale Gaudet; Patrick Ruch

The rapid increase in the number of published articles poses a challenge for curated databases to remain up-to-date. To help the scientific community and database curators deal with this issue, we have developed an application, neXtA5, which prioritizes the literature for specific curation requirements. Our system, neXtA5, is a curation service composed of three main elements. The first component is a named-entity recognition module, which annotates MEDLINE over some predefined axes. This report focuses on three axes: Diseases, the Molecular Function and Biological Process sub-ontologies of the Gene Ontology (GO). The automatic annotations are then stored in a local database, BioMed, for each annotation axis. Additional entities such as species and chemical compounds are also identified. The second component is an existing search engine, which retrieves the most relevant MEDLINE records for any given query. The third component uses the content of BioMed to generate an axis-specific ranking, which takes into account the density of named-entities as stored in the Biomed database. The two ranked lists are ultimately merged using a linear combination, which has been specifically tuned to support the annotation of each axis. The fine-tuning of the coefficients is formally reported for each axis-driven search. Compared with PubMed, which is the system used by most curators, the improvement is the following: +231% for Diseases, +236% for Molecular Functions and +3153% for Biological Process when measuring the precision of the top-returned PMID (P0 or mean reciprocal rank). The current search methods significantly improve the search effectiveness of curators for three important curation axes. Further experiments are being performed to extend the curation types, in particular protein–protein interactions, which require specific relationship extraction capabilities. In parallel, user-friendly interfaces powered with a set of JSON web services are currently being implemented into the neXtProt annotation pipeline. Available on: http://babar.unige.ch:8082/neXtA5 Database URL: http://babar.unige.ch:8082/neXtA5/fetcher.jsp


Swiss medical informatics | 2009

Integration of biomedical data using federated databases

Douglas Teodoro; Emilie Pasche; Rolf Wipfli; Julien Gobeill; Rémy Choquet; Christel Daniel; Patrick Ruch; Christian Lovis

The expansion of biomedical knowledge, reduction in computing costs and spread of IT facilities have conducted to an escalation of the biomedical electronic data. However, these data are rarely integrated and analysed because of the insufficiency of specialised tools. This paper presents a pilot system that will be used in the European FP7 DebugIT project to integrate biomedical data from several healthcare centres across Europe. The system aims at solving complex problems derived from the technical and semantic heterogeneity intrinsic to these kinds of data sources as well as from the absence of reliability of the distributed system.


text retrieval conference | 2009

Report on the TREC 2009 Experiments: Chemical IR Track

Julien Gobeill; Douglas Teodoro; Emilie Pasche; Patrick Ruch

Collaboration


Dive into the Emilie Pasche's collaboration.

Top Co-Authors

Avatar

Patrick Ruch

École Polytechnique Fédérale de Lausanne

View shared research outputs
Top Co-Authors

Avatar

Julien Gobeill

University of Applied Sciences Western Switzerland

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Arnaud Gaudinat

École Normale Supérieure

View shared research outputs
Top Co-Authors

Avatar

Julien Gobeill

University of Applied Sciences Western Switzerland

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Luc Mottin

Swiss Institute of Bioinformatics

View shared research outputs
Researchain Logo
Decentralizing Knowledge