Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Julien Gobeill is active.

Publication


Featured researches published by Julien Gobeill.


PLOS Biology | 2013

The COMBREX Project: Design, Methodology, and Initial Results

Brian P. Anton; Yi-Chien Chang; Peter Brown; Han-Pil Choi; Lina L. Faller; Jyotsna Guleria; Zhenjun Hu; Niels Klitgord; Ami Levy-Moonshine; Almaz Maksad; Varun Mazumdar; Mark McGettrick; Lais Osmani; Revonda Pokrzywa; John Rachlin; Rajeswari Swaminathan; Benjamin Allen; Genevieve Housman; Caitlin Monahan; Krista Rochussen; Kevin Tao; Ashok S. Bhagwat; Steven E. Brenner; Linda Columbus; Valérie de Crécy-Lagard; Donald J. Ferguson; Alexey Fomenkov; Giovanni Gadda; Richard D. Morgan; Andrei L. Osterman

Experimental data exists for only a vanishingly small fraction of sequenced microbial genes. This community page discusses the progress made by the COMBREX project to address this important issue using both computational and experimental resources.


BMC Medical Informatics and Decision Making | 2008

Automatic medical encoding with SNOMED categories

Patrick Ruch; Julien Gobeill; Christian Lovis; Antoine Geissbuhler

BackgroundIn this paper, we describe the design and preliminary evaluation of a new type of tools to speed up the encoding of episodes of care using the SNOMED CT terminology.MethodsThe proposed system can be used either as a search tool to browse the terminology or as a categorization tool to support automatic annotation of textual contents with SNOMED concepts. The general strategy is similar for both tools and is based on the fusion of two complementary retrieval strategies with thesaural resources. The first classification module uses a traditional vector-space retrieval engine which has been fine-tuned for the task, while the second classifier is based on regular variations of the term list. For evaluating the system, we use a sample of MEDLINE. SNOMED CT categories have been restricted to Medical Subject Headings (MeSH) using the SNOMED-MeSH mapping provided by the UMLS (version 2006).ResultsConsistent with previous investigations applied on biomedical terminologies, our results show that performances of the hybrid system are significantly improved as compared to each single module. For top returned concepts, a precision at high ranks (P0) of more than 80% is observed. In addition, a manual and qualitative evaluation on a dozen of MEDLINE abstracts suggests that SNOMED CT could represent an improvement compared to existing medical terminologies such as MeSH.ConclusionAlthough the precision of the SNOMED categorizer seems sufficient to help professional encoders, it is concluded that clinical benchmarks as well as usability studies are needed to assess the impact of our SNOMED encoding method in real settings.AvailabilitiesThe system is available for research purposes on: http://eagl.unige.ch/SNOCat.


Database | 2014

Overview of the gene ontology task at BioCreative IV

Yuqing Mao; Kimberly Van Auken; Donghui Li; Cecilia N. Arighi; Peter McQuilton; G. Thomas Hayman; Susan Tweedie; Mary L. Schaeffer; Stanley J. F. Laulederkind; Shur Jen Wang; Julien Gobeill; Patrick Ruch; Anh T.uan Luu; Jung Jae Kim; Jung-Hsien Chiang; Yu De Chen; Chia Jung Yang; Hongfang Liu; Dongqing Zhu; Yanpeng Li; Hong Yu; Ehsan Emadzadeh; Graciela Gonzalez; Jian Ming Chen; Hong Jie Dai; Zhiyong Lu

Gene Ontology (GO) annotation is a common task among model organism databases (MODs) for capturing gene function data from journal articles. It is a time-consuming and labor-intensive task, and is thus often considered as one of the bottlenecks in literature curation. There is a growing need for semiautomated or fully automated GO curation techniques that will help database curators to rapidly and accurately identify gene function information in full-length articles. Despite multiple attempts in the past, few studies have proven to be useful with regard to assisting real-world GO curation. The shortage of sentence-level training data and opportunities for interaction between text-mining developers and GO curators has limited the advances in algorithm development and corresponding use in practical circumstances. To this end, we organized a text-mining challenge task for literature-based GO annotation in BioCreative IV. More specifically, we developed two subtasks: (i) to automatically locate text passages that contain GO-relevant information (a text retrieval task) and (ii) to automatically identify relevant GO terms for the genes in a given article (a concept-recognition task). With the support from five MODs, we provided teams with >4000 unique text passages that served as the basis for each GO annotation in our task data. Such evidence text information has long been recognized as critical for text-mining algorithm development but was never made available because of the high cost of curation. In total, seven teams participated in the challenge task. From the team results, we conclude that the state of the art in automatically mining GO terms from literature has improved over the past decade while much progress is still needed for computer-assisted GO curation. Future work should focus on addressing remaining technical challenges for improved performance of automatic GO concept recognition and incorporating practical benefits of text-mining tools into real-world GO annotation. Database URL: http://www.biocreative.org/tasks/biocreative-iv/track-4-GO/.


Database | 2013

Managing the data deluge: data-driven GO category assignment improves while complexity of functional annotation increases

Julien Gobeill; Emilie Pasche; Dina Vishnyakova; Patrick Ruch

The available curated data lag behind current biological knowledge contained in the literature. Text mining can assist biologists and curators to locate and access this knowledge, for instance by characterizing the functional profile of publications. Gene Ontology (GO) category assignment in free text already supports various applications, such as powering ontology-based search engines, finding curation-relevant articles (triage) or helping the curator to identify and encode functions. Popular text mining tools for GO classification are based on so called thesaurus-based—or dictionary-based—approaches, which exploit similarities between the input text and GO terms themselves. But their effectiveness remains limited owing to the complex nature of GO terms, which rarely occur in text. In contrast, machine learning approaches exploit similarities between the input text and already curated instances contained in a knowledge base to infer a functional profile. GO Annotations (GOA) and MEDLINE make possible to exploit a growing amount of curated abstracts (97 000 in November 2012) for populating this knowledge base. Our study compares a state-of-the-art thesaurus-based system with a machine learning system (based on a k-Nearest Neighbours algorithm) for the task of proposing a functional profile for unseen MEDLINE abstracts, and shows how resources and performances have evolved. Systems are evaluated on their ability to propose for a given abstract the GO terms (2.8 on average) used for curation in GOA. We show that since 2006, although a massive effort was put into adding synonyms in GO (+300%), our thesaurus-based system effectiveness is rather constant, reaching from 0.28 to 0.31 for Recall at 20 (R20). In contrast, thanks to its knowledge base growth, our machine learning system has steadily improved, reaching from 0.38 in 2006 to 0.56 for R20 in 2012. Integrated in semi-automatic workflows or in fully automatic pipelines, such systems are more and more efficient to provide assistance to biologists. Database URL: http://eagl.unige.ch/GOCat/


Journal of Medical Internet Research | 2012

Building a transnational biosurveillance network using semantic web technologies: requirements, design, and preliminary evaluation.

Douglas Teodoro; Emilie Pasche; Julien Gobeill; Stéphane Paul Emonet; Patrick Ruch; Christian Lovis

Background Antimicrobial resistance has reached globally alarming levels and is becoming a major public health threat. Lack of efficacious antimicrobial resistance surveillance systems was identified as one of the causes of increasing resistance, due to the lag time between new resistances and alerts to care providers. Several initiatives to track drug resistance evolution have been developed. However, no effective real-time and source-independent antimicrobial resistance monitoring system is available publicly. Objective To design and implement an architecture that can provide real-time and source-independent antimicrobial resistance monitoring to support transnational resistance surveillance. In particular, we investigated the use of a Semantic Web-based model to foster integration and interoperability of interinstitutional and cross-border microbiology laboratory databases. Methods Following the agile software development methodology, we derived the main requirements needed for effective antimicrobial resistance monitoring, from which we proposed a decentralized monitoring architecture based on the Semantic Web stack. The architecture uses an ontology-driven approach to promote the integration of a network of sentinel hospitals or laboratories. Local databases are wrapped into semantic data repositories that automatically expose local computing-formalized laboratory information in the Web. A central source mediator, based on local reasoning, coordinates the access to the semantic end points. On the user side, a user-friendly Web interface provides access and graphical visualization to the integrated views. Results We designed and implemented the online Antimicrobial Resistance Trend Monitoring System (ARTEMIS) in a pilot network of seven European health care institutions sharing 70+ million triples of information about drug resistance and consumption. Evaluation of the computing performance of the mediator demonstrated that, on average, query response time was a few seconds (mean 4.3, SD 0.1×102 seconds). Clinical pertinence assessment showed that resistance trends automatically calculated by ARTEMIS had a strong positive correlation with the European Antimicrobial Resistance Surveillance Network (EARS-Net) (ρ = .86, P < .001) and the Sentinel Surveillance of Antibiotic Resistance in Switzerland (SEARCH) (ρ = .84, P < .001) systems. Furthermore, mean resistance rates extracted by ARTEMIS were not significantly different from those of either EARS-Net (∆ = ±0.130; 95% confidence interval –0 to 0.030; P < .001) or SEARCH (∆ = ±0.042; 95% confidence interval –0.004 to 0.028; P = .004). Conclusions We introduce a distributed monitoring architecture that can be used to build transnational antimicrobial resistance surveillance networks. Results indicated that the Semantic Web-based approach provided an efficient and reliable solution for development of eHealth architectures that enable online antimicrobial resistance monitoring from heterogeneous data sources. In future, we expect that more health care institutions can join the ARTEMIS network so that it can provide a large European and wider biosurveillance network that can be used to detect emerging bacterial resistance in a multinational context and support public health actions.


BMC Bioinformatics | 2008

Gene Ontology density estimation and discourse analysis for automatic GeneRiF extraction

Julien Gobeill; Imad Tbahriti; Frédéric Ehrler; Anaïs Mottaz; Anne-Lise Veuthey; Patrick Ruch

BackgroundThis paper describes and evaluates a sentence selection engine that extracts a GeneRiF (Gene Reference into Functions) as defined in ENTREZ-Gene based on a MEDLINE record. Inputs for this task include both a gene and a pointer to a MEDLINE reference. In the suggested approach we merge two independent sentence extraction strategies. The first proposed strategy (LASt) uses argumentative features, inspired by discourse-analysis models. The second extraction scheme (GOEx) uses an automatic text categorizer to estimate the density of Gene Ontology categories in every sentence; thus providing a full ranking of all possible candidate GeneRiFs. A combination of the two approaches is proposed, which also aims at reducing the size of the selected segment by filtering out non-content bearing rhetorical phrases.ResultsBased on the TREC-2003 Genomics collection for GeneRiF identification, the LASt extraction strategy is already competitive (52.78%). When used in a combined approach, the extraction task clearly shows improvement, achieving a Dice score of over 57% (+10%).ConclusionsArgumentative representation levels and conceptual density estimation using Gene Ontology contents appear complementary for functional annotation in proteomics.


BMC Bioinformatics | 2013

Application of text-mining for updating protein post-translational modification annotation in UniProtKB.

Anne-Lise Veuthey; Alan Bridge; Julien Gobeill; Patrick Ruch; Johanna McEntyre; Lydie Bougueleret; Ioannis Xenarios

BackgroundThe annotation of protein post-translational modifications (PTMs) is an important task of UniProtKB curators and, with continuing improvements in experimental methodology, an ever greater number of articles are being published on this topic. To help curators cope with this growing body of information we have developed a system which extracts information from the scientific literature for the most frequently annotated PTMs in UniProtKB.ResultsThe procedure uses a pattern-matching and rule-based approach to extract sentences with information on the type and site of modification. A ranked list of protein candidates for the modification is also provided. For PTM extraction, precision varies from 57% to 94%, and recall from 75% to 95%, according to the type of modification. The procedure was used to track new publications on PTMs and to recover potential supporting evidence for phosphorylation sites annotated based on the results of large scale proteomics experiments.ConclusionsThe information retrieval and extraction method we have developed in this study forms the basis of a simple tool for the manual curation of protein post-translational modifications in UniProtKB/Swiss-Prot. Our work demonstrates that even simple text-mining tools can be effectively adapted for database curation tasks, providing that a thorough understanding of the working process and requirements are first obtained. This system can be accessed at http://eagl.unige.ch/PTM/.


Database | 2015

Deep Question Answering for protein annotation

Julien Gobeill; Arnaud Gaudinat; Emilie Pasche; Dina Vishnyakova; Pascale Gaudet; Amos Marc Bairoch; Patrick Ruch

Biomedical professionals have access to a huge amount of literature, but when they use a search engine, they often have to deal with too many documents to efficiently find the appropriate information in a reasonable time. In this perspective, question-answering (QA) engines are designed to display answers, which were automatically extracted from the retrieved documents. Standard QA engines in literature process a user question, then retrieve relevant documents and finally extract some possible answers out of these documents using various named-entity recognition processes. In our study, we try to answer complex genomics questions, which can be adequately answered only using Gene Ontology (GO) concepts. Such complex answers cannot be found using state-of-the-art dictionary- and redundancy-based QA engines. We compare the effectiveness of two dictionary-based classifiers for extracting correct GO answers from a large set of 100 retrieved abstracts per question. In the same way, we also investigate the power of GOCat, a GO supervised classifier. GOCat exploits the GOA database to propose GO concepts that were annotated by curators for similar abstracts. This approach is called deep QA, as it adds an original classification step, and exploits curated biological data to infer answers, which are not explicitly mentioned in the retrieved documents. We show that for complex answers such as protein functional descriptions, the redundancy phenomenon has a limited effect. Similarly usual dictionary-based approaches are relatively ineffective. In contrast, we demonstrate how existing curated data, beyond information extraction, can be exploited by a supervised classifier, such as GOCat, to massively improve both the quantity and the quality of the answers with a +100% improvement for both recall and precision. Database URL: http://eagl.unige.ch/DeepQA4PA/


cross language evaluation forum | 2009

Simple pre and post processing strategies for patent searching in CLEF intellectual property track 2009

Julien Gobeill; Emilie Pasche; Douglas Teodoro; Patrick Ruch

The objective of the 2009 CLEF-IP Track was to find documents that constitute prior art for a given patent. We explored a wide range of simple preprocessing and post-processing strategies, using Mean Average Precision (MAP) for evaluation purposes. Once determined the best document representation, we tuned a classical Information Retrieval engine in order to perform the retrieval step. Finally, we explored two different post-processing strategies. In our experiments, using the complete IPC codes for filtering purposes led to greater improvements than using 4-digits IPC codes. The second postprocessing strategy was to exploit the citations of retrieved patents in order to boost scores of cited patents. Combining all selected strategies, we computed optimal runs that reached a MAP of 0.122 for the training set, and a MAP of 0.129 for the official 2009 CLEF-IP XL set.


cross language evaluation forum | 2008

Query and document expansion with medical subject headings terms at medical Imageclef 2008

Julien Gobeill; Patrick Ruch; Xin Zhou

In this paper, we report on query and document expansion using Medical Subject Headings (MeSH) terms designed for medical ImageCLEF 2008. In this collection, MeSH terms describing an image could be obtained in two different ways: either being collected with the associated MEDLINEs paper, or being extracted from the associated caption. We compared document expansion using both. From a baseline of 0.136 for Mean Average Precision (MAP), we reached a MAP of respectively 0.176 (+29%) with the first method, and 0.154 (+13%) with the second. In-depth analyses show how both strategies were beneficial, as they covered different aspects of the image. Finally, we combined them in order to produce a significantly better run (0.254 MAP, +86%). Combining the MeSH terms using both methods gives hence a better representation of the images, in order to perform document expansion.

Collaboration


Dive into the Julien Gobeill's collaboration.

Top Co-Authors

Avatar

Patrick Ruch

Swiss Institute of Bioinformatics

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Arnaud Gaudinat

École Normale Supérieure

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Luc Mottin

Swiss Institute of Bioinformatics

View shared research outputs
Top Co-Authors

Avatar

Anaïs Mottaz

Swiss Institute of Bioinformatics

View shared research outputs
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge