Johanna McEntyre
European Bioinformatics Institute
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Johanna McEntyre.
Database | 2016
Qinghua Wang; Shabbir Syed Abdul; Lara Monteiro Almeida; Sophia Ananiadou; Yalbi Itzel Balderas-Martínez; Riza Theresa Batista-Navarro; David Campos; Lucy Chilton; Hui-Jou Chou; Gabriela Contreras; Laurel Cooper; Hong-Jie Dai; Barbra Ferrell; Juliane Fluck; Socorro Gama-Castro; Nancy George; Georgios V. Gkoutos; Afroza Khanam Irin; Lars Juhl Jensen; Silvia Jimenez; Toni Rose Jue; Ingrid M. Keseler; Sumit Madan; Sérgio Matos; Peter McQuilton; Marija Milacic; Matthew Mort; Jeyakumar Natarajan; Evangelos Pafilis; Emiliano Pereira
Fully automated text mining (TM) systems promote efficient literature searching, retrieval, and review but are not sufficient to produce ready-to-consume curated documents. These systems are not meant to replace biocurators, but instead to assist them in one or more literature curation steps. To do so, the user interface is an important aspect that needs to be considered for tool adoption. The BioCreative Interactive task (IAT) is a track designed for exploring user-system interactions, promoting development of useful TM tools, and providing a communication channel between the biocuration and the TM communities. In BioCreative V, the IAT track followed a format similar to previous interactive tracks, where the utility and usability of TM tools, as well as the generation of use cases, have been the focal points. The proposed curation tasks are user-centric and formally evaluated by biocurators. In BioCreative V IAT, seven TM systems and 43 biocurators participated. Two levels of user participation were offered to broaden curator involvement and obtain more feedback on usability aspects. The full level participation involved training on the system, curation of a set of documents with and without TM assistance, tracking of time-on-task, and completion of a user survey. The partial level participation was designed to focus on usability aspects of the interface and not the performance per se. In this case, biocurators navigated the system by performing pre-designed tasks and then were asked whether they were able to achieve the task and the level of difficulty in completing the task. In this manuscript, we describe the development of the interactive task, from planning to execution and discuss major findings for the systems tested. Database URL: http://www.biocreative.org
Nature | 2001
Tyra G. Wolfsberg; Johanna McEntyre; Gregory D. Schuler
There are a number of ways to investigate the structure, function and evolution of the human genome. These include examining the morphology of normal and abnormal chromosomes, constructing maps of genomic landmarks, following the genetic transmission of phenotypes and DNA sequence variations, and characterizing thousands of individual genes. To this list we can now add the elucidation of the genomic DNA sequence, albeit at ‘working draft’ accuracy. The current challenge is to weave together these disparate types of data to produce the information infrastructure needed to support the next generation of biomedical research. Here we provide an overview of the different sources of information about the human genome and how modern information technology, in particular the internet, allows us to link them together.
Nucleic Acids Research | 2011
Johanna McEntyre; Sophia Ananiadou; Stephen Andrews; William J. Black; Richard Boulderstone; Paula Buttery; David Chaplin; Sandeepreddy Chevuru; Norman Cobley; Lee Ann Coleman; Paul Davey; Bharti Gupta; Lesley Haji-Gholam; Craig Hawkins; Alan Horne; Simon J. Hubbard; Jee Hyub Kim; Ian Lewin; Vic Lyte; Ross MacIntyre; Sami Mansoor; Linda Mason; John McNaught; Elizabeth Newbold; Chikashi Nobata; Ernest Ong; Sharmila Pillai; Dietrich Rebholz-Schuhmann; Heather Rosie; Rob Rowbotham
UK PubMed Central (UKPMC) is a full-text article database that extends the functionality of the original PubMed Central (PMC) repository. The UKPMC project was launched as the first ‘mirror’ site to PMC, which in analogy to the International Nucleotide Sequence Database Collaboration, aims to provide international preservation of the open and free-access biomedical literature. UKPMC (http://ukpmc.ac.uk) has undergone considerable development since its inception in 2007 and now includes both a UKPMC and PubMed search, as well as access to other records such as Agricola, Patents and recent biomedical theses. UKPMC also differs from PubMed/PMC in that the full text and abstract information can be searched in an integrated manner from one input box. Furthermore, UKPMC contains ‘Cited By’ information as an alternative way to navigate the literature and has incorporated text-mining approaches to semantically enrich content and integrate it with related database resources. Finally, UKPMC also offers added-value services (UKPMC+) that enable grantees to deposit manuscripts, link papers to grants, publish online portfolios and view citation information on their papers. Here we describe UKPMC and clarify the relationship between PMC and UKPMC, providing historical context and future directions, 10 years on from when PMC was first launched.
Nucleic Acids Research | 2017
Gautier Koscielny; Peter An; Denise R. Carvalho-Silva; Jennifer A. Cham; Luca Fumis; Rippa Gasparyan; Samiul Hasan; Nikiforos Karamanis; Michael Maguire; Eliseo Papa; Andrea Pierleoni; Miguel Pignatelli; Theo Platt; Francis Rowland; Priyanka Wankar; A. Patrícia Bento; Tony Burdett; Antonio Fabregat; Simon A. Forbes; Anna Gaulton; Cristina Yenyxe Gonzalez; Henning Hermjakob; Anne Hersey; Steven Jupe; Şenay Kafkas; Maria Keays; Catherine Leroy; Francisco-Javier Lopez; María Paula Magariños; James Malone
We have designed and developed a data integration and visualization platform that provides evidence about the association of known and potential drug targets with diseases. The platform is designed to support identification and prioritization of biological targets for follow-up. Each drug target is linked to a disease using integrated genome-wide data from a broad range of data sources. The platform provides either a target-centric workflow to identify diseases that may be associated with a specific target, or a disease-centric workflow to identify targets that may be associated with a specific disease. Users can easily transition between these target- and disease-centric workflows. The Open Targets Validation Platform is accessible at https://www.targetvalidation.org.
Nucleic Acids Research | 2014
Yuci Gou; Florian Graff; Philip Rossiter; Francesco Talo; Vid Vartak; Lee-Ann Coleman; Craig Hawkins; Anna Kinsey; Sami Mansoor; Victoria Morris; Rob Rowbotham; David A. G. Chapman; Oliver Kilian; Ross MacIntyre; Yogesh Patel; Sophia Ananiadou; William J. Black; John McNaught; Rafal Rak; Andrew Rowley; Senay Kafkas; Jyothi Katuri; Jee-Hyub Kim; Nikos Marinos; Johanna McEntyre; Andrew Morrison; Xingjun Pi
This article describes recent developments of Europe PMC (http://europepmc.org), the leading database for life science literature. Formerly known as UKPMC, the service was rebranded in November 2012 as Europe PMC to reflect the scope of the funding agencies that support it. Several new developments have enriched Europe PMC considerably since then. Europe PMC now offers RESTful web services to access both articles and grants, powerful search tools such as citation-count sort order and data citation features, a service to add publications to your ORCID, a variety of export formats, and an External Links service that enables any related resource to be linked from Europe PMC content.
Journal of Biomedical Semantics | 2015
Şenay Kafkas; Jee-Hyub Kim; Xingjun Pi; Johanna McEntyre
BackgroundIn this study, we present an analysis of data citation practices in full text research articles and their corresponding supplementary data files, made available in the Open Access set of articles from Europe PubMed Central. Our aim is to investigate whether supplementary data files should be considered as a source of information for integrating the literature with biomolecular databases.ResultsUsing text-mining methods to identify and extract a variety of core biological database accession numbers, we found that the supplemental data files contain many more database citations than the body of the article, and that those citations often take the form of a relatively small number of articles citing large collections of accession numbers in text-based files. Moreover, citation of value-added databases derived from submission databases (such as Pfam, UniProt or Ensembl) is common, demonstrating the reuse of these resources as datasets in themselves. All the database accession numbers extracted from the supplementary data are publicly accessible from http://dx.doi.org/10.5281/zenodo.11771.ConclusionsOur study suggests that supplementary data should be considered when linking articles with data, in curation pipelines, and in information retrieval tasks in order to make full use of the entire research article. These observations highlight the need to improve the management of supplemental data in general, in order to make this information more discoverable and useful.
PLOS Biology | 2017
Julie McMurry; Nick Juty; Niklas Blomberg; Tony Burdett; Tom Conlin; Nathalie Conte; Mélanie Courtot; John Deck; Michel Dumontier; Donal Fellows; Alejandra Gonzalez-Beltran; Philipp Gormanns; Jeffrey S. Grethe; Janna Hastings; Jean-Karim Hériché; Henning Hermjakob; Jon Ison; Rafael C. Jimenez; Simon Jupp; John Kunze; Camille Laibe; Nicolas Le Novère; James Malone; María Martín; Johanna McEntyre; Chris Morris; Juha Muilu; Wolfgang Müller; Philippe Rocca-Serra; Susanna-Assunta Sansone
In many disciplines, data are highly decentralized across thousands of online databases (repositories, registries, and knowledgebases). Wringing value from such databases depends on the discipline of data science and on the humble bricks and mortar that make integration possible; identifiers are a core component of this integration infrastructure. Drawing on our experience and on work by other groups, we outline 10 lessons we have learned about the identifier qualities and best practices that facilitate large-scale data integration. Specifically, we propose actions that identifier practitioners (database providers) should take in the design, provision and reuse of identifiers. We also outline the important considerations for those referencing identifiers in various circumstances, including by authors and data generators. While the importance and relevance of each lesson will vary by context, there is a need for increased awareness about how to avoid and manage common identifier problems, especially those related to persistence and web-accessibility/resolvability. We focus strongly on web-based identifiers in the life sciences; however, the principles are broadly relevant to other disciplines.
PLOS ONE | 2013
Şenay Kafkas; Jee-Hyub Kim; Johanna McEntyre
Molecular biology and literature databases represent essential infrastructure for life science research. Effective integration of these data resources requires that there are structured cross-references at the level of individual articles and biological records. Here, we describe the current patterns of how database entries are cited in research articles, based on analysis of the full text Open Access articles available from Europe PMC. Focusing on citation of entries in the European Nucleotide Archive (ENA), UniProt and Protein Data Bank, Europe (PDBe), we demonstrate that text mining doubles the number of structured annotations of database record citations supplied in journal articles by publishers. Many thousands of new literature-database relationships are found by text mining, since these relationships are also not present in the set of articles cited by database records. We recommend that structured annotation of database records in articles is extended to other databases, such as ArrayExpress and Pfam, entries from which are also cited widely in the literature. The very high precision and high-throughput of this text-mining pipeline makes this activity possible both accurately and at low cost, which will allow the development of new integrated data services.
BMC Bioinformatics | 2013
Anne-Lise Veuthey; Alan Bridge; Julien Gobeill; Patrick Ruch; Johanna McEntyre; Lydie Bougueleret; Ioannis Xenarios
BackgroundThe annotation of protein post-translational modifications (PTMs) is an important task of UniProtKB curators and, with continuing improvements in experimental methodology, an ever greater number of articles are being published on this topic. To help curators cope with this growing body of information we have developed a system which extracts information from the scientific literature for the most frequently annotated PTMs in UniProtKB.ResultsThe procedure uses a pattern-matching and rule-based approach to extract sentences with information on the type and site of modification. A ranked list of protein candidates for the modification is also provided. For PTM extraction, precision varies from 57% to 94%, and recall from 75% to 95%, according to the type of modification. The procedure was used to track new publications on PTMs and to recover potential supporting evidence for phosphorylation sites annotated based on the results of large scale proteomics experiments.ConclusionsThe information retrieval and extraction method we have developed in this study forms the basis of a simple tool for the manual curation of protein post-translational modifications in UniProtKB/Swiss-Prot. Our work demonstrates that even simple text-mining tools can be effectively adapted for database curation tasks, providing that a thorough understanding of the working process and requirements are first obtained. This system can be accessed at http://eagl.unige.ch/PTM/.
Database | 2016
Ayush Singhal; Robert Leaman; Natalie L. Catlett; Thomas Lemberger; Johanna McEntyre; Shawn W. Polson; Ioannis Xenarios; Cecilia N. Arighi; Zhiyong Lu
Text mining in the biomedical sciences is rapidly transitioning from small-scale evaluation to large-scale application. In this article, we argue that text-mining technologies have become essential tools in real-world biomedical research. We describe four large scale applications of text mining, as showcased during a recent panel discussion at the BioCreative V Challenge Workshop. We draw on these applications as case studies to characterize common requirements for successfully applying text-mining techniques to practical biocuration needs. We note that system ‘accuracy’ remains a challenge and identify several additional common difficulties and potential research directions including (i) the ‘scalability’ issue due to the increasing need of mining information from millions of full-text articles, (ii) the ‘interoperability’ issue of integrating various text-mining systems into existing curation workflows and (iii) the ‘reusability’ issue on the difficulty of applying trained systems to text genres that are not seen previously during development. We then describe related efforts within the text-mining community, with a special focus on the BioCreative series of challenge workshops. We believe that focusing on the near-term challenges identified in this work will amplify the opportunities afforded by the continued adoption of text-mining tools. Finally, in order to sustain the curation ecosystem and have text-mining systems adopted for practical benefits, we call for increased collaboration between text-mining researchers and various stakeholders, including researchers, publishers and biocurators.