Colin R. Batchelor | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Colin R. Batchelor is active.

Explore More

Publication

Featured researches published by Colin R. Batchelor.

empirical methods in natural language processing | 2009

Towards Domain-Independent Argumentative Zoning: Evidence from Chemistry and Computational Linguistics

Simone Teufel; Advaith Siddharthan; Colin R. Batchelor

Argumentative Zoning (AZ) is an analysis of the argumentative and rhetorical structure of a scientific paper. It has been shown to be reliably used by independent human coders, and has proven useful for various information access tasks. Annotation experiments have however so far been restricted to one discipline, computational linguistics (CL). Here, we present a more informative AZ scheme with 15 categories in place of the original 7, and show that it can be applied to the life sciences as well as to CL. We use a domain expert to encode basic knowledge about the subject (such as terminology and domain specific rules for individual categories) as part of the annotation guidelines. Our results show that non-expert human coders can then use these guidelines to reliably annotate this scheme in two domains, chemistry and computational linguistics.

Genome Biology | 2010

A standard variation file format for human genome sequences

Martin G. Reese; Barry Moore; Colin R. Batchelor; Fidel Salas; Fiona Cunningham; Gabor T. Marth; Lincoln Stein; Paul Flicek; Mark Yandell; Karen Eilbeck

Here we describe the Genome Variation Format (GVF) and the 10Gen dataset. GVF, an extension of Generic Feature Format version 3 (GFF3), is a simple tab-delimited format for DNA variant files, which uses Sequence Ontology to describe genome variation data. The 10Gen dataset, ten human genomes in GVF format, is freely available for community analysis from the Sequence Ontology website and from an Amazon elastic block storage (EBS) snapshot for use in Amazons EC2 cloud computing environment.

Bioinformatics | 2012

Automatic recognition of conceptualization zones in scientific articles and two life science applications

Maria Liakata; Shyamasree Saha; Simon Dobnik; Colin R. Batchelor; Dietrich Rebholz-Schuhmann

Motivation: Scholarly biomedical publications report on the findings of a research investigation. Scientists use a well-established discourse structure to relate their work to the state of the art, express their own motivation and hypotheses and report on their methods, results and conclusions. In previous work, we have proposed ways to explicitly annotate the structure of scientific investigations in scholarly publications. Here we present the means to facilitate automatic access to the scientific discourse of articles by automating the recognition of 11 categories at the sentence level, which we call Core Scientific Concepts (CoreSCs). These include: Hypothesis, Motivation, Goal, Object, Background, Method, Experiment, Model, Observation, Result and Conclusion. CoreSCs provide the structure and context to all statements and relations within an article and their automatic recognition can greatly facilitate biomedical information extraction by characterizing the different types of facts, hypotheses and evidence available in a scientific publication. Results: We have trained and compared machine learning classifiers (support vector machines and conditional random fields) on a corpus of 265 full articles in biochemistry and chemistry to automatically recognize CoreSCs. We have evaluated our automatic classifications against a manually annotated gold standard, and have achieved promising accuracies with ‘Experiment’, ‘Background’ and ‘Model’ being the categories with the highest F1-scores (76%, 62% and 53%, respectively). We have analysed the task of CoreSC annotation both from a sentence classification as well as sequence labelling perspective and we present a detailed feature evaluation. The most discriminative features are local sentence features such as unigrams, bigrams and grammatical dependencies while features encoding the document structure, such as section headings, also play an important role for some of the categories. We discuss the usefulness of automatically generated CoreSCs in two biomedical applications as well as work in progress. Availability: A web-based tool for the automatic annotation of articles with CoreSCs and corresponding documentation is available online at http://www.sapientaproject.com/software http://www.sapientaproject.com also contains detailed information pertaining to CoreSC annotation and links to annotation guidelines as well as a corpus of manually annotated articles, which served as our training data. Contact: [email protected] Supplementary information: Supplementary data are available at Bioinformatics online.

Journal of Biomedical Semantics | 2011

The Translational Medicine Ontology and Knowledge Base: driving personalized medicine by bridging the gap between bench and bedside

Joanne S. Luciano; Bosse Andersson; Colin R. Batchelor; Olivier Bodenreider; Timothy W.I. Clark; Christine Denney; Christopher Domarew; Thomas Gambet; Lee Harland; Anja Jentzsch; Vipul Kashyap; Peter Kos; Julia Kozlovsky; Timothy Lebo; Scott M Marshall; James P. McCusker; Deborah L. McGuinness; Chimezie Ogbuji; Elgar Pichler; Robert L Powers; Eric Prud’hommeaux; Matthias Samwald; Lynn M. Schriml; Peter J. Tonellato; Patricia L. Whetzel; Jun Zhao; Susie Stephens; Michel Dumontier

BackgroundTranslational medicine requires the integration of knowledge using heterogeneous data from health care to the life sciences. Here, we describe a collaborative effort to produce a prototype Translational Medicine Knowledge Base (TMKB) capable of answering questions relating to clinical practice and pharmaceutical drug discovery.ResultsWe developed the Translational Medicine Ontology (TMO) as a unifying ontology to integrate chemical, genomic and proteomic data with disease, treatment, and electronic health records. We demonstrate the use of Semantic Web technologies in the integration of patient and biomedical data, and reveal how such a knowledge base can aid physicians in providing tailored patient care and facilitate the recruitment of patients into active clinical trials. Thus, patients, physicians and researchers may explore the knowledge base to better understand therapeutic options, efficacy, and mechanisms of action.ConclusionsThis work takes an important step in using Semantic Web technologies to facilitate integration of relevant, distributed, external sources and progress towards a computational platform to support personalized medicine.AvailabilityTMO can be downloaded from http://code.google.com/p/translationalmedicineontology and TMKB can be accessed at http://tm.semanticscience.org/sparql.

Journal of Biomedical Informatics | 2011

Evolution of the Sequence Ontology terms and relationships

Christopher J. Mungall; Colin R. Batchelor; Karen Eilbeck

The Sequence Ontology is an established ontology, with a large user community, for the purpose of genomic annotation. We are reforming the ontology to provide better terms and relationships to describe the features of biological sequence, for both genomic and derived sequence. The SO is working within the guidelines of the OBO Foundry to provide interoperability between SO and the other related OBO ontologies. Here, we report changes and improvements made to SO including new relationships to better define the mereological, spatial and temporal aspects of biological sequence.

meeting of the association for computational linguistics | 2007

Annotation of Chemical Named Entities

Peter T. Corbett; Colin R. Batchelor; Simone Teufel

We describe the annotation of chemical named entities in scientific text. A set of annotation guidelines defines 5 types of named entities, and provides instructions for the resolution of special cases. A corpus of fulltext chemistry papers was annotated, with an inter-annotator agreement F score of 93%. An investigation of named entity recognition using LingPipe suggests that F scores of 63% are possible without customisation, and scores of 74% are possible with the addition of custom tokenisation and the use of dictionaries.

Applied Ontology | 2011

The RNA Ontology RNAO: An ontology for integrating RNA sequence and structure data

Robert Hoehndorf; Colin R. Batchelor; Thomas Bittner; Michel Dumontier; Karen Eilbeck; Rob Knight; Christopher J. Mungall; Jane S. Richardson; Jesse Stombaugh; Eric Westhof; Craig L. Zirbel; Neocles B. Leontis

Biomedical Ontologies integrate diverse biomedical data and enable intelligent data-mining and help translate basic research into useful clinical knowledge. We present the RNA Ontology (RNAO), an ontology for integrating diverse RNA data, including RNA sequences and sequence alignments, three-dimensional structures, and biochemical and functional data. For example, individual atomic resolution RNA structures have broader significance as representatives of classes of homologous molecules, which can differ significantly in sequence while sharing core structural features and common roles or functions. Thus, structural data gain value by being linked to homologous sequences in genomic data and databases of sequence alignments. Likewise, the value of genomic data is enhanced by annotation of shared structural features, especially when these can be linked to specific functions. Moreover, the significance of biochemical, functional and mutational analyses of RNA molecules are most fully understood when linked to molecular structures and phylogenies. To achieve these goals, RNAO provides logically rigorous definitions of the components of RNA primary, secondary and tertiary structure and the relations between these entities. RNAO is being developed to comply with the developing standards of the Open Biomedical Ontologies (OBO) Consortium. The RNAO can be accessed at http://code.google.com/p/rnao/.

Journal of Cheminformatics | 2015

PubChemRDF: towards the semantic annotation of PubChem compound and substance databases.

Gang Fu; Colin R. Batchelor; Michel Dumontier; Janna Hastings; Egon Willighagen; Evan Bolton

BackgroundPubChem is an open repository for chemical structures, biological activities and biomedical annotations. Semantic Web technologies are emerging as an increasingly important approach to distribute and integrate scientific data. Exposing PubChem data to Semantic Web services may help enable automated data integration and management, as well as facilitate interoperable web applications.DescriptionThis work, one of a series covering the PubChemRDF project, describes an approach to translate PubChem Substance and Compound information into Resource Description Framework (RDF) format. Basic examples are provided to demonstrate its use. The aim of this effort is to provide two new primary benefits to researchers in a cost-effective manner. Firstly, we aim to remove the inherent limitations of using the web-based resource PubChem by allowing a researcher to use readily available semantic technologies (namely, RDF triple stores and their corresponding SPARQL query engines) to query and analyze PubChem data on local computing resources. Secondly, this work intends to help improve data sharing, analysis, and integration of PubChem data to resources external to NCBI and across scientific domains, by means of the association of PubChem data to existing ontological frameworks, including CHEMical INFormation ontology, Semanticscience Integrated Ontology, and others.ConclusionsWith the goal of semantically describing information available in the PubChem archive, pre-existing ontological frameworks were used, rather than creating new ones. Semantic relationships between compounds and substances, chemical descriptors associated with compounds and substances, interrelationships between chemicals, as well as provenance and attribute metadata of substances are described.

BMC Genomics | 2013

Dovetailing biology and chemistry: integrating the Gene Ontology with the ChEBI chemical ontology

David P. Hill; Nico Adams; Mike Bada; Colin R. Batchelor; Tanya Z. Berardini; Heiko Dietze; Harold J. Drabkin; Marcus Ennis; Rebecca E. Foulger; Midori A. Harris; Janna Hastings; Namrata Kale; Paula de Matos; Christopher J. Mungall; Gareth Owen; Paola Roncaglia; Christoph Steinbeck; Steve Turner; Jane Lomax

BackgroundThe Gene Ontology (GO) facilitates the description of the action of gene products in a biological context. Many GO terms refer to chemical entities that participate in biological processes. To facilitate accurate and consistent systems-wide biological representation, it is necessary to integrate the chemical view of these entities with the biological view of GO functions and processes. We describe a collaborative effort between the GO and the Chemical Entities of Biological Interest (ChEBI) ontology developers to ensure that the representation of chemicals in the GO is both internally consistent and in alignment with the chemical expertise captured in ChEBI.ResultsWe have examined and integrated the ChEBI structural hierarchy into the GO resource through computationally-assisted manual curation of both GO and ChEBI. Our work has resulted in the creation of computable definitions of GO terms that contain fully defined semantic relationships to corresponding chemical terms in ChEBI.ConclusionsThe set of logical definitions using both the GO and ChEBI has already been used to automate aspects of GO development and has the potential to allow the integration of data across the domains of biology and chemistry. These logical definitions are available as an extended version of the ontology from http://purl.obolibrary.org/obo/go/extensions/go-plus.owl.

meeting of the association for computational linguistics | 2007

Semantic enrichment of journal articles using chemical named entity recognition

Colin R. Batchelor; Peter T. Corbett

We describe the semantic enrichment of journal articles with chemical structures and biomedical ontology terms using Oscar, a program for chemical named entity recognition (NER). We describe how Oscar works and how it can been adapted for general NER. We discuss its implementation in a real publishing workflow and possible applications for enriched articles.

Explore More