Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Christopher J. O. Baker is active.

Publication


Featured researches published by Christopher J. O. Baker.


Journal of Biomedical Semantics | 2014

The Semanticscience Integrated Ontology (SIO) for biomedical research and knowledge discovery

Michel Dumontier; Christopher J. O. Baker; Joachim Baran; Alison Callahan; Leonid L. Chepelev; José Cruz-Toledo; Nicholas Del Rio; Geraint Duck; Laura I. Furlong; Nichealla Keath; Dana Klassen; James P. McCusker; Núria Queralt-Rosinach; Matthias Samwald; Natalia Villanueva-Rosales; Mark D. Wilkinson; Robert Hoehndorf

The Semanticscience Integrated Ontology (SIO) is an ontology to facilitate biomedical knowledge discovery. SIO features a simple upper level comprised of essential types and relations for the rich description of arbitrary (real, hypothesized, virtual, fictional) objects, processes and their attributes. SIO specifies simple design patterns to describe and associate qualities, capabilities, functions, quantities, and informational entities including textual, geometrical, and mathematical entities, and provides specific extensions in the domains of chemistry, biology, biochemistry, and bioinformatics. SIO provides an ontological foundation for the Bio2RDF linked data for the life sciences project and is used for semantic integration and discovery for SADI-based semantic web services. SIO is freely available to all users under a creative commons by attribution license. See website for further information: http://sio.semanticscience.org.


Journal of Biomedical Semantics | 2011

Assessment of NER solutions against the first and second CALBC Silver Standard Corpus

Dietrich Rebholz-Schuhmann; Antonio Jimeno Yepes; Chen Li; Senay Kafkas; Ian Lewin; Ning Kang; Peter Corbett; David Milward; Ekaterina Buyko; Elena Beisswanger; Kerstin Hornbostel; Alexandre Kouznetsov; René Witte; Jonas B. Laurila; Christopher J. O. Baker; Cheng-Ju Kuo; Simone Clematide; Fabio Rinaldi; Richárd Farkas; György Móra; Kazuo Hara; Laura I. Furlong; Michael Rautschka; Mariana Neves; Alberto Pascual-Montano; Qi Wei; Nigel Collier; Faisal Mahbub Chowdhury; Alberto Lavelli; Rafael Berlanga

BackgroundCompetitions in text mining have been used to measure the performance of automatic text processing solutions against a manually annotated gold standard corpus (GSC). The preparation of the GSC is time-consuming and costly and the final corpus consists at the most of a few thousand documents annotated with a limited set of semantic groups. To overcome these shortcomings, the CALBC project partners (PPs) have produced a large-scale annotated biomedical corpus with four different semantic groups through the harmonisation of annotations from automatic text mining solutions, the first version of the Silver Standard Corpus (SSC-I). The four semantic groups are chemical entities and drugs (CHED), genes and proteins (PRGE), diseases and disorders (DISO) and species (SPE). This corpus has been used for the First CALBC Challenge asking the participants to annotate the corpus with their text processing solutions.ResultsAll four PPs from the CALBC project and in addition, 12 challenge participants (CPs) contributed annotated data sets for an evaluation against the SSC-I. CPs could ignore the training data and deliver the annotations from their genuine annotation system, or could train a machine-learning approach on the provided pre-annotated data. In general, the performances of the annotation solutions were lower for entities from the categories CHED and PRGE in comparison to the identification of entities categorized as DISO and SPE. The best performance over all semantic groups were achieved from two annotation solutions that have been trained on the SSC-I.The data sets from participants were used to generate the harmonised Silver Standard Corpus II (SSC-II), if the participant did not make use of the annotated data set from the SSC-I for training purposes. The performances of the participants’ solutions were again measured against the SSC-II. The performances of the annotation solutions showed again better results for DISO and SPE in comparison to CHED and PRGE.ConclusionsThe SSC-I delivers a large set of annotations (1,121,705) for a large number of documents (100,000 Medline abstracts). The annotations cover four different semantic groups and are sufficiently homogeneous to be reproduced with a trained classifier leading to an average F-measure of 85%. Benchmarking the annotation solutions against the SSC-II leads to better performance for the CPs’ annotation solutions in comparison to the SSC-I.


BMC Bioinformatics | 2008

Towards ontology-driven navigation of the lipid bibliosphere

Christopher J. O. Baker; Rajaraman Kanagasabai; Wee Tiong Ang; Anitha Veeramani; Hong-Sang Low; Markus R. Wenk

BackgroundThe indexing of scientific literature and content is a relevant and contemporary requirement within life science information systems. Navigating information available in legacy formats continues to be a challenge both in enterprise and academic domains. The emergence of semantic web technologies and their fusion with artificial intelligence techniques has provided a new toolkit with which to address these data integration challenges. In the emerging field of lipidomics such navigation challenges are barriers to the translation of scientific results into actionable knowledge, critical to the treatment of diseases such as Alzheimers syndrome, Mycobacterium infections and cancer.ResultsWe present a literature-driven workflow involving document delivery and natural language processing steps generating tagged sentences containing lipid, protein and disease names, which are instantiated to custom designed lipid ontology. We describe the design challenges in capturing lipid nomenclature, the mandate of the ontology and its role as query model in the navigation of the lipid bibliosphere. We illustrate the extent of the description logic-based A-box query capability provided by the instantiated ontology using a graphical query composer to query sentences describing lipid-protein and lipid-disease correlations.ConclusionAs scientists accept the need to readjust the manner in which we search for information and derive knowledge we illustrate a system that can constrain the literature explosion and knowledge navigation problems. Specifically we have focussed on solving this challenge for lipidomics researchers who have to deal with the lack of standardized vocabulary, differing classification schemes, and a wide array of synonyms before being able to derive scientific insights. The use of the OWL-DL variant of the Web Ontology Language (OWL) and description logic reasoning is pivotal in this regard, providing the lipid scientist with advanced query access to the results of text mining algorithms instantiated into the ontology. The visual query paradigm assists in the adoption of this technology.


Information Systems Frontiers | 2006

Mutation Mining--A Prospector's Tale

Christopher J. O. Baker; René Witte

Protein structure visualization tools render images that allow the user to explore structural features of a protein. Context specific information relating to a particular protein or protein family is, however, not easily integrated and must be uploaded from databases or provided through manual curation of input files. Protein Engineers spend considerable time iteratively reviewing both literature and protein structure visualizations manually annotated with mutated residues. Meanwhile, text mining tools are increasingly used to extract specific units of raw text from scientific literature and have demonstrated the potential to support the activities of Protein Engineers.The transfer of mutation specific raw-text annotations to protein structures requires integrated data processing pipelines that can co-ordinate information retrieval, information extraction, protein sequence retrieval, sequence alignment and mutant residue mapping. We describe the Mutation Miner pipeline designed for this purpose and present case study evaluations of the key steps in the process. Starting with literature about mutations made to protein families; haloalkane dehalogenase, bi-phenyl dioxygenase, and xylanase we enumerate relevant documents available for text mining analysis, the available electronic formats, and the number of mutations made to a given protein family. We review the efficiency of NLP driven protein sequence retrieval from databases and report on the effectiveness of Mutation Miner in mapping annotations to protein structure visualizations. We highlight the feasibility and practicability of the approach.


Journal of Web Semantics | 2006

Semantic web infrastructure for fungal enzyme biotechnologists

Christopher J. O. Baker; Arash Shaban-Nejad; Xiao Su; Volker Haarslev; Greg Butler

The FungalWeb Ontology seeks to support various data integration needs of enzyme biotechnology from inception to product roll. Serving as a knowledgebase for decision support, the conceptualization seeks to link fungal species with enzymes, enzyme substrates, enzyme classifications, enzyme modifications, enzyme related intellectual property, enzyme retail and applications. The ontology, developed in the OWL language, is the result of the integration of numerous biological database schemas, web accessible text resources and components of existing ontologies. We assess the quantity of implicit knowledge in the FungalWeb Ontology by analyzing the range of tags in the OWL files and along with other description logic (DL) computable metrics of the ontology, contrast it with other publicly available bio-ontologies. Thereafter, we demonstrate how the FungalWeb Ontology supports its broad remit required in fungal biotechnology by (i) presenting application scenarios (ii) presenting the conceptualizations of the ontological frame able to support these scenarios and (iii) suggesting semantic queries typical of a fungal enzymologist involved in product development. Recognizing the complexity of the ontology query process for the non-technical manager we introduce a simplified query tool, Ontoligent Interactive Query (OntoIQ) that allows the user to browse and build queries from a selection of query patterns and ontology content. The OntoIQ interface supports users not familiar with writing DL syntax allowing them access to the ontology with expressive description logic reasoning tools. Finally we discuss the challenges encountered during the development of semantic infrastructure for fungal enzyme biotechnologists.


BMC Genomics | 2010

Algorithms and semantic infrastructure for mutation impact extraction and grounding

Jonas B. Laurila; Nona Naderi; René Witte; Alexandre Riazanov; Alexandre Kouznetsov; Christopher J. O. Baker

BackgroundMutation impact extraction is a hitherto unaccomplished task in state of the art mutation extraction systems. Protein mutations and their impacts on protein properties are hidden in scientific literature, making them poorly accessible for protein engineers and inaccessible for phenotype-prediction systems that currently depend on manually curated genomic variation databases.ResultsWe present the first rule-based approach for the extraction of mutation impacts on protein properties, categorizing their directionality as positive, negative or neutral. Furthermore protein and mutation mentions are grounded to their respective UniProtKB IDs and selected protein properties, namely protein functions to concepts found in the Gene Ontology. The extracted entities are populated to an OWL-DL Mutation Impact ontology facilitating complex querying for mutation impacts using SPARQL. We illustrate retrieval of proteins and mutant sequences for a given direction of impact on specific protein properties. Moreover we provide programmatic access to the data through semantic web services using the SADI (Semantic Automated Discovery and Integration) framework.ConclusionWe address the problem of access to legacy mutation data in unstructured form through the creation of novel mutation impact extraction methods which are evaluated on a corpus of full-text articles on haloalkane dehalogenases, tagged by domain experts. Our approaches show state of the art levels of precision and recall for Mutation Grounding and respectable level of precision but lower recall for the task of Mutant-Impact relation extraction. The system is deployed using text mining and semantic web technologies with the goal of publishing to a broad spectrum of consumers.


Bioinformatics | 2011

OrganismTagger: detection, normalization and grounding of organism entities in biomedical documents

Nona Naderi; Thomas Kappler; Christopher J. O. Baker; René Witte

MOTIVATION Semantic tagging of organism mentions in full-text articles is an important part of literature mining and semantic enrichment solutions. Tagged organism mentions also play a pivotal role in disambiguating other entities in a text, such as proteins. A high-precision organism tagging system must be able to detect the numerous forms of organism mentions, including common names as well as the traditional taxonomic groups: genus, species and strains. In addition, such a system must resolve abbreviations and acronyms, assign the scientific name and if possible link the detected mention to the NCBI Taxonomy database for further semantic queries and literature navigation. RESULTS We present the OrganismTagger, a hybrid rule-based/machine learning system to extract organism mentions from the literature. It includes tools for automatically generating lexical and ontological resources from a copy of the NCBI Taxonomy database, thereby facilitating system updates by end users. Its novel ontology-based resources can also be reused in other semantic mining and linked data tasks. Each detected organism mention is normalized to a canonical name through the resolution of acronyms and abbreviations and subsequently grounded with an NCBI Taxonomy database ID. In particular, our system combines a novel machine-learning approach with rule-based and lexical methods for detecting strain mentions in documents. On our manually annotated OT corpus, the OrganismTagger achieves a precision of 95%, a recall of 94% and a grounding accuracy of 97.5%. On the manually annotated corpus of Linnaeus-100, the results show a precision of 99%, recall of 97% and grounding accuracy of 97.4%. AVAILABILITY The OrganismTagger, including supporting tools, resources, training data and manual annotations, as well as end user and developer documentation, is freely available under an open-source license at http://www.semanticsoftware.info/organism-tagger. CONTACT [email protected].


Comparative Biochemistry and Physiology Part D: Genomics and Proteomics | 2013

Classifying chemical mode of action using gene networks and machine learning: A case study with the herbicide linuron

Anna Ornostay; Andrew M. Cowie; Matthew M. Hindle; Christopher J. O. Baker; Christopher J. Martyniuk

The herbicide linuron (LIN) is an endocrine disruptor with an anti-androgenic mode of action. The objectives of this study were to (1) improve knowledge of androgen and anti-androgen signaling in the teleostean ovary and to (2) assess the ability of gene networks and machine learning to classify LIN as an anti-androgen using transcriptomic data. Ovarian explants from vitellogenic fathead minnows (FHMs) were exposed to three concentrations of either 5α-dihydrotestosterone (DHT), flutamide (FLUT), or LIN for 12h. Ovaries exposed to DHT showed a significant increase in 17β-estradiol (E2) production while FLUT and LIN had no effect on E2. To improve understanding of androgen receptor signaling in the ovary, a reciprocal gene expression network was constructed for DHT and FLUT using pathway analysis and these data suggested that steroid metabolism, translation, and DNA replication are processes regulated through AR signaling in the ovary. Sub-network enrichment analysis revealed that FLUT and LIN shared more regulated gene networks in common compared to DHT. Using transcriptomic datasets from different fish species, machine learning algorithms classified LIN successfully with other anti-androgens. This study advances knowledge regarding molecular signaling cascades in the ovary that are responsive to androgens and anti-androgens and provides proof of concept that gene network analysis and machine learning can classify priority chemicals using experimental transcriptomic data collected from different fish species.


Integrated Environmental Assessment and Management | 2011

Toward a knowledge infrastructure for traits-based ecological risk assessment.

Donald J. Baird; Christopher J. O. Baker; Robert B. Brua; Mehrdad Hajibabaei; Kearon McNicol; Timothy Pascoe; Dick de Zwart

The trait approach has already indicated significant potential as a tool in understanding natural variation among species in sensitivity to contaminants in the process of ecological risk assessment. However, to realize its full potential, a defined nomenclature for traits is urgently required, and significant effort is required to populate databases of species-trait relationships. Recently, there have been significant advances in the area of information management and discovery in the area of the semantic web. Combined with continuing progress in biological trait knowledge, these suggest that the time is right for a reevaluation of how trait information from divergent research traditions is collated and made available for end users in the field of environmental management. Although there has already been a great deal of work on traits, the information is scattered throughout databases, literature, and undiscovered sources. Further progress will require better leverage of this existing data and research to fill in the gaps. We review and discuss a number of technical and social challenges to bringing together existing information and moving toward a new, collaborative approach. Finally, we outline a path toward enhanced knowledge discovery within the traits domain space, showing that, by linking knowledge management infrastructure, semantic metadata (trait ontologies), and Web 2.0 and 3.0 technologies, we can begin to construct a dedicated platform for TERA science.


Archive | 2007

Semantic Web Approach to Database Integration in the Life Sciences

Kei-Hoi Cheung; Andrew Smith; Kevin Y. Yip; Christopher J. O. Baker; Mark Gerstein

This chapter describes the challenges involved in the integration of databases storing diverse but related types of life sciences data. A major challenge in this regard is the syntactic and semantic heterogeneity of life sciences databases. There is a strong need for standardizing the syntactic and semantic data representations. We discuss how to address this by using the emerging Semantic Web technologies based on the Resource Description Framework (RDF) standard. This chapter presents two use cases, namely YeastHub and LinkHub, which demonstrate how to use the latest RDF database technology to build data warehouses that facilitate integration of genomic/proteomic data and identifiers.

Collaboration


Dive into the Christopher J. O. Baker's collaboration.

Top Co-Authors

Avatar

Alexandre Riazanov

University of New Brunswick

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Artjom Klein

University of New Brunswick

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Jonas B. Laurila

University of New Brunswick

View shared research outputs
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge