Is this you? Create Your Porfile

James P. Balhoff

University of North Carolina at Chapel Hill

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where James P. Balhoff is active.

Explore More

Publication

Featured researches published by James P. Balhoff.

PLOS Biology | 2015

Finding Our Way through Phenotypes

Andrew R. Deans; Suzanna E. Lewis; Eva Huala; Salvatore S. Anzaldo; Michael Ashburner; James P. Balhoff; David C. Blackburn; Judith A. Blake; J. Gordon Burleigh; Bruno Chanet; Laurel Cooper; Mélanie Courtot; Sándor Csösz; Hong Cui; Wasila M. Dahdul; Sandip Das; T. Alexander Dececchi; Agnes Dettai; Rui Diogo; Robert E. Druzinsky; Michel Dumontier; Nico M. Franz; Frank Friedrich; George V. Gkoutos; Melissa Haendel; Luke J. Harmon; Terry F. Hayamizu; Yongqun He; Heather M. Hines; Nizar Ibrahim

Imagine if we could compute across phenotype data as easily as genomic data; this article calls for efforts to realize this vision and discusses the potential benefits.

Genome Research | 2008

Development and application of a phylogenomic toolkit: Resolving the evolutionary history of Madagascar’s lemurs

Julie E. Horvath; David W. Weisrock; Stephanie L. Embry; Isabella Fiorentino; James P. Balhoff; Peter M. Kappeler; Gregory A. Wray; Huntington F. Willard; Anne D. Yoder

Lemurs and the other strepsirrhine primates are of great interest to the primate genomics community due to their phylogenetic placement as the sister lineage to all other primates. Previous attempts to resolve the phylogeny of lemurs employed limited mitochondrial or small nuclear data sets, with many relationships poorly supported or entirely unresolved. We used genomic resources to develop 11 novel markers from nine chromosomes, representing approximately 9 kb of nuclear sequence data. In combination with previously published nuclear and mitochondrial loci, this yields a data set of more than 16 kb and adds approximately 275 kb of DNA sequence to current databases. Our phylogenetic analyses confirm hypotheses of lemuriform monophyly and provide robust resolution of the phylogenetic relationships among the five lemuriform families. We verify that the genus Daubentonia is the sister lineage to all other lemurs. The Cheirogaleidae and Lepilemuridae are sister taxa and together form the sister lineage to the Indriidae; this clade is the sister lineage to the Lemuridae. Divergence time estimates indicate that lemurs are an ancient group, with their initial diversification occurring around the Cretaceous-Tertiary boundary. Given the power of this data set to resolve branches in a notoriously problematic area of primate phylogeny, we anticipate that our phylogenomic toolkit will be of value to other studies of primate phylogeny and diversification. Moreover, the methods applied will be broadly applicable to other taxonomic groups where phylogenetic relationships have been notoriously difficult to resolve.

Trends in Ecology and Evolution | 2012

Time to change how we describe biodiversity

Andrew R. Deans; Matthew J. Yoder; James P. Balhoff

Taxonomists are arguably the most active annotators of the natural world, collecting and publishing millions of phenotype data annually through descriptions of new taxa. By formalizing these data, preferably as they are collected, taxonomists stand to contribute a data set with research potential that rivals or even surpasses genomics. Over a decade of electronic innovation and debate has initiated a revolution in the way that the biodiversity is described. Here, we opine that a new generation of semantically based digital scaffolding, presently in various stages of completeness, and a commitment by taxonomists and their colleagues to undertake this transformation, are required to complete the taxonomic revolution and critically broaden the relevance of its products.

PLOS ONE | 2010

Evolutionary Characters, Phenotypes and Ontologies: Curating Data from the Systematic Biology Literature

Wasila M. Dahdul; James P. Balhoff; Jeffrey M. Engeman; Terry Grande; Eric J. Hilton; Cartik R. Kothari; Hilmar Lapp; John G. Lundberg; Peter E. Midford; Monte Westerfield; Paula M. Mabee

Background The wealth of phenotypic descriptions documented in the published articles, monographs, and dissertations of phylogenetic systematics is traditionally reported in a free-text format, and it is therefore largely inaccessible for linkage to biological databases for genetics, development, and phenotypes, and difficult to manage for large-scale integrative work. The Phenoscape project aims to represent these complex and detailed descriptions with rich and formal semantics that are amenable to computation and integration with phenotype data from other fields of biology. This entails reconceptualizing the traditional free-text characters into the computable Entity-Quality (EQ) formalism using ontologies. Methodology/Principal Findings We used ontologies and the EQ formalism to curate a collection of 47 phylogenetic studies on ostariophysan fishes (including catfishes, characins, minnows, knifefishes) and their relatives with the goal of integrating these complex phenotype descriptions with information from an existing model organism database (zebrafish, http://zfin.org). We developed a curation workflow for the collection of character, taxonomic and specimen data from these publications. A total of 4,617 phenotypic characters (10,512 states) for 3,449 taxa, primarily species, were curated into EQ formalism (for a total of 12,861 EQ statements) using anatomical and taxonomic terms from teleost-specific ontologies (Teleost Anatomy Ontology and Teleost Taxonomy Ontology) in combination with terms from a quality ontology (Phenotype and Trait Ontology). Standards and guidelines for consistently and accurately representing phenotypes were developed in response to the challenges that were evident from two annotation experiments and from feedback from curators. Conclusions/Significance The challenges we encountered and many of the curation standards and methods for improving consistency that we developed are generally applicable to any effort to represent phenotypes using ontologies. This is because an ontological representation of the detailed variations in phenotype, whether between mutant or wildtype, among individual humans, or across the diversity of species, requires a process by which a precise combination of terms from domain ontologies are selected and organized according to logical relations. The efficiencies that we have developed in this process will be useful for any attempt to annotate complex phenotypic descriptions using ontologies. We also discuss some ramifications of EQ representation for the domain of systematics.

PLOS ONE | 2010

Phenex: ontological annotation of phenotypic diversity.

James P. Balhoff; Wasila M. Dahdul; Cartik R. Kothari; Hilmar Lapp; John G. Lundberg; Paula M. Mabee; Peter E. Midford; Monte Westerfield

Background Phenotypic differences among species have long been systematically itemized and described by biologists in the process of investigating phylogenetic relationships and trait evolution. Traditionally, these descriptions have been expressed in natural language within the context of individual journal publications or monographs. As such, this rich store of phenotype data has been largely unavailable for statistical and computational comparisons across studies or integration with other biological knowledge. Methodology/Principal Findings Here we describe Phenex, a platform-independent desktop application designed to facilitate efficient and consistent annotation of phenotypic similarities and differences using Entity-Quality syntax, drawing on terms from community ontologies for anatomical entities, phenotypic qualities, and taxonomic names. Phenex can be configured to load only those ontologies pertinent to a taxonomic group of interest. The graphical user interface was optimized for evolutionary biologists accustomed to working with lists of taxa, characters, character states, and character-by-taxon matrices. Conclusions/Significance Annotation of phenotypic data using ontologies and globally unique taxonomic identifiers will allow biologists to integrate phenotypic data from different organisms and studies, leveraging decades of work in systematics and comparative morphology.

Systematic Biology | 2010

The Teleost Anatomy Ontology: Anatomical Representation for the Genomics Age

Wasila M. Dahdul; John G. Lundberg; Peter E. Midford; James P. Balhoff; Hilmar Lapp; Melissa Haendel; Monte Westerfield; Paula M. Mabee

Abstract The rich knowledge of morphological variation among organisms reported in the systematic literature has remained in free-text format, impractical for use in large-scale synthetic phylogenetic work. This noncomputable format has also precluded linkage to the large knowledgebase of genomic, genetic, developmental, and phenotype data in model organism databases. We have undertaken an effort to prototype a curated, ontology-based evolutionary morphology database that maps to these genetic databases (http://kb.phenoscape.org) to facilitate investigation into the mechanistic basis and evolution of phenotypic diversity. Among the first requirements in establishing this database was the development of a multispecies anatomy ontology with the goal of capturing anatomical data in a systematic and computable manner. An ontology is a formal representation of a set of concepts with defined relationships between those concepts. Multispecies anatomy ontologies in particular are an efficient way to represent the diversity of morphological structures in a clade of organisms, but they present challenges in their development relative to single-species anatomy ontologies. Here, we describe the Teleost Anatomy Ontology (TAO), a multispecies anatomy ontology for teleost fishes derived from the Zebrafish Anatomical Ontology (ZFA) for the purpose of annotating varying morphological features across species. To facilitate interoperability with other anatomy ontologies, TAO uses the Common Anatomy Reference Ontology as a template for its upper level nodes, and TAO and ZFA are synchronized, with zebrafish terms specified as subtypes of teleost terms. We found that the details of ontology architecture have ramifications for querying, and we present general challenges in developing a multispecies anatomy ontology, including refinement of definitions, taxon-specific relationships among terms, and representation of taxonomically variable developmental pathways.

Database | 2013

An overview of the BioCreative 2012 Workshop Track III: interactive text mining task.

Cecilia N. Arighi; Ben Carterette; K. Bretonnel Cohen; Martin Krallinger; W. John Wilbur; Petra Fey; Robert Dodson; Laurel Cooper; Ceri E. Van Slyke; Wasila M. Dahdul; Paula M. Mabee; Donghui Li; Bethany Harris; Marc Gillespie; Silvia Jimenez; Phoebe M. Roberts; Lisa Matthews; Kevin G. Becker; Harold J. Drabkin; Susan M. Bello; Luana Licata; Andrew Chatr-aryamontri; Mary L. Schaeffer; Julie Park; Melissa Haendel; Kimberly Van Auken; Yuling Li; Juancarlos Chan; Hans-Michael Müller; Hong Cui

In many databases, biocuration primarily involves literature curation, which usually involves retrieving relevant articles, extracting information that will translate into annotations and identifying new incoming literature. As the volume of biological literature increases, the use of text mining to assist in biocuration becomes increasingly relevant. A number of groups have developed tools for text mining from a computer science/linguistics perspective, and there are many initiatives to curate some aspect of biology from the literature. Some biocuration efforts already make use of a text mining tool, but there have not been many broad-based systematic efforts to study which aspects of a text mining tool contribute to its usefulness for a curation task. Here, we report on an effort to bring together text mining tool developers and database biocurators to test the utility and usability of tools. Six text mining systems presenting diverse biocuration tasks participated in a formal evaluation, and appropriate biocurators were recruited for testing. The performance results from this evaluation indicate that some of the systems were able to improve efficiency of curation by speeding up the curation task significantly (∼1.7- to 2.5-fold) over manual curation. In addition, some of the systems were able to improve annotation accuracy when compared with the performance on the manually curated set. In terms of inter-annotator agreement, the factors that contributed to significant differences for some of the systems included the expertise of the biocurator on the given curation task, the inherent difficulty of the curation and attention to annotation guidelines. After the task, annotators were asked to complete a survey to help identify strengths and weaknesses of the various systems. The analysis of this survey highlights how important task completion is to the biocurators’ overall experience of a system, regardless of the system’s high score on design, learnability and usability. In addition, strategies to refine the annotation guidelines and systems documentation, to adapt the tools to the needs and query types the end user might have and to evaluate performance in terms of efficiency, user interface, result export and traditional evaluation metrics have been analyzed during this task. This analysis will help to plan for a more intense study in BioCreative IV.

Journal of Biomedical Semantics | 2014

Unification of multi-species vertebrate anatomy ontologies for comparative biology in Uberon

Melissa Haendel; James P. Balhoff; Frederic B. Bastian; David C. Blackburn; Judith A. Blake; Yvonne M. Bradford; Aurélie Comte; Wasila M. Dahdul; Thomas Dececchi; Robert E. Druzinsky; Terry F. Hayamizu; Nizar Ibrahim; Suzanna E. Lewis; Paula M. Mabee; Anne Niknejad; Marc Robinson-Rechavi; Paul C. Sereno; Christopher J. Mungall

BackgroundElucidating disease and developmental dysfunction requires understanding variation in phenotype. Single-species model organism anatomy ontologies (ssAOs) have been established to represent this variation. Multi-species anatomy ontologies (msAOs; vertebrate skeletal, vertebrate homologous, teleost, amphibian AOs) have been developed to represent ‘natural’ phenotypic variation across species. Our aim has been to integrate ssAOs and msAOs for various purposes, including establishing links between phenotypic variation and candidate genes.ResultsPreviously, msAOs contained a mixture of unique and overlapping content. This hampered integration and coordination due to the need to maintain cross-references or inter-ontology equivalence axioms to the ssAOs, or to perform large-scale obsolescence and modular import. Here we present the unification of anatomy ontologies into Uberon, a single ontology resource that enables interoperability among disparate data and research groups. As a consequence, independent development of TAO, VSAO, AAO, and vHOG has been discontinued.ConclusionsThe newly broadened Uberon ontology is a unified cross-taxon resource for metazoans (animals) that has been substantially expanded to include a broad diversity of vertebrate anatomical structures, permitting reasoning across anatomical variation in extinct and extant taxa. Uberon is a core resource that supports single- and cross-species queries for candidate genes using annotations for phenotypes from the systematics, biodiversity, medical, and model organism communities, while also providing entities for logical definitions in the Cell and Gene Ontologies.The ontology release files associated with the ontology merge described in this manuscript are available at: http://purl.obolibrary.org/obo/uberon/releases/2013-02-21/Current ontology release files are available always available at: http://purl.obolibrary.org/obo/uberon/releases/

Systematic Biology | 2012

NeXML: Rich, Extensible, and Verifiable Representation of Comparative Data and Metadata

Rutger A. Vos; James P. Balhoff; Jason Caravas; Mark T. Holder; Hilmar Lapp; Wayne P. Maddison; Peter E. Midford; Anurag Priyam; Jeet Sukumaran; Xuhua Xia; Arlin Stoltzfus

Abstract In scientific research, integration and synthesis require a common understanding of where data come from, how much they can be trusted, and what they may be used for. To make such an understanding computer-accessible requires standards for exchanging richly annotated data. The challenges of conveying reusable data are particularly acute in regard to evolutionary comparative analysis, which comprises an ever-expanding list of data types, methods, research aims, and subdisciplines. To facilitate interoperability in evolutionary comparative analysis, we present NeXML, an XML standard (inspired by the current standard, NEXUS) that supports exchange of richly annotated comparative data. NeXML defines syntax for operational taxonomic units, character-state matrices, and phylogenetic trees and networks. Documents can be validated unambiguously. Importantly, any data element can be annotated, to an arbitrary degree of richness, using a system that is both flexible and rigorous. We describe how the use of NeXML by the TreeBASE and Phenoscape projects satisfies user needs that cannot be satisfied with other available file formats. By relying on XML Schema Definition, the design of NeXML facilitates the development and deployment of software for processing, transforming, and querying documents. The adoption of NeXML for practical use is facilitated by the availability of (1) an online manual with code samples and a reference to all defined elements and attributes, (2) programming toolkits in most of the languages used commonly in evolutionary informatics, and (3) input–output support in several widely used software applications. An active, open, community-based development process enables future revision and expansion of NeXML.

Journal of Applied Ichthyology | 2012

500,000 fish phenotypes: The new informatics landscape for evolutionary and developmental biology of the vertebrate skeleton

Paula M. Mabee; James P. Balhoff; Wasila M. Dahdul; Hilmar Lapp; Peter E. Midford; Monte Westerfield

The rich phenotypic diversity that characterizes the vertebrate skeleton results from evolutionary changes in regulation of genes that drive development. Although relatively little is known about the genes that underlie the skeletal variation among fish species, significant knowledge of genetics and development is available for zebrafish. Because developmental processes are highly conserved, this knowledge can be leveraged for understanding the evolution of skeletal diversity. We developed the Phenoscape Knowledgebase (KB; http://kb.phenoscape.org) to yield testable hypotheses of candidate genes involved in skeletal evolution. We developed a community anatomy ontology for fishes and ontology-based methods to represent complex free-text character descriptions of species in a computable format. With these tools, we populated the KB with comparative morphological data from the literature on over 2500 teleost fishes (mainly Ostariophysi) resulting in over 500,000 taxon phenotype annotations. The KB integrates these data with similarly structured phenotype data from zebrafish genes (http://zfin.org). Using ontology-based reasoning, candidate genes can be inferred for the phenotypes that vary across taxa, thereby uniting genetic and phenotypic data to formulate evo-devo hypotheses. The morphological data in the KB can be browsed, sorted, and aggregated in ways that provide unprecedented possibilities for data mining and discovery.

Explore More