Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Michael Bada is active.

Publication


Featured researches published by Michael Bada.


Journal of Biomedical Informatics | 2011

Cross-product extensions of the Gene Ontology

Christopher J. Mungall; Michael Bada; Tanya Z. Berardini; Jennifer I. Deegan; Amelia Ireland; Midori A. Harris; David P. Hill; Jane Lomax

The Gene Ontology (GO) consists of nearly 30,000 classes for describing the activities and locations of gene products. Manual maintenance of ontology of this size is a considerable effort, and errors and inconsistencies inevitably arise. Reasoners can be used to assist with ontology development, automatically placing classes in a subsumption hierarchy based on their properties. However, the historic lack of computable definitions within the GO has prevented the user of these tools. In this paper, we present preliminary results of an ongoing effort to normalize the GO by explicitly stating the definitions of compositional classes in a form that can be used by reasoners. These definitions are partitioned into mutually exclusive cross-product sets, many of which reference other OBO Foundry candidate ontologies for chemical entities, proteins, biological qualities and anatomical entities. Using these logical definitions we are gradually beginning to automate many aspects of ontology development, detecting errors and filling in missing relationships. These definitions also enhance the GO by weaving it into the fabric of a wider collection of interoperating ontologies, increasing opportunities for data integration and enhancing genomic analyses.


BMC Bioinformatics | 2012

Concept annotation in the CRAFT corpus

Michael Bada; Miriam Eckert; Donald Evans; Kristin Garcia; Krista Shipley; Dmitry Sitnikov; William A. Baumgartner; K. Bretonnel Cohen; Karin Verspoor; Judith A. Blake; Lawrence Hunter

BackgroundManually annotated corpora are critical for the training and evaluation of automated methods to identify concepts in biomedical text.ResultsThis paper presents the concept annotations of the Colorado Richly Annotated Full-Text (CRAFT) Corpus, a collection of 97 full-length, open-access biomedical journal articles that have been annotated both semantically and syntactically to serve as a research resource for the biomedical natural-language-processing (NLP) community. CRAFT identifies all mentions of nearly all concepts from nine prominent biomedical ontologies and terminologies: the Cell Type Ontology, the Chemical Entities of Biological Interest ontology, the NCBI Taxonomy, the Protein Ontology, the Sequence Ontology, the entries of the Entrez Gene database, and the three subontologies of the Gene Ontology. The first public release includes the annotations for 67 of the 97 articles, reserving two sets of 15 articles for future text-mining competitions (after which these too will be released). Concept annotations were created based on a single set of guidelines, which has enabled us to achieve consistently high interannotator agreement.ConclusionsAs the initial 67-article release contains more than 560,000 tokens (and the full set more than 790,000 tokens), our corpus is among the largest gold-standard annotated biomedical corpora. Unlike most others, the journal articles that comprise the corpus are drawn from diverse biomedical disciplines and are marked up in their entirety. Additionally, with a concept-annotation count of nearly 100,000 in the 67-article subset (and more than 140,000 in the full collection), the scale of conceptual markup is also among the largest of comparable corpora. The concept annotations of the CRAFT Corpus have the potential to significantly advance biomedical text mining by providing a high-quality gold standard for NLP systems. The corpus, annotation guidelines, and other associated resources are freely available at http://bionlp-corpora.sourceforge.net/CRAFT/index.shtml.


BMC Bioinformatics | 2012

A corpus of full-text journal articles is a robust evaluation tool for revealing differences in performance of biomedical natural language processing tools

Karin Verspoor; Kevin Bretonnel Cohen; Arrick Lanfranchi; Colin Warner; Helen L. Johnson; Christophe Roeder; Jinho D. Choi; Christopher S. Funk; Yuriy Malenkiy; Miriam Eckert; Nianwen Xue; William A. Baumgartner; Michael Bada; Martha Palmer; Lawrence Hunter

BackgroundWe introduce the linguistic annotation of a corpus of 97 full-text biomedical publications, known as the Colorado Richly Annotated Full Text (CRAFT) corpus. We further assess the performance of existing tools for performing sentence splitting, tokenization, syntactic parsing, and named entity recognition on this corpus.ResultsMany biomedical natural language processing systems demonstrated large differences between their previously published results and their performance on the CRAFT corpus when tested with the publicly available models or rule sets. Trainable systems differed widely with respect to their ability to build high-performing models based on this data.ConclusionsThe finding that some systems were able to train high-performing models based on this corpus is additional evidence, beyond high inter-annotator agreement, that the quality of the CRAFT corpus is high. The overall poor performance of various systems indicates that considerable work needs to be done to enable natural language processing systems to work well when the input is full-text journal articles. The CRAFT corpus provides a valuable resource to the biomedical natural language processing community for evaluation and training of new models for biomedical full text publications.


BMC Bioinformatics | 2014

Large-scale biomedical concept recognition: an evaluation of current automatic annotators and their parameters.

Christopher S. Funk; William A. Baumgartner; Benjamin Garcia; Christophe Roeder; Michael Bada; K. Bretonnel Cohen; Lawrence Hunter; Karin Verspoor

BackgroundOntological concepts are useful for many different biomedical tasks. Concepts are difficult to recognize in text due to a disconnect between what is captured in an ontology and how the concepts are expressed in text. There are many recognizers for specific ontologies, but a general approach for concept recognition is an open problem.ResultsThree dictionary-based systems (MetaMap, NCBO Annotator, and ConceptMapper) are evaluated on eight biomedical ontologies in the Colorado Richly Annotated Full-Text (CRAFT) Corpus. Over 1,000 parameter combinations are examined, and best-performing parameters for each system-ontology pair are presented.ConclusionsBaselines for concept recognition by three systems on eight biomedical ontologies are established (F-measures range from 0.14–0.83). Out of the three systems we tested, ConceptMapper is generally the best-performing system; it produces the highest F-measure of seven out of eight ontologies. Default parameters are not ideal for most systems on most ontologies; by changing parameters F-measure can be increased by up to 0.4. Not only are best performing parameters presented, but suggestions for choosing the best parameters based on ontology characteristics are presented.


pacific symposium on biocomputing | 2005

EVALUATION OF LEXICAL METHODS FOR DETECTING RELATIONSHIPS BETWEEN CONCEPTS FROM MULTIPLE ONTOLOGIES

Helen L. Johnson; K. Bretonnel Cohen; William A. Baumgartner; Zhiyong Lu; Michael Bada; Todd Kester; Hyunmin Kim; Lawrence Hunter

We used exact term matching, stemming, and inclusion of synonyms, implemented via the Lucene information retrieval library, to discover relationships between the Gene Ontology and three other OBO ontologies: ChEBI, Cell Type, and BRENDA Tissue. Proposed relationships were evaluated by domain experts. We discovered 91,385 relationships between the ontologies. Various methods had a wide range of correctness. Based on these results, we recommend careful evaluation of all matching strategies before use, including exact string matching. The full set of relationships is available at compbio.uchsc.edu/dependencies.


international conference on management of data | 2004

Using reasoning to guide annotation with gene ontology terms in GOAT

Michael Bada; Daniele Turi; Robin McEntire; Robert Stevens

High-quality annotation of biological data is central to bioinformatics. Annotation using terms from ontologies provides reliable computational access to data. The Gene Ontology (GO), a structured controlled vocabulary of nearly 17,000 terms, is becoming the de facto standard for describing the functionality of gene products. Many prominent biomedical databases use GO as a source of terms for functional annotation of their gene-product entries to promote consistent querying and interoperability. However, current annotation editors do not constrain the choice of GO terms users may enter for a given gene product, potentially resulting in an inconsistent or even nonsensical description. Furthermore, the process of annotation is largely an unguided one in which the user must wade through large GO subtrees in search of terms. Relying upon a reasoner loaded with a DAML+OIL version of GO and an instance store of mined GO-term-to-GO-term associations, GOAT aims to aid the user in the annotation of gene products with GO terms by displaying those field values that are most likely to be appropriate based on previously entered terms. This can result in a reduction in biologically inconsistent combinations of GO terms and a less tedious annotation process on the part of the user.


Journal of Biomedical Informatics | 2007

Enrichment of OBO ontologies

Michael Bada; Lawrence Hunter

This paper describes a frame-based integration of the three GO subontologies, the Chemical Entities of Biological Interest ontology, and the Cell Type Ontology in which relationships are modeled in a way that better captures the semantics between biological concepts represented by the terms, rather than between the terms themselves, than previous frame-based efforts. We also describe a methodology for creating suggested enriching assertions by identifying patterns in GO terms, mapping these patterns to new, specific relationships, and matching term substrings to concepts. Using this methodology, a predicted assertion was made for 62% of GO terms that matched one of 31 patterns, and 97% of these predicted assertions were assessed to be valid, resulting in an initial set of over 4000 assertions. Furthermore, this methodology programmatically integrates assertions into an ontology such that each assertion is fully consistent with respect to higher (i.e., more general) relevant class and slot levels.


BMC Bioinformatics | 2015

KaBOB: ontology-based semantic integration of biomedical databases

Kevin Livingston; Michael Bada; William A. Baumgartner; Lawrence Hunter

BackgroundThe ability to query many independent biological databases using a common ontology-based semantic model would facilitate deeper integration and more effective utilization of these diverse and rapidly growing resources. Despite ongoing work moving toward shared data formats and linked identifiers, significant problems persist in semantic data integration in order to establish shared identity and shared meaning across heterogeneous biomedical data sources.ResultsWe present five processes for semantic data integration that, when applied collectively, solve seven key problems. These processes include making explicit the differences between biomedical concepts and database records, aggregating sets of identifiers denoting the same biomedical concepts across data sources, and using declaratively represented forward-chaining rules to take information that is variably represented in source databases and integrating it into a consistent biomedical representation. We demonstrate these processes and solutions by presenting KaBOB (the Knowledge Base Of Biomedicine), a knowledge base of semantically integrated data from 18 prominent biomedical databases using common representations grounded in Open Biomedical Ontologies. An instance of KaBOB with data about humans and seven major model organisms can be built using on the order of 500 million RDF triples. All source code for building KaBOB is available under an open-source license.ConclusionsKaBOB is an integrated knowledge base of biomedical data representationally based in prominent, actively maintained Open Biomedical Ontologies, thus enabling queries of the underlying data in terms of biomedical concepts (e.g., genes and gene products, interactions and processes) rather than features of source-specific data schemas or file formats. KaBOB resolves many of the issues that routinely plague biomedical researchers intending to work with data from multiple data sources and provides a platform for ongoing data integration and development and for formal reasoning over a wealth of integrated biomedical data.


Bioinformatics | 2008

Identification of OBO nonalignments and its implications for OBO enrichment

Michael Bada; Lawrence Hunter

Motivation: Existing projects that focus on the semiautomatic addition of links between existing terms in the Open Biomedical Ontologies can take advantage of reasoners that can make new inferences between terms that are based on the added formal definitions and that reflect nonalignments between the linked terms. However, these projects require that these definitions be necessary and sufficient, a strong requirement that often does not hold. If such definitions cannot be added, the reasoners cannot point to the nonalignments through the suggestion of new inferences. Results: We describe a methodology by which we have identified over 1900 instances of nonredundant nonalignments between terms from the Gene Ontology (GO) biological process (BP), cellular component (CC) and molecular function (MF) ontologies, Chemical Entities of Biological Interest (ChEBI) and the Cell Type Ontology (CL). Many of the 39.8% of these nonalignments whose object terms are more atomic than the subject terms are not currently examined in other ontology-enrichment projects due to the fact that the necessary and sufficient conditions required for the inferences are not currently examined. Analysis of the ratios of nonalignments to assertions from which the nonalignments were identified suggests that BP–MF, BP–BP, BP–CL and CC–CC terms are relatively well-aligned, while ChEBI–MF, BP–ChEBI and CC–MF terms are relatively not aligned well. We propose four ways to resolve an identified nonalignment and recommend an analogous implementation of our methodology in ontology-enrichment tools to identify types of nonalignments that are currently not detected. Availability: The nonalignments discussed in this article may be viewed at http://compbio.uchsc.edu/Hunter_lab/Bada/nonalignments_2008_03_06.html. Code for the generation of these nonalignments is available upon request. Contact: [email protected]


Journal of Biomedical Informatics | 2011

Desiderata for ontologies to be used in semantic annotation of biomedical documents

Michael Bada; Lawrence Hunter

A wealth of knowledge valuable to the translational research scientist is contained within the vast biomedical literature, but this knowledge is typically in the form of natural language. Sophisticated natural-language-processing systems are needed to translate text into unambiguous formal representations grounded in high-quality consensus ontologies, and these systems in turn rely on gold-standard corpora of annotated documents for training and testing. To this end, we are constructing the Colorado Richly Annotated Full-Text (CRAFT) Corpus, a collection of 97 full-text biomedical journal articles that are being manually annotated with the entire sets of terms from select vocabularies, predominantly from the Open Biomedical Ontologies (OBO) library. Our efforts in building this corpus has illuminated infelicities of these ontologies with respect to the semantic annotation of biomedical documents, and we propose desiderata whose implementation could substantially improve their utility in this task; these include the integration of overlapping terms across OBOs, the resolution of OBO-specific ambiguities, the integration of the BFO with the OBOs and the use of mid-level ontologies, the inclusion of noncanonical instances, and the expansion of relations and realizable entities.

Collaboration


Dive into the Michael Bada's collaboration.

Top Co-Authors

Avatar

Lawrence Hunter

University of Colorado Denver

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

K. Bretonnel Cohen

University of Colorado Denver

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Martha Palmer

University of Colorado Boulder

View shared research outputs
Top Co-Authors

Avatar

Christopher J. Mungall

Lawrence Berkeley National Laboratory

View shared research outputs
Top Co-Authors

Avatar

Christopher S. Funk

University of Colorado Denver

View shared research outputs
Top Co-Authors

Avatar

Helen L. Johnson

University of Colorado Denver

View shared research outputs
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge