Philip V. Ogren
Mayo Clinic
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Philip V. Ogren.
Journal of the American Medical Informatics Association | 2010
Guergana Savova; James J. Masanz; Philip V. Ogren; Jiaping Zheng; Sunghwan Sohn; Karin Kipper-Schuler; Christopher G. Chute
We aim to build and evaluate an open-source natural language processing system for information extraction from electronic medical record clinical free-text. We describe and evaluate our system, the clinical Text Analysis and Knowledge Extraction System (cTAKES), released open-source at http://www.ohnlp.org. The cTAKES builds on existing open-source technologies-the Unstructured Information Management Architecture framework and OpenNLP natural language processing toolkit. Its components, specifically trained for the clinical domain, create rich linguistic and semantic annotations. Performance of individual components: sentence boundary detector accuracy=0.949; tokenizer accuracy=0.949; part-of-speech tagger accuracy=0.936; shallow parser F-score=0.924; named entity recognizer and system-level evaluation F-score=0.715 for exact and 0.824 for overlapping spans, and accuracy for concept mapping, negation, and status attributes for exact and overlapping spans of 0.957, 0.943, 0.859, and 0.580, 0.939, and 0.839, respectively. Overall performance is discussed against five applications. The cTAKES annotations are the foundation for methods and modules for higher-level semantic processing of clinical free-text.
BMC Bioinformatics | 2008
Lawrence Hunter; Zhiyong Lu; James Firby; William A. Baumgartner; Helen L. Johnson; Philip V. Ogren; K. Bretonnel Cohen
BackgroundInformation extraction (IE) efforts are widely acknowledged to be important in harnessing the rapid advance of biomedical knowledge, particularly in areas where important factual information is published in a diverse literature. Here we report on the design, implementation and several evaluations of OpenDMAP, an ontology-driven, integrated concept analysis system. It significantly advances the state of the art in information extraction by leveraging knowledge in ontological resources, integrating diverse text processing applications, and using an expanded pattern language that allows the mixing of syntactic and semantic elements and variable ordering.ResultsOpenDMAP information extraction systems were produced for extracting protein transport assertions (transport), protein-protein interaction assertions (interaction) and assertions that a gene is expressed in a cell type (expression). Evaluations were performed on each system, resulting in F-scores ranging from .26 – .72 (precision .39 – .85, recall .16 – .85). Additionally, each of these systems was run over all abstracts in MEDLINE, producing a total of 72,460 transport instances, 265,795 interaction instances and 176,153 expression instances.ConclusionOpenDMAP advances the performance standards for extracting protein-protein interaction predications from the full texts of biomedical research articles. Furthermore, this level of performance appears to generalize to other information extraction tasks, including extracting information about predicates of more than two arguments. The output of the information extraction system is always constructed from elements of an ontology, ensuring that the knowledge representation is grounded with respect to a carefully constructed model of reality. The results of these efforts can be used to increase the efficiency of manual curation efforts and to provide additional features in systems that integrate multiple sources for information extraction. The open source OpenDMAP code library is freely available at http://bionlp.sourceforge.net/
language and technology conference | 2006
Philip V. Ogren
A general-purpose text annotation tool called Knowtator is introduced. Knowtator facilitates the manual creation of annotated corpora that can be used for evaluating or training a variety of natural language processing systems. Building on the strengths of the widely used Protege knowledge representation system, Knowtator has been developed as a Protege plug-in that leverages Proteges knowledge representation capabilities to specify annotation schemas. Knowtators unique advantage over other annotation tools is the ease with which complex annotation schemas (e.g. schemas which have constrained relationships between annotation types) can be defined and incorporated into use. Knowtator is available under the Mozilla Public License 1.1 at http://bionlp.sourceforge.net/Knowtator.
pacific symposium on biocomputing | 2003
Philip V. Ogren; Kevin Bretonnel Cohen; George K. Acquaah-Mensah; Jens Eberlein; Lawrence Hunter
An analysis of the term names in the Gene Ontology reveals the prevalence of substring relations between terms: 65.3% of all GO terms contain another GO term as a proper substring. This substring relation often coincides with a derivational relationship between the terms. For example, the term regulation of cell proliferation (GO:0042127) is derived from the term cell proliferation (GO:0008283) by addition of the phrase regulation of. Further, we note that particular substrings which are not themselves GO terms (e.g. regulation of in the preceding example) recur frequently and in consistent subtrees of the ontology, and that these frequently occurring substrings often indicate interesting semantic relationships between the related terms. We describe the extent of these phenomena--substring relations between terms, and the recurrence of derivational phrases such as regulation of--and propose that these phenomena can be exploited in various ways to make the information in GO more computationally accessible, to construct a conceptually richer representation of the data encoded in the ontology, and to assist in the analysis of natural language texts.
intelligent systems in molecular biology | 2005
K. Bretonnel Cohen; Lynne M. Fox; Philip V. Ogren; Lawrence Hunter
This paper classifies six publicly available biomedical corpora according to various corpus design features and characteristics. We then present usage data for the six corpora. We show that corpora that are carefully annotated with respect to structural and linguistic characteristics and that are distributed in standard formats are more widely used than corpora that are not. These findings have implications for the design of the next generation of biomedical corpora.
north american chapter of the association for computational linguistics | 2009
K. Bretonnel Cohen; Karin Verspoor; Helen L. Johnson; Christophe Roeder; Philip V. Ogren; William A. Baumgartner; Elizabeth K. White; Lawrence Hunter
We approached the problems of event detection, argument identification, and negation and speculation detection as one of concept recognition and analysis. Our methodology involved using the OpenDMAP semantic parser with manually-written rules. We achieved state-of-the-art precision for two of the three tasks, scoring the highest of 24 teams at precision of 71.81 on Task 1 and the highest of 6 teams at precision of 70.97 on Task 2. The OpenDMAP system and the rule set are available at bionlp.sourceforge.net.
Journal of Biomedical Informatics | 2008
Guergana Savova; Anni Coden; Igor L. Sominsky; Rie Johnson; Philip V. Ogren; Piet C. de Groen; Christopher G. Chute
The aim of this study is to explore the word sense disambiguation (WSD) problem across two biomedical domains-biomedical literature and clinical notes. A supervised machine learning technique was used for the WSD task. One of the challenges addressed is the creation of a suitable clinical corpus with manual sense annotations. This corpus in conjunction with the WSD set from the National Library of Medicine provided the basis for the evaluation of our method across multiple domains and for the comparison of our results to published ones. Noteworthy is that only 20% of the most relevant ambiguous terms within a domain overlap between the two domains, having more senses associated with them in the clinical space than in the biomedical literature space. Experimentation with 28 different feature sets rendered a system achieving an average F-score of 0.82 on the clinical data and 0.86 on the biomedical literature.
pacific symposium on biocomputing | 2004
Philip V. Ogren; K. Bretonnel Cohen; Lawrence Hunter
In this paper we argue that a richer underlying representational model for the Gene Ontology that captures the implicit compositional structure of GO terms could have a positive impact on two activities crucial to the success of GO: ontology curation and database annotation. We show that many of the new terms added to GO in a one-year span appear to be compositional variations of other terms. We found that 90.2% of the 3,652 new terms added between July 2003 and July 2004 exhibited characteristics of compositionality. We also examine annotations available from the GO Consortium website that are either manually curated or automatically generated. We found that 74.5% and 63.2% of GO terms are seldom, if ever, used in manual and automatic annotations, respectively. We show that there are features that tend to distinguish terms that are used from those that are not. In order to characterize the effect of compositionality on the combinatorial properties of GO, we employ finite state automata that represent sets of GO terms. This representational tool demonstrates how ontologies can grow very fast, and also shows that small conceptual changes can directly result in a large number of changes to the terminology. We argue that the curation and annotation findings we report are influenced by the combinatorial properties that present themselves in an ontology that does not have a model that properly captures the compositional structure of its terms.
Proceedings of the Workshop on Software Engineering, Testing, and Quality Assurance for Natural Language Processing (SETQA-NLP 2009) | 2009
Philip V. Ogren; Steven Bethard
We summarize our experiences building a comprehensive suite of tests for a statistical natural language processing toolkit, ClearTK. We describe some of the challenges we encountered, introduce a software project that emerged from these efforts, summarize our resulting test suite, and discuss some of the lessons learned.
computational intelligence | 2011
K. Bretonnel Cohen; Karin Verspoor; Helen L. Johnson; Christophe Roeder; Philip V. Ogren; William A. Baumgartner; Elizabeth K. White; Hannah Tipney; Lawrence Hunter
We approached the problems of event detection, argument identification, and negation and speculation detection in the BioNLP’09 information extraction challenge through concept recognition and analysis. Our methodology involved using the OpenDMAP semantic parser with manually written rules. The original OpenDMAP system was updated for this challenge with a broad ontology defined for the events of interest, new linguistic patterns for those events, and specialized coordination handling. We achieved state‐of‐the‐art precision for two of the three tasks, scoring the highest of 24 teams at precision of 71.81 on Task 1 and the highest of 6 teams at precision of 70.97 on Task 2. We provide a detailed analysis of the training data and show that a number of trigger words were ambiguous as to event type, even when their arguments are constrained by semantic class. The data is also shown to have a number of missing annotations. Analysis of a sampling of the comparatively small number of false positives returned by our system shows that major causes of this type of error were failing to recognize second themes in two‐theme events, failing to recognize events when they were the arguments to other events, failure to recognize nontheme arguments, and sentence segmentation errors. We show that specifically handling coordination had a small but important impact on the overall performance of the system. The OpenDMAP system and the rule set are available at http://bionlp.sourceforge.net.