James D. Buntrock
Mayo Clinic
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by James D. Buntrock.
Journal of Biomedical Informatics | 2005
Sergey V. Pakhomov; James D. Buntrock; Christopher G. Chute
This paper addresses a very specific problem of identifying patients diagnosed with a specific condition for potential recruitment in a clinical trial or an epidemiological study. We present a simple machine learning method for identifying patients diagnosed with congestive heart failure and other related conditions by automatically classifying clinical notes dictated at Mayo Clinic. This method relies on an automatic classifier trained on comparable amounts of positive and negative samples of clinical notes previously categorized by human experts. The documents are represented as feature vectors, where features are a mix of demographic information as well as single words and concept mappings to MeSH and HICDA classification systems. We compare two simple and efficient classification algorithms (Naïve Bayes and Perceptron) and a baseline term spotting method with respect to their accuracy and recall on positive samples. Depending on the test set, we find that Naïve Bayes yields better recall on positive samples (95 vs. 86%) but worse accuracy than Perceptron (57 vs. 65%). Both algorithms perform better than the baseline with recall on positive samples of 71% and accuracy of 54%.
Journal of the American Medical Informatics Association | 2009
Jyotishman Pathak; Harold R. Solbrig; James D. Buntrock; Thomas M. Johnson; Christopher G. Chute
Many biomedical terminologies, classifications, and ontological resources such as the NCI Thesaurus (NCIT), International Classification of Diseases (ICD), Systematized Nomenclature of Medicine (SNOMED), Current Procedural Terminology (CPT), and Gene Ontology (GO) have been developed and used to build a variety of IT applications in biology, biomedicine, and health care settings. However, virtually all these resources involve incompatible formats, are based on different modeling languages, and lack appropriate tooling and programming interfaces (APIs) that hinder their wide-scale adoption and usage in a variety of application contexts. The Lexical Grid (LexGrid) project introduced in this paper is an ongoing community-driven initiative, coordinated by the Mayo Clinic Division of Biomedical Statistics and Informatics, designed to bridge this gap using a common terminology model called the LexGrid model. The key aspect of the model is to accommodate multiple vocabulary and ontology distribution formats and support of multiple data stores for federated vocabulary distribution. The model provides a foundation for building consistent and standardized APIs to access multiple vocabularies that support lexical search queries, hierarchy navigation, and a rich set of features such as recursive subsumption (e.g., get all the children of the concept penicillin). Existing LexGrid implementations include the LexBIG API as well as a reference implementation of the HL7 Common Terminology Services (CTS) specification providing programmatic access via Java, Web, and Grid services.
meeting of the association for computational linguistics | 2005
Serguei V. S. Pakhomov; James D. Buntrock; Patrick H. Duffy
This paper presents the results of the development of a high throughput, real time modularized text analysis and information retrieval system that identifies clinically relevant entities in clinical notes, maps the entities to several standardized nomenclatures and makes them available for subsequent information retrieval and data mining. The performance of the system was validated on a small collection of 351 documents partitioned into 4 query topics and manually examined by 3 physicians and 3 nurse abstractors for relevance to the query topics. We find that simple key phrase searching results in 73% recall and 77% precision. A combination of NLP approaches to indexing improve the recall to 92%, while lowering the precision to 67%.
Studies in health technology and informatics | 2004
Serguei V. S. Pakhomov; James D. Buntrock; Christopher G. Chute
Classification of diagnoses (a.k.a. coding) is the central part of current concept based medical IR systems. Some classification systems contain over 30,000 distinct codes which makes classifying clinical documents a time consuming labor intensive and error prone process. This paper presents a simple methodology for cleaning up and reusing existing manually coded diagnostic statements mainly extracted from clinical notes to build predictive models using a sparse-feature implementation of a Naïve Bayes classifier. One of the problems addressed is that diagnostic statements often contain several diagnoses and are assigned several codes resulting in a multi-class classification problem. We investigate one possible way of addressing this problem by introducing compound (multiple code) categories. We present experimental results of classifying >16,000 randomly selected diagnostic strings into 19 top level categories. A small improvement (3%) with using compound categories over simple categories indicates that using multiple code categories is a promising solution, although clearly in need of further research and refinement.
meeting of the association for computational linguistics | 2003
Sergey V. Pakhomov; James D. Buntrock; Christopher G. Chute
This paper addresses a very specific problem that happens to be common in health science research. We present a machine learning based method for identifying patients diagnosed with congestive heart failure and other related conditions by automatically classifying clinical notes. This method relies on a Perceptron neural network classifier trained on comparable amounts of positive and negative samples of clinical notes previously categorized by human experts. The documents are represented as feature vectors where features are a mix of single words and concept mappings to MeSH and HICDA ontologies. The method is designed and implemented to support a particular epidemiological study but has broader implications for clinical research. In this paper, we describe the method and present experimental classification results based on classification accuracy and positive predictive value.
Journal of the American Medical Informatics Association | 2008
Guergana Savova; Philip V. Ogren; Patrick H. Duffy; James D. Buntrock; Christopher G. Chute
american medical informatics association annual symposium | 2001
Peter L. Elkin; Alexander Ruggieri; Steven H. Brown; James D. Buntrock; Brent A. Bauer; Dietlind L. Wahner-Roedler; Scott C. Litin; Julie Beinborn; Kent R. Bailey; Larry R. Bergstrom
annual symposium on computer application in medical care | 1995
Christopher G. Chute; D. L. Crowson; James D. Buntrock
american medical informatics association annual symposium | 2006
Philip V. Ogren; Guergana Savova; James D. Buntrock; Christopher G. Chute
annual symposium on computer application in medical care | 1994
Christopher G. Chute; Yiming Yang; James D. Buntrock