Kevin J. Mitchell | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Kevin J. Mitchell is active.

Explore More

Publication

Featured researches published by Kevin J. Mitchell.

Journal of the American Medical Informatics Association | 2010

caTIES: a grid based system for coding and retrieval of surgical pathology reports and tissue specimens in support of translational research.

Rebecca S. Crowley; Melissa Castine; Kevin J. Mitchell; Girish Chavan; Tara McSherry; Michael Feldman

The authors report on the development of the Cancer Tissue Information Extraction System (caTIES)--an application that supports collaborative tissue banking and text mining by leveraging existing natural language processing methods and algorithms, grid communication and security frameworks, and query visualization methods. The system fills an important need for text-derived clinical data in translational research such as tissue-banking and clinical trials. The design of caTIES addresses three critical issues for informatics support of translational research: (1) federation of research data sources derived from clinical systems; (2) expressive graphical interfaces for concept-based text mining; and (3) regulatory and security model for supporting multi-center collaborative research. Implementation of the system at several Cancer Centers across the country is creating a potential network of caTIES repositories that could provide millions of de-identified clinical reports to users. The system provides an end-to-end application of medical natural language processing to support multi-institutional translational research programs.

BMC Bioinformatics | 2016

NOBLE – Flexible concept recognition for large-scale biomedical natural language processing

Eugene Tseytlin; Kevin J. Mitchell; Elizabeth Legowski; Julia Corrigan; Girish Chavan; Rebecca S. Jacobson

BackgroundNatural language processing (NLP) applications are increasingly important in biomedical data analysis, knowledge engineering, and decision support. Concept recognition is an important component task for NLP pipelines, and can be either general-purpose or domain-specific. We describe a novel, flexible, and general-purpose concept recognition component for NLP pipelines, and compare its speed and accuracy against five commonly used alternatives on both a biological and clinical corpus.NOBLE Coder implements a general algorithm for matching terms to concepts from an arbitrary vocabulary set. The system’s matching options can be configured individually or in combination to yield specific system behavior for a variety of NLP tasks. The software is open source, freely available, and easily integrated into UIMA or GATE. We benchmarked speed and accuracy of the system against the CRAFT and ShARe corpora as reference standards and compared it to MMTx, MGrep, Concept Mapper, cTAKES Dictionary Lookup Annotator, and cTAKES Fast Dictionary Lookup Annotator.ResultsWe describe key advantages of the NOBLE Coder system and associated tools, including its greedy algorithm, configurable matching strategies, and multiple terminology input formats. These features provide unique functionality when compared with existing alternatives, including state-of-the-art systems. On two benchmarking tasks, NOBLE’s performance exceeded commonly used alternatives, performing almost as well as the most advanced systems. Error analysis revealed differences in error profiles among systems.ConclusionNOBLE Coder is comparable to other widely used concept recognition systems in terms of accuracy and speed. Advantages of NOBLE Coder include its interactive terminology builder tool, ease of configuration, and adaptability to various domains and tasks. NOBLE provides a term-to-concept matching system suitable for general concept recognition in biomedical NLP pipelines.

PLOS ONE | 2014

Feature Engineering and a Proposed Decision-Support System for Systematic Reviewers of Medical Evidence

Tanja Bekhuis; Eugene Tseytlin; Kevin J. Mitchell; Dina Demner-Fushman

Objectives Evidence-based medicine depends on the timely synthesis of research findings. An important source of synthesized evidence resides in systematic reviews. However, a bottleneck in review production involves dual screening of citations with titles and abstracts to find eligible studies. For this research, we tested the effect of various kinds of textual information (features) on performance of a machine learning classifier. Based on our findings, we propose an automated system to reduce screeing burden, as well as offer quality assurance. Methods We built a database of citations from 5 systematic reviews that varied with respect to domain, topic, and sponsor. Consensus judgments regarding eligibility were inferred from published reports. We extracted 5 feature sets from citations: alphabetic, alphanumeric+, indexing, features mapped to concepts in systematic reviews, and topic models. To simulate a two-person team, we divided the data into random halves. We optimized the parameters of a Bayesian classifier, then trained and tested models on alternate data halves. Overall, we conducted 50 independent tests. Results All tests of summary performance (mean F3) surpassed the corresponding baseline, P<0.0001. The ranks for mean F3, precision, and classification error were statistically different across feature sets averaged over reviews; P-values for Friedmans test were .045, .002, and .002, respectively. Differences in ranks for mean recall were not statistically significant. Alphanumeric+ features were associated with best performance; mean reduction in screening burden for this feature type ranged from 88% to 98% for the second pass through citations and from 38% to 48% overall. Conclusions A computer-assisted, decision support system based on our methods could substantially reduce the burden of screening citations for systematic review teams and solo reviewers. Additionally, such a system could deliver quality assurance both by confirming concordant decisions and by naming studies associated with discordant decisions for further consideration.

Cancer Research | 2015

A Federated Network for Translational Cancer Research Using Clinical Data and Biospecimens

Rebecca S. Jacobson; Michael J. Becich; Roni J. Bollag; Girish Chavan; Julia Corrigan; Rajiv Dhir; Michael Feldman; Carmelo Gaudioso; Elizabeth Legowski; Nita J. Maihle; Kevin J. Mitchell; Monica Murphy; Mayurapriyan Sakthivel; Eugene Tseytlin; JoEllen Weaver

Advances in cancer research and personalized medicine will require significant new bridging infrastructures, including more robust biorepositories that link human tissue to clinical phenotypes and outcomes. In order to meet that challenge, four cancer centers formed the Text Information Extraction System (TIES) Cancer Research Network, a federated network that facilitates data and biospecimen sharing among member institutions. Member sites can access pathology data that are de-identified and processed with the TIES natural language processing system, which creates a repository of rich phenotype data linked to clinical biospecimens. TIES incorporates multiple security and privacy best practices that, combined with legal agreements, network policies, and procedures, enable regulatory compliance. The TIES Cancer Research Network now provides integrated access to investigators at all member institutions, where multiple investigator-driven pilot projects are underway. Examples of federated search across the network illustrate the potential impact on translational research, particularly for studies involving rare cancers, rare phenotypes, and specific biologic behaviors. The network satisfies several key desiderata including local control of data and credentialing, inclusion of rich phenotype information, and applicability to diverse research objectives. The TIES Cancer Research Network presents a model for a national data and biospecimen network.

Journal of Biomedical Informatics | 2017

Automated annotation and classification of BI-RADS assessment from radiology reports

Sergio M. Castro; Eugene Tseytlin; Olga Medvedeva; Kevin J. Mitchell; Shyam Visweswaran; Tanja Bekhuis; Rebecca S. Jacobson

The Breast Imaging Reporting and Data System (BI-RADS) was developed to reduce variation in the descriptions of findings. Manual analysis of breast radiology report data is challenging but is necessary for clinical and healthcare quality assurance activities. The objective of this study is to develop a natural language processing (NLP) system for automated BI-RADS categories extraction from breast radiology reports. We evaluated an existing rule-based NLP algorithm, and then we developed and evaluated our own method using a supervised machine learning approach. We divided the BI-RADS category extraction task into two specific tasks: (1) annotation of all BI-RADS category values within a report, (2) classification of the laterality of each BI-RADS category value. We used one algorithm for task 1 and evaluated three algorithms for task 2. Across all evaluations and model training, we used a total of 2159 radiology reports from 18 hospitals, from 2003 to 2015. Performance with the existing rule-based algorithm was not satisfactory. Conditional random fields showed a high performance for task 1 with an F-1 measure of 0.95. Rules from partial decision trees (PART) algorithm showed the best performance across classes for task 2 with a weighted F-1 measure of 0.91 for BIRADS 0-6, and 0.93 for BIRADS 3-5. Classification performance by class showed that performance improved for all classes from Naïve Bayes to Support Vector Machine (SVM), and also from SVM to PART. Our system is able to annotate and classify all BI-RADS mentions present in a single radiology report and can serve as the foundation for future studies that will leverage automated BI-RADS annotation, to provide feedback to radiologists as part of a learning health system loop.

bioinformatics and biomedicine | 2015

A prototype for a hybrid system to support systematic review teams: A case study of organ transplantation

Tanja Bekhuis; Eugene Tseytlin; Kevin J. Mitchell

We describe a prototype for a hybrid system designed to reduce the number of citations needed to re-screen (NNRS) by systematic reviewers, where citations include titles, abstracts, and metadata. The system obviates the need for screening the entire set of citations a second time, which is typically done to control human error. The reference set is based on a complex review about organ transplantation (N=10,796 citations). Data were split into 50% training and test sets, randomly stratified for percentage eligible citations. The system consists of a rule-based module and a machine-learning (ML) module. The former substantially reduces the number of negative citations passed to the ML module and improves imbalance. Relative to the baseline, the system reduces classification error (5.6% vs 2.9%) thereby reducing NNRS by 47.3% (300 vs 158). We discuss the implications of de-emphasizing sensitivity (recall) in favor of specificity and negative predictive value to reduce screening burden.

Unknown Journal | 2004

Implementation and evaluation of a negation tagger in a pipeline-based system for information extraction from pathology reports

Kevin J. Mitchell; Michael J. Becich; Jules J. Berman; Wendy W. Chapman; John Gilbertson; Dilip Gupta; James Harrison; Elizabeth Legowski; Rebecca S. Crowley

We have developed a pipeline-based system for automated annotation of Surgical Pathology Reports with UMLS terms that builds on GATE – an open-source architecture for language engineering. The system includes a module for detecting and annotating negated concepts, which implements the NegEx algorithm – an algorithm originally described for use in discharge summaries and radiology reports. We describe the implementation of the system, and early evaluation of the Negation Tagger. Our results are encouraging. In the key Final Diagnosis section, with almost no modification of the algorithm or phrase lists, the system performs with precision of 0.84 and recall of 0.80 against a gold-standard corpus of negation annotations, created by modified Delphi technique by a panel of pathologists. Further work will focus on refining the Negation Tagger and UMLS Tagger and adding additional processing resources for annotating freetext pathology reports.

Studies in health technology and informatics | 2004

Implementation and evaluation of a negation tagger in a pipeline-based system for information extract from pathology reports.

Kevin J. Mitchell; Michael J. Becich; Jules J. Berman; Wendy W. Chapman; John R. Gilbertson; Dilip Gupta; James Harrison; Elizabeth Legowski; Rebecca S. Crowley

american medical informatics association annual symposium | 2005