Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Cheryl Clark is active.

Publication


Featured researches published by Cheryl Clark.


International Journal of Medical Informatics | 2010

The MITRE Identification Scrubber Toolkit: Design, training, and assessment

John S. Aberdeen; Samuel Bayer; Reyyan Yeniterzi; Benjamin Wellner; Cheryl Clark; David A. Hanauer; Bradley Malin; Lynette Hirschman

PURPOSE Medical records must often be stripped of patient identifiers, or de-identified, before being shared. De-identification by humans is time-consuming, and existing software is limited in its generality. The open source MITRE Identification Scrubber Toolkit (MIST) provides an environment to support rapid tailoring of automated de-identification to different document types, using automatically learned classifiers to de-identify and protect sensitive information. METHODS MIST was evaluated with four classes of patient records from the Vanderbilt University Medical Center: discharge summaries, laboratory reports, letters, and order summaries. We trained and tested MIST on each class of record separately, as well as on pooled sets of records. We measured precision, recall, F-measure and accuracy at the word level for the detection of patient identifiers as designated by the HIPAA Safe Harbor Rule. RESULTS MIST was applied to medical records that differed in the amounts and types of protected health information (PHI): lab reports contained only two types of PHI (dates, names) compared to discharge summaries, which were much richer. Performance of the de-identification tool depended on record class; F-measure results were 0.996 for order summaries, 0.996 for discharge summaries, 0.943 for letters and 0.934 for laboratory reports. Experiments suggest the tool requires several hundred training exemplars to reach an F-measure of at least 0.9. CONCLUSIONS The MIST toolkit makes possible the rapid tailoring of automated de-identification to particular document types and supports the transition of the de-identification software to medical end users, avoiding the need for developers to have access to original medical records. We are making the MIST toolkit available under an open source license to encourage its application to diverse data sets at multiple institutions.


Journal of the American Medical Informatics Association | 2008

Identifying Smokers with a Medical Extraction System

Cheryl Clark; Kathleen Good; Lesley Jezierny; Melissa Macpherson; Brian Wilson; Urszula Chajewska

The Clinical Language Understanding group at Nuance Communications has developed a medical information extraction system that combines a rule-based extraction engine with machine learning algorithms to identify and categorize references to patient smoking in clinical reports. The extraction engine identifies smoking references; documents that contain no smoking references are classified as UNKNOWN. For the remaining documents, the extraction engine uses linguistic analysis to associate features such as status and time to smoking mentions. Machine learning is used to classify the documents based on these features. This approach shows overall accuracy in the 90s on all data sets used. Classification using engine-generated and word-based features outperforms classification using only word-based features for all data sets, although the difference gets smaller as the data set size increases. These techniques could be applied to identify other risk factors, such as drug and alcohol use, or a family history of a disease.


PLOS ONE | 2014

Negation’s Not Solved: Generalizability Versus Optimizability in Clinical Natural Language Processing

Stephen T. Wu; Timothy A. Miller; James J. Masanz; Matt Coarr; Scott R. Halgrim; David Carrell; Cheryl Clark

A review of published work in clinical natural language processing (NLP) may suggest that the negation detection task has been “solved.” This work proposes that an optimizable solution does not equal a generalizable solution. We introduce a new machine learning-based Polarity Module for detecting negation in clinical text, and extensively compare its performance across domains. Using four manually annotated corpora of clinical text, we show that negation detection performance suffers when there is no in-domain development (for manual methods) or training data (for machine learning-based methods). Various factors (e.g., annotation guidelines, named entity characteristics, the amount of data, and lexical and syntactic context) play a role in making generalizability difficult, but none completely explains the phenomenon. Furthermore, generalizability remains challenging because it is unclear whether to use a single source for accurate data, combine all sources into a single model, or apply domain adaptation methods. The most reliable means to improve negation detection is to manually annotate in-domain training data (or, perhaps, manually modify rules); this is a strategy for optimizing performance, rather than generalizing it. These results suggest a direction for future work in domain-adaptive and task-adaptive methods for clinical NLP.


International Journal of Medical Informatics | 2013

Bootstrapping a de-identification system for narrative patient records: Cost-performance tradeoffs

David A. Hanauer; John S. Aberdeen; Samuel Bayer; Benjamin Wellner; Cheryl Clark; Kai Zheng; Lynette Hirschman

PURPOSE We describe an experiment to build a de-identification system for clinical records using the open source MITRE Identification Scrubber Toolkit (MIST). We quantify the human annotation effort needed to produce a system that de-identifies at high accuracy. METHODS Using two types of clinical records (history and physical notes, and social work notes), we iteratively built statistical de-identification models by annotating 10 notes, training a model, applying the model to another 10 notes, correcting the models output, and training from the resulting larger set of annotated notes. This was repeated for 20 rounds of 10 notes each, and then an additional 6 rounds of 20 notes each, and a final round of 40 notes. At each stage, we measured precision, recall, and F-score, and compared these to the amount of annotation time needed to complete the round. RESULTS After the initial 10-note round (33min of annotation time) we achieved an F-score of 0.89. After just over 8h of annotation time (round 21) we achieved an F-score of 0.95. Number of annotation actions needed, as well as time needed, decreased in later rounds as model performance improved. Accuracy on history and physical notes exceeded that of social work notes, suggesting that the wider variety and contexts for protected health information (PHI) in social work notes is more difficult to model. CONCLUSIONS It is possible, with modest effort, to build a functioning de-identification system de novo using the MIST framework. The resulting system achieved performance comparable to other high-performing de-identification systems.


Journal of the American Medical Informatics Association | 2014

MedXN: an open source medication extraction and normalization tool for clinical text

Sunghwan Sohn; Cheryl Clark; Scott R. Halgrim; Sean P. Murphy; Christopher G. Chute; Hongfang Liu

OBJECTIVE We developed the Medication Extraction and Normalization (MedXN) system to extract comprehensive medication information and normalize it to the most appropriate RxNorm concept unique identifier (RxCUI) as specifically as possible. METHODS Medication descriptions in clinical notes were decomposed into medication name and attributes, which were separately extracted using RxNorm dictionary lookup and regular expression. Then, each medication name and its attributes were combined together according to RxNorm convention to find the most appropriate RxNorm representation. To do this, we employed serialized hierarchical steps implemented in Apaches Unstructured Information Management Architecture. We also performed synonym expansion, removed false medications, and employed inference rules to improve the medication extraction and normalization performance. RESULTS An evaluation on test data of 397 medication mentions showed F-measures of 0.975 for medication name and over 0.90 for most attributes. The RxCUI assignment produced F-measures of 0.932 for medication name and 0.864 for full medication information. Most false negative RxCUI assignments in full medication information are due to human assumption of missing attributes and medication names in the gold standard. CONCLUSIONS The MedXN system (http://sourceforge.net/projects/ohnlp/files/MedXN/) was able to extract comprehensive medication information with high accuracy and demonstrated good normalization capability to RxCUI as long as explicit evidence existed. More sophisticated inference rules might result in further improvements to specific RxCUI assignments for incomplete medication descriptions.


Biomedical Informatics Insights | 2013

Analysis of Cross-Institutional Medication Description Patterns in Clinical Narratives

Sunghwan Sohn; Cheryl Clark; Scott R. Halgrim; Sean P. Murphy; Siddhartha Jonnalagadda; Kavishwar B. Wagholikar; Stephen T. Wu; Christopher G. Chute; Hongfang Liu

A large amount of medication information resides in the unstructured text found in electronic medical records, which requires advanced techniques to be properly mined. In clinical notes, medication information follows certain semantic patterns (eg, medication, dosage, frequency, and mode). Some medication descriptions contain additional word(s) between medication attributes. Therefore, it is essential to understand the semantic patterns as well as the patterns of the context interspersed among them (ie, context patterns) to effectively extract comprehensive medication information. In this paper we examined both semantic and context patterns, and compared those found in Mayo Clinic and i2b2 challenge data. We found that some variations exist between the institutions but the dominant patterns are common.


document recognition and retrieval | 2011

Online medical symbol recognition using a Tablet PC

Amlan Kundu; Qian Hu; Stanley Boykin; Cheryl Clark; Randy Fish; Stephen Jones; Stephen R. Moore

In this paper we describe a scheme to enhance the usability of a Tablet PCs handwriting recognition system by including medical symbols that are not a part of the Tablet PCs symbol library. The goal of this work is to make handwriting recognition more useful for medical professionals accustomed to using medical symbols in medical records. To demonstrate that this new symbol recognition module is robust and expandable, we report results on both a medical symbol set and an expanded symbol test set which includes selected mathematical symbols.


Journal of Biomedical Informatics | 2017

Automatic classification of RDoC positive valence severity with a neural network

Cheryl Clark; Ben Wellner; Rachel Davis; John S. Aberdeen; Lynette Hirschman

OBJECTIVE Our objective was to develop a machine learning-based system to determine the severity of Positive Valance symptoms for a patient, based on information included in their initial psychiatric evaluation. Severity was rated on an ordinal scale of 0-3 as follows: 0 (absent=no symptoms), 1 (mild=modest significance), 2 (moderate=requires treatment), 3 (severe=causes substantial impairment) by experts. MATERIALS AND METHODS We treated the task of assigning Positive Valence severity as a text classification problem. During development, we experimented with regularized multinomial logistic regression classifiers, gradient boosted trees, and feedforward, fully-connected neural networks. We found both regularization and feature selection via mutual information to be very important in preventing models from overfitting the data. Our best configuration was a neural network with three fully connected hidden layers with rectified linear unit activations. RESULTS Our best performing system achieved a score of 77.86%. The evaluation metric is an inverse normalization of the Mean Absolute Error presented as a percentage number between 0 and 100, where 100 means the highest performance. Error analysis showed that 90% of the system errors involved neighboring severity categories. CONCLUSION Machine learning text classification techniques with feature selection can be trained to recognize broad differences in Positive Valence symptom severity with a modest amount of training data (in this case 600 documents, 167 of which were unannotated). An increase in the amount of annotated data can increase accuracy of symptom severity classification by several percentage points. Additional features and/or a larger training corpus may further improve accuracy.


Omics A Journal of Integrative Biology | 2008

Habitat-lite: A GSC case study based on free text terms for environmental metadata

Lynette Hirschman; Cheryl Clark; K. Bretonnel Cohen; Scott A. Mardis; Joanne S. Luciano; Renzo Kottmann; James R. Cole; Victor Markowitz; Nikos C. Kyrpides; Norman Morrison; Lynn M. Schriml; Dawn Field


Journal of the American Medical Informatics Association | 2011

MITRE system for clinical assertion status classification

Cheryl Clark; John S. Aberdeen; Matthew Coarr; David Tresner-Kirsch; Ben Wellner; Alexander S. Yeh; Lynette Hirschman

Collaboration


Dive into the Cheryl Clark's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar

Scott R. Halgrim

Group Health Research Institute

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge