Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Amber Stubbs is active.

Publication


Featured researches published by Amber Stubbs.


Journal of Biomedical Informatics | 2015

Automated systems for the de-identification of longitudinal clinical narratives

Amber Stubbs; Christopher Kotfila; Özlem Uzuner

The 2014 i2b2/UTHealth Natural Language Processing (NLP) shared task featured four tracks. The first of these was the de-identification track focused on identifying protected health information (PHI) in longitudinal clinical narratives. The longitudinal nature of clinical narratives calls particular attention to details of information that, while benign on their own in separate records, can lead to identification of patients in combination in longitudinal records. Accordingly, the 2014 de-identification track addressed a broader set of entities and PHI than covered by the Health Insurance Portability and Accountability Act - the focus of the de-identification shared task that was organized in 2006. Ten teams tackled the 2014 de-identification task and submitted 22 system outputs for evaluation. Each team was evaluated on their best performing system output. Three of the 10 systems achieved F1 scores over .90, and seven of the top 10 scored over .75. The most successful systems combined conditional random fields and hand-written rules. Our findings indicate that automated systems can be very effective for this task, but that de-identification is not yet a solved problem.


Journal of Biomedical Informatics | 2015

Identifying risk factors for heart disease over time

Amber Stubbs; Christopher Kotfila; Hua Xu; Özlem Uzuner

The second track of the 2014 i2b2/UTHealth natural language processing shared task focused on identifying medical risk factors related to Coronary Artery Disease (CAD) in the narratives of longitudinal medical records of diabetic patients. The risk factors included hypertension, hyperlipidemia, obesity, smoking status, and family history, as well as diabetes and CAD, and indicators that suggest the presence of those diseases. In addition to identifying the risk factors, this track of the 2014 i2b2/UTHealth shared task studied the presence and progression of the risk factors in longitudinal medical records. Twenty teams participated in this track, and submitted 49 system runs for evaluation. Six of the top 10 teams achieved F1 scores over 0.90, and all 10 scored over 0.87. The most successful system used a combination of additional annotations, external lexicons, hand-written rules and Support Vector Machines. The results of this track indicate that identification of risk factors and their progression over time is well within the reach of automated systems.


Journal of Biomedical Informatics | 2015

Annotating longitudinal clinical narratives for de-identification

Amber Stubbs; Özlem Uzuner

The 2014 i2b2/UTHealth natural language processing shared task featured a track focused on the de-identification of longitudinal medical records. For this track, we de-identified a set of 1304 longitudinal medical records describing 296 patients. This corpus was de-identified under a broad interpretation of the HIPAA guidelines using double-annotation followed by arbitration, rounds of sanity checking, and proof reading. The average token-based F1 measure for the annotators compared to the gold standard was 0.927. The resulting annotations were used both to de-identify the data and to set the gold standard for the de-identification track of the 2014 i2b2/UTHealth shared task. All annotated private health information were replaced with realistic surrogates automatically and then read over and corrected manually. The resulting corpus is the first of its kind made available for de-identification research. This corpus was first used for the 2014 i2b2/UTHealth shared task, during which the systems achieved a mean F-measure of 0.872 and a maximum F-measure of 0.964 using entity-based micro-averaged evaluations.


Journal of Biomedical Informatics | 2015

Creation of a new longitudinal corpus of clinical narratives

Vishesh Kumar; Amber Stubbs; Stanley Y. Shaw; Özlem Uzuner

The 2014 i2b2/UTHealth Natural Language Processing (NLP) shared task featured a new longitudinal corpus of 1304 records representing 296 diabetic patients. The corpus contains three cohorts: patients who have a diagnosis of coronary artery disease (CAD) in their first record, and continue to have it in subsequent records; patients who do not have a diagnosis of CAD in the first record, but develop it by the last record; patients who do not have a diagnosis of CAD in any record. This paper details the process used to select records for this corpus and provides an overview of novel research uses for this corpus. This corpus is the only annotated corpus of longitudinal clinical narratives currently available for research to the general research community.


linguistic annotation workshop | 2007

Combining Independent Syntactic and Semantic Annotation Schemes

Marc Verhagen; Amber Stubbs; James Pustejovsky

We present MAIS, a UIMA-based environment for combining information from various annotated resources. Each resource contains one mode of linguistic annotation and remains independent from the other resources. Interactions between annotations are defined based on use cases.


Journal of Biomedical Informatics | 2015

Practical applications for natural language processing in clinical research

Özlem Uzuner; Amber Stubbs

Display Omitted Capstone shared task for 8years of i2b2 challenges. Co-organized with UTHealth.Four tracks: de-identification, risk factor extraction, software usability, and novel data use.Participation from around the world, from academia and industry.Data sets available for research beyond the lifetime of i2b2, at i2b2.org/NLP.


Journal of Biomedical Informatics | 2017

Symptom severity prediction from neuropsychiatric clinical records: Overview of 2016 CEGS N-GRID Shared Tasks Track 2.

Michele Filannino; Amber Stubbs; Özlem Uzuner

The second track of the CEGS N-GRID 2016 natural language processing shared tasks focused on predicting symptom severity from neuropsychiatric clinical records. For the first time, initial psychiatric evaluation records have been collected, de-identified, annotated and shared with the scientific community. One-hundred-ten researchers organized in twenty-four teams participated in this track and submitted sixty-five system runs for evaluation. The top ten teams each achieved an inverse normalized macro-averaged mean absolute error score over 0.80. The top performing system employed an ensemble of six different machine learning-based classifiers to achieve a score 0.86. The task resulted to be generally easy with the exception of two specific classes of records: records with very few but crucial positive valence signals, and records describing patients predominantly affected by negative rather than positive valence. Those cases proved to be very challenging for most of the systems. Further research is required to consider the task solved. Overall, the results of this track demonstrate the effectiveness of data-driven approaches to the task of symptom severity classification.


Journal of Biomedical Informatics | 2013

Editorial: Chronology of your health events: Approaches to extracting temporal relations from medical narratives

Özlem Uzuner; Amber Stubbs; Weiyi Sun

Participants in the 2012 i2b2 Shared-Task and Workshop on Challenges in Natural Language Processing for Clinical Data created a variety of systems for processing temporal relations in clinical records. The different ways of conceptualizing the shared-task Tracks reflects the complexity of temporal analysis of narratives even for humans, and the use of hybrid systems, world knowledge, and other sources of linguistic information reflect the difficulty of formulating temporal analysis for automated methods. Despite their promising results, and significant advancement on the state of the art in temporal relations in medical records, the 2012 i2b2 challenge systems only scratched the surface in this task. Open questions remain about the applicability of the developed systems for real life practical questions, such as the determination of the progression of diseases in patients, for example, heart disease in diabetic populations. Nevertheless, the 2012 i2b2 Challenge corpus of temporal annotations remains a valuable asset to the medical NLP community, and we hope will serve as the basis for further innovation in temporal relations, resulting in systems that can be applied to real life clinical problems.


Medical Data Privacy Handbook | 2015

Challenges in Synthesizing Surrogate PHI in Narrative EMRs

Amber Stubbs; Özlem Uzuner; Christopher Kotfila; Ira Goldstein; Peter Szolovits

Preparing narrative medical records for use outside of their originating institutions requires that protected health information (PHI) be removed from the records. If researchers intend to use these records for natural language processing, then preparing the medical documents requires two steps: (1) identifying the PHI and (2) replacing the PHI with realistic surrogates. In this chapter we discuss the challenges associated with generating these realistic surrogates and describe the algorithms we used to prepare the 2014 i2b2/UTHealth shared task corpus for distribution and use in a natural language processing task focused on de-identification.


Handbook of Linguistic Annotation | 2017

De-identification of Medical Records Through Annotation

Amber Stubbs; Özlem Uzuner

Before medical records can be shared outside of a hospital or medical group, all of the information that identifies the patient (called protected health information, or PHI) must be removed. In this paper, we examine different methodologies for performing de-identification annotation in order to determine which is most effective at ensuring that all identifying information is removed. We used serial (i.e., multiple annotators working in succession) and parallel (i.e., multiple annotators working independently) annotation paradigms on two different corpora, one unannotated and the other pre-annotated for PHI. Our evaluation revealed that neither annotation paradigm was superior to the other, regardless of whether the corpus was pre-annotated or unannotated.

Collaboration


Dive into the Amber Stubbs's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar

Hua Xu

University of Texas Health Science Center at Houston

View shared research outputs
Top Co-Authors

Avatar

Anna Rumshisky

University of Massachusetts Lowell

View shared research outputs
Top Co-Authors

Avatar

Kai Zheng

University of Michigan

View shared research outputs
Top Co-Authors

Avatar

Anupama E. Gururaj

University of Texas at Austin

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge