Doug Redd | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Doug Redd is active.

Explore More

Publication

Featured researches published by Doug Redd.

Journal of Biomedical Informatics | 2015

Regular expression-based learning to extract bodyweight values from clinical notes

Maureen A. Murtaugh; Bryan Gibson; Doug Redd; Qing Zeng-Treitler

BACKGROUND Bodyweight related measures (weight, height, BMI, abdominal circumference) are extremely important for clinical care, research and quality improvement. These and other vitals signs data are frequently missing from structured tables of electronic health records. However they are often recorded as text within clinical notes. In this project we sought to develop and validate a learning algorithm that would extract bodyweight related measures from clinical notes in the Veterans Administration (VA) Electronic Health Record to complement the structured data used in clinical research. METHODS We developed the Regular Expression Discovery Extractor (REDEx), a supervised learning algorithm that generates regular expressions from a training set. The regular expressions generated by REDEx were then used to extract the numerical values of interest. To train the algorithm we created a corpus of 268 outpatient primary care notes that were annotated by two annotators. This annotation served to develop the annotation process and identify terms associated with bodyweight related measures for training the supervised learning algorithm. Snippets from an additional 300 outpatient primary care notes were subsequently annotated independently by two reviewers to complete the training set. Inter-annotator agreement was calculated. REDEx was applied to a separate test set of 3561 notes to generate a dataset of weights extracted from text. We estimated the number of unique individuals who would otherwise not have bodyweight related measures recorded in the CDW and the number of additional bodyweight related measures that would be additionally captured. RESULTS REDExs performance was: accuracy=98.3%, precision=98.8%, recall=98.3%, F=98.5%. In the dataset of weights from 3561 notes, 7.7% of notes contained bodyweight related measures that were not available as structured data. In addition 2 additional bodyweight related measures were identified per individual per year. CONCLUSION Bodyweight related measures are frequently stored as text in clinical notes. A supervised learning algorithm can be used to extract this data. Implications for clinical care, epidemiology, and quality improvement efforts are discussed.

Journal of Health and Medical Informatics | 2013

Characterizing Clinical Text and Sublanguage: A Case Study of the VA Clinical Notes

Qing T. Zeng; Doug Redd; Guy Divita; SamahJarad; Cynthia Br; Jonathan R. Nebeker

Objective: To characterize text and sublanguage in medical records to better address challenges within Natural Language Processing (NLP) tasks such as information extraction, word sense disambiguation, information retrieval, and text summarization. The text and sublanguage analysis is needed to scale up the NLP development for large and diverse free-text clinical data sets. Design: This is a quantitative descriptive study which analyzes the text and sublanguage characteristics of a very large Veteran Affairs (VA) clinical note corpus (569 million notes) to guide the customization of natural language processing (NLP) of VA notes. Methods: We randomly sampled 100,000 notes from the top 100 most frequently appearing document types. We examined surface features and used those features to identify sublanguage groups using unsupervised clustering. Results: Using the text features we are able to characterize each of the 100 document types and identify 16 distinct sublanguage groups. The identified sublanguages reflect different clinical domains and types of encounters within the sample corpus. We also found much variance within each of the document types. Such characteristics will facilitate the tuning and crafting of NLP tools. Conclusion: Using a diverse and large sample of clinical text, we were able to show there are a relatively large number of sublanguages and variance both within and between document types. These findings will guide NLP development to create more customizable and generalizable solutions across medical domains and sublanguages.

eGEMs (Generating Evidence & Methods to improve patient outcomes) | 2016

v3NLP Framework: Tools to Build Applications for Extracting Concepts from Clinical Text.

Guy Divita; Marjorie E. Carter; Le-Thuy T. Tran; Doug Redd; Qing T. Zeng; Scott L. DuVall; Matthew H. Samore; Adi V. Gundlapalli

Introduction: Substantial amounts of clinically significant information are contained only within the narrative of the clinical notes in electronic medical records. The v3NLP Framework is a set of “best-of-breed” functionalities developed to transform this information into structured data for use in quality improvement, research, population health surveillance, and decision support. Background: MetaMap, cTAKES and similar well-known natural language processing (NLP) tools do not have sufficient scalability out of the box. The v3NLP Framework evolved out of the necessity to scale-up these tools up and provide a framework to customize and tune techniques that fit a variety of tasks, including document classification, tuned concept extraction for specific conditions, patient classification, and information retrieval. Innovation: Beyond scalability, several v3NLP Framework-developed projects have been efficacy tested and benchmarked. While v3NLP Framework includes annotators, pipelines and applications, its functionalities enable developers to create novel annotators and to place annotators into pipelines and scaled applications. Discussion: The v3NLP Framework has been successfully utilized in many projects including general concept extraction, risk factors for homelessness among veterans, and identification of mentions of the presence of an indwelling urinary catheter. Projects as diverse as predicting colonization with methicillin-resistant Staphylococcus aureus and extracting references to military sexual trauma are being built using v3NLP Framework components. Conclusion: The v3NLP Framework is a set of functionalities and components that provide Java developers with the ability to create novel annotators and to place those annotators into pipelines and applications to extract concepts from clinical text. There are scale-up and scale-out functionalities to process large numbers of records.

Computers in Biology and Medicine | 2014

Informatics can identify systemic sclerosis (SSc) patients at risk for scleroderma renal crisis

Doug Redd; Tracy M. Frech; Maureen A. Murtaugh; Julia Rhiannon; Qing T. Zeng

BACKGROUND Electronic medical records (EMR) provide an ideal opportunity for the detection, diagnosis, and management of systemic sclerosis (SSc) patients within the Veterans Health Administration (VHA). The objective of this project was to use informatics to identify potential SSc patients in the VHA that were on prednisone, in order to inform an outreach project to prevent scleroderma renal crisis (SRC). METHODS The electronic medical data for this study came from Veterans Informatics and Computing Infrastructure (VINCI). For natural language processing (NLP) analysis, a set of retrieval criteria was developed for documents expected to have a high correlation to SSc. The two annotators reviewed the ratings to assemble a single adjudicated set of ratings, from which a support vector machine (SVM) based document classifier was trained. Any patient having at least one document positively classified for SSc was considered positive for SSc and the use of prednisone≥10mg in the clinical document was reviewed to determine whether it was an active medication on the prescription list. RESULTS In the VHA, there were 4272 patients that have a diagnosis of SSc determined by the presence of an ICD-9 code. From these patients, 1118 patients (21%) had the use of prednisone≥10mg. Of these patients, 26 had a concurrent diagnosis of hypertension, thus these patients should not be on prednisone. By the use of natural language processing (NLP) an additional 16,522 patients were identified as possible SSc, highlighting that cases of SSc in the VHA may exist that are unidentified by ICD-9. A 10-fold cross validation of the classifier resulted in a precision (positive predictive value) of 0.814, recall (sensitivity) of 0.973, and f-measure of 0.873. CONCLUSIONS Our study demonstrated that current clinical practice in the VHA includes the potentially dangerous use of prednisone for veterans with SSc. This present study also suggests there may be many undetected cases of SSc and NLP can successfully identify these patients.

hawaii international conference on system sciences | 2013

Improve Retrieval Performance on Clinical Notes: A Comparison of Four Methods

Doug Redd; Thomas C. Rindflesch; Jonathan R. Nebeker; Qing Zeng-Treitler

Query expansion is a commonly used approach to improving search results. Specific expansion methods, however, are expected to have different results. We have developed three different expansion methods using knowledge derived from medical thesaurus, medical literature, and clinical notes. Since the three different sources each have strengths and weaknesses, we hypothesized that combining the three sources will lead to better retrieval performance. Evaluation was performed for the 3 different query expansion techniques and an ensemble method on two sets of clinical notes. 11-point interpolated average precisions, MAP, and P(10) scores were calculated which indicate that topic model based expansion has the best results and the predication method the worst. This finding points to the potential of the topic modeling methods as well as the challenge in integrating different knowledge sources.

Journal of Medical Systems | 2017

An Evolving Ecosystem for Natural Language Processing in Department of Veterans Affairs

Jennifer H. Garvin; Megha Kalsy; Cynthia Brandt; Stephen L. Luther; Guy Divita; Gregory Coronado; Doug Redd; Carrie M. Christensen; Brent Hill; Natalie Kelly; Qing Zeng Treitler

In an ideal clinical Natural Language Processing (NLP) ecosystem, researchers and developers would be able to collaborate with others, undertake validation of NLP systems, components, and related resources, and disseminate them. We captured requirements and formative evaluation data from the Veterans Affairs (VA) Clinical NLP Ecosystem stakeholders using semi-structured interviews and meeting discussions. We developed a coding rubric to code interviews. We assessed inter-coder reliability using percent agreement and the kappa statistic. We undertook 15 interviews and held two workshop discussions. The main areas of requirements related to; design and functionality, resources, and information. Stakeholders also confirmed the vision of the second generation of the Ecosystem and recommendations included; adding mechanisms to better understand terms, measuring collaboration to demonstrate value, and datasets/tools to navigate spelling errors with consumer language, among others. Stakeholders also recommended capability to: communicate with developers working on the next version of the VA electronic health record (VistA Evolution), provide a mechanism to automatically monitor download of tools and to automatically provide a summary of the downloads to Ecosystem contributors and funders. After three rounds of coding and discussion, we determined the percent agreement of two coders to be 97.2% and the kappa to be 0.7851. The vision of the VA Clinical NLP Ecosystem met stakeholder needs. Interviews and discussion provided key requirements that inform the design of the VA Clinical NLP Ecosystem.

Computers in Biology and Medicine | 2015

Maximizing clinical cohort size using free text queries

Adi V. Gundlapalli; Doug Redd; Bryan Gibson; Marjorie E. Carter; Chris Korhonen; Jonathan R. Nebeker; Matthew H. Samore; Qing Zeng-Treitler

BACKGROUND Cohort identification is important in both population health management and research. In this project we sought to assess the use of text queries for cohort identification. Specifically we sought to determine the incremental value of unstructured data queries when added to structured queries for the purpose of patient cohort identification. METHODS Three cohort identification tasks were evaluated: identification of individuals taking gingko biloba and warfarin simultaneously (Gingko/Warfarin), individuals who were overweight, and individuals with uncontrolled diabetes (UCD). We assessed the increase in cohort size when unstructured data queries were added to structured data queries. The positive predictive value of unstructured data queries was assessed by manual chart review of a random sample of 500 patients. RESULTS For Gingko/Warfarin, text query increased the cohort size from 9 to 28,924 over the cohort identified by query of pharmacy data only. For the weight-related tasks, text search increased the cohort by 5-29% compared to the cohort identified by query of the vitals table. For the UCD task, text query increased the cohort size by 2-43% compared to the cohort identified by query of laboratory results or ICD codes. The positive predictive values for text searches were 52% for Gingko/Warfarin, 19-94% for the weight cohort and 44% for UCD. DISCUSSION This project demonstrates the value and limitation of free text queries in patient cohort identification from large data sets. The clinical domain and prevalence of the inclusion and exclusion criteria in the patient population influence the utility and yield of this approach.

international conference on data mining | 2013

Preface to data mining in biomedical informatics and healthcare

Carlo Barbieri; Cynthia Brandt; Samah Jamal Fodeh; Christopher Gillies; José David Martín-Guerrero; Daniela Stan Raicu; Mohammad Reza Siadat; Claudia Amato; Sameer K. Antani; Paul Bradley; Hamidreza Chitsaz; Rosa L. Figueroa; Jacob D. Furst; Adam E. Gaweda; Maryellen L. Giger; Juan Gómez; Ali Haddad; Kourosh Jafari-Khouzani; Jesse Lingeman; Paulo J. G. Lisboa; Flavio Mari; Theophilus Ogunyemi; Doug Redd; Ishwar K. Sethi; Hamid Soltanian-Zadeh; Emilio Soria; Gautam B. Singh; Szilárd Vajda

In the last decade, healthcare institutions, pharmaceutical companies as well as other organizations started to aggregate biomedical and clinical data in electronic databases. Mining these databases gives promising new threads of knowledge that could lead to a variety of beneficial outcomes for the entire community, from improving patients’ quality of life towards saving public healthcare costs or increasing efficiency of private healthcare companies. Given the complexity of biomedical and clinical information, it is important to make use of the proper tools to gain valuable insights from the available data.

american medical informatics association annual symposium | 2012