Rebecca S. Jacobson | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Rebecca S. Jacobson is active.

Explore More

Publication

Featured researches published by Rebecca S. Jacobson.

BMC Bioinformatics | 2016

NOBLE – Flexible concept recognition for large-scale biomedical natural language processing

Eugene Tseytlin; Kevin J. Mitchell; Elizabeth Legowski; Julia Corrigan; Girish Chavan; Rebecca S. Jacobson

BackgroundNatural language processing (NLP) applications are increasingly important in biomedical data analysis, knowledge engineering, and decision support. Concept recognition is an important component task for NLP pipelines, and can be either general-purpose or domain-specific. We describe a novel, flexible, and general-purpose concept recognition component for NLP pipelines, and compare its speed and accuracy against five commonly used alternatives on both a biological and clinical corpus.NOBLE Coder implements a general algorithm for matching terms to concepts from an arbitrary vocabulary set. The system’s matching options can be configured individually or in combination to yield specific system behavior for a variety of NLP tasks. The software is open source, freely available, and easily integrated into UIMA or GATE. We benchmarked speed and accuracy of the system against the CRAFT and ShARe corpora as reference standards and compared it to MMTx, MGrep, Concept Mapper, cTAKES Dictionary Lookup Annotator, and cTAKES Fast Dictionary Lookup Annotator.ResultsWe describe key advantages of the NOBLE Coder system and associated tools, including its greedy algorithm, configurable matching strategies, and multiple terminology input formats. These features provide unique functionality when compared with existing alternatives, including state-of-the-art systems. On two benchmarking tasks, NOBLE’s performance exceeded commonly used alternatives, performing almost as well as the most advanced systems. Error analysis revealed differences in error profiles among systems.ConclusionNOBLE Coder is comparable to other widely used concept recognition systems in terms of accuracy and speed. Advantages of NOBLE Coder include its interactive terminology builder tool, ease of configuration, and adaptability to various domains and tasks. NOBLE provides a term-to-concept matching system suitable for general concept recognition in biomedical NLP pipelines.

Implementation Science | 2015

Computer-supported feedback message tailoring: theory-informed adaptation of clinical audit and feedback for learning and behavior change.

Zach Landis-Lewis; Jamie C. Brehaut; Harry Hochheiser; Gerald P. Douglas; Rebecca S. Jacobson

BackgroundEvidence shows that clinical audit and feedback can significantly improve compliance with desired practice, but it is unclear when and how it is effective. Audit and feedback is likely to be more effective when feedback messages can influence barriers to behavior change, but barriers to change differ across individual health-care providers, stemming from differences in providers’ individual characteristics.DiscussionThe purpose of this article is to invite debate and direct research attention towards a novel audit and feedback component that could enable interventions to adapt to barriers to behavior change for individual health-care providers: computer-supported tailoring of feedback messages. We argue that, by leveraging available clinical data, theory-informed knowledge about behavior change, and the knowledge of clinical supervisors or peers who deliver feedback messages, a software application that supports feedback message tailoring could improve feedback message relevance for barriers to behavior change, thereby increasing the effectiveness of audit and feedback interventions. We describe a prototype system that supports the provision of tailored feedback messages by generating a menu of graphical and textual messages with associated descriptions of targeted barriers to behavior change. Supervisors could use the menu to select messages based on their awareness of each feedback recipient’s specific barriers to behavior change. We anticipate that such a system, if designed appropriately, could guide supervisors towards giving more effective feedback for health-care providers.SummaryA foundation of evidence and knowledge in related health research domains supports the development of feedback message tailoring systems for clinical audit and feedback. Creating and evaluating computer-supported feedback tailoring tools is a promising approach to improving the effectiveness of clinical audit and feedback.

BMC Medical Informatics and Decision Making | 2016

An information model for computable cancer phenotypes

Harry Hochheiser; Melissa Castine; David J. Harris; Guergana Savova; Rebecca S. Jacobson

BackgroundStandards, methods, and tools supporting the integration of clinical data and genomic information are an area of significant need and rapid growth in biomedical informatics. Integration of cancer clinical data and cancer genomic information poses unique challenges, because of the high volume and complexity of clinical data, as well as the heterogeneity and instability of cancer genome data when compared with germline data. Current information models of clinical and genomic data are not sufficiently expressive to represent individual observations and to aggregate those observations into longitudinal summaries over the course of cancer care. These models are acutely needed to support the development of systems and tools for generating the so called clinical “deep phenotype” of individual cancer patients, a process which remains almost entirely manual in cancer research and precision medicine.MethodsReviews of existing ontologies and interviews with cancer researchers were used to inform iterative development of a cancer phenotype information model. We translated a subset of the Fast Healthcare Interoperability Resources (FHIR) models into the OWL 2 Description Logic (DL) representation, and added extensions as needed for modeling cancer phenotypes with terms derived from the NCI Thesaurus. Models were validated with domain experts and evaluated against competency questions.ResultsThe DeepPhe Information model represents cancer phenotype data at increasing levels of abstraction from mention level in clinical documents to summaries of key events and findings. We describe the model using breast cancer as an example, depicting methods to represent phenotypic features of cancers, tumors, treatment regimens, and specific biologic behaviors that span the entire course of a patient’s disease.ConclusionsWe present a multi-scale information model for representing individual document mentions, document level classifications, episodes along a disease course, and phenotype summarization, linking individual observations to high-level summaries in support of subsequent integration and analysis.

International Journal of Medical Informatics | 2015

Barriers to using eHealth data for clinical performance feedback in Malawi: A case study.

Zach Landis-Lewis; Ronald Manjomo; Oliver Jintha Gadabu; Matthew Kam; Bertha N. Simwaka; Susan L. Zickmund; Frank Chimbwandira; Gerald P. Douglas; Rebecca S. Jacobson

INTRODUCTION Sub-optimal performance of healthcare providers in low-income countries is a critical and persistent global problem. The use of electronic health information technology (eHealth) in these settings is creating large-scale opportunities to automate performance measurement and provision of feedback to individual healthcare providers, to support clinical learning and behavior change. An electronic medical record system (EMR) deployed in 66 antiretroviral therapy clinics in Malawi collects data that supervisors use to provide quarterly, clinic-level performance feedback. Understanding barriers to provision of eHealth-based performance feedback for individual healthcare providers in this setting could present a relatively low-cost opportunity to significantly improve the quality of care. OBJECTIVE The aims of this study were to identify and describe barriers to using EMR data for individualized audit and feedback for healthcare providers in Malawi and to consider how to design technology to overcome these barriers. METHODS We conducted a qualitative study using interviews, observations, and informant feedback in eight public hospitals in Malawi where an EMR system is used. We interviewed 32 healthcare providers and conducted seven hours of observation of system use. RESULTS We identified four key barriers to the use of EMR data for clinical performance feedback: provider rotations, disruptions to care processes, user acceptance of eHealth, and performance indicator lifespan. Each of these factors varied across sites and affected the quality of EMR data that could be used for the purpose of generating performance feedback for individual healthcare providers. CONCLUSION Using routinely collected eHealth data to generate individualized performance feedback shows potential at large-scale for improving clinical performance in low-resource settings. However, technology used for this purpose must accommodate ongoing changes in barriers to eHealth data use. Understanding the clinical setting as a complex adaptive system (CAS) may enable designers of technology to effectively model change processes to mitigate these barriers.

Cancer Research | 2015

A Federated Network for Translational Cancer Research Using Clinical Data and Biospecimens

Rebecca S. Jacobson; Michael J. Becich; Roni J. Bollag; Girish Chavan; Julia Corrigan; Rajiv Dhir; Michael Feldman; Carmelo Gaudioso; Elizabeth Legowski; Nita J. Maihle; Kevin J. Mitchell; Monica Murphy; Mayurapriyan Sakthivel; Eugene Tseytlin; JoEllen Weaver

Advances in cancer research and personalized medicine will require significant new bridging infrastructures, including more robust biorepositories that link human tissue to clinical phenotypes and outcomes. In order to meet that challenge, four cancer centers formed the Text Information Extraction System (TIES) Cancer Research Network, a federated network that facilitates data and biospecimen sharing among member institutions. Member sites can access pathology data that are de-identified and processed with the TIES natural language processing system, which creates a repository of rich phenotype data linked to clinical biospecimens. TIES incorporates multiple security and privacy best practices that, combined with legal agreements, network policies, and procedures, enable regulatory compliance. The TIES Cancer Research Network now provides integrated access to investigators at all member institutions, where multiple investigator-driven pilot projects are underway. Examples of federated search across the network illustrate the potential impact on translational research, particularly for studies involving rare cancers, rare phenotypes, and specific biologic behaviors. The network satisfies several key desiderata including local control of data and credentialing, inclusion of rich phenotype information, and applicability to diverse research objectives. The TIES Cancer Research Network presents a model for a national data and biospecimen network.

PLOS ONE | 2016

TCGA Expedition: A Data Acquisition and Management System for TCGA Data.

Uma Chandran; Olga Medvedeva; M. Michael Barmada; Philip D. Blood; Anish Chakka; Soumya Luthra; Antonio G. Ferreira; Kim F. Wong; Adrian V. Lee; Zhihui Zhang; Robert Budden; J. Ray Scott; Annerose Berndt; Jeremy M. Berg; Rebecca S. Jacobson

Background The Cancer Genome Atlas Project (TCGA) is a National Cancer Institute effort to profile at least 500 cases of 20 different tumor types using genomic platforms and to make these data, both raw and processed, available to all researchers. TCGA data are currently over 1.2 Petabyte in size and include whole genome sequence (WGS), whole exome sequence, methylation, RNA expression, proteomic, and clinical datasets. Publicly accessible TCGA data are released through public portals, but many challenges exist in navigating and using data obtained from these sites. We developed TCGA Expedition to support the research community focused on computational methods for cancer research. Data obtained, versioned, and archived using TCGA Expedition supports command line access at high-performance computing facilities as well as some functionality with third party tools. For a subset of TCGA data collected at University of Pittsburgh, we also re-associate TCGA data with de-identified data from the electronic health records. Here we describe the software as well as the architecture of our repository, methods for loading of TCGA data to multiple platforms, and security and regulatory controls that conform to federal best practices. Results TCGA Expedition software consists of a set of scripts written in Bash, Python and Java that download, extract, harmonize, version and store all TCGA data and metadata. The software generates a versioned, participant- and sample-centered, local TCGA data directory with metadata structures that directly reference the local data files as well as the original data files. The software supports flexible searches of the data via a web portal, user-centric data tracking tools, and data provenance tools. Using this software, we created a collaborative repository, the Pittsburgh Genome Resource Repository (PGRR) that enabled investigators at our institution to work with all TCGA data formats, and to interrogate these data with analysis pipelines, and associated tools. WGS data are especially challenging for individual investigators to use, due to issues with downloading, storage, and processing; having locally accessible WGS BAM files has proven invaluable. Conclusion Our open-source, freely available TCGA Expedition software can be used to create a local collaborative infrastructure for acquiring, managing, and analyzing TCGA data and other large public datasets.

JMIR Research Protocols | 2016

Using Nonexperts for Annotating Pharmacokinetic Drug-Drug Interaction Mentions in Product Labeling: A Feasibility Study

Harry Hochheiser; Yifan Ning; Hernandez A; Rebecca S. Jacobson; Richard D. Boyce

Background Because vital details of potential pharmacokinetic drug-drug interactions are often described in free-text structured product labels, manual curation is a necessary but expensive step in the development of electronic drug-drug interaction information resources. The use of nonexperts to annotate potential drug-drug interaction (PDDI) mentions in drug product label annotation may be a means of lessening the burden of manual curation. Objective Our goal was to explore the practicality of using nonexpert participants to annotate drug-drug interaction descriptions from structured product labels. By presenting annotation tasks to both pharmacy experts and relatively naïve participants, we hoped to demonstrate the feasibility of using nonexpert annotators for drug-drug information annotation. We were also interested in exploring whether and to what extent natural language processing (NLP) preannotation helped improve task completion time, accuracy, and subjective satisfaction. Methods Two experts and 4 nonexperts were asked to annotate 208 structured product label sections under 4 conditions completed sequentially: (1) no NLP assistance, (2) preannotation of drug mentions, (3) preannotation of drug mentions and PDDIs, and (4) a repeat of the no-annotation condition. Results were evaluated within the 2 groups and relative to an existing gold standard. Participants were asked to provide reports on the time required to complete tasks and their perceptions of task difficulty. Results One of the experts and 3 of the nonexperts completed all tasks. Annotation results from the nonexpert group were relatively strong in every scenario and better than the performance of the NLP pipeline. The expert and 2 of the nonexperts were able to complete most tasks in less than 3 hours. Usability perceptions were generally positive (3.67 for expert, mean of 3.33 for nonexperts). Conclusions The results suggest that nonexpert annotation might be a feasible option for comprehensive labeling of annotated PDDIs across a broader range of drug product labels. Preannotation of drug mentions may ease the annotation task. However, preannotation of PDDIs, as operationalized in this study, presented the participants with difficulties. Future work should test if these issues can be addressed by the use of better performing NLP and a different approach to presenting the PDDI preannotations to users during the annotation workflow.

PLOS ONE | 2015

Needs Assessment for Research Use of High-Throughput Sequencing at a Large Academic Medical Center.

Albert Geskin; Elizabeth Legowski; Anish Chakka; Uma Chandran; M. Michael Barmada; William A. LaFramboise; Jeremy Berg; Rebecca S. Jacobson

Next Generation Sequencing (NGS) methods are driving profound changes in biomedical research, with a growing impact on patient care. Many academic medical centers are evaluating potential models to prepare for the rapid increase in NGS information needs. This study sought to investigate (1) how and where sequencing data is generated and analyzed, (2) research objectives and goals for NGS, (3) workforce capacity and unmet needs, (4) storage capacity and unmet needs, (5) available and anticipated funding resources, and (6) future challenges. As a precursor to informed decision making at our institution, we undertook a systematic needs assessment of investigators using survey methods. We recruited 331 investigators from over 60 departments and divisions at the University of Pittsburgh Schools of Health Sciences and had 140 respondents, or a 42% response rate. Results suggest that both sequencing and analysis bottlenecks currently exist. Significant educational needs were identified, including both investigator-focused needs, such as selection of NGS methods suitable for specific research objectives, and program-focused needs, such as support for training an analytic workforce. The absence of centralized infrastructure was identified as an important institutional gap. Key principles for organizations managing this change were formulated based on the survey responses. This needs assessment provides an in-depth case study which may be useful to other academic medical centers as they identify and plan for future needs.

Journal of Biomedical Informatics | 2017

Automated annotation and classification of BI-RADS assessment from radiology reports

Sergio M. Castro; Eugene Tseytlin; Olga Medvedeva; Kevin J. Mitchell; Shyam Visweswaran; Tanja Bekhuis; Rebecca S. Jacobson

The Breast Imaging Reporting and Data System (BI-RADS) was developed to reduce variation in the descriptions of findings. Manual analysis of breast radiology report data is challenging but is necessary for clinical and healthcare quality assurance activities. The objective of this study is to develop a natural language processing (NLP) system for automated BI-RADS categories extraction from breast radiology reports. We evaluated an existing rule-based NLP algorithm, and then we developed and evaluated our own method using a supervised machine learning approach. We divided the BI-RADS category extraction task into two specific tasks: (1) annotation of all BI-RADS category values within a report, (2) classification of the laterality of each BI-RADS category value. We used one algorithm for task 1 and evaluated three algorithms for task 2. Across all evaluations and model training, we used a total of 2159 radiology reports from 18 hospitals, from 2003 to 2015. Performance with the existing rule-based algorithm was not satisfactory. Conditional random fields showed a high performance for task 1 with an F-1 measure of 0.95. Rules from partial decision trees (PART) algorithm showed the best performance across classes for task 2 with a weighted F-1 measure of 0.91 for BIRADS 0-6, and 0.93 for BIRADS 3-5. Classification performance by class showed that performance improved for all classes from Naïve Bayes to Support Vector Machine (SVM), and also from SVM to PART. Our system is able to annotate and classify all BI-RADS mentions present in a single radiology report and can serve as the foundation for future studies that will leverage automated BI-RADS annotation, to provide feedback to radiologists as part of a learning health system loop.

Cancer Research | 2017

DeepPhe: A Natural Language Processing System for Extracting Cancer Phenotypes from Clinical Records

Guergana Savova; Eugene Tseytlin; Sean Finan; Melissa Castine; Timothy A. Miller; Olga Medvedeva; David J. Harris; Harry Hochheiser; Chen Lin; Girish Chavan; Rebecca S. Jacobson

Precise phenotype information is needed to understand the effects of genetic and epigenetic changes on tumor behavior and responsiveness. Extraction and representation of cancer phenotypes is currently mostly performed manually, making it difficult to correlate phenotypic data to genomic data. In addition, genomic data are being produced at an increasingly faster pace, exacerbating the problem. The DeepPhe software enables automated extraction of detailed phenotype information from electronic medical records of cancer patients. The system implements advanced Natural Language Processing and knowledge engineering methods within a flexible modular architecture, and was evaluated using a manually annotated dataset of the University of Pittsburgh Medical Center breast cancer patients. The resulting platform provides critical and missing computational methods for computational phenotyping. Working in tandem with advanced analysis of high-throughput sequencing, these approaches will further accelerate the transition to precision cancer treatment. Cancer Res; 77(21); e115-8. ©2017 AACR.

Explore More