Suzanne Tamang | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Suzanne Tamang is active.

Explore More

Publication

Featured researches published by Suzanne Tamang.

Drug Safety | 2014

Text mining for adverse drug events: the promise, challenges, and state of the art.

Rave Harpaz; Alison Callahan; Suzanne Tamang; Yen S. Low; David Odgers; Sam Finlayson; Kenneth Jung; Paea LePendu; Nigam H. Shah

Text mining is the computational process of extracting meaningful information from large amounts of unstructured text. It is emerging as a tool to leverage underutilized data sources that can improve pharmacovigilance, including the objective of adverse drug event (ADE) detection and assessment. This article provides an overview of recent advances in pharmacovigilance driven by the application of text mining, and discusses several data sources—such as biomedical literature, clinical narratives, product labeling, social media, and Web search logs—that are amenable to text mining for pharmacovigilance. Given the state of the art, it appears text mining can be applied to extract useful ADE-related information from multiple textual sources. Nonetheless, further research is required to address remaining technical challenges associated with the text mining methodologies, and to conclusively determine the relative contribution of each textual source to improving pharmacovigilance.

asia information retrieval symposium | 2010

Top-down and Bottom-up: A Combined Approach to Slot Filling

Zheng Chen; Suzanne Tamang; Adam Lee; Xiang Li; Marissa Passantino; Heng Ji

The Slot Filling task requires a system to automatically distill information from a large document collection and return answers for a query entity with specified attributes (‘slots’), and use them to expand the Wikipedia infoboxes. We describe two bottom-up Information Extraction style pipelines and a top-down Question Answering style pipeline to address this task. We propose several novel approaches to enhance these pipelines, including statistical answer re-ranking and Markov Logic Networks based cross-slot reasoning. We demonstrate that our system achieves state-of-the-art performance, with 3.1% higher precision and 2.6% higher recall compared with the best system in the KBP2009 evaluation.

Knowledge and Information Systems | 2014

Tackling representation, annotation and classification challenges for temporal knowledge base population

Heng Ji; Taylor Cassidy; Qi Li; Suzanne Tamang

Temporal Information Extraction (TIE) plays an important role in many natural language processing and database applications. Temporal slot filling (TSF) is a new and ambitious TIE task prepared for the knowledge base population (KBP2011) track of NIST Text Analysis Conference. TSF requires systems to discover temporally bound facts about entities and their attributes in order to populate a structured knowledge base. In this paper, we will provide an overview of the unique challenges of this new task and our novel approaches to address these challenges. We present challenges from three perspectives: (1) Temporal information representation: We will review the relevant linguistic semantic theories of temporal information and their limitations, motivating the need to develop a new (4-tuple) representation framework for the task. (2) Annotation acquisition: The lack of substantial labeled training data for supervised learning is a limiting factor in the design of TSF systems. Our work examines the use of multi-class logistic regression methods to improve the labeling quality of training data obtained by distant supervision. (3) Temporal information classification: Another key challenge lies in capturing relations between salient text elements separated by a long context. We develop two approaches for temporal classification and combine them through cross-document aggregation: a flat approach that uses lexical context and shallow dependency features and a structured approach that captures long syntactic contexts by using a dependency path kernel tailored for this task. Experimental results demonstrated that our annotation enhancement approach dramatically increased the speed of the training procedure (by almost 100 times), and that the flat and structured classification approaches were complementary, together yielding a state-of-the-art TSF system.

Journal of Oncology Practice | 2015

Detecting Unplanned Care From Clinician Notes in Electronic Health Records

Suzanne Tamang; Manali I. Patel; Douglas W. Blayney; Julie Lawrence Kuznetsov; Samuel G. Finlayson; Yohan Vetteth; Nigam H. Shah

PURPOSE Reduction in unplanned episodes of care, such as emergency department visits and unplanned hospitalizations, are important quality outcome measures. However, many events are only documented in free-text clinician notes and are labor intensive to detect by manual medical record review. METHODS We studied 308,096 free-text machine-readable documents linked to individual entries in our electronic health records, representing care for patients with breast, GI, or thoracic cancer, whose treatment was initiated at one academic medical center, Stanford Health Care (SHC). Using a clinical text-mining tool, we detected unplanned episodes documented in clinician notes (for non-SHC visits) or in coded encounter data for SHC-delivered care and the most frequent symptoms documented in emergency department (ED) notes. RESULTS Combined reporting increased the identification of patients with one or more unplanned care visits by 32% (15% using coded data; 20% using all the data) among patients with 3 months of follow-up and by 21% (23% using coded data; 28% using all the data) among those with 1 year of follow-up. Based on the textual analysis of SHC ED notes, pain (75%), followed by nausea (54%), vomiting (47%), infection (36%), fever (28%), and anemia (27%), were the most frequent symptoms mentioned. Pain, nausea, and vomiting co-occur in 35% of all ED encounter notes. CONCLUSION The text-mining methods we describe can be applied to automatically review free-text clinician notes to detect unplanned episodes of care mentioned in these notes. These methods have broad application for quality improvement efforts in which events of interest occur outside of a network that allows for patient data sharing.

eGEMs (Generating Evidence & Methods to improve patient outcomes) | 2016

New Paradigms for Patient-Centered Outcomes Research in Electronic Medical Records: An Example of Detecting Urinary Incontinence Following Prostatectomy.

Tina Hernandez-Boussard; Suzanne Tamang; Douglas W. Blayney; James D. Brooks; Nigam H. Shah

Introduction: National initiatives to develop quality metrics emphasize the need to include patient-centered outcomes. Patient-centered outcomes are complex, require documentation of patient communications, and have not been routinely collected by healthcare providers. The widespread implementation of electronic medical records (EHR) offers opportunities to assess patient-centered outcomes within the routine healthcare delivery system. The objective of this study was to test the feasibility and accuracy of identifying patient centered outcomes within the EHR. Methods: Data from patients with localized prostate cancer undergoing prostatectomy were used to develop and test algorithms to accurately identify patient-centered outcomes in post-operative EHRs – we used urinary incontinence as the use case. Standard data mining techniques were used to extract and annotate free text and structured data to assess urinary incontinence recorded within the EHRs. Results A total 5,349 prostate cancer patients were identified in our EHR-system between 1998–2013. Among these EHRs, 30.3% had a text mention of urinary incontinence within 90 days post-operative compared to less than 1.0% with a structured data field for urinary incontinence (i.e. ICD-9 code). Our workflow had good precision and recall for urinary incontinence (positive predictive value: 0.73 and sensitivity: 0.84). Discussion. Our data indicate that important patient-centered outcomes, such as urinary incontinence, are being captured in EHRs as free text and highlight the long-standing importance of accurate clinician documentation. Standard data mining algorithms can accurately and efficiently identify these outcomes in existing EHRs; the complete assessment of these outcomes is essential to move practice into the patient-centered realm of healthcare.

Proceedings of the 2011 workshop on Data mining for medicine and healthcare | 2011

Using semi-parametric clustering applied to electronic health record time series data

Suzanne Tamang; Simon Parsons

We describe a exible framework for biomedical time series clustering that aims to facilitate the use of temporal information derived from EHRs in a meaningful way. As a case study, we use a dataset indicating the presence of physician ordered glucose tests for a population of hospitalized patients and aim to group individuals with similar disease status. Our approach pairs Hidden Markov Models (HMMs) to abstract variable length temporal information, with non-parametric spectral clustering to reveal inherent group structure. We focus on systematically comparing the performance of our approach with two alternative clustering methods that use various time series statistics instead of HMM based temporal features. Intrinsic evaluation of cluster quality shows a dramatic improvement using the HMM based feature set, generating clusters that indicate more than 90% of patients are similar to members of their own cluster, and distinct from patients in neighboring clusters.

Proceedings of the 1st international workshop on Search and mining entity-relationship data | 2011

Adding smarter systems instead of human annotators: re-ranking for system combination

Suzanne Tamang; Heng Ji

Using a Knowledge Base Population (KBP) slot filling task as a case study, we describe a re-ranking framework in the context of two experimental settings: (1) high transparency; a few pipelines share similar resources that can be used to provide the developer detailed intermediate answer results; (2) low transparency; many systems use diverse resources, and serve as black boxes, absent of any intermediate system results. In both settings, our results show that statistical re-ranking can effectively combine automated systems, achieving better performance than the best state-of-the-art individual system (6.6% absolute improvement in F-score) and alternative combination methods. Furthermore, to create labeled data for system development and assessment, information extraction tasks often require expensive human annotators to struggle with the vast amounts of information contained within a large scale corpus. In this paper, we demonstrate the impact of our learning-to-rank framework to combine output from multiple slot filling systems to populate entity-attribute facts in a knowledge base. We show that our approach can be used to create answer keys more efficiently and at a lower cost (63.5% reduction) than laborious human annotation.

international acm sigir conference on research and development in information retrieval | 2011

A toolkit for knowledge base population

Zheng Chen; Suzanne Tamang; Adam Lee; Heng Ji

The main goal of knowledge base population (KBP) is to distill entity information (e.g., facts of a person) from multiple unstructured and semi-structured data sources, and incorporate the information into a knowledge base (KB). In this work, we intend to release an open source KBP toolkit that is publicly available for research purposes.

BMJ Open | 2017

Predicting patient ‘cost blooms’ in Denmark: a longitudinal population-based study

Suzanne Tamang; Arnold Milstein; Henrik Toft Sørensen; Lars Pedersen; Lester W. Mackey; Jean-Raymond Betterton; Lucas Janson; Nigam H. Shah

Objectives To compare the ability of standard versus enhanced models to predict future high-cost patients, especially those who move from a lower to the upper decile of per capita healthcare expenditures within 1 year—that is, ‘cost bloomers’. Design We developed alternative models to predict being in the upper decile of healthcare expenditures in year 2 of a sample, based on data from year 1. Our 6 alternative models ranged from a standard cost-prediction model with 4 variables (ie, traditional model features), to our largest enhanced model with 1053 non-traditional model features. To quantify any increases in predictive power that enhanced models achieved over standard tools, we compared the prospective predictive performance of each model. Participants and Setting We used the population of Western Denmark between 2004 and 2011 (2 146 801 individuals) to predict future high-cost patients and characterise high-cost patient subgroups. Using the most recent 2-year period (2010–2011) for model evaluation, our whole-population model used a cohort of 1 557 950 individuals with a full year of active residency in year 1 (2010). Our cost-bloom model excluded the 155 795 individuals who were already high cost at the population level in year 1, resulting in 1 402 155 individuals for prediction of cost bloomers in year 2 (2011). Primary outcome measures Using unseen data from a future year, we evaluated each models prospective predictive performance by calculating the ratio of predicted high-cost patient expenditures to the actual high-cost patient expenditures in Year 2—that is, cost capture. Results Our best enhanced model achieved a 21% and 30% improvement in cost capture over a standard diagnosis-based model for predicting population-level high-cost patients and cost bloomers, respectively. Conclusions In combination with modern statistical learning methods for analysing large data sets, models enhanced with a large and diverse set of features led to better performance—especially for predicting future cost bloomers.

JAMA Internal Medicine | 2018

Potential Biases in Machine Learning Algorithms Using Electronic Health Record Data

Milena A. Gianfrancesco; Suzanne Tamang; Jinoos Yazdany; Gabriela Schmajuk

A promise of machine learning in health care is the avoidance of biases in diagnosis and treatment; a computer algorithm could objectively synthesize and interpret the data in the medical record. Integration of machine learning with clinical decision support tools, such as computerized alerts or diagnostic support, may offer physicians and others who provide health care targeted and timely information that can improve clinical decisions. Machine learning algorithms, however, may also be subject to biases. The biases include those related to missing data and patients not identified by algorithms, sample size and underestimation, and misclassification and measurement error. There is concern that biases and deficiencies in the data used by machine learning algorithms may contribute to socioeconomic disparities in health care. This Special Communication outlines the potential biases that may be introduced into machine learning–based clinical decision support tools that use electronic health record data and proposes potential solutions to the problems of overreliance on automation, algorithms based on biased data, and algorithms that do not provide information that is clinically meaningful. Existing health care disparities should not be amplified by thoughtless or excessive reliance on machines.

Explore More