Yoni Halpern | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Yoni Halpern is active.

Explore More

Publication

Featured researches published by Yoni Halpern.

Journal of the American Medical Informatics Association | 2016

Electronic medical record phenotyping using the anchor and learn framework

Yoni Halpern; Steven Horng; Youngduck Choi; David Sontag

BACKGROUND Electronic medical records (EMRs) hold a tremendous amount of information about patients that is relevant to determining the optimal approach to patient care. As medicine becomes increasingly precise, a patients electronic medical record phenotype will play an important role in triggering clinical decision support systems that can deliver personalized recommendations in real time. Learning with anchors presents a method of efficiently learning statistically driven phenotypes with minimal manual intervention. MATERIALS AND METHODS We developed a phenotype library that uses both structured and unstructured data from the EMR to represent patients for real-time clinical decision support. Eight of the phenotypes were evaluated using retrospective EMR data on emergency department patients using a set of prospectively gathered gold standard labels. RESULTS We built a phenotype library with 42 publicly available phenotype definitions. Using information from triage time, the phenotype classifiers have an area under the ROC curve (AUC) of infection 0.89, cancer 0.88, immunosuppressed 0.85, septic shock 0.93, nursing home 0.87, anticoagulated 0.83, cardiac etiology 0.89, and pneumonia 0.90. Using information available at the time of disposition from the emergency department, the AUC values are infection 0.91, cancer 0.95, immunosuppressed 0.90, septic shock 0.97, nursing home 0.91, anticoagulated 0.94, cardiac etiology 0.92, and pneumonia 0.97. DISCUSSION The resulting phenotypes are interpretable and fast to build, and perform comparably to statistically learned phenotypes developed with 5000 manually labeled patients. CONCLUSION Learning with anchors is an attractive option for building a large public repository of phenotype definitions that can be used for a range of health IT applications, including real-time decision support.

PLOS ONE | 2017

Creating an automated trigger for sepsis clinical decision support at emergency department triage using machine learning

Steven Horng; David Sontag; Yoni Halpern; Yacine Jernite; Nathan I. Shapiro; Larry A. Nathanson

Objective To demonstrate the incremental benefit of using free text data in addition to vital sign and demographic data to identify patients with suspected infection in the emergency department. Methods This was a retrospective, observational cohort study performed at a tertiary academic teaching hospital. All consecutive ED patient visits between 12/17/08 and 2/17/13 were included. No patients were excluded. The primary outcome measure was infection diagnosed in the emergency department defined as a patient having an infection related ED ICD-9-CM discharge diagnosis. Patients were randomly allocated to train (64%), validate (20%), and test (16%) data sets. After preprocessing the free text using bigram and negation detection, we built four models to predict infection, incrementally adding vital signs, chief complaint, and free text nursing assessment. We used two different methods to represent free text: a bag of words model and a topic model. We then used a support vector machine to build the prediction model. We calculated the area under the receiver operating characteristic curve to compare the discriminatory power of each model. Results A total of 230,936 patient visits were included in the study. Approximately 14% of patients had the primary outcome of diagnosed infection. The area under the ROC curve (AUC) for the vitals model, which used only vital signs and demographic data, was 0.67 for the training data set, 0.67 for the validation data set, and 0.67 (95% CI 0.65–0.69) for the test data set. The AUC for the chief complaint model which also included demographic and vital sign data was 0.84 for the training data set, 0.83 for the validation data set, and 0.83 (95% CI 0.81–0.84) for the test data set. The best performing methods made use of all of the free text. In particular, the AUC for the bag-of-words model was 0.89 for training data set, 0.86 for the validation data set, and 0.86 (95% CI 0.85–0.87) for the test data set. The AUC for the topic model was 0.86 for the training data set, 0.86 for the validation data set, and 0.85 (95% CI 0.84–0.86) for the test data set. Conclusion Compared to previous work that only used structured data such as vital signs and demographic information, utilizing free text drastically improves the discriminatory ability (increase in AUC from 0.67 to 0.86) of identifying infection.

Scientific Reports | 2017

Learning a Health Knowledge Graph from Electronic Medical Records

Maya Rotmensch; Yoni Halpern; Abdulhakim Tlimat; Steven Horng; David Sontag

Demand for clinical decision support systems in medicine and self-diagnostic symptom checkers has substantially increased in recent years. Existing platforms rely on knowledge bases manually compiled through a labor-intensive process or automatically derived using simple pairwise statistics. This study explored an automated process to learn high quality knowledge bases linking diseases and symptoms directly from electronic medical records. Medical concepts were extracted from 273,174 de-identified patient records and maximum likelihood estimation of three probabilistic models was used to automatically construct knowledge graphs: logistic regression, naive Bayes classifier and a Bayesian network using noisy OR gates. A graph of disease-symptom relationships was elicited from the learned parameters and the constructed knowledge graphs were evaluated and validated, with permission, against Google’s manually-constructed knowledge graph and against expert physician opinions. Our study shows that direct and automated construction of high quality health knowledge graphs from medical records using rudimentary concept extraction is feasible. The noisy OR model produces a high quality knowledge graph reaching precision of 0.85 for a recall of 0.6 in the clinical evaluation. Noisy OR significantly outperforms all tested models across evaluation frameworks (p < 0.01).

knowledge discovery and data mining | 2017

Learning to Count Mosquitoes for the Sterile Insect Technique

Yaniv Ovadia; Yoni Halpern; Dilip Krishnan; Josh Livni; Daniel Newburger; Ryan Poplin; Tiantian Zha; D. Sculley

Mosquito-borne illnesses such as dengue, chikungunya, and Zika are major global health problems, which are not yet addressable with vaccines and must be countered by reducing mosquito populations. The Sterile Insect Technique (SIT) is a promising alternative to pesticides; however, effective SIT relies on minimal releases of female insects. This paper describes a multi-objective convolutional neural net to significantly streamline the process of counting male and female mosquitoes released from a SIT factory and provides a statistical basis for verifying strict contamination rate limits from these counts despite measurement noise. These results are a promising indication that such methods may dramatically reduce the cost of effective SIT methods in practice.

bioRxiv | 2017

Contextual Autocomplete: A Novel User Interface Using Machine Learning to Improve Ontology Usage and Structured Data Capture for Presenting Problems in the Emergency Department

Nathaniel R Greenbaum; Yacine Jernite; Yoni Halpern; Shelley Calder; Larry A. Nathanson; David Sontag; Steven Horng

Objective To determine the effect of contextual autocomplete, a user interface that uses machine learning, on the efficiency and quality of documentation of presenting problems (chief complaints) in the emergency department (ED). Materials and Methods We used contextual autocomplete, a user interface that ranks concepts by their predicted probability, to help nurses enter data about a patient’s reason for visiting the ED. Predicted probabilities were calculated using a previously derived model based on triage vital signs and a brief free text note. We evaluated the percentage and quality of structured data captured using a prospective before-and-after study design. Results A total of 279,231 patient encounters were analyzed. Structured data capture improved from 26.2% to 97.2% (p<0.0001). During the post-implementation period, presenting problems were more complete (3.35 vs 3.66; p=0.0004), as precise (3.59 vs. 3.74; p=0.1), and higher in overall quality (3.38 vs. 3.72; p=0.0002). Our system reduced the mean number of keystrokes required to document a presenting problem from 11.6 to 0.6 (p<0.0001), a 95% improvement. Discussion We have demonstrated a technique that captures structured data on nearly all patients. We estimate that our system reduces the number of man-hours required annually to type presenting problems at our institution from 92.5 hours to 4.8 hours. Conclusion Implementation of a contextual autocomplete system resulted in improved structured data capture, ontology usage compliance, and data quality.

bioRxiv | 2017

Derivation and Validation of a Record Linkage Algorithm between EMS and the Emergency Department

Colby Redfield; Abdulhakim Tlimat; Yoni Halpern; David Schoenfeld; Edward Ullman; David Sontag; Larry A. Nathanson; Steven Horng

Background Linking EMS electronic patient care reports (ePCRs) to ED records can provide clinicians access to vital information that can alter management. It can also create rich databases for research and quality improvement. Unfortunately, previous attempts at ePCR - ED record linkage have had limited success. Objective To derive and validate an automated record linkage algorithm between EMS ePCR’s and ED records using supervised machine learning. Methods All consecutive ePCR’s from a single EMS provider between June 2013 and June 2015 were included. A primary reviewer matched ePCR’s to a list of ED patients to create a gold standard. Age, gender, last name, first name, social security number (SSN), and date of birth (DOB) were extracted. Data was randomly split into 80%/20% training and test data sets. We derived missing indicators, identical indicators, edit distances, and percent differences. A multivariate logistic regression model was trained using 5k fold cross-validation, using label k-fold, L2 regularization, and class re-weighting. Results A total of 14,032 ePCRs were included in the study. Inter-rater reliability between the primary and secondary reviewer had a Kappa of 0.9. The algorithm had a sensitivity of 99.4%, a PPV of 99.9% and AUC of 0.99 in both the training and test sets. DOB match had the highest odd ratio of 16.9, followed by last name match (10.6). SSN match had an odds ratio of 3.8. Conclusions We were able to successfully derive and validate a probabilistic record linkage algorithm from a single EMS ePCR provider to our hospital EMR.

american medical informatics association annual symposium | 2014