Is this you? Create Your Porfile

John Zech

Icahn School of Medicine at Mount Sinai

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where John Zech is active.

Explore More

Publication

Featured researches published by John Zech.

Radiology | 2018

Natural Language–based Machine Learning Models for the Annotation of Clinical Radiology Reports

John Zech; Margaret Pain; J. Titano; Marcus A. Badgeley; Javin Schefflein; Andres Su; Anthony B. Costa; Joshua B. Bederson; Joseph Lehar; Eric K. Oermann

Purpose To compare different methods for generating features from radiology reports and to develop a method to automatically identify findings in these reports. Materials and Methods In this study, 96 303 head computed tomography (CT) reports were obtained. The linguistic complexity of these reports was compared with that of alternative corpora. Head CT reports were preprocessed, and machine-analyzable features were constructed by using bag-of-words (BOW), word embedding, and Latent Dirichlet allocation-based approaches. Ultimately, 1004 head CT reports were manually labeled for findings of interest by physicians, and a subset of these were deemed critical findings. Lasso logistic regression was used to train models for physician-assigned labels on 602 of 1004 head CT reports (60%) using the constructed features, and the performance of these models was validated on a held-out 402 of 1004 reports (40%). Models were scored by area under the receiver operating characteristic curve (AUC), and aggregate AUC statistics were reported for (a) all labels, (b) critical labels, and (c) the presence of any critical finding in a report. Sensitivity, specificity, accuracy, and F1 score were reported for the best performing models (a) predictions of all labels and (b) identification of reports containing critical findings. Results The best-performing model (BOW with unigrams, bigrams, and trigrams plus average word embeddings vector) had a held-out AUC of 0.966 for identifying the presence of any critical head CT finding and an average 0.957 AUC across all head CT findings. Sensitivity and specificity for identifying the presence of any critical finding were 92.59% (175 of 189) and 89.67% (191 of 213), respectively. Average sensitivity and specificity across all findings were 90.25% (1898 of 2103) and 91.72% (18 351 of 20 007), respectively. Simpler BOW methods achieved results competitive with those of more sophisticated approaches, with an average AUC for presence of any critical finding of 0.951 for unigram BOW versus 0.966 for the best-performing model. The Yule I of the head CT corpus was 34, markedly lower than that of the Reuters corpus (at 103) or I2B2 discharge summaries (at 271), indicating lower linguistic complexity. Conclusion Automated methods can be used to identify findings in radiology reports. The success of this approach benefits from the standardized language of these reports. With this method, a large labeled corpus can be generated for applications such as deep learning.

Nature Medicine | 2018

Automated deep-neural-network surveillance of cranial images for acute neurologic events

J. Titano; Marcus A. Badgeley; Javin Schefflein; Margaret Pain; Andres Su; Michael Cai; Nathaniel C. Swinburne; John Zech; Jun Kim; Joshua B. Bederson; J Mocco; Burton P. Drayer; Joseph Lehar; Samuel K. Cho; Anthony B. Costa; Eric K. Oermann

Rapid diagnosis and treatment of acute neurological illnesses such as stroke, hemorrhage, and hydrocephalus are critical to achieving positive outcomes and preserving neurologic function—‘time is brain’1–5. Although these disorders are often recognizable by their symptoms, the critical means of their diagnosis is rapid imaging6–10. Computer-aided surveillance of acute neurologic events in cranial imaging has the potential to triage radiology workflow, thus decreasing time to treatment and improving outcomes. Substantial clinical work has focused on computer-assisted diagnosis (CAD), whereas technical work in volumetric image analysis has focused primarily on segmentation. 3D convolutional neural networks (3D-CNNs) have primarily been used for supervised classification on 3D modeling and light detection and ranging (LiDAR) data11–15. Here, we demonstrate a 3D-CNN architecture that performs weakly supervised classification to screen head CT images for acute neurologic events. Features were automatically learned from a clinical radiology dataset comprising 37,236 head CTs and were annotated with a semisupervised natural-language processing (NLP) framework16. We demonstrate the effectiveness of our approach to triage radiology workflow and accelerate the time to diagnosis from minutes to seconds through a randomized, double-blinded, prospective trial in a simulated clinical environment.A deep-learning algorithm is developed to provide rapid and accurate diagnosis of clinical 3D head CT-scan images to triage and prioritize urgent neurological events, thus potentially accelerating time to diagnosis and care in clinical settings.

Applied Clinical Informatics | 2016

Measuring the Degree of Unmatched Patient Records in a Health Information Exchange Using Exact Matching

John Zech; Gregg Husk; Thomas Moore; Jason S. Shapiro

BACKGROUND Health information exchange (HIE) facilitates the exchange of patient information across different healthcare organizations. To match patient records across sites, HIEs usually rely on a master patient index (MPI), a database responsible for determining which medical records at different healthcare facilities belong to the same patient. A single patients records may be improperly split across multiple profiles in the MPI. OBJECTIVES We investigated the how often two individuals shared the same first name, last name, and date of birth in the Social Security Death Master File (SSDMF), a US government database containing over 85 million individuals, to determine the feasibility of using exact matching as a split record detection tool. We demonstrated how a method based on exact record matching could be used to partially measure the degree of probable split patient records in the MPI of an HIE. METHODS We calculated the percentage of individuals who were uniquely identified in the SSDMF using first name, last name, and date of birth. We defined a measure consisting of the average number of unique identifiers associated with a given first name, last name, and date of birth. We calculated a reference value for this measure on a subsample of SSDMF data. We compared this measure value to data from a functioning HIE. RESULTS We found that it was unlikely for two individuals to share the same first name, last name, and date of birth in a large US database including over 85 million individuals. 98.81% of individuals were uniquely identified in this dataset using only these three items. We compared the value of our measure on a subsample of Social Security data (1.00089) to that of HIE data (1.1238) and found a significant difference (t-test p-value < 0.001). CONCLUSIONS This method may assist HIEs in detecting split patient records.

Bioinformatics | 2018

CANDI: an R package and Shiny app for annotating radiographs and evaluating computer-aided diagnosis

Marcus A. Badgeley; Manway Liu; Benjamin S. Glicksberg; Mark Shervey; John Zech; Khader Shameer; Joseph Lehar; Eric K. Oermann; Michael V. McConnell; Thomas M Snyder; Joel T. Dudley

Abstract Motivation Radiologists have used algorithms for Computer-Aided Diagnosis (CAD) for decades. These algorithms use machine learning with engineered features, and there have been mixed findings on whether they improve radiologists’ interpretations. Deep learning offers superior performance but requires more training data and has not been evaluated in joint algorithm-radiologist decision systems. Results We developed the Computer-Aided Note and Diagnosis Interface (CANDI) for collaboratively annotating radiographs and evaluating how algorithms alter human interpretation. The annotation app collects classification, segmentation, and image captioning training data, and the evaluation app randomizes the availability of CAD tools to facilitate clinical trials on radiologist enhancement. Availability and implementation Demonstrations and source code are hosted at (https://candi.nextgenhealthcare.org), and (https://github.com/mbadge/candi), respectively, under GPL-3 license. Supplementary information Supplementary material is available at Bioinformatics online.

Journal of the American Medical Informatics Association | 2015