Scott M. Lundberg | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Scott M. Lundberg is active.

Explore More

Publication

Featured researches published by Scott M. Lundberg.

Genome Biology | 2016

ChromNet: Learning the human chromatin network from all ENCODE ChIP-seq data

Scott M. Lundberg; William B. Tu; Brian Raught; Linda Z. Penn; Michael M. Hoffman; Su-In Lee

A cell’s epigenome arises from interactions among regulatory factors—transcription factors and histone modifications—co-localized at particular genomic regions. We developed a novel statistical method, ChromNet, to infer a network of these interactions, the chromatin network, by inferring conditional-dependence relationships among a large number of ChIP-seq data sets. We applied ChromNet to all available 1451 ChIP-seq data sets from the ENCODE Project, and showed that ChromNet revealed previously known physical interactions better than alternative approaches. We experimentally validated one of the previously unreported interactions, MYC–HCFC1. An interactive visualization tool is available at http://chromnet.cs.washington.edu.

bioRxiv | 2017

Explainable machine learning predictions to help anesthesiologists prevent hypoxemia during surgery

Scott M. Lundberg; Bala G. Nair; Monica S. Vavilala; Mayumi Horibe; Michael J. Eisses; Trevor Adams; David E. Liston; Daniel King-Wai Low; Shu-Fang Newman; Jerry Kim; Su-In Lee

Hypoxemia causes serious patient harm, and while anesthesiologists strive to avoid hypoxemia during surgery, anesthesiologists are not reliably able to predict which patients will have intraoperative hypoxemia. Using minute by minute EMR data from fifty thousand surgeries we developed and tested a machine learning based system called Prescience that predicts real-time hypoxemia risk and presents an explanation of factors contributing to that risk during general anesthesia. Prescience improved anesthesiologists’ performance when providing interpretable hypoxemia risks with contributing factors. The results suggest that if anesthesiologists currently anticipate 15% of events, then with Prescience assistance they could anticipate 30% of events or an estimated additional 2.4 million annually in the US, a large portion of which may be preventable because they are attributable to modifiable factors. The prediction explanations are broadly consistent with the literature and anesthesiologists’ prior knowledge. Prescience can also improve clinical understanding of hypoxemia risk during anesthesia by providing general insights into the exact changes in risk induced by certain patient or procedure characteristics. Making predictions of complex medical machine learning models (such as Prescience) interpretable has broad applicability to other data-driven prediction tasks in medicine.

Nature Biomedical Engineering | 2018

Explainable machine-learning predictions for the prevention of hypoxaemia during surgery

Scott M. Lundberg; Bala G. Nair; Monica S. Vavilala; Mayumi Horibe; Michael J. Eisses; Trevor Adams; David E. Liston; Daniel King-Wai Low; Shu-Fang Newman; Jerry Kim; Su-In Lee

Although anaesthesiologists strive to avoid hypoxaemia during surgery, reliably predicting future intraoperative hypoxaemia is not possible at present. Here, we report the development and testing of a machine-learning-based system that predicts the risk of hypoxaemia and provides explanations of the risk factors in real time during general anaesthesia. The system, which was trained on minute-by-minute data from the electronic medical records of over 50,000 surgeries, improved the performance of anaesthesiologists by providing interpretable hypoxaemia risks and contributing factors. The explanations for the predictions are broadly consistent with the literature and with prior knowledge from anaesthesiologists. Our results suggest that if anaesthesiologists currently anticipate 15% of hypoxaemia events, with the assistance of this system they could anticipate 30%, a large portion of which may benefit from early intervention because they are associated with modifiable factors. The system can help improve the clinical understanding of hypoxaemia risk during anaesthesia care by providing general insights into the exact changes in risk induced by certain characteristics of the patient or procedure.An alert system based on machine learning and trained on surgical data from electronic medical records helps anaesthesiologists prevent hypoxaemia during surgery by providing interpretable real-time predictions.

bioRxiv | 2018

AIControl: Replacing matched control experiments with machine learning improves ChIP-seq peak identification

Naozumi Hiranuma; Scott M. Lundberg; Su-In Lee

Motivation Accurately identifying the binding sites of regulatory proteins remains a central and unresolved challenge in molecular biology. The most commonly used experimental technique to determine binding locations of transcription factors is chromatin immunoprecipitation followed by DNA sequencing (ChIP-seq). Because ChIP-seq is highly susceptible to background noise, the current practice obtains one matched “control” ChIP-seq dataset and estimates position-wise background distributions using ChIP-seq signals from nearby positions (e.g., within 5,000-10,000 bps). This approach poses the following four problems. (1) Incorporating such a large window of nearby positions may cause an inaccurate estimation of position-specific background distributions. (2) There are multiple ways to obtain control ChIP-seq datasets, but current peak calling methods constrain users to only one. (3) A single matched control dataset may not capture all sources of noise signals. (4) Generating a matched control dataset incurs additional time and cost. Methods We introduce the AIControl framework, which replaces matched control experiments by automatically learning weighted contributions of a large number of publicly available control ChIP-seq datasets to generate position-specific background noise distributions. We thereby avoid the cost of running control experiments while simultaneously increasing binding location accuracy. Specifically, AIControl can, (1) obtain a precise, position-specific background distribution (i.e., no need to use a window), (2) use machine learning to systematically select the most appropriate set of control datasets in a data-driven way, (3) capture noise sources that may be missed by one matched control dataset, and (4) remove the need for costly and time-consuming matched control experiments. Results We applied AIControl to 410 ChIP-seq datasets from tier 1 or 2 cell lines in the Encyclopedia of DNA Elements (ENCODE) database whose transcription factors have motif information. AIControl used 455 control ChIP-seq datasets from 107 cell lines and 9 laboratories to impute background ChIP-seq noise signals. Without using matched control datasets, AIControl identified peaks that were more enriched for putative binding sites than those identified by other popular peak callers that used a matched control dataset. Additionally, our framework reduced the reproducibility between two ChIP-seq datasets whose associated transcription factors had no documented interactions, which suggests that AIControl better removes reproducible confounding effects. Finally, we demonstrated that our framework improves the quality of downstream analysis by showing that binding sites it identified recover documented protein interactions more accurately. Conclusion AIControl removes the need to generate additional matched control data and accurately predicts binding events even when testing on cell lines not in the ENCODE’s control datasets. The implementation can be accessed at https://github.com/suinleelab/AIControl.

bioRxiv | 2017

DeepATAC: A deep-learning method to predict regulatory factor binding activity from ATAC-seq signals

Naozumi Hiranuma; Scott M. Lundberg; Su-In Lee

Determining the binding locations of regulatory factors, such as transcription factors and histone modifications, is essential to both basic biology research and many clinical applications. Obtaining such genome-wide location maps directly is often invasive and resource-intensive, so it is common to impute binding locations from DNA sequence or measures of chromatin accessibility. We introduce DeepATAC, a deep-learning approach for imputing binding locations that uses both DNA sequence and chromatin accessibility as measured by ATAC-seq. DeepATAC significantly outperforms current approaches such as FIMO motif predictions overlapped with ATAC-seq peaks, and models based only on DNA sequence, such as DeepSEA. Visualizing the input importances for the DeepATAC model reveals DNA sequence motifs and ATAC-seq signal patterns that are important for predicting binding events. The Keras implementation and analysis pipelines of DeepATAC are available at https://github.com/hiranumn/deepatac.

bioRxiv | 2015

Learning the human chromatin network from all ENCODE ChIP-seq data

Scott M. Lundberg; William B. Tu; Brian Raught; Linda Z. Penn; Michael M. Hoffman; Su-In Lee

Introduction: A cell’s epigenome arises from interactions among regulatory factors — transcription factors, histone modifications, and other DNA-associated proteins — co-localized at particular genomic regions. Identifying the network of interactions among regulatory factors, the chromatin network, is of paramount importance in understanding epigenome regulation. Methods: We developed a novel computational approach, ChromNet, to infer the chromatin network from a set of ChIP-seq datasets. ChromNet has four key features that enable its use on large collections of ChIP-seq data. First, rather than using pairwise co-localization of factors along the genome, ChromNet identifies conditional dependence relationships that better discriminate direct and indirect interactions. Second, our novel statistical technique, the group graphical model, improves inference of conditional dependence on highly correlated datasets. Such datasets are common because some transcription factors form a complex and the same transcription factor is often assayed in different laboratories or cell types. Third, ChromNet’s computationally efficient method and the group graphical model enable the learning of a joint network across all cell types, which greatly increases the scope of possible interactions. We have shown that this results in a significantly higher fold enrichment for validated protein interactions. Fourth, ChromNet provides an efficient way to identify the genomic context that drives a particular network edge, which provides a more comprehensive understanding of regulatory factor interactions. Results: We applied ChromNet to all available ChIP-seq data from the ENCODE Project, consisting of 1451 ChIP-seq datasets, which revealed previously known physical interactions better than alternative approaches. ChromNet also identified previously unreported regulatory factor interactions. We experimentally validated one of these interactions, between the MYC and HCFC1 transcription factors. Discussion: ChromNet provides a useful tool for understanding the interactions among regulatory factors and identifying novel interactions. We have provided an interactive web-based visualization of the full ENCODE chromatin network and the ability to incorporate custom datasets at http://chromnet.cs.washington.edu.

neural information processing systems | 2017