Is this you? Create Your Porfile

David van Dijk

Memorial Sloan Kettering Cancer Center

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where David van Dijk is active.

Explore More

Publication

Featured researches published by David van Dijk.

bioRxiv | 2017

MAGIC: A diffusion-based imputation method reveals gene-gene interactions in single-cell RNA-sequencing data

David van Dijk; Juozas Nainys; Roshan Sharma; Pooja Kathail; Ambrose Carr; Kevin R. Moon; Linas Mazutis; Guy Wolf; Smita Krishnaswamy; Dana Pe'er

Single-cell RNA-sequencing is fast becoming a major technology that is revolutionizing biological discovery in fields such as development, immunology and cancer. The ability to simultaneously measure thousands of genes at single cell resolution allows, among other prospects, for the possibility of learning gene regulatory networks at large scales. However, scRNA-seq technologies suffer from many sources of significant technical noise, the most prominent of which is ‘dropout’ due to inefficient mRNA capture. This results in data that has a high degree of sparsity, with typically only ~10% non-zero values. To address this, we developed MAGIC (Markov Affinity-based Graph Imputation of Cells), a method for imputing missing values, and restoring the structure of the data. After MAGIC, we find that two- and three-dimensional gene interactions are restored and that MAGIC is able to impute complex and non-linear shapes of interactions. MAGIC also retains cluster structure, enhances cluster-specific gene interactions and restores trajectories, as demonstrated in mouse retinal bipolar cells, hematopoiesis, and our newly generated epithelial-to-mesenchymal transition dataset.

bioRxiv | 2018

Exploring Single-Cell Data with Deep Multitasking Neural Networks

Matthew Amodio; David van Dijk; K. Srinivasan; William S. Chen; Hussein Mohsen; Kevin R. Moon; Allison M. Campbell; Yujiao Zhao; Xiaomei Wang; Manjunatha M. Venkataswamy; Anita Desai; V. Ravi; Priti Kumar; Ruth R. Montgomery; Guy Wolf; Smita Krishnaswamy

Biomedical researchers are generating high-throughput, high-dimensional single-cell data at a staggering rate. As costs of data generation decrease, experimental design is moving towards measurement of many different single-cell samples in the same dataset. These samples can correspond to different patients, conditions, or treatments. While scalability of methods to datasets of these sizes is a challenge on its own, dealing with large-scale experimental design presents a whole new set of problems, including batch effects and sample comparison issues. Currently, there are no computational tools that can both handle large amounts of data in a scalable manner (many cells) and at the same time deal with many samples (many patients or conditions). Moreover, data analysis currently involves the use of different tools that each operate on their own data representation, not guaranteeing a synchronized analysis pipeline. For instance, data visualization methods can be disjoint and mismatched with the clustering method. For this purpose, we present SAUCIE, a deep neural network that leverages the high degree of parallelization and scalability offered by neural networks, as well as the deep representation of data that can be learned by them to perform many single-cell data analysis tasks, all on a unified representation. A well-known limitation of neural networks is their interpretability. Our key contribution here are newly formulated regularizations (penalties) that render features learned in hidden layers of the neural network interpretable. When large multi-patient datasets are fed into SAUCIE, the various hidden layers contain denoised and batch-corrected data, a low dimensional visualization, unsupervised clustering, as well as other information that can be used to explore the data. We show this capability by analyzing a newly generated 180-sample dataset consisting of T cells from dengue patients in India, measured with mass cytometry. We show that SAUCIE, for the first time, can batch correct and process this 11-million cell data to identify cluster-based signatures of acute dengue infection and create a patient manifold, stratifying immune response to dengue on the basis of single-cell measurements.

bioRxiv | 2017

PHATE: A Dimensionality Reduction Method for Visualizing Trajectory Structures in High-Dimensional Biological Data

Kevin R. Moon; David van Dijk; Zheng Wang; William C. Chen; Matthew J. Hirn; Ronald R. Coifman; Natalia B. Ivanova; Guy Wolf; Smita Krishnaswamy

With the advent of high-throughput technologies measuring high-dimensional biological data, there is a pressing need for visualization tools that reveal the structure and emergent patterns of data in an intuitive form. We present PHATE, a visualization method that captures both local and global nonlinear structure in data by an information-geometry distance between datapoints. We perform extensive comparison between PHATE and other tools on a variety of artificial and biological datasets, and find that it consistently preserves a range of patterns in data including continual progressions, branches, and clusters. We show that PHATE is applicable to a wide variety of datatypes including mass cytometry, single-cell RNA-sequencing, Hi-C, and gut microbiome data, where it can generate interpretable insights into the underlying systems. Finally, we use PHATE to explore a newly generated scRNA-seq dataset of human germ layer differentiation. Here, PHATE reveals a dynamic picture of the main developmental branches in unparalleled detail.In recent years, dimensionality reduction methods have become critical for visualization, exploration, and interpretation of high-throughput, high-dimensional biological data, as they enable the extraction of major trends in the data while discarding noise. However, biological data contains a type of predominant structure that is not preserved in commonly used methods such as PCA and tSNE, namely, branching progression structure. This structure, which is often non-linear, arises from underlying biological processes such as differentiation, graded responses to stimuli, and population drift, which generate cellular (or population) diversity. We propose a novel, affinity-preserving embedding called PHATE (Potential of Heat-diffusion for Affinity-based Trajectory Embedding), designed explicitly to preserve progression structure in data. PHATE provides a denoised, two or three-dimensional visualization of the complete branching trajectory structure in high-dimensional data. It uses heat-diffusion processes, which naturally denoise the data, to compute cell-cell affinities. Then, PHATE creates a diffusion-potential geometry by free-energy potentials of these processes. This geometry captures high-dimensional trajectory structures, while enabling a natural embedding of the intrinsic data geometry. This embedding accurately visualizes trajectories and data distances, without requiring strict assumptions typically used by path-finding and tree-fitting algorithms, which have recently been used for pseudotime orderings or tree-renderings of cellular data. Furthermore, PHATE supports a wide range of data exploration tasks by providing interpretable overlays on top of the visualization. We show that such overlays can emphasize and reveal trajectory end-points, branch points and associated split-decisions, progression-forming variables (e.g., specific genes), and paths between developmental events in cellular state-space. We demonstrate PHATE on single-cell RNA sequencing and mass cytometry data pertaining to embryoid body differentiation, IPSC reprogramming, and hematopoiesis in the bone marrow. We also demonstrate PHATE on non-single cell data including single-nucleotide polymorphism (SNP) measurements of European populations, and 16s sequencing of gut microbiota.Abstract In the era of ‘Big Data’ there is a pressing need for tools that provide human interpretable visualizations of emergent patterns in high-throughput high-dimensional data. Further, to enable insightful data exploration, such visualizations should faithfully capture and emphasize emergent structures and patterns without enforcing prior assumptions on the shape or form of the data. In this paper, we present PHATE (Potential of Heat-diffusion for Affinity-based Transition Embedding) - an unsupervised low-dimensional embedding for visualization of data that is aimed at solving these issues. Unlike previous methods that are commonly used for visualization, such as PCA and tSNE, PHATE is able to capture and highlight both local and global structure in the data. In particular, in addition to clustering patterns, PHATE also uncovers and emphasizes progression and transitions (when they exist) in the data, which are often missed in other visualization-capable methods. Such patterns are especially important in biological data that contain, for example, single-cell phenotypes at different phases of differentiation, patients at different stages of disease progression, and gut microbial compositions that vary gradually between individuals, even of the same enterotype. The embedding provided by PHATE is based on a novel informational distance that captures long-range nonlinear relations in the data by computing energy potentials of data-adaptive diffusion processes. We demonstrate the effectiveness of the produced visualization in revealing insights on a wide variety of biomedical data, including single-cell RNA-sequencing, mass cytometry, gut microbiome sequencing, human SNP data, Hi-C data, as well as non-biomedical data, such as facebook network and facial image data. In order to validate the capability of PHATE to enable exploratory analysis, we generate a new dataset of 31,000 single-cells from a human embryoid body differentiation system. Here, PHATE provides a comprehensive picture of the differentiation process, while visualizing major and minor branching trajectories in the data. We validate that all known cell types are recapitulated in the PHATE embedding in proper organization. Furthermore, the global picture of the system offered by PHATE allows us to connect parts of the developmental progression and characterize novel regulators associated with developmental lineages.

bioRxiv | 2017

Visualizing Transitions and Structure for High Dimensional Data Exploration

Kevin R. Moon; David van Dijk; Zheng Wang; Daniel Burkhardt; William C. Chen; Antonia van den Elzen; Matthew J. Hirn; Ronald R. Coifman; Natalia B. Ivanova; Guy Wolf; Smita Krishnaswamy

bioRxiv | 2018

Visualizing Transitions and Structure for Biological Data Exploration

Kevin R. Moon; David van Dijk; Zheng Wang; Scott Gigante; Daniel Burkhardt; William C. Chen; Antonia van den Elzen; Matthew J. Hirn; Ronald R. Coifman; Natalia B. Ivanova; Guy Wolf; Smita Krishnaswamy

JCI insight | 2016

PD-1 marks dysfunctional regulatory T cells in malignant gliomas

Daniel E. Lowther; Brittany A. Goods; Liliana E. Lucca; Benjamin Lerner; David van Dijk; Amanda L. Hernandez; Xiangguo Duan; Murat Gunel; Vlad Coric; Smita Krishnaswamy; J. Christopher Love; David A. Hafler

Cell | 2018

Recovering Gene Interactions from Single-Cell Data Using Data Diffusion

David van Dijk; Roshan Sharma; Juozas Nainys; Kristina Yim; Pooja Kathail; Ambrose Carr; Cassandra Burdziak; Kevin R. Moon; Christine L. Chaffer; Diwakar R. Pattabiraman; Brian Bierie; Linas Mazutis; Guy Wolf; Smita Krishnaswamy; Dana Pe’er

Current Opinion in Systems Biology | 2018