Guy Wolf | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Guy Wolf is active.

Explore More

Publication

Featured researches published by Guy Wolf.

bioRxiv | 2017

MAGIC: A diffusion-based imputation method reveals gene-gene interactions in single-cell RNA-sequencing data

David van Dijk; Juozas Nainys; Roshan Sharma; Pooja Kathail; Ambrose Carr; Kevin R. Moon; Linas Mazutis; Guy Wolf; Smita Krishnaswamy; Dana Pe'er

Single-cell RNA-sequencing is fast becoming a major technology that is revolutionizing biological discovery in fields such as development, immunology and cancer. The ability to simultaneously measure thousands of genes at single cell resolution allows, among other prospects, for the possibility of learning gene regulatory networks at large scales. However, scRNA-seq technologies suffer from many sources of significant technical noise, the most prominent of which is ‘dropout’ due to inefficient mRNA capture. This results in data that has a high degree of sparsity, with typically only ~10% non-zero values. To address this, we developed MAGIC (Markov Affinity-based Graph Imputation of Cells), a method for imputing missing values, and restoring the structure of the data. After MAGIC, we find that two- and three-dimensional gene interactions are restored and that MAGIC is able to impute complex and non-linear shapes of interactions. MAGIC also retains cluster structure, enhances cluster-specific gene interactions and restores trajectories, as demonstrated in mouse retinal bipolar cells, hematopoiesis, and our newly generated epithelial-to-mesenchymal transition dataset.

bioRxiv | 2018

Exploring Single-Cell Data with Deep Multitasking Neural Networks

Matthew Amodio; David van Dijk; K. Srinivasan; William S. Chen; Hussein Mohsen; Kevin R. Moon; Allison M. Campbell; Yujiao Zhao; Xiaomei Wang; Manjunatha M. Venkataswamy; Anita Desai; V. Ravi; Priti Kumar; Ruth R. Montgomery; Guy Wolf; Smita Krishnaswamy

Biomedical researchers are generating high-throughput, high-dimensional single-cell data at a staggering rate. As costs of data generation decrease, experimental design is moving towards measurement of many different single-cell samples in the same dataset. These samples can correspond to different patients, conditions, or treatments. While scalability of methods to datasets of these sizes is a challenge on its own, dealing with large-scale experimental design presents a whole new set of problems, including batch effects and sample comparison issues. Currently, there are no computational tools that can both handle large amounts of data in a scalable manner (many cells) and at the same time deal with many samples (many patients or conditions). Moreover, data analysis currently involves the use of different tools that each operate on their own data representation, not guaranteeing a synchronized analysis pipeline. For instance, data visualization methods can be disjoint and mismatched with the clustering method. For this purpose, we present SAUCIE, a deep neural network that leverages the high degree of parallelization and scalability offered by neural networks, as well as the deep representation of data that can be learned by them to perform many single-cell data analysis tasks, all on a unified representation. A well-known limitation of neural networks is their interpretability. Our key contribution here are newly formulated regularizations (penalties) that render features learned in hidden layers of the neural network interpretable. When large multi-patient datasets are fed into SAUCIE, the various hidden layers contain denoised and batch-corrected data, a low dimensional visualization, unsupervised clustering, as well as other information that can be used to explore the data. We show this capability by analyzing a newly generated 180-sample dataset consisting of T cells from dengue patients in India, measured with mass cytometry. We show that SAUCIE, for the first time, can batch correct and process this 11-million cell data to identify cluster-based signatures of acute dengue infection and create a patient manifold, stratifying immune response to dengue on the basis of single-cell measurements.

bioRxiv | 2017

PHATE: A Dimensionality Reduction Method for Visualizing Trajectory Structures in High-Dimensional Biological Data

Kevin R. Moon; David van Dijk; Zheng Wang; William C. Chen; Matthew J. Hirn; Ronald R. Coifman; Natalia B. Ivanova; Guy Wolf; Smita Krishnaswamy

With the advent of high-throughput technologies measuring high-dimensional biological data, there is a pressing need for visualization tools that reveal the structure and emergent patterns of data in an intuitive form. We present PHATE, a visualization method that captures both local and global nonlinear structure in data by an information-geometry distance between datapoints. We perform extensive comparison between PHATE and other tools on a variety of artificial and biological datasets, and find that it consistently preserves a range of patterns in data including continual progressions, branches, and clusters. We show that PHATE is applicable to a wide variety of datatypes including mass cytometry, single-cell RNA-sequencing, Hi-C, and gut microbiome data, where it can generate interpretable insights into the underlying systems. Finally, we use PHATE to explore a newly generated scRNA-seq dataset of human germ layer differentiation. Here, PHATE reveals a dynamic picture of the main developmental branches in unparalleled detail.In recent years, dimensionality reduction methods have become critical for visualization, exploration, and interpretation of high-throughput, high-dimensional biological data, as they enable the extraction of major trends in the data while discarding noise. However, biological data contains a type of predominant structure that is not preserved in commonly used methods such as PCA and tSNE, namely, branching progression structure. This structure, which is often non-linear, arises from underlying biological processes such as differentiation, graded responses to stimuli, and population drift, which generate cellular (or population) diversity. We propose a novel, affinity-preserving embedding called PHATE (Potential of Heat-diffusion for Affinity-based Trajectory Embedding), designed explicitly to preserve progression structure in data. PHATE provides a denoised, two or three-dimensional visualization of the complete branching trajectory structure in high-dimensional data. It uses heat-diffusion processes, which naturally denoise the data, to compute cell-cell affinities. Then, PHATE creates a diffusion-potential geometry by free-energy potentials of these processes. This geometry captures high-dimensional trajectory structures, while enabling a natural embedding of the intrinsic data geometry. This embedding accurately visualizes trajectories and data distances, without requiring strict assumptions typically used by path-finding and tree-fitting algorithms, which have recently been used for pseudotime orderings or tree-renderings of cellular data. Furthermore, PHATE supports a wide range of data exploration tasks by providing interpretable overlays on top of the visualization. We show that such overlays can emphasize and reveal trajectory end-points, branch points and associated split-decisions, progression-forming variables (e.g., specific genes), and paths between developmental events in cellular state-space. We demonstrate PHATE on single-cell RNA sequencing and mass cytometry data pertaining to embryoid body differentiation, IPSC reprogramming, and hematopoiesis in the bone marrow. We also demonstrate PHATE on non-single cell data including single-nucleotide polymorphism (SNP) measurements of European populations, and 16s sequencing of gut microbiota.Abstract In the era of ‘Big Data’ there is a pressing need for tools that provide human interpretable visualizations of emergent patterns in high-throughput high-dimensional data. Further, to enable insightful data exploration, such visualizations should faithfully capture and emphasize emergent structures and patterns without enforcing prior assumptions on the shape or form of the data. In this paper, we present PHATE (Potential of Heat-diffusion for Affinity-based Transition Embedding) - an unsupervised low-dimensional embedding for visualization of data that is aimed at solving these issues. Unlike previous methods that are commonly used for visualization, such as PCA and tSNE, PHATE is able to capture and highlight both local and global structure in the data. In particular, in addition to clustering patterns, PHATE also uncovers and emphasizes progression and transitions (when they exist) in the data, which are often missed in other visualization-capable methods. Such patterns are especially important in biological data that contain, for example, single-cell phenotypes at different phases of differentiation, patients at different stages of disease progression, and gut microbial compositions that vary gradually between individuals, even of the same enterotype. The embedding provided by PHATE is based on a novel informational distance that captures long-range nonlinear relations in the data by computing energy potentials of data-adaptive diffusion processes. We demonstrate the effectiveness of the produced visualization in revealing insights on a wide variety of biomedical data, including single-cell RNA-sequencing, mass cytometry, gut microbiome sequencing, human SNP data, Hi-C data, as well as non-biomedical data, such as facebook network and facial image data. In order to validate the capability of PHATE to enable exploratory analysis, we generate a new dataset of 31,000 single-cells from a human embryoid body differentiation system. Here, PHATE provides a comprehensive picture of the differentiation process, while visualizing major and minor branching trajectories in the data. We validate that all known cell types are recapitulated in the PHATE embedding in proper organization. Furthermore, the global picture of the system offered by PHATE allows us to connect parts of the developmental progression and characterize novel regulators associated with developmental lineages.

bioRxiv | 2017

Visualizing Transitions and Structure for High Dimensional Data Exploration

Kevin R. Moon; David van Dijk; Zheng Wang; Daniel Burkhardt; William C. Chen; Antonia van den Elzen; Matthew J. Hirn; Ronald R. Coifman; Natalia B. Ivanova; Guy Wolf; Smita Krishnaswamy

IEEE Transactions on Signal Processing | 2016

Rigid Motion Model for Audio Source Separation

Guy Wolf; Stéphane Mallat; Shihab Shamma

We introduce a single channel blind source separation algorithm of audio mixtures. It uses a strategy that is similar to rigid object segregation in videos. A velocity field is defined over the wavelet time-frequency plane. It captures the time evolution of amplitude modulations and harmonic frequencies. Several audio sources are segregated by separating their velocity field with a harmonic rigidity assumption. Signals are then reconstructed from wavelet coefficients in different harmonic templates. The resulting monaural blind source separation is demonstrated on mixtures of speech, singing voice, music, and noise audio signals.

international workshop on machine learning for signal processing | 2014

Audio source separation with time-frequency velocities

Guy Wolf; Stéphane Mallat; Shihab Shamma

Separating complex audio sources from a single measurement channel, with no training data, is highly challenging. We introduce a new approach, which relies on the time dynamics of rigid audio models, based on harmonic templates. The velocity vectors of such models are defined and computed in a time-frequency scalogram calculated with a wavelet transform. Similarly to rigid object segmentation in videos, multiple audio sources are discriminated by approximating their velocity vectors with low-dimensional models. The different audio sources are segmented by optimizing a harmonic template selection, which provides piecewise constant velocity approximations. Numerical experiments give examples of blind source separation from single channel audio signals.

Archive | 2014

Parameter Rating by Diffusion Gradient

Guy Wolf; Amir Averbuch; Pekka Neittaanmäki

Anomaly detection is a central task in high-dimensional data analysis. It can be performed by using dimensionality reduction methods to obtain a low-dimensional representation of the data, which reveals the geometry and the patterns that exist and govern it. Usually, anomaly detection methods classify high-dimensional vectors that represent data points as either normal or abnormal. Revealing the parameters (i.e., features) that cause detected abnormal behaviors is critical in many applications. However, this problem is not addressed by recent anomaly-detection methods and, specifically, by nonparametric methods, which are based on feature-free analysis of the data. In this chapter, we provide an algorithm that rates (i.e., ranks) the parameters that cause an abnormal behavior to occur. We assume that the anomalies have already been detected by other anomaly detection methods and they are treated in this chapter as prior knowledge. Our algorithm is based on the underlying potential of the diffusion process that is used in Diffusion Maps (DM) for dimensionality reduction. We show that the gradient of this potential indicates the direction from an anomalous data point to a cluster that represents a normal behavior. We use this direction to rate the parameters that cause the abnormal behavior to occur. The algorithm was applied successfully to rate the measured parameters from process control and networking applications.

bioRxiv | 2018

Visualizing Transitions and Structure for Biological Data Exploration

Kevin R. Moon; David van Dijk; Zheng Wang; Scott Gigante; Daniel Burkhardt; William C. Chen; Antonia van den Elzen; Matthew J. Hirn; Ronald R. Coifman; Natalia B. Ivanova; Guy Wolf; Smita Krishnaswamy

bioRxiv | 2018

Embedding the single-cell sample manifold to reveal insights into cancer pathogenesis and disease heterogeneity

William S. Chen; N. Zivanovic; D. van Dijk; Guy Wolf; B. Bodenmiller; Smita Krishnaswamy

Abstract Previously, the effect of a drug on a cell population was measured based on simple metrics such as cell viability. However, as single-cell technologies are becoming more advanced, drug screen experiments can now be conducted with more complex readouts such as gene expression profiles of individual cells. The increasing complexity of measurements from these multi-sample experiments calls for more sophisticated analytical approaches than are currently available. We developed a novel method called PhEMD (Phenotypic Earth Mover’s Distance) and show that it can be used to embed the space of drug perturbations on the basis of the drugs’ effects on cell populations. When testing PhEMD on a newly-generated, 300-sample CyTOF kinase inhibition screen experiment, we find that the state space of the perturbation conditions is surprisingly low-dimensional and that the network of drugs demonstrates manifold structure. We show that because of the fairly simple manifold geometry of the 300 samples, we can accurately capture the full range of drug effects using a dictionary of only 30 experimental conditions. We also show that new drugs can be added to our PhEMD embedding using similarities inferred from other characterizations of drugs using a technique called Nystrom extension. Our findings suggest that large-scale drug screens can be conducted by measuring only a small fraction of the drugs using the most expensive high-throughput single-cell technologies—the effects of other drugs may be inferred by mapping and extending the perturbation space. We additionally show that PhEMD can be useful for analyzing other types of single-cell samples, such as patient tumor biopsies, by mapping the patient state space in a similar way as the drug state space. We demonstrate that PhEMD is scalable, compatible with leading batch effect correction techniques, and generalizable to multiple experimental designs. Altogether, our analyses suggest that PhEMD may facilitate drug discovery efforts and help uncover the network geometry of a collection of single-cell samples.Abstract Single-cell data are now being collected in large quantities across multiple samples and gene profiling runs. This introduces the need for computational methods that can compare and stratify samples that are represented themselves as complex, high-dimensional objects. We introduce PhEMD as an analytical approach that can be used for this purpose. PhEMD uses Earth Mover’s Distance (EMD), a distance between probability distributions that is sensitive to differences at multiple levels of granularity, in order to compute an accurate measure of dissimilarity between single-cell samples. PhEMD then generates a low-dimensional embedding of the samples based on this dissimilarity. We demonstrate the utility of the PhEMD sample embedding by using it to subtype melanoma and clear-cell renal cell carcinomas based on their immune cell profiles. These analyses reveal sources of inter-sample heterogeneity that have potentially clinically actionable implications, given the recent adoption of immunotherapy as an effective treatment for these cancers. We also apply PhEMD to a newly-generated 300-sample CyTOF drug screen experiment, where the effects of 233 kinase inhibitors are measured at the single-cell resolution in 33 protein dimensions. In doing so, we find that PhEMD reveals novel insights into the effects of small-molecule inhibitors on breast cancer cell subpopulations undergoing epithelial-to-mesenchymal transition. Finally, by leveraging the Nystrom extension method for diffusion maps, we demonstrate that the results of PhEMD can be integrated with other data sources and data types to predict the single-cell phenotypes of samples not directly profiled. Our analyses demonstrate that PhEMD is highly scalable and compatible with leading batch effect correction techniques, allowing for the simultaneous comparison of many single-cell samples.Previously, the effect of a drug on a cell population was measured based on simple metrics such as cell viability. However, as single-cell technologies are becoming more advanced, drug screen experiments can now be conducted with more complex readouts such as gene expression profiles of individual cells. The increasing complexity of measurements from these multi-sample experiments calls for more sophisticated analytical approaches than are currently available. We develop a novel method called PhEMD (Phenotypic Earth Mover’s Distance) and show that it can be used to embed the space of drug perturbations on the basis of the drugs’ effects on cell populations. When testing PhEMD on a newly-generated, 300-sample CyTOF kinase inhibition screen experiment, we find that the state space of the perturbation conditions is surprisingly low-dimensional and that the network of drugs demonstrates manifold structure. We show that because of the fairly simple manifold geometry of the 300 samples, we can accurately capture the full range of drug effects using a dictionary of only 30 experimental conditions. We also show that new drugs can be added to our PhEMD embedding using similarities inferred from other characterizations of drugs using a technique called Nystrom extension. Our findings suggest that large-scale drug screens can be conducted by measuring only a small fraction of the drugs using the most expensive high-throughput single-cell technologies—the effects of other drugs may be inferred by mapping and extending the perturbation space. We additionally show that PhEMD can be useful for analyzing other types of single-cell samples, such as patient tumor biopsies, by mapping the patient state space in a similar way as the drug state space. We demonstrate that PhEMD is highly scalable, compatible with leading batch effect correction techniques, and generalizable to multiple experimental designs. Altogether, our analyses suggest that PhEMD may facilitate drug discovery efforts and help uncover the network geometry of a collection of single-cell samples.

Machine Learning | 2016

Learning from patches by efficient spectral decomposition of a structured kernel

Moshe Salhov; Amit Bermanis; Guy Wolf; Amir Averbuch

We present a kernel based method that learns from a small neighborhoods (patches) of multidimensional data points. This method is based on spectral decomposition of a large structured kernel accompanied by an out-of-sample extension method. In many cases, the performance of a spectral based learning mechanism is limited due to the use of a distance metric among the multidimensional data points in the kernel construction. Recently, different distance metrics have been proposed that are based on a spectral decomposition of an appropriate kernel prior to the application of learning mechanisms. The diffusion distance metric is a typical example where a distance metric is computed by incorporating the relation of a single measurement to the entire input dataset. A method, which is called patch-to-tensor embedding (PTE), generalizes the diffusion distance metric that incorporates matrix similarity relations into the kernel construction that replaces its scalar entries with matrices. The use of multidimensional similarities in PTE based spectral decomposition results in a bigger kernel that significantly increases its computational complexity. In this paper, we propose an efficient dictionary construction that approximates the oversized PTE kernel and its associated spectral decomposition. It is supplemented by providing an out-of-sample extension for vector fields. Furthermore, the approximation error is analyzed and the advantages of the proposed dictionary construction are demonstrated on several image processing tasks.

Explore More