Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Geet Duggal is active.

Publication


Featured researches published by Geet Duggal.


Nature Methods | 2017

Salmon provides fast and bias-aware quantification of transcript expression

Rob Patro; Geet Duggal; Michael I. Love; Rafael A. Irizarry; Carl Kingsford

We introduce Salmon, a lightweight method for quantifying transcript abundance from RNA–seq reads. Salmon combines a new dual-phase parallel inference algorithm and feature-rich bias models with an ultra-fast read mapping procedure. It is the first transcriptome-wide quantifier to correct for fragment GC-content bias, which, as we demonstrate here, substantially improves the accuracy of abundance estimates and the sensitivity of subsequent differential expression analysis.We introduce Salmon, a method for quantifying transcript abundance from RNA-seq reads that is accurate and fast. Salmon is the first transcriptome-wide quantifier to correct for fragment GC content bias, which we demonstrate substantially improves the accuracy of abundance estimates and the reliability of subsequent differential expression analysis. Salmon combines a new dual-phase parallel inference algorithm and feature-rich bias models with an ultra-fast read mapping procedure.


Algorithms for Molecular Biology | 2014

Identification of alternative topological domains in chromatin

Darya Filippova; Robert Patro; Geet Duggal; Carl Kingsford

Chromosome conformation capture experiments have led to the discovery of dense, contiguous, megabase-sized topological domains that are similar across cell types and conserved across species. These domains are strongly correlated with a number of chromatin markers and have since been included in a number of analyses. However, functionally-relevant domains may exist at multiple length scales. We introduce a new and efficient algorithm that is able to capture persistent domains across various resolutions by adjusting a single scale parameter. The ensemble of domains we identify allows us to quantify the degree to which the domain structure is hierarchical as opposed to overlapping, and our analysis reveals a pronounced hierarchical structure in which larger stable domains tend to completely contain smaller domains. The identified novel domains are substantially different from domains reported previously and are highly enriched for insulating factor CTCF binding and histone marks at the boundaries.


bioRxiv | 2015

Salmon: Accurate, Versatile and Ultrafast Quantification from RNA-seq Data using Lightweight-Alignment

Rob Patro; Geet Duggal; Carl Kingsford

Transcript quantication is a central task in the analysis of RNA-seq data. Accurate computational methods for the quantication of transcript abundances are essential for downstream analysis. However, most existing approaches are much slower than is necessary for their degree of accuracy. We introduce Salmon, a novel method and software tool for transcript quantication that exhibits state-of-the-art accuracy while being signicantly faster than most other tools. Salmon achieves this through the combined application of a two-phase inference procedure, a reduced data representation, and a novel lightweight read alignment algorithm. Salmon is written in C++11, and is available under the GPL v3 license as open-source software at https://combine-lab.github.io/salmon.We introduce Salmon, a new method for quantifying transcript abundance from RNA-seq reads that is highly-accurate and very fast. Salmon is the first transcriptome-wide quantifier to model and correct for fragment GC content bias, which we demonstrate substantially improves the accuracy of abundance estimates and the reliability of subsequent differential expression analysis compared to existing methods that do not account for these biases. Salmon achieves its speed and accuracy by combining a new dual-phase parallel inference algorithm and feature-rich bias models with an ultra-fast read mapping procedure. These innovations yield both exceptional accuracy and order-of-magnitude speed benefits over alignment-based methods.


Nucleic Acids Research | 2014

Higher-order chromatin domains link eQTLs with the expression of far-away genes

Geet Duggal; Hao Wang; Carl Kingsford

Distal expression quantitative trait loci (distal eQTLs) are genetic mutations that affect the expression of genes genomically far away. However, the mechanisms that cause a distal eQTL to modulate gene expression are not yet clear. Recent high-resolution chromosome conformation capture experiments along with a growing database of eQTLs provide an opportunity to understand the spatial mechanisms influencing distal eQTL associations on a genome-wide scale. We test the hypothesis that spatial proximity contributes to eQTL-gene regulation in the context of the higher-order domain structure of chromatin as determined from recent Hi-C chromosome conformation experiments. This analysis suggests that the large-scale topology of chromatin is coupled with eQTL associations by providing evidence that eQTLs are in general spatially close to their target genes, occur often around topological domain boundaries and preferentially associate with genes across domains. We also find that within-domain eQTLs that overlap with regulatory elements such as promoters and enhancers are spatially more close than the overall set of within-domain eQTLs, suggesting that spatial proximity derived from the domain structure in chromatin plays an important role in the regulation of gene expression.


bioRxiv | 2016

Salmon provides accurate, fast, and bias-aware transcript expression estimates using dual-phase inference

Rob Patro; Geet Duggal; Michael I. Love; Rafael A. Irizarry; Carl Kingsford

Transcript quantication is a central task in the analysis of RNA-seq data. Accurate computational methods for the quantication of transcript abundances are essential for downstream analysis. However, most existing approaches are much slower than is necessary for their degree of accuracy. We introduce Salmon, a novel method and software tool for transcript quantication that exhibits state-of-the-art accuracy while being signicantly faster than most other tools. Salmon achieves this through the combined application of a two-phase inference procedure, a reduced data representation, and a novel lightweight read alignment algorithm. Salmon is written in C++11, and is available under the GPL v3 license as open-source software at https://combine-lab.github.io/salmon.We introduce Salmon, a new method for quantifying transcript abundance from RNA-seq reads that is highly-accurate and very fast. Salmon is the first transcriptome-wide quantifier to model and correct for fragment GC content bias, which we demonstrate substantially improves the accuracy of abundance estimates and the reliability of subsequent differential expression analysis compared to existing methods that do not account for these biases. Salmon achieves its speed and accuracy by combining a new dual-phase parallel inference algorithm and feature-rich bias models with an ultra-fast read mapping procedure. These innovations yield both exceptional accuracy and order-of-magnitude speed benefits over alignment-based methods.


knowledge discovery and data mining | 2012

The missing models: a data-driven approach for learning how networks grow

Robert Patro; Geet Duggal; Emre Sefer; Hao Wang; Darya Filippova; Carl Kingsford

Probabilistic models of network growth have been extensively studied as idealized representations of network evolution. Models, such as the Kronecker model, duplication-based models, and preferential attachment models, have been used for tasks such as representing null models, detecting anomalies, algorithm testing, and developing an understanding of various mechanistic growth processes. However, developing a new growth model to fit observed properties of a network is a difficult task, and as new networks are studied, new models must constantly be developed. Here, we present a framework, called GrowCode, for the automatic discovery of novel growth models that match user-specified topological features in undirected graphs. GrowCode introduces a set of basic commands that are general enough to encode several previously developed models. Coupling this formal representation with an optimization approach, we show that GrowCode is able to discover models for protein interaction networks, autonomous systems networks, and scientific collaboration networks that closely match properties such as the degree distribution, the clustering coefficient, and assortativity that are observed in real networks of these classes. Additional tests on simulated networks show that the models learned by GrowCode generate distributions of graphs with similar variance as existing models for these classes.


research in computational molecular biology | 2015

Deconvolution of Ensemble Chromatin Interaction Data Reveals the Latent Mixing Structures in Cell Subpopulations

Emre Sefer; Geet Duggal; Carl Kingsford

Chromosome conformation capture (3C) experiments provide a window into the spatial packing of a genome in three dimensions within the cell. This structure has been shown to be highly correlated with gene regulation, cancer mutations, and other genomic functions. However, 3C provides mixed measurements on a population of typically millions of cells, each with a different genome structure due to the fluidity of the genome and differing cell states. Here, we present several algorithms to deconvolve these measured 3C matrices into estimations of the contact matrices for each subpopulation of cells and relative densities of each subpopulation. We formulate the problem as that of choosing matrices and densities that minimize the Frobenius distance between the observed 3C matrix and the weighted sum of the estimated subpopulation matrices. Results on HeLa 5C and mouse and bacteria Hi-C data demonstrate the methods’ effectiveness. We also show that domain boundaries from deconvolved matrices are often more enriched or depleted for regulatory chromatin markers when compared to boundaries from convolved matrices.


international conference on bioinformatics | 2013

Topological properties of chromosome conformation graphs reflect spatial proximities within chromatin

Hao Wang; Geet Duggal; Robert Patro; Michelle Girvan; Sridhar Hannenhalli; Carl Kingsford

Recent chromosome conformation capture (3C) experiments produce genome-wide networks of chromatin interactions to help to study how chromosome structures relate to genomic functions. We investigate whether properties of chromatin interaction graphs based on shortest paths, maximum flows, and dense cores correlate with the spatial proximity in a three-dimensional model of the yeast genome. We demonstrate that within automatically-detected dense subgraphs, which correspond to spatially compact cores of interacting chromatin, these properties are well-correlated with spatial volume. We show that all tested methods are able to identify spatially compact sets when the test sets contain fragments from several chromosomes. We use a framework for systematically evaluating whether a method can accurately assess the spatial enrichment of a set of genomic loci for a hypothesized biological function. In such regions, we observe that the sets of fragments contained in the maximum density subgraph overlap highly with the sets of fragments in the spatially compact cores. Further, we observe that all methods agree on the spatial closeness of the yeast genomic annotations. Together, we show that compared to the more computationally complex and expensive three-dimensional embedding approach, the topological features of 3C graphs can be used to directly detect spatial closeness.


Algorithms for Molecular Biology | 2013

Resolving spatial inconsistencies in chromosome conformation measurements

Geet Duggal; Robert Patro; Emre Sefer; Hao Wang; Darya Filippova; Samir Khuller; Carl Kingsford

BackgroundChromosome structure is closely related to its function and Chromosome Conformation Capture (3C) is a widely used technique for exploring spatial properties of chromosomes. 3C interaction frequencies are usually associated with spatial distances. However, the raw data from 3C experiments is an aggregation of interactions from many cells, and the spatial distances of any given interaction are uncertain.ResultsWe introduce a new method for filtering 3C interactions that selects subsets of interactions that obey metric constraints of various strictness. We demonstrate that, although the problem is computationally hard, near-optimal results are often attainable in practice using well-designed heuristics and approximation algorithms. Further, we show that, compared with a standard technique, this metric filtering approach leads to (a) subgraphs with higher statistical significance, (b) lower embedding error, (c) lower sensitivity to initial conditions of the embedding algorithm, and (d) structures with better agreement with light microscopy measurements. Our filtering scheme is applicable for a strict frequency-to-distance mapping and a more relaxed mapping from frequency to a range of distances.ConclusionsOur filtering method for 3C data considers both metric consistency and statistical confidence simultaneously resulting in lower-error embeddings that are biologically more plausible.


PLOS ONE | 2012

Interpreting patterns of gene expression: signatures of coregulation, the data processing inequality, and triplet motifs.

Wai Lim Ku; Geet Duggal; Yuan Li; Michelle Girvan; Edward Ott

Various methods of reconstructing transcriptional regulatory networks infer transcriptional regulatory interactions (TRIs) between strongly coexpressed gene pairs (as determined from microarray experiments measuring mRNA levels). Alternatively, however, the coexpression of two genes might imply that they are coregulated by one or more transcription factors (TFs), and do not necessarily share a direct regulatory interaction. We explore whether and under what circumstances gene pairs with a high degree of coexpression are more likely to indicate TRIs, coregulation or both. Here we use established TRIs in combination with microarray expression data from both Escherichia coli (a prokaryote) and Saccharomyces cerevisiae (a eukaryote) to assess the accuracy of predictions of coregulated gene pairs and TRIs from coexpressed gene pairs. We find that coexpressed gene pairs are more likely to indicate coregulation than TRIs for Saccharomyces cerevisiae, but the incidence of TRIs in highly coexpressed gene pairs is higher for Escherichia coli. The data processing inequality (DPI) has previously been applied for the inference of TRIs. We consider the case where a transcription factor gene is known to regulate two genes (one of which is a transcription factor gene) that are known not to regulate one another. According to the DPI, the non-interacting gene pairs should have the smallest mutual information among all pairs in the triplets. While this is sometimes the case for Escherichia coli, we find that it is almost always not the case for Saccharomyces cerevisiae. This brings into question the usefulness of the DPI sometimes employed to infer TRIs from expression data. Finally, we observe that when a TF gene is known to regulate two other genes, it is rarely the case that one regulatory interaction is positively correlated and the other interaction is negatively correlated. Typically both are either positively or negatively correlated.

Collaboration


Dive into the Geet Duggal's collaboration.

Top Co-Authors

Avatar

Carl Kingsford

Carnegie Mellon University

View shared research outputs
Top Co-Authors

Avatar

Darya Filippova

Carnegie Mellon University

View shared research outputs
Top Co-Authors

Avatar

Emre Sefer

Carnegie Mellon University

View shared research outputs
Top Co-Authors

Avatar

Hao Wang

Carnegie Mellon University

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Rob Patro

Stony Brook University

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge