Is this you? Create Your Porfile

Saket Navlakha

Salk Institute for Biological Studies

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Saket Navlakha is active.

Explore More

Publication

Featured researches published by Saket Navlakha.

Bioinformatics | 2010

The power of protein interaction networks for associating genes with diseases

Saket Navlakha; Carl Kingsford

Motivation: Understanding the association between genetic diseases and their causal genes is an important problem concerning human health. With the recent influx of high-throughput data describing interactions between gene products, scientists have been provided a new avenue through which these associations can be inferred. Despite the recent interest in this problem, however, there is little understanding of the relative benefits and drawbacks underlying the proposed techniques. Results: We assessed the utility of physical protein interactions for determining gene–disease associations by examining the performance of seven recently developed computational methods (plus several of their variants). We found that random-walk approaches individually outperform clustering and neighborhood approaches, although most methods make predictions not made by any other method. We show how combining these methods into a consensus method yields Pareto optimal performance. We also quantified how a diffuse topological distribution of disease-related proteins negatively affects prediction quality and are thus able to identify diseases especially amenable to network-based predictions and others for which additional information sources are absolutely required. Availability: The predictions made by each algorithm considered are available online at http://www.cbcb.umd.edu/DiseaseNet Contact: [email protected] Supplementary information: Supplementary data are available at Bioinformatics online.

international conference on management of data | 2008

Graph summarization with bounded error

Saket Navlakha; Rajeev Rastogi; Nisheeth Shrivastava

We propose a highly compact two-part representation of a given graph G consisting of a graph summary and a set of corrections. The graph summary is an aggregate graph in which each node corresponds to a set of nodes in G, and each edge represents the edges between all pair of nodes in the two sets. On the other hand, the corrections portion specifies the list of edge-corrections that should be applied to the summary to recreate G. Our representations allow for both lossless and lossy graph compression with bounds on the introduced error. Further, in combination with the MDL principle, they yield highly intuitive coarse-level summaries of the input graph G. We develop algorithms to construct highly compressed graph representations with small sizes and guaranteed accuracy, and validate our approach through an extensive set of experiments with multiple real-life graph data sets. To the best of our knowledge, this is the first work to compute graph summaries using the MDL principle, and use the summaries (along with corrections) to compress graphs with bounded error.

BMC Bioinformatics | 2010

Alignment and clustering of phylogenetic markers - implications for microbial diversity studies

James Robert White; Saket Navlakha; Niranjan Nagarajan; Mohammadreza Ghodsi; Carl Kingsford; Mihai Pop

BackgroundMolecular studies of microbial diversity have provided many insights into the bacterial communities inhabiting the human body and the environment. A common first step in such studies is a survey of conserved marker genes (primarily 16S rRNA) to characterize the taxonomic composition and diversity of these communities. To date, however, there exists significant variability in analysis methods employed in these studies.ResultsHere we provide a critical assessment of current analysis methodologies that cluster sequences into operational taxonomic units (OTUs) and demonstrate that small changes in algorithm parameters can lead to significantly varying results. Our analysis provides strong evidence that the species-level diversity estimates produced using common OTU methodologies are inflated due to overly stringent parameter choices. We further describe an example of how semi-supervised clustering can produce OTUs that are more robust to changes in algorithm parameters.ConclusionsOur results highlight the need for systematic and open evaluation of data analysis methodologies, especially as targeted 16S rRNA diversity studies are increasingly relying on high-throughput sequencing technologies. All data and results from our study are available through the JGI FAMeS website http://fames.jgi-psf.org/.

Molecular Systems Biology | 2014

Algorithms in nature: the convergence of systems biology and computational thinking

Saket Navlakha; Ziv Bar-Joseph

Computer science and biology have enjoyed a long and fruitful relationship for decades. Biologists rely on computational methods to analyze and integrate large data sets, while several computational methods were inspired by the high‐level design principles of biological systems. Recently, these two directions have been converging. In this review, we argue that thinking computationally about biological processes may lead to more accurate models, which in turn can be used to improve the design of algorithms. We discuss the similar mechanisms and requirements shared by computational and biological processes and then present several recent studies that apply this joint analysis strategy to problems related to coordination, network analysis, and tracking and vision. We also discuss additional biological processes that can be studied in a similar manner and link them to potential computational problems. With the rapid accumulation of data detailing the inner workings of biological systems, we expect this direction of coupling biological and computational studies to greatly expand in the future.

Journal of Computational Biology | 2009

Revealing biological modules via graph summarization

Saket Navlakha; Michael C. Schatz; Carl Kingsford

The division of a protein interaction network into biologically meaningful modules can aid with automated detection of protein complexes and prediction of biological processes and can uncover the global organization of the cell. We propose the use of a graph summarization (GS) technique, based on graph compression, to cluster protein interaction graphs into biologically relevant modules. The method is motivated by defining a biological module as a set of proteins that have similar sets of interaction partners. We show this definition, put into practice by a GS algorithm, reveals modules that are more biologically enriched than those found by other methods. We also apply GS to predict complex memberships, biological processes, and co-complexed pairs and show that in most settings GS is preferable over existing methods of protein interaction graph clustering.

PLOS Computational Biology | 2011

Network Archaeology: Uncovering Ancient Networks from Present-Day Interactions

Saket Navlakha; Carl Kingsford

What proteins interacted in a long-extinct ancestor of yeast? How have different members of a protein complex assembled together over time? Our ability to answer such questions has been limited by the unavailability of ancestral protein-protein interaction (PPI) networks. To overcome this limitation, we propose several novel algorithms to reconstruct the growth history of a present-day network. Our likelihood-based method finds a probable previous state of the graph by applying an assumed growth model backwards in time. This approach retains node identities so that the history of individual nodes can be tracked. Using this methodology, we estimate protein ages in the yeast PPI network that are in good agreement with sequence-based estimates of age and with structural features of protein complexes. Further, by comparing the quality of the inferred histories for several different growth models (duplication-mutation with complementarity, forest fire, and preferential attachment), we provide additional evidence that a duplication-based model captures many features of PPI network growth better than models designed to mimic social network growth. From the reconstructed history, we model the arrival time of extant and ancestral interactions and predict that complexes have significantly re-wired over time and that new edges tend to form within existing complexes. We also hypothesize a distribution of per-protein duplication rates, track the change of the networks clustering coefficient, and predict paralogous relationships between extant proteins that are likely to be complementary to the relationships inferred using sequence alone. Finally, we infer plausible parameters for the model, thereby predicting the relative probability of various evolutionary events. The success of these algorithms indicates that parts of the history of the yeast PPI are encoded in its present-day form.

Communications of The ACM | 2015

Distributed information processing in biological and computational systems

Saket Navlakha; Ziv Bar-Joseph

Exploring the similarities and differences between distributed computations in biological and computational systems.

international semantic web conference | 2011

Link prediction for annotation graphs using graph summarization

Andreas Thor; Philip Anderson; Louiqa Raschid; Saket Navlakha; Barna Saha; Samir Khuller; Xiao-Ning Zhang

Annotation graph datasets are a natural representation of scientific knowledge. They are common in the life sciences where genes or proteins are annotated with controlled vocabulary terms (CV terms) from ontologies. The W3C Linking Open Data (LOD) initiative and semantic Web technologies are playing a leading role in making such datasets widely available. Scientists can mine these datasets to discover patterns of annotation. While ontology alignment and integration across datasets has been explored in the context of the semantic Web, there is no current approach to mine such patterns in annotation graph datasets. In this paper, we propose a novel approach for link prediction; it is a preliminary task when discovering more complex patterns. Our prediction is based on a complementary methodology of graph summarization (GS) and dense subgraphs (DSG). GS can exploit and summarize knowledge captured within the ontologies and in the annotation patterns. DSG uses the ontology structure, in particular the distance between CV terms, to filter the graph, and to find promising subgraphs. We develop a scoring function based on multiple heuristics to rank the predictions. We perform an extensive evaluation on Arabidopsis thaliana genes.

research in computational molecular biology | 2009

Finding Biologically Accurate Clusterings in Hierarchical Tree Decompositions Using the Variation of Information

Saket Navlakha; James Robert White; Niranjan Nagarajan; Mihai Pop; Carl Kingsford

Hierarchical clustering is a popular method for grouping together similar elements based on a distance measure between them. In many cases, annotations for some elements are known beforehand, which can aid the clustering process. We present a novel approach for decomposing a hierarchical clustering into the clusters that optimally match a set of known annotations, as measured by the variation of information metric. Our approach is general and does not require the user to enter the number of clusters desired. We apply it to two biological domains: finding protein complexes within protein interaction networks and identifying species within metagenomic DNA samples. For these two applications, we test the quality of our clusters by using them to predict complex and species membership, respectively. We find that our approach generally outperforms the commonly used heuristic methods.

PLOS Computational Biology | 2012

A Network-based Approach for Predicting Missing Pathway Interactions

Saket Navlakha; Anthony Gitter; Ziv Bar-Joseph

Embedded within large-scale protein interaction networks are signaling pathways that encode response cascades in the cell. Unfortunately, even for well-studied species like S. cerevisiae, only a fraction of all true protein interactions are known, which makes it difficult to reason about the exact flow of signals and the corresponding causal relations in the network. To help address this problem, we introduce a framework for predicting new interactions that aid connectivity between upstream proteins (sources) and downstream transcription factors (targets) of a particular pathway. Our algorithms attempt to globally minimize the distance between sources and targets by finding a small set of shortcut edges to add to the network. Unlike existing algorithms for predicting general protein interactions, by focusing on proteins involved in specific responses our approach homes-in on pathway-consistent interactions. We applied our method to extend pathways in osmotic stress response in yeast and identified several missing interactions, some of which are supported by published reports. We also performed experiments that support a novel interaction not previously reported. Our framework is general and may be applicable to edge prediction problems in other domains.

Explore More