Sagi Snir
University of Haifa
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Sagi Snir.
Bioinformatics | 2006
Guohua Jin; Luay Nakhleh; Sagi Snir; Tamir Tuller
MOTIVATION Horizontal gene transfer (HGT) is believed to be ubiquitous among bacteria, and plays a major role in their genome diversification as well as their ability to develop resistance to antibiotics. In light of its evolutionary significance and implications for human health, developing accurate and efficient methods for detecting and reconstructing HGT is imperative. RESULTS In this article we provide a new HGT-oriented likelihood framework for many problems that involve phylogeny-based HGT detection and reconstruction. Beside the formulation of various likelihood criteria, we show that most of these problems are NP-hard, and offer heuristics for efficient and accurate reconstruction of HGT under these criteria. We implemented our heuristics and used them to analyze biological as well as synthetic data. In both cases, our criteria and heuristics exhibited very good performance with respect to identifying the correct number of HGT events as well as inferring their correct location on the species tree. AVAILABILITY Implementation of the criteria as well as heuristics and hardness proofs are available from the authors upon request. Hardness proofs can also be downloaded at http://www.cs.tau.ac.il/~tamirtul/MLNET/Supp-ML.pdf
IEEE/ACM Transactions on Computational Biology and Bioinformatics | 2010
Sagi Snir; Satish Rao
Accurate phylogenetic reconstruction methods are currently limited to a maximum of few dozens of taxa. Supertree methods construct a large tree over a large set of taxa, from a set of small trees over overlapping subsets of the complete taxa set. Hence, in order to construct the tree of life over a million and a half different species, the use of a supertree method over the product of accurate methods, is inevitable. Perhaps the simplest version of this task that is still widely applicable, yet quite challenging, is quartet-based reconstruction. This problem lies at the root of many tree reconstruction methods and theoretical as well as experimental results have been reported. Nevertheless, dealing with false, conflicting quartet trees remains problematic. In this paper, we describe an algorithm for constructing a tree from a set of input quartet trees even with a significant fraction of errors. We show empirically that conflicts in the inputs are handled satisfactorily and that it significantly outperforms and outraces the Matrix Representation with Parsimony (MRP) methods that have previously been most successful in dealing with supertrees. Our algorithm is based on a divide and conquer algorithm where our divide step uses a semidefinite programming (SDP) formulation of MaxCut. We remark that this builds on previous work of ours [29] for piecing together trees from rooted triplet trees. The recursion for unrooted quartets, however, is more complicated in that even with completely consistent set of quartet trees the problem is NP-hard, as opposed to the problem for triples where there is a linear time algorithm. This complexity leads to several issues and some solutions of possible independent interest.
Bioinformatics | 2007
Guohua Jin; Luay Nakhleh; Sagi Snir; Tamir Tuller
MOTIVATION Phylogenies--the evolutionary histories of groups of organisms-play a major role in representing relationships among biological entities. Although many biological processes can be effectively modeled as tree-like relationships, others, such as hybrid speciation and horizontal gene transfer (HGT), result in networks, rather than trees, of relationships. Hybrid speciation is a significant evolutionary mechanism in plants, fish and other groups of species. HGT plays a major role in bacterial genome diversification and is a significant mechanism by which bacteria develop resistance to antibiotics. Maximum parsimony is one of the most commonly used criteria for phylogenetic tree inference. Roughly speaking, inference based on this criterion seeks the tree that minimizes the amount of evolution. In 1990, Jotun Hein proposed using this criterion for inferring the evolution of sequences subject to recombination. Preliminary results on small synthetic datasets. Nakhleh et al. (2005) demonstrated the criterions application to phylogenetic network reconstruction in general and HGT detection in particular. However, the naive algorithms used by the authors are inapplicable to large datasets due to their demanding computational requirements. Further, no rigorous theoretical analysis of computing the criterion was given, nor was it tested on biological data. RESULTS In the present work we prove that the problem of scoring the parsimony of a phylogenetic network is NP-hard and provide an improved fixed parameter tractable algorithm for it. Further, we devise efficient heuristics for parsimony-based reconstruction of phylogenetic networks. We test our methods on both synthetic and biological data (rbcL gene in bacteria) and obtain very promising results.
Molecular Phylogenetics and Evolution | 2012
Sagi Snir; Satish Rao
Accurate phylogenetic reconstruction methods are inherently computationally heavy and therefore are limited to relatively small numbers of taxa. Supertree construction is the task of amalgamating small trees over partial sets into a big tree over the complete taxa set. The need for fast and accurate supertree methods has become crucial due to the enormous number of new genomic sequences generated by modern technology and the desire to use them for classification purposes. In particular, the Assembling the Tree of Life (ATOL) program aims at constructing the evolutionary history of all living organisms on Earth. When dealing with unrooted trees, a quartet - an unrooted tree over four taxa - is the most basic piece of phylogenetic information. Therefore, quartet amalgamation stands at the heart of any supertree problem as it concerns combining many minimal pieces of information into a single, coherent, and more comprehensive piece of information. We have devised an extremely fast algorithm for quartet amalgamation and implemented it in a very efficient code. The new code can handle over a hundred millions of quartet trees over several hundreds of taxa with very high accuracy.
IEEE/ACM Transactions on Computational Biology and Bioinformatics | 2006
Sagi Snir; Satish Rao
Supertree methods are used to construct a large tree over a large set of taxa from a set of small trees over overlapping subsets of the complete taxa set. Since accurate reconstruction methods are currently limited to a maximum of a few dozen taxa, the use of a supertree method in order to construct the tree of life is inevitable. Supertree methods are broadly divided according to the input trees: When the input trees are unrooted, the basic reconstruction unit is a quartet tree. In this case, the basic decision problem of whether there exists a tree that agrees with all quartets is NP-complete. On the other hand, when the input trees are rooted, the basic reconstruction unit is a rooted triplet and the above decision problem has a polynomial time algorithm. However, when there is no tree which agrees with all triplets, it would be desirable to find the tree that agrees with the maximum number of triplets. However, this optimization problem was shown to be NP-hard. Current heuristic approaches perform min cut on a graph representing the triplets inconsistency and return a tree that is guaranteed to satisfy some required properties. In this work, we present a different heuristic approach that guarantees the properties provided by the current methods and give experimental evidence that it significantly outperforms currently used methods. This method is based on a divide and conquer approach, where the min cut in the divide step is replaced by a max cut in a variant of the same graph. The latter is achieved by a lightweight semidefinite programming-like heuristic that leads to very fast running times
Journal of Computational Biology | 2008
Sagi Snir; Tandy J. Warnow; Satish Rao
Quartet-based phylogeny reconstruction methods, such as Quartet Puzzling, were introduced in the hope that they might be competitive with maximum likelihood methods, without being as computationally intensive. However, despite the numerous quartet-based methods that have been developed, their performance in simulation has been disappointing. In particular, Ranwez and Gascuel, the developers of one of the best quartet methods, conjecture that quartet-based methods have inherent limitations that make them unable to produce trees as accurate as neighbor joining or maximum parsimony. In this paper, we present Short Quartet Puzzling, a new quartet-based phylogeny reconstruction algorithm, and we demonstrate the improved topological accuracy of the new method over maximum parsimony and neighbor joining, disproving the conjecture of Ranwez and Gascuel. We also show a dramatic improvement over Quartet Puzzling. Thus, while our new method is not compared to any ML method (as it is not expected to be as accurate as the best of these), this study shows that quartet methods are not as limited in performance as was previously conjectured, and opens the possibility to further improvements through new algorithmic designs.
computing and combinatorics conference | 2007
Benny Chor; Michael R. Fellows; Mark A. Ragan; Igor Razgon; Frances A. Rosamond; Sagi Snir
An r-component connected coloring of a graph is a coloring of the vertices so that each color class induces a subgraph having at most r connected components. The concept has been well-studied for r = 1, in the case of trees, under the rubric of convex coloring, used in modeling perfect phylogenies. Several applications in bioinformatics of connected coloring problems on general graphs are discussed, including analysis of protein-protein interaction networks and protein structure graphs, and of phylogenetic relationships modeled by splits trees. We investigate the r-COMPONENT CONNECTED COLORING COMPLETION (r-CCC) problem, that takes as input a partially colored graph, having k uncolored vertices, and asks whether the partial coloring can be completed to an r-component connected coloring. For r = 1 this problem is shown to be NPhard, but fixed-parameter tractable when parameterized by the number of uncolored vertices, solvable in time O*(8k). We also show that the 1-CCC problem, parameterized (only) by the treewidth t of the graph, is fixed-parameter tractable; we show this by a method that is of independent interest. The r-CCC problem is shown to be W[1]-hard, when parameterized by the treewidth bound t, for any r = 2. Our proof also shows that the problem is NP-complete for r = 2, for general graphs.
PLOS ONE | 2012
Yarin Hadid; Attila Németh; Sagi Snir; Tomáš Pavlíček; Gábor Csorba; Miklós Kázmér; Ágnes Major; Sergey Mezhzherin; Mikhail Rusin; Yüksel Coşkun; Eviatar Nevo
The concept of climate variability facilitating adaptive radiation supported by the “Court Jester” hypothesis is disputed by the “Red Queen” one, but the prevalence of one or the other might be scale-dependent. We report on a detailed, comprehensive phylo-geographic study on the ∼4 kb mtDNA sequence in underground blind mole rats of the family Spalacidae (or subfamily Spalacinae) from the East Mediterranean steppes. Our study aimed at testing the presence of periodicities in branching patterns on a constructed phylogenetic tree and at searching for congruence between branching events, tectonic history and paleoclimates. In contrast to the strong support for the majority of the branching events on the tree, the absence of support in a few instances indicates that network-like evolution could exist in spalacids. In our tree, robust support was given, in concordance with paleontological data, for the separation of spalacids from muroid rodents during the first half of the Miocene when open, grass-dominated habitats were established. Marine barriers formed between Anatolia and the Balkans could have facilitated the separation of the lineage “Spalax” from the lineage “Nannospalax” and of the clade “leucodon” from the clade “xanthodon”. The separation of the clade “ehrenbergi” occurred during the late stages of the tectonically induced uplift of the Anatolian high plateaus and mountains, whereas the separation of the clade “vasvarii” took place when the rapidly uplifting Taurus mountain range prevented the Mediterranean rainfalls from reaching the Central Anatolian Plateau. The separation of Spalax antiquus and S. graecus occurred when the southeastern Carpathians were uplifted. Despite the role played by tectonic events, branching events that show periodicity corresponding to 400-kyr and 100-kyr eccentricity bands illuminate the important role of orbital fluctuations on adaptive radiation in spalacids. At the given scale, our results supports the “Court Jester” hypothesis over the “Red Queen” one.
IEEE/ACM Transactions on Computational Biology and Bioinformatics | 2009
Guohua Jin; Luay Nakhleh; Sagi Snir; Tamir Tuller
Phylogenies-the evolutionary histories of groups of organisms-play a major role in representing the interrelationships among biological entities. Many methods for reconstructing and studying such phylogenies have been proposed, almost all of which assume that the underlying history of a given set of species can be represented by a binary tree. Although many biological processes can be effectively modeled and summarized in this fashion, others cannot: recombination, hybrid speciation, and horizontal gene transfer result in networks of relationships rather than trees of relationships. In previous works, we formulated a maximum parsimony (MP) criterion for reconstructing and evaluating phylogenetic networks, and demonstrated its quality on biological as well as synthetic data sets. In this paper, we provide further theoretical results as well as a very fast heuristic algorithm for the MP criterion of phylogenetic networks. In particular, we provide a novel combinatorial definition of phylogenetic networks in terms of ldquoforbidden cycles,rdquo and provide detailed hardness and hardness of approximation proofs for the smallrdquo MP problem. We demonstrate the performance of our heuristic in terms of time and accuracy on both biological and synthetic data sets. Finally, we explain the difference between our model and a similar one formulated by Nguyen et al., and describe the implications of this difference on the hardness and approximation results.
PLOS Computational Biology | 2012
Sagi Snir; Yuri I. Wolf; Eugene V. Koonin
A fundamental observation of comparative genomics is that the distribution of evolution rates across the complete sets of orthologous genes in pairs of related genomes remains virtually unchanged throughout the evolution of life, from bacteria to mammals. The most straightforward explanation for the conservation of this distribution appears to be that the relative evolution rates of all genes remain nearly constant, or in other words, that evolutionary rates of different genes are strongly correlated within each evolving genome. This correlation could be explained by a model that we denoted Universal PaceMaker (UPM) of genome evolution. The UPM model posits that the rate of evolution changes synchronously across genome-wide sets of genes in all evolving lineages. Alternatively, however, the correlation between the evolutionary rates of genes could be a simple consequence of molecular clock (MC). We sought to differentiate between the MC and UPM models by fitting thousands of phylogenetic trees for bacterial and archaeal genes to supertrees that reflect the dominant trend of vertical descent in the evolution of archaea and bacteria and that were constrained according to the two models. The goodness of fit for the UPM model was better than the fit for the MC model, with overwhelming statistical significance, although similarly to the MC, the UPM is strongly overdispersed. Thus, the results of this analysis reveal a universal, genome-wide pacemaker of evolution that could have been in operation throughout the history of life.