Erfan Sayyari
University of California, San Diego
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Erfan Sayyari.
Molecular Biology and Evolution | 2016
Erfan Sayyari; Siavash Mirarab
Species tree reconstruction is complicated by effects of incomplete lineage sorting, commonly modeled by the multi-species coalescent model (MSC). While there has been substantial progress in developing methods that estimate a species tree given a collection of gene trees, less attention has been paid to fast and accurate methods of quantifying support. In this article, we propose a fast algorithm to compute quartet-based support for each branch of a given species tree with regard to a given set of gene trees. We then show how the quartet support can be used in the context of the MSC to compute (1) the local posterior probability (PP) that the branch is in the species tree and (2) the length of the branch in coalescent units. We evaluate the precision and recall of the local PP on a wide set of simulated and biological datasets, and show that it has very high precision and improved recall compared with multi-locus bootstrapping. The estimated branch lengths are highly accurate when gene tree estimation error is low, but are underestimated when gene tree estimation error increases. Computation of both the branch length and local PP is implemented as new features in ASTRAL.
research in computational molecular biology | 2017
Chao Zhang; Erfan Sayyari; Siavash Mirarab
Discordances between species trees and gene trees can complicate phylogenetics reconstruction. ASTRAL is a leading method for inferring species trees given gene trees while accounting for incomplete lineage sorting. It finds the tree that shares the maximum number of quartets with input trees, drawing bipartitions from a predefined set of bipartitions X. In this paper, we introduce ASTRAL-III, which substantially improves on ASTRAL-II in terms of running time by handling polytomies more efficiently, exploiting similarities between gene trees, and trimming unnecessary parts of the search space. The asymptotic running time in the presence of polytomies is reduced from \(O(n^3k|X|^{{1.726}})\) for n species and k genes to \(O(D|X|^{1.726})\) where \(D=O(nk)\) is the sum of degrees of all unique nodes in input trees. ASTRAL-III enables us to test whether contracting low support branches in gene trees improves the accuracy by reducing noise. In extensive simulations and on real data, we show that removing branches with very low support improves accuracy while overly aggressive filtering is harmful.
Molecular Biology and Evolution | 2017
Erfan Sayyari; James B. Whitfield; Siavash Mirarab
Species tree reconstruction from genome-wide data is increasingly being attempted, in most cases using a two-step approach of first estimating individual gene trees and then summarizing them to obtain a species tree. The accuracy of this approach, which promises to account for gene tree discordance, depends on the quality of the inferred gene trees. At the same time, phylogenomic and phylotranscriptomic analyses typically use involved bioinformatics pipelines for data preparation. Errors and shortcomings resulting from these preprocessing steps may impact the species tree analyses at the other end of the pipeline. In this article, we first show that the presence of fragmentary data for some species in a gene alignment, as often seen on real data, can result in substantial deterioration of gene trees, and as a result, the species tree. We then investigate a simple filtering strategy where individual fragmentary sequences are removed from individual genes but the rest of the gene is retained. Both in simulations and by reanalyzing a large insect phylotranscriptomic data set, we show the effectiveness of this simple filtering strategy.
BMC Genomics | 2016
Erfan Sayyari; Siavash Mirarab
BackgroundInferring species trees from gene trees using the coalescent-based summary methods has been the subject of much attention, yet new scalable and accurate methods are needed.ResultsWe introduce DISTIQUE, a new statistically consistent summary method for inferring species trees from gene trees under the coalescent model. We generalize our results to arbitrary phylogenetic inference problems; we show that two arbitrarily chosen leaves, called anchors, can be used to estimate relative distances between all other pairs of leaves by inferring relevant quartet trees. This results in a family of distance-based tree inference methods, with running times ranging between quadratic to quartic in the number of leaves.ConclusionsWe show in simulated studies that DISTIQUE has comparable accuracy to leading coalescent-based summary methods and reduced running times.
BMC Bioinformatics | 2018
Chao Zhang; Maryam Rabiee; Erfan Sayyari; Siavash Mirarab
BackgroundEvolutionary histories can be discordant across the genome, and such discordances need to be considered in reconstructing the species phylogeny. ASTRAL is one of the leading methods for inferring species trees from gene trees while accounting for gene tree discordance. ASTRAL uses dynamic programming to search for the tree that shares the maximum number of quartet topologies with input gene trees, restricting itself to a predefined set of bipartitions.ResultsWe introduce ASTRAL-III, which substantially improves the running time of ASTRAL-II and guarantees polynomial running time as a function of both the number of species (n) and the number of genes (k). ASTRAL-III limits the bipartition constraint set (X) to grow at most linearly with n and k. Moreover, it handles polytomies more efficiently than ASTRAL-II, exploits similarities between gene trees better, and uses several techniques to avoid searching parts of the search space that are mathematically guaranteed not to include the optimal tree. The asymptotic running time of ASTRAL-III in the presence of polytomies is O(nk)1.726D
PLOS ONE | 2017
Uyen Mai; Erfan Sayyari; Siavash Mirarab
O\left ((nk)^{1.726} D \right)
Genes | 2018
Erfan Sayyari; Siavash Mirarab
where D=O(nk) is the sum of degrees of all unique nodes in input trees. The running time improvements enable us to test whether contracting low support branches in gene trees improves the accuracy by reducing noise. In extensive simulations, we show that removing branches with very low support (e.g., below 10%) improves accuracy while overly aggressive filtering is harmful. We observe on a biological avian phylogenomic dataset of 14K genes that contracting low support branches greatly improve results.ConclusionsASTRAL-III is a faster version of the ASTRAL method for phylogenetic reconstruction and can scale up to 10,000 species. With ASTRAL-III, low support branches can be removed, resulting in improved accuracy.
hawaii international conference on system sciences | 2016
Nadir Weibel; So-One Hwang; Steven Rick; Erfan Sayyari; Dan Lenzen; James D. Hollan
Phylogenetic trees inferred using commonly-used models of sequence evolution are unrooted, but the root position matters both for interpretation and downstream applications. This issue has been long recognized; however, whether the potential for discordance between the species tree and gene trees impacts methods of rooting a phylogenetic tree has not been extensively studied. In this paper, we introduce a new method of rooting a tree based on its branch length distribution; our method, which minimizes the variance of root to tip distances, is inspired by the traditional midpoint rerooting and is justified when deviations from the strict molecular clock are random. Like midpoint rerooting, the method can be implemented in a linear time algorithm. In extensive simulations that consider discordance between gene trees and the species tree, we show that the new method is more accurate than midpoint rerooting, but its relative accuracy compared to using outgroups to root gene trees depends on the size of the dataset and levels of deviations from the strict clock. We show high levels of error for all methods of rooting estimated gene trees due to factors that include effects of gene tree discordance, deviations from the clock, and gene tree estimation error. Our simulations, however, did not reveal significant differences between two equivalent methods for species tree estimation that use rooted and unrooted input, namely, STAR and NJst. Nevertheless, our results point to limitations of existing scalable rooting methods.
Molecular Phylogenetics and Evolution | 2018
Erfan Sayyari; James B. Whitfield; Siavash Mirarab
Phylogenetic species trees typically represent the speciation history as a bifurcating tree. Speciation events that simultaneously create more than two descendants, thereby creating polytomies in the phylogeny, are possible. Moreover, the inability to resolve relationships is often shown as a (soft) polytomy. Both types of polytomies have been traditionally studied in the context of gene tree reconstruction from sequence data. However, polytomies in the species tree cannot be detected or ruled out without considering gene tree discordance. In this paper, we describe a statistical test based on properties of the multi-species coalescent model to test the null hypothesis that a branch in an estimated species tree should be replaced by a polytomy. On both simulated and biological datasets, we show that the null hypothesis is rejected for all but the shortest branches, and in most cases, it is retained for true polytomies. The test, available as part of the Accurate Species TRee ALgorithm (ASTRAL) package, can help systematists decide whether their datasets are sufficient to resolve specific relationships of interest.
Molecular Phylogenetics and Evolution | 2018
Maryam Rabiee; Erfan Sayyari; Siavash Mirarab
Gestures, the visible body movements that are ubiquitous in human behavior, are key elements of natural communication. Understanding them is fundamental to designing computing applications with more natural forms of interaction. Both sign languages and everyday gestures reveal the rich signal capacity of this modality. However, although research is developing at fast pace, we still lack in-depth understanding of the elements that create the underlying symbolic signals. This is partly due to lack of tools for studying communicative movements in context. We introduce a novel approach to address this problem based on unobtrusive depth cameras and developed an infrastructure supporting naturalistic data collection. While we focus on sign language and gestures, the tools we developed are applicable for other types of body based research applications. We report on the quality of data collection, and we show how our approach can lead to novel insights and understanding of communicative movements.