Derrick J. Zwickl
University of Arizona
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Derrick J. Zwickl.
Systematic Biology | 2002
Derrick J. Zwickl; David M. Hillis
Several authors have argued recently that extensive taxon sampling has a positive and important effect on the accuracy of phylogenetic estimates. However, other authors have argued that there is little benefit of extensive taxon sampling, and so phylogenetic problems can or should be reduced to a few exemplar taxa as a means of reducing the computational complexity of the phylogenetic analysis. In this paper we examined five aspects of study design that may have led to these different perspectives. First, we considered the measurement of phylogenetic error across a wide range of taxon sample sizes, and conclude that the expected error based on randomly selecting trees (which varies by taxon sample size) must be considered in evaluating error in studies of the effects of taxon sampling. Second, we addressed the scope of the phylogenetic problems defined by different samples of taxa, and argue that phylogenetic scope needs to be considered in evaluating the importance of taxon-sampling strategies. Third, we examined the claim that fast and simple tree searches are as effective as more thorough searches at finding near-optimal trees that minimize error. We show that a more complete search of tree space reduces phylogenetic error, especially as the taxon sample size increases. Fourth, we examined the effects of simple versus complex simulation models on taxonomic sampling studies. Although benefits of taxon sampling are apparent for all models, data generated under more complex models of evolution produce higher overall levels of error and show greater positive effects of increased taxon sampling. Fifth, we asked if different phylogenetic optimality criteria show different effects of taxon sampling. Although we found strong differences in effectiveness of different optimality criteria as a function of taxon sample size, increased taxon sampling improved the results from all the common optimality criteria. Nonetheless, the method that showed the lowest overall performance (minimum evolution) also showed the least improvement from increased taxon sampling. Taking each of these results into account re-enforces the conclusion that increased sampling of taxa is one of the most important ways to increase overall phylogenetic accuracy.
Molecular Phylogenetics and Evolution | 2002
Thomas P. Wilcox; Derrick J. Zwickl; Tracy A. Heath; David M. Hillis
Four New World genera of dwarf boas (Exiliboa, Trachyboa, Tropidophis, and Ungaliophis) have been placed by many systematists in a single group (traditionally called Tropidophiidae). However, the monophyly of this group has been questioned in several studies. Moreover, the overall relationships among basal snake lineages, including the placement of the dwarf boas, are poorly understood. We obtained mtDNA sequence data for 12S, 16S, and intervening tRNA-val genes from 23 species of snakes representing most major snake lineages, including all four genera of New World dwarf boas. We then examined the phylogenetic position of these species by estimating the phylogeny of the basal snakes. Our phylogenetic analysis suggests that New World dwarf boas are not monophyletic. Instead, we find Exiliboa and Ungaliophis to be most closely related to sand boas (Erycinae), boas (Boinae), and advanced snakes (Caenophidea), whereas Tropidophis and Trachyboa form an independent clade that separated relatively early in snake radiation. Our estimate of snake phylogeny differs significantly in other ways from some previous estimates of snake phylogeny. For instance, pythons do not cluster with boas and sand boas, but instead show a strong relationship with Loxocemus and Xenopeltis. Additionally, uropeltids cluster strongly with Cylindrophis, and together are embedded in what has previously been considered the macrostomatan radiation. These relationships are supported by both bootstrapping (parametric and nonparametric approaches) and Bayesian analysis, although Bayesian support values are consistently higher than those obtained from nonparametric bootstrapping. Simulations show that Bayesian support values represent much better estimates of phylogenetic accuracy than do nonparametric bootstrap support values, at least under the conditions of our study.
Systematic Biology | 2002
David D. Pollock; Derrick J. Zwickl; Jimmy A. McGuire; David M. Hillis
Until recently, it was believed that complex phylogenies might be extremely difficult to reconstruct due to the phenomenal rate of increase in the number of possible phylogenies as the number of taxa increases. However, Hillis (1996) showed through simulation that, for at least one complex phylogeny of angiosperms with 228 taxa, reconstruction was far more accurate than expected, even with relatively modest amounts of DNA sequence data. This led to a flurry of papers on the subject of taxon sampling and phylogenetic reconstruction, with focus quickly shifting from the question of whether complex phylogenies can be reconstructed to whether and how much an existing phylogeny can be improved through increased taxon sampling (Hillis, 1998; Kim, 1998; Poe, 1998; Poe and Swofford, 1999; Pollock and Bruno, 2000; Rannala et al., 1998; Yang, 1998). Although a statistician might intuitively believe that it is generally better (or at least no worse) to increase the amount of data to resolve a question in statistical inference, the benefits of taxon addition for phylogenetic inference remain controversial. Some researchers have argued that taxon addition can decrease accuracy (Kim, 1996,1998), while others believe that increased sampling improves accuracy (Graybeal, 1998; Hillis, 1996, 1998; Murphy et al., 2001; Poe, 1998; Pollock and Bruno, 2000; Pollock et al., 2000; Soltis et al., 1999). The reasons that different papers come to apparently contradictory conclusions deserve careful consideration. An often cited factor affecting the benefits of taxon addition is the phenomenon of long-branch attraction (LBA). Some phylogenetic methods have a bias toward preferential clustering of long branches, leading to erroneous results when those long branches do not actually represent a monophyletic assemblage (Felsenstein, 1978; Hendy and Penny, 1989). This phenomenon has been cited in favor of increased taxon sampling, since sampling can be designed to break up long branches (Hillis, 1998). However, increased sampling has also been implicated as a potential cause of LBA because addition of a new long branch may wrongly attract a pre-existing long branch that had previously been inferred correctly (Poe and Swofford, 1999; Rannala et al., 1998). LBA may also explain some simulations that have found problems in phylogeny estimation when sampling outside the taxonomic group of interest (but see Pollock and Bruno [2000] for an alternative explanation). Outside sampling in these simulations tended to add long branches, which tended to attract the longest unbroken branch in the group of interest (Hillis, 1998; Rannala et al., 1998). The degree to which LBA is a problem depends greatly on the method of analysis, and LBA is much less of a problem for maximum likelihood (ML) than for parsimony or distance methods (Bruno and Halpern, 1999). A recent paper on the subject of taxon addition (Rosenberg and Kumar, 2001) concludes that increased taxon sampling is of little benefit to phylogenetic inference when compared to increasing sequence length. We disagree with their interpretation and believe that their data support the importance of increased taxon sampling. In addition, some of their data were simulated under extreme conditions (i.e., substitution rates that were very high or low, or sequences that were unreasonably short). Large error values and nonlinear relationships at these extremes make it difficult to interpret effects for the majority of the range, and averaging across the entire range is inappropriate. Moreover, we do not believe that Rosenberg and Kumar (2001) used the most appropriate metric to measure the relative effect of taxon addition. Our reanalysis of their simulated data indicates that increased taxon sampling is highly beneficial for phylogenetic inference.
Systematic Biology | 2003
David M. Hillis; David D. Pollock; Jimmy A. McGuire; Derrick J. Zwickl
Rosenberg and Kumar (2001) addressed the importance of taxon sampling in phylogenetic analysis and concluded that phylogenetic error is “largely independent of taxon sample size” (2001:10756) and that their “results do not provide evidence in favor of adding taxa to problematic phylogenies” (2001:10756). In response to these conclusions, Zwickl and Hillis (2002) and Pollock et al. (2002) conducted additional simulations and reanalyzed the data presented by Rosenberg and Kumar (2001). Zwickl and Hillis and Pollock et al. showed that these conclusions of Rosenberg and Kumar could not be supported either by analyses of their original data or by new simulations that corrected a number of deficiencies in Rosenberg and Kumar’s original experimental design. Both Zwickl and Hillis and Pollock et al. found that increased taxon sampling resulted in greatly reduced phylogenetic estimation error, and Pollock et al. showed that the benefits of increased taxon sampling were similar to adding an equivalent amount of sequence length for the same taxa (in the ranges simulated by Rosenberg and Kumar). In their response, Rosenberg and Kumar (2002) focused on a slightly different conclusion from that in their original paper, which was that “longer sequences, rather than extensive sampling, will better improve the accuracy of phylogenetic inference” (2001:10751). In 2001, Rosenberg and Kumar argued that the beneficial effect of increasing taxa was 10-fold lower than the beneficial effect of increasing sequence length and that the effects of increased taxon sampling for the same genes were negligible (“largely independently” of phylogenetic error). Rosenberg and Kumar (2002) have now concluded that the beneficial effect of increasing taxon sample size is not small, but they suggested that the benefit comes simply from the overall increase in size of the data matrix (the total number of characters × taxa). Furthermore, they maintained that there is a greater benefit to increasing the total sequence length for few taxa than can be obtained by increasing taxon sampling for the same genes. Here, we discuss the two sets of conclusions reached by Rosenberg and Kumar (2001, 2002).
Systematic Biology | 2012
Daniel L. Ayres; Aaron E. Darling; Derrick J. Zwickl; Peter Beerli; Mark T. Holder; Paul O. Lewis; John P. Huelsenbeck; Fredrik Ronquist; David L. Swofford; Michael P. Cummings; Andrew Rambaut; Marc A. Suchard
Abstract Phylogenetic inference is fundamental to our understanding of most aspects of the origin and evolution of life, and in recent years, there has been a concentration of interest in statistical approaches such as Bayesian inference and maximum likelihood estimation. Yet, for large data sets and realistic or interesting models of evolution, these approaches remain computationally demanding. High-throughput sequencing can yield data for thousands of taxa, but scaling to such problems using serial computing often necessitates the use of nonstatistical or approximate approaches. The recent emergence of graphics processing units (GPUs) provides an opportunity to leverage their excellent floating-point computational performance to accelerate statistical phylogenetic inference. A specialized library for phylogenetic calculation would allow existing software packages to make more effective use of available computer hardware, including GPUs. Adoption of a common library would also make it easier for other emerging computing architectures, such as field programmable gate arrays, to be used in the future. We present BEAGLE, an application programming interface (API) and library for high-performance statistical phylogenetic inference. The API provides a uniform interface for performing phylogenetic likelihood calculations on a variety of compute hardware platforms. The library includes a set of efficient implementations and can currently exploit hardware including GPUs using NVIDIA CUDA, central processing units (CPUs) with Streaming SIMD Extensions and related processor supplementary instruction sets, and multicore CPUs via OpenMP. To demonstrate the advantages of a common API, we have incorporated the library into several popular phylogenetic software packages. The BEAGLE library is free open source software licensed under the Lesser GPL and available from http://beagle-lib.googlecode.com. An example client program is available as public domain software.
PLOS ONE | 2013
Jerome C. Regier; Charles Mitter; Andreas Zwick; Adam L. Bazinet; Michael P. Cummings; Akito Y. Kawahara; Jae-Cheon Sohn; Derrick J. Zwickl; Soowon Cho; Donald R. Davis; Joaquin Baixeras; John W. Brown; Cynthia Sims Parr; Susan J. Weller; David C. Lees; Kim T. Mitter
Background Higher-level relationships within the Lepidoptera, and particularly within the species-rich subclade Ditrysia, are generally not well understood, although recent studies have yielded progress. We present the most comprehensive molecular analysis of lepidopteran phylogeny to date, focusing on relationships among superfamilies. Methodology / Principal Findings 483 taxa spanning 115 of 124 families were sampled for 19 protein-coding nuclear genes, from which maximum likelihood tree estimates and bootstrap percentages were obtained using GARLI. Assessment of heuristic search effectiveness showed that better trees and higher bootstrap percentages probably remain to be discovered even after 1000 or more search replicates, but further search proved impractical even with grid computing. Other analyses explored the effects of sampling nonsynonymous change only versus partitioned and unpartitioned total nucleotide change; deletion of rogue taxa; and compositional heterogeneity. Relationships among the non-ditrysian lineages previously inferred from morphology were largely confirmed, plus some new ones, with strong support. Robust support was also found for divergences among non-apoditrysian lineages of Ditrysia, but only rarely so within Apoditrysia. Paraphyly for Tineoidea is strongly supported by analysis of nonsynonymous-only signal; conflicting, strong support for tineoid monophyly when synonymous signal was added back is shown to result from compositional heterogeneity. Conclusions / Significance Support for among-superfamily relationships outside the Apoditrysia is now generally strong. Comparable support is mostly lacking within Apoditrysia, but dramatically increased bootstrap percentages for some nodes after rogue taxon removal, and concordance with other evidence, strongly suggest that our picture of apoditrysian phylogeny is approximately correct. This study highlights the challenge of finding optimal topologies when analyzing hundreds of taxa. It also shows that some nodes get strong support only when analysis is restricted to nonsynonymous change, while total change is necessary for strong support of others. Thus, multiple types of analyses will be necessary to fully resolve lepidopteran phylogeny.
Systematic Biology | 2014
Adam L. Bazinet; Derrick J. Zwickl; Michael P. Cummings
We introduce molecularevolution.org, a publicly available gateway for high-throughput, maximum-likelihood phylogenetic analysis powered by grid computing. The gateway features a garli 2.0 web service that enables a user to quickly and easily submit thousands of maximum likelihood tree searches or bootstrap searches that are executed in parallel on distributed computing resources. The garli web service allows one to easily specify partitioned substitution models using a graphical interface, and it performs sophisticated post-processing of phylogenetic results. Although the garli web service has been used by the research community for over three years, here we formally announce the availability of the service, describe its capabilities, highlight new features and recent improvements, and provide details about how the grid system efficiently delivers high-quality phylogenetic results. [garli, gateway, grid computing, maximum likelihood, molecular evolution portal, phylogenetics, web service.]
Systematic Biology | 2008
Tracy A. Heath; Derrick J. Zwickl; Junhyong Kim; David M. Hillis
Bertrand, Y., and M. Harlin. 2006. Stability and universality in the application of taxon names in phylogenetic nomenclature. Syst. Biol. 55:848–858. Bininda-Emonds, O. R. P. 2004. The evolution of supertrees. Trends Ecol. Evol. 19:315–322. Bryant, H. N., and P. D. Cantino. 2002. A review of criticisms of phylogenetic nomenclature: Is taxonomic freedom the fundamental issue? Biol. Rev. 77:39–55. Cantino, P. D., and K. de Queiroz. 2007. International Code of Phylogenetic Nomenclature, Version 4b. Available at http://www.ohiou. edu/phylocode. Cantino, P. D., and R. G. Olmstead. 2004. Phylogenetic nomenclature of Lamiaceae. Abstracts of the First International Phylogenetic Nomenclature Meeting (Paris): 13. Available at http://www.ohiou. edu/phylocode/events.html. Cantino, P. D., R. G. Olmstead, and S. J. Wagstaff. 1997. A comparison of phylogenetic nomenclature with the current system: A botanical case study. Syst. Biol. 46:313–331.
Proceedings of the National Academy of Sciences of the United States of America | 2010
Matthe W. E. Arnegard; Derrick J. Zwickl; Ying Lu; Harold H. Zakon
The genetic basis of parallel innovation remains poorly understood due to the rarity of independent origins of the same complex trait among model organisms. We focus on two groups of teleost fishes that independently gained myogenic electric organs underlying electrical communication. Earlier work suggested that a voltage-gated sodium channel gene (Scn4aa), which arose by whole-genome duplication, was neofunctionalized for expression in electric organ and subsequently experienced strong positive selection. However, it was not possible to determine if these changes were temporally linked to the independent origins of myogenic electric organs in both lineages. Here, we test predictions of such a relationship. We show that Scn4aa co-option and rapid sequence evolution were tightly coupled to the two origins of electric organ, providing strong evidence that Scn4aa contributed to parallel innovations underlying the evolutionary diversification of each electric fish group. Independent evolution of electric organs and Scn4aa co-option occurred more than 100 million years following the origin of Scn4aa by duplication. During subsequent diversification of the electrical communication channels, amino acid substitutions in both groups occurred in the same regions of the sodium channel that likely contribute to electric signal variation. Thus, the phenotypic similarities between independent electric fish groups are also associated with striking parallelism at genetic and molecular levels. Our results show that gene duplication can contribute to remarkably similar innovations in repeatable ways even after long waiting periods between gene duplication and the origins of novelty.
Proceedings of the National Academy of Sciences of the United States of America | 2010
Diane I. Scaduto; Jeremy M. Brown; Wade C. Haaland; Derrick J. Zwickl; David M. Hillis; Michael L. Metzker
Phylogenetic analysis has been widely used to test the a priori hypothesis of epidemiological clustering in suspected transmission chains of HIV-1. Among studies showing strong support for relatedness between HIV samples obtained from infected individuals, evidence for the direction of transmission between epidemiologically related pairs has been lacking. During transmission of HIV, a genetic bottleneck occurs, resulting in the paraphyly of source viruses with respect to those of the recipient. This paraphyly establishes the direction of transmission, from which the source can then be inferred. Here, we present methods and results from two criminal cases, State of Washington v Anthony Eugene Whitfield, case number 04-1-0617-5 (Superior Court of the State of Washington, Thurston County, 2004) and State of Texas v Philippe Padieu, case numbers 219-82276-07, 219-82277-07, 219-82278-07, 219-82279-07, 219-82280-07, and 219-82705-07 (219th Judicial District Court, Collin County, TX, 2009), which provided evidence that direction can be established from blinded case samples. The observed paraphyly from each case study led to the identification of an inferred source (i.e., index case), whose identity was revealed at trial to be that of the defendant.