David A. Duchêne
University of Sydney
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by David A. Duchêne.
Molecular Biology and Evolution | 2015
Sebastián Duchêne; David A. Duchêne; Edward C. Holmes; Simon Y. W. Ho
Rates and timescales of viral evolution can be estimated using phylogenetic analyses of time-structured molecular sequences. This involves the use of molecular-clock methods, calibrated by the sampling times of the viral sequences. However, the spread of these sampling times is not always sufficient to allow the substitution rate to be estimated accurately. We conducted Bayesian phylogenetic analyses of simulated virus data to evaluate the performance of the date-randomization test, which is sometimes used to investigate whether time-structured data sets have temporal signal. An estimate of the substitution rate passes this test if its mean does not fall within the 95% credible intervals of rate estimates obtained using replicate data sets in which the sampling times have been randomized. We find that the test sometimes fails to detect rate estimates from data with no temporal signal. This error can be minimized by using a more conservative criterion, whereby the 95% credible interval of the estimate with correct sampling times should not overlap with those obtained with randomized sampling times. We also investigated the behavior of the test when the sampling times are not uniformly distributed throughout the tree, which sometimes occurs in empirical data sets. The test performs poorly in these circumstances, such that a modification to the randomization scheme is needed. Finally, we illustrate the behavior of the test in analyses of nucleotide sequences of cereal yellow dwarf virus. Our results validate the use of the date-randomization test and allow us to propose guidelines for interpretation of its results.
Molecular Ecology Resources | 2015
Simon Y. W. Ho; Sebastián Duchêne; David A. Duchêne
Evolutionary timescales can be estimated from genetic data using phylogenetic methods based on the molecular clock. To account for molecular rate variation among lineages, a number of relaxed‐clock models have been developed. Some of these models assume that rates vary among lineages in an autocorrelated manner, so that closely related species share similar rates. In contrast, uncorrelated relaxed clocks allow all of the branch‐specific rates to be drawn from a single distribution, without assuming any correlation between rates along neighbouring branches. There is uncertainty about which of these two classes of relaxed‐clock models are more appropriate for biological data. We present an R package, NELSI, that allows the evolution of DNA sequences to be simulated according to a range of clock models. Using data generated by this package, we assessed the ability of two Bayesian phylogenetic methods to distinguish among different relaxed‐clock models and to quantify rate variation among lineages. The results of our analyses show that rate autocorrelation is typically difficult to detect, even when there is complete taxon sampling. This provides a potential explanation for past failures to detect rate autocorrelation in a range of data sets.
BMC Evolutionary Biology | 2013
David A. Duchêne; Lindell Bromham
BackgroundMany factors have been identified as correlates of the rate of molecular evolution, such as body size and generation length. Analysis of many molecular phylogenies has also revealed correlations between substitution rates and clade size, suggesting a link between rates of molecular evolution and the process of diversification. However, it is not known whether this relationship applies to all lineages and all sequences. Here, in order to investigate how widespread this phenomenon is, we investigate patterns of substitution in chloroplast genomes of the diverse angiosperm family Proteaceae. We used DNA sequences from six chloroplast genes (6278bp alignment with 62 taxa) to test for a correlation between diversification and the rate of substitutions.ResultsUsing phylogenetically-independent sister pairs, we show that species-rich lineages of Proteaceae tend to have significantly higher chloroplast substitution rates, for both synonymous and non-synonymous substitutions.ConclusionsWe show that the rate of molecular evolution in chloroplast genomes is correlated with net diversification rates in this large plant family. We discuss the possible causes of this relationship, including molecular evolution driving diversification, speciation increasing the rate of substitutions, or a third factor causing an indirect link between molecular and diversification rates. The link between the synonymous substitution rate and clade size is consistent with a role for the mutation rate of chloroplasts driving the speed of reproductive isolation. We find no significant differences in the ratio of non-synonymous to synonymous substitutions between lineages differing in net diversification rate, therefore we detect no signal of population size changes or alteration in selection pressures that might be causing this relationship.
Molecular Biology and Evolution | 2015
David A. Duchêne; Sebastián Duchêne; Edward C. Holmes; Simon Y. W. Ho
Abstract Molecular clock models are commonly used to estimate evolutionary rates and timescales from nucleotide sequences. The goal of these models is to account for rate variation among lineages, such that they are assumed to be adequate descriptions of the processes that generated the data. A common approach for selecting a clock model for a data set of interest is to examine a set of candidates and to select the model that provides the best statistical fit. However, this can lead to unreliable estimates if all the candidate models are actually inadequate. For this reason, a method of evaluating absolute model performance is critical. We describe a method that uses posterior predictive simulations to assess the adequacy of clock models. We test the power of this approach using simulated data and find that the method is sensitive to bias in the estimates of branch lengths, which tends to occur when using underparameterized clock models. We also compare the performance of the multinomial test statistic, originally developed to assess the adequacy of substitution models, but find that it has low power in identifying the adequacy of clock models. We illustrate the performance of our method using empirical data sets from coronaviruses, simian immunodeficiency virus, killer whales, and marine turtles. Our results indicate that methods of investigating model adequacy, including the one proposed here, should be routinely used in combination with traditional model selection in evolutionary studies. This will reveal whether a broader range of clock models to be considered in phylogenetic analysis.
Molecular Ecology Resources | 2015
David A. Duchêne; Sebastián Duchêne; Simon Y. W. Ho
Phylogenetic estimation of evolutionary timescales has become routine in biology, forming the basis of a wide range of evolutionary and ecological studies. However, there are various sources of bias that can affect these estimates. We investigated whether tree imbalance, a property that is commonly observed in phylogenetic trees, can lead to reduced accuracy or precision of phylogenetic timescale estimates. We analysed simulated data sets with calibrations at internal nodes and at the tips, taking into consideration different calibration schemes and levels of tree imbalance. We also investigated the effect of tree imbalance on two empirical data sets: mitogenomes from primates and serial samples of the African swine fever virus. In analyses calibrated using dated, heterochronous tips, we found that tree imbalance had a detrimental impact on precision and produced a bias in which the overall timescale was underestimated. A pronounced effect was observed in analyses with shallow calibrations. The greatest decreases in accuracy usually occurred in the age estimates for medium and deep nodes of the tree. In contrast, analyses calibrated at internal nodes did not display a reduction in estimation accuracy or precision due to tree imbalance. Our results suggest that molecular‐clock analyses can be improved by increasing taxon sampling, with the specific aims of including deeper calibrations, breaking up long branches and reducing tree imbalance.
Molecular Phylogenetics and Evolution | 2013
David A. Duchêne; Selma O. Klanten; Philip L. Munday; Jürgen Herler; Lynne van Herwerden
Graphical abstract
Molecular Biology and Evolution | 2017
David A. Duchêne; Sebastián Duchêne; Simon Y. W. Ho
In statistical phylogenetic analyses of DNA sequences, models of evolutionary change commonly assume that base composition is stationary through time and across lineages. This assumption is violated by many data sets, but it is unclear whether the magnitude of these violations is sufficient to mislead phylogenetic inference. We investigated the impacts of compositional heterogeneity on phylogenetic estimates using a method for assessing model adequacy. Based on a detailed simulation study, we found that common frequentist criteria are highly conservative, such that the model is often rejected when the phylogenetic estimates do not show clear signs of bias. We propose new criteria and provide guidelines for their usage. We apply these criteria to genome-scale data from 40 birds and find that loci with severely non-homogeneous base composition are uncommon. Our results show the importance of using well-informed diagnostic statistics when testing model adequacy for phylogenomic analyses.
Systematic Biology | 2018
David A. Duchêne; Jason G. Bragg; Sebastián Duchêne; Linda E. Neaves; Sally Potter; Craig Moritz; Rebecca N. Johnson; Simon Y. W. Ho; Mark D. B. Eldridge
&NA; A fundamental challenge in resolving evolutionary relationships across the tree of life is to account for heterogeneity in the evolutionary signal across loci. Studies of marsupial mammals have demonstrated that this heterogeneity can be substantial, leaving considerable uncertainty in the evolutionary timescale and relationships within the group. Using simulations and a new phylogenomic data set comprising nucleotide sequences of 1550 loci from 18 of the 22 extant marsupial families, we demonstrate the power of a method for identifying clusters of loci that support different phylogenetic trees. We find two distinct clusters of loci, each providing an estimate of the species tree that matches previously proposed resolutions of the marsupial phylogeny. We also identify a well‐supported placement for the enigmatic marsupial moles (Notoryctes) that contradicts previous molecular estimates but is consistent with morphological evidence. The pattern of gene‐tree variation across tree‐space is characterized by changes in information content, GC content, substitution‐model adequacy, and signatures of purifying selection in the data. In a simulation study, we show that incomplete lineage sorting can explain the division of loci into the two tree‐topology clusters, as found in our phylogenomic analysis of marsupials. We also demonstrate the potential benefits of minimizing uncertainty from phylogenetic conflict for molecular dating. Our analyses reveal that Australasian marsupials appeared in the early Paleocene, whereas the diversification of present‐day families occurred primarily during the late Eocene and early Oligocene. Our methods provide an intuitive framework for improving the accuracy and precision of phylogenetic inference and molecular dating using genome‐scale data.
Genome Biology and Evolution | 2016
Simon Y. W. Ho; Amanda X. Y. Chen; Luana S. F. Lins; David A. Duchêne; Nathan Lo
Abstract The molecular clock is a valuable and widely used tool for estimating evolutionary rates and timescales in biological research. There has been considerable progress in the theory and practice of molecular clocks over the past five decades. Although the idea of a molecular clock was originally put forward in the context of protein evolution and advanced using various biochemical techniques, it is now primarily applied to analyses of DNA sequences. An interesting but very underappreciated aspect of molecular clocks is that they can be based on genetic data other than DNA or protein sequences. For example, evolutionary timescales can be estimated using microsatellites, protein folds, and even the extent of recombination. These genome features hold great potential for molecular dating, particularly in cases where nucleotide sequences might be uninformative or unreliable. Here we present an outline of the different genetic data types that have been used for molecular dating, and we describe the features that good molecular clocks should possess. We hope that our article inspires further work on the genome as an evolutionary timepiece.
BMC Evolutionary Biology | 2016
Sebastián Duchêne; David A. Duchêne; Francesca Di Giallonardo; John-Sebastian Eden; Jemma L. Geoghegan; Kathryn E. Holt; Simon Y. W. Ho; Edward C. Holmes
BackgroundRecent developments in Bayesian phylogenetic models have increased the range of inferences that can be drawn from molecular sequence data. Accordingly, model selection has become an important component of phylogenetic analysis. Methods of model selection generally consider the likelihood of the data under the model in question. In the context of Bayesian phylogenetics, the most common approach involves estimating the marginal likelihood, which is typically done by integrating the likelihood across model parameters, weighted by the prior. Although this method is accurate, it is sensitive to the presence of improper priors. We explored an alternative approach based on cross-validation that is widely used in evolutionary analysis. This involves comparing models according to their predictive performance.ResultsWe analysed simulated data and a range of viral and bacterial data sets using a cross-validation approach to compare a variety of molecular clock and demographic models. Our results show that cross-validation can be effective in distinguishing between strict- and relaxed-clock models and in identifying demographic models that allow growth in population size over time. In most of our empirical data analyses, the model selected using cross-validation was able to match that selected using marginal-likelihood estimation. The accuracy of cross-validation appears to improve with longer sequence data, particularly when distinguishing between relaxed-clock models.ConclusionsCross-validation is a useful method for Bayesian phylogenetic model selection. This method can be readily implemented even when considering complex models where selecting an appropriate prior for all parameters may be difficult.