Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Huateng Huang is active.

Publication


Featured researches published by Huateng Huang.


Systematic Biology | 2016

Unforeseen Consequences of Excluding Missing Data from Next-Generation Sequences: Simulation Study of RAD Sequences

Huateng Huang; L. Lacey Knowles

There is a lack of consensus on how next-generation sequence (NGS) data should be considered for phylogenetic and phylogeographic estimates, with some studies excluding loci with missing data, whereas others include them, even when sequences are missing from a large number of individuals. Here, we use simulations, focusing specifically on RAD (Restriction site Associated DNA) sequences, to highlight some of the unforeseen consequence of excluding missing data from next-generation sequencing. Specifically, we show that in addition to the obvious effects associated with reducing the amount of data used to make historical inferences, the decisions we make about missing data (such as the minimum number of individuals with a sequence for a locus to be included in the study) also impact the types of loci sampled for a study. In particular, as the tolerance for missing data becomes more stringent, the mutational spectrum represented in the sampled loci becomes truncated such that loci with the highest mutation rates are disproportionately excluded. This effect is exacerbated further by factors involved in the preparation of the genomic library (i.e., the use of reduced representation libraries, as well as the coverage) and the taxonomic diversity represented in the library (i.e., the level of divergence among the individuals). We demonstrate that the intuitive appeals about being conservative by removing loci may be misguided. [Next-generation sequencing; phylogenetic; phylogeography; RADseq; RADtags; species delimitation.].


Systematic Biology | 2010

Sources of Error Inherent in Species-Tree Estimation: Impact of Mutational and Coalescent Effects on Accuracy and Implications for Choosing among Different Methods

Huateng Huang; Qixin He; Laura Kubatko; L. Lacey Knowles

Discord in the estimated gene trees among loci can be attributed to both the process of mutation and incomplete lineage sorting. Effectively modeling these two sources of variation--mutational and coalescent variance--provides two distinct challenges for phylogenetic studies. Despite extensive investigation on mutational models for gene-tree estimation over the past two decades and recent attention to modeling of the coalescent process for phylogenetic estimation, the effects of these two variances have yet to be evaluated simultaneously. Here, we partition the effects of mutational and coalescent processes on phylogenetic accuracy by comparing the accuracy of species trees estimated from gene trees (i.e., the actual coalescent genealogies) with that of species trees estimated from estimated gene trees (i.e., trees estimated from nucleotide sequences, which contain both coalescent and mutational variance). Not only is there a significant contribution of both mutational and coalescent variance to errors in species-tree estimates, but the relative magnitude of the effects on the accuracy of species-tree estimation also differs systematically depending on 1) the timing of divergence, 2) the sampling design, and 3) the method used for species-tree estimation. These findings explain why using more information contained in gene trees (e.g., topology and branch lengths as opposed to just topology) does not necessarily translate into pronounced gains in accuracy, highlighting the strengths and limits of different methods for species-tree estimation. Differences in accuracy scores between methods for different sampling regimes also emphasize that it would be a mistake to assume more computationally intensive species-tree estimation procedures that will always provide better estimates of species trees. To the contrary, the performance of a method depends not only on the method per se but also on the compatibilities between the input genetic data and the method as determined by the relative impact of mutational and coalescent variance.


Systematic Biology | 2009

Maximum Likelihood Estimates of Species Trees: How Accuracy of Phylogenetic Inference Depends upon the Divergence History and Sampling Design

John E. McCormack; Huateng Huang; L. Lacey Knowles

The understanding that gene trees are often in discord with each other and with the species trees that contain them has led researchers to methods that incorporate the inherent stochasticity of genetic processes in the phylogenetic estimation procedure. Recently developed methods for species-tree estimation that not only consider the retention and sorting of ancestral polymorphism but also quantify the actual probabilities of incomplete lineage sorting are expected to provide an improvement over earlier summary-statistic based approaches that discard much of the information content of gene trees. However, these new methods have yet to be tested on truly challenging evolutionary histories such as those marked by recent rapid speciation where high levels of incomplete lineage sorting and discord among gene trees predominate. Here, we test a new maximum-likelihood method that incorporates stochastic models of both nucleotide substitution and lineage sorting for species-tree estimation. Using a simulation approach, we consider a broad range of species-tree topologies under 2 scenarios representing moderate and severe incomplete lineage sorting. We show that the maximum-likelihood method results in more accurate species trees than a summary-statistic based approach, demonstrating that information contained in discordant gene trees can be effectively extracted using a full probabilistic model. Moreover, we demonstrate that the shape of the original species tree (i.e., the relative lengths of internal branches) has a significant impact on whether the species tree is estimated accurately. In the speciation histories explored here, it is not just the recent origin of species that affects the accuracy of the estimates but the variance in relative species divergence times as well. Additionally, we show that sampling effort (number of individuals and/or loci) and sampling design (ratio of individuals to loci) are both important factors affecting the accuracy of species-tree estimates, which is again affected by the relative timing of divergence among species. The inherent difficulties of estimating relationships when species have undergone a recent radiation are discussed, and in particular, the limitations with maximum-likelihood estimates of species trees that do not consider uncertainty in the estimated gene trees of individual loci. Thus, despite substantial improvements over current summary-statistic based approaches, and the increased sophistication of procedures that incorporate the process of gene lineage coalescence, recent radiations still appear to pose daunting challenges for phylogenetics.


Systematic Biology | 2009

What is the danger of the anomaly zone for empirical phylogenetics

Huateng Huang; L. Lacey Knowles

The increasing number of observations of gene trees with discordant topologies in phylogenetic studies has raised awareness about the problems of incongruence between species trees and gene trees. Moreover, theoretical treatments focusing on the impact of coalescent variance on phylogenetic study have also identified situations where the most probable gene trees are ones that do not match the underlying species tree (i.e., anomalous gene trees [AGTs]). However, although the theoretical proof of the existence of AGTs is alarming, the actual risk that AGTs pose to empirical phylogenetic study is far from clear. Establishing the conditions (i.e., the branch lengths in a species tree) for which AGTs are possible does not address the critical issue of how prevalent they might be. Furthermore, theoretical characterization of the species trees for which AGTs may pose a problem (i.e., the anomaly zone or the species histories for which AGTs are theoretically possible) is based on consideration of just one source of variance that contributes to species tree and gene tree discord-gene lineage coalescence. Yet, empirical data contain another important stochastic component-mutational variance. Estimated gene trees will differ from the underlying gene trees (i.e., the actual genealogy) because of the random process of mutation. Here, we take a simulation approach to investigate the prevalence of AGTs, among estimated gene trees, thereby characterizing the boundaries of the anomaly zone taking into account both coalescent and mutational variances. We also determine the frequency of realized AGTs, which is critical to putting the theoretical work on AGTs into a realistic biological context. Two salient results emerge from this investigation. First, our results show that mutational variance can indeed expand the parameter space (i.e., the relative branch lengths in a species tree) where AGTs might be observed in empirical data. By exploring the underlying cause for the expanded anomaly zone, we identify aspects of empirical data relevant to avoiding the problems that AGTs pose for species tree inference from multilocus data. Second, for the empirical species histories where AGTs are possible, unresolved trees-not AGTs-predominate the pool of estimated gene trees. This result suggests that the risk of AGTs, while they exist in theory, may rarely be realized in practice. By considering the biological realities of both mutational and coalescent variances, the study has refined, and redefined, what the actual challenges are for empirical phylogenetic study of recently diverged taxa that have speciated rapidly-AGTs themselves are unlikely to pose a significant danger to empirical phylogenetic study.


Systematic Biology | 2016

A Robust Semi-Parametric Test for Detecting Trait-Dependent Diversification.

Daniel L. Rabosky; Huateng Huang

Rates of species diversification vary widely across the tree of life and there is considerable interest in identifying organismal traits that correlate with rates of speciation and extinction. However, it has been challenging to develop methodological frameworks for testing hypotheses about trait-dependent diversification that are robust to phylogenetic pseudoreplication and to directionally biased rates of character change. We describe a semi-parametric test for trait-dependent diversification that explicitly requires replicated associations between character states and diversification rates to detect effects. To use the method, diversification rates are reconstructed across a phylogenetic tree with no consideration of character states. A test statistic is then computed to measure the association between species-level traits and the corresponding diversification rate estimates at the tips of the tree. The empirical value of the test statistic is compared to a null distribution that is generated by structured permutations of evolutionary rates across the phylogeny. The test is applicable to binary discrete characters as well as continuous-valued traits and can accommodate extremely sparse sampling of character states at the tips of the tree. We apply the test to several empirical data sets and demonstrate that the method has acceptable Type I error rates.


Molecular Phylogenetics and Evolution | 2014

How low can you go? The effects of mutation rate on the accuracy of species-tree estimation

Hayley C. Lanier; Huateng Huang; L. Lacey Knowles

Although species-tree methods have been widely adopted for multi-locus data, little consideration has been given to the source and character of the loci used in these approaches. Decisions about which loci to target in empirical studies are typically constrained by availability, technology and funds - characteristics that are not typically considered in simulation studies. As a result, most real-world datasets often combine one or two variable loci (such as mtDNA or chloroplast loci) with multiple lower-variation loci to estimate species trees. These locus selections impact the accuracy and the resolution of a phylogeny. Furthermore, the fact that using a larger sample of loci can result in lower posterior probabilities has been used as an excuse to drop loci from an analysis. Here we address these issues directly through a simulation approach designed to mimic situations arising in empirical datasets by combining loci with differing mutation rates. We show that low-variation loci can be utilized in species-tree analyses that account for gene-tree uncertainty (e.g., a Bayesian framework), whereas maximum likelihood approaches show no improvement in accuracy when low-variation loci are added. We demonstrate that limited phylogenetic signal associated with low-variation loci constrains gains in species-tree estimation accuracy when adding loci. Lastly, we demonstrate that the inclusion of only a handful of loci with higher mutation rates, and hence greater phylogenetic information content, can make a tremendous difference in the accuracy of species-tree estimates, suggesting that empiricists should consider the quality, and not just quantity, of loci in multi-locus phylogenetic analyses.


Molecular Ecology | 2008

Molecular evidence of a peripatric origin for two sympatric species of field crickets (Gryllus rubens and G. texensis) revealed from coalescent simulations and population genetic tests

David A. Gray; Huateng Huang; L. Lacey Knowles

Species pairs that differ primarily in characters involved in mating interactions and are largely sympatric raise intriguing questions about the mode of speciation. When species divergence is relatively recent, the footprint of the demographic history during speciation might be preserved and used to reconstruct the biogeography of species divergence. In this study, patterns of genetic variation were examined throughout the geographical range of two cryptic sister taxa of field crickets, Gryllus texensis and G. rubens; mitochondrial cytochrome oxidase I (COI) was sequenced in 365 individuals sampled from 48 localities. Despite significant molecular divergence between the species, they were not reciprocally monophyletic. We devised several analyses to statistically explore what historical processes might have given rise to this genealogical structure. The analyses indicated that the biogeographical pattern of genetic variation does not support a model of recent gene flow between species. Instead, coalescent simulations suggested that the genealogical structure within G. texensis, namely a deep split between two geographically overlapping clades, reflects historical substructure within G. texensis. Additional tests that consider the concentration of G. rubens haplotypes in one of the two G. texensis genetic clusters suggest a model of speciation in which G. rubens was derived from one lineage of a geographically subdivided ancestor. These results indicate that, despite the contemporary sympatry of G. texensis and G. rubens, the data are indicative of an peripatric origin in which G. rubens was derived from one of the two historical partitions in the species currently recognized as G. texensis. This proposed model of species divergence suggests how the interplay of geography and selection may give rise to new species, although this requires testing with multilocus data. Specifically, the model highlights how that geographical partitioning of ancestral variation in the past may augment the selectively driven divergence of characters involved in the reproductive isolation of the species today.


Proceedings of the Royal Society B: Biological Sciences | 2015

Minimal effects of latitude on present-day speciation rates in New World birds

Daniel L. Rabosky; Pascal O. Title; Huateng Huang

The tropics contain far greater numbers of species than temperate regions, suggesting that rates of species formation might differ systematically between tropical and non-tropical areas. We tested this hypothesis by reconstructing the history of speciation in New World (NW) land birds using BAMM, a Bayesian framework for modelling complex evolutionary dynamics on phylogenetic trees. We estimated marginal distributions of present-day speciation rates for each of 2571 species of birds. The present-day rate of speciation varies approximately 30-fold across NW birds, but there is no difference in the rate distributions for tropical and temperate taxa. Using macroevolutionary cohort analysis, we demonstrate that clades with high tropical membership do not produce species more rapidly than temperate clades. For nearly any value of present-day speciation rate, there are far more species in the tropics than the temperate zone. Any effects of latitude on speciation rate are marginal in comparison to the dramatic variation in rates among clades.


Molecular Phylogenetics and Evolution | 2014

Do estimated and actual species phylogenies match? Evaluation of East African cichlid radiations.

Huateng Huang; Lucy A. P. Tran; L. Lacey Knowles

A large number of published phylogenetic estimates are based on a single locus or the concatenation of multiple loci, even though genealogies of single or concatenated loci may not accurately reflect the true history of species diversification (i.e., the species tree). The increased availability of genomic data, coupled with new computational methods, improves resolution of species relationships beyond what was possible in the past. Such developments will no doubt benefit future phylogenetic studies. It remains unclear how robust phylogenies that predate these developments (i.e., the bulk of phylogenetic studies) are to departures from the assumption of strict gene tree-species tree concordance. Here, we present a parametric bootstrap (PBST) approach that assesses the reliability of past phylogenetic estimates in which gene tree-species tree discord was ignored. We focus on a universal cause of discord-the random loss of gene lineages from genetic drift-and apply the method in a meta-analysis of East African cichlids, a group encompassing historical scenarios that are particularly challenging for phylogenetic estimation. Although we identify some evolutionary relationships that are robust to gene tree discord, many past phylogenetic estimates of cichlids are not. We discuss the utility of the PBST method for evaluating the robustness of gene tree-based phylogenetic estimations in general as well as for testing the clade-specific performance of species tree estimation methods and designing sampling strategies that increase the accuracy of estimated species relationships.


Proceedings of the Royal Society B: Biological Sciences | 2017

Genetic diversity is largely unpredictable but scales with museum occurrences in a species-rich clade of Australian lizards

Sonal Singhal; Huateng Huang; Pascal O. Title; Stephen C. Donnellan; Iris Holmes; Daniel L. Rabosky

Genetic diversity is a fundamental characteristic of species and is affected by many factors, including mutation rate, population size, life history and demography. To better understand the processes that influence levels of genetic diversity across taxa, we collected genome-wide restriction-associated DNA data from more than 500 individuals spanning 76 nominal species of Australian scincid lizards in the genus Ctenotus. To avoid potential biases associated with variation in taxonomic practice across the group, we used coalescent-based species delimitation to delineate 83 species-level lineages within the genus for downstream analyses. We then used these genetic data to infer levels of within-population genetic diversity. Using a phylogenetically informed approach, we tested whether variation in genetic diversity could be explained by population size, environmental heterogeneity or historical demography. We find that the strongest predictor of genetic diversity is a novel proxy for census population size: the number of vouchered occurrences in museum databases. However, museum occurrences only explain a limited proportion of the variance in genetic diversity, suggesting that genetic diversity might be difficult to predict at shallower phylogenetic scales.

Collaboration


Dive into the Huateng Huang's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Iris Holmes

University of Michigan

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

David A. Gray

California State University

View shared research outputs
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge