Xing-Xing Shen
Vanderbilt University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Xing-Xing Shen.
Nature Ecology and Evolution | 2017
Xing-Xing Shen; Chris Todd Hittinger; Antonis Rokas
Phylogenomic studies have resolved countless branches of the tree of life, but remain strongly contradictory on certain, contentious relationships. Here, we use a maximum likelihood framework to quantify the distribution of phylogenetic signal among genes and sites for 17 contentious branches and 6 well-established control branches in plant, animal and fungal phylogenomic data matrices. We find that resolution in some of these 17 branches rests on a single gene or a few sites, and that removal of a single gene in concatenation analyses or a single site from every gene in coalescence-based analyses diminishes support and can alter the inferred topology. These results suggest that tiny subsets of very large data matrices drive the resolution of specific internodes, providing a dissection of the distribution of support and observed incongruence in phylogenomic analyses. We submit that quantifying the distribution of phylogenetic signal in phylogenomic data is essential for evaluating whether branches, especially contentious ones, are truly resolved. Finally, we offer one detailed example of such an evaluation for the controversy regarding the earliest-branching metazoan phylum, for which examination of the distributions of gene-wise and site-wise phylogenetic signal across eight data matrices consistently supports ctenophores as the sister group to all other metazoans.
G3: Genes, Genomes, Genetics | 2016
Xing-Xing Shen; Xiaofan Zhou; Jacek Kominek; Cletus P. Kurtzman; Chris Todd Hittinger; Antonis Rokas
Understanding the phylogenetic relationships among the yeasts of the subphylum Saccharomycotina is a prerequisite for understanding the evolution of their metabolisms and ecological lifestyles. In the last two decades, the use of rDNA and multilocus data sets has greatly advanced our understanding of the yeast phylogeny, but many deep relationships remain unsupported. In contrast, phylogenomic analyses have involved relatively few taxa and lineages that were often selected with limited considerations for covering the breadth of yeast biodiversity. Here we used genome sequence data from 86 publicly available yeast genomes representing nine of the 11 known major lineages and 10 nonyeast fungal outgroups to generate a 1233-gene, 96-taxon data matrix. Species phylogenies reconstructed using two different methods (concatenation and coalescence) and two data matrices (amino acids or the first two codon positions) yielded identical and highly supported relationships between the nine major lineages. Aside from the lineage comprised by the family Pichiaceae, all other lineages were monophyletic. Most interrelationships among yeast species were robust across the two methods and data matrices. However, eight of the 93 internodes conflicted between analyses or data sets, including the placements of: the clade defined by species that have reassigned the CUG codon to encode serine, instead of leucine; the clade defined by a whole genome duplication; and the species Ascoidea rubescens. These phylogenomic analyses provide a robust roadmap for future comparative work across the yeast subphylum in the disciplines of taxonomy, molecular genetics, evolutionary biology, ecology, and biotechnology. To further this end, we have also provided a BLAST server to query the 86 Saccharomycotina genomes, which can be found at http://y1000plus.org/blast.
Molecular Biology and Evolution | 2018
Xiaofan Zhou; Xing-Xing Shen; Chris Todd Hittinger; Antonis Rokas
&NA; The sizes of the data matrices assembled to resolve branches of the tree of life have increased dramatically, motivating the development of programs for fast, yet accurate, inference. For example, several different fast programs have been developed in the very popular maximum likelihood framework, including RAxML/ExaML, PhyML, IQ‐TREE, and FastTree. Although these programs are widely used, a systematic evaluation and comparison of their performance using empirical genome‐scale data matrices has so far been lacking. To address this question, we evaluated these four programs on 19 empirical phylogenomic data sets with hundreds to thousands of genes and up to 200 taxa with respect to likelihood maximization, tree topology, and computational speed. For single‐gene tree inference, we found that the more exhaustive and slower strategies (ten searches per alignment) outperformed faster strategies (one tree search per alignment) using RAxML, PhyML, or IQ‐TREE. Interestingly, single‐gene trees inferred by the three programs yielded comparable coalescent‐based species tree estimations. For concatenation‐based species tree inference, IQ‐TREE consistently achieved the best‐observed likelihoods for all data sets, and RAxML/ExaML was a close second. In contrast, PhyML often failed to complete concatenation‐based analyses, whereas FastTree was the fastest but generated lower likelihood values and more dissimilar tree topologies in both types of analyses. Finally, data matrix properties, such as the number of taxa and the strength of phylogenetic signal, sometimes substantially influenced the programs’ relative performance. Our results provide real‐world gene and species tree phylogenetic inference benchmarks to inform the design and execution of large‐scale phylogenomic data analyses.
Genome Biology and Evolution | 2016
Xing-Xing Shen; Leonidas Salichos; Antonis Rokas
Molecular phylogenetic inference is inherently dependent on choices in both methodology and data. Many insightful studies have shown how choices in methodology, such as the model of sequence evolution or optimality criterion used, can strongly influence inference. In contrast, much less is known about the impact of choices in the properties of the data, typically genes, on phylogenetic inference. We investigated the relationships between 52 gene properties (24 sequence-based, 19 function-based, and 9 tree-based) with each other and with three measures of phylogenetic signal in two assembled data sets of 2,832 yeast and 2,002 mammalian genes. We found that most gene properties, such as evolutionary rate (measured through the percent average of pairwise identity across taxa) and total tree length, were highly correlated with each other. Similarly, several gene properties, such as gene alignment length, Guanine-Cytosine content, and the proportion of tree distance on internal branches divided by relative composition variability (treeness/RCV), were strongly correlated with phylogenetic signal. Analysis of partial correlations between gene properties and phylogenetic signal in which gene evolutionary rate and alignment length were simultaneously controlled, showed similar patterns of correlations, albeit weaker in strength. Examination of the relative importance of each gene property on phylogenetic signal identified gene alignment length, alongside with number of parsimony-informative sites and variable sites, as the most important predictors. Interestingly, the subsets of gene properties that optimally predicted phylogenetic signal differed considerably across our three phylogenetic measures and two data sets; however, gene alignment length and RCV were consistently included as predictors of all three phylogenetic measures in both yeasts and mammals. These results suggest that a handful of sequence-based gene properties are reliable predictors of phylogenetic signal and could be useful in guiding the choice of phylogenetic markers.
eLife | 2018
Carla Gonçalves; Jennifer H. Wisecaver; Jacek Kominek; Madalena Salema Oom; Maria José Leandro; Xing-Xing Shen; Dana A. Opulente; Xiaofan Zhou; David Peris; Cletus P. Kurtzman; Chris Todd Hittinger; Antonis Rokas; Paula Gonçalves
Fructophily is a rare trait that consists of the preference for fructose over other carbon sources. Here, we show that in a yeast lineage (the Wickerhamiella/Starmerella, W/S clade) comprised of fructophilic species thriving in the high-sugar floral niche, the acquisition of fructophily is concurrent with a wider remodeling of central carbon metabolism. Coupling comparative genomics with biochemical and genetic approaches, we gathered ample evidence for the loss of alcoholic fermentation in an ancestor of the W/S clade and subsequent reinstatement through either horizontal acquisition of homologous bacterial genes or modification of a pre-existing yeast gene. An enzyme required for sucrose assimilation was also acquired from bacteria, suggesting that the genetic novelties identified in the W/S clade may be related to adaptation to the high-sugar environment. This work shows how even central carbon metabolism can be remodeled by a surge of HGT events.
The EMBO Journal | 2018
Rongxin Shi; Elwood A. Mullins; Xing-Xing Shen; Kori T. Lay; Philip K. Yuen; Sheila S. David; Antonis Rokas; Brandt F. Eichman
DNA glycosylases preserve genome integrity and define the specificity of the base excision repair pathway for discreet, detrimental modifications, and thus, the mechanisms by which glycosylases locate DNA damage are of particular interest. Bacterial AlkC and AlkD are specific for cationic alkylated nucleobases and have a distinctive HEAT‐like repeat (HLR) fold. AlkD uses a unique non‐base‐flipping mechanism that enables excision of bulky lesions more commonly associated with nucleotide excision repair. In contrast, AlkC has a much narrower specificity for small lesions, principally N3‐methyladenine (3mA). Here, we describe how AlkC selects for and excises 3mA using a non‐base‐flipping strategy distinct from that of AlkD. A crystal structure resembling a catalytic intermediate complex shows how AlkC uses unique HLR and immunoglobulin‐like domains to induce a sharp kink in the DNA, exposing the damaged nucleobase to active site residues that project into the DNA. This active site can accommodate and excise N3‐methylcytosine (3mC) and N1‐methyladenine (1mA), which are also repaired by AlkB‐catalyzed oxidative demethylation, providing a potential alternative mechanism for repair of these lesions in bacteria.
Nature Communications | 2018
Tadeusz Krassowski; Aisling Y. Coughlan; Xing-Xing Shen; Xiaofan Zhou; Jacek Kominek; Dana A. Opulente; Robert Riley; Igor V. Grigoriev; Nikunj Maheshwari; Denis C. Shields; Cletus P. Kurtzman; Chris Todd Hittinger; Antonis Rokas; Kenneth H. Wolfe
The genetic code used in nuclear genes is almost universal, but here we report that it changed three times in parallel during the evolution of budding yeasts. All three changes were reassignments of the codon CUG, which is translated as serine (in 2 yeast clades), alanine (1 clade), or the ‘universal’ leucine (2 clades). The newly discovered Ser2 clade is in the final stages of a genetic code transition. Most species in this clade have genes for both a novel tRNASer(CAG) and an ancestral tRNALeu(CAG) to read CUG, but only tRNASer(CAG) is used in standard growth conditions. The coexistence of these alloacceptor tRNA genes indicates that the genetic code transition occurred via an ambiguous translation phase. We propose that the three parallel reassignments of CUG were not driven by natural selection in favor of their effects on the proteome, but by selection to eliminate the ancestral tRNALeu(CAG).The genetic code for amino acids is nearly universal, and among eukaryotic nuclear genomes the only known reassignments are of codon CUG in yeasts. Here, the authors identify a third independent CUG transition in budding yeasts that is still ongoing with alternative tRNAs present in the genome.
bioRxiv | 2018
Jacob L. Steenwyk; Xing-Xing Shen; Abigail L. Lind; Gustavo G Goldman; Antonis Rokas
The filamentous fungal family Aspergillaceae contains > 1,000 known species, mostly in the genera Aspergillus and Penicillium. Fungi in Aspergillaceae display a wide range of lifestyles, including several that are of relevance to human affairs. For example, several species are used as industrial workhorses, food fermenters, or platforms for drug discovery (e.g., Aspergillus niger, Penicillium camemberti), while others are dangerous human and plant pathogens (e.g., Aspergillus fumigatus, Penicillium digitatum). Reconstructing the phylogeny and timeline of the family’s diversification is the first step toward understanding how its diverse range of lifestyles evolved. To infer a robust phylogeny for Aspergillaceae and pinpoint poorly resolved branches and their likely underlying contributors, we used 81 genomes spanning the diversity of Aspergillus and Penicillium to construct a 1,668-gene data matrix. Phylogenies of the nucleotide and amino acid versions of this full data matrix were generated using three different maximum likelihood schemes (i.e., gene-partitioned, unpartitioned, and coalescence). We also used the same three schemes to infer phylogenies from five additional 834-gene data matrices constructed by subsampling the top 50% of genes according to different criteria associated with strong phylogenetic signal (alignment length, average bootstrap value, taxon completeness, treeness / relative composition variability, and number of variable sites). Examination of the topological agreement among these 36 phylogenies and measures of internode certainty identified 12 / 78 (15.4%) bipartitions that were incongruent. Patterns of incongruence across these 12 bipartitions fell into three categories: (i) low levels of incongruence for 2 shallow bipartitions, most likely stemming from incomplete lineage sorting, (ii) high levels of incongruence for 3 shallow bipartitions, most likely stemming from hybridization or introgression (or very high levels of incomplete lineage sorting), and (iii) varying levels of incongruence for 7 deeper bipartitions, most likely stemming from reconstruction artifacts associated with poor taxon sampling. Relaxed molecular clock analyses suggest that Aspergillaceae likely originated in the lower Cretaceous, 125.1 (95% Confidence Interval (CI): 146.7 - 102.1) million years ago (mya), with the origins of the Aspergillus and Penicillium genera dating back to 84.3 mya (95% CI: 90.9 - 77.6) and 77.4 mya (95% CI: 94.0 - 61.0), respectively. Our results provide a robust evolutionary and temporal framework for comparative genomic analyses in Aspergillaceae, while our general approach provides a widely applicable template for phylogenomic identification of resolved and contentious branches in densely genome-sequenced lineages across the tree of life.
bioRxiv | 2018
Jacek Kominek; Drew T. Doering; Dana A. Opulente; Xing-Xing Shen; Xiaofan Zhou; Jeremy DeVirgilio; Amanda Beth Hulfachor; Cletus P. Kurtzman; Antonis Rokas; Chris Todd Hittinger
Operons are a hallmark of bacterial genomes, where they allow concerted expression of multiple functionally related genes as single polycistronic transcripts. They are rare in eukaryotes, where each gene usually drives expression of its own independent messenger RNAs. Here we report the horizontal operon transfer of a catecholate-class siderophore biosynthesis pathway from Enterobacteriaceae into a group of closely related yeast taxa. We further show that the co-linearly arranged secondary metabolism genes are actively expressed, exhibit mainly eukaryotic transcriptional features, and enable the sequestration and uptake of iron. After transfer to the eukaryotic host, several genetic changes occurred, including the acquisition of polyadenylation sites, structural rearrangements, integration of eukaryotic genes, and secondary loss in some lineages. We conclude that the operon genes were likely captured in the shared insect gut habitat, modified for eukaryotic gene expression, and maintained by selection to adapt to the highly-competitive, iron-limited environment.
Proceedings of the National Academy of Sciences of the United States of America | 2018
David J. Krause; Jacek Kominek; Dana A. Opulente; Xing-Xing Shen; Xiaofan Zhou; Quinn K. Langdon; Jeremy DeVirgilio; Amanda Beth Hulfachor; Cletus P. Kurtzman; Antonis Rokas; Chris Todd Hittinger
Significance Evolutionary and comparative genomics, combined with reverse genetics, have the power to identify and characterize new biology. Here, we use these approaches in several nontraditional model species of budding yeasts to characterize a budding yeast secondary metabolite gene cluster, a set of genes responsible for production and reutilization of the siderophore pulcherrimin. We also use this information to assign roles in pulcherrimin utilization for two previously uncharacterized Saccharomyces cerevisiae genes. The evolution of this gene cluster in budding yeasts suggests an ecological role for pulcherrimin akin to other microbial public goods systems. Secondary metabolites are key in how organisms from all domains of life interact with their environment and each other. The iron-binding molecule pulcherrimin was described a century ago, but the genes responsible for its production in budding yeasts have remained uncharacterized. Here, we used phylogenomic footprinting on 90 genomes across the budding yeast subphylum Saccharomycotina to identify the gene cluster associated with pulcherrimin production. Using targeted gene replacements in Kluyveromyces lactis, we characterized the four genes that make up the cluster, which likely encode two pulcherriminic acid biosynthesis enzymes, a pulcherrimin transporter, and a transcription factor involved in both biosynthesis and transport. The requirement of a functional putative transporter to utilize extracellular pulcherrimin-complexed iron demonstrates that pulcherriminic acid is a siderophore, a chelator that binds iron outside the cell for subsequent uptake. Surprisingly, we identified homologs of the putative transporter and transcription factor genes in multiple yeast genera that lacked the biosynthesis genes and could not make pulcherrimin, including the model yeast Saccharomyces cerevisiae. We deleted these previously uncharacterized genes and showed they are also required for pulcherrimin utilization in S. cerevisiae, raising the possibility that other genes of unknown function are linked to secondary metabolism. Phylogenetic analyses of this gene cluster suggest that pulcherrimin biosynthesis and utilization were ancestral to budding yeasts, but the biosynthesis genes and, subsequently, the utilization genes, were lost in many lineages, mirroring other microbial public goods systems that lead to the rise of cheater organisms.