Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Yi-Chieh Wu is active.

Publication


Featured researches published by Yi-Chieh Wu.


Science | 2015

Extensive introgression in a malaria vector species complex revealed by phylogenomics

Michael Fontaine; James B. Pease; Aaron Steele; Robert M. Waterhouse; Daniel E. Neafsey; Igor V. Sharakhov; Xiaofang Jiang; Andrew Brantley Hall; Flaminia Catteruccia; Evdoxia G. Kakani; Sara N. Mitchell; Yi-Chieh Wu; Hilary A. Smith; R. Rebecca Love; Mara K. N. Lawniczak; Michel A. Slotman; Scott J. Emrich; Matthew W. Hahn; Nora J. Besansky

Introduction The notion that species boundaries can be porous to introgression is increasingly accepted. Yet the broader role of introgression in evolution remains contentious and poorly documented, partly because of the challenges involved in accurately identifying introgression in the very groups where it is most likely to occur. Recently diverged species often have incomplete reproductive barriers and may hybridize where they overlap. However, because of retention and stochastic sorting of ancestral polymorphisms, inference of the correct species branching order is notoriously challenging for recent speciation events, especially those closely spaced in time. Without knowledge of species relationships, it is impossible to identify instances of introgression. Rationale Since the discovery that the single mosquito taxon described in 1902 as Anopheles gambiae was actually a complex of several closely related and morphologically indistinguishable sibling species, the correct species branching order has remained controversial and unresolved. This Afrotropical complex contains the world’s most important vectors of human malaria, owing to their close association with humans, as well as minor vectors and species that do not bite humans. On the basis of ecology and behavior, one might predict phylogenetic clustering of the three highly anthropophilic vector species. However, previous phylogenetic analyses of the complex based on a limited number of markers strongly disagree about relationships between the major vectors, potentially because of historical introgression between them. To investigate the history of the species complex, we used whole-genome reference assemblies, as well as dozens of resequenced individuals from the field. Results We observed a large amount of phylogenetic discordance between trees generated from the autosomes and X chromosome. The autosomes, which make up the majority of the genome, overwhelmingly supported the grouping of the three major vectors of malaria, An. gambiae, An. coluzzii, and An. arabiensis. In stark contrast, the X chromosome strongly supported the grouping of An. arabiensis with a species that plays no role in malaria transmission, An. quadriannulatus. Although the whole-genome consensus phylogeny unequivocally agrees with the autosomal topology, we found that the topology most often located on the X chromosome follows the historical species branching order, with pervasive introgression on the autosomes producing relationships that group the three highly anthropophilic species together. With knowledge of the correct species branching order, we are further able to uncover introgression between another species pair, as well as a complex history of balancing selection, introgression, and local adaptation of a large autosomal inversion that confers aridity tolerance. Conclusion We identify the correct species branching order of the An. gambiae species complex, resolving a contentious phylogeny. Notably, lineages leading to the principal vectors of human malaria were among the first in the complex to radiate and are not most closely related to each other. Pervasive autosomal introgression between these human malaria vectors, including nonsister vector species, suggests that traits enhancing vectorial capacity can be acquired not only through de novo mutation but also through a more rapid process of interspecific genetic exchange. Time-lapse photographs of an adult anopheline mosquito emerging from its pupal case. RELATED ITEMS IN ScienceD. E. Neafsey et al., Science 347, 1258522 (2015) Introgressive hybridization is now recognized as a widespread phenomenon, but its role in evolution remains contested. Here, we use newly available reference genome assemblies to investigate phylogenetic relationships and introgression in a medically important group of Afrotropical mosquito sibling species. We have identified the correct species branching order to resolve a contentious phylogeny and show that lineages leading to the principal vectors of human malaria were among the first to split. Pervasive autosomal introgression between these malaria vectors means that only a small fraction of the genome, mainly on the X chromosome, has not crossed species boundaries. Our results suggest that traits enhancing vectorial capacity may be gained through interspecific gene flow, including between nonsister species. Mosquito adaptability across genomes Virtually everyone has first-hand experience with mosquitoes. Few recognize the subtle biological distinctions among these bloodsucking flies that render some bites mere nuisances and others the initiation of a potentially life-threatening infection. By sequencing the genomes of several mosquitoes in depth, Neafsey et al. and Fontaine et al. reveal clues that explain the mystery of why only some species of one genus of mosquitoes are capable of transmitting human malaria (see the Perspective by Clark and Messer). Science, this issue 10.1126/science.1258524 and 10.1126/science.1258522; see also p. 27 Comparison of several genomes reveals the genetic history of mosquitoes’ ability to vector malaria among humans. [Also see Perspective by Clark and Messer]


Nature | 2014

Comparative analysis of regulatory information and circuits across distant species

Alan P. Boyle; Carlos L. Araya; Cathleen M. Brdlik; Philip Cayting; Chao Cheng; Yong Cheng; Kathryn E. Gardner; LaDeana W. Hillier; J. Janette; Lixia Jiang; Dionna M. Kasper; Trupti Kawli; Pouya Kheradpour; Anshul Kundaje; Jingyi Jessica Li; Lijia Ma; Wei Niu; E. Jay Rehm; Joel Rozowsky; Matthew Slattery; Rebecca Spokony; Robert Terrell; Dionne Vafeados; Daifeng Wang; Peter Weisdepp; Yi-Chieh Wu; Dan Xie; Koon Kiu Yan; Elise A. Feingold; Peter J. Good

Despite the large evolutionary distances between metazoan species, they can show remarkable commonalities in their biology, and this has helped to establish fly and worm as model organisms for human biology. Although studies of individual elements and factors have explored similarities in gene regulation, a large-scale comparative analysis of basic principles of transcriptional regulatory features is lacking. Here we map the genome-wide binding locations of 165 human, 93 worm and 52 fly transcription regulatory factors, generating a total of 1,019 data sets from diverse cell types, developmental stages, or conditions in the three species, of which 498 (48.9%) are presented here for the first time. We find that structural properties of regulatory networks are remarkably conserved and that orthologous regulatory factor families recognize similar binding motifs in vivo and show some similar co-associations. Our results suggest that gene-regulatory properties previously observed for individual factors are general principles of metazoan regulation that are remarkably well-preserved despite extensive functional divergence of individual network connections. The comparative maps of regulatory circuitry provided here will drive an improved understanding of the regulatory underpinnings of model organism biology and how these relate to human biology, development and disease.


Systematic Biology | 2013

TreeFix: statistically informed gene tree error correction using species trees.

Yi-Chieh Wu; Matthew D. Rasmussen; Mukul S. Bansal; Manolis Kellis

Accurate gene tree reconstruction is a fundamental problem in phylogenetics, with many important applications. However, sequence data alone often lack enough information to confidently support one gene tree topology over many competing alternatives. Here, we present a novel framework for combining sequence data and species tree information, and we describe an implementation of this framework in TreeFix, a new phylogenetic program for improving gene tree reconstructions. Given a gene tree (preferably computed using a maximum-likelihood phylogenetic program), TreeFix finds a “statistically equivalent” gene tree that minimizes a species tree-based cost function. We have applied TreeFix to 2 clades of 12 Drosophila and 16 fungal genomes, as well as to simulated phylogenies and show that it dramatically improves reconstructions compared with current state-of-the-art programs. Given its accuracy, speed, and simplicity, TreeFix should be applicable to a wide range of analyses and have many important implications for future investigations of gene evolution. The source code and a sample data set are available at http://compbio.mit.edu/treefix.


Genome Research | 2014

Most parsimonious reconciliation in the presence of gene duplication, loss, and deep coalescence using labeled coalescent trees

Yi-Chieh Wu; Matthew D. Rasmussen; Mukul S. Bansal; Manolis Kellis

Accurate gene tree-species tree reconciliation is fundamental to inferring the evolutionary history of a gene family. However, although it has long been appreciated that population-related effects such as incomplete lineage sorting (ILS) can dramatically affect the gene tree, many of the most popular reconciliation methods consider discordance only due to gene duplication and loss (and sometimes horizontal gene transfer). Methods that do model ILS are either highly parameterized or consider a restricted set of histories, thus limiting their applicability and accuracy. To address these challenges, we present a novel algorithm DLCpar for inferring a most parsimonious (MP) history of a gene family in the presence of duplications, losses, and ILS. Our algorithm relies on a new reconciliation structure, the labeled coalescent tree (LCT), that simultaneously describes coalescent and duplication-loss history. We show that the LCT representation enables an exhaustive and efficient search over the space of reconciliations, and, for most gene families, the least common ancestor (LCA) mapping is an optimal solution for the species mapping between the gene tree and species tree in an MP LCT. Applying our algorithm to a variety of clades, including flies, fungi, and primates, as well as to simulated phylogenies, we achieve high accuracy, comparable to sophisticated probabilistic reconciliation methods, at reduced run time and with far fewer parameters. These properties enable inferences of the complex evolution of gene families across a broad range of species and large data sets.


Molecular Biology and Evolution | 2012

Evolution at the Subgene Level: Domain Rearrangements in the Drosophila Phylogeny

Yi-Chieh Wu; Matthew D. Rasmussen; Manolis Kellis

Although the possibility of gene evolution by domain rearrangements has long been appreciated, current methods for reconstructing and systematically analyzing gene family evolution are limited to events such as duplication, loss, and sometimes, horizontal transfer. However, within the Drosophila clade, we find domain rearrangements occur in 35.9% of gene families, and thus, any comprehensive study of gene evolution in these species will need to account for such events. Here, we present a new computational model and algorithm for reconstructing gene evolution at the domain level. We develop a method for detecting homologous domains between genes and present a phylogenetic algorithm for reconstructing maximum parsimony evolutionary histories that include domain generation, duplication, loss, merge (fusion), and split (fission) events. Using this method, we find that genes involved in fusion and fission are enriched in signaling and development, suggesting that domain rearrangements and reuse may be crucial in these processes. We also find that fusion is more abundant than fission, and that fusion and fission events occur predominantly alongside duplication, with 92.5% and 34.3% of fusion and fission events retaining ancestral architectures in the duplicated copies. We provide a catalog of ∼9,000 genes that undergo domain rearrangement across nine sequenced species, along with possible mechanisms for their formation. These results dramatically expand on evolution at the subgene level and offer several insights into how new genes and functions arise between species.


Bioinformatics | 2014

Pareto-optimal phylogenetic tree reconciliation

Ran Libeskind-Hadas; Yi-Chieh Wu; Mukul S. Bansal; Manolis Kellis

Motivation: Phylogenetic tree reconciliation is a widely used method for reconstructing the evolutionary histories of gene families and species, hosts and parasites and other dependent pairs of entities. Reconciliation is typically performed using maximum parsimony, in which each evolutionary event type is assigned a cost and the objective is to find a reconciliation of minimum total cost. It is generally understood that reconciliations are sensitive to event costs, but little is understood about the relationship between event costs and solutions. Moreover, choosing appropriate event costs is a notoriously difficult problem. Results: We address this problem by giving an efficient algorithm for computing Pareto-optimal sets of reconciliations, thus providing the first systematic method for understanding the relationship between event costs and reconciliations. This, in turn, results in new techniques for computing event support values and, for cophylogenetic analyses, performing robust statistical tests. We provide new software tools and demonstrate their use on a number of datasets from evolutionary genomic and cophylogenetic studies. Availability and implementation: Our Python tools are freely available at www.cs.hmc.edu/∼hadas/xscape. Contact: [email protected] Supplementary information: Supplementary data are available at Bioinformatics online.


Bioinformatics | 2015

Improved gene tree error correction in the presence of horizontal gene transfer

Mukul S. Bansal; Yi-Chieh Wu; Eric J. Alm; Manolis Kellis

Motivation: The accurate inference of gene trees is a necessary step in many evolutionary studies. Although the problem of accurate gene tree inference has received considerable attention, most existing methods are only applicable to gene families unaffected by horizontal gene transfer. As a result, the accurate inference of gene trees affected by horizontal gene transfer remains a largely unaddressed problem. Results: In this study, we introduce a new and highly effective method for gene tree error correction in the presence of horizontal gene transfer. Our method efficiently models horizontal gene transfers, gene duplications and losses, and uses a statistical hypothesis testing framework [Shimodaira–Hasegawa (SH) test] to balance sequence likelihood with topological information from a known species tree. Using a thorough simulation study, we show that existing phylogenetic methods yield inaccurate gene trees when applied to horizontally transferred gene families and that our method dramatically improves gene tree accuracy. We apply our method to a dataset of 11 cyanobacterial species and demonstrate the large impact of gene tree accuracy on downstream evolutionary analyses. Availability and implementation: An implementation of our method is available at http://compbio.mit.edu/treefix-dtl/ Contact: [email protected] or [email protected] Supplementary information: Supplementary data are available at Bioinformatics online.


bioRxiv | 2014

Phylogenetic Identification and Functional Characterization of Orthologs and Paralogs across Human, Mouse, Fly, and Worm

Yi-Chieh Wu; Mukul S. Bansal; Matthew D. Rasmussen; Javier Herrero; Manolis Kellis

Model organisms can serve the biological and medical community by enabling the study of conserved gene families and pathways in experimentally-tractable systems. Their use, however, hinges on the ability to reliably identify evolutionary orthologs and paralogs with high accuracy, which can be a great challenge at both small and large evolutionary distances. Here, we present a phylogenomics-based approach for the identification of orthologous and paralogous genes in human, mouse, fly, and worm, which forms the foundation of the comparative analyses of the modENCODE and mouse ENCODE projects. We study a median of 16,101 genes across 2 mammalian genomes (human, mouse), 12 Drosophila genomes, 5 Caenorhabditis genomes, and an outgroup yeast genome, and demonstrate that accurate inference of evolutionary relationships and events across these species must account for frequent gene-tree topology errors due to both incomplete lineage sorting and insufficient phylogenetic signal. Furthermore, we show that integration of two separate phylogenomic pipelines yields increased accuracy, suggesting that their sources of error are independent, and finally, we leverage the resulting annotation of homologous genes to study the functional impact of gene duplication and loss in the context of rich gene expression and functional genomic datasets of the modENCODE, mouse ENCODE, and human ENCODE projects.


international symposium on bioinformatics research and applications | 2017

Coestimation of Gene Trees and Reconciliations Under a Duplication-Loss-Coalescence Model

Bo Zhang; Yi-Chieh Wu

Accurate gene tree-species tree reconciliation is fundamental to understanding evolutionary processes across species. However, within eukaryotes, the most popular algorithms consider only a restricted set of evolutionary events, typically modeling only duplications and losses or only coalescences. Recent work has unified duplications, losses, and coalescences through an intermediate locus tree; however, the associated reconciliation algorithms assume that the gene tree is known and do not account for gene tree reconstruction error. Here, we demonstrate that independent reconstruction of the gene tree followed by reconciliation substantially degrades accuracy compared to using the true gene tree. To address this challenge, we present DLC-Coestimation, a Bayesian method that simultaneously reconstructs the gene tree and reconciles it with the species tree. We have applied our method on two clades of flies and fungi and demonstrate that it outperforms existing approaches in ortholog, duplication, and loss inference. This work demonstrates the utility of coestimation methods for inferences under joint phylogenetic and population genomic models.


BMC Bioinformatics | 2017

Reconciliation feasibility in the presence of gene duplication, loss, and coalescence with multiple individuals per species

Jennifer Rogers; Andrew Fishberg; Nora Youngs; Yi-Chieh Wu

BackgroundIn phylogenetics, we often seek to reconcile gene trees with species trees within the framework of an evolutionary model. While the most popular models for eukaryotic species allow for only gene duplication and gene loss or only multispecies coalescence, recent work has combined these phenomena through a reconciliation structure, the labeled coalescent tree (LCT), that simultaneously describes the duplication-loss and coalescent history of a gene family. However, the LCT makes the simplifying assumption that only one individual is sampled per species whereas, with advances in gene sequencing, we now have access to multiple samples per species.ResultsWe demonstrate that with these additional samples, there exist gene tree topologies that are impossible to reconcile with any species tree. In particular, the multiple samples enforce new constraints on the placement of duplications within a valid reconciliation. To model these constraints, we extend the LCT to a new structure, the partially labeled coalescent tree (PLCT) and demonstrate how to use the PLCT to evaluate the feasibility of a gene tree topology. We apply our algorithm to two clades of apes and flies to characterize possible sources of infeasibility.ConclusionGoing forward, we believe that this model represents a first step towards understanding reconciliations in duplication-loss-coalescence models with multiple samples per species.

Collaboration


Dive into the Yi-Chieh Wu's collaboration.

Top Co-Authors

Avatar

Manolis Kellis

Massachusetts Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

Matthew D. Rasmussen

Massachusetts Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

Mukul S. Bansal

University of Connecticut

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Pouya Kheradpour

Massachusetts Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

Abhishek Sarkar

Massachusetts Institute of Technology

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Andreas R. Pfenning

Howard Hughes Medical Institute

View shared research outputs
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge