Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Matthew D. Rasmussen is active.

Publication


Featured researches published by Matthew D. Rasmussen.


Nature | 2011

A high-resolution map of human evolutionary constraint using 29 mammals

Kerstin Lindblad-Toh; Manuel Garber; Or Zuk; Michael F. Lin; Brian J. Parker; Stefan Washietl; Pouya Kheradpour; Jason Ernst; Gregory Jordan; Evan Mauceli; Lucas D. Ward; Craig B. Lowe; Alisha K. Holloway; Michele Clamp; Sante Gnerre; Jessica Alföldi; Kathryn Beal; Jean Chang; Hiram Clawson; James Cuff; Federica Di Palma; Stephen Fitzgerald; Paul Flicek; Mitchell Guttman; Melissa J. Hubisz; David B. Jaffe; Irwin Jungreis; W. James Kent; Dennis Kostka; Marcia Lara

The comparison of related genomes has emerged as a powerful lens for genome interpretation. Here we report the sequencing and comparative analysis of 29 eutherian genomes. We confirm that at least 5.5% of the human genome has undergone purifying selection, and locate constrained elements covering ∼4.2% of the genome. We use evolutionary signatures and comparisons with experimental data sets to suggest candidate functions for ∼60% of constrained bases. These elements reveal a small number of new coding exons, candidate stop codon readthrough events and over 10,000 regions of overlapping synonymous constraint within protein-coding exons. We find 220 candidate RNA structural families, and nearly a million elements overlapping potential promoter, enhancer and insulator regions. We report specific amino acid residues that have undergone positive selection, 280,000 non-coding elements exapted from mobile elements and more than 1,000 primate- and human-accelerated elements. Overlap with disease-associated variants indicates that our findings will be relevant for studies of human biology, health and disease.


Nature | 2009

Evolution of pathogenicity and sexual reproduction in eight Candida genomes.

Geraldine Butler; Matthew D. Rasmussen; Michael F. Lin; Manuel A. S. Santos; Sharadha Sakthikumar; Carol A. Munro; Esther Rheinbay; Manfred Grabherr; Anja Forche; Jennifer L. Reedy; Ino Agrafioti; Martha B. Arnaud; Steven Bates; Alistair J. P. Brown; Sascha Brunke; Maria C. Costanzo; David A. Fitzpatrick; Piet W. J. de Groot; David Harris; Lois L. Hoyer; Bernhard Hube; Frans M. Klis; Chinnappa D. Kodira; Nicola Lennard; Mary E. Logue; Ronny Martin; Aaron M. Neiman; Elissavet Nikolaou; Michael A. Quail; Janet Quinn

Candida species are the most common cause of opportunistic fungal infection worldwide. Here we report the genome sequences of six Candida species and compare these and related pathogens and non-pathogens. There are significant expansions of cell wall, secreted and transporter gene families in pathogenic species, suggesting adaptations associated with virulence. Large genomic tracts are homozygous in three diploid species, possibly resulting from recent recombination events. Surprisingly, key components of the mating and meiosis pathways are missing from several species. These include major differences at the mating-type loci (MTL); Lodderomyces elongisporus lacks MTL, and components of the a1/α2 cell identity determinant were lost in other species, raising questions about how mating and cell types are controlled. Analysis of the CUG leucine-to-serine genetic-code change reveals that 99% of ancestral CUG codons were erased and new ones arose elsewhere. Lastly, we revise the Candida albicans gene catalogue, identifying many new genes.


Nature | 2007

Discovery of functional elements in 12 Drosophila genomes using evolutionary signatures

Alexander Stark; Michael F. Lin; Pouya Kheradpour; Jakob Skou Pedersen; Leopold Parts; Joseph W. Carlson; Madeline A. Crosby; Matthew D. Rasmussen; Sushmita Roy; Ameya N. Deoras; J. Graham Ruby; Julius Brennecke; Harvard FlyBase curators; Berkeley Drosophila Genome; Emily Hodges; Angie S. Hinrichs; Anat Caspi; Benedict Paten; Seung-Won Park; Mira V. Han; Morgan L. Maeder; Benjamin J. Polansky; Bryanne E. Robson; Stein Aerts; Jacques van Helden; Bassem A. Hassan; Donald G. Gilbert; Deborah A. Eastman; Michael D. Rice; Michael Weir

Sequencing of multiple related species followed by comparative genomics analysis constitutes a powerful approach for the systematic understanding of any genome. Here, we use the genomes of 12 Drosophila species for the de novo discovery of functional elements in the fly. Each type of functional element shows characteristic patterns of change, or ‘evolutionary signatures’, dictated by its precise selective constraints. Such signatures enable recognition of new protein-coding genes and exons, spurious and incorrect gene annotations, and numerous unusual gene structures, including abundant stop-codon readthrough. Similarly, we predict non-protein-coding RNA genes and structures, and new microRNA (miRNA) genes. We provide evidence of miRNA processing and functionality from both hairpin arms and both DNA strands. We identify several classes of pre- and post-transcriptional regulatory motifs, and predict individual motif instances with high confidence. We also study how discovery power scales with the divergence and number of species compared, and we provide general guidelines for comparative studies.


PLOS Genetics | 2014

Genome-wide inference of ancestral recombination graphs

Matthew D. Rasmussen; Melissa J. Hubisz; Ilan Gronau; Adam Siepel

The complex correlation structure of a collection of orthologous DNA sequences is uniquely captured by the “ancestral recombination graph” (ARG), a complete record of coalescence and recombination events in the history of the sample. However, existing methods for ARG inference are computationally intensive, highly approximate, or limited to small numbers of sequences, and, as a consequence, explicit ARG inference is rarely used in applied population genomics. Here, we introduce a new algorithm for ARG inference that is efficient enough to apply to dozens of complete mammalian genomes. The key idea of our approach is to sample an ARG of chromosomes conditional on an ARG of chromosomes, an operation we call “threading.” Using techniques based on hidden Markov models, we can perform this threading operation exactly, up to the assumptions of the sequentially Markov coalescent and a discretization of time. An extension allows for threading of subtrees instead of individual sequences. Repeated application of these threading operations results in highly efficient Markov chain Monte Carlo samplers for ARGs. We have implemented these methods in a computer program called ARGweaver. Experiments with simulated data indicate that ARGweaver converges rapidly to the posterior distribution over ARGs and is effective in recovering various features of the ARG for dozens of sequences generated under realistic parameters for human populations. In applications of ARGweaver to 54 human genome sequences from Complete Genomics, we find clear signatures of natural selection, including regions of unusually ancient ancestry associated with balancing selection and reductions in allele age in sites under directional selection. The patterns we observe near protein-coding genes are consistent with a primary influence from background selection rather than hitchhiking, although we cannot rule out a contribution from recurrent selective sweeps.


Molecular Biology and Evolution | 2011

A Bayesian Approach for Fast and Accurate Gene Tree Reconstruction

Matthew D. Rasmussen; Manolis Kellis

Recent sequencing and computing advances have enabled phylogenetic analyses to expand to both entire genomes and large clades, thus requiring more efficient and accurate methods designed specifically for the phylogenomic context. Here, we present SPIMAP, an efficient Bayesian method for reconstructing gene trees in the presence of a known species tree. We observe many improvements in reconstruction accuracy, achieved by modeling multiple aspects of evolution, including gene duplication and loss (DL) rates, speciation times, and correlated substitution rate variation across both species and loci. We have implemented and applied this method on two clades of fully sequenced species, 12 Drosophila and 16 fungal genomes as well as simulated phylogenies and find dramatic improvements in reconstruction accuracy as compared with the most popular existing methods, including those that take the species tree into account. We find that reconstruction inaccuracies of traditional phylogenetic methods overestimate the number of DL events by as much as 2–3-fold, whereas our method achieves significantly higher accuracy. We feel that the results and methods presented here will have many important implications for future investigations of gene evolution.


Genome Research | 2012

Unified modeling of gene duplication, loss, and coalescence using a locus tree

Matthew D. Rasmussen; Manolis Kellis

Gene phylogenies provide a rich source of information about the way evolution shapes genomes, populations, and phenotypes. In addition to substitutions, evolutionary events such as gene duplication and loss (as well as horizontal transfer) play a major role in gene evolution, and many phylogenetic models have been developed in order to reconstruct and study these events. However, these models typically make the simplifying assumption that population-related effects such as incomplete lineage sorting (ILS) are negligible. While this assumption may have been reasonable in some settings, it has become increasingly problematic as increased genome sequencing has led to denser phylogenies, where effects such as ILS are more prominent. To address this challenge, we present a new probabilistic model, DLCoal, that defines gene duplication and loss in a population setting, such that coalescence and ILS can be directly addressed. Interestingly, this model implies that in addition to the usual gene tree and species tree, there exists a third tree, the locus tree, which will likely have many applications. Using this model, we develop the first general reconciliation method that accurately infers gene duplications and losses in the presence of ILS, and we show its improved inference of orthologs, paralogs, duplications, and losses for a variety of clades, including flies, fungi, and primates. Also, our simulations show that gene duplications increase the frequency of ILS, further illustrating the importance of a joint model. Going forward, we believe that this unified model can offer insights to questions in both phylogenetics and population genetics.


Systematic Biology | 2013

TreeFix: statistically informed gene tree error correction using species trees.

Yi-Chieh Wu; Matthew D. Rasmussen; Mukul S. Bansal; Manolis Kellis

Accurate gene tree reconstruction is a fundamental problem in phylogenetics, with many important applications. However, sequence data alone often lack enough information to confidently support one gene tree topology over many competing alternatives. Here, we present a novel framework for combining sequence data and species tree information, and we describe an implementation of this framework in TreeFix, a new phylogenetic program for improving gene tree reconstructions. Given a gene tree (preferably computed using a maximum-likelihood phylogenetic program), TreeFix finds a “statistically equivalent” gene tree that minimizes a species tree-based cost function. We have applied TreeFix to 2 clades of 12 Drosophila and 16 fungal genomes, as well as to simulated phylogenies and show that it dramatically improves reconstructions compared with current state-of-the-art programs. Given its accuracy, speed, and simplicity, TreeFix should be applicable to a wide range of analyses and have many important implications for future investigations of gene evolution. The source code and a sample data set are available at http://compbio.mit.edu/treefix.


PLOS Computational Biology | 2008

Performance and Scalability of Discriminative Metrics for Comparative Gene Identification in 12 Drosophila Genomes

Michael F. Lin; Ameya N. Deoras; Matthew D. Rasmussen; Manolis Kellis

Comparative genomics of multiple related species is a powerful methodology for the discovery of functional genomic elements, and its power should increase with the number of species compared. Here, we use 12 Drosophila genomes to study the power of comparative genomics metrics to distinguish between protein-coding and non-coding regions. First, we study the relative power of different comparative metrics and their relationship to single-species metrics. We find that even relatively simple multi-species metrics robustly outperform advanced single-species metrics, especially for shorter exons (≤240 nt), which are common in animal genomes. Moreover, the two capture largely independent features of protein-coding genes, with different sensitivity/specificity trade-offs, such that their combinations lead to even greater discriminatory power. In addition, we study how discovery power scales with the number and phylogenetic distance of the genomes compared. We find that species at a broad range of distances are comparably effective informants for pairwise comparative gene identification, but that these are surpassed by multi-species comparisons at similar evolutionary divergence. In particular, while pairwise discovery power plateaued at larger distances and never outperformed the most advanced single-species metrics, multi-species comparisons continued to benefit even from the most distant species with no apparent saturation. Last, we find that genes in functional categories typically considered fast-evolving can nonetheless be recovered at very high rates using comparative methods. Our results have implications for comparative genomics analyses in any species, including the human.


Genome Research | 2014

Most parsimonious reconciliation in the presence of gene duplication, loss, and deep coalescence using labeled coalescent trees

Yi-Chieh Wu; Matthew D. Rasmussen; Mukul S. Bansal; Manolis Kellis

Accurate gene tree-species tree reconciliation is fundamental to inferring the evolutionary history of a gene family. However, although it has long been appreciated that population-related effects such as incomplete lineage sorting (ILS) can dramatically affect the gene tree, many of the most popular reconciliation methods consider discordance only due to gene duplication and loss (and sometimes horizontal gene transfer). Methods that do model ILS are either highly parameterized or consider a restricted set of histories, thus limiting their applicability and accuracy. To address these challenges, we present a novel algorithm DLCpar for inferring a most parsimonious (MP) history of a gene family in the presence of duplications, losses, and ILS. Our algorithm relies on a new reconciliation structure, the labeled coalescent tree (LCT), that simultaneously describes coalescent and duplication-loss history. We show that the LCT representation enables an exhaustive and efficient search over the space of reconciliations, and, for most gene families, the least common ancestor (LCA) mapping is an optimal solution for the species mapping between the gene tree and species tree in an MP LCT. Applying our algorithm to a variety of clades, including flies, fungi, and primates, as well as to simulated phylogenies, we achieve high accuracy, comparable to sophisticated probabilistic reconciliation methods, at reduced run time and with far fewer parameters. These properties enable inferences of the complex evolution of gene families across a broad range of species and large data sets.


Molecular Biology and Evolution | 2012

Evolution at the Subgene Level: Domain Rearrangements in the Drosophila Phylogeny

Yi-Chieh Wu; Matthew D. Rasmussen; Manolis Kellis

Although the possibility of gene evolution by domain rearrangements has long been appreciated, current methods for reconstructing and systematically analyzing gene family evolution are limited to events such as duplication, loss, and sometimes, horizontal transfer. However, within the Drosophila clade, we find domain rearrangements occur in 35.9% of gene families, and thus, any comprehensive study of gene evolution in these species will need to account for such events. Here, we present a new computational model and algorithm for reconstructing gene evolution at the domain level. We develop a method for detecting homologous domains between genes and present a phylogenetic algorithm for reconstructing maximum parsimony evolutionary histories that include domain generation, duplication, loss, merge (fusion), and split (fission) events. Using this method, we find that genes involved in fusion and fission are enriched in signaling and development, suggesting that domain rearrangements and reuse may be crucial in these processes. We also find that fusion is more abundant than fission, and that fusion and fission events occur predominantly alongside duplication, with 92.5% and 34.3% of fusion and fission events retaining ancestral architectures in the duplicated copies. We provide a catalog of ∼9,000 genes that undergo domain rearrangement across nine sequenced species, along with possible mechanisms for their formation. These results dramatically expand on evolution at the subgene level and offer several insights into how new genes and functions arise between species.

Collaboration


Dive into the Matthew D. Rasmussen's collaboration.

Top Co-Authors

Avatar

Manolis Kellis

Massachusetts Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

Yi-Chieh Wu

Massachusetts Institute of Technology

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Michael F. Lin

Massachusetts Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

Mukul S. Bansal

University of Connecticut

View shared research outputs
Top Co-Authors

Avatar

Sharad Vikram

University of California

View shared research outputs
Top Co-Authors

Avatar

Adam Siepel

Cold Spring Harbor Laboratory

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Pouya Kheradpour

Massachusetts Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

Ameya N. Deoras

Massachusetts Institute of Technology

View shared research outputs
Researchain Logo
Decentralizing Knowledge