Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Diego Mallo is active.

Publication


Featured researches published by Diego Mallo.


Systematic Biology | 2016

SimPhy: Phylogenomic Simulation of Gene, Locus, and Species Trees.

Diego Mallo; Leonardo de Oliveira Martins; David Posada

We present a fast and flexible software package—SimPhy—for the simulation of multiple gene families evolving under incomplete lineage sorting, gene duplication and loss, horizontal gene transfer—all three potentially leading to species tree/gene tree discordance—and gene conversion. SimPhy implements a hierarchical phylogenetic model in which the evolution of species, locus, and gene trees is governed by global and local parameters (e.g., genome-wide, species-specific, locus-specific), that can be fixed or be sampled from a priori statistical distributions. SimPhy also incorporates comprehensive models of substitution rate variation among lineages (uncorrelated relaxed clocks) and the capability of simulating partitioned nucleotide, codon, and protein multilocus sequence alignments under a plethora of substitution models using the program INDELible. We validate SimPhys output using theoretical expectations and other programs, and show that it scales extremely well with complex models and/or large trees, being an order of magnitude faster than the most similar program (DLCoal-Sim). In addition, we demonstrate how SimPhy can be useful to understand interactions among different evolutionary processes, conducting a simulation study to characterize the systematic overestimation of the duplication time when using standard reconciliation methods. SimPhy is available at https://github.com/adamallo/SimPhy, where users can find the source code, precompiled executables, a detailed manual and example cases.


Environmental Microbiology | 2015

Diversity and distribution of unicellular opisthokonts along the European coast analysed using high-throughput sequencing.

Javier Campo; Diego Mallo; Ramon Massana; Colomban de Vargas; Thomas A. Richards; Iñaki Ruiz-Trillo

The opisthokonts are one of the major super groups of eukaryotes. It comprises two major clades: (i) the Metazoa and their unicellular relatives and (ii) the Fungi and their unicellular relatives. There is, however, little knowledge of the role of opisthokont microbes in many natural environments, especially among non-metazoan and non-fungal opisthokonts. Here, we begin to address this gap by analysing high-throughput 18S rDNA and 18S rRNA sequencing data from different European coastal sites, sampled at different size fractions and depths. In particular, we analyse the diversity and abundance of choanoflagellates, filastereans, ichthyosporeans, nucleariids, corallochytreans and their related lineages. Our results show the great diversity of choanoflagellates in coastal waters as well as a relevant representation of the ichthyosporeans and the uncultured marine opisthokonts (MAOP). Furthermore, we describe a new lineage of marine fonticulids (MAFO) that appears to be abundant in sediments. Taken together, our work points to a greater potential ecological role for unicellular opisthokonts than previously appreciated in marine environments, both in water column and sediments, and also provides evidence of novel opisthokont phylogenetic lineages. This study highlights the importance of high-throughput sequencing approaches to unravel the diversity and distribution of both known and novel eukaryotic lineages.


Systematic Biology | 2016

A Bayesian Supertree Model for Genome-Wide Species Tree Reconstruction

Leonardo de Oliveira Martins; Diego Mallo; David Posada

Current phylogenomic data sets highlight the need for species tree methods able to deal with several sources of gene tree/species tree incongruence. At the same time, we need to make most use of all available data. Most species tree methods deal with single processes of phylogenetic discordance, namely, gene duplication and loss, incomplete lineage sorting (ILS) or horizontal gene transfer. In this manuscript, we address the problem of species tree inference from multilocus, genome-wide data sets regardless of the presence of gene duplication and loss and ILS therefore without the need to identify orthologs or to use a single individual per species. We do this by extending the idea of Maximum Likelihood (ML) supertrees to a hierarchical Bayesian model where several sources of gene tree/species tree disagreement can be accounted for in a modular manner. We implemented this model in a computer program called guenomu whose inputs are posterior distributions of unrooted gene tree topologies for multiple gene families, and whose output is the posterior distribution of rooted species tree topologies. We conducted extensive simulations to evaluate the performance of our approach in comparison with other species tree approaches able to deal with more than one leaf from the same species. Our method ranked best under simulated data sets, in spite of ignoring branch lengths, and performed well on empirical data, as well as being fast enough to analyze relatively large data sets. Our Bayesian supertree method was also very successful in obtaining better estimates of gene trees, by reducing the uncertainty in their distributions. In addition, our results show that under complex simulation scenarios, gene tree parsimony is also a competitive approach once we consider its speed, in contrast to more sophisticated models.


Philosophical Transactions of the Royal Society B | 2016

Multilocus inference of species trees and DNA barcoding.

Diego Mallo; David Posada

The unprecedented amount of data resulting from next-generation sequencing has opened a new era in phylogenetic estimation. Although large datasets should, in theory, increase phylogenetic resolution, massive, multilocus datasets have uncovered a great deal of phylogenetic incongruence among different genomic regions, due both to stochastic error and to the action of different evolutionary process such as incomplete lineage sorting, gene duplication and loss and horizontal gene transfer. This incongruence violates one of the fundamental assumptions of the DNA barcoding approach, which assumes that gene history and species history are identical. In this review, we explain some of the most important challenges we will have to face to reconstruct the history of species, and the advantages and disadvantages of different strategies for the phylogenetic analysis of multilocus data. In particular, we describe the evolutionary events that can generate species tree—gene tree discordance, compare the most popular methods for species tree reconstruction, highlight the challenges we need to face when using them and discuss their potential utility in barcoding. Current barcoding methods sacrifice a great amount of statistical power by only considering one locus, and a transition to multilocus barcodes would not only improve current barcoding methods, but also facilitate an eventual transition to species-tree-based barcoding strategies, which could better accommodate scenarios where the barcode gap is too small or inexistent. This article is part of the themed issue ‘From DNA barcodes to biomes’.


Systematic Biology | 2014

Unsorted Homology within Locus and Species Trees

Diego Mallo; Leonardo de Oliveira Martins; David Posada

The concept of homology lies at the root of evolutionary biology. Since the seminal work of Fitch (1970), three main categories of homology relationships have been defined at the molecular level: orthology, paralogy, and xenology. In brief, if two gene copies arose by duplication they are paralogs, whereas if they arose through speciation they are orthologs. If one of them was transferred from a contemporaneous species, we call them xenologs (Supplementary Fig. S1 in Supplementary Material online, available at http://dx.doi.org/10.5061/dryad.87k57; see Gray and Fitch (1983); Fitch (2000)). Indeed, these terms were coined under a phylogenetic framework in which species were represented by single individuals, and as such they have remained very much intact during the last four decades—although particular cases within these categories have received specific names (Mindell and Meyer 2001). However, advances in sequencing technology have changed the field, and it is now very common to collect data sets containing multiple gene loci and/or multiple individuals per species. In general, such genome-wide data sets not only have unveiled extensive phylogenomic incongruence (Jeffroy et al. 2006; Salichos and Rokas 2013) but have brought back to the spotlight the consideration of how ancestral polymorphisms sort within populations (Edwards 2009). Altogether, phylogenomic data make imperative the explicit distinction between organismal and gene histories. Let us consider phylogenetic relationships at three different levels: species, loci, and gene copies (Fig. 1). The distinction between species/population trees and gene trees has been known for decades (Goodman et al. 1979; Pamilo and Nei 1988; Takahata 1989), whereas the introduction of locus trees into these models is very recent (Rasmussen and Kellis 2012). In brief, a species tree depicts the evolutionary history of the sampled organisms. In this case, the nodes represent speciation events, connected by branches that reflect the population history along these periods, and where their widths represent effective population size (Ne) and their lengths represent time (usually in years or number of generations). Apart from speciations, only evolutionary processes that affect species as a whole are represented at this level, like hybridization. Note that species trees are equivalent to population trees when the organismal units of interest are conspecific populations. In this case, the nodes of the population trees represent isolation events. In general, we will refer to “species” as any diverging, interbreeding group of individuals regardless of its taxonomic rank. On the other hand, a locus tree represents the evolutionary history of the sampled loci for a given gene family (see Rasmussen and Kellis 2012). Since the loci exist inside individuals evolving as part of a population, the locus tree is embedded within the species tree. In a locus tree, the nodes depict either genetic divergence due to speciation in the embedding species tree or locus-level events such as duplication, losses, or horizontal gene transfers, whereas the branch lengths and widths represent time and Ne, respectively. Here, we assume that the locuslevel events get immediately fixed in the population, so these Ne are equivalent to those in the species tree and are the same for every locus. Finally, a gene tree represents the evolutionary history of the sampled gene copies that evolve inside the locus tree. Gene tree nodes indicate coalescent events, which looking forward in time correspond to the process of DNA replication and divergence, and that can occur around the speciation time, well before (deep coalescence) or afterwards (migration in population trees). The branches of the gene tree usually represent amount of substitutions per site, and can also represent number of generations or other measures of time. Importantly, these three historical layers do not necessarily coincide. True species/population trees can differ from true locus trees due to gene duplications, losses, and/or horizontal gene transfers, whereas true gene trees can differ from their embedding locus and species trees if there is incomplete lineage sorting (ILS) (Maddison 1997; Page and Charleston 1997) (and migration in the case of population trees). In this regard,


bioRxiv | 2018

Cryptsim: Modeling the evolutionary dynamics of the progression of Barrett's esophagus to esophageal adenocarcinoma

Diego Mallo; Rumen Kostadinov; Luis Cisneros; Mary K. Kuhner; Carlo C. Maley

To alleviate the over-diagnosis and overtreatment of premalignant conditions we need to predict their progression to cancer, and therefore, the dynamics of an evolutionary process. However, monitoring evolutionary processes in vivo is extremely challenging. Computer simulations constitute an attractive alternative, allowing us to study these dynamics based on a set of evolutionary parameters. We introduce CryptSim, a simulator of crypt evolution inspired by Barrett’s esophagus. We detail the most relevant computational strategies it implements, and perform a simulation study showing that the interaction between neighboring crypts may play a crucial role in carcinogenesis.


Bioinformatics | 2018

RecPhyloXML: a format for reconciled gene trees

Wandrille Duchemin; Guillaume Gence; Anne-Muriel Arigon Chifolleau; Lars Arvestad; Mukul S. Bansal; Vincent Berry; Bastien Boussau; François Chevenet; Nicolas Comte; Adrian A. Davin; Christophe Dessimoz; David Dylus; Damir Hasic; Diego Mallo; Rémi Planel; David Posada; Celine Scornavacca; Gergely J. Szöllősi; Louxin Zhang; Eric Tannier; Vincent Daubin

Motivation: A reconciliation is an annotation of the nodes of a gene tree with evolutionary events—for example, speciation, gene duplication, transfer, loss, etc.—along with a mapping onto a species tree. Many algorithms and software produce or use reconciliations but often using different reconciliation formats, regarding the type of events considered or whether the species tree is dated or not. This complicates the comparison and communication between different programs. Results: Here, we gather a consortium of software developers in gene tree species tree reconciliation to propose and endorse a format that aims to promote an integrative—albeit flexible—specification of phylogenetic reconciliations. This format, named recPhyloXML, is accompanied by several tools such as a reconciled tree visualizer and conversion utilities. Availability and implementation: http://phylariane.univ‐lyon1.fr/recphyloxml/.


Genome Biology | 2016

When (distant) relatives stay too long: implications for cancer medicine.

Diego Chowell; Amy M. Boddy; Diego Mallo; Marc Tollis; Carlo C. Maley

Whole-genome analyses of human medulloblastomas show that the dominant clone at relapse is present as a rare subclone at primary diagnosis.


Archive | 2013

Phylogenomics reveals polyphyly of haploscleromorph clades and provides insight into the early evolution of sponges

Guifré Torruella; Diego Mallo; Alicia R. Pérez-Porro; Sally P. Leys; Iñaki Ruiz-Trillo; Gonzalo Giribet; Ana Riesgo

Trabajo presentado en el Ninth World Sponge Conference, celebrado en Fremantle (Australia) del 4 al 8 de noviembre de 2013


Pattern Recognition in Computational Molecular Biology: Techniques and Approaches | 2015

23. Diverse Considerations for Successful Phylogenetic Tree Reconstruction: Impacts from Model Misspecification, Recombination, Homoplasy, and Pattern Recognition

Diego Mallo; Agustín Sánchez-Cobos; Miguel Arenas

Collaboration


Dive into the Diego Mallo's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar

Carlo C. Maley

Arizona State University

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Amy M. Boddy

Arizona State University

View shared research outputs
Top Co-Authors

Avatar

Mary K. Kuhner

University of Washington

View shared research outputs
Top Co-Authors

Avatar

Agustín Sánchez-Cobos

Spanish National Research Council

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Ramon Massana

Spanish National Research Council

View shared research outputs
Researchain Logo
Decentralizing Knowledge