Is this you? Create Your Porfile

Fernando Izquierdo-Carrasco

Heidelberg Institute for Theoretical Studies

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Fernando Izquierdo-Carrasco is active.

Explore More

Publication

Featured researches published by Fernando Izquierdo-Carrasco.

Nature Methods | 2013

Metagenomic species profiling using universal phylogenetic marker genes

Shinichi Sunagawa; Daniel R. Mende; Georg Zeller; Fernando Izquierdo-Carrasco; Simon A. Berger; Jens Roat Kultima; Luis Pedro Coelho; Manimozhiyan Arumugam; Julien Tap; Henrik Bjørn Nielsen; Simon Rasmussen; Søren Brunak; Oluf Pedersen; Francisco Guarner; Willem M. de Vos; Jun Wang; Junhua Li; Joël Doré; S. Dusko Ehrlich; Alexandros Stamatakis; Peer Bork

To quantify known and unknown microorganisms at species-level resolution using shotgun sequencing data, we developed a method that establishes metagenomic operational taxonomic units (mOTUs) based on single-copy phylogenetic marker genes. Applied to 252 human fecal samples, the method revealed that on average 43% of the species abundance and 58% of the richness cannot be captured by current reference genome–based methods. An implementation of the method is available at http://www.bork.embl.de/software/mOTU/.

Bioinformatics | 2012

RAxML-Light

Alexandros Stamatakis; Andre J. Aberer; Christian Goll; Stephen A. Smith; Simon A. Berger; Fernando Izquierdo-Carrasco

Motivation: Due to advances in molecular sequencing and the increasingly rapid collection of molecular data, the field of phyloinformatics is transforming into a computational science. Therefore, new tools are required that can be deployed in supercomputing environments and that scale to hundreds or thousands of cores. Results: We describe RAxML-Light, a tool for large-scale phylogenetic inference on supercomputers under maximum likelihood. It implements a light-weight checkpointing mechanism, deploys 128-bit (SSE3) and 256-bit (AVX) vector intrinsics, offers two orthogonal memory saving techniques and provides a fine-grain production-level message passing interface parallelization of the likelihood function. To demonstrate scalability and robustness of the code, we inferred a phylogeny on a simulated DNA alignment (1481 taxa, 20 000 000 bp) using 672 cores. This dataset requires one terabyte of RAM to compute the likelihood score on a single tree. Code Availability: https://github.com/stamatak/RAxML-Light-1.0.5 Data Availability: http://www.exelixis-lab.org/onLineMaterial.tar.bz2 Contact: [email protected] Supplementary Information: Supplementary data are available at Bioinformatics online.

BMC Bioinformatics | 2011

Algorithms, data structures, and numerics for likelihood-based phylogenetic inference of huge trees

Fernando Izquierdo-Carrasco; Stephen A. Smith; Alexandros Stamatakis

BackgroundThe rapid accumulation of molecular sequence data, driven by novel wet-lab sequencing technologies, poses new challenges for large-scale maximum likelihood-based phylogenetic analyses on trees with more than 30,000 taxa and several genes. The three main computational challenges are: numerical stability, the scalability of search algorithms, and the high memory requirements for computing the likelihood.ResultsWe introduce methods for solving these three key problems and provide respective proof-of-concept implementations in RAxML. The mechanisms presented here are not RAxML-specific and can thus be applied to any likelihood-based (Bayesian or maximum likelihood) tree inference program. We develop a new search strategy that can reduce the time required for tree inferences by more than 50% while yielding equally good trees (in the statistical sense) for well-chosen starting trees. We present an adaptation of the Subtree Equality Vector technique for phylogenomic datasets with missing data (already available in RAxML v728) that can reduce execution times and memory requirements by up to 50%. Finally, we discuss issues pertaining to the numerical stability of the Γ model of rate heterogeneity on very large trees and argue in favor of rate heterogeneity models that use a single rate or rate category for each site to resolve these problems.ConclusionsWe address three major issues pertaining to large scale tree reconstruction under maximum likelihood and propose respective solutions. Respective proof-of-concept/production-level implementations of our ideas are made available as open-source code.

Bioinformatics | 2014

PUmPER: phylogenies updated perpetually

Fernando Izquierdo-Carrasco; John Cazes; Stephen A. Smith; Alexandros Stamatakis

Summary: New sequence data useful for phylogenetic and evolutionary analyses continues to be added to public databases. The construction of multiple sequence alignments and inference of huge phylogenies comprising large taxonomic groups are expensive tasks, both in terms of man hours and computational resources. Therefore, maintaining comprehensive phylogenies, based on representative and up-to-date molecular sequences, is challenging. PUmPER is a framework that can perpetually construct multi-gene alignments (with PHLAWD) and phylogenetic trees (with ExaML or RAxML-Light) for a given NCBI taxonomic group. When sufficient numbers of new gene sequences for the selected taxonomic group have accumulated in GenBank, PUmPER automatically extends the alignment and infers extended phylogenetic trees by using previously inferred smaller trees as starting topologies. Using our framework, large phylogenetic trees can be perpetually updated without human intervention. Importantly, resulting phylogenies are not statistically significantly worse than trees inferred from scratch. Availability and implementation: PUmPER can run in stand-alone mode on a single server, or offload the computationally expensive phylogenetic searches to a parallel computing cluster. Source code, documentation, and tutorials are available at https://github.com/fizquierdo/perpetually-updated-trees. Contact: [email protected] Supplementary information: Supplementary Material is available at Bioinformatics online.

ieee international symposium on parallel & distributed processing, workshops and phd forum | 2013

Boosting the Performance of Bayesian Divergence Time Estimation with the Phylogenetic Likelihood Library

Diego Darriba; Andre J. Aberer; Tomáš Flouri; Tracy A. Heath; Fernando Izquierdo-Carrasco; Alexandros Stamatakis

We present a substantially improved and parallelized version of DPPDiv, a software tool for estimating species divergence times and lineage-specific substitution rates on a fixed tree topology. The improvement is achieved by integrating the DPPDiv code with the Phylogenetic Likelihood Library (PLL), a fast, optimized, and parallelized collection of functions for conducting likelihood computations on phylogenetic trees. We show that, integrating the PLL into a likelihoodbased application is straight-forward since it took the first author (DD) a programming effort of only one month, without having prior knowledge of DPPDiv, nor the PLL. We achieve sequential speedups that range between a factor of two to three and near-optimal parallel speedups up to 48 threads on sufficiently large datasets. Hence, with a programming effort of one month, we were able to improve DPPDivs time-to-solution on parallel systems by two orders of magnitude and also to substantially improve its ability to infer divergence times on large-scale datasets.

international symposium on bioinformatics research and applications | 2013

Heuristic Algorithms for the Protein Model Assignment Problem

Jörg Hauser; Kassian Kobert; Fernando Izquierdo-Carrasco; Karen Meusemann; Bernhard Misof; Michael Gertz; Alexandros Stamatakis

Assigning an optimal combination of empirical amino acid substitution models (e.g., WAG, LG, MTART) to partitioned multi-gene datasets when branch lengths across partitions are linked, is suspected to be an NP-hard problem. Given p partitions and the approximately 20 empirical protein models that are available, one needs to compute the log likelihood score of 20 p possible model-to-partition assignments for obtaining the optimal assignment.

international parallel and distributed processing symposium | 2012

Inference of Huge Trees under Maximum Likelihood

Fernando Izquierdo-Carrasco; Alexandros Stamatakis

The wide adoption of Next-Generation Sequencing technologies in recent years has generated an avalanche of genetic data, which poses new challenges for large-scale maximum likelihood-based phylogenetic analyses. Improving the scalability of search algorithms and reducing the high memory requirements for computing the likelihood represent major computational challenges in this context. We have introduced methods for solving these key problems and provided respective proof-of-concept implementations. Moreover, we have developed a new tree search strategy that can reduce run times by more than 50% while yielding equally good trees (in the statistical sense). To reduce memory requirements, we explored the applicability of external memory (out-of-core) algorithms as well as a concept that trades memory for additional computations in the likelihood function. The latter concept, only induces a surprisingly small increase in overall execution times. When trading 50% of the required RAM for additional computations, the average execution time increase- because of additional computations-amounts to only 15%. All concepts presented here are sufficiently generic such that they can be applied to all programs that rely on the phylogenetic likelihood function. Thereby, the approaches we have developed will contribute to enable large-scale inferences of whole-genome phylogenies.

Briefings in Bioinformatics | 2011