Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Lucas Czech is active.

Publication


Featured researches published by Lucas Czech.


Nature Ecology and Evolution | 2017

Parasites dominate hyperdiverse soil protist communities in Neotropical rainforests

Frédéric Mahé; Colomban de Vargas; David Bass; Lucas Czech; Alexandros Stamatakis; Enrique Lara; David Singer; Jordan Mayor; John Bunge; Sarah Sernaker; Tobias Siemensmeyer; Isabelle Trautmann; Sarah Romac; Cédric Berney; Alexey Kozlov; Edward A. D. Mitchell; Christophe V. W. Seppey; Elianne Sirnæs Egge; Guillaume Lentendu; Rainer Wirth; Gabriel Trueba; Micah Dunthorn

High animal and plant richness in tropical rainforest communities has long intrigued naturalists. It is unknown if similar hyperdiversity patterns are reflected at the microbial scale with unicellular eukaryotes (protists). Here we show, using environmental metabarcoding of soil samples and a phylogeny-aware cleaning step, that protist communities in Neotropical rainforests are hyperdiverse and dominated by the parasitic Apicomplexa, which infect arthropods and other animals. These host-specific parasites potentially contribute to the high animal diversity in the forests by reducing population growth in a density-dependent manner. By contrast, too few operational taxonomic units (OTUs) of Oomycota were found to broadly drive high tropical tree diversity in a host-specific manner under the Janzen-Connell model. Extremely high OTU diversity and high heterogeneity between samples within the same forests suggest that protists, not arthropods, are the most diverse eukaryotes in tropical rainforests. Our data show that protists play a large role in tropical terrestrial ecosystems long viewed as being dominated by macroorganisms.


Journal of Eukaryotic Microbiology | 2017

UniEuk : Time to Speak a Common Language in Protistology!

Cédric Berney; Andreea Ciuprina; Sara J. Bender; Juliet Brodie; Virginia P. Edgcomb; Eunsoo Kim; Jeena Rajan; Laura Wegener Parfrey; Sina Adl; Stéphane Audic; David Bass; David A. Caron; Guy Cochrane; Lucas Czech; Micah Dunthorn; Stefan Geisen; Frank Oliver Glöckner; Frédéric Mahé; Christian Quast; Jonathan Z. Kaye; Alastair G. B. Simpson; Alexandros Stamatakis; Javier Campo; Pelin Yilmaz; Colomban de Vargas

Universal taxonomic frameworks have been critical tools to structure the fields of botany, zoology, mycology, and bacteriology as well as their large research communities. Animals, plants, and fungi have relatively solid, stable morpho‐taxonomies built over the last three centuries, while bacteria have been classified for the last three decades under a coherent molecular taxonomic framework. By contrast, no such common language exists for microbial eukaryotes, even though environmental ‘‐omics’ surveys suggest that protists make up most of the organismal and genetic complexity of our planets ecosystems! With the current deluge of eukaryotic meta‐omics data, we urgently need to build up a universal eukaryotic taxonomy bridging the protist ‐omics age to the fragile, centuries‐old body of classical knowledge that has effectively linked protist taxa to morphological, physiological, and ecological information. UniEuk is an open, inclusive, community‐based and expert‐driven international initiative to build a flexible, adaptive universal taxonomic framework for eukaryotes. It unites three complementary modules, EukRef, EukBank, and EukMap, which use phylogenetic markers, environmental metabarcoding surveys, and expert knowledge to inform the taxonomic framework. The UniEuk taxonomy is directly implemented in the European Nucleotide Archive at EMBL‐EBI, ensuring its broad use and long‐term preservation as a reference taxonomy for eukaryotes.


Molecular Biology and Evolution | 2017

A Critical Review on the Use of Support Values in Tree Viewers and Bioinformatics Toolkits

Lucas Czech; Jaime Huerta-Cepas; Alexandros Stamatakis

Abstract Phylogenetic trees are routinely visualized to present and interpret the evolutionary relationships of species. Most empirical evolutionary data studies contain a visualization of the inferred tree with branch support values. Ambiguous semantics in tree file formats can lead to erroneous tree visualizations and therefore to incorrect interpretations of phylogenetic analyses. Here, we discuss problems that arise when displaying branch values on trees after rerooting. Branch values are typically stored as node labels in the widely-used Newick tree format. However, such values are attributes of branches. Storing them as node labels can therefore yield errors when rerooting trees. This depends on the mostly implicit semantics that tools deploy to interpret node labels. We reviewed ten tree viewers and ten bioinformatics toolkits that can display and reroot trees. We found that 14 out of 20 of these tools do not permit users to select the semantics of node labels. Thus, unaware users might obtain incorrect results when rooting trees. We illustrate such incorrect mappings for several test cases and real examples taken from the literature. This review has already led to improvements in eight tools. We suggest tools should provide options that explicitly force users to define the semantics of node labels.


Journal of Eukaryotic Microbiology | 2018

Clarifying the Relationships between Microsporidia and Cryptomycota

David Bass; Lucas Czech; Bryony A. P. Williams; Cédric Berney; Micah Dunthorn; Frédéric Mahé; Guifré Torruella; Grant D. Stentiford; T. Williams

Some protists with microsporidian‐like cell biological characters, including Mitosporidium, Paramicrosporidium, and Nucleophaga, have SSU rRNA gene sequences that are much less divergent than canonical Microsporidia. We analysed the phylogenetic placement and environmental diversity of microsporidian‐like lineages that group near the base of the fungal radiation and show that they group in a clade with metchnikovellids and canonical microsporidians, to the exclusion of the clade including Rozella, in line with what is currently known of their morphology and cell biology. These results show that the phylogenetic scope of Microsporidia has been greatly underestimated. We propose that much of the lineage diversity previously thought to be cryptomycotan/rozellid is actually microsporidian, offering new insights into the evolution of the highly specialized parasitism of canonical Microsporidia. This insight has important implications for our understanding of opisthokont evolution and ecology, and is important for accurate interpretation of environmental diversity. Our analyses also demonstrate that many opisthosporidian (aphelid+rozellid+microsporidian) SSU V4 OTUs from Neotropical forest soils group with the short‐branching Microsporidia, consistent with the abundance of their protist and arthropod hosts in soils. This novel diversity of Microsporidia provides a unique opportunity to investigate the evolutionary origins of a highly specialized clade of major animal parasites.


bioRxiv | 2017

Quartet-based computations of internode certainty provide accurate and robust measures of phylogenetic incongruence

Xiaofan Zhou; Sarah Lutteropp; Lucas Czech; Alexandros Stamatakis; Moritz von Looz; Antonis Rokas

Incongruence, or topological conflict, is prevalent in genome-scale data sets but relatively few measures have been developed to quantify it. Internode Certainty (IC) and related measures were recently introduced to explicitly quantify the level of incongruence of a given internode (or internal branch) among a set of phylogenetic trees and complement regular branch support statistics in assessing the confidence of the inferred phylogenetic relationships. Since most phylogenomic studies contain data partitions (e.g., genes) with missing taxa and IC scores stem from the frequencies of bipartitions (or splits) on a set of trees, the calculation of IC scores requires adjusting the frequencies of bipartitions from these partial gene trees. However, when the proportion of missing data is high, current approaches that adjust bipartition frequencies in partial gene trees tend to overestimate IC scores and alternative adjustment approaches differ substantially from each other in their scores. To overcome these issues, we developed three new measures for calculating internode certainty that are based on the frequencies of quartets, which naturally apply to both comprehensive and partial trees. Our comparison of these new quartet-based measures to previous bipartition-based measures on simulated data shows that: 1) on comprehensive trees, both types of measures yield highly similar IC scores; 2) on partial trees, quartet-based measures generate more accurate IC scores; and 3) quartet-based measures are more robust to the absence of phylogenetic signal and errors in the phylogenetic relationships to be assessed. Additionally, analysis of 15 empirical phylogenomic data sets using our quartet-based measures suggests that numerous relationships remain unresolved despite the availability of genome-scale data. Finally, we provide an efficient open-source implementation of these quartet-based measures in the program QuartetScores, which is freely available at https://github.com/algomaus/QuartetScores.


bioRxiv | 2016

Soil Protists in Three Neotropical Rainforests are Hyperdiverse and Dominated by Parasites

Frédéric Mahé; Colomban de Vargas; David Bass; Lucas Czech; Alexandros Stamatakis; Enrique Lara; Jordan Mayor; John Bunge; Sarah Sernaker; Tobias Siemensmeyer; Isabelle Trautmann; Sarah Romac; Cédric Berney; Alexey Kozlov; Edward A. D. Mitchell; Christophe V. W. Seppey; David Singer; Elianne Sirnæs Egge; Rainer Wirth; Gabriel Trueba; Micah Dunthorn

Animal and plant richness in tropical rainforests has long intrigued naturalist. More recent work has revealed that parasites contribute to high tropical tree diversity (Bagchi et al., 2014; Terborgh, 2012) and that arthropods are the most diverse eukaryotes in these forests (Erwin, 1982; Basset et al., 2012). It is unknown if similar patterns are reflected at the microbial scale with unicellular eukaryotes or protists. Here we show, using environmental metabarcoding and a novel phylogeny-aware cleaning step, that protists inhabiting Neotropical rainforest soils are hyperdiverse and dominated by the parasitic Apicomplexa, which infect arthropods and other animals. These host-specific protist parasites potentially contribute to the high animal diversity in the forests by reducing population growth in a density-dependent manner. By contrast, we found too few Oomycota to broadly drive high tropical tree diversity in a host-specific manner under the Janzen-Connell model (Janzen, 1970; Connell, 1970). Extremely high OTU diversity and high heterogeneity between samples within the same forests suggest that protists, not arthropods, are the most diverse eukaryotes in tropical rainforests. Our data show that microbes play a large role in tropical terrestrial ecosystems long viewed as being dominated by macro-organisms. Contact: [email protected]


bioRxiv | 2015

Do Phylogenetic Tree Viewers correctly display Support Values

Lucas Czech; Alexandros Stamatakis

Phylogenetic trees are routinely visualized to present and interpret the evolutionary relationships of the species that are being studied. Virtually all empirical evolutionary data studies contain a visualization of the inferred tree with support values using one of the popular and highly cited (e.g., TreeView, Dendroscope, FigTree, Archaeopteryx, etc.) tree viewing tools. As a consequence, programming errors or ambiguous semantics in tree file formats can lead to erroneous tree visualizations and consequently incorrect interpretations of phylogenetic analyses. Here, we discuss the problems that can and do arise when displaying branch support values on trees. Presumably for historical reasons, branch support values (e.g., bootstrap support or Bayesian posterior probabilities) are typically stored as node labels in the widely-used Newick tree format. However, support values are attributes of branches (bipartitions) in unrooted phylogenetic trees. Therefore, storing support values as node labels can potentially lead to incorrect support-value-to-bipartition mappings when re-rooting trees in tree viewers. This depends on the mostly implicit semantics of tree viewers for interpreting node labels. To assess the potential impact of these ambiguous and predominantly implicit semantics of support values, we analyzed 10 distinct tree viewers. We find that, most of them exhibit some sort of incorrect or unexpected behavior when re-rooting trees with support values. We find that Dendroscope interprets Newick node labels as simply that, node labels in Newick trees. However, if they are meant to represent branch support values, the support value to branch mapping is incorrect when re-rooting trees with Dendroscope. We illustrate such an incorrect mapping by example of an empirical phylogenetic study. As a solution, we suggest that (i) branch support values should exclusively be stored as meta-data associated to branches (and not nodes), and (ii) if this is not feasible, tree viewers should include a user dialogue that explicitly forces users to define if node labels shall be interpreted as node or branch labels, prior to tree visualization.Phylogenetic trees are routinely visualized to present and interpret the evolutionary relationships of species. Virtually all empirical evolutionary data studies contain a visualization of the inferred tree with branch support values. Ambiguous semantics in tree file formats can lead to erroneous tree visualizations and therefore to incorrect interpretations of phylogenetic analyses. Here, we discuss problems that can and do arise when displaying branch values on trees after re-rooting. Branch values are typically stored as node labels in the widely-used Newick tree format. However, such values are attributes of branches. Storing them as node labels can therefore yield errors when re-rooting trees. This depends on the mostly implicit semantics that tools deploy to interpret node labels. We reviewed 10 tree viewers and 10 bioinformatics toolkits that can display and re-root trees. We found that 14 out of 20 of these tools do not permit users to select the semantics of node labels. Thus, unaware users might obtain incorrect results when rooting trees inferred by common phylogenetic inference programs. We illustrate such incorrect mappings for several test cases and real examples taken from the literature. This review has already led to improvements and workarounds in 8 of the tested tools. We suggest tools should provide an option that explicitly forces users to define the semantics of node labels.


bioRxiv | 2018

Scalable Methods for Post-Processing, Visualizing, and Analyzing Phylogenetic Placements

Lucas Czech; Alexandros Stamatakis

The exponential decrease in molecular sequencing cost generates unprecedented amounts of data. Hence, scalable methods to analyze these data are required. Phylogenetic (or Evolutionary) Placement methods identify the evolutionary provenance of anonymous sequences with respect to a given reference phylogeny. This increasingly popular method is deployed for scrutinizing metagenomic samples from environments such as water, soil, or the human gut. Here, we present novel and, more importantly, highly scalable methods for analyzing phylogenetic placements of metagenomic samples. More specifically, we introduce methods for (a) visualizing differences between samples and their correlation with associated meta-data on the reference phylogeny, (b) clustering similar samples using a variant of the fc-means method, and (c) finding phylogenetic factors using an adaptation of the Phylofactorization method. These methods enable to interpret metagenomic data in a phylogenetic context, to find patterns in the data, and to identify branches of the phylogeny that are driving these patterns. To demonstrate the scalability and utility of our methods, as well as to provide exemplary interpretations of our methods, we applied them to 3 publicly available datasets comprising 9782 samples with a total of approximately 168 million sequences. The results indicate that new biological insights can be attained via our methods.The exponential decrease in molecular sequencing cost generates unprecedented amounts of data. Hence, scalable methods to analyze these data are required. Phylogenetic (or Evolutionary) Placement methods identify the evolutionary provenance of anonymous sequences with respect to a given reference phylogeny. This increasingly popular method is deployed for scrutinizing metagenomic samples from environments such as water, soil, or the human gut. Here, we present novel and, more importantly, highly scalable methods for analyzing phylogenetic placements of metagenomic samples. More specifically, we introduce methods for visualizing differences between samples and their correlation with associated meta-data on the reference phylogeny, as well as for clustering similar samples using a variant of the k-means method. To demonstrate the scalability and utility of our methods, as well as to provide exemplary interpretations of our methods, we applied them to 3 publicly available datasets comprising 9782 samples with a total of approximately 168 million sequences. The results indicate that new biological insights can be attained via our methods.


Systematic Biology | 2018

EPA-ng: Massively Parallel Evolutionary Placement of Genetic Sequences

Pierre Barbera; Alexey Kozlov; Lucas Czech; Benoit Morel; Diego Darriba; Tomáš Flouri; Alexandros Stamatakis

&NA; Next generation sequencing (NGS) technologies have led to a ubiquity of molecular sequence data. This data avalanche is particularly challenging in metagenetics, which focuses on taxonomic identification of sequences obtained from diverse microbial environments. Phylogenetic placement methods determine how these sequences fit into an evolutionary context. Previous implementations of phylogenetic placement algorithms, such as the evolutionary placement algorithm (EPA) included in RAxML, or PPLACER, are being increasingly used for this purpose. However, due to the steady progress in NGS technologies, the current implementations face substantial scalability limitations. Herein, we present EPA‐NG, a complete reimplementation of the EPA that is substantially faster, offers a distributed memory parallelization, and integrates concepts from both, RAxML‐EPA and PPLACER. EPA‐NG can be executed on standard shared memory, as well as on distributed memory systems (e.g., computing clusters). To demonstrate the scalability of EPA‐NG, we placed 1 billion metagenetic reads from the Tara Oceans Project onto a reference tree with 3748 taxa in just under 7 h, using 2048 cores. Our performance assessment shows that EPA‐NG outperforms RAxML‐EPA and PPLACER by up to a factor of 30 in sequential execution mode, while attaining comparable parallel efficiency on shared memory systems. We further show that the distributed memory parallelization of EPA‐NG scales well up to 2048 cores. EPA‐NG is available under the AGPLv3 license: https://github.com/Pbdas/epa‐ng.


Bioinformatics | 2018

Methods for Automatic Reference Trees and Multilevel Phylogenetic Placement

Lucas Czech; Pierre Barbera; Alexandros Stamatakis

Abstract Motivation In most metagenomic sequencing studies, the initial analysis step consists in assessing the evolutionary provenance of the sequences. Phylogenetic (or Evolutionary) Placement methods can be employed to determine the evolutionary position of sequences with respect to a given reference phylogeny. These placement methods do however face certain limitations: The manual selection of reference sequences is labor-intensive; the computational effort to infer reference phylogenies is substantially larger than for methods that rely on sequence similarity; the number of taxa in the reference phylogeny should be small enough to allow for visually inspecting the results. Results We present algorithms to overcome the above limitations. First, we introduce a method to automatically construct representative sequences from databases to infer reference phylogenies. Second, we present an approach for conducting large-scale phylogenetic placements on nested phylogenies. Third, we describe a preprocessing pipeline that allows for handling huge sequence datasets. Our experiments on empirical data show that our methods substantially accelerate the workflow and yield highly accurate placement results. Availability and implementation Freely available under GPLv3 at http://github.com/lczech/gappa. Supplementary information Supplementary data are available at Bioinformatics online.

Collaboration


Dive into the Lucas Czech's collaboration.

Top Co-Authors

Avatar

Alexandros Stamatakis

Karlsruhe Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

David Bass

Centre for Environment

View shared research outputs
Top Co-Authors

Avatar

Frédéric Mahé

Kaiserslautern University of Technology

View shared research outputs
Top Co-Authors

Avatar

Micah Dunthorn

Kaiserslautern University of Technology

View shared research outputs
Top Co-Authors

Avatar

Alexey Kozlov

Heidelberg Institute for Theoretical Studies

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Isabelle Trautmann

Kaiserslautern University of Technology

View shared research outputs
Top Co-Authors

Avatar

Pierre Barbera

Heidelberg Institute for Theoretical Studies

View shared research outputs
Top Co-Authors

Avatar

Rainer Wirth

Kaiserslautern University of Technology

View shared research outputs
Researchain Logo
Decentralizing Knowledge