Alexander F. Auch | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Alexander F. Auch is active.

Explore More

Publication

Featured researches published by Alexander F. Auch.

BMC Bioinformatics | 2013

Genome sequence-based species delimitation with confidence intervals and improved distance functions

Jan P. Meier-Kolthoff; Alexander F. Auch; Hans-Peter Klenk; Markus Göker

BackgroundFor the last 25 years species delimitation in prokaryotes (Archaea and Bacteria) was to a large extent based on DNA-DNA hybridization (DDH), a tedious lab procedure designed in the early 1970s that served its purpose astonishingly well in the absence of deciphered genome sequences. With the rapid progress in genome sequencing time has come to directly use the now available and easy to generate genome sequences for delimitation of species. GBDP (Genome Blast Distance Phylogeny) infers genome-to-genome distances between pairs of entirely or partially sequenced genomes, a digital, highly reliable estimator for the relatedness of genomes. Its application as an in-silico replacement for DDH was recently introduced. The main challenge in the implementation of such an application is to produce digital DDH values that must mimic the wet-lab DDH values as close as possible to ensure consistency in the Prokaryotic species concept.ResultsCorrelation and regression analyses were used to determine the best-performing methods and the most influential parameters. GBDP was further enriched with a set of new features such as confidence intervals for intergenomic distances obtained via resampling or via the statistical models for DDH prediction and an additional family of distance functions. As in previous analyses, GBDP obtained the highest agreement with wet-lab DDH among all tested methods, but improved models led to a further increase in the accuracy of DDH prediction. Confidence intervals yielded stable results when inferred from the statistical models, whereas those obtained via resampling showed marked differences between the underlying distance functions.ConclusionsDespite the high accuracy of GBDP-based DDH prediction, inferences from limited empirical data are always associated with a certain degree of uncertainty. It is thus crucial to enrich in-silico DDH replacements with confidence-interval estimation, enabling the user to statistically evaluate the outcomes. Such methodological advancements, easily accessible through the web service at http://ggdc.dsmz.de, are crucial steps towards a consistent and truly genome sequence-based classification of microorganisms.

Standards in Genomic Sciences | 2010

Digital DNA-DNA hybridization for microbial species delineation by means of genome-to-genome sequence comparison

Alexander F. Auch; Mathias von Jan; Hans-Peter Klenk; Markus Göker

The pragmatic species concept for Bacteria and Archaea is ultimately based on DNA-DNA hybridization (DDH). While enabling the taxonomist, in principle, to obtain an estimate of the overall similarity between the genomes of two strains, this technique is tedious and error-prone and cannot be used to incrementally build up a comparative database. Recent technological progress in the area of genome sequencing calls for bioinformatics methods to replace the wet-lab DDH by in-silico genome-to-genome comparison. Here we investigate state-of-the-art methods for inferring whole-genome distances in their ability to mimic DDH. Algorithms to efficiently determine high-scoring segment pairs or maximally unique matches perform well as a basis of inferring intergenomic distances. The examined distance functions, which are able to cope with heavily reduced genomes and repetitive sequence regions, outperform previously described ones regarding the correlation with and error ratios in emulating DDH. Simulation of incompletely sequenced genomes indicates that some distance formulas are very robust against missing fractions of genomic information. Digitally derived genome-to-genome distances show a better correlation with 16S rRNA gene sequence distances than DDH values. The future perspectives of genome-informed taxonomy are discussed, and the investigated methods are made available as a web service for genome-based species delineation.

PLOS ONE | 2008

MetaSim—A Sequencing Simulator for Genomics and Metagenomics

Daniel C. Richter; Felix Ott; Alexander F. Auch; Ramona Schmid; Daniel H. Huson

Background The new research field of metagenomics is providing exciting insights into various, previously unclassified ecological systems. Next-generation sequencing technologies are producing a rapid increase of environmental data in public databases. There is great need for specialized software solutions and statistical methods for dealing with complex metagenome data sets. Methodology/Principal Findings To facilitate the development and improvement of metagenomic tools and the planning of metagenomic projects, we introduce a sequencing simulator called MetaSim. Our software can be used to generate collections of synthetic reads that reflect the diverse taxonomical composition of typical metagenome data sets. Based on a database of given genomes, the program allows the user to design a metagenome by specifying the number of genomes present at different levels of the NCBI taxonomy, and then to collect reads from the metagenome using a simulation of a number of different sequencing technologies. A population sampler optionally produces evolved sequences based on source genomes and a given evolutionary tree. Conclusions/Significance MetaSim allows the user to simulate individual read datasets that can be used as standardized test scenarios for planning sequencing projects or for benchmarking metagenomic software.

Standards in Genomic Sciences | 2010

Standard operating procedure for calculating genome-to-genome distances based on high-scoring segment pairs

Alexander F. Auch; Hans-Peter Klenk; Markus Göker

DNA-DNA hybridization (DDH) is a widely applied wet-lab technique to obtain an estimate of the overall similarity between the genomes of two organisms. To base the species concept for prokaryotes ultimately on DDH was chosen by microbiologists as a pragmatic approach for deciding about the recognition of novel species, but also allowed a relatively high degree of standardization compared to other areas of taxonomy. However, DDH is tedious and error-prone and first and foremost cannot be used to incrementally establish a comparative database. Recent studies have shown that in-silico methods for the comparison of genome sequences can be used to replace DDH. Considering the ongoing rapid technological progress of sequencing methods, genome-based prokaryote taxonomy is coming into reach. However, calculating distances between genomes is dependent on multiple choices for software and program settings. We here provide an overview over the modifications that can be applied to distance methods based in high-scoring segment pairs (HSPs) or maximally unique matches (MUMs) and that need to be documented. General recommendations on determining HSPs using BLAST or other algorithms are also provided. As a reference implementation, we introduce the GGDC web server (http://ggdc.gbdp.org).

BMC Bioinformatics | 2009

Methods for comparative metagenomics

Daniel H. Huson; Daniel C. Richter; Suparna Mitra; Alexander F. Auch; Stephan C. Schuster

BackgroundMetagenomics is a rapidly growing field of research that aims at studying uncultured organisms to understand the true diversity of microbes, their functions, cooperation and evolution, in environments such as soil, water, ancient remains of animals, or the digestive system of animals and humans. The recent development of ultra-high throughput sequencing technologies, which do not require cloning or PCR amplification, and can produce huge numbers of DNA reads at an affordable cost, has boosted the number and scope of metagenomic sequencing projects. Increasingly, there is a need for new ways of comparing multiple metagenomics datasets, and for fast and user-friendly implementations of such approaches.ResultsThis paper introduces a number of new methods for interactively exploring, analyzing and comparing multiple metagenomic datasets, which will be made freely available in a new, comparative version 2.0 of the stand-alone metagenome analysis tool MEGAN.ConclusionThere is a great need for powerful and user-friendly tools for comparative analysis of metagenomic data and MEGAN 2.0 will help to fill this gap.

Bioinformatics | 2007

CopyCat : cophylogenetic analysis tool

Jan P. Meier-Kolthoff; Alexander F. Auch; Daniel H. Huson; Markus Göker

UNLABELLED We have developed the software CopyCat which provides an easy and fast access to cophylogenetic analyses. It incorporates a wrapper for the program ParaFit, which conducts a statistical test for the presence of congruence between host and parasite phylogenies. CopyCat offers various features, such as the creation of customized host-parasite association data and the computation of phylogenetic host/parasite trees based on the NCBI taxonomy. AVAILABILITY CopyCat and its manual are freely available at http://www-ab.informatik.uni-tuebingen.de/software/copycat. SUPPLEMENTARY INFORMATION Results of the real-world example can be found at http://www-ab.informatik.uni-tuebingen.de/software/copycat or Bioinformatics online.

BMC Bioinformatics | 2007

AxPcoords & parallel AxParafit: statistical co-phylogenetic analyses on thousands of taxa

Alexandros Stamatakis; Alexander F. Auch; Jan P. Meier-Kolthoff; Markus Göker

BackgroundCurrent tools for Co-phylogenetic analyses are not able to cope with the continuous accumulation of phylogenetic data. The sophisticated statistical test for host-parasite co-phylogenetic analyses implemented in Parafit does not allow it to handle large datasets in reasonable times. The Parafit and DistPCoA programs are the by far most compute-intensive components of the Parafit analysis pipeline. We present AxParafit and AxPcoords (Ax stands for Accelerated) which are highly optimized versions of Parafit and DistPCoA respectively.ResultsBoth programs have been entirely re-written in C. Via optimization of the algorithm and the C code as well as integration of highly tuned BLAS and LAPACK methods AxParafit runs 5–61 times faster than Parafit with a lower memory footprint (up to 35% reduction) while the performance benefit increases with growing dataset size. The MPI-based parallel implementation of AxParafit shows good scalability on up to 128 processors, even on medium-sized datasets. The parallel analysis with AxParafit on 128 CPUs for a medium-sized dataset with an 512 by 512 association matrix is more than 1,200/128 times faster per processor than the sequential Parafit run. AxPcoords is 8–26 times faster than DistPCoA and numerically stable on large datasets. We outline the substantial benefits of using parallel AxParafit by example of a large-scale empirical study on smut fungi and their host plants. To the best of our knowledge, this study represents the largest co-phylogenetic analysis to date.ConclusionThe highly efficient AxPcoords and AxParafit programs allow for large-scale co-phylogenetic analyses on several thousands of taxa for the first time. In addition, AxParafit and AxPcoords have been integrated into the easy-to-use CopyCat tool.

Evolutionary Bioinformatics | 2010

A Clustering Optimization Strategy for Molecular Taxonomy Applied to Planktonic Foraminifera SSU rDNA

Markus Göker; Guido W. Grimm; Alexander F. Auch; Ralf Aurahs; Michal Kucera

Identifying species is challenging in the case of organisms for which primarily molecular data are available. Even if morphological features are available, molecular taxonomy is often necessary to revise taxonomic concepts and to analyze environmental DNA sequences. However, clustering approaches to delineate molecular operational taxonomic units often rely on arbitrary parameter choices. Also, distance calculation is difficult for highly alignment-ambiguous sequences. Here, we applied a recently described clustering optimization method to highly divergent planktonic foraminifera SSU rDNA sequences. We determined the distance function and the clustering setting that result in the highest agreement with morphological reference data. Alignment-free distance calculation, when adapted to the use with partly non-homologous sequences caused by distinct primer pairs, outperformed multiple sequence alignment. Clustering optimization offers new perspectives for the barcoding of species diversity and for environmental sequencing. It bridges the gap between traditional and modern taxonomic disciplines by specifically addressing the issue of how to optimally account for both genetic divergence and given species concepts.

Concurrency and Computation: Practice and Experience | 2014

Highly parallelized inference of large genome-based phylogenies

Jan P. Meier-Kolthoff; Alexander F. Auch; Hans-Peter Klenk; Markus Göker

Genome Blast Distance Phylogeny (GBDP) infers distances and phylogenetic relationships between organisms from completely or partially sequenced genomes. It is well suited for parallelization as pairwise distances are calculated independently. As exemplar data for a high‐performance cluster implementation that executes many pairwise genome comparisons in parallel, we here used sequences from the Genomic Encyclopedia of Bacteria and Archaea project. Phylogenies were inferred from genome‐scale nucleotide and amino acid data with all variants of GBDP, including novel adaptations to amino acid sequences and approaches yielding trees with branch support. The dependency of phylogenetic accuracy, average branch support as well as performance indicators such as running time and disk space consumption on details of genome comparison, distance calculation, and phylogenetic inference was examined in detail. If combined with conservative measures for branch support, GBDP appears to infer reasonable phylogenetic relationships of microorganisms with a comparatively low computational cost. Due to the linear speed‐up of the cluster, benchmarks reveal an overall computation time of less than 24 h required for the 7750 pairwise genome/proteome comparisons of the Genomic Encyclopedia of Bacteria and Archaea data set that is opposed to an estimated running time of about 30 days for the non‐parallelized version. Copyright

ieee international conference on high performance computing data and analytics | 2009

Large-Scale Co-Phylogenetic Analysis on the Grid

Heinz Stockinger; Alexander F. Auch; Markus Göker; Jan P. Meier-Kolthoff; Alexandros Stamatakis

Phylogenetic data analysis represents an extremely compute-intensive area of Bioinformatics and thus requires high-performance technologies. Another computeand memory-intensive problem is that of hostparasite co-phylogenetic analysis: given two phylogenetic trees, one for the hosts (e.g., mammals) and one for their respective parasites (e.g., lice) the question arises whether host and parasite trees are more similar to each other than expected by chance alone. CopyCat is an easy-to-use tool that allows biologists to conduct such co-phylogenetic studies within an elaborate statistical framework based on the highly optimized sequential and parallel �xParafit program. We have developed enhanced versions of these tools that efficiently exploit a Grid environment and therefore facilitate large-scale data analyses. Furthermore, we developed a freely accessible client tool that provides co-phylogenetic analysis capabilities. Since the computational bulk of the problem is embarrassingly parallel, it fits well to a computational Grid and reduces the response time of large scale analyses.

Explore More