Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Markus Göker is active.

Publication


Featured researches published by Markus Göker.


BMC Bioinformatics | 2013

Genome sequence-based species delimitation with confidence intervals and improved distance functions

Jan P. Meier-Kolthoff; Alexander F. Auch; Hans-Peter Klenk; Markus Göker

BackgroundFor the last 25 years species delimitation in prokaryotes (Archaea and Bacteria) was to a large extent based on DNA-DNA hybridization (DDH), a tedious lab procedure designed in the early 1970s that served its purpose astonishingly well in the absence of deciphered genome sequences. With the rapid progress in genome sequencing time has come to directly use the now available and easy to generate genome sequences for delimitation of species. GBDP (Genome Blast Distance Phylogeny) infers genome-to-genome distances between pairs of entirely or partially sequenced genomes, a digital, highly reliable estimator for the relatedness of genomes. Its application as an in-silico replacement for DDH was recently introduced. The main challenge in the implementation of such an application is to produce digital DDH values that must mimic the wet-lab DDH values as close as possible to ensure consistency in the Prokaryotic species concept.ResultsCorrelation and regression analyses were used to determine the best-performing methods and the most influential parameters. GBDP was further enriched with a set of new features such as confidence intervals for intergenomic distances obtained via resampling or via the statistical models for DDH prediction and an additional family of distance functions. As in previous analyses, GBDP obtained the highest agreement with wet-lab DDH among all tested methods, but improved models led to a further increase in the accuracy of DDH prediction. Confidence intervals yielded stable results when inferred from the statistical models, whereas those obtained via resampling showed marked differences between the underlying distance functions.ConclusionsDespite the high accuracy of GBDP-based DDH prediction, inferences from limited empirical data are always associated with a certain degree of uncertainty. It is thus crucial to enrich in-silico DDH replacements with confidence-interval estimation, enabling the user to statistically evaluate the outcomes. Such methodological advancements, easily accessible through the web service at http://ggdc.dsmz.de, are crucial steps towards a consistent and truly genome sequence-based classification of microorganisms.


Standards in Genomic Sciences | 2010

Digital DNA-DNA hybridization for microbial species delineation by means of genome-to-genome sequence comparison

Alexander F. Auch; Mathias von Jan; Hans-Peter Klenk; Markus Göker

The pragmatic species concept for Bacteria and Archaea is ultimately based on DNA-DNA hybridization (DDH). While enabling the taxonomist, in principle, to obtain an estimate of the overall similarity between the genomes of two strains, this technique is tedious and error-prone and cannot be used to incrementally build up a comparative database. Recent technological progress in the area of genome sequencing calls for bioinformatics methods to replace the wet-lab DDH by in-silico genome-to-genome comparison. Here we investigate state-of-the-art methods for inferring whole-genome distances in their ability to mimic DDH. Algorithms to efficiently determine high-scoring segment pairs or maximally unique matches perform well as a basis of inferring intergenomic distances. The examined distance functions, which are able to cope with heavily reduced genomes and repetitive sequence regions, outperform previously described ones regarding the correlation with and error ratios in emulating DDH. Simulation of incompletely sequenced genomes indicates that some distance formulas are very robust against missing fractions of genomic information. Digitally derived genome-to-genome distances show a better correlation with 16S rRNA gene sequence distances than DDH values. The future perspectives of genome-informed taxonomy are discussed, and the investigated methods are made available as a web service for genome-based species delineation.


Standards in Genomic Sciences | 2010

Standard operating procedure for calculating genome-to-genome distances based on high-scoring segment pairs

Alexander F. Auch; Hans-Peter Klenk; Markus Göker

DNA-DNA hybridization (DDH) is a widely applied wet-lab technique to obtain an estimate of the overall similarity between the genomes of two organisms. To base the species concept for prokaryotes ultimately on DDH was chosen by microbiologists as a pragmatic approach for deciding about the recognition of novel species, but also allowed a relatively high degree of standardization compared to other areas of taxonomy. However, DDH is tedious and error-prone and first and foremost cannot be used to incrementally establish a comparative database. Recent studies have shown that in-silico methods for the comparison of genome sequences can be used to replace DDH. Considering the ongoing rapid technological progress of sequencing methods, genome-based prokaryote taxonomy is coming into reach. However, calculating distances between genomes is dependent on multiple choices for software and program settings. We here provide an overview over the modifications that can be applied to distance methods based in high-scoring segment pairs (HSPs) or maximally unique matches (MUMs) and that need to be documented. General recommendations on determining HSPs using BLAST or other algorithms are also provided. As a reference implementation, we introduce the GGDC web server (http://ggdc.gbdp.org).


Mycologia | 2002

Phylogenetic relationships of the downy mildews (Peronosporales) and related groups based on nuclear large subunit ribosomal DNA sequences

A. Riethmüller; Hermann Voglmayr; Markus Göker; Michael Weiß; Franz Oberwinkler

In order to investigate phylogenetic relationships of the Peronosporomycetes (Oomycetes), nuclear large subunit ribosomal DNA sequences containing the D1 and D2 region were analyzed of 92 species belonging to the orders Peronosporales, Pythiales, Leptomitales, Rhipidiales, Saprolegniales and Sclerosporales. The data were analyzed applying methods of neighbor-joining as well as maximum parsimony, both statistically supported using the bootstrap method. The results confirm the major division between the Pythiales and Peronosporales on the one hand and the Saprolegniales, Leptomitales, and Rhipidiales on the other. The Sclerosporales were shown to be polyphyletic; while Sclerosporaceae are nested within the Peronosporaceae, the Verrucalvaceae are merged within the Saprolegniales. Within the Peronosporomycetidae, Pythiales as well as Peronosporales as currently defined are polyphyletic. The well supported Albugo clade appears to be the most basal lineage, followed by a Pythium-Lagenidium clade. The third, highly supported clade comprises the Peronosporaceae together with Sclerospora, Phytophthora, and Peronophythora. Peronophythora is placed within Phytophthora, indicating that both genera should be merged. Bremiella seems to be polyphyletic within the genus Plasmopara, suggesting a transfer to Plasmopara. The species of Peronospora do not appear as a monophyletic group. Peronospora species growing on Brassicaceae form a highly supported clade.


International Journal of Systematic and Evolutionary Microbiology | 2013

Chryseobacterium hispalense sp. nov., a plant-growth-promoting bacterium isolated from a rainwater pond in an olive plant nursery, and emended descriptions of Chryseobacterium defluvii, Chryseobacterium indologenes, Chryseobacterium wanjuense and Chryseobacterium gregarium

Maria del Carmen Montero-Calasanz; Markus Göker; Manfred Rohde; Cathrin Spröer; Peter Schumann; Hans-Jürgen Busse; Michael Schmid; Brian J. Tindall; Hans-Peter Klenk; M. Camacho

A novel non-motile, Gram-staining-negative, yellow-pigmented bacterium, designated AG13(T), isolated from a rain water pond at a plant nursery in Spain and characterized as a plant-growth-promoting bacterium, was investigated to determine its taxonomic status. The isolate grew best over a temperature range of 15-40 °C, at pH 5.0-8.0 and with 0-4 % (w/v) NaCl. Chemotaxonomic and molecular characteristics of the isolate matched those described for members of the genus Chryseobacterium. The DNA G+C content of the novel strain was 37.2 mol%. The strain had a polyamine pattern with sym-homospermidine as the major compound and produced flexirubin-type pigments. MK-6 was the dominant menaquinone and the major cellular fatty acids were iso-C15 : 0, C17 : 1ω9c and iso-C17 : 0 3-OH. The main polar lipids were phosphatidylethanolamine, aminolipids and several unidentified lipids. The 16S rRNA gene showed 92.0-97.2 % sequence similarity with those of the members of the genus Chryseobacterium. Based on chemotaxonomic and phenotypic traits, and DNA-DNA hybridizations with the type strains of the most closely related species, the isolate is proposed to represent a novel species, Chryseobacterium hispalense, type strain AG13(T) ( = DSM 25574(T) = CCUG 63019(T)). Emended descriptions of the species Chryseobacterium defluvii, Chryseobacterium indologenes, Chryseobacterium wanjuense and Chryseobacterium gregarium are also provided.


International Journal of Systematic and Evolutionary Microbiology | 2014

Taxonomic use of DNA G+C content and DNA–DNA hybridization in the genomic age

Jan P. Meier-Kolthoff; Hans-Peter Klenk; Markus Göker

The G+C content of a genome is frequently used in taxonomic descriptions of species and genera. In the past it has been determined using conventional, indirect methods, but it is nowadays reasonable to calculate the DNA G+C content directly from the increasingly available and affordable genome sequences. The expected increase in accuracy, however, might alter the way in which the G+C content is used for drawing taxonomic conclusions. We here re-estimate the literature assumption that the G+C content can vary up to 3-5 % within species using genomic datasets. The resulting G+C content differences are compared with DNA-DNA hybridization (DDH) similarities calculated in silico using the GGDC web server, with 70% similarity as the gold standard threshold for species boundaries. The results indicate that the G+C content, if computed from genome sequences, varies no more than 1% within species. Statistical models based on larger differences alone can reject the hypothesis that two strains belong to the same species. Because DDH similarities between two non-type strains occur in the genomic datasets, we also examine to what extent and under which conditions such a similarity could be <70% even though the similarity of either strain to a type strain was ≥ 70%. In theory, their similarity could be as low as 50%, whereas empirical data suggest a boundary closer (but not identical) to 70%. However, it is shown that using a 50% boundary would not affect the conclusions regarding the DNA G+C content. Hence, we suggest that discrepancies between G+C content data provided in species descriptions on the one hand and those recalculated after genome sequencing on the other hand ≥ 1% are due to significant inaccuracies of the applied conventional methods and accordingly call for emendations of species descriptions.


PLOS Biology | 2014

Genomic Encyclopedia of Bacteria and Archaea: Sequencing a Myriad of Type Strains

Nikos C. Kyrpides; Philip Hugenholtz; Jonathan A. Eisen; Tanja Woyke; Markus Göker; Charles Thomas Parker; Rudolf Amann; Brian Beck; Patrick Chain; Jongsik Chun; Rita R. Colwell; Antoine Danchin; Peter Dawyndt; Tom Dedeurwaerdere; Edward F. DeLong; John C. Detter; Paul De Vos; Timothy J. Donohue; Xiu Zhu Dong; Dusko S. Ehrlich; Claire M. Fraser; Richard A. Gibbs; Jack A. Gilbert; Paul Gilna; Frank Oliver Glöckner; Janet K. Jansson; Jay D. Keasling; Rob Knight; David P. Labeda; Alla Lapidus

This manuscript calls for an international effort to generate a comprehensive catalog from genome sequences of all the archaeal and bacterial type strains.


PLOS ONE | 2012

Visualization and Curve-Parameter Estimation Strategies for Efficient Exploration of Phenotype Microarray Kinetics

Lea A. I. Vaas; Johannes Sikorski; Victoria Michael; Markus Göker; Hans-Peter Klenk

Background The Phenotype MicroArray (OmniLog® PM) system is able to simultaneously capture a large number of phenotypes by recording an organisms respiration over time on distinct substrates. This technique targets the object of natural selection itself, the phenotype, whereas previously addressed ‘-omics’ techniques merely study components that finally contribute to it. The recording of respiration over time, however, adds a longitudinal dimension to the data. To optimally exploit this information, it must be extracted from the shapes of the recorded curves and displayed in analogy to conventional growth curves. Methodology The free software environment R was explored for both visualizing and fitting of PM respiration curves. Approaches using either a model fit (and commonly applied growth models) or a smoothing spline were evaluated. Their reliability in inferring curve parameters and confidence intervals was compared to the native OmniLog® PM analysis software. We consider the post-processing of the estimated parameters, the optimal classification of curve shapes and the detection of significant differences between them, as well as practically relevant questions such as detecting the impact of cultivation times and the minimum required number of experimental repeats. Conclusions We provide a comprehensive framework for data visualization and parameter estimation according to user choices. A flexible graphical representation strategy for displaying the results is proposed, including 95% confidence intervals for the estimated parameters. The spline approach is less prone to irregular curve shapes than fitting any of the considered models or using the native PM software for calculating both point estimates and confidence intervals. These can serve as a starting point for the automated post-processing of PM data, providing much more information than the strict dichotomization into positive and negative reactions. Our results form the basis for a freely available R package for the analysis of PM data.


Standards in Genomic Sciences | 2009

Complete genome sequence of Kytococcus sedentarius type strain (541T)

David Sims; Thomas Brettin; John C. Detter; Cliff Han; Alla Lapidus; Alex Copeland; Tijana Glavina del Rio; Matt Nolan; Feng Chen; Susan Lucas; Hope Tice; Jan-Fang Cheng; David Bruce; Lynne Goodwin; Sam Pitluck; Galina Ovchinnikova; Amrita Pati; Natalia Ivanova; Konstantinos Mavromatis; Amy Chen; Krishna Palaniappan; Patrik D’haeseleer; Patrick Chain; Jim Bristow; Jonathan A. Eisen; Victor Markowitz; Philip Hugenholtz; Susanne Schneider; Markus Göker; Rüdiger Pukall

Kytococcus sedentarius (ZoBell and Upham 1944) Stackebrandt et al. 1995 is the type strain of the species, and is of phylogenetic interest because of its location in the Dermacoccaceae, a poorly studied family within the actinobacterial suborder Micrococcineae. K. sedentarius is known for the production of oligoketide antibiotics as well as for its role as an opportunistic pathogen causing valve endocarditis, hemorrhagic pneumonia, and pitted keratolysis. It is strictly aerobic and can only grow when several amino acids are provided in the medium. The strain described in this report is a free-living, nonmotile, Gram-positive bacterium, originally isolated from a marine environment. Here we describe the features of this organism, together with the complete genome sequence, and annotation. This is the first complete genome sequence of a member of the family Dermacoccaceae and the 2,785,024 bp long single replicon genome with its 2639 protein-coding and 64 RNA genes is part of the GenomicEncyclopedia ofBacteria andArchaea project.


Standards in Genomic Sciences | 2014

Complete genome sequence of DSM 30083T, the type strain (U5/41T) of Escherichia coli, and a proposal for delineating subspecies in microbial taxonomy

Jan P. Meier-Kolthoff; Richard L. Hahnke; Jörn Petersen; Carmen Scheuner; Victoria Michael; Anne Fiebig; Christine Rohde; Manfred Rohde; Berthold Fartmann; Lynne Goodwin; Olga Chertkov; T. B. K. Reddy; Amrita Pati; Natalia Ivanova; Victor Markowitz; Nikos C. Kyrpides; Tanja Woyke; Markus Göker; Hans-Peter Klenk

Although Escherichia coli is the most widely studied bacterial model organism and often considered to be the model bacterium per se, its type strain was until now forgotten from microbial genomics. As a part of the GenomicEncyclopedia ofBacteria andArchaea project, we here describe the features of E. coli DSM 30083T together with its genome sequence and annotation as well as novel aspects of its phenotype. The 5,038,133 bp containing genome sequence includes 4,762 protein-coding genes and 175 RNA genes as well as a single plasmid. Affiliation of a set of 250 genome-sequenced E. coli strains, Shigella and outgroup strains to the type strain of E. coli was investigated using digital DNA:DNA-hybridization (dDDH) similarities and differences in genomic G+C content. As in the majority of previous studies, results show Shigella spp. embedded within E. coli and in most cases forming a single subgroup of it. Phylogenomic trees also recover the proposed E. coli phylotypes as monophyla with minor exceptions and place DSM 30083T in phylotype B2 with E. coli S88 as its closest neighbor. The widely used lab strain K-12 is not only genomically but also physiologically strongly different from the type strain. The phylotypes do not express a uniform level of character divergence as measured using dDDH, however, thus an alternative arrangement is proposed and discussed in the context of bacterial subspecies. Analyses of the genome sequences of a large number of E. coli strains and of strains from > 100 other bacterial genera indicate a value of 79-80% dDDH as the most promising threshold for delineating subspecies, which in turn suggests the presence of five subspecies within E. coli.

Collaboration


Dive into the Markus Göker's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar

Lynne Goodwin

Los Alamos National Laboratory

View shared research outputs
Top Co-Authors

Avatar

Amrita Pati

Joint Genome Institute

View shared research outputs
Top Co-Authors

Avatar

Sam Pitluck

Joint Genome Institute

View shared research outputs
Top Co-Authors

Avatar

Amy Chen

Joint Genome Institute

View shared research outputs
Top Co-Authors

Avatar

Manfred Rohde

Lawrence Livermore National Laboratory

View shared research outputs
Top Co-Authors

Avatar

Susan Lucas

Joint Genome Institute

View shared research outputs
Top Co-Authors

Avatar

Krishna Palaniappan

Lawrence Berkeley National Laboratory

View shared research outputs
Top Co-Authors

Avatar

Alla Lapidus

Saint Petersburg State University

View shared research outputs
Top Co-Authors

Avatar

Matt Nolan

Joint Genome Institute

View shared research outputs
Researchain Logo
Decentralizing Knowledge