Steffen Schober | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Steffen Schober is active.

Explore More

Publication

Featured researches published by Steffen Schober.

BMC Genomics | 2014

Comparison of strand-specific transcriptomes of enterohemorrhagic Escherichia coli O157:H7 EDL933 (EHEC) under eleven different environmental conditions including radish sprouts and cattle feces

Richard Landstorfer; Svenja Simon; Steffen Schober; Daniel A. Keim; Siegfried Scherer; Klaus Neuhaus

BackgroundMultiple infection sources for enterohemorrhagic Escherichia coli O157:H7 (EHEC) are known, including animal products, fruit and vegetables. The ecology of this pathogen outside its human host is largely unknown and one third of its annotated genes are still hypothetical. To identify genetic determinants expressed under a variety of environmental factors, we applied strand-specific RNA-sequencing, comparing the SOLiD and Illumina systems.ResultsTranscriptomes of EHEC were sequenced under 11 different biotic and abiotic conditions: LB medium at pH4, pH7, pH9, or at 15°C; LB with nitrite or trimethoprim-sulfamethoxazole; LB-agar surface, M9 minimal medium, spinach leaf juice, surface of living radish sprouts, and cattle feces. Of 5379 annotated genes in strain EDL933 (genome and plasmid), a surprising minority of only 144 had null sequencing reads under all conditions. We therefore developed a statistical method to distinguish weakly transcribed genes from background transcription. We find that 96% of all genes and 91.5% of the hypothetical genes exhibit a significant transcriptional signal under at least one condition. Comparing SOLiD and Illumina systems, we find a high correlation between both approaches for fold-changes of the induced or repressed genes. The pathogenicity island LEE showed highest transcriptional activity in LB medium, minimal medium, and after treatment with antibiotics. Unique sets of genes, including many hypothetical genes, are highly up-regulated on radish sprouts, cattle feces, or in the presence of antibiotics. Furthermore, we observed induction of the shiga-toxin carrying phages by antibiotics and confirmed active biofilm related genes on radish sprouts, in cattle feces, and on agar plates.ConclusionsSince only a minority of genes (2.7%) were not active under any condition tested (null reads), we suggest that the assumption of significant genome over-annotations is wrong. Environmental transcriptomics uncovered hitherto unknown gene functions and unique regulatory patterns in EHEC. For instance, the environmental function of azoR had been elusive, but this gene is highly active on radish sprouts. Thus, NGS-transcriptomics is an appropriate technique to propose new roles of hypothetical genes and to guide future research.

IEEE Transactions on Information Theory | 2014

Canalizing Boolean Functions Maximize Mutual Information

Johannes Georg Klotz; David Kracht; Martin Bossert; Steffen Schober

Information processing in biologically motivated Boolean networks is of interest in recent information theoretic research. One measure to quantify this ability is the well-known mutual information. Using Fourier analysis, we show that canalizing functions maximize mutual information between a single input variable and the outcome of a function with fixed expectation. A similar result can be obtained for the mutual information between a set of input variables and the output. Further, if the expectation of the function is not fixed, we obtain that the mutual information is maximized by a function only dependent on this single variable, i.e., the dictatorship function. We prove our findings for Boolean functions with uniformly distributed as well as product distributed input variables.

PLOS ONE | 2013

Short Barcodes for Next Generation Sequencing

Katharina Mir; Klaus Neuhaus; Martin Bossert; Steffen Schober

We consider the design and evaluation of short barcodes, with a length between six and eight nucleotides, used for parallel sequencing on platforms where substitution errors dominate. Such codes should have not only good error correction properties but also the code words should fulfil certain biological constraints (experimental parameters). We compare published barcodes with codes obtained by two new constructions methods, one based on the currently best known linear codes and a simple randomized construction method. The evaluation done is with respect to the error correction capabilities, barcode size and their experimental parameters and fundamental bounds on the code size and their distance properties. We provide a list of codes for lengths between six and eight nucleotides, where for length eight, two substitution errors can be corrected. In fact, no code with larger minimum distance can exist.

Eurasip Journal on Bioinformatics and Systems Biology | 2013

Properties of Boolean networks and methods for their tests

Johannes Georg Klotz; Ronny Feuer; Oliver Sawodny; Martin Bossert; Michael Ederer; Steffen Schober

Transcriptional regulation networks are often modeled as Boolean networks. We discuss certain properties of Boolean functions (BFs), which are considered as important in such networks, namely, membership to the classes of unate or canalizing functions. Of further interest is the average sensitivity (AS) of functions. In this article, we discuss several algorithms to test the properties of interest. To test canalizing properties of functions, we apply spectral techniques, which can also be used to characterize the AS of functions as well as the influences of variables in unate BFs. Further, we provide and review upper and lower bounds on the AS of unate BFs based on the spectral representation. Finally, we apply these methods to a transcriptional regulation network of Escherichia coli, which controls central parts of the E. coli metabolism. We find that all functions are unate. Also the analysis of the AS of the network reveals an exceptional robustness against transient fluctuations of the binary variables.a

PLOS ONE | 2012

Predicting statistical properties of open reading frames in bacterial genomes.

Katharina Mir; Klaus Neuhaus; Siegfried Scherer; Martin Bossert; Steffen Schober

An analytical model based on the statistical properties of Open Reading Frames (ORFs) of eubacterial genomes such as codon composition and sequence length of all reading frames was developed. This new model predicts the average length, maximum length as well as the length distribution of the ORFs of 70 species with GC contents varying between 21% and 74%. Furthermore, the number of annotated genes is predicted with high accordance. However, the ORF length distribution in the five alternative reading frames shows interesting deviations from the predicted distribution. In particular, long ORFs appear more often than expected statistically. The unexpected depletion of stop codons in these alternative open reading frames cannot completely be explained by a biased codon usage in the +1 frame. While it is unknown if the stop codon depletion has a biological function, it could be due to a protein coding capacity of alternative ORFs exerting a selection pressure which prevents the fixation of stop codon mutations. The comparison of the analytical model with bacterial genomes, therefore, leads to a hypothesis suggesting novel gene candidates which can now be investigated in subsequent wet lab experiments.

PLOS ONE | 2014

Selection pressure in alternative reading frames.

Katharina Mir; Steffen Schober

Overlapping genes are two protein-coding sequences sharing a significant part of the same DNA locus in different reading frames. Although in recent times an increasing number of examples have been found in bacteria the underlying mechanisms of their evolution are unknown. In this work we explore how selective pressure in a protein-coding sequence influences its overlapping genes in alternative reading frames. We model evolution using a time-continuous Markov process and derive the corresponding model for the remaining frames to quantify selection pressure and genetic noise. Our findings lead to the presumption that, once information is embedded in the reverse reading frame −2 (relative to the mother gene in +1) purifying selection in the protein-coding reading frame automatically protects the sequences in both frames. We also found that this coincides with the fact that the genetic noise measured using the conditional entropy is minimal in frame −2 under selection in the coding frame.

BMC Genomics | 2017

Differentiation of ncRNAs from small mRNAs in Escherichia coli O157:H7 EDL933 (EHEC) by combined RNAseq and RIBOseq – ryhB encodes the regulatory RNA RyhB and a peptide, RyhP

Klaus Neuhaus; Richard Landstorfer; Svenja Simon; Steffen Schober; Patrick R. Wright; Cameron Smith; Rolf Backofen; Romy Wecko; Daniel A. Keim; Siegfried Scherer

BackgroundWhile NGS allows rapid global detection of transcripts, it remains difficult to distinguish ncRNAs from short mRNAs. To detect potentially translated RNAs, we developed an improved protocol for bacterial ribosomal footprinting (RIBOseq). This allowed distinguishing ncRNA from mRNA in EHEC. A high ratio of ribosomal footprints per transcript (ribosomal coverage value, RCV) is expected to indicate a translated RNA, while a low RCV should point to a non-translated RNA.ResultsBased on their low RCV, 150 novel non-translated EHEC transcripts were identified as putative ncRNAs, representing both antisense and intergenic transcripts, 74 of which had expressed homologs in E. coli MG1655. Bioinformatics analysis predicted statistically significant target regulons for 15 of the intergenic transcripts; experimental analysis revealed 4-fold or higher differential expression of 46 novel ncRNA in different growth media. Out of 329 annotated EHEC ncRNAs, 52 showed an RCV similar to protein-coding genes, of those, 16 had RIBOseq patterns matching annotated genes in other enterobacteriaceae, and 11 seem to possess a Shine-Dalgarno sequence, suggesting that such ncRNAs may encode small proteins instead of being solely non-coding. To support that the RIBOseq signals are reflecting translation, we tested the ribosomal-footprint covered ORF of ryhB and found a phenotype for the encoded peptide in iron-limiting condition.ConclusionDetermination of the RCV is a useful approach for a rapid first-step differentiation between bacterial ncRNAs and small mRNAs. Further, many known ncRNAs may encode proteins as well.

BMC Bioinformatics | 2015

Insertion and deletion correcting DNA barcodes based on watermarks

David Kracht; Steffen Schober

BackgroundBarcode multiplexing is a key strategy for sharing the rising capacity of next-generation sequencing devices: Synthetic DNA tags, called barcodes, are attached to natural DNA fragments within the library preparation procedure. Different libraries, can individually be labeled with barcodes for a joint sequencing procedure. A post-processing step is needed to sort the sequencing data according to their origin, utilizing these DNA labels. The final separation step is called demultiplexing and is mainly determined by the characteristics of the DNA code words used as labels.Currently, we are facing two different strategies for barcoding: One is based on the Hamming distance, the other uses the edit metric to measure distances of code words. The theory of channel coding provides well-known code constructions for Hamming metric. They provide a large number of code words with variable lengths and maximal correction capability regarding substitution errors. However, some sequencing platforms are known to have exceptional high numbers of insertion or deletion errors. Barcodes based on the edit distance can take insertion and deletion errors into account in the decoding process. Unfortunately, there is no explicit code-construction known that gives optimal codes for edit metric.ResultsIn the present work we focus on an entirely different perspective to obtain DNA barcodes. We consider a concatenated code construction, producing so-called watermark codes, which were first proposed by Davey and Mackay, to communicate via binary channels with synchronization errors. We adapt and extend the concepts of watermark codes to use them for DNA sequencing. Moreover, we provide an exemplary set of barcodes that are experimentally compatible with common next-generation sequencing platforms. Finally, a realistic simulation scenario is use to evaluate the proposed codes to show that the watermark concept is suitable for DNA sequencing applications.ConclusionOur adaption of watermark codes enables the construction of barcodes that are capable of correcting substitutions, insertion and deletion errors. The presented approach has the advantage of not needing any markers or technical sequences to recover the position of the barcode in the sequencing reads, which poses a significant restriction with other approaches.

Eurasip Journal on Bioinformatics and Systems Biology | 2013

Harmonic analysis of Boolean networks: determinative power and perturbations

Reinhard Heckel; Steffen Schober; Martin Bossert

Consider a large Boolean network with a feed forward structure. Given a probability distribution on the inputs, can one find, possibly small, collections of input nodes that determine the states of most other nodes in the network? To answer this question, a notion that quantifies the determinative power of an input over the states of the nodes in the network is needed. We argue that the mutual information (MI) between a given subset of the inputs X={X1,...,Xn} of some node i and its associated function fi(X) quantifies the determinative power of this set of inputs over node i. We compare the determinative power of a set of inputs to the sensitivity to perturbations to these inputs, and find that, maybe surprisingly, an input that has large sensitivity to perturbations does not necessarily have large determinative power. However, for unate functions, which play an important role in genetic regulatory networks, we find a direct relation between MI and sensitivity to perturbations. As an application of our results, we analyze the large-scale regulatory network of Escherichia coli. We identify the most determinative nodes and show that a small subset of those reduces the overall uncertainty of the network state significantly. Furthermore, the network is found to be tolerant to perturbations of its inputs.

BMC Evolutionary Biology | 2015

Evidence for the recent origin of a bacterial protein-coding, overlapping orphan gene by evolutionary overprinting

Lea Fellner; Svenja Simon; Christian Scherling; Michael Witting; Steffen Schober; Christine Polte; Philippe Schmitt-Kopplin; Daniel A. Keim; Siegfried Scherer; Klaus Neuhaus

BackgroundGene duplication is believed to be the classical way to form novel genes, but overprinting may be an important alternative. Overprinting allows entirely novel proteins to evolve de novo, i.e., formerly non-coding open reading frames within functional genes become expressed. Only three cases have been described for Escherichia coli. Here, a fourth example is presented.ResultsRNA sequencing revealed an open reading frame weakly transcribed in cow dung, coding for 101 residues and embedded completely in the −2 reading frame of citC in enterohemorrhagic E. coli. This gene is designated novel overlapping gene, nog1. The promoter region fused to gfp exhibits specific activities and 5’ rapid amplification of cDNA ends indicated the transcriptional start 40-bp upstream of the start codon. nog1 was strand-specifically arrested in translation by a nonsense mutation silent in citC. This Nog1-mutant showed a phenotype in competitive growth against wild type in the presence of MgCl2. Small differences in metabolite concentrations were also found. Bioinformatic analyses propose Nog1 to be inner membrane-bound and to possess at least one membrane-spanning domain. A phylogenetic analysis suggests that the orphan gene nog1 arose by overprinting after Escherichia/Shigella separated from the other γ-proteobacteria.ConclusionsSince nog1 is of recent origin, non-essential, short, weakly expressed and only marginally involved in E. coli’s central metabolism, we propose that this gene is in an initial stage of evolution. While we present specific experimental evidence for the existence of a fourth overlapping gene in enterohemorrhagic E. coli, we believe that this may be an initial finding only and overlapping genes in bacteria may be more common than is currently assumed by microbiologists.

Explore More