Jonas Behr
Max Planck Society
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Jonas Behr.
Nature | 2011
Xiangchao Gan; Oliver Stegle; Jonas Behr; Joshua G. Steffen; Philipp Drewe; Katie L. Hildebrand; Rune Lyngsoe; Sebastian J. Schultheiss; Edward J. Osborne; Vipin T. Sreedharan; André Kahles; Regina Bohnert; Géraldine Jean; Paul S. Derwent; Paul J. Kersey; Eric J. Belfield; Nicholas P. Harberd; Eric Kemen; Christopher Toomajian; Paula X. Kover; Richard M. Clark; Gunnar Rätsch; Richard Mott
Genetic differences between Arabidopsis thaliana accessions underlie the plant’s extensive phenotypic variation, and until now these have been interpreted largely in the context of the annotated reference accession Col-0. Here we report the sequencing, assembly and annotation of the genomes of 18 natural A. thaliana accessions, and their transcriptomes. When assessed on the basis of the reference annotation, one-third of protein-coding genes are predicted to be disrupted in at least one accession. However, re-annotation of each genome revealed that alternative gene models often restore coding potential. Gene expression in seedlings differed for nearly half of expressed genes and was frequently associated with cis variants within 5 kilobases, as were intron retention alternative splicing events. Sequence and expression variation is most pronounced in genes that respond to the biotic environment. Our data further promote evolutionary and functional studies in A. thaliana, especially the MAGIC genetic reference population descended from these accessions.
BMC Bioinformatics | 2007
Sören Sonnenburg; Gabriele Schweikert; Petra Philips; Jonas Behr; Gunnar Rätsch
BackgroundFor splice site recognition, one has to solve two classification problems: discriminating true from decoy splice sites for both acceptor and donor sites. Gene finding systems typically rely on Markov Chains to solve these tasks.ResultsIn this work we consider Support Vector Machines for splice site recognition. We employ the so-called weighted degree kernel which turns out well suited for this task, as we will illustrate in several experiments where we compare its prediction accuracy with that of recently proposed systems. We apply our method to the genome-wide recognition of splice sites in Caenorhabditis elegans, Drosophila melanogaster, Arabidopsis thaliana, Danio rerio, and Homo sapiens. Our performance estimates indicate that splice sites can be recognized very accurately in these genomes and that our method outperforms many other methods including Markov Chains, GeneSplicer and SpliceMachine. We provide genome-wide predictions of splice sites and a stand-alone prediction tool ready to be used for incorporation in a gene finder.AvailabilityData, splits, additional information on the model selection, the whole genome predictions, as well as the stand-alone prediction tool are available for download at http://www.fml.mpg.de/raetsch/projects/splice.
The Plant Cell | 2013
Gabriele Drechsel; André Kahles; Anil K. Kesarwani; Eva Stauffer; Jonas Behr; Philipp Drewe; Gunnar Rätsch; Andreas Wachter
Alternative precursor mRNA splicing represents a major mechanism to increase transcriptome diversity in higher eukaryotes. This work demonstrates that a substantial fraction of the alternative splicing variants from Arabidopsis thaliana is subject to turn over via the RNA quality-control mechanism nonsense-mediated decay, having important implications for the regulation of gene expression. The nonsense-mediated decay (NMD) surveillance pathway can recognize erroneous transcripts and physiological mRNAs, such as precursor mRNA alternative splicing (AS) variants. Currently, information on the global extent of coupled AS and NMD remains scarce and even absent for any plant species. To address this, we conducted transcriptome-wide splicing studies using Arabidopsis thaliana mutants in the NMD factor homologs UP FRAMESHIFT1 (UPF1) and UPF3 as well as wild-type samples treated with the translation inhibitor cycloheximide. Our analyses revealed that at least 17.4% of all multi-exon, protein-coding genes produce splicing variants that are targeted by NMD. Moreover, we provide evidence that UPF1 and UPF3 act in a translation-independent mRNA decay pathway. Importantly, 92.3% of the NMD-responsive mRNAs exhibit classical NMD-eliciting features, supporting their authenticity as direct targets. Genes generating NMD-sensitive AS variants function in diverse biological processes, including signaling and protein modification, for which NaCl stress–modulated AS-NMD was found. Besides mRNAs, numerous noncoding RNAs and transcripts derived from intergenic regions were shown to be NMD responsive. In summary, we provide evidence for a major function of AS-coupled NMD in shaping the Arabidopsis transcriptome, having fundamental implications in gene regulation and quality control of transcript processing.
Genome Research | 2009
Gabriele Schweikert; Alexander Zien; Georg Zeller; Jonas Behr; Christoph Dieterich; Cheng Soon Ong; Petra Philips; Fabio De Bona; Lisa Hartmann; Anja Bohlen; Nina Krüger; Sören Sonnenburg; Gunnar Rätsch
We present a highly accurate gene-prediction system for eukaryotic genomes, called mGene. It combines in an unprecedented manner the flexibility of generalized hidden Markov models (gHMMs) with the predictive power of modern machine learning methods, such as Support Vector Machines (SVMs). Its excellent performance was proved in an objective competition based on the genome of the nematode Caenorhabditis elegans. Considering the average of sensitivity and specificity, the developmental version of mGene exhibited the best prediction performance on nucleotide, exon, and transcript level for ab initio and multiple-genome gene-prediction tasks. The fully developed version shows superior performance in 10 out of 12 evaluation criteria compared with the other participating gene finders, including Fgenesh++ and Augustus. An in-depth analysis of mGenes genome-wide predictions revealed that approximately 2200 predicted genes were not contained in the current genome annotation. Testing a subset of 57 of these genes by RT-PCR and sequencing, we confirmed expression for 24 (42%) of them. mGene missed 300 annotated genes, out of which 205 were unconfirmed. RT-PCR testing of 24 of these genes resulted in a success rate of merely 8%. These findings suggest that even the gene catalog of a well-studied organism such as C. elegans can be substantially improved by mGenes predictions. We also provide gene predictions for the four nematodes C. briggsae, C. brenneri, C. japonica, and C. remanei. Comparing the resulting proteomes among these organisms and to the known protein universe, we identified many species-specific gene inventions. In a quality assessment of several available annotations for these genomes, we find that mGenes predictions are most accurate.
Bioinformatics | 2013
Jonas Behr; André Kahles; Yi Zhong; Vipin T. Sreedharan; Philipp Drewe; Gunnar Rätsch
Motivation: High-throughput sequencing of mRNA (RNA-Seq) has led to tremendous improvements in the detection of expressed genes and reconstruction of RNA transcripts. However, the extensive dynamic range of gene expression, technical limitations and biases, as well as the observed complexity of the transcriptional landscape, pose profound computational challenges for transcriptome reconstruction. Results: We present the novel framework MITIE (Mixed Integer Transcript IdEntification) for simultaneous transcript reconstruction and quantification. We define a likelihood function based on the negative binomial distribution, use a regularization approach to select a few transcripts collectively explaining the observed read data and show how to find the optimal solution using Mixed Integer Programming. MITIE can (i) take advantage of known transcripts, (ii) reconstruct and quantify transcripts simultaneously in multiple samples, and (iii) resolve the location of multi-mapping reads. It is designed for genome- and assembly-based transcriptome reconstruction. We present an extensive study based on realistic simulated RNA-Seq data. When compared with state-of-the-art approaches, MITIE proves to be significantly more sensitive and overall more accurate. Moreover, MITIE yields substantial performance gains when used with multiple samples. We applied our system to 38 Drosophila melanogaster modENCODE RNA-Seq libraries and estimated the sensitivity of reconstructing omitted transcript annotations and the specificity with respect to annotated transcripts. Our results corroborate that a well-motivated objective paired with appropriate optimization techniques lead to significant improvements over the state-of-the-art in transcriptome reconstruction. Availability: MITIE is implemented in C++ and is available from http://bioweb.me/mitie under the GPL license. Contact: [email protected] and [email protected] Supplementary information: Supplementary data are available at Bioinformatics online.
Nucleic Acids Research | 2009
Gabriele Schweikert; Jonas Behr; Alexander Zien; Georg Zeller; Cheng Soon Ong; Sören Sonnenburg; Gunnar Rätsch
We describe mGene.web, a web service for the genome-wide prediction of protein coding genes from eukaryotic DNA sequences. It offers pre-trained models for the recognition of gene structures including untranslated regions in an increasing number of organisms. With mGene.web, users have the additional possibility to train the system with their own data for other organisms on the push of a button, a functionality that will greatly accelerate the annotation of newly sequenced genomes. The system is built in a highly modular way, such that individual components of the framework, like the promoter prediction tool or the splice site predictor, can be used autonomously. The underlying gene finding system mGene is based on discriminative machine learning techniques and its high accuracy has been demonstrated in an international competition on nematode genomes. mGene.web is available at http://www.mgene.org/web, it is free of charge and can be used for eukaryotic genomes of small to moderate size (several hundred Mbp).
BMC Bioinformatics | 2009
Regina Bohnert; Jonas Behr; Gunnar Rätsch
Motivation Novel high-throughput sequencing technologies open exciting new approaches to transcriptome profiling. Sequencing transcript populations of interest, e.g. from different tissues or variable stress conditions, with RNA sequencing (RNA-Seq) [1] generates millions of short reads. Accurately aligned to a reference genome, they provide digital counts and thus facilitate transcript quantification. As the observed read counts only provide the summation of all expressed sequences at one locus, the inference of the underlying transcript abundances is crucial for further quantitative analyses.
The Plant Cell | 2016
Lisa Hartmann; Philipp Drewe-Boß; Theresa Wießner; Gabriele Wagner; Sascha Geue; Hsin-Chieh Lee; Dominik M. Obermüller; André Kahles; Jonas Behr; Fabian Sinz; Gunnar Rätsch; Andreas Wachter
Alternative splicing is an integration point of light and sugar signaling pathways in Arabidopsis. Plants use light as source of energy and information to detect diurnal rhythms and seasonal changes. Sensing changing light conditions is critical to adjust plant metabolism and to initiate developmental transitions. Here, we analyzed transcriptome-wide alterations in gene expression and alternative splicing (AS) of etiolated seedlings undergoing photomorphogenesis upon exposure to blue, red, or white light. Our analysis revealed massive transcriptome reprogramming as reflected by differential expression of ∼20% of all genes and changes in several hundred AS events. For more than 60% of all regulated AS events, light promoted the production of a presumably protein-coding variant at the expense of an mRNA with nonsense-mediated decay-triggering features. Accordingly, AS of the putative splicing factor REDUCED RED-LIGHT RESPONSES IN CRY1CRY2 BACKGROUND1, previously identified as a red light signaling component, was shifted to the functional variant under light. Downstream analyses of candidate AS events pointed at a role of photoreceptor signaling only in monochromatic but not in white light. Furthermore, we demonstrated similar AS changes upon light exposure and exogenous sugar supply, with a critical involvement of kinase signaling. We propose that AS is an integration point of signaling pathways that sense and transmit information regarding the energy availability in plants.
BMC Bioinformatics | 2010
Jonas Behr; Regina Bohnert; Georg Zeller; Gabriele Schweikert; Lisa Hartmann; Gunnar Rätsch
An increasingly large number of novel genomes is being sequenced and the task of automatic genome annotation has never been more important. The current revolution in sequencing technologies also allows us to obtain a detailed picture of the whole complement of expressed RNA transcripts. We have developed a novel de novo gene finding system mGene.ngs that combines the benefits of accurate ab initio gene finding with the rich information obtained in RNA sequencing (RNA-seq) experiments. The system is based on the recently developed accurate gene finding system mGene [1], which employs state-of-the-art prediction techniques and which has been shown to perform very well compared to established gene finding systems [2]. In contrast to many HMM-based gene finders, mGene has the conceptual advantage of being very flexible in terms of incorporating heterogeneous input data. The employed inference techniques can exploit the transcriptome information already at the learning stage to appropriately adapt to the relevance of the different evidences. We show that these advantages can be translated into more accurate gene predictions. Moreover, we developed extensions of mGene.ngs to predict and quantify alternative RNA transcripts. To provide de novo genome annotations based on RNA-seq experiments, we first construct a preliminary, highly specific gene set for genes that are well-covered with RNA-seq reads. In a second step, we train predictors for genomic signals on the preliminary gene set. In the third step we train mGene.ngs, using the preliminary gene models while taking advantage of the RNA-seq read coverage and genomic signal predictions. We illustrate the power of our approach for the C. elegans genome and 50M paired-end RNA-seq reads (Illumina; 76nt). Figure 1 shows transcript level evaluation results for all annotated genes (WS200) as a function of the expression level. The ab initio mGene-based system (blue) trained on the annotation achieves an average transcript-level F-score of 49.9%. We achieve a slightly better performance (51.8%) for the de novo annotation system (green) using RNA-seq reads, but without considering the existing genome annotation. If we use the RNA-seq reads and train on the existing annotation (red), we achieve 57.6%, and can therefore take advantage of the previous annotation. We find it remarkable that for medium to high expressed genes the de novo gene predictions are as similar to the genome annotation as the predictions of the system, that has seen parts of the annotation in training. Comparing these results to predictions from the recently published method cufflinks [3] (black) reveals that cufflinks seems not to be able to appropriately adapt to the RNA-seq data at hand. Investigating the contribution of individual features we found that spliced read alignments suggesting introns help most to increase the gene prediction performance; 91.6% of the achieved total improvement is due to spliced read alignments. The read coverage alone is much less informative and only leads to improvements similar to the ones achieved with transcriptome tiling arrays. We employed the developed annotation strategy for the re-annotation of the C. briggsae genome, for which only few transcriptome sequences are available yet. We can show that the new annotation is considerably more accurate than previous ones and additionally includes alternative RNA isoforms. mGene.ngs will be released as open source software on http://mgene.org and is already available as Galaxy-based web-service at http://galaxy.fml.mpg.de. * Correspondence: [email protected] Friedrich Miescher Laboratory of the Max Planck Society, Tubingen, Germany Full list of author information is available at the end of the article Behr et al. BMC Bioinformatics 2010, 11(Suppl 10):O8 http://www.biomedcentral.com/1471-2105/11/S10/O8
Bioinformatics | 2016
André Kahles; Jonas Behr; Gunnar Rätsch
Abstract Motivation: Mapping high-throughput sequencing data to a reference genome is an essential step for most analysis pipelines aiming at the computational analysis of genome and transcriptome sequencing data. Breaking ties between equally well mapping locations poses a severe problem not only during the alignment phase but also has significant impact on the results of downstream analyses. We present the multi-mapper resolution (MMR) tool that infers optimal mapping locations from the coverage density of other mapped reads. Results: Filtering alignments with MMR can significantly improve the performance of downstream analyses like transcript quantitation and differential testing. We illustrate that the accuracy (Spearman correlation) of transcript quantification increases by 15% when using reads of length 51. In addition, MMR decreases the alignment file sizes by more than 50%, and this leads to a reduced running time of the quantification tool. Our efficient implementation of the MMR algorithm is easily applicable as a post-processing step to existing alignment files in BAM format. Its complexity scales linearly with the number of alignments and requires no further inputs. Availability and implementation: Open source code and documentation are available for download at http://github.com/ratschlab/mmr. Comprehensive testing results and further information can be found at http://bioweb.me/mmr. Contact: [email protected] or [email protected] Supplementary information: Supplementary data are available at Bioinformatics online.