Gabriele Schweikert
Max Planck Society
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Gabriele Schweikert.
BMC Bioinformatics | 2007
Sören Sonnenburg; Gabriele Schweikert; Petra Philips; Jonas Behr; Gunnar Rätsch
BackgroundFor splice site recognition, one has to solve two classification problems: discriminating true from decoy splice sites for both acceptor and donor sites. Gene finding systems typically rely on Markov Chains to solve these tasks.ResultsIn this work we consider Support Vector Machines for splice site recognition. We employ the so-called weighted degree kernel which turns out well suited for this task, as we will illustrate in several experiments where we compare its prediction accuracy with that of recently proposed systems. We apply our method to the genome-wide recognition of splice sites in Caenorhabditis elegans, Drosophila melanogaster, Arabidopsis thaliana, Danio rerio, and Homo sapiens. Our performance estimates indicate that splice sites can be recognized very accurately in these genomes and that our method outperforms many other methods including Markov Chains, GeneSplicer and SpliceMachine. We provide genome-wide predictions of splice sites and a stand-alone prediction tool ready to be used for incorporation in a gene finder.AvailabilityData, splits, additional information on the model selection, the whole genome predictions, as well as the stand-alone prediction tool are available for download at http://www.fml.mpg.de/raetsch/projects/splice.
Genome Research | 2009
Gabriele Schweikert; Alexander Zien; Georg Zeller; Jonas Behr; Christoph Dieterich; Cheng Soon Ong; Petra Philips; Fabio De Bona; Lisa Hartmann; Anja Bohlen; Nina Krüger; Sören Sonnenburg; Gunnar Rätsch
We present a highly accurate gene-prediction system for eukaryotic genomes, called mGene. It combines in an unprecedented manner the flexibility of generalized hidden Markov models (gHMMs) with the predictive power of modern machine learning methods, such as Support Vector Machines (SVMs). Its excellent performance was proved in an objective competition based on the genome of the nematode Caenorhabditis elegans. Considering the average of sensitivity and specificity, the developmental version of mGene exhibited the best prediction performance on nucleotide, exon, and transcript level for ab initio and multiple-genome gene-prediction tasks. The fully developed version shows superior performance in 10 out of 12 evaluation criteria compared with the other participating gene finders, including Fgenesh++ and Augustus. An in-depth analysis of mGenes genome-wide predictions revealed that approximately 2200 predicted genes were not contained in the current genome annotation. Testing a subset of 57 of these genes by RT-PCR and sequencing, we confirmed expression for 24 (42%) of them. mGene missed 300 annotated genes, out of which 205 were unconfirmed. RT-PCR testing of 24 of these genes resulted in a success rate of merely 8%. These findings suggest that even the gene catalog of a well-studied organism such as C. elegans can be substantially improved by mGenes predictions. We also provide gene predictions for the four nematodes C. briggsae, C. brenneri, C. japonica, and C. remanei. Comparing the resulting proteomes among these organisms and to the known protein universe, we identified many species-specific gene inventions. In a quality assessment of several available annotations for these genomes, we find that mGenes predictions are most accurate.
Nucleic Acids Research | 2009
Gabriele Schweikert; Jonas Behr; Alexander Zien; Georg Zeller; Cheng Soon Ong; Sören Sonnenburg; Gunnar Rätsch
We describe mGene.web, a web service for the genome-wide prediction of protein coding genes from eukaryotic DNA sequences. It offers pre-trained models for the recognition of gene structures including untranslated regions in an increasing number of organisms. With mGene.web, users have the additional possibility to train the system with their own data for other organisms on the push of a button, a functionality that will greatly accelerate the annotation of newly sequenced genomes. The system is built in a highly modular way, such that individual components of the framework, like the promoter prediction tool or the splice site predictor, can be used autonomously. The underlying gene finding system mGene is based on discriminative machine learning techniques and its high accuracy has been demonstrated in an international competition on nematode genomes. mGene.web is available at http://www.mgene.org/web, it is free of charge and can be used for eukaryotic genomes of small to moderate size (several hundred Mbp).
BMC Bioinformatics | 2010
Jonas Behr; Regina Bohnert; Georg Zeller; Gabriele Schweikert; Lisa Hartmann; Gunnar Rätsch
An increasingly large number of novel genomes is being sequenced and the task of automatic genome annotation has never been more important. The current revolution in sequencing technologies also allows us to obtain a detailed picture of the whole complement of expressed RNA transcripts. We have developed a novel de novo gene finding system mGene.ngs that combines the benefits of accurate ab initio gene finding with the rich information obtained in RNA sequencing (RNA-seq) experiments. The system is based on the recently developed accurate gene finding system mGene [1], which employs state-of-the-art prediction techniques and which has been shown to perform very well compared to established gene finding systems [2]. In contrast to many HMM-based gene finders, mGene has the conceptual advantage of being very flexible in terms of incorporating heterogeneous input data. The employed inference techniques can exploit the transcriptome information already at the learning stage to appropriately adapt to the relevance of the different evidences. We show that these advantages can be translated into more accurate gene predictions. Moreover, we developed extensions of mGene.ngs to predict and quantify alternative RNA transcripts. To provide de novo genome annotations based on RNA-seq experiments, we first construct a preliminary, highly specific gene set for genes that are well-covered with RNA-seq reads. In a second step, we train predictors for genomic signals on the preliminary gene set. In the third step we train mGene.ngs, using the preliminary gene models while taking advantage of the RNA-seq read coverage and genomic signal predictions. We illustrate the power of our approach for the C. elegans genome and 50M paired-end RNA-seq reads (Illumina; 76nt). Figure 1 shows transcript level evaluation results for all annotated genes (WS200) as a function of the expression level. The ab initio mGene-based system (blue) trained on the annotation achieves an average transcript-level F-score of 49.9%. We achieve a slightly better performance (51.8%) for the de novo annotation system (green) using RNA-seq reads, but without considering the existing genome annotation. If we use the RNA-seq reads and train on the existing annotation (red), we achieve 57.6%, and can therefore take advantage of the previous annotation. We find it remarkable that for medium to high expressed genes the de novo gene predictions are as similar to the genome annotation as the predictions of the system, that has seen parts of the annotation in training. Comparing these results to predictions from the recently published method cufflinks [3] (black) reveals that cufflinks seems not to be able to appropriately adapt to the RNA-seq data at hand. Investigating the contribution of individual features we found that spliced read alignments suggesting introns help most to increase the gene prediction performance; 91.6% of the achieved total improvement is due to spliced read alignments. The read coverage alone is much less informative and only leads to improvements similar to the ones achieved with transcriptome tiling arrays. We employed the developed annotation strategy for the re-annotation of the C. briggsae genome, for which only few transcriptome sequences are available yet. We can show that the new annotation is considerably more accurate than previous ones and additionally includes alternative RNA isoforms. mGene.ngs will be released as open source software on http://mgene.org and is already available as Galaxy-based web-service at http://galaxy.fml.mpg.de. * Correspondence: [email protected] Friedrich Miescher Laboratory of the Max Planck Society, Tubingen, Germany Full list of author information is available at the end of the article Behr et al. BMC Bioinformatics 2010, 11(Suppl 10):O8 http://www.biomedcentral.com/1471-2105/11/S10/O8
Science | 2007
Richard M. Clark; Gabriele Schweikert; Christopher Toomajian; Stephan Ossowski; Georg Zeller; Paul Shinn; Norman Warthmann; Tina T. Hu; Glenn Fu; David A. Hinds; Huaming Chen; Kelly A. Frazer; Daniel H. Huson; Bernhard Schölkopf; Magnus Nordborg; Gunnar Rätsch; Joseph R. Ecker; Detlef Weigel
neural information processing systems | 2008
Gabriele Schweikert; Gunnar Rätsch; Christian Widmer; Bernhard Schölkopf
Structure | 2005
Vladan Lucˇic; Ting Yang; Gabriele Schweikert; Friedrich Förster; Wolfgang Baumeister
F1000Research | 2011
Jonas Behr; Regina Bohnert; André Kahles; Gabriele Schweikert; Georg Zeller; Lisa Hartmann; Gunnar Rätsch
Archive | 2010
Gabriele Schweikert
Worm Genomics and Systems Biology meeting | 2008
Gabriele Schweikert; Zeller G, Zien, A; Behr J, Sonnenburg, S; Philips P, Ong, Cs; Gunnar Rätsch