Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Sven Degroeve is active.

Publication


Featured researches published by Sven Degroeve.


Nucleic Acids Research | 2005

Large-scale structural analysis of the core promoter in mammalian and plant genomes

Kobe Florquin; Yvan Saeys; Sven Degroeve; Pierre Rouzé; Yves Van de Peer

DNA encodes at least two independent levels of functional information. The first level is for encoding proteins and sequence targets for DNA-binding factors, while the second one is contained in the physical and structural properties of the DNA molecule itself. Although the physical and structural properties are ultimately determined by the nucleotide sequence itself, the cell exploits these properties in a way in which the sequence itself plays no role other than to support or facilitate certain spatial structures. In this work, we focus on these structural properties, comparing them between different organisms and assessing their ability to describe the core promoter. We prove the existence of distinct types of core promoters, based on a clustering of their structural profiles. These results indicate that the structural profiles are much conserved within plants (Arabidopsis and rice) and animals (human and mouse), but differ considerably between plants and animals. Furthermore, we demonstrate that these structural profiles can be an alternative way of describing the core promoter, in addition to more classical motif or IUPAC-based approaches. Using the structural profiles as discriminatory elements to separate promoter regions from non-promoter regions, reliable models can be built to identify core-promoter regions using a strictly computational approach.


Bioinformatics | 2005

SpliceMachine: predicting splice sites from high-dimensional local context representations

Sven Degroeve; Yvan Saeys; Bernard De Baets; Pierre Rouzé; Yves Van de Peer

MOTIVATION In this age of complete genome sequencing, finding the location and structure of genes is crucial for further molecular research. The accurate prediction of intron boundaries largely facilitates the correct prediction of gene structure in nuclear genomes. Many tools for localizing these boundaries on DNA sequences have been developed and are available to researchers through the internet. Nevertheless, these tools still make many false positive predictions. RESULTS This manuscript presents a novel publicly available splice site prediction tool named SpliceMachine that (i) shows state-of-the-art prediction performance on Arabidopsis thaliana and human sequences, (ii) performs a computationally fast annotation and (iii) can be trained by the user on its own data. AVAILABILITY Results, figures and software are available at http://www.bioinformatics.psb.ugent.be/supplementary_data/ CONTACT [email protected]; [email protected].


BMC Bioinformatics | 2004

Feature selection for splice site prediction: A new method using EDA-based feature ranking

Yvan Saeys; Sven Degroeve; Dirk Aeyels; Pierre Rouzé; Yves Van de Peer

BackgroundThe identification of relevant biological features in large and complex datasets is an important step towards gaining insight in the processes underlying the data. Other advantages of feature selection include the ability of the classification system to attain good or even better solutions using a restricted subset of features, and a faster classification. Thus, robust methods for fast feature selection are of key importance in extracting knowledge from complex biological data.ResultsIn this paper we present a novel method for feature subset selection applied to splice site prediction, based on estimation of distribution algorithms, a more general framework of genetic algorithms. From the estimated distribution of the algorithm, a feature ranking is derived. Afterwards this ranking is used to iteratively discard features. We apply this technique to the problem of splice site prediction, and show how it can be used to gain insight into the underlying biological process of splicing.ConclusionWe show that this technique proves to be more robust than the traditional use of estimation of distribution algorithms for feature selection: instead of returning a single best subset of features (as they normally do) this method provides a dynamical view of the feature selection process, like the traditional sequential wrapper methods. However, the method is faster than the traditional techniques, and scales better to datasets described by a large number of features.


Journal of Proteome Research | 2011

Analysis of the resolution limitations of peptide identification algorithms.

Niklaas Colaert; Sven Degroeve; Kenny Helsens; Lennart Martens

Proteome identification using peptide-centric proteomics techniques is a routinely used analysis technique. One of the most powerful and popular methods for the identification of peptides from MS/MS spectra is protein database matching using search engines. Significance thresholding through false discovery rate (FDR) estimation by target/decoy searches is used to ensure the retention of predominantly confident assignments of MS/MS spectra to peptides. However, shortcomings have become apparent when such decoy searches are used to estimate the FDR. To study these shortcomings, we here introduce a novel kind of decoy database that contains isobaric mutated versions of the peptides that were identified in the original search. Because of the supervised way in which the entrapment sequences are generated, we call this a directed decoy database. Since the peptides found in our directed decoy database are thus specifically designed to look quite similar to the forward identifications, the limitations of the existing search algorithms in making correct calls in such strongly confusing situations can be analyzed. Interestingly, for the vast majority of confidently identified peptide identifications, a directed decoy peptide-to-spectrum match can be found that has a better or equal match score than the forward match score, highlighting an important issue in the interpretation of peptide identifications in present-day high-throughput proteomics.


Proteomics | 2011

A posteriori quality control for the curation and reuse of public proteomics data

Joseph M. Foster; Sven Degroeve; Laurent Gatto; Matthieu Visser; Rui Wang; Johannes Griss; Rolf Apweiler; Lennart Martens

Proteomics is a rapidly expanding field encompassing a multitude of complex techniques and data types. To date much effort has been devoted to achieving the highest possible coverage of proteomes with the aim to inform future developments in basic biology as well as in clinical settings. As a result, growing amounts of data have been deposited in publicly available proteomics databases. These data are in turn increasingly reused for orthogonal downstream purposes such as data mining and machine learning. These downstream uses however, need ways to a posteriori validate whether a particular data set is suitable for the envisioned purpose. Furthermore, the (semi‐)automatic curation of repository data is dependent on analyses that can highlight misannotation and edge conditions for data sets. Such curation is an important prerequisite for efficient proteomics data reuse in the life sciences in general. We therefore present here a selection of quality control metrics and approaches for the a posteriori detection of potential issues encountered in typical proteomics data sets. We illustrate our metrics by relying on publicly available data from the Proteomics Identifications Database (PRIDE), and simultaneously show the usefulness of the large body of PRIDE data as a means to derive empirical background distributions for relevant metrics.


Bioinformatics | 2013

MS2PIP: a tool for MS/MS peak intensity prediction.

Sven Degroeve; Lennart Martens

MOTIVATION Tandem mass spectrometry provides the means to match mass spectrometry signal observations with the chemical entities that generated them. The technology produces signal spectra that contain information about the chemical dissociation pattern of a peptide that was forced to fragment using methods like collision-induced dissociation. The ability to predict these MS(2) signals and to understand this fragmentation process is important for sensitive high-throughput proteomics research. RESULTS We present a new tool called MS(2)PIP for predicting the intensity of the most important fragment ion signal peaks from a peptide sequence. MS(2)PIP pre-processes a large dataset with confident peptide-to-spectrum matches to facilitate data-driven model induction using a random forest regression learning algorithm. The intensity predictions of MS(2)PIP were evaluated on several independent evaluation sets and found to correlate significantly better with the observed fragment-ion intensities as compared with the current state-of-the-art PeptideART tool. AVAILABILITY MS(2)PIP code is available for both training and predicting at http://compomics.com/.


Nature Methods | 2011

Combining quantitative proteomics data processing workflows for greater sensitivity

Niklaas Colaert; Christophe Van Huele; Sven Degroeve; An Staes; Joël Vandekerckhove; Kris Gevaert; Lennart Martens

We here describe a normalization method to combine quantitative proteomics data. By merging the output of two popular quantification software packages, we obtained a 20% increase (on average) in the number of quantified human proteins without suffering from a loss of quality. Our integrative workflow is freely available through our user-friendly, open-source Rover software (http://compomics-rover.googlecode.com/).


Journal of Proteome Research | 2013

Predicting tryptic cleavage from proteomics data using decision tree ensembles

Thomas Fannes; Elien Vandermarliere; Leander Schietgat; Sven Degroeve; Lennart Martens; Jan Ramon

Trypsin is the workhorse protease in mass spectrometry-based proteomics experiments and is used to digest proteins into more readily analyzable peptides. To identify these peptides after mass spectrometric analysis, the actual digestion has to be mimicked as faithfully as possible in silico. In this paper we introduce CP-DT (Cleavage Prediction with Decision Trees), an algorithm based on a decision tree ensemble that was learned on publicly available peptide identification data from the PRIDE repository. We demonstrate that CP-DT is able to accurately predict tryptic cleavage: tests on three independent data sets show that CP-DT significantly outperforms the Keil rules that are currently used to predict tryptic cleavage. Moreover, the trees generated by CP-DT can make predictions efficiently and are interpretable by domain experts.


Analytical and Bioanalytical Chemistry | 2012

Towards a human proteomics atlas

Giulia Gonnelli; Niels Hulstaert; Sven Degroeve; Lennart Martens

Proteomics research has taken up an increasingly important role in life sciences over the past few years. Due to a strong push from publishers and funders alike, the community has also started to freely share its data in earnest, making use of public repositories such as the highly popular PRIDE database at EMBL-EBI. Reuse of these publicly available data has so far been confined to rather specific, targeted reanalyses, but this limited reuse is set to expand dramatically as repositories continue to grow exponentially. Examples of large-scale reuse are readily found in other omics disciplines, where more comprehensive public data have already accumulated over longer periods. Here, a typical example of integrative data reuse is provided by the construction of so-called expression atlases. We here therefore investigate the issues involved in using the human data currently stored in the PRIDE database to construct a robust, tissue-specific protein expression atlas from tandem-MS based label-free quantification.


european conference on principles of data mining and knowledge discovery | 2004

Digging into acceptor splice site prediction: an iterative feature selection approach

Yvan Saeys; Sven Degroeve; Yves Van de Peer

Feature selection techniques are often used to reduce data dimensionality, increase classification performance, and gain insight into the processes that generated the data. In this paper, we describe an iterative procedure of feature selection and feature construction steps, improving the classification of acceptor splice sites, an important subtask of gene prediction. We show that acceptor prediction can benefit from feature selection, and describe how feature selection techniques can be used to gain new insights in the classification of acceptor sites. This is illustrated by the identification of a new, biologically motivated feature: the AG-scanning feature.The results described in this paper contribute both to the domain of gene prediction, and to research in feature selection techniques, describing a new wrapper based feature weighting method that aids in knowledge discovery when dealing with complex datasets.

Collaboration


Dive into the Sven Degroeve's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge