Is this you? Create Your Porfile

Celine Vens

Katholieke Universiteit Leuven

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Celine Vens is active.

Explore More

Publication

Featured researches published by Celine Vens.

Machine Learning | 2008

Decision trees for hierarchical multi-label classification

Celine Vens; Jan Struyf; Leander Schietgat; Sašo Džeroski; Hendrik Blockeel

Hierarchical multi-label classification (HMC) is a variant of classification where instances may belong to multiple classes at the same time and these classes are organized in a hierarchy. This article presents several approaches to the induction of decision trees for HMC, as well as an empirical study of their use in functional genomics. We compare learning a single HMC tree (which makes predictions for all classes together) to two approaches that learn a set of regular classification trees (one for each class). The first approach defines an independent single-label classification task for each class (SC). Obviously, the hierarchy introduces dependencies between the classes. While they are ignored by the first approach, they are exploited by the second approach, named hierarchical single-label classification (HSC). Depending on the application at hand, the hierarchy of classes can be such that each class has at most one parent (tree structure) or such that classes may have multiple parents (DAG structure). The latter case has not been considered before and we show how the HMC and HSC approaches can be modified to support this setting. We compare the three approaches on 24 yeast data sets using as classification schemes MIPS’s FunCat (tree structure) and the Gene Ontology (DAG structure). We show that HMC trees outperform HSC and SC trees along three dimensions: predictive accuracy, model size, and induction time. We conclude that HMC trees should definitely be considered in HMC tasks where interpretable models are desired.

BMC Bioinformatics | 2010

Predicting gene function using hierarchical multi-label decision tree ensembles

Leander Schietgat; Celine Vens; Jan Struyf; Hendrik Blockeel; Dragi Kocev; Sašo Džeroski

BackgroundS. cerevisiae, A. thaliana and M. musculus are well-studied organisms in biology and the sequencing of their genomes was completed many years ago. It is still a challenge, however, to develop methods that assign biological functions to the ORFs in these genomes automatically. Different machine learning methods have been proposed to this end, but it remains unclear which method is to be preferred in terms of predictive performance, efficiency and usability.ResultsWe study the use of decision tree based models for predicting the multiple functions of ORFs. First, we describe an algorithm for learning hierarchical multi-label decision trees. These can simultaneously predict all the functions of an ORF, while respecting a given hierarchy of gene functions (such as FunCat or GO). We present new results obtained with this algorithm, showing that the trees found by it exhibit clearly better predictive performance than the trees found by previously described methods. Nevertheless, the predictive performance of individual trees is lower than that of some recently proposed statistical learning methods. We show that ensembles of such trees are more accurate than single trees and are competitive with state-of-the-art statistical learning and functional linkage methods. Moreover, the ensemble method is computationally efficient and easy to use.ConclusionsOur results suggest that decision tree based methods are a state-of-the-art, efficient and easy-to-use approach to ORF function prediction.

Pattern Recognition | 2013

Tree ensembles for predicting structured outputs

Dragi Kocev; Celine Vens; Jan Struyf; Sašo Deroski

In this paper, we address the task of learning models for predicting structured outputs. We consider both global and local predictions of structured outputs, the former based on a single model that predicts the entire output structure and the latter based on a collection of models, each predicting a component of the output structure. We use ensemble methods and apply them in the context of predicting structured outputs. We propose to build ensemble models consisting of predictive clustering trees, which generalize classification trees: these have been used for predicting different types of structured outputs, both locally and globally. More specifically, we develop methods for learning two types of ensembles (bagging and random forests) of predictive clustering trees for global and local predictions of different types of structured outputs. The types of outputs considered correspond to different predictive modeling tasks: multi-target regression, multi-target classification, and hierarchical multi-label classification. Each of the combinations can be applied both in the context of global prediction (producing a single ensemble) or local prediction (producing a collection of ensembles). We conduct an extensive experimental evaluation across a range of benchmark datasets for each of the three types of structured outputs. We compare ensembles for global and local prediction, as well as single trees for global prediction and tree collections for local prediction, both in terms of predictive performance and in terms of efficiency (running times and model complexity). The results show that both global and local tree ensembles perform better than the single model counterparts in terms of predictive power. Global and local tree ensembles perform equally well, with global ensembles being more efficient and producing smaller models, as well as needing fewer trees in the ensemble to achieve the maximal performance.

european conference on machine learning | 2007

Ensembles of Multi-Objective Decision Trees

Dragi Kocev; Celine Vens; Jan Struyf; Sašo Džeroski

Ensemble methods are able to improve the predictive performance of many base classifiers. Up till now, they have been applied to classifiers that predict a single target attribute. Given the non-trivial interactions that may occur among the different targets in multi-objective prediction tasks, it is unclear whether ensemble methods also improve the performance in this setting. In this paper, we consider two ensemble learning techniques, bagging and random forests, and apply them to multi-objective decision trees (MODTs), which are decision trees that predict multiple target attributes at once. We empirically investigate the performance of ensembles of MODTs. Our most important conclusions are: (1) ensembles of MODTs yield better predictive performance than MODTs, and (2) ensembles of MODTs are equally good, or better than ensembles of single-objective decision trees, i.e., a set of ensembles for each target. Moreover, ensembles of MODTs have smaller model size and are faster to learn than ensembles of single-objective decision trees.

inductive logic programming | 2006

First order random forests: Learning relational classifiers with complex aggregates

Anneleen Van Assche; Celine Vens; Hendrik Blockeel; Sašo Džeroski

In relational learning, predictions for an individual are based not only on its own properties but also on the properties of a set of related individuals. Relational classifiers differ with respect to how they handle these sets: some use properties of the set as a whole (using aggregation), some refer to properties of specific individuals of the set, however, most classifiers do not combine both. This imposes an undesirable bias on these learners. This article describes a learning approach that avoids this bias, using first order random forests. Essentially, an ensemble of decision trees is constructed in which tests are first order logic queries. These queries may contain aggregate functions, the argument of which may again be a first order logic query. The introduction of aggregate functions in first order logic, as well as upgrading the forest’s uniform feature sampling procedure to the space of first order logic, generates a number of complications. We address these and propose a solution for them. The resulting first order random forest induction algorithm has been implemented and integrated in the ACE-ilProlog system, and experimentally evaluated on a variety of datasets. The results indicate that first order random forests with complex aggregates are an efficient and effective approach towards learning relational classifiers that involve aggregates over complex selections.

international conference on data mining | 2011

Random Forest Based Feature Induction

Celine Vens; Fabrizio Costa

We propose a simple yet effective strategy to induce a task dependent feature representation using ensembles of random decision trees. The new feature mapping is efficient in space and time, and provides a metric transformation that is non parametric and not implicit in nature (i.e. not expressed via a kernel matrix), nor limited to the transductive setup. The main advantage of the proposed mapping lies in its flexibility to adapt to several types of learning tasks ranging from regression to multi-label classification, and to deal in a natural way with missing values. Finally, we provide an extensive empirical study of the properties of the learned feature representation over real and artificial datasets.

inductive logic programming | 2004

First Order Random Forests with Complex Aggregates

Celine Vens; Anneleen Van Assche; Hendrik Blockeel; Sašo Džeroski

Random forest induction is a bagging method that randomly samples the feature set at each node in a decision tree. In propositional learning, the method has been shown to work well when lots of features are available. This certainly is the case in first order learning, especially when aggregate functions, combined with selection conditions on the set to be aggregated, are included in the feature space. In this paper, we introduce a random forest based approach to learning first order theories with aggregates. We experimentally validate and compare several variants: first order random forests without aggregates, with simple aggregates, and with complex aggregates in the feature set.

Science | 2017

Predicting human olfactory perception from chemical features of odor molecules

Andreas Keller; Richard C. Gerkin; Yuanfang Guan; Amit Dhurandhar; Gábor Turu; Bence Szalai; Yusuke Ihara; Chung Wen Yu; Russ Wolfinger; Celine Vens; Leander Schietgat; Kurt De Grave; Raquel Norel; Gustavo Stolovitzky; Guillermo A. Cecchi; Leslie B. Vosshall; Pablo Meyer

How will this molecule smell? We still do not understand what a given substance will smell like. Keller et al. launched an international crowd-sourced competition in which many teams tried to solve how the smell of a molecule will be perceived by humans. The teams were given access to a database of responses from subjects who had sniffed a large number of molecules and been asked to rate each smell across a range of different qualities. The teams were also given a comprehensive list of the physical and chemical features of the molecules smelled. The teams produced algorithms to predict the correspondence between the quality of each smell and a given molecule. The best models that emerged from this challenge could accurately predict how a new molecule would smell. Science, this issue p. 820 Results of a crowdsourcing competition show that it is possible to accurately predict and reverse-engineer the smell of a molecule. It is still not possible to predict whether a given molecule will have a perceived odor or what olfactory percept it will produce. We therefore organized the crowd-sourced DREAM Olfaction Prediction Challenge. Using a large olfactory psychophysical data set, teams developed machine-learning algorithms to predict sensory attributes of molecules based on their chemoinformatic features. The resulting models accurately predicted odor intensity and pleasantness and also successfully predicted 8 among 19 rated semantic descriptors (“garlic,” “fish,” “sweet,” “fruit,” “burnt,” “spices,” “flower,” and “sour”). Regularized linear models performed nearly as well as random forest–based ones, with a predictive accuracy that closely approaches a key theoretical limit. These models help to predict the perceptual qualities of virtually any molecule with high accuracy and also reverse-engineer the smell of a molecule.

european conference on principles of data mining and knowledge discovery | 2006

Refining aggregate conditions in relational learning

Celine Vens; Jan Ramon; Hendrik Blockeel

In relational learning, predictions for an individual are based not only on its own properties but also on the properties of a set of related individuals. Many systems use aggregates to summarize this set. Features thus introduced compare the result of an aggregate function to a threshold. We consider the case where the set to be aggregated is generated by a complex query and present a framework for refining such complex aggregate conditions along three dimensions: the aggregate function, the query used to generate the set, and the threshold value. The proposed aggregate refinement operator allows a more efficient search through the hypothesis space and thus can be beneficial for many relational learners that use aggregates. As an example application, we have implemented the refinement operator in a relational decision tree induction system. Experimental results show a significant efficiency gain in comparison with the use of a less advanced refinement operator.

Cytometry Part A | 2016

A benchmark for evaluation of algorithms for identification of cellular correlates of clinical outcomes.

Nima Aghaeepour; Pratip K. Chattopadhyay; Maria Chikina; Tom Dhaene; Sofie Van Gassen; Miron B. Kursa; Bart N. Lambrecht; Mehrnoush Malek; Geoffrey J. McLachlan; Yu Qian; Peng Qiu; Yvan Saeys; Rick Stanton; Dong Tong; Celine Vens; Slawomir Walkowiak; Kui Wang; Greg Finak; Raphael Gottardo; Tim R. Mosmann; Garry P. Nolan; Richard H. Scheuermann; Ryan R. Brinkman

The Flow Cytometry: Critical Assessment of Population Identification Methods (FlowCAP) challenges were established to compare the performance of computational methods for identifying cell populations in multidimensional flow cytometry data. Here we report the results of FlowCAP‐IV where algorithms from seven different research groups predicted the time to progression to AIDS among a cohort of 384 HIV+ subjects, using antigen‐stimulated peripheral blood mononuclear cell (PBMC) samples analyzed with a 14‐color staining panel. Two approaches (FlowReMi.1 and flowDensity‐flowType‐RchyOptimyx) provided statistically significant predictive value in the blinded test set. Manual validation of submitted results indicated that unbiased analysis of single cell phenotypes could reveal unexpected cell types that correlated with outcomes of interest in high dimensional flow cytometry datasets.

Explore More