Anneleen Van Assche | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Anneleen Van Assche is active.

Explore More

Publication

Featured researches published by Anneleen Van Assche.

international conference on multiple classifier systems | 2003

Ensemble methods for noise elimination in classification problems

Sofie Verbaeten; Anneleen Van Assche

Ensemble methods combine a set of classifiers to construct a new classifier that is (often) more accurate than any of its component classifiers. In this paper, we use ensemble methods to identify noisy training examples. More precisely, we consider the problem of mislabeled training examples in classification tasks, and address this problem by pre-processing the training set, i.e. by identifying and removing outliers from the training set. We study a number of filter techniques that are based on well-known ensemble methods like cross-validated committees, bagging and boosting. We evaluate these techniques in an Inductive Logic Programming setting and use a first order decision tree algorithm to construct the ensembles.

inductive logic programming | 2006

First order random forests: Learning relational classifiers with complex aggregates

Anneleen Van Assche; Celine Vens; Hendrik Blockeel; Sašo Džeroski

In relational learning, predictions for an individual are based not only on its own properties but also on the properties of a set of related individuals. Relational classifiers differ with respect to how they handle these sets: some use properties of the set as a whole (using aggregation), some refer to properties of specific individuals of the set, however, most classifiers do not combine both. This imposes an undesirable bias on these learners. This article describes a learning approach that avoids this bias, using first order random forests. Essentially, an ensemble of decision trees is constructed in which tests are first order logic queries. These queries may contain aggregate functions, the argument of which may again be a first order logic query. The introduction of aggregate functions in first order logic, as well as upgrading the forest’s uniform feature sampling procedure to the space of first order logic, generates a number of complications. We address these and propose a solution for them. The resulting first order random forest induction algorithm has been implemented and integrated in the ACE-ilProlog system, and experimentally evaluated on a variety of datasets. The results indicate that first order random forests with complex aggregates are an efficient and effective approach towards learning relational classifiers that involve aggregates over complex selections.

inductive logic programming | 2004

First Order Random Forests with Complex Aggregates

Celine Vens; Anneleen Van Assche; Hendrik Blockeel; Sašo Džeroski

Random forest induction is a bagging method that randomly samples the feature set at each node in a decision tree. In propositional learning, the method has been shown to work well when lots of features are available. This certainly is the case in first order learning, especially when aggregate functions, combined with selection conditions on the set to be aggregated, are included in the feature space. In this paper, we introduce a random forest based approach to learning first order theories with aggregates. We experimentally validate and compare several variants: first order random forests without aggregates, with simple aggregates, and with complex aggregates in the feature set.

european conference on machine learning | 2007

Seeing the Forest Through the Trees: Learning a Comprehensible Model from an Ensemble

Anneleen Van Assche; Hendrik Blockeel

Ensemble methods are popular learning methods that usually increase the predictive accuracy of a classifier though at the cost of interpretability and insight in the decision process. In this paper we aim to overcome this issue of comprehensibility by learning a single decision tree that approximates an ensemble of decision trees. The new model is obtained by exploiting the class distributions predicted by the ensemble. These are employed to compute heuristics for deciding which tests are to be used in the new tree. As such we acquire a model that is able to give insight in the decision process, while being more accurate than the single model directly learned on the data. The proposed method is experimentally evaluated on a large number of UCI data sets, and compared to an existing approach that makes use of artificially generated data.

inductive logic programming | 2007

Seeing the Forest Through the Trees

Anneleen Van Assche; Hendrik Blockeel

Ensemble methods are popular learning methods that are usually able to increase the predictive accuracy of a classifier. On the other hand, this comes at the cost of interpretability, and insight in the decision process of an ensemble is hard to obtain. This is a major reason why ensemble methods have not been extensively used in the setting of inductive logic programming. In this paper we aim to overcome this issue of comprehensibility by learning a single first order interpretable model that approximates the first order ensemble. The new model is obtained by exploiting the class distributions predicted by the ensemble. These are employed to compute heuristics for deciding which tests are to be used in the new model. As such we obtain a model that is able to give insight in the decision process of the ensemble, while being more accurate than the single model directly learned on the data.

european conference on machine learning | 2006

Bagging using statistical queries

Anneleen Van Assche; Hendrik Blockeel

Bagging is an ensemble method that relies on random resampling of a data set to construct models for the ensemble. When only statistics about the data are available, but no individual examples, the straightforward resampling procedure cannot be implemented. The question is then whether bagging can somehow be simulated. In this paper we propose a method that, instead of computing certain heuristics (such as information gain) from a resampled version of the data, estimates the probability distribution of these heuristics under random resampling, and then samples from this distribution. The resulting method is not entirely equivalent to bagging because it ignores certain dependencies among statistics. Nevertheless, experiments show that this “simulated bagging” yields similar accuracy as bagging, while being as efficient and more generally applicable.

Lecture Notes in Computer Science | 2004

First order random forests with complex aggregates

Celine Vens; Anneleen Van Assche; Hendrik Blockeel; Saso Dzeroski

For the last ten years a lot of work has been devoted to propositionalization techniques in relational learning. These techniques change the representation of relational problems to attribute-value problems in order to use well-known learning algorithms to solve them. Propositionalization approaches have been successively applied to various problems but are still considered as ad hoc techniques. In this paper, we study these techniques in the larger context of macro-operators as techniques to improve the heuristic search. The macro-operator paradigm enables us to propose a unified view of propositionalization and to discuss its current limitations. We show that a whole new class of approaches can be developed in relational learning which extends the idea of changes of representation to more suited learning languages. As a first step, we propose different languages that provide a better compromise than current propositionalization techniques between the cost of building macro-operators and the cost of learning. It is known that ILP problems can be reformulated either into attribute-value or multi-instance problems. With the macro-operator approach, we see that we can target a new representation language we name multi-table. This new language is more expressive than attribute-value but is simpler than multi-instance. Moreover, it is PAC-learnable under weak constraints. Finally, we suggest that relational learning can benefit from both the problem solving and the attributevalue learning community by focusing on the design of effective macrooperator approaches.

Archive | 2009