Stijn Vanderlooy | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Stijn Vanderlooy is active.

Explore More

Publication

Featured researches published by Stijn Vanderlooy.

Pattern Recognition | 2010

Combining predictions in pairwise classification: An optimal adaptive voting strategy and its relation to weighted voting

Eyke Hüllermeier; Stijn Vanderlooy

Weighted voting is the commonly used strategy for combining predictions in pairwise classification. Even though it shows good classification performance in practice, it is often criticized for lacking a sound theoretical justification. In this paper, we study the problem of combining predictions within a formal framework of label ranking and, under some model assumptions, derive a generalized voting strategy in which predictions are properly adapted according to the strengths of the corresponding base classifiers. We call this strategy adaptive voting and show that it is optimal in the sense of yielding a MAP prediction of the class label of a test instance. Moreover, we offer a theoretical justification for weighted voting by showing that it yields a good approximation of the optimal adaptive voting prediction. This result is further corroborated by empirical evidence from experiments with real and synthetic data sets showing that, even though adaptive voting is sometimes able to achieve consistent improvements, weighted voting is in general quite competitive, all the more in cases where the aforementioned model assumptions underlying adaptive voting are not met. In this sense, weighted voting appears to be a more robust aggregation strategy.

european conference on machine learning | 2009

Binary Decomposition Methods for Multipartite Ranking

Johannes Fürnkranz; Eyke Hüllermeier; Stijn Vanderlooy

Bipartite ranking refers to the problem of learning a ranking function from a training set of positively and negatively labeled examples. Applied to a set of unlabeled instances, a ranking function is expected to establish a total order in which positive instances precede negative ones. The performance of a ranking function is typically measured in terms of the AUC. In this paper, we study the problem of multipartite ranking, an extension of bipartite ranking to the multi-class case. In this regard, we discuss extensions of the AUC metric which are suitable as evaluation criteria for multipartite rankings. Moreover, to learn multipartite ranking functions, we propose methods on the basis of binary decomposition techniques that have previously been used for multi-class and ordinal classification. We compare these methods both analytically and experimentally, not only against each other but also to existing methods applicable to the same problem.

IEEE Transactions on Fuzzy Systems | 2009

Why Fuzzy Decision Trees are Good Rankers

Eyke Hüllermeier; Stijn Vanderlooy

Several fuzzy extensions of decision tree induction, which is an established machine-learning method, have already been proposed in the literature. So far, however, fuzzy decision trees have almost exclusively been used for the performance task of classification. In this paper, we show that a fuzzy extension of decision trees is arguably more useful for another performance task, namely ranking. Roughly, the goal of ranking is to order a set of instances from most likely positive to most likely negative. The motivation for applying fuzzy decision trees to this problem originates from recent investigations of the ranking performance of conventional decision trees. These investigations will be continued and complemented in this paper. Our results reveal some properties that seem to be crucial for a good ranking performance-properties that are better and more naturally offered by fuzzy than by conventional decision trees. Most notably, a fuzzy decision tree produces scores in terms of membership degrees on a fine-granular scale. Using these membership degrees as a ranking criterion, a key problem of conventional decision trees is solved in an elegant way, namely the question of how to break ties between instances in the same leaf or, more generally, between equally scored instances.

Machine Learning | 2008

A critical analysis of variants of the AUC

Stijn Vanderlooy; Eyke Hüllermeier

The area under the ROC curve, or AUC, has been widely used to assess the ranking performance of binary scoring classifiers. Given a sample, the metric considers the ordering of positive and negative instances, i.e., the sign of the corresponding score differences. From a model evaluation and selection point of view, it may appear unreasonable to ignore the absolute value of these differences. For this reason, several variants of the AUC metric that take score differences into account have recently been proposed. In this paper, we present a unified framework for these metrics and provide a formal analysis. We conjecture that, despite their intuitive appeal, actually none of the variants is effective, at least with regard to model evaluation and selection. An extensive empirical analysis corroborates this conjecture. Our findings also shed light on recent research dealing with the construction of AUC-optimizing classifiers.

intelligent data analysis | 2009

The ROC isometrics approach to construct reliable classifiers

Stijn Vanderlooy; Ida G. Sprinkhuizen-Kuyper; Evgueni N. Smirnov; H. Jaap van den Herik

We address the problem of applying machine-learning classifiers in domains where incorrect classifications have severe consequences. In these domains we propose to apply classifiers only when their performance can be defined by the domain expert prior to classification. The classifiers so obtained are called reliable classifiers. In the article we present three main contributions. First, we establish the effect on an ROC curve when ambiguous instances are left unclassified. Second, we propose the ROC isometrics approach to tune and transform a classifier in such a way that it becomes reliable. Third, we provide an empirical evaluation of the approach. From our analysis and experimental evaluation we may conclude that the ROC isometrics approach is an effective and efficient approach to construct reliable classifiers. In addition, a discussion about related work clearly shows the benefits of the approach when compared with existing approaches that also have the option to leave ambiguous instances unclassified.

Neuroinformatics | 2008

Non-parametric Algorithmic Generation of Neuronal Morphologies

Benjamin Torben-Nielsen; Stijn Vanderlooy; Eric O. Postma

Generation algorithms allow for the generation of Virtual Neurons (VNs) from a small set of morphological properties. The set describes the morphological properties of real neurons in terms of statistical descriptors such as the number of branches and segment lengths (among others). The majority of reconstruction algorithms use the observed properties to estimate the parameters of a priori fixed probability distributions in order to construct statistical descriptors that fit well with the observed data. In this article, we present a non-parametric generation algorithm based on kernel density estimators (KDEs). The new algorithm is called KDE-Neuron and has three advantages over parametric reconstruction algorithms: (1) no a priori specifications about the distributions underlying the real data, (2) peculiarities in the biological data will be reflected in the VNs, and (3) ability to reconstruct different cell types. We experimentally generated motor neurons and granule cells, and statistically validated the obtained results. Moreover, we assessed the quality of the prototype data set and observed that our generated neurons are as good as the prototype data in terms of the used statistical descriptors. The opportunities and limitations of data-driven algorithmic reconstruction of neurons are discussed.

Journal of Computational Biology | 2007

On the reliable identification of plant sequences containing a polyadenylation site.

Ilkka Havukkala; Stijn Vanderlooy

It is a challenging task to predict with high reliability whether plant genomic sequences contain a polyadenylation (polyA) site or not. In this paper, we solve the task by means of a systematic machine-learning procedure applied on a dataset of 1000 Arabidopsis thaliana sequences flanking polyA sites. Our procedure consists of three steps. In the first step, we extract informative features from the sequences using the highly informative k-mer windows approach. Experiments with five classifiers show that the best performance is approximately 83%. In the second step, we improve performance to 95% by reducing the number of features using linear discriminant analysis, followed by applying the linear discriminant classifier. In the third step, we apply the transductive confidence machines approach and the receiver operating characteristic isometrics approach. The resulting two classifiers enable presetting any desired performance by dealing carefully with sequences for which it is unclear whether they contain polyA sites or not. For example, in our case study, we obtain 99% performance by leaving 26% of the sequences unclassified, and 100% performance by leaving 40% of the sequences unclassified. This is clearly useful for experimental verification of putative polyA sites in the laboratory. The novel methods in our machine-learning procedure should find applications in several areas of bioinformatics.

european conference on principles of data mining and knowledge discovery | 2007

A comparison of two approaches to classify with guaranteed performance

Stijn Vanderlooy; Ida G. Sprinkhuizen-Kuyper

The recently introduced transductive confidence machine approach and the ROC isometrics approach provide a framework to extend classifiers such that their performance can be set by the user prior to classification. In this paper we use the k-nearest neighbour classifier in order to provide an extensive empirical evaluation and comparison of the approaches. From our results we may conclude that the approaches are competing and promising generally applicable machine learning tools.

european conference on machine learning | 2008