Willem Waegeman | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Willem Waegeman is active.

Explore More

Publication

Featured researches published by Willem Waegeman.

Machine Learning | 2012

On label dependence and loss minimization in multi-label classification

Krzysztof Dembczyński; Willem Waegeman; Weiwei Cheng; Eyke Hüllermeier

Most of the multi-label classification (MLC) methods proposed in recent years intended to exploit, in one way or the other, dependencies between the class labels. Comparing to simple binary relevance learning as a baseline, any gain in performance is normally explained by the fact that this method is ignoring such dependencies. Without questioning the correctness of such studies, one has to admit that a blanket explanation of that kind is hiding many subtle details, and indeed, the underlying mechanisms and true reasons for the improvements reported in experimental studies are rarely laid bare. Rather than proposing yet another MLC algorithm, the aim of this paper is to elaborate more closely on the idea of exploiting label dependence, thereby contributing to a better understanding of MLC. Adopting a statistical perspective, we claim that two types of label dependence should be distinguished, namely conditional and marginal dependence. Subsequently, we present three scenarios in which the exploitation of one of these types of dependence may boost the predictive performance of a classifier. In this regard, a close connection with loss minimization is established, showing that the benefit of exploiting label dependence does also depend on the type of loss to be minimized. Concrete theoretical results are presented for two representative loss functions, namely the Hamming loss and the subset 0/1 loss. In addition, we give an overview of state-of-the-art decomposition algorithms for MLC and we try to reveal the reasons for their effectiveness. Our conclusions are supported by carefully designed experiments on synthetic and benchmark data.

Systematic and Applied Microbiology | 2011

Bacterial species identification from MALDI-TOF mass spectra through data analysis and machine learning

Katrien De Bruyne; Bram Slabbinck; Willem Waegeman; Paul Vauterin; Bernard De Baets; Peter Vandamme

At present, there is much variability between MALDI-TOF MS methodology for the characterization of bacteria through differences in e.g., sample preparation methods, matrix solutions, organic solvents, acquisition methods and data analysis methods. After evaluation of the existing methods, a standard protocol was developed to generate MALDI-TOF mass spectra obtained from a collection of reference strains belonging to the genera Leuconostoc, Fructobacillus and Lactococcus. Bacterial cells were harvested after 24h of growth at 28°C on the media MRS or TSA. Mass spectra were generated, using the CHCA matrix combined with a 50:48:2 acetonitrile:water:trifluoroacetic acid matrix solution, and analyzed by the cell smear method and the cell extract method. After a data preprocessing step, the resulting high quality data set was used for PCA, distance calculation and multi-dimensional scaling. Using these analyses, species-specific information in the MALDI-TOF mass spectra could be demonstrated. As a next step, the spectra, as well as the binary character set derived from these spectra, were successfully used for species identification within the genera Leuconostoc, Fructobacillus, and Lactococcus. Using MALDI-TOF MS identification libraries for Leuconostoc and Fructobacillus strains, 84% of the MALDI-TOF mass spectra were correctly identified at the species level. Similarly, the same analysis strategy within the genus Lactococcus resulted in 94% correct identifications, taking species and subspecies levels into consideration. Finally, two machine learning techniques were evaluated as alternative species identification tools. The two techniques, support vector machines and random forests, resulted in accuracies between 94% and 98% for the identification of Leuconostoc and Fructobacillus species, respectively.

Pattern Recognition Letters | 2008

ROC analysis in ordinal regression learning

Willem Waegeman; Bernard De Baets; Luc Boullart

Nowadays the area under the receiver operating characteristics (ROC) curve, which corresponds to the Wilcoxon-Mann-Whitney test statistic, is increasingly used as a performance measure for binary classification systems. In this article we present a natural generalization of this concept for more than two ordered categories, a setting known as ordinal regression. Our extension of the Wilcoxon-Mann-Whitney statistic now corresponds to the volume under an r-dimensional surface (VUS) for r ordered categories and differs from extensions recently proposed for multi-class classification. VUS rather evaluates the ranking returned by an ordinal regression model instead of measuring the error rate, a way of thinking which has especially advantages with skew class or cost distributions. We give theoretical and experimental evidence of the advantages and different behavior of VUS compared to error rate, mean absolute error and other ranking-based performance measures for ordinal regression. The results demonstrate that the models produced by ordinal regression algorithms minimizing the error rate or a preference learning based loss, not necessarily impose a good ranking on the data.

Computational Statistics & Data Analysis | 2011

An experimental comparison of cross-validation techniques for estimating the area under the ROC curve

Antti Airola; Tapio Pahikkala; Willem Waegeman; Bernard De Baets; Tapio Salakoski

Reliable estimation of the classification performance of inferred predictive models is difficult when working with small data sets. Cross-validation is in this case a typical strategy for estimating the performance. However, many standard approaches to cross-validation suffer from extensive bias or variance when the area under the ROC curve (AUC) is used as the performance measure. This issue is explored through an extensive simulation study. Leave-pair-out cross-validation is proposed for conditional AUC-estimation, as it is almost unbiased, and its deviation variance is as low as that of the best alternative approaches. When using regularized least-squares based learners, efficient algorithms exist for calculating the leave-pair-out cross-validation estimate.

The ISME Journal | 2017

Absolute quantification of microbial taxon abundances

Ruben Props; Frederiek-Maarten Kerckhof; Peter Rubbens; Jo De Vrieze; Emma Hernandez Sanabria; Willem Waegeman; Pieter Monsieurs; Frederik Hammes; Nico Boon

High-throughput amplicon sequencing has become a well-established approach for microbial community profiling. Correlating shifts in the relative abundances of bacterial taxa with environmental gradients is the goal of many microbiome surveys. As the abundances generated by this technology are semi-quantitative by definition, the observed dynamics may not accurately reflect those of the actual taxon densities. We combined the sequencing approach (16S rRNA gene) with robust single-cell enumeration technologies (flow cytometry) to quantify the absolute taxon abundances. A detailed longitudinal analysis of the absolute abundances resulted in distinct abundance profiles that were less ambiguous and expressed in units that can be directly compared across studies. We further provide evidence that the enrichment of taxa (increase in relative abundance) does not necessarily relate to the outgrowth of taxa (increase in absolute abundance). Our results highlight that both relative and absolute abundances should be considered for a comprehensive biological interpretation of microbiome surveys.

european conference on artificial intelligence | 2012

An analysis of chaining in multi-label classification

Krzysztof Dembczyński; Willem Waegeman; Eyke Hüllermeier

The idea of classifier chains has recently been introduced as a promising technique for multi-label classification. However, despite being intuitively appealing and showing strong performance in empirical studies, still very little is known about the main principles underlying this type of method. In this paper, we provide a detailed probabilistic analysis of classifier chains from a risk minimization perspective, thereby helping to gain a better understanding of this approach. As a main result, we clarify that the original chaining method seeks to approximate the joint mode of the conditional distribution of label vectors in a greedy manner. As a result of a theoretical regret analysis, we conclude that this approach can perform quite poorly in terms of subset 0/1 loss. Therefore, we present an enhanced inference procedure for which the worst-case regret can be upper-bounded far more tightly. In addition, we show that a probabilistic variant of chaining, which can be utilized for any loss function, becomes tractable by using Monte Carlo sampling. Finally, we present experimental results confirming the validity of our theoretical findings.

Computational Statistics & Data Analysis | 2012

Learning partial ordinal class memberships with kernel-based proportional odds models

Jan Verwaeren; Willem Waegeman; Bernard De Baets

As an extension of multi-class classification, machine learning algorithms have been proposed that are able to deal with situations in which the class labels are defined in a non-crisp way. Objects exhibit in that sense a degree of membership to several classes. In a similar setting, models are developed here for classification problems where an order relation is specified on the classes (i.e., non-crisp ordinal regression problems). As for traditional (crisp) ordinal regression problems, it is argued that the order relation on the classes should be reflected by the model structure as well as the performance measure used to evaluate the model. These arguments lead to a natural extension of the well-known proportional odds model for non-crisp ordinal regression problems, in which the underlying latent variable is not necessarily restricted to the class of linear models (by using kernel methods).

Research in Microbiology | 2013

Exploration and prediction of interactions between methanotrophs and heterotrophs

Michiel Stock; Sven Hoefman; Frederiek-Maarten Kerckhof; Nico Boon; Paul De Vos; Bernard De Baets; Kim Heylen; Willem Waegeman

Methanotrophs can form the basis of a methane-driven food web on which heterotrophic microorganisms can feed. In return, these heterotrophs can stimulate growth of methanotrophs in co-culture by providing growth additives. However, only a few specific interactions are currently known. We incubated nine methanotrophs with 25 heterotrophic strains in a pairwise miniaturized co-cultivation setup. Through principal component analysis and k-means clustering, methanotrophs and heterotrophs could be grouped according to their interaction behaviour, suggesting strain-dependent methanotroph-heterotroph complementarity. Co-cultivation significantly enhanced the growth parameters of three methanotrophs. This was most pronounced for Methylomonas sp. M5, with a threefold increase in maximum density and a fourfold increase in maximum increase in density in co-culture with Cupriavidus taiwanensis LMG 19424. In contrast, co-cultivation with Methylobacterium radiotolerans LMG 2269 and Pseudomonas aeruginosa LMG 12228 inhibited growth of most methanotrophs. Functional genomic analysis suggested the importance of vitamin metabolism for co-cultivation success. The generated data set was then successfully exploited as a proof-of-principle for predictive modelling of co-culture responses based on other interactions of the same heterotrophs and methanotrophs, yielding values of the area under the receiver operating characteristic curve of 0.73 upon 50% missing values for the maximum increase in density parameter. As such, these modelling-based tools were shown to hold great promise in reducing the amount of data that needs to be generated when conducting large co-cultivation studies.

European Journal of Operational Research | 2010

Learning intransitive reciprocal relations with kernel methods

Tapio Pahikkala; Willem Waegeman; Evgeni Tsivtsivadze; Tapio Salakoski; Bernard De Baets

In different fields like decision making, psychology, game theory and biology, it has been observed that paired-comparison data like preference relations defined by humans and animals can be intransitive. Intransitive relations cannot be modeled with existing machine learning methods like ranking models, because these models exhibit strong transitivity properties. More specifically, in a stochastic context, where often the reciprocity property characterizes probabilistic relations such as choice probabilities, it has been formally shown that ranking models always satisfy the well-known strong stochastic transitivity property. Given this limitation of ranking models, we present a new kernel function that together with the regularized least-squares algorithm is capable of inferring intransitive reciprocal relations in problems where transitivity violations cannot be considered as noise. In this approach it is the kernel function that defines the transition from learning transitive to learning intransitive relations, and the Kronecker-product is introduced for representing the latter type of relations. In addition, we empirically demonstrate on two benchmark problems, one in game theory and one in theoretical biology, that our algorithm outperforms methods not capable of learning intransitive reciprocal relations.

Machine Learning | 2013

Efficient regularized least-squares algorithms for conditional ranking on relational data

Tapio Pahikkala; Antti Airola; Michiel Stock; Bernard De Baets; Willem Waegeman

In domains like bioinformatics, information retrieval and social network analysis, one can find learning tasks where the goal consists of inferring a ranking of objects, conditioned on a particular target object. We present a general kernel framework for learning conditional rankings from various types of relational data, where rankings can be conditioned on unseen data objects. We propose efficient algorithms for conditional ranking by optimizing squared regression and ranking loss functions. We show theoretically, that learning with the ranking loss is likely to generalize better than with the regression loss. Further, we prove that symmetry or reciprocity properties of relations can be efficiently enforced in the learned models. Experiments on synthetic and real-world data illustrate that the proposed methods deliver state-of-the-art performance in terms of predictive power and computational efficiency. Moreover, we also show empirically that incorporating symmetry or reciprocity properties can improve the generalization performance.

Explore More