Albrecht Zimmermann
Katholieke Universiteit Leuven
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Albrecht Zimmermann.
international conference on data mining | 2007
Björn Bringmann; Albrecht Zimmermann
Constrained pattern mining extracts patterns based on their individual merit. Usually this results in far more patterns than a human expert or a machine learning technique could make use of. Often different patterns or combinations of patterns cover a similar subset of the examples, thus being redundant and not carrying any new information. To remove the redundant information contained in such pattern sets, we propose a general heuristic approach for selecting a small subset of patterns. We identify several selection techniques for use in this general algorithm and evaluate those on several data sets. The results show that the technique succeeds in severely reducing the number of patterns, while at the same time apparently retaining much of the original information. Additionally the experiments show that reducing the pattern set indeed improves the quality of classification results. Both results show that the approach is very well suited for the goals we aim at.
Knowledge and Information Systems | 2009
Björn Bringmann; Albrecht Zimmermann
Constrained pattern mining extracts patterns based on their individual merit. Usually this results in far more patterns than a human expert or a machine leaning technique could make use of. Often different patterns or combinations of patterns cover a similar subset of the examples, thus being redundant and not carrying any new information. To remove the redundant information contained in such pattern sets, we propose two general heuristic algorithms—Bouncer and Picker—for selecting a small subset of patterns. We identify several selection techniques for use in this general algorithm and evaluate those on several data sets. The results show that both techniques succeed in severely reducing the number of patterns, while at the same time apparently retaining much of the original information. Additionally, the experiments show that reducing the pattern set indeed improves the quality of classification results. Both results show that the developed solutions are very well suited for the goals we aim at.
Machine Learning | 2009
Albrecht Zimmermann; Luc De Raedt
We introduce the problem of cluster-grouping and show that it can be considered a subtask in several important data mining tasks, such as subgroup discovery, mining correlated patterns, clustering and classification. The algorithm CG for solving cluster-grouping problems is then introduced, and it is incorporated as a component in several existing and novel algorithms for tackling subgroup discovery, clustering and classification. The resulting systems are empirically compared to state-of-the-art systems such as CN2, CBA, Ripper, Autoclass and CobWeb. The results indicate that the CG algorithm can be useful as a generic local pattern mining component in a wide variety of data mining and machine learning algorithms.
european conference on machine learning | 2010
Albrecht Zimmermann; Björn Bringmann; Ulrich Rückert
In structure-activity-relationships (SAR) one aims at finding classifiers that predict the biological or chemical activity of a compound from its molecular graph. Many approaches to SAR use sets of binary substructure features, which test for the occurrence of certain substructures in the molecular graph. As an alternative to enumerating very large sets of frequent patterns, numerous pattern set mining and pattern set selection techniques have been proposed. Existing approaches can be broadly classified into those that focus on minimizing correspondences, that is, the number of pairs of training instances from different classes with identical encodings and those that focus on maximizing the number of equivalence classes, that is, unique encodings in the training data. In this paper we evaluate a number of techniques to investigate which criterion is a better indicator of predictive accuracy. We find that minimizing correspondences is a necessary but not sufficient condition for good predictive accuracy, that equivalence classes are a better indicator of success and that it is important to have a good match between training set and pattern set size. Based on these results we propose a new, improved algorithm which performs local minimization of correspondences, yet evaluates the effect of patterns on equivalence classes globally. Empirical experiments demonstrate its efficacy and its superior run time behavior.
discovery science | 2008
Albrecht Zimmermann
Decision trees are among the most effective and interpretable classification algorithms while ensembles techniques have been proven to alleviate problems regarding over-fitting and variance. On the other hand, decision trees show a tendency to lack stability given small changes in the data, whereas interpreting an ensemble of trees is challenging to comprehend. We propose the technique of Ensemble-Treeswhich uses ensembles of rules withinthe test nodes to reduce over-fitting and variance effects. Validating the technique experimentally, we find that improvements in performance compared to ensembles of pruned trees exist, but also that the technique does less to reduce structural instability than could be expected.
international conference on data mining | 2011
Tias Guns; Siegfried Nijssen; Albrecht Zimmermann; Luc De Raedt
Recently, constraint programming has been proposed as a declarative framework for constraint-based pattern mining. In constraint programming, a problem is modelled in terms of constraints and search is done by a general solver. Similar to most pattern mining algorithms, these solvers typically employ exhaustive depth-first search, where constraints are used to prune the search space and make the search viable. In this paper we investigate the use of a similar declarative approach to the problem of pattern set mining. In pattern set mining one is searching for a small and useful set of patterns. In contrast to pattern mining, however, exhaustive search is not common in pattern set mining, the search space is often far too large to make such an approach practical. In this paper, we investigate an approach which aims to make general pattern set mining feasible by using a recently developed general solver that supports exhaustive as well as heuristic search. The key idea in this solver is that next to a declarative specification of the constraints also a high-level declarative description of the search is given. By separating the model and the search from the solver, the approach offers the advantage of reusing constraints and search strategies declaratively, while also allowing fast heuristic search.
intelligent data analysis | 2014
Albrecht Zimmermann
Frequent episode mining has been proposed as a data mining task for recovering sequential patterns from temporal data sequences and several approaches have been introduced over the last fifteen years. These techniques have however never been compared against each other in a large scale comparison, mainly because the existing real life data is prevented from entering the public domain by non-disclosure agreements. We perform such a comparison for the first time. To get around the problem of proprietary data, we employ a data generator based on a number of real life observations and capable of generating data that mimics real life data at our disposal. Artificial data offers the additional advantage that the underlying patterns are known, which is typically not the case for real life data. Thus, we can evaluate for the first time the ability of mining approaches to recover patterns that are embedded in noise. Our experiments indicate that temporal constraints are more important in affecting the effectiveness of episode mining than occurrence semantics. They also indicate that recovering underlying patterns when several phenomena are present at the same time is rather difficult and that there is need to develop better significance measures and techniques for dealing with sets of episodes.
knowledge discovery and data mining | 2013
Albrecht Zimmermann
Itemset mining approaches, while having been studied for more than 15 years, have been evaluated only on a handful of data sets. In particular, they have never been evaluated on data sets for which the ground truth was known. Thus, it is currently unknown whether itemset mining techniques actually recover underlying patterns. Since the weakness of the algorithmically attractive support/confidence framework became apparent early on, a number of interestingness measures have been proposed. Their utility, however, has not been evaluated, except for attempts to establish congruence with expert opinions. Using an extension of the Quest generator proposed in the original itemset mining paper, we propose to evaluate these measures objectively for the first time, showing how many non-relevant patterns slip through the cracks.
Inductive Databases and Constraint-Based Data Mining | 2010
Björn Bringmann; Siegfried Nijssen; Albrecht Zimmermann
Using pattern mining techniques for building a predictive model is currently a popular topic of research. The aim of these techniques is to obtain classifiers of better predictive performance as compared to greedily constructed models, as well as to allow the construction of predictive models for data not represented in attribute-value vectors. In this chapter we provide an overview of recent techniques we developed for integrating pattern mining and classification tasks. The range of techniques spans the entire range from approaches that select relevant patterns from a previously mined set for propositionalization of the data, over inducing patternbased rule sets, to algorithms that integrate pattern mining and model construction. We provide an overview of the algorithms which are most closely related to our approaches in order to put our techniques in a context.
intelligent information systems | 2015
Albrecht Zimmermann
Itemset mining approaches, while having been studied for more than 15 years, have been evaluated only on a handful of data sets. In particular, they have never been evaluated on data sets for which the ground truth was known. Thus, it is currently unknown whether itemset mining techniques actually recover underlying patterns. Since the weakness of the algorithmically attractive support/confidence framework became apparent early on, a number of interestingness measures have been proposed. Their utility, however, has not been evaluated, except for attempts to establish congruence with expert opinions. Using an extension of the Quest generator proposed in the original itemset mining paper, we propose to evaluate these measures objectively for the first time, showing how many non-relevant patterns slip through the cracks.