Peter A. Flach
University of Bristol
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Peter A. Flach.
european conference on machine learning | 2009
Tarek Abudawood; Peter A. Flach
Subgroup discovery aims at finding subsets of a population whose class distribution is significantly different from the overall distribution. It has previously predominantly been investigated in a two-class context. This paper investigates multi-class subgroup discovery methods. We consider six evaluation measures for multi-class subgroups, four of them new, and study their theoretical properties. We extend the two-class subgroup discovery algorithm CN2-SD to incorporate the new evaluation measures and a new weighting scheme inspired by AdaBoost. We demonstrate the usefulness of multi-class subgroup discovery experimentally, using discovered subgroups as features for a decision tree learner. Not only is the number of leaves of the decision tree reduced with a factor between 8 and 16 on average, but significant improvements in accuracy and AUC are achieved with particular evaluation measures and settings. Similar performance improvements can be observed when using naive Bayes.
conference on learning theory | 2003
Thomas Gärtner; Peter A. Flach; Stefan Wrobel
As most ‘real-world’ data is structured, research in kernel methods has begun investigating kernels for various kinds of structured data. One of the most widely used tools for modeling structured data are graphs. An interesting and important challenge is thus to investigate kernels on instances that are represented by graphs. So far, only very specific graphs such as trees and strings have been considered.
inductive logic programming | 1999
Nada Lavrač; Peter A. Flach; Blaz Zupan
Numerous measures are used for performance evaluation in machine learning. In predictive knowledge discovery, the most frequently used measure is classification accuracy. With new tasks being addressed in knowledge discovery, new measures appear. In descriptive knowledge discovery, where induced rules are not primarily intended for classification, new measures used are novelty in clausal and subgroup discovery, and support and confidence in association rule learning. Additional measures are needed as many descriptive knowledge discovery tasks involve the induction of a large set of redundant rules and the problem is the ranking and filtering of the induced rule set. In this paper we develop a unifying view on some of the existing measures for predictive and descriptive induction. We provide a common terminology and notation by means of contingency tables. We demonstrate how to trade off these measures, by using what we call weighted relative accuracy. The paper furthermore demonstrates that many rule evaluation measures developed for predictive knowledge discovery can be adapted to descriptive knowledge discovery tasks.
Relational Data Mining | 2001
Stefan Kramer; Nada Lavrač; Peter A. Flach
This chapter surveys methods that transform a relational representation of a learning problem into a propositional (feature-based, attribute-value) representation. This kind of representation change is known as propositionalization. Taking such an approach, feature construction can be decoupled from model construction. It has been shown that in many relational data mining applications this can be done without loss of predictive performance. After reviewing both general-purpose and domain-dependent propositionalization approaches from the literature, an extension to the LINUS propositionalization method that overcomes the systems earlier inability to deal with non-determinate local variables is described.
Archive | 2001
Luc De Raedt; Peter A. Flach
This paper presents a missing link between Plotkin’s least general generalization formalism and generalization on the Order Sorted Feature (OSF) foundation. A feature term (or ψ-term) is an extended logic term based on ordered sorts and is a normal form of an OSF-term. An axiomatic definition of ψ-term generalization is given as a set of OSF clause generalization rules and the least generality of the axiomatic definition is proven in the sense of Plotkin’s least general generalization (lgg). The correctness of the definition is given on the basis of the axiomatic foundation. An operational definition of the least general generalization of clauses based on ψ-terms is also shown as a realization of the axiomatic definition.
Machine Learning | 2005
Johannes Fürnkranz; Peter A. Flach
This paper provides an analysis of the behavior of separate-and-conquer or covering rule learning algorithms by visualizing their evaluation metrics and their dynamics in coverage space, a variant of ROC space. Our results show that most commonly used metrics, including accuracy, weighted relative accuracy, entropy, and Gini index, are equivalent to one of two fundamental prototypes: precision, which tries to optimize the area under the ROC curve for unknown costs, and a cost-weighted difference between covered positive and negative examples, which tries to find the optimal point under known or assumed costs. We also show that a straightforward generalization of the m-estimate trades off these two prototypes. Furthermore, our results show that stopping and filtering criteria like CN2’s significance test focus on identifying significant deviations from random classification, which does not necessarily avoid overfitting. We also identify a problem with Foil’s MDL-based encoding length restriction, which proves to be largely equivalent to a variable threshold on the recall of the rule. In general, we interpret these results as evidence that, contrary to common conception, pre-pruning heuristics are not very well understood and deserve more investigation.
Machine Learning | 2004
Thomas Gärtner; John W. Lloyd; Peter A. Flach
This paper brings together two strands of machine learning of increasing importance: kernel methods and highly structured data. We propose a general method for constructing a kernel following the syntactic structure of the data, as defined by its type signature in a higher-order logic. Our main theoretical result is the positive definiteness of any kernel thus defined. We report encouraging experimental results on a range of real-world data sets. By converting our kernel to a distance pseudo-metric for 1-nearest neighbour, we were able to improve the best accuracy from the literature on the Diterpene data set by more than 10%.
inductive logic programming | 2002
Thomas Gärtner; John W. Lloyd; Peter A. Flach
Learning from structured data is becoming increasingly important. However, most prior work on kernel methods has focused on learning from attribute-value data. Only recently have researchers started investigating kernels for structured data. This paper describes how kernel definitions can be simplified by identifying the structure of the data and how kernels can be defined on this structure. We propose a kernel for structured data, prove that it is positive definite, and show how it can be adapted in practical applications.
Machine Learning | 2001
Peter A. Flach; Nicolas Lachiche
This paper deals with learning first-order logic rules from data lacking an explicit classification predicate. Consequently, the learned rules are not restricted to predicate definitions as in supervised inductive logic programming. First-order logic offers the ability to deal with structured, multi-relational knowledge. Possible applications include first-order knowledge discovery, induction of integrity constraints in databases, multiple predicate learning, and learning mixed theories of predicate definitions and integrity constraints. One of the contributions of our work is a heuristic measure of confirmation, trading off novelty and satisfaction of the rule. The approach has been implemented in the Tertius system. The system performs an optimal best-first search, finding the k most confirmed hypotheses, and includes a non-redundant refinement operator to avoid duplicates in the search. Tertius can be adapted to many different domains by tuning its parameters, and it can deal either with individual-based representations by upgrading propositional representations to first-order, or with general logical rules. We describe a number of experiments demonstrating the feasibility and flexibility of our approach.
inductive logic programming | 2003
Mark-A. Krogel; Simon Rawles; Filip Železný; Peter A. Flach; Nada Lavrač; Stefan Wrobel
Propositionalization has already been shown to be a promising approach for robustly and effectively handling relational data sets for knowledge discovery. In this paper, we compare up-to-date methods for propositionalization from two main groups: logic-oriented and database-oriented techniques. Experiments using several learning tasks – both ILP benchmarks and tasks from recent international data mining competitions – show that both groups have their specific advantages. While logic-oriented methods can handle complex background knowledge and provide expressive first-order models, database-oriented methods can be more efficient especially on larger data sets. Obtained accuracies vary such that a combination of the features produced by both groups seems a further valuable venture.