Paul H. Leng
University of Liverpool
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Paul H. Leng.
Data Mining and Knowledge Discovery | 2004
Frans Coenen; Graham Goulbourne; Paul H. Leng
A well-known approach to Knowledge Discovery in Databases involves the identification of association rules linking database attributes. Extracting all possible association rules from a database, however, is a computationally intractable problem, because of the combinatorial explosion in the number of sets of attributes for which incidence-counts must be computed. Existing methods for dealing with this may involve multiple passes of the database, and tend still to cope badly with densely-packed database records. We describe here a class of methods we have introduced that begin by using a single database pass to perform a partial computation of the totals required, storing these in the form of a set enumeration tree, which is created in time linear to the size of the database. Algorithms for using this structure to complete the count summations are discussed, and a method is described, derived from the well-known Apriori algorithm. Results are presented demonstrating the performance advantage to be gained from the use of this approach. Finally, we discuss possible further applications of the method.
IEEE Transactions on Knowledge and Data Engineering | 2004
Frans Coenen; Paul H. Leng; Shakil Ahmed
Two new structures for association rule mining (ARM), the T-tree, and the P-tree, together with associated algorithms, are described. The authors demonstrate that the structures and algorithms offer significant advantages in terms of storage and execution time.
Knowledge Based Systems | 2000
Graham Goulbourne; Frans Coenen; Paul H. Leng
Abstract This paper presents new algorithms for the extraction of association rules from binary databases. Most existing methods operate by generating “candidate” sets, representing combinations of attributes which may be associated, and then testing the database to establish the degree of association. This may involve multiple database passes, and is also likely to encounter problems when dealing with “dense” data due to the increase in the number of sets under consideration. Our method uses a single pass of the database to perform a partial computation of support for all sets encountered in the database, storing this in the form of a set enumeration tree. We describe algorithms for generating this tree and for using it to generate association rules.
data and knowledge engineering | 2007
Frans Coenen; Paul H. Leng
Classification Association Rule Mining (CARM) systems operate by applying an Association Rule Mining (ARM) method to obtain classification rules from a training set of previously classified data. The rules thus generated will be influenced by the choice of ARM parameters employed by the algorithm (typically support and confidence threshold values). In this paper we examine the effect that this choice has on the predictive accuracy of CARM methods. We show that the accuracy can almost always be improved by a suitable choice of parameters, and describe a hill-climbing method for finding the best parameter settings. We also demonstrate that the proposed hill-climbing method is most effective when coupled with a fast CARM algorithm such as the TFPC algorithm which is also described.
knowledge discovery and data mining | 2005
Frans Coenen; Paul H. Leng; Lu Zhang
One application of Association Rule Mining (ARM) is to identify Classification Association Rules (CARs) that can be used to classify future instances from the same population as the data being mined. Most CARM methods first mine the data for candidate rules, then prune these using coverage analysis of the training data. In this paper we describe a CARM algorithm that avoids the need for coverage analysis, and a technique for tuning its threshold parameters to obtain more accurate classification. We present results to show this approach can achieve better accuracy than comparable alternatives at lower cost.
european conference on principles of data mining and knowledge discovery | 2001
Frans Coenen; Graham Goulbourne; Paul H. Leng
The problem of extracting all association rules from within a binary database is well-known. Existing methods may involve multiple passes of the database, and cope badly with densely- packed database records because of the combinatorial explosion in the number of sets of attributes for which incidence-counts must be computed. We describe here a class of methods we have introduced that begin by using a single database pass to perform a partial computation of the totals required, storing these in the form of a set enumeration tree, which is created in time linear to the size of the database. Algorithms for using this structure to complete the count summations are discussed, and a method is described, derived from the well-known Apriori algorithm. Results are presented demonstrating the performance advantage to be gained from the use of this approach.
international conference on data mining | 2004
Frans Coenen; Paul H. Leng
In this paper a number of classification rule evaluation measures are considered. In particular the authors review the use of a variety of selection techniques used to order classification rules contained in a classifier, and a number of mechanisms used to classify unseen data. The authors demonstrate that rule ordering founded on the size of antecedent works well given certain conditions.
international conference on data mining | 2003
Frans Coenen; Paul H. Leng; Shakil Ahmed
We consider a technique (DATA-VP) for distributed (and parallel) association rule mining that makes use of a vertical partitioning technique to distribute the input data, amongst processors. The proposed vertical partitioning is facilitated by a novel compressed set enumeration tree data structure (the T-tree), and an associated mining algorithm (Apriori-T), that allows for computationally effective distributed/parallel ARM when compared with existing approaches.
international conference on data mining | 2005
Frans Coenen; Paul H. Leng
In this paper we examine the effect that the choice of support and confidence thresholds has on the accuracy of classifiers obtained by classification association rule mining. We show that accuracy can almost always be improved by a suitable choice of threshold values, and we describe a method for finding the best values. We present results that demonstrate this approach can obtain higher accuracy without the need for coverage analysis of the training data.
Knowledge Based Systems | 2009
Kamal Ali Albashiri; Frans Coenen; Paul H. Leng
In this paper, we describe EMADS, an extendible multi-agent data mining system. The EMADS vision is that of a community of data mining agents, contributed by many individuals, interacting under decentralised control to address data mining requests. EMADS is seen both as an end user application and a research tool. This paper details the EMADS vision, the associated conceptual framework and the current implementation. Although EMADS may be applied to many data mining tasks; the study described here, for the sake of brevity, concentrates on agent based data classification. A full description of EMADS is presented.