Paul H. Leng | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Paul H. Leng is active.

Explore More

Publication

Featured researches published by Paul H. Leng.

Data Mining and Knowledge Discovery | 2004

Tree Structures for Mining Association Rules

Frans Coenen; Graham Goulbourne; Paul H. Leng

A well-known approach to Knowledge Discovery in Databases involves the identification of association rules linking database attributes. Extracting all possible association rules from a database, however, is a computationally intractable problem, because of the combinatorial explosion in the number of sets of attributes for which incidence-counts must be computed. Existing methods for dealing with this may involve multiple passes of the database, and tend still to cope badly with densely-packed database records. We describe here a class of methods we have introduced that begin by using a single database pass to perform a partial computation of the totals required, storing these in the form of a set enumeration tree, which is created in time linear to the size of the database. Algorithms for using this structure to complete the count summations are discussed, and a method is described, derived from the well-known Apriori algorithm. Results are presented demonstrating the performance advantage to be gained from the use of this approach. Finally, we discuss possible further applications of the method.

IEEE Transactions on Knowledge and Data Engineering | 2004

Data structure for association rule mining: T-trees and P-trees

Frans Coenen; Paul H. Leng; Shakil Ahmed

Two new structures for association rule mining (ARM), the T-tree, and the P-tree, together with associated algorithms, are described. The authors demonstrate that the structures and algorithms offer significant advantages in terms of storage and execution time.

Knowledge Based Systems | 2000

Algorithms for computing association rules using a partial-support tree

Graham Goulbourne; Frans Coenen; Paul H. Leng

Abstract This paper presents new algorithms for the extraction of association rules from binary databases. Most existing methods operate by generating “candidate” sets, representing combinations of attributes which may be associated, and then testing the database to establish the degree of association. This may involve multiple database passes, and is also likely to encounter problems when dealing with “dense” data due to the increase in the number of sets under consideration. Our method uses a single pass of the database to perform a partial computation of support for all sets encountered in the database, storing this in the form of a set enumeration tree. We describe algorithms for generating this tree and for using it to generate association rules.

data and knowledge engineering | 2007

The effect of threshold values on association rule based classification accuracy

Frans Coenen; Paul H. Leng

Classification Association Rule Mining (CARM) systems operate by applying an Association Rule Mining (ARM) method to obtain classification rules from a training set of previously classified data. The rules thus generated will be influenced by the choice of ARM parameters employed by the algorithm (typically support and confidence threshold values). In this paper we examine the effect that this choice has on the predictive accuracy of CARM methods. We show that the accuracy can almost always be improved by a suitable choice of parameters, and describe a hill-climbing method for finding the best parameter settings. We also demonstrate that the proposed hill-climbing method is most effective when coupled with a fast CARM algorithm such as the TFPC algorithm which is also described.

knowledge discovery and data mining | 2005

Threshold tuning for improved classification association rule mining

Frans Coenen; Paul H. Leng; Lu Zhang

One application of Association Rule Mining (ARM) is to identify Classification Association Rules (CARs) that can be used to classify future instances from the same population as the data being mined. Most CARM methods first mine the data for candidate rules, then prune these using coverage analysis of the training data. In this paper we describe a CARM algorithm that avoids the need for coverage analysis, and a technique for tuning its threshold parameters to obtain more accurate classification. We present results to show this approach can achieve better accuracy than comparable alternatives at lower cost.

european conference on principles of data mining and knowledge discovery | 2001

Computing Association Rules Using Partial Totals

Frans Coenen; Graham Goulbourne; Paul H. Leng

The problem of extracting all association rules from within a binary database is well-known. Existing methods may involve multiple passes of the database, and cope badly with densely- packed database records because of the combinatorial explosion in the number of sets of attributes for which incidence-counts must be computed. We describe here a class of methods we have introduced that begin by using a single database pass to perform a partial computation of the totals required, storing these in the form of a set enumeration tree, which is created in time linear to the size of the database. Algorithms for using this structure to complete the count summations are discussed, and a method is described, derived from the well-known Apriori algorithm. Results are presented demonstrating the performance advantage to be gained from the use of this approach.

international conference on data mining | 2004

An evaluation of approaches to classification rule selection

Frans Coenen; Paul H. Leng

In this paper a number of classification rule evaluation measures are considered. In particular the authors review the use of a variety of selection techniques used to order classification rules contained in a classifier, and a number of mechanisms used to classify unseen data. The authors demonstrate that rule ordering founded on the size of antecedent works well given certain conditions.

international conference on data mining | 2003

T-trees, vertical partitioning and distributed association rule mining

Frans Coenen; Paul H. Leng; Shakil Ahmed

We consider a technique (DATA-VP) for distributed (and parallel) association rule mining that makes use of a vertical partitioning technique to distribute the input data, amongst processors. The proposed vertical partitioning is facilitated by a novel compressed set enumeration tree data structure (the T-tree), and an associated mining algorithm (Apriori-T), that allows for computationally effective distributed/parallel ARM when compared with existing approaches.

international conference on data mining | 2005

Obtaining best parameter values for accurate classification

Frans Coenen; Paul H. Leng

In this paper we examine the effect that the choice of support and confidence thresholds has on the accuracy of classifiers obtained by classification association rule mining. We show that accuracy can almost always be improved by a suitable choice of threshold values, and we describe a method for finding the best values. We present results that demonstrate this approach can obtain higher accuracy without the need for coverage analysis of the training data.

Knowledge Based Systems | 2009

EMADS: An extendible multi-agent data miner

Kamal Ali Albashiri; Frans Coenen; Paul H. Leng

In this paper, we describe EMADS, an extendible multi-agent data mining system. The EMADS vision is that of a community of data mining agents, contributed by many individuals, interacting under decentralised control to address data mining requests. EMADS is seen both as an end user application and a research tool. This paper details the EMADS vision, the associated conceptual framework and the current implementation. Although EMADS may be applied to many data mining tasks; the study described here, for the sake of brevity, concentrates on agent based data classification. A full description of EMADS is presented.

Explore More