Graeme Richards
University of East Anglia
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Graeme Richards.
Journal of Mathematical Modelling and Algorithms | 2006
Alan P. Reynolds; Graeme Richards; B. de la Iglesia; Victor J. Rayward-Smith
Previous research has resulted in a number of different algorithms for rule discovery. Two approaches discussed here, the ‘all-rules’ algorithm and multi-objective metaheuristics, both result in the production of a large number of partial classification rules, or ‘nuggets’, for describing different subsets of the records in the class of interest. This paper describes the application of a number of different clustering algorithms to these rules, in order to identify similar rules and to better understand the data.
intelligent data engineering and automated learning | 2004
Alan P. Reynolds; Graeme Richards; Victor J. Rayward-Smith
Earlier research has resulted in the production of an ‘all-rules’ algorithm for data-mining that produces all conjunctive rules of above given confidence and coverage thresholds. While this is a useful tool, it may produce a large number of rules. This paper describes the application of two clustering algorithms to these rules, in order to identify sets of similar rules and to better understand the data.
European Journal of Operational Research | 2006
B. de la Iglesia; Graeme Richards; M.S. Philpott; Victor J. Rayward-Smith
In this paper, we present an application of multi-objective metaheuristics to the field of data mining. We introduce the data mining task of nugget discovery (also known as partial classification) and show how the multi-objective metaheuristic algorithm NSGA II can be modified to solve this problem. We also present an alternative algorithm for the same task, the ARAC algorithm, which can find all rules that are best according to some measures of interest subject to certain constraints. The ARAC algorithm provides an excellent basis for comparison with the results of the multi-objective metaheuristic algorithm as it can deliver the Pareto optimal front consisting of all partial classification rules that lie in the upper confidence/coverage border, for databases of limited size. We present the results of experiments with various well-known databases for both algorithms. We also discuss how the two methods can be used complementarily for large databases to deliver a set of best rules according to some predefined criteria, providing a powerful tool for knowledge discovery in databases.
international conference of the ieee engineering in medicine and biology society | 2005
Wenjia Wang; Graeme Richards; Sarah Rea
This paper presents the research in developing data mining ensembles for predicting the risk of osteoporosis prevalence in women. Osteoporosis is a bone disease that commonly occurs among postmenopausal women and no effective treatments are available at the moment, except prevention, which requires early diagnosis. However, early detection of the disease is very difficult. This research aims to devise an intelligent diagnosis support system by using data mining ensemble technology to assist general practitioners assessing patients risk at developing osteoporosis. The paper describes the methods for constructing effective ensembles through measuring diversity between individual predictors. Hybrid ensembles are implemented by neural networks and decision trees. The ensembles built for predicting osteoporosis are evaluated by the real-world data and the results indicate that the hybrid ensembles have relatively high-level of diversity and thus are able to improve prediction accuracy
international conference on data mining | 2001
Graeme Richards; Victor J. Rayward-Smith
In this paper we address the problem of finding all association rules in tabular data. An algorithm, ARA, for finding rules, that satisfy clearly specified constraints, in tabular data is presented. ARA is based on the dense miner algorithm but includes an additional constraint and an improved method of calculating support. ARA is tested and compared with our implementation of dense miner; it is concluded that ARA is usually more efficient than dense miner and is often considerably more so. We also consider the potential for modifying the constraints used in ARA in order to find more general rules.
intelligent information systems | 2013
Jon Hills; Anthony J. Bagnall; Beatriz de la Iglesia; Graeme Richards
Association rule mining can provide genuine insight into the data being analysed; however, rule sets can be extremely large, and therefore difficult and time-consuming for the user to interpret. We propose reducing the size of Apriori rule sets by removing overlapping rules, and compare this approach with two standard methods for reducing rule set size: increasing the minimum confidence parameter, and increasing the minimum antecedent support parameter. We evaluate the rule sets in terms of confidence and coverage, as well as two rule interestingness measures that favour rules with antecedent conditions that are poor individual predictors of the target class, as we assume that these represent potentially interesting rules. We also examine the distribution of the rules graphically, to assess whether particular classes of rules are eliminated. We show that removing overlapping rules substantially reduces rule set size in most cases, and alters the character of a rule set less than if the standard parameters are used to constrain the rule set to the same size. Based on our results, we aim to extend the Apriori algorithm to incorporate the suppression of overlapping rules.
Journal of Intelligent Information Systems | 2012
Graeme Richards; Wenjia Wang
An ensemble in machine learning is defined as a set of models (such as classifiers or predictors) that are induced individually from data by using one or more machine learning algorithms for a given task and then work collectively in the hope of generating improved decisions. In this paper we investigate the factors that influence ensemble performance, which mainly include accuracy of individual classifiers, diversity between classifiers, the number of classifiers in an ensemble and the decision fusion strategy. Among them, diversity is believed to be a key factor but more complex and difficult to be measured quantitatively, and it was thus chosen as the focus of this study, together with the relationships between the other factors. A technique was devised to build ensembles with decision trees that are induced with randomly selected features. Three sets of experiments were performed using 12 benchmark datasets, and the results indicate that (i) a high level of diversity indeed makes an ensemble more accurate and robust compared with individual models; (ii) small ensembles can produce results as good as, or better than, large ensembles provided the appropriate (e.g. more diverse) models are selected for the inclusion. This has implications that for scaling up to larger databases the increased efficiency of smaller ensembles becomes more significant and beneficial. As a test case study, ensembles are built based on these findings for a real world application—osteoporosis classification, and found that, in each case of three datasets used, the ensembles out-performed individual decision trees consistently and reliably.
intelligent data engineering and automated learning | 2004
Karl J. Brazier; Graeme Richards; Wenjia Wang
Implicit fitness sharing is an approach to the stimulation of speciation in evolutionary computation for problems where the fitness of an individual is determined as its success rate over a number trials against a collection of succeed/fail tests. By fixing the reward available for each test, individuals succeeding in a particular test are caused to depress the size of one another’s fitness gain and hence implicitly co-operate with those succeeding in other tests. An important class of problems of this form is that of attribute-value learning of classifiers. Here, it is recognised that the combination of diverse classifiers has the potential to enhance performance in comparison with the use of the best obtainable individual classifiers. However, proposed prescriptive measures of the diversity required have inherent limitations from which we would expect the diversity emergent from the self-organisation of speciating evolutionary simulation to be free. The approach was tested on a number of the popularly used real-world data sets and produced encouraging results in terms of accuracy and stability.
international joint conference on neural network | 2006
Graeme Richards; Wenjia Wang
An ensemble is viewed as a machine learning system that combines multiple models to work collectively in the hope of producing a better performance than that of individuals. However, an ensembles accuracy cannot be easily determined as it involves several factors, e.g. individual models accuracy, diversity between its member models, decision- making strategy and number of members and the relationships between them are unclear. This paper, taking random decision tree ensembles as testing platforms, investigates these relationships and the strategies for creating ensembles from randomly generated trees. Specifically, we devised three sets of procedures for conducting experiments using twelve data sets from the UCI repository to determine the importance of individual model accuracy and the diversity between decision tree models within an ensemble. The main findings of the investigations are presented and discussed in the paper.
Artificial Intelligence in Medicine | 2001
Graeme Richards; Victor J. Rayward-Smith; P. H. Sönksen; S. Carey; C. Weng