B. de la Iglesia
University of East Anglia
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by B. de la Iglesia.
Journal of Mathematical Modelling and Algorithms | 2006
Alan P. Reynolds; Graeme Richards; B. de la Iglesia; Victor J. Rayward-Smith
Previous research has resulted in a number of different algorithms for rule discovery. Two approaches discussed here, the ‘all-rules’ algorithm and multi-objective metaheuristics, both result in the production of a large number of partial classification rules, or ‘nuggets’, for describing different subsets of the records in the class of interest. This paper describes the application of a number of different clustering algorithms to these rules, in order to identify similar rules and to better understand the data.
congress on evolutionary computation | 2003
B. de la Iglesia; M.S. Philpott; Anthony J. Bagnall; Victor J. Rayward-Smith
In data mining, nugget discovery is the discovery of interesting classification rules that apply to a target class. In previous research, heuristic methods (genetic algorithms, simulated annealing and tabu search) have been used to optimise a single measure of interest. This paper proposes the use of multi-objective optimisation evolutionary algorithms to allow the user to interactively select a number of interest measures and deliver the best nuggets (an approximation to the Pareto-optimal set) according to those measures. Initial experiments are conducted on a number of databases, using an implementation of the fast elitist non-dominated sorting genetic algorithm (NSGA), and two well known measures of interest. Comparisons with the results obtained using modern heuristic methods are presented. Results indicate the potential of multi-objective evolutionary algorithms for the task of nugget discovery.
European Journal of Operational Research | 2006
B. de la Iglesia; Graeme Richards; M.S. Philpott; Victor J. Rayward-Smith
In this paper, we present an application of multi-objective metaheuristics to the field of data mining. We introduce the data mining task of nugget discovery (also known as partial classification) and show how the multi-objective metaheuristic algorithm NSGA II can be modified to solve this problem. We also present an alternative algorithm for the same task, the ARAC algorithm, which can find all rules that are best according to some measures of interest subject to certain constraints. The ARAC algorithm provides an excellent basis for comparison with the results of the multi-objective metaheuristic algorithm as it can deliver the Pareto optimal front consisting of all partial classification rules that lie in the upper confidence/coverage border, for databases of limited size. We present the results of experiments with various well-known databases for both algorithms. We also discuss how the two methods can be used complementarily for large databases to deliver a set of best rules according to some predefined criteria, providing a powerful tool for knowledge discovery in databases.
international joint conference on neural network | 2006
Alan P. Reynolds; B. de la Iglesia
Previous research produced a multi-objective metaheuristic for partial classification, where rule dominance is determined through the comparison of rules based on just two objectives: rule confidence and coverage. The user is presented with a set of descriptions of the class of interest from which he may select a subset. This paper presents two enhancements to this algorithm, describing how the use of modified dominance relations may increase the diversity of rules presented to the user and how clustering techniques may be used to aid in the presentation of the potentially large sets of rules generated.
Methods of Information in Medicine | 2011
Joao H. Bettencourt-Silva; B. de la Iglesia; Simon T. Donell; Victor J. Rayward-Smith
BACKGROUND The information present in Hospital Information Systems (HIS) is heterogeneous and is used primarily by health practitioners to support and improve patient care. Conducting clinical research, data analyses or knowledge discovery projects using electronic patient data in secondary care centres relies on accurate data collection, which is often an ad-hoc process poorly described in the literature. OBJECTIVES This paper aims at facilitating and expanding on the process of retrieving and collating patient-centric data from multiple HIS for the purpose of creating a research database. The development of a process roadmap for this purpose illustrates and exposes the constraints and drawbacks of undertaking such work in secondary care centres. METHODS A data collection exercise was carried using a combined approach based on segments of well established data mining and knowledge discovery methodologies, previous work on clinical data integration and local expert consultation. A case study on prostate cancer was carried out at an English regional National Health Service (NHS) hospital. RESULTS The process for data retrieval described in this paper allowed patient-centric data, pertaining to the case study on prostate cancer, to be successfully collected from multiple heterogeneous hospital sources, and collated in a format suitable for further clinical research. CONCLUSIONS The data collection exercise described in this paper exposes the lengthy and difficult journey of retrieving and collating patient-centric, multi-source data from a hospital, which is indeed a non-trivial task, and one which will greatly benefit from further attention from researchers and hospital IT management.
computational intelligence and security | 2005
Hong Yan Yi; B. de la Iglesia; Victor J. Rayward-Smith
Taxonomies are exploited to generate improved decision trees. Experiments show very considerable improvements in tree simplicity can be achieved with little or no loss of accuracy.
multiple criteria decision making | 2007
Alan P. Reynolds; B. de la Iglesia
The most successful multi-objective metaheuristics, such as NSGA II and SPEA 2, usually apply a form of elitism in the search. However, there are multi-objective problems where this approach leads to a major loss of population diversity early in the search. In earlier work, the authors applied a multi-objective metaheuristic to the problem of rule induction for predictive classification, minimizing rule complexity and misclassification costs. While high quality results were obtained, this problem was found to suffer from such a loss of diversity. This paper describes the use of both linear combinations of objectives and modified dominance relations to control population diversity, producing higher quality results in shorter run times
Archive | 2001
B. de la Iglesia; C.M. Howard; Victor J. Rayward-Smith
Knowledge discovery in databases (KDD) is an iterative multi-stage process for extracting useful, non-trivial information from large databases. Each stage of the process presents numerous choices to the user that can significantly change the outcome of the project. This methodology, presented in the form of a roadmap, emphasises the importance of the early stages of the KDD process and shows how careful planning can lead to a successful and well-managed project. The content is the result of expertise acquired through research and a wide range of practical experiences; the work is of value to KDD experts and novices alike. Each stage, from specification to exploitation, is described in detail with suggested approaches, resources and questions that should be considered. The final section describes how the methodology has been successfully used in the design of a commercial KDD toolkit.
international conference on information and communication technologies | 2005
B. de la Iglesia; Alan P. Reynolds
In this paper we explore the application of powerful optimisers known as metaheuristic algorithms to problems within the data mining domain. We introduce some well-known data mining problems, and show how they can be formulated as optimisation problems. We then review the use of metaheuristics in this context. In particular, we focus on the task of partial classification and show how multi-objective metaheuristics have produced results that are comparable to the best known techniques but more scalable to large databases. We conclude by reinforcing the importance of research on the areas of metaheuristics for optimisation and data mining. The combination of robust methods for solving real-life problems in a reasonable time and the ability to apply these methods to the analysis of large repositories of data may hold the key for success in many other scientific and commercial application areas.
Metaheuristics | 2004
B. de la Iglesia; Jan-Jaap Wesselink; Victor J. Rayward-Smith; Jo Dicks; Ian N. Roberts; Vincent Robert; Teun Boekhout
This paper describes new approaches to classification/identification of biological data. It is expected that the work: may be extensible to other domains such as the medical domain or fault diagnostic problems. Organisms are often classified according to the value of tests which are used for measuring some characteristic of the organism. When selecting a suitable test set it is important to choose one of minimum cost. Equally, when classification models are constructed for the posterior identification of unnamed individuals it is important to produce optimal models in terms of identification performance and cost. In this paper, we first describe the problem of selecting an economic test set for classification. We develop a criterion for differentiation of organisms which may encompass fuzzy differentiability. Then, we describe the problem of using batches of tests sequentially for identification of unknown organisms, and we explore the problem of constructing the best sequence of batches of tests in terms of cost and identification performance. We discuss how metaheuristic algorithms may be used in the solution of these problems. We also present an application of the above to the problem of yeast classification and identification.