Beatriz de la Iglesia

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Beatriz de la Iglesia is active.

Explore More

Publication

Featured researches published by Beatriz de la Iglesia.

international conference on evolutionary multi criterion optimization | 2005

Developments on a multi-objective metaheuristic (MOMH) algorithm for finding interesting sets of classification rules

Beatriz de la Iglesia; Alan P. Reynolds; Victor J. Rayward-Smith

In this paper, we experiment with a combination of innovative approaches to rule induction to encourage the production of interesting sets of classification rules. These include multi-objective metaheuristics to induce the rules; measures of rule dissimilarity to encourage the production of dissimilar rules; and rule clustering algorithms to evaluate the results obtained. Our previous implementation of NSGA-II for rule induction produces a set of cc-optimal rules (coverage-confidence optimal rules). Among the set of rules produced there may be rules that are very similar. We explore the concept of rule similarity and experiment with a number of modifications of the crowding distance to increasing the diversity of the partial classification rules produced by the multi-objective algorithm.

Heart | 2011

Performance of the ASSIGN cardiovascular disease risk score on a UK cohort of patients from general practice

Beatriz de la Iglesia; John F. Potter; Neil Poulter; Margaret M Robins; Jane Skinner

Objective To evaluate the performance of ASSIGN against the Framingham equations for predicting 10 year risk of cardiovascular disease in a UK cohort of patients from general practice and to make the evaluation comparable to an independent evaluation of QRISK on the same cohort. Design Prospective open cohort study. Setting 288 practices from England and Wales contributing to The Health Improvement Network (THIN) database. Participants Patients registered with 288 UK practices for some period between January 1995 and March 2006. The number of records available was 1 787 169. Main outcome measures First diagnosis of myocardial infarction, coronary heart disease, stroke and transient ischaemic attacks recorded. Methods We implemented the Anderson Framingham Coronary Heart Disease and Stroke models, ASSIGN, and a more recent Framingham Cox proportional-hazards model and analysed their calibration and discrimination. Results Calibration showed that all models tested over-estimated risk particularly for men. ASSIGN showed better discrimination with higher AUROC (0.756/0.792 for men/women), D statistic (1.35/1.58 for men/women), and R2 (30.47%/37.39% for men/women). The performance of ASSIGN was comparable to that of QRISK on the same cohort. Models agreed on 93–97% of categorical (high/lower) risk assessments and when they disagreed, ASSIGN was often closer to the estimated Kaplan-Meier incidence. ASSIGN also provided a steeper gradient of deprivation and discriminated between those with and without recorded family history of CVD. The estimated incidence was twice/three times as high for women/men with a recorded family history of CVD. Conclusions For systematic CVD risk assessment all models could usefully be applied, but ASSIGN improved on the gradient of deprivation and accounted for recorded family history whereas the Framingham equations did not. However, all models display relatively low specificity and sensitivity. An additional conclusion is that the recording of family history of CVD in primary care databases needs to improve given its importance in risk assessment.

soft computing | 2008

A multi-objective GRASP for partial classification

Alan P. Reynolds; Beatriz de la Iglesia

Metaheuristic algorithms have been used successfully in a number of data mining contexts and specifically in the production of classification rules. Classification rules describe a class of interest or a subset of this class, and as such may also be used as an aid in prediction. The production and selection of classification rules for a particular class of the database is often referred to as partial classification. Since partial classification rules are often evaluated according to a number of conflicting objectives, the generation of such rules is a task that is well suited to a multi-objective (MO) metaheuristic approach. In this paper we discuss how to adapt well known MO algorithms for the task of partial classification. Additionally, we introduce a new MO algorithm for this task based on a greedy randomized adaptive search procedure (GRASP). GRASP has been applied to a number of problems in combinatorial optimization, but it has very seldom been used in a MO setting, and generally only through repeated optimization of single objective problems, using either linear combinations of the objectives or additional constraints. The approach presented takes advantage of some specific characteristics of the data mining problem being solved, allowing for the very effective construction of a set of solutions that form the starting point for the local search phase of the GRASP. The resulting algorithm is guided solely by the concepts of dominance and Pareto-optimality. We present experimental results for our partial classification GRASP and other MO metaheuristics. These show that such algorithms are generally very well suited to this data mining task and furthermore, the GRASP brings additional efficiency to the search for partial classification rules.

Wiley Interdisciplinary Reviews-Data Mining and Knowledge Discovery | 2013

Evolutionary computation for feature selection in classification problems

Beatriz de la Iglesia

Feature subset selection (FSS) has received a great deal of attention in statistics, machine learning, and data mining. Real world data analyzed by data mining algorithms can involve a large number of redundant or irrelevant features or simply too many features for a learning algorithm to handle them efficiently. Feature selection is becoming essential as databases grow in size and complexity. The selection process is expected to bring benefits in terms of better performing models, computational efficiency, and simpler more understandable models. Evolutionary computation (EC) encompasses a number of naturally inspired techniques such as genetic algorithms, genetic programming, ant colony optimization, or particle swarm optimization algorithms. Such techniques are well suited to feature selection because the representation of a feature subset is straightforward and the evaluation can also be easily accomplished through the use of wrapper or filter algorithms. Furthermore, the capability of such heuristic algorithms to efficiently search large search spaces is of great advantage to the feature selection problem. Here, we review the use of different EC paradigms for feature selection in classification problems. We discuss details of each implementation including representation, evaluation, and validation. The review enables us to uncover the best EC algorithms for FSS and to point at future research directions. WIREs Data Mining Knowl Discov 2013, 3:381–407. doi: 10.1002/widm.1106 Conflict of interest: The authors have declared no conflicts of interest for this article. For further resources related to this article, please visit the WIREs website.

international conference on evolutionary multi criterion optimization | 2007

Rule induction for classification using multi-objective genetic programming

Alan P. Reynolds; Beatriz de la Iglesia

Multi-objective metaheuristics have previously been applied to partial classification, where the objective is to produce simple, easy to understand rules that describe subsets of a class of interest. While this provides a useful aid in descriptive data mining, it is difficult to see how the rules produced can be combined usefully to make a predictive classifier. This paper describes how, by using a more complex representation of the rules, it is possible to produce effective classifiers for two class problems. Furthermore, through the use of multi-objective genetic programming, the user can be provided with a selection of classifiers providing different trade-offs between the misclassification costs and the overall model complexity.

Bioinformatics | 2002

Determining a unique defining DNA sequence for yeast species using hashing techniques

Jan-Jaap Wesselink; Beatriz de la Iglesia; Stephen A. James; Jo Dicks; Ian N. Roberts; Victor J. Rayward-Smith

MOTIVATION Yeasts are often still identified with physiological growth tests, which are both time consuming and unsuitable for detection of a mixture of organisms. Hence, there is a need for molecular methods to identify yeast species. RESULTS A hashing technique has been developed to search for unique DNA sequences in 702 26S rRNA genes. A unique DNA sequence has been found for almost every yeast species described to date. The locations of the unique defining sequences are in accordance with the variability map of large subunit ribosomal RNA and provide detail of the evolution of the D1/D2 region. This approach will be applicable to the rapid identification of unique sequences in other DNA sequence sets. AVAILABILITY Freely available upon request from the authors. SUPPLEMENTARY INFORMATION Results are available at http://www.sys.uea.ac.uk/~jjw/project/paper

genetic and evolutionary computation conference | 2009

A multiobjective GRASP for rule selection

Alan P. Reynolds; David Corne; Beatriz de la Iglesia

This paper describes the application of a multiobjective GRASP to rule selection, where previously generated simple rules are combined to give rule sets that minimize complexity and misclassfication cost. As rule selection performance depends heavily on the diversity and quality of the previously generated rules, this paper also investigates a range of multiobjective approaches for creating this initial rule set and the effect on the quality of the resulting classifier.

JMIR medical informatics | 2015

Building Data-Driven Pathways From Routinely Collected Hospital Data: A Case Study on Prostate Cancer

Joao H. Bettencourt-Silva; Jeremy Clark; Colin S. Cooper; Rob Mills; Victor J. Rayward-Smith; Beatriz de la Iglesia

Background Routinely collected data in hospitals is complex, typically heterogeneous, and scattered across multiple Hospital Information Systems (HIS). This big data, created as a byproduct of health care activities, has the potential to provide a better understanding of diseases, unearth hidden patterns, and improve services and cost. The extent and uses of such data rely on its quality, which is not consistently checked, nor fully understood. Nevertheless, using routine data for the construction of data-driven clinical pathways, describing processes and trends, is a key topic receiving increasing attention in the literature. Traditional algorithms do not cope well with unstructured processes or data, and do not produce clinically meaningful visualizations. Supporting systems that provide additional information, context, and quality assurance inspection are needed. Objective The objective of the study is to explore how routine hospital data can be used to develop data-driven pathways that describe the journeys that patients take through care, and their potential uses in biomedical research; it proposes a framework for the construction, quality assessment, and visualization of patient pathways for clinical studies and decision support using a case study on prostate cancer. Methods Data pertaining to prostate cancer patients were extracted from a large UK hospital from eight different HIS, validated, and complemented with information from the local cancer registry. Data-driven pathways were built for each of the 1904 patients and an expert knowledge base, containing rules on the prostate cancer biomarker, was used to assess the completeness and utility of the pathways for a specific clinical study. Software components were built to provide meaningful visualizations for the constructed pathways. Results The proposed framework and pathway formalism enable the summarization, visualization, and querying of complex patient-centric clinical information, as well as the computation of quality indicators and dimensions. A novel graphical representation of the pathways allows the synthesis of such information. Conclusions Clinical pathways built from routinely collected hospital data can unearth information about patients and diseases that may otherwise be unavailable or overlooked in hospitals. Data-driven clinical pathways allow for heterogeneous data (ie, semistructured and unstructured data) to be collated over a unified data model and for data quality dimensions to be assessed. This work has enabled further research on prostate cancer and its biomarkers, and on the development and application of methods to mine, compare, analyze, and visualize pathways constructed from routine data. This is an important development for the reuse of big data in hospitals.

Lecture Notes in Computer Science | 2003

Algorithms for identification key generation and optimization with application to yeast identification

Alan P. Reynolds; Jo Dicks; Ian N. Roberts; Jan-Jaap Wesselink; Beatriz de la Iglesia; Vincent Robert; Teun Boekhout; Victor J. Rayward-Smith

Algorithms for the automated creation of low cost identification keys are described and theoretical and empirical justifications are provided. The algorithms are shown to handle differing test costs, prior probabilities for each potential diagnosis and tests that produce uncertain results. The approach is then extended to cover situations where more than one measure of cost is of importance, by allowing tests to be performed in batches. Experiments are performed on a real-world case study involving the identification of yeasts.

privacy security risk and trust | 2012

Non-linear Dimensionality Reduction for Privacy-Preserving Data Classification

Khaled Alotaibi; Victor J. Rayward-Smith; Wenjia Wang; Beatriz de la Iglesia

Many techniques have been proposed to protect the privacy of data outsourced for analysis by external parties. However, most of these techniques distort the underlying data properties, and therefore, hinder data mining algorithms from discovering patterns. The aim of Privacy-Preserving Data Mining (PPDM) is to generate a data-friendly transformation that maintains both the privacy and the utility of the data. We have proposed a novel privacy-preserving framework based on non-linear dimensionality reduction (i.e. non-metric multidimensional scaling) to perturb the original data. The perturbed data exhibited good utility in terms of distance-preservation between objects. This was tested on a clustering task with good results. In this paper, we test our novel PPDM approach on a classification task using a k-Nearest Neighbour (k-NN) classification algorithm. We compare the classification results obtained from both the original and the perturbed data and find them to be much same particularly for the few lower dimensions. We show that, for distance-based classification, our approach preserves the utility of the data while hiding the private details.

Explore More