Richard J. Bolton
Imperial College London
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Richard J. Bolton.
Journal of Applied Statistics | 2004
David J. Hand; Richard J. Bolton
Modern statistical data analysis is predominantly model-driven, seeking to decompose an observed data distribution in terms of major underlying descriptive features modified by some stochastic variation. A large part of data mining is also concerned with this exercise. However, another fundamental part of data mining is concerned with detecting anomalies amongst the vast mass of the data: the small deviations, unusual observations, unexpected clusters of observations, or surprising blips in the data, which the model does not explain. We call such anomalies patterns. For sound reasons, which are outlined in the paper, the data mining community has tended to focus on the algorithmic aspects of pattern discovery, and has not developed any general underlying theoretical base. However, such a base is important for any technology: it helps to steer the direction in which the technology develops, as well as serving to provide a basis from which algorithms can be compared, and to indicate which problems are the important ones waiting to be solved. This paper attempts to provide such a theoretical base, linking the ideas to statistical work in spatial epidemiology, scan statistics, outlier detection, and other areas. One of the striking characteristics of work on pattern discovery is that the ideas have been developed in several theoretical arenas, and also in several application domains, with little apparent awareness of the fundamentally common nature of the problem. Like model building, pattern discovery is fundamentally an inferential activity, and is an area in which statisticians can make very significant contributions.
Lecture Notes in Computer Science | 2002
Richard J. Bolton; David J. Hand; Niall M. Adams
The problem of spurious apparent patterns arising by chance is a fundamental one for pattern detection. Classical approaches, based on adjustments such as the Bonferroni procedure, are arguably not appropriate in a data mining context. Instead, methods based on the false discovery rate - the proportion of flagged patterns which do not represent an underlying reality - may be more relevant. We describe such procedures and illustrate their application on a marketing dataset.
Statistics and Computing | 2003
Richard J. Bolton; David J. Hand; Andrew R. Webb
Principal Components Analysis (PCA) is traditionally a linear technique for projecting multidimensional data onto lower dimensional subspaces with minimal loss of variance. However, there are several applications where the data lie in a lower dimensional subspace that is not linear; in these cases linear PCA is not the optimal method to recover this subspace and thus account for the largest proportion of variance in the data.Nonlinear PCA addresses the nonlinearity problem by relaxing the linear restrictions on standard PCA. We investigate both linear and nonlinear approaches to PCA both exclusively and in combination. In particular we introduce a combination of projection pursuit and nonlinear regression for nonlinear PCA. We compare the success of PCA techniques in variance recovery by applying linear, nonlinear and hybrid methods to some simulated and real data sets.We show that the best linear projection that captures the structure in the data (in the sense that the original data can be reconstructed from the projection) is not necessarily a (linear) principal component. We also show that the ability of certain nonlinear projections to capture data structure is affected by the choice of constraint in the eigendecomposition of a nonlinear transform of the data. Similar success in recovering data structure was observed for both linear and nonlinear projections.
international conference on data mining | 2001
Richard J. Bolton; David J. Hand
The authors consider the question of uncertainty of detected patterns in data mining. In particular, we develop statistical tests for patterns found in continuous data, indicating the significance of these patterns in terms of the probability that they have occurred by chance. We examine the performance of these tests on patterns detected in several large data sets, including a data set describing the locations of earthquakes in California and another describing flow cytometry measurements on phytoplankton.
Statistical Science | 2002
Richard J. Bolton; David J. Hand
Archive | 2002
Richard J. Bolton; David J. Hand
knowledge discovery and data mining | 2003
Richard J. Bolton; Niall M. Adams
Archive | 2001
Richard J. Bolton; David J. Hand
Proceedings of the ESF Exploratory Workshop on Pattern Detection and Discovery | 2002
David J. Hand; Niall M. Adams; Richard J. Bolton
ESF Exploratory Workshops | 2002
David J. Hand; Niall M. Adams; Richard J. Bolton