David John Weston
Imperial College London
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by David John Weston.
The Annals of Applied Statistics | 2010
Nicholas A. Heard; David John Weston; Kiriaki Platanioti; David J. Hand
Learning the network structure of a large graph is computationally demanding, and dynamically monitoring the network over time for any changes in structure threatens to be more challenging still. This paper presents a two-stage method for anomaly detection in dynamic graphs: the first stage uses simple, conjugate Bayesian models for discrete time counting processes to track the pairwise links of all nodes in the graph to assess normality of behavior; the second stage applies standard network inference tools on a greatly reduced subset of potentially anomalous nodes. The utility of the method is demonstrated on simulated and real data sets.
Data Mining and Knowledge Discovery | 2009
Christopher Whitrow; David J. Hand; Piotr Juszczak; David John Weston; Niall M. Adams
The problem of preprocessing transaction data for supervised fraud classification is considered. It is impractical to present an entire series of transactions to a fraud detection system, partly because of the very high dimensionality of such data but also because of the heterogeneity of the transactions. Hence, a framework for transaction aggregation is considered and its effectiveness is evaluated against transaction-level detection, using a variety of classification methods and a realistic cost-based performance measure. These methods are applied in two case studies using real data. Transaction aggregation is found to be advantageous in many but not all circumstances. Also, the length of the aggregation period has a large impact upon performance. Aggregation seems particularly effective when a random forest is used for classification. Moreover, random forests were found to perform better than other classification methods, including SVMs, logistic regression and KNN. Aggregation also has the advantage of not requiring precisely labeled data and may be more robust to the effects of population drift.
Journal of the Operational Research Society | 2008
David J. Hand; Christopher Whitrow; Niall M. Adams; Piotr Juszczak; David John Weston
In predictive data mining, algorithms will be both optimized and compared using a measure of predictive performance. Different measures will yield different results, and it follows that it is crucial to match the measure to the true objectives. In this paper, we explore the desirable characteristics of measures for constructing and evaluating tools for mining plastic card data to detect fraud. We define two measures, one based on minimizing the overall cost to the card company, and the other based on minimizing the amount of fraud given the maximum number of investigations the card company can afford to make. We also describe a plot, analogous to the standard ROC, for displaying the performance trace of an algorithm as the relative costs of the two different kinds of misclassification—classing a fraudulent transaction as legitimate or vice versa—are varied.
Advanced Data Analysis and Classification | 2008
David John Weston; David J. Hand; Niall M. Adams; Christopher Whitrow; Piotr Juszczak
Peer group analysis is an unsupervised method for monitoring behaviour over time. In the context of plastic card fraud detection, this technique can be used to find anomalous transactions. These are transactions that deviate strongly from their peer group and are flagged as potentially fraudulent. Time alignment, the quality of the peer groups and the timeliness of assigning fraud flags to transactions are described. We demonstrate the ability to detect fraud using peer groups with real credit card transaction data and define a novel method for evaluating performance.
Computational Statistics & Data Analysis | 2008
Piotr Juszczak; Niall M. Adams; David J. Hand; Christopher Whitrow; David John Weston
Detecting fraudulent plastic card transactions is an important and challenging problem. The challenges arise from a number of factors including the sheer volume of transactions financial institutions have to process, the asynchronous and heterogeneous nature of transactions, and the adaptive behaviour of fraudsters. In this fraud detection problem the performance of a supervised two-class classification approach is compared with performance of an unsupervised one-class classification approach. Attention is focussed primarily on one-class classification approaches. Useful representations of transaction records, and ways of combining different one-class classifiers are described. Assessment of performance for such problems is complicated by the need for timely decision making. Performance assessment measures are discussed, and the performance of a number of one- and two-class classification methods is assessed using two large, real world personal banking data sets.
PLOS ONE | 2012
David John Weston; Niall M. Adams; Richard A. Russell; David A. Stephens; Paul S. Freemont
There is considerable interest in cell biology in determining whether, and to what extent, the spatial arrangement of nuclear objects affects nuclear function. A common approach to address this issue involves analyzing a collection of images produced using some form of fluorescence microscopy. We assume that these images have been successfully pre-processed and a spatial point pattern representation of the objects of interest within the nuclear boundary is available. Typically in these scenarios, the number of objects per nucleus is low, which has consequences on the ability of standard analysis procedures to demonstrate the existence of spatial preference in the pattern. There are broadly two common approaches to look for structure in these spatial point patterns. First a spatial point pattern for each image is analyzed individually, or second a simple normalization is performed and the patterns are aggregated. In this paper we demonstrate using synthetic spatial point patterns drawn from predefined point processes how difficult it is to distinguish a pattern from complete spatial randomness using these techniques and hence how easy it is to miss interesting spatial preferences in the arrangement of nuclear objects. The impact of this problem is also illustrated on data related to the configuration of PML nuclear bodies in mammalian fibroblast cells.
intelligent data analysis | 2014
David John Weston
Using space-filling curves to order multidimensional data has been found to be useful in a variety of application domains. This paper examines the space-filling curve induced ordering of multidimensional data that has been transformed using shape preserving transformations. It is demonstrated that, although the orderings are not invariant under these transformations, the probability of an ordering is dependent on the geometrical configuration of the multidimensional data. This novel property extends the potential applicability of space-filling curves and is demonstrated by constructing novel features for shape matching.
GfKl | 2012
David John Weston; Niall M. Adams; Yoonseong Kim; David J. Hand
There has been increasing interest in deploying data mining methods for fault detection. For the case where we have potentially large numbers of devices to monitor, we propose to use peer group analysis to identify faults. First, we identify the peer group of each device. This consists of other devices that have behaved similarly. We then monitor the behaviour of a device by measuring how well the peer group tracks the device. Should the device’s behaviour deviate strongly from its peer group we flag the behaviour as an outlier. An outlier is used to indicate the potential occurrence of a fault. A device exhibiting outlier behaviour from its peer group need not be an outlier to the population of devices. Indeed a device exhibiting behaviour typical for the population of devices might deviate sufficiently far from its peer group to be flagged as an outlier. We demonstrate the usefulness of this property for detecting faults by monitoring the data output from a collection of privately run weather stations across the UK.
intelligent data analysis | 2017
Abul Hasan; Mark Levene; David John Weston
Despite advances in concept extraction from free text, finding meaningful health related information from online patient forums still poses a significant challenge. Here we demonstrate how structured information can be extracted from posts found in such online health related forums by forming relationships between a drug/treatment and a symptom or side effect, including the polarity/sentiment of the patient. In particular, a rule-based natural language processing (NLP) system is deployed, where information in sentences is linked together though anaphora resolution. Our NLP relationship extraction system provides a strong baseline, achieving an \(\text {F}_1\) score of over 80% in discovering the said relationships that are present in the posts we analysed.
Archive | 2017
Niall M. Adams; Allan Tucker; David John Weston
This paper shows how state-of-the-art deep learning methods can be combined to successfully tackle a new classification task related to chairlift security using visual information. In particular, we show that with an effective architecture and some domain adaptation components, we can learn an end-to-end model that could be deployed in ski resorts to improve the security of chairlift passengers. Our experiments show that our method gives better results than already deployed hand-tuned systems when using all the available data and very promising results on new unseen chairlifts.