Gordon J. Ross
Imperial College London
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Gordon J. Ross.
Pattern Recognition Letters | 2012
Gordon J. Ross; Niall M. Adams; Dimitris K. Tasoulis; David J. Hand
Classifying streaming data requires the development of methods which are computationally efficient and able to cope with changes in the underlying distribution of the stream, a phenomenon known in the literature as concept drift. We propose a new method for detecting concept drift which uses an exponentially weighted moving average (EWMA) chart to monitor the misclassification rate of an streaming classifier. Our approach is modular and can hence be run in parallel with any underlying classifier to provide an additional layer of concept drift detection. Moreover our method is computationally efficient with overhead O(1) and works in a fully online manner with no need to store data points in memory. Unlike many existing approaches to concept drift detection, our method allows the rate of false positive detections to be controlled and kept constant over time.
Technometrics | 2011
Gordon J. Ross; Dimitris K. Tasoulis; Niall M. Adams
The analysis of data streams requires methods which can cope with a very high volume of data points. Under the requirement that algorithms must have constant computational complexity and a fixed amount of memory, we develop a framework for detecting changes in data streams when the distributional form of the stream variables is unknown. We consider the general problem of detecting a change in the location and/or scale parameter of a stream of random variables, and adapt several nonparametric hypothesis tests to create a streaming change detection algorithm. This algorithm uses a test statistic with a null distribution independent of the data. This allows a desired rate of false alarms to be maintained for any stream even when its distribution is unknown. Our method is based on hypothesis tests which involve ranking data points, and we propose a method for calculating these ranks online in a manner which respects the constraints of data stream analysis.
intelligent data analysis | 2007
Dimitris K. Tasoulis; Gordon J. Ross; Niall M. Adams
The increasing availability of streaming data is a consequence of the continuing advancement of data acquisition technology. Such data provides new challenges to the various data analysis communities. Clustering has long been a fundamental procedure for acquiring knowledge from data, and new tools are emerging that allow the clustering of data streams. However the dynamic, temporal components of streaming data provide extra challenges to the development of stream clustering and associated visualisation techniques. In this work we combine a streaming clustering framework with an extension of a static cluster visualisation method, in order to construct a surface that graphically represents the clustering structure of the data stream. The proposed method, OpticsStream, provides intuitive representations of the clustering structure as well as the manner in which this structure changes through time.
Physica A-statistical Mechanics and Its Applications | 2013
Gordon J. Ross
The volatility of financial instruments is rarely constant, and usually varies over time. This creates a phenomenon called volatility clustering, where large price movements on one day are followed by similarly large movements on successive days, creating temporal clusters. The GARCH model, which treats volatility as a drift process, is commonly used to capture this behaviour. However research suggests that volatility is often better described by a structural break model, where the volatility undergoes abrupt jumps in addition to drift. Most efforts to integrate these jumps into the GARCH methodology have resulted in models which are either very computationally demanding, or which make problematic assumptions about the distribution of the instruments, often assuming that they are Gaussian. We present a new approach which uses ideas from nonparametric statistics to identify structural break points without making such distributional assumptions, and then models drift separately within each identified regime. Using our method, we investigate the volatility of several major stock indexes, and find that our approach can potentially give an improved fit compared to more commonly used techniques.
acm symposium on applied computing | 2009
Gordon J. Ross; Dimitris K. Tasoulis; Niall M. Adams
Regime switching models, in which the state of the world is locally stationary, are a useful abstraction for many continuous valued data streams. In this paper we develop an online framework for the challenging problem of jointly predicting and annotating streaming data as it arrives. The framework consists of three sequential modules: prediction, change detection and regime annotation, each of which may be instantiated in a number of ways. We describe a specific realisation of this framework with the prediction module implemented using recursive least squares, and change detection implemented using CUSUM techniques. The annotation step involves associating a label with each regime, implemented here using a confidence interval approach. Experiments with simulated data show that this methodology can provide an annotation that is consistent with ground truth. Finally, the method is illustrated with foreign exchange data.
Computational Statistics | 2013
Gordon J. Ross; Dimitris K. Tasoulis; Niall M. Adams
The task of monitoring for a change in the mean of a sequence of Bernoulli random variables has been widely studied. However most existing approaches make at least one of the following assumptions, which may be violated in many real-world situations: (1) the pre-change value of the Bernoulli parameter is known in advance, (2) computational efficiency is not paramount, and (3) enough observations occur between change points to allow asymptotic approximations to be used. We develop a novel change detection method based on Fisher’s exact test which does not make any of these assumptions. We show that our method can be implemented in a computationally efficient manner, and is hence suited to sequential monitoring where new observations are constantly being received over time. We assess our method’s performance empirically via using simulated data, and find that it is comparable to the optimal CUSUM scheme which assumes both pre- and post-change values of the parameter to be known.
Physical Review E | 2014
Gordon J. Ross
We investigate the tendency for financial instruments to form clusters when there are multiple factors influencing the correlation structure. Specifically, we consider a stock portfolio which contains companies from different industrial sectors, located in several different countries. Both sector membership and geography combine to create a complex clustering structure where companies seem to first be divided based on sector, with geographical subclusters emerging within each industrial sector. We argue that standard techniques for detecting overlapping clusters and communities are not able to capture this type of structure and show how robust regression techniques can instead be used to remove the influence of both sector and geography from the correlation matrix separately. Our analysis reveals that prior to the 2008 financial crisis, companies did not tend to form clusters based on geography. This changed immediately following the crisis, with geography becoming a more important determinant of clustering structure.
Statistics and Computing | 2014
Gordon J. Ross
It is commonly required to detect change points in sequences of random variables. In the most difficult setting of this problem, change detection must be performed sequentially with new observations being constantly received over time. Further, the parameters of both the pre- and post- change distributions may be unknown. In Hawkins and Zamba (Technometrics 47(2):164–173, 2005), the sequential generalised likelihood ratio test was introduced for detecting changes in this context, under the assumption that the observations follow a Gaussian distribution. However, we show that the asymptotic approximation used in their test statistic leads to it being conservative even when a large numbers of observations is available. We propose an improved procedure which is more efficient, in the sense of detecting changes faster, in all situations. We also show that similar issues arise in other parametric change detection contexts, which we illustrate by introducing a novel monitoring procedure for sequences of Exponentially distributed random variable, which is an important topic in time-to-failure modelling.
advances in geographic information systems | 2016
Apostolos Pyrgelis; Emiliano De Cristofaro; Gordon J. Ross
Location data can be extremely useful to study commuting patterns and disruptions, as well as to predict real-time traffic volumes. At the same time, however, the fine-grained collection of user locations raises serious privacy concerns, as this can reveal sensitive information about the users, such as, life style, political and religious inclinations, or even identities. In this paper, we study the feasibility of crowd-sourced mobility analytics over aggregate location information: users periodically report their location, using a privacy-preserving aggregation protocol, so that the server can only recover aggregates - i.e., how many, but not which, users are in a region at a given time. We experiment with real-world mobility datasets obtained from the Transport For London authority and the San Francisco Cabs network, and present a novel methodology based on time series modeling that is geared to forecast traffic volumes in regions of interest and to detect mobility anomalies in them. In the presence of anomalies, we also make enhanced traffic volume predictions by feeding our model with additional information from correlated regions. Finally, we present and evaluate a mobile app prototype, called Mobility Data Donors (MDD), in terms of computation, communication, and energy overhead, demonstrating the real-world deployability of our techniques.
availability, reliability and security | 2016
Enrico Mariconti; Jeremiah Onaolapo; Gordon J. Ross; Gianluca Stringhini
This work uses statistical classification techniques to learn about the different network behavior patterns demonstrated by targeted malware and generic malware. Targeted malware is a recent type of threat, involving bespoke software that has been created to target a specific victim. It is considered a more dangerous threat than generic malware, because a targeted attack can cause more serious damage to the victim. Our work aims to automatically distinguish between the network activity generated by the two types of malware, which then allows samples of malware to be classified as being either targeted or generic. For a network administrator, such knowledge can be important because it assists to understand which threats require particular attention. Because a network administrator usually manages more than an alarm simultaneously, the aim of the work is particularly relevant. We set up a sandbox and infected virtual machines with malware, recording all resulting malware activity on the network. Using the network packets produced by the malware samples, we extract features to classify their behavior. Before performing classification, we carefully analyze the features and the dataset to study all their details and gain a deeper understanding of the malware under study. Our use of statistical classifiers is shown to give excellent results in some cases, where we achieved an accuracy of almost 96% in distinguishing between the two types of malware. We can conclude that the network behaviors of the two types of malicious code are very different.