Jonathan R. Wells | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Jonathan R. Wells is active.

Explore More

Publication

Featured researches published by Jonathan R. Wells.

Machine Learning | 2011

Feature-subspace aggregating: ensembles for stable and unstable learners

Kai Ming Ting; Jonathan R. Wells; Swee Chuan Tan; Shyh Wei Teng; Geoffrey I. Webb

This paper introduces a new ensemble approach, Feature-Subspace Aggregating (Feating), which builds local models instead of global models. Feating is a generic ensemble approach that can enhance the predictive performance of both stable and unstable learners. In contrast, most existing ensemble approaches can improve the predictive performance of unstable learners only. Our analysis shows that the new approach reduces the execution time to generate a model in an ensemble through an increased level of localisation in Feating. Our empirical evaluation shows that Feating performs significantly better than Boosting, Random Subspace and Bagging in terms of predictive accuracy, when a stable learner SVM is used as the base learner. The speed up achieved by Feating makes feasible SVM ensembles that would otherwise be infeasible for large data sets. When SVM is the preferred base learner, we show that Feating SVM performs better than Boosting decision trees and Random Forests. We further demonstrate that Feating also substantially reduces the error of another stable learner, k-nearest neighbour, and an unstable learner, decision tree.

international conference on data mining | 2010

Multi-dimensional Mass Estimation and Mass-based Clustering

Kai Ming Ting; Jonathan R. Wells

Mass estimation, an alternative to density estimation, has been shown recently to be an effective base modelling mechanism for three data mining tasks of regression, information retrieval and anomaly detection. This paper advances this work in two directions. First, we generalise the previously proposed one-dimensional mass estimation to multidimensional mass estimation, and significantly reduce the time complexity to O(ψh) from O(ψh)-making it feasible for a full range of generic problems. Second, we introduce the first clustering method based on mass-it is unique because it does not employ any distance or density measure. The structure of the new mass model enables different parts of a cluster to be identified and merged without expensive evaluations. The characteristics of the new clustering method are: (i) it can identify arbitrary-shape clusters; (ii) it is significantly faster than existing density-based or distance-based methods; and (iii) it is noise-tolerant.

Knowledge and Information Systems | 2013

DEMass: a new density estimator for big data

Kai Ming Ting; Takashi Washio; Jonathan R. Wells; Fei Tony Liu; Sunil Aryal

Density estimation is the ubiquitous base modelling mechanism employed for many tasks including clustering, classification, anomaly detection and information retrieval. Commonly used density estimation methods such as kernel density estimator and

Pattern Recognition | 2014

LiNearN: A new approach to nearest neighbour density estimator

Jonathan R. Wells; Kai Ming Ting; Takashi Washio

international conference on data mining | 2014

Efficient Anomaly Detection by Isolation Using Nearest Neighbour Ensemble

Tharindu Rukshan Bandaragoda; Kai Ming Ting; David W. Albrecht; Fei Tony Liu; Jonathan R. Wells

Machine Learning | 2017

Defying the gravity of learning curve: a characteristic of nearest neighbour anomaly detectors

Kai Ming Ting; Takashi Washio; Jonathan R. Wells; Sunil Aryal

computational intelligence | 2013

LOCAL MODELS—THE KEY TO BOOSTING STABLE LEARNERS SUCCESSFULLY

Kai Ming Ting; Lian Zhu; Jonathan R. Wells

-nearest neighbour density estimator have high time and space complexities which render them inapplicable in problems with big data. This weakness sets the fundamental limit in existing algorithms for all these tasks. We propose the first density estimation method, having average case sub-linear time complexity and constant space complexity in the number of instances, that stretches this fundamental limit to an extent that dealing with millions of data can now be done easily and quickly. We provide an asymptotic analysis of the new density estimator and verify the generality of the method by replacing existing density estimators with the new one in three current density-based algorithms, namely DBSCAN, LOF and Bayesian classifiers, representing three different data mining tasks of clustering, anomaly detection and classification. Our empirical evaluation results show that the new density estimation method significantly improves their time and space complexities, while maintaining or improving their task-specific performances in clustering, anomaly detection and classification. The new method empowers these algorithms, currently limited to small data size only, to process big data—setting a new benchmark for what density-based algorithms can achieve.

pacific-asia conference on knowledge discovery and data mining | 2014

Improving iForest with Relative Mass

Sunil Aryal; Kai Ming Ting; Jonathan R. Wells; Takashi Washio

Abstract Despite their wide spread use, nearest neighbour density estimators have two fundamental limitations: O ( n 2 ) time complexity and O(n) space complexity. Both limitations constrain nearest neighbour density estimators to small data sets only. Recent progress using indexing schemes has improved to near linear time complexity only. We propose a new approach, called LiNearN for Linear time Nearest Neighbour algorithm, that yields the first nearest neighbour density estimator having O(n) time complexity and constant space complexity, as far as we know. This is achieved without using any indexing scheme because LiNearN uses a subsampling approach for which the subsample values are significantly less than the data size. Like existing density estimators, our asymptotic analysis reveals that the new density estimator has a parameter to trade off between bias and variance. We show that algorithms based on the new nearest neighbour density estimator can easily scale up to data sets with millions of instances in anomaly detection and clustering tasks.

multiple classifier systems | 2009

FaSS: Ensembles for Stable Learners

Kai Ming Ting; Jonathan R. Wells; Swee Chuan Tan; Shyh Wei Teng; Geoffrey I. Webb

This paper presents iNNE (isolation using Nearest Neighbour Ensemble), an efficient nearest neighbour-based anomaly detection method by isolation. Inne runs significantly faster than existing nearest neighbour-based methods such as Local Outlier Factor, especially in data sets having thousands of dimensions or millions of instances. This is because the proposed method has linear time complexity and constant space complexity. Compared with the existing tree-based isolation method iForest, the proposed isolation method overcomes three weaknesses of iForest that we have identified, i.e., Its inability to detect local anomalies, anomalies with a low number of relevant attributes, and anomalies that are surrounded by normal instances.

computational intelligence | 2018

Isolation-based anomaly detection using nearest-neighbor ensembles

Tharindu Rukshan Bandaragoda; Kai Ming Ting; David W. Albrecht; Fei Tony Liu; Ye Zhu; Jonathan R. Wells

Conventional wisdom in machine learning says that all algorithms are expected to follow the trajectory of a learning curve which is often colloquially referred to as ‘more data the better’. We call this ‘the gravity of learning curve’, and it is assumed that no learning algorithms are ‘gravity-defiant’. Contrary to the conventional wisdom, this paper provides the theoretical analysis and the empirical evidence that nearest neighbour anomaly detectors are gravity-defiant algorithms.

Explore More