Robert J. Durrant | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Robert J. Durrant is active.

Explore More

Publication

Featured researches published by Robert J. Durrant.

knowledge discovery and data mining | 2010

Compressed fisher linear discriminant analysis: classification of randomly projected data

Robert J. Durrant; Ata Kabán

We consider random projections in conjunction with classification, specifically the analysis of Fishers Linear Discriminant (FLD) classifier in randomly projected data spaces. Unlike previous analyses of other classifiers in this setting, we avoid the unnatural effects that arise when one insists that all pairwise distances are approximately preserved under projection. We impose no sparsity or underlying low-dimensional structure constraints on the data; we instead take advantage of the class structure inherent in the problem. We obtain a reasonably tight upper bound on the estimated misclassification error on average over the random choice of the projection, which, in contrast to early distance preserving approaches, tightens in a natural way as the number of training examples increases. It follows that, for good generalisation of FLD, the required projection dimension grows logarithmically with the number of classes. We also show that the error contribution of a covariance misspecification is always no worse in the low-dimensional space than in the initial high-dimensional space. We contrast our findings to previous related work, and discuss our insights.

Machine Learning | 2015

Random projections as regularizers: learning a linear discriminant from fewer observations than dimensions

Robert J. Durrant; Ata Kabán

We prove theoretical guarantees for an averaging-ensemble of randomly projected Fisher linear discriminant classifiers, focusing on the case when there are fewer training observations than data dimensions. The specific form and simplicity of this ensemble permits a direct and much more detailed analysis than existing generic tools in previous works. In particular, we are able to derive the exact form of the generalization error of our ensemble, conditional on the training set, and based on this we give theoretical guarantees which directly link the performance of the ensemble to that of the corresponding linear discriminant learned in the full data space. To the best of our knowledge these are the first theoretical results to prove such an explicit link for any classifier and classifier ensemble pair. Furthermore we show that the randomly projected ensemble is equivalent to implementing a sophisticated regularization scheme to the linear discriminant learned in the original data space and this prevents overfitting in conditions of small sample size where pseudo-inverse FLD learned in the data space is provably poor. Our ensemble is learned from a set of randomly projected representations of the original high dimensional data and therefore for this approach data can be collected, stored and processed in such a compressed form. We confirm our theoretical findings with experiments, and demonstrate the utility of our approach on several datasets from the bioinformatics domain and one very high dimensional dataset from the drug discovery domain, both settings in which fewer observations than dimensions are the norm.

Evolutionary Computation | 2016

Toward large-scale continuous eda: A random matrix theory perspective

Ata Kabán; Jakramate Bootkrajang; Robert J. Durrant

Estimations of distribution algorithms (EDAs) are a major branch of evolutionary algorithms (EA) with some unique advantages in principle. They are able to take advantage of correlation structure to drive the search more efficiently, and they are able to provide insights about the structure of the search space. However, model building in high dimensions is extremely challenging, and as a result existing EDAs may become less attractive in large-scale problems because of the associated large computational requirements. Large-scale continuous global optimisation is key to many modern-day real-world problems. Scaling up EAs to large-scale problems has become one of the biggest challenges of the field. This paper pins down some fundamental roots of the problem and makes a start at developing a new and generic framework to yield effective and efficient EDA-type algorithms for large-scale continuous global optimisation problems. Our concept is to introduce an ensemble of random projections to low dimensions of the set of fittest search points as a basis for developing a new and generic divide-and-conquer methodology. Our ideas are rooted in the theory of random projections developed in theoretical computer science, and in developing and analysing our framework we exploit some recent results in nonasymptotic random matrix theory.

european conference on machine learning | 2008

Learning with Lq < 1 vs L1-Norm Regularisation with Exponentially Many Irrelevant Features

Ata Kabán; Robert J. Durrant

We study the use of fractional norms for regularisation in supervised learning from high dimensional data, in conditions of a large number of irrelevant features, focusing on logistic regression. We develop a variational method for parameter estimation, and show an equivalence between two approximations recently proposed in the statistics literature. Building on previous work by A.Ng, we show the fractional norm regularised logistic regression enjoys a sample complexity that grows logarithmically with the data dimensions and polynomially with the number of relevant dimensions. In addition, extensive empirical testing indicates that fractional-norm regularisation is more suitable than L1 in cases when the number of relevant features is very small, and works very well despite a large number of irrelevant features.

algorithmic learning theory | 2013

Dimension-Adaptive Bounds on Compressive FLD Classification

Ata Kabán; Robert J. Durrant

Efficient dimensionality reduction by random projections (RP) gains popularity, hence the learning guarantees achievable in RP spaces are of great interest. In finite dimensional setting, it has been shown for the compressive Fisher Linear Discriminant (FLD) classifier that for good generalisation the required target dimension grows only as the log of the number of classes and is not adversely affected by the number of projected data points. However these bounds depend on the dimensionality d of the original data space. In this paper we give further guarantees that remove d from the bounds under certain conditions of regularity on the data density structure. In particular, if the data density does not fill the ambient space then the error of compressive FLD is independent of the ambient dimension and depends only on a notion of ‘intrinsic dimension’.

international conference on pattern recognition | 2010

A Bound on the Performance of LDA in Randomly Projected Data Spaces

Robert J. Durrant; Ata Kabán

We consider the problem of classification in nonadaptive dimensionality reduction. Specifically, we bound the increase in classification error of Fisher’s Linear Discriminant classifier resulting from randomly projecting the high dimensional data into a lower dimensional space and both learning the classifier and performing the classification in the projected space. Our bound is reasonably tight, and unlike existing bounds on learning from randomly projected data, it becomes tighter as the quantity of training data increases without requiring any sparsity structure from the data.

congress on evolutionary computation | 2016

How effective is Cauchy-EDA in high dimensions?

Momodou L. Sanyang; Robert J. Durrant; Ata Kabán

We consider the problem of high dimensional blackbox optimisation via Estimation of Distribution Algorithms (EDA) and the use of heavy-tailed search distributions in this setting. Some authors have suggested that employing a heavy tailed search distribution, such as a Cauchy, may make EDA better explore a high dimensional search space. However, other authors have found Cauchy search distributions are less effective than Gaussian search distributions in high dimensional problems. In this paper, we set out to resolve this controversy. To achieve this we run extensive experiments on a battery of high-dimensional test functions, and develop some theory which shows that small search steps are always more likely to move the search distribution towards the global optimum than large ones and, in particular, large search steps in high-dimensional spaces do badly in this respect with high probability. We hypothesise that, since exploration by large steps is mostly counterproductive in high dimensions, and since the fraction of good directions decays exponentially fast with increasing dimension, instead one should focus mainly on finding the right direction in which to move the search distribution. We propose a minor change to standard Gaussian EDA which implicitly achieves this aim, and our experiments on a sequence of test functions confirm the good performance of our new approach.

Machine Learning | 2017

Foreword: special issue for the journal track of the 8th Asian conference on machine learning (ACML 2016)

Robert J. Durrant; Kee-Eung Kim; Geoffrey Holmes; Stephen Marsland; Masashi Sugiyama; Zhi-Hua Zhou

We, the guest editors, welcome you to this special issue of Machine Learning comprising papers accepted to the journal track of the 8th Asian conference on machine learning (ACML 2016), held at the University of Waikato, Hamilton, New Zealand. This year’s ACML was the first to run a dedicated journal track alongside the usual proceedings track: We believe the experiment was a success and we are delighted to share these contributions with you. This issue of Machine Learning covers a range of areas of current interest in machine learning, and these papers represent some of the state-of-the-art in our fascinating field. Unlike in previous years this ACML special issue does not contain extended versions of selected conference submissions, rather these are instead full journal papers on original

Diagnostic Microbiology and Infectious Disease | 2017

Rapid molecular diagnosis of the Mycobacterium tuberculosis Rangipo strain responsible for the largest recurring TB cluster in New Zealand

Claire V. Mulholland; Ali Ruthe; Raymond T. Cursons; Robert J. Durrant; Noel Karalus; Kathryn Coley; James E. Bower; Elizabeth Permina; Megan J. Coleman; Sally A Roberts; Vickery L. Arcus; Gregory M. Cook; Htin Lin Aung

Despite New Zealand being a low-tuberculosis (TB) burden country, there are disproportionately high rates of TB in particular populations. Here, we report a rapid molecular diagnosis of the Mycobacterium tuberculosis Rangipo strain responsible for the largest recurring TB cluster in New Zealand.

Journal of Complexity | 2009