Adam Craig Pocock | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Adam Craig Pocock is active.

Explore More

Publication

Featured researches published by Adam Craig Pocock.

Electronic Notes in Theoretical Computer Science | 2010

Fundamental Nano-Patterns to Characterize and Classify Java Methods

Jeremy Singer; Gavin Brown; Mikel Luján; Adam Craig Pocock; Paraskevas Yiapanis

Fundamental nano-patterns are simple, static, binary properties of Java methods, such as ObjectCreator and Recursive. We present a provisional catalogue of 17 such nano-patterns. We report statistical and information theoretic metrics to show the frequency of nano-pattern occurrence in a large corpus of open-source Java projects. We proceed to give two example case studies that demonstrate potential applications for nano-patterns. The first study involves a quantitative comparison of two popular Java benchmarking suites, in terms of their relative object-orientedness and diversity. The second study involves applying machine learning techniques to program comprehension, using method nano-patterns as learning features. In both studies, nano-patterns provide concise summaries of Java methods to enable efficient and effective analysis.

ieee international symposium on workload characterization | 2010

Toward a more accurate understanding of the limits of the TLS execution paradigm

Nikolas Ioannou; Jeremy Singer; Salman Khan; Polychronis Xekalakis; Paraskevas Yiapanis; Adam Craig Pocock; Gavin Brown; Mikel Luján; Ian Watson; Marcelo Cintra

Thread-Level Speculation (TLS) facilitates the extraction of parallel threads from sequential applications. Most prior work has focused on developing the compiler and architecture for this execution paradigm. Such studies often narrowly concentrated on a specific design point. On the other hand, other studies have attempted to assess how well TLS performs if some architectural/ compiler constraint is relaxed. Unfortunately, such previous studies have failed to truly assess TLS performance potential, because they have been bound to some specific TLS architecture and have ignored one or another important TLS design choice, such as support for out-of-order task spawn or support for intermediate checkpointing.

international conference on multiple classifier systems | 2010

Online non-stationary boosting

Adam Craig Pocock; Paraskevas Yiapanis; Jeremy Singer; Mikel Luján; Gavin Brown

Ozas Online Boosting algorithm provides a version of AdaBoost which can be trained in an online way for stationary problems. One perspective is that this enables the power of the boosting framework to be applied to datasets which are too large to fit into memory. The online boosting algorithm assumes the data distribution to be independent and identically distributed (i.i.d.) and therefore has no provision for concept drift. We present an algorithm called Online Non-Stationary Boosting (ONSBoost) that, like Online Boosting, uses a static ensemble size without generating new members each time new examples are presented, and also adapts to a changing data distribution. We evaluate the new algorithm against Online Boosting, using the STAGGER dataset and three challenging datasets derived from a learning problem inside a parallelising virtual machine. We find that the new algorithm provides equivalent performance on the STAGGER dataset and an improvement of up to 3% on the parallelisation datasets.

international conference on big data | 2015

A scalable implementation of information theoretic feature selection for high dimensional data

Anthony Kleerekoper; Michael Pappas; Adam Craig Pocock; Gavin Brown; Mikel Luján

With the growth of high dimensional data, feature selection is a vital component of machine learning as well as an important stand alone data analytics tool. Without it, the computation cost of big data analytics can become unmanageable and spurious correlations and noise can reduce the accuracy of any results. Feature selection removes irrelevant and redundant information leading to faster, more reliable data analysis. Feature selection techniques based on information theory are among the fastest known and the Manchester AnalyticS Toolkit (MAST) provides an efficient, parallel and scalable implementation of these methods. This paper considers a number of data structures for storing the frequency counters that underpin MAST. We show that preprocessing the data to reduce the number of zero-valued counters in an array structure results in an order of magnitude reduction in both memory usage and execution time compared to state of the art structures that use explicit mappings to avoid zero-valued counters. We also describe a number of parallel processing techniques that enable MAST to scale linearly with the number of processors even on NUMA architectures. MAST targets scale-up servers rather than scale-out clusters and we show that it performs orders of magnitude faster than existing tools. Moreover, we show that MAST is 3.5 times faster than a scale-out solution built for Spark running on the same server. As an example of the performance of MAST, we were able to process a dataset of 100 million examples and 100,000 features in under 10 minutes on a four socket server which each socket containing an 8-core Intel Xeon E5-4620 processor.

Journal of Machine Learning Research | 2012