Is this you? Create Your Porfile

Andy Nisbet

Manchester Metropolitan University

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Andy Nisbet is active.

Explore More

Publication

Featured researches published by Andy Nisbet.

Interactive Technology and Smart Education | 2005

Personal Investigator: A therapeutic 3D game for adolecscent psychotherapy

David Coyle; Mark Matthews; John Sharry; Andy Nisbet; Gavin J. Doherty

Although mental health problems increase markedly during adolescent years, therapists often find it difficult to engage with adolescents. The majority of disturbed adolescents do not receive professional mental health care and of those who do fewer still will fully engage with the therapeutic process (Offer et al. 1991; US Surgeon General 1999). Personal Investigator (PI) is a 3D computer game specifically designed to help adolescents overcome mental health problems such as depression and help them engage more easily with professional mental health care services. PI is an implementation of a new computer mediated model for how therapists and adolescents can engage. The model has its theoretical foundations in play therapy and therapeutic storytelling and applies current research on the educational use of computer gaming and interactive narrative systems to these foundations. Previously demonstrated benefits of computer games and interactive narrative systems in education include increased motivation, increased self-esteem, improved problem solving and discussion skills and improved storytelling skills (Bruckman 1997; Bers 2001; Robertson 2001; Robertson and Oberlander 2002; Bers et al. 2003; Squire 2003). PI aims to take advantage of these benefits in a mental health care setting. PI incorporates a goal-oriented, strengths based model of psychotherapy called Solution Focused Therapy (SFT). By engaging adolescents, in a client-centred way, it aims to build stronger therapeutic relationships between therapists and adolescents. PI is the first game to integrate this established psychotherapy approach into an engaging online 3D game. Results of trials of PI with four adolescents, referred to clinics for issues including anxiety and behaviour problems, attempted suicide, and social skills difficulties, are presented.

Journal of Parallel and Distributed Computing | 2013

Enhancing data parallelism for Ant Colony Optimization on GPUs

José M. Cecilia; José M. García; Andy Nisbet; Martyn Amos; Manuel Ujaldon

Graphics Processing Units (GPUs) have evolved into highly parallel and fully programmable architecture over the past five years, and the advent of CUDA has facilitated their application to many real-world applications. In this paper, we deal with a GPU implementation of Ant Colony Optimization (ACO), a population-based optimization method which comprises two major stages: tour construction and pheromone update. Because of its inherently parallel nature, ACO is well-suited to GPU implementation, but it also poses significant challenges due to irregular memory access patterns. Our contribution within this context is threefold: (1) a data parallelism scheme for tour construction tailored to GPUs, (2) novel GPU programming strategies for the pheromone update stage, and (3) a new mechanism called I-Roulette to replicate the classic roulette wheel while improving GPU parallelism. Our implementation leads to factor gains exceeding 20x for any of the two stages of the ACO algorithm as applied to the TSP when compared to its sequential counterpart version running on a similar single-threaded high-end CPU. Moreover, an extensive discussion focused on different implementation paths on GPUs shows the way to deal with parallel graph connected components. This, in turn, suggests a broader area of inquiry, where algorithm designers may learn to adapt similar optimization methods to GPU architecture.

Science of Computer Programming | 2005

The case for virtual register machines

David Gregg; Andrew Beatty; Kevin Casey; Brain Davis; Andy Nisbet

Virtual machines (VMs) are a popular target for language implementers. A long-running question in the design of virtual machines has been whether stack or register architectures can be implemented more efficiently with an interpreter. Many designers favour stack architectures since the location of operands is implicit in the stack pointer. In contrast, the operands of register machine instructions must be specified explicitly. In this paper, we present a working system for translating stack-based Java virtual machine (JVM) code to a simple register code. We describe the translation process, the complicated parts of the JVM which make translation more difficult, and the optimisations needed to eliminate copy instructions. Experimental results show that a register format reduced the number of executed instructions by 34.88%, while increasing the number of bytecode loads by an average of 44.81%. Overall, this corresponds to an increase of 2.32 loads for each dispatch removed. We believe that the high cost of dispatches makes register machines attractive even at the cost of increased loads.

ieee international symposium on parallel & distributed processing, workshops and phd forum | 2011

Parallelization strategies for ant colony optimisation on GPUs

José M. Cecilia; José M. García; Manuel Ujaldon; Andy Nisbet; Martyn Amos

Ant Colony Optimisation (ACO) is an effective population-based meta-heuristic for the solution of a wide variety of problems. As a population-based algorithm, its computation is intrinsically massively parallel, and it is therefore theoretically well-suited for implementation on Graphics Processing Units (GPUs). The ACO algorithm comprises two main stages: textit{Tour construction} and textit{Pheromone update}. The former has been previously implemented on the GPU, using a task-based parallelism approach. However, up until now, the latter has always been implemented on the CPU. In this paper, we discuss several parallelisation strategies for {it both} stages of the ACO algorithm on the GPU. We propose an alternative {it data-based} parallelism scheme for textit{Tour construction}, which fits better on the GPU architecture. We also describe novel GPU programming strategies for the textit{Pheromone update} stage. Our results show a total speed-up exceeding 28x for the textit{Tour construction} stage, and 20x for textit{Pheromone update}, and suggest that ACO is a potentially fruitful area for future research in the GPU domain.

compiler construction | 2004

Stochastic Bit-Width Approximation Using Extreme Value Theory for Customizable Processors

Emre Özer; Andy Nisbet; David Gregg

Application-specific logic can be generated with a balance and mix of functional units tailored to match an application’s computational requirements. The area and power consumption of application-specific functional units, registers and memory blocks is heavily dependent on the bit-widths of operands used in computations. The actual bit-width required to store the values assigned to a variable during execution of a program will not in general match the built-in C data types with fixed sizes of 8, 16, 32 and 64 bits. Thus, precious area is wasted if the built-in data type sizes are used to declare the size of operands. A novel stochastic bit-width approximation technique is introduced to estimate the required bit-width of integer variables using Extreme Value Theory. Results are presented to demonstrate reductions in bit-widths, area and power consumption when the probability of overflow/underflow occurring is varied from 0.1 to infinitesimal levels. Our experimental results show that the stochastic bit-width approximation results in overall 32% reduction in area and overall 21% reduction in the design power consumption on a FPGA chip for nine embedded benchmarks.

Opto-Ireland 2002: Optical Metrology, Imaging, and Machine Vision | 2003

Correction of geometric image distortion using FPGAs

David Eadie; Fergal Shevlin; Andy Nisbet

Many image processing systems have real-time performance constraints. Systems implemented on general purpose processors maximize performance by keeping busy the small fixed number of available functional units such as adders and multipliers. In this paper we investigate the use of programmable logic devices to accelerate the execution of an application. Field Programmable Gate Arrays (FPGAs) can be programmed to generate application specific logic that alters the balance and type(s) of functional units to match application characteristics. In this paper we introduce a correction of geometric image distortion application. Real number support is a requirement in most image processing applications. We examine the suitability of fixed point, floating-point and logarithmic number systems for an FPGA implementation of this image processing application. Performance results are presented in terms of: (1) execution time, and (2) FPGA logic resource requirements.

ACM Transactions in Embedded Computing Systems | 2008

A stochastic bitwidth estimation technique for compact and low-power custom processors

Emre Özer; Andy Nisbet; David Gregg

There is an increasing trend toward compiling from C to custom hardware for designing embedded systems in which the area and power consumption of application-specific functional units, registers, and memory blocks are heavily dependent on the bit-widths of integer operands used in computations. The actual bit-width required to store the values assigned to an integer variable during the execution of a program will not, in general, match the built-in C data types. Thus, precious area is wasted if the built-in data type sizes are used to declare the size of integer operands. In this paper, we introduce stochastic bit-width estimation that follows a simulation-based probabilistic approach to estimate the bit-widths of integer variables using extreme value theory. The estimation technique is also empirically compared to two compile-time integer bit-width analysis techniques. Our experimental results show that the stochastic bit-width estimation technique dramatically reduces integer bit-widths and, therefore, enables more compact and power-efficient custom hardware designs than the compile-time integer bit-width analysis techniques. Up to 37% reduction in custom hardware area and 30% reduction in logic power consumption using stochastic bit-width estimation can be attained over ten integer applications implemented on an FPGA chip.

The Journal of Supercomputing | 2013

Enhancing GPU parallelism in nature-inspired algorithms

José M. Cecilia; Andy Nisbet; Martyn Amos; José M. García; Manuel Ujaldon

We present GPU implementations of two different nature-inspired optimization methods for well-known optimization problems. Ant Colony Optimization (ACO) is a two-stage population-based method modelled on the foraging behaviour of ants, while P systems provide a high-level computational modelling framework that combines the structure and dynamic aspects of biological systems (in particular, their parallel and non-deterministic nature). Our methods focus on exploiting data parallelism and memory hierarchy to obtain GPU factor gains surpassing 20x for any of the two stages of the ACO algorithm, and 16x for P systems when compared to sequential versions running on a single-threaded high-end CPU. Additionally, we compare performance between GPU generations to validate hardware enhancements introduced by Nvidia’s Fermi architecture.

field-programmable logic and applications | 2006

High Performance Scientific Computing Using FPGAs with IEEE Floating Point and Logarithmic Arithmetic for Lattice QCD

Owen Callanan; David Gregg; Andy Nisbet; Mike Peardon

The recent development of large FPGAs along with the availability of a variety of floating point cores have made it possible to implement high-performance matrix and vector kernel operations on FPGAs. In this paper we seek to evaluate the performance of FPGAs for real scientific computations by implementing Lattice QCD, one of the classic scientific computing problems. Lattice QCD is the focus of considerable research work worldwide, including two custom ASIC-based solutions. Our results give significant insights into the usefulness of FPGAs for scientific computing. We also seek to evaluate two different number systems available for running scientific computations on FPGAs. To do this we implement FPGA based lattice QCD processors using both double precision IEEE floating point and single precision equivalent Logarithmic Number System (LNS) cores and compare their performance with that of two lattice QCD targeted ASIC based solutions and with PC cluster based solutions.

international parallel and distributed processing symposium | 2005

FPGA implementation of a lattice quantum chromodynamics algorithm using logarithmic arithmetic

Owen Callanan; Andy Nisbet; Emre Özer; James Sexton; David Gregg

In this paper, we discuss the implementation of a lattice quantum chromodynamics (QCD) application to a Xilinx VirtexII FPGA device on an Alpha Data ADM-XRC-II board using Handel-C and logarithmic arithmetic. The specific algorithm implemented is the Wilson Dirac Fermion Vector times matrix product operation. QCD is the scientific theory that describes the interactions of various types of sub-atomic particles. Lattice QCD is the use of computer simulations to prove aspects of this theory. The research described in this paper aims to investigate whether FPGAs and logarithmic arithmetic are a viable compute-platform for high performance computing by implementing lattice QCD for this platform. We have achieved competitive performance of at least 936 MFlops per node, executing 14.2 floating point equivalent operations per cycle, which is far higher than the previous solutions proposed for lattice QCD simulations.

Explore More