Paul H. J. Kelly | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Paul H. J. Kelly is active.

Explore More

Publication

Featured researches published by Paul H. J. Kelly.

international conference on parallel architectures and languages europe | 1993

Parallel Programming Using Skeleton Functions

John Darlington; A. J. Field; Peter G. Harrison; Paul H. J. Kelly; David W. N. Sharp; Qian Wu

Programming parallel machines is notoriously difficult. Factors contributing to this difficulty include the complexity of concurrency, the effect of resource allocation on performance and the current diversity of parallel machine models. The net result is that effective portability, which depends crucially on the predictability of performance, has been lost. Functional programming languages have been put forward as solutions to these problems, because of the availability of implicit parallelism. However, performance will be generally poor unless the issue of resource allocation is addressed explicitly, diminishing the advantage of using a functional language in the first place.

ACM Transactions on Programming Languages and Systems | 2007

Efficient field-sensitive pointer analysis of C

David J. Pearce; Paul H. J. Kelly; Chris Hankin

The subject of this article is flow- and context-insensitive pointer analysis. We present a novel approach for precisely modelling struct variables and indirect function calls. Our method emphasises efficiency and simplicity and is based on a simple language of set constraints. We obtain an O(v4) bound on the time needed to solve a set of constraints from this language, where v is the number of constraint variables. This gives, for the first time, some insight into the hardness of performing field-sensitive pointer analysis of C. Furthermore, we experimentally evaluate the time versus precision trade-off for our method by comparing against the field-insensitive equivalent. Our benchmark suite consists of 11 common C programs ranging in size from 15,000 to 200,000 lines of code. Our results indicate the field-sensitive analysis is more expensive to compute, but yields significantly better precision. In addition, our technique has been integrated into the latest release (version 4.1) of the GNU Compiler GCC. Finally, we identify several previously unknown issues with an alternative and less precise approach to modelling struct variables, known as field-based analysis.

international symposium on mixed and augmented reality | 2014

Dense planar SLAM

Renato F. Salas-Moreno; Ben Glocken; Paul H. J. Kelly; Andrew J. Davison

Using higher-level entities during mapping has the potential to improve camera localisation performance and give substantial perception capabilities to real-time 3D SLAM systems. We present an efficient new real-time approach which densely maps an environment using bounded planes and surfels extracted from depth images (like those produced by RGB-D sensors or dense multi-view stereo reconstruction). Our method offers the every-pixel descriptive power of the latest dense SLAM approaches, but takes advantage directly of the planarity of many parts of real-world scenes via a data-driven process to directly regularize planar regions and represent their accurate extent efficiently using an occupancy approach with on-line compression. Large areas can be mapped efficiently and with useful semantic planar structure which enables intuitive and useful AR applications such as using any wall or other planar surface in a scene to display a users content.

ACM Journal of Experimental Algorithms | 2007

A dynamic topological sort algorithm for directed acyclic graphs

David J. Pearce; Paul H. J. Kelly

We consider the problem of maintaining the topological order of a directed acyclic graph (DAG) in the presence of edge insertions and deletions. We present a new algorithm and, although this has inferior time complexity compared with the best previously known result, we find that its simplicity leads to better performance in practice. In addition, we provide an empirical comparison against the three main alternatives over a large number of random DAGs. The results show our algorithm is the best for sparse digraphs and only a constant factor slower than the best on dense digraphs.

ACM Transactions on Mathematical Software | 2017

Firedrake: Automating the Finite Element Method by Composing Abstractions

Florian Rathgeber; David A. Ham; Lawrence Mitchell; Fabio Luporini; Andrew T. T. McRae; Gheorghe-Teodor Bercea; Graham Markall; Paul H. J. Kelly

Firedrake is a new tool for automating the numerical solution of partial differential equations. Firedrake adopts the domain-specific language for the finite element method of the FEniCS project, but with a pure Python runtime-only implementation centred on the composition of several existing and new abstractions for particular aspects of scientific computing. The result is a more complete separation of concerns which eases the incorporation of separate contributions from computer scientists, numerical analysts and application specialists. These contributions may add functionality, or improve performance. Firedrake benefits from automatically applying new optimisations. This includes factorising mixed function spaces, transforming and vectorising inner loops, and intrinsically supporting block matrix operations. Importantly, Firedrake presents a simple public API for escaping the UFL abstraction. This allows users to implement common operations that fall outside pure variational formulations, such as flux-limiters.

Software - Practice and Experience | 2007

Profiling with AspectJ

David J. Pearce; Matthew Alexander Webster; Robert Francis Berry; Paul H. J. Kelly

A system for manufacturing tire molding metal molds with electrical discharge machining is disclosed, which makes possible the rotation, longitudinal and cross feeding of a work table on which a workpiece to be machined into a tire molding metal mold is placed and the rotation, lifting and lowering of a machining head, whereby the workpiece is discharge-machined by feeding a discharge machining electrode in a plurality of different directions corresponding to a plurality of projections protruding essentially vertical to the tire molding surface of the tire molding metal mold to be machined, and by changing electrodes having such profiles as to prevent the excess metal removal of the projections in accordance with each of the electrode feeding directions.

european conference on computer systems | 2011

Symbolic crosschecking of floating-point and SIMD code

Peter Collingbourne; Cristian Cadar; Paul H. J. Kelly

We present an effective technique for crosschecking an IEEE 754 floating-point program and its SIMD-vectorized version, implemented in KLEE-FP, an extension to the KLEE symbolic execution tool that supports symbolic reasoning on the equivalence between floating-point values. The key insight behind our approach is that floatingpoint values are only reliably equal if they are essentially built by the same operations. As a result, our technique works by lowering the Intel Streaming SIMD Extension (SSE) instruction set to primitive integer and floating-point operations, and then using an algorithm based on symbolic expression matching augmented with canonicalization rules. Under symbolic execution, we have to verify equivalence along every feasible control-flow path. We reduce the branching factor of this process by aggressively merging conditionals, if-converting branches into select operations via an aggressive phi-node folding transformation. We applied KLEE-FP to OpenCV, a popular open source computer vision library. KLEE-FP was able to successfully crosscheck 51 SIMD/SSE implementations against their corresponding scalar versions, proving the bounded equivalence of 41 of them (i.e., on images up to a certain size), and finding inconsistencies in the other 10.

international conference on robotics and automation | 2015

Introducing SLAMBench, a performance and accuracy benchmarking methodology for SLAM

Luigi Nardi; Bruno Bodin; M. Zeeshan Zia; John Mawer; Andy Nisbet; Paul H. J. Kelly; Andrew J. Davison; Mikel Luján; Michael F. P. O'Boyle; Graham D. Riley; Nigel P. Topham; Stephen B. Furber

Real-time dense computer vision and SLAM offer great potential for a new level of scene modelling, tracking and real environmental interaction for many types of robot, but their high computational requirements mean that use on mass market embedded platforms is challenging. Meanwhile, trends in low-cost, low-power processing are towards massive parallelism and heterogeneity, making it difficult for robotics and vision researchers to implement their algorithms in a performance-portable way. In this paper we introduce SLAMBench, a publicly-available software framework which represents a starting point for quantitative, comparable and validatable experimental research to investigate trade-offs in performance, accuracy and energy consumption of a dense RGB-D SLAM system. SLAMBench provides a KinectFusion implementation in C++, OpenMP, OpenCL and CUDA, and harnesses the ICL-NUIM dataset of synthetic RGB-D sequences with trajectory and scene ground truth for reliable accuracy comparison of different implementation and algorithms. We present an analysis and breakdown of the constituent algorithmic elements of KinectFusion, and experimentally investigate their execution time on a variety of multicore and GPU-accelerated platforms. For a popular embedded platform, we also present an analysis of energy efficiency for different configuration alternatives.

The Computer Journal | 2012

Performance Analysis and Optimization of the OP2 Framework on Many-Core Architectures

Michael B. Giles; Gihan R. Mudalige; Z. Sharif; Graham Markall; Paul H. J. Kelly

This paper presents a benchmarking, performance analysis and optimization study of the OP2 ‘active’ library, which provides an abstraction framework for the parallel execution of unstructured mesh applications. OP2 aims to decouple the scientific specification of the application from its parallel implementation, and thereby achieve code longevity and near-optimal performance through re-targeting the application to execute on different multi-core/many-core hardware. Runtime performance results are presented for a representative unstructured mesh application on a variety of many-core processor systems, including traditional X86 architectures from Intel (Xeon based on the older Penryn and current Nehalem micro-architectures) and GPU offerings from NVIDIA (GTX260, Tesla C2050). Our analysis demonstrates the contrasting performance between the use of CPU (OpenMP) and GPU (CUDA) parallel implementations for the solution of an industrial-sized unstructured mesh consisting of about 1.5Â million edges. Results show the significance of choosing the correct partition and thread-block configuration, the factors limiting the GPU performance and insights into optimizations for improved performance.

Lecture Notes in Computer Science | 2002

GILK: A Dynamic Instrumentation Tool for the Linux Kernel

David J. Pearce; Paul H. J. Kelly; Tony Field; Uli Harder

This paper describes a dynamic instrumentation tool for the Linux Kernel which allows a stock Linux kernel to be modified while in execution, with instruments implemented as kernel modules. The Intel x86 architecture poses a particular problem, due to variable length instructions, which this paper addresses for the first time. Finally we present a short case study illustrating its use in understanding i/o behaviour in the kernel. The source code is freely available for download.

Explore More