Oliver Pell | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Oliver Pell is active.

Explore More

Publication

Featured researches published by Oliver Pell.

international symposium on microarchitecture | 2011

Beyond Traditional Microprocessors for Geoscience High-Performance Computing Applications

Olav Lindtjorn; Robert G. Clapp; Oliver Pell; H Fu; Michael J. Flynn; Haohuan Fu

The oil and gas industry is a major user of high-performance computing, and geoscience computational cycles are dominated by kernels that are relatively few and well defined. This project explores accelerating geoscience applications using FPGA-based hardware, optimizing the algorithm and the hardware to achieve maximum performance. This approach can deliver speedup of 20 to 70 times compared with a conventional HPC node.

ACM Sigarch Computer Architecture News | 2011

Surviving the end of frequency scaling with reconfigurable dataflow computing

Oliver Pell; Oskar Mencer

Over the past decade x86 processors have come to dominate the worlds largest supercomputers. However in the future conventional multicore processors are unlikely to be able to deliver the necessary performance per

IEEE Micro | 2014

Scaling Reverse Time Migration Performance through Reconfigurable Dataflow Engines

Haohuan Fu; Lin Gan; Robert G. Clapp; Huabin Ruan; Oliver Pell; Oskar Mencer; Michael J. Flynn; Xiaomeng Huang; Guangwen Yang

and per W to achieve exascale performance. Heterogeneous computing is emerging as a powerful alternative to conventional multi-core to help address these challenges. In this paper we describe our approach to Maximum Performance Computing - building applicationspecific computers which complement conventional x86 processors with high performance dataflow engines implemented on FPGA to provide 10-100x improvements in performance and performance/W. We describe the MaxCompiler programming system which allows software engineers to create dataflow engines optimized for their particular applications, and discuss an example application that has been accelerated using this methodology.

field programmable logic and applications | 2012

Exploiting run-time reconfiguration in stencil computation

Xinyu Niu; Qiwei Jin; Wayne Luk; Qiang Liu; Oliver Pell

Seismic migrations dominate about 90 percent of the computation cycles in the oil and gas industry. With the demand to handle high-density data and more complicated physics models, migration applications always call for more computing power, and they adopt new architectures quickly. Current multicore and many-core architectures have significantly improved the density of computational resources within a chip, but they also have made memory bandwidth a bottleneck that stops the scaling of performance over the increased number of cores. In this article, the authors present their reverse time migration design based on reconfigurable data-flow engines. Combining both algorithmic and architectural optimizations, they manage to achieve a balanced utilization of various resources (computational logic, local buffers, memory bandwidth, and so on) in the system, with none of them becoming the performance bottleneck. Their data-flow design provides performance equivalent to 72 Intel CPU cores, and achieves 10 times higher power efficiency than the multicore CPU architecture.

field-programmable logic and applications | 2006

Comparing FPGAs to Graphics Accelerators and the Playstation 2 Using a Unified Source Description

Lee W. Howes; Paul Price; Oskar Mencer; Olav Beckmann; Oliver Pell

Stencil computation is computationally intensive and required by many applications. This paper proposes an approach to exploit run-time reconfigurability of field-programmable accelerators for stencil computation. System throughput is optimized by partitioning, analysing and scheduling tasks in applications to remove idle functions. To evaluate the proposed approach, Reverse Time Migration (RTM), a high performance application, is developed. Our optimized runtime reconfigurable solution, which targets a Virtex-6 FPGA in a Maxeler MAX3424A system, can achieves an improved throughput of 102.8 GFlop/s, up to two orders of magnitude faster than the CPU reference designs, 1.59 times faster than the best published GPU and FPGA results, and 1.45 times faster than an optimized static implementation.

high performance computational finance | 2010

Accelerating the computation of portfolios of tranched credit derivatives

Stephen Weston; Jean-Tristan Marin; James Barry Spooner; Oliver Pell; Oskar Mencer

Field programmable gate arrays (FPGAs), graphics processing units (GPUs) and Sonys Playstation 2 vector units offer scope for hardware acceleration of applications. We compare the performance of these architectures using a unified description based on A Stream Compiler (ASC) for FPGAs, which has been extended to target GPUs and PS2 vector units. Programming these architectures from a single description enables us to reason about optimizations for the different architectures. Using the ASC description we implement a Monte Carlo simulation, a fast Fourier transform (FFT) and a weighted sum algorithm. Our results show that without much optimization the GPU is suited to the Monte Carlo simulation, while the weighted sum is better suited to PS2 vector units. FPGA implementations benefit particularly from architecture specific optimizations which ASC allows us to easily implement by adding simple annotations to the shared code.

IEEE Transactions on Parallel and Distributed Systems | 2013

Finite-Difference Wave Propagation Modeling on Special-Purpose Dataflow Machines

Oliver Pell; Jacob A. Bower; Robert G. Dimond; Oskar Mencer; Michael J. Flynn

Huge growth in the trading and complexity of credit derivative instruments over the past five years has driven the need for ever more computationally demanding mathematical models. This has led to massive growth in data center compute capacity, power and cooling requirements. We report the results of an on-going joint project between J.P. Morgan and specialist acceleration solutions provider Maxeler Technologies to improve the price-performance for calculating the value and risk of a large complex credit derivatives portfolio. Our results show that valuing tranches of Collateralized Default Obligations (CDOs) on Maxeler accelerated systems is over 30 times faster per cubic foot and per Watt than solutions using standard multi-core Intel Xeon processors. We also report some preliminary results of further work that extends the approach to classes of interest rate derivatives.

international symposium on parallel and distributed computing | 2008

Finding Speedup in Parallel Processors

Michael J. Flynn; Robert G. Dimond; Oskar Mencer; Oliver Pell

Modeling wave propagation through the earth is an important application in geoscience. We present a framework for wave propagation modeling on special-purpose hardware, which dramatically improves the application performance compared to conventional CPUs. We utilize custom hardware platforms consisting of a mix of x86 CPUs and dataflow engines connected by high-bandwidth communication links. Application programmers describe their algorithms in a domain specific language using Java syntax, with special dataflow semantics overlayed on top of the Java language. The application-specific dataflow engines run at hundreds of MHz with massive parallelism and deliver high performance/Watt, up to 30 times more energy efficient than conventional CPUs. The power efficiency of this approach suggests that dataflow computing may have a key role to play in the improvements in power efficiency necessary to reach exascale computing.

Seg Technical Program Expanded Abstracts | 2008

An implementation of the acoustic wave equation on FPGAs

Rob Dimond; Oliver Pell; Tamas Nemeth; Wei Liu; Joe Stefani; Ray Ergas

While recently the focus of architects and programmers has been on multi core, the alternative of processor node plus array oriented accelerator has some significant advantages especially in compute intensive static applications. We propose an acceleration methodology based on FPGA arrays (but, in principle it could be GPU or Cell based). The methodology uses a comprehensive application analysis supported by high performance FPGA hardware. The analysis provides a dataflow graph of the application which is replicated in SIMD for multiple data strips until limited by the pin bandwidth, then pipelined (MISD) until circuit limited. An oil exploration application shows the possibility of speedup of over 300x over an Intel Xeon.

Seg Technical Program Expanded Abstracts | 2009

Accelerating 3D Convolution Using Streaming Architectures On FPGAs

Haohuan Fu; Robert G. Clapp; Oskar Mencer; Oliver Pell

Hardware accelerators as co-processors are emerging as a powerful solution to computationally intensive problems. A standard desktop PC or cluster node can be augmented with additional hardware dedicated to providing substantially increased performance for particular applications. Previous efforts have shown that FPGA-based hardware accelerators can offer order-of-magnitude greater performance than conventional CPUs, providing the target algorithm performs a large number of operations per data point. FPGAs are off-the-shelf chips with a configurable ‘sea’ of logic and memory that can be used to implement digital circuits. FPGAs can be attached to the compute system either through the main system bus or as PCI Express cards (or similar) and are typically configured as highly parallel stream processors. FPGA acceleration has been successfully demonstrated in a variety of application domains including computational finance (Zhang et al., 2005), fluid dynamics (Sano et al., 2007), cryptography (Cheung et al., 2005) and seismic processing (Bean and Gray, 1997; He et al., 2005a; He et al., 2005b; Pell and Clapp, 2007).

Explore More