André Rigland Brodtkorb

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where André Rigland Brodtkorb is active.

Explore More

Publication

Featured researches published by André Rigland Brodtkorb.

Scientific Programming | 2010

State-of-the-art in heterogeneous computing

André Rigland Brodtkorb; Christopher Dyken; Trond Runar Hagen; Jon M. Hjelmervik; Olaf O Storaasli

Node level heterogeneous architectures have become attractive during the last decade for several reasons: compared to traditional symmetric CPUs, they offer high peak performance and are energy and/or cost efficient. With the increase of fine-grained parallelism in high-performance computing, as well as the introduction of parallelism in workstations, there is an acute need for a good overview and understanding of these architectures. We give an overview of the state-of-the-art in heterogeneous computing, focusing on three commonly found architectures: the Cell Broadband Engine Architecture, graphics processing units (GPUs), and field programmable gate arrays (FPGAs). We present a review of hardware, available software tools, and an overview of state-of-the-art techniques and algorithms. Furthermore, we present a qualitative and quantitative comparison of the architectures, and give our view on the future of heterogeneous computing.

Journal of Parallel and Distributed Computing | 2013

Graphics processing unit (GPU) programming strategies and trends in GPU computing

André Rigland Brodtkorb; Trond Runar Hagen; Martin Lilleeng Sætra

Over the last decade, there has been a growing interest in the use of graphics processing units (GPUs) for non-graphics applications. From early academic proof-of-concept papers around the year 2000, the use of GPUs has now matured to a point where there are countless industrial applications. Together with the expanding use of GPUs, we have also seen a tremendous development in the programming languages and tools, and getting started programming GPUs has never been easier. However, whilst getting started with GPU programming can be simple, being able to fully utilize GPU hardware is an art that can take months or years to master. The aim of this article is to simplify this process, by giving an overview of current GPU programming strategies, profile-driven development, and an outlook to future trends.

parallel computing | 2010

Shallow water simulations on multiple GPUs

Martin Lilleeng Sætra; André Rigland Brodtkorb

We present a state-of-the-art shallow water simulator running on multiple GPUs. Our implementation is based on an explicit high-resolution finite volume scheme suitable for modeling dam breaks and flooding. We use row domain decomposition to enable multi-GPU computations, and perform traditional CUDA block decomposition within each GPU for further parallelism. Our implementation shows near perfect weak and strong scaling, and enables simulation of domains consisting of up-to 235 million cells at a rate of over 1.2 gigacells per second using four Fermi-generation GPUs. The code is thoroughly benchmarked using three different systems, both high-performance and commodity-level systems.

Computing and Visualization in Science | 2010

Simulation and visualization of the Saint-Venant system using GPUs

André Rigland Brodtkorb; Trond Runar Hagen; Knut-Andreas Lie; Jostein R. Natvig

We consider three high-resolution schemes for computing shallow-water waves as described by the Saint-Venant system and discuss how to develop highly efficient implementations using graphical processing units (GPUs). The schemes are well-balanced for lake-at-rest problems, handle dry states, and support linear friction models. The first two schemes handle dry states by switching variables in the reconstruction step, so that bilinear reconstructions are computed using physical variables for small water depths and conserved variables elsewhere. In the third scheme, reconstructed slopes are modified in cells containing dry zones to ensure non-negative values at integration points. We discuss how single and double-precision arithmetics affect accuracy and efficiency, scalability and resource utilization for our implementations, and demonstrate that all three schemes map very well to current GPU hardware. We have also implemented direct and close-to-photo-realistic visualization of simulation results on the GPU, giving visual simulations with interactive speeds for reasonably-sized grids.

EURO Journal on Transportation and Logistics | 2013

GPU computing in discrete optimization. Part II: Survey focused on routing problems

Christian Ferdinand Schulz; Geir Hasle; André Rigland Brodtkorb; Trond Runar Hagen

In many cases there is still a large gap between the performance of current optimization technology and the requirements of real-world applications. As in the past, performance will improve through a combination of more powerful solution methods and a general performance increase of computers. These factors are not independent. Due to physical limits, hardware development no longer results in higher speed for sequential algorithms, but rather in increased parallelism. Modern commodity PCs include a multi-core CPU and at least one GPU, providing a low-cost, easily accessible heterogeneous environment for high-performance computing. New solution methods that combine task parallelization and stream processing are needed to fully exploit modern computer architectures and profit from future hardware developments. This paper is the second in a series of two. Part I gives a tutorial style introduction to modern PC architectures and GPU programming. Part II gives a broad survey of the literature on parallel computing in discrete optimization targeted at modern PCs, with special focus on routing problems. We assume that the reader is familiar with GPU programming, and refer the interested reader to Part I. We conclude with lessons learnt, directions for future research, and prospects.

EURO Journal on Transportation and Logistics | 2013

GPU Computing in Discrete Optimization Part I: Introduction to the GPU

André Rigland Brodtkorb; Trond Runar Hagen; Christian Ferdinand Schulz; Geir Hasle

In many cases there is still a large gap between the performance of current optimization technology and the requirements of real world applications. As in the past, performance will improve through a combination of more powerful solution methods and a general performance increase of computers. These factors are not independent. Due to physical limits, hardware development no longer results in higher speed for sequential algorithms, but rather in increased parallelism. Modern commodity PCs include a multi-core CPU and at least one GPU, providing a low cost, easily accessible heterogeneous environment for high performance computing. New solution methods that combine task parallelization and stream processing are needed to fully exploit modern computer architectures and profit from future hardware developments. This paper is the first part of a series of two, where the goal of this first part is to give a tutorial style introduction to modern PC architectures and GPU programming. We start with a short historical account of modern mainstream computer architectures, and a brief description of parallel computing. This is followed by the evolution of modern GPUs, before a GPU programming example is given. Strategies and guidelines for program development are also discussed. Part II gives a broad survey of the existing literature on parallel computing targeted at modern PCs in discrete optimization, with special focus on papers on routing problems. We conclude with lessons learnt, directions for future research, and prospects.

Journal of Scientific Computing | 2015

Efficient GPU-Implementation of Adaptive Mesh Refinement for the Shallow-Water Equations

Martin Lilleeng Sætra; André Rigland Brodtkorb; Knut-Andreas Lie

The shallow-water equations model hydrostatic flow below a free surface for cases in which the ratio between the vertical and horizontal length scales is small and are used to describe waves in lakes, rivers, oceans, and the atmosphere. The equations admit discontinuous solutions, and numerical solutions are typically computed using high-resolution schemes. For many practical problems, there is a need to increase the grid resolution locally to capture complicated structures or steep gradients in the solution. An efficient method to this end is adaptive mesh refinement (AMR), which recursively refines the grid in parts of the domain and adaptively updates the refinement as the simulation progresses. Several authors have demonstrated that the explicit stencil computations of high-resolution schemes map particularly well to many-core architectures seen in hardware accelerators such as graphics processing units (GPUs). Herein, we present the first full GPU-implementation of a block-based AMR method for the second-order Kurganov–Petrova central scheme. We discuss implementation details, potential pitfalls, and key insights, and present a series of performance and accuracy tests. Although it is only presented for a particular case herein, we believe our approach to GPU-implementation of AMR is transferable to other hyperbolic conservation laws, numerical schemes, and architectures similar to the GPU.

complex, intelligent and software intensive systems | 2008

The Graphics Processor as a Mathematical Coprocessor in MATLAB

André Rigland Brodtkorb

We present an interface to the graphics processing unit (GPU) from MATLAB, and four algorithms from numerical linear algebra available through this interface; matrix-matrix multiplication, Gauss-Jordan elimination, PLU factorization, and tridiagonal Gaussian elimination. In addition to being a high level abstraction to the GPU, the interface offers background processing, enabling computations to be executed on the CPU simultaneously. The algorithms are shown to be up-to 31 times faster than highly optimized CPU code. The algorithms have only been tested on single precision hardware, but will easily run on new double precision hardware.

advanced video and signal based surveillance | 2013

Real-time online camera synchronization for volume carving on GPU

Torkel Andreas Haufmann; André Rigland Brodtkorb; Asbj⊘rn Berge; Anna Kim

Volume carving is a well-known technique for reconstructing a 3D scene from a set of 2D images, using features detected in individual cameras, and camera parameters. Spatial calibration of the cameras is well understood, but the resulting carved volume is very sensitive to temporal offsets between the cameras. Automatic synchronization between the cameras is therefore desirable. In this paper, we present a highly efficient implementation of volume carving and synchronization on a heterogeneous system fitted with commodity GPUs using an improved version of the algorithm in [1]. An online, real-time synchronization system is described and evaluated on surveillance video of an indoor scene. Improvements to the state of the art CPU-based algorithms are described.

mathematical methods for curves and surfaces | 2008

A comparison of three commodity-level parallel architectures: multi-core CPU, cell BE and GPU

André Rigland Brodtkorb; Trond Runar Hagen

We explore three commodity parallel architectures: multi-core CPUs, the Cell BE processor, and graphics processing units. We have implemented four algorithms on these three architectures: solving the heat equation, inpainting using the heat equation, computing the Mandelbrot set, and MJPEG movie compression. We use these four algorithms to exemplify the benefits and drawbacks of each parallel architecture.

Explore More