Helmar Burkhart | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Helmar Burkhart is active.

Explore More

Publication

Featured researches published by Helmar Burkhart.

international parallel and distributed processing symposium | 2011

PATUS: A Code Generation and Autotuning Framework for Parallel Iterative Stencil Computations on Modern Microarchitectures

Matthias Christen; Olaf Schenk; Helmar Burkhart

Stencil calculations comprise an important class of kernels in many scientific computing applications ranging from simple PDE solvers to constituent kernels in multigrid methods as well as image processing applications. In such types of solvers, stencil kernels are often the dominant part of the computation, and an efficient parallel implementation of the kernel is therefore crucial in order to reduce the time to solution. However, in the current complex hardware micro architectures, meticulous architecture-specific tuning is required to elicit the machines full compute power. We present a code generation and auto-tuning framework \textsc{Patus} for stencil computations targeted at multi- and many core processors, such as multicore CPUs and graphics processing units, which makes it possible to generate compute kernels from a specification of the stencil operation and a parallelization and optimization strategy, and leverages the auto tuning methodology to optimize strategy-dependent parameters for the given hardware architecture.

IEEE Transactions on Computers | 1989

Performance-measurement tools in a multiprocessor environment

Helmar Burkhart; Roland Millen

A family of monitoring facilities is proposed which are used in combination, e.g. a breakpoint monitor used for debugging purposes, a mailbox monitor for analysis of synchronization traffic, and a bus monitor for measurements of bus load. These tools are used in multi-monitor mode, for which both a common programming-language interface and a user interface are provided. Design concepts are presented, along with the overall structure of such an integrated monitoring tool set in a multiprocessor environment. How a combination of hardware and software monitors is embedded into a MODULA-2 multiprocessor environment is outlined, as a case study. >

Journal of Parallel and Distributed Computing | 2008

Algorithmic performance studies on graphics processing units

Olaf Schenk; Matthias Christen; Helmar Burkhart

We report on our experience with integrating and using graphics processing units (GPUs) as fast parallel floating-point co-processors to accelerate two fundamental computational scientific kernels on the GPU: sparse direct factorization and nonlinear interior-point optimization. Since a full re-implementation of these complex kernels is typically not feasible, we identify the matrix-matrix multiplication as a first natural entry-point for a minimally invasive integration of GPUs. We investigate the performance on the NVIDIA GeForce 8800 multicore chip initially architectured for intensive gaming applications. We exploit the architectural features of the GeForce 8800 GPU to design an efficient GPU-parallel sparse matrix solver. A prototype approach to leverage the bandwidth and computing power of GPUs for these matrix kernel operation is demonstrated resulting in an overall performance of over 110 GFlops/s on the desktop for large matrices and over 38 GFlops/s for sparse matrices arising in real applications. We use our GPU algorithm for PDE-constrained optimization problems and demonstrate that the commodity GPU is a useful co-processor for scientific applications.

Future Generation Computer Systems | 2003

An Interdisciplinary Virtual Laboratory on Nanoscience

M. Guggisberg; Peter Fornaro; T. Gyalog; Helmar Burkhart

The Swiss Virtual Campus project ”Virtual Nanoscience Laboratory” realises a virtual laboratory for the booming field of Nanoscience. Nanoscience laboratories are expensive and only major companies and organizations sponsored by research programmes can aord their usage. With a concept of distance education, complex and sensible experimental equipment can be shared through the Internet. Three main topics are realized in the framework of a virtual laboratory: user management, communication and co-operation, and the control of virtual experiments. The basic architecture is based on a multitiered client-server model. Each nanoscience experiment is implemented as a stand-alone web-service.

international parallel and distributed processing symposium | 2009

Parallel data-locality aware stencil computations on modern micro-architectures

Matthias Christen; Olaf Schenk; Esra Neufeld; Peter Messmer; Helmar Burkhart

Novel micro-architectures including the Cell Broadband Engine Architecture and graphics processing units are attractive platforms for compute-intensive simulations. This paper focuses on stencil computations arising in the context of a biomedical simulation and presents performance benchmarks on both the Cell BE and GPUs and contrasts them with a benchmark on a traditional CPU system. Due to the low arithmetic intensity of stencil computations, typically only a fraction of the peak performance of the compute hardware is reached. An algorithm is presented, which reduces the bandwidth requirements and thereby improves performance by exploiting temporal locality of the data. We report on performance improvements over CPU implementations.

Computer Science - Research and Development | 2011

Automatic code generation and tuning for stencil kernels on modern shared memory architectures

Matthias Christen; Olaf Schenk; Helmar Burkhart

In this paper, we present Patus, a code generation and auto-tuning framework for stencil computations targeted at multi- and manycore processors, such as multicore CPUs and graphics processing units. Patus, which stands for “Parallel Autotuned Stencils,” generates a compute kernel from a specification of the stencil operation and a strategy which describes the parallelization and optimization to be applied, and leverages the autotuning methodology to optimize strategy-specific parameters for the given hardware architecture.

parallel computing | 2012

An auction-based weighted matching implementation on massively parallel architectures

Madan Sathe; Olaf Schenk; Helmar Burkhart

Maximum weighted matchings represent a fundamental kernel in massive graph analysis and occur in a wide range of real-life applications. Here, a parallel auction-based matching algorithm is developed, which is able to tackle matchings in very large, dense, and sparse bipartite graphs. It will be demonstrated that the convergence of the auction algorithm crucially depends on two different @e-scaling strategies. The auction algorithm including the @e-scaling strategies has been implemented using a hybrid MPI-OpenMP programming model, and its performance is validated in various applications from bioinformatics, computer vision, and sparse linear algebra. It is concluded that for dense bipartite graphs, the auction algorithm scales well, and for sparse bipartite graphs at least a substantial speedup is achieved against alternative approaches that are based on augmenting path algorithms.

databases and social networks | 2011

Social-data storage-systems

Nicolas Ruflin; Helmar Burkhart; Sven Rizzotti

The amount of social data produced by a wide variety of social platforms grows every day. Storing and querying this huge amount of data in almost real time presents a challenge to storage systems in order to scale up to hundreds or thousands of nodes. Also the graph structure and the diverse and changing structure of every single node in social data has to be handled by these systems. In this paper, we describe five storage system types on the basis of eight current open source storage system solutions in order to analyze their application potential.

Archive | 1994

Steps Towards Reusability and Portability in Parallel Programming

Helmar Burkhart; Stephan Gutzwiller

Skeleton-oriented programming is a new technique that aims towards reusability of software components in massively parallel systems. Carefully tested and efficiently implemented coordination schemes and data distributions are collected in a library of algorithmic skeletons. Programmers inspect the library, access the appropriate element and fill in the application-dependent parts. Our approach has several benefits, such as improved portability, reusability, and correctness of software.

international conference on advanced learning technologies | 2006

Problem-Based Learning Using Mobile Devices

Christian Wattinger; Duc Phuong Nguyen; Peter Fornaro; M. Guggisberg; T. Gyalog; Helmar Burkhart

Small handheld devices such as PDAs and smart phones become more and more popular. Thus, we are seeing a growing number of projects using mobile devices for educational purposes. Due to the limitations of screen size, CPU performance, and memory size, software development for mobile devices is challenging. More critical argument is about the lack of pedagogic and didactic concepts on the usage of mobile devices in education. In this paper, we present a system using mobile devices which supports research experiences for students in a laboratory context. The system allows students to efficiently monitor and control their scientific experiments at anytime, from anywhere. We describe several usage scenarios in the area of nanoscience studies, where the remote control of microscopes is mandatory. We present our system architecture and describe the implementation based on open-source software.

Explore More