Sven H. M. Buijssen
Technical University of Dortmund
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Sven H. M. Buijssen.
parallel computing | 2007
Dominik Göddeke; Robert Strzodka; Jamaludin Mohd-Yusof; Patrick S. McCormick; Sven H. M. Buijssen; Matthias Grajewski; Stefan Turek
The first part of this paper surveys co-processor approaches for commodity based clusters in general, not only with respect to raw performance, but also in view of their system integration and power consumption. We then extend previous work on a small GPU cluster by exploring the heterogeneous hardware approach for a large-scale system with up to 160 nodes. Starting with a conventional commodity based cluster we leverage the high bandwidth of graphics processing units (GPUs) to increase the overall system bandwidth that is the decisive performance factor in this scenario. Thus, even the addition of low-end, out of date GPUs leads to improvements in both performance- and power-related metrics.
international conference on high performance computing and simulation | 2009
Dominik Göddeke; Sven H. M. Buijssen; Hilmar Wobker; Stefan Turek
We have previously suggested a minimally invasive approach to include hardware accelerators into an existing large-scale parallel finite element PDE solver toolkit, and implemented it into our software FEAST. Our concept has the important advantage that applications built on top of FEAST benefit from the acceleration immediately, without changes to application code. In this paper we explore the limitations of our approach by accelerating a Navier-Stokes solver. This nonlinear saddle point problem is much more involved than our previous tests, and does not exhibit an equally favourable acceleration potential: Not all computational work is concentrated inside the linear solver. Nonetheless, we are able to achieve speedups of more than a factor of two on a small GPU-enhanced cluster. We conclude with a discussion how our concept can be altered to further improve acceleration.
Archive | 2008
Chr. Becker; Sven H. M. Buijssen; Stefan Turek
Modern processors reach their performance speedup not merely by increasing clock frequency, but to a greater extend by fundamental changes and extensions of the processor architecture itself. These extensions require the application developer to adapt programming techniques to exploit the existing performance potential. Otherwise the situation may arise that the processor becomes nominally faster, but the application doesn’t run faster [3, 4]. A limiting factor for computations is memory access. There is an ever increasing discrepancy between CPU cycle time and main storage access time. Fetching data is expensive in terms of CPU being idle. To narrow the gap between smaller CPU cycle times and possible access times of main storage in general, a rapid access temporary storage between CPU and main storage was introduced, the so-called cache. The basic idea of a cache is to store data following the locality of reference principle. Latency is reduced if a subsequently requested datum is found in the faster cache instead of having to transfer it from slow main storage. Given a sufficient locality of the data, i.e. the data of preceding accesses is still cached, the number of accesses to the cache will exceed those to slow main storage. Throughput can be increased significantly this way. Access to main storage will not be faster with any access sample automatically, but only if the program uses mainly data being already in the cache. This requires appropriate adjustments being made to the applications [2].
Archive | 2005
Sven H. M. Buijssen; Stefan Turek
Parallel multigrid methods belong to the most prominent tools for solving huge systems of (non-)linear equations arising from the discretisation of PDEs, as for instance in Computational Fluid Dynamics (CFD). However, the quality of (parallel) multigrid methods in regard of numerical and computational complexity mainly stands and falls with the smoothing algorithms (“smoother”) used. Since the inherent highly recursive character of many global smoothers (SOR, ILU) often impedes a direct parallelisation, the application of block smoothers is an alternative. However, due to the weakened recursive character, the resulting parallel efficiency may decrease in comparison to the sequential performance, due to a weaker total numerical efficiency. Within this paper, we show the consequences of such a strategy for the resulting total efficiency on the Hitachi SR8000-F1 if incorporated into the parallel CFD solver parpp3d++ for 3D incompressible flow. Moreover, we analyse the numerical losses of parallel efficiency due to communication costs and numerical efficiency on several modern parallel computer platforms.
international supercomputing conference | 2010
Stefan Turek; Dominik Göddeke; Christian Becker; Sven H. M. Buijssen; Hilmar Wobker
Chemical Engineering & Technology | 2005
Carsten Schmitt; David W. Agar; Frank Platte; Sven H. M. Buijssen; Beate Pawlowski; Matthias Duisberg
Archive | 2011
Stefan Turek; Dominik Göddeke; Sven H. M. Buijssen; Hilmar Wobker
european conference on parallel processing | 2002
Sven H. M. Buijssen; Stefan Turek
Archive | 2011
Stefan Turek; Dominik Göddeke; Christian Becker; Sven H. M. Buijssen; Hilmar Wobker
european conference on parallel processing | 2002
Sven H. M. Buijssen; Stefan Turek