Is this you? Create Your Porfile

Ulrich Rüde

University of Erlangen-Nuremberg

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Ulrich Rüde is active.

Explore More

Publication

Featured researches published by Ulrich Rüde.

ieee international conference on high performance computing data and analytics | 2013

Multiphysics simulations: Challenges and opportunities

David E. Keyes; Lois Curfman McInnes; Carol S. Woodward; William Gropp; Eric Myra; Michael Pernice; John B. Bell; Jed Brown; Alain Clo; Jeffrey M. Connors; Emil M. Constantinescu; Donald Estep; Kate Evans; Charbel Farhat; Ammar Hakim; Glenn E. Hammond; Glen A. Hansen; Judith C. Hill; Tobin Isaac; Kirk E. Jordan; Dinesh K. Kaushik; Efthimios Kaxiras; Alice Koniges; Kihwan Lee; Aaron Lott; Qiming Lu; John Harold Magerlein; Reed M. Maxwell; Michael McCourt; Miriam Mehl

We consider multiphysics applications from algorithmic and architectural perspectives, where “algorithmic” includes both mathematical analysis and computational complexity, and “architectural” includes both software and hardware environments. Many diverse multiphysics applications can be reduced, en route to their computational simulation, to a common algebraic coupling paradigm. Mathematical analysis of multiphysics coupling in this form is not always practical for realistic applications, but model problems representative of applications discussed herein can provide insight. A variety of software frameworks for multiphysics applications have been constructed and refined within disciplinary communities and executed on leading-edge computer systems. We examine several of these, expose some commonalities among them, and attempt to extrapolate best practices to future systems. From our study, we summarize challenges and forecast opportunities.

Parallel Processing Letters | 2003

OPTIMIZATION AND PROFILING OF THE CACHE PERFORMANCE OF PARALLEL LATTICE BOLTZMANN CODES

Thomas Pohl; Markus Kowarschik; Jens Wilke; Klaus Iglberger; Ulrich Rüde

When designing and implementing highly efficient scientific applications for parallel computers such as clusters of workstations, it is inevitable to consider and to optimize the single-CPU performance of the codes. For this purpose, it is particularly important that the codes respect the hierarchical memory designs that computer architects employ in order to hide the effects of the growing gap between CPU performance and main memory speed. In this article, we present techniques to enhance the single-CPU efficiency of lattice Boltzmann methods which are commonly used in computational fluid dynamics. We show various performance results for both 2D and 3D codes in order to emphasize the effectiveness of our optimization techniques.

Journal of Computational Science | 2011

WaLBerla: HPC software design for computational engineering simulations

Christian Feichtinger; Stefan Donath; Harald Köstler; Jan Götz; Ulrich Rüde

Abstract WaLBerla (Widely applicable Lattice-Boltzmann from Erlangen) is a massively parallel software framework supporting a wide range of physical phenomena. This article describes the software designs realizing the major goal of the framework, a good balance between expandability and scalable, highly optimized, hardware-dependent, special purpose kernels. To demonstrate our designs, we discuss the coupling of our Lattice-Boltzmann fluid flow solver and a method for fluid structure interaction. Additionally, we show a software design for heterogeneous computations on GPU and CPU utilizing optimized kernels. Finally, we estimate the software quality of the framework on the basis of software quality factors.

Computing in Science and Engineering | 2006

A Massively Parallel Multigrid Method for Finite Elements

Benjamin Karl Bergen; Tobias Gradl; Ulrich Rüde; Frank Hülsemann

The hierarchical hybrid grid framework supports the parallel implementation of multigrid solvers for finite element problems. Specifically, it generates extremely fine meshes by using a structured refinement of an unstructured base mesh. For special problems with piecewise uniform material parameters, this leads to the possibility of stencil-based operations, which save substantial memory and permit a very efficient implementation of the multigrid method

Archive | 2006

Parallel Geometric Multigrid

Frank Hülsemann; Markus Kowarschik; Marcus Mohr; Ulrich Rüde

Multigrid methods are among the fastest numerical algorithms for the solution of large sparse systems of linear equations. While these algorithms exhibit asymptotically optimal computational complexity, their efficient parallelisation is hampered by the poor computation-to-communication ratio on the coarse grids. Our contribution discusses parallelisation techniques for geometric multigrid methods. It covers both theoretical approaches as well as practical implementation issues that may guide code development.

Archive | 2006

Parallel Lattice Boltzmann Methods for CFD Applications

Carolin Körner; Thomas Pohl; Ulrich Rüde; Nils Thürey; Thomas Zeiser

The lattice Boltzmann method (LBM) has evolved to a promising alternative to the well-established methods based on finite elements/volumes for computational fluid dynamics simulations. Ease of implementation, extensibility, and computational efficiency are the major reasons for LBM’s growing field of application and increasing popularity. In this paper we give a brief introduction to the involved theory and equations for LBM, present various techniques to increase the single-CPU performance, outline the parallelization of a standard LBM implementation, and show performance results. In order to demonstrate the straightforward extensibility of LBM, we then focus on an application in material science involving fluid flows with free surfaces. We discuss the required extensions to handle this complex scenario, and the impact on the parallelization technique.

symposium on computer animation | 2006

Animation of open water phenomena with coupled shallow water and free surface simulations

Nils Thürey; Ulrich Rüde; Marc Stamminger

The goal of this paper is to perform simulations that capture fluid effects from small drops up to the propagation of large waves. To achieve this, we present a hybrid simulation method, that couples a two-dimensional shallow water simulation with a full three-dimensional free surface fluid simulation. We explain the approximations imposed by the shallow water model, and how to parametrize it according to the parameters of a 3D simulation. Each simulation is used to initialize double layered boundary conditions for the other one. The area covered by the 2D region can be an order of magnitude larger than the 3D region without significantly effecting the overall computation time. The 3D region can furthermore be easily moved within the 2D region during the course of the simulation. To achieve realistic results we combine our simulation method with a physically based model to generate and animate drops. For their generation we make use of the fluid turbulence model, and animate them with a simplified drag calculation. This allows simulations with relatively low resolutions.

ieee international conference on high performance computing data and analytics | 2013

A framework for hybrid parallel flow simulations with a trillion cells in complex geometries

Christian Godenschwager; Florian Schornbaum; Martin Bauer; Harald Köstler; Ulrich Rüde

waLBerla is a massively parallel software framework for simulating complex flows with the lattice Boltzmann method (LBM). Performance and scalability results are presented for SuperMUC, the worlds fastest x86-based supercomputer ranked number 6 on the Top500 list, and JUQUEEN, a Blue Gene/Q system ranked as number 5. We reach resolutions with more than one trillion cells and perform up to 1.93 trillion cell updates per second using 1.8 million threads. The design and implementation of waLBerla is driven by a careful analysis of the performance on current petascale supercomputers. Our fully distributed data structures and algorithms allow for efficient, massively parallel simulations on these machines. Elaborate node level optimizations and vectorization using SIMD instructions result in highly optimized compute kernels for the single- and two-relaxation-time LBM. Excellent weak and strong scaling is achieved for a complex vascular geometry of the human coronary tree.

Computer Methods in Applied Mechanics and Engineering | 1994

Extrapolation, combination, and sparse grid techniques for elliptic boundary value problems

Hans-Joachim Bungartz; Michael Griebel; Ulrich Rüde

Several variants of extrapolation can be used for the solution of partial differential equations. There are Richardson extrapolation, truncation error extrapolation, and extrapolation of the functional. In multi-dimensional problems, multivariate error expansions can be exploited by multivariate extrapolation, where the asymptotic expansions in different mesh parameters are exploited. Particularly interesting cases are the combination technique that uses all the grids that have a constant product of the meshspacings in the different coordinate directions. Another related technique is the sparse grid finite element technique that can be interpreted as a combination extrapolation of the functional.

parallel computing | 2011

A flexible Patch-based lattice Boltzmann parallelization approach for heterogeneous GPU-CPU clusters

Christian Feichtinger; Johannes Habich; Harald Köstler; Georg Hager; Ulrich Rüde; Gerhard Wellein

Sustaining a large fraction of single GPU performance in parallel computations is considered to be the major problem of GPU-based clusters. We address this issue in the context of a lattice Boltzmann flow solver that is integrated in the WaLBerla software framework. Our multi-GPU implementation uses a block-structured MPI parallelization and is suitable for load balancing and heterogeneous computations on CPUs and GPUs. The overhead required for multi-GPU simulations is discussed in detail. It is demonstrated that a large fraction of the kernel performance can be sustained for weak scaling on InfiniBand clusters, leading to excellent parallel efficiency. However, in strong scaling scenarios using multiple GPUs is much less efficient than running CPU-only simulations on IBM BG/P and x86-based clusters. Hence, a cost analysis must determine the best course of action for a particular simulation task and hardware configuration. Finally we present weak scaling results of heterogeneous simulations conducted on CPUs and GPUs simultaneously, using clusters equipped with varying node configurations.

Explore More