Thomas Zeiser
University of Erlangen-Nuremberg
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Thomas Zeiser.
Chemical Engineering Science | 2003
Hannsjörg Freund; Thomas Zeiser; Florian Huber; Elias Klemm; Gunther Brenner; Franz Durst; Gerhard Emig
Abstract Randomly packed fixed-bed reactors are widely used in the chemical process industries. Their design is usually based on pseudo-homogeneous model equations with averaged semi-empirical parameters. However, this design concept fails for low tube-to-particle diameter ratios (=aspect ratios) where local phenomena dominate. The complete three-dimensional (3D) structure of the packing has therefore to be considered in order to resolve the local inhomogeneities. New numerical methods and the increase of computational power allow us to simulate in detail single phase reacting flows in such reactors, exclusively based on material properties and the 3D description of the geometry, thus without the use of semi-empirical data. The successive simulation steps (packing generation, fluid flow and species calculation) and their validation with experimental data are described in this paper. In order to synthetically generate realistic random packings of spherical particles, we apply a Monte-Carlo method. The subsequent numerical simulation of the 3D flow field and coupled mass transport of reacting species is done by means of lattice Boltzmann methods. The simulation results reveal that not only the local behaviour but also integral quantities like the pressure drop depend remarkably on the local structural properties of the packings, a feature which is neglected when using correlations with averaged values.
computer software and applications conference | 2009
Gerhard Wellein; Georg Hager; Thomas Zeiser; Markus Wittmann; H. Fehske
We present a pipelined wavefront parallelization approach for stencil-based computations. Within a fixed spatial domain successive wavefronts are executed by threads scheduled to a multicore processor chip with a shared outer level cache. By re-using data from cache in the successive wavefronts this multicore-aware parallelization strategy employs temporal blocking in a simple and efficient way. We use the Jacobi algorithm in three dimensions as a prototype or stencil-based computations and prove the efficiency of our approach on the latest generations of Intels x86 quad- and hexa-core processors.
Archive | 2006
Carolin Körner; Thomas Pohl; Ulrich Rüde; Nils Thürey; Thomas Zeiser
The lattice Boltzmann method (LBM) has evolved to a promising alternative to the well-established methods based on finite elements/volumes for computational fluid dynamics simulations. Ease of implementation, extensibility, and computational efficiency are the major reasons for LBM’s growing field of application and increasing popularity. In this paper we give a brief introduction to the involved theory and equations for LBM, present various techniques to increase the single-CPU performance, outline the parallelization of a standard LBM implementation, and show performance results. In order to demonstrate the straightforward extensibility of LBM, we then focus on an application in material science involving fluid flows with free surfaces. We discuss the required extensions to handle this complex scenario, and the impact on the parallelization technique.
Journal of Computational Physics | 2008
Lilit Axner; Joerg M. Bernsdorf; Thomas Zeiser; Peter Lammers; Jan Linxweiler; Alfonsb G. Hoekstra
We develop a performance prediction model for a parallelized sparse lattice Boltzmann solver and present performance results for simulations of flow in a variety of complex geometries. A special focus is on partitioning and memory/load balancing strategy for geometries with a high solid fraction and/or complex topology such as porous media, fissured rocks and geometries from medical applications. The topology of the lattice nodes representing the fluid fraction of the computational domain is mapped on a graph. Graph decomposition is performed with both multilevel recursive-bisection and multilevel k-way schemes based on modified Kernighan-Lin and Fiduccia-Mattheyses partitioning algorithms. Performance results and optimization strategies are presented for a variety of platforms, showing a parallel efficiency of almost 80% for the largest problem size. A good agreement between the performance model and experimental results is demonstrated.
Philosophical Transactions of the Royal Society A | 2002
Thomas Zeiser; Martin Steven; Hannsjörg Freund; Peter Lammers; Gunther Brenner; Franz Durst; Jörg Bernsdorf
The pressure drop of technical devices is a crucial property for their design and operation. In this paper, we show how the results of lattice Boltzmann simulations can be used in science and engineering to improve the physical understanding of the pressure drop and the flow inhomogeneities in porous media, especially in sphere-packed fixed-bed reactors with low aspect ratios. Commonly used pressure drop correlations are based on simplified assumptions such as the capillary or tortuosity model, which do not reflect all hydrodynamic effects. Consequently, empirical correlations for certain classes of media have been introduced in the past to bridge the gap between the models and the experimental findings. As is shown in this paper by the detailed analysis of the velocity field in the void space of packed beds, the pressure drop is due to more complex hydrodynamics than considered in the above-mentioned models. With the help of lattice Boltzmann simulations, we were able to analyse the different contributions to the total dissipation, namely shear and deformation of the fluid, for different geometries over a wide range of Reynolds numbers. We further show that the actual length of the flow paths changes considerably with the radial and circumferential position.
international parallel and distributed processing symposium | 2008
Georg Hager; Thomas Zeiser; Gerhard Wellein
Processor and system architectures that feature multiple memory controllers are prone to show bottlenecks and erratic performance numbers on codes with regular access patterns. Although such effects are well known in the form of cache thrashing and aliasing conflicts, they become more severe when memory access is involved. Using the new Sun UltraSPARC T2 processor as a prototypical multi-core design, we analyze performance patterns in low-level and application benchmarks and show ways to circumvent bottlenecks by careful data layout and padding.
Progress in Computational Fluid Dynamics | 2008
Thomas Zeiser; Gerhard Wellein; A. Nitsure; Klaus Iglberger; Ulrich Rüde; Georg Hager
In this report we propose a parallel cache oblivious spatial and temporal blocking algorithm for the lattice Boltzmann method in three spatial dimensions. The algorithm has originally been proposed by Frigo et al. (1999) and divides the space-time domain of stencil-based methods in an optimal way, independently of any external parameters, e.g., cache size. In view of the increasing gap between processor speed and memory performance this approach offers a promising path to increase cache utilisation. We find that even a straightforward cache oblivious implementation can reduce memory traffic at least by a factor of two if compared to a highly optimised standard kernel and improves scalability for shared memory parallelisation. Due to the recursive structure of the algorithm we use an unconventional parallelisation scheme based on task queuing.
Advances in Engineering Software | 2011
Johannes Habich; Thomas Zeiser; Georg Hager; Gerhard Wellein
This paper presents implementation strategies and optimization approaches for a D3Q19 lattice Boltzmann flow solver on nVIDIA graphics processing units (GPUs). Using the STREAM benchmarks we demonstrate the GPU parallelization approach and obtain an upper limit for the flow solver performance. We discuss the GPU-specific implementation of the solver with a focus on memory alignment and register shortage. The optimized code is up to an order of magnitude faster than standard two-socket x86 servers with AMD Barcelona or Intel Nehalem CPUs. We further analyze data transfer rates for the PCI-express bus to evaluate the potential benefits of multi-GPU parallelism in a cluster environment.
Parallel Processing Letters | 2009
Thomas Zeiser; Georg Hager; Gerhard Wellein
Classic vector systems have all but vanished from recent TOP500 lists. Looking at the recently introduced NEC SX-9 series, we benchmark its memory subsystem using the low level vector triad and employ the kernel of an advanced lattice Boltzmann flow solver to demonstrate that classic vectors still combine excellent performance with a well-established optimization approach. To investigate the multi-node performance, the flow field in a real porous medium is simulated using the hybrid MPI/OpenMP parallel ILBDC lattice Boltzmann application code. Results for a commodity Intel Nehalem-based cluster are provided for comparison. Clusters can keep up with the vector systems, however, require massive parallelism and thus much more effort to provide a good domain decomposition.
Parallel Computational Fluid Dynamics 2005#R##N#Theory and Applications | 2006
Gerhard Wellein; Peter Lammers; Georg Hager; Stefan Donath; Thomas Zeiser
Publisher Summary The chapter discusses the lattice Boltzmann method (LBM) performance on commodity “off-the-shelf” clusters with Intel Xeon processors, tailored HPC systems, and a NEC SX8 vector system. The chapter describes the main architectural differences and comments on single processor performance as well as optimization strategies. The parallel performance of a large scale simulation running on up to 2000 processors, providing 2 TFlop/s of sustained performance is evaluated and presented in the chapter. In the past decade, the LBM has been established as an alternative for the numerical simulation of incompressible flows. One major reason for the success of LBM is the simplicity of its core algorithm that allows both easy adaption to complex application scenarios as well as extension to additional physical or chemical effects. Because LBM is a direct method, the use of extensive computer resources is often mandatory. Thus, LBM has attracted a lot of attention in the high-performance computing (HFC) community. An important feature of many LBM codes is that the core algorithm can be reduced to a few manageable subroutines, facilitating deep performance analysis followed by precise code and data layout optimization.