Christian Obrecht | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Christian Obrecht is active.

Explore More

Publication

Featured researches published by Christian Obrecht.

Computers & Mathematics With Applications | 2010

LBM based flow simulation using GPU computing processor

Frédéric Kuznik; Christian Obrecht; Gilles Rusaouen; Jean-Jacques Roux

Graphics Processing Units (GPUs), originally developed for computer games, now provide computational power for scientific applications. In this paper, we develop a general purpose Lattice Boltzmann code that runs entirely on a single GPU. The results show that: (1) simple precision floating point arithmetic is sufficient for LBM computation in comparison to double precision; (2) the implementation of LBM on GPUs allows us to achieve up to about one billion lattice update per second using single precision floating point; (3) GPUs provide an inexpensive alternative to large clusters for fluid dynamics prediction.

Computers & Mathematics With Applications | 2011

A new approach to the lattice Boltzmann method for graphics processing units

Christian Obrecht; Frédéric Kuznik; Bernard Tourancheau; Jean-Jacques Roux

Emerging many-core processors, like CUDA capable nVidia GPUs, are promising platforms for regular parallel algorithms such as the Lattice Boltzmann Method (LBM). Since the global memory for graphic devices shows high latency and LBM is data intensive, the memory access pattern is an important issue for achieving good performances. Whenever possible, global memory loads and stores should be coalescent and aligned, but the propagation phase in LBM can lead to frequent misaligned memory accesses. Most previous CUDA implementations of 3D LBM addressed this problem by using low latency on chip shared memory. Instead of this, our CUDA implementation of LBM follows carefully chosen data transfer schemes in global memory. For the 3D lid-driven cavity test case, we obtained up to 86% of the global memory maximal throughput on nVidias GT200. We show that as a consequence highly efficient implementations of LBM on GPUs are possible, even for complex models.

Computers & Mathematics With Applications | 2013

Multi-GPU implementation of the lattice Boltzmann method

Christian Obrecht; Frédéric Kuznik; Bernard Tourancheau; Jean-Jacques Roux

The lattice Boltzmann method (LBM) is an increasingly popular approach for solving fluid flows in a wide range of applications. The LBM yields regular, data-parallel computations; hence, it is especially well fitted to massively parallel hardware such as graphics processing units (GPU). Up to now, though, single-GPU implementations of the LBM are of moderate practical interest since the on-board memory of GPU-based computing devices is too scarce for large scale simulations. In this paper, we present a multi-GPU LBM solver based on the well-known D3Q19 MRT model. Using appropriate hardware, we managed to run our program on six Tesla C1060 computing devices in parallel. We observed up to 2.15x10^9 node updates per second for the lid-driven cubic cavity test case. It is worth mentioning that such a performance is comparable to the one obtained with large high performance clusters or massively parallel supercomputers. Our solver enabled us to perform high resolution simulations for large Reynolds numbers without facing numerical instabilities. Though, we could observe symmetry breaking effects for long-extended simulations of unsteady flows. We describe the different levels of precision we implemented, showing that these effects are due to round off errors, and we discuss their relative impact on performance.

parallel computing | 2013

Scalable lattice Boltzmann solvers for CUDA GPU clusters

Christian Obrecht; Frédéric Kuznik; Bernard Tourancheau; Jean-Jacques Roux

The lattice Boltzmann method (LBM) is an innovative and promising approach in computational fluid dynamics. From an algorithmic standpoint it reduces to a regular data parallel procedure and is therefore well-suited to high performance computations. Numerous works report efficient implementations of the LBM for the GPU, but very few mention multi-GPU versions and even fewer GPU cluster implementations. Yet, to be of practical interest, GPU LBM solvers need to be able to perform large scale simulations. In the present contribution, we describe an efficient LBM implementation for CUDA GPU clusters. Our solver consists of a set of MPI communication routines and a CUDA kernel specifically designed to handle three-dimensional partitioning of the computation domain. Performance measurement were carried out on a small cluster. We show that the results are satisfying, both in terms of data throughput and parallelisation efficiency.

ieee international conference on high performance computing data and analytics | 2011

The TheLMA project: Multi-GPU implementation of the lattice Boltzmann method

Christian Obrecht; Frédéric Kuznik; Bernard Tourancheau; Jean-Jacques Roux

In this paper, we describe the implementation of a multi-graphical processing unit (GPU) fluid flow solver based on the lattice Boltzmann method (LBM). The LBM is a novel approach in computational fluid dynamics, with numerous interesting features from a computational, numerical, and physical standpoint. Our program is based on CUDA and uses POSIX threads to manage multiple computation devices. Using recently released hardware, our solver may therefore run eight GPUs in parallel, which allows us to perform simulations at a rather large scale. Performance and scalability are excellent, the speedup over sequential implementations being at least of two orders of magnitude. In addition, we discuss tiling and communication issues for present and forthcoming implementations.

ieee international conference on high performance computing data and analytics | 2010

Global memory access modelling for efficient implementation of the lattice Boltzmann method on graphics processing units

Christian Obrecht; Frédéric Kuznik; Bernard Tourancheau; Jean-Jacques Roux

In this work, we investigate the global memory access mechanism on recent GPUs. For the purpose of this study, we created specific benchmark programs, which allowed us to explore the scheduling of global memory transactions. Thus, we formulate a model capable of estimating the execution time for a large class of applications. Our main goal is to facilitate optimisation of regular data-parallel applications on GPUs. As an example, we finally describe our CUDA implementations of LBM flow solvers on which our model was able to estimate performance with less than 5% relative error.

Environmental Fluid Mechanics | 2015

Towards aeraulic simulations at urban scale using the lattice Boltzmann method

Christian Obrecht; Frédéric Kuznik; Lucie Merlier; Jean-Jacques Roux; Bernard Tourancheau

The lattice Boltzmann method (LBM) is an innovative approach in computational fluid dynamics (CFD). Due to the underlying lattice structure, the LBM is inherently parallel and therefore well suited for high performance computing. Its application to outdoor aeraulic studies is promising, e.g. applied on complex urban configurations, as an alternative approach to the commonplace Reynolds-averaged Navier–Stokes and large eddy simulation methods based on the Navier–Stokes equations. Emerging many-core devices, such as graphic processing units (GPUs), nowadays make possible to run very large scale simulations on rather inexpensive hardware. In this paper, we present simulation results obtained using our multi-GPU LBM solver. For validation purpose, we study the flow around a wall-mounted cube and show agreement with previously published experimental results. Furthermore, we discuss larger scale flow simulations involving nine cubes which demonstrate the practicability of CFD simulations in building external aeraulics.

Journal of Computational Physics | 2014

High-performance implementations and large-scale validation of the link-wise artificial compressibility method

Christian Obrecht; Pietro Asinari; Frédéric Kuznik; Jean-Jacques Roux

The link-wise artificial compressibility method (LW-ACM) is a recent formulation of the artificial compressibility method for solving the incompressible Navier-Stokes equations. Two implementations of the LW-ACM in three dimensions on CUDA enabled GPUs are described. The first one is a modified version of a state-of-the-art CUDA implementation of the lattice Boltzmann method (LBM), showing that an existing GPU LBM solver might easily be adapted to LW-ACM. The second one follows a novel approach, which leads to a performance increase of up to 1.8x compared to the LBM implementation considered here, while reducing the memory requirements by a factor of 5.25. Large-scale simulations of the lid-driven cubic cavity at Reynolds number Re=2000 were performed for both LW-ACM and LBM. Comparison of the simulation results against spectral elements reference data shows that LW-ACM performs almost as well as multiple-relaxation-time LBM in terms of accuracy.

Computers & Mathematics With Applications | 2016

Thermal link-wise artificial compressibility method

Christian Obrecht; Pietro Asinari; Frédéric Kuznik; Jean Jacques Roux

The link-wise artificial compressibility method (LW-ACM) is a novel formulation of the artificial compressibility method for the incompressible Navier-Stokes equations showing strong analogies with the lattice Boltzmann method (LBM). The LW-ACM operates on regular Cartesian meshes and is therefore well-suited for massively parallel processors such as graphics processing units (GPUs). In this work, we describe the GPU implementation of a three-dimensional thermal flow solver based on a double-population LW-ACM model. Focusing on large scale simulations of the differentially heated cubic cavity, we compare the present method to hybrid approaches based on either multiple-relaxation-time LBM (MRT-LBM) or LW-ACM, where the energy equation is solved through finite differences on a compact stencil. Since thermal LW-ACM requires only the storing of fluid density and velocity in addition to temperature, both double-population thermal LW-ACM and hybrid thermal LW-ACM reduce the memory requirements by a factor of 4.4 compared to a D3Q19 hybrid thermal LBM implementation following a two-grid approach. Using a single graphics card featuring 6GiB11Instead of the widespread but ambiguous GB and KB notations, we use the notations of the International System of Quantities, namely 1 GiB = 2 30 B , 1 KiB = 2 10 B , and 1 kB = 10 3 B . źof memory, we were able to perform single-precision computations on meshes containing up to 536 3 nodes, i.e. about 154 million nodes. We show that all three methods are comparable both in terms of accuracy and performance on recent GPUs. For Rayleigh numbers ranging from 10 4 to 10 6 , the thermal fluxes as well as the flow features are in similar good agreement with reference values from the literature.

Computers & Mathematics With Applications | 2013

Efficient GPU implementation of the linearly interpolated bounce-back boundary condition

Christian Obrecht; Frédéric Kuznik; Bernard Tourancheau; Jean-Jacques Roux

Interpolated bounce-back boundary conditions for the lattice Boltzmann method (LBM) make the accurate representation of complex geometries possible. In the present work, we describe an implementation of a linearly interpolated bounce-back (LIBB) boundary condition for graphics processing units (GPUs). To validate our code, we simulated the flow past a sphere in a square channel. At low Reynolds numbers, results are in good agreement with experimental data. Moreover, we give an estimate of the critical Reynolds number for transition from steady to periodic flow. Performance recorded on a single node server with eight GPU based computing devices ranged up to 2.63x10^9 fluid node updates per second. Comparison with a simple bounce-back version of the solver shows that the impact of LIBB on performance is fairly low.

Explore More