The International Conference on High Performance Computing in Asia-Pacific Region Companion | 2021

An efficient halo approach for Euler-Lagrange simulations based on MPI-3 shared memory

 
 
 
 

Abstract


Euler-Lagrange methods are a common approach for simulation of dispersed particle-laden flow, e.g. in turbomachinery. In this approach, the fluid is treated as continuous phase with an Eulerian field solver whereas the Lagrangian movement of the dispersed phase is described through the equations of motion for each individual particle. In high-performance computing, the load of the fluid phase is only dependent on the degrees of freedom and load-balancing steps can be taken a priori, thereby ensuring optimal scaling. However, the discrete phase introduces local load imbalances that cannot easily predicted as generally neither the spatial particle distribution nor the computational cost for advancing particles in relation to the fluid integration are know a priori. Runtime load balancing alleviates this problem by adjusting the local load on each processor according to information gathered during the simulation [4]. Since the load balancing step becomes part of the simulation time, its performance and appropriate scaling on modern HPC systems becomes of crucial importance. In this talk, we will first present the FLEXI framework for the Euler-Lagrange system, and follow by introducing the previous approach and highlight its difficulties. FLEXI is a high-order accurate, massively parallel CFD framework based on the Discontinuous Galerkin Spectral Element Method (DGSEM). It has shown excellent scaling properties for the fluid phase and was recently extended by particle tracking capabilities [1], developed together with the PICLas framework [2]. In FLEXI, the mesh is saved in the HDF5 format, allowing for parallel access, with the elements presorted along a space-filling curve (SFC). This approach has shown its suitability for fluid simulations as each processor requires and accesses only the local mesh information, thereby reducing I/O on the underlying file system [3]. However, the particle phase needs additional information around the fluid domain to retain high computational efficiency since particles can cross the local domain boundary at any point during a time step. In previous implementations, this “halo region” information was communicated between each individual processor, causing significant CPU and network load for an extended period of time during initialization and each load balancing step. Therefore, we propose an method developed from scratch utilizing modern MPI calls and able to overcome most of the challenges in the previous approach. This reworked method utilizes MPI-3 shared memory to make mesh information available to all processors on a compute-node. We perform a two-step, communication-free identification of all relevant mesh elements for a compute-node. Furthermore, by making the mesh information accessible to all processors sharing local memory, we eliminate redundant calculations and reduce data duplication. We conclude by presenting examples of large scale computations of particle-laden flows in complex turbomachinery systems and give an outlook on the next research challenges.

Volume None
Pages None
DOI 10.1145/3440722.3440904
Language English
Journal The International Conference on High Performance Computing in Asia-Pacific Region Companion

Full Text