A three-dimensional unified gas-kinetic wave-particle solver for flow computation in all regimes
AA three-dimensional unified gas-kinetic wave-particlesolver for flow computation in all regimes
Yipei Chen a , Yajun Zhu c , Kun Xu a,b, ∗ a Department of Mathematics, Hong Kong University of Science and Technology, HongKong, China b Department of Mechanical and Aerospace Engineering, Hong Kong University ofScience and Technology, Hong Kong, China c National Key Laboratory of Science and Technology on Aerodynamic Design andResearch, Northwestern Polytechnical University, Xian, Shaanxi 710072, China
Abstract
In this paper, the unified gas-kinetic wave-particle (UGKWP) method hasbeen constructed on three-dimensional unstructured mesh with parallel com-puting for multiscale flow simulation. Following the direct modeling method-ology of the unified gas-kinetic scheme (UGKS), the UGKWP method mod-els the flow dynamics uniformly in different regime and gets the local cell’sKnudsen number dependent numerical solution directly without the require-ment of kinetic scale cell resolution. The UGKWP method is composed ofevolution of deterministic wave and stochastic particles. With the dynamicwave-particle decomposition, the UGKWP method is able to capture thecontinuum wave interaction and rarefied particle transport under a unifiedframework and achieves the high efficiency in different flow regime. TheUGKWP flow solver is validated by many three-dimensional test cases ofdifferent Mach and Knudsen numbers, which include 3D shock tube prob-lem, lid-driven cavity flow, high-speed flow passing through a cubic object,and hypersonic flow around a space vehicle. The parallel performance hasbeen tested on the Tianhe-2 supercomputer, and reasonable parallel per-formance has been observed up to one thousand core processing. Due towave-particle formulation, the UGKWP method has great potential in solv-ing three-dimensional multiscale transport with the co-existence of contin- ∗ Corresponding author.
Email address: [email protected] (Kun Xu)
Preprint submitted to Physics of Fluids July 28, 2020 a r X i v : . [ phy s i c s . c o m p - ph ] J u l um and rarefied flow regimes, especially for the high-speed rarefied andcontinuum flow around space vehicle in near space flight. Keywords:
Unified wave-particle method, Multiscale transport, Rarefiedand continuum flow simulation, Hypersonic flow
1. Introduction
Non-equilibrium flow appears in a wide range of applications, such as there-entry of spacecraft in upper planetary atmospheres, vacuum devices, andfluid-structure interaction in Microelectromechanical systems. For example,for a vehicle in a near-space flight at Mach number 6 and Reynolds number5000, the local Knudsen number defined by Kn local = l |∇ ρ | /ρ with the meanfree path l can cover a wide range of values with five orders of magnitudedifference[1]. To simulate such a multiscale problem, it requires numericalalgorithm to capture both equilibrium and non-equilibrium flow in differentregime, such as the hydrodynamic regime in the highly compressible leadingedge, the whole transition regime across the vehicle surface, and the rarefiedregime in the highly expanding trailing edge.The well-known Euler and Navier-Stokes-Fourier (NSF) equations are hy-drodynamic equations, which are valid for the continuum flow. They becomeinaccurate in the near continuum and transition regimes. On the other hand,the Boltzmann equation models the gas dynamics in the kinetic scale of par-ticle mean free path and collision time. The solution in all flow regimescan be obtained by solving the Boltzmann equation under the kinetic scaleresolution. However, the high-dimensionality of the equation, nonlinearityof collision term, and its integro-differential nature make the deterministicBoltzmann solver extremely expensive in memory requirement and compu-tational cost. Instead, for practical high-speed non-equilibrium flow compu-tation, the direct simulation Monte Carlo (DSMC) method [2] becomes themain choice due to its high efficiency for solving the Boltzmann equation fromthe stochastic particle approach. Similar to the modeling in the derivationof the Boltzmann equation, the separation of particle transport and collisionin DSMC enforces the numerical mesh size and time step to be less than theparticle mean free path and collision time, i.e., the so-called kinetic scale ofDSMC modeling.Developed at Sandia National Laboratories, stochastic parallel rarefied-gas time-accurate analyzer (SPARTA)[3] is an open source 2 & 3D DSMC2imulator optimized for exascale parallel computing and embed with bothstatic and dynamic load balancing across processors. Particles in SPARTAadvect through a hierarchical oct-tree based Cartesian grid that overlays thesimulation box. Additionally, dsmcFoam[4] and its upgrade release dsmcFoam+[5]have been developed within the framework of OpenFOAM[6–8], which no-tably features with dynamic load balancing on arbitrary 2D/3D polyhedronmesh, molecular vibrational and electronic energy modes, chemical reac-tions and gravitational force. Other DSMC codes such as MONACO[9],SMILE[10], DAC[11] with different mesh topologies and collision treatmentscan be found in the literature. Apart from stochastic solvers, the deter-ministic numerical scheme for Boltzmann and kinetic model equations withdiscrete velocity points have been extensively studied in the last severaldecades. Nesvetay-3D[12] is an implicit solver on unstructured mesh de-veloped by Titarev et al. with both physical and velocity space decomposedparallelization. Recently, Zhu et al. has implemented discrete unified gas ki-netic scheme (DUGKS)[13, 14] with the Shakhov collision model[15] nameddugksFoam[16]. Unlike the traditional DVM method, DUGKS is a multiscalesolver, and the time step is not restricted by the particle collision time dueto the coupled treatment of particle transport and collision. Besides the tra-ditional physical space decomposition parallel strategy, dugksFoam featuresa parallel computing ability based on the velocity space decomposition.The recently developed unified gas-kinetic wave-particle (UGKWP) methodis a multiscale method for all flow regimes[17, 18], and is used in othermultiscale transport simulation as well, such as photon transport[19]. TheUGKWP is constructed under the unified gas-kinetic scheme (UGKS) frame-work [20]. Instead of using discrete velocity method in UGKS, the UGKWPuses both hydrodynamic wave and stochastic particles to model the flow evo-lution, where a time and scale-dependent flux function is constructed throughthe coupled wave and particle transport across a cell interface in order to up-date both macroscopic flow variables and microscopic gas distribution func-tion inside each control volume. Due to the adaptive wave-particle decom-position, the hydrodynamic equilibrium flow and the kinetic non-equilibriumparticle free transport can be simulated efficiently by their separate repre-sentations and their dynamical coupling according to the local cell Knudsennumber, i.e., Kn c = τ / ∆ t with the particle relaxation time τ and numericaltime step ∆ t . The UGKWP method has unified preserving property[21] topresent the physical solution in all flow regime from the kinetic scale trans-port to the Navier-Stokes wave propagation without the constraint on the3umerical cell size and time step being less than the particle mean free pathand collision time.The particle number in UGKWP is proportional to exp ( − /Kn c ), whichis a function of the cell Knudsen number. In the continuum flow regime, theparticle number will be significantly reduced due to the small cell Knudsennumber. In the highly rarefied regime, similar to DSMC, the particle willplay a dominant role in the flow evolution with the association of statisti-cal noise. The steady-state solution can be obtained from the averaging oftime-accurate evolution solution. Due to the decoupled treatment of parti-cle transport and collision, DSMC requires the cell size to be a fraction ofthe particle mean free path and becomes very expensive in the transitionand near continuum flow regime. In contrast, the UGKS and UGKWP haveno such requirement with coupled particle transport and collision process inthe gas evolution and flux construction[22]. Moreover, the DSMC methodhandles the collision process by selecting particle collision pairs. In the lowKnudsen number case, intensive collisions have to be dealt with, which makeDSMC impractical in the near continuum flow simulation, such as the fly-ing vehicle at an altitude below 80 km . In contrast, the UGKWP methodremoves collisional particles and re-samples them from the updated macro-scopic flow variables. The modeling of collective effect of particles’ collisionwithin a time step in UGKWP doesn’t require the time step and cell size tobe less than the particle collision time and mean free path. As a result, theUGKWP is suitable for multiscale flow computation. Furthermore, with theimplementation of particles in UGKWP, the ray effect[23], which is observedin DVM-type schemes in the highly non-equilibrium regime due to the in-adequate numerical resolution in the particle velocity space, can be totallyavoided. A series of 1D/2D test cases covering a wide range of Mach andKnudsen numbers have been conducted extensively to validate the scheme[17, 18]. In this work, the UGKWP method will be extended to 3D unstruc-tured mesh with parallel computing via spatial decomposition, and the codeis potentially applicable to 3D flow simulation with arbitrary geometries inall flow regimes. The development from a 2D to a 3D code makes great effortto solve all problems related to the complex geometry, reconstruction, mul-tidimensional flux from wave-particle decomposition, particle tracking, andparallelization.The rest of the paper is organized as follows. In section 2, the numer-ical procedure of the UGKWP method on unstructured mesh is presented.Section 3 covers the construction of 3D UGKWP on parallel framework. In4ection 4, several numerical examples, including the 3D Sod shock tube in-side a square-column, Lid-driven cubic cavity flow, and the high-speed flowaround a cube and space vehicle, will be computed to demonstrate the per-formance of the current algorithm in multiscale flow simulation. Conclusionand further developments are given in the last section.
2. 3D Unified gas-kinetic wave-particle method
In this section, the unified gas-kinetic wave-particle (UGKWP) methodwill be introduced. In the classical kinetic theory, the Boltzmann equationreads ∂f∂t + u · ∇ x f = Q ( f ) , (1)where f ( x , u , t ) is the gas distribution function, which depends on the particlevelocity u ∈ R , physical space position x ∈ R , and time t ∈ R + . Q ( f ) is thenonlinear Boltzmann collision operator. In many applications, the collisionterm is usually simplified by other relaxation-type collision models S ( f ), suchas Bhatnagar-Gross-Krook (BGK)[24], the ellipsoidal statistical BGK (ES-BGK)[25], and the Shakhov model[15]. In general, the kinetic model can bewritten as ∂f∂t + u · ∇ x f = S ( f ) . (2)Throughout this paper, the BGK relaxation model f t + u · ∇ x f = g − fτ (3)will be used to construct the UGKWP method. Here τ denotes the relaxationtime, which is related the dynamic viscosity coefficient µ and the pressure p ,i.e., τ = µ/p . The local equilibrium state g is the Maxwellian distribution g = ρ (cid:18) λπ (cid:19) K exp[ − λ (( u − U ) + ξ )] , (4)with density ρ , macroscopic velocity U , internal degree of freedom K , andthe internal variable ξ = ( ξ , . . . , ξ K ). λ is related to the temperature T by λ = m/ (2 k B T ) = 1 / (2 RT ). Here, m and k B represent the molecular massand the Boltzmann constant, respectively. R = k B /m is the specific gas5onstant. Typically, the relaxation parameter in the kinetic model can becalculated through τ = µp = µ ref p (cid:18) TT ref (cid:19) ω , (5)where µ ref , T ref are the reference viscosity coefficient and temperature, and ω is power index, which related to Variable Hard Sphere (VHS) or VariableSoft Sphere (VSS) Models. The unified scheme is direct modeling in the discretized space (cid:80) i Ω i ⊂ R and time t n ∈ R + [22]. The cell averaged conservative flow variables W i =( ρ i , ( ρ U ) i , ( ρE ) i ) on a physical cell Ω i is defined as W i = 1 | Ω i | (cid:90) Ω i W ( x )d x , (6)and the cell averaged distribution function f i on physical cell Ω i is defined as f i = 1 | Ω i | (cid:90) Ω i f ( x )d x . (7)In terms of conservative flow variables, from t n to t n +1 on cell Ω i , the dis-cretized conservation laws for W i and f i are W n +1 i = W ni − | Ω i | (cid:88) j ∈ N ( i ) F ij | S ij | , (8)and f n +1 i = f ni − | Ω i | (cid:88) j ∈ N ( i ) F ij | S ij | + (cid:90) t n +1 t n S ( f i )d t, (9)where N ( i ) denotes the set of the interface-adjacent neighboring cells of cell i , and cell j is one of the neighbors. The interface between cells i and j isrepresented by the subscript ij . Hence, | S ij | and n ij are referred to the areaof the interface ij and the unit normal vector of the interface ij pointing fromcell i to cell j . F ij and F ij denotes the macroscopic and microscopic fluxesacross the interface, respectively. | Ω i | is the volume of cell i , ∆ t = t n +1 − t n denotes the discretized time step. 6t should be noted that Eqs.(8) and (9) are the fundamental physical lawson the scale of mesh size and time step, which describe the conservations ofmacroscopic flow variables and microscopic gas distribution function. Themacroscopic conservative flow variables, their fluxes, and the flux for theparticle transport are related to the moments of the gas distribution functionthrough W i = (cid:90) f i ψ dΞ , (10) F ij = (cid:90) t n +1 t n (cid:90) u · n ij f ij ( t ) ψ dΞd t, (11)and F ij = (cid:90) t n +1 t n u · n ij f ij ( t )d t, (12)where f ij ( t ) is the time-dependent distribution function on the cell interface, ψ = (1 , u , ( u + ξ )) is collision invariants, dΞ = d u d ξ , d u = d u d v d w , andd ξ = d ξ d ξ · · · d ξ K . The BGK relaxation term satisfies the compatibilitycondition (cid:90) S ( f ) ψ dΞ = (cid:90) g − fτ ψ dΞ = (13)for the mass, momentum, and energy conservations during the particle colli-sion process.The multiscale flow evolution in the unified algorithm relies on the con-struction of the flux function at the cell interfaces. The time-dependent gasdistribution function f ij ( t ) couples particle free streaming and collision de-termines the flow physics in different regime, which is based on the integralsolution of the BGK model f ( x , t ) = 1 τ (cid:90) tt g ( x (cid:48) , t (cid:48) ) e − ( t − t (cid:48) ) /τ d t (cid:48) + e − ( t − t ) /τ f ( x − u ( t − t )) , (14)where x is the point for the evaluation of the local gas distribution function, x (cid:48) = x − u ( t − t (cid:48) ) is the particle trajectory. Typically, x is denoted as x ij , thecenter of a cell interface for flux evaluation. f ( x ) is the initial distributionfunction around x at the beginning of each step t = t n , and g ( x , t ) is theequilibrium state distributed around x and t . Specifically, for second-orderaccuracy, with transformation t = t − t , x = x − x , the local expansionsare g ( x , t ) = g + g x · x + g t t, (15)7nd f ( x ) = f + f x · x . (16)The time-dependent distribution function at the center of cell interface x ij can be constructed as f ij ( t ) = c g + c g x · u + c g t (cid:124) (cid:123)(cid:122) (cid:125) f eqij ( t ) + c f + c f x · u (cid:124) (cid:123)(cid:122) (cid:125) f frij ( t ) , (17)with the coefficients c = 1 − e − t/τ ,c = te − t/τ − τ (1 − e − t/τ ) ,c = t − τ (1 − e − t/τ ) ,c = e − t/τ ,c = − te − t/τ . (18)Note that f eqij ( t ) and f frij ( t ) are the terms related to the evolution of the localequilibrium state g ( x , t ) and the initial distribution function f ( x ), respec-tively. The initial gas distribution function f in Eq. (17) is reconstructedfrom the updated gas distribution function at t n , which has the form f ( x ij , t ) = (cid:40) f ni + ( ∇ x f ) ni · ( x ij − x i ) − u · ( ∇ x f ) ni ( t − t n ) , n ij · u ≥ ,f nj + ( ∇ x f ) nj · ( x ij − x j ) − u · ( ∇ x f ) nj ( t − t n ) , n ij · u < , (19)where f ni and f nj are the initial distribution functions at neighboring cellsaround the cell interface ij . Here ( ∇ x f ) ni is the spatial gradient of the initialdistribution function inside the cell i and can be reconstructed via leastsquare with Venkatakrishnan’s limiter[26] or Barth and Jespersen limiter[27].The local equilibrium state g in Eq. (17) is computed from the compat-ibility condition W = (cid:90) g ψ dΞ = (cid:90) f ψ dΞ , (20)and the spatial and temporal derivatives of the equilibriums state can beobtained through the micro-macro relationship W x = (cid:90) g x ψ dΞ = (cid:90) f x ψ dΞ , W t = − (cid:90) u · g x ψ dΞ . (21)8quations (14) and (17) present a transition process from the initial non-equilibrium distribution function to the equilibrium one with the incrementof particle collision. It shows an evolution process from the kinetic to the hy-drodynamic scale, and the real solution depends on the local parameter τ / ∆ t ,i.e., the local cell Knudsen number. Specifically, the integrated microscopicflux over a time step gives F ij = (cid:90) ∆ t u · n ij f ij ( t )d t = u · n ij ( q g + q g x · u + q g t ) (cid:124) (cid:123)(cid:122) (cid:125) F eqij + u · n ij ( q f + q f x · u ) (cid:124) (cid:123)(cid:122) (cid:125) F frij , (22)where F eqij and F frij are the equilibrium microscopic flux and the free trans-port microscopic flux, respectively. Similarly, the macroscopic fluxes for con-servative variables are splitting into the equilibrium flux F eqij and the freestreaming flux F frij F ij = (cid:90) ∆ t (cid:90) u · n ij f ij ( t ) ψ dΞd t = (cid:90) F eqij ψ dΞ (cid:124) (cid:123)(cid:122) (cid:125) F eqij + (cid:90) F frij ψ dΞ (cid:124) (cid:123)(cid:122) (cid:125) F frij , (23)with the coefficients q = ∆ t − τ (1 − e − ∆ t/τ ) ,q = 2 τ (1 − e − ∆ t/τ ) − τ ∆ t − τ ∆ te − ∆ t/τ ,q = ∆ t − τ ∆ t + τ (1 − e − ∆ t/τ ) ,q = τ (1 − e − ∆ t/τ ) ,q = τ ∆ te − ∆ t/τ − τ (1 − e − ∆ t/τ ) . (24)With the variation of τ / ∆ t , Eq. (22) and (23) can provide multiscale flowevolution solution. When ∆ t (cid:29) τ , only the terms F eqij with q ≈ ∆ t and q ≈ ∆ t / t (cid:28) τ , F frij with q ≈ ∆ t and q ≈ ∆ t / f i isfurther discretized in the particle velocity space with discrete velocity points9 k to capture the non-equilibrium distribution function. Compared withmany other DVM with separate particle free-streaming and collision, themesh size and time step in UGKS are not limited by the particle mean freepath and collision time due to their coupled evolution solution for the fluxevaluation. Moreover, the NS solutions can be obtained automatically byUGKS in the continuum regime even with ∆ t (cid:29) τ , such as for the laminarboundary layer solution at high Reynolds number. For UGKWP, instead ofdiscretizing the particle velocity space, the particle will be used directly torepresent the non-equilibrium gas distribution function. The integral solution of the kinetic model equation (14) can be rewrittenas f ( x , t ) = (1 − e − t/τ ) g p ( x , t ) + e − t/τ f ( x − u t ) , (25)where g p = g + (cid:18) te − t/τ − e − t/τ − τ (cid:19) u · ∇ x g + (cid:18) t − e − t/τ − τ (cid:19) ∂ t g. (26)Equation (25) states that the distribution function at time t is a combinationof the initial distribution function f and the modified equilibrium state g .The probability for the particle without suffering collision at time t is e − t/τ .Otherwise, it will collide with other particle and the post-collision distribu-tion is determined by the distribution g p . The cumulative distribution forparticle free streaming at time t f is given by F ( t ) = ( t f ≤ t ) = e − t/τ . (27)A particle P k ( m k , x k , u k , e k ) can be represented by its mass m k , position x k ,velocity u k , and internal energy e k . Its free transport time is t f = min( − τ ln( η ) , ∆ t ) , (28)where η is a random number generated from a uniform distribution on theinterval (0 , η ∼ U (0 , x ∗ of the particle freetransport up to time t f can be accurately tracked, x ∗ k = x nk + u k t f , (29)where the particle velocity u k keeps the same value.10ccording to the time t f assigned to each particle, these particles with t f = ∆ t are called collisionless particles P f , and the particles with t f < ∆ t are called collisional particles P c . The collisional particles P c,k should bedeleted at the collision time t f and re-sampled from distribution function g p with the updated macroscopic quantities W n +1 i via sampling, i.e., x n +1 k ∼ U (Ω i ) , u n +1 k ∼ g p ( W n +1 i ) . (30)The position of the re-sampled particle x n +1 k is uniformly distributed insidethe cell Ω i where the collision happens. Similarly to the DSMC method, theinternal energy e n +1 k is sampled according to the temperature and internaldegree of freedom K . The particles mass m k can be prescribed and will bediscussed later. The above scheme is the unified gas-kinetic particle (UGKP)method.Theoretically, in the next time step, the re-sampled equilibrium particleswill be reclassified into collisionless and collisional particles again according tothe free transport time t f , and only the collisionless particles will be retainedat the end of the next time step. Since the collisional particle will disappearin the next time step, it is not necessary to re-sample it, and its dynamicimpact, such as the contribution to the flux, can be calculated analytically.Therefore, in order to reduce the noise variance in near continuum regimeand avoid re-sampling collisional particles repeatedly, only the collisionlessparticles in the hydrodynamic wave W hi = W i − W pi need to be re-sampledat the beginning of each time step. This is the basic idea of the unified gas-kinetic wave-particle (UGKWP) method. Here, W pi is the total conservativequantities of collisionless particles remained in cell Ω i at the end of each timestep, W pi = 1 | Ω i | (cid:88) x k ∈ Ω i φ k , (31)where the vector φ k = m k (1 , u k , ( u k + e k )) denotes the mass, momentum,and energy carried by the particle P k .Based on the cumulative distribution Eq.(27), the proportion of the colli-sionless particles can be evaluated in each cell at the beginning of each timestep. The total mass density of the re-sampled collisionless particle takesa portion of the updated hydrodynamic wave density ρ h from the previoustime step ρ hp = e − ∆ t/τ ρ h . (32)11ased on this observation, the particle evolution procedure of the UGKWPmethod can be summarized as1. Obtain free streaming time t f,k for the remaining particles P nf,k .2. Sample the collisionless particles P nf,k from hydrodynamic wave withdistribution g p ( W n ). Note that the collisionless particles with totalmass density ρ hp,n = e − ∆ t/τ ρ h,n have the free streaming time t f = ∆ t .3. Stream all the particles and classified into two categories, i,e, collision-less particles P n +1 f,k and collisional particles P ∗ c,k .4. Keep collisionless particles P n +1 f,k , and remove collisional particles P ∗ c,k .Calculate total conservative quantities of the remained collisionless par-ticles W p,n +1 according to Eq. (31). The conservative quantities ofcollisional particles W h,n +1 are obtained from the updated total con-servative quantities W n +1 in Eq.(8) as W h,n +1 = W n +1 − W p,n +1 . Thedetailed formulation for the update of W n +1 will be presented in thenext subsection.The interplay of waves (collisional) and particles (collisionless) in theUGKWP method is illustrated through a series of figures in fig. 1. From thediagram, the multi-efficiency property of UGKWP[17] is clearly indicated,i.e., the computational efficiency of UGKWP goes to the high efficient ap-proach in the corresponding regime. For example, in near continuum regime,i.e., τ →
0, the proportion of collisionless particle decreases exponentially.The UGKWP becomes a scheme without particles, and its computationalcost is comparable to a traditional NS solver. On the other hand, for highlynon-equilibrium hypersonic flow, such as τ (cid:29) ∆ t , the particles will play adominant role to capture the non-equilibrium transport, and the efficiencyof the scheme will go to the particle method, such as DSMC.It has been shown in [17] that the UGKWP method is a kinetic equationsolver in the rarefied regime and preserves the Navier-Stokes solution in thecontinuum regime with the particles re-sampled from the first-order approx-imation of g p . Even though the particles are sampled uniformly inside thecontrol volume, the spatial accuracy can be still kept in the near continuumregime, because the portion of particles e − ∆ t/τ is minimal and the hydrody-namic wave evolution is dominant by the updated W , which is computedanalytically with second-order accuracy. The DSMC method requires ∆ t tobe less than the particle mean collision time, which is equivalent to t f = ∆ t .The free transport time in UGKWP method is obtained from the samplingprocess in Eq. (28), where the particle collisional effect, such as evolving12 ℎ −Δ / −Δ / − − Δ / − Δ / (a) ℎ −Δ / −Δ / − − Δ / − Δ / (b) −Δ / −Δ / − − Δ / − Δ / ℎ (c) ℎ −Δ / −Δ / − − Δ / − Δ / (d) Figure 1: The diagram illustrates the interplay of waves and particles in the UGKWPmethod. Grey block: waves, hollow circle: collisionless particles, solid circle: collisionalparticles. (a) Initial field; (b) Classification of the collisionless particles and collisionalparticles for the part of W p according to the free transport time t f ; (c) Sample collision-less particles W hp from hydrodynamic waves W h ; (d) Update on both macroscopic andmicroscopic level. to the equilibrium distribution g p , has been modeled in the scheme throughthe evolution solution in (25). Without using this evolution solution withtime accumulating particle collision effect, or any other equivalent form, it isimpossible to design a multiscale method, which can recover the NS solutionin the continuum flow regime. The UGKWP updates the macroscopic variables for each control vol-ume in Eq. (8). The equilibrium part flux F eqij is directly calculated fromthe macroscopic flow field as given by Eq. (23). In UGKWP, the calcula-tion of the free streaming flux F frij will be divided into two parts. The freestreaming flux from collisional hydrodynamic waves of (1 − e − ∆ t/τ ) W h canbe calculated analytically. The other free streaming flux from collisionlessparticles of hydrodynamic waves e − ∆ t/τ W h and the remained particles W p can be evaluated by counting the particles passing through the cell interfaceduring a time step. The free streaming flux contributed from the collisional13ydrodynamic waves of (1 − e − ∆ t/τ ) W h on the cell interface ij is F fr,waveij = F fr,UGKSij ( W h ) − F fr,DV Mij ( W hp )= (cid:90) u · n ij (cid:20) ( q g h + q u · g h x ) − e − ∆ t/τ (cid:90) ∆ t ( g h − t u · g h x )d t (cid:21) ψ dΞ= (cid:90) u · n ij (cid:20) ( q − ∆ te − ∆ t/τ ) g h + ( q + ∆ t e − ∆ t/τ ) u · g h x (cid:21) ψ dΞ , (33)where g h is the Maxwellian distribution with temperature and average ve-locity determined by total macroscopic variables W , but the density by W h . g h x is the spatial derivative of the Maxwellian distribution, which can be ob-tained from the reconstruction of W and W h . The free streaming flux F frij in Eq. (23) is computed partially by particles and partially by the contribu-tion of g h ( W h ) analytically. In addition, the subtraction of F fr,DV Mij ( W hp )from F fr,UGKSij ( W h ) aims to remove the free transport fluxes which are stillcalculated by the collisionless particle P f sampled from W hp . The total non-equilibrium free streaming flux F frij also includes the contribution from theremaining particles P k from the previous time step. During the free transportprocess, the contribution to the numerical fluxes of cell i can be obtained bycounting the particles across the cell interfaces, F fr,pi = (cid:88) x n +1 k , x ∗ k ∈ Ω i φ k − (cid:88) x nk ∈ Ω i φ k . (34)Finally, the updates of the conservative flow variables in the UGKWP methodare W n +1 i = W ni − | Ω i | (cid:88) j ∈ N ( i ) F eqij | S ij | − | Ω i | (cid:88) j ∈ N ( i ) F frij | S ij | , = W ni − | Ω i | (cid:88) j ∈ N ( i ) F eqij | S ij | − | Ω i | (cid:88) j ∈ N ( i ) F fr,waveij | S ij | + F fr,pi | Ω i | . (35) (a) Time step on unstructured meshFollow the implementation in [28], the time step for unsteady flow simu-lation is obtained from ∆ t = C min i Ω i Λ xi + Λ yi + Λ zi , (36)14ith Courant number C typically satisfied 0 < C < i Λ xi = ( | U i | + c )∆ S xi , Λ yi = ( | V i | + c )∆ S yi , Λ zi = ( | W i | + c )∆ S zi , (37)where c = 3 σ i = 3 √ RT i is approximately the sound speed, U i = ( U i , V i , W i )is the macroscopic velocity. The variables ∆ S xi , ∆ S yi , and ∆ S zi , respectively,represent projections of the control volume on the y-z-, x-z-, and x-y-plane,which are given by ∆ S xi = 12 (cid:88) j ∈ N ( i ) | S xij | , ∆ S yi = 12 (cid:88) j ∈ N ( i ) | S yij | , ∆ S zi = 12 (cid:88) j ∈ N ( i ) | S zij | , (38)where S xij , S yij , and S zij denote the x-, y-, and the z-component of the facevector S ij = | S ij | n ij .(b) Particle samplingAt the beginning of each time step, the collisionless particles of hydrody-namic waves will be sampled in pairs from Maxwellian distribution function g ( W n ). Specifically, given with the macroscopic velocity U = ( U, V, W ), tem-perature T , and vector X that sampled from the normal distribution, a pair ofparticles with microscopic velocities u = U + √ RT X and u (cid:48) = U − √ RT X will be sampled. To determine the sampling particle number N sam in thecell, a prescribed preference number N ref is required. Further, the referencemass m ref can be determined from the total particle mass and the referencenumber N ref m ref = ( ρ p + ρ hp ) | Ω | N ref = ( ρ p + e − ∆ t/τ ρ h ) | Ω | N ref . (39)Once the reference mass m ref is available, the number of particles to besampled symmetrically is determined by N sam = 2 (cid:24) ρ hp | Ω | m ref (cid:25) = 2 (cid:24) e − ∆ t/τ ρ h | Ω | m ref (cid:25) . (40)15f the reference mass m ref is the same for all cells, then the total number ofparticles per cell would be exactly equal to N ref . In this way, the total numberof particles in each cell can be controlled around the given reference number N ref in near continuum regime regardless of mesh distribution. Moreover,the minimum number of particles N min per cell can be prescribed to adjustthe sampled particles’ number such that N sam = max { N sam , N min − N left } , (41)where N left is the collisionless particles left at the initial of each time step.Finally, the sampled mass weight m sam for each sampled particle is m sam = ρ hp | Ω | N sam = e − ∆ t/τ ρ h | Ω | N sam , (42)which guarantees that the total sampled mass is exactly equal to ρ hp | Ω | .(c) Time averagingFor steady-state solution, the flow field ¯W starts to be averaged after agiven time step N avg , ¯W = (cid:80) n>N avg ∆ t n W n (cid:80) n>N avg ∆ t n (43)where ∆ t n = t n − t n − . The averaged flow field ¯W is assumed to be convergentif the relative change in two-successive steps is less than a given tolerance,such as ε = 10 − . Then, the flow variables, such as the temperature ¯ T andmacroscopic velocity ¯U , can be obtained from the averaged conservative flowvariables ¯W .(d) Numerical dissipationThe UGKWP targets the continuum and rarefied flow. In the continuumflow regime, the strong shock structure is usually unresolved by the meshsize. Therefore, numerical dissipation is added through relaxation time toenlarge the shock thickness to the mesh size scale, τ num = µP + C | P l − P r || P l + P r | ∆ t, (44)where P l and P r are the reconstructed pressures at the left and right side ofthe cell interface and C is a constant, such as C = 10 for strong shock inthe continuum regime. 16e) Boundary conditionThe proper treatment of boundary condition is crucial for a numericalscheme. For a diffusive wall condition with normal direction n pointing to-ward the computational domain, the incoming distribution function f in ( t ) atboundary is given by Eq.(17). The distribution function of emitted particlesfrom the wall has a Maxwellian distribution g w = ρ w (cid:18) πRT w (cid:19) K exp (cid:20) − ( u − U w ) + ξ RT w (cid:21) , (45)where T w and U w are prescribed wall temperature and velocity. Based onthe non-penetration condition, ρ w in the above Maxwellian is given by (cid:90) ∆ t (cid:90) n · ( u − U w ) < n · ( u − U w ) f in ( t )d u d t = ∆ t (cid:90) n · ( u − U w ) ≥ n · ( u − U w ) g w d u . (46)
3. 3D UGKWP code and parallelization
UGKWP solver is constructed under a finite volume framework on 3Dunstructured mesh. It includes not only the reconstruction and flux evalua-tion module as in the traditional finite volume solver, but also the particlesampling and tracking module as in pure particle method.
The main components of UGKWP solver are sketched in fig. 2. As shownin the diagram, the program starts with the pre- and post-processor module,where the mesh partition, initialization, setup of boundary condition, andparallel IO are handled inside. In UGKWP solver, the numerical proceduresare organized into a macroscopic field level and a microscopic particle level.Accordingly, the macroscopic components surrounded by the blue dash lineconsist of the parallel data transfer, reconstruction of the macroscopic gradi-ent, and macroscopic flux calculation with boundary treatment. The greendash block contains the components for microscopic particles which are storedin the doubly-linked list. There are frequent operations, such as tracking par-ticles and calculating the macroscopic fluxes. The insert/delete operation isefficient in the scenario of parallel transfer and sampling/elimination of theparticles. 17
GKWP solver
Pre/Post prosessor
Parallel IOMesh partition for MPIInitialization && Setupof boundary condition Sample and set particle statusSample particles from Compute macro flux and Sample inflow/outflow particlesMove particles and particle boundary treatmentUpdate the conservative variables
Calculate
Steady averageReconstruct the gradientof and Parallel data transfer Pack up particles for parallel transferCalculate the micro flux
Figure 2: The structure and main components of the UGKWP solver
The parallelization of the current code adopts Message Passing Inter-face (MPI) based on the physical mesh decomposition. Every MPI processdeals with a non-overlapping sub-domain, and the information like conserva-tive variables and particles are communicated with the neighboring domainthrough corresponding boundaries. Since the macroscopic solver has second-order accuracy, no padding area of sud-domain is required.The code has been tested on Tianhe-2, located in the National Supercom-puter Center in Guangzhou, China. Tianhe-2 contains 16,000 nodes that eachof which possesses one Intel Xeon E5-2692 12 Cores @2.2 GHz CPU and 88gigabytes of memory (64 used by the Ivy Bridge processors, and 8 gigabytesfor each of the Xeon Phi processors). The computing nodes of Tianhe-2 areinterconnected by TH Express-2 network.To test the parallel efficiency and scalability of the UGKWP, the code iscompiled using Intel C/C++ compiler of version 18.0.0 with -O3 optimizationflag, and are linked to the MPICH2 with a customized GLEX channel. Sincethe scaling is problem-specific (depending on Knudsen number and preferencenumber of particles per cells N ref ), multiple test cases and several factors18ffecting the performance will be analyzed in a series of three-dimensionallid-driven cavity flow tests. Firstly, only the parallel speedup of pure macroscopic field computationusing different MPI processes is measured to eliminate the computation ofparticles and communication of particle parallel transfer. Without the in-volvement of particle generation and particle transportation, the UGKWPdegenerates to the gas-kinetic scheme (GKS) for the continuum flow compu-tation [29]. As it becomes a deterministic solver, for simplicity, the Knudsennumber is fixed at 10 − in the following parallel computation. The averagedrunning time (wall clock time) of a single iteration step is measured, and themeasurement is ensured to be over 100 seconds, and no IO time is counted.To investigate the Amdahls law (strong scaling) at different fixed problemsize, the physical domain is discretized as D , where D is the number of cellsalong with each direction with the values 64 , ,
256 separately. To test theGustafsons law (weak scaling), we concern the speedup for a scaled problemsize to the number of processors. Hence, D = 64 on one node with 24 coresis chosen as the baseline, i.e., keeping the number of cells per processors as64 /
24, and the grid size increased simultaneously as the increment of thenumber of processors P . The corresponding speedup is measured based onthe averaged single-node simulation time, i.e., S P = 24 T p /T .Both strong and weak scaling analysis is plotted in fig. 3. The solidred line represents the ideal linear speedup. From the diagram, the overallcomputational time of the various physical grid sizes scales well with thenumber of processing cores (or MPI processes). Although strong scaling isvery sensitive towards the serial fraction of the program and the communica-tion overhead (e.g., synchronization) could further degrade performance, theworst efficiency still has 73 .
8% for the case D = 64 with P = 960 numberof processors. Moreover, it is observed that strong scaling performance in-creases considerably as the increase of grid size. The strong scaling parallelefficiency for the largest problem size D = 256 can reach 85 .
4% despite theusage of P = 1536 number of processing cores. Finally, the weak scaling isalso verified by increasing both the job size and the number of processingcores, and a satisfactory weak scaling efficiency up to 92% has been achievedeven with P = 1536 number of processing cores.19 umber of cores (P) S p ee dup ( S P ) IdealD = 64D = 128D = 256Weak scalling
Figure 3: Strong and weak scaling analysis without the involvement of particles
Next, the particles are included in the experiment to explore the scala-bility and parallel efficiency of the implementation. Problems with differentnumbers of cells D , Knudsen number Kn , and preference number of par-ticles per cell N ref are run across different nodes. The maximum problemsize is limited by the total memory available on each node, and no IO time isrecorded. The average computing time (wall clock time) for a single iterationstep is averaged from 1001 to 1100 steps to ensure sufficient running timeand reach a steady-state solution.The averaged CPU time (second) per step against the number of cores isshown in fig. 4. The results indicate that the scaling performance is good forall simulations because the CPU time decreases almost linearly as the increasein the number of processing cores. Actually, from table 1 and table 2, theparallel efficiency E P = S P /P for cases with D ≥
72 is over 86% even with P = 864 number of processing cores.Furthermore, several interesting patterns have been observed. First, forthe cases of both N ref = 50 and N ref = 500, we can observe that the absoluteCPU time raises not only as of the increment of grid size but also the Knud-sen number. The reason is that the mean free path of the particles becomeslarge at a high Knudsen number, which produces the unbalanced distributionof particles among different sub-domains as well as in the processing cores.Secondly, the scaling performance deteriorates as the shrinkage of grid size,especially in high-Knudsen number cases, because the proportion of commu-20ication time would increase as the number of cells per core declined, andthe uneven effect of particles distributed among processing cores would alsobe amplified. Accordingly, the worst parallel efficiency E = 68 .
7% is ob-served for the case D = 36 at Kn = 1 with respect to P = 864 number ofcores. Lastly, the parallel efficiency would increase as the reference numberof particles becomes larger.Another interesting phenomenon is that maximum parallel efficiency canbe greater than one. For instance, the maximum parallel efficiency observedis E = 123 .
2% and achieves at the cases of D = 72 , Kn = 10 − and N ref = 500 using P = 864 cores. Actually, the parallel efficiency in transitionregime Kn = 10 − is even higher than that in the near continuum regime Kn = 10 − with the same grid size D and reference number of particles N ref . Besides, the worst parallel efficiency is even larger than 86 .
3% for allcases at Kn = 10 − , − . This counterintuitive parallel efficiency in thetransition regime is probably due to the doubly-linked list data structure forstoring particles. In the transition or near continuum regime, the collisionbetween particles is intensive, and the frequent elimination/resampling ofparticles involves frequent delete/insert operation in memory. Nonetheless,the bottleneck caused by the implemented data structure can be alleviatedthrough the replacement of a sequence container, like the STL vector.In summary, intensive parallel tests of cavity flow show satisfactory strongand weak scaling performance of the UGKWP code. The current implemen-tation becomes a valuable tool for simulating complex flow problems acrossthousands of processing cores in parallel computation. The actual parallelefficiency might vary for particular simulation setup. Table 1: Parallel efficiency for cases with N ref = 50 on Tianhe-2 Nodes Cores D = 72 D = 120 Kn = 10 − Kn = 10 − Kn = 1 Kn = 10 − Kn = 10 − .
1% 101 .
7% 101 .
9% 103 .
2% 103 . .
8% 102 .
6% 97 .
4% 98 .
1% 102 . .
5% 101 .
7% 95 .
8% 97 .
1% 105 . .
2% 98 .
8% 85 .
9% 96 .
6% 112 . .
3% 91 .
2% 88 .
5% 94 .
3% 109 . umber of cores (P) C P U t i m e ( s ) D = 72, Kn = 10 D = 72, Kn = 10 D = 72, Kn = 1D = 120, Kn = 10 D = 120, Kn = 10 Ideal (a)
Number of cores (P) C P U t i m e ( s ) D = 72, Kn = 10 D = 72, Kn = 10 D = 72, Kn = 1D = 48, Kn = 1D = 36, Kn = 1
Ideal (b)
Figure 4: Performance scaling on Tianhe-2 where each node has 24 cores (a) N ref = 50and (b) N ref = 500Table 2: Parallel efficiency for cases with N ref = 500 on Tianhe-2 Nodes Cores D = 72 D = 48 D = 36 Kn = 10 − Kn = 10 − Kn = 1 Kn = 1 Kn = 11 24 100% 100% 100% 100% 100%4 96 101 .
2% 106 .
7% 96 .
4% 92% 92%8 192 100 .
3% 116 .
3% 96 .
5% 92 .
4% 86 . .
7% 119 .
3% 99 .
5% 87 .
2% 82 . .
1% 119 .
3% 98 .
2% 82 .
5% 72 . .
2% 94 .
8% 80 .
8% 68 . . Numerical examples In this section, the accuracy and computational efficiency of the UGKWPsolver will be evaluated through many test cases with a wide range of Knud-sen and Mach numbers. The numerical Sod shock tube problem in 3D,lid-driven cubic cavity flow, high-speed flow passing through a cube, and theflow around a space vehicle, are tested. The results are compared with thosefrom UGKS/DUGKS and DSMC. Without a special statement, the diffusiveboundary condition with full accommodation is applied for the isothermalwalls. The code is compiled with GCC version 7.5.0, and all computationsare carried out on a workstation with [Dual CPU] Intel R (cid:13) Xeon(R) Platinum8168 @ 2.70GHz with 48 cores and 270 GB memory unless indicated other-wise.
The Sod shock tube problem is simulated inside a square-column fordiatomic gas at different Knudsen numbers to validate the current UGKWPmethod, and the result is compared with the 1D UGKS solution.In this test case, the following non-dimensionalization is usedˆ ρ = ρρ ∞ , ˆ U = UC ∞ , ˆ V = VC ∞ , ˆ W = WC ∞ , ˆ T = TT ∞ , ˆ P = Pρ ∞ C ∞ , ˆ t = tt ∞ , ˆ x = xL , C ∞ = (cid:114) k B T ∞ m , t ∞ = LC ∞ , and the initial condition for the non-dimensional variables is( ˆ ρ, ˆ U , ˆ V , ˆ W , ˆ P ) = (cid:26) (1 , , , , , < ˆ x < . , (0 . , , , , . , . < ˆ x < . (47)For UGKWP simulation, the physical domain is a [0 , × [ − . , . × [ − . , . × × N ref = 200 , , , , Kn = 10 − , − , − , . , ,
10 respectively. Leastsquare reconstruction with Venkatakrishnan limiter is utilized. For the UGKSsimulation, the 1D physical domain [0 ,
1] is discretized uniformly with 100cells. Composed Newton-Cotes quadrature with 101 velocity points in range[ − ,
6] is fixed to discretize the one-dimensional velocity space. van Leer lim-iter is used for the reconstruction of both conservative variables and discrete23istribution function. The left and right boundaries are treated as far-field,and the others are treated as symmetric planes. The CFL number for bothUGKWP and UGKS simulation is 0 .
9, and the reference viscosity is given inEq.(5) with ω = 0 .
74. The results at the time t = 0 .
12 in all flow regimesare presented.The density, velocity, and temperature obtained by the UGKS and theUGKWP method at different Knudsen numbers are plotted in Figures 5 to 10,where the three-dimensional flow field computed by UGKWP is projectedto one-dimensional along the x-direction by taking ensemble average overthe cells on y-z plane. No time averaging is applied, and the statisticalnoise is satisfactory for this unsteady flow simulation. For all the cases indifferent flow regimes, the 3D UGKWP solutions agree well with the 1DUGKS data. The slight difference is due to different limiters, i.e., van Leerlimiter for UGKS and Venkatakrishnan limiter for UGKWP. The capabilityof the UGKWP method for numerical simulations in both continuum andrarefied regime is confirmed.The distinguishable feature of multi-efficiency[17] can also be demon-strated here. For UGKS, the computational costs for all Knudsen numbercases will be on the same order since its discretization of the particle velocityspace is the same. While for the UGKWP method, the computational costis reduced at small Knudsen number, e.g., Kn = 10 − in near continuumregime, where the hydrodynamic wave is dominant, and few particles aresampled and tracked. The computational cost of UGKWP for 3D simulationis admissible because only a few hundred or thousand particles are enoughto adaptively discretize the velocity space, whereas it becomes possible that101 mesh points in the velocity space may be required in DVM-based UGKSfor high speed flow. For steady-state simulation, the number of particles canbe reduced further since the statistical noise can be reduced through thetemporal ensemble. For low-speed flow, the UGKWP method is applied to study the three-dimensional lid-driven cubic cavity flow in the transition regime, and theresults are compared with the solution predicted by dugksFoam[16].The side length of the cubic cavity is L = 1 m with the computationaldomain [0 , × [0 , × [0 , hexa-hedrons with the cell size gradually increased towards to the cavity center.The ratio of the cell size in the center and the boundary is about 2. The24 oordinate X D e n s i t y UGKWP 3DUGKS 1D (a)
Coordinate X X V e l o c i t y U UGKWP 3DUGKS 1D (b)
Coordinate X T e m p e r a t u r e UGKWP 3DUGKS 1D (c)
Figure 5: Sod shock tube at Kn = 10 − . (a) Density, (b) X-Velocity U, and (c) Temper-ature. Coordinate X D e n s i t y UGKWP 3DUGKS 1D (a)
Coordinate X X V e l o c i t y U UGKWP 3DUGKS 1D (b)
Coordinate X T e m p e r a t u r e UGKWP 3DUGKS 1D (c)
Figure 6: Sod shock tube at Kn = 10 − . (a) Density, (b) X-Velocity U, and (c) Temper-ature. Coordinate X D e n s i t y UGKWP 3DUGKS 1D (a)
Coordinate X X V e l o c i t y U UGKWP 3DUGKS 1D (b)
Coordinate X T e m p e r a t u r e UGKWP 3DUGKS 1D (c)
Figure 7: Sod shock tube at Kn = 10 − . (a) Density, (b) X-Velocity U, and (c) Temper-ature. oordinate X D e n s i t y UGKWP 3DUGKS 1D (a)
Coordinate X X V e l o c i t y U UGKWP 3DUGKS 1D (b)
Coordinate X T e m p e r a t u r e UGKWP 3DUGKS 1D (c)
Figure 8: Sod shock tube at Kn = 0 .
1. (a) Density, (b) X-Velocity U, and (c) Temperature.
Coordinate X D e n s i t y UGKWP 3DUGKS 1D (a)
Coordinate X X V e l o c i t y U UGKWP 3DUGKS 1D (b)
Coordinate X T e m p e r a t u r e UGKWP 3DUGKS 1D (c)
Figure 9: Sod shock tube at Kn = 1. (a) Density, (b) X-Velocity U, and (c) Temperature. Coordinate X D e n s i t y UGKWP 3DUGKS 1D (a)
Coordinate X X V e l o c i t y U UGKWP 3DUGKS 1D (b)
Coordinate X T e m p e r a t u r e UGKWP 3DUGKS 1D (c)
Figure 10: Sod shock tube at Kn = 10. (a) Density, (b) X-Velocity U, and (c) Tempera-ture. U w = 50 m/s , while the other walls are kept fixed. All side-walls have the diffusive boundary condition and keep a uniform temperature T w = 273 K . The cavity is assumed to consist of monatomic argon gas withmolecular mass m = 6 . × − kg and diameter d = 4 . × − m . TheKnudsen number is Kn = λ/L = 0 . λ is cal-culated from the initial uniform gas density by λ = m/ ( √ πd ρ ). The gasviscosity depends on the temperature by Eq. (5) with reference temperature T ref = T w = 273 K and reference viscosity µ ref given by variable hard sphere(VHS) model with ω = 0 . N ref =5000 reference number of simulation particles is used. The time-averagingis starting from 1000 steps in order to reduce the statistical noises of highmoments quantities, such as the temperature. The CFL number is set tobe 0 .
95, and the least square reconstruction with Venkatakrishnan limiter isemployed for the gradient calculation. Physical space parallelization with 48cores is adopted for UGKWP.In the dugksFoam simulation, the three-dimensional velocity space is dis-credited using 28 half-range Gauss-Hermit quadrature points in each direc-tion. The CFL number is set to be 0 .
8. The gradients are calculated by leastSquare. The Prandtl number is fixed as
P r = 1 . U and V ), match well. For the temper-ature, as a higher moment quantity, the UGKWP solutions generally agreewith that of dugksFoam, but still exhibit relatively large statistical noise,although a long time averaging has been performed in UGKWP. It is alsonoteworthy that the noise incurred by three-dimensional particles in realthree-dimensional simulation is larger than that in the two-dimensional sim-ulation with particles without Z-direction velocity, e.g., 2D Cavity flow[18].The computational time for dugksFoam is around 154 . . × − . The UGKWP solu-27ion takes 93 . . Figure 11: Comparison of the temperature iso-surfaces predicted by DUGKS (left) andUGKWP (right).
The first case is a supersonic rarefied gas flow passing through a cubeat
M a = 2 and Kn = 1. The cube center is located at (0 , , m . The surfaces of the cube are diffusive wall boundarycondition with a constant temperature T w = 273 K . Due to the symmetry,only a quadrant of the cube is simulated by UGKWP. The computationaldomain [ − , × [0 , × [0 ,
8] is discretized by (32+14+34) × (7+34) × (7+34) =28 Z Density1.21E 061.17E 061.16E 061.15E 061.15E 061.14E 061.14E 061.12E 061.11E 061.08E 06 X Z (a) X Z Temperature X Z (b) X Z U2522.52017.51512.5107.552.50 2.5 5 X Z (c) X Z W1086420 2 4 6 8 10
8 6 6 4 4 2 2022 44 6810 X Z (d) Figure 12: Symmetric X-Z cut-plane contour of cavity flow at Kn = 0 . × ×
41 cells with uniformly distributed grids on the surface of the cube.The cell size is stretched from the cube surface with a ratio of 1 . .
083 at the rear and lateral sides of the cube.The inflow is monatomic argon gas with molecular mass m = 6 . × − kg and diameter d = 4 . × − m . The CFL number for UGKWP simulationis 0 .
9, and the reference viscosity is given by the variable hard sphere (VHS)model with ω = 0 .
81. To capture the lowest temperature that appearsat the rear of the cube caused by the expanding flow, a large number ofparticles N ref = N min = 5000 is required. The simulation is carried out withfirst-order spatial accuracy to reduce the noises caused by unreliable wavereconstruction. The time-averaging starts from 1400 steps with an initialfield computed by 1000 steps GKS. The simulation runs 120 . × ×
91. Each cell has 50 particles on average, and thetime step is 2 . × − s . The averaging begins from 1000 steps and continuesfor 68000 steps, which take 128 . τ (cid:29) ∆ t , the simulation particle increases,and the stochastic noise becomes significant. Instead of choosing collisionpairs in DSMC, UGKWP re-sample the collisional particles according to thecell averaged temperature and macroscopic velocity. When the temperaturehas a small variance, such as in the low-speed cavity flow, the inadequate30article number could deteriorate the temperature with noise. Then, the re-sampled collisional particles even inherit the inaccuracy and poison the low-temperature area at the rear part of the cube, where artificial heating withover-estimated temperature appears. Even though with the above weakness,UGKWP can perform simulation on a coarse mesh than that used in theDSMC. Consequently, UGKWP and DSMC have comparable computationalcost in the rarefied regime.In DVM method [33], the implicit discretization with memory reductiontechnique on GPU is implemented. The full cube is simulated with a physicalgrid 191 × ×
181 and a velocity grid 48 . The velocity points are distributeduniformly to cover a range of [ − √ RT w , √ RT w ] , and the trapezoidal ruleis used to calculate the moments. The simulation takes approximately 20hours, with 41 iteration steps on the Tesla K40 GPU. Figure 14 shows thedetailed comparisons of the temperature, density, and velocities distributionson the X-Z symmetry plane with DVM solutions. Similar to the comparisonwith DSMC, the shock thickness and the separation distance between thedensity and temperature profiles have small variations between UGKWPand implicit DVM solutions, where the Shakhov model is used in DVM.Again, the UGKWP uses the BGK model. The differences in the shockstructure solution between the BGK and Shakhov model have been presentedin [35]. As for computational time, owing to the implicit treatment andimplementation of GPU acceleration, the DVM simulation is around an orderfaster than the current UGKWP simulation at such a low Mach number M a = 2.
To highlight the efficiency and capability of UGKWP, hypersonic flow at
M a = 20 is simulated in the transition regime ( Kn = 0 . K , and the wall temperature is 300 K . The cube is contained in a volumewith base and top side lengths of a = 16 m, b = 10 m , and height h = 14 m .As shown in Figure 15, a unstructured mesh with total 420702 cells is gener-ated, which is composed of 8305 hexahedra, 52 prisms, 25030 pyramids and387315 tetrahedra with a minimum cell height 0 . m near the cube wall.Distinguishable from the rarefied case, the number of particles required dropsdramatically. Here, the reference and minimum number of particles per cell N ref = N min = 400 are used. The simulation is conducted with the leastsquare reconstruction and Barth and Jespersen limiter. An initial field is31 Z Temperature519.0491.7464.4437.0409.7382.4355.1327.8300.4273.1245.8 (a) X Z Density2.7E 071.8E 071.2E 071.0E 079.3E 088.9E 088.5E 087.9E 086.5E 084.8E 082.3E 085.1E 09 (b) X Z U600.0556.7513.3470.0426.7383.3340.0296.7253.3210.0166.7123.380.0 (c) X Z W100.077.354.531.89.1 13.6 36.4 59.1 81.8 104.5 127.3 150.0 (d)
Figure 13: Comparison of distributions between UGKWP and DSMC on the X-Z symmet-ric cut-plane at
M a = 2 and Kn = 1. Dashed red lines with colored background representthe UGKWP result, and the solid white lines denote the DSMC solution. (a) temperaturecontour, (b) density contour, (c) contour of U (X-component velocity) and (d) contour of W (Z-component velocity). Z Temperature519.0491.7464.4437.0409.7382.4355.1327.8300.4273.1245.8 (a) X Z Density2.7E 071.6E 071.1E 079.7E 089.0E 088.9E 087.8E 086.5E 084.4E 082.5E 086.3E 09 (b) X Z U600.0556.7513.3470.0426.7383.3340.0296.7253.3210.0166.7123.380.0 (c) X Z W100.077.354.531.89.1 13.6 36.4 59.1 81.8 104.5 127.3 150.0 (d)
Figure 14: Comparison of distributions between UGKWP and DVM on the X-Z symmetriccut-plane at
M a = 2 and Kn = 1. Dashed red lines with colored background representthe UGKWP result, and the solid white lines denote the DVM solution. (a) temperaturecontour, (b) density contour, (c) contour of U (X-component velocity) and (d) contour of W (Z-component velocity). . . XYZ (a)
XYZ (b)
Figure 15: Unstructured mesh configuration at
M a = 20 and Kn = 0 .
05 (a) Full viewand (b) local enlargement
Figure 16 shows the distributions of temperature, density, and velocitieson the X-Z symmetry plane. The hypersonic flow computation in the tran-sition regime is a challenge for both stochastic and deterministic methods.For the DSMC, an extremely fine mesh in physical space is required. For theDVM-type deterministic solvers, a tremendous amount of discrete velocitypoints becomes necessary. The UGKWP is an idealized method for the hy-personic flow in all flow regimes. Due to the high Mach number, even with alow inflow temperature of 56 K , the maximum temperature inside the shockregion can get to 6500 K and over. In the future, the physics associating withhigh temperature, such as ionization and chemical reaction, will be added inUGKWP.To further illustrate the multiscale nature of the simulation, the localKnudsen number [36] based on the gradient Kn GLL = l |∇ ρ | /ρ for the abovetwo cases are presented in fig. 17, which presents five orders of magnitudedifference.The decline of parallel efficiency in the rarefied regime, as presented insection 3.2, can be visualized through the averaged number of particles per34 Z
4 2 0 2 4 6 8 6 4 20246
Temperature65005983.335466.6749504433.333916.6734002883.332366.6718501333.33816.66730056 (a) X Z
4 2 0 2 4 6 8 6 4 20246
Density5.16E 063.48E 062.82E 062.70E 062.39E 061.82E 061.36E 069.15E 075.09E 074.60E 073.80E 072.37E 075.80E 08 (b) X Z
4 2 0 2 4 6 8 6 4 20246
U28002600240022002000180016001400120010008006004002000 (c) X Z
4 2 0 2 4 6 8 6 4 20246
W7006005004003002001000 100 200 300 400 500 600 700 (d)
Figure 16: Symmetric X-Z cut-plane contour of various flow fields at at
M a = 20 and Kn = 0 .
05. (a) temperature contour, (b) density contour, (c) contour of U (X-componentvelocity) and (d) contour of W (Z-component velocity). Z
6 4 2 0 2 4 6 8012345678 Kn GLL (a) X Z
4 2 0 2 4 6 80246 Kn GLL (b)
Figure 17: Local Knudsen number contour on symmetric X-Z cut-plane at (a)
M a = 2 and Kn = 1 and (b) M a = 20 and Kn = 0 . cell in the simulation. Figure 18 shows the distribution of normalized particlenumber per cell N/N ref on symmetric X-Z cut-plane at
M a = 2 , Kn = 1and
M a = 20 , Kn = 0 .
05. The probability of particle collision in the cellbecomes lower with the increment of τ / ∆ t . Therefore, the particle tends tokeep free streaming and concentrate in the rearward of the computationaldomain, especially in the cell with a large volume. This mechanism causesan imbalance in the distributions of particles across different CPU cores.Nevertheless, this problem can be mitigated by implementing dynamic loadbalancing as used in the DSMC implementation. The last example is hypersonic flow at Mach numbers 6 and 10 over aspace vehicle in the transition regimes Kn = 10 − . This case shows theefficiency and capability of UGKWP for simulating three-dimension hyper-sonic flow over complex geometry configuration. The angle of attack is 20 ◦ degrees in this case. As seen in fig. 19, the unstructured mesh of 560593cells consists of 15277 pyramids and 545316 tetrahedra with minimum cellheight 0 . L near the front of the vehicle surface. The reference length forthe definition of Knudsen number is L = 0 . m . The boundary conditionon the vehicle surface is a diffusive one, on which the temperature maintainsat T w = 300 K . Due to the symmetry, only half of the vehicle is simulated.36 Z
6 4 2 0 2 4 6 8012345678
N/N ref (a) X Z N/N ref (b)
Figure 18: Distribution of normalized particle number per cell on symmetric X-Z cut-planeat (a)
M a = 2 and Kn = 1 and (b) M a = 20 and Kn = 0 . The inflow is monatomic argon gas with molecular mass m = 6 . × − kg and diameter d = 4 . × − m at T ∞ = 300 K . The CFL number for thesimulation is 0 .
95, and the reference viscosity is given by the variable hardsphere (VHS) model with ω = 0 .
81. The least square reconstruction withVenkatakrishnan limiter is used in the simulation.Figure 20 presents the distribution of temperature, heat flux, pressure,local Knudsen number, and streamlines around the vehicle at Mach number6. Figure 21 shows the solutions at Mach number 10. Even the free-streamKnudsen number is relatively small, no vortex flow is observed in the rearpart of the vehicle, see figs. 20d and 21d, which is observed in the sim-ulation of near continuum flow [1]. Meanwhile, from figs. 20c and 21c, thedensity-based local Knudsen number Kn GLL can cover a wide range of valueswith five orders of magnitude difference. Therefore, a multi-scale method,like UGKWP, is necessary to capture the flow physics in different regimescorrectly. As presented in figs. 20a and 21a, a high-temperature region isdetected at the leeward side despite the low intensity of heat exchange uponvehicle surface. This is mainly caused by particle collisions in the strongrecompression region with a relatively low free-stream Knudsen number.As for the computational cost, for the
M a = 6 case, the initial fieldis obtained by GKS with 6000 local time stepping, and the time-averagingstarts after 12000 steps of UGKWP computation. The simulation runs 24 . M a = 10, N ref = N min = 400 particles is used.The simulation is conducted on Tianhe-2 with 8 nodes or 192 cores, andit takes 22 . (a) (b) Figure 19: Surface mesh of space vehicle (a) local enlargement, (b) global view
5. Conclusion and further improvements
In this paper, a unified gas-kinetic wave-particle (UGKWP) method isconstructed on three-dimensional unstructured mesh with parallel computingon supercomputer. The scheme is validated for flow simulation in both con-tinuum and rarefied regimes at different flow speeds. Compared with otherpopular flow solvers, such as the DSMC method and the deterministic DVM-based Boltzmann solver, the UGKWP has multiscale property and is efficientin simulating 3D supersonic/hypersonic flow, especially in the transition andnear continuum flow regime. However, for practical engineering applications,further optimization and extension have to be implemented in the code tomake it more efficient and comprehensive. It is expected that the dynamicload balancing implementation can enhance the parallel efficiency in rarefied38 a) (b)(c) (d)
Figure 20: Space vehicle at
M a = 6 and Kn = 10 − . (a) Temperature and surfacedistribution of heat flux, (b) Pressure distribution, (c) Kundsen number distribution, (d)Streamlines color by magnitude of velocity. a) (b)(c) (d) Figure 21: Space vehicle at
M a = 10 and Kn = 10 − . (a) Temperature and surfacedistribution of heat flux, (b) Pressure distribution, (c) Kundsen number distribution, (d)Streamlines color by magnitude of velocity. Author’s contributions
All authors contributed equally to this work.
Acknowledgments
This work was supported by Hong Kong research grant council (16206617),National Natural Science Foundation of China (Grant Nos. 11772281, 91852114),and the National Numerical Windtunnel project.
Data Availability
The data that support the findings of this study are available from thecorresponding author upon reasonable request.
ReferencesReferences [1] D. Jiang, M. Mao, J. Li, X. Deng, An implicit parallel ugks solver forflows covering various regimes, Advances in Aerodynamics 1 (1) (2019)8.[2] G. A. Bird, J. Brady, Molecular gas dynamics and the direct simulationof gas flows, Vol. 5, Clarendon press Oxford, 1994.413] M. A. Gallis, Stochastic PArallel Rarefied-gas Time-accurate Analyzer.,Tech. rep., Sandia National Lab.(SNL-NM), Albuquerque, NM (UnitedStates) (2015).[4] T. Scanlon, E. Roohi, C. White, M. Darbandi, J. Reese, An open source,parallel DSMC code for rarefied gas flows in arbitrary geometries, Com-puters & Fluids 39 (10) (2010) 2078–2089.[5] C. White, M. K. Borg, T. J. Scanlon, S. M. Longshaw, B. John, D. Emer-son, J. M. Reese, dsmcFoam+: An OpenFOAM based direct simulationMonte Carlo solver, Computer Physics Communications 224 (2018) 22–43.[6] H. G. Weller, G. Tabor, H. Jasak, C. Fureby, A tensorial approach tocomputational continuum mechanics using object-oriented techniques,Computers in physics 12 (6) (1998) 620–631.[7] C. J. Greenshields, OpenFOAM - The Open Source CFD Toolbox-UserGuide, OpenFOAM Foundation Ltd 2 (0).[8] The OpenFOAM Foundation, Openfoam, [Online; accessed 12-January-2020] (2020).URL https://openfoam.org/https://openfoam.org/