[PDF] A three-dimensional unified gas-kinetic wave-particle solver for flow computation in all regimes

Abstract

In this paper, the unified gas-kinetic wave-particle (UGKWP) method has been constructed on three-dimensional unstructured mesh with parallel computing for multiscale flow simulation. Following the direct modeling methodology of the unified gas-kinetic scheme (UGKS), the UGKWP method models the flow dynamics uniformly in different regime and gets the local cell's Knudsen number dependent numerical solution directly without the requirement of kinetic scale cell resolution. The UGKWP method is composed of evolution of deterministic wave and stochastic particles. With the dynamic wave-particle decomposition, the UGKWP method is able to capture the continuum wave interaction and rarefied particle transport under a unified framework and achieves the high efficiency in different flow regime. The UGKWP flow solver is validated by many three-dimensional test cases of different Mach and Knudsen numbers, which include 3D shock tube problem, lid-driven cavity flow, high-speed flow passing through a cubic object, and hypersonic flow around a space vehicle. The parallel performance has been tested on the Tianhe-2 supercomputer, and reasonable parallel performance has been observed up to one thousand core processing. Due to wave-particle formulation, the UGKWP method has great potential in solving three-dimensional multiscale transport with the co-existence of continuum and rarefied flow regimes, especially for the high-speed rarefied and continuum flow around space vehicle in near space flight.

Full PDF

AA three-dimensional uniﬁed gas-kinetic wave-particlesolver for ﬂow computation in all regimes

Yipei Chen a , Yajun Zhu c , Kun Xu a,b, ∗ a Department of Mathematics, Hong Kong University of Science and Technology, HongKong, China b Department of Mechanical and Aerospace Engineering, Hong Kong University ofScience and Technology, Hong Kong, China c National Key Laboratory of Science and Technology on Aerodynamic Design andResearch, Northwestern Polytechnical University, Xian, Shaanxi 710072, China

Abstract

In this paper, the uniﬁed gas-kinetic wave-particle (UGKWP) method hasbeen constructed on three-dimensional unstructured mesh with parallel com-puting for multiscale ﬂow simulation. Following the direct modeling method-ology of the uniﬁed gas-kinetic scheme (UGKS), the UGKWP method mod-els the ﬂow dynamics uniformly in diﬀerent regime and gets the local cell’sKnudsen number dependent numerical solution directly without the require-ment of kinetic scale cell resolution. The UGKWP method is composed ofevolution of deterministic wave and stochastic particles. With the dynamicwave-particle decomposition, the UGKWP method is able to capture thecontinuum wave interaction and rareﬁed particle transport under a uniﬁedframework and achieves the high eﬃciency in diﬀerent ﬂow regime. TheUGKWP ﬂow solver is validated by many three-dimensional test cases ofdiﬀerent Mach and Knudsen numbers, which include 3D shock tube prob-lem, lid-driven cavity ﬂow, high-speed ﬂow passing through a cubic object,and hypersonic ﬂow around a space vehicle. The parallel performance hasbeen tested on the Tianhe-2 supercomputer, and reasonable parallel per-formance has been observed up to one thousand core processing. Due towave-particle formulation, the UGKWP method has great potential in solv-ing three-dimensional multiscale transport with the co-existence of contin- ∗ Corresponding author.

Email address: [email protected] (Kun Xu)

Preprint submitted to Physics of Fluids July 28, 2020 a r X i v : . [ phy s i c s . c o m p - ph ] J u l um and rareﬁed ﬂow regimes, especially for the high-speed rareﬁed andcontinuum ﬂow around space vehicle in near space ﬂight. Keywords:

Uniﬁed wave-particle method, Multiscale transport, Rareﬁedand continuum ﬂow simulation, Hypersonic ﬂow

1. Introduction

Non-equilibrium ﬂow appears in a wide range of applications, such as there-entry of spacecraft in upper planetary atmospheres, vacuum devices, andﬂuid-structure interaction in Microelectromechanical systems. For example,for a vehicle in a near-space ﬂight at Mach number 6 and Reynolds number5000, the local Knudsen number deﬁned by Kn local = l |∇ ρ | /ρ with the meanfree path l can cover a wide range of values with ﬁve orders of magnitudediﬀerence[1]. To simulate such a multiscale problem, it requires numericalalgorithm to capture both equilibrium and non-equilibrium ﬂow in diﬀerentregime, such as the hydrodynamic regime in the highly compressible leadingedge, the whole transition regime across the vehicle surface, and the rareﬁedregime in the highly expanding trailing edge.The well-known Euler and Navier-Stokes-Fourier (NSF) equations are hy-drodynamic equations, which are valid for the continuum ﬂow. They becomeinaccurate in the near continuum and transition regimes. On the other hand,the Boltzmann equation models the gas dynamics in the kinetic scale of par-ticle mean free path and collision time. The solution in all ﬂow regimescan be obtained by solving the Boltzmann equation under the kinetic scaleresolution. However, the high-dimensionality of the equation, nonlinearityof collision term, and its integro-diﬀerential nature make the deterministicBoltzmann solver extremely expensive in memory requirement and compu-tational cost. Instead, for practical high-speed non-equilibrium ﬂow compu-tation, the direct simulation Monte Carlo (DSMC) method [2] becomes themain choice due to its high eﬃciency for solving the Boltzmann equation fromthe stochastic particle approach. Similar to the modeling in the derivationof the Boltzmann equation, the separation of particle transport and collisionin DSMC enforces the numerical mesh size and time step to be less than theparticle mean free path and collision time, i.e., the so-called kinetic scale ofDSMC modeling.Developed at Sandia National Laboratories, stochastic parallel rareﬁed-gas time-accurate analyzer (SPARTA)[3] is an open source 2 & 3D DSMC2imulator optimized for exascale parallel computing and embed with bothstatic and dynamic load balancing across processors. Particles in SPARTAadvect through a hierarchical oct-tree based Cartesian grid that overlays thesimulation box. Additionally, dsmcFoam[4] and its upgrade release dsmcFoam+[5]have been developed within the framework of OpenFOAM[6–8], which no-tably features with dynamic load balancing on arbitrary 2D/3D polyhedronmesh, molecular vibrational and electronic energy modes, chemical reac-tions and gravitational force. Other DSMC codes such as MONACO[9],SMILE[10], DAC[11] with diﬀerent mesh topologies and collision treatmentscan be found in the literature. Apart from stochastic solvers, the deter-ministic numerical scheme for Boltzmann and kinetic model equations withdiscrete velocity points have been extensively studied in the last severaldecades. Nesvetay-3D[12] is an implicit solver on unstructured mesh de-veloped by Titarev et al. with both physical and velocity space decomposedparallelization. Recently, Zhu et al. has implemented discrete uniﬁed gas ki-netic scheme (DUGKS)[13, 14] with the Shakhov collision model[15] nameddugksFoam[16]. Unlike the traditional DVM method, DUGKS is a multiscalesolver, and the time step is not restricted by the particle collision time dueto the coupled treatment of particle transport and collision. Besides the tra-ditional physical space decomposition parallel strategy, dugksFoam featuresa parallel computing ability based on the velocity space decomposition.The recently developed uniﬁed gas-kinetic wave-particle (UGKWP) methodis a multiscale method for all ﬂow regimes[17, 18], and is used in othermultiscale transport simulation as well, such as photon transport[19]. TheUGKWP is constructed under the uniﬁed gas-kinetic scheme (UGKS) frame-work [20]. Instead of using discrete velocity method in UGKS, the UGKWPuses both hydrodynamic wave and stochastic particles to model the ﬂow evo-lution, where a time and scale-dependent ﬂux function is constructed throughthe coupled wave and particle transport across a cell interface in order to up-date both macroscopic ﬂow variables and microscopic gas distribution func-tion inside each control volume. Due to the adaptive wave-particle decom-position, the hydrodynamic equilibrium ﬂow and the kinetic non-equilibriumparticle free transport can be simulated eﬃciently by their separate repre-sentations and their dynamical coupling according to the local cell Knudsennumber, i.e., Kn c = τ / ∆ t with the particle relaxation time τ and numericaltime step ∆ t . The UGKWP method has uniﬁed preserving property[21] topresent the physical solution in all ﬂow regime from the kinetic scale trans-port to the Navier-Stokes wave propagation without the constraint on the3umerical cell size and time step being less than the particle mean free pathand collision time.The particle number in UGKWP is proportional to exp ( − /Kn c ), whichis a function of the cell Knudsen number. In the continuum ﬂow regime, theparticle number will be signiﬁcantly reduced due to the small cell Knudsennumber. In the highly rareﬁed regime, similar to DSMC, the particle willplay a dominant role in the ﬂow evolution with the association of statisti-cal noise. The steady-state solution can be obtained from the averaging oftime-accurate evolution solution. Due to the decoupled treatment of parti-cle transport and collision, DSMC requires the cell size to be a fraction ofthe particle mean free path and becomes very expensive in the transitionand near continuum ﬂow regime. In contrast, the UGKS and UGKWP haveno such requirement with coupled particle transport and collision process inthe gas evolution and ﬂux construction[22]. Moreover, the DSMC methodhandles the collision process by selecting particle collision pairs. In the lowKnudsen number case, intensive collisions have to be dealt with, which makeDSMC impractical in the near continuum ﬂow simulation, such as the ﬂy-ing vehicle at an altitude below 80 km . In contrast, the UGKWP methodremoves collisional particles and re-samples them from the updated macro-scopic ﬂow variables. The modeling of collective eﬀect of particles’ collisionwithin a time step in UGKWP doesn’t require the time step and cell size tobe less than the particle collision time and mean free path. As a result, theUGKWP is suitable for multiscale ﬂow computation. Furthermore, with theimplementation of particles in UGKWP, the ray eﬀect[23], which is observedin DVM-type schemes in the highly non-equilibrium regime due to the in-adequate numerical resolution in the particle velocity space, can be totallyavoided. A series of 1D/2D test cases covering a wide range of Mach andKnudsen numbers have been conducted extensively to validate the scheme[17, 18]. In this work, the UGKWP method will be extended to 3D unstruc-tured mesh with parallel computing via spatial decomposition, and the codeis potentially applicable to 3D ﬂow simulation with arbitrary geometries inall ﬂow regimes. The development from a 2D to a 3D code makes great eﬀortto solve all problems related to the complex geometry, reconstruction, mul-tidimensional ﬂux from wave-particle decomposition, particle tracking, andparallelization.The rest of the paper is organized as follows. In section 2, the numer-ical procedure of the UGKWP method on unstructured mesh is presented.Section 3 covers the construction of 3D UGKWP on parallel framework. In4ection 4, several numerical examples, including the 3D Sod shock tube in-side a square-column, Lid-driven cubic cavity ﬂow, and the high-speed ﬂowaround a cube and space vehicle, will be computed to demonstrate the per-formance of the current algorithm in multiscale ﬂow simulation. Conclusionand further developments are given in the last section.

2. 3D Uniﬁed gas-kinetic wave-particle method

In this section, the uniﬁed gas-kinetic wave-particle (UGKWP) methodwill be introduced. In the classical kinetic theory, the Boltzmann equationreads ∂f∂t + u · ∇ x f = Q ( f ) , (1)where f ( x , u , t ) is the gas distribution function, which depends on the particlevelocity u ∈ R , physical space position x ∈ R , and time t ∈ R + . Q ( f ) is thenonlinear Boltzmann collision operator. In many applications, the collisionterm is usually simpliﬁed by other relaxation-type collision models S ( f ), suchas Bhatnagar-Gross-Krook (BGK)[24], the ellipsoidal statistical BGK (ES-BGK)[25], and the Shakhov model[15]. In general, the kinetic model can bewritten as ∂f∂t + u · ∇ x f = S ( f ) . (2)Throughout this paper, the BGK relaxation model f t + u · ∇ x f = g − fτ (3)will be used to construct the UGKWP method. Here τ denotes the relaxationtime, which is related the dynamic viscosity coeﬃcient µ and the pressure p ,i.e., τ = µ/p . The local equilibrium state g is the Maxwellian distribution g = ρ (cid:18) λπ (cid:19) K exp[ − λ (( u − U ) + ξ )] , (4)with density ρ , macroscopic velocity U , internal degree of freedom K , andthe internal variable ξ = ( ξ , . . . , ξ K ). λ is related to the temperature T by λ = m/ (2 k B T ) = 1 / (2 RT ). Here, m and k B represent the molecular massand the Boltzmann constant, respectively. R = k B /m is the speciﬁc gas5onstant. Typically, the relaxation parameter in the kinetic model can becalculated through τ = µp = µ ref p (cid:18) TT ref (cid:19) ω , (5)where µ ref , T ref are the reference viscosity coeﬃcient and temperature, and ω is power index, which related to Variable Hard Sphere (VHS) or VariableSoft Sphere (VSS) Models. The uniﬁed scheme is direct modeling in the discretized space (cid:80) i Ω i ⊂ R and time t n ∈ R + [22]. The cell averaged conservative ﬂow variables W i =( ρ i , ( ρ U ) i , ( ρE ) i ) on a physical cell Ω i is deﬁned as W i = 1 | Ω i | (cid:90) Ω i W ( x )d x , (6)and the cell averaged distribution function f i on physical cell Ω i is deﬁned as f i = 1 | Ω i | (cid:90) Ω i f ( x )d x . (7)In terms of conservative ﬂow variables, from t n to t n +1 on cell Ω i , the dis-cretized conservation laws for W i and f i are W n +1 i = W ni − | Ω i | (cid:88) j ∈ N ( i ) F ij | S ij | , (8)and f n +1 i = f ni − | Ω i | (cid:88) j ∈ N ( i ) F ij | S ij | + (cid:90) t n +1 t n S ( f i )d t, (9)where N ( i ) denotes the set of the interface-adjacent neighboring cells of cell i , and cell j is one of the neighbors. The interface between cells i and j isrepresented by the subscript ij . Hence, | S ij | and n ij are referred to the areaof the interface ij and the unit normal vector of the interface ij pointing fromcell i to cell j . F ij and F ij denotes the macroscopic and microscopic ﬂuxesacross the interface, respectively. | Ω i | is the volume of cell i , ∆ t = t n +1 − t n denotes the discretized time step. 6t should be noted that Eqs.(8) and (9) are the fundamental physical lawson the scale of mesh size and time step, which describe the conservations ofmacroscopic ﬂow variables and microscopic gas distribution function. Themacroscopic conservative ﬂow variables, their ﬂuxes, and the ﬂux for theparticle transport are related to the moments of the gas distribution functionthrough W i = (cid:90) f i ψ dΞ , (10) F ij = (cid:90) t n +1 t n (cid:90) u · n ij f ij ( t ) ψ dΞd t, (11)and F ij = (cid:90) t n +1 t n u · n ij f ij ( t )d t, (12)where f ij ( t ) is the time-dependent distribution function on the cell interface, ψ = (1 , u , ( u + ξ )) is collision invariants, dΞ = d u d ξ , d u = d u d v d w , andd ξ = d ξ d ξ · · · d ξ K . The BGK relaxation term satisﬁes the compatibilitycondition (cid:90) S ( f ) ψ dΞ = (cid:90) g − fτ ψ dΞ = (13)for the mass, momentum, and energy conservations during the particle colli-sion process.The multiscale ﬂow evolution in the uniﬁed algorithm relies on the con-struction of the ﬂux function at the cell interfaces. The time-dependent gasdistribution function f ij ( t ) couples particle free streaming and collision de-termines the ﬂow physics in diﬀerent regime, which is based on the integralsolution of the BGK model f ( x , t ) = 1 τ (cid:90) tt g ( x (cid:48) , t (cid:48) ) e − ( t − t (cid:48) ) /τ d t (cid:48) + e − ( t − t ) /τ f ( x − u ( t − t )) , (14)where x is the point for the evaluation of the local gas distribution function, x (cid:48) = x − u ( t − t (cid:48) ) is the particle trajectory. Typically, x is denoted as x ij , thecenter of a cell interface for ﬂux evaluation. f ( x ) is the initial distributionfunction around x at the beginning of each step t = t n , and g ( x , t ) is theequilibrium state distributed around x and t . Speciﬁcally, for second-orderaccuracy, with transformation t = t − t , x = x − x , the local expansionsare g ( x , t ) = g + g x · x + g t t, (15)7nd f ( x ) = f + f x · x . (16)The time-dependent distribution function at the center of cell interface x ij can be constructed as f ij ( t ) = c g + c g x · u + c g t (cid:124) (cid:123)(cid:122) (cid:125) f eqij ( t ) + c f + c f x · u (cid:124) (cid:123)(cid:122) (cid:125) f frij ( t ) , (17)with the coeﬃcients c = 1 − e − t/τ ,c = te − t/τ − τ (1 − e − t/τ ) ,c = t − τ (1 − e − t/τ ) ,c = e − t/τ ,c = − te − t/τ . (18)Note that f eqij ( t ) and f frij ( t ) are the terms related to the evolution of the localequilibrium state g ( x , t ) and the initial distribution function f ( x ), respec-tively. The initial gas distribution function f in Eq. (17) is reconstructedfrom the updated gas distribution function at t n , which has the form f ( x ij , t ) = (cid:40) f ni + ( ∇ x f ) ni · ( x ij − x i ) − u · ( ∇ x f ) ni ( t − t n ) , n ij · u ≥ ,f nj + ( ∇ x f ) nj · ( x ij − x j ) − u · ( ∇ x f ) nj ( t − t n ) , n ij · u < , (19)where f ni and f nj are the initial distribution functions at neighboring cellsaround the cell interface ij . Here ( ∇ x f ) ni is the spatial gradient of the initialdistribution function inside the cell i and can be reconstructed via leastsquare with Venkatakrishnan’s limiter[26] or Barth and Jespersen limiter[27].The local equilibrium state g in Eq. (17) is computed from the compat-ibility condition W = (cid:90) g ψ dΞ = (cid:90) f ψ dΞ , (20)and the spatial and temporal derivatives of the equilibriums state can beobtained through the micro-macro relationship W x = (cid:90) g x ψ dΞ = (cid:90) f x ψ dΞ , W t = − (cid:90) u · g x ψ dΞ . (21)8quations (14) and (17) present a transition process from the initial non-equilibrium distribution function to the equilibrium one with the incrementof particle collision. It shows an evolution process from the kinetic to the hy-drodynamic scale, and the real solution depends on the local parameter τ / ∆ t ,i.e., the local cell Knudsen number. Speciﬁcally, the integrated microscopicﬂux over a time step gives F ij = (cid:90) ∆ t u · n ij f ij ( t )d t = u · n ij ( q g + q g x · u + q g t ) (cid:124) (cid:123)(cid:122) (cid:125) F eqij + u · n ij ( q f + q f x · u ) (cid:124) (cid:123)(cid:122) (cid:125) F frij , (22)where F eqij and F frij are the equilibrium microscopic ﬂux and the free trans-port microscopic ﬂux, respectively. Similarly, the macroscopic ﬂuxes for con-servative variables are splitting into the equilibrium ﬂux F eqij and the freestreaming ﬂux F frij F ij = (cid:90) ∆ t (cid:90) u · n ij f ij ( t ) ψ dΞd t = (cid:90) F eqij ψ dΞ (cid:124) (cid:123)(cid:122) (cid:125) F eqij + (cid:90) F frij ψ dΞ (cid:124) (cid:123)(cid:122) (cid:125) F frij , (23)with the coeﬃcients q = ∆ t − τ (1 − e − ∆ t/τ ) ,q = 2 τ (1 − e − ∆ t/τ ) − τ ∆ t − τ ∆ te − ∆ t/τ ,q = ∆ t − τ ∆ t + τ (1 − e − ∆ t/τ ) ,q = τ (1 − e − ∆ t/τ ) ,q = τ ∆ te − ∆ t/τ − τ (1 − e − ∆ t/τ ) . (24)With the variation of τ / ∆ t , Eq. (22) and (23) can provide multiscale ﬂowevolution solution. When ∆ t (cid:29) τ , only the terms F eqij with q ≈ ∆ t and q ≈ ∆ t / t (cid:28) τ , F frij with q ≈ ∆ t and q ≈ ∆ t / f i isfurther discretized in the particle velocity space with discrete velocity points9 k to capture the non-equilibrium distribution function. Compared withmany other DVM with separate particle free-streaming and collision, themesh size and time step in UGKS are not limited by the particle mean freepath and collision time due to their coupled evolution solution for the ﬂuxevaluation. Moreover, the NS solutions can be obtained automatically byUGKS in the continuum regime even with ∆ t (cid:29) τ , such as for the laminarboundary layer solution at high Reynolds number. For UGKWP, instead ofdiscretizing the particle velocity space, the particle will be used directly torepresent the non-equilibrium gas distribution function. The integral solution of the kinetic model equation (14) can be rewrittenas f ( x , t ) = (1 − e − t/τ ) g p ( x , t ) + e − t/τ f ( x − u t ) , (25)where g p = g + (cid:18) te − t/τ − e − t/τ − τ (cid:19) u · ∇ x g + (cid:18) t − e − t/τ − τ (cid:19) ∂ t g. (26)Equation (25) states that the distribution function at time t is a combinationof the initial distribution function f and the modiﬁed equilibrium state g .The probability for the particle without suﬀering collision at time t is e − t/τ .Otherwise, it will collide with other particle and the post-collision distribu-tion is determined by the distribution g p . The cumulative distribution forparticle free streaming at time t f is given by F ( t ) = ( t f ≤ t ) = e − t/τ . (27)A particle P k ( m k , x k , u k , e k ) can be represented by its mass m k , position x k ,velocity u k , and internal energy e k . Its free transport time is t f = min( − τ ln( η ) , ∆ t ) , (28)where η is a random number generated from a uniform distribution on theinterval (0 , η ∼ U (0 , x ∗ of the particle freetransport up to time t f can be accurately tracked, x ∗ k = x nk + u k t f , (29)where the particle velocity u k keeps the same value.10ccording to the time t f assigned to each particle, these particles with t f = ∆ t are called collisionless particles P f , and the particles with t f < ∆ t are called collisional particles P c . The collisional particles P c,k should bedeleted at the collision time t f and re-sampled from distribution function g p with the updated macroscopic quantities W n +1 i via sampling, i.e., x n +1 k ∼ U (Ω i ) , u n +1 k ∼ g p ( W n +1 i ) . (30)The position of the re-sampled particle x n +1 k is uniformly distributed insidethe cell Ω i where the collision happens. Similarly to the DSMC method, theinternal energy e n +1 k is sampled according to the temperature and internaldegree of freedom K . The particles mass m k can be prescribed and will bediscussed later. The above scheme is the uniﬁed gas-kinetic particle (UGKP)method.Theoretically, in the next time step, the re-sampled equilibrium particleswill be reclassiﬁed into collisionless and collisional particles again according tothe free transport time t f , and only the collisionless particles will be retainedat the end of the next time step. Since the collisional particle will disappearin the next time step, it is not necessary to re-sample it, and its dynamicimpact, such as the contribution to the ﬂux, can be calculated analytically.Therefore, in order to reduce the noise variance in near continuum regimeand avoid re-sampling collisional particles repeatedly, only the collisionlessparticles in the hydrodynamic wave W hi = W i − W pi need to be re-sampledat the beginning of each time step. This is the basic idea of the uniﬁed gas-kinetic wave-particle (UGKWP) method. Here, W pi is the total conservativequantities of collisionless particles remained in cell Ω i at the end of each timestep, W pi = 1 | Ω i | (cid:88) x k ∈ Ω i φ k , (31)where the vector φ k = m k (1 , u k , ( u k + e k )) denotes the mass, momentum,and energy carried by the particle P k .Based on the cumulative distribution Eq.(27), the proportion of the colli-sionless particles can be evaluated in each cell at the beginning of each timestep. The total mass density of the re-sampled collisionless particle takesa portion of the updated hydrodynamic wave density ρ h from the previoustime step ρ hp = e − ∆ t/τ ρ h . (32)11ased on this observation, the particle evolution procedure of the UGKWPmethod can be summarized as1. Obtain free streaming time t f,k for the remaining particles P nf,k .2. Sample the collisionless particles P nf,k from hydrodynamic wave withdistribution g p ( W n ). Note that the collisionless particles with totalmass density ρ hp,n = e − ∆ t/τ ρ h,n have the free streaming time t f = ∆ t .3. Stream all the particles and classiﬁed into two categories, i,e, collision-less particles P n +1 f,k and collisional particles P ∗ c,k .4. Keep collisionless particles P n +1 f,k , and remove collisional particles P ∗ c,k .Calculate total conservative quantities of the remained collisionless par-ticles W p,n +1 according to Eq. (31). The conservative quantities ofcollisional particles W h,n +1 are obtained from the updated total con-servative quantities W n +1 in Eq.(8) as W h,n +1 = W n +1 − W p,n +1 . Thedetailed formulation for the update of W n +1 will be presented in thenext subsection.The interplay of waves (collisional) and particles (collisionless) in theUGKWP method is illustrated through a series of ﬁgures in ﬁg. 1. From thediagram, the multi-eﬃciency property of UGKWP[17] is clearly indicated,i.e., the computational eﬃciency of UGKWP goes to the high eﬃcient ap-proach in the corresponding regime. For example, in near continuum regime,i.e., τ →

0, the proportion of collisionless particle decreases exponentially.The UGKWP becomes a scheme without particles, and its computationalcost is comparable to a traditional NS solver. On the other hand, for highlynon-equilibrium hypersonic ﬂow, such as τ (cid:29) ∆ t , the particles will play adominant role to capture the non-equilibrium transport, and the eﬃciencyof the scheme will go to the particle method, such as DSMC.It has been shown in [17] that the UGKWP method is a kinetic equationsolver in the rareﬁed regime and preserves the Navier-Stokes solution in thecontinuum regime with the particles re-sampled from the ﬁrst-order approx-imation of g p . Even though the particles are sampled uniformly inside thecontrol volume, the spatial accuracy can be still kept in the near continuumregime, because the portion of particles e − ∆ t/τ is minimal and the hydrody-namic wave evolution is dominant by the updated W , which is computedanalytically with second-order accuracy. The DSMC method requires ∆ t tobe less than the particle mean collision time, which is equivalent to t f = ∆ t .The free transport time in UGKWP method is obtained from the samplingprocess in Eq. (28), where the particle collisional eﬀect, such as evolving12 ℎ −Δ / −Δ / − − Δ / − Δ / (a) ℎ −Δ / −Δ / − − Δ / − Δ / (b) −Δ / −Δ / − − Δ / − Δ / ℎ (c) ℎ −Δ / −Δ / − − Δ / − Δ / (d) Figure 1: The diagram illustrates the interplay of waves and particles in the UGKWPmethod. Grey block: waves, hollow circle: collisionless particles, solid circle: collisionalparticles. (a) Initial ﬁeld; (b) Classiﬁcation of the collisionless particles and collisionalparticles for the part of W p according to the free transport time t f ; (c) Sample collision-less particles W hp from hydrodynamic waves W h ; (d) Update on both macroscopic andmicroscopic level. to the equilibrium distribution g p , has been modeled in the scheme throughthe evolution solution in (25). Without using this evolution solution withtime accumulating particle collision eﬀect, or any other equivalent form, it isimpossible to design a multiscale method, which can recover the NS solutionin the continuum ﬂow regime. The UGKWP updates the macroscopic variables for each control vol-ume in Eq. (8). The equilibrium part ﬂux F eqij is directly calculated fromthe macroscopic ﬂow ﬁeld as given by Eq. (23). In UGKWP, the calcula-tion of the free streaming ﬂux F frij will be divided into two parts. The freestreaming ﬂux from collisional hydrodynamic waves of (1 − e − ∆ t/τ ) W h canbe calculated analytically. The other free streaming ﬂux from collisionlessparticles of hydrodynamic waves e − ∆ t/τ W h and the remained particles W p can be evaluated by counting the particles passing through the cell interfaceduring a time step. The free streaming ﬂux contributed from the collisional13ydrodynamic waves of (1 − e − ∆ t/τ ) W h on the cell interface ij is F fr,waveij = F fr,UGKSij ( W h ) − F fr,DV Mij ( W hp )= (cid:90) u · n ij (cid:20) ( q g h + q u · g h x ) − e − ∆ t/τ (cid:90) ∆ t ( g h − t u · g h x )d t (cid:21) ψ dΞ= (cid:90) u · n ij (cid:20) ( q − ∆ te − ∆ t/τ ) g h + ( q + ∆ t e − ∆ t/τ ) u · g h x (cid:21) ψ dΞ , (33)where g h is the Maxwellian distribution with temperature and average ve-locity determined by total macroscopic variables W , but the density by W h . g h x is the spatial derivative of the Maxwellian distribution, which can be ob-tained from the reconstruction of W and W h . The free streaming ﬂux F frij in Eq. (23) is computed partially by particles and partially by the contribu-tion of g h ( W h ) analytically. In addition, the subtraction of F fr,DV Mij ( W hp )from F fr,UGKSij ( W h ) aims to remove the free transport ﬂuxes which are stillcalculated by the collisionless particle P f sampled from W hp . The total non-equilibrium free streaming ﬂux F frij also includes the contribution from theremaining particles P k from the previous time step. During the free transportprocess, the contribution to the numerical ﬂuxes of cell i can be obtained bycounting the particles across the cell interfaces, F fr,pi = (cid:88) x n +1 k , x ∗ k ∈ Ω i φ k − (cid:88) x nk ∈ Ω i φ k . (34)Finally, the updates of the conservative ﬂow variables in the UGKWP methodare W n +1 i = W ni − | Ω i | (cid:88) j ∈ N ( i ) F eqij | S ij | − | Ω i | (cid:88) j ∈ N ( i ) F frij | S ij | , = W ni − | Ω i | (cid:88) j ∈ N ( i ) F eqij | S ij | − | Ω i | (cid:88) j ∈ N ( i ) F fr,waveij | S ij | + F fr,pi | Ω i | . (35) (a) Time step on unstructured meshFollow the implementation in [28], the time step for unsteady ﬂow simu-lation is obtained from ∆ t = C min i Ω i Λ xi + Λ yi + Λ zi , (36)14ith Courant number C typically satisﬁed 0 < C < i Λ xi = ( | U i | + c )∆ S xi , Λ yi = ( | V i | + c )∆ S yi , Λ zi = ( | W i | + c )∆ S zi , (37)where c = 3 σ i = 3 √ RT i is approximately the sound speed, U i = ( U i , V i , W i )is the macroscopic velocity. The variables ∆ S xi , ∆ S yi , and ∆ S zi , respectively,represent projections of the control volume on the y-z-, x-z-, and x-y-plane,which are given by ∆ S xi = 12 (cid:88) j ∈ N ( i ) | S xij | , ∆ S yi = 12 (cid:88) j ∈ N ( i ) | S yij | , ∆ S zi = 12 (cid:88) j ∈ N ( i ) | S zij | , (38)where S xij , S yij , and S zij denote the x-, y-, and the z-component of the facevector S ij = | S ij | n ij .(b) Particle samplingAt the beginning of each time step, the collisionless particles of hydrody-namic waves will be sampled in pairs from Maxwellian distribution function g ( W n ). Speciﬁcally, given with the macroscopic velocity U = ( U, V, W ), tem-perature T , and vector X that sampled from the normal distribution, a pair ofparticles with microscopic velocities u = U + √ RT X and u (cid:48) = U − √ RT X will be sampled. To determine the sampling particle number N sam in thecell, a prescribed preference number N ref is required. Further, the referencemass m ref can be determined from the total particle mass and the referencenumber N ref m ref = ( ρ p + ρ hp ) | Ω | N ref = ( ρ p + e − ∆ t/τ ρ h ) | Ω | N ref . (39)Once the reference mass m ref is available, the number of particles to besampled symmetrically is determined by N sam = 2 (cid:24) ρ hp | Ω | m ref (cid:25) = 2 (cid:24) e − ∆ t/τ ρ h | Ω | m ref (cid:25) . (40)15f the reference mass m ref is the same for all cells, then the total number ofparticles per cell would be exactly equal to N ref . In this way, the total numberof particles in each cell can be controlled around the given reference number N ref in near continuum regime regardless of mesh distribution. Moreover,the minimum number of particles N min per cell can be prescribed to adjustthe sampled particles’ number such that N sam = max { N sam , N min − N left } , (41)where N left is the collisionless particles left at the initial of each time step.Finally, the sampled mass weight m sam for each sampled particle is m sam = ρ hp | Ω | N sam = e − ∆ t/τ ρ h | Ω | N sam , (42)which guarantees that the total sampled mass is exactly equal to ρ hp | Ω | .(c) Time averagingFor steady-state solution, the ﬂow ﬁeld ¯W starts to be averaged after agiven time step N avg , ¯W = (cid:80) n>N avg ∆ t n W n (cid:80) n>N avg ∆ t n (43)where ∆ t n = t n − t n − . The averaged ﬂow ﬁeld ¯W is assumed to be convergentif the relative change in two-successive steps is less than a given tolerance,such as ε = 10 − . Then, the ﬂow variables, such as the temperature ¯ T andmacroscopic velocity ¯U , can be obtained from the averaged conservative ﬂowvariables ¯W .(d) Numerical dissipationThe UGKWP targets the continuum and rareﬁed ﬂow. In the continuumﬂow regime, the strong shock structure is usually unresolved by the meshsize. Therefore, numerical dissipation is added through relaxation time toenlarge the shock thickness to the mesh size scale, τ num = µP + C | P l − P r || P l + P r | ∆ t, (44)where P l and P r are the reconstructed pressures at the left and right side ofthe cell interface and C is a constant, such as C = 10 for strong shock inthe continuum regime. 16e) Boundary conditionThe proper treatment of boundary condition is crucial for a numericalscheme. For a diﬀusive wall condition with normal direction n pointing to-ward the computational domain, the incoming distribution function f in ( t ) atboundary is given by Eq.(17). The distribution function of emitted particlesfrom the wall has a Maxwellian distribution g w = ρ w (cid:18) πRT w (cid:19) K exp (cid:20) − ( u − U w ) + ξ RT w (cid:21) , (45)where T w and U w are prescribed wall temperature and velocity. Based onthe non-penetration condition, ρ w in the above Maxwellian is given by (cid:90) ∆ t (cid:90) n · ( u − U w ) < n · ( u − U w ) f in ( t )d u d t = ∆ t (cid:90) n · ( u − U w ) ≥ n · ( u − U w ) g w d u . (46)

3. 3D UGKWP code and parallelization

UGKWP solver is constructed under a ﬁnite volume framework on 3Dunstructured mesh. It includes not only the reconstruction and ﬂux evalua-tion module as in the traditional ﬁnite volume solver, but also the particlesampling and tracking module as in pure particle method.

The main components of UGKWP solver are sketched in ﬁg. 2. As shownin the diagram, the program starts with the pre- and post-processor module,where the mesh partition, initialization, setup of boundary condition, andparallel IO are handled inside. In UGKWP solver, the numerical proceduresare organized into a macroscopic ﬁeld level and a microscopic particle level.Accordingly, the macroscopic components surrounded by the blue dash lineconsist of the parallel data transfer, reconstruction of the macroscopic gradi-ent, and macroscopic ﬂux calculation with boundary treatment. The greendash block contains the components for microscopic particles which are storedin the doubly-linked list. There are frequent operations, such as tracking par-ticles and calculating the macroscopic ﬂuxes. The insert/delete operation iseﬃcient in the scenario of parallel transfer and sampling/elimination of theparticles. 17

GKWP solver

Pre/Post prosessor

Parallel IOMesh partition for MPIInitialization && Setupof boundary condition Sample and set particle statusSample particles from Compute macro ﬂux and Sample inﬂow/outﬂow particlesMove particles and particle boundary treatmentUpdate the conservative variables

Calculate

Steady averageReconstruct the gradientof and Parallel data transfer Pack up particles for parallel transferCalculate the micro ﬂux

Figure 2: The structure and main components of the UGKWP solver

The parallelization of the current code adopts Message Passing Inter-face (MPI) based on the physical mesh decomposition. Every MPI processdeals with a non-overlapping sub-domain, and the information like conserva-tive variables and particles are communicated with the neighboring domainthrough corresponding boundaries. Since the macroscopic solver has second-order accuracy, no padding area of sud-domain is required.The code has been tested on Tianhe-2, located in the National Supercom-puter Center in Guangzhou, China. Tianhe-2 contains 16,000 nodes that eachof which possesses one Intel Xeon E5-2692 12 Cores @2.2 GHz CPU and 88gigabytes of memory (64 used by the Ivy Bridge processors, and 8 gigabytesfor each of the Xeon Phi processors). The computing nodes of Tianhe-2 areinterconnected by TH Express-2 network.To test the parallel eﬃciency and scalability of the UGKWP, the code iscompiled using Intel C/C++ compiler of version 18.0.0 with -O3 optimizationﬂag, and are linked to the MPICH2 with a customized GLEX channel. Sincethe scaling is problem-speciﬁc (depending on Knudsen number and preferencenumber of particles per cells N ref ), multiple test cases and several factors18ﬀecting the performance will be analyzed in a series of three-dimensionallid-driven cavity ﬂow tests. Firstly, only the parallel speedup of pure macroscopic ﬁeld computationusing diﬀerent MPI processes is measured to eliminate the computation ofparticles and communication of particle parallel transfer. Without the in-volvement of particle generation and particle transportation, the UGKWPdegenerates to the gas-kinetic scheme (GKS) for the continuum ﬂow compu-tation [29]. As it becomes a deterministic solver, for simplicity, the Knudsennumber is ﬁxed at 10 − in the following parallel computation. The averagedrunning time (wall clock time) of a single iteration step is measured, and themeasurement is ensured to be over 100 seconds, and no IO time is counted.To investigate the Amdahls law (strong scaling) at diﬀerent ﬁxed problemsize, the physical domain is discretized as D , where D is the number of cellsalong with each direction with the values 64 , ,

256 separately. To test theGustafsons law (weak scaling), we concern the speedup for a scaled problemsize to the number of processors. Hence, D = 64 on one node with 24 coresis chosen as the baseline, i.e., keeping the number of cells per processors as64 /

24, and the grid size increased simultaneously as the increment of thenumber of processors P . The corresponding speedup is measured based onthe averaged single-node simulation time, i.e., S P = 24 T p /T .Both strong and weak scaling analysis is plotted in ﬁg. 3. The solidred line represents the ideal linear speedup. From the diagram, the overallcomputational time of the various physical grid sizes scales well with thenumber of processing cores (or MPI processes). Although strong scaling isvery sensitive towards the serial fraction of the program and the communica-tion overhead (e.g., synchronization) could further degrade performance, theworst eﬃciency still has 73 .

8% for the case D = 64 with P = 960 numberof processors. Moreover, it is observed that strong scaling performance in-creases considerably as the increase of grid size. The strong scaling paralleleﬃciency for the largest problem size D = 256 can reach 85 .

4% despite theusage of P = 1536 number of processing cores. Finally, the weak scaling isalso veriﬁed by increasing both the job size and the number of processingcores, and a satisfactory weak scaling eﬃciency up to 92% has been achievedeven with P = 1536 number of processing cores.19 umber of cores (P) S p ee dup ( S P ) IdealD = 64D = 128D = 256Weak scalling

Figure 3: Strong and weak scaling analysis without the involvement of particles

Next, the particles are included in the experiment to explore the scala-bility and parallel eﬃciency of the implementation. Problems with diﬀerentnumbers of cells D , Knudsen number Kn , and preference number of par-ticles per cell N ref are run across diﬀerent nodes. The maximum problemsize is limited by the total memory available on each node, and no IO time isrecorded. The average computing time (wall clock time) for a single iterationstep is averaged from 1001 to 1100 steps to ensure suﬃcient running timeand reach a steady-state solution.The averaged CPU time (second) per step against the number of cores isshown in ﬁg. 4. The results indicate that the scaling performance is good forall simulations because the CPU time decreases almost linearly as the increasein the number of processing cores. Actually, from table 1 and table 2, theparallel eﬃciency E P = S P /P for cases with D ≥

72 is over 86% even with P = 864 number of processing cores.Furthermore, several interesting patterns have been observed. First, forthe cases of both N ref = 50 and N ref = 500, we can observe that the absoluteCPU time raises not only as of the increment of grid size but also the Knud-sen number. The reason is that the mean free path of the particles becomeslarge at a high Knudsen number, which produces the unbalanced distributionof particles among diﬀerent sub-domains as well as in the processing cores.Secondly, the scaling performance deteriorates as the shrinkage of grid size,especially in high-Knudsen number cases, because the proportion of commu-20ication time would increase as the number of cells per core declined, andthe uneven eﬀect of particles distributed among processing cores would alsobe ampliﬁed. Accordingly, the worst parallel eﬃciency E = 68 .

7% is ob-served for the case D = 36 at Kn = 1 with respect to P = 864 number ofcores. Lastly, the parallel eﬃciency would increase as the reference numberof particles becomes larger.Another interesting phenomenon is that maximum parallel eﬃciency canbe greater than one. For instance, the maximum parallel eﬃciency observedis E = 123 .

2% and achieves at the cases of D = 72 , Kn = 10 − and N ref = 500 using P = 864 cores. Actually, the parallel eﬃciency in transitionregime Kn = 10 − is even higher than that in the near continuum regime Kn = 10 − with the same grid size D and reference number of particles N ref . Besides, the worst parallel eﬃciency is even larger than 86 .

3% for allcases at Kn = 10 − , − . This counterintuitive parallel eﬃciency in thetransition regime is probably due to the doubly-linked list data structure forstoring particles. In the transition or near continuum regime, the collisionbetween particles is intensive, and the frequent elimination/resampling ofparticles involves frequent delete/insert operation in memory. Nonetheless,the bottleneck caused by the implemented data structure can be alleviatedthrough the replacement of a sequence container, like the STL vector.In summary, intensive parallel tests of cavity ﬂow show satisfactory strongand weak scaling performance of the UGKWP code. The current implemen-tation becomes a valuable tool for simulating complex ﬂow problems acrossthousands of processing cores in parallel computation. The actual paralleleﬃciency might vary for particular simulation setup. Table 1: Parallel eﬃciency for cases with N ref = 50 on Tianhe-2 Nodes Cores D = 72 D = 120 Kn = 10 − Kn = 10 − Kn = 1 Kn = 10 − Kn = 10 − .

1% 101 .

7% 101 .

9% 103 .

2% 103 . .

8% 102 .

6% 97 .

4% 98 .

1% 102 . .

5% 101 .

7% 95 .

8% 97 .

1% 105 . .

2% 98 .

8% 85 .

9% 96 .

6% 112 . .

3% 91 .

2% 88 .

5% 94 .

3% 109 . umber of cores (P) C P U t i m e ( s ) D = 72, Kn = 10 D = 72, Kn = 10 D = 72, Kn = 1D = 120, Kn = 10 D = 120, Kn = 10 Ideal (a)

Number of cores (P) C P U t i m e ( s ) D = 72, Kn = 10 D = 72, Kn = 10 D = 72, Kn = 1D = 48, Kn = 1D = 36, Kn = 1

Ideal (b)

Figure 4: Performance scaling on Tianhe-2 where each node has 24 cores (a) N ref = 50and (b) N ref = 500Table 2: Parallel eﬃciency for cases with N ref = 500 on Tianhe-2 Nodes Cores D = 72 D = 48 D = 36 Kn = 10 − Kn = 10 − Kn = 1 Kn = 1 Kn = 11 24 100% 100% 100% 100% 100%4 96 101 .

2% 106 .

7% 96 .

4% 92% 92%8 192 100 .

3% 116 .

3% 96 .

5% 92 .

4% 86 . .

7% 119 .

3% 99 .

5% 87 .

2% 82 . .

1% 119 .

3% 98 .

2% 82 .

5% 72 . .

2% 94 .

8% 80 .

8% 68 . . Numerical examples In this section, the accuracy and computational eﬃciency of the UGKWPsolver will be evaluated through many test cases with a wide range of Knud-sen and Mach numbers. The numerical Sod shock tube problem in 3D,lid-driven cubic cavity ﬂow, high-speed ﬂow passing through a cube, and theﬂow around a space vehicle, are tested. The results are compared with thosefrom UGKS/DUGKS and DSMC. Without a special statement, the diﬀusiveboundary condition with full accommodation is applied for the isothermalwalls. The code is compiled with GCC version 7.5.0, and all computationsare carried out on a workstation with [Dual CPU] Intel R (cid:13) Xeon(R) Platinum8168 @ 2.70GHz with 48 cores and 270 GB memory unless indicated other-wise.

The Sod shock tube problem is simulated inside a square-column fordiatomic gas at diﬀerent Knudsen numbers to validate the current UGKWPmethod, and the result is compared with the 1D UGKS solution.In this test case, the following non-dimensionalization is usedˆ ρ = ρρ ∞ , ˆ U = UC ∞ , ˆ V = VC ∞ , ˆ W = WC ∞ , ˆ T = TT ∞ , ˆ P = Pρ ∞ C ∞ , ˆ t = tt ∞ , ˆ x = xL , C ∞ = (cid:114) k B T ∞ m , t ∞ = LC ∞ , and the initial condition for the non-dimensional variables is( ˆ ρ, ˆ U , ˆ V , ˆ W , ˆ P ) = (cid:26) (1 , , , , , < ˆ x < . , (0 . , , , , . , . < ˆ x < . (47)For UGKWP simulation, the physical domain is a [0 , × [ − . , . × [ − . , . × × N ref = 200 , , , , Kn = 10 − , − , − , . , ,

10 respectively. Leastsquare reconstruction with Venkatakrishnan limiter is utilized. For the UGKSsimulation, the 1D physical domain [0 ,

1] is discretized uniformly with 100cells. Composed Newton-Cotes quadrature with 101 velocity points in range[ − ,

6] is ﬁxed to discretize the one-dimensional velocity space. van Leer lim-iter is used for the reconstruction of both conservative variables and discrete23istribution function. The left and right boundaries are treated as far-ﬁeld,and the others are treated as symmetric planes. The CFL number for bothUGKWP and UGKS simulation is 0 .

9, and the reference viscosity is given inEq.(5) with ω = 0 .

74. The results at the time t = 0 .

12 in all ﬂow regimesare presented.The density, velocity, and temperature obtained by the UGKS and theUGKWP method at diﬀerent Knudsen numbers are plotted in Figures 5 to 10,where the three-dimensional ﬂow ﬁeld computed by UGKWP is projectedto one-dimensional along the x-direction by taking ensemble average overthe cells on y-z plane. No time averaging is applied, and the statisticalnoise is satisfactory for this unsteady ﬂow simulation. For all the cases indiﬀerent ﬂow regimes, the 3D UGKWP solutions agree well with the 1DUGKS data. The slight diﬀerence is due to diﬀerent limiters, i.e., van Leerlimiter for UGKS and Venkatakrishnan limiter for UGKWP. The capabilityof the UGKWP method for numerical simulations in both continuum andrareﬁed regime is conﬁrmed.The distinguishable feature of multi-eﬃciency[17] can also be demon-strated here. For UGKS, the computational costs for all Knudsen numbercases will be on the same order since its discretization of the particle velocityspace is the same. While for the UGKWP method, the computational costis reduced at small Knudsen number, e.g., Kn = 10 − in near continuumregime, where the hydrodynamic wave is dominant, and few particles aresampled and tracked. The computational cost of UGKWP for 3D simulationis admissible because only a few hundred or thousand particles are enoughto adaptively discretize the velocity space, whereas it becomes possible that101 mesh points in the velocity space may be required in DVM-based UGKSfor high speed ﬂow. For steady-state simulation, the number of particles canbe reduced further since the statistical noise can be reduced through thetemporal ensemble. For low-speed ﬂow, the UGKWP method is applied to study the three-dimensional lid-driven cubic cavity ﬂow in the transition regime, and theresults are compared with the solution predicted by dugksFoam[16].The side length of the cubic cavity is L = 1 m with the computationaldomain [0 , × [0 , × [0 , hexa-hedrons with the cell size gradually increased towards to the cavity center.The ratio of the cell size in the center and the boundary is about 2. The24 oordinate X D e n s i t y UGKWP 3DUGKS 1D (a)

Coordinate X X V e l o c i t y U UGKWP 3DUGKS 1D (b)