Analytic Solution to the Piecewise Linear Interface Construction Problem and its Application in Curvature Calculation for Volume-of-Fluid Simulation Codes
AAnalytic Solution to the Piecewise Linear Interface ConstructionProblem and its Application in Curvature Calculation forVolume-of-Fluid Simulation Codes
Moritz Lehmann Stephan GekleBiofluid Simulation and Modeling – Theoretische Physik VIUniversity of Bayreuth, 95440 Bayreuth, GermanyTel.: +49 921 55 3094, eMail: [email protected] 19, 2020
Keywords
PLIC, plane-cube intersection, Volume-of-Fluid, LatticeBoltzmann Method, GPU, OpenCL
Abstract
The plane-cube intersection problem has been aroundin literature since 1984 and iterative solutions to ithave been used as part of piecewise linear interfaceconstruction (PLIC) in computational fluid dynamicssimulation codes ever since. In many cases, PLIC isthe bottleneck of these simulations regarding computetime, so a faster, analytic solution to the plane-cubeintersection would greatly reduce compute time for suchsimulations. We derive an analytic solution for allintersection cases and compare it to the one previoussolution from Scardovelli and Zaleski (Ruben Scardovelliand Stephane Zaleski. ”Analytical relations connectinglinear interfaces and volume fractions in rectangulargrids”. In: Journal of Computational Physics 164.1(2000), pp. 228 237.), which we further improveto include edge cases and micro-optimize to reducearithmetic operations and branching. We then extendour comparison regarding compute time and accuracy toinclude two different iterative solutions as well. We findthat the best choice depends on the employed hardwareplatform: on the CPU, Newton-Raphson is fastest withvectorization while analytic solutions perform betterwithout. The reason for this is that vectorizationinstruction sets do not include trigonometric functionsas used in the analytic solutions. On the GPU, thefastest method is our optimized version of the analyticSZ solution.We finally provide details on one of the applicationsof PLIC – curvature calculation for the Volume-of- Fluid model used for free surface fluid simulations incombination with the lattice Boltzmann method.
Piecewise linear interface construction (PLIC) – firstoccurring in literature for 2D in 1982 [1] and for 3D in1984 [2] – refers to the problem of calculating the offsetalong the given normal vector of a plane intersectinga unit cube for a given truncated volume. There arefive possible intersection cases (cf. figure 1), of whichthe numbers (1), (2) and (5) have been already solvedin the original 1984 work by Youngs [2], but the cubicpolynomial cases (3) and (4) – resigned as impossible toalgebraically invert [3] – in the majority of literature areapproximated by a Newton-Raphson iterative solution.Nevertheless, there does exist an analytic solution byScardovelli and Zaleski (SZ) [4] and a single documentedimplementation thereof in Fortran [5] which also includesan approximative version termed APPLIC.Here, we formulate the PLIC problem from the groundup – first in the inverse direction – and derive analternative analytic solution for all intersection cases byinverting the inverse formulation. We then compare ournovel solution with (i) the original SZ solution, (ii) animproved and micro-optimized version of the SZ solutiondeveloped in the present work, (iii) an iterative solutionusing Newton-Raphson and (iv) an iterative solutionusing nested intervals. Depending on the availablemicroarchitecture (GPU/CPU), vectorization may beavailable, which strongly favors multiplications andadditions while not speeding up trigonometric functions,impacting which of the algorithms is fastest.Among the applications for PLIC are Volume-of-Fluidsimulation codes such as
FluidX3D [6, 7] and others [8–17], often in conjunction with GPU implementations [6,1 a r X i v : . [ phy s i c s . c o m p - ph ] J un Plane-Cube Intersection Analytic solution to the PLIC Problem7, 18–24] of the lattice Boltzmann method [25–27], usedfor simulating free surface fluid flows. In particular, thesesimulations work on a cubic lattice with every latticepoint having a fill level assigned to it and PLIC is usedin the process of surface reconstruction during curvaturecalculation for calculating physical surface tension effects[6, 8]. In the final section of this work, we providea detailed overview on the state-of-the-art curvaturecalculation procedure using PLIC.
Inputs to the PLIC algorithm are the truncated volume V ∈ [0 ,
1] and the (normalized) normal vector of theplane (cid:126)n = ( n x , n y , n z ) T , | (cid:126)n | = n x + n y + n z = 1. Thedesired output is the plane offset from the origin (centerof the unit cube) along the normal vector d V , ( n x , n y , n z ) T → d (1)where d ∈ [ − | n x | + | n y | + | n z | , | n x | + | n y | + | n z | ]. Theinterval is determined by the normal vector orientation:depending on the normal vector, the maximum possibledistance from the cube center to be still at least touchingthe cube in one point varies between (normal vectorparallel to one of the coordinate system axes) and √ (normal vector along the space diagonal). To reduce the amount of possible cases and to avoidhaving to consider all possible intersections of the planeand cube edges – following the scheme in [2] and [14] – the normal vector is component-wise mirrored intopositive. The mirrored normal vector components aresorted ascending for their magnitude such that 0 ≤ n ≤ n ≤ n ≤
1. Because (cid:126)n is normalized, the absolute valueof its largest component n is always greater than zero. n := min( | n x | , | n y | , | n z | ) ≥ n := max( | n x | , | n y | , | n z | ) > n := | n x | + | n y | + | n z | − n − n ≥ V ( d ) is symmetric around d = 0 andincreasing monotonically, the reduced-symmetry-volume V ∈ [0 , ] is limited to the lower half of the intersectionvolume V and the upper half is reconstructed from We note that in [14] in equations (21) and (23) respectivelythe ”+” should be a ” − ” and in equation (24) the ” > ” should bea ” < ”. symmetry. V := 12 − (cid:12)(cid:12)(cid:12)(cid:12) V − (cid:12)(cid:12)(cid:12)(cid:12) (5) V =sign( d ) (cid:18) − V (cid:19) + 12 (6)This symmetry condition for the case V > is nowapplied to d , and the coordinate origin is shifted from(0 , ,
0) (center of the unit cube) to ( − , − , − )(bottom left corner in the back in fig. 1), resulting in thedistance d ∈ [0 , n + n + n ] in reduced symmetry space: d := n + n + n − | d | (7) d =sign (cid:18) V − (cid:19) (cid:18) n + n + n − d (cid:19) (8)With this reduction in symmetry, there are only fivedifferent intersection cases remaining (see figure 1).2 Plane-Cube Intersection Analytic solution to the PLIC Problem (5)(4)(3)(2)(1) s s s s s s s s s s s s s s s d d d d d Figure 1: All possible intersection cases of a plane and a unit cube. The truncated volume of cases (1) to (4) is atetrahedral pyramid with zero (1), one (2), two (3) or all three (4) corners extending outside of the unit cube beingcut-off tetrahedral pyramids themselves.
In order to derive the analytic PLIC solution, firstthe inverse problem is formulated in equations – againfollowing the scheme in [2]. In the inverse problem, theintersection volume is calculated from the plane offsetand normal vector as inputs. At first, the intersectionpoints s , s and s of the plane with the coordinatesystem axes (see figure 1) are determined: s := dn ≥ s := dn ≥ s := dn (9) Now one calculates the actual volume in reducedsymmetry space. The approach is to calculate the volumeof the tetrahedral pyramid formed by the plane andthe coordinate system axes and, if necessary, subtractthe volumes of zero, one, two or all three corners thatextend beyond 1. For case (3), an additional conditionis required to mutually exclude case (5), in which thebottom four corners of the cube are located beneath theplane. V = s s s if s < s s (cid:18) s − ( s − (cid:16) − s (cid:17) (cid:19) if s ≥ s ≤ s (cid:18) s s − ( s − s (cid:16) − s (cid:17) − ( s − s (cid:16) − s (cid:17) (cid:19) if s ≥ s ≤ s ( s − ≤ s (cid:18) s s s − ( s − s s (cid:16) − s (cid:17) − ( s − s s (cid:16) − s (cid:17) − ( s − s s (cid:16) − s (cid:17) (cid:19) if s ≥ s (cid:16) − s − s (cid:17) otherwise (10)To shorten equation (10), s , s and s are substituted and the expression is simplified, yielding V = 16 n n n · d (1) if d < n ( d − ( d − n ) ) (2) if n ≤ d ≤ n ( d − ( d − n ) − ( d − n ) ) (3) if n ≤ d ≤ min( n + n , n )( d − ( d − n ) − ( d − n ) − ( d − n ) ) (4) if n ≤ d n n ( d − ( n + n )) (5) if min( n + n , n ) ≤ d ≤ n (11)which is already quite a bit more friendly andcompletes the inverse PLIC formulation in conjunction with equations (2) to (4), (7) and (6). The condition forcase (5) is the remaining free sector of the possible range3 Plane-Cube Intersection Analytic solution to the PLIC Problemof d mutually excluded by the other four cases. Listing1 shows the fully optimized OpenCL C implementationof the inverse PLIC solution. float plic_cube_inverse(const float d0, const float3 n) { // unit cube - (cid:44) → plane intersection: plane offset d0, normal vector n -> volume V0 (cid:44) → in [0,1]const float n1 = fmin(fmin(fabs(n.x), fabs(n.y)), fabs(n.z)); // eliminate (cid:44) → most cases due to symmetryconst float n3 = fmax(fmax(fabs(n.x), fabs(n.y)), fabs(n.z));const float n2 = fabs(n.x)-n1+fabs(n.y)+fabs(n.z)-n3;const float d = 0.5f ∗ (n1+n2+n3)-fabs(d0); // calculate PLIC with reduced (cid:44) → symmetry , shift origin from (0.0,0.0,0.0) -> (0.5,0.5,0.5)float V; // 0.0<=V<=0.5if(fmin(n1+n2, n3)<=d && d<=n3) { // case (5)V = (d-0.5f ∗ (n1+n2))/n3; // avoid division by n1 and n2} else if(d
13 atan2( y, x ) (cid:19) = (12)= c − a (1 − i √ √ x + i y − b (1 + i √ (cid:112) x + i y (13)For better readability, a few expressions are pre-defined.Hereby the normalization condition n + n + n = 1 isapplied. x := 81 n n ( n + n − V n ) > y := (cid:113) n n ) − x ≥ a := √ n n (16) b := 1 √
432 (17) c := n + n (18) t := 9 ( n + n + n ) −
18 (19) x := 324 n n n (1 − V ) ≥ y := (cid:113) t − x ≥ a := 1 √ t (22) b := 1 √ c := n + n + n d = d = √ V n n n (1) if d < n d = n + (cid:113) V n n − n (2) if n ≤ d ≤ n d = f ( x , y , a , b , c ) (3) if n ≤ d ≤ min( n + n , n ) d = f ( x , y , a , b , c ) (4) if n ≤ d d = V n + n + n (5) if min( n + n , n ) ≤ d ≤ n (25)4 Plane-Cube Intersection Analytic solution to the PLIC Problemin conjunction with equations (2) to (4), (5), (8), (12)and (14) to (24).In equation (25) it is noteworthy that the condi-tions for the five different cases are determined aposteriori by the result itself. This means that each casehas to be evaluated successively and for the resultingvalue d the respective condition has to be tested. Ifthe condition is true, calculation is stopped and d isreturned. If the condition is false, the next case has tobe evaluated and so on, until the last case is reached.The order in which the cases are computed andchecked can be optimized to calculate the most difficult and infrequent cases last, when the probability is highthat one of the easier and more frequent cases hasalready been chosen. ’Frequent’ here refers to somecases appearing more often than others with randomized V and (cid:126)n as expected in typical PLIC applications.Here also special considerations for edge cases (moreon that below) need to be taken into account to avoidpossible divisions by zero. With this in mind, the order(5) → (2) → (1) → (3) → (4) is preferred.Additional speedup can be gained by noting that theimplicit condition involving d can be replaced by anexplicit condition involving V for cases (1), (2) and (5): d = d = √ V n n n (1) if 6 V n n < n d = n + (cid:113) V n n − n (2) if 3 n ( V n + n − n ) ≤ n ≤ V n n d = f ( x , y , a , b , c ) (3) if n ≤ d ≤ min( n + n , n ) d = f ( x , y , a , b , c ) (4) if n ≤ d d = V n + n + n (5) if n + n ≤ V n (26)Since V is known, these conditions are checked a prioriin order to avoid root function calls if the condition isfalse.For even more speedup, all redundant mathematicaloperations are reduced to a minimum by pre-calculatingthem to variables (micro-optimization) and conditionchecks mutually excluded by previous checks are skipped,especially all conditions for the very last case. In theimplementation order (5) → (2) → (1) → (3) → (4)the conditions for case (3) simplify to d ≤ n , whichwill mutually exclusive decide between cases (3) and (4).Since both d and d are very complicated expressions,here no simplified a priori condition is formulated.In equations (19), (21) and (25), the argument of thesquare root may be negative before the case condition istested. In this case – since in the actual code, floating-point exception handling is turned off for performancereasons – the resulting NaN of a square root of anegative number would not be captured in the casecondition, leading to a wrong result. An additional fdim function call in the square root solves this issue. Inthe implementation, we artificially exclude the edge case x = 0 in order to instead of atan2( y, x ) use the fasteratan( y/x ), giving the algorithm a 15% speedup. In casebranching would be undesirable, bit masking is also anoption, but bit masking turned out to be slower even onGPUs.Two edge cases still need to be taken into carefulconsideration: n = 0 (2D) and n = 1 (1D). n = 0restricts (cid:126)n to be in a 2D plane of two coordinate systemaxes and n = 1 restricts (cid:126)n to be parallel to one of the coordinate system axes. For the (1D) case, n = 1and the solution is always (5), so n = n = 0 andmin( n + n , n ) = 0, simplifying equation (25) withoutloss of generality to d = (cid:110) V (5) if 0 ≤ d ≤ d = V (28)without any conditions, so it is a necessary requirementthat both additional conditions in eq. (27) must befulfilled automatically. d is in the range 0 ≤ d ≤ n + n + n , so here in the special case we have 0 ≤ d ≤ = ≤
1, which means 0 ≤ d ≤ n = 0 ( n = 0 is excluded heresince it is already covered in the (1D) case, so here n > < n ≤ n ≤ n + n , n ) = n ,simplifying eq. (25) without loss of generality to d = (cid:40) √ V n n (2) if 0 ≤ d ≤ n V n + n (5) if n ≤ d ≤ n (29)A clean derivation of the 2D case yields d = (cid:40) √ V n n (2) if d ≤ n V n + n (5) if n ≤ d (30)which has simpler conditions, so again it is a necessaryrequirement that both additional conditions in eq.5 Plane-Cube Intersection Analytic solution to the PLIC Problem(29) must be fulfilled automatically. d is in the range0 ≤ d ≤ n + n + n , so here with the special conditionswe have 0 ≤ d ≤ n + n ≤ n + n = n , meaning thatboth 0 ≤ d and d ≤ n are indeed fulfilled automatically.In the above chosen (5) → (2) → (1) → (3) → (4)implementation order, the 1D and 2D special cases arealready covered in (5) and (2) at the beginning, so theyboth are excluded in the remaining intersection cases(1), (3) and (4), meaning that there n , n , n > x > float plic_cube_reduced(const float V, const float n1, const float n2, const (cid:44) → float n3) {const float n1pn2=n1+n2, n3xV=n3 ∗ V;if(n1pn2 <=2.0f ∗ n3xV) return n3xV+0.5f ∗ n1pn2; // case (5)const float V6n2n3=6.0f ∗ n2 ∗ n3xV, sqn1=sq(n1);if(V6n2n3 >=sq(n1) && 3.0f ∗ n2 ∗ (2.0f ∗ n3xV+n1-n2)<=sqn1) return 0.5f ∗ n1 (cid:44) → +0.28867513f ∗ sqrt(24.0f ∗ n2 ∗ n3xV-sqn1); // case (2)if(V6n2n3
The analytic PLIC solution by Scardovelli and Zaleskifrom 2000 [4] has been implemented in Fortran in 2016by Kawano [5] where it is used as comparison to theapproximative APPLIC method. Here we focus onthe exact SZ solution. The SZ solution is particularlyinteresting in that it builds upon the L -normalizedplane normal vector instead of the more common L normalization as used in our own solution in section2.3. We first translate the Fortran implementation toOpenCL C, make it compatible with an L -normalizedplane normal vector as input and rescale the result fromthe original [0 ,
1] to [ − | n x | + | n y | + | n z | , | n x | + | n y | + | n z | ]. Theimplementation is provided in listing 3. float plic_cube(const float V0, const float3 n) { // unit cube - plane (cid:44) → intersection: volume V0 in [0,1], normal vector n -> plane offset (cid:44) → d0const float l = fabs(n.x)+fabs(n.y)+fabs(n.z); // length in L1 norm const float ax=fabs(n.x)/l, ay=fabs(n.y)/l, az=fabs(n.z)/l, w=0.5f-fabs(V0 (cid:44) → -0.5f); // eliminate symmetry casesconst float vm1 = fmin(fmin(ax, ay), az);const float vm3 = fmax(fmax(ax, ay), az);const float vm2 = fdim(1.0f, vm1+vm3); // ensure vm2 >=0const float vm12 = vm1+vm2;float alpha = 0.0f;const float v1 = sq(vm1)/(6.0f ∗ vm2 ∗ vm3+1E-25f);const float w6 = 6.0f ∗ vm1 ∗ vm2 ∗ vm3 ∗ w;if(w
For comparison, we also provide iterative solutions usingnested intervals in listing 5 and Newton-Raphson inlisting 6. int log2_fast(const float x) { // evil log2 hack: log2(x)=(as_uint(x)>>23)- (cid:44) → (cid:44) → float n3) {const float n1pn2=n1+n2, n3xV=n3 ∗ V;if(n1pn2 <=2.0f ∗ n3xV) return n3xV+0.5f ∗ n1pn2; // case (5)const float V6n2n3=6.0f ∗ n2 ∗ n3xV, sqn1=sq(n1);if(V6n2n3 >=sq(n1) && 3.0f ∗ n2 ∗ (2.0f ∗ n3xV+n1-n2)<=sqn1) return 0.5f ∗ n1 (cid:44) → +0.28867513f ∗ sqrt(24.0f ∗ n2 ∗ n3xV-sqn1); // case (2)if(V6n2n3
Apart from floating-point errors, the SZ solution inlisting 4 and our own solution in listing 2 produceidentical results. For accuracy comparison, we define theerror as E i := | plic cube inverse(plic cube( V , (cid:126)n i ) , (cid:126)n i ) − V | (31)with plic cube inverse( d , (cid:126)n i ) referring to the implemen-tation in listing 1 and plic cube( V , (cid:126)n i ) referring to thevarious PLIC implementations. We define the averageerror as follows: E avg := 1 N L
N L − (cid:88) i =0 ( E i ) (32)The execution time for a ’blank run’ E i, blank := | plic cube inverse( V − , (cid:126)n i ) − V | (33)containing the time for memory loads and stores aswell as compute time for the inverse PLIC algorithm ismeasured and later subtracted from the execution timesof the different PLIC variants in order to isolate theirexecution time. The time is also divided by the numberof PLIC function evaluations ( N L ) in order to obtainthe time for a single execution.
In this test, (cid:126)n i is set to N = 4096 different normalvectors, of which one is (1 , , T , one is ( √ , √ , T ,510 are random 2D directions in the x - y -plane and theremaining 3584 are random 3D directions. For each ofthese, the volume V is varied in the interval [0 ,
1] – edgecases included – in L = 4096 equally spaced steps. Thetest is executed on a single core of a Coffee Lake IntelCore i7-8700K CPU at 4 . /O2 , /Oi , /Ot , /Qpar , /arch:AVX2 , /fp:fast and /fp:except- compiler flagsset. The results are presented in table 1 below.When AVX2 vectorization is available, Newton-Raphsonis considerably faster than all the other solutions.However we note that our benchmark is a particularlysynthetic scenario where vectorization is easily achievablefor the compiler: The benchmark just calculates PLICrepeatedly in a loop. With FP32 and AVX2, everyeight consecutive iterations are combined into onevectorized iteration, completing in far fewer clock cyclesthan the eight iterations separately would. In real-world applications, PLIC is not computed repeatedly inmultiples of eight, greatly limiting vectorization potentialsuch that the strict criteria for auto-vectorization [28]might not be met. Surprisingly, our analytic solutionand the optimized SZ solution are within margin of errorregarding compute time.In the edge cases of the normal vector (cid:126)n = (1 , , T and the volume V ∈ { , } , the SZ solution by Kawanoreturns -NaN , which propagates through the entire erroraveraging procedure. For the IEEE-754 FP32 floating-point format, the machine epsilon is at (cid:15) = 5 . · − ,meaning that for all other PLIC variants the averageerror E avg is within machine precision.Without compiler optimization ( /Od , /Qpar- ), theexecution time results are very different as shownin table 2. With AVX2 vectorization not available,Newton-Raphson considerably falls behind the analyticsolutions. Since in many applications the target platform for thePLIC algorithm is the GPU, we also benchmark thedifferent variants in OpenCL on an Nvidia Titan XpGPU (3840 CUDA cores at 1582 MHz, driver version442 .
59, OpenCL 1 . , , T , one is ( √ , √ , T , 8388606 are random 2Ddirections in the x - y -plane and the remaining 58720256are random 3D directions. For these normal vectors,PLIC is run in parallel and for each of them, the volume V is varied in the interval [0 ,
1] – edge cases included– in L = 256 equally spaced steps in series. Formore accurate results through averaging, this test isrun 64 times and the mean execution time is averagedand divided by the number of parallel and serial PLICcomputations, resulting in an average compute time ofmore than three magnitudes shorter than for single- core CPU execution as listed in table 3 below. Eventhough the tests are designed differently for the CPUand GPU, this comparison is appropriate because inboth cases the same algorithms are computed and thehardware is fully saturated. The table also includes thenumber of arithmetic operations (floating-point, integerand bit operations combined) N a and the number ofbranching operations N b in PTX assembly [29]. GPUsare especially bad at branching, so a small N b is desired.These operation counts – in analogy to the compute time– refer to the isolated PLIC variants with the background’blank run’ for memory loads and stores as well as theinverse PLIC algorithm subtracted. N a indicates thatNewton-Raphson is unrolled by the compiler whereasnested intervals is not. The number of iterations toreach machine precision for nested intervals depends onthe initial interval width, which is a function of thenormal vector components. For Newton-Raphson, thenumber of branching operations N b is smallest, resultingin rather good performance. Nevertheless, our optimizedimplementation of the SZ solution pulls ahead rathersignificantly and thus is preferred by us.8 Plane-Sphere Intersection Analytic solution to the PLIC ProblemPLIC variant execution time / ns E avg . ± . . · − . ± . . · − . ± . . · − . ± . − NaN5 nested intervals 270 . ± . . · − Table 1: Comparison of execution time and accuracy of the different PLIC variants with compiler optimizationsenabled. PLIC variant execution time / ns E avg . ± . . · − . ± . . · − . ± . − NaN6 Newton-Raphson 180 . ± . . · − . ± . . · − Table 2: Comparison of execution time and accuracy of the different PLIC variants with compiler optimizationsdisabled. PLIC variant execution time / ps N a N b E avg . ± . . · − . ± . − NaN2 our analytic solution 19 . ± . . · − . ± . . · − . ± . . · − Table 3: Comparison of execution time and accuracy of the different PLIC variants in OpenCL on a GPU.
The PLIC problem can be extended to spherical cellswith unit volume (see figure 2): d Figure 2: Plane-sphere intersection. Here only oneintersection case is possible.1 = V = 43 π r (34) r = (cid:114) π (35)The offset along the plane normal vector is the desiredresult while the truncated volume is given. Since wehave a sphere, the normal vector direction of the plane is irrelevant. V → d (36)Calculating the inverse PLIC (getting the volume V from a given offset d ∈ [ − r, r ]) is straight-forward andcovered by the ’spherical cap’ equation: V = π r + d ) (2 r − d ) ∈ [0 ,
1] (37)Calculating the inverse function of the above equationagain is quite difficult due to it being a third orderpolynomial with three complex solutions, but by usingthe same trick as in the plane-cube intersection case(equations (12) and (13)), this real expression isobtained: d = (cid:114) π sin (cid:18) π −
13 atan2 (cid:18) (cid:113) V − V , V − (cid:19)(cid:19) (38)Whilst the plane-sphere intersection solution is notparticularly useful for VoF-LBM simulations, it mightbe of interest for some completely different applications.9 Application: Curvature Calculation for VoF-LBM on the GPU Analytic solution to the PLIC Problem Volume-of-Fluid (VoF) is a model to simulate a sharp,freely moving interface between a fluid and gas phase ina Cartesian lattice [8–11]. While it can be coupled to anyflow solver, here we focus on its usage in conjunction withthe Lattice-Boltzmann-method (LBM). The interface isensured to be exactly one lattice cell thick at any time(illustrated in figure 3). As an indicator for each latticepoint type, the fill level ϕ is introduced, whereby for fluid lattice points ϕ = 1, for interface > ϕ > gas ϕ = 0: ϕ ( (cid:126)x, t ) := m ( (cid:126)x, t ) ρ ( (cid:126)x, t ) = 1 if (cid:126)x is fluid ∈ ]0 ,
1[ if (cid:126)x is interface = 0 if (cid:126)x is gas (39)Here ρ is the density provided by the lattice Boltzmannmethod (LBM) and m is the fluid mass. m is a conservedquantity and cannot be gained or lost, only moved withinthe simulation box. Figure 3: The idea of the Volume-of-Fluid modelillustrated in 2D: A sharp interface (black curved line)divides the gas phase (white cells) from the fluid phase(dark blue cells). Lattice points are located at the centerof each cell. All cells through which the interface extendsare called interface cells (light blue). Every lattice cellhas a fill level ϕ ∈ [0 ,
1] assigned to it, which is ϕ = 0 for gas , ϕ = 1 for fluid and ϕ ∈ ]0 ,
1[ for interface – basedon where exactly the sharp interface cuts through.The key difficulty of modeling a free surface on adiscretized lattice is to obtain the surface curvature,which is a necessary ingredient for calculating the surfacetension via the Young-Laplace pressure∆ p = 2 σ κ (40)with κ = R denoting the local mean curvatureand σ denoting the surface tension parameter of thesimulated fluid. The equation is easy in principle, but calculating κ from the discretized interface geometry isnot. Specifically, discretized interface here means thatonly a local 3 neighborhood of fill levels ϕ ∈ [0 , interface lattice point. ϕ , ..., ϕ → κ (41)The most common algorithm in literature [8, 11] is thecurvature calculation via a least-squares paraboloid fitfrom a neighborhood of points located on the interface.It assumes the local interface to be a paraboloid, thespecifics of which will be given in the following sections.Finding an appropriate set of neighboring points on theinterface requires the PLIC solution. Piecewise linear interface construction works on a 3 neighborhood of an interface lattice point (illustratedin 2D in figure 4a). Within this neighborhood, onlyinterface points other than the center interface point –which is always interface – are considered as candidatesfor the later fitting procedure (figure 4b). The difficultpart is to accurately obtain the heights z i of at leastfive neighboring points located on the true interface(figure 4c). For this, first the normal vector (cid:126)n of thecenter interface point is calculated via the Parker-Youngsapproximation as described in the appendix 10.1 in eq.(53). A new coordinate system is introduced with itsfirst base vector (cid:126)b z defined as this normal vector. Then,the cross product with an arbitrary vector such as (cid:126)r := (0 . , . , . T (42)which is always non-colinear with (cid:126)b z just by randomchance is calculated to provide second and thirdorthonormal vectors (cid:126)b z := (cid:126)n (43) (cid:126)b y := (cid:126)b z × (cid:126)r | (cid:126)b z × (cid:126)r | (44) (cid:126)b x := (cid:126)b y × (cid:126)b z (45)forming the new coordinate system in which the z-axis iscolinear with the surface normal and the center interfacepoint is in the origin. Now the relative positions (cid:126)e i (equal to the D3Q27 LBM streaming directions, eq. (52))of all neighboring interface lattice points are gatheredand transformed into the rotated coordinate system.During this step, the approximate interface position ofneighboring interface points (streaming directions, figure4b) is corrected to the much more accurate interface10 Application: Curvature Calculation for VoF-LBM on the GPU Analytic solution to the PLIC Problem (a) (b) (c) (d) (e) Figure 4: The curvature calculation procedure with PLIC illustrated in 2D. (a) The fill levels of the interfacelattice points indicate the position of the true interface (not known; here illustrated as a black curve), but onlya 3 -neighborhood of these fill levels is available in memory. For obtaining the local curvature, the steps are to(b) identify all interface neighbors (eq. (39)), (c) correct the relative interface neighbor positions with the PLICoffset (section 2), (d) rotate/translate these now PLIC-corrected points into a coordinate system (eq. (46)) with thePLIC-corrected center point being the origin and the z-axis being colinear with the local surface normal (appendix10.1) and finally (e) perform a paraboloid fit with these points (appendix 10.3).position via the PLIC plane-cube intersection solution(section 2, figure 4c and 4d): (cid:126)p i = x i y i z i := (cid:126)e i · (cid:126)b x (cid:126)e i · (cid:126)b y (cid:126)e i · (cid:126)b z + d ( ϕ i , (cid:126)n ) − d ( ϕ , (cid:126)n ) (46)Here i is only the subset of { , .., } for which 0 < ϕ i < d ( V = ϕ i , (cid:126)n ) denotes thePLIC function (equation (8)). Note that d ( V = ϕ , (cid:126)n )only needs to be calculated once while d ( V = ϕ i , (cid:126)n ) hasto be calculated for each neighboring interface point andthat the normal vectors of neighboring interface latticepoints are approximated to be equal to the normal vectorof the center lattice point. In theory, going with theseparately calculated neighbor normal vectors – whichwould require either an additional data buffer for normalvectors in memory or alternatively a 5 neighborhoodwhich would break the multi-GPU capabilities of thecode – should be more accurate, but our practical testsindicated no noticeable difference.The set of points (cid:126)p i is then used to fit a localparaboloid. This paraboloid (figure 4e) here lacks avertical offset parameter as that is handled alreadyby the center point being defined as the origin,reducing computational cost to a LU-decomposition ofdimensionality N = 5. The paraboloid has the form z ( x, y ) = Ax + By + Cxy + Hx + Iy =: (cid:126)x · (cid:126)Q (47)with (cid:126)x := ( A, B, C, H, I ) T (48) (cid:126)Q := ( x , y , x y, x, y ) T (49)The solution vector (cid:126)x and thus the fitting parametersare calculated following the procedure in appendix 10.3. Finally, the constants A , B , C , H and I are inserted intothe analytic equation for the curvature (62), completingthe algorithm. The analytic plane-cube intersection solution presentedin this work has originally been developed for the VoF-LBM GPU simulation code
FluidX3D , where we couldobserve a significant speedup compared to when aniterative nested-intervals solution is used. To illustratethis particular application of PLIC, we show a simulationof a 5 mm diameter raindrop impact at terminal velocityin figure 5. The parameters for this simulation are Re = 35195, We = 5702, Fr = 41 . Ca = 0 . Bo =3 . FluidX3D is documented ingreat detail in [6].11 Application: Curvature Calculation for VoF-LBM on the GPU Analytic solution to the PLIC Problem (a) t = 0 ms (b) t = 1 ms (c) t = 2 ms(d) t = 3 ms (e) t = 4 ms (f) t = 5 ms Figure 5: A 5 mm diameter raindrop impacting a lake at 9 . ms mean sea level pressure terminal velocity [30] and10 ◦ C water temperature simulated with the VoF-LBM GPU simulation code
FluidX3D at a lattice resolution of400 × ×
340 with the D3Q19 discretization and the SRT collision operator. The simulation box in SI-units hasthe dimensions 5 .
00 cm × .
00 cm × .
25 cm and the pool height is 2 .
00 cm. Compute time for this simulation is lessthan one minute on a single Nvidia Titan Xp GPU. Visualization is done with a custom GPU implementation ofthe marching cubes algorithm [31–33]. 12 Declarations Analytic solution to the PLIC Problem
We derived an analytic solution to the PLIC problemand compared it to the existing solution by Scardovelliand Zaleski [4] in two variants: an implementationby Kawano [5] and an improved and micro-optimizedversion thereof. We furthermore compared thesethree analytic solutions to two iterative solutions usingNewton-Raphson and nested intervals. We providesOpenCL C implementations of all variants as well as theinverse PLIC formulation.We observed that in a synthetic benchmark scenariowhere AVX2 vectorization is available on the CPU,the Newton-Raphson solution (listing 6) is considerablyfaster than all other solutions because the analyticsolutions require trigonometric functions which cannotbe vectorized. Our benchmark calculated PLICrepeatedly in a loop. With FP32 and AVX2, every eightconsecutive iterations are combined into one vectorizediteration, completing in far fewer clock cycles thanthe eight iterations separately would. In real-worldapplications however, PLIC is not computed repeatedlyin multiples of eight, greatly limiting vectorizationpotential.Without vectorization, on both the CPU and GPU theanalytic solutions are faster, with our micro-optimizedversion of the SZ solution as presented in listing 4 beingfastest. For a generic PLIC problem, this is the solutionwe recommend.In the most common application of PLIC –curvature calculation for Volume-of-Fluid LBM, whichwe presented in some detail, bringing together the variousrequired parts from literature – profiling revealed PLICto be the main bottleneck regarding compute time Here,the presented fast PLIC solution led to significant speedup of VoF calculations. We hope that our findings willalso make other simulation codes more computationallyefficient.
We gratefully acknowledge computing time provided bythe SuperMUC system of the Leibniz Rechenzentrum,Garching. We further acknowledge support throughthe computational resources provided by the BavarianPolymer Institute. We acknowledge the NVIDIA Cor-poration for donating a Titan Xp GPU for our research.
ML acknowledges funding by the Deutsche Forschungs-gemeinschaft (DFG, German Research Foundation) -Project number 391977956 - SFB 1357 ”Microplastics”(subproject B04).
The authors declare no potential conflicts of interests.
All data is archived and available upon request.
All code beyond what is provided in the listings isarchived and available upon request.
Analytic calculations were done by ML (section 2) andSG (section 3). ML did programming and testing. MLwrote the initial draft and created the illustrations. MLand SG did review and editing on the draft.13 References Analytic solution to the PLIC Problem [1] David L Youngs. “Time-dependent multi-materialflow with large fluid distortion”. In:
Numericalmethods for fluid dynamics (1982).[2] David L Youngs. “An interface tracking methodfor a 3D Eulerian hydrodynamics code”. In:
Atomic Weapons Research Establishment (AWRE)Technical Report
Computers & Mathematics with Applications
Journal ofComputational Physics
Computers & Fluids
134 (2016),pp. 130–145.[6] Moritz Lehmann.
High Performance Free SurfaceLBM on GPUs . 2019.[7] Fabian Husl.
MPI-based multi-GPU extension ofthe Lattice Boltzmann Method . 2019.[8] Simon Bogner, Ulrich R¨ude, and Jens Harting.“Curvature estimation from a volume-of-fluidindicator function for the simulation of surfacetension and wetting with a free-surface latticeBoltzmann method”. In:
Physical Review E
Journalof Statistical Physics
Technical Report05-4. University ofErlangen-Nuremberg, Germany (2005).[11] Thomas Pohl.
High performance simulation of freesurface flows using the lattice Boltzmann method .Verlag Dr. Hut, 2008.[12] Martin Schreiber and DTMP Neumann. “GPUbased simulation and visualization of fluids withfree surfaces”. PhD thesis. Diploma Thesis,Technische Universit¨at M¨unchen, 2010.[13] St´ephane Popinet. “An accurate adaptive solverfor surface-tension-driven interfacial flows”. In:
Journal of Computational Physics
International Journal of Computational Fluid Dy-namics
International journal for numerical methods influids
Computers & fluids
Computers & Mathematics with Applications
Computers & Mathematicswith Applications
Computers & Mathematics with Applications .IEEE. 2018, pp. 825–834.[22] Mark J Mawson and Alistair J Revell. “Memorytransfer optimization for a lattice Boltzmannsolver on Kepler architecture nVidia GPUs”. In:
Computer Physics Communications
Computers & Mathematics with Applications
Computers& Mathematics with Applications
Springer International Publishing
The mathematical theory of non-uniform gases: an account of the kinetic theoryof viscosity, thermal conduction and diffusion ingases . Cambridge university press, 1990.[27] Acep Purqon et al. “Accuracy and NumericalStabilty Analysis of Lattice Boltzmann Methodwith Multiple Relaxation Time for IncompressibleFlows”. In:
Journal of Physics: Conference Series .Vol. 877. 1. IOP Publishing. 2017, p. 012035.[28] Microsoft Corporation.
Vectorizer and parallelizermessages . 2019. url : https://docs.microsoft.com/en-us/cpp/error-messages/tool-errors/vectorizer - and - parallelizer - messages ?view=vs-2019 (visited on 05/11/2020).[29] NVIDIA Corporation. Parallel Thread ExecutionISA Version 6.4 . 2019. url : https : / / docs .nvidia . com / cuda / parallel - thread -execution/index.html (visited on 10/25/2019).[30] Federico Porc`u et al. “Effects of altitude onmaximum raindrop size and fall velocity aslimited by collisional breakup”. In: Journal of theatmospheric sciences
Polygonising a scalar field . 1994.[32] William E Lorensen and Harvey E Cline. “March-ing cubes: A high resolution 3D surface construc-tion algorithm”. In:
ACM siggraph computer graph-ics
Journal of Computer Graphics Techniques Vol
Two and three dimen-sional Eulerian simulation of fluid flow with ma-terial interfaces . Atomic Weapons Establishment,1992.[35] Andrew N Pressley.
Elementary differential geom-etry . Springer Science & Business Media, 2010.[36] Elsa Abbena, Simon Salamon, and Alfred Gray.
Modern differential geometry of curves and sur-faces with Mathematica . Chapman and Hall/CRC,2017.[37] Jingyi Yu et al. “Focal surfaces of discretegeometry”. In:
ACM International ConferenceProceeding Series . Vol. 257. 2007, pp. 23–32.[38] Zvi Harel. “Curvature of curves and surfaces–aparabolic approach”. In:
Department of Mathemat-ics, Technion–Israel Institute of Technology (1995).[39] Yan-Bin Jia. “Gaussian and Mean Curvatures”. In:(2018). [40] David Eberly. “Least squares fitting of data”. In:
Chapel Hill, NC: Magic Software (2000).15 Appendix A: PLIC Inversion with Mathematica Analytic solution to the PLIC Problem
In[1]:= $Assumptions = { x, y, a, b, c, V, n , n , n } ∈ Reals
Out[1]= ( x y a b c V n n n ) ∈ In[2]:= f : = c - a * ( - I * ( / ))/ ( x + I * y ) ^ ( / )- b * ( + I * ( / ))* ( x + I * y ) ^ ( / ) fFullSimplify [ ComplexExpand [ Re [ f ]]] Out[3]= c - - ⅈ a ( x + ⅈ y ) / - + ⅈ ( x + ⅈ y ) / Out[4]= c + a + b x + y / - Cos Arg [ x + ⅈ y ] + Arg [ x + ⅈ y ] x + y / In[5]:=
V1 : = d^3 / ( * n * n * n ) V1Solve [ V ⩵ V1, d ] Out[6]= d n n Out[7]= d → -(- ) / V / n / n / n / , d → / V / n / n / n / , d → (- ) / / V / n / n / n / In[8]:=
V2 : = ( d^3 -( d - n ) ^3 )/ ( * n * n * n ) V2Solve [ V ⩵ V2, d ] Out[9]= d -( d - n ) n n Out[10]= d →
12 n - - n +
24V n n , d →
12 n + - n +
24V n n In[11]:=
V3 : = ( d^3 -( d - n ) ^3 -( d - n ) ^3 )/ ( * n * n * n ) V3Solve [ V ⩵ V3, d ] Out[12]= d -( d - n ) -( d - n ) n n Out[13]= d → n + n + × / n n n + n - n n + - n + n + n - n n / + n + n - n n + - n + n + n - n n / × / , d → n + n - × / + ⅈ n n n + n - n n + - n + n + n - n n / - × / - ⅈ n + n - n n +- n + n + n - n n / , d → n + n - × / - ⅈ n n n + n - n n + - n + n + n - n n / - × / + ⅈ n + n - n n + - n + n + n - n n / plic_solution.nb In[14]:=
V4 : = ( d^3 -( d - n ) ^3 -( d - n ) ^3 -( d - n ) ^3 )/ ( * n * n * n ) V4Solve [ V ⩵ V4, d ] Out[15]= d -( d - n ) -( d - n ) -( d - n ) n n Out[16]= d → ( n + n + n )- - ( n + n + n ) + n + n + n × / n n - n n +( n n - n n ) + - ( n + n + n ) + n + n + n / + × / n n - n n +( n n - n n ) + - ( n + n + n ) + n + n + n / , d → ( n + n + n )+ + ⅈ - ( n + n + n ) + n + n + n × / n n - n n +( n n - n n ) + - ( n + n + n ) + n + n + n / - × / - ⅈ n n - n n +( n n - n n ) + - ( n + n + n ) + n + n + n / , d → ( n + n + n )+ - ⅈ - ( n + n + n ) + n + n + n × / n n - n n +( n n - n n ) + - ( n + n + n ) + n + n + n / - × / + ⅈ n n - n n +( n n - n n ) + - ( n + n + n ) + n + n + n / plic_solution.nb In[17]:=
V5 : = ( d -( n + n )/ )/ n V5Solve [ V ⩵ V5, d ] Out[18]= d + (- n - n ) n Out[19]= d → ( n + n +
2V n ) plic_solution.nb
160 Appendix B Analytic solution to the PLIC Problem
10 Appendix B: Paraboloid Cur-vature, Interface Normal andLeast-Squares Fit Neighborhood
Calculating the normal vector on an interface latticepoint in a 3 neighborhood in which all fill levels ϕ i areknown works by applying the gradient to the fill levels: ∇ ϕ ( x, y, z ) = ∂∂x ϕ ( x, y, z ) ∂∂y ϕ ( x, y, z ) ∂∂z ϕ ( x, y, z ) ≈ ϕ + ϕ + ϕ + ϕ + ϕ + ϕ + ϕ + ϕ + ϕ ϕ + ϕ + ϕ + ϕ + ϕ + ϕ + ϕ + ϕ + ϕ ϕ + ϕ + ϕ + ϕ + ϕ + ϕ + ϕ + ϕ + ϕ −− ϕ + ϕ + ϕ + ϕ + ϕ + ϕ + ϕ + ϕ + ϕ ϕ + ϕ + ϕ + ϕ + ϕ + ϕ + ϕ + ϕ + ϕ ϕ + ϕ + ϕ + ϕ + ϕ + ϕ + ϕ + ϕ + ϕ = 118 (cid:88) i =1 (cid:126)e i ϕ i (50)This is called the center of mass (CM) method: (cid:126)n CM := − (cid:80) i =1 (cid:126)e i ϕ i | (cid:80) i =1 (cid:126)e i ϕ i | (51) (cid:126)e i are the directions from the center point of the 3 -neighborhood to all of its 26 neighbors including itself: (cid:126)e i = ± ± ± ± ± ± ± ± ∓
10 0 ± ± ± ∓ ± ± ± ∓ ±
10 0 0 ± ± ± ∓ ∓ ± ∓ ± ± , i ∈ [0 ,
26] (52)Another more accurate approach is the Parker-Youngs(PY) approximation [11, 34] which assigns differentweights to the gradient components: (cid:126)n PY := − (cid:80) i =1 w i (cid:126)e i ϕ i | (cid:80) i =1 w i (cid:126)e i ϕ i | (53)with w i := | (cid:126)c i | = 12 for | (cid:126)c i | = √
24 for | (cid:126)c i | = √ ◦ while for PY it is approximately 1 ◦ . For thesurface curvature algorithms below, the more accurateand equally fast PY method is used. A paraboloid curve is described by z = f ( x, y ) = A x + B y + C x y + H x + I y + J (55)where A , B , C , H , I and J are fitting parameters.For such a 2D surface in 3D space in the Monge patch x, y, z = f ( x, y ), the mean curvature [35, p.185][36–39]is κ := f xx (cid:0) f y + 1 (cid:1) + f yy (cid:0) f x + 1 (cid:1) − f xy f x f y (cid:16)(cid:113) f x + f y + 1 (cid:17) (56)The partial derivatives of eq. (55) evaluated at the point( x = 0 , y = 0) are f xx | x = y =0 = 2 A (57) f yy | x = y =0 = 2 B (58) f xy | x = y =0 = C (59) f x | x = y =0 = 2 A x + C y + H | x = y =0 = H (60) f y | x = y =0 = 2 B y + C x + I | x = y =0 = I (61)so that the mean curvature for the paraboloid at theorigin is given by κ := A ( I + 1) + B ( H + 1) − C H I (cid:0) √ H + I + 1 (cid:1) . (62)We note here that [8] in equation (13) have an erroneousfactor 2 and that [13] use a different definition of themean curvature. The strategy for finding the requiredfitting parameters is to apply a least-squares fit on aneighborhood of points on the interface.170 Appendix B Analytic solution to the PLIC Problem The least-squares method [40] is a procedure for fittingan analytic curve – here a Monge patch – to a set ofdiscretized points located nearby the analytic curve. Thegeneral idea is to define the total error as a generalexpression of all fitting parameters and the entire set ofdiscretized points and then find its global minimum byzeroing its gradient.The analytic curve first needs to be written in a dotproduct form z ( x, y ) = (cid:126)x · (cid:126)Q (63)with (cid:126)x being defined as the vector of parameters thatdefine the curve and (cid:126)Q = (cid:126)Q ( x, y ) being an expressionof the continuous coordinates x and y . This equationis then discretized to a set of individual data points( x i , y i , z i ) z i ( x i , y i ) ≈ (cid:126)x · (cid:126)Q i (64)with (cid:126)Q i = (cid:126)Q i ( x i , y i ) being a vector containingexpressions only dependent on a discretized set of points( x i , y i ) whose corresponding z -component z i is locatedclose to the curve. In this notation, the error E betweenthe z-positions of the analytic curve (cid:126)x · (cid:126)Q and a set ofz-positions of at least N neighboring interface points z i is defined by summing up the squared differences E ( (cid:126)x ) = N (cid:88) i =0 ( (cid:126)x · (cid:126)Q i − z i ) (65)whereby N denotes the dimensionality which is equal tothe number of desired fitting parameters. The gradientof the error E is calculated and set to zero, where theerror must have a global minimum: ∇ E ( (cid:126)x ) = 2 N (cid:88) i =0 ( (cid:126)x · (cid:126)Q i − z i ) (cid:126)Q i = 0 (66)With some algebra, this equation is then transformedinto a linear equation (cid:32) N (cid:88) i =0 (cid:126)Q i (cid:126)Q i T (cid:33) (cid:126)x = N (cid:88) i =0 z i (cid:126)Q i (67) M := N (cid:88) i =0 (cid:126)Q i (cid:126)Q i T (cid:126)b := N (cid:88) i =0 z i (cid:126)Q i (68) M (cid:126)x = (cid:126)b (69)which is solved by LU-decomposition and provides thedesired solution (cid:126)x that uniquely defines the curve.Note that the matrix M is always symmetrical,meaning that only the upper half and diagonal haveto be calculated explicitly and the lower half is copied over. This reduces computational cost significantly dueto every matrix element being a sum over an expressiondepending on all fitted points. In case there are lessthan N data points available (lattice points next tosolid boundaries may have less interface neighbors), theregular fitting will not work. Instead, then the amount offitting parameters is decreased to match the number ofavailable data points by reducing dimensionality in theLU-decomposition. The ignored fitting parameters willremain zero.Finally, from the solution vector (cid:126)x(cid:126)x