[PDF] Analytic Solution to the Piecewise Linear Interface Construction Problem and its Application in Curvature Calculation for Volume-of-Fluid Simulation Codes

Abstract

The plane-cube intersection problem has been around in literature since 1984 and iterative solutions to it have been used as part of piecewise linear interface construction (PLIC) in computational fluid dynamics simulation codes ever since. In many cases, PLIC is the bottleneck of these simulations regarding compute time, so a faster, analytic solution to the plane-cube intersection would greatly reduce compute time for such simulations. We derive an analytic solution for all intersection cases and compare it to the one previous solution from Scardovelli and Zaleski (Ruben Scardovelli and Stephane Zaleski. "Analytical relations connecting linear interfaces and volume fractions in rectangular grids". In: Journal of Computational Physics 164.1 (2000), pp. 228-237.), which we further improve to include edge cases and micro-optimize to reduce arithmetic operations and branching. We then extend our comparison regarding compute time and accuracy to include two different iterative solutions as well. We find that the best choice depends on the employed hardware platform: on the CPU, Newton-Raphson is fastest with vectorization while analytic solutions perform better without. The reason for this is that vectorization instruction sets do not include trigonometric functions as used in the analytic solutions. On the GPU, the fastest method is our optimized version of the analytic SZ solution. We finally provide details on one of the applications of PLIC: curvature calculation for the Volume-of-Fluid model used for free surface fluid simulations in combination with the lattice Boltzmann method.

Full PDF

AAnalytic Solution to the Piecewise Linear Interface ConstructionProblem and its Application in Curvature Calculation forVolume-of-Fluid Simulation Codes

Moritz Lehmann Stephan GekleBioﬂuid Simulation and Modeling – Theoretische Physik VIUniversity of Bayreuth, 95440 Bayreuth, GermanyTel.: +49 921 55 3094, eMail: [email protected] 19, 2020

Keywords

PLIC, plane-cube intersection, Volume-of-Fluid, LatticeBoltzmann Method, GPU, OpenCL

Abstract

The plane-cube intersection problem has been aroundin literature since 1984 and iterative solutions to ithave been used as part of piecewise linear interfaceconstruction (PLIC) in computational ﬂuid dynamicssimulation codes ever since. In many cases, PLIC isthe bottleneck of these simulations regarding computetime, so a faster, analytic solution to the plane-cubeintersection would greatly reduce compute time for suchsimulations. We derive an analytic solution for allintersection cases and compare it to the one previoussolution from Scardovelli and Zaleski (Ruben Scardovelliand Stephane Zaleski. ”Analytical relations connectinglinear interfaces and volume fractions in rectangulargrids”. In: Journal of Computational Physics 164.1(2000), pp. 228 237.), which we further improveto include edge cases and micro-optimize to reducearithmetic operations and branching. We then extendour comparison regarding compute time and accuracy toinclude two diﬀerent iterative solutions as well. We ﬁndthat the best choice depends on the employed hardwareplatform: on the CPU, Newton-Raphson is fastest withvectorization while analytic solutions perform betterwithout. The reason for this is that vectorizationinstruction sets do not include trigonometric functionsas used in the analytic solutions. On the GPU, thefastest method is our optimized version of the analyticSZ solution.We ﬁnally provide details on one of the applicationsof PLIC – curvature calculation for the Volume-of- Fluid model used for free surface ﬂuid simulations incombination with the lattice Boltzmann method.

Piecewise linear interface construction (PLIC) – ﬁrstoccurring in literature for 2D in 1982 [1] and for 3D in1984 [2] – refers to the problem of calculating the oﬀsetalong the given normal vector of a plane intersectinga unit cube for a given truncated volume. There areﬁve possible intersection cases (cf. ﬁgure 1), of whichthe numbers (1), (2) and (5) have been already solvedin the original 1984 work by Youngs [2], but the cubicpolynomial cases (3) and (4) – resigned as impossible toalgebraically invert [3] – in the majority of literature areapproximated by a Newton-Raphson iterative solution.Nevertheless, there does exist an analytic solution byScardovelli and Zaleski (SZ) [4] and a single documentedimplementation thereof in Fortran [5] which also includesan approximative version termed APPLIC.Here, we formulate the PLIC problem from the groundup – ﬁrst in the inverse direction – and derive analternative analytic solution for all intersection cases byinverting the inverse formulation. We then compare ournovel solution with (i) the original SZ solution, (ii) animproved and micro-optimized version of the SZ solutiondeveloped in the present work, (iii) an iterative solutionusing Newton-Raphson and (iv) an iterative solutionusing nested intervals. Depending on the availablemicroarchitecture (GPU/CPU), vectorization may beavailable, which strongly favors multiplications andadditions while not speeding up trigonometric functions,impacting which of the algorithms is fastest.Among the applications for PLIC are Volume-of-Fluidsimulation codes such as

FluidX3D [6, 7] and others [8–17], often in conjunction with GPU implementations [6,1 a r X i v : . [ phy s i c s . c o m p - ph ] J un Plane-Cube Intersection Analytic solution to the PLIC Problem7, 18–24] of the lattice Boltzmann method [25–27], usedfor simulating free surface ﬂuid ﬂows. In particular, thesesimulations work on a cubic lattice with every latticepoint having a ﬁll level assigned to it and PLIC is usedin the process of surface reconstruction during curvaturecalculation for calculating physical surface tension eﬀects[6, 8]. In the ﬁnal section of this work, we providea detailed overview on the state-of-the-art curvaturecalculation procedure using PLIC.

Inputs to the PLIC algorithm are the truncated volume V ∈ [0 ,

1] and the (normalized) normal vector of theplane (cid:126)n = ( n x , n y , n z ) T , | (cid:126)n | = n x + n y + n z = 1. Thedesired output is the plane oﬀset from the origin (centerof the unit cube) along the normal vector d V , ( n x , n y , n z ) T → d (1)where d ∈ [ − | n x | + | n y | + | n z | , | n x | + | n y | + | n z | ]. Theinterval is determined by the normal vector orientation:depending on the normal vector, the maximum possibledistance from the cube center to be still at least touchingthe cube in one point varies between (normal vectorparallel to one of the coordinate system axes) and √ (normal vector along the space diagonal). To reduce the amount of possible cases and to avoidhaving to consider all possible intersections of the planeand cube edges – following the scheme in [2] and [14] – the normal vector is component-wise mirrored intopositive. The mirrored normal vector components aresorted ascending for their magnitude such that 0 ≤ n ≤ n ≤ n ≤

1. Because (cid:126)n is normalized, the absolute valueof its largest component n is always greater than zero. n := min( | n x | , | n y | , | n z | ) ≥ n := max( | n x | , | n y | , | n z | ) > n := | n x | + | n y | + | n z | − n − n ≥ V ( d ) is symmetric around d = 0 andincreasing monotonically, the reduced-symmetry-volume V ∈ [0 , ] is limited to the lower half of the intersectionvolume V and the upper half is reconstructed from We note that in [14] in equations (21) and (23) respectivelythe ”+” should be a ” − ” and in equation (24) the ” > ” should bea ” < ”. symmetry. V := 12 − (cid:12)(cid:12)(cid:12)(cid:12) V − (cid:12)(cid:12)(cid:12)(cid:12) (5) V =sign( d ) (cid:18) − V (cid:19) + 12 (6)This symmetry condition for the case V > is nowapplied to d , and the coordinate origin is shifted from(0 , ,

0) (center of the unit cube) to ( − , − , − )(bottom left corner in the back in ﬁg. 1), resulting in thedistance d ∈ [0 , n + n + n ] in reduced symmetry space: d := n + n + n − | d | (7) d =sign (cid:18) V − (cid:19) (cid:18) n + n + n − d (cid:19) (8)With this reduction in symmetry, there are only ﬁvediﬀerent intersection cases remaining (see ﬁgure 1).2 Plane-Cube Intersection Analytic solution to the PLIC Problem (5)(4)(3)(2)(1) s s s s s s s s s s s s s s s d d d d d Figure 1: All possible intersection cases of a plane and a unit cube. The truncated volume of cases (1) to (4) is atetrahedral pyramid with zero (1), one (2), two (3) or all three (4) corners extending outside of the unit cube beingcut-oﬀ tetrahedral pyramids themselves.

In order to derive the analytic PLIC solution, ﬁrstthe inverse problem is formulated in equations – againfollowing the scheme in [2]. In the inverse problem, theintersection volume is calculated from the plane oﬀsetand normal vector as inputs. At ﬁrst, the intersectionpoints s , s and s of the plane with the coordinatesystem axes (see ﬁgure 1) are determined: s := dn ≥ s := dn ≥ s := dn (9) Now one calculates the actual volume in reducedsymmetry space. The approach is to calculate the volumeof the tetrahedral pyramid formed by the plane andthe coordinate system axes and, if necessary, subtractthe volumes of zero, one, two or all three corners thatextend beyond 1. For case (3), an additional conditionis required to mutually exclude case (5), in which thebottom four corners of the cube are located beneath theplane. V =  s s s if s < s s (cid:18) s − ( s − (cid:16) − s (cid:17) (cid:19) if s ≥ s ≤ s (cid:18) s s − ( s − s (cid:16) − s (cid:17) − ( s − s (cid:16) − s (cid:17) (cid:19) if s ≥ s ≤ s ( s − ≤ s (cid:18) s s s − ( s − s s (cid:16) − s (cid:17) − ( s − s s (cid:16) − s (cid:17) − ( s − s s (cid:16) − s (cid:17) (cid:19) if s ≥ s (cid:16) − s − s (cid:17) otherwise (10)To shorten equation (10), s , s and s are substituted and the expression is simpliﬁed, yielding V = 16 n n n ·  d (1) if d < n ( d − ( d − n ) ) (2) if n ≤ d ≤ n ( d − ( d − n ) − ( d − n ) ) (3) if n ≤ d ≤ min( n + n , n )( d − ( d − n ) − ( d − n ) − ( d − n ) ) (4) if n ≤ d n n ( d − ( n + n )) (5) if min( n + n , n ) ≤ d ≤ n (11)which is already quite a bit more friendly andcompletes the inverse PLIC formulation in conjunction with equations (2) to (4), (7) and (6). The condition forcase (5) is the remaining free sector of the possible range3 Plane-Cube Intersection Analytic solution to the PLIC Problemof d mutually excluded by the other four cases. Listing1 shows the fully optimized OpenCL C implementationof the inverse PLIC solution. float plic_cube_inverse(const float d0, const float3 n) { // unit cube - (cid:44) → plane intersection: plane offset d0, normal vector n -> volume V0 (cid:44) → in [0,1]const float n1 = fmin(fmin(fabs(n.x), fabs(n.y)), fabs(n.z)); // eliminate (cid:44) → most cases due to symmetryconst float n3 = fmax(fmax(fabs(n.x), fabs(n.y)), fabs(n.z));const float n2 = fabs(n.x)-n1+fabs(n.y)+fabs(n.z)-n3;const float d = 0.5f ∗ (n1+n2+n3)-fabs(d0); // calculate PLIC with reduced (cid:44) → symmetry , shift origin from (0.0,0.0,0.0) -> (0.5,0.5,0.5)float V; // 0.0<=V<=0.5if(fmin(n1+n2, n3)<=d && d<=n3) { // case (5)V = (d-0.5f ∗ (n1+n2))/n3; // avoid division by n1 and n2} else if(d0.5} Listing 1: Fully optimized OpenCL C implementationof the inverse PLIC solution. To avoid branchingbetween cases (3) and (4), in the implementation the fdimf(x,y):=max(x-y,0) function is used. Case (2)cannot be included with another fdimf(x,y) because incase (2) division by n must be avoided since it could bezero. Equation (11) is now inverted for each case individually.Cases (1), (2) and (5) are easy, but cases (3) and (4) arenon-trivial third order polynomials. Here we make useof the tool Mathematica, which outputs three complexsolutions for cases (3) and (4) each (section 9), of whichthe third solutions respectively are the right ones astheir imaginary parts in the desired range are zero aftersimpliﬁcation. Luckily, both are of the same overall form(eq. (13)). However, a complex solution is not useful heresince the expected result is a real number – a problemknown as the casus irreducibilis – and most programminglanguages cannot deal with complex numbers natively.It would also lead to unwanted computational overheadto carry along the imaginary part during computation,which in the end will be zero anyway. To overcomethis, we again make use of Mathematica to simplify thegeneral form of the complex solution (eq. (13)) in orderto obtain the real, trigonometric solution, which is theneven further simpliﬁed using the trigonometric identity √ α ) − cos( α ) = − π − α ) such that the numberof trigonometric functions (which are computationallyexpensive compared to simpler operations such asadditions or multiplications) is minimized (eq. (12)).f ( x, y, a, b, c ) := c − a + b (cid:112) x + y (cid:112) x + y sin (cid:18) π −

13 atan2( y, x ) (cid:19) = (12)= c − a (1 − i √ √ x + i y − b (1 + i √ (cid:112) x + i y (13)For better readability, a few expressions are pre-deﬁned.Hereby the normalization condition n + n + n = 1 isapplied. x := 81 n n ( n + n − V n ) > y := (cid:113) n n ) − x ≥ a := √ n n (16) b := 1 √

432 (17) c := n + n (18) t := 9 ( n + n + n ) −

18 (19) x := 324 n n n (1 − V ) ≥ y := (cid:113) t − x ≥ a := 1 √ t (22) b := 1 √ c := n + n + n d =  d = √ V n n n (1) if d < n d = n + (cid:113) V n n − n (2) if n ≤ d ≤ n d = f ( x , y , a , b , c ) (3) if n ≤ d ≤ min( n + n , n ) d = f ( x , y , a , b , c ) (4) if n ≤ d d = V n + n + n (5) if min( n + n , n ) ≤ d ≤ n (25)4 Plane-Cube Intersection Analytic solution to the PLIC Problemin conjunction with equations (2) to (4), (5), (8), (12)and (14) to (24).In equation (25) it is noteworthy that the condi-tions for the ﬁve diﬀerent cases are determined aposteriori by the result itself. This means that each casehas to be evaluated successively and for the resultingvalue d the respective condition has to be tested. Ifthe condition is true, calculation is stopped and d isreturned. If the condition is false, the next case has tobe evaluated and so on, until the last case is reached.The order in which the cases are computed andchecked can be optimized to calculate the most diﬃcult and infrequent cases last, when the probability is highthat one of the easier and more frequent cases hasalready been chosen. ’Frequent’ here refers to somecases appearing more often than others with randomized V and (cid:126)n as expected in typical PLIC applications.Here also special considerations for edge cases (moreon that below) need to be taken into account to avoidpossible divisions by zero. With this in mind, the order(5) → (2) → (1) → (3) → (4) is preferred.Additional speedup can be gained by noting that theimplicit condition involving d can be replaced by anexplicit condition involving V for cases (1), (2) and (5): d =  d = √ V n n n (1) if 6 V n n < n d = n + (cid:113) V n n − n (2) if 3 n ( V n + n − n ) ≤ n ≤ V n n d = f ( x , y , a , b , c ) (3) if n ≤ d ≤ min( n + n , n ) d = f ( x , y , a , b , c ) (4) if n ≤ d d = V n + n + n (5) if n + n ≤ V n (26)Since V is known, these conditions are checked a prioriin order to avoid root function calls if the condition isfalse.For even more speedup, all redundant mathematicaloperations are reduced to a minimum by pre-calculatingthem to variables (micro-optimization) and conditionchecks mutually excluded by previous checks are skipped,especially all conditions for the very last case. In theimplementation order (5) → (2) → (1) → (3) → (4)the conditions for case (3) simplify to d ≤ n , whichwill mutually exclusive decide between cases (3) and (4).Since both d and d are very complicated expressions,here no simpliﬁed a priori condition is formulated.In equations (19), (21) and (25), the argument of thesquare root may be negative before the case condition istested. In this case – since in the actual code, ﬂoating-point exception handling is turned oﬀ for performancereasons – the resulting NaN of a square root of anegative number would not be captured in the casecondition, leading to a wrong result. An additional fdim function call in the square root solves this issue. Inthe implementation, we artiﬁcially exclude the edge case x = 0 in order to instead of atan2( y, x ) use the fasteratan( y/x ), giving the algorithm a 15% speedup. In casebranching would be undesirable, bit masking is also anoption, but bit masking turned out to be slower even onGPUs.Two edge cases still need to be taken into carefulconsideration: n = 0 (2D) and n = 1 (1D). n = 0restricts (cid:126)n to be in a 2D plane of two coordinate systemaxes and n = 1 restricts (cid:126)n to be parallel to one of the coordinate system axes. For the (1D) case, n = 1and the solution is always (5), so n = n = 0 andmin( n + n , n ) = 0, simplifying equation (25) withoutloss of generality to d = (cid:110) V (5) if 0 ≤ d ≤ d = V (28)without any conditions, so it is a necessary requirementthat both additional conditions in eq. (27) must befulﬁlled automatically. d is in the range 0 ≤ d ≤ n + n + n , so here in the special case we have 0 ≤ d ≤ = ≤

1, which means 0 ≤ d ≤ n = 0 ( n = 0 is excluded heresince it is already covered in the (1D) case, so here n > < n ≤ n ≤ n + n , n ) = n ,simplifying eq. (25) without loss of generality to d = (cid:40) √ V n n (2) if 0 ≤ d ≤ n V n + n (5) if n ≤ d ≤ n (29)A clean derivation of the 2D case yields d = (cid:40) √ V n n (2) if d ≤ n V n + n (5) if n ≤ d (30)which has simpler conditions, so again it is a necessaryrequirement that both additional conditions in eq.5 Plane-Cube Intersection Analytic solution to the PLIC Problem(29) must be fulﬁlled automatically. d is in the range0 ≤ d ≤ n + n + n , so here with the special conditionswe have 0 ≤ d ≤ n + n ≤ n + n = n , meaning thatboth 0 ≤ d and d ≤ n are indeed fulﬁlled automatically.In the above chosen (5) → (2) → (1) → (3) → (4)implementation order, the 1D and 2D special cases arealready covered in (5) and (2) at the beginning, so theyboth are excluded in the remaining intersection cases(1), (3) and (4), meaning that there n , n , n > x > float plic_cube_reduced(const float V, const float n1, const float n2, const (cid:44) → float n3) {const float n1pn2=n1+n2, n3xV=n3 ∗ V;if(n1pn2 <=2.0f ∗ n3xV) return n3xV+0.5f ∗ n1pn2; // case (5)const float V6n2n3=6.0f ∗ n2 ∗ n3xV, sqn1=sq(n1);if(V6n2n3 >=sq(n1) && 3.0f ∗ n2 ∗ (2.0f ∗ n3xV+n1-n2)<=sqn1) return 0.5f ∗ n1 (cid:44) → +0.28867513f ∗ sqrt(24.0f ∗ n2 ∗ n3xV-sqn1); // case (2)if(V6n2n3 0const float y32 = fdim(23328.0f ∗ cb(n1xn2), sq(x3)); // y3>=0const float u3 = cbrt(sq(x3)+y32);const float d3 = n1pn2-(7.5595264f ∗ n1xn2+0.26456684f ∗ u3) ∗ rsqrt(u3) ∗ sin (cid:44) → (0.5235988f-0.33333334f ∗ atan(sqrt(y32)/x3)); // x3>0if(d3<=n3) return d3; // case (3)const float t4 = 9.0f ∗ sq(n1pn2+n3)-18.0f;const float x4 = fmax(n1xn2 ∗ n3 ∗ (324.0f-648.0f ∗ V), 1.1754944E-38f); // (cid:44) → avoid edge case V==0.5 to make x4>0const float y42 = 4.0f ∗ cb(t4)-sq(x4); // y4>=0const float u4 = cbrt(sq(x4)+y42);const float d4 = 0.5f ∗ (n1pn2+n3)-(0.20998684f ∗ t4+0.13228342f ∗ u4) ∗ rsqrt(u4) (cid:44) → ∗ sin(0.5235988f-0.33333334f ∗ atan(sqrt(y42)/x4)); // x4>0return d4; // case (4)}float plic_cube(const float V0, const float3 n) { // unit cube - plane (cid:44) → intersection: volume V0 in [0,1], normal vector n -> plane offset (cid:44) → d0const float ax=fabs(n.x), ay=fabs(n.y), az=fabs(n.z), V=0.5f-fabs(V0-0.5f) (cid:44) → ; // eliminate symmetry casesconst float n1 = fmin(fmin(ax, ay), az);const float n3 = fmax(fmax(ax, ay), az);const float n2 = ax-n1+ay+az-n3;const float d = plic_cube_reduced(V, n1, n2, n3); // calculate PLIC with (cid:44) → reduced symmetryreturn copysign(0.5f ∗ (n1+n2+n3)-d, V0-0.5f); // apply symmetry for V0>0.5} Listing 2: Fully optimized OpenCL C implementation ofour analytic PLIC solution.

The analytic PLIC solution by Scardovelli and Zaleskifrom 2000 [4] has been implemented in Fortran in 2016by Kawano [5] where it is used as comparison to theapproximative APPLIC method. Here we focus onthe exact SZ solution. The SZ solution is particularlyinteresting in that it builds upon the L -normalizedplane normal vector instead of the more common L normalization as used in our own solution in section2.3. We ﬁrst translate the Fortran implementation toOpenCL C, make it compatible with an L -normalizedplane normal vector as input and rescale the result fromthe original [0 ,

1] to [ − | n x | + | n y | + | n z | , | n x | + | n y | + | n z | ]. Theimplementation is provided in listing 3. float plic_cube(const float V0, const float3 n) { // unit cube - plane (cid:44) → intersection: volume V0 in [0,1], normal vector n -> plane offset (cid:44) → d0const float l = fabs(n.x)+fabs(n.y)+fabs(n.z); // length in L1 norm const float ax=fabs(n.x)/l, ay=fabs(n.y)/l, az=fabs(n.z)/l, w=0.5f-fabs(V0 (cid:44) → -0.5f); // eliminate symmetry casesconst float vm1 = fmin(fmin(ax, ay), az);const float vm3 = fmax(fmax(ax, ay), az);const float vm2 = fdim(1.0f, vm1+vm3); // ensure vm2 >=0const float vm12 = vm1+vm2;float alpha = 0.0f;const float v1 = sq(vm1)/(6.0f ∗ vm2 ∗ vm3+1E-25f);const float w6 = 6.0f ∗ vm1 ∗ vm2 ∗ vm3 ∗ w;if(w0.5} Listing 3: The Fortran implementation [5] of the analyticSZ PLIC solution [4] translated to OpenCL C withoutfurther optimization.When closely studying the implementation in listing 3,we notice that the 1D edge case is poorly handled despitethe addition of a tiny constant to the denominator forthe calculation of v1 . In the edge cases of the normalvector (cid:126)n = (1 , , T and the volume V ∈ { , } , thealgorithm returns -NaN both in our OpenCL and inthe original Fortran implementation. By checking cases(5) and (2) ﬁrst and by avoiding divisions by vm1 and vm2 before the checks for cases (5) and (2) respectively,we improve handling of the 1D edge case. We furtherapply some micro-optimization to signiﬁcantly reducethe number of arithmetic operations, resulting in anoptimized implementation of the SZ solution shown inlisting 4 below. float plic_cube_reduced(const float V, const float n1, const float n2, const (cid:44) → float n3) { // optimized solution from SZ and Kawanoconst float n12=n1+n2, n3V=n3 ∗ V;if(n12 <=2.0f ∗ n3V) return n3V+0.5f ∗ n12; // case (5)const float sqn1=sq(n1), n26=6.0f ∗ n2, v1=sqn1/n26; // after case (5) check (cid:44) → n2>0 is trueif(v1<=n3V && n3V0 is (cid:44) → trueconst float sqn12=sqn1+sq(n2), V6cbn12=V6-cb(n1)-cb(n2);const bool case34 = n3V plane offset (cid:44) → d0const float ax=fabs(n.x), ay=fabs(n.y), az=fabs(n.z), V=0.5f-fabs(V0-0.5f) (cid:44) → , l=ax+ay+az; // eliminate symmetry cases , normalize n using L1 (cid:44) → norm const float n1 = fmin(fmin(ax, ay), az)/l;const float n3 = fmax(fmax(ax, ay), az)/l;const float n2 = fdim(1.0f, n1+n3); // ensure n2>=0const float d = plic_cube_reduced(V, n1, n2, n3); // calculate PLIC with (cid:44) → reduced symmetryreturn l ∗ copysign(0.5f-d, V0-0.5f); // rescale result and apply symmetry (cid:44) → for V0>0.5} Listing 4: Fully optimized OpenCL C implementation ofthe SZ PLIC solution.

For comparison, we also provide iterative solutions usingnested intervals in listing 5 and Newton-Raphson inlisting 6. int log2_fast(const float x) { // evil log2 hack: log2(x)=(as_uint(x)>>23)- (cid:44) → (cid:44) → float n3) {const float n1pn2=n1+n2, n3xV=n3 ∗ V;if(n1pn2 <=2.0f ∗ n3xV) return n3xV+0.5f ∗ n1pn2; // case (5)const float V6n2n3=6.0f ∗ n2 ∗ n3xV, sqn1=sq(n1);if(V6n2n3 >=sq(n1) && 3.0f ∗ n2 ∗ (2.0f ∗ n3xV+n1-n2)<=sqn1) return 0.5f ∗ n1 (cid:44) → +0.28867513f ∗ sqrt(24.0f ∗ n2 ∗ n3xV-sqn1); // case (2)if(V6n2n3 0.5} Listing 5: An OpenCL C implementation of the iterativenested intervals solution for cases (3) and (4). float plic_cube_reduced(const float V, const float n1, const float n2, const (cid:44) → float n3) {const float n1pn2=n1+n2, n3xV=n3 ∗ V;if(n1pn2 <=2.0f ∗ n3xV) return n3xV+0.5f ∗ n1pn2; // case (5)const float V6n2n3=6.0f ∗ n2 ∗ n3xV, sqn1=sq(n1);if(V6n2n3 >=sq(n1) && 3.0f ∗ n2 ∗ (2.0f ∗ n3xV+n1-n2)<=sqn1) return 0.5f ∗ n1 (cid:44) → +0.28867513f ∗ sqrt(24.0f ∗ n2 ∗ n3xV-sqn1); // case (2)const float V6n1n2n3 = V6n2n3 ∗ n1;float dmin, dmax, d;if(V6n2n3 0.5} Listing 6: OpenCL C implementation of the iterativeNewton-Raphson solution for cases (1), (3) and (4).Calculating case (1) with Newton-Raphson as wellinstead of the cbrt() function results in a very small, butnoticeable improvement in performance when executedon the CPU with AVX2 vectorization.

Apart from ﬂoating-point errors, the SZ solution inlisting 4 and our own solution in listing 2 produceidentical results. For accuracy comparison, we deﬁne theerror as E i := | plic cube inverse(plic cube( V , (cid:126)n i ) , (cid:126)n i ) − V | (31)with plic cube inverse( d , (cid:126)n i ) referring to the implemen-tation in listing 1 and plic cube( V , (cid:126)n i ) referring to thevarious PLIC implementations. We deﬁne the averageerror as follows: E avg := 1 N L

N L − (cid:88) i =0 ( E i ) (32)The execution time for a ’blank run’ E i, blank := | plic cube inverse( V − , (cid:126)n i ) − V | (33)containing the time for memory loads and stores aswell as compute time for the inverse PLIC algorithm ismeasured and later subtracted from the execution timesof the diﬀerent PLIC variants in order to isolate theirexecution time. The time is also divided by the numberof PLIC function evaluations ( N L ) in order to obtainthe time for a single execution.

In this test, (cid:126)n i is set to N = 4096 diﬀerent normalvectors, of which one is (1 , , T , one is ( √ , √ , T ,510 are random 2D directions in the x - y -plane and theremaining 3584 are random 3D directions. For each ofthese, the volume V is varied in the interval [0 ,

1] – edgecases included – in L = 4096 equally spaced steps. Thetest is executed on a single core of a Coﬀee Lake IntelCore i7-8700K CPU at 4 . /O2 , /Oi , /Ot , /Qpar , /arch:AVX2 , /fp:fast and /fp:except- compiler ﬂagsset. The results are presented in table 1 below.When AVX2 vectorization is available, Newton-Raphsonis considerably faster than all the other solutions.However we note that our benchmark is a particularlysynthetic scenario where vectorization is easily achievablefor the compiler: The benchmark just calculates PLICrepeatedly in a loop. With FP32 and AVX2, everyeight consecutive iterations are combined into onevectorized iteration, completing in far fewer clock cyclesthan the eight iterations separately would. In real-world applications, PLIC is not computed repeatedly inmultiples of eight, greatly limiting vectorization potentialsuch that the strict criteria for auto-vectorization [28]might not be met. Surprisingly, our analytic solutionand the optimized SZ solution are within margin of errorregarding compute time.In the edge cases of the normal vector (cid:126)n = (1 , , T and the volume V ∈ { , } , the SZ solution by Kawanoreturns -NaN , which propagates through the entire erroraveraging procedure. For the IEEE-754 FP32 ﬂoating-point format, the machine epsilon is at (cid:15) = 5 . · − ,meaning that for all other PLIC variants the averageerror E avg is within machine precision.Without compiler optimization ( /Od , /Qpar- ), theexecution time results are very diﬀerent as shownin table 2. With AVX2 vectorization not available,Newton-Raphson considerably falls behind the analyticsolutions. Since in many applications the target platform for thePLIC algorithm is the GPU, we also benchmark thediﬀerent variants in OpenCL on an Nvidia Titan XpGPU (3840 CUDA cores at 1582 MHz, driver version442 .

59, OpenCL 1 . , , T , one is ( √ , √ , T , 8388606 are random 2Ddirections in the x - y -plane and the remaining 58720256are random 3D directions. For these normal vectors,PLIC is run in parallel and for each of them, the volume V is varied in the interval [0 ,

1] – edge cases included– in L = 256 equally spaced steps in series. Formore accurate results through averaging, this test isrun 64 times and the mean execution time is averagedand divided by the number of parallel and serial PLICcomputations, resulting in an average compute time ofmore than three magnitudes shorter than for single- core CPU execution as listed in table 3 below. Eventhough the tests are designed diﬀerently for the CPUand GPU, this comparison is appropriate because inboth cases the same algorithms are computed and thehardware is fully saturated. The table also includes thenumber of arithmetic operations (ﬂoating-point, integerand bit operations combined) N a and the number ofbranching operations N b in PTX assembly [29]. GPUsare especially bad at branching, so a small N b is desired.These operation counts – in analogy to the compute time– refer to the isolated PLIC variants with the background’blank run’ for memory loads and stores as well as theinverse PLIC algorithm subtracted. N a indicates thatNewton-Raphson is unrolled by the compiler whereasnested intervals is not. The number of iterations toreach machine precision for nested intervals depends onthe initial interval width, which is a function of thenormal vector components. For Newton-Raphson, thenumber of branching operations N b is smallest, resultingin rather good performance. Nevertheless, our optimizedimplementation of the SZ solution pulls ahead rathersigniﬁcantly and thus is preferred by us.8 Plane-Sphere Intersection Analytic solution to the PLIC ProblemPLIC variant execution time / ns E avg . ± . . · − . ± . . · − . ± . . · − . ± . − NaN5 nested intervals 270 . ± . . · − Table 1: Comparison of execution time and accuracy of the diﬀerent PLIC variants with compiler optimizationsenabled. PLIC variant execution time / ns E avg . ± . . · − . ± . . · − . ± . − NaN6 Newton-Raphson 180 . ± . . · − . ± . . · − Table 2: Comparison of execution time and accuracy of the diﬀerent PLIC variants with compiler optimizationsdisabled. PLIC variant execution time / ps N a N b E avg . ± . . · − . ± . − NaN2 our analytic solution 19 . ± . . · − . ± . . · − . ± . . · − Table 3: Comparison of execution time and accuracy of the diﬀerent PLIC variants in OpenCL on a GPU.

The PLIC problem can be extended to spherical cellswith unit volume (see ﬁgure 2): d Figure 2: Plane-sphere intersection. Here only oneintersection case is possible.1 = V = 43 π r (34) r = (cid:114) π (35)The oﬀset along the plane normal vector is the desiredresult while the truncated volume is given. Since wehave a sphere, the normal vector direction of the plane is irrelevant. V → d (36)Calculating the inverse PLIC (getting the volume V from a given oﬀset d ∈ [ − r, r ]) is straight-forward andcovered by the ’spherical cap’ equation: V = π r + d ) (2 r − d ) ∈ [0 ,

1] (37)Calculating the inverse function of the above equationagain is quite diﬃcult due to it being a third orderpolynomial with three complex solutions, but by usingthe same trick as in the plane-cube intersection case(equations (12) and (13)), this real expression isobtained: d = (cid:114) π sin (cid:18) π −

13 atan2 (cid:18) (cid:113) V − V , V − (cid:19)(cid:19) (38)Whilst the plane-sphere intersection solution is notparticularly useful for VoF-LBM simulations, it mightbe of interest for some completely diﬀerent applications.9 Application: Curvature Calculation for VoF-LBM on the GPU Analytic solution to the PLIC Problem Volume-of-Fluid (VoF) is a model to simulate a sharp,freely moving interface between a ﬂuid and gas phase ina Cartesian lattice [8–11]. While it can be coupled to anyﬂow solver, here we focus on its usage in conjunction withthe Lattice-Boltzmann-method (LBM). The interface isensured to be exactly one lattice cell thick at any time(illustrated in ﬁgure 3). As an indicator for each latticepoint type, the ﬁll level ϕ is introduced, whereby for ﬂuid lattice points ϕ = 1, for interface > ϕ > gas ϕ = 0: ϕ ( (cid:126)x, t ) := m ( (cid:126)x, t ) ρ ( (cid:126)x, t )  = 1 if (cid:126)x is ﬂuid ∈ ]0 ,

1[ if (cid:126)x is interface = 0 if (cid:126)x is gas (39)Here ρ is the density provided by the lattice Boltzmannmethod (LBM) and m is the ﬂuid mass. m is a conservedquantity and cannot be gained or lost, only moved withinthe simulation box. Figure 3: The idea of the Volume-of-Fluid modelillustrated in 2D: A sharp interface (black curved line)divides the gas phase (white cells) from the ﬂuid phase(dark blue cells). Lattice points are located at the centerof each cell. All cells through which the interface extendsare called interface cells (light blue). Every lattice cellhas a ﬁll level ϕ ∈ [0 ,

1] assigned to it, which is ϕ = 0 for gas , ϕ = 1 for ﬂuid and ϕ ∈ ]0 ,

1[ for interface – basedon where exactly the sharp interface cuts through.The key diﬃculty of modeling a free surface on adiscretized lattice is to obtain the surface curvature,which is a necessary ingredient for calculating the surfacetension via the Young-Laplace pressure∆ p = 2 σ κ (40)with κ = R denoting the local mean curvatureand σ denoting the surface tension parameter of thesimulated ﬂuid. The equation is easy in principle, but calculating κ from the discretized interface geometry isnot. Speciﬁcally, discretized interface here means thatonly a local 3 neighborhood of ﬁll levels ϕ ∈ [0 , interface lattice point. ϕ , ..., ϕ → κ (41)The most common algorithm in literature [8, 11] is thecurvature calculation via a least-squares paraboloid ﬁtfrom a neighborhood of points located on the interface.It assumes the local interface to be a paraboloid, thespeciﬁcs of which will be given in the following sections.Finding an appropriate set of neighboring points on theinterface requires the PLIC solution. Piecewise linear interface construction works on a 3 neighborhood of an interface lattice point (illustratedin 2D in ﬁgure 4a). Within this neighborhood, onlyinterface points other than the center interface point –which is always interface – are considered as candidatesfor the later ﬁtting procedure (ﬁgure 4b). The diﬃcultpart is to accurately obtain the heights z i of at leastﬁve neighboring points located on the true interface(ﬁgure 4c). For this, ﬁrst the normal vector (cid:126)n of thecenter interface point is calculated via the Parker-Youngsapproximation as described in the appendix 10.1 in eq.(53). A new coordinate system is introduced with itsﬁrst base vector (cid:126)b z deﬁned as this normal vector. Then,the cross product with an arbitrary vector such as (cid:126)r := (0 . , . , . T (42)which is always non-colinear with (cid:126)b z just by randomchance is calculated to provide second and thirdorthonormal vectors (cid:126)b z := (cid:126)n (43) (cid:126)b y := (cid:126)b z × (cid:126)r | (cid:126)b z × (cid:126)r | (44) (cid:126)b x := (cid:126)b y × (cid:126)b z (45)forming the new coordinate system in which the z-axis iscolinear with the surface normal and the center interfacepoint is in the origin. Now the relative positions (cid:126)e i (equal to the D3Q27 LBM streaming directions, eq. (52))of all neighboring interface lattice points are gatheredand transformed into the rotated coordinate system.During this step, the approximate interface position ofneighboring interface points (streaming directions, ﬁgure4b) is corrected to the much more accurate interface10 Application: Curvature Calculation for VoF-LBM on the GPU Analytic solution to the PLIC Problem (a) (b) (c) (d) (e) Figure 4: The curvature calculation procedure with PLIC illustrated in 2D. (a) The ﬁll levels of the interfacelattice points indicate the position of the true interface (not known; here illustrated as a black curve), but onlya 3 -neighborhood of these ﬁll levels is available in memory. For obtaining the local curvature, the steps are to(b) identify all interface neighbors (eq. (39)), (c) correct the relative interface neighbor positions with the PLICoﬀset (section 2), (d) rotate/translate these now PLIC-corrected points into a coordinate system (eq. (46)) with thePLIC-corrected center point being the origin and the z-axis being colinear with the local surface normal (appendix10.1) and ﬁnally (e) perform a paraboloid ﬁt with these points (appendix 10.3).position via the PLIC plane-cube intersection solution(section 2, ﬁgure 4c and 4d): (cid:126)p i =  x i y i z i  :=  (cid:126)e i · (cid:126)b x (cid:126)e i · (cid:126)b y (cid:126)e i · (cid:126)b z + d ( ϕ i , (cid:126)n ) − d ( ϕ , (cid:126)n )  (46)Here i is only the subset of { , .., } for which 0 < ϕ i < d ( V = ϕ i , (cid:126)n ) denotes thePLIC function (equation (8)). Note that d ( V = ϕ , (cid:126)n )only needs to be calculated once while d ( V = ϕ i , (cid:126)n ) hasto be calculated for each neighboring interface point andthat the normal vectors of neighboring interface latticepoints are approximated to be equal to the normal vectorof the center lattice point. In theory, going with theseparately calculated neighbor normal vectors – whichwould require either an additional data buﬀer for normalvectors in memory or alternatively a 5 neighborhoodwhich would break the multi-GPU capabilities of thecode – should be more accurate, but our practical testsindicated no noticeable diﬀerence.The set of points (cid:126)p i is then used to ﬁt a localparaboloid. This paraboloid (ﬁgure 4e) here lacks avertical oﬀset parameter as that is handled alreadyby the center point being deﬁned as the origin,reducing computational cost to a LU-decomposition ofdimensionality N = 5. The paraboloid has the form z ( x, y ) = Ax + By + Cxy + Hx + Iy =: (cid:126)x · (cid:126)Q (47)with (cid:126)x := ( A, B, C, H, I ) T (48) (cid:126)Q := ( x , y , x y, x, y ) T (49)The solution vector (cid:126)x and thus the ﬁtting parametersare calculated following the procedure in appendix 10.3. Finally, the constants A , B , C , H and I are inserted intothe analytic equation for the curvature (62), completingthe algorithm. The analytic plane-cube intersection solution presentedin this work has originally been developed for the VoF-LBM GPU simulation code

FluidX3D , where we couldobserve a signiﬁcant speedup compared to when aniterative nested-intervals solution is used. To illustratethis particular application of PLIC, we show a simulationof a 5 mm diameter raindrop impact at terminal velocityin ﬁgure 5. The parameters for this simulation are Re = 35195, We = 5702, Fr = 41 . Ca = 0 . Bo =3 . FluidX3D is documented ingreat detail in [6].11 Application: Curvature Calculation for VoF-LBM on the GPU Analytic solution to the PLIC Problem (a) t = 0 ms (b) t = 1 ms (c) t = 2 ms(d) t = 3 ms (e) t = 4 ms (f) t = 5 ms Figure 5: A 5 mm diameter raindrop impacting a lake at 9 . ms mean sea level pressure terminal velocity [30] and10 ◦ C water temperature simulated with the VoF-LBM GPU simulation code

FluidX3D at a lattice resolution of400 × ×

340 with the D3Q19 discretization and the SRT collision operator. The simulation box in SI-units hasthe dimensions 5 .

00 cm × .

25 cm and the pool height is 2 .

00 cm. Compute time for this simulation is lessthan one minute on a single Nvidia Titan Xp GPU. Visualization is done with a custom GPU implementation ofthe marching cubes algorithm [31–33]. 12 Declarations Analytic solution to the PLIC Problem

We derived an analytic solution to the PLIC problemand compared it to the existing solution by Scardovelliand Zaleski [4] in two variants: an implementationby Kawano [5] and an improved and micro-optimizedversion thereof. We furthermore compared thesethree analytic solutions to two iterative solutions usingNewton-Raphson and nested intervals. We providesOpenCL C implementations of all variants as well as theinverse PLIC formulation.We observed that in a synthetic benchmark scenariowhere AVX2 vectorization is available on the CPU,the Newton-Raphson solution (listing 6) is considerablyfaster than all other solutions because the analyticsolutions require trigonometric functions which cannotbe vectorized. Our benchmark calculated PLICrepeatedly in a loop. With FP32 and AVX2, every eightconsecutive iterations are combined into one vectorizediteration, completing in far fewer clock cycles thanthe eight iterations separately would. In real-worldapplications however, PLIC is not computed repeatedlyin multiples of eight, greatly limiting vectorizationpotential.Without vectorization, on both the CPU and GPU theanalytic solutions are faster, with our micro-optimizedversion of the SZ solution as presented in listing 4 beingfastest. For a generic PLIC problem, this is the solutionwe recommend.In the most common application of PLIC –curvature calculation for Volume-of-Fluid LBM, whichwe presented in some detail, bringing together the variousrequired parts from literature – proﬁling revealed PLICto be the main bottleneck regarding compute time Here,the presented fast PLIC solution led to signiﬁcant speedup of VoF calculations. We hope that our ﬁndings willalso make other simulation codes more computationallyeﬃcient.

We gratefully acknowledge computing time provided bythe SuperMUC system of the Leibniz Rechenzentrum,Garching. We further acknowledge support throughthe computational resources provided by the BavarianPolymer Institute. We acknowledge the NVIDIA Cor-poration for donating a Titan Xp GPU for our research.

ML acknowledges funding by the Deutsche Forschungs-gemeinschaft (DFG, German Research Foundation) -Project number 391977956 - SFB 1357 ”Microplastics”(subproject B04).

The authors declare no potential conﬂicts of interests.

All data is archived and available upon request.

All code beyond what is provided in the listings isarchived and available upon request.

Analytic calculations were done by ML (section 2) andSG (section 3). ML did programming and testing. MLwrote the initial draft and created the illustrations. MLand SG did review and editing on the draft.13 References Analytic solution to the PLIC Problem [1] David L Youngs. “Time-dependent multi-materialﬂow with large ﬂuid distortion”. In:

Numericalmethods for ﬂuid dynamics (1982).[2] David L Youngs. “An interface tracking methodfor a 3D Eulerian hydrodynamics code”. In:

Atomic Weapons Research Establishment (AWRE)Technical Report

Computers & Mathematics with Applications

Journal ofComputational Physics

Computers & Fluids

134 (2016),pp. 130–145.[6] Moritz Lehmann.

High Performance Free SurfaceLBM on GPUs . 2019.[7] Fabian Husl.

MPI-based multi-GPU extension ofthe Lattice Boltzmann Method . 2019.[8] Simon Bogner, Ulrich R¨ude, and Jens Harting.“Curvature estimation from a volume-of-ﬂuidindicator function for the simulation of surfacetension and wetting with a free-surface latticeBoltzmann method”. In:

Physical Review E

Journalof Statistical Physics

Technical Report05-4. University ofErlangen-Nuremberg, Germany (2005).[11] Thomas Pohl.

High performance simulation of freesurface ﬂows using the lattice Boltzmann method .Verlag Dr. Hut, 2008.[12] Martin Schreiber and DTMP Neumann. “GPUbased simulation and visualization of ﬂuids withfree surfaces”. PhD thesis. Diploma Thesis,Technische Universit¨at M¨unchen, 2010.[13] St´ephane Popinet. “An accurate adaptive solverfor surface-tension-driven interfacial ﬂows”. In:

Journal of Computational Physics

International Journal of Computational Fluid Dy-namics

International journal for numerical methods inﬂuids

Computers & ﬂuids

Computers & Mathematics with Applications

Computers & Mathematicswith Applications

Computers & Mathematics with Applications .IEEE. 2018, pp. 825–834.[22] Mark J Mawson and Alistair J Revell. “Memorytransfer optimization for a lattice Boltzmannsolver on Kepler architecture nVidia GPUs”. In:

Computer Physics Communications

Computers & Mathematics with Applications

Computers& Mathematics with Applications

Springer International Publishing

The mathematical theory of non-uniform gases: an account of the kinetic theoryof viscosity, thermal conduction and diﬀusion ingases . Cambridge university press, 1990.[27] Acep Purqon et al. “Accuracy and NumericalStabilty Analysis of Lattice Boltzmann Methodwith Multiple Relaxation Time for IncompressibleFlows”. In:

Journal of Physics: Conference Series .Vol. 877. 1. IOP Publishing. 2017, p. 012035.[28] Microsoft Corporation.

Vectorizer and parallelizermessages . 2019. url : https://docs.microsoft.com/en-us/cpp/error-messages/tool-errors/vectorizer - and - parallelizer - messages ?view=vs-2019 (visited on 05/11/2020).[29] NVIDIA Corporation. Parallel Thread ExecutionISA Version 6.4 . 2019. url : https : / / docs .nvidia . com / cuda / parallel - thread -execution/index.html (visited on 10/25/2019).[30] Federico Porc`u et al. “Eﬀects of altitude onmaximum raindrop size and fall velocity aslimited by collisional breakup”. In: Journal of theatmospheric sciences

Polygonising a scalar ﬁeld . 1994.[32] William E Lorensen and Harvey E Cline. “March-ing cubes: A high resolution 3D surface construc-tion algorithm”. In:

ACM siggraph computer graph-ics

Journal of Computer Graphics Techniques Vol

Two and three dimen-sional Eulerian simulation of ﬂuid ﬂow with ma-terial interfaces . Atomic Weapons Establishment,1992.[35] Andrew N Pressley.

Elementary diﬀerential geom-etry . Springer Science & Business Media, 2010.[36] Elsa Abbena, Simon Salamon, and Alfred Gray.

Modern diﬀerential geometry of curves and sur-faces with Mathematica . Chapman and Hall/CRC,2017.[37] Jingyi Yu et al. “Focal surfaces of discretegeometry”. In:

ACM International ConferenceProceeding Series . Vol. 257. 2007, pp. 23–32.[38] Zvi Harel. “Curvature of curves and surfaces–aparabolic approach”. In:

Department of Mathemat-ics, Technion–Israel Institute of Technology (1995).[39] Yan-Bin Jia. “Gaussian and Mean Curvatures”. In:(2018). [40] David Eberly. “Least squares ﬁtting of data”. In:

Chapel Hill, NC: Magic Software (2000).15 Appendix A: PLIC Inversion with Mathematica Analytic solution to the PLIC Problem

In[1]:= $Assumptions = { x, y, a, b, c, V, n , n , n } ∈ Reals

Out[1]= ( x y a b c V n n n ) ∈  In[2]:= f : = c - a * ( - I * ( / ))/ ( x + I * y ) ^ ( / )- b * ( + I * ( / ))* ( x + I * y ) ^ ( / ) fFullSimplify [ ComplexExpand [ Re [ f ]]] Out[3]= c -  - ⅈ  a ( x + ⅈ y ) / - + ⅈ ( x + ⅈ y ) / Out[4]= c +  a + b  x + y  /  - Cos  Arg [ x + ⅈ y ]  +  Arg [ x + ⅈ y ]  x + y  / In[5]:=

V1 : = d^3 / ( * n * n * n ) V1Solve [ V ⩵ V1, d ] Out[6]= d n n Out[7]=  d → -(- ) / V / n / n / n /  ,  d → / V / n / n / n /  ,  d → (- ) / / V / n / n / n /  In[8]:=

V2 : = ( d^3 -( d - n ) ^3 )/ ( * n * n * n ) V2Solve [ V ⩵ V2, d ] Out[9]= d -( d - n ) n n Out[10]=  d →

12 n - - n +

24V n n  ,  d →

12 n + - n +

24V n n  In[11]:=

V3 : = ( d^3 -( d - n ) ^3 -( d - n ) ^3 )/ ( * n * n * n ) V3Solve [ V ⩵ V3, d ] Out[12]= d -( d - n ) -( d - n ) n n Out[13]=  d → n + n + × / n n n + n - n n + - n +  n + n - n n  / + n + n - n n + - n +  n + n - n n  / × /  ,  d → n + n - × /  + ⅈ  n n n + n - n n + - n +  n + n - n n  / - × / - ⅈ n + n - n n +- n +  n + n - n n  /  ,  d → n + n - × /  - ⅈ  n n n + n - n n + - n +  n + n - n n  / - × / + ⅈ n + n - n n + - n +  n + n - n n  /  plic_solution.nb In[14]:=

V4 : = ( d^3 -( d - n ) ^3 -( d - n ) ^3 -( d - n ) ^3 )/ ( * n * n * n ) V4Solve [ V ⩵ V4, d ] Out[15]= d -( d - n ) -( d - n ) -( d - n ) n n Out[16]=  d → ( n + n + n )-  - ( n + n + n ) +  n + n + n  × / n n - n n +( n n - n n ) +  - ( n + n + n ) +  n + n + n  / + × / n n - n n +( n n - n n ) +  - ( n + n + n ) +  n + n + n  /  ,  d → ( n + n + n )+ + ⅈ  - ( n + n + n ) +  n + n + n   × / n n - n n +( n n - n n ) +  - ( n + n + n ) +  n + n + n  / - × / - ⅈ n n - n n +( n n - n n ) +  - ( n + n + n ) +  n + n + n  /  ,  d → ( n + n + n )+ - ⅈ  - ( n + n + n ) +  n + n + n   × / n n - n n +( n n - n n ) +  - ( n + n + n ) +  n + n + n  / - × / + ⅈ n n - n n +( n n - n n ) +  - ( n + n + n ) +  n + n + n  /  plic_solution.nb In[17]:=

V5 : = ( d -( n + n )/ )/ n V5Solve [ V ⩵ V5, d ] Out[18]= d + (- n - n ) n Out[19]=  d → ( n + n +

2V n )  plic_solution.nb

160 Appendix B Analytic solution to the PLIC Problem

10 Appendix B: Paraboloid Cur-vature, Interface Normal andLeast-Squares Fit Neighborhood

Calculating the normal vector on an interface latticepoint in a 3 neighborhood in which all ﬁll levels ϕ i areknown works by applying the gradient to the ﬁll levels: ∇ ϕ ( x, y, z ) =  ∂∂x ϕ ( x, y, z ) ∂∂y ϕ ( x, y, z ) ∂∂z ϕ ( x, y, z )  ≈  ϕ + ϕ + ϕ + ϕ + ϕ + ϕ + ϕ + ϕ + ϕ ϕ + ϕ + ϕ + ϕ + ϕ + ϕ + ϕ + ϕ + ϕ ϕ + ϕ + ϕ + ϕ + ϕ + ϕ + ϕ + ϕ + ϕ  −−  ϕ + ϕ + ϕ + ϕ + ϕ + ϕ + ϕ + ϕ + ϕ ϕ + ϕ + ϕ + ϕ + ϕ + ϕ + ϕ + ϕ + ϕ ϕ + ϕ + ϕ + ϕ + ϕ + ϕ + ϕ + ϕ + ϕ  = 118 (cid:88) i =1 (cid:126)e i ϕ i (50)This is called the center of mass (CM) method: (cid:126)n CM := − (cid:80) i =1 (cid:126)e i ϕ i | (cid:80) i =1 (cid:126)e i ϕ i | (51) (cid:126)e i are the directions from the center point of the 3 -neighborhood to all of its 26 neighbors including itself: (cid:126)e i =  ± ± ± ± ± ± ± ± ∓

10 0 ± ± ± ∓ ± ± ± ∓ ±

10 0 0 ± ± ± ∓ ∓ ± ∓ ± ±  , i ∈ [0 ,

26] (52)Another more accurate approach is the Parker-Youngs(PY) approximation [11, 34] which assigns diﬀerentweights to the gradient components: (cid:126)n PY := − (cid:80) i =1 w i (cid:126)e i ϕ i | (cid:80) i =1 w i (cid:126)e i ϕ i | (53)with w i :=  | (cid:126)c i | = 12 for | (cid:126)c i | = √

24 for | (cid:126)c i | = √ ◦ while for PY it is approximately 1 ◦ . For thesurface curvature algorithms below, the more accurateand equally fast PY method is used. A paraboloid curve is described by z = f ( x, y ) = A x + B y + C x y + H x + I y + J (55)where A , B , C , H , I and J are ﬁtting parameters.For such a 2D surface in 3D space in the Monge patch x, y, z = f ( x, y ), the mean curvature [35, p.185][36–39]is κ := f xx (cid:0) f y + 1 (cid:1) + f yy (cid:0) f x + 1 (cid:1) − f xy f x f y (cid:16)(cid:113) f x + f y + 1 (cid:17) (56)The partial derivatives of eq. (55) evaluated at the point( x = 0 , y = 0) are f xx | x = y =0 = 2 A (57) f yy | x = y =0 = 2 B (58) f xy | x = y =0 = C (59) f x | x = y =0 = 2 A x + C y + H | x = y =0 = H (60) f y | x = y =0 = 2 B y + C x + I | x = y =0 = I (61)so that the mean curvature for the paraboloid at theorigin is given by κ := A ( I + 1) + B ( H + 1) − C H I (cid:0) √ H + I + 1 (cid:1) . (62)We note here that [8] in equation (13) have an erroneousfactor 2 and that [13] use a diﬀerent deﬁnition of themean curvature. The strategy for ﬁnding the requiredﬁtting parameters is to apply a least-squares ﬁt on aneighborhood of points on the interface.170 Appendix B Analytic solution to the PLIC Problem The least-squares method [40] is a procedure for ﬁttingan analytic curve – here a Monge patch – to a set ofdiscretized points located nearby the analytic curve. Thegeneral idea is to deﬁne the total error as a generalexpression of all ﬁtting parameters and the entire set ofdiscretized points and then ﬁnd its global minimum byzeroing its gradient.The analytic curve ﬁrst needs to be written in a dotproduct form z ( x, y ) = (cid:126)x · (cid:126)Q (63)with (cid:126)x being deﬁned as the vector of parameters thatdeﬁne the curve and (cid:126)Q = (cid:126)Q ( x, y ) being an expressionof the continuous coordinates x and y . This equationis then discretized to a set of individual data points( x i , y i , z i ) z i ( x i , y i ) ≈ (cid:126)x · (cid:126)Q i (64)with (cid:126)Q i = (cid:126)Q i ( x i , y i ) being a vector containingexpressions only dependent on a discretized set of points( x i , y i ) whose corresponding z -component z i is locatedclose to the curve. In this notation, the error E betweenthe z-positions of the analytic curve (cid:126)x · (cid:126)Q and a set ofz-positions of at least N neighboring interface points z i is deﬁned by summing up the squared diﬀerences E ( (cid:126)x ) = N (cid:88) i =0 ( (cid:126)x · (cid:126)Q i − z i ) (65)whereby N denotes the dimensionality which is equal tothe number of desired ﬁtting parameters. The gradientof the error E is calculated and set to zero, where theerror must have a global minimum: ∇ E ( (cid:126)x ) = 2 N (cid:88) i =0 ( (cid:126)x · (cid:126)Q i − z i ) (cid:126)Q i = 0 (66)With some algebra, this equation is then transformedinto a linear equation (cid:32) N (cid:88) i =0 (cid:126)Q i (cid:126)Q i T (cid:33) (cid:126)x = N (cid:88) i =0 z i (cid:126)Q i (67) M := N (cid:88) i =0 (cid:126)Q i (cid:126)Q i T (cid:126)b := N (cid:88) i =0 z i (cid:126)Q i (68) M (cid:126)x = (cid:126)b (69)which is solved by LU-decomposition and provides thedesired solution (cid:126)x that uniquely deﬁnes the curve.Note that the matrix M is always symmetrical,meaning that only the upper half and diagonal haveto be calculated explicitly and the lower half is copied over. This reduces computational cost signiﬁcantly dueto every matrix element being a sum over an expressiondepending on all ﬁtted points. In case there are lessthan N data points available (lattice points next tosolid boundaries may have less interface neighbors), theregular ﬁtting will not work. Instead, then the amount ofﬁtting parameters is decreased to match the number ofavailable data points by reducing dimensionality in theLU-decomposition. The ignored ﬁtting parameters willremain zero.Finally, from the solution vector (cid:126)x(cid:126)x