[PDF] Quality tetrahedral mesh generation with HXT

Abstract

We proposed, in a recent paper (10.1002/nme.5987), a fast 3D parallel Delaunay kernel for tetrahedral mesh generation. This kernel was however incomplete in the sense that it lacked the necessary mesh improvement tools. The present paper builds on that previous work and proposes a fast parallel mesh improvement stage that delivers high-quality tetrahedral meshes compared to alternative open-source mesh generators. Our mesh improvement toolkit includes edge removal and improved Laplacian smoothing as well as a brand new operator called the Growing SPR Cavity, which can be regarded as the mother of all flips. The paper describes the workflow of the new mesh improvement schedule, as well as the details of the implementation. The result of this research is a series of open-source scalable software components, called HXT, whose overall efficiency is demonstrated on practical examples by means of a detailed comparative benchmark with two open-source mesh generators: Gmsh and TetGen.

Full PDF

RReceived: Added at production Revised: Added at production Accepted: Added at productionDOI: xxx/xxxx

ARTICLE TYPE

Quality tetrahedral mesh generation with HXT

Célestin Marot* | Jean-François Remacle Institute of Mechanics, Materials and CivilEngineering, Université catholique deLouvain, Louvain-la-Neuve, Belgium

Correspondence

Célestin Marot, Institute of Mechanics,Materials and Civil Engineering,Université catholique de Louvain,Avenue Georges Lemaitre 4, bte L4.05.02,1348 Louvain-la-Neuve, Belgium.Email: [email protected] InformationThis research was supported by theEuropean Union’s Horizon 2020 researchand innovation programme,ERC-2015AdG-694020.

Summary

We proposed, in a recent paper , a fast 3D parallel Delaunay kernel for tetrahedralmesh generation. This kernel was however incomplete in the sense that it lackedthe necessary mesh improvement tools. The present paper builds on that previouswork and proposes a fast parallel mesh improvement stage that delivers high-qualitytetrahedral meshes compared to alternative open-source mesh generators. Our meshimprovement toolkit includes edge removal and improved Laplacian smoothing aswell as a brand new operator called the Growing SPR Cavity, which can be regardedas the mother of all ﬂips . The paper describes the workﬂow of the new mesh improve-ment schedule, as well as the details of the implementation. The result of this researchis a series of open-source scalable software components, called HXT, whose overalleﬃciency is demonstrated on practical examples by means of a detailed comparativebenchmark with two open-source mesh generators: Gmsh and TetGen. KEYWORDS:

HXT, Growing SPR Cavity, quality, tetrahedral, mesh, improvement

Tetrahedral meshes are the geometrical support for most ﬁnite element discretizations. The size and the shape of the generatedtetrahedral elements must however be controlled cautiously to ensure reliable numerical simulations in industrial applications.The majority of popular tetrahedral mesh generators are based on a Delaunay kernel, because Delaunay-based algorithms arefast, especially in 3D. They are also robust and consume relatively little memory. Yet, pure

Delaunay meshes are known tocontain near-zero volume elements, called slivers , and a mesh improvement stage is mandatory if one wishes to end up with ahigh-quality computational mesh.We proposed, in a recent paper , techniques to compute a Delaunay triangulation of three billion tetrahedra in less than oneminute using multiple threads. We also explained how to extend that algorithm to obtain an eﬃcient parallel tetrahedral meshgenerator. This mesh generator was however incomplete in the sense that it did not provide any mesh improvement process. Thepresent paper builds on the 2019 paper and proposes a mesh improvement strategy that is both highly parallel and eﬃcient, andgenerates high quality meshes.Starting with an existing surface mesh, tetrahedral mesh generation is usually broken down into four steps.1. Empty Mesh : A triangulation (tetrahedralization) of the volume, based on all points of the surface mesh and with noadditional interior nodes (hence the name empty mesh), except for some isolated user-speciﬁed points in the interior ofthe volume, is ﬁrst generated.2.

Boundary Recovery : The recovery step locally modiﬁes the empty mesh to match the triangular facets of the surfacemesh as well as some possible embedded user-speciﬁed constrained triangles and lines. a r X i v : . [ c s . C G ] A ug Reﬁnement : The empty mesh is iteratively reﬁned by adding points in the interior of the volumes while always preservinga valid mesh at each iteration. The reﬁnement step ends up when all tetrahedra are smaller in size than a user-prescribedsizemap.4.

Improvement : Whereas the reﬁnement step controls the size of the generated tetrahedra, the improvement step ﬁnalizesthe mesh by locally eliminating badly shaped tetrahedra, and optimizes the quality of the mesh by means of speciﬁctopological operations and vertex relocations.Steps 1 and 3 were described in detail in our 2019 paper . Step 2 is currently done using TetGen’s implementation . Finally,Step 4, which is the central topic in this paper, relies on a novel mesh improvement strategy based on an innovative local meshmodiﬁcation operator called Growing SPR Cavity . The overall eﬃciency of the resulting full-ﬂedged open-source tetrahedralmesh generation library is then discussed in detail. Some implementation details are included in this paper, be the authors areaware that the devil is very much into the details. Being commited to provide reproducible research , the code and meshes usedin this paper’s test cases are thus all made available at https://git.immc.ucl.ac.be/marotc/tetmesher_benchmark.The computer code supporting this paper is available in Gmsh 4.6.0 and above versions through the -algo hxt parameteror the Mesh.Algorithm3D=10 option. HXT successfully passes all Gmsh 3D benchmarks and, as discussed below, is fasterand generates meshes of higher quality than other open-source implementations, including Gmsh’s native 3D mesher.The paper is structured as follows. Section 2 contains a concise review of existing mesh improvement operations. Wheneverour implementation diﬀers from others, the diﬀerences are brieﬂy explained. Section 3 is devoted to the presentation of ournew

Growing SPR Cavity operation, which is the main contribution of this paper. In section 4, we detail our mesh improvementschedule, its parallelization and the quality improvement that can be obtained. Finally, in Section 5, we present our complete tetra-hedral mesh generator algorithm (HXT) and compare it against Gmsh and TetGen, two other reference open-source tetrahedralmesh generation softwares.

Mesh improvement operations are modiﬁcations of the mesh aiming at increasing its overall quality, the latter being essentiallydetermined by the quality of its worst quality element . Therefore, when we shall speak of the quality of a mesh or of a cavity,we shall in practice refer to the quality of its worst tetrahedron. The value of a quality measure is expected to be inverselyproportional to the interpolation error associated with the tetrahedron in some discretization scheme. If the tetrahedron is notvalid, i.e., if it is inverted or ﬂat, the quality measure should then be non-positive. Two examples of quality measure, the Gammaand SICN measure, are described in Appendix A.3 and A.4.A cavity is a volume corresponding to a set of face-connected tetrahedra. A mesh modiﬁcation operates on a cavity, which is bydeﬁnition the part of the mesh that is modiﬁed. Among all possible mesh modiﬁcations, mesh improvement operations are thosethat strictly increase the quality of a cavity, from an original tetrahedralization of quality 𝑞 𝑎 to an improved tetrahedralizationof quality 𝑞 𝑏 > 𝑞 𝑎 . Any mesh modiﬁcation can be decomposed into a succession of elementary moves, called bistellar ﬂips orPachner moves . The 1-4 move adds a point inside a cavity formed by the removal of a single tetrahedron, and ﬁlls it with 4 newtetrahedra. The 4-1 move is the opposite, it removes a point. Both operations are illustrated on Figure 1a. The two remainingPachner moves are the 2-3 and 3-2 ﬂips, which do neither add nor remove any points in the tetrahedralization (see Figure 1b). The purpose of the mesh reﬁnement step is to obtain a distribution of points all over the volume whose distance to the closestneighbour is prescribed by a given mesh size ﬁeld. Adding or removing vertices during the subsequent mesh improvementphase is therefore likely to disrupt the reﬁned point distribution and see it deviate from the prescribed size map. The simplestmesh improvement schedule one may think of is thus composed of 2-3 and 3-2 ﬂips only, the simplest operations that do notadd or remove points, and are executed whenever they improve the quality of the tetrahedralization in their respective cavity.This strategy is able to eliminate eﬃciently most ill-shaped tetrahedra. However, this hill-climbing method often reaches localmaxima where 2-3 or 3-2 ﬂip no longer improve the mesh quality although the overall quality is not yet optimal. To overcome https://gitlab.onelab.info/gmsh/gmsh/-/tree/master/contrib/hxt (a) The 1-4 move (point insertion) and 4-1 move (point removal) (b)

The 2-3 and 3-2 ﬂips

FIGURE 1

Pachner movesthis limitation, combinations of multiple ﬂips can be applied at once whenever one can check they result in a tetrahedralizationof better quality. This is equivalent with creating more complex topological operations. For example, the 4-4 ﬂip, which is anoperation on a cavity of 6 points and 4 tetrahedra with only one interior edge (see Figure 2a), can be obtained by doing a 2-3ﬂip that creates a new interior edge, followed by a 3-2 ﬂip that removes the initial edge. (a)

The 4-4 ﬂip e d g e r e m o v a l m u l t i - f a c e r e m o v a l e d g e r e m o v a l m u l t i - f a c e r e m o v a l multi-face retriangulation (b) Edge removal, multi-face removal and multi-face retriangulation

FIGURE 2

Examples of composite topological operations

The most useful topological operation is the edge removal operation. It is a generalization of the 3-2 and 4-4 ﬂips, starting froma cavity with 𝑁 ≥ tetrahedra surrounding a unique edge 𝑎𝑏 (see Figure 2b). The operation removes ﬁrst all tetrahedra adjacentto that edge, and creates instead 𝑁 − 2 upper tetrahedra connected to 𝑎 and 𝑁 − 2 lower tetrahedra connected to 𝑏 . The facetsbetween lower and upper tetrahedra form a 2D sandwiched triangulation, and Shewchuk has proposed an algorithm, based on dynamic programming concepts, that ﬁnds the 2D triangulation resulting in an optimal tetrahedralization . In contrast, Sipresents a more versatile implementation of the edge removal operation that recursively removes other edges that prevents thecurrent edge removal from producing a valid tetrahedralization . This forms a tree of edge removal operations, where the leafsare removed with a sequence of 2-3 ﬂips terminated by a ﬁnal 3-2 ﬂip. Our implementation of edge removal does neither usedynamic programming nor sequences of ﬂips. It is rather a brute-force approach inspired by the ﬁrst description of edge removalby Freitag and Ollivier-Gooch . In a nutshell, the principle is as follows. Each triangle of the sandwiched 2D triangulation isassigned a quality, which is the minimum of the corresponding upper and lower tetrahedron quality. If the quality of a triangle isless than the quality of the original tetrahedralization, the triangle is marked bad. Using precomputed tables giving all possibletriangulations up to 𝑁 = 7 in which this triangle is found, it is then possible to eliminate triangulations involving bad triangles.Whenever all triangulations are eliminated, the edge removal is not performed. If, on the other hand, several tetrahedralizationsare possible, their respective overall quality are computed, and the best candidate is selected. As Freitag and Ollivier-Goochnoted, having more than 7 tetrahedra around an edge is exceptional, and edge removal is also less likely to succeed in those cases.Our approach favors thus simplicity over asymptotic complexity, although all approaches have of course their pros and consin practice. In average, our edge removal terminates within about 1 microsecond for cavities of about 5 tetrahedra, on modernhardware.The inverse of the edge removal was coined multi-face removal by Shewchuk . The operations that derive from edge removalare shown in Figure 2b. This group of operations includes a third operation, named Mutli-face retriangulation by Misztal etal. . Mutli-face retriangulation can be regarded either as a combination of an edge removal and a multi-face removal, or as asequence of 4-4 ﬂips modifying the 2D sandwiched triangulation. Although multi-face removal and multi-face retriangulationhave proved their eﬀectiveness, their implementation is more involved than edge removal, and they are far less used. We decidednot to implement them, because they are covered by the Growing SPR Cavity operation anyway. This mother of all ﬂips isdetailed in section 3.

A vertex relocation, or smoothing operation, in the context of tetrahedral mesh optimization, is an operation that changes theposition of a point in order to improve the quality of the adjacent tetrahedra. Smoothing methods have been extensively studiedin the past 25 years , and their objective is twofold: improve the overall quality of the mesh and space out points appropriatelyso that subsequent topological transformations can further improve the mesh. Smoothing and topological transformations areindeed more eﬀective when combined . The most used smoothing technique, called Laplacian smoothing , simply relocates apoint at the centroid of the set of points to which is it connected by a mesh edge. As for ﬂipping, Laplacian smoothing is appliedonly if it eﬀectively improves the quality of the mesh. In our mesh generation library

HXT , Laplacian smoothing is combined witha golden-section search of the optimal relocation on the segment between the original position and the centroid. This approachis eﬀective in practice, although the objective function may have local maxima over the segment, and the golden-section searchis not guaranteed to identify the largest one. In their work , Freitag et al. studied this optimized Laplacian smoothing technique,among other techniques. A good review of classical mesh improvement operations is found in Klingner’s Ph.D. Dissertation . Klingner reports thereinthat large quality improvements are obtained by using point insertion and edge contraction. We have not implement thoseoperations so far, but we will consider them for later updates. They are indeed more complex than one might think at ﬁrst sight.It is not enough to insert or remove points and check on whether the quality is improved. The position of adjacent points mustalso often be modiﬁed to obtain tetrahedra of better quality and, more importantly, to respect the meshsize map.Valid mesh of higher quality may be obtained in some situations if a slight modiﬁcation of the surface mesh is allowed. Thishowever implies relying on a CAD software during the meshing process to evaluate the distance to the parametric deﬁnition ofthe initial surface. A more lightweight approach considering a modiﬁcation of the surface mesh itself, up to an approximatedHausdorﬀ distance threshold, would still mean bookkeeping in memory an unmodiﬁed version of the surface mesh. As we areaiming at a simple and parallelizable mesh generator, we chose to not consider at all operations that modiﬁes the surface mesh.In some cases, this is even an asset, as there are situations where one wants the surface mesh to remain strictly unchanged, forinstance, if the surface is an interface between independent parts of an assembly. Boundary modiﬁcation is however an option FIGURE 3

The 2-2 ﬂip. Triangles of the surface mesh are shaded in orange.worth being considered for future implementations, as it has shown its eﬀectiveness . The most elementary surface meshmodiﬁcation operation simply ﬂips a pair of adjacent tetrahedra, which are themselves adjacent to two boundary triangles onthe surface mesh. This operation is called a and is illustrated in Figure 3. FIGURE 4

2D illustration of the SPR search tree

Small Polyhedron Reconnection (SPR) is a branch and bound type algorithm for ﬁnding the best of all possible tetrahedraliza-tion of a cavity. It was named and ﬁrst implemented by Liu et al. in 2006 . Finding whether a polyhedron can be triangulatedor not is already a NP-complete problem . Finding the best triangulation is an even more diﬃcult NP-hard problem. The SPRalgorithm has indeed factorial complexity with regard to the number of points 𝑛 in the cavity. It starts from a selected face, andrecursively ﬁlls up the cavity with well-shaped tetrahedra, trying out each of the 𝑛 − 3 remaining points, until the best triangula-tion is found. The basic algorithm thus covers the whole tree of possible triangulations, but simple optimizations allow pruningmost branches

14, 16 . Figure 4 shows the search tree for a simple 2D cavity.The choice of the cavity on which to apply the SPR operation is a matter that received very little attention, although theperformances strongly depend on the geometry and topology of the cavity. In this paper, we propose a new algorithm called

Growing SPR Cavity (GSC), whose basic idea is to increment the number of points gradually, and to apply the SPR at each stepuntil a better triangulation is found. As the SPR algorithm has factorial complexity, the cost of repeating SPR operations from 4to 𝑛 points is not much higher than computing directly the best triangulation with 𝑛 points. In addition, most of the SPR structurecan be reused from one iteration to the next, and a better triangulation of the cavity is usually found within few iterations. Thelimit to the maximum number of points has been set to 𝑛 = 32 , and GSC abandons if no better triangulation has been foundwhen the cavity has reached 32 points. We explained in how the SPR algorithm can be optimized by storing, in a 𝑛 table, the quality (and thus also the validity)of all tetrahedra the algorithm already encountered. By limiting the number of points to 32, we can allocate an acceptably large table at the beginning of the GSC algorithm and keep it from one iteration of the GSC algorithm to another. C A (a) C A (b) C A (c) C new A (d) FIGURE 5

Growing SPR Cavity in 2D. Triangles in the cavity  are colored in orange, and triangles adjacent to the cavity  are colored in cyan. (a) : at ﬁrst, the cavity only contains a bad triangle. (b) and (c) : every point outside the cavity is a vertexof at most one triangle in  , so we add the triangle in  with the worst quality to  . (d) : the SPR algorithm found a bettertriangulation for  , the triangulation of the cavity is replaced and the GSC algorithm ends. Algorithm 1

The

Growing SPR Cavity (GSC) operation, applied on a bad tetrahedron 𝑡 of a mesh  function GSC( 𝑡 ,  )  ← 𝑡 ⊳ the cavity is the tetrahedron at ﬁrst 𝑛 ← ⊳ the number of points in  while 𝑛 < do  ← E XTEND C AVITY (  , 𝐶 , 𝑛 ) ⊳ add 𝑝 𝑘 +1 and all tetrahedra with 4 vertices in  𝑘 +1 𝑛 ← 𝑛 + 1  𝑏 ← SPR(  ) ⊳ ﬁnd the optimal triangulation of  if  𝑏 ≠  then return (  ⧵  ) ∪  𝑏 ⊳ Mesh is improved end while return  ⊳ Failed to improve the mesh end function

The GSC pseudocode given in algorithm 1 can be explained as follows. As its initial cavity, GSC starts with a tetrahedronof bad quality which needs to be optimized. Let the four point of this initial tetrahedron be denoted as { 𝑝 , 𝑝 , 𝑝 , 𝑝 } . Now, let  𝑘 be a cavity containing a set of 𝑘 points  𝑘 = { 𝑝 , 𝑝 , … , 𝑝 𝑘 } . To iteratively obtain  𝑘 +1 , a point 𝑝 𝑘 +1 is added and everytetrahedra whose four points are in  𝑘 +1 =  𝑘 ∪ 𝑝 𝑘 +1 are also added. The selection of the next point of 𝑝 𝑘 +1 is a heuristic based ona simple intuition comforted by experience in testing the SPR algorithm. We observed that it is in general easier to ﬁnd a bettertetrahedralization for a cavity with few points and many tetrahedra than for a cavity with few tetrahedra per point. Therefore,the point 𝑝 𝑘 +1 is chosen so as to add as many tetrahedra as possible. Let  𝑘 denote the set of tetrahedra sharing at least onefacet with a tetrahedron in  𝑘 . In practice, GSC adds the most connected point , i.e., the point adjacent to the largest number oftetrahedra in  𝑘 . If several points are adjacent to 𝑚 tetrahedra in  𝑘 , the sum of the qualities of those 𝑚 tetrahedra is evaluatedfor each point, and the point with the lowest sum is selected as a tiebreaker rule. Figure 5 illustrates the GSC algorithm in 2D,where the tiebreaker rule has been used twice. This rule is however empirical and diﬀerent alternative rule were also tested:selecting the point with the highest sum of qualities, with the maximum or minimum quality, with the quality function replaced by the volume or by the height associated to the boundary facet. The proposed tiebreaker rule consistently gave better results inour full mesh improvement schedule, as detailed in section 4.In 2D, the triangles to be added to the cavity, i.e. with all 3 points in  𝑘 +1 =  𝑘 ∪ 𝑝 𝑘 +1 , are always in  𝑘 . However, in 3Dthere might be tetrahedra that are neither in  𝑘 nor in  𝑘 , but still have all their vertices in  𝑘 +1 . Therefore, in reality, the GSCalgorithm does not always add the optimal point that result in the addition of as many tetrahedra as possible. Choosing the mostconnected point and not the optimal point is however simpler and faster in practice. (a) Tetrahedra with 𝛾 < . before (above) andafter (below) the mesh improvement step on the Rotor mesh (b)

Tetrahedra with 𝛾 < . before (above) andafter (below) the mesh improvement step on the Piston mesh (c)

Tetrahedra with 𝛾 < . before (above) andafter (below) the mesh improvement step on the Rim mesh

FIGURE 6

Eﬀect of HXT’s mesh improvement step on bad tetrahedra. The threshold for being considered a bad tetrahedronwas set to 𝛾 𝑡ℎ𝑟𝑒𝑠ℎ𝑜𝑙𝑑 = 0 . . Almost all bad tetrahedra of the Rotor , Piston and

Rim meshes (speciﬁed in Table A1) were improvedabove this threshold.Our mesh improvement strategy includes a Laplacian smoothing phase, an edge removal phase and the GSC. As Laplaciansmoothing and edge removal are approximately faster than GSC, they are used in priority, whereas GSC is used as alast resort technique to unlock processes trapped in local maxima of the mesh quality objective function. The pseudocode for asimpliﬁed serial mesh improvement schedule is presented in Algorithm 2. The for loop on line 7 will be called the

SER loop inthe following (Smoothing Edge Removal) and the loop where the GSC operation is applied, on line 17, will be called

GSC loop .Both loops iterate over bad tetrahedra, which are tetrahedra with a quality under a user-deﬁned threshold. Using the Gammaquality function detailed in Appendix A.3, our mesh improvement strategy is able to eliminate most tetrahedra with 𝛾 < . ,as shown in Figure 6, where one can see that only a few bad tetrahedra subsist. Algorithm 2

The proposed serial mesh improvement schedule tries to improve each tetrahedron 𝜏 from a list of bad tetrahedra  in a mesh  . Bad tetrahedra are tetrahedra with a quality smaller than a user-deﬁned threshold 𝑞 𝑚𝑖𝑛 function M ESH I MPROVEMENT (  , 𝑞 𝑚𝑖𝑛 ) do modifGSC ← ⊳ count modiﬁcations of the mesh by Growing SPR Cavity do modifSER ← ⊳ count modiﬁcations of the mesh by Smoothing or Edge Removal  ← G ET B AD T ETRAHEDRA (  , 𝑞 𝑚𝑖𝑛 ) for 𝜏 ∈  do ⊳ the SER loop improved ← False for point ∈ 𝜏 and ¬ improved do improved ← L APLACIAN S MOOTHING (  , point) for edge ∈ 𝜏 and ¬ improved do improved ← EDGE R EMOVAL (  , edge) if improved then modifSER ← modifSER + 1 while modifSER >  ← G ET B AD T ETS (  , 𝑞 𝑚𝑖𝑛 ) for 𝜏 ∈  do ⊳ the GSC loop if GSC(  , 𝑡𝑎𝑢 ) then modifGSC ← modifGSC + 1 while modifGSC > end function The mesh improvement strategy described above can be parallelized pretty much in the same way as the Delaunay reﬁnement.The parallel shared-memory Delaunay kernel introduced in our previous article partitions the domain on basis of a 3D Moorecurve deﬁned in such a way that all partitions contain the same amount of points to be inserted. A point is in a partition if itsMoore index is in the range that deﬁnes the partition. A tetrahedron, then, is considered to belong to a partition if at least 3 of its4 vertices lay in that partition. When the cavity created for the insertion of a new point overlaps the boundary between diﬀerentpartitions, the operation is suspended, until all other insertions have been tried. At that moment, a new Moore curve and hencenew partitions are created, and the insertion loop resumes with the points whose insertion was suspended. The number of parallelthreads, and hence the number of partitions, is determined at the beginning of each sweep by the percentage 𝜌 of suspended pointinsertions during the previous sweep. If 𝜌 = 1 , a single thread is used so that the termination of the algorithm is guaranteed.Similarly, for mesh improvement, the space ﬁlling curve is partitioned so as to equally distribute bad tetrahedra over thethreads. Figure 7 shows partitioning examples for 8 threads on 3 diﬀerent meshes. The number of threads is again decided infunction of the percentage of suspended operations—due to a conﬂict with another partition— in the previous sweep. The SERloop is thus executed repeatedly with decreasing numbers of threads, until all partition conﬂicts have been resolved. The sameprocedure is used for the GSC loop as well. This parallel algorithm scales well in case of very big meshes only, for essentiallytwo reasons. Firstly, elements crossing partition boundaries represent a larger portion of space in small meshes than in largemeshes, yielding thus mechanically more conﬂicts. This is less of an issue with mesh reﬁnement because cavities are usuallysmaller for point insertion than for the Growing SPR Cavity. The second reason is that the time spent in one execution of theSER or of the GSC loop is very small, typically in the millisecond range. Therefore, the overhead of launching new threads andcomputing Moore indices for the whole mesh is signiﬁcant. Figure 8a compares the scaling in the case of two highly-reﬁnedmodels. The mesh generation was done on the 64 physical cores of an Intel Xeon Phi 7210 machine, running at 1.3 GHz. Theoverall eﬀectiveness and performance of our mesh improvement algorithm is analyzed in section 5.4. (a) Rotor mesh partitions (b)

Piston mesh partitions (c)

Rim mesh partitions

FIGURE 7

Partitions based on the 3D Moore curve ordering. These partitions were created almost instantaneously at the startof the mesh improvement step, running on 8 threads.1 2 4 8 16 32 64 Number of threads T i m e [ s ] Aircraft500 thin ﬁbersperfect scaling (a)

Scaling of tetrahedral mesh improvement on 2 very large meshes. (b)

The 2 diﬀerent meshes used within the scaling graph. Above, an Aircraftwith 637 million tetrahedra, the interior being also meshed. Below, a cubewith 500 thin ﬁbers that has 351 million tetrahedra.

FIGURE 8

Scaling of our parallel mesh improvement schedule on 2 huge meshes

Before comparing the performance of our tetrahedral mesh generator with TetGen and Gmsh, the similarities and key diﬀerencesbetween the diﬀerent implementations are brieﬂy recalled. Besides being all open-source and free, the three considered softwaretools are also structured very similarly. As explained in the introduction, the mesh generation process can be splitted into fourdistinct steps: creation of an empty mesh , boundary recovery , mesh reﬁnement and mesh improvement .Although it could be sensible to work on improving the quality of tetrahedra right at the moment of their creation, none of theopen source mesh generators that are considered here use such a strategy. Separating mesh reﬁnement and mesh improvementhas several advantages. The mesh reﬁnement process essentially creates around times the number of tetrahedra that are presentin the ﬁnal mesh. Avoiding the computation of element quality during reﬁnement therefore allows a substantial gain in eﬃciency.Moreover, having separated pieces of code for reﬁnement and improvement allows to programmers to focus on smaller tasks andgoals. It also allows using meshing capabilities of our code in a modular fashion: it is indeed possible to create a mesh with Gmsh,reﬁne it with TetGen, and optimize it with our HXT algorithm, without duplicating any part of the process. Another advantage isthat the implementations of the diﬀerent steps can be analyzed separately, and their respective execution times be compared on diﬀerent models. The results of our benchmark for all 4 steps of the tetrahedral mesh generation, including the speciﬁcation ofrelevant hardware characteristics and compilation ﬂags, are given in Appendix A. More speciﬁcally, element numbers, timingsand CPU/memory usages are reported in Table A1. Comparative performances per step, except for the boundary recovery step,and per mesh generator are shown as bar plots in Figure A1. The ﬁrst step, i.e., the creation of the empty mesh, consists in computing a Delaunay tetrahedralization through all points ofthe surface mesh plus some possible user-deﬁned interiors points. Since a Delaunay tetrahedralization is unique, provided a simulation of simplicity is used in conjunction with robust adaptive predicates , all three programs generate identical emptymeshes. The ordering of the tetrahedra however, can largely vary from one packege to another. The ordering is even non-deterministic with HXT, because threads can reserve chunks of memory corresponding to a set of 8192 tetrahedra at diﬀerentmoments from one run to another. To get rid of this non-deterministic behaviour, HXT oﬀers a reproducible mode, whichreorders deterministically tetrahedra in the lexicographic order of their nodes. The performance of TetGen, Gmsh and HXT(with and without the reproducible mode), for the diﬀerent meshes shown in Table A1, are reported in Figure A1b. HXT is thefastest for the empty mesh step, but not because of parallelization. Most tetrahedra of the empty mesh indeed cross the domainfrom side to side. With our partitioning method based on a space-ﬁlling curve, points that are at diﬀerent extremities of thedomain have very little chances of being in the same partition. Because a tetrahedron is only considered in a partition when atleast three of its vertices are in that partition, parallelization is not very eﬀective at this step. The speed diﬀerence between HXTand TetGen, or between HXT and Gmsh, is rather explained here by the good serial performances of our Delaunay kernel . All 3 software tools rely internally on TetGen’s boundary recovery code. Table A1 shows however that HXT is a bit slowerthan TetGen for boundary recovery, because of the back and forth conversion of the mesh between its own data structure andTetGen’s format. Gmsh applies the same, albeit much slower, kind of conversion. Moreover, the order under which tetrahedra arestored in memory can either slow down or speedup the boundary recovery process, which explains timing diﬀerences observedbetween HXT in normal or reproducible mode. As the other parts of HXT are parallelized and optimized for heavy workload,boundary recovery is the bottleneck of our code for large meshes. In contrast, HXT usually performs very well on small meshes.We suspect that TetGen’s algorithm for locating missing facets or edges has a superlinear complexity with respect to the size ofthe mesh.

Mesh reﬁnement is the step where our parallel HXT algorithm really stands out. The ways TetGen, HXT or Gmsh proceed toreﬁne the mesh are rather diﬀerent, but Figure A1a nonetheless shows that the diﬀerent software tools generate meshes withonly slightly diﬀerent numbers of tetrahedra. TetGen was given the options q1.1/14 , causing it to reﬁne only tetrahedra witha radius-edge ratio larger than . or a minimum dihedral angle smaller than ◦ . No meshsize constraint was given, althoughTetGen allows it. TetGen therefore adds point only to optimize the quality of elements. In contrast, HXT and Gmsh add a newpoint 𝑝 𝑘 +1 inside a tetrahedron 𝜏 if the insertion does not create an edge shorter than the prescribed meshsize, which is the valuelinearly interpolated from meshsizes at the vertices of 𝜏 . At the beginning of the reﬁnement step, nodal meshsizes are evaluatedfrom the surface mesh. and Gmsh thus do not reﬁne with the goal of improving the quality of the elements. All three softwaresdo however end up with meshes with very similar numbers of tetrahedra. Still, timings for mesh reﬁnement and improvementindicated in Table A1, Figure A1c and A1d, are expressed in second per million tetrahedra to alleviate the eﬀect of the diﬀerentreﬁnement strategies. HXT is at least 5 times faster than Gmsh and TetGen for mesh reﬁnement, consistently generating morethan one million tetrahedra per second when the other mesh generators peak at

300 000 tetrahedra per second. On the model, HXT reaches nearly 3 million tetrahedra per second, which is 132 times faster than Gmsh. HXT is also eﬃcientin terms of memory usage. For the

100 fibers mesh, the peak resident set size does not exceed 77 bytes per tetrahedron. Ifmemory allocations not depending on the size of the mesh are put aside, HXT consumes only about 60 bytes per tetrahedron. For the mesh improvement step, TetGen was given the -o/130 option, setting to ◦ the target maximum dihedral angle . Incontrast, HXT and Gmsh target a minimum inradius to longest edge ratio 𝛾 𝑡ℎ𝑟𝑒𝑠ℎ𝑜𝑙𝑑 = 0 . . As Gmsh and HXT avoid modifyingthe surface mesh, TetGen was also prevented from modifying the surface mesh by enabling the -Y option. In addition to thetimings of Table A1, mesh quality has also been compared after the mesh improvement step for 3 geometries ( Rotor , Piston and

Rim ) and 3 quality measures: • the dihedral angles, plotted in Figure A2 • the inradius to longest edge ratio ( 𝛾 ), shown on the bar plot of Figure A3. The tetrahedra with 𝛾 < . before and afterHXT’s mesh improvement step are also shown in Figure 6 • the signed inverse condition number (SICN), plotted in Figure A4HXT gives comparatively the best results regarding each of those three quality measures, even for the maximum dihedralangles which is the quality measure used by TetGen during the mesh improvements step. Figure A2 indeed shows noticeablysmaller maximum dihedral angles for HXT compared to Gmsh and TetGen for each of the 3 considered meshes. This diﬀerenceis explained by the eﬀectiveness of the new Growing SPR Cavity operation described in section 3. The performance of HXT iseven more striking when looking at the minimum inradius to longest edge ratio and the minimum SICN:Rotor Piston Rim 𝛾 SICN ∡ 𝑚𝑖𝑛 ∡ 𝑚𝑎𝑥 𝛾 SICN ∡ 𝑚𝑖𝑛 ∡ 𝑚𝑎𝑥 𝛾 SICN ∡ 𝑚𝑖𝑛 ∡ 𝑚𝑎𝑥 Gmsh 0.014 0.077 4.06 174.76 0.23 0.23 11.24 161.40 0.0094 0.034 1.59 177.22TetGen 0.037 0.16 5.10 168.28 0.15 0.30 14.40 156.12 0.073 0.15 6.70 168.26HXT (ours) 0.13 0.16 5.10 149.2 0.29 0.39 13.76 146.64 0.16 0.17 5.92 149.09

TABLE 1

Minimum 𝛾 and SICN measure among all tetrahedra, minimum and maximum dihedral angle among all dihedralangles of the mesh, for the Rotor , Piston and

Rim meshes with the 3 diﬀerent open-source software tools tested.HXT’s mesh improvement is parallelized and has very fast smoothing and edge removal operations. Table A1 and FigureA1d indeed indicate that HXT is also the fastest when it comes to mesh improvement, but not by much. The reason for this issimple: TetGen and Gmsh stop optimizing whenever the smoothing and edge removal operations become ineﬀective, whereasHXT, then, starts its ﬁrst GSC sweep.

This paper has presented the parallel tetrahedral mesh generator HXT, and demonstrated its eﬃciency by means of a detailedcomparative benchmark with two concurrent open-source software tools: Gmsh and TetGen. Although the obtained performanceare very satisfactory, there is room for improvement. The ﬁrst and easiest improvement will be the implementation of edgecontraction, point insertion and 2-2 ﬂips. Another particularly challenging task is the parallelization of the boundary recoverystep, which is the last non-parallelized step in the whole meshing process. Thanks to the acquired experience, improvement onthe boundary recovery procedures of TetGen is now deﬁnitely within reach, probably but adding our parallelization strategybased on Moore curves. The novel GSC operation could also help gaining performance, and could be adapted as a last resortoperation to recover constrained facets or edges. Other parallelizations strategies should also be tested for the mesh improvementstage.

ACKNOWLEDGMENTS

This project has received funding from the European Research Council (ERC) under the European Union’s Horizon 2020research and innovation programme (grant agreement ERC-2015-AdG-694020). We are grateful to the authors of the ABC Dataset , for providing the Rim and

Piston geometries (respectively model 40 and 9733). Surface meshes were generated thanksto Gmsh.

References

15. Ruppert J, Seidel R. On the diﬃculty of triangulating three-dimensional Nonconvex Polyhedra. Discrete & ComputationalGeometry. 1992 Mar;7(3):227–253. Available from: http://link.springer.com/10.1007/BF02187840.16. Marot C, Verhetsel K, Remacle JF. Reviving the Search for Optimal Tetrahedralizations. In: Proceedings of the 28th Inter-national Meshing Roundtable. Buﬀalo, New York, USA: Zenodo; 2020. Available from: https://doi.org/10.5281/zenodo.3653420.17. Edelsbrunner H, Mücke EP. Simulation of simplicity: a technique to cope with degenerate cases in geometric algorithms.ACM Trans Graph. 1990;9(1):66–104. Available from: http://doi.acm.org/10.1145/77635.77639.18. Shewchuk JR. Adaptive precision ﬂoating-point arithmetic and fast robust geometric predicates. Discrete & ComputationalGeometry. 1997;18(3):305–363.19. TetGen, Version 1.5, User’s Manual;. Available from: http://wias-berlin.de/software/tetgen/1.5/doc/manual/index.html.20. Koch S, Matveev A, Jiang Z, Williams F, Artemov A, Burnaev E, et al. ABC: A Big CAD Model Dataset For GeometricDeep Learning. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR); 2019. . APPENDIXA BENCHMARKS

Benchmarks use the average of 5 runs on an Intel Ⓡ Core TM i7-6700HQ with 4 cores running at 3.5GHz and 16Gb of RAM.We use Gmsh 4.6.0 with the -optimize_threshold=0.35 option and HXT’s tetMesh_CLI executable provided in gmsh/-contrib/hxt/tetMesh/ with default options. We use TetGen 1.5.1 with "-BNEFIVVYp -q1.1/14 -O7 -o/130" options. TetGenwas compiled with gcc -O3 , Gmsh and HXT with gcc -O3 -march=native . See section 5 for an in-depth discussion on theresults. The code for the benchmarks is available at https://git.immc.ucl.ac.be/marotc/tetmesher_benchmark A.1 Speed

Figure A1 shows bar plots of diﬀerent performance measures, further detailed in Table A1. ﬁ b e r s K nu c k l e R i m ﬁ b e r s R o t o r P i s t on ⋅ t e t . GmshTetGenOursOurs (reprod. mode) (a)

Final number of tetrahedra ﬁ b e r s K nu c k l e R i m ﬁ b e r s R o t o r P i s t on T i m e [ s ] GmshTetGenOursOurs (reprod. mode) (b)

Tetrahedrizing the empty mesh ﬁ b e r s K nu c k l e R i m ﬁ b e r s R o t o r P i s t on ⋅ −5 T i m e / t e t . [ s /t e t ] GmshTetGenOursOurs (reprod. mode) (c)

Mesh reﬁnement ﬁ b e r s K nu c k l e R i m ﬁ b e r s R o t o r P i s t on ⋅ −6 T i m e / t e t . [ s /t e t ] GmshTetGenOursOurs (reprod. mode) (d)

Mesh improvement

FIGURE A1

Performance benchmark bar plots models Rotor Knuckle Rim 100 ﬁbers 300 ﬁbers Piston Surface mesh number of points

115 052 435 423 397 985 47 759 328 661 27 187 number of triangles

230 232 870 914 796 030 94 994 656 162 54 374

Gmsh ﬁnal number of points

138 652 1 435 736 964 213 583 860 2 583 840 52 823 ﬁnal number of tetrahedra

493 632 7 536 855 4 729 987 3 580 914 15 680 789 241 982

Empty Mesh [ 𝑠 ] 2.31 13.78 9.68 0.95 9.78 0.52Boundary Recovery [ 𝑠 ] 2.43 33.039 32.93 6.30 0.07 23.92Mesh Reﬁnement [ 𝜇𝑠 ∕ tet] 14.61 36.73 28.50 28.72 45.12 20.69Mesh Improvement [ 𝜇𝑠 ∕ tet] 4.14 7.16 5.96 5.73 8.24 4.23Max. mem. usage [ 𝐵 ∕ tet] 1226.70 558.86 569.25 524.24 493.53 865.92CPU % 99.9 99.9 99.9 99.9 99.9 99.9 TetGen ﬁnal number of points

169 755 2 140 715 1 101 619 614 938 3 183 439 59 876 ﬁnal number of tetrahedra

670 594 11 442 634 5 382 975 3 606 324 18 589 149 275 491

Empty Mesh [ 𝑠 ] 1.78 8.13 6.32 0.63 4.81 0.42Boundary Recovery [ 𝑠 ] 1.23 6.21 4.70 0.54 4.60 0.32Mesh Reﬁnement [ 𝜇𝑠 ∕ tet] 3.19 5.23 4.76 3.49 4.14 3.98Mesh Improvement [ 𝜇𝑠 ∕ tet] 0.82 1.00 0.94 0.88 0.94 0.73Max. mem. usage [ 𝐵 ∕ tet] 903.46 314.12 474.32 204.72 227.18 614.19CPU % 99.9 99.9 99.9 99.9 99.9 99.9 HXT (ours) ﬁnal number of points

146 957 2 121 555 . . . ﬁnal number of tetrahedra

541 929 . . . . . Empty Mesh [ 𝑠 ] 0.36 1.57 0.87 0.16 1.13 0.12Boundary Recovery [ 𝑠 ] 1.91 8.54 6.64 0.72 6.53 0.47Mesh Reﬁnement [ 𝜇𝑠 ∕ tet] 0.64 0.50 0.47 0.35 0.34 0.62Mesh Improvement [ 𝜇𝑠 ∕ tet] 0.78 0.65 0.11 0.36 0.31 0.25Max. mem. usage [ 𝐵 ∕ tet] 1032.73 221.15 328.88 80.86 77.13 646.72CPU % 292.2 383.0 326.4 523.8 479.4 351.2 HXT (ours)reproducible mode ﬁnal number of points

146 967 2 121 944 1 184 569 1 162 848 4 348 050 57 505 ﬁnal number of tetrahedra

542 012 11 716 723 6 064 318 7 141 786 26 506 921 268 493

Empty Mesh [ 𝑠 ] 0.35 1.67 1.35 0.19 1.35 0.12Boundary Recovery [ 𝑠 ] 1.60 7.60 6.10 0.64 5.74 0.42Mesh Reﬁnement [ 𝜇𝑠 ∕ tet] 0.65 0.83 0.78 0.51 0.75 0.75Mesh Improvement [ 𝜇𝑠 ∕ tet] 0.61 0.64 0.90 0.36 0.24 0.15Max. mem. usage [ 𝐵 ∕ tet] 1031.42 221.40 328.60 123.45 113.98 654.47CPU % 301.4 444.6 391.4 577.4 514.8 379.2 TABLE A1

Performance benchmark table A.2 Dihedral Angles

The dihedral angles formed by each pair of facets of a tetrahedron are customary measures to look at because of their conceptualsimplicity. However, this measure does not directly correspond to any type of error that the discretization induces during a ﬁniteelement simulation . ⋅ a ng l e s Gmsh - Rotor ⋅ Gmsh - Piston ⋅ Gmsh - Rim ⋅ a ng l e s Tetgen - Rotor ⋅ Tetgen - Piston ⋅ Tetgen - Rim ⋅ dihedral angle [ ◦ ] a ng l e s Ours - Rotor ⋅ dihedral angle [ ◦ ]Ours - Piston ⋅ dihedral angle [ ◦ ]Ours - Rim FIGURE A2

Histograms of the dihedral angles. Lower and upper bound are marked with red vertical lines. The average ismarked with a blue line. A.3 Gamma 𝛾 = √

24 3 𝑉 | 𝑒 𝑚𝑎𝑥 | ( 𝐴 + 𝐴 + 𝐴 + 𝐴 ) = √ 𝑟 𝑖𝑛 | 𝑒 𝑚𝑎𝑥 | , where 𝑉 is the volume of the tetrahedron, | 𝑒 𝑚𝑎𝑥 | is the length of the longest edge, 𝐴 𝑖 is the area of the 𝑖 th face and 𝑟 𝑖𝑛 is theinradius of the tetrahedron. The factor √ is added such that the optimal tetrahedron, which is a regular tetrahedron, has aquality 𝛾 = 1 . This quality measure penalize all tetrahedra according to their associated interpolation error . . . . . ⋅ t e t . Gmsh - Rotor . . . . . ⋅ Gmsh - Piston . . . . ⋅ Gmsh - Rim . . . . ⋅ t e t . Tetgen - Rotor . . . . . ⋅ Tetgen - Piston . . . . ⋅ Tetgen - Rim . . . . ⋅ 𝛾 t e t . Ours - Rotor . . . . . ⋅ 𝛾 Ours - Piston . . . . ⋅ 𝛾 Ours - Rim

FIGURE A3

Histograms of the quality measure 𝛾 . Lower bound is marked with a red vertical line. The average is marked witha blue line. A.4 Signed inverse condition number (SICN)

The inverse condition number in Frobenius norm SICN = 𝜅 ( 𝑆 ) for each tetrahedron. 𝜅 ( 𝑆 ) is the condition number of the lineartransformation matrix 𝑆 between a tetrahedron of the mesh and a regular tetrahedron. the SICN is directly proportional to thegreatest lower bound for the distance of 𝑆 to the set of singular matrices . . . . . ⋅ t e t . Gmsh - Rotor . . . . . ⋅ Gmsh - Piston . . . . ⋅ Gmsh - Rim . . . . ⋅ t e t . Tetgen - Rotor . . . . . . ⋅ Tetgen - Piston . . . . ⋅ Tetgen - Rim . . . . ⋅ SICN t e t . Ours - Rotor . . . . . . ⋅ SICNOurs - Piston . . . . ⋅ SICNOurs - Rim