Quality tetrahedral mesh generation with HXT
RReceived: Added at production Revised: Added at production Accepted: Added at productionDOI: xxx/xxxx
ARTICLE TYPE
Quality tetrahedral mesh generation with HXT
Célestin Marot* | Jean-François Remacle Institute of Mechanics, Materials and CivilEngineering, Université catholique deLouvain, Louvain-la-Neuve, Belgium
Correspondence
Célestin Marot, Institute of Mechanics,Materials and Civil Engineering,Université catholique de Louvain,Avenue Georges Lemaitre 4, bte L4.05.02,1348 Louvain-la-Neuve, Belgium.Email: [email protected] InformationThis research was supported by theEuropean Union’s Horizon 2020 researchand innovation programme,ERC-2015AdG-694020.
Summary
We proposed, in a recent paper , a fast 3D parallel Delaunay kernel for tetrahedralmesh generation. This kernel was however incomplete in the sense that it lackedthe necessary mesh improvement tools. The present paper builds on that previouswork and proposes a fast parallel mesh improvement stage that delivers high-qualitytetrahedral meshes compared to alternative open-source mesh generators. Our meshimprovement toolkit includes edge removal and improved Laplacian smoothing aswell as a brand new operator called the Growing SPR Cavity, which can be regardedas the mother of all flips . The paper describes the workflow of the new mesh improve-ment schedule, as well as the details of the implementation. The result of this researchis a series of open-source scalable software components, called HXT, whose overallefficiency is demonstrated on practical examples by means of a detailed comparativebenchmark with two open-source mesh generators: Gmsh and TetGen. KEYWORDS:
HXT, Growing SPR Cavity, quality, tetrahedral, mesh, improvement
Tetrahedral meshes are the geometrical support for most finite element discretizations. The size and the shape of the generatedtetrahedral elements must however be controlled cautiously to ensure reliable numerical simulations in industrial applications.The majority of popular tetrahedral mesh generators are based on a Delaunay kernel, because Delaunay-based algorithms arefast, especially in 3D. They are also robust and consume relatively little memory. Yet, pure
Delaunay meshes are known tocontain near-zero volume elements, called slivers , and a mesh improvement stage is mandatory if one wishes to end up with ahigh-quality computational mesh.We proposed, in a recent paper , techniques to compute a Delaunay triangulation of three billion tetrahedra in less than oneminute using multiple threads. We also explained how to extend that algorithm to obtain an efficient parallel tetrahedral meshgenerator. This mesh generator was however incomplete in the sense that it did not provide any mesh improvement process. Thepresent paper builds on the 2019 paper and proposes a mesh improvement strategy that is both highly parallel and efficient, andgenerates high quality meshes.Starting with an existing surface mesh, tetrahedral mesh generation is usually broken down into four steps.1. Empty Mesh : A triangulation (tetrahedralization) of the volume, based on all points of the surface mesh and with noadditional interior nodes (hence the name empty mesh), except for some isolated user-specified points in the interior ofthe volume, is first generated.2.
Boundary Recovery : The recovery step locally modifies the empty mesh to match the triangular facets of the surfacemesh as well as some possible embedded user-specified constrained triangles and lines. a r X i v : . [ c s . C G ] A ug Refinement : The empty mesh is iteratively refined by adding points in the interior of the volumes while always preservinga valid mesh at each iteration. The refinement step ends up when all tetrahedra are smaller in size than a user-prescribedsizemap.4.
Improvement : Whereas the refinement step controls the size of the generated tetrahedra, the improvement step finalizesthe mesh by locally eliminating badly shaped tetrahedra, and optimizes the quality of the mesh by means of specifictopological operations and vertex relocations.Steps 1 and 3 were described in detail in our 2019 paper . Step 2 is currently done using TetGen’s implementation . Finally,Step 4, which is the central topic in this paper, relies on a novel mesh improvement strategy based on an innovative local meshmodification operator called Growing SPR Cavity . The overall efficiency of the resulting full-fledged open-source tetrahedralmesh generation library is then discussed in detail. Some implementation details are included in this paper, be the authors areaware that the devil is very much into the details. Being commited to provide reproducible research , the code and meshes usedin this paper’s test cases are thus all made available at https://git.immc.ucl.ac.be/marotc/tetmesher_benchmark.The computer code supporting this paper is available in Gmsh 4.6.0 and above versions through the -algo hxt parameteror the Mesh.Algorithm3D=10 option. HXT successfully passes all Gmsh 3D benchmarks and, as discussed below, is fasterand generates meshes of higher quality than other open-source implementations, including Gmsh’s native 3D mesher.The paper is structured as follows. Section 2 contains a concise review of existing mesh improvement operations. Wheneverour implementation differs from others, the differences are briefly explained. Section 3 is devoted to the presentation of ournew
Growing SPR Cavity operation, which is the main contribution of this paper. In section 4, we detail our mesh improvementschedule, its parallelization and the quality improvement that can be obtained. Finally, in Section 5, we present our complete tetra-hedral mesh generator algorithm (HXT) and compare it against Gmsh and TetGen, two other reference open-source tetrahedralmesh generation softwares.
Mesh improvement operations are modifications of the mesh aiming at increasing its overall quality, the latter being essentiallydetermined by the quality of its worst quality element . Therefore, when we shall speak of the quality of a mesh or of a cavity,we shall in practice refer to the quality of its worst tetrahedron. The value of a quality measure is expected to be inverselyproportional to the interpolation error associated with the tetrahedron in some discretization scheme. If the tetrahedron is notvalid, i.e., if it is inverted or flat, the quality measure should then be non-positive. Two examples of quality measure, the Gammaand SICN measure, are described in Appendix A.3 and A.4.A cavity is a volume corresponding to a set of face-connected tetrahedra. A mesh modification operates on a cavity, which is bydefinition the part of the mesh that is modified. Among all possible mesh modifications, mesh improvement operations are thosethat strictly increase the quality of a cavity, from an original tetrahedralization of quality 𝑞 𝑎 to an improved tetrahedralizationof quality 𝑞 𝑏 > 𝑞 𝑎 . Any mesh modification can be decomposed into a succession of elementary moves, called bistellar flips orPachner moves . The 1-4 move adds a point inside a cavity formed by the removal of a single tetrahedron, and fills it with 4 newtetrahedra. The 4-1 move is the opposite, it removes a point. Both operations are illustrated on Figure 1a. The two remainingPachner moves are the 2-3 and 3-2 flips, which do neither add nor remove any points in the tetrahedralization (see Figure 1b). The purpose of the mesh refinement step is to obtain a distribution of points all over the volume whose distance to the closestneighbour is prescribed by a given mesh size field. Adding or removing vertices during the subsequent mesh improvementphase is therefore likely to disrupt the refined point distribution and see it deviate from the prescribed size map. The simplestmesh improvement schedule one may think of is thus composed of 2-3 and 3-2 flips only, the simplest operations that do notadd or remove points, and are executed whenever they improve the quality of the tetrahedralization in their respective cavity.This strategy is able to eliminate efficiently most ill-shaped tetrahedra. However, this hill-climbing method often reaches localmaxima where 2-3 or 3-2 flip no longer improve the mesh quality although the overall quality is not yet optimal. To overcome https://gitlab.onelab.info/gmsh/gmsh/-/tree/master/contrib/hxt (a) The 1-4 move (point insertion) and 4-1 move (point removal) (b)
The 2-3 and 3-2 flips
FIGURE 1
Pachner movesthis limitation, combinations of multiple flips can be applied at once whenever one can check they result in a tetrahedralizationof better quality. This is equivalent with creating more complex topological operations. For example, the 4-4 flip, which is anoperation on a cavity of 6 points and 4 tetrahedra with only one interior edge (see Figure 2a), can be obtained by doing a 2-3flip that creates a new interior edge, followed by a 3-2 flip that removes the initial edge. (a)
The 4-4 flip e d g e r e m o v a l m u l t i - f a c e r e m o v a l e d g e r e m o v a l m u l t i - f a c e r e m o v a l multi-face retriangulation (b) Edge removal, multi-face removal and multi-face retriangulation
FIGURE 2
Examples of composite topological operations
The most useful topological operation is the edge removal operation. It is a generalization of the 3-2 and 4-4 flips, starting froma cavity with 𝑁 ≥ tetrahedra surrounding a unique edge 𝑎𝑏 (see Figure 2b). The operation removes first all tetrahedra adjacentto that edge, and creates instead 𝑁 − 2 upper tetrahedra connected to 𝑎 and 𝑁 − 2 lower tetrahedra connected to 𝑏 . The facetsbetween lower and upper tetrahedra form a 2D sandwiched triangulation, and Shewchuk has proposed an algorithm, based on dynamic programming concepts, that finds the 2D triangulation resulting in an optimal tetrahedralization . In contrast, Sipresents a more versatile implementation of the edge removal operation that recursively removes other edges that prevents thecurrent edge removal from producing a valid tetrahedralization . This forms a tree of edge removal operations, where the leafsare removed with a sequence of 2-3 flips terminated by a final 3-2 flip. Our implementation of edge removal does neither usedynamic programming nor sequences of flips. It is rather a brute-force approach inspired by the first description of edge removalby Freitag and Ollivier-Gooch . In a nutshell, the principle is as follows. Each triangle of the sandwiched 2D triangulation isassigned a quality, which is the minimum of the corresponding upper and lower tetrahedron quality. If the quality of a triangle isless than the quality of the original tetrahedralization, the triangle is marked bad. Using precomputed tables giving all possibletriangulations up to 𝑁 = 7 in which this triangle is found, it is then possible to eliminate triangulations involving bad triangles.Whenever all triangulations are eliminated, the edge removal is not performed. If, on the other hand, several tetrahedralizationsare possible, their respective overall quality are computed, and the best candidate is selected. As Freitag and Ollivier-Goochnoted, having more than 7 tetrahedra around an edge is exceptional, and edge removal is also less likely to succeed in those cases.Our approach favors thus simplicity over asymptotic complexity, although all approaches have of course their pros and consin practice. In average, our edge removal terminates within about 1 microsecond for cavities of about 5 tetrahedra, on modernhardware.The inverse of the edge removal was coined multi-face removal by Shewchuk . The operations that derive from edge removalare shown in Figure 2b. This group of operations includes a third operation, named Mutli-face retriangulation by Misztal etal. . Mutli-face retriangulation can be regarded either as a combination of an edge removal and a multi-face removal, or as asequence of 4-4 flips modifying the 2D sandwiched triangulation. Although multi-face removal and multi-face retriangulationhave proved their effectiveness, their implementation is more involved than edge removal, and they are far less used. We decidednot to implement them, because they are covered by the Growing SPR Cavity operation anyway. This mother of all flips isdetailed in section 3.
A vertex relocation, or smoothing operation, in the context of tetrahedral mesh optimization, is an operation that changes theposition of a point in order to improve the quality of the adjacent tetrahedra. Smoothing methods have been extensively studiedin the past 25 years , and their objective is twofold: improve the overall quality of the mesh and space out points appropriatelyso that subsequent topological transformations can further improve the mesh. Smoothing and topological transformations areindeed more effective when combined . The most used smoothing technique, called Laplacian smoothing , simply relocates apoint at the centroid of the set of points to which is it connected by a mesh edge. As for flipping, Laplacian smoothing is appliedonly if it effectively improves the quality of the mesh. In our mesh generation library
HXT , Laplacian smoothing is combined witha golden-section search of the optimal relocation on the segment between the original position and the centroid. This approachis effective in practice, although the objective function may have local maxima over the segment, and the golden-section searchis not guaranteed to identify the largest one. In their work , Freitag et al. studied this optimized Laplacian smoothing technique,among other techniques. A good review of classical mesh improvement operations is found in Klingner’s Ph.D. Dissertation . Klingner reports thereinthat large quality improvements are obtained by using point insertion and edge contraction. We have not implement thoseoperations so far, but we will consider them for later updates. They are indeed more complex than one might think at first sight.It is not enough to insert or remove points and check on whether the quality is improved. The position of adjacent points mustalso often be modified to obtain tetrahedra of better quality and, more importantly, to respect the meshsize map.Valid mesh of higher quality may be obtained in some situations if a slight modification of the surface mesh is allowed. Thishowever implies relying on a CAD software during the meshing process to evaluate the distance to the parametric definition ofthe initial surface. A more lightweight approach considering a modification of the surface mesh itself, up to an approximatedHausdorff distance threshold, would still mean bookkeeping in memory an unmodified version of the surface mesh. As we areaiming at a simple and parallelizable mesh generator, we chose to not consider at all operations that modifies the surface mesh.In some cases, this is even an asset, as there are situations where one wants the surface mesh to remain strictly unchanged, forinstance, if the surface is an interface between independent parts of an assembly. Boundary modification is however an option FIGURE 3
The 2-2 flip. Triangles of the surface mesh are shaded in orange.worth being considered for future implementations, as it has shown its effectiveness . The most elementary surface meshmodification operation simply flips a pair of adjacent tetrahedra, which are themselves adjacent to two boundary triangles onthe surface mesh. This operation is called a and is illustrated in Figure 3. FIGURE 4
2D illustration of the SPR search tree
Small Polyhedron Reconnection (SPR) is a branch and bound type algorithm for finding the best of all possible tetrahedraliza-tion of a cavity. It was named and first implemented by Liu et al. in 2006 . Finding whether a polyhedron can be triangulatedor not is already a NP-complete problem . Finding the best triangulation is an even more difficult NP-hard problem. The SPRalgorithm has indeed factorial complexity with regard to the number of points 𝑛 in the cavity. It starts from a selected face, andrecursively fills up the cavity with well-shaped tetrahedra, trying out each of the 𝑛 − 3 remaining points, until the best triangula-tion is found. The basic algorithm thus covers the whole tree of possible triangulations, but simple optimizations allow pruningmost branches
14, 16 . Figure 4 shows the search tree for a simple 2D cavity.The choice of the cavity on which to apply the SPR operation is a matter that received very little attention, although theperformances strongly depend on the geometry and topology of the cavity. In this paper, we propose a new algorithm called
Growing SPR Cavity (GSC), whose basic idea is to increment the number of points gradually, and to apply the SPR at each stepuntil a better triangulation is found. As the SPR algorithm has factorial complexity, the cost of repeating SPR operations from 4to 𝑛 points is not much higher than computing directly the best triangulation with 𝑛 points. In addition, most of the SPR structurecan be reused from one iteration to the next, and a better triangulation of the cavity is usually found within few iterations. Thelimit to the maximum number of points has been set to 𝑛 = 32 , and GSC abandons if no better triangulation has been foundwhen the cavity has reached 32 points. We explained in how the SPR algorithm can be optimized by storing, in a 𝑛 table, the quality (and thus also the validity)of all tetrahedra the algorithm already encountered. By limiting the number of points to 32, we can allocate an acceptably large table at the beginning of the GSC algorithm and keep it from one iteration of the GSC algorithm to another. C A (a) C A (b) C A (c) C new A (d) FIGURE 5
Growing SPR Cavity in 2D. Triangles in the cavity are colored in orange, and triangles adjacent to the cavity are colored in cyan. (a) : at first, the cavity only contains a bad triangle. (b) and (c) : every point outside the cavity is a vertexof at most one triangle in , so we add the triangle in with the worst quality to . (d) : the SPR algorithm found a bettertriangulation for , the triangulation of the cavity is replaced and the GSC algorithm ends. Algorithm 1
The
Growing SPR Cavity (GSC) operation, applied on a bad tetrahedron 𝑡 of a mesh function GSC( 𝑡 , ) ← 𝑡 ⊳ the cavity is the tetrahedron at first 𝑛 ← ⊳ the number of points in while 𝑛 < do ← E XTEND C AVITY ( , 𝐶 , 𝑛 ) ⊳ add 𝑝 𝑘 +1 and all tetrahedra with 4 vertices in 𝑘 +1 𝑛 ← 𝑛 + 1 𝑏 ← SPR( ) ⊳ find the optimal triangulation of if 𝑏 ≠ then return ( ⧵ ) ∪ 𝑏 ⊳ Mesh is improved end while return ⊳ Failed to improve the mesh end function
The GSC pseudocode given in algorithm 1 can be explained as follows. As its initial cavity, GSC starts with a tetrahedronof bad quality which needs to be optimized. Let the four point of this initial tetrahedron be denoted as { 𝑝 , 𝑝 , 𝑝 , 𝑝 } . Now, let 𝑘 be a cavity containing a set of 𝑘 points 𝑘 = { 𝑝 , 𝑝 , … , 𝑝 𝑘 } . To iteratively obtain 𝑘 +1 , a point 𝑝 𝑘 +1 is added and everytetrahedra whose four points are in 𝑘 +1 = 𝑘 ∪ 𝑝 𝑘 +1 are also added. The selection of the next point of 𝑝 𝑘 +1 is a heuristic based ona simple intuition comforted by experience in testing the SPR algorithm. We observed that it is in general easier to find a bettertetrahedralization for a cavity with few points and many tetrahedra than for a cavity with few tetrahedra per point. Therefore,the point 𝑝 𝑘 +1 is chosen so as to add as many tetrahedra as possible. Let 𝑘 denote the set of tetrahedra sharing at least onefacet with a tetrahedron in 𝑘 . In practice, GSC adds the most connected point , i.e., the point adjacent to the largest number oftetrahedra in 𝑘 . If several points are adjacent to 𝑚 tetrahedra in 𝑘 , the sum of the qualities of those 𝑚 tetrahedra is evaluatedfor each point, and the point with the lowest sum is selected as a tiebreaker rule. Figure 5 illustrates the GSC algorithm in 2D,where the tiebreaker rule has been used twice. This rule is however empirical and different alternative rule were also tested:selecting the point with the highest sum of qualities, with the maximum or minimum quality, with the quality function replaced by the volume or by the height associated to the boundary facet. The proposed tiebreaker rule consistently gave better results inour full mesh improvement schedule, as detailed in section 4.In 2D, the triangles to be added to the cavity, i.e. with all 3 points in 𝑘 +1 = 𝑘 ∪ 𝑝 𝑘 +1 , are always in 𝑘 . However, in 3Dthere might be tetrahedra that are neither in 𝑘 nor in 𝑘 , but still have all their vertices in 𝑘 +1 . Therefore, in reality, the GSCalgorithm does not always add the optimal point that result in the addition of as many tetrahedra as possible. Choosing the mostconnected point and not the optimal point is however simpler and faster in practice. (a) Tetrahedra with 𝛾 < . before (above) andafter (below) the mesh improvement step on the Rotor mesh (b)
Tetrahedra with 𝛾 < . before (above) andafter (below) the mesh improvement step on the Piston mesh (c)
Tetrahedra with 𝛾 < . before (above) andafter (below) the mesh improvement step on the Rim mesh
FIGURE 6
Effect of HXT’s mesh improvement step on bad tetrahedra. The threshold for being considered a bad tetrahedronwas set to 𝛾 𝑡ℎ𝑟𝑒𝑠ℎ𝑜𝑙𝑑 = 0 . . Almost all bad tetrahedra of the Rotor , Piston and
Rim meshes (specified in Table A1) were improvedabove this threshold.Our mesh improvement strategy includes a Laplacian smoothing phase, an edge removal phase and the GSC. As Laplaciansmoothing and edge removal are approximately faster than GSC, they are used in priority, whereas GSC is used as alast resort technique to unlock processes trapped in local maxima of the mesh quality objective function. The pseudocode for asimplified serial mesh improvement schedule is presented in Algorithm 2. The for loop on line 7 will be called the
SER loop inthe following (Smoothing Edge Removal) and the loop where the GSC operation is applied, on line 17, will be called
GSC loop .Both loops iterate over bad tetrahedra, which are tetrahedra with a quality under a user-defined threshold. Using the Gammaquality function detailed in Appendix A.3, our mesh improvement strategy is able to eliminate most tetrahedra with 𝛾 < . ,as shown in Figure 6, where one can see that only a few bad tetrahedra subsist. Algorithm 2
The proposed serial mesh improvement schedule tries to improve each tetrahedron 𝜏 from a list of bad tetrahedra in a mesh . Bad tetrahedra are tetrahedra with a quality smaller than a user-defined threshold 𝑞 𝑚𝑖𝑛 function M ESH I MPROVEMENT ( , 𝑞 𝑚𝑖𝑛 ) do modifGSC ← ⊳ count modifications of the mesh by Growing SPR Cavity do modifSER ← ⊳ count modifications of the mesh by Smoothing or Edge Removal ← G ET B AD T ETRAHEDRA ( , 𝑞 𝑚𝑖𝑛 ) for 𝜏 ∈ do ⊳ the SER loop improved ← False for point ∈ 𝜏 and ¬ improved do improved ← L APLACIAN S MOOTHING ( , point) for edge ∈ 𝜏 and ¬ improved do improved ← EDGE R EMOVAL ( , edge) if improved then modifSER ← modifSER + 1 while modifSER > ← G ET B AD T ETS ( , 𝑞 𝑚𝑖𝑛 ) for 𝜏 ∈ do ⊳ the GSC loop if GSC( , 𝑡𝑎𝑢 ) then modifGSC ← modifGSC + 1 while modifGSC > end function The mesh improvement strategy described above can be parallelized pretty much in the same way as the Delaunay refinement.The parallel shared-memory Delaunay kernel introduced in our previous article partitions the domain on basis of a 3D Moorecurve defined in such a way that all partitions contain the same amount of points to be inserted. A point is in a partition if itsMoore index is in the range that defines the partition. A tetrahedron, then, is considered to belong to a partition if at least 3 of its4 vertices lay in that partition. When the cavity created for the insertion of a new point overlaps the boundary between differentpartitions, the operation is suspended, until all other insertions have been tried. At that moment, a new Moore curve and hencenew partitions are created, and the insertion loop resumes with the points whose insertion was suspended. The number of parallelthreads, and hence the number of partitions, is determined at the beginning of each sweep by the percentage 𝜌 of suspended pointinsertions during the previous sweep. If 𝜌 = 1 , a single thread is used so that the termination of the algorithm is guaranteed.Similarly, for mesh improvement, the space filling curve is partitioned so as to equally distribute bad tetrahedra over thethreads. Figure 7 shows partitioning examples for 8 threads on 3 different meshes. The number of threads is again decided infunction of the percentage of suspended operations—due to a conflict with another partition— in the previous sweep. The SERloop is thus executed repeatedly with decreasing numbers of threads, until all partition conflicts have been resolved. The sameprocedure is used for the GSC loop as well. This parallel algorithm scales well in case of very big meshes only, for essentiallytwo reasons. Firstly, elements crossing partition boundaries represent a larger portion of space in small meshes than in largemeshes, yielding thus mechanically more conflicts. This is less of an issue with mesh refinement because cavities are usuallysmaller for point insertion than for the Growing SPR Cavity. The second reason is that the time spent in one execution of theSER or of the GSC loop is very small, typically in the millisecond range. Therefore, the overhead of launching new threads andcomputing Moore indices for the whole mesh is significant. Figure 8a compares the scaling in the case of two highly-refinedmodels. The mesh generation was done on the 64 physical cores of an Intel Xeon Phi 7210 machine, running at 1.3 GHz. Theoverall effectiveness and performance of our mesh improvement algorithm is analyzed in section 5.4. (a) Rotor mesh partitions (b)
Piston mesh partitions (c)
Rim mesh partitions
FIGURE 7
Partitions based on the 3D Moore curve ordering. These partitions were created almost instantaneously at the startof the mesh improvement step, running on 8 threads.1 2 4 8 16 32 64 Number of threads T i m e [ s ] Aircraft500 thin fibersperfect scaling (a)
Scaling of tetrahedral mesh improvement on 2 very large meshes. (b)
The 2 different meshes used within the scaling graph. Above, an Aircraftwith 637 million tetrahedra, the interior being also meshed. Below, a cubewith 500 thin fibers that has 351 million tetrahedra.
FIGURE 8
Scaling of our parallel mesh improvement schedule on 2 huge meshes
Before comparing the performance of our tetrahedral mesh generator with TetGen and Gmsh, the similarities and key differencesbetween the different implementations are briefly recalled. Besides being all open-source and free, the three considered softwaretools are also structured very similarly. As explained in the introduction, the mesh generation process can be splitted into fourdistinct steps: creation of an empty mesh , boundary recovery , mesh refinement and mesh improvement .Although it could be sensible to work on improving the quality of tetrahedra right at the moment of their creation, none of theopen source mesh generators that are considered here use such a strategy. Separating mesh refinement and mesh improvementhas several advantages. The mesh refinement process essentially creates around times the number of tetrahedra that are presentin the final mesh. Avoiding the computation of element quality during refinement therefore allows a substantial gain in efficiency.Moreover, having separated pieces of code for refinement and improvement allows to programmers to focus on smaller tasks andgoals. It also allows using meshing capabilities of our code in a modular fashion: it is indeed possible to create a mesh with Gmsh,refine it with TetGen, and optimize it with our HXT algorithm, without duplicating any part of the process. Another advantage isthat the implementations of the different steps can be analyzed separately, and their respective execution times be compared on different models. The results of our benchmark for all 4 steps of the tetrahedral mesh generation, including the specification ofrelevant hardware characteristics and compilation flags, are given in Appendix A. More specifically, element numbers, timingsand CPU/memory usages are reported in Table A1. Comparative performances per step, except for the boundary recovery step,and per mesh generator are shown as bar plots in Figure A1. The first step, i.e., the creation of the empty mesh, consists in computing a Delaunay tetrahedralization through all points ofthe surface mesh plus some possible user-defined interiors points. Since a Delaunay tetrahedralization is unique, provided a simulation of simplicity is used in conjunction with robust adaptive predicates , all three programs generate identical emptymeshes. The ordering of the tetrahedra however, can largely vary from one packege to another. The ordering is even non-deterministic with HXT, because threads can reserve chunks of memory corresponding to a set of 8192 tetrahedra at differentmoments from one run to another. To get rid of this non-deterministic behaviour, HXT offers a reproducible mode, whichreorders deterministically tetrahedra in the lexicographic order of their nodes. The performance of TetGen, Gmsh and HXT(with and without the reproducible mode), for the different meshes shown in Table A1, are reported in Figure A1b. HXT is thefastest for the empty mesh step, but not because of parallelization. Most tetrahedra of the empty mesh indeed cross the domainfrom side to side. With our partitioning method based on a space-filling curve, points that are at different extremities of thedomain have very little chances of being in the same partition. Because a tetrahedron is only considered in a partition when atleast three of its vertices are in that partition, parallelization is not very effective at this step. The speed difference between HXTand TetGen, or between HXT and Gmsh, is rather explained here by the good serial performances of our Delaunay kernel . All 3 software tools rely internally on TetGen’s boundary recovery code. Table A1 shows however that HXT is a bit slowerthan TetGen for boundary recovery, because of the back and forth conversion of the mesh between its own data structure andTetGen’s format. Gmsh applies the same, albeit much slower, kind of conversion. Moreover, the order under which tetrahedra arestored in memory can either slow down or speedup the boundary recovery process, which explains timing differences observedbetween HXT in normal or reproducible mode. As the other parts of HXT are parallelized and optimized for heavy workload,boundary recovery is the bottleneck of our code for large meshes. In contrast, HXT usually performs very well on small meshes.We suspect that TetGen’s algorithm for locating missing facets or edges has a superlinear complexity with respect to the size ofthe mesh.
Mesh refinement is the step where our parallel HXT algorithm really stands out. The ways TetGen, HXT or Gmsh proceed torefine the mesh are rather different, but Figure A1a nonetheless shows that the different software tools generate meshes withonly slightly different numbers of tetrahedra. TetGen was given the options q1.1/14 , causing it to refine only tetrahedra witha radius-edge ratio larger than . or a minimum dihedral angle smaller than ◦ . No meshsize constraint was given, althoughTetGen allows it. TetGen therefore adds point only to optimize the quality of elements. In contrast, HXT and Gmsh add a newpoint 𝑝 𝑘 +1 inside a tetrahedron 𝜏 if the insertion does not create an edge shorter than the prescribed meshsize, which is the valuelinearly interpolated from meshsizes at the vertices of 𝜏 . At the beginning of the refinement step, nodal meshsizes are evaluatedfrom the surface mesh. and Gmsh thus do not refine with the goal of improving the quality of the elements. All three softwaresdo however end up with meshes with very similar numbers of tetrahedra. Still, timings for mesh refinement and improvementindicated in Table A1, Figure A1c and A1d, are expressed in second per million tetrahedra to alleviate the effect of the differentrefinement strategies. HXT is at least 5 times faster than Gmsh and TetGen for mesh refinement, consistently generating morethan one million tetrahedra per second when the other mesh generators peak at
300 000 tetrahedra per second. On the model, HXT reaches nearly 3 million tetrahedra per second, which is 132 times faster than Gmsh. HXT is also efficientin terms of memory usage. For the
100 fibers mesh, the peak resident set size does not exceed 77 bytes per tetrahedron. Ifmemory allocations not depending on the size of the mesh are put aside, HXT consumes only about 60 bytes per tetrahedron. For the mesh improvement step, TetGen was given the -o/130 option, setting to ◦ the target maximum dihedral angle . Incontrast, HXT and Gmsh target a minimum inradius to longest edge ratio 𝛾 𝑡ℎ𝑟𝑒𝑠ℎ𝑜𝑙𝑑 = 0 . . As Gmsh and HXT avoid modifyingthe surface mesh, TetGen was also prevented from modifying the surface mesh by enabling the -Y option. In addition to thetimings of Table A1, mesh quality has also been compared after the mesh improvement step for 3 geometries ( Rotor , Piston and
Rim ) and 3 quality measures: • the dihedral angles, plotted in Figure A2 • the inradius to longest edge ratio ( 𝛾 ), shown on the bar plot of Figure A3. The tetrahedra with 𝛾 < . before and afterHXT’s mesh improvement step are also shown in Figure 6 • the signed inverse condition number (SICN), plotted in Figure A4HXT gives comparatively the best results regarding each of those three quality measures, even for the maximum dihedralangles which is the quality measure used by TetGen during the mesh improvements step. Figure A2 indeed shows noticeablysmaller maximum dihedral angles for HXT compared to Gmsh and TetGen for each of the 3 considered meshes. This differenceis explained by the effectiveness of the new Growing SPR Cavity operation described in section 3. The performance of HXT iseven more striking when looking at the minimum inradius to longest edge ratio and the minimum SICN:Rotor Piston Rim 𝛾 SICN ∡ 𝑚𝑖𝑛 ∡ 𝑚𝑎𝑥 𝛾 SICN ∡ 𝑚𝑖𝑛 ∡ 𝑚𝑎𝑥 𝛾 SICN ∡ 𝑚𝑖𝑛 ∡ 𝑚𝑎𝑥 Gmsh 0.014 0.077 4.06 174.76 0.23 0.23 11.24 161.40 0.0094 0.034 1.59 177.22TetGen 0.037 0.16 5.10 168.28 0.15 0.30 14.40 156.12 0.073 0.15 6.70 168.26HXT (ours) 0.13 0.16 5.10 149.2 0.29 0.39 13.76 146.64 0.16 0.17 5.92 149.09
TABLE 1
Minimum 𝛾 and SICN measure among all tetrahedra, minimum and maximum dihedral angle among all dihedralangles of the mesh, for the Rotor , Piston and
Rim meshes with the 3 different open-source software tools tested.HXT’s mesh improvement is parallelized and has very fast smoothing and edge removal operations. Table A1 and FigureA1d indeed indicate that HXT is also the fastest when it comes to mesh improvement, but not by much. The reason for this issimple: TetGen and Gmsh stop optimizing whenever the smoothing and edge removal operations become ineffective, whereasHXT, then, starts its first GSC sweep.
This paper has presented the parallel tetrahedral mesh generator HXT, and demonstrated its efficiency by means of a detailedcomparative benchmark with two concurrent open-source software tools: Gmsh and TetGen. Although the obtained performanceare very satisfactory, there is room for improvement. The first and easiest improvement will be the implementation of edgecontraction, point insertion and 2-2 flips. Another particularly challenging task is the parallelization of the boundary recoverystep, which is the last non-parallelized step in the whole meshing process. Thanks to the acquired experience, improvement onthe boundary recovery procedures of TetGen is now definitely within reach, probably but adding our parallelization strategybased on Moore curves. The novel GSC operation could also help gaining performance, and could be adapted as a last resortoperation to recover constrained facets or edges. Other parallelizations strategies should also be tested for the mesh improvementstage.
ACKNOWLEDGMENTS
This project has received funding from the European Research Council (ERC) under the European Union’s Horizon 2020research and innovation programme (grant agreement ERC-2015-AdG-694020). We are grateful to the authors of the ABC Dataset , for providing the Rim and
Piston geometries (respectively model 40 and 9733). Surface meshes were generated thanksto Gmsh.
References
15. Ruppert J, Seidel R. On the difficulty of triangulating three-dimensional Nonconvex Polyhedra. Discrete & ComputationalGeometry. 1992 Mar;7(3):227–253. Available from: http://link.springer.com/10.1007/BF02187840.16. Marot C, Verhetsel K, Remacle JF. Reviving the Search for Optimal Tetrahedralizations. In: Proceedings of the 28th Inter-national Meshing Roundtable. Buffalo, New York, USA: Zenodo; 2020. Available from: https://doi.org/10.5281/zenodo.3653420.17. Edelsbrunner H, Mücke EP. Simulation of simplicity: a technique to cope with degenerate cases in geometric algorithms.ACM Trans Graph. 1990;9(1):66–104. Available from: http://doi.acm.org/10.1145/77635.77639.18. Shewchuk JR. Adaptive precision floating-point arithmetic and fast robust geometric predicates. Discrete & ComputationalGeometry. 1997;18(3):305–363.19. TetGen, Version 1.5, User’s Manual;. Available from: http://wias-berlin.de/software/tetgen/1.5/doc/manual/index.html.20. Koch S, Matveev A, Jiang Z, Williams F, Artemov A, Burnaev E, et al. ABC: A Big CAD Model Dataset For GeometricDeep Learning. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR); 2019. . APPENDIXA BENCHMARKS
Benchmarks use the average of 5 runs on an Intel Ⓡ Core TM i7-6700HQ with 4 cores running at 3.5GHz and 16Gb of RAM.We use Gmsh 4.6.0 with the -optimize_threshold=0.35 option and HXT’s tetMesh_CLI executable provided in gmsh/-contrib/hxt/tetMesh/ with default options. We use TetGen 1.5.1 with "-BNEFIVVYp -q1.1/14 -O7 -o/130" options. TetGenwas compiled with gcc -O3 , Gmsh and HXT with gcc -O3 -march=native . See section 5 for an in-depth discussion on theresults. The code for the benchmarks is available at https://git.immc.ucl.ac.be/marotc/tetmesher_benchmark A.1 Speed
Figure A1 shows bar plots of different performance measures, further detailed in Table A1. fi b e r s K nu c k l e R i m fi b e r s R o t o r P i s t on ⋅ t e t . GmshTetGenOursOurs (reprod. mode) (a)
Final number of tetrahedra fi b e r s K nu c k l e R i m fi b e r s R o t o r P i s t on T i m e [ s ] GmshTetGenOursOurs (reprod. mode) (b)
Tetrahedrizing the empty mesh fi b e r s K nu c k l e R i m fi b e r s R o t o r P i s t on ⋅ −5 T i m e / t e t . [ s /t e t ] GmshTetGenOursOurs (reprod. mode) (c)
Mesh refinement fi b e r s K nu c k l e R i m fi b e r s R o t o r P i s t on ⋅ −6 T i m e / t e t . [ s /t e t ] GmshTetGenOursOurs (reprod. mode) (d)
Mesh improvement
FIGURE A1
Performance benchmark bar plots models Rotor Knuckle Rim 100 fibers 300 fibers Piston Surface mesh number of points
115 052 435 423 397 985 47 759 328 661 27 187 number of triangles
230 232 870 914 796 030 94 994 656 162 54 374
Gmsh final number of points
138 652 1 435 736 964 213 583 860 2 583 840 52 823 final number of tetrahedra
493 632 7 536 855 4 729 987 3 580 914 15 680 789 241 982
Empty Mesh [ 𝑠 ] 2.31 13.78 9.68 0.95 9.78 0.52Boundary Recovery [ 𝑠 ] 2.43 33.039 32.93 6.30 0.07 23.92Mesh Refinement [ 𝜇𝑠 ∕ tet] 14.61 36.73 28.50 28.72 45.12 20.69Mesh Improvement [ 𝜇𝑠 ∕ tet] 4.14 7.16 5.96 5.73 8.24 4.23Max. mem. usage [ 𝐵 ∕ tet] 1226.70 558.86 569.25 524.24 493.53 865.92CPU % 99.9 99.9 99.9 99.9 99.9 99.9 TetGen final number of points
169 755 2 140 715 1 101 619 614 938 3 183 439 59 876 final number of tetrahedra
670 594 11 442 634 5 382 975 3 606 324 18 589 149 275 491
Empty Mesh [ 𝑠 ] 1.78 8.13 6.32 0.63 4.81 0.42Boundary Recovery [ 𝑠 ] 1.23 6.21 4.70 0.54 4.60 0.32Mesh Refinement [ 𝜇𝑠 ∕ tet] 3.19 5.23 4.76 3.49 4.14 3.98Mesh Improvement [ 𝜇𝑠 ∕ tet] 0.82 1.00 0.94 0.88 0.94 0.73Max. mem. usage [ 𝐵 ∕ tet] 903.46 314.12 474.32 204.72 227.18 614.19CPU % 99.9 99.9 99.9 99.9 99.9 99.9 HXT (ours) final number of points
146 957 2 121 555 . . . final number of tetrahedra
541 929 . . . . . Empty Mesh [ 𝑠 ] 0.36 1.57 0.87 0.16 1.13 0.12Boundary Recovery [ 𝑠 ] 1.91 8.54 6.64 0.72 6.53 0.47Mesh Refinement [ 𝜇𝑠 ∕ tet] 0.64 0.50 0.47 0.35 0.34 0.62Mesh Improvement [ 𝜇𝑠 ∕ tet] 0.78 0.65 0.11 0.36 0.31 0.25Max. mem. usage [ 𝐵 ∕ tet] 1032.73 221.15 328.88 80.86 77.13 646.72CPU % 292.2 383.0 326.4 523.8 479.4 351.2 HXT (ours)reproducible mode final number of points
146 967 2 121 944 1 184 569 1 162 848 4 348 050 57 505 final number of tetrahedra
542 012 11 716 723 6 064 318 7 141 786 26 506 921 268 493
Empty Mesh [ 𝑠 ] 0.35 1.67 1.35 0.19 1.35 0.12Boundary Recovery [ 𝑠 ] 1.60 7.60 6.10 0.64 5.74 0.42Mesh Refinement [ 𝜇𝑠 ∕ tet] 0.65 0.83 0.78 0.51 0.75 0.75Mesh Improvement [ 𝜇𝑠 ∕ tet] 0.61 0.64 0.90 0.36 0.24 0.15Max. mem. usage [ 𝐵 ∕ tet] 1031.42 221.40 328.60 123.45 113.98 654.47CPU % 301.4 444.6 391.4 577.4 514.8 379.2 TABLE A1
Performance benchmark table A.2 Dihedral Angles
The dihedral angles formed by each pair of facets of a tetrahedron are customary measures to look at because of their conceptualsimplicity. However, this measure does not directly correspond to any type of error that the discretization induces during a finiteelement simulation . ⋅ a ng l e s Gmsh - Rotor ⋅ Gmsh - Piston ⋅ Gmsh - Rim ⋅ a ng l e s Tetgen - Rotor ⋅ Tetgen - Piston ⋅ Tetgen - Rim ⋅ dihedral angle [ ◦ ] a ng l e s Ours - Rotor ⋅ dihedral angle [ ◦ ]Ours - Piston ⋅ dihedral angle [ ◦ ]Ours - Rim FIGURE A2
Histograms of the dihedral angles. Lower and upper bound are marked with red vertical lines. The average ismarked with a blue line. A.3 Gamma 𝛾 = √
24 3 𝑉 | 𝑒 𝑚𝑎𝑥 | ( 𝐴 + 𝐴 + 𝐴 + 𝐴 ) = √ 𝑟 𝑖𝑛 | 𝑒 𝑚𝑎𝑥 | , where 𝑉 is the volume of the tetrahedron, | 𝑒 𝑚𝑎𝑥 | is the length of the longest edge, 𝐴 𝑖 is the area of the 𝑖 th face and 𝑟 𝑖𝑛 is theinradius of the tetrahedron. The factor √ is added such that the optimal tetrahedron, which is a regular tetrahedron, has aquality 𝛾 = 1 . This quality measure penalize all tetrahedra according to their associated interpolation error . . . . . ⋅ t e t . Gmsh - Rotor . . . . . ⋅ Gmsh - Piston . . . . ⋅ Gmsh - Rim . . . . ⋅ t e t . Tetgen - Rotor . . . . . ⋅ Tetgen - Piston . . . . ⋅ Tetgen - Rim . . . . ⋅ 𝛾 t e t . Ours - Rotor . . . . . ⋅ 𝛾 Ours - Piston . . . . ⋅ 𝛾 Ours - Rim
FIGURE A3
Histograms of the quality measure 𝛾 . Lower bound is marked with a red vertical line. The average is marked witha blue line. A.4 Signed inverse condition number (SICN)
The inverse condition number in Frobenius norm SICN = 𝜅 ( 𝑆 ) for each tetrahedron. 𝜅 ( 𝑆 ) is the condition number of the lineartransformation matrix 𝑆 between a tetrahedron of the mesh and a regular tetrahedron. the SICN is directly proportional to thegreatest lower bound for the distance of 𝑆 to the set of singular matrices . . . . . ⋅ t e t . Gmsh - Rotor . . . . . ⋅ Gmsh - Piston . . . . ⋅ Gmsh - Rim . . . . ⋅ t e t . Tetgen - Rotor . . . . . . ⋅ Tetgen - Piston . . . . ⋅ Tetgen - Rim . . . . ⋅ SICN t e t . Ours - Rotor . . . . . . ⋅ SICNOurs - Piston . . . . ⋅ SICNOurs - Rim