[PDF] Computing Three-dimensional Constrained Delaunay Refinement Using the GPU

Abstract

We propose the first GPU algorithm for the 3D triangulation refinement problem. For an input of a piecewise linear complex G and a constant B , it produces, by adding Steiner points, a constrained Delaunay triangulation conforming to G and containing tetrahedra mostly of radius-edge ratios smaller than B . Our implementation of the algorithm shows that it can be an order of magnitude faster than the best CPU algorithm while using a similar amount of Steiner points to produce triangulations of comparable quality.

Full PDF

CComputing Three-dimensional Constrained DelaunayRefinement Using the GPU

Zhenghai Chen

School of ComputingNational University of [email protected]

Tiow-Seng Tan

School of ComputingNational University of [email protected]

ABSTRACT

We propose the first GPU algorithm for the 3D triangulation refine-ment problem. For an input of a piecewise linear complex G anda constant B , it produces, by adding Steiner points, a constrainedDelaunay triangulation conforming to G and containing tetrahedramostly of radius-edge ratios smaller than B . Our implementation ofthe algorithm shows that it can be an order of magnitude faster thanthe best CPU algorithm while using a similar amount of Steinerpoints to produce triangulations of comparable quality. CCS CONCEPTS • Theory of computation → Computational geometry ; •

Com-puting methodologies → Graphics processors ; KEYWORDS

GPGPU, Computational Geometry, Mesh Refinement, Finite Ele-ment Analysis

Constrained Delaunay triangulations (CDTs) are used in variousengineering and scientific applications, such as finite element meth-ods, interpolation etc. Such a CDT, in general, is obtained from aso-called piecewise linear complex (PLC) G containing a point set P ,an edge set E (where each edge with endpoints in P ), and a polygonset F (where each polygon with boundary edges in E ). All vertices,edges and polygons of G also appear in T as vertices, union ofedges, and union of triangles, respectively; we also say T conforms to G in this case. For our discussion, we call an edge in E a segment ,an edge in T which is also a part (or whole) of some segment a subsegment , and a triangle in T which is also a part (or whole) ofsome polygon of F a subface .For a given constant B and a CDT T of G as input, the con-strained Delaunay refinement problem is to add vertices, called Steiner points , into T to eliminate or split most, if not all, bad tetra-hedra to generate a new CDT of G . (A tetrahedron t is bad if theratio of the radius of its circumsphere to its shortest edge is largerthan B .) A solution to the problem should also aims to add fewSteiner points. The TetGen software by Si [3] is the best CPU so-lution known to the problem. It, however, can take a significantamount of time of minutes to hours to compute CDTs for sometypical inputs from applications. We thus explore the use of GPUto address this problem.

Our proposed algorithm gQM3D follows the general Delaunay refine-ment paradigm where subsegments, subfaces and bad tetrahedra, collectively called elements , are split in this order in many roundsuntil there are no more bad tetrahedra. Each round, the splittingis done to many elements in parallel with many GPU threads. Thealgorithm first calculates the so-called splitting points that can splitelements into smaller ones, then decides on a subset of them tobe Steiner points for actual insertions into the triangulation T .Note first that a splitting point is calculated by a GPU thread asthe midpoint of a subsegment, the circumcenter of the circumcircleof the subface, and the circumcenter of the circumsphere of thetetrahedron. Note second that not all splitting points calculatedcan be inserted as Steiner points in a same round as they togethercan potentially create undesirable short edges in T to cause non-termination of the algorithm. So, the algorithm must filter awaysome splitting points.For a splitting point p , its Delaunay region is the set of elements(subfaces or tetrahedra) who will become non-Delaunay (with theircircumcircles or circumspheres, respectively, enclosing p ) if p isinserted as a Steiner point into T . We know for two splitting pointswith disjoint Delaunay regions, their insertions into T will notresult in them forming an edge in T (while T is maintained as aconstrained Delaunay triangulation at the end of each round). Assuch, and to achieve good speed up with using the GPU, our algo-rithm seeks to identify a large set of splitting points with mutuallydisjoint Delaunay regions in each round. So, the problem becomeshow to identify disjoint Delaunay regions efficiently.The trivial way of one thread taking care of one splitting point tocalculate its Delaunay region is inefficient as different threads canneed vastly different amounts of computation to process Delaunayregions of different sizes. Instead, a good approach should deploy anumber of threads in proportion to the size of a Delaunay regionso each thread does more or less similar amount of work. Such adesirable regularized work approach is developed in our grow-and-blast scheme as outline in the next paragraph.Initially, a thread is assigned to an element where the splittingpoint is located. This element is also a part of the Delaunay regionof the splitting point. The thread then checks the neighbors (sub-faces and tetrahedra) of this element to decide whether they arealso a part of the Delaunay region of the splitting point. For sucha neighbor, it is marked (grown) as a part of the Delaunay region,and a thread will be assigned to it to perform the similar kind ofchecking and marking subsequently. Having said this, when anelement appears as a neighbor to many and is to be marked intomore than one Delaunay regions, only one is allowed while oth-ers with predetermined lower priorities must be stop (blasted) andtheir corresponding splitting points filtered away. Those Delau-nay regions remain are mutually disjoint, and their correspondingsplitting points are inserted concurrently into T as Steiner points. a r X i v : . [ c s . G R ] M a r

3D ’19, 21-23 May 2019, Montreal, Quebec, Canada Zhenghai Chen and Tiow-Seng Tan γ TetGen gQM3D gQM3D + TetGen gQM3D gQM3D + TetGen gQM3D gQM3D + TetGen gQM3D gQM3D + TetGen gQM3D gQM3D + B . . . . Table 1: Comparison among algorithms with 25K input points of the ball distribution. "Tets" denotes tetrahedra.

All experiments are conducted on a PC with an Intel i7-7700k4.2GHz CPU, 32GB of DDR4 RAM and a GTX1080 Ti graphicscard with 11GB of video memory.

TetGen is the main CPU softwarewe use to compare with our gQM3D implemented with CUDA pro-gramming model. During our experimentation, we notice gQM3D does not have particular advantage over CPU approach for theinitial part of the computation. We thus replace this part of gQM3D by using

TetGen in CPU to obtain a variant called gQM3D + . We notethat CGAL [1] and

TetWild [2] are not part of the comparison fornow as they address a slightly different problem that allows outputnot conforming to the input PLCs.

Rectangles to points ratio S p ee dup S p ee dup Rectangles to points ratio

Figure 1: Speedup of gQM3D (left) and gQM3D + (right). Table 1 and Figure 1 report the running time and triangulationquality obtained with synthetic PLCs with points of different distri-butions. γ is the ratio of the number of polygons (which are mainlyrectangles) to the number of points in the input PLC. Both gQM3D and gQM3D + can achieve speedup of an order of magnitude whilegenerate outputs with similar sizes compared to that of TetGen .Figure 2 shows (cut-off views) the comparison of output triangu-lations of a real-world object for

TetGen and gQM3D . The outputshave similar sizes with the latter having slightly more Steiner points

Figure 2: The output triangulations of a triceratops gener-ated by

TetGen (left) and gQM3D (right). but fewer bad tetrahedra. Both triangulations have similar distribu-tion of dihedral angles (ranging from 0 ◦ to 180 ◦ ) as shown in theinserted line graphs and thus of equally good triangulations. We propose the first GPU algorithm for the constrained Delaunayrefinement problem. It is designed with regularized work in mindto suit GPU computation. With this work and our continuing effortto optimize our implementations of gQM3D and gQM3D + , the compu-tation of a quality triangulation can possibly be an integral part ofinteractive engineering or scientific applications. In addition, theapproach and strategy used in this work are of independent interestto studying other variants of 3D and surface triangulation problemssuch as that by CGAL and

TetWild to realize them in GPU.

REFERENCES [1] Pierre Alliez, Clément Jamin, Laurent Rineau, Stéphane Tayeb, Jane Tournois, andMariette Yvinec. 2018. 3D Mesh Generation. In

CGAL User and Reference Manual (4.13 ed.). CGAL Editorial Board. https://doc.cgal.org/4.13/Manual/packages.html

ACM Trans. Graph.

37, 4, Article60 (July 2018), 14 pages. https://doi.org/10.1145/3197517.3201353[3] Hang Si. 2015. TetGen, a Delaunay-Based Quality Tetrahedral Mesh Generator.