A tetrahedral space-filling curve for non-conforming adaptive meshes
AA tetrahedral space-filling curve for non-conforming adaptive meshes
CARSTEN BURSTEDDE ∗ AND
JOHANNES HOLKE ∗† Abstract.
We introduce a space-filling curve for triangular and tetrahedral red-refinement thatcan be computed using bitwise interleaving operations similar to the well-known Z-order or Mortoncurve for cubical meshes. To store sufficient information for random access, we define a low-memoryencoding using 10 bytes per triangle and 14 bytes per tetrahedron. We present algorithms thatcompute the parent, children, and face-neighbors of a mesh element in constant time, as well as thenext and previous element in the space-filling curve and whether a given element is on the boundaryof the root simplex or not. Our presentation concludes with a scalability demonstration that createsand adapts selected meshes on a large distributed-memory system.
Key words.
Forest of octrees, parallel adaptive mesh refinement, Morton code, high perfor-mance computing, nonconforming simplicial mesh, space-filling curve
AMS subject classifications.
1. Introduction.
Conforming adaptive mesh refinement for simplicial (trian-gular and tetrahedral) meshes is one of the most successful concepts in numericalmathematics and computational science and engineering; see, e.g., [3, 21, 50]. Sim-plices provide high flexibility in meshing to arbitrary domain geometries [46, 47] andcan be mapped to a reference simplex using an elementwise constant Jacobian inmost cases, which allows for an efficient numerical implementation. Discretizationand integration methods of various kinds are available for both low and high ordersof accuracy.Large-scale scientific computing requires fast and scalable algorithms for (1) adap-tive refinement and coarsening (AMR) as well as (2) parallel partitioning. One classof methods for AMR exploits the properties of Delaunay triangulations [23, 26, 45],while partitioning of unstructured meshes is often approached by formulating themesh topology as a graph. These triangulations are usually conforming; that is, ele-ments intersect only along whole faces and edges. Common application codes such asFEniCS [33], PLUM [35,36], OpenFOAM [37], or MOAB from the SIGMA toolkit [53]delegate graph partitioning to third-party software like ParMETIS [29] or Scotch [20].Graph-based algorithms have been advanced to target millions of processes and bil-lions of elements [18,42,49]. Still, increasing the scalability and decreasing the absoluteruntime and memory demands of distributed implementations remains a challenge,and the lack of an obvious parent-child structure in many unstructured meshing ap-proaches prevents certain use cases.Nonconforming AMR in combination with recursive refinement makes refinementand coarsening nearly trivial operations. The additional mathematical logic to enablehanging faces and edges is well understood for both continuous and discontinuousdiscretizations [1, 22, 30, 43]. It is local to the loop over the finite elements or volumesand transparent to most of the numerical pipeline, thus offering the possibility toextend existing conforming codes. The resolution may be as coarse as any chosenroot mesh (a mesh which is not intended for further coarsening), which poses only aslight limit to geometric flexibility.The challenge of efficient partitioning of meshes may be addressed using space- ∗ Institut f¨ur Numerische Simulation (INS) and Hausdorff Center for Mathematics (HCM),Rheinische Friedrich-Wilhelms-Universit¨at Bonn, Germany. † Corresponding author ([email protected])1 a r X i v : . [ c s . D C ] A p r C, BURSTEDDE, J. HOLKE filling curves (SFC). Instead of working on the NP-hard graph partitioning problem[12], SFCs are used to approximately solve the partitioning problem in linear runtime[4, 57]. Common SFCs establish an equivalence between the adaptive mesh and atree, where the mesh elements correspond one-to-one to the leaves of the tree. Theyalso define a total order among the leaves that can be used to store application datalinearly in the same order as the elements [15]. The original concept is specified fora cubic domain and can be traced back to over a hundred years ago [25, 39] . TheSierpinski curve [48] and generalized versions of it are the most common SFCs onsimplicial meshes. These were described by B¨ansch [8], Kossaczk`y [31], and others;see e.g. [4,44]. The computation of parents and children, for example, usually involvesa loop over the refinement levels, equivalent to following the path between the treeroot and a leaf.When using hypercubes as mesh primitives, the logic of common SFCs can beformulated with remarkable simplicity due to the local tensor product structure [39,51,52,54,57]. It is also well-suited to compute topological entities like face-neighbors,children, and parents of given mesh elements. The Morton curve [34], in particular,allows one to design algorithms for these tasks (see, e.g., [16]) whose runtime is level-independent, i.e., does not depend on the depth of the refinement tree. Moreover,SFCs are memory efficient, since for a mesh element only the coordinates of oneanchor node plus its refinement level have to be stored. Today, this principle hasfound its way into widely used software libraries [5, 6]. Memory usage can be furtheroptimized by incremental encoding along the SFC [4, 13, 55].Using SFCs on hexahedral meshes is exceptionally fast and scalable [14, 28, 41].This fact has not only been exploited in writing simulation codes using hexahedralmeshes, but also by approaches that use the hexahedral SFC as an instrument topartition simplices, mapping them into a surrounding cube [2]. However, hexahedralAMR is more limited geometrically than simplicial AMR. One indication for this isthe (lack of) availability of (open-source) mesh generators that operate exclusivelyon cubes. Furthermore, assuming an existing simplicial code, rewriting it in terms ofcubical meshes will be out of the question if the code is of sufficient size, complexity,or maturity. For these reasons, in this paper, we attempt to establish a new SFCfor triangles and tetrahedra that has many of the favorable properties known forhexahedra.Our starting point is to divide the simplices in a refined mesh into two (twodimensions, 2D), respectively six (three dimensions, 3D), different types and selectingfor each type a specific reordering for Bey’s red-refinement [9]. This type and thecoordinates of one vertex serve as a unique identifier, the Tet-id, of the simplex inquestion. In particular, we do not require storing the type of all parent simplices tothe root, as one might naively imagine. We then propose a Morton-like coordinatemapping that can be computed from the Tet-id and gives rise to an SFC. Based onthis logic, we develop constant-time algorithms (that is, not requiring a loop overthe refinement levels) to compute the tetrahedral-Morton identifiers of any parent,child, face-neighbor, and SFC-successor/predecessor and to decide whether for twogiven elements one is an ancestor of the other. We conclude with scalability tests ofmesh creation and adaptation with over 8 . × tetrahedral mesh elements on upto 131,072 cores of the JUQUEEN supercomputer and 786,432 cores of MIRA.
2. Mesh refinement on simplices.
Our aim is to define and examine a newSFC for triangles and tetrahedra by adding ordering prescriptions to the noncon-forming Bey-refinement (also called red-refinement) [9, 10, 56]. We briefly restate the
TETRAHEDRAL SPACE-FILLING CURVE (cid:126)x (cid:126)x (cid:126)x (cid:126)x (cid:126)x (cid:126)x T T T T (cid:126)x (cid:126)x (cid:126)x (cid:126)x (cid:126)x (cid:126)x (cid:126)x (cid:126)x Fig. 1 . Left: The refinement scheme for triangles in two dimensions. A triangle T =[ (cid:126)x , (cid:126)x , (cid:126)x ] ⊂ R is refined by dividing each face at the midpoints (cid:126)x ij . We obtain four smallertriangles, all similar to T . Right: The situation in three dimensions. If we divide the edges of thetetrahedron T = [ (cid:126)x , (cid:126)x , (cid:126)x , (cid:126)x ] ⊂ R in half, we get four smaller tetrahedra (similar to T ) and aninner octahedron. By dividing the octahedron along any of its three diagonals (shown dashed) wefinally end up with partitioning T into eight smaller tetrahedra, all having the same volume. Therefinement rule of Bey is obtained by always choosing the diagonal from (cid:126)x to (cid:126)x and numberingthe corners of the children according to (2) . red-refinement in this section and contrast it with the well-known conforming (orred/green) refinement.We refer to triangles and tetrahedra as d -simplices, where d ∈ { , } specifies thedimension. It is sometimes convenient to drop d from this notation. A d -simplex T ⊆ R d is uniquely determined by its d +1 affine-independent corner nodes (cid:126)x , . . . , (cid:126)x d ∈ R d .Their order is significant, and therefore we write T = [ (cid:126)x , (cid:126)x , (cid:126)x ] in 2D , (1a) T = [ (cid:126)x , (cid:126)x , (cid:126)x , (cid:126)x ] in 3D . (1b)We define (cid:126)x as the anchor node of T . By (cid:126)x ij we denote the midpoint between (cid:126)x i and (cid:126)x j , thus (cid:126)x ij = ( (cid:126)x i + (cid:126)x j ). Bey’s rule is a prescription for subdividing a sim-plex. It is one instance of the so-called red-refinement, where all faces of a simplexare subdivided simultaneously.
Definition Given a d -simplex T = [ (cid:126)x , . . . , (cid:126)x d ] ⊂ R d , the refinement rule ofBey consists of cutting off four subsimplices at the corners (as in Figure 1). In 3Dthe remaining octahedron is then divided along the diagonal from (cid:126)x to (cid:126)x . Beynumbers the d resulting subsimplices as follows. (2a) 2 D : T := [ (cid:126)x , (cid:126)x , (cid:126)x ] , T := [ (cid:126)x , (cid:126)x , (cid:126)x ] ,T := [ (cid:126)x , (cid:126)x , (cid:126)x ] , T := [ (cid:126)x , (cid:126)x , (cid:126)x ] , (2b) 3 D : T := [ (cid:126)x , (cid:126)x , (cid:126)x , (cid:126)x ] , T := [ (cid:126)x , (cid:126)x , (cid:126)x , (cid:126)x ] ,T := [ (cid:126)x , (cid:126)x , (cid:126)x , (cid:126)x ] , T := [ (cid:126)x , (cid:126)x , (cid:126)x , (cid:126)x ] ,T := [ (cid:126)x , (cid:126)x , (cid:126)x , (cid:126)x ] , T := [ (cid:126)x , (cid:126)x , (cid:126)x , (cid:126)x ] ,T := [ (cid:126)x , (cid:126)x , (cid:126)x , (cid:126)x ] , T := [ (cid:126)x , (cid:126)x , (cid:126)x , (cid:126)x ] . C, BURSTEDDE, J. HOLKE
XY S S c = (cid:18) (cid:19) c c (cid:18) (cid:19) = c S S S S S X YZ S c c c c c c c c (3a) 2D (cid:126)x (cid:126)x (cid:126)x S c c c S c c c (3b) 3D (cid:126)x (cid:126)x (cid:126)x (cid:126)x S c c c c S c c c c S c c c c S c c c c S c c c c S c c c c Fig. 2 . The basic triangle (2D) and tetrahedra types (3D) obtained by dividing [0 , d intosimplices of varying types, denoted by a subscript. Top left: The unit square can be divided into twotriangles sharing the edge from (0 , T to (1 , T . We denote these triangles by S and S . Thefour corners of the square are numbered c , . . . , c in yx -order. Top right: The corner nodes of S and S in terms of the square corners. Bottom left (exploded view): In three dimensions the unitcube can be divided into six tetrahedra, all sharing the edge from the origin to (1 , , T . We denotethese tetrahedra by S , . . . , S . The eight corners of the cube are numbered c , . . . , c in zyx -order(redrawn and modified with permission [9]). Bottom right: The corner nodes of the six tetrahedra S , . . . , S in terms of the cube corners. Definition In this paper, a refinement of a d -simplex S denotes a set S of d -dimensional subsimplices of S that can be constructed from S via successive re-finement, where only the finest simplices belong to the actual refinement. Thus allrefinements can be obtained applying the following rules: • S = { S } is a refinement of S . • If S (cid:48) is a refinement of S , then S = ( S (cid:48) \ { T } ) ∪ { T , . . . , T d − } is arefinement for every T ∈ S (cid:48) .We explicitly allow nonuniform refinements and thus nonconforming faces and edges. Definition The T i from (2) are called the children of T , and T is called the parent of the T i , written T = P ( T i ) . Therefore, we also call the T i siblings of eachother. If a d -simplex T belongs to a refinement of another d -simplex S , then T is a descendant of S , and S is an ancestor of T . The number (cid:96) of refining steps neededto obtain T from S is unique and called the level of T (with respect to S ); we write (cid:96) = (cid:96) ( T ) . Usually S is clear from the context, and therefore we omit it in the notation.By definition, T is an ancestor and descendant of itself. Consider the six tetrahedra S , . . . , S ⊂ R displayed in Figure 2. These tetra-hedra form a triangulation of the unit cube. The results and algorithms in this paper TETRAHEDRAL SPACE-FILLING CURVE refining the cube (cid:47) (cid:47) triangulating the cube (cid:15) (cid:15) triangulating each cube (cid:15) (cid:15) refining each cube (cid:47) (cid:47) triangulating each cube (cid:15) (cid:15) refiningeach tetrahedron (cid:47) (cid:47) refiningeach tetrahedron (cid:47) (cid:47) Fig. 3 . Triangulating a cube according to Figure 2 and then refining the tetrahedra via Bey’srefinement rule results in the same mesh as first refining the cube into eight subcubes and afterwardtriangulating each of these cubes. Each occurring tetrahedron is uniquely determined by the subcubeit lies in plus its type. The same situation can be observed in 2D if we restrict our view to one sideof the cube. rely on the following property [9].
Property Refining the six tetrahedra from the triangulation of the unit cubesimultaneously to level (cid:96) results in the same mesh as first refining the unit cube to level (cid:96) and then triangulating each smaller cube with the six tetrahedra S , . . . , S , scaledby a factor of − (cid:96) (see Figure 3). The same behavior can be observed in 2D when theunit square is divided into two triangles. It is worthnoting that, although the methods and algorithms presented in this paper apply tored-refined meshes with hanging nodes, it is possible to augment them to create mesheswithout hanging nodes. For this we may use red/green or red/green/blue refinementmethods [7, 17].After the red-refinement step we may add an additional and possibly nonlocalrefinement operation that ensures a maximum level difference of 1 between neigh-boring simplices. Such an operation is also called 2:1 balance [27, 52, 54]; its fulldescription, however, would exceed the scope of the present paper. Hanging nodesare then resolved by bisecting those simplices with hanging nodes (green/blue refine-ment) [4, section 12.1.3]. The 2D case is shown in Figure 4.If one of the newly created simplices shall be further refined, the bisection isreversed, the original simplex is red-refined, and the balancing and green-refinementis repeated. This may void the nesting property of certain discrete function spaces,yet applications may still prefer this approach over the manual implementation ofhanging node constraints.
3. A tetrahedral Morton index.
The Morton index or Z-order for a cube ina hexahedral mesh is computed by bitwise interleaving the coordinates of the anchornode of the cube [34]. In this section we present an index for d -simplices that also C, BURSTEDDE, J. HOLKE
Fig. 4 . To resolve hanging nodes we can execute an additional step of green- or blue-refinementafter the last refinement step [7,17]. Here we show the 2D refinement rules. Left: green (1 hangingnode). Right: blue (2 hanging nodes). If a triangle has 3 hanging nodes it is red-refined. uses the bitwise interleaving approach, the tetrahedral Morton index (TM-index). Todefine the TM-index we look at refinements of a reference simplex, which we discussin Section 3.1 below. For each d -simplex in a refinement of the reference simplex wedefine a unique identifier, the so-called Tet-id, which serves as the input to compute theTM-index and for all algorithms related to it. This Tet-id consists of the coordinatesof the anchor node of the considered simplex plus one additional number, the typeof the simplex. We define the Tet-id and type in Section 3.2. We then define theTM-index in Section 3.3 and in the following subsections. We show that it possesseslocality properties similar to those of the Morton index. One novel aspect of thisconstruction lies in logically including the types of the simplex and all its parents inthe interleaving, while only using the type of the simplex itself in the algorithms. Throughout the rest of this paper, let L be agiven maximal refinement level. Instead of the unit cube [0 , d , we consider thescaled cube [0 , L ] d , ensuring that all node coordinates in a refinement up to level L are integers. Suppose we are given some d -simplex T ⊂ R d together with a refinement S of T . By mapping T affine-linearly to 2 L S the refinement S is mapped to arefinement S (cid:48) of 2 L S . Therefore, to examine SFCs on refinements of T , it sufficesto examine SFCs on 2 L S . Thus, we only consider refinements of the d -simplex T d := 2 L S . Let T d be the set of all possible descendants of this d -simplex with levelsmaller than or equal to L ; thus(4) T d = (cid:8) T | T is a descendant of T d with 0 ≤ (cid:96) ( T ) ≤ L (cid:9) . Any refinement (up to level L ) of T d is a subset of T d , and for each T ∈ T d thereexists at least one refinement T of T d with T ∈ T . In this context, we refer to T d as the root simplex . Furthermore, let L d denote the set of all possible anchor nodecoordinates of d -simplices in T d , thus(5) L = (cid:8) [0 , L ) ∩ Z | y ≤ x (cid:9) , L = (cid:8) [0 , L ) ∩ Z | y ≤ z ≤ x (cid:9) . Note that we could have chosen any other of the S i (scaled by 2 L ) as the root simplexand we do not see any advantage or disadvantage in doing so. d -simplex. Making use of Property 4, wedefine the following.
Definition Each d -simplex T ∈ T d of level (cid:96) lies in a d -cube of the hexahedralmesh that is part of a uniform level (cid:96) refinement of [0 , L ] d . This specific cube is the associated cube of T and denoted by Q T . The d -simplex T is a scaled and shiftedversion of exactly one of the six tetrahedra S i that constitute the unit cube, and wedefine the type of T as this number, type( T ) := i . TETRAHEDRAL SPACE-FILLING CURVE T T T T b 0 0 0 0 11 1 1 1 0 Ct Child3D T , . . . , T T T T T b 0 0 4 5 2 11 1 3 2 5 02 2 0 1 4 33 3 5 4 1 24 4 2 3 0 55 5 1 0 3 4 Table 1
For a d -simplex T of type b the table gives the types Ct ( T ) , . . . , Ct ( T d − ) of T (cid:48) s children. Thecorner-children T , T , T (and in 3D also T ) always have the same type as T . The anchor node of a subcube of level (cid:96) is the particular corner of that cube with thesmallest x -, y - (and z -) coordinates. This means that for each simplex T in the refine-ment from Figure 3 the anchor node of T and the anchor node of its associated cubecoincide. Any two d -simplices in T d with the same associated cube are distinguishableby their type.From Bey’s observation from Figure 3 it follows that any simplex in T d can beobtained by specifying a level (cid:96) , then choosing one level (cid:96) subcube of the root cubeand finally fixing a type. This provides motivation for the following definition. Definition
For T = [ (cid:126)x , . . . , (cid:126)x d ] ∈ T d we define the Tet-id of T asthe tuple of its anchor node and type; thus (6) Tet-id ( T ) := ( (cid:126)x , type( T )) . Corollary Let
T, T (cid:48) ∈ T d . Then T = T (cid:48) if and only if their Tet-ids andlevels are the same. Note that in an arbitrary adaptive mesh there can be simplices with different levelsand each simplex T has an associated cube of level (cid:96) ( T ). In particular, simplices withthe same anchor node can have different associated cubes if their levels are not equal.Since any simplex in T d can be specified by the Tet-id and level, the Tet-id providesan important tool for our work. The construction of the TM-index in the next sectionand the algorithms that we present in Section 4 rely on the Tet-id as the fundamentaldata of a simplex. All information about a mesh can be extracted from the Tet-idand level of each element.Since the root simplex has type 0, in a uniform refinement more simplices havetype 0 than any other type. However, a close examination of Table 1 together with ashort inductive argument leads to the following proposition. Proposition In the limit (cid:96) → ∞ the different types of simplices in a uniformlevel (cid:96) refinement of T d occur in equal ratios. In addition to the anchorcoordinates the TM-index also depends on the types of all ancestors of a simplex. Inorder to define the TM-index we start by giving a formal definition of the interleavingoperation and some additional information.
Definition We define the interleaving a ˙ ⊥ b of two n -tuples a = ( a n − ,. . . , a ) and b = ( b n − , . . . , b ) as the n -tuple obtained by alternating the entries C, BURSTEDDE, J. HOLKE of a and b : (7) a ˙ ⊥ b := ( a n − , b n − , . . . , a , b ) . The interleaving of more than two n -tuples a , . . . , a m is defined analogously as the mn -tuple (8) a ˙ ⊥ · · · ˙ ⊥ a m := ( a n − , a n − , . . . , a mn − , a n − , . . . , a m − , a m ) . Remark
The TM-index of a d -simplex T ∈ T d that we are going to defineis constructed by interleaving d + 1 L -tuples, where the first d are the binary repre-sentations of the coordinates of T ’s anchor node and the last is the tuple consistingof the types of the ancestors of T . Definition
Let T ∈ T be a tetrahedron of refinement level (cid:96) with anchornode (cid:126)x = ( x, y, z ) T ∈ L . Since x, y, z ∈ N with ≤ x, y, z < L , we can expressthem as binary numbers with L digits, writing (9) x = L− (cid:88) j =0 x j j , y = L− (cid:88) j =0 y j j , z = L− (cid:88) j =0 z j j . We define the L -tuples X , Y , and Z as the L -tuples consisting of the binary digits of x , y , and z ; thus, X = X ( T ) := ( x L− , . . . , x ) , (10a) Y = Y ( T ) := ( y L− , . . . , y ) , (10b) Z = Z ( T ) := ( z L− , . . . , z ) . (10c) In 2D we get the same definitions with X and Y , leaving out the z -coordinate. Definition
For a T ∈ T d of level (cid:96) and each ≤ j ≤ (cid:96) let T j be the (unique)ancestor of T of level j . In particular, T (cid:96) = T . We define B ( T ) as the L -tupleconsisting of the types of T ’s ancestors in the first (cid:96) entries, starting with T . Thelast L − (cid:96) entries of B ( T ) are zero: (11) B = B ( T ) := type( T ) , type( T ) , . . . , type( T ) (cid:124) (cid:123)(cid:122) (cid:125) (cid:96) entries , , . . . , , Thus, if we write B as an L -tuple with indexed entries b i (12) B = B ( T ) = ( b L− , . . . , b ) ∈ { , . . . , d ! − } L , then the i th entry b i is given as (13) b i = (cid:40) type( T L− i ) L − ≥ i ≥ L − (cid:96), L − (cid:96) > i ≥ . Definition
13 (tetrahedral Morton Index).
We define the tetrahedral Mortonindex ( TM-index ) m ( T ) of a d -simplex T ∈ T d as the interleaving of the L -tuples Z (for tetrahedra), Y , X and B . Thus, (14a) m ( T ) := Y ˙ ⊥ X ˙ ⊥ B TETRAHEDRAL SPACE-FILLING CURVE for triangles and (14b) m ( T ) := Z ˙ ⊥ Y ˙ ⊥ X ˙ ⊥ B for tetrahedra. This index resembles the well-known Morton index or Z-order for d -dimensional cubes,which we denote by (cid:101) m here. For such a cube Q the Morton index is usually definedas the bitwise interleaving of its coordinates. Thus (cid:101) m ( Q ) = Z ˙ ⊥ Y ˙ ⊥ X , respectively, (cid:101) m ( Q ) = Y ˙ ⊥ X ; see [16, 34, 52].As we show in Section 4, the TM-index can be computed from the Tet-id of T with no further information given. Thus, in an implementation it is not necessary tostore the L -tuple B .The TM-index of a d -simplex builds up from packs of d bits z i (for tetrahedra), y i , and x i followed by a type b i ∈ { , . . . , d ! − } . Since d ! = 2 < d = 2, we caninterpret the 2D TM-index as a quarternary number with digits ( y i x i ) and b i :(15a) m ( T ) = (( y L− x L− ) , b L− , . . . , ( y x ) , b ) = L− (cid:88) i =0 (cid:0) (2 y i + x i )4 i +1 + b i i (cid:1) . Similarly we can interpret it as an octal number with digits ( z i y i x i ) and b i for d = 3,since then d ! = 6 < m ( T ) = (( z L− y L− x L− ) , b L− , . . . , ( z y x ) , b ) = L− (cid:88) i =0 (cid:0) (4 z i + 2 y i + x i )8 i +1 + b i i (cid:1) . The entries in these numbers are only nonzero up to the level (cid:96) of T , since x L− i = y L− i = ( z L− i =) b L− i = 0 for all i > (cid:96) . The octal/quarternary representation (15)directly gives an order on the TM-indices, and therefore it is possible to construct anSFC from it, which we examine further in Section 3.6. We use m ( T ) to denote boththe ( d + 1) L -tuple from (14) and the number given by (15).Let us look at Figure 5 for a short example to motivate this definition of the TM-index. Since the anchor coordinates and the type together with the level uniquelydetermine a d -simplex in T d , one could ask why we do not define the index to be(( Z ˙ ⊥ ) Y ˙ ⊥ X, type( T )), a pair consisting of the Morton index of the associated cubeof T and the type of T . This index was introduced for triangles in a slightly modifiedversion as semiquadcodes in [38] and would certainly require less information sincethe computation of the sequence B would not be necessary. However, it results in anSFC that traverses the leaf cubes of a refinement in the usual Z-order and inside ofeach cube it traverses the d ! different simplices in the order S , . . . , S d ! − . As a result,there can be simplices T whose children are not traversed as a group, which meansthat there is a tetrahedron T (cid:48) , which is not an ancestor or descendant of T , such thatsome child T i of T is traversed before T (cid:48) and T (cid:48) is traversed before another child T j of T . Theorem 16 states that this problem does not occur with the TM-index thatwe have defined above. Figure 5 compares the two approaches for a uniform level 2refinement of T . There is another in-terpretation of the TM-index, which is particularly useful for the AMR algorithms0
C, BURSTEDDE, J. HOLKE
Y Xm T = ( Y ˙ ⊥ X, type( T )) Y Xm T = Y ˙ ⊥ X ˙ ⊥ B Fig. 5 . Comparing a straightforward definition of a Morton-type SFC with our approach. Left:The SFC arising from taking the Morton order of the quadrants and only dividing into triangles onthe last level. Thus the index is ( Y ˙ ⊥ X, type( T )) . As we see on the two coarse triangles that areshaded, the children of a level 1 triangle are not necessarily traversed before any other triangle istraversed. Such a locality property would be desirable, and therefore this index is not suitable forour purposes. Right: The SFC arising from the TM-index from our Definition 13. We see that foreach level 1 triangle its four children are traversed as a group. Theorem 16 states that this propertyholds for any parent triangle/tetrahedron. The order in which the children are traversed depends(only) on the type of the parent and is different from Bey’s order given by (2) . presented in Section 4. In order to define it we introduce the concept of the so-calledcube-id. According to Figure 2 we number the 2 d corners of a d -dimensional cube by c , . . . , c d − in a zyx -order ( x varies fastest). When refining a cube to 2 d children,each child has exactly one of the c i as a corner, and it is therefore convenient tonumber the children by c , . . . , c d − as well. For the child c i we call the number i the cube-id of that child; see Figure 6 for an illustration. Since each cube Q thatis not the root has a unique parent, it also has a unique cube-id. This cube-id caneasily be computed by interleaving the last significant bits of the z - (in 3D), y -, and x -coordinates of Q ’s anchor node. Definition
Because each d -simplex T ∈ T d of level (cid:96) has a unique associatedcube we define the cube-id of T to be the cube-id of the associated cube of T , that is,the d -cube of level (cid:96) that has the same anchor node as T . If X , Y (and Z ) are as is in Definition 11 then we can write the cube-id of T ’s ancestorsas(16) cube-id( T i ) = ( y i x i ) in 2D , cube-id( T i ) = ( z i y i x i ) in 3D , and therefore using (15) we can rewrite the TM-index of T as(17) m ( T ) = (cube-id( T ) , type( T ) , . . . , cube-id( T (cid:96) ) , type( T (cid:96) ) , , . . . , d . This resembles the Morton index of the associated cube Q T of T , since we can writethis as(18) (cid:101) m ( Q T ) = (cube-id( Q ) , . . . , cube-id( Q (cid:96) ) , , . . . , d . In this section we show that the TM-indexhas properties similar to that of the Morton index for cubes. We are particularly in-terested in locality properties stating that refining a simplex only locally changes the
TETRAHEDRAL SPACE-FILLING CURVE XYc c c c X YZc c c c c c c
45 6701 23 c Fig. 6 . Left: A square is refined to four children, each of which corresponds to a corner of thesquare. The number of the corner is the cube-id of that child. Right: In three dimensions a cube isrefined to eight children. Their cube-ids and corner numbers are shown as well. order given by the TM-index. We begin by stating the uniqueness of the TM-indexif additionally a level (cid:96) is given.
Proposition
Together with a refinement level (cid:96) , the TM-index m ( T ) unique-ly determines a d -simplex in T d .Proof. If (cid:96) = 0, then there is only one simplex of level (cid:96) in T d , which is T d . Solet (cid:96) > m = m ( T ) be given as in (14), and let (cid:96) be the level of T . From m wecan compute the x - , y - (and z -) coordinates of the associated cube of T . We can alsocompute the type of T from the TM-index. By Corollary 7 this information uniquelydetermines T .For the Morton index (cid:101) m for cubes the following important properties are known [52]:(i) The Morton indices of the descendants of a parent cube are larger than or equalto the index of the parent cube.(ii) A Morton index of a cube Q is the prefix of an index of a cube P of higher levelthan Q if and only if P is a descendant of Q .(iii) Refining only changes the SFC locally. Thus, if Q is a cube and P is a cubewith (cid:101) m ( Q ) < (cid:101) m ( P ) and P is not a descendant of Q , then (cid:101) m ( Q (cid:48) ) < (cid:101) m ( P ) foreach descendant Q (cid:48) of Q .Property (iii) defines a hierarchic invariant of the SFC that is specific to our con-struction (see Figure 5). We show below that properties (i), (ii) and (iii) hold for d -simplices and the TM-index described by (14). Theorem
For arbitrary d -simplices T (cid:54) = S ∈ T d the TM-index satisfies thefollowing:(i) If S is a descendant of T then m ( T ) ≤ m ( S ) .(ii) If (cid:96) ( T ) < (cid:96) ( S ) , then m ( T ) is a prefix of m ( S ) if and only if S is a descendantof T .(iii) If m ( T ) < m ( S ) and S is no descendant of T , then for each descendant T (cid:48) of T we have (19) m ( T ) ≤ m ( T (cid:48) ) < m ( S ) . The proof of Theorem 16 requires some work and we need to show a technical resultfirst. Hereby, we consider only the 3D case, since for 2D the argument is completelyanalogous. We define an embedding of the set of all TM-indices into the set of Mortonindices for 6D cubes. Since the properties (i)–(iii) hold for these cubes it follows that2
C, BURSTEDDE, J. HOLKE they hold for tetrahedra as well. To this end, for a given tetrahedron T ∈ T weinterpret each entry b j of B ( T ) as a 3-digit binary number(20) b j = ( b j b j b j ) , which is possible since b j ∈ { , . . . , } . We obtain three new L -tuples B , B , B satisfying(21) B = B ˙ ⊥ B ˙ ⊥ B , and thus we can rewrite the TM-index as(22) m ( T ) = Z ˙ ⊥ Y ˙ ⊥ X ˙ ⊥ B ˙ ⊥ B ˙ ⊥ B . Note that we can interpret each B i as an L -digit binary number for which we have0 ≤ B i < L . Now let Q denote the set of all 6D cubes that are a child of the cube Q := [0 , L ] :(23) Q = { Q | Q is a descendant of Q of level 0 ≤ (cid:96) ≤ L } . Since a cube Q ∈ Q is uniquely determined by the six coordinates ( x , . . . , x ) ofits anchor node plus its level (cid:96) , we also write Q = Q ( x ,...,x ) ,(cid:96) . Note that theMorton index for a cube can be defined as the bitwise interleaving of its anchor nodecoordinates [34]:(24) (cid:101) m ( Q ) = X ˙ ⊥ X ˙ ⊥ X ˙ ⊥ X ˙ ⊥ X ˙ ⊥ X . Proposition
The map (25) Φ : T −→ Q ,T (cid:55)−→ Q ( B ( T ) ,B ( T ) ,B ( T ) ,x ( T ) ,y ( T ) ,z ( T )) ,(cid:96) ( T ) is injective and satisfies (26) (cid:101) m (Φ( T )) = m ( T ) . Furthermore, it fulfills the property that T (cid:48) is a child of T if and only if Φ( T (cid:48) ) is achild of Φ( T ) .Proof. The equation m ( T ) = (cid:101) m (Φ( T )) follows directly from the definitions of theTM-indices on T and Q . From Lemma 15 we conclude that Φ is injective. Now let T (cid:48) , T ∈ T , where T (cid:48) is a child of T . Furthermore, let (cid:96) = (cid:96) ( T ). We know that Q (cid:48) := Φ( T (cid:48) ) is a child of Q := Φ( T ) if and only if for each i ∈ { , . . . , } it holds that(27) x i ( Q (cid:48) ) ∈ (cid:110) x i ( Q ) , x i ( Q ) + 2 L− ( (cid:96) +1) (cid:111) . Because of the underlying cube structure (compare Figure 3) we know that the x -coordinate of the anchor node of T (cid:48) satisfies(28) x ( T (cid:48) ) ∈ (cid:110) x ( T ) , x ( T ) + 2 L− ( (cid:96) +1) (cid:111) , and likewise for Y ( T (cid:48) ) and Z ( T (cid:48) ). Therefore, (27) holds for i = 3 , ,
5. By definition B j ( T (cid:48) ) is the same as B j ( T ) except at position L − ( (cid:96) + 1), where(29) B j ( T (cid:48) ) L− ( (cid:96) +1) = b j L− ( (cid:96) +1) ( T (cid:48) ) ∈ { , } TETRAHEDRAL SPACE-FILLING CURVE B j ( T ) L− ( (cid:96) +1) = 0 . Hence, we conclude that (27) also holds for i = 0 , ,
2. So Φ( T (cid:48) ) is a child of Φ( T ).To show the other implication, let us assume that Φ( T (cid:48) ) is a child of Φ( T ). Since (cid:96) ( T (cid:48) ) = (cid:96) (Φ( T (cid:48) )) > T (cid:48) has a parent and we denote it by P . In the argument abovewe have shown that Φ( P ) is the parent of Φ( T (cid:48) ) and because each cube has a uniqueparent the identity Φ( P ) = Φ( T ) must hold. Therefore, we get P = T since Φ isinjective; thus, T (cid:48) is the child of T .Inductively we conclude that T (cid:48) is a descendant of T if and only if Φ( T (cid:48) ) is a descen-dant of Φ( T ). Now Theorem 16 follows, because the desired properties (i)–(iii) holdfor the Morton index of cubes. By interpretingthe TM-indices as 2 d -ary numbers as in (15) we get a total order on the set of allpossible TM-indices, and therefore it gives rise to an SFC for any refinement S of T d . In this section we further examine the SFC derived from the TM-index. We givehere a recursive description of it, similarly to how it is done for the Sierpinski curveand other cubical SFC by Haverkort and van Walderveen [24].Part (iii) of Theorem 16 tells us that the descendants of a simplex T are traversedbefore any other simplices with a higher TM-index than T are traversed. However, theorder that the children of T have relative to each other can be different to the orderof children of another simplex T (cid:48) . In particular the order of the simplices defined bythe TM-index differs from the order (2) defined by Bey. We observe this behavior in2D in Figure 5 on the right-hand side: For the level 1 triangles of type 0 the childrenare traversed in the order(31) T , T , T , T and the children of the level 1 triangle of type 1 are traversed in the order(32) T , T , T , T . In fact, the order of the children of a simplex T depends only on the type of T , as weshow in the following Proposition. Proposition If T, T (cid:48) ∈ T d are two d -simplices of given type b = type( T ) =type( T (cid:48) ) , then there exists a unique permutation σ ≡ σ b of (cid:8) , . . . , d − (cid:9) such that (33) m ( T σ (0) ) < m ( T σ (1) ) < · · · < m ( T σ (2 d − ) , and m ( T (cid:48) σ (0) ) < m ( T (cid:48) σ (1) ) < · · · < m ( T (cid:48) σ (2 d − ) . Thus, the children of T and the children of T (cid:48) are in the same order with respect totheir TM-index.Proof. By ordering the children of T and T (cid:48) with respect to their TM-indices, weobtain σ and σ (cid:48) with(34) m ( T σ (0) ) < m ( T σ (1) ) < · · · < m ( T σ (2 d − ) ,m ( T (cid:48) σ (cid:48) (0) ) < m ( T (cid:48) σ (cid:48) (1) ) < · · · < m ( T (cid:48) σ (cid:48) (2 d − ) . C, BURSTEDDE, J. HOLKE
These permutations are well-defined and unique with this property because differentsimplices of the same level never have the same TM-index; see Proposition 15. Itremains to show that σ (cid:48) = σ . Let (cid:96) = (cid:96) ( T ) and (cid:96) (cid:48) = (cid:96) ( T (cid:48) ). Since the TM-indices ofthe children of T do all agree up to level (cid:96) , we see, using the notation from (15), thattheir order σ depends only on the d + 1 numbers ( z is omitted for d = 2)(35) z L− ( (cid:96) +1) ( T i ) , y L− ( (cid:96) +1) ( T i ) , x L− ( (cid:96) +1) ( T i ) and b L− ( (cid:96) +1) ( T i ) . The same argument applies to σ (cid:48) and (cid:96) (cid:48) . From now on we carry out the computationsfor d = 3. Since type( T ) = type( T (cid:48) ) we can write(36) T = λT (cid:48) + (cid:126)c, with(37) λ = 2 (cid:96) (cid:48) − (cid:96) , (cid:126)c = x ( T ) − x ( T (cid:48) ) y ( T ) − y ( T (cid:48) ) z ( T ) − z ( T (cid:48) ) . Since the refinement rules (2) commute with scaling and translation we also obtain(38) T i = λT (cid:48) i + (cid:126)c for the children of T and T (cid:48) and therefore(39) b L− ( (cid:96) +1) ( T i ) = type( T i ) = type( T (cid:48) i ) = b L− ( (cid:96) (cid:48) +1) ( T (cid:48) i )for 0 ≤ i < d . Furthermore, we have(40) x L− ( (cid:96) +1) ( T i ) = ( x ( T i ) − x ( T ))2 −L +( (cid:96) +1) from which we derive(41) x L− ( (cid:96) +1) ( T i ) = λ ( x ( T (cid:48) i ) − x ( T (cid:48) ))2 −L +( (cid:96) +1) = 2 (cid:96) (cid:48) − (cid:96) ( x ( T (cid:48) i ) − x ( T (cid:48) ))2 −L +( (cid:96) +1) = ( x ( T (cid:48) i ) − x ( T (cid:48) ))2 −L +( (cid:96) (cid:48) +1) = x L− ( (cid:96) (cid:48) +1) ( T (cid:48) i ) , and analogously(42) y L− ( (cid:96) +1) ( T i ) = y L− ( (cid:96) (cid:48) +1) ( T (cid:48) i ) ,z L− ( (cid:96) +1) ( T i ) = z L− ( (cid:96) (cid:48) +1) ( T (cid:48) i ) . This shows that the tetrahedral Morton order of the children of T and T (cid:48) are thesame and σ (cid:48) must equal σ . Definition
Let T ∈ T d such that T ’s parent P has type b and T is the i thchild of P according to Bey’s order (2) , ≤ i < d . We call the number σ b ( i ) the local index of the d -simplex T and use the notation (43) I loc ( T ) := σ b ( i ) to denote the child number in the TM-ordering, subsequently written TM-child. Bydefinition, the local index of the root simplex is zero, I loc ( T d ) := 0 . Table 2 lists thelocal indices for each parent type. TETRAHEDRAL SPACE-FILLING CURVE I loc Child2D T T T T b 0 0 1 3 21 0 2 3 1 I loc Child3D T T T T T T T T b 0 0 1 4 7 2 3 6 51 0 1 5 7 2 3 6 42 0 3 4 7 1 2 6 53 0 1 6 7 2 3 4 54 0 3 5 7 1 2 4 65 0 3 6 7 2 1 4 5 Table 2
The local index for the children of a d -simplex T according to the TM-ordering. For each type b ,the d children T , . . . , T d − of a simplex of this type can be ordered according to their TM-indices.The position of T i according to the TM-order is the local index I loc ( T i ) = σ b ( i ) . type = 0R: 31 20 RF RR type = 1F: 0 32 1 RFF F Fig. 7 . Left: Using the notation from [24] we recursively describe the SFC arising from theTM-index for triangles. The number inside each child triangle is its local index. R denotes therefinement scheme for type 0 triangles and F for type 1 triangles. This pattern can be obtained fromTables 1 and 2. Right: The SFC for an example adaptive refinement of the root triangle. Thus, we know for each type 0 ≤ b < d ! how the children of a tetrahedron of type b are traversed. This gives us an approach for describing the SFC arising from theTM-index in a recursive fashion [24]. By specifying for each possible type b the orderand types of the children of a type b simplex, we can build up the SFC. In Figure 7we describe the SFC for triangles in this way. In three dimensions it is not convenientto draw the six pictures for the different types, but the SFC can be derived similarlyfrom Tables 1 and 2.
4. Algorithms on simplices.
In this section we present fundamental algo-rithms that operate on d -simplices in T d . These algorithms include computations ofparent and child simplices, computation of face-neighbors and computations involvedwith the TM-index. To simplify the notation we carry out all algorithms for tetrahe-dra and then describe how to modify them for triangles. We introduce the data type Tet and do not distinguish between the abstract concept of a
Tet and the geomet-ric object (tetrahedron or triangle) that it represents. The data type
Tet T has thefollowing members: • T.(cid:96) — the refinement level of T ; • T.(cid:126)x = (
T.x, T.y, T.z ) — the x - , y - and z -coordinates of T ’s anchor node, alsosometimes referred to as T.x , T.x , and T.x ; • T.b — the type of T .In 2D computations the parameter T.z is not present. To avoid confusion we use the6
C, BURSTEDDE, J. HOLKE notation (cid:126)x to denote vectors in Z d and x (without arrow) for integers, thus numbersin Z . From Corollary 7 we know that the values stored in a Tet suffice to uniquelyidentify a d -simplex T ∈ T . Remark
20 (Storage requirement).
The algorithms that we present in this sectiononly need this data as input for a simplex resulting in a fixed storage size per
Tet . If,for example, the maximum level L is 32 or less, then the coordinates can be storedin one -byte integer per dimension, while the level and type occupy one byte each,leading to a total storage of (44) 2 × bytes per Tet in 2D, × bytes per Tet in 3D.
Remark
21 (Runtime).
Most of these algorithms run in constant time indepen-dent of the maximum level L . The only operations using a loop over the level L or T.(cid:96) ,thus having O ( L ) runtime, are computing the consecutive index from a Tet and ini-tializing a
Tet according to a given consecutive index. Hence, we show how to replacerepetitive calls of these relatively involved algorithms by more efficient constant-timeones. d -simplex. The coordinates of the d + 1 nodes ofa d -simplex T can be obtained easily from its Tet-id, the relation (3), and simplearithmetic: If T is a d -simplex of level (cid:96) , type b and anchor node (cid:126)x ∈ Z d , then(45) T = 2 L− (cid:96) S b + (cid:126)x . Hence, in order to compute the coordinates of T we can take the coordinates of S b ,as given in (3), and then use relation (45). A closer look at (3) reveals that it is notnecessary to examine all coordinates of S b in order to compute the x i , but that theycan also be computed arithmetically. This computation is carried out in Algorithm4.1. Algorithm 4.1:
Coordinates ( Tet T ) X ← ( T.(cid:126)x, , , h ← L− (cid:96) ; i ← (cid:98) T.b (cid:99) ; /* Replace with i ← T.b for 2D */ if T.b % then j ← ( i + 2) % else j ← ( i + 1) % X [1] ← X [0] + he i ; X [2] ← X [1] + he j ; /* Replace with X [2] ← X [0] + ( h, h ) T for 2D */ X [3] ← X [0] + ( h, h, h ) T ; /* Remove this line for 2D */ return X ; In this section we describe how to compute the Tet-idsof the parent P ( T ) and of the 2 d children T i , 0 ≤ i < d , of a given d -simplex T ∈ T d .Computing the anchor node coordinates of the parent is easy, since their first T.(cid:96) − T ’s anchor node and the rest of their bits is zero. TETRAHEDRAL SPACE-FILLING CURVE Pt( c, b ) b2D 0 1 c c, b ) b3D 0 1 2 3 4 5 c Z YX
Fig. 8 . The type of the parent of a d -simplex T can be determined from T ’s cube-id c and type b . Left: The values of P t from (46) in the 2D case. Middle: The values of the function
P t in the3D case. Right: Two examples showing the computation in 3D. (1) The small tetrahedron in thetop left corner (orange) has cube-id and type , and its parent (green) can be seen to have type .(2) The small tetrahedron at the bottom right (red) has cube-id and type , and its parent (blue)has type . For computing the type of P ( T ), we need the function(46) Pt : (cid:8) , . . . , d − (cid:9) × { , . . . , d ! − } −→ { , . . . , d ! − } , (cube-id( T ) , T.b ) (cid:55)−→ P.b, giving the type of T ’s parent in dependence of its cube-id and type. In Figure 8 welist all values of this function for d ∈ { , } . Algorithm 4.2: c-id ( Tet T ,int (cid:96) ) i ← h ← L− (cid:96) i | = ( T.x & h ) ? 1 : 0 i | = ( T.y & h ) ? 2 : 0 i | = ( T.z & h ) ? 4 : 0 /* Remove this line for 2D */ return i Algorithm 4.3:
Parent ( Tet T ) h ← L− T.(cid:96) P.(cid:96) ← T.(cid:96) − P.x ← T.x & ¬ h P.y ← T.y & ¬ h P.z ← T.z & ¬ h /* Remove this line for 2D */ P.b ← Pt( c-id ( T, T.(cid:96) ) , T.b ) /* See (46) and Figure 8 for Pt */ return P The algorithm
Parent to compute the parent of T now puts these two ideastogether, computing the coordinates and type of P ( T ). Algorithm 4.3 shows animplementation. It uses Algorithm 4.2 to compute the cube-id of a d -simplex.For the computation of one child T i of T for a given i ∈ (cid:8) , . . . , d − (cid:9) we lookat Bey’s definition of the subsimplices in (2) and see that in order to compute the8 C, BURSTEDDE, J. HOLKE anchor node of the child we need to know some of the node coordinates (cid:126)x , . . . , (cid:126)x d of the parent simplex T . These can be obtained via Algorithm 4.1. However, it ismore efficient to compute only those coordinates of T that are actually necessary.To compute the Tet-id of T i we also need to know its type. The type of T i dependsonly on the type of T , and in the algorithm we use the function Ct (children type)to compute this type. Ct is effectively an evaluation of Table 1. Algorithm 4.4 showsnow how to compute the coordinates of the i th child of T in Bey’s order.When we would like to compute the i th child of a d -simplex T of type b withrespect to the tetrahedral Morton order (thus the child T k of T with I loc ( T k ) = i )we just call Algorithm 4.4 with σ − b ( i ) as input. The permutations σ − b are availablefrom Table 2; see (43) and Algorithm 4.5. Algorithm 4.4:
Child ( Tet T ,int i ) X ← Coordinates ( T ) if i = 0 then j ← else if i ∈ { , , } then j ← else if i ∈ { , , } then j ← else if i = 3 then j ← /* If i = 3 then j ← for 2D */ T i .(cid:126)x ← ( X [0] + X [ j ]) T i .b ← Ct ( T.b, i ) /* See Table 1 */ Algorithm 4.5:
TM-Child ( Tet T ,int i ) return Child ( T , σ − T.b ( i )) /* See Table 2 */ Many applications—e.g., finite element methods—need to gather information about the face-neighboring simplices of a given simplexin a refinement. In this section we describe a level-independent constant-runtimealgorithm to compute the Tet-id of the same level neighbor along a specific face f ofa given d -simplex T . This algorithm is very lightweight since it only requires a fewarithmetic computations involving the Tet-id of T and the number f . In comparisonto other approaches to finding neighbors in constant time [11, 32], our algorithm doesnot involve the computation of any of T ’s ancestors.The d + 1 faces of a d -simplex T = [ (cid:126)x , . . . , (cid:126)x d ] are numbered f , . . . , f d in sucha way that face f i is the face not containing the node (cid:126)x i . To examine the situationwhere two d -simplices of the same level share a common face, let T (cid:96)d denote a uniformrefinement of T d of a given level 0 ≤ (cid:96) ≤ L ,(47) T (cid:96)d := (cid:8) T | T is a descendant of T d of level (cid:96) (cid:9) ⊂ T d . T d can be seen as the disjoint union of all the T (cid:96)d :(48) T d = L (cid:91) (cid:96) =0 T (cid:96)d . Given a d -simplex T ∈ T (cid:96)d and a face number i ∈ { , . . . , d } , denote T ’s neighbor in T (cid:96)d across face f = f i by N f ( T ), and denote the face number of the neighbor simplex TETRAHEDRAL SPACE-FILLING CURVE Z YX
Type 1Type 4Type 3 Type 0
Fig. 9 . A tetrahedron T of type (in the middle, red) and its four face neighbors (blue) oftypes , , , and , drawn with their associated cubes (exploded view). We see that the type of T ’sneighbors depends only on its type, while their node coordinates relative to T ’s depend additionallyon T ’s level. (Color available online.) N f ( T ) across which T is its neighbor by ˜ f ( T ). Hence, the relation(49) N ˜ f ( T ) ( N f ( T )) = T holds for each face f of T .Our aim is to compute the Tet-id of N f ( T ) and ˜ f ( T ) from the Tet-id of T . Usingthe underlying cube structure this problem can be solved for each occuring type of d -simplex separately, and the solution scheme is independent of the coordinates of T and of (cid:96) . In Figure 9 the situation for a tetrahedron of type 5 is illustrated, and inTables 3 and 4 we present the general solution for each type.Using these tables, a constant-time computation of the Tet-id of N f ( T ) and of˜ f ( T ) from the Tet-id of T is possible, and the 3D case is carried out in Algorithm 4.6.Note that this algorithm uses arithmetic expressions in T.b to avoid the sixfold dis-tinction of cases.
Remark
To find existing neighbors in a nonuniform refinement we use Al-gorithm 4.6 in combination with
Parent and
TM-Child and comparison functions.Of course it is possible that N f ( T ) does not belong to T (cid:96)d any longer. If this isthe case, then f was part of the boundary of the root simplex T d . We describe in thenext section how we can decide in constant time whether N f ( T ) is in T (cid:96)d or not. For completeness, we summarize the geometry and numbers of d -simplices tou-ching each other via a corner ( d = 2 or d = 3) or edge (only d = 3). In this paper wedo not list these neighboring tetrahedra in detail.For d = 3 each corner in the mesh T (cid:96) has 24 adjacent tetrahedra; thus eachtetrahedron has at each corner 23 other tetrahedra that share this particular corner.For d = 2 the situation is similar, with six triangles meeting at each corner. Toexamine the number of adjacent tetrahedra to an edge we distinguish three types ofedges in T (cid:96)d :1. edges that are also edges in the underlying hexahedral mesh;2. edges that are the diagonal of a side of a cube in the hexahedral mesh;3. edges that correspond to the inner diagonal of a cube in the hexahedral mesh.Edges of the first and third kind have six adjacent tetrahedra each, and edges of thesecond kind do have four adjacent tetrahedra each. When computing neighboring d -simplices it is possible that the neighbor in question does not belong to the rootsimplex T d but lies outside of it. If we look at face-neighbors of a d -simplex T , the0 C, BURSTEDDE, J. HOLKE f T.b = 0 N f .b N f .x T.x + h T.x T.x N f .y T.y T.y T.y − h ˜ f T.b = 1 N f .b N f .x T.x T.x T.x − h N f .y T.y + h T.y T.y ˜ f Table 3
Face neighbors in 2D. For each possible type b ∈ { , } of a triangle T and each of its faces f = f i , i ∈ { , , } , the type, anchor node coordinates, and corresponding face number ˜ f of T ’sneighbor across f are shown. In the 2D case we can directly compute N .b = 1 − T.b and ˜ f = 2 − f .Here, h = 2 L− (cid:96) refers to the length of one square of level (cid:96) . f T.b = 0 N f .b N f .x T.x + h T.x T.x T.x N f .y T.y T.y T.y T.y − h N f .z T.z T.z T.z T.z ˜ f T.b = 1 N f .b N f .x T.x + h T.x T.x T.x N f .y T.y T.y T.y T.y N f .z T.z T.z T.z T.z − h ˜ f T.b = 2 N f .b N f .x T.x T.x T.x T.x N f .y T.y + h T.y T.y T.y N f .z T.z T.z T.z T.z − h ˜ f f T.b = 3 N f .b N f .x T.x T.x T.x T.x − h N f .y T.y + h T.y T.y T.y N f .z T.z T.z T.z T.z ˜ f T.b = 4 N f .b N f .x T.x T.x T.x T.x − h N f .y T.y T.y T.y T.y N f .z T.z + h T.z T.z T.z ˜ f T.b = 5 N f .b N f .x T.x T.x T.x T.x N f .y T.y T.y T.y T.y − h N f .z T.z + h T.z T.z T.z ˜ f Table 4
Face neighbors in 3D. For each possible type b ∈ { , , , , , } of a tetrahedron T and eachof its faces f = f i , i ∈ { , , , } the type N f ( T ) .b of T ’s neighbor across f , its coordinates of theanchor node N f ( T ) .x , N f ( T ) .y , N f ( T ) .z and the corresponding face number ˜ f ( T ) , across which T is N f ( T ) ’s neighbor, are shown. fact that the considered neighbor lies outside means that the respective face was onthe boundary of T d . In order to check whether a computed d -simplex is outside thebase simplex, we investigate a more general problem: Given anchor node coordinates( x , y ) T ∈ Z , respectively ( x , y , z ) T ∈ Z , of type b a level (cid:96) , decide whether thecorresponding d -simplex N lies inside or outside of the root tetrahedron T d : N ∈ T (cid:96)d or N / ∈ T (cid:96)d . At the end of this section we furthermore generalize this to the problemof deciding for any two d -simplices N and T whether or not N lies outside of T . Wesolve this problem in constant time and independent of the levels of N and T .We examine the 3D case. Looking at T we observe that two of its boundary facescorrespond to faces of the root cube, namely, the intersections of T with the y = 0and the x = 2 L planes. The other two boundary faces of T are the intersections withthe x = z and the y = z planes. Thus, the boundary of T can be described as the TETRAHEDRAL SPACE-FILLING CURVE Algorithm 4.6:
Face-neighbor ( Tet T ,int f ) b ← T.b , x ← T.x , x ← T.x , x ← T.x if f = 1 or f = 2 then ˜ f ← f if ( b % and f = 2) or ( b % (cid:54) = 0 and f = 1) then b ← b + 1 else b ← b − else ˜ f ← − f h ← L− T.(cid:96) if f = 0 then /* f = 0 */ i ← b div x i ← T.x i + h b ← b + ( b % else /* f = 3 */ i ← ( b + 3) % div x i ← T.x i − h b ← b + ( b % return ( x , x , x , b % , ˜ f )intersection of T with those planes. We refer to the latter two planes as E and E .Let N be a tetrahedron with anchor node ( x , y , z ) T ∈ Z of type b and level (cid:96) and denote with ( x i , y i , z i ) T the coordinates of node i of N . Since ( x i , y i , z i ) T ≥ ( x , y , z ) T (componentwise), we directly conclude that if x ≥ L or y < N / ∈ T . Because the outer normal vectors of T on the two faces intersecting with E and E are(50) (cid:126)n = 1 √ − and (cid:126)n = 1 √ − , we also conclude that N / ∈ T if z − x > y − z >
0. Now we have alreadycovered all the cases except those where the anchor node of N lies directly in E or E . In these cases we cannot solve the problem by looking at the coordinates of theanchor node alone, since there exist tetrahedra T (cid:48) ∈ T with anchor nodes lying in oneof these planes (see Figure 10 for an illustration of the analogous case in 2D). Thisdepends on the type of the tetrahedron in question. We observe that a tetrahedron T (cid:48) ∈ T with anchor node lying in E can have the types 0 ,1, or 2, and a tetrahedronwith anchor node lying in E can have the types 0, 4, or 5. We conclude that tocheck whether N is outside of the root tetrahedron we have to check if any one of sixconditions is fulfilled. In fact these conditions fit into the general form below with x i = x , x j = y , x k = z , and T as the root tetrahedron; thus T.x = T.y = T.z = 0 and
T.b = 0.These generalized conditions solve the problem to check for any two given tetra-hedra N and T , whether N lies outside of T or not. Proposition
Given two d -simplices N, T with
N.(cid:96) > T.(cid:96) , then N is outsideof the simplex T —which is equivalent to saying that N is no descendant of T —if and C, BURSTEDDE, J. HOLKE
T.b
2D 0 1 x i x yx j y x T.b
3D 0 1 2 3 4 5 x i x x y y z zx j y z z x x yx k z y x z y x Table 5
Following the general scheme described in this section to compute whether a given d -simplex N lies outside of another given d -simplex T , we give the coordinates x i , x j , and x k in dependence ofthe type of T . only if at least one of the following conditions is fulfilled.For 2D, N.x i − T.x i ≥ L− T.(cid:96) , (51a) N.x j − T.x j < , (51b) ( N.x j − T.x j ) − ( N.x i − T.x i ) > , (51c) N.x i − T.x i = N.x j − T.x j and N.b = (cid:26) if T.b = 0 , if T.b = 1 . (51d) For 3D,
N.x i − T.x i ≥ L− T.(cid:96) , (52a) N.x j − T.x j < , (52b)( N.x k − T.x k ) − ( N.x i − T.x i ) > , (52c)( N.x j − T.x j ) − ( N.x k − T.x k ) > , (52d) N.x j − T.x j = N.x k − T.x k and N.b ∈ (cid:26) { T.b + 1 , T.b + 2 , T.b + 3 } , if T.b is even, { T.b − , T.b − , T.b − } , if T.b is odd, (52e)
N.x k − T.x k = N.x i − T.x i and N.b ∈ (cid:26) { T.b − , T.b − , T.b − } , if T.b is even, { T.b + 1 , T.b + 2 , T.b + 3 } , if T.b is odd. (52f)
N.x j − T.x j = N.x k − T.x k and N.x i − T.x i = N.x k − T.x k and N.b (cid:54) = T.b (52g)
The coordinates x i , x j , and x k are chosen in dependence of the type of T accordingto Table 5. TETRAHEDRAL SPACE-FILLING CURVE XY Cases 3 and 4:( N .x < N .y ) or( N .x = N .y and N .b = 1) Case 1: N .x ≥ L Case 2: N .y < b =0 b =1 Fig. 10 . A uniform level 2 refinement ofthe triangle T in 2D and triangles lying outsideof it with their anchor nodes marked. Whendeciding whether a triangle with given anchornode coordinates lies outside of T there arefour cases to consider, one for each face of T .For the two faces lying parallel to the X -axis,respectively, Y -axis, it suffices to check whetherthe x -coordinate is greater than or equal to L ,or whether the y -coordinate is smaller than .Similarly one can conclude that the triangle liesoutside of T if its x -coordinate is smaller thanits y -coordinate. If both coordinates agree (andnone of the previous cases applies) then thegiven triangle is outside T if and only if itstype is . Proof.
By shifting N by ( − T.(cid:126)x ) we reduce the problem to checking whether theshifted d -simplex lies outside of a d -simplex with anchor node (cid:126)
0, level
T.(cid:96) and type
T.b . For d = 3 the proof is analogous to the above argument, where we consideredthe case b = 0 and (cid:96) = 0. In two dimensions the situation is even simpler, since thereexists only one face of the root triangle that is not a coordinate axis (see Figure 10). In contrast to the Morton index for cubes, the TM-index for d -simplices does not produce a consecutive range of numbers. Therefore,two simplices T and T (cid:48) of level (cid:96) that are direct successors/predecessors with respectto the tetrahedral Morton order do not necessarily fulfill m ( T ) = m ( T (cid:48) ) ± d ( L− (cid:96) ) or m ( T ) = m ( T (cid:48) ) ±
1. For d = 3 this follows directly from the fact that each b j occupiesthree bits, but there are only six values that each b j can assume, since there are onlysix different types. In 2D this follows from the fact that not every combination ofanchor node coordinates and type can occur for triangles in T , the triangle withanchor node (0 ,
0) and type 1 being one example. This also means that the largestoccurring TM-index is bigger than 2 d L − (cid:96) a consecutive index I ( T ) ∈ (cid:8) , . . . , d(cid:96) − (cid:9) , such that(53) ∀ T, S ∈ T (cid:96)d : I ( T ) < I ( S ) ⇔ m ( T ) < m ( S ) . This index can also be understood as a bijection(54) I : (cid:8) m ∈ N | m = m ( T ) for a T ∈ T (cid:96)d (cid:9) ∼ = −→ (cid:8) , . . . , d(cid:96) − (cid:9) , mapping the TM-indices of level (cid:96) d -simplices to a consecutive range of numbers. It isobvious that I ( T d ) = 0. The index I ( T ) can be easily computed as the (cid:96) -digit 2 d -arynumber consisting of the local indices as digits, thus(55) I ( T ) = ( I loc ( T ) , . . . , I loc ( T (cid:96) )) d . Algorithm 4.7 shows an implementation of this computation. It can be donedirectly from the Tet-id of T , and thus it is not necessary to compute the TM-indexof T first.4 C, BURSTEDDE, J. HOLKE
Algorithm 4.7: I ( Tet T ) I ← b ← T.b for T.(cid:96) ≥ i ≥ do c ← c-id ( T, i ) I ← I + 8 i I b loc ( c ) /* See Table 6; multiply with i for 2D */ b ← Pt( c, b ) return I The inverse operation of computing T from I ( T ) and a given level (cid:96) can be carriedout in a similar fashion; see Algorithm 4.8. For each 0 ≤ i ≤ (cid:96) we look up the type b and the cube-id of T i from I loc ( T i ) and the type of Parent ( T i ) = T i − (startingwith type( T ) = 0) via Tables 7 and 8. From the cube-ids we can build up the anchornode coordinates of T . The last computed type is the type of T . The runtime of thisalgorithm is O ( T.(cid:96) ). Algorithm 4.8: T (consecutive index I ,int (cid:96) ) T.x, T.y, T.z ← b ← for ≤ i ≤ (cid:96) do Get I loc ( T i ) from I /* See (55) */ c ← c-id ( T i ) , b ← T i .b /* See Tables 7 and 8 */ if c & 1 then T.x ← T.x + 2 L− i if c & 2 then T.y ← T.y + 2 L− i if c & 4 then T.z ← T.z + 2 L− i /* Remove this line for 2D */ T.b ← b return T Similar to Algorithm 4.7 is Algorithm
Is valid , which decides whether a givenindex m ∈ [0 , L ) ∩ Z is in fact a TM-index for a tetrahedron. Thus, in the spirit ofSection 3.5 we can decide whether a given 6D cube is in the image of map (25) thatembeds T into the set of 6D subcubes of [0 , L ] . The runtime of Is valid is O ( L ). Algorithm 4.9:
Is valid ( m ∈ [0 , L ) ∩ Z , (cid:96) ) I ← k ← L − i ) for (cid:96) ≥ i ≥ do b ← ( m k , m k +1 , m k +2 ) c ← ( m k +3 , m k +4 , m k +5 ) k ← L − i + 1) if ( m k , m k +1 , m k +2 ) (cid:54) = Pt ( c, b ) then /* Take (0 , , if i = 1 */ return False return True
The consecutive index simplifies the relation between the TM-index of a simplexand its position in the SFC. In the special case of a uniform mesh, the consecutive
TETRAHEDRAL SPACE-FILLING CURVE I b loc ( c ) cube-id c2D 0 1 2 3b 0 0 1 1 31 0 2 2 3 I b loc ( c ) cube-id c3D 0 1 2 3 4 5 6 7b 0 0 1 1 4 1 4 4 71 0 1 2 5 2 5 4 72 0 2 3 4 1 6 5 73 0 3 1 5 2 4 6 74 0 2 2 6 3 5 5 75 0 3 3 6 3 6 6 7 Table 6
The local index of a tetrahedron T ∈ T in dependence of its cube-id c and type b . cube-id( T ) I loc ( T )2D 0 1 2 3 P.b T ) I loc ( T )3D 0 1 2 3 4 5 6 7 P.b
Table 7
For a tetrahedron T ∈ T of local Index I loc whose parent P has type P.b we give the cube-id of T . index and the position are identical. Calculating the TM-index corresponding toa particular consecutive index is occasionally needed in higher-level algorithms. Thisis relatively expensive, since it involves a loop over all refinement levels, thus some10 to 30 in extreme cases. However often the task is to compute a whole range of d -simplices. This occurs, for example, when creating an initial uniform refinementof a given mesh (see Algorithm New in Section 5.1). That is, for a given consecutiveindex I , a level (cid:96) , and a count n , find the n level- (cid:96) simplices following the d -simplexcorresponding to the consecutive index I , that is, the d -simplices corresponding tothe n consecutive indices I, I + 1 , . . . , I + n −
1. Ideally, this operation should runlinearly in n , independent of (cid:96) , but if we used Algorithm 4.8 to create each of the n + 1 simplices we would have a runtime of O ( n L ). In order to achieve the desiredlinear runtime we introduce the operations Successor and
Predecessor that run inaverage O (1) time. These operations compute from a given d -simplex T of level (cid:96) withconsecutive index I T the d -simplex T (cid:48) whose consecutive index is I T + 1, respectively, I T −
1. Thus, T (cid:48) is the next level (cid:96) simplex in the SFC after T (resp. the previous one).Algorithm 4.10, which we introduce to solve this problem does not require knowledgeabout the consecutive indices I T and I T ± T we only need to reverse the sign in Line 3 in the Successor recursion subroutine of Algorithm 4.10.
Lemma
Algorithm 4.10 has constant average runtime (independent of L ).Proof. Because each operation in the algorithm can be executed in constant time,6
C, BURSTEDDE, J. HOLKE
T.b I loc ( T )2D 0 1 2 3 P.b
T.b I loc ( T )3D 0 1 2 3 4 5 6 7 P.b
Table 8
For a tetrahedron T ∈ T of local Index I loc whose parent P has type P.b we give the type of T . Algorithm 4.10:
Successor ( Tet T ) return Successor recursion ( T, T, T.(cid:96) ) Function
Successor recursion ( Tet T , Tet T (cid:48) ,int (cid:96) ) c ← c-id ( T, (cid:96) ) From c and b look up i := I loc ( T (cid:96) ) /* See Table 6 */ i ← ( i + 1) % if i = 0 then /* Enter recursion (in rare cases) */ T (cid:48) ← Successor recursion ( T, T (cid:48) , (cid:96) − ˆ b ← T (cid:48) .b /* ˆ b stores the type of T (cid:48) (cid:96) − */ else ˆ b ← Pt( c, b ) From ˆ b and I loc = i look up ( c (cid:48) , b (cid:48) ) /* See Tables 7 and 8 */ Set the level (cid:96) entries of T (cid:48) .x, T (cid:48) .y and T (cid:48) .z to c (cid:48) T (cid:48) .b ← b (cid:48) return T (cid:48) the average runtime is nc , where c is a constant (independent of L ) and n − i cycles through 0 to 2 d − d th step, allowing for a geometric series argument.We see in Algorithm 4.10 the usefulness of the consecutive index. Because we areusing this index instead of the TM-index, computing the index of the successor/pre-decessor only requires adding/subtracting 1 to the given index. On the other hand,computing the TM-index of a successor/predecessor would involve more subtle com-putations. TETRAHEDRAL SPACE-FILLING CURVE
5. High-level AMR algorithms.
To develop the complete AMR functionalityrequired by numerical applications, we aim at a forest of quad-/octrees in the spiritof [16, 28]. Key top-level algorithms are: • New . Given an input mesh of conforming simplices, each considered a rootsimplex, generate an initial refinement. • Adapt . Adapt (refine and coarsen) a mesh according to a given criterion. • Partition . Partition a mesh among all processes such that the load is bal-anced, possibly according to weights. • Balance . Establish a 2:1 size condition between neighbors in a given refinedmesh. The levels of any two neighboring simplices must differ by at most 1. • Ghost . For each process, assemble the layer of directly neighboring elementsowned by other processes. • Iterate . Iterate through the local mesh, executing a callback function oneach element and on all interelement interfaces.Since partitioning via SFC only uses the SFC index as information, we refer to al-ready existing descriptions of
Partition for hexahedral or simplicial SFCs [16, 40].
Balance , Ghost , and
Iterate are sophisticated parallel algorithms and require addi-tional theoretical work, which is beyond the scope of this paper.Here, we briefly describe
New and
Adapt . In the forest-of-trees approach we modelan adaptive mesh by a coarse mesh of level 0 d -simplices, the trees . Such a coarsemesh could be specified manually for simple geometries, or obtained from executinga mesh generator. Each level 0 simplex is identified with the root simplex T d andthen refined adaptively to produce the fine and potentially nonconforming mesh of d -simplices. These simplices are partitioned among all processes; thus each processholds a range of trees, of which the first and last may be incomplete: Their leaves aredivided between multiple processes.An entity F of the structure forest consists of the following entries • F .C — the coarse mesh; • F . K — the process-local trees; • F . E k — for each local tree k the list of process-local simplices in tetrahedralMorton order.We acknowledge that New and
Adapt are essentially communication-free, but still servewell to exercise some of the fundamental algorithms described earlier.
The
New algorithm creates a partitioned uniform level (cid:96) refined forestfrom a given coarse mesh. To achieve this, we first compute the first and last d -simplices belonging to the current process p . From this range we can calculate whichtrees belong to p and for each of these trees, the consecutive index of the first andlast d -simplices on this tree. We then create the first simplex in a tree by a call to T (Algorithm 4.8). In contrast to the New algorithm in [16] we create the remainingsimplices by calls to
Successor instead of T to avoid the O ( (cid:96) ) runtime of T in thecase of simplices. Our numerical tests, displayed in Figure 11, show that the runtimeof New is in fact linear in the number of elements and does not depend on the level (cid:96) .Within the algorithm, K denotes the number of trees in the coarse mesh and P thenumber of processes.After New returns, the process local number of elements is known, and per-elementdata can be allocated linearly in an array of structures, or a structure of arrays,depending on the specifics of the application.
The
Adapt algorithm modifies an existing forest by refining andcoarsening the d -simplices of a given forest according to a callback function. It does8 C, BURSTEDDE, J. HOLKE
Algorithm 5.1:
New ( Coarse Mesh C , int (cid:96) ) n ← d(cid:96) , N ← nK ; /* d -simplices per tree and global number of d -simplices */ g first ← (cid:98) N p/P (cid:99) , g last ← (cid:98) N ( p + 1) /P (cid:99) − /* Global numbers of firstand last.. */ k first ← (cid:98) g first /n (cid:99) , k last ← (cid:98) g last /n (cid:99) ; /* ..local simplex and local treerange */ for t ∈ { k first , . . . , k last } do e first ← ( t = k first ) ? g first − nt : 0; e last ← ( t = k last ) ? g last − nt : n − T ← T ( e first , (cid:96) ); /* Call Algorithm 4.8 */ E k ← { T } ; for e ∈ { e first , . . . , e last − } do T ← Successor ( T ); E k ← E k ∪ { T } this by traversing the d -simplices of each tree in tetrahedral Morton order and passingthem to the callback function. If the current d -simplex and its 2 d − d d -simplices as input plus the index ofthe current tree. In both cases, a return value greater than zero means that the firstinput d -simplex should be refined, and thus its 2 d children are added in tetrahedralMorton order to the new forest. Additionally, if the input consists of 2 d simplices,they form a family, and a return value smaller than zero means that this family shouldbe coarsened, thus replaced by their parent. If the callback function returns zero, thefirst given d -simplex remains unchanged and is added to the new forest, and Adapt continues with the next d -simplex in the current tree. The Adapt algorithm createsa new forest from the given one and can handle recursive refinement/coarsening. Forthe recursive part we make use of the following reasonable assumptions: • A d -simplex that was created in a refine step will not be coarsened duringthe same adapt call. • A d -simplex that was created in a coarsening step will not be refined duringthe same adapt call.From these assumptions we conclude that for recursive refinement we only have toconsider those d -simplices that were created in a previous refinement step and that weonly have to care about recursive coarsening directly after we processed a d -simplexthat was not refined and could be the last d -simplex in a family. If refinement andcoarsening are not done recursively, the runtime of Adapt is linear in the number of d -simplices of the given forest.An application will generally project or otherwise transform data from the previ-ous to the adapted mesh. This can be done within the adaptation callback, which isknown to proceed linearly through the local elements, or after Adapt returns if a copyof the old mesh has been retained. In the latter case, one would allocate element datafor the adapted mesh and then iterate over the old and the new data simultaneously,performing the projection in the order of the SFC. Once this is done, the old dataand the previous mesh are deallocated [15].
TETRAHEDRAL SPACE-FILLING CURVE R un t i m e [ s ] Number of processesStrong scaling of
New
New level 8New level 10Ideal strong scaling
Runtime tests for
New . .
059 –8 8 . .
451 7 .
649 6 . .
58 7 . . . . . .
116 –9 6 . .
898 7 . . .
15 7 . Fig. 11 . Runtime tests for
New on JUQUEEN. Left: Two strong scaling studies. A new uniformlevel 8 (circles) and level 10 (triangles) refinement of a coarse mesh of 512 root tetrahedra, carriedout with 16 up to 2,048 processes and 1,024 up to 131,072 processes with 16 processes per computenode. Right: The data shows that the runtime of
New is linear in the number of generated elementsand does not additionally depend on the level. The uniform refinement is created from a coarsemesh of 512 root tetrahedra. For the first computation on 64 processes we use 1 process per computenode and for the computation on 256 processes we use 2 processes per node.
6. Performance evaluation.
Given the design of the algorithms discussed inthis paper, we expect runtimes that are precisely proportional to the number of el-ements and independent of the level of refinement. To verify this, we present scal-ing and runtime tests for New and
Adapt on the JUQUEEN supercomputer at theForschungszentrum Juelich [19], an IBM BlueGene/Q system with 28,672 nodes con-sisting of 16 IBM PowerPC A2 @ 1 . . × tetrahedra with 13 millionelements per process.The first two tests are a strong scaling (up to 131k processes) and a runtimestudy of New in 3D, shown in Figure 11. For both tests we use a coarse mesh of 512tetrahedra. We time the
New algorithm with input level 8 (resp. level 10 for highernumbers of processes). We execute the runtime study to examine whether
New hasthe proposed level-independent linear runtime in the number of generated tetrahedra,which can be read from the results presented in the Table in Figure 11.The last test is
Adapt with a recursive nonuniform refinement pattern. Thestarting point for all runs is a mesh obtained by uniformly refining a coarse meshof 512 tetrahedra to a given initial level k . This mesh is then refined recursivelyusing a single Adapt call, where only the tetrahedra of types 0 and 3 whose leveldoes not exceed the fine level k + 5 are refined recursively. The resulting mesh oneach tetrahedron resembles a fractal pattern similar to the Sierpinski tetrahedron.We perform several strong and weak scaling runs on JUQUEEN starting with 128processes and scaling up to 131,072. The setting is 16 processes per compute node. Wefinally do another strong scaling run on the full system of the MIRA supercomputerat the Argonne Leadership Computing Facility with 786,432 processes and again 16processes per compute node. Figure 12 shows our runtime results.
7. Conclusion.
We present a new encoding for adaptive nonconforming triangu-lar and tetrahedral mesh refinement based on Bey’s red-refinement rule. We identify Version v0.1 is available at https://bitbucket.org/cburstedde/t8code.git C, BURSTEDDE, J. HOLKE R un t i m e [ s ] Number of processesStrong scaling of
Adapt , refine types 0 and 3
Adapt from level 4 to 9level 5 to 10level 6 to 11level 7 to 12Ideal weak scalingIdeal strong scaling
Fig. 12 . Strong scaling for
Adapt with a fractal refinement pattern. Starting from an initiallevel k on a coarse mesh of 512 tetrahedra we refine recursively to a maximal final level k + 5 .The refinement callback is such that only subtetrahedra of types 0 and 3 are refined. Left: Strongand weak scaling on JUQUEEN with up to 131,072 processes and strong scaling on MIRA withup to 786,432 processes. On both systems we use 16 processes per compute node. The level 12mesh consists of 858,588,635,136 tetrahedra. Right: An initial level 0 and final level 3 refinementaccording to the fractal pattern. The subtetrahedra of levels 1 and 2 are transparent. six different types of tetrahedra (and two types of triangles) and prescribe an orderingof the children for each of these types that differs from Bey’s original order. By intro-ducing an embedding of the mesh elements into a Cartesian coordinate structure, wedefine a tetrahedral Morton index that can be computed using bitwise interleavingsimilar to the Morton index for cubes. This tetrahedral Morton index shares someproperties with the well-known cubical one and allows for a memory-efficient randomaccess storage of the mesh elements.Exploiting the Cartesian coordinate structure, we develop several constant-timealgorithms on simplices. These include computing the parent, the children, and theface-neighbors of a given mesh element, as well as computing the next and previouselements according to the SFC.In view of providing a complete suite of parallel dynamic AMR capabilities, theconstructions and algorithms described in this paper are just the beginning. A repar-titioning algorithm following our SFC, for example, is easy to imagine, but challengingto implement if the tree connectivity is to be partitioned dynamically, and if globalshared metadata shall be reduced from being proportional to the number of ranks tothe number of compute nodes. The present paper provides atomic building blocksthat can be used in high-level algorithms for 2:1 balancing [27] and the computationof ghost elements and generalized topology iteration [28]; regardless, these algorithmsstill have to be written and are likely to raise questions that we will have to addressin future work. Notwithstanding the above, we believe that the choices presented inthis paper are sustainable for maintaining extreme scalability in the long term. Acknowledgments.
The authors gratefully acknowledge support by the BonnHausdorff Center for Mathematics (HCM) funded by the Deutsche Forschungsgemein-schaft (DFG). Author Holke acknowledges additional support by the Bonn Interna-tional Graduate School for Mathematics (BIGS) as part of HCM.We would like to thank Tobin Isaac (University of Chicago) for his thoughts onsoftware interfacing.We also would like to thank the Gauss Centre for Supercomputing (GCS) for pro-viding computing time through the John von Neumann Institute for Computing (NIC)on the GCS share of the supercomputer JUQUEEN [19] at the J¨ulich Supercomput-ing Centre (JSC). GCS is the alliance of the three national supercomputing centers
TETRAHEDRAL SPACE-FILLING CURVE
REFERENCES[1]
V. Akc¸elik, J. Bielak, G. Biros, I. Epanomeritakis, A. Fernandez, O. Ghattas, E. J.Kim, J. Lopez, D. R. O’Hallaron, T. Tu, and J. Urbanic , High resolution forward andinverse earthquake modeling on terascale computers , in SC03: Proceedings of the Interna-tional Conference for High Performance Computing, Networking, Storage, and Analysis,ACM/IEEE, 2003. Gordon Bell Prize for Special Achievement.[2]
F. Alauzet and A. Loseille , On the use of space filling curves for parallel anisotropic meshadaptation , in Proceedings of the 18th International Meshing Roundtable, B. W. Clark,ed., Springer, 2009, pp. 337–357, doi:10.1007/978-3-642-04319-2 20, http://dx.doi.org/10.1007/978-3-642-04319-2 20.[3]
I. Babuska and W. C. Rheinboldt , Error estimates for adaptive finite element computations
M. Bader , Space-Filling Curves: An Introduction with Applications in Scientific Computing ,Texts in Computational Science and Engineering, Springer, 2012.[5]
W. Bangerth, C. Burstedde, T. Heister, and M. Kronbichler , Algorithms and data struc-tures for massively parallel generic adaptive finite element codes , ACM Transactions onMathematical Software, 38 (2011), pp. 14:1–14:28.[6]
W. Bangerth, R. Hartmann, and G. Kanschat , deal.II – a general-purpose object-orientedfinite element library , ACM Transactions on Mathematical Software, 33 (2007), p. 24,doi:10.1145/1268776.1268779.[7] R. E. Bank, A. H. Sherman, and A. Weiser , Some refinement algorithms and data structuresfor regular local mesh refinement , in Scientific Computing, Applications of Mathematicsand Computing to the Physical Sciences, R. S. Stapleman, ed., vol. 1, IMACS/North-Holland Publishing Co., 1983.[8]
E. B¨ansch , Local mesh refinement in 2 and 3 dimensions
J. Bey , Der BPX-Vorkonditionierer in drei Dimensionen: Gitterverfeinerung, Parallelisierungund Simulation , Universit¨at Heidelberg, (1992). Preprint.[10]
J. Bey , Simplicial grid refinement: on Freudenthal’s algorithm and the optimal number of con-gruence classes , Numerische Mathematik, 85 (2000), pp. 1–29, doi:10.1007/s002110050475,http://dx.doi.org/10.1007/s002110050475.[11]
K. Brix, R. Massjung, and A. Voss , Refinement and connectivity algorithms for adaptivediscontinuous Galerkin methods , SIAM Journal on Scientific Computing, 33 (2011), pp. 66–101.[12]
T. N. Bui and C. Jones , Finding good approximate vertex and edge partitions is NP-hard
H.-J. Bungartz, M. Mehl, and T. Weinzierl , A parallel adaptive Cartesian PDE solverusing space–filling curves , Euro-Par 2006 Parallel Processing, (2006), pp. 1064–1074.[14]
C. Burstedde, O. Ghattas, M. Gurnis, T. Isaac, G. Stadler, T. Warburton, and L. C.Wilcox , Extreme-scale AMR , in SC10: Proceedings of the International Conference forHigh Performance Computing, Networking, Storage and Analysis, ACM/IEEE, 2010.[15]
C. Burstedde, O. Ghattas, G. Stadler, T. Tu, and L. C. Wilcox , Towards adaptive meshPDE simulations on petascale computers , in Proceedings of Teragrid ’08, 2008.[16]
C. Burstedde, L. C. Wilcox, and O. Ghattas , p4est : Scalable algorithms for parallel adap-tive mesh refinement on forests of octrees , SIAM Journal on Scientific Computing, 33(2011), pp. 1103–1133, doi:10.1137/100791634.[17] C. Carstensen , An adaptive mesh-refining algorithm allowing for an H stable L projection C, BURSTEDDE, J. HOLKE onto Courant finite element spaces , Constr. Approx., 20 (2004), pp. 549–564.[18]
U. Catalyurek, E. Boman, K. Devine, D. Bozdag, R. Heaphy, and L. Riesen , Hypergraph-based dynamic load balancing for adaptive scientific computations , in Proc. of 21st Inter-national Parallel and Distributed Processing Symposium (IPDPS’07), IEEE, 2007.[19]
J. S. Centre , JUQUEEN: IBM Blue Gene/Q supercomputer system at the J¨ulich Supercom-puting Centre , Journal of large-scale research facilities, A1 (2015), http://dx.doi.org/10.17815/jlsrf-1-18. http://dx.doi.org/10.17815/jlsrf-1-18.[20]
C. Chevalier and F. Pellegrini , PT-Scotch: A tool for efficient parallel graph ordering ,Parallel Computing, 34 (2008), pp. 318–331, doi:10.1016/j.parco.2007.12.001, http://dx.doi.org/10.1016/j.parco.2007.12.001.[21]
W. D¨orfler , A convergent adaptive algorithm for Poisson’s equation
P. F. Fischer, G. W. Kruse, and F. Loth , Spectral element methods for transitionalflows in complex geometries , Journal of Scientific Computing, 17 (2002), pp. 81–98,doi:10.1023/A:1015188211796.[23]
N. Golias and R. Dutton , Delaunay triangulation and 3D adaptive mesh gen-eration
H. Haverkort and F. van Walderveen , Locality and bounding-box quality of two-dimensionalspace-filling curves , Computational Geometry, 43 (2010), pp. 131–174.[25]
D. Hilbert , ¨Uber die stetige Abbildung einer Linie auf ein Fl¨achenst¨uck , Mathematische An-nalen, 38 (1891), pp. 459–460.[26] Ø. Hjelle and M. Dæhlen , Triangulations and Applications , Mathematics and Visualization,Springer, 2006, https://books.google.de/books?id=cRXAe8CVbBYC.[27]
T. Isaac, C. Burstedde, and O. Ghattas , Low-cost parallel algorithms for 2:1 octree bal-ance , in Proceedings of the 26th IEEE International Parallel & Distributed ProcessingSymposium, IEEE, 2012. http://dx.doi.org/10.1109/IPDPS.2012.47.[28]
T. Isaac, C. Burstedde, L. C. Wilcox, and O. Ghattas , Recursive algorithms for dis-tributed forests of octrees , SIAM Journal on Scientific Computing, 37 (2015), pp. C497–C531, doi:10.1137/140970963, http://arxiv.org/abs/1406.0089.[29]
G. Karypis and V. Kumar , A parallel algorithm for multilevel graph partitioning and sparsematrix ordering , Journal of Parallel and Distributed Computing, 48 (1998), pp. 71–95.[30]
D. A. Kopriva, S. L. Woodruff, and M. Y. Hussaini , Computation of electromagnetic scat-tering with a non-conforming discontinuous spectral element method , International Journalfor Numerical Methods in Engineering, 53 (2002), pp. 105–122, doi:10.1002/nme.394.[31]
I. Kossaczk`y , A recursive approach to local mesh refinement in two and three dimen-sions
M. Lee, L. De Floriani, and H. Samet , Constant-time neighbor finding in hierarchical tetra-hedral meshes , in SMI 2001 International Conference on Shape Modeling and Applications,May 2001, pp. 286–295, doi:10.1109/SMA.2001.923400.[33]
A. Logg, K.-A. Mardal, and G. N. Wells , eds.,
Automated Solution of Differential Equa-tions by the Finite Element Method , vol. 84 of Lecture Notes in Computational Scienceand Engineering, Springer, 2012, doi:10.1007/978-3-642-23099-8.[34]
G. M. Morton , A computer oriented geodetic data base; and a new technique in file sequencing ,tech. report, IBM Ltd., 1966.[35]
L. Oliker and R. Biswas , PLUM: Parallel load balancing for adaptive unstructuredmeshes
L. Oliker, R. Biswas, and H. N. Gabow , Parallel tetrahedral mesh adaptation with dy-namic load balancing , Parallel Computing, 26 (2000), pp. 1583–1608, doi:10.1016/S0167-8191(00)00047-8, http://dx.doi.org/10.1016/S0167-8191(00)00047-8.[37]
OpenCFD , OpenFOAM – The Open Source CFD Toolbox – User’s Guide , OpenCFD Ltd.,United Kingdom, 1.4 ed., 11 2007.[38]
E. J. Otoo and H. Zhu , Indexing on spherical surfaces using semi-quadcodes , in Advances inSpatial Databases, D. Abel and B. Chin Ooi, eds., vol. 692 of Lecture Notes in ComputerScience, Springer, 1993, pp. 510–529, doi:10.1007/3-540-56869-7 29, http://dx.doi.org/10.1007/3-540-56869-7 29.[39]
G. Peano , Sur une courbe, qui remplit toute une aire plane , Math. Ann., 36 (1890), pp. 157– TETRAHEDRAL SPACE-FILLING CURVE J. R. Pilkington and S. B. Baden , Partitioning with spacefilling curves , tech. report, Dept.of Computer Science and Engineering, University of California, San Diego, 1994.[41]
A. Rahimian, I. Lashuk, S. Veerapaneni, A. Chandramowlishwaran, D. Malhotra,L. Moon, R. Sampath, A. Shringarpure, J. Vetter, R. Vuduc, et al. , Petascaledirect numerical simulation of blood flow on 200k cores and heterogeneous architectures ,in Proceedings of the 2010 ACM/IEEE International Conference for High PerformanceComputing, Networking, Storage and Analysis, IEEE Computer Society, 2010, pp. 1–11.[42]
M. Rasquin, C. Smith, K. Chitale, E. S. Seol, B. A. Matthews, J. L. Martin, O. Sahni,R. M. Loy, M. S. Shephard, and K. E. Jansen , Scalable implicit flow solver for realisticwing simulations with flow control , Computing in Science Engineering, 16 (2014), pp. 13–21, doi:10.1109/MCSE.2014.75.[43]
W. C. Rheinboldt and C. K. Mesztenyi , On a data structure for adaptive finite elementmesh refinements , ACM Transactions on Mathematical Software, 6 (1980), pp. 166–187,doi:10.1145/355887.355891.[44]
H. Sagan , Space-Filling Curves , Springer, 1994.[45]
J. Shewchuk , Delaunay Refinement Mesh Generation , PhD thesis, Carnegie Mellon University,Pittsburgh, May 1997. CMU CS Tech Report CMU-CS-97-137.[46]
J. R. Shewchuk , Triangle: Engineering a 2D quality mesh generator and delaunay triangula-tor , in Applied Computational Geometry: Towards Geometric Engineering, M. C. Lin andD. Manocha, eds., vol. 1148 of Lecture Notes in Computer Science, Springer-Verlag, 1996,pp. 203–222. From the First ACM Workshop on Applied Computational Geometry.[47]
H. Si , TetGen—A Quality Tetrahedral Mesh Generator and Three-Dimensional Delaunay Tri-angulator , Weierstrass Institute for Applied Analysis and Stochastics, Berlin, 2006.[48]
W. Sierpi´nski , Sur une nouvelle courbe continue qui remplit toute une aire plane , Bulletin del’Acad´emie des Sciences de Cracovie, S´eries A (1912), pp. 462–478.[49]
C. W. Smith, M. Rasquin, D. Ibanez, K. E. Jansen, and M. S. Shephard , Application spe-cific mesh partition improvement
D. A. Steinman, J. S. Milner, C. J. Norley, S. P. Lownie, and D. W. Holdsworth , Image-based computational simulation of flow dynamics in a giant intracranial aneurysm ,American Journal of Neuroradiology, 24 (2003), pp. 559–566.[51]
J. R. Stewart and H. C. Edwards , A framework approach for developing parallel adaptivemultiphysics applications , Finite Elements in Analysis and Design, 40 (2004), pp. 1599–1617, doi:10.1016/j.finel.2003.10.006.[52]
H. Sundar, R. Sampath, and G. Biros , Bottom-up construction and 2:1 balance refinement oflinear octrees in parallel , SIAM Journal on Scientific Computing, 30 (2008), pp. 2675–2708,doi:10.1137/070681727.[53]
T. J. Tautges, R. Meyers, K. Merkley, C. Stimpson, and C. Ernst , MOAB: A mesh-oriented database , SAND2004-1592, Sandia National Laboratories, Apr. 2004. Report.[54]
T. Tu, D. R. O’Hallaron, and O. Ghattas , Scalable parallel octree meshing forterascale applications , in SC ’05: Proceedings of the International Conference forHigh Performance Computing, Networking, Storage, and Analysis, ACM/IEEE, 2005,doi:http://dx.doi.org/10.1109/SC.2005.61.[55]
T. Weinzierl and M. Mehl , Peano—a traversal and storage scheme for octree-like adaptiveCartesian multiscale grids , SIAM Journal on Scientific Computing, 33 (2011), pp. 2732–2760, http://link.aip.org/link/?SCE/33/2732.[56]
S. Zhang , Successive subdivisions of tetrahedra and multigrid methods on tetrahedral meshes ,Houston Journal of Mathematics, 21 (1995).[57]