[PDF] A tetrahedral space-filling curve for non-conforming adaptive meshes

Abstract

We introduce a space-filling curve for triangular and tetrahedral red-refinement that can be computed using bitwise interleaving operations similar to the well-known Z-order or Morton curve for cubical meshes. To store sufficient information for random access, we define a low-memory encoding using 10 bytes per triangle and 14 bytes per tetrahedron. We present algorithms that compute the parent, children, and face-neighbors of a mesh element in constant time, as well as the next and previous element in the space-filling curve and whether a given element is on the boundary of the root simplex or not. Our presentation concludes with a scalability demonstration that creates and adapts selected meshes on a large distributed-memory system.

Full PDF

AA tetrahedral space-ﬁlling curve for non-conforming adaptive meshes

CARSTEN BURSTEDDE ∗ AND

JOHANNES HOLKE ∗† Abstract.

We introduce a space-ﬁlling curve for triangular and tetrahedral red-reﬁnement thatcan be computed using bitwise interleaving operations similar to the well-known Z-order or Mortoncurve for cubical meshes. To store suﬃcient information for random access, we deﬁne a low-memoryencoding using 10 bytes per triangle and 14 bytes per tetrahedron. We present algorithms thatcompute the parent, children, and face-neighbors of a mesh element in constant time, as well as thenext and previous element in the space-ﬁlling curve and whether a given element is on the boundaryof the root simplex or not. Our presentation concludes with a scalability demonstration that createsand adapts selected meshes on a large distributed-memory system.

Key words.

Forest of octrees, parallel adaptive mesh reﬁnement, Morton code, high perfor-mance computing, nonconforming simplicial mesh, space-ﬁlling curve

AMS subject classiﬁcations.

1. Introduction.

Conforming adaptive mesh reﬁnement for simplicial (trian-gular and tetrahedral) meshes is one of the most successful concepts in numericalmathematics and computational science and engineering; see, e.g., [3, 21, 50]. Sim-plices provide high ﬂexibility in meshing to arbitrary domain geometries [46, 47] andcan be mapped to a reference simplex using an elementwise constant Jacobian inmost cases, which allows for an eﬃcient numerical implementation. Discretizationand integration methods of various kinds are available for both low and high ordersof accuracy.Large-scale scientiﬁc computing requires fast and scalable algorithms for (1) adap-tive reﬁnement and coarsening (AMR) as well as (2) parallel partitioning. One classof methods for AMR exploits the properties of Delaunay triangulations [23, 26, 45],while partitioning of unstructured meshes is often approached by formulating themesh topology as a graph. These triangulations are usually conforming; that is, ele-ments intersect only along whole faces and edges. Common application codes such asFEniCS [33], PLUM [35,36], OpenFOAM [37], or MOAB from the SIGMA toolkit [53]delegate graph partitioning to third-party software like ParMETIS [29] or Scotch [20].Graph-based algorithms have been advanced to target millions of processes and bil-lions of elements [18,42,49]. Still, increasing the scalability and decreasing the absoluteruntime and memory demands of distributed implementations remains a challenge,and the lack of an obvious parent-child structure in many unstructured meshing ap-proaches prevents certain use cases.Nonconforming AMR in combination with recursive reﬁnement makes reﬁnementand coarsening nearly trivial operations. The additional mathematical logic to enablehanging faces and edges is well understood for both continuous and discontinuousdiscretizations [1, 22, 30, 43]. It is local to the loop over the ﬁnite elements or volumesand transparent to most of the numerical pipeline, thus oﬀering the possibility toextend existing conforming codes. The resolution may be as coarse as any chosenroot mesh (a mesh which is not intended for further coarsening), which poses only aslight limit to geometric ﬂexibility.The challenge of eﬃcient partitioning of meshes may be addressed using space- ∗ Institut f¨ur Numerische Simulation (INS) and Hausdorﬀ Center for Mathematics (HCM),Rheinische Friedrich-Wilhelms-Universit¨at Bonn, Germany. † Corresponding author ([email protected])1 a r X i v : . [ c s . D C ] A p r C, BURSTEDDE, J. HOLKE ﬁlling curves (SFC). Instead of working on the NP-hard graph partitioning problem[12], SFCs are used to approximately solve the partitioning problem in linear runtime[4, 57]. Common SFCs establish an equivalence between the adaptive mesh and atree, where the mesh elements correspond one-to-one to the leaves of the tree. Theyalso deﬁne a total order among the leaves that can be used to store application datalinearly in the same order as the elements [15]. The original concept is speciﬁed fora cubic domain and can be traced back to over a hundred years ago [25, 39] . TheSierpinski curve [48] and generalized versions of it are the most common SFCs onsimplicial meshes. These were described by B¨ansch [8], Kossaczk`y [31], and others;see e.g. [4,44]. The computation of parents and children, for example, usually involvesa loop over the reﬁnement levels, equivalent to following the path between the treeroot and a leaf.When using hypercubes as mesh primitives, the logic of common SFCs can beformulated with remarkable simplicity due to the local tensor product structure [39,51,52,54,57]. It is also well-suited to compute topological entities like face-neighbors,children, and parents of given mesh elements. The Morton curve [34], in particular,allows one to design algorithms for these tasks (see, e.g., [16]) whose runtime is level-independent, i.e., does not depend on the depth of the reﬁnement tree. Moreover,SFCs are memory eﬃcient, since for a mesh element only the coordinates of oneanchor node plus its reﬁnement level have to be stored. Today, this principle hasfound its way into widely used software libraries [5, 6]. Memory usage can be furtheroptimized by incremental encoding along the SFC [4, 13, 55].Using SFCs on hexahedral meshes is exceptionally fast and scalable [14, 28, 41].This fact has not only been exploited in writing simulation codes using hexahedralmeshes, but also by approaches that use the hexahedral SFC as an instrument topartition simplices, mapping them into a surrounding cube [2]. However, hexahedralAMR is more limited geometrically than simplicial AMR. One indication for this isthe (lack of) availability of (open-source) mesh generators that operate exclusivelyon cubes. Furthermore, assuming an existing simplicial code, rewriting it in terms ofcubical meshes will be out of the question if the code is of suﬃcient size, complexity,or maturity. For these reasons, in this paper, we attempt to establish a new SFCfor triangles and tetrahedra that has many of the favorable properties known forhexahedra.Our starting point is to divide the simplices in a reﬁned mesh into two (twodimensions, 2D), respectively six (three dimensions, 3D), diﬀerent types and selectingfor each type a speciﬁc reordering for Bey’s red-reﬁnement [9]. This type and thecoordinates of one vertex serve as a unique identiﬁer, the Tet-id, of the simplex inquestion. In particular, we do not require storing the type of all parent simplices tothe root, as one might naively imagine. We then propose a Morton-like coordinatemapping that can be computed from the Tet-id and gives rise to an SFC. Based onthis logic, we develop constant-time algorithms (that is, not requiring a loop overthe reﬁnement levels) to compute the tetrahedral-Morton identiﬁers of any parent,child, face-neighbor, and SFC-successor/predecessor and to decide whether for twogiven elements one is an ancestor of the other. We conclude with scalability tests ofmesh creation and adaptation with over 8 . × tetrahedral mesh elements on upto 131,072 cores of the JUQUEEN supercomputer and 786,432 cores of MIRA.

2. Mesh reﬁnement on simplices.

Our aim is to deﬁne and examine a newSFC for triangles and tetrahedra by adding ordering prescriptions to the noncon-forming Bey-reﬁnement (also called red-reﬁnement) [9, 10, 56]. We brieﬂy restate the

TETRAHEDRAL SPACE-FILLING CURVE (cid:126)x (cid:126)x (cid:126)x (cid:126)x (cid:126)x (cid:126)x T T T T (cid:126)x (cid:126)x (cid:126)x (cid:126)x (cid:126)x (cid:126)x (cid:126)x (cid:126)x Fig. 1 . Left: The reﬁnement scheme for triangles in two dimensions. A triangle T =[ (cid:126)x , (cid:126)x , (cid:126)x ] ⊂ R is reﬁned by dividing each face at the midpoints (cid:126)x ij . We obtain four smallertriangles, all similar to T . Right: The situation in three dimensions. If we divide the edges of thetetrahedron T = [ (cid:126)x , (cid:126)x , (cid:126)x , (cid:126)x ] ⊂ R in half, we get four smaller tetrahedra (similar to T ) and aninner octahedron. By dividing the octahedron along any of its three diagonals (shown dashed) weﬁnally end up with partitioning T into eight smaller tetrahedra, all having the same volume. Thereﬁnement rule of Bey is obtained by always choosing the diagonal from (cid:126)x to (cid:126)x and numberingthe corners of the children according to (2) . red-reﬁnement in this section and contrast it with the well-known conforming (orred/green) reﬁnement.We refer to triangles and tetrahedra as d -simplices, where d ∈ { , } speciﬁes thedimension. It is sometimes convenient to drop d from this notation. A d -simplex T ⊆ R d is uniquely determined by its d +1 aﬃne-independent corner nodes (cid:126)x , . . . , (cid:126)x d ∈ R d .Their order is signiﬁcant, and therefore we write T = [ (cid:126)x , (cid:126)x , (cid:126)x ] in 2D , (1a) T = [ (cid:126)x , (cid:126)x , (cid:126)x , (cid:126)x ] in 3D . (1b)We deﬁne (cid:126)x as the anchor node of T . By (cid:126)x ij we denote the midpoint between (cid:126)x i and (cid:126)x j , thus (cid:126)x ij = ( (cid:126)x i + (cid:126)x j ). Bey’s rule is a prescription for subdividing a sim-plex. It is one instance of the so-called red-reﬁnement, where all faces of a simplexare subdivided simultaneously.

Definition Given a d -simplex T = [ (cid:126)x , . . . , (cid:126)x d ] ⊂ R d , the reﬁnement rule ofBey consists of cutting oﬀ four subsimplices at the corners (as in Figure 1). In 3Dthe remaining octahedron is then divided along the diagonal from (cid:126)x to (cid:126)x . Beynumbers the d resulting subsimplices as follows. (2a) 2 D : T := [ (cid:126)x , (cid:126)x , (cid:126)x ] , T := [ (cid:126)x , (cid:126)x , (cid:126)x ] ,T := [ (cid:126)x , (cid:126)x , (cid:126)x ] , T := [ (cid:126)x , (cid:126)x , (cid:126)x ] , (2b) 3 D : T := [ (cid:126)x , (cid:126)x , (cid:126)x , (cid:126)x ] , T := [ (cid:126)x , (cid:126)x , (cid:126)x , (cid:126)x ] ,T := [ (cid:126)x , (cid:126)x , (cid:126)x , (cid:126)x ] , T := [ (cid:126)x , (cid:126)x , (cid:126)x , (cid:126)x ] ,T := [ (cid:126)x , (cid:126)x , (cid:126)x , (cid:126)x ] , T := [ (cid:126)x , (cid:126)x , (cid:126)x , (cid:126)x ] ,T := [ (cid:126)x , (cid:126)x , (cid:126)x , (cid:126)x ] , T := [ (cid:126)x , (cid:126)x , (cid:126)x , (cid:126)x ] . C, BURSTEDDE, J. HOLKE

XY S S c = (cid:18) (cid:19) c c (cid:18) (cid:19) = c S S S S S X YZ S c c c c c c c c (3a) 2D (cid:126)x (cid:126)x (cid:126)x S c c c S c c c (3b) 3D (cid:126)x (cid:126)x (cid:126)x (cid:126)x S c c c c S c c c c S c c c c S c c c c S c c c c S c c c c Fig. 2 . The basic triangle (2D) and tetrahedra types (3D) obtained by dividing [0 , d intosimplices of varying types, denoted by a subscript. Top left: The unit square can be divided into twotriangles sharing the edge from (0 , T to (1 , T . We denote these triangles by S and S . Thefour corners of the square are numbered c , . . . , c in yx -order. Top right: The corner nodes of S and S in terms of the square corners. Bottom left (exploded view): In three dimensions the unitcube can be divided into six tetrahedra, all sharing the edge from the origin to (1 , , T . We denotethese tetrahedra by S , . . . , S . The eight corners of the cube are numbered c , . . . , c in zyx -order(redrawn and modiﬁed with permission [9]). Bottom right: The corner nodes of the six tetrahedra S , . . . , S in terms of the cube corners. Definition In this paper, a reﬁnement of a d -simplex S denotes a set S of d -dimensional subsimplices of S that can be constructed from S via successive re-ﬁnement, where only the ﬁnest simplices belong to the actual reﬁnement. Thus allreﬁnements can be obtained applying the following rules: • S = { S } is a reﬁnement of S . • If S (cid:48) is a reﬁnement of S , then S = ( S (cid:48) \ { T } ) ∪ { T , . . . , T d − } is areﬁnement for every T ∈ S (cid:48) .We explicitly allow nonuniform reﬁnements and thus nonconforming faces and edges. Definition The T i from (2) are called the children of T , and T is called the parent of the T i , written T = P ( T i ) . Therefore, we also call the T i siblings of eachother. If a d -simplex T belongs to a reﬁnement of another d -simplex S , then T is a descendant of S , and S is an ancestor of T . The number (cid:96) of reﬁning steps neededto obtain T from S is unique and called the level of T (with respect to S ); we write (cid:96) = (cid:96) ( T ) . Usually S is clear from the context, and therefore we omit it in the notation.By deﬁnition, T is an ancestor and descendant of itself. Consider the six tetrahedra S , . . . , S ⊂ R displayed in Figure 2. These tetra-hedra form a triangulation of the unit cube. The results and algorithms in this paper TETRAHEDRAL SPACE-FILLING CURVE reﬁning the cube (cid:47) (cid:47) triangulating the cube (cid:15) (cid:15) triangulating each cube (cid:15) (cid:15) reﬁning each cube (cid:47) (cid:47) triangulating each cube (cid:15) (cid:15) reﬁningeach tetrahedron (cid:47) (cid:47) reﬁningeach tetrahedron (cid:47) (cid:47) Fig. 3 . Triangulating a cube according to Figure 2 and then reﬁning the tetrahedra via Bey’sreﬁnement rule results in the same mesh as ﬁrst reﬁning the cube into eight subcubes and afterwardtriangulating each of these cubes. Each occurring tetrahedron is uniquely determined by the subcubeit lies in plus its type. The same situation can be observed in 2D if we restrict our view to one sideof the cube. rely on the following property [9].

Property Reﬁning the six tetrahedra from the triangulation of the unit cubesimultaneously to level (cid:96) results in the same mesh as ﬁrst reﬁning the unit cube to level (cid:96) and then triangulating each smaller cube with the six tetrahedra S , . . . , S , scaledby a factor of − (cid:96) (see Figure 3). The same behavior can be observed in 2D when theunit square is divided into two triangles. It is worthnoting that, although the methods and algorithms presented in this paper apply tored-reﬁned meshes with hanging nodes, it is possible to augment them to create mesheswithout hanging nodes. For this we may use red/green or red/green/blue reﬁnementmethods [7, 17].After the red-reﬁnement step we may add an additional and possibly nonlocalreﬁnement operation that ensures a maximum level diﬀerence of 1 between neigh-boring simplices. Such an operation is also called 2:1 balance [27, 52, 54]; its fulldescription, however, would exceed the scope of the present paper. Hanging nodesare then resolved by bisecting those simplices with hanging nodes (green/blue reﬁne-ment) [4, section 12.1.3]. The 2D case is shown in Figure 4.If one of the newly created simplices shall be further reﬁned, the bisection isreversed, the original simplex is red-reﬁned, and the balancing and green-reﬁnementis repeated. This may void the nesting property of certain discrete function spaces,yet applications may still prefer this approach over the manual implementation ofhanging node constraints.

3. A tetrahedral Morton index.

The Morton index or Z-order for a cube ina hexahedral mesh is computed by bitwise interleaving the coordinates of the anchornode of the cube [34]. In this section we present an index for d -simplices that also C, BURSTEDDE, J. HOLKE

Fig. 4 . To resolve hanging nodes we can execute an additional step of green- or blue-reﬁnementafter the last reﬁnement step [7,17]. Here we show the 2D reﬁnement rules. Left: green (1 hangingnode). Right: blue (2 hanging nodes). If a triangle has 3 hanging nodes it is red-reﬁned. uses the bitwise interleaving approach, the tetrahedral Morton index (TM-index). Todeﬁne the TM-index we look at reﬁnements of a reference simplex, which we discussin Section 3.1 below. For each d -simplex in a reﬁnement of the reference simplex wedeﬁne a unique identiﬁer, the so-called Tet-id, which serves as the input to compute theTM-index and for all algorithms related to it. This Tet-id consists of the coordinatesof the anchor node of the considered simplex plus one additional number, the typeof the simplex. We deﬁne the Tet-id and type in Section 3.2. We then deﬁne theTM-index in Section 3.3 and in the following subsections. We show that it possesseslocality properties similar to those of the Morton index. One novel aspect of thisconstruction lies in logically including the types of the simplex and all its parents inthe interleaving, while only using the type of the simplex itself in the algorithms. Throughout the rest of this paper, let L be agiven maximal reﬁnement level. Instead of the unit cube [0 , d , we consider thescaled cube [0 , L ] d , ensuring that all node coordinates in a reﬁnement up to level L are integers. Suppose we are given some d -simplex T ⊂ R d together with a reﬁnement S of T . By mapping T aﬃne-linearly to 2 L S the reﬁnement S is mapped to areﬁnement S (cid:48) of 2 L S . Therefore, to examine SFCs on reﬁnements of T , it suﬃcesto examine SFCs on 2 L S . Thus, we only consider reﬁnements of the d -simplex T d := 2 L S . Let T d be the set of all possible descendants of this d -simplex with levelsmaller than or equal to L ; thus(4) T d = (cid:8) T | T is a descendant of T d with 0 ≤ (cid:96) ( T ) ≤ L (cid:9) . Any reﬁnement (up to level L ) of T d is a subset of T d , and for each T ∈ T d thereexists at least one reﬁnement T of T d with T ∈ T . In this context, we refer to T d as the root simplex . Furthermore, let L d denote the set of all possible anchor nodecoordinates of d -simplices in T d , thus(5) L = (cid:8) [0 , L ) ∩ Z | y ≤ x (cid:9) , L = (cid:8) [0 , L ) ∩ Z | y ≤ z ≤ x (cid:9) . Note that we could have chosen any other of the S i (scaled by 2 L ) as the root simplexand we do not see any advantage or disadvantage in doing so. d -simplex. Making use of Property 4, wedeﬁne the following.

Definition Each d -simplex T ∈ T d of level (cid:96) lies in a d -cube of the hexahedralmesh that is part of a uniform level (cid:96) reﬁnement of [0 , L ] d . This speciﬁc cube is the associated cube of T and denoted by Q T . The d -simplex T is a scaled and shiftedversion of exactly one of the six tetrahedra S i that constitute the unit cube, and wedeﬁne the type of T as this number, type( T ) := i . TETRAHEDRAL SPACE-FILLING CURVE T T T T b 0 0 0 0 11 1 1 1 0 Ct Child3D T , . . . , T T T T T b 0 0 4 5 2 11 1 3 2 5 02 2 0 1 4 33 3 5 4 1 24 4 2 3 0 55 5 1 0 3 4 Table 1

For a d -simplex T of type b the table gives the types Ct ( T ) , . . . , Ct ( T d − ) of T (cid:48) s children. Thecorner-children T , T , T (and in 3D also T ) always have the same type as T . The anchor node of a subcube of level (cid:96) is the particular corner of that cube with thesmallest x -, y - (and z -) coordinates. This means that for each simplex T in the reﬁne-ment from Figure 3 the anchor node of T and the anchor node of its associated cubecoincide. Any two d -simplices in T d with the same associated cube are distinguishableby their type.From Bey’s observation from Figure 3 it follows that any simplex in T d can beobtained by specifying a level (cid:96) , then choosing one level (cid:96) subcube of the root cubeand ﬁnally ﬁxing a type. This provides motivation for the following deﬁnition. Definition

For T = [ (cid:126)x , . . . , (cid:126)x d ] ∈ T d we deﬁne the Tet-id of T asthe tuple of its anchor node and type; thus (6) Tet-id ( T ) := ( (cid:126)x , type( T )) . Corollary Let

T, T (cid:48) ∈ T d . Then T = T (cid:48) if and only if their Tet-ids andlevels are the same. Note that in an arbitrary adaptive mesh there can be simplices with diﬀerent levelsand each simplex T has an associated cube of level (cid:96) ( T ). In particular, simplices withthe same anchor node can have diﬀerent associated cubes if their levels are not equal.Since any simplex in T d can be speciﬁed by the Tet-id and level, the Tet-id providesan important tool for our work. The construction of the TM-index in the next sectionand the algorithms that we present in Section 4 rely on the Tet-id as the fundamentaldata of a simplex. All information about a mesh can be extracted from the Tet-idand level of each element.Since the root simplex has type 0, in a uniform reﬁnement more simplices havetype 0 than any other type. However, a close examination of Table 1 together with ashort inductive argument leads to the following proposition. Proposition In the limit (cid:96) → ∞ the diﬀerent types of simplices in a uniformlevel (cid:96) reﬁnement of T d occur in equal ratios. In addition to the anchorcoordinates the TM-index also depends on the types of all ancestors of a simplex. Inorder to deﬁne the TM-index we start by giving a formal deﬁnition of the interleavingoperation and some additional information.

Definition We deﬁne the interleaving a ˙ ⊥ b of two n -tuples a = ( a n − ,. . . , a ) and b = ( b n − , . . . , b ) as the n -tuple obtained by alternating the entries C, BURSTEDDE, J. HOLKE of a and b : (7) a ˙ ⊥ b := ( a n − , b n − , . . . , a , b ) . The interleaving of more than two n -tuples a , . . . , a m is deﬁned analogously as the mn -tuple (8) a ˙ ⊥ · · · ˙ ⊥ a m := ( a n − , a n − , . . . , a mn − , a n − , . . . , a m − , a m ) . Remark

The TM-index of a d -simplex T ∈ T d that we are going to deﬁneis constructed by interleaving d + 1 L -tuples, where the ﬁrst d are the binary repre-sentations of the coordinates of T ’s anchor node and the last is the tuple consistingof the types of the ancestors of T . Definition

Let T ∈ T be a tetrahedron of reﬁnement level (cid:96) with anchornode (cid:126)x = ( x, y, z ) T ∈ L . Since x, y, z ∈ N with ≤ x, y, z < L , we can expressthem as binary numbers with L digits, writing (9) x = L− (cid:88) j =0 x j j , y = L− (cid:88) j =0 y j j , z = L− (cid:88) j =0 z j j . We deﬁne the L -tuples X , Y , and Z as the L -tuples consisting of the binary digits of x , y , and z ; thus, X = X ( T ) := ( x L− , . . . , x ) , (10a) Y = Y ( T ) := ( y L− , . . . , y ) , (10b) Z = Z ( T ) := ( z L− , . . . , z ) . (10c) In 2D we get the same deﬁnitions with X and Y , leaving out the z -coordinate. Definition

For a T ∈ T d of level (cid:96) and each ≤ j ≤ (cid:96) let T j be the (unique)ancestor of T of level j . In particular, T (cid:96) = T . We deﬁne B ( T ) as the L -tupleconsisting of the types of T ’s ancestors in the ﬁrst (cid:96) entries, starting with T . Thelast L − (cid:96) entries of B ( T ) are zero: (11) B = B ( T ) :=  type( T ) , type( T ) , . . . , type( T ) (cid:124) (cid:123)(cid:122) (cid:125) (cid:96) entries , , . . . ,  , Thus, if we write B as an L -tuple with indexed entries b i (12) B = B ( T ) = ( b L− , . . . , b ) ∈ { , . . . , d ! − } L , then the i th entry b i is given as (13) b i = (cid:40) type( T L− i ) L − ≥ i ≥ L − (cid:96), L − (cid:96) > i ≥ . Definition

13 (tetrahedral Morton Index).

We deﬁne the tetrahedral Mortonindex ( TM-index ) m ( T ) of a d -simplex T ∈ T d as the interleaving of the L -tuples Z (for tetrahedra), Y , X and B . Thus, (14a) m ( T ) := Y ˙ ⊥ X ˙ ⊥ B TETRAHEDRAL SPACE-FILLING CURVE for triangles and (14b) m ( T ) := Z ˙ ⊥ Y ˙ ⊥ X ˙ ⊥ B for tetrahedra. This index resembles the well-known Morton index or Z-order for d -dimensional cubes,which we denote by (cid:101) m here. For such a cube Q the Morton index is usually deﬁnedas the bitwise interleaving of its coordinates. Thus (cid:101) m ( Q ) = Z ˙ ⊥ Y ˙ ⊥ X , respectively, (cid:101) m ( Q ) = Y ˙ ⊥ X ; see [16, 34, 52].As we show in Section 4, the TM-index can be computed from the Tet-id of T with no further information given. Thus, in an implementation it is not necessary tostore the L -tuple B .The TM-index of a d -simplex builds up from packs of d bits z i (for tetrahedra), y i , and x i followed by a type b i ∈ { , . . . , d ! − } . Since d ! = 2 < d = 2, we caninterpret the 2D TM-index as a quarternary number with digits ( y i x i ) and b i :(15a) m ( T ) = (( y L− x L− ) , b L− , . . . , ( y x ) , b ) = L− (cid:88) i =0 (cid:0) (2 y i + x i )4 i +1 + b i i (cid:1) . Similarly we can interpret it as an octal number with digits ( z i y i x i ) and b i for d = 3,since then d ! = 6 < m ( T ) = (( z L− y L− x L− ) , b L− , . . . , ( z y x ) , b ) = L− (cid:88) i =0 (cid:0) (4 z i + 2 y i + x i )8 i +1 + b i i (cid:1) . The entries in these numbers are only nonzero up to the level (cid:96) of T , since x L− i = y L− i = ( z L− i =) b L− i = 0 for all i > (cid:96) . The octal/quarternary representation (15)directly gives an order on the TM-indices, and therefore it is possible to construct anSFC from it, which we examine further in Section 3.6. We use m ( T ) to denote boththe ( d + 1) L -tuple from (14) and the number given by (15).Let us look at Figure 5 for a short example to motivate this deﬁnition of the TM-index. Since the anchor coordinates and the type together with the level uniquelydetermine a d -simplex in T d , one could ask why we do not deﬁne the index to be(( Z ˙ ⊥ ) Y ˙ ⊥ X, type( T )), a pair consisting of the Morton index of the associated cubeof T and the type of T . This index was introduced for triangles in a slightly modiﬁedversion as semiquadcodes in [38] and would certainly require less information sincethe computation of the sequence B would not be necessary. However, it results in anSFC that traverses the leaf cubes of a reﬁnement in the usual Z-order and inside ofeach cube it traverses the d ! diﬀerent simplices in the order S , . . . , S d ! − . As a result,there can be simplices T whose children are not traversed as a group, which meansthat there is a tetrahedron T (cid:48) , which is not an ancestor or descendant of T , such thatsome child T i of T is traversed before T (cid:48) and T (cid:48) is traversed before another child T j of T . Theorem 16 states that this problem does not occur with the TM-index thatwe have deﬁned above. Figure 5 compares the two approaches for a uniform level 2reﬁnement of T . There is another in-terpretation of the TM-index, which is particularly useful for the AMR algorithms0

C, BURSTEDDE, J. HOLKE

Y Xm T = ( Y ˙ ⊥ X, type( T )) Y Xm T = Y ˙ ⊥ X ˙ ⊥ B Fig. 5 . Comparing a straightforward deﬁnition of a Morton-type SFC with our approach. Left:The SFC arising from taking the Morton order of the quadrants and only dividing into triangles onthe last level. Thus the index is ( Y ˙ ⊥ X, type( T )) . As we see on the two coarse triangles that areshaded, the children of a level 1 triangle are not necessarily traversed before any other triangle istraversed. Such a locality property would be desirable, and therefore this index is not suitable forour purposes. Right: The SFC arising from the TM-index from our Deﬁnition 13. We see that foreach level 1 triangle its four children are traversed as a group. Theorem 16 states that this propertyholds for any parent triangle/tetrahedron. The order in which the children are traversed depends(only) on the type of the parent and is diﬀerent from Bey’s order given by (2) . presented in Section 4. In order to deﬁne it we introduce the concept of the so-calledcube-id. According to Figure 2 we number the 2 d corners of a d -dimensional cube by c , . . . , c d − in a zyx -order ( x varies fastest). When reﬁning a cube to 2 d children,each child has exactly one of the c i as a corner, and it is therefore convenient tonumber the children by c , . . . , c d − as well. For the child c i we call the number i the cube-id of that child; see Figure 6 for an illustration. Since each cube Q thatis not the root has a unique parent, it also has a unique cube-id. This cube-id caneasily be computed by interleaving the last signiﬁcant bits of the z - (in 3D), y -, and x -coordinates of Q ’s anchor node. Definition

Because each d -simplex T ∈ T d of level (cid:96) has a unique associatedcube we deﬁne the cube-id of T to be the cube-id of the associated cube of T , that is,the d -cube of level (cid:96) that has the same anchor node as T . If X , Y (and Z ) are as is in Deﬁnition 11 then we can write the cube-id of T ’s ancestorsas(16) cube-id( T i ) = ( y i x i ) in 2D , cube-id( T i ) = ( z i y i x i ) in 3D , and therefore using (15) we can rewrite the TM-index of T as(17) m ( T ) = (cube-id( T ) , type( T ) , . . . , cube-id( T (cid:96) ) , type( T (cid:96) ) , , . . . , d . This resembles the Morton index of the associated cube Q T of T , since we can writethis as(18) (cid:101) m ( Q T ) = (cube-id( Q ) , . . . , cube-id( Q (cid:96) ) , , . . . , d . In this section we show that the TM-indexhas properties similar to that of the Morton index for cubes. We are particularly in-terested in locality properties stating that reﬁning a simplex only locally changes the

TETRAHEDRAL SPACE-FILLING CURVE XYc c c c X YZc c c c c c c

45 6701 23 c Fig. 6 . Left: A square is reﬁned to four children, each of which corresponds to a corner of thesquare. The number of the corner is the cube-id of that child. Right: In three dimensions a cube isreﬁned to eight children. Their cube-ids and corner numbers are shown as well. order given by the TM-index. We begin by stating the uniqueness of the TM-indexif additionally a level (cid:96) is given.

Proposition

Together with a reﬁnement level (cid:96) , the TM-index m ( T ) unique-ly determines a d -simplex in T d .Proof. If (cid:96) = 0, then there is only one simplex of level (cid:96) in T d , which is T d . Solet (cid:96) > m = m ( T ) be given as in (14), and let (cid:96) be the level of T . From m wecan compute the x - , y - (and z -) coordinates of the associated cube of T . We can alsocompute the type of T from the TM-index. By Corollary 7 this information uniquelydetermines T .For the Morton index (cid:101) m for cubes the following important properties are known [52]:(i) The Morton indices of the descendants of a parent cube are larger than or equalto the index of the parent cube.(ii) A Morton index of a cube Q is the preﬁx of an index of a cube P of higher levelthan Q if and only if P is a descendant of Q .(iii) Reﬁning only changes the SFC locally. Thus, if Q is a cube and P is a cubewith (cid:101) m ( Q ) < (cid:101) m ( P ) and P is not a descendant of Q , then (cid:101) m ( Q (cid:48) ) < (cid:101) m ( P ) foreach descendant Q (cid:48) of Q .Property (iii) deﬁnes a hierarchic invariant of the SFC that is speciﬁc to our con-struction (see Figure 5). We show below that properties (i), (ii) and (iii) hold for d -simplices and the TM-index described by (14). Theorem

For arbitrary d -simplices T (cid:54) = S ∈ T d the TM-index satisﬁes thefollowing:(i) If S is a descendant of T then m ( T ) ≤ m ( S ) .(ii) If (cid:96) ( T ) < (cid:96) ( S ) , then m ( T ) is a preﬁx of m ( S ) if and only if S is a descendantof T .(iii) If m ( T ) < m ( S ) and S is no descendant of T , then for each descendant T (cid:48) of T we have (19) m ( T ) ≤ m ( T (cid:48) ) < m ( S ) . The proof of Theorem 16 requires some work and we need to show a technical resultﬁrst. Hereby, we consider only the 3D case, since for 2D the argument is completelyanalogous. We deﬁne an embedding of the set of all TM-indices into the set of Mortonindices for 6D cubes. Since the properties (i)–(iii) hold for these cubes it follows that2

C, BURSTEDDE, J. HOLKE they hold for tetrahedra as well. To this end, for a given tetrahedron T ∈ T weinterpret each entry b j of B ( T ) as a 3-digit binary number(20) b j = ( b j b j b j ) , which is possible since b j ∈ { , . . . , } . We obtain three new L -tuples B , B , B satisfying(21) B = B ˙ ⊥ B ˙ ⊥ B , and thus we can rewrite the TM-index as(22) m ( T ) = Z ˙ ⊥ Y ˙ ⊥ X ˙ ⊥ B ˙ ⊥ B ˙ ⊥ B . Note that we can interpret each B i as an L -digit binary number for which we have0 ≤ B i < L . Now let Q denote the set of all 6D cubes that are a child of the cube Q := [0 , L ] :(23) Q = { Q | Q is a descendant of Q of level 0 ≤ (cid:96) ≤ L } . Since a cube Q ∈ Q is uniquely determined by the six coordinates ( x , . . . , x ) ofits anchor node plus its level (cid:96) , we also write Q = Q ( x ,...,x ) ,(cid:96) . Note that theMorton index for a cube can be deﬁned as the bitwise interleaving of its anchor nodecoordinates [34]:(24) (cid:101) m ( Q ) = X ˙ ⊥ X ˙ ⊥ X ˙ ⊥ X ˙ ⊥ X ˙ ⊥ X . Proposition

The map (25) Φ : T −→ Q ,T (cid:55)−→ Q ( B ( T ) ,B ( T ) ,B ( T ) ,x ( T ) ,y ( T ) ,z ( T )) ,(cid:96) ( T ) is injective and satisﬁes (26) (cid:101) m (Φ( T )) = m ( T ) . Furthermore, it fulﬁlls the property that T (cid:48) is a child of T if and only if Φ( T (cid:48) ) is achild of Φ( T ) .Proof. The equation m ( T ) = (cid:101) m (Φ( T )) follows directly from the deﬁnitions of theTM-indices on T and Q . From Lemma 15 we conclude that Φ is injective. Now let T (cid:48) , T ∈ T , where T (cid:48) is a child of T . Furthermore, let (cid:96) = (cid:96) ( T ). We know that Q (cid:48) := Φ( T (cid:48) ) is a child of Q := Φ( T ) if and only if for each i ∈ { , . . . , } it holds that(27) x i ( Q (cid:48) ) ∈ (cid:110) x i ( Q ) , x i ( Q ) + 2 L− ( (cid:96) +1) (cid:111) . Because of the underlying cube structure (compare Figure 3) we know that the x -coordinate of the anchor node of T (cid:48) satisﬁes(28) x ( T (cid:48) ) ∈ (cid:110) x ( T ) , x ( T ) + 2 L− ( (cid:96) +1) (cid:111) , and likewise for Y ( T (cid:48) ) and Z ( T (cid:48) ). Therefore, (27) holds for i = 3 , ,

5. By deﬁnition B j ( T (cid:48) ) is the same as B j ( T ) except at position L − ( (cid:96) + 1), where(29) B j ( T (cid:48) ) L− ( (cid:96) +1) = b j L− ( (cid:96) +1) ( T (cid:48) ) ∈ { , } TETRAHEDRAL SPACE-FILLING CURVE B j ( T ) L− ( (cid:96) +1) = 0 . Hence, we conclude that (27) also holds for i = 0 , ,

2. So Φ( T (cid:48) ) is a child of Φ( T ).To show the other implication, let us assume that Φ( T (cid:48) ) is a child of Φ( T ). Since (cid:96) ( T (cid:48) ) = (cid:96) (Φ( T (cid:48) )) > T (cid:48) has a parent and we denote it by P . In the argument abovewe have shown that Φ( P ) is the parent of Φ( T (cid:48) ) and because each cube has a uniqueparent the identity Φ( P ) = Φ( T ) must hold. Therefore, we get P = T since Φ isinjective; thus, T (cid:48) is the child of T .Inductively we conclude that T (cid:48) is a descendant of T if and only if Φ( T (cid:48) ) is a descen-dant of Φ( T ). Now Theorem 16 follows, because the desired properties (i)–(iii) holdfor the Morton index of cubes. By interpretingthe TM-indices as 2 d -ary numbers as in (15) we get a total order on the set of allpossible TM-indices, and therefore it gives rise to an SFC for any reﬁnement S of T d . In this section we further examine the SFC derived from the TM-index. We givehere a recursive description of it, similarly to how it is done for the Sierpinski curveand other cubical SFC by Haverkort and van Walderveen [24].Part (iii) of Theorem 16 tells us that the descendants of a simplex T are traversedbefore any other simplices with a higher TM-index than T are traversed. However, theorder that the children of T have relative to each other can be diﬀerent to the orderof children of another simplex T (cid:48) . In particular the order of the simplices deﬁned bythe TM-index diﬀers from the order (2) deﬁned by Bey. We observe this behavior in2D in Figure 5 on the right-hand side: For the level 1 triangles of type 0 the childrenare traversed in the order(31) T , T , T , T and the children of the level 1 triangle of type 1 are traversed in the order(32) T , T , T , T . In fact, the order of the children of a simplex T depends only on the type of T , as weshow in the following Proposition. Proposition If T, T (cid:48) ∈ T d are two d -simplices of given type b = type( T ) =type( T (cid:48) ) , then there exists a unique permutation σ ≡ σ b of (cid:8) , . . . , d − (cid:9) such that (33) m ( T σ (0) ) < m ( T σ (1) ) < · · · < m ( T σ (2 d − ) , and m ( T (cid:48) σ (0) ) < m ( T (cid:48) σ (1) ) < · · · < m ( T (cid:48) σ (2 d − ) . Thus, the children of T and the children of T (cid:48) are in the same order with respect totheir TM-index.Proof. By ordering the children of T and T (cid:48) with respect to their TM-indices, weobtain σ and σ (cid:48) with(34) m ( T σ (0) ) < m ( T σ (1) ) < · · · < m ( T σ (2 d − ) ,m ( T (cid:48) σ (cid:48) (0) ) < m ( T (cid:48) σ (cid:48) (1) ) < · · · < m ( T (cid:48) σ (cid:48) (2 d − ) . C, BURSTEDDE, J. HOLKE

These permutations are well-deﬁned and unique with this property because diﬀerentsimplices of the same level never have the same TM-index; see Proposition 15. Itremains to show that σ (cid:48) = σ . Let (cid:96) = (cid:96) ( T ) and (cid:96) (cid:48) = (cid:96) ( T (cid:48) ). Since the TM-indices ofthe children of T do all agree up to level (cid:96) , we see, using the notation from (15), thattheir order σ depends only on the d + 1 numbers ( z is omitted for d = 2)(35) z L− ( (cid:96) +1) ( T i ) , y L− ( (cid:96) +1) ( T i ) , x L− ( (cid:96) +1) ( T i ) and b L− ( (cid:96) +1) ( T i ) . The same argument applies to σ (cid:48) and (cid:96) (cid:48) . From now on we carry out the computationsfor d = 3. Since type( T ) = type( T (cid:48) ) we can write(36) T = λT (cid:48) + (cid:126)c, with(37) λ = 2 (cid:96) (cid:48) − (cid:96) , (cid:126)c =  x ( T ) − x ( T (cid:48) ) y ( T ) − y ( T (cid:48) ) z ( T ) − z ( T (cid:48) )  . Since the reﬁnement rules (2) commute with scaling and translation we also obtain(38) T i = λT (cid:48) i + (cid:126)c for the children of T and T (cid:48) and therefore(39) b L− ( (cid:96) +1) ( T i ) = type( T i ) = type( T (cid:48) i ) = b L− ( (cid:96) (cid:48) +1) ( T (cid:48) i )for 0 ≤ i < d . Furthermore, we have(40) x L− ( (cid:96) +1) ( T i ) = ( x ( T i ) − x ( T ))2 −L +( (cid:96) +1) from which we derive(41) x L− ( (cid:96) +1) ( T i ) = λ ( x ( T (cid:48) i ) − x ( T (cid:48) ))2 −L +( (cid:96) +1) = 2 (cid:96) (cid:48) − (cid:96) ( x ( T (cid:48) i ) − x ( T (cid:48) ))2 −L +( (cid:96) +1) = ( x ( T (cid:48) i ) − x ( T (cid:48) ))2 −L +( (cid:96) (cid:48) +1) = x L− ( (cid:96) (cid:48) +1) ( T (cid:48) i ) , and analogously(42) y L− ( (cid:96) +1) ( T i ) = y L− ( (cid:96) (cid:48) +1) ( T (cid:48) i ) ,z L− ( (cid:96) +1) ( T i ) = z L− ( (cid:96) (cid:48) +1) ( T (cid:48) i ) . This shows that the tetrahedral Morton order of the children of T and T (cid:48) are thesame and σ (cid:48) must equal σ . Definition

Let T ∈ T d such that T ’s parent P has type b and T is the i thchild of P according to Bey’s order (2) , ≤ i < d . We call the number σ b ( i ) the local index of the d -simplex T and use the notation (43) I loc ( T ) := σ b ( i ) to denote the child number in the TM-ordering, subsequently written TM-child. Bydeﬁnition, the local index of the root simplex is zero, I loc ( T d ) := 0 . Table 2 lists thelocal indices for each parent type. TETRAHEDRAL SPACE-FILLING CURVE I loc Child2D T T T T b 0 0 1 3 21 0 2 3 1 I loc Child3D T T T T T T T T b 0 0 1 4 7 2 3 6 51 0 1 5 7 2 3 6 42 0 3 4 7 1 2 6 53 0 1 6 7 2 3 4 54 0 3 5 7 1 2 4 65 0 3 6 7 2 1 4 5 Table 2

The local index for the children of a d -simplex T according to the TM-ordering. For each type b ,the d children T , . . . , T d − of a simplex of this type can be ordered according to their TM-indices.The position of T i according to the TM-order is the local index I loc ( T i ) = σ b ( i ) . type = 0R: 31 20 RF RR type = 1F: 0 32 1 RFF F Fig. 7 . Left: Using the notation from [24] we recursively describe the SFC arising from theTM-index for triangles. The number inside each child triangle is its local index. R denotes thereﬁnement scheme for type 0 triangles and F for type 1 triangles. This pattern can be obtained fromTables 1 and 2. Right: The SFC for an example adaptive reﬁnement of the root triangle. Thus, we know for each type 0 ≤ b < d ! how the children of a tetrahedron of type b are traversed. This gives us an approach for describing the SFC arising from theTM-index in a recursive fashion [24]. By specifying for each possible type b the orderand types of the children of a type b simplex, we can build up the SFC. In Figure 7we describe the SFC for triangles in this way. In three dimensions it is not convenientto draw the six pictures for the diﬀerent types, but the SFC can be derived similarlyfrom Tables 1 and 2.

4. Algorithms on simplices.

In this section we present fundamental algo-rithms that operate on d -simplices in T d . These algorithms include computations ofparent and child simplices, computation of face-neighbors and computations involvedwith the TM-index. To simplify the notation we carry out all algorithms for tetrahe-dra and then describe how to modify them for triangles. We introduce the data type Tet and do not distinguish between the abstract concept of a

Tet and the geomet-ric object (tetrahedron or triangle) that it represents. The data type

Tet T has thefollowing members: • T.(cid:96) — the reﬁnement level of T ; • T.(cid:126)x = (

T.x, T.y, T.z ) — the x - , y - and z -coordinates of T ’s anchor node, alsosometimes referred to as T.x , T.x , and T.x ; • T.b — the type of T .In 2D computations the parameter T.z is not present. To avoid confusion we use the6

C, BURSTEDDE, J. HOLKE notation (cid:126)x to denote vectors in Z d and x (without arrow) for integers, thus numbersin Z . From Corollary 7 we know that the values stored in a Tet suﬃce to uniquelyidentify a d -simplex T ∈ T . Remark

20 (Storage requirement).

The algorithms that we present in this sectiononly need this data as input for a simplex resulting in a ﬁxed storage size per

Tet . If,for example, the maximum level L is 32 or less, then the coordinates can be storedin one -byte integer per dimension, while the level and type occupy one byte each,leading to a total storage of (44) 2 × bytes per Tet in 2D, × bytes per Tet in 3D.

Remark

21 (Runtime).

Most of these algorithms run in constant time indepen-dent of the maximum level L . The only operations using a loop over the level L or T.(cid:96) ,thus having O ( L ) runtime, are computing the consecutive index from a Tet and ini-tializing a

Tet according to a given consecutive index. Hence, we show how to replacerepetitive calls of these relatively involved algorithms by more eﬃcient constant-timeones. d -simplex. The coordinates of the d + 1 nodes ofa d -simplex T can be obtained easily from its Tet-id, the relation (3), and simplearithmetic: If T is a d -simplex of level (cid:96) , type b and anchor node (cid:126)x ∈ Z d , then(45) T = 2 L− (cid:96) S b + (cid:126)x . Hence, in order to compute the coordinates of T we can take the coordinates of S b ,as given in (3), and then use relation (45). A closer look at (3) reveals that it is notnecessary to examine all coordinates of S b in order to compute the x i , but that theycan also be computed arithmetically. This computation is carried out in Algorithm4.1. Algorithm 4.1:

Coordinates ( Tet T ) X ← ( T.(cid:126)x, , , h ← L− (cid:96) ; i ← (cid:98) T.b (cid:99) ; /* Replace with i ← T.b for 2D */ if T.b % then j ← ( i + 2) % else j ← ( i + 1) % X [1] ← X [0] + he i ; X [2] ← X [1] + he j ; /* Replace with X [2] ← X [0] + ( h, h ) T for 2D */ X [3] ← X [0] + ( h, h, h ) T ; /* Remove this line for 2D */ return X ; In this section we describe how to compute the Tet-idsof the parent P ( T ) and of the 2 d children T i , 0 ≤ i < d , of a given d -simplex T ∈ T d .Computing the anchor node coordinates of the parent is easy, since their ﬁrst T.(cid:96) − T ’s anchor node and the rest of their bits is zero. TETRAHEDRAL SPACE-FILLING CURVE Pt( c, b ) b2D 0 1 c c, b ) b3D 0 1 2 3 4 5 c Z YX

Fig. 8 . The type of the parent of a d -simplex T can be determined from T ’s cube-id c and type b . Left: The values of P t from (46) in the 2D case. Middle: The values of the function

P t in the3D case. Right: Two examples showing the computation in 3D. (1) The small tetrahedron in thetop left corner (orange) has cube-id and type , and its parent (green) can be seen to have type .(2) The small tetrahedron at the bottom right (red) has cube-id and type , and its parent (blue)has type . For computing the type of P ( T ), we need the function(46) Pt : (cid:8) , . . . , d − (cid:9) × { , . . . , d ! − } −→ { , . . . , d ! − } , (cube-id( T ) , T.b ) (cid:55)−→ P.b, giving the type of T ’s parent in dependence of its cube-id and type. In Figure 8 welist all values of this function for d ∈ { , } . Algorithm 4.2: c-id ( Tet T ,int (cid:96) ) i ← h ← L− (cid:96) i | = ( T.x & h ) ? 1 : 0 i | = ( T.y & h ) ? 2 : 0 i | = ( T.z & h ) ? 4 : 0 /* Remove this line for 2D */ return i Algorithm 4.3:

Parent ( Tet T ) h ← L− T.(cid:96) P.(cid:96) ← T.(cid:96) − P.x ← T.x & ¬ h P.y ← T.y & ¬ h P.z ← T.z & ¬ h /* Remove this line for 2D */ P.b ← Pt( c-id ( T, T.(cid:96) ) , T.b ) /* See (46) and Figure 8 for Pt */ return P The algorithm

Parent to compute the parent of T now puts these two ideastogether, computing the coordinates and type of P ( T ). Algorithm 4.3 shows animplementation. It uses Algorithm 4.2 to compute the cube-id of a d -simplex.For the computation of one child T i of T for a given i ∈ (cid:8) , . . . , d − (cid:9) we lookat Bey’s deﬁnition of the subsimplices in (2) and see that in order to compute the8 C, BURSTEDDE, J. HOLKE anchor node of the child we need to know some of the node coordinates (cid:126)x , . . . , (cid:126)x d of the parent simplex T . These can be obtained via Algorithm 4.1. However, it ismore eﬃcient to compute only those coordinates of T that are actually necessary.To compute the Tet-id of T i we also need to know its type. The type of T i dependsonly on the type of T , and in the algorithm we use the function Ct (children type)to compute this type. Ct is eﬀectively an evaluation of Table 1. Algorithm 4.4 showsnow how to compute the coordinates of the i th child of T in Bey’s order.When we would like to compute the i th child of a d -simplex T of type b withrespect to the tetrahedral Morton order (thus the child T k of T with I loc ( T k ) = i )we just call Algorithm 4.4 with σ − b ( i ) as input. The permutations σ − b are availablefrom Table 2; see (43) and Algorithm 4.5. Algorithm 4.4:

Child ( Tet T ,int i ) X ← Coordinates ( T ) if i = 0 then j ← else if i ∈ { , , } then j ← else if i ∈ { , , } then j ← else if i = 3 then j ← /* If i = 3 then j ← for 2D */ T i .(cid:126)x ← ( X [0] + X [ j ]) T i .b ← Ct ( T.b, i ) /* See Table 1 */ Algorithm 4.5:

TM-Child ( Tet T ,int i ) return Child ( T , σ − T.b ( i )) /* See Table 2 */ Many applications—e.g., ﬁnite element methods—need to gather information about the face-neighboring simplices of a given simplexin a reﬁnement. In this section we describe a level-independent constant-runtimealgorithm to compute the Tet-id of the same level neighbor along a speciﬁc face f ofa given d -simplex T . This algorithm is very lightweight since it only requires a fewarithmetic computations involving the Tet-id of T and the number f . In comparisonto other approaches to ﬁnding neighbors in constant time [11, 32], our algorithm doesnot involve the computation of any of T ’s ancestors.The d + 1 faces of a d -simplex T = [ (cid:126)x , . . . , (cid:126)x d ] are numbered f , . . . , f d in sucha way that face f i is the face not containing the node (cid:126)x i . To examine the situationwhere two d -simplices of the same level share a common face, let T (cid:96)d denote a uniformreﬁnement of T d of a given level 0 ≤ (cid:96) ≤ L ,(47) T (cid:96)d := (cid:8) T | T is a descendant of T d of level (cid:96) (cid:9) ⊂ T d . T d can be seen as the disjoint union of all the T (cid:96)d :(48) T d = L (cid:91) (cid:96) =0 T (cid:96)d . Given a d -simplex T ∈ T (cid:96)d and a face number i ∈ { , . . . , d } , denote T ’s neighbor in T (cid:96)d across face f = f i by N f ( T ), and denote the face number of the neighbor simplex TETRAHEDRAL SPACE-FILLING CURVE Z YX

Type 1Type 4Type 3 Type 0

Fig. 9 . A tetrahedron T of type (in the middle, red) and its four face neighbors (blue) oftypes , , , and , drawn with their associated cubes (exploded view). We see that the type of T ’sneighbors depends only on its type, while their node coordinates relative to T ’s depend additionallyon T ’s level. (Color available online.) N f ( T ) across which T is its neighbor by ˜ f ( T ). Hence, the relation(49) N ˜ f ( T ) ( N f ( T )) = T holds for each face f of T .Our aim is to compute the Tet-id of N f ( T ) and ˜ f ( T ) from the Tet-id of T . Usingthe underlying cube structure this problem can be solved for each occuring type of d -simplex separately, and the solution scheme is independent of the coordinates of T and of (cid:96) . In Figure 9 the situation for a tetrahedron of type 5 is illustrated, and inTables 3 and 4 we present the general solution for each type.Using these tables, a constant-time computation of the Tet-id of N f ( T ) and of˜ f ( T ) from the Tet-id of T is possible, and the 3D case is carried out in Algorithm 4.6.Note that this algorithm uses arithmetic expressions in T.b to avoid the sixfold dis-tinction of cases.

Remark

To ﬁnd existing neighbors in a nonuniform reﬁnement we use Al-gorithm 4.6 in combination with

Parent and

TM-Child and comparison functions.Of course it is possible that N f ( T ) does not belong to T (cid:96)d any longer. If this isthe case, then f was part of the boundary of the root simplex T d . We describe in thenext section how we can decide in constant time whether N f ( T ) is in T (cid:96)d or not. For completeness, we summarize the geometry and numbers of d -simplices tou-ching each other via a corner ( d = 2 or d = 3) or edge (only d = 3). In this paper wedo not list these neighboring tetrahedra in detail.For d = 3 each corner in the mesh T (cid:96) has 24 adjacent tetrahedra; thus eachtetrahedron has at each corner 23 other tetrahedra that share this particular corner.For d = 2 the situation is similar, with six triangles meeting at each corner. Toexamine the number of adjacent tetrahedra to an edge we distinguish three types ofedges in T (cid:96)d :1. edges that are also edges in the underlying hexahedral mesh;2. edges that are the diagonal of a side of a cube in the hexahedral mesh;3. edges that correspond to the inner diagonal of a cube in the hexahedral mesh.Edges of the ﬁrst and third kind have six adjacent tetrahedra each, and edges of thesecond kind do have four adjacent tetrahedra each. When computing neighboring d -simplices it is possible that the neighbor in question does not belong to the rootsimplex T d but lies outside of it. If we look at face-neighbors of a d -simplex T , the0 C, BURSTEDDE, J. HOLKE f T.b = 0 N f .b N f .x T.x + h T.x T.x N f .y T.y T.y T.y − h ˜ f T.b = 1 N f .b N f .x T.x T.x T.x − h N f .y T.y + h T.y T.y ˜ f Table 3

Face neighbors in 2D. For each possible type b ∈ { , } of a triangle T and each of its faces f = f i , i ∈ { , , } , the type, anchor node coordinates, and corresponding face number ˜ f of T ’sneighbor across f are shown. In the 2D case we can directly compute N .b = 1 − T.b and ˜ f = 2 − f .Here, h = 2 L− (cid:96) refers to the length of one square of level (cid:96) . f T.b = 0 N f .b N f .x T.x + h T.x T.x T.x N f .y T.y T.y T.y T.y − h N f .z T.z T.z T.z T.z ˜ f T.b = 1 N f .b N f .x T.x + h T.x T.x T.x N f .y T.y T.y T.y T.y N f .z T.z T.z T.z T.z − h ˜ f T.b = 2 N f .b N f .x T.x T.x T.x T.x N f .y T.y + h T.y T.y T.y N f .z T.z T.z T.z T.z − h ˜ f f T.b = 3 N f .b N f .x T.x T.x T.x T.x − h N f .y T.y + h T.y T.y T.y N f .z T.z T.z T.z T.z ˜ f T.b = 4 N f .b N f .x T.x T.x T.x T.x − h N f .y T.y T.y T.y T.y N f .z T.z + h T.z T.z T.z ˜ f T.b = 5 N f .b N f .x T.x T.x T.x T.x N f .y T.y T.y T.y T.y − h N f .z T.z + h T.z T.z T.z ˜ f Table 4

Face neighbors in 3D. For each possible type b ∈ { , , , , , } of a tetrahedron T and eachof its faces f = f i , i ∈ { , , , } the type N f ( T ) .b of T ’s neighbor across f , its coordinates of theanchor node N f ( T ) .x , N f ( T ) .y , N f ( T ) .z and the corresponding face number ˜ f ( T ) , across which T is N f ( T ) ’s neighbor, are shown. fact that the considered neighbor lies outside means that the respective face was onthe boundary of T d . In order to check whether a computed d -simplex is outside thebase simplex, we investigate a more general problem: Given anchor node coordinates( x , y ) T ∈ Z , respectively ( x , y , z ) T ∈ Z , of type b a level (cid:96) , decide whether thecorresponding d -simplex N lies inside or outside of the root tetrahedron T d : N ∈ T (cid:96)d or N / ∈ T (cid:96)d . At the end of this section we furthermore generalize this to the problemof deciding for any two d -simplices N and T whether or not N lies outside of T . Wesolve this problem in constant time and independent of the levels of N and T .We examine the 3D case. Looking at T we observe that two of its boundary facescorrespond to faces of the root cube, namely, the intersections of T with the y = 0and the x = 2 L planes. The other two boundary faces of T are the intersections withthe x = z and the y = z planes. Thus, the boundary of T can be described as the TETRAHEDRAL SPACE-FILLING CURVE Algorithm 4.6:

Face-neighbor ( Tet T ,int f ) b ← T.b , x ← T.x , x ← T.x , x ← T.x if f = 1 or f = 2 then ˜ f ← f if ( b % and f = 2) or ( b % (cid:54) = 0 and f = 1) then b ← b + 1 else b ← b − else ˜ f ← − f h ← L− T.(cid:96) if f = 0 then /* f = 0 */ i ← b div x i ← T.x i + h b ← b + ( b % else /* f = 3 */ i ← ( b + 3) % div x i ← T.x i − h b ← b + ( b % return ( x , x , x , b % , ˜ f )intersection of T with those planes. We refer to the latter two planes as E and E .Let N be a tetrahedron with anchor node ( x , y , z ) T ∈ Z of type b and level (cid:96) and denote with ( x i , y i , z i ) T the coordinates of node i of N . Since ( x i , y i , z i ) T ≥ ( x , y , z ) T (componentwise), we directly conclude that if x ≥ L or y < N / ∈ T . Because the outer normal vectors of T on the two faces intersecting with E and E are(50) (cid:126)n = 1 √  −  and (cid:126)n = 1 √  −  , we also conclude that N / ∈ T if z − x > y − z >

0. Now we have alreadycovered all the cases except those where the anchor node of N lies directly in E or E . In these cases we cannot solve the problem by looking at the coordinates of theanchor node alone, since there exist tetrahedra T (cid:48) ∈ T with anchor nodes lying in oneof these planes (see Figure 10 for an illustration of the analogous case in 2D). Thisdepends on the type of the tetrahedron in question. We observe that a tetrahedron T (cid:48) ∈ T with anchor node lying in E can have the types 0 ,1, or 2, and a tetrahedronwith anchor node lying in E can have the types 0, 4, or 5. We conclude that tocheck whether N is outside of the root tetrahedron we have to check if any one of sixconditions is fulﬁlled. In fact these conditions ﬁt into the general form below with x i = x , x j = y , x k = z , and T as the root tetrahedron; thus T.x = T.y = T.z = 0 and

T.b = 0.These generalized conditions solve the problem to check for any two given tetra-hedra N and T , whether N lies outside of T or not. Proposition

Given two d -simplices N, T with

N.(cid:96) > T.(cid:96) , then N is outsideof the simplex T —which is equivalent to saying that N is no descendant of T —if and C, BURSTEDDE, J. HOLKE

T.b

2D 0 1 x i x yx j y x T.b

3D 0 1 2 3 4 5 x i x x y y z zx j y z z x x yx k z y x z y x Table 5

Following the general scheme described in this section to compute whether a given d -simplex N lies outside of another given d -simplex T , we give the coordinates x i , x j , and x k in dependence ofthe type of T . only if at least one of the following conditions is fulﬁlled.For 2D, N.x i − T.x i ≥ L− T.(cid:96) , (51a) N.x j − T.x j < , (51b) ( N.x j − T.x j ) − ( N.x i − T.x i ) > , (51c) N.x i − T.x i = N.x j − T.x j and N.b = (cid:26) if T.b = 0 , if T.b = 1 . (51d) For 3D,

N.x i − T.x i ≥ L− T.(cid:96) , (52a) N.x j − T.x j < , (52b)( N.x k − T.x k ) − ( N.x i − T.x i ) > , (52c)( N.x j − T.x j ) − ( N.x k − T.x k ) > , (52d) N.x j − T.x j = N.x k − T.x k and N.b ∈ (cid:26) { T.b + 1 , T.b + 2 , T.b + 3 } , if T.b is even, { T.b − , T.b − , T.b − } , if T.b is odd, (52e)

N.x k − T.x k = N.x i − T.x i and N.b ∈ (cid:26) { T.b − , T.b − , T.b − } , if T.b is even, { T.b + 1 , T.b + 2 , T.b + 3 } , if T.b is odd. (52f)

N.x j − T.x j = N.x k − T.x k and N.x i − T.x i = N.x k − T.x k and N.b (cid:54) = T.b (52g)

The coordinates x i , x j , and x k are chosen in dependence of the type of T accordingto Table 5. TETRAHEDRAL SPACE-FILLING CURVE XY Cases 3 and 4:( N .x < N .y ) or( N .x = N .y and N .b = 1) Case 1: N .x ≥ L Case 2: N .y < b =0 b =1 Fig. 10 . A uniform level 2 reﬁnement ofthe triangle T in 2D and triangles lying outsideof it with their anchor nodes marked. Whendeciding whether a triangle with given anchornode coordinates lies outside of T there arefour cases to consider, one for each face of T .For the two faces lying parallel to the X -axis,respectively, Y -axis, it suﬃces to check whetherthe x -coordinate is greater than or equal to L ,or whether the y -coordinate is smaller than .Similarly one can conclude that the triangle liesoutside of T if its x -coordinate is smaller thanits y -coordinate. If both coordinates agree (andnone of the previous cases applies) then thegiven triangle is outside T if and only if itstype is . Proof.

By shifting N by ( − T.(cid:126)x ) we reduce the problem to checking whether theshifted d -simplex lies outside of a d -simplex with anchor node (cid:126)

0, level

T.(cid:96) and type

T.b . For d = 3 the proof is analogous to the above argument, where we consideredthe case b = 0 and (cid:96) = 0. In two dimensions the situation is even simpler, since thereexists only one face of the root triangle that is not a coordinate axis (see Figure 10). In contrast to the Morton index for cubes, the TM-index for d -simplices does not produce a consecutive range of numbers. Therefore,two simplices T and T (cid:48) of level (cid:96) that are direct successors/predecessors with respectto the tetrahedral Morton order do not necessarily fulﬁll m ( T ) = m ( T (cid:48) ) ± d ( L− (cid:96) ) or m ( T ) = m ( T (cid:48) ) ±

1. For d = 3 this follows directly from the fact that each b j occupiesthree bits, but there are only six values that each b j can assume, since there are onlysix diﬀerent types. In 2D this follows from the fact that not every combination ofanchor node coordinates and type can occur for triangles in T , the triangle withanchor node (0 ,

0) and type 1 being one example. This also means that the largestoccurring TM-index is bigger than 2 d L − (cid:96) a consecutive index I ( T ) ∈ (cid:8) , . . . , d(cid:96) − (cid:9) , such that(53) ∀ T, S ∈ T (cid:96)d : I ( T ) < I ( S ) ⇔ m ( T ) < m ( S ) . This index can also be understood as a bijection(54) I : (cid:8) m ∈ N | m = m ( T ) for a T ∈ T (cid:96)d (cid:9) ∼ = −→ (cid:8) , . . . , d(cid:96) − (cid:9) , mapping the TM-indices of level (cid:96) d -simplices to a consecutive range of numbers. It isobvious that I ( T d ) = 0. The index I ( T ) can be easily computed as the (cid:96) -digit 2 d -arynumber consisting of the local indices as digits, thus(55) I ( T ) = ( I loc ( T ) , . . . , I loc ( T (cid:96) )) d . Algorithm 4.7 shows an implementation of this computation. It can be donedirectly from the Tet-id of T , and thus it is not necessary to compute the TM-indexof T ﬁrst.4 C, BURSTEDDE, J. HOLKE

Algorithm 4.7: I ( Tet T ) I ← b ← T.b for T.(cid:96) ≥ i ≥ do c ← c-id ( T, i ) I ← I + 8 i I b loc ( c ) /* See Table 6; multiply with i for 2D */ b ← Pt( c, b ) return I The inverse operation of computing T from I ( T ) and a given level (cid:96) can be carriedout in a similar fashion; see Algorithm 4.8. For each 0 ≤ i ≤ (cid:96) we look up the type b and the cube-id of T i from I loc ( T i ) and the type of Parent ( T i ) = T i − (startingwith type( T ) = 0) via Tables 7 and 8. From the cube-ids we can build up the anchornode coordinates of T . The last computed type is the type of T . The runtime of thisalgorithm is O ( T.(cid:96) ). Algorithm 4.8: T (consecutive index I ,int (cid:96) ) T.x, T.y, T.z ← b ← for ≤ i ≤ (cid:96) do Get I loc ( T i ) from I /* See (55) */ c ← c-id ( T i ) , b ← T i .b /* See Tables 7 and 8 */ if c & 1 then T.x ← T.x + 2 L− i if c & 2 then T.y ← T.y + 2 L− i if c & 4 then T.z ← T.z + 2 L− i /* Remove this line for 2D */ T.b ← b return T Similar to Algorithm 4.7 is Algorithm

Is valid , which decides whether a givenindex m ∈ [0 , L ) ∩ Z is in fact a TM-index for a tetrahedron. Thus, in the spirit ofSection 3.5 we can decide whether a given 6D cube is in the image of map (25) thatembeds T into the set of 6D subcubes of [0 , L ] . The runtime of Is valid is O ( L ). Algorithm 4.9:

Is valid ( m ∈ [0 , L ) ∩ Z , (cid:96) ) I ← k ← L − i ) for (cid:96) ≥ i ≥ do b ← ( m k , m k +1 , m k +2 ) c ← ( m k +3 , m k +4 , m k +5 ) k ← L − i + 1) if ( m k , m k +1 , m k +2 ) (cid:54) = Pt ( c, b ) then /* Take (0 , , if i = 1 */ return False return True

The consecutive index simpliﬁes the relation between the TM-index of a simplexand its position in the SFC. In the special case of a uniform mesh, the consecutive

TETRAHEDRAL SPACE-FILLING CURVE I b loc ( c ) cube-id c2D 0 1 2 3b 0 0 1 1 31 0 2 2 3 I b loc ( c ) cube-id c3D 0 1 2 3 4 5 6 7b 0 0 1 1 4 1 4 4 71 0 1 2 5 2 5 4 72 0 2 3 4 1 6 5 73 0 3 1 5 2 4 6 74 0 2 2 6 3 5 5 75 0 3 3 6 3 6 6 7 Table 6

The local index of a tetrahedron T ∈ T in dependence of its cube-id c and type b . cube-id( T ) I loc ( T )2D 0 1 2 3 P.b T ) I loc ( T )3D 0 1 2 3 4 5 6 7 P.b

Table 7

For a tetrahedron T ∈ T of local Index I loc whose parent P has type P.b we give the cube-id of T . index and the position are identical. Calculating the TM-index corresponding toa particular consecutive index is occasionally needed in higher-level algorithms. Thisis relatively expensive, since it involves a loop over all reﬁnement levels, thus some10 to 30 in extreme cases. However often the task is to compute a whole range of d -simplices. This occurs, for example, when creating an initial uniform reﬁnementof a given mesh (see Algorithm New in Section 5.1). That is, for a given consecutiveindex I , a level (cid:96) , and a count n , ﬁnd the n level- (cid:96) simplices following the d -simplexcorresponding to the consecutive index I , that is, the d -simplices corresponding tothe n consecutive indices I, I + 1 , . . . , I + n −

1. Ideally, this operation should runlinearly in n , independent of (cid:96) , but if we used Algorithm 4.8 to create each of the n + 1 simplices we would have a runtime of O ( n L ). In order to achieve the desiredlinear runtime we introduce the operations Successor and

Predecessor that run inaverage O (1) time. These operations compute from a given d -simplex T of level (cid:96) withconsecutive index I T the d -simplex T (cid:48) whose consecutive index is I T + 1, respectively, I T −

1. Thus, T (cid:48) is the next level (cid:96) simplex in the SFC after T (resp. the previous one).Algorithm 4.10, which we introduce to solve this problem does not require knowledgeabout the consecutive indices I T and I T ± T we only need to reverse the sign in Line 3 in the Successor recursion subroutine of Algorithm 4.10.

Lemma

Algorithm 4.10 has constant average runtime (independent of L ).Proof. Because each operation in the algorithm can be executed in constant time,6

C, BURSTEDDE, J. HOLKE

T.b I loc ( T )2D 0 1 2 3 P.b

T.b I loc ( T )3D 0 1 2 3 4 5 6 7 P.b

Table 8

For a tetrahedron T ∈ T of local Index I loc whose parent P has type P.b we give the type of T . Algorithm 4.10:

Successor ( Tet T ) return Successor recursion ( T, T, T.(cid:96) ) Function

Successor recursion ( Tet T , Tet T (cid:48) ,int (cid:96) ) c ← c-id ( T, (cid:96) ) From c and b look up i := I loc ( T (cid:96) ) /* See Table 6 */ i ← ( i + 1) % if i = 0 then /* Enter recursion (in rare cases) */ T (cid:48) ← Successor recursion ( T, T (cid:48) , (cid:96) − ˆ b ← T (cid:48) .b /* ˆ b stores the type of T (cid:48) (cid:96) − */ else ˆ b ← Pt( c, b ) From ˆ b and I loc = i look up ( c (cid:48) , b (cid:48) ) /* See Tables 7 and 8 */ Set the level (cid:96) entries of T (cid:48) .x, T (cid:48) .y and T (cid:48) .z to c (cid:48) T (cid:48) .b ← b (cid:48) return T (cid:48) the average runtime is nc , where c is a constant (independent of L ) and n − i cycles through 0 to 2 d − d th step, allowing for a geometric series argument.We see in Algorithm 4.10 the usefulness of the consecutive index. Because we areusing this index instead of the TM-index, computing the index of the successor/pre-decessor only requires adding/subtracting 1 to the given index. On the other hand,computing the TM-index of a successor/predecessor would involve more subtle com-putations. TETRAHEDRAL SPACE-FILLING CURVE

5. High-level AMR algorithms.

To develop the complete AMR functionalityrequired by numerical applications, we aim at a forest of quad-/octrees in the spiritof [16, 28]. Key top-level algorithms are: • New . Given an input mesh of conforming simplices, each considered a rootsimplex, generate an initial reﬁnement. • Adapt . Adapt (reﬁne and coarsen) a mesh according to a given criterion. • Partition . Partition a mesh among all processes such that the load is bal-anced, possibly according to weights. • Balance . Establish a 2:1 size condition between neighbors in a given reﬁnedmesh. The levels of any two neighboring simplices must diﬀer by at most 1. • Ghost . For each process, assemble the layer of directly neighboring elementsowned by other processes. • Iterate . Iterate through the local mesh, executing a callback function oneach element and on all interelement interfaces.Since partitioning via SFC only uses the SFC index as information, we refer to al-ready existing descriptions of

Partition for hexahedral or simplicial SFCs [16, 40].

Balance , Ghost , and

Iterate are sophisticated parallel algorithms and require addi-tional theoretical work, which is beyond the scope of this paper.Here, we brieﬂy describe

New and

Adapt . In the forest-of-trees approach we modelan adaptive mesh by a coarse mesh of level 0 d -simplices, the trees . Such a coarsemesh could be speciﬁed manually for simple geometries, or obtained from executinga mesh generator. Each level 0 simplex is identiﬁed with the root simplex T d andthen reﬁned adaptively to produce the ﬁne and potentially nonconforming mesh of d -simplices. These simplices are partitioned among all processes; thus each processholds a range of trees, of which the ﬁrst and last may be incomplete: Their leaves aredivided between multiple processes.An entity F of the structure forest consists of the following entries • F .C — the coarse mesh; • F . K — the process-local trees; • F . E k — for each local tree k the list of process-local simplices in tetrahedralMorton order.We acknowledge that New and

Adapt are essentially communication-free, but still servewell to exercise some of the fundamental algorithms described earlier.

The

New algorithm creates a partitioned uniform level (cid:96) reﬁned forestfrom a given coarse mesh. To achieve this, we ﬁrst compute the ﬁrst and last d -simplices belonging to the current process p . From this range we can calculate whichtrees belong to p and for each of these trees, the consecutive index of the ﬁrst andlast d -simplices on this tree. We then create the ﬁrst simplex in a tree by a call to T (Algorithm 4.8). In contrast to the New algorithm in [16] we create the remainingsimplices by calls to

Successor instead of T to avoid the O ( (cid:96) ) runtime of T in thecase of simplices. Our numerical tests, displayed in Figure 11, show that the runtimeof New is in fact linear in the number of elements and does not depend on the level (cid:96) .Within the algorithm, K denotes the number of trees in the coarse mesh and P thenumber of processes.After New returns, the process local number of elements is known, and per-elementdata can be allocated linearly in an array of structures, or a structure of arrays,depending on the speciﬁcs of the application.

The

Adapt algorithm modiﬁes an existing forest by reﬁning andcoarsening the d -simplices of a given forest according to a callback function. It does8 C, BURSTEDDE, J. HOLKE

Algorithm 5.1:

New ( Coarse Mesh C , int (cid:96) ) n ← d(cid:96) , N ← nK ; /* d -simplices per tree and global number of d -simplices */ g ﬁrst ← (cid:98) N p/P (cid:99) , g last ← (cid:98) N ( p + 1) /P (cid:99) − /* Global numbers of firstand last.. */ k ﬁrst ← (cid:98) g ﬁrst /n (cid:99) , k last ← (cid:98) g last /n (cid:99) ; /* ..local simplex and local treerange */ for t ∈ { k ﬁrst , . . . , k last } do e ﬁrst ← ( t = k ﬁrst ) ? g ﬁrst − nt : 0; e last ← ( t = k last ) ? g last − nt : n − T ← T ( e ﬁrst , (cid:96) ); /* Call Algorithm 4.8 */ E k ← { T } ; for e ∈ { e ﬁrst , . . . , e last − } do T ← Successor ( T ); E k ← E k ∪ { T } this by traversing the d -simplices of each tree in tetrahedral Morton order and passingthem to the callback function. If the current d -simplex and its 2 d − d d -simplices as input plus the index ofthe current tree. In both cases, a return value greater than zero means that the ﬁrstinput d -simplex should be reﬁned, and thus its 2 d children are added in tetrahedralMorton order to the new forest. Additionally, if the input consists of 2 d simplices,they form a family, and a return value smaller than zero means that this family shouldbe coarsened, thus replaced by their parent. If the callback function returns zero, theﬁrst given d -simplex remains unchanged and is added to the new forest, and Adapt continues with the next d -simplex in the current tree. The Adapt algorithm createsa new forest from the given one and can handle recursive reﬁnement/coarsening. Forthe recursive part we make use of the following reasonable assumptions: • A d -simplex that was created in a reﬁne step will not be coarsened duringthe same adapt call. • A d -simplex that was created in a coarsening step will not be reﬁned duringthe same adapt call.From these assumptions we conclude that for recursive reﬁnement we only have toconsider those d -simplices that were created in a previous reﬁnement step and that weonly have to care about recursive coarsening directly after we processed a d -simplexthat was not reﬁned and could be the last d -simplex in a family. If reﬁnement andcoarsening are not done recursively, the runtime of Adapt is linear in the number of d -simplices of the given forest.An application will generally project or otherwise transform data from the previ-ous to the adapted mesh. This can be done within the adaptation callback, which isknown to proceed linearly through the local elements, or after Adapt returns if a copyof the old mesh has been retained. In the latter case, one would allocate element datafor the adapted mesh and then iterate over the old and the new data simultaneously,performing the projection in the order of the SFC. Once this is done, the old dataand the previous mesh are deallocated [15].

TETRAHEDRAL SPACE-FILLING CURVE R un t i m e [ s ] Number of processesStrong scaling of

New

New level 8New level 10Ideal strong scaling

Runtime tests for

New . .

059 –8 8 . .

451 7 .

649 6 . .

58 7 . . . . . .

116 –9 6 . .

898 7 . . .

15 7 . Fig. 11 . Runtime tests for

New on JUQUEEN. Left: Two strong scaling studies. A new uniformlevel 8 (circles) and level 10 (triangles) reﬁnement of a coarse mesh of 512 root tetrahedra, carriedout with 16 up to 2,048 processes and 1,024 up to 131,072 processes with 16 processes per computenode. Right: The data shows that the runtime of

New is linear in the number of generated elementsand does not additionally depend on the level. The uniform reﬁnement is created from a coarsemesh of 512 root tetrahedra. For the ﬁrst computation on 64 processes we use 1 process per computenode and for the computation on 256 processes we use 2 processes per node.

6. Performance evaluation.

Given the design of the algorithms discussed inthis paper, we expect runtimes that are precisely proportional to the number of el-ements and independent of the level of reﬁnement. To verify this, we present scal-ing and runtime tests for New and

Adapt on the JUQUEEN supercomputer at theForschungszentrum Juelich [19], an IBM BlueGene/Q system with 28,672 nodes con-sisting of 16 IBM PowerPC A2 @ 1 . . × tetrahedra with 13 millionelements per process.The ﬁrst two tests are a strong scaling (up to 131k processes) and a runtimestudy of New in 3D, shown in Figure 11. For both tests we use a coarse mesh of 512tetrahedra. We time the

New algorithm with input level 8 (resp. level 10 for highernumbers of processes). We execute the runtime study to examine whether

New hasthe proposed level-independent linear runtime in the number of generated tetrahedra,which can be read from the results presented in the Table in Figure 11.The last test is

Adapt with a recursive nonuniform reﬁnement pattern. Thestarting point for all runs is a mesh obtained by uniformly reﬁning a coarse meshof 512 tetrahedra to a given initial level k . This mesh is then reﬁned recursivelyusing a single Adapt call, where only the tetrahedra of types 0 and 3 whose leveldoes not exceed the ﬁne level k + 5 are reﬁned recursively. The resulting mesh oneach tetrahedron resembles a fractal pattern similar to the Sierpinski tetrahedron.We perform several strong and weak scaling runs on JUQUEEN starting with 128processes and scaling up to 131,072. The setting is 16 processes per compute node. Weﬁnally do another strong scaling run on the full system of the MIRA supercomputerat the Argonne Leadership Computing Facility with 786,432 processes and again 16processes per compute node. Figure 12 shows our runtime results.

7. Conclusion.

We present a new encoding for adaptive nonconforming triangu-lar and tetrahedral mesh reﬁnement based on Bey’s red-reﬁnement rule. We identify Version v0.1 is available at https://bitbucket.org/cburstedde/t8code.git C, BURSTEDDE, J. HOLKE R un t i m e [ s ] Number of processesStrong scaling of

Adapt , reﬁne types 0 and 3

Adapt from level 4 to 9level 5 to 10level 6 to 11level 7 to 12Ideal weak scalingIdeal strong scaling

Fig. 12 . Strong scaling for

Adapt with a fractal reﬁnement pattern. Starting from an initiallevel k on a coarse mesh of 512 tetrahedra we reﬁne recursively to a maximal ﬁnal level k + 5 .The reﬁnement callback is such that only subtetrahedra of types 0 and 3 are reﬁned. Left: Strongand weak scaling on JUQUEEN with up to 131,072 processes and strong scaling on MIRA withup to 786,432 processes. On both systems we use 16 processes per compute node. The level 12mesh consists of 858,588,635,136 tetrahedra. Right: An initial level 0 and ﬁnal level 3 reﬁnementaccording to the fractal pattern. The subtetrahedra of levels 1 and 2 are transparent. six diﬀerent types of tetrahedra (and two types of triangles) and prescribe an orderingof the children for each of these types that diﬀers from Bey’s original order. By intro-ducing an embedding of the mesh elements into a Cartesian coordinate structure, wedeﬁne a tetrahedral Morton index that can be computed using bitwise interleavingsimilar to the Morton index for cubes. This tetrahedral Morton index shares someproperties with the well-known cubical one and allows for a memory-eﬃcient randomaccess storage of the mesh elements.Exploiting the Cartesian coordinate structure, we develop several constant-timealgorithms on simplices. These include computing the parent, the children, and theface-neighbors of a given mesh element, as well as computing the next and previouselements according to the SFC.In view of providing a complete suite of parallel dynamic AMR capabilities, theconstructions and algorithms described in this paper are just the beginning. A repar-titioning algorithm following our SFC, for example, is easy to imagine, but challengingto implement if the tree connectivity is to be partitioned dynamically, and if globalshared metadata shall be reduced from being proportional to the number of ranks tothe number of compute nodes. The present paper provides atomic building blocksthat can be used in high-level algorithms for 2:1 balancing [27] and the computationof ghost elements and generalized topology iteration [28]; regardless, these algorithmsstill have to be written and are likely to raise questions that we will have to addressin future work. Notwithstanding the above, we believe that the choices presented inthis paper are sustainable for maintaining extreme scalability in the long term. Acknowledgments.

The authors gratefully acknowledge support by the BonnHausdorﬀ Center for Mathematics (HCM) funded by the Deutsche Forschungsgemein-schaft (DFG). Author Holke acknowledges additional support by the Bonn Interna-tional Graduate School for Mathematics (BIGS) as part of HCM.We would like to thank Tobin Isaac (University of Chicago) for his thoughts onsoftware interfacing.We also would like to thank the Gauss Centre for Supercomputing (GCS) for pro-viding computing time through the John von Neumann Institute for Computing (NIC)on the GCS share of the supercomputer JUQUEEN [19] at the J¨ulich Supercomput-ing Centre (JSC). GCS is the alliance of the three national supercomputing centers

TETRAHEDRAL SPACE-FILLING CURVE

REFERENCES[1]

V. Akc¸elik, J. Bielak, G. Biros, I. Epanomeritakis, A. Fernandez, O. Ghattas, E. J.Kim, J. Lopez, D. R. O’Hallaron, T. Tu, and J. Urbanic , High resolution forward andinverse earthquake modeling on terascale computers , in SC03: Proceedings of the Interna-tional Conference for High Performance Computing, Networking, Storage, and Analysis,ACM/IEEE, 2003. Gordon Bell Prize for Special Achievement.[2]

F. Alauzet and A. Loseille , On the use of space ﬁlling curves for parallel anisotropic meshadaptation , in Proceedings of the 18th International Meshing Roundtable, B. W. Clark,ed., Springer, 2009, pp. 337–357, doi:10.1007/978-3-642-04319-2 20, http://dx.doi.org/10.1007/978-3-642-04319-2 20.[3]

I. Babuska and W. C. Rheinboldt , Error estimates for adaptive ﬁnite element computations

M. Bader , Space-Filling Curves: An Introduction with Applications in Scientiﬁc Computing ,Texts in Computational Science and Engineering, Springer, 2012.[5]

W. Bangerth, C. Burstedde, T. Heister, and M. Kronbichler , Algorithms and data struc-tures for massively parallel generic adaptive ﬁnite element codes , ACM Transactions onMathematical Software, 38 (2011), pp. 14:1–14:28.[6]

W. Bangerth, R. Hartmann, and G. Kanschat , deal.II – a general-purpose object-orientedﬁnite element library , ACM Transactions on Mathematical Software, 33 (2007), p. 24,doi:10.1145/1268776.1268779.[7] R. E. Bank, A. H. Sherman, and A. Weiser , Some reﬁnement algorithms and data structuresfor regular local mesh reﬁnement , in Scientiﬁc Computing, Applications of Mathematicsand Computing to the Physical Sciences, R. S. Stapleman, ed., vol. 1, IMACS/North-Holland Publishing Co., 1983.[8]

E. B¨ansch , Local mesh reﬁnement in 2 and 3 dimensions

J. Bey , Der BPX-Vorkonditionierer in drei Dimensionen: Gitterverfeinerung, Parallelisierungund Simulation , Universit¨at Heidelberg, (1992). Preprint.[10]

J. Bey , Simplicial grid reﬁnement: on Freudenthal’s algorithm and the optimal number of con-gruence classes , Numerische Mathematik, 85 (2000), pp. 1–29, doi:10.1007/s002110050475,http://dx.doi.org/10.1007/s002110050475.[11]

K. Brix, R. Massjung, and A. Voss , Reﬁnement and connectivity algorithms for adaptivediscontinuous Galerkin methods , SIAM Journal on Scientiﬁc Computing, 33 (2011), pp. 66–101.[12]

T. N. Bui and C. Jones , Finding good approximate vertex and edge partitions is NP-hard

H.-J. Bungartz, M. Mehl, and T. Weinzierl , A parallel adaptive Cartesian PDE solverusing space–ﬁlling curves , Euro-Par 2006 Parallel Processing, (2006), pp. 1064–1074.[14]

C. Burstedde, O. Ghattas, M. Gurnis, T. Isaac, G. Stadler, T. Warburton, and L. C.Wilcox , Extreme-scale AMR , in SC10: Proceedings of the International Conference forHigh Performance Computing, Networking, Storage and Analysis, ACM/IEEE, 2010.[15]

C. Burstedde, O. Ghattas, G. Stadler, T. Tu, and L. C. Wilcox , Towards adaptive meshPDE simulations on petascale computers , in Proceedings of Teragrid ’08, 2008.[16]

C. Burstedde, L. C. Wilcox, and O. Ghattas , p4est : Scalable algorithms for parallel adap-tive mesh reﬁnement on forests of octrees , SIAM Journal on Scientiﬁc Computing, 33(2011), pp. 1103–1133, doi:10.1137/100791634.[17] C. Carstensen , An adaptive mesh-reﬁning algorithm allowing for an H stable L projection C, BURSTEDDE, J. HOLKE onto Courant ﬁnite element spaces , Constr. Approx., 20 (2004), pp. 549–564.[18]

U. Catalyurek, E. Boman, K. Devine, D. Bozdag, R. Heaphy, and L. Riesen , Hypergraph-based dynamic load balancing for adaptive scientiﬁc computations , in Proc. of 21st Inter-national Parallel and Distributed Processing Symposium (IPDPS’07), IEEE, 2007.[19]

J. S. Centre , JUQUEEN: IBM Blue Gene/Q supercomputer system at the J¨ulich Supercom-puting Centre , Journal of large-scale research facilities, A1 (2015), http://dx.doi.org/10.17815/jlsrf-1-18. http://dx.doi.org/10.17815/jlsrf-1-18.[20]

C. Chevalier and F. Pellegrini , PT-Scotch: A tool for eﬃcient parallel graph ordering ,Parallel Computing, 34 (2008), pp. 318–331, doi:10.1016/j.parco.2007.12.001, http://dx.doi.org/10.1016/j.parco.2007.12.001.[21]

W. D¨orfler , A convergent adaptive algorithm for Poisson’s equation

P. F. Fischer, G. W. Kruse, and F. Loth , Spectral element methods for transitionalﬂows in complex geometries , Journal of Scientiﬁc Computing, 17 (2002), pp. 81–98,doi:10.1023/A:1015188211796.[23]

N. Golias and R. Dutton , Delaunay triangulation and 3D adaptive mesh gen-eration

H. Haverkort and F. van Walderveen , Locality and bounding-box quality of two-dimensionalspace-ﬁlling curves , Computational Geometry, 43 (2010), pp. 131–174.[25]

D. Hilbert , ¨Uber die stetige Abbildung einer Linie auf ein Fl¨achenst¨uck , Mathematische An-nalen, 38 (1891), pp. 459–460.[26] Ø. Hjelle and M. Dæhlen , Triangulations and Applications , Mathematics and Visualization,Springer, 2006, https://books.google.de/books?id=cRXAe8CVbBYC.[27]

T. Isaac, C. Burstedde, and O. Ghattas , Low-cost parallel algorithms for 2:1 octree bal-ance , in Proceedings of the 26th IEEE International Parallel & Distributed ProcessingSymposium, IEEE, 2012. http://dx.doi.org/10.1109/IPDPS.2012.47.[28]

T. Isaac, C. Burstedde, L. C. Wilcox, and O. Ghattas , Recursive algorithms for dis-tributed forests of octrees , SIAM Journal on Scientiﬁc Computing, 37 (2015), pp. C497–C531, doi:10.1137/140970963, http://arxiv.org/abs/1406.0089.[29]

G. Karypis and V. Kumar , A parallel algorithm for multilevel graph partitioning and sparsematrix ordering , Journal of Parallel and Distributed Computing, 48 (1998), pp. 71–95.[30]

D. A. Kopriva, S. L. Woodruff, and M. Y. Hussaini , Computation of electromagnetic scat-tering with a non-conforming discontinuous spectral element method , International Journalfor Numerical Methods in Engineering, 53 (2002), pp. 105–122, doi:10.1002/nme.394.[31]

I. Kossaczk`y , A recursive approach to local mesh reﬁnement in two and three dimen-sions

M. Lee, L. De Floriani, and H. Samet , Constant-time neighbor ﬁnding in hierarchical tetra-hedral meshes , in SMI 2001 International Conference on Shape Modeling and Applications,May 2001, pp. 286–295, doi:10.1109/SMA.2001.923400.[33]

A. Logg, K.-A. Mardal, and G. N. Wells , eds.,

Automated Solution of Diﬀerential Equa-tions by the Finite Element Method , vol. 84 of Lecture Notes in Computational Scienceand Engineering, Springer, 2012, doi:10.1007/978-3-642-23099-8.[34]

G. M. Morton , A computer oriented geodetic data base; and a new technique in ﬁle sequencing ,tech. report, IBM Ltd., 1966.[35]

L. Oliker and R. Biswas , PLUM: Parallel load balancing for adaptive unstructuredmeshes

L. Oliker, R. Biswas, and H. N. Gabow , Parallel tetrahedral mesh adaptation with dy-namic load balancing , Parallel Computing, 26 (2000), pp. 1583–1608, doi:10.1016/S0167-8191(00)00047-8, http://dx.doi.org/10.1016/S0167-8191(00)00047-8.[37]

OpenCFD , OpenFOAM – The Open Source CFD Toolbox – User’s Guide , OpenCFD Ltd.,United Kingdom, 1.4 ed., 11 2007.[38]

E. J. Otoo and H. Zhu , Indexing on spherical surfaces using semi-quadcodes , in Advances inSpatial Databases, D. Abel and B. Chin Ooi, eds., vol. 692 of Lecture Notes in ComputerScience, Springer, 1993, pp. 510–529, doi:10.1007/3-540-56869-7 29, http://dx.doi.org/10.1007/3-540-56869-7 29.[39]

G. Peano , Sur une courbe, qui remplit toute une aire plane , Math. Ann., 36 (1890), pp. 157– TETRAHEDRAL SPACE-FILLING CURVE J. R. Pilkington and S. B. Baden , Partitioning with spaceﬁlling curves , tech. report, Dept.of Computer Science and Engineering, University of California, San Diego, 1994.[41]

A. Rahimian, I. Lashuk, S. Veerapaneni, A. Chandramowlishwaran, D. Malhotra,L. Moon, R. Sampath, A. Shringarpure, J. Vetter, R. Vuduc, et al. , Petascaledirect numerical simulation of blood ﬂow on 200k cores and heterogeneous architectures ,in Proceedings of the 2010 ACM/IEEE International Conference for High PerformanceComputing, Networking, Storage and Analysis, IEEE Computer Society, 2010, pp. 1–11.[42]

M. Rasquin, C. Smith, K. Chitale, E. S. Seol, B. A. Matthews, J. L. Martin, O. Sahni,R. M. Loy, M. S. Shephard, and K. E. Jansen , Scalable implicit ﬂow solver for realisticwing simulations with ﬂow control , Computing in Science Engineering, 16 (2014), pp. 13–21, doi:10.1109/MCSE.2014.75.[43]

W. C. Rheinboldt and C. K. Mesztenyi , On a data structure for adaptive ﬁnite elementmesh reﬁnements , ACM Transactions on Mathematical Software, 6 (1980), pp. 166–187,doi:10.1145/355887.355891.[44]

H. Sagan , Space-Filling Curves , Springer, 1994.[45]

J. Shewchuk , Delaunay Reﬁnement Mesh Generation , PhD thesis, Carnegie Mellon University,Pittsburgh, May 1997. CMU CS Tech Report CMU-CS-97-137.[46]

J. R. Shewchuk , Triangle: Engineering a 2D quality mesh generator and delaunay triangula-tor , in Applied Computational Geometry: Towards Geometric Engineering, M. C. Lin andD. Manocha, eds., vol. 1148 of Lecture Notes in Computer Science, Springer-Verlag, 1996,pp. 203–222. From the First ACM Workshop on Applied Computational Geometry.[47]

H. Si , TetGen—A Quality Tetrahedral Mesh Generator and Three-Dimensional Delaunay Tri-angulator , Weierstrass Institute for Applied Analysis and Stochastics, Berlin, 2006.[48]

W. Sierpi´nski , Sur une nouvelle courbe continue qui remplit toute une aire plane , Bulletin del’Acad´emie des Sciences de Cracovie, S´eries A (1912), pp. 462–478.[49]

C. W. Smith, M. Rasquin, D. Ibanez, K. E. Jansen, and M. S. Shephard , Application spe-ciﬁc mesh partition improvement

D. A. Steinman, J. S. Milner, C. J. Norley, S. P. Lownie, and D. W. Holdsworth , Image-based computational simulation of ﬂow dynamics in a giant intracranial aneurysm ,American Journal of Neuroradiology, 24 (2003), pp. 559–566.[51]

J. R. Stewart and H. C. Edwards , A framework approach for developing parallel adaptivemultiphysics applications , Finite Elements in Analysis and Design, 40 (2004), pp. 1599–1617, doi:10.1016/j.ﬁnel.2003.10.006.[52]

H. Sundar, R. Sampath, and G. Biros , Bottom-up construction and 2:1 balance reﬁnement oflinear octrees in parallel , SIAM Journal on Scientiﬁc Computing, 30 (2008), pp. 2675–2708,doi:10.1137/070681727.[53]

T. J. Tautges, R. Meyers, K. Merkley, C. Stimpson, and C. Ernst , MOAB: A mesh-oriented database , SAND2004-1592, Sandia National Laboratories, Apr. 2004. Report.[54]

T. Tu, D. R. O’Hallaron, and O. Ghattas , Scalable parallel octree meshing forterascale applications , in SC ’05: Proceedings of the International Conference forHigh Performance Computing, Networking, Storage, and Analysis, ACM/IEEE, 2005,doi:http://dx.doi.org/10.1109/SC.2005.61.[55]

T. Weinzierl and M. Mehl , Peano—a traversal and storage scheme for octree-like adaptiveCartesian multiscale grids , SIAM Journal on Scientiﬁc Computing, 33 (2011), pp. 2732–2760, http://link.aip.org/link/?SCE/33/2732.[56]

S. Zhang , Successive subdivisions of tetrahedra and multigrid methods on tetrahedral meshes ,Houston Journal of Mathematics, 21 (1995).[57]