[PDF] GRAthena++: puncture evolutions on vertex-centered oct-tree AMR

Abstract

Numerical relativity is central to the investigation of astrophysical sources in the dynamical and strong-field gravity regime, such as binary black hole and neutron star coalescences. Current challenges set by gravitational-wave and multi-messenger astronomy call for highly performant and scalable codes on modern massively-parallel architectures. We present GR-Athena++, a general-relativistic, high-order, vertex-centered solver that extends the oct-tree, adaptive mesh refinement capabilities of the astrophysical (radiation) magnetohydrodynamics code Athena++. To simulate dynamical space-times GR-Athena++ uses the Z4c evolution scheme of numerical relativity coupled to the moving puncture gauge. We demonstrate stable and accurate binary black hole merger evolutions via extensive convergence testing, cross-code validation, and verification against state-of-the-art effective-one-body waveforms. GR-Athena++ leverages the task-based parallelism paradigm of Athena++ to achieve excellent scalability. We measure strong scaling efficiencies above 95\% for up to \sim 1.2\times10^4 CPUs and excellent weak scaling is shown up to \sim 10^5 CPUs in a production binary black hole setup with adaptive mesh refinement. GR-Athena++ thus allows for the robust simulation of compact binary coalescences and offers a viable path towards numerical relativity at exascale.

Full PDF

DDraft version January 22, 2021

Typeset using L A TEX twocolumn style in AASTeX63

GR-Athena++ : puncture evolutions on vertex-centered oct-tree AMR

Boris Daszuta, Francesco Zappa, William Cook, David Radice,

2, 3, 4

Sebastiano Bernuzzi, andViktoriya Morozova

2, 3 Theoretisch-Physikalisches Institut, Friedrich-Schiller-Universität Jena, 07743, Jena, Germany Institute for Gravitation and the Cosmos, The Pennsylvania State University, University Park, PA 16802, USA Department of Physics, The Pennsylvania State University, University Park, PA 16802, USA Department of Astronomy and Astrophysics, The Pennsylvania State University, University Park, PA 16802, USA (Dated: January 22, 2021)

ABSTRACTNumerical relativity is central to the investigation of astrophysical sources in the dynamical andstrong-ﬁeld gravity regime, such as binary black hole and neutron star coalescences. Current chal-lenges set by gravitational-wave and multi-messenger astronomy call for highly performant and scal-able codes on modern massively-parallel architectures. We present

GR-Athena++ , a general-relativistic,high-order, vertex-centered solver that extends the oct-tree, adaptive mesh reﬁnement capabilities ofthe astrophysical (radiation) magnetohydrodynamics code

Athena++ . To simulate dynamical space-times

GR-Athena++ uses the Z4 c evolution scheme of numerical relativity coupled to the moving punc-ture gauge. We demonstrate stable and accurate binary black hole merger evolutions via extensiveconvergence testing, cross-code validation, and veriﬁcation against state-of-the-art eﬀective-one-bodywaveforms. GR-Athena++ leverages the task-based parallelism paradigm of

Athena++ to achieve ex-cellent scalability. We measure strong scaling eﬃciencies above for up to ∼ . × CPUs andexcellent weak scaling is shown up to ∼ CPUs in a production binary black hole setup with adaptivemesh reﬁnement.

GR-Athena++ thus allows for the robust simulation of compact binary coalescencesand oﬀers a viable path towards numerical relativity at exascale. INTRODUCTIONNumerical relativity (NR) provides robust techniquesfor constructing numerical solutions to the Einstein ﬁeldequations (EFE). The phenomenology of astrophysicalinspiral and merger events, such as those between bi-nary constituents involving (variously) black holes (BH)or neutron stars (NS) can be succinctly described withgravitational waves (GW) computed with NR Pretorius(2005); Baker et al. (2007); Campanelli et al. (2006);Shibata & Uryu (2000). This has crucially assisted inthe recent detection of such events by the LIGO andVirgo collaborations Abbott et al. (2016a,b, 2017a). Asthe operating sensitivity of these detectors is improvedAbbott et al. (2020) and new (KAGRA Akutsu et al.(2020)), approved (LISA Amaro-Seoane et al. (2017)),or proposed (Einstein telescope Punturo et al. (2010),Cosmic explorer Abbott et al. (2017b)) designs come on-line a concomitant enlargement of the physical parame-ter space that may be experimentally probed is oﬀered.These experimental eﬀorts are complemented by pub-licly available catalogs of simulation data provided bythe NR community (see e.g. Dietrich et al. (2018); Boyleet al. (2019); Healy et al. (2019); Jani et al. (2016)). On the NR side there is thus a pressing requirementto better resolve and characterize the underlying physicsduring simulation of binary black holes (BBH) in evermore extreme conﬁgurations, such as higher mass ratioLousto et al. (2010); Nakano et al. (2011) or to providediscrimination between candidate models in descriptionof binary NS Radice et al. (2020); Bernuzzi (2020) orBH-NS Shibata & Taniguchi (2011) events. Such sim-ulations can be extremely demanding from the point ofview of computational resources and viability typicallyhinges upon the availability of high performance com-puting (HPC) infrastructure Huerta et al. (2019). Thusaccurate NR codes that remain performant as HPC re-sources are scaled up and simultaneously allow for thescope of input physics to be simply extended are crucial.An important concern for NR investigations of the bi-nary merger problem is treatment of features sensitive towidely-varying length and time-scales. An approach in-spired by Berger-Oliger Berger & Oliger (1984) (see alsoBerger & Colella (1989)) is based on the introduction ofa sequence of hierarchically, well-nested patches (usuallyboxes) of increasing resolution and decreasing diametercentered at the constituents of the binary. The rela- a r X i v : . [ g r- q c ] J a n tive spatial positions of such patch conﬁgurations maybe arranged to automatically track the time-evolutionof the aforementioned compact objects. This has beena common approach adopted for NR codes that buildupon the open source Cactus framework Goodale et al.(2003) and utilize the

Carpet thorn Schnetter et al.(2004) (see also the

Einstein toolkit Loﬄer et al. (2012)for an overview tailored to astrophysical applications).Some notable code implementations based on

Cactus are

Llama

Pollney et al. (2011); Reisswig et al. (2013),

McLachlan

Brown et al. (2009),

LEAN

Sperhake (2007),

LazEv

Zlochower et al. (2005), and

Maya

Herrmann et al.(2007) furthermore within this framework magnetohy-drodynamics (MHD) may be coupled through use of

GRHydro

Mösta et al. (2014) or

WhiskyTHC

Radice et al.(2014). Other examples of non-

Cactus codes adoptingthe Berger-Oliger approach include

BAM

Brügmann et al.(2008); Galaviz et al. (2010); Thierfelder et al. (2011),

AMSS-NCKU

Cao et al. (2008). Elements of this approachare also shared by the recent

GRChombo

Clough et al.(2015). Thus far, all code-bases discussed here make(at least some) use of Cartesian grid coordinatizationsand involve use of ﬁnite-diﬀerence (FD) approximantsto derivative operators.Unfortunately here a priori speciﬁcation of patch hier-archies is usually required which makes capturing emer-gent features at unexpected locations challenging. Moreimportantly for Berger-Oliger the overhead of synchro-nization of solution data between patches on diﬀer-ing levels can incur heavy performance penalties whichspoil scaling in modern highly-parallel HPC architec-tures Stout et al. (1997).Other approaches such as those based on pseudo-spectral methods are represented by

SpeC

Szilagyi et al.(2009) where multi-patch decomposition of the compu-tational domain is made using a combination of topo-logical spheres and cylinders. The SXS collaborationBoyle et al. (2019) has used

SpeC to produce some ofthe longest and most-accurate binary GW to date, albeitBBH mass ratio (deﬁned q := m /m where m i are con-stituent masses and m ≥ m ) for publicly available GRtemplates remains conﬁned to q ≤ . Closely relatedare eﬀorts based on the discontinuous Galerkin (DG)method such as bamps Hilditch et al. (2016); Bugneret al. (2016) and

SpECTRE

Kidder et al. (2017). An-other recently pursued alternative has been an attemptto eliminate the need for reﬁnement altogether throughgeneration of adapted, problem-speciﬁc curvilinear-gridsas recently demonstrated in

SENR/NRPy+

Mewes et al.(2020, 2018); Ruchlin et al. (2018).A blend of beneﬁts that ameliorates some of the dis-advantages of the above approaches is oﬀered in block- based adaptive mesh reﬁnement (AMR) strategies Stoutet al. (1997). Crucially, in contrast to hierarchical,nested patches, for block-based AMR each physical po-sition on a computational domain is covered by oneand only one level. This reduces the problem of syn-chronizing the data of a solution over diﬀering levels tocommunication between block boundaries (as in DG)which when logically arranged into an oct-tree (in spatial dimensions – see e.g. Burstedde et al. (2011))can greatly improve computational eﬃciency throughpreservation of data locality in memory. Furthermore,making use of task-based parallelism as the computa-tional model greatly facilitates the overlap of communi-cation and computation. Additionally, great ﬂexibilityis maintained in how the computational domain can bereﬁned. The recent work of Dendro-GR

Fernando et al.(2018) utilizes such an approach in treatment of the vac-uum sector of the EFE with the BSSNOK formulationNakamura et al. (1987); Shibata & Nakamura (1995);Baumgarte & Shapiro (1999) where solution representa-tion is in terms of adaptive wavelets and choice of regionsto reﬁne controlled by the wavelet expansion itself Holm-ström (1999). Scaling in terms of HPC performanceappears to have been convincingly demonstrated with

Dendro-GR for a mock q = 10 BBH event, although nu-merical accuracy and full scale evolutions, together withGW characteristics that would potentially be calculatedduring production runs are not presented for any q .In this work we present our eﬀort to build upon apublic version of Athena++

White et al. (2016); Felker& Stone (2018); Stone et al. (2020) where changes tocore functionality have been made, together with intro-duction of new modules targeted towards solution of theEFE. We refer to these new features as

GR-Athena++ .Originally

Athena++ was conceived as a framework forpurely non- and special-relativistic MHD, as well asGRMHD for stationary space-times, which adopts manyof the mature and robust numerical algorithms of Stoneet al. (2008) in a modern

C++ design centered aroundblock-based AMR. Key design elements include nativesupport for Cartesian and curvilinear coordinates and aparticular focus on future-prooﬁng through code mod-ularity. The computational model is task-based andembeds hybrid parallelism through dual use of mes-sage passing interface (MPI) and threading via OpenMP(OMP). In addition to excellent scaling properties onHPC infrastructure, modularity and modern code prac-tices have allowed for extension of

Athena++ to hetero-geneous architectures through

Kokkos

Carter Edwardset al. (2014) resulting in performant MHD calculationon graphics processing units with

K-Athena

Grete et al.(2019) (see also parthenon

Miller et al. (2021)).These attractive properties served as a strong moti-vation in development of

GR-Athena++ where we haveimplemented the Z4 c formulation Bernuzzi & Hilditch(2010); Ruiz et al. (2011); Weyhausen et al. (2012);Hilditch et al. (2013) of NR utilizing the (moving) punc-ture gauge Brandt & Brügmann (1997); Baker et al.(2007); Campanelli et al. (2006). We provide accu-rate and eﬃcient extensions to derivative approximantsthrough (templated) arbitrary-order FD based on Al-ﬁeri et al. (2018). Our introduction of vertex-centered(VC) variable treatment (extending core cell- and face-centered functionality) is motivated by a desire to matchany selected FD order in calculations that involve AMR.Furthermore our implementation of the level to leveltransfer operators that occur in AMR takes advantageof the particular structure over sampled nodes at dif-fering grid levels to simultaneously improve computa-tional eﬃciency and accuracy. Within (GR-)Athena++ time-evolution is achieved through the standard methodof lines approach where scheme order may be speciﬁedﬂexibly. In this work the formal order of the spatialdiscretizations considered are th and th whereas thetemporal treatment is at th order.As calculating quantities such as GW typically in-volves integration over spherical surfaces we have intro-duced a module implementing geodesic spheres based onWang & Lee (2011). A primary demonstration of thisfunctionality is presented in direct, cross-code validationagainst BAM where extracted GW are computed from theWeyl scalar Ψ and the gravitational strain is examinedfor the prototype BH and BBH calibration problems ofBrügmann et al. (2008). Additionally, it is importantthat simulations can be carried out that provide dataof physical relevance to detection eﬀorts. To this endwe consider an equal mass BBH inspiral on an initiallyquasi-circular (i.e low eccentricity) co-orbit that resultsin a merger event. Initial data is based on the conﬁg-uration of Hannam et al. (2010). Veriﬁcation is madeutilizing the state-of-the-art, NR informed, eﬀective onebody model of TEOBResumS

Nagar et al. (2018).As we prioritized HPC eﬃciency it is thus importantthat

GR-Athena++ preserves the already impressive be-havior of

Athena++ where over parallel eﬃciencyis shown in weak scaling tests with uniform grids em-ploying up to ∼ . × CPUs for MHD/HD prob-lems Stone et al. (2020).

Dendro-GR code, in whichBBH are evolved using an oct-tree grid as we do in

GR-Athena++ but with diﬀerent strategies, demonstratesvery good scaling properties of q = 10 BBH evolutionsutilizing up to ∼ . × CPUs. In these tests, however,re-mesh and inter-grid transfer operations are disabled. In this work we aim to reach such performance for BBHevolutions with full AMR.The rest of this paper is organized as follows: In §2 weprovide further details on the computational approachtaken within (GR-)Athena++ together with the variousextensions we have made to core functionality. Subse-quently in §3 an overview of the Z4 c system we use inour calculations is provided together with description ofthe numerical algorithms employed. Reﬁnement strat-egy and details concerning grids are provided in §4. In §5we discuss results of extensive testing of GR-Athena++ onBH and BBH problems performing cross-code validationand assessing convergence properties whereupon in §6computational performance is detailed through strongand weak scaling tests. Finally §7 summarizes and con-cludes. METHOD

GR-Athena++ builds upon

Athena++ thus in order tospecify nomenclature, provide a self-contained descrip-tion, and explain our extensions, we ﬁrst brieﬂy recountsome details of the framework (see also White et al.(2016); Felker & Stone (2018); Stone et al. (2020)).In (GR-)Athena++ overall details about the domain Ω over which a problem is formulated are abstractedfrom the salient physics and contained within a classcalled the Mesh . Within the

Mesh an overall represen-tation of the domain as a logical n -rectangle is stored,together with details of coordinatization type (Carte-sian or more generally curvilinear), number of pointsalong each dimension for the coarsest sampling N M =( N M , · · · , N M d ) , and physical boundary conditions on ∂ Ω . In order to partition the domain we ﬁrst ﬁx a choice N B = ( N B , · · · , N B d ) where each element of N B mustdivide each element of N M component-wise. Then Ω is domain-decomposed through rectilinear sub-divisioninto a family of n -rectangles satisfying Ω = (cid:116) i ∈ Z Ω i ,where Z is the set of MeshBlock indices, correspond-ing to the ordering described in §2.1. Nearest-neighborelements are constrained to only diﬀer by a single sub-division at most. The

MeshBlock class stores propertiesof an element Ω i of the sub-division. In particular thenumber of points in the sampling of Ω i is controlledthrough the choice of N B . For purposes of communi-cation of data between nearest neighbor MeshBlock ob-jects the sampling over Ω i is extended by a thin layer ofso-called “ghost nodes” in each direction. Furthermorethe local values (with respect to the chosen, extendedsampling on Ω i ) of any discretized, dependent ﬁeld vari-ables of interest are stored within the MeshBlock .In both uniform grid ( ∀ i ∈ Z ) vol(Ω i ) = C and reﬁnedmeshes ( ∃ i, j ∈ Z ) vol(Ω i ) (cid:54) = vol(Ω j ) it is crucial toarrange inter- MeshBlock communication eﬃciently – tothis end the relationships between diﬀering

MeshBlock objects are arranged in a tree data structure, to whichwe now turn. 2.1.

Tree Structure of

Mesh

For the sake of exposition here and convenience inlater sections we now particularize to a Cartesian coor-dinatization though we emphasize that the general pic-ture (and our implementation) of the discussions hereand in §2.2 carry over to the curvilinear context withonly minor modiﬁcation. (GR-)Athena++ stores the logical relationship betweenthe

MeshBlock objects (i.e. Ω i ) involved in descrip-tion of a domain Ω within a tree data structure. Abinary-tree, quad-tree or oct-tree is utilized when d :=dim(Ω) = 1 , , respectively. The relevant tree is thenconstructed by ﬁrst selecting the minimum N such that N exceeds the largest number of Ω i along any dimen-sion. The root of the tree is assigned a logical levelof zero and the tree terminates at level N with every MeshBlock assigned to an appropriate leaf, though someleaves and nodes of the tree may remain empty. Datalocality is enhanced, as references to

MeshBlock objectsare stored such that a post-order, depth-ﬁrst traversalof the tree follows Morton order (also termed Z-order)Morton (1966). This order can be used to encode multi-dimensional coordinates into a linear index parametriz-ing a Z-shaped, space-ﬁlling curve where small changesin the parameter imply spatial coordinates that are closein a suitable sense Burstedde et al. (2019).As an example we consider a three-dimensional

Mesh described by ( N x , N y , N z ) = (2 , , MeshBlock ob-jects in each direction at ﬁxed physical level in Fig.1.Consider now a

Mesh with reﬁnement. Function dataat a ﬁxed physical level is transferred one level ﬁnerthrough use of a prolongation operator P ; dually, func-tion data may be coarsened by one physical level throughrestriction R . The number of physical reﬁnement levelsadded to a uniform level, domain-decomposed Ω is con-trolled by the parameter N L . By convention N L startsat zero. Subject to satisfaction of problem-dependentreﬁnement criteria, there may exist physical levels at , · · · , N L . When a given MeshBlock is reﬁned (coars-ened) d MeshBlock objects are constructed (destroyed).This is constrained to satisfy a reﬁnement ratiowhere nearest-neighbor

MeshBlock objects can diﬀer byat most one physical level.In Fig.2 we consider an example of a non-periodic Ω described by N x = N y = N z = 2 MeshBlock objectswith N L = 3 selected with reﬁnement introduced at the Figure 1.

Example of

Mesh partitioned uniformly by

MeshBlock objects indexed via Z-order and traced in redthrough constituent geometric centroids. The logical rela-tionship between Ω i is stored in an oct-tree. Empty leavesare suppressed though each populated node above logicallevel three has eight children. Notice that physical level p and logical level l are distinct. See text for further discus-sion. corner x max , z max . If periodicity conditions are imposedon ∂ Ω then additional reﬁnement may be required forboundary intersecting MeshBlock objects so as to main-tain the aforementioned inter-

MeshBlock reﬁne-ment ratio.2.2.

Vertex-centered Discretization

Natively

Athena++ supports cell-centered (CC) andface-centered (FC) description of variables, togetherwith calculation of line-averages on cell edges Stoneet al. (2020).

GR-Athena++ extends support to allowfor vertex-centering (VC). The modiﬁcations required toachieve this are extensive as core code must be changedin such a way so as to complement existing functionality.The modularity and good code practices of

Athena++ greatly facilitated matters. Our motivation for intro-duction of VC is a desire to ensure each stage of our nu-merical scheme maintains consistent (high) order whilesimultaneously maintaining eﬃciency of R and P op-erator choice and implementation. In the remainder ofthis section we brieﬂy describe this newly introducedfunctionality.2.2.1. VC and Communication: Fixed Physical Level

Unless otherwise stated in all remaining sections weﬁx N M and N B to be uniform in each dimension andrepresent each of these tuples with a single scalar. As a Figure 2.

Example of

Mesh partitioned and reﬁned by

MeshBlock objects indexed (labels explicitly indicated up tological level two) via Z-order and traced in red through con-stituent geometric centroids. The logical relationship be-tween p Ω i and neighbors is stored in an oct-tree. There areno unpopulated leaves. Notice that physical level p and logi-cal level l are distinct; coloring corresponds to physical level: p = 0 in black, p = 1 in dark green, p = 2 in blue, and p = 3 in purple. See text for further discussion. preliminary, x ∈ [ a, b ] is said to be vertex-centered whendiscretized as x I = a + Iδx where δx = ( b − a ) /N B and I = 0 , . . . , N B yielding N B + 1 total samples. In prac-tice, to this an additional N g so-called ghost nodes areappended which extend the interval by N g δx on bothsides. When d = dim(Ω j ) = 2 , an appropriate ten-sor product of such extended one-dimensional discretiza-tions is utilized. When a ﬁeld component V is sampledon such grids it is said to be VC. The additional ghostnodes form a layer that enables imposition of physicalboundary conditions and inter- MeshBlock communica-tion.Consider a domain decomposed into multiple

MeshBlock objects. Discretized variable data must becommunicated. An additional intricacy however arisesdue to the sharing of vertices at neighboring

MeshBlock interfaces that are not part of the ghost-layer. The num-ber of

MeshBlock objects a node is shared between isreferred to as the node-multiplicity.We illustrate this with a two-dimensional example.Let V be sampled on neighboring MeshBlock objects

Figure 3.

Schematic of (communicated) nodes on a two-dimensional

MeshBlock Ω i . The ghost-layer is shaded ingray with alternating shading demarcating diﬀering neigh-bor MeshBlock objects. Nodes marked with “ (cid:3) ” are interiorto Ω i and are unaﬀected as neighbor data is received – allother nodes are updated. Ghost-layer multiplicities are in-dicated for “ (cid:4) ” and dark-green where µ = 1 whereas nodesin “ (cid:7) ” and light-green have µ = 2 . Interface nodes alongedges are marked with “ • ” in light-green and correspond to µ = 2 whereas corner nodes marked “ (cid:63) ” in green correspondto µ = 4 . See text for further discussion. of ﬁxed physical level, where N B = 6 is chosen andghost-zone layer selected to have N g = 2 nodes. Fur-ther, we assume that Ω i is not on the physical boundaryof the domain. This entails that ( N B − nodes areinternal and the remainder require synchronization viadata received (i.e. populated) from neighboring blocksas depicted in Fig.(3). Note that independent commu-nication requests and buﬀers are posted for each neigh-bor. Communication from neighbors therefore has nopreferred order and consequently we follow an averagingapproach to achieve consistency as follows: All receiveddata is ﬁrst additively accumulated on the MeshBlock with node-multiplicity µ dynamically updated in an aux-iliary array of d elements based on the location of therelevant neighbor . After data from all relevant neigh-bors has been received, a ﬁnal division by µ is performed.This is done so as to not preferentially weight data fromany particular neighbor. In principle, it is possible toconstruct µ a priori, however, we elect to follow thisdynamical strategy to facilitate automatic treatment ofboundary conditions and avoid the additional complex-ity involved during Mesh reﬁnement. For a given node µ is uniform in all VC variables that are to becommunicated and hence need only be constructed once in theabsence of Mesh reﬁnement. The choice of d elements is made tosimultaneously treat communication over distinct physical levels(see §2.2.2). Communication: Distinct Physical Levels

We now consider a

Mesh featuring reﬁnement. Thefundamental description of variables between neighbor-ing

MeshBlock objects may therefore diﬀer by (at most)a single physical level (see also 2.1).A

MeshBlock at physical level p will be denoted by p Ω j and the corresponding collection of ﬁelds sampled usingVC discretization over the MeshBlock as F ( p Ω j ) . Inthis context V ∈ F ( p Ω j ) , has a complementary, coarseanalogue V c of ( (cid:98) N B / (cid:99) + 1) d samples further extendedby a coarse ghost-layer comprised of N cg nodes. Thesampling resolution for V c is thus half that used for V . In order to emphasize the physical level of a given MeshBlock and not blur the distinction between thetypes of samplings we also make use of the notation F c ( p Ω j ) ≡ F ( p − Ω j ) .In contrast to CC and FC as implemented in Athena++ our implementation of VC allows for N g and N cg to takeodd values and be independently varied. For simplicityof discussion we impose N g = N cg .When a Mesh involves multiple physical levels, priorto any communication of data, VC variables are initiallyrestricted so as to have a fundamental and coarse de-scription on each

MeshBlock excluding the ghost-layers.For logically Cartesian grids in particular, this turns outto be an inexpensive and exact operation (§2.2.3). Withthis initial step neighboring

MeshBlock objects at thesame physical level have V and V c communicated usingthe method described in §2.2.1.To describe our treatment when neighboring physi-cal levels diﬀer, consider a two-dimensional Mesh where N B = 8 and N g = 2 . Once more, we work within alocal portion of the full Mesh where the role of the phys-ical boundary may be ignored. Suppose p Ω A is neigh-bored by p +1 Ω B and p +1 Ω C to the east and the lattertwo MeshBlock objects share a common edge. Figure 4shows how the ghost-layer nodes of the ﬁner p +1 Ω B based on the coarser neighbor p Ω A are populated. Inthis situation data may be freely posted for communi-cation to the MeshBlock on the ﬁner level whereuponghost-zones of its coarse variable are populated. How-ever depending on the details of P , the prolongation op-eration over the ghost-layer is blocked in the sense thatthe entirety of the coarse ghost-layer of F c ( p +1 Ω B ) mustﬁrst be populated. Once fully populated, prolongationis carried out on the target MeshBlock . During synchro-nization of data from coarse to ﬁne levels interface nodesare maintained at the value of the ﬁner level.An example of the dual process of populating nodeson a coarser level involving p Ω A and p +1 Ω C is depictedin Fig.5. In this case (previously) restricted data of theﬁner MeshBlock interior is communicated, updating the

Figure 4.

Schematic of two-dimensional

MeshBlock p Ω A used to populate ghost-nodes of ﬁner MeshBlock p +1 Ω B . Lo-cal view of the Mesh depicts nearest-neighbor

MeshBlock con-nectivity and physical levels. Nodes over p Ω A , and p +1 Ω B together with coarse analogues are shown. Sampled values V ∈ F ( p Ω A ) that are to be sent are marked by “ • ” in darkgreen; this data is received and directly populates the ghost-nodes marked by “ (cid:4) ” in dark green, i.e., V ∈ F c ( p +1 Ω B ) .Once the remaining data for F c ( p +1 Ω B ) – marked by “ (cid:3) ”– is ﬁlled, and any multiplicity conditions (here suppressed)are accounted for, prolongation P : F c ( p +1 Ω B ) (cid:55)→ F ( p +1 Ω B ) can be performed in order to populate values at the ghost-nodes of p +1 Ω B marked by “ (cid:4) ” in purple. Notice that forthis procedure data at nodes on the neighbor interface re-main unchanged. See text for further discussion. Figure 5.

Schematic of two-dimensional

MeshBlock p +1 Ω C used to populate ghost-layer of coarser MeshBlock p Ω A . Lo-cally the Mesh has the same structure as in Fig.4. Nodes over p Ω A , and p +1 Ω C together with coarse analogues are shown.Prior to communication, sampled data of V ∈ F ( p +1 Ω C ) atnodes marked by “ • ” in purple must be restricted to popu-late data V c ∈ F c ( p +1 Ω C ) at nodes marked by “ • ” in darkgreen. This is then sent whereupon data at the nodes of p Ω A marked by “ (cid:4) ” in dark green is provided. During thisprocedure data at nodes on the neighbor interface are (addi-tively) updated (cf. Fig.4) and multiplicity conditions (heresuppressed) dynamically updated in an auxiliary array. Seetext for further discussion. common interface and ghost-layer of F ( p Ω A ) . In thissituation no blocking occurs. However, non-trivial mul-tiplicity conditions arise on the common neighbor inter-face. Furthermore, the equivalent operation involving p +1 Ω B instead of p +1 Ω C induces another edge withinthe ghost-layer of p Ω A . Finally, we note that valuesof F c ( p Ω A ) must also be updated. Thus another re-striction of samples of F c ( p +1 Ω C ) is also made and theoverall communication process repeated.The steps for the above communication procedure aresummarized in §2.2.4.2.2.3. Restriction and Prolongation

When a

Mesh is reﬁned restriction R and prolongation P operations are required. In GR-Athena++ these oper-ations for VC variables are implemented based on uni-variate Lagrange polynomial interpolation or productsthereof when dim(Ω) > with function data utilized atnodes centered about a target-point of interest.For a Mesh sampled according to a Cartesian coordi-natization

MeshBlock grids are uniformly spaced in eachdimension. This provides immediate simpliﬁcations to R and P which may be understood as follows. Con-sider interpolation of a smooth function V on a one-dimensional interval. A polynomial interpolant ˜ V of de-gree N with samples of the function V symmetricallyand uniformly spaced about x ∗ that passes through the N + 1 distinct points: I V := { ( x i , V ( x i )) | x i = x ∗ + iδx ∧ i ∈ {− N, . . . , N }} , is unique and may be written in Lagrange form Tre-fethen (2013): ˜ V ( x ) = N (cid:88) i = − N l i ( x ) V ( x i ) , (1)where the Lagrange cardinal polynomials satisfy l i ( x j ) = δ ij when x j is a node used during formationof I V and δ ij is the Kronecker delta. We use Eq.(1) (orappropriate product generalizations) in order to specify R . Given function data on a uniform, VC discretized in-terval suppose we wish to construct data on the interiorof a coarser overlapping interval that shares the sameend-points and is sampled at twice the spacing. We ﬁndthat points in the image of R (i.e. desired points overthe coarse grid) form a subset of points over the orig-inal ﬁne grid. Therefore the desired data may simplybe immediately copied (see Fig.5). This is eﬃcient andinvolves no approximation.Recall that the restriction operator is utilized duringtransfer of data from a MeshBlock to a coarser neighbor.Consider the case of the two-fold coarsened data that must be provided to the neighbor

MeshBlock . While R as speciﬁed here is an idempotent operation, some caremust be taken, because the spatial extent of the ghost-nodes to be populated is sampled by the non-ghost dataof the source MeshBlock . To ensure this is possible weimpose a constraint relating

MeshBlock sampling andghost-layer through: N B ≥ max(4 , N g − . (2)The above does not place any constraint on whether N g is even and therefore a choice of an odd or evennumber is allowed.Interpolation based on Eq.(1) is also utilized for pro-longation. Here function data is transferred to a ﬁner,uniformly sampled grid of half the spacing and conse-quently interspersed nodes coincide (see Fig.4) oﬀeringanother optimization in execution eﬃciency. Due tothe uniform structure of the source and target grids,interpolation at non-coincident nodes may be imple-mented through a weighted sum where weight factorscan be precomputed Berrut & Trefethen (2004). In prac-tice the width of the interpolation stencil we utilize is N = (cid:98)N cg / (cid:99) + 1 .Tailored, optimized routines for GR-Athena++ incor-porate the above simpliﬁcations for the case of logicallyCartesian grids.Finally, for later convenience we note that the d -rectangle [ x L , x R ] d as represented by a Mesh with Carte-sian coordinatization, N M points along each dimension,and N L physical levels of reﬁnement has a grid spacingon the ﬁnest level of: δx = x R − x L N M N L − . (3)2.2.4. Summary

We close discussion of VC by providing a compactsummary of the overall logic involved during synchro-nization of data between

MeshBlock objects for a prob-lem involving reﬁnement.At compile time N g and N cg are selected and C++ templates specify precomputed weights for any requisiteinterpolation during a computation (thus ﬁxing R and P ). A given problem of interest may then be executedfor some choice of N M , N B (subject to Eq.(2)), N L , andphysical grid.The following steps are taken when function data froma MeshBlock p Ω i is to be sent :i. Non-ghost data is restricted populating F c ( p Ω i ) .ii. Neighbor MeshBlock objects are iterated over andtreated according to the physical level of the targetneighbors and the communication buﬀers are popu-lated from: p − Relevant interior (and shared interface) nodesof F c ( p Ω i ) ; similarly F ( p Ω i ) is twice restricteddirectly to the communication buﬀer. p : Relevant interior (and shared interface) nodesof F ( p Ω i ) together with F c ( p Ω i ) . p + 1 : Relevant interior nodes not on the common in-terface from F ( p Ω i ) .The following steps are taken when function data on p Ω i is to be received :i. The ghost-layer of variable data for the given MeshBlock p Ω i is set to zero and any previously ac-cumulated multiplicites are reset.ii. Non-ghost data is restricted populating F c ( p Ω i ) .iii. Function data is independently received (unordered)from neighbor MeshBlock objects. Treatment againsplits based on physical level of the salient neighborwith additive updating of the following

MeshBlock -local function data: p − Relevant ghost-layer nodes of F c ( p Ω i ) . p : Relevant ghost-layer and interface nodes of F c ( p Ω i ) and F ( p Ω i ) . p + 1 : Relevant ghost-layer and interface nodes of F ( p Ω i ) and F c ( p Ω i ) .iv. Once all neighbor function data is received, divisionby multiplicity conditions is carried out.v. Regions of the ghost-layer involving a coarser levelneighbor may ﬁnally be prolongated.For local calculations (in the absence of distributed, MPIcommunications) operations are performed locally inmemory. Finally we emphasize that the base Athena++

CC and FC variables when required continue to simul-taneously function as explained in Stone et al. (2020).2.3.

Geodesic spheres

Calculating quantities such as the ADM mass, mo-mentum and gravitational radiation associated with anisolated system typically involves integration over spher-ical surfaces, the radii of which are controlled by a lim-iting procedure. In practice, a large but ﬁnite radiusis often selected during numerical work. Denote the -sphere of ﬁxed radius R by S R . The natural choice ofspherical coordinatization for S R involves uniform sam-pling in the polar and azimuthal angles ( ϑ, ϕ ) and it iswell-known that problems may arise at the poles duringdescription of geometric quantities; furthermore, pointstend to cluster there which may be undesirable from Figure 6.

Structure of the geodesic grid used by

GR-Athena++ . Left panel: an example of a low resolution (92vertices) geodesic grid highlighting the features of the grid.Right panel: a grid used for gravitational wave extraction inproduction simulations (9002 vertices). the stand-point of eﬃciency in some applications. In

GR-Athena++ we avoid these issues by instead workingwith triangulated geodesic spheres. In short, a geodesicsphere of radius R (denoted Q R ) may be viewed as theboundary of a convex polyhedron embedded in R withtriangular faces, i.e., a simplicial 2-sphere that is home-omorphic to S R . A sequence of geodesic spheres with anincreasing number of vertices (and consequently surfacetiling triangles) thus serves as a sequence of increasinglyaccurate approximants to S R ; see Fig.6To construct the geodesic grid we start from a regularicosahedron with 12 vertices and 20 plane equilateral tri-angular faces, embedded in a unit sphere. We reﬁne itusing the so called “non-recursive” approach describedin Wang & Lee (2011). In this approach, each planeequilateral triangle of the icosahedron is divided into n Q small equilateral triangles (each side of the triangleis split into n Q equal segments, where n Q is called thegrid level). The intersection points are projected ontothe unit sphere, and together with the original 12 ver-tices of the icosahedron they form the convex polyhedronused as a grid. The resulting polyhedron has n Q + 2 vertices, in which we deﬁne the desired physical quan-tities. The left panel of Fig.6 shows the grid consistingof 92 vertices ( n Q = 3 ), while the right panel shows thegrid consisting of 9002 vertices ( n Q = 30 ).Integrals on the sphere are computed with numeri-cal quadratures. To this aim we associate to each gridpoint a solid angle in the following way. We constructcells around each vertex of the grid by connecting thecircumcenters of any pair of triangular faces that sharea common edge. The resulting cells are mostly repre-sented by hexagons, apart from the 12 vertices of theoriginal icosahedron, which have only ﬁve neighbors andtherefore correspond to pentagonal cells. The solid an-gles subtended by the cells at the center of the sphereare used as weighting coeﬃcients when computing theaverages. The logical connection between the neighbor-ing cells is implemented as described in Randall et al.(2002).Using a geodesic grid ensures more even tiling of thesphere compared to the uniform latitude-longitude gridof similar resolution. The ratio between the solid anglescorresponding to the largest and smallest cells in the n Q = 30 grid is equal to . For comparison, a gridof comparable resolution with uniform sampling in thepolar and azimuthal angles (say, 67 ϑ angles and 134 ϕ angles, with the total of 8978 cells), would have theratio between the areas of the smallest and largest cells (cid:39) / sin( π/ (cid:39) . . Z4C

SYSTEM IN

GR-Athena++

In the Cauchy problem for the Einstein ﬁeld equations(EFE), a globally hyperbolic space-time M is foliatedby a family of non-intersecting spatial slices { Σ t } t ∈ R where the parametrizing time-function t is assumedglobally deﬁned. An initial slice Σ t is selected andwell-posed evolution equations based on the EFE mustbe prescribed. A variety of mature approaches exist tothis problem such as BSSNOK Nakamura et al. (1987);Shibata & Nakamura (1995); Baumgarte & Shapiro(1999) or those based on the generalized harmonic gauge(GHG) formulation Friedrich (1985); Pretorius (2005);Lindblom et al. (2006). A unifying framework is pro-vided in the Z4 approach Bona et al. (2003) where par-ticular cases of both GHG and BSSNOK formulationsmay be recovered (see Bona et al. (2010) and referencestherein). In particular Z4 c Bernuzzi & Hilditch (2010);Ruiz et al. (2011); Weyhausen et al. (2012); Hilditchet al. (2013) seeks to combine the strengths of theseother two approaches Cao & Hilditch (2012) thus moti-vating it as the choice of formulation for GR-Athena++ .In §3.1 and §3.2 we describe the overall idea behind nu-merical evolution with Z4 c and implementation within GR-Athena++ . Details on our method for wave extrac-tion (i.e., calculation of gravitational radiation) is pro-vided in §3.3 whereupon §3.4 closes with a brief descrip-tion of numerical methods we utilize.3.1.

Overview

At its core, the Z4 formulation Bona et al. (2003) seeksto stabilize the time-evolution problem through directaugmentation of the EFE via suitable introduction ofan auxiliary, dynamical vector ﬁeld Z a and ﬁrst-ordercovariant derivatives thereof. The approach admits nat-ural incorporation of constraint damping via explicitappearance of (freely chosen) parameters κ i Gundlachet al. (2005); Weyhausen et al. (2012). Recall that in the standard method of ADM-decomposition Arnowitt et al. (1959); Baumgarte &Shapiro (2010) one introduces a future-directed t a satis-fying t a ∇ a [ t ] = 1 and considers t a = αn a + β a where n a is a future-directed, time-like, unit normal n a toeach member of the foliation Σ t , α is the lapse and β a the shift. Subsequently geometric projections of am-bient ﬁelds, to (products of) the tangent and normalbundle(s) of Σ may be considered, which here leads toevolution equations for the augmented EFE. The evo-lution equations are written in terms of the variables (cid:0) γ ij , K ij , Θ , ˇ Z i (cid:1) where γ ij is the induced metric and K ij the extrinsic curvature associated with Σ ; Θ := − n a Z a and ˇ Z i := ⊥ ai Z a (with ⊥ ab := g ab + n a n b and g ab being the space-time metric) are the normal and spatialprojections of Z a respectively. Furthermore Hamilto-nian, momentum, and auxiliary vector constraints mustalso be satisﬁed C U := ( H , M i , Z a ) = 0 such that a nu-merical space-time is faithful to a solution of the stan-dard EFE. Importantly, for a space-time without bound-ary if C U = 0 for some element of the foliation Σ t ∗ then analytically this property extends for all t Bonaet al. (2003). This compatibility of C U with the evolu-tion is one crucial property for numerical calculationsallowing for a choice of free-evolution scheme. In such ascheme equations are discretized and initial data of in-terest is prepared so as to satisfy C U = 0 on Σ t , duringthe course of the time-evolution C U is monitored and itmust be veriﬁed that any accumulated numerical errorconverges away with increased resolution.In Z4 c Bernuzzi & Hilditch (2010); Hilditch et al.(2013) to fashion an evolution scheme an additional stepis taken wherein a spatial conformal degree of freedomis ﬁrst factored out via: ˜ γ ij := ψ − γ ij , ˜ A ij := ψ − (cid:16) K ij − Kγ ij (cid:17) ; (4)with K := K ij γ ij and ψ := ( γ/f ) / where γ and f are determinants of γ ij and some spatial referencemetric f ij respectively. Here we assume f ij is ﬂat andin Cartesian coordinates which immediately yields the algebraic constraints : C A := (cid:0) ln(˜ γ ) , ˜ γ ij ˜ A ij (cid:1) = 0 . (5)The expression C A = 0 must be continuously enforced during numerical evolution (see §3.4) to ensure consis-tency Cao & Hilditch (2012). Additionally we introduce From the point of view of computational eﬃciency this is trivialto accomplish but strictly speaking doing so entails a partially-constrained evolution scheme. χ := γ − / , ˆ K := K − (6) ˜Γ i := 2˜ γ ij Z j + ˜ γ ij ˜ γ kl ∂ l [˜ γ jk ] , (cid:98) Γ i := ˜ γ jk ˜Γ ijk ; (7)where the deﬁnition of χ implies that χ = ψ − . Col-lectively the Z4 c system is comprised of dynamical vari-ables (cid:0) χ, ˜ γ ij , ˆ K, ˜ A ij , Θ , ˜Γ i (cid:1) which are governed by the evolution equations : ∂ t [ χ ] = 23 χ (cid:16) α ( ˆ K + 2Θ) − ∂ i [ β i ] (cid:17) + β i ∂ i [ χ ] , (8) ∂ t [˜ γ ij ] = − α ˜ A ij + β k ∂ k [˜ γ ij ] −

23 ˜ γ ij ∂ k [ β k ]+ 2˜ γ k ( i ∂ j ) [ β k ] . (9) ∂ t [ ˆ K ] = − D i [D i [ α ]] + α (cid:20) ˜ A ij ˜ A ij + 13 ( ˆ K + 2Θ) (cid:21) + β i ∂ i [ ˆ K ] + ακ (1 − κ )Θ + 4 πα [ S + ρ ] , (10) ∂ t [ ˜ A ij ] = χ [ − D i [D j [ α ]] + α ( R ij − πS ij )] tf + α [( ˆ K + 2Θ) ˜ A ij − A ki ˜ A kj ] + β k ∂ k [ ˜ A ij ]+ 2 ˜ A k ( i ∂ j ) [ β k ] −

23 ˜ A ij ∂ k [ β k ] , (11) ∂ t [Θ] = α (cid:104) ˜ H − κ (2 + κ )Θ (cid:105) + β i ∂ i [Θ] , (12) ∂ t [˜Γ i ] = − A ij ∂ j [ α ] + 2 α (cid:104) ˜Γ ijk ˜ A jk −

32 ˜ A ij ∂ j [ln( χ )] − κ (˜Γ i − (cid:98) Γ i ) −

13 ˜ γ ij ∂ j [2 ˆ K + Θ] − π ˜ γ ij S j (cid:105) + ˜ γ jk ∂ k [ ∂ j [ β i ]] + 13 ˜ γ ij ∂ j [ ∂ k [ β k ]]+ β j ∂ j [˜Γ i ] − (cid:98) Γ j ∂ j [ β i ] + 23 (cid:98) Γ i ∂ j [ β j ]; (13)where in Eq.(11) the trace-free operation is computedwith respect to γ ij and ˜ H is deﬁned in Eq.(18). Deﬁni-tions of matter ﬁelds are based on projections of the de-composed space-time, energy-momentum-stress tensor: T ab = ρn a n b + 2 S ( a n b ) + S ab , (14)in terms of the energy density ρ := T ab n a n b , momentum S i := − T bc n b ⊥ ci , and spatial stress S ij := T cd ⊥ ci ⊥ dj with associated traces T := g ab T ab = − ρ + S and S := γ ij S ij . The intrinsic curvature appearing in Eq.(11) isdecomposed according to: R ij = ˜ R χij + ˜ R ij , (15) where in terms of the conformal connection ˜D i compat-ible with ˜ γ jk : ˜ R χij = 12 χ (cid:20) ˜D i [ ˜D j [ χ ]] + ˜ γ ij ˜D l [ ˜D l [ χ ]] − χ ˜D i [ χ ] ˜D j [ χ ] (cid:21) − χ ˜D l [ χ ] ˜D l [ χ ]˜ γ ij , (16)and: ˜ R ij = −

12 ˜ γ lm ∂ l [ ∂ m [˜ γ ij ]] + ˜ γ k ( i ∂ j ) [˜Γ k ] + (cid:98) Γ k ˜Γ ( ij ) k + ˜ γ lm (2˜Γ kl ( i ˜Γ j ) km + ˜Γ kim ˜Γ klj ) . (17)Furthermore, we emphasize that in Eq.(13) and Eq.(17)it is crucial to impose (cid:98) Γ i where it appears through thedeﬁnition of Eq.(7).The dynamical constraints in terms of transformedvariables ( ˜ H , ˜ M i , Θ , ˇ Z i ) may be monitored to assessthe quality of a numerical calculation: ˜ H := R − ˜ A ij ˜ A ij + 23 (cid:0) ˆ K + 2Θ (cid:1) − πρ = 0 , (18) ˜ M j := ˜D i [ ˜ A ij ] −

32 ˜ A ij ∂ i [ln( χ )] − ∂ j [ ˆ K + 2Θ] − πS j = 0 , (19) Θ =0 , ˇ Z i =˜Γ i − (cid:98) Γ i = 0 . (20)Depending on the quantities of interest we may alterna-tively monitor the original non-rescaled constraints C U .Furthermore, we introduce for later convenience a single,scalar-valued collective constraint monitor: C := (cid:113) H + γ ij M i M j + Θ + 4 γ ij ˇ Z i ˇ Z j . (21)Finally we note that we have made use of the freedom toadjust the system by non-principal parts prior to con-formal decomposition so as to have a result closer toBSSNOK (which may be obtained by taking the formallimit Θ → in Eqs.(8–13)).3.2. Gauge choice and boundary conditions

To close the Z4 c system it must be further supple-mented by gauge conditions (i.e., conditions on α and β i ) that specify how the various elements Σ t of the foli-ation piece together. Furthermore in this work the com-putational domain does not extend to spatial inﬁnityand consequently boundary conditions (BC) on ∂ Ω mustalso be imposed.In GR-Athena++ we make use of the puncture gaugecondition which consists of the Bona-Másso lapse Bona1et al. (1995) and the gamma-driver shift Alcubierre et al.(2003): ∂ t [ α ] = − µ L α ˆ K + β i ∂ i [ α ] ,∂ t [ β i ] = µ S α ˜Γ i − ηβ i + β j ∂ j [ β i ] . (22)In speciﬁcation of Eq.(22) we employ the lapsevariant µ L = 2 /α together with µ S = 1 /α . Initially a“precollapsed” lapse and zero-shift is set: α | t =0 = ψ − (cid:12)(cid:12) t =0 , β i (cid:12)(cid:12) t =0 = 0; (23)where the choice is motivated by a resulting reductionin initial gauge dynamics Campanelli et al. (2006). Theshift damping parameter η appearing in Eq.(22) reduceslong-term drifts in the metric variables Alcubierre et al.(2003) and serves to magnify the eﬀective spatial resolu-tion near a massive feature, which in turn reduces noisein its local motion and extracted gravitational wave-forms Brügmann et al. (2008) (see also §5). We adopt aﬁxed choice η = 2 /M where M is the total ADM massArnowitt et al. (2008) of the system throughout thiswork as it is known to lead to successful time evolutionof binary black holes (BBH) of comparable masses Brüg-mann et al. (2008) and improves stability more broadlyCao & Hilditch (2012). With a view towards potentialinvestigations of high mass ratio binaries we have alsoincorporated η damping conditions within GR-Athena++ based on BBH location as a function of time Purreret al. (2012) together with the conformal factor basedapproach of Nakano et al. (2011); Lousto et al. (2010);Müller & Brügmann (2010).When coupled to the puncture gauge with the choicesmade above Z4 c forms a PDE system that is stronglyhyperbolic Cao & Hilditch (2012) and consequently theinitial value problem is well-posed Bernuzzi & Hilditch(2010). Artiﬁcial introduction of a boundary at ﬁnitedistance complicates the analysis of numerical stabil-ity signiﬁcantly Hilditch & Ruiz (2018). For the ini-tial boundary value problem the analysis beneﬁts fromsymmetric hyperbolicity of the underlying evolutionarysystem however for Z4 c starting in fully second orderform this property does not appear to exist within alarge class of symmetrizers Cao & Hilditch (2012). Wedo not seek to address this issue further here. Ourboundary treatment follows an approach due to Hilditchet al. (2013). We consider a Cartesian coordinatiza-tion of the Mesh Ω as a compactly contained domainwithin Σ t capturing the physics of interest. On ∂ Ω Sommerfeld BC are imposed on the subset of dynam-ical ﬁelds { ˆ K, ˜Γ , Θ , ˜ A ij } . Though this choice is notoptimal, as it is not constraint-preserving, we have notexperienced issues on account of this. An interesting consideration for future work would be to incorporatewithin GR-Athena++ the constraint-preserving BC of e.g.Ruiz et al. (2011); Hilditch et al. (2013) (see also Rinneet al. (2009)). 3.3.

Wave extraction

To obtain the gravitational wave content of the space-time, we calculate the Weyl scalar Ψ , the projectionof the Weyl tensor onto an appropriately chosen nulltetrad k, l, m, ¯ m . We use the same deﬁnition of Ψ hereas Brügmann et al. (2008): Ψ = − R abcd k a ¯ m b k c ¯ m d , (24)where we have exchanged the Weyl tensor for the Rie-mann tensor since we extract the gravitational wavesin vacuum. The dimensional Riemann tensor is con-structed from split ADM variables using the Gauss-Codazzi relations as detailed in Brügmann et al. (2008).To construct the null tetrad we start from a spatial co-ordinate basis: φ i =( − y, x, , r i =( x, y, z ) , θ i = (cid:15) ikl φ k r l ; (25)which is then Gram-Schmidt orthonormalized. Thenewly formed orthonormal, spatial triad is extended tospace-time with th components set to . With this weconstruct the null tetrad: k = 1 √ n − ˆ r ) , l = 1 √ n + ˆ r ); m = 1 √ θ + i ˆ φ ) , ¯ m = 1 √ θ − i ˆ φ ); (26)where n a = (1 /α, − β i /α ) . Once the Weyl scalar isobtained we perform a multipolar decomposition ontospherical harmonics of spin-weight s = − , deﬁned asfollows : ψ (cid:96)m = (cid:90) π (cid:90) π Ψ ¯ − Y (cid:96)m sin ϑ d ϑ d ϕ, (27) − Y (cid:96)m = (cid:114) (cid:96) + 14 π d (cid:96)m ( ϑ ) e imϕ , (28) d (cid:96)m ( ϑ ) = k (cid:88) k = k ( − k (( (cid:96) + m )!( (cid:96) − m )!( (cid:96) − (cid:96) + 2)!) / ( (cid:96) + m − k )!( (cid:96) − k − k !( k − m + 2)! × (cid:18) cos (cid:18) ϑ (cid:19)(cid:19) (cid:96) + m − − k (cid:18) sin (cid:18) ϑ (cid:19)(cid:19) k − m +2 (29) k = max(0 , m − , (30) k = min( (cid:96) + m, (cid:96) − . (31) Convention here is that of Goldberg et al. (1967) up to a Condon-Shortley phase factor of ( − m . Ψ is ﬁrstcalculated at all grid points throughout the Mesh . Thisis then interpolated onto a set of geodesic spheres Q R (see §2.3) at given extraction radii R Q , over which theintegral in Eq.(27) is performed. Recall that the gridlevel parameter n Q controls the total number of sampleson on Q R as n Q + 2 . We select n Q through localmatching to an area element of a MeshBlock : n Q = (cid:115) πR Q δx −  . (32)The modes of the gravitational wave strain h are com-puted from the projected Weyl scalar by integratingtwice in time ψ (cid:96)m = ¨ h (cid:96)m , (33)The strain is then given by the mode-sum: R ( h + − ih × ) = ∞ (cid:88) (cid:96) =2 (cid:96) (cid:88) m = − (cid:96) h (cid:96)m ( t ) − Y (cid:96)m ( ϑ, ϕ ) . (34)Following the convention of the LIGO algorithms libraryLIGO Scientiﬁc Collaboration (2018) we set: Rh (cid:96)m = A (cid:96)m exp( − iφ (cid:96)m ) , (35)and the gravitational-wave frequency is: ω (cid:96)m = ddt φ (cid:96)m . (36)3.4. Numerical technique

We implement the Z4 c system described in §3.1 in GR-Athena++ based on the method of lines approachwhere ﬁeld variables may be chosen to obey a VC orCC discretization at compile time and time-evolution isperformed using the fourth order in time, four stage,low-storage RK S ] method of Ketcheson (2010).Generic spatial ﬁeld derivatives in the bulk (awayfrom ∂ Ω ) are computed with high-order, centered, ﬁnitediﬀerence (FD) stencils whereas shift advection termsuse stencils lopsided by one grid point Zlochower et al.(2005); Husa et al. (2008); Brügmann et al. (2008); Chir-vasa & Husa (2010). The implementation is based onAlﬁeri et al. (2018) and utilizes C++ templates to oﬀerﬂexibility in problem-speciﬁc accuracy demands with-out performance penalties. A similar approach is takenfor implementation of the R and P operators discussedin §2.2.3. With this a consistent, overall, formal or-der throughout the bulk of the computational domain ismaintained during calculations by compile-time speciﬁ-cation of the ghost-layer through choice of N g together with N cg . Throughout this work we take N g = N cg though for a VC discretization this is not a requirementwithin GR-Athena++ and may be tuned to the demandsof the desired

Mesh reﬁnement strategy. In the case of Z4 c and VC this translates to an order for spatial dis-cretization in the bulk of N g − .We emphasize that during calculation of FD, R and P approximants, special care has been taken in the or-dering and grouping of arithmetical operations so as toreduce accumulation of small ﬂoating-point diﬀerences.This is a particularly important consideration in thepresence of physical symmetries where linear instabil-ities may amplify unwanted features present in the op-erator approximants and lead to resultant, spurious ap-pearance of asymmetry during late-time solutions Stoneet al. (2020).For most calculations involving Z4 c the treatment ofthe physical boundary is non-trivial. GR-Athena++ ex-tends

Athena++ by providing the Sommerfeld BC mo-tivated in §3.2. To accomplish this, within every time-integrator substep an initial Lagrange extrapolation isperformed so as to populate the ghost-layer at ∂ Ω . Or-der is again controlled at compile-time and we typicallyselect N g + 1 points for the extrapolation, albeit nu-merical experiments did not indicate signiﬁcant changeswhen this choice was varied. The dynamical equationsof §3.1 and gauge conditions of §3.2 populate the sub-set of ﬁelds { χ, ˜ γ ij , α, β i } on nodes of ∂ Ω whereasfor { ˆ K, ˜Γ , Θ , ˜ A ij } Sommerfeld BC are imposed as inHilditch et al. (2013) where ﬁrst order spatial deriva-tives are approximated through second order accurate,centered FD; we have found this to be crucial for nu-merical stability.As observed in Cao & Hilditch (2012) in the absenceof algebraic constraint C A projection Z4 c is only weak-hyperbolic. Therefore in GR-Athena++ we enforce C A at each time-integrator substep. On the other hand acoarse indicator on the overall error during the courseof a calculation is provided through inspection of theconstraints C U together with C (of Eq.(21)). Note thatthese latter constraints are not enforced.In order to have conﬁdence in implementation detailswe have replicated a subset of tests from the “Appleswith Apples” test-bed suite Alcubierre et al. (2004);Babiuc et al. (2008); Cao & Hilditch (2012); Daverioet al. (2018) a discussion of which is provided in theappendix §A.The Z4 c system does not strictly impose any partic-ular underlying Mesh structure or reﬁnement strategyand consequently we use this freedom to improve eﬃ-ciency and accuracy by raising resolution only whereit is required. During evolution the Courant-Friedrich-3Lewy (CFL) condition must be satisﬁed. To achievethis in the context of reﬁnement, spatial resolution onthe most reﬁned level and the choice of

CFL factor itselfdetermines the global time-step that is applied on each

MeshBlock . Finally, in order to suppress high-frequencynumerical artifacts generated at

MeshBlock boundariesand not present in the physical solution we make useof high-order Kreiss-Oliger (KO) dissipation Kreiss &Oliger (1973); Gustafsson et al. (2013) of uniform fac-tor σ over all levels. In particular given a system oftime-evolution equations for a vector of variables u thereplacement ∂ t [ u ] ← ∂ t [ u ] + σ D [ u ] is made where D [ · ] is proportional to a spatial derivative of order N g − . MESH REFINEMENT FOR PUNCTURESBlack holes in

GR-Athena++ are modeled as in

BAM making use of the puncture formalism Brügmann et al.(2008). In numerical relativity, BH can be treated byadopting the Brill-Lindquist wormhole topology whichconsists of considering N black holes with N+1 asymp-totically ﬂat ends for the initial geometry. These ﬂatends are compactiﬁed and identiﬁed with points on R and the coordinate singularities at these points are called punctures . This allows one to produce black hole ini-tial data associating masses, momenta and spins to anynumber of black holes. The main application of thisformalism is binary black hole evolution.The adaptive mesh reﬁnement criterion implementedin GR-Athena++ for puncture evolution mimics the clas-sic box-in-box reﬁnement (used in e.g.

BAM , Cactus ),within the

Athena++ infrastructure. The main idea isto follow the punctures’ position during the evolutionand reﬁne the grid depending on the distance from eachpuncture. 4.1.

Punctures’ initial data

Black holes initial data are constructed follow-ing Brügmann et al. (2008). We consider as our ini-tial data the positive-deﬁnite metric and extrinsic cur-vature ( γ ij , K ij ) on a spatial hypersurface Σ withtime-like unit normal n i such that n i n i = − . Suchinitial data are constructed by means of the confor-mal, transverse-traceless decomposition of the initial-value equations York (1979). We can use the map ofEq.(4) and freely choose an initially conformally ﬂatbackground ˜ γ ij = δ ij and take a maximal slice, i.e.set K = 0 . Doing so, the momentum constraint be-comes ∂ j (cid:16) ψ ˜ A ij (cid:17) = 0 and admits Bowen-York solu-tions Bowen & York (1980) for an arbitrary number ofblack holes.The Hamiltonian constraint reduces to an ellipticequation for ψ , with solution (for N black holes with centers at r i ): ψ = 1 + N (cid:88) i =1 m i r i + u. (37)The variable ψ represents the initial value of ψ , which,based on its relation to χ , is evolved according to Eq.(8).In this equation the function u can be determined by anelliptic equation on R and is C at the punctures and C ∞ elsewhere. The variable m i is called the bare mass of a black hole and it coincides with the actual massonly in the Schwarzschild case. The total ADM mass ofeach black hole at the puncture is given by: M i = m i  u i + (cid:88) i (cid:54) = j m j d ij  , (38)where u i is the value of u at each puncture and d ij isthe coordinate distance between each pair of punctures.Ultimately, we denote the total mass of the system with M , which represents the physical mass scale of the prob-lem and thus all results will be reported accordingly.To produce BBH initial data following the above de-scription, we make use of an external C library based onthe pseudo-spectral approach of Ansorg et al. (2004),which is also used in the TwoPunctures thorn of Cactus . 4.2.

Puncture tracker

To follow the punctures’ position we need to solve anadditional ODE, which is not coupled to the Z4c system.Since the conformal factor χ vanishes at the puncture,Eq.(8) implies that Campanelli et al. (2006): ˙ x p ( t ) = − β | x p ( t ) , (39)where β | x p is the shift function evaluated at the punc-ture position. We solve this vectorial equation at everytimestep using an explicit Euler solver. Though BAM im-plements higher order methods to solve this equation(Crank–Nicolson method), the solution obtained withthe ﬁrst order Euler solver agrees with that of

BAM wherea comparison is made for two trajectories in the leftpanel of Fig.10. 4.3.

Oct-tree box-in-box In Athena++ , adaptive mesh reﬁnement (AMR) is im-plemented as follows: during the evolution a certain We adapted the public code into a stand-alone library thatmay be found at the URL https://bitbucket.org/bernuzzi/twopuncturesc/.

MeshBlock and conse-quently the code reﬁnes the particular

MeshBlock , de-reﬁnes it or does nothing. In the punctures’ case thecondition we employ relies on the punctures’ position.For a given

MeshBlock we ﬁrst calculate the distance min i || x i p − x MB || ∞ , where x p , x MB denote the punctureand MeshBlock positions respectively, and i labels eachpuncture that is present. This allows one to assess the theoretical reﬁnement level the MeshBlock should be in.If the reﬁnement level of the considered

MeshBlock isnot the same as the theoretical one just calculated, thenthe block is either reﬁned or de-reﬁned according to itscurrent level. The theoretical reﬁnement level is deter-mined by considering a classic box-in-box structure ofthe grid, in which each puncture is enclosed in a seriesof nested boxes centered on the puncture, all with thesame number of points but with increasing physical ex-tent, i.e. with decreasing resolution. In particular, eachbox has half of the resolution of the next one it contains.The presence of punctures and their position determinesa structure of nested boxes, in such a way that the small-est (and ﬁnest) imaginary box around a puncture deﬁnesthe highest reﬁnement level. The box containing it cor-responds to a lower reﬁnement level and so on up tothe th level which corresponds to the initial mesh it-self. Practically, to deﬁne a grid in (GR-)Athena++ oneneeds to specify the number of points of the initial meshgrid N M , the number of points per MeshBlock consti-tuting the mesh N B , and the number of total reﬁnementlevels N L up to which the grid has to be reﬁned. Follow-ing the procedure above, GR-Athena++ will reﬁne each

MeshBlock , producing an oct-tree box-in-box grid struc-ture. A visualization of this can be seen in Fig.7, inwhich the initial conﬁguration of two coalescing blackholes and a snapshot at later time are shown. Here N M = 64 and N B = 16 , thus the initial mesh is di-vided into MeshBlock s (level 0, see §2.1). Follow-ing the procedure described above,

MeshBlock s are sub-divided up to level 10, which corresponds to the smallest

MeshBlock s containing the two black holes. The ﬁnalgrid is composed of the initial

MeshBlock s, which arethose far from the punctures and thus untouched, andincreasingly smaller blocks (in terms of physical extent)for each reﬁnement level. Note that in the top panellevels 7, 8, 9, 10 are visible, while due to regridding atlater times in the bottom panel

MeshBlock s of level 7 For implementation reasons, x MB is deﬁned as follows: we con-sider a cube with same center, the edge of which is / of theedge of the original MeshBlock . x MB are the coordinates of thecorner of this cube which is closer to the closest puncture. (the ones closest to the edges of the plot) have beensubdivided and their children belong to level 8. Figure 7.

2D slice at z = 0 of a mesh grid produced with GR-Athena++ , setting 11 reﬁnement levels. Top panel: blackholes at initial position x ± = ( ± . , , M . Bottompanel: snapshot at M in which trackers are shown (redlines). The color code refers to the value of the conformalfactor χ . For clarity, in the ﬁgure only a subset of the totalslice is shown, therefore only highest levels are visible. It isalso possible to see the underlying box-in-box structure, inwhich boxes are made up of MeshBlock s. Grid conﬁgurations

In order to accurately and eﬃciently perform a BBHmerger simulation it is crucial to optimize the grid con-ﬁguration for a given problem. Thus to attain accuracyat reduced computational cost a balance must be strucksuch that the strong-ﬁeld dynamics are well-resolved and5their eﬀects propagate cleanly into the wave-zone for ex-traction. The former can be directly controlled based onthe oct-tree box-in-box reﬁnement criterion and ﬁxinga target puncture resolution δx p or equivalently, maxi-mum number of reﬁnement levels N L . For the wave-zonean extraction radius R must be selected and the under-lying reﬁnement strategy together with choices of N M and N B induce a resolution δx w . Finally, the maximumspatial extent x M of the computational domain Ω mustbe chosen to be suﬃciently large so as to mitigate anypotential spurious eﬀects due to imposed approximateboundary conditions.Unless otherwise stated we select Ω as the Cartesiancoordinatized cube [ − x M , x M ] which results in a res-olution on the most reﬁned level in the vicinity of apuncture of: δx p = 2 x M N M N L − . (40)Wave-zone resolution δx w may similarly be computedtaking N L = (cid:100) log (cid:0) x M R (cid:1) (cid:101) in the above where R is theextraction radius. For the calculations presented in thiswork, we typically select N B = 16 , which allows forup to th order accuracy for approximants to operatorspertaining to quantities appearing during spatial dis-cretization. The constraint on maximum approximantorder that can be selected for a choice of N B is given inEq.(2) which arises on account of the double restrictionoperation described in §2.2.3. Natively, GR-Athena++ supports up to th order which may be further extendedthrough simple modiﬁcation of the relevant C++ tem-plates.We have observed that a simple approach to furtheroptimize for eﬃciency once convergence properties areestablished is to modify N M and N B . PUNCTURE TESTSIn this section we present several tests of punctureevolutions to validate

GR-Athena++ . We compare ourresults against

BAM code and

TEOBResumS , used as bench-marks. We also demonstrate the convergence propertiesof our code for these tests.Unless otherwise stated, throughout this section weadopt tortoise coordinates, in which evolution time t is mapped as t → u ≡ t − r ∗ , where r ∗ = r +2 M log (cid:12)(cid:12) r M − (cid:12)(cid:12) and M, r are the total mass of thebinary system and the Schwarzschild coordinate, re-spectively. In waveform plots quantities are suitably Direct control on δx w is oﬀered in GR-Athena++ through optionalintroduction of a minimum reﬁnement level maintained over aball of radius R centered at C but for the results presented itwas not found to be necessary to utilize. rescaled by M (see §4.1) and by the symmetric massratio ν := M M ( M + M ) . The merger or time of merger is deﬁned as the time corresponding to the peak of the ( (cid:96) = 2 , m = 2) -mode of A (cid:96)m (deﬁned as in Eq.(35)).5.1. Single Spinning Puncture

In order to verify the evolution of a single black holepuncture, as well as the gravitational wave signal calcu-lated by

GR-Athena++ we perform a direct comparisonwith the established

BAM code. Using initial data gen-erated by the

TwoPunctures library for both codes, asused in similar tests of the

BAM code in Hilditch et al.(2013), we simulate the evolution of a single spinningpuncture, representing a Kerr black hole with dimen-sionless spin parameter a = 0 . . To this end we initial-ize two black holes, one with the target mass, M , andspin a = 0 . , and another with negligible mass, − M ,and zero spin, with a small initial separation, − M .These black holes merge soon after the simulation be-gins, and the resulting single black hole can be treated asour target Kerr BH. We use the static mesh reﬁnementof GR-Athena++ to construct a reﬁned grid around thepuncture that matches the resolution of the

BAM evolu-tion both at the puncture ( δx = 0 . M ) and in thewave zone ( δx = 0 . M ).To compare the two wave signals, we calculate thedominant (2 , mode of the strain from the expressionsabove in Eqs.(27–31, 33). In doing so, we perform twointegrations in the time domain (note this is diﬀerentto the frequency domain integration performed for the (2 , mode studied below for the black hole binary, ashere we have no well-motivated cut-oﬀ frequency avail-able Dietrich & Bernuzzi (2015)). These integrationsmay add an arbitrary quadratic polynomial in time ontothe strain as constants of integration Damour et al.(2008); Baiotti et al. (2009) and so, in the results pre-sented here, we ﬁt for, and then subtract, this quadratic.We further note this reconstruction has been shown tointroduce errors in the waveform ring-down Dietrich &Bernuzzi (2015).In Fig.8 we show the match between the two calcu-lated signals for the real part of the (2 , mode of thegravitational wave strain. These show consistency inthe phasing of the signal, with slight discrepancies inthe amplitude of the strain.We also demonstrate the convergence properties of thewaveforms in GR-Athena++ for this test. We performthe same simulation at a coarse, medium and ﬁne reso-lution (with ﬁnest grid spacings δx c = 0 . M, δx m = Hereafter the (2,2) mode, and similarly for other ( (cid:96), m ) modesand for all related quantities. Figure 8. (cid:60) ( Rh /M ) for a single spinning puncture in GR-Athena++ and

BAM with diﬀerence shown in black as afunction of Schwarzschild tortoise coordinate, u . Wave ex-tracted at R = 50 M . Figure 9.

The diﬀerence between the waveform at coarseand medium resolution ( δx c − δx m ) is consistent with thediﬀerence between the waveform at medium and ﬁne resolu-tion ( δx m − δx f ) when rescaled by the appropriate factor for th order convergence Q . Waves extracted at R = 50 M . M, δx f = 0 . M ), and show that the diﬀer-ence between the medium and ﬁne resolution waveformmatches the diﬀerence between the coarse and mediumwaveform, when rescaled by the factor Q n for n th orderconvergence, deﬁned as: Q n = δx nc − δx nm δx nm − δx nf . (41)In Fig.9 we show convergence properties for a simula-tion with th order accurate ﬁnite diﬀerencing operators.We see here that, rescaling assuming th order conver-gence, GR-Athena++ demonstrates under-convergence atinitial times and a consistent order of convergence at later times, although without a precise point-wise scal-ing. We note that for similar tests performed with the

BAM code in Hilditch et al. (2013) using initial data con-structed in the same manner, these same convergenceproperties are observed for the waveform of the spin-ning puncture problem.5.2.

Calibration evolution of two punctures

We validate

GR-Athena++ against binary black holeevolutions by comparing with

BAM and performing con-vergence tests. For these tests the two initial non-spinning black holes, with bare-mass m ± = 0 . M , arelocated on the x − axis, with x , ± ( t = 0) = ± . M ,and initial momenta directed along the y − axis, p ± ( t =0) = ∓ . M . The gauge is chosen as explained in§3.2. For the GR-Athena++ vs.

BAM comparison and forconvergence tests several runs at diﬀerent resolutionshave been performed. The grid conﬁguration for bothcodes is described in detail below. This initial setup re-sults in a ∼ . orbits evolution of the two black holes be-fore merger, which happens at evolution time t ∼ M ,as can be seen in Fig.10.In the next two subsections the (2 , mode of gravi-tational wave strain is calculated according to Eq.(33).5.2.1. GR-Athena++ vs.

BAM comparison

Athena++ and

BAM implement completely diﬀerentgrid structures. To compare the two codes, we try togenerate grids as similar as possible aiming to matchthe resolution at the puncture and the physical extentof the grid. In the case of

GR-Athena++ we choosegrid parameters N B = 16 , N L = 11 , x M = 3072 M and various resolutions N M = [96 , , , .This results in resolutions at the puncture of δx p =[ 1 . , . , . , . × − M (seeEq.(40)). For BAM the same is achieved in each corre-sponding simulation by considering 6 nested boxes of N M points and 10 smaller boxes (5 per puncture, cen-tered at each one) of N M / points and maximum spac-ing in the outermost grids ∆ x = [96 , , , , M respectively. In both cases all simulations are performedwith th order ﬁnite diﬀerences stencils for derivatives.Fig.10 shows very good agreement between the twocodes regarding black hole trajectories (left panel). Thiscan be also seen looking at the right panel, in which thetwo GW frequencies perfectly match. There is a dis-crepancy of 2% between GW amplitudes that convergesaway with increasing resolution.5.2.2. Convergence tests for

GR-Athena++

Convergence tests are performed on the same runsas the previous section plus an additional run madeat resolution N M = 384 , with N B = 24 resulting in7 Figure 10.

Comparison between

BAM and

GR-Athena++ of the trajectories of puncture − (left panel) and of gravitationalwaveforms (right panel) for resolution N M = 192 . Waveforms are extracted at a representative radius R = 120 M and mergertime is deﬁned as the amplitude peak time. Discrepancy between the two amplitudes is (cid:46) . Figure 11.

Convergence plots for calibration BBH evolution. Left and right plots correspond respectively to th order and th order ﬁnite diﬀerencing. In both case waves are extracted at R = 120 M . In bottom panels phase diﬀerences between resolutionsare rescaled according to Eq.(41) with respect to the blue line (corresponding to lowest resolutions). δx p = 0 . × − M . Moreover, we consider an-other set of runs employing th order ﬁnite diﬀerences,with the same grid setup as in the previous sectionbut halving the maximum extent of the physical grid,namely in this case x M = 1536 M . This has the ef-fect of doubling the resolution at the puncture. In thetop panels of Fig.11 we compare the gravitational wavestrain extracted at R = 120 M for every resolution, bothfor th and th order. In order to quantitatively investigate the eﬀect of res-olution on phase error we inspect diﬀerences in phasebetween runs in the bottom panels of Fig.11. InvertingEq.(35) allows us to write the phase diﬀerence as: | ∆ φ ( α, β ) | := (cid:12)(cid:12)(cid:12) φ [ h ] | α − φ [ h ] | β (cid:12)(cid:12)(cid:12) . (42)In the bottom panel of the th order plot (Fig.11 left)the red and green lines match, demonstrating th orderconvergence for the highest resolutions. In the th ordercase (Fig.11 right) the waveforms (and corresponding8 Figure 12.

Self-convergence test for the calibration BBHevolution. In the plot phase diﬀerence at merger with re-spect to the highest resolution available is reported on y -axis, against the resolution N M . Diamonds correspond to th order series, while dots refer to th order series. Purpleand cyan lines represent the theoretical convergence for bothcases. frequencies) lay on top of each other, so as to be indis-tinguishable, and this translates into smaller phase dif-ferences comparing the bottom panels of the two plots.Even though the red line, corresponding to the phasediﬀerence between the two highest resolution, is quitenoisy, th order convergence can be seen here for thethree highest resolutions as in the previous case. Thisbehavior is present in every extraction radius. In allcases, the plots show a convergent behavior with respectto the phase diﬀerences, i.e., the diﬀerences betweeneach pair of lines decreases with increasing resolution.Additionally, we check the accuracy of our convergencetests by evaluating the error on the phase diﬀerences, es-timated for each line as the diﬀerence between the phasegiven by the Richardson extrapolation formula and thephase corresponding to the highest resolution used tocalculate the Richardson extrapolated phase, similarlyto what is done in Bernuzzi & Dietrich (2016). For eachcorresponding phase diﬀerence line, we ﬁnd that this er-ror is always at least ∼ smaller than ∆ φ .Additionally, we show in Fig.12 a convergence plot inwhich phase diﬀerences are calculated at merger (see be-ginning of §5). This ﬁgure further conﬁrms the clean th order convergence of GR-Athena++ for the highest res-olution cases. For the th order case, comparing withthe other, phase diﬀerences are smaller and the conver-gence is faster. However, the noise in the phase diﬀer-ences with respect to the highest resolution (right plotin Fig.11, bottom panel) makes the convergence assess-ment less clean. 5.3. Two punctures evolution of ten orbits

Physically one anticipates that inspiral of astrophysi-cal binary systems is well-described by a signiﬁcant du-ration of co-orbit on a quasi-circular trajectory Peters &Mathews (1963); Peters (1964). This assumption is con-sistent with the events detected by the LIGO and Virgocollaborations Abbott et al. (2016a,b, 2017a). Conse-quently it is of considerable interest to also test the per-formance of

GR-Athena++ in this scenario. To this endwe evolve non-spinning, equal-mass, low eccentricity ini-tial data based on Hannam et al. (2010) where bare-massparameters are m ± = 0 . M and the puncturesare initially on-axis at x ± = ± . M with instan-taneous momenta p ± = ( ∓ . × − , ± . × − , M . This choice of parameters results in ∼ orbits prior to merger at t ∼ M . In comparison tothe calibration evolution this evolution is of signiﬁcantlylonger duration and therefore it is of interest to inves-tigate how waveform accuracy is aﬀected for a selectionof Mesh parameters that reduce computational resourcerequirements. In order to provide another comparisonthat is independent of

BAM here we provide a ﬁnal as-sessment on the quality of waveforms computed with

GR-Athena++ based on the NR informed, eﬀective-one-body model

TEOBResumS

Nagar et al. (2018).5.3.1.

Setup

The convergence studies performed for the calibra-tion BBH merger problem provide a guide as to howto choose resolution at the puncture δx p . Here we ﬁxthe MeshBlock sampling to N B = 16 and work at th order in the spatial discretization. For the Mesh sam-pling we select N M = 64 and construct a sequence ofgrid conﬁgurations where each value of δx p is reducedby a factor of / compared to the previous in Tab.1. ρ ( · ) N L x M δx p × − [ M ] . . . . . . . Table 1.

A distinct label ρ ( · ) is assigned to each run withcorresponding maximum number of reﬁnement levels N L andﬁxed physical extent x M of the Mesh (see §4.4). Resultantpuncture resolutions δx p and total number of MeshBlock ob-jects initially partitioning a

Mesh are provided.

That the choice of parameters in Tab.1 reduce overallcomputational resource requirements can be understood9

Figure 13.

Coordinate trajectories of both punctures( x + ( t ) in blue and x − ( t ) in orange) for parameter choice ρ h . as follows. Consider the choice of parameters made in ρ h and suppose N M and N L are varied while maintaining δx p ﬁxed. With this, the number of MeshBlock objectsrequired to partition the initial

Mesh changes. For ex-ample, taking N M = 128 and N L = 12 resulted in aninitial number of MeshBlock objects of .Whereas selecting N M = 256 and N L = 11 leads to initially. Generally, we found an approx-imate cubic scaling in as N M is scaled which isrelated to the dimensionality of the problem.In this section an extraction radius of R = 100 M isused. The CFL condition is . and a KO dissipationof σ = 0 . is chosen. Constraint damping parametersare selected as κ = 0 . and κ = 0 .The coordinate trajectories of the punctures for a cal-culation utilizing grid parameters ρ h of Tab.1 can beseen to satisfy ten orbits in Fig.13. This provides an ini-tial veriﬁcation of expected qualitative properties Han-nam et al. (2010) of the BBH inspiral and merger. Inorder to investigate the behavior of the constraints wefocus attention on the collective constraint C of Eq.(21).We display values of C in the orbital plane ( z = 0 ) atﬁxed times t = 500 M and t = 2100 M in Fig.14. Thegeneral properties of C discussed here we found to beshared between other simulations utilizing parametersfrom Tab.1. Crucially, this means that increasing re-ﬁnement in the vicinity of the puncture does not con-taminate the rest of the physical domain. In all cases wefound that away from the punctures values of C decreaseon average as the boundary of the computational do- Figure 14.

Values of the (normalized) collective constraint (cid:98) C ( x, y, z ) := C ( x, y, / max x, y C ( x, y, over the orbitalplane z = 0 for a simulation with ρ h of Tab.1. Upper panel:Evolution time is t = 500 M where max x, y C ( x, y, (cid:39) . . Lower panel: Evolution time is t = 2100 M where max x, y C ( x, y, (cid:39) . . As can be seen in both cases con-straint violation is greatest directly in the vicinity of thepunctures. See text for further discussion. main is approached. In particular, for the calculation in-volving parameters ρ h and during M ≤ t ≤ M as (cid:37) := (cid:112) x + y → M we found C ∼ − whichcontinues to decrease as (cid:37) → M to C ∼ − there-after plateauing at C ∼ − towards the boundary.We found qualitatively similar behavior when inspect-ing the Hamiltonian constraint. We remark that, evenin the continuum limit, constraints are not expected toconverge to zero in the entire domain for this solutionbecause punctures are excluded from R .Of principal interest for gravitational wave detectionis the strain. To this end we solve Eq.(33) for h in0 Figure 15.

The (2 , multipole of the GW strain normal-ized to the symmetric mass ratio ν = 1 / computed for sim-ulations based on parameters of Tab.1. Peak amplitude for achoice of ρ h occurs at u/M = 2037 . which indicates the endof the inspiral Bernuzzi et al. (2014). Dephasing as merger-time is approached reduces rapidly with increased resolution(see also Fig.16 though note that the legend diﬀers there).Note: horizontal axis scale changes at u/M = 1500 . the frequency domain making use of the FFI methodof Reisswig & Pollney (2011). A cut-oﬀ frequency of f = 1 / × / is chosen which is physically motivatedby inspecting the early time puncture trajectories of theinspiral. We display the resulting (2 , mode of thestrain for calculations using the parameters of Tab.1 inFig.15.The peak amplitude of h indicates the end of theinspiral Bernuzzi et al. (2014) and for a grid parameterchoice of ρ h occurs at u = 2037 . M . The maximumdeviation from this value for parameters investigated inFig.15 occurs when ρ ml is utilized resulting in ∆ u =10 . M . In order to directly quantify how the choiceof δx p aﬀects the phase error in the strain waveform asmerger time is approached we compute ∆ φ using Eq.(42)and show the result in Fig.16.As we have not modiﬁed resolution globally over thecomputational domain but rather considered the eﬀectof introducing additional reﬁnement levels in the vicinityof the punctures it is not clear what sort of convergenceshould be expected. Furthermore the extent to whicha time-integrator order below the order of the spatialdiscretization aﬀects GW waveform quality can also besomewhat delicate (see e.g. the super-convergence dis-cussion of Reisswig & Pollney (2011)). In Fig.16 (upperpanel) clean th order convergence in ∆ φ is not found forall u upon rescaling with the appropriate factors deter-mined through Eq.(41). An additional issue that com-plicates the discussion here is that the parameters ofTab.1 also vary the spatial extent of the computationaldomain potentially introducing a source of systematic Figure 16.

Phase diﬀerences ∆ φ between simulations in-volving parameters of Tab.1. Upper panel: A trend of ∆ φ ac-cumulating with time is present. Merger time correspondingto ρ h is indicated with a vertical black line at u/M = 2037 . .A decrease in ∆ φ occurs as δx p pairs of decreasing values arecompared. In order to mitigate a systematic eﬀect of variedspatial extent in the computational domain we also com-pute phase diﬀerences at ﬁxed x M such as ∆ φ ( ρ l , ρ m ) and ∆ φ ( ρ m , ρ h ) . These two diﬀerences are also shown rescaledwith Q and Q (cid:48) as computed using Eq.(41) under the as-sumption of th order spatial discretization. Lower panel:Phase diﬀerences at merger computed with reference datataken from the ρ h run are depicted as a function of punctureresolution. Data on the black reference curve would obey a th order convergence trend. See text for further discussion. error. For example, at merger ∆ φ ( ρ mh , ρ h ) (cid:39) × − and ∆ φ ( ρ m , ρ h ) (cid:39) × − though δx p ( ρ m ) > δx p ( ρ mh ) .In order to compensate for this eﬀect we consider phasediﬀerences at ﬁxed x M . In particular the lower panel ofFig.16 displays phase diﬀerences at merger where ρ h istaken as the reference value for all comparisons. Whiledisplayed ∆ φ are compatible with a th order trend the ρ vvl choice of parameters is likely of too low resolution tomake a robust claim on convergence order with respectto varying δx p . Nonetheless it is clear that judiciouschoice of reﬁnement level (and hence resolution local tothe punctures through δx p ) reduces GW phase error.5.3.2. EOB comparison

We compare the gravitational waveform from the orbit simulation to the state-of-the-art EOB model TEOBResumS

Nagar et al. (2018). The latter is in-formed by several existing NR datasets and faithfullymodels the two-body dynamics and radiation of spin-aligned BBH multipolar waveforms for a wide variety of1mass ratio and spins magnitudes. We focus again onthe (2 , mode of the gravitational wave strain. The GR-Athena++ ψ output mode is ﬁrst extrapolated tonull inﬁnity using the asymptotic extrapolation formulaLousto et al. (2010); Nakano (2015): lim r →∞ rψ ∼ A (cid:16) rψ − ( l − l + 2)2 r (cid:90) rψ d t (cid:17) , (43)where A ( r ) := 1 − M/r and r := R (1 + M/ (2 R )) with R being the (ﬁnite) GW extraction radius of anNR simulation. The extrapolated result is successivelyintegrated twice in time (Eq.(33)) using the FFI methodReisswig & Pollney (2011) to obtain the strain mode h . The waveform comparison is performed by suitablyaligning the two waveforms; the time and phase relativeshifts are determined by minimizing the L norm of thephase diﬀerences Bernuzzi et al. (2012).Figure 17 shows the two waveforms are compatiblewithin the NR errors. The accumulated EOBNR phasediﬀerences are of order (cid:39) . rad to merger and (cid:39) . radto the ring-down for the highest resolution GR-Athena++ simulation. The larger inaccuracy of the ring-down partis a resolution eﬀect related to the higher frequency ofthe wave; it can potentially be improved by adding areﬁnement level so as to better resolve the black holeremnant. The maximum relative amplitude diﬀerenceis of order (cid:39) . . The same comparison using the low-est resolution gives (cid:39) . rad at merger ( (cid:39) rad duringthe ring-down) and maximum relative amplitude diﬀer-ences of (cid:39) . . Overall, this analysis demonstrates that GR-Athena++ can produce high-quality data for wave-form modeling. SCALING TESTSTo check the performance of

GR-Athena++ and makesure it maintains the scalability properties of

Athena++ ,we conduct weak and strong scaling tests with the sameproblem setup as in §5.2. In these tests BBH evolu-tions of 20 Runge-Kutta time-steps are performed, withfull AMR and full production grids, in which N B = 16 , N L = 11 , x M = 1536 M are ﬁxed and making use ofhybrid MPI and OMP parallelization. These tests areperformed on the cluster SuperMUC-NG at LRZ. Specif-ically, on each node of the cluster (48 CPUs per node) 8MPI tasks with 6 OMP threads are launched, thus ﬁll-ing up the node. We ﬁnd very good results using up to2048 nodes ( ∼ ) CPUs. With respect to scaling testspresented in Dendro-GR

Fernando et al. (2018), our re-sults favorably compare both in the case of strong andweak scaling tests.6.1.

Strong scaling tests

To test the strong scaling behavior of the code indiﬀerent regimes of CPU numbers, we consider 6 res-olutions, namely N M = [64 , , , , , .For each of them we perform an evolution on i = N nodesmin , . . . , N nodesmax nodes, where N nodesmin/max representsome limits on the possible number of nodes that canbe used for each resolution. The presence of theseboundaries is due to the fact that for a certain res-olution a speciﬁc number of MeshBlock s is producedand consequently on the one hand a suﬃcient num-ber of CPUs is required to handle those

MeshBlock sand, on the other hand,

Athena++ ’s parallelization strat-egy is based on

MeshBlock s and so is not possible touse more OMP threads (and consequently CPUs) than

MeshBlock s. Fig.18 (left) shows that excellent strongscalability is obtained up to ∼ . × CPUs, witheﬃciency above . For the aforementioned reasons,for a given resolution it is not possible to achieve higheﬃciency when increasing CPU number above a cer-tain point. Fig.18 (right) shows that eﬃciency stronglydepends on the ratio between

MeshBlock number andCPUs. In particular, for each resolution, an eﬃciencyof above is obtained when there are at least 10

MeshBlock s/CPU. By contrast, the eﬃciency shown inthe strong scaling plot of Fernando et al. (2018) appearsto decrease faster when comparing to the brown line inFig.18 (left), although the two results are obtained inslightly diﬀerent regimes of CPU number.6.2.

Weak scaling tests

For weak scaling tests we use asymmetric gridsin terms of N M in each direction, while keeping MeshBlock s of a constant size N B = 16 . In particu-lar, we start with a run on a single node with a grid N xM = 128 , N yM = 64 , N zM = 64 . Then for 2 nodes wedouble N yM . For 4 nodes N zM is doubled as well and for 8nodes N xM is also doubled. We continue this up to 2048nodes. In this way, we are able to double the resourcestogether with the required computations, and we man-age to keep a ratio of ∼ MeshBlock s/CPUs in eachdiﬀerent run. The results are displayed in Fig.19. Weperformed these tests twice, once with the code compiledusing the Intel compiler and once with the code compiledusing the GNU compiler. The top panel of Fig.19 showsthat the total CPU time per MPI task remains constantup to CPUs employed, thus demonstrating excellentscalability in an unprecedented CPU number regime fora numerical relativity code. The bottom panel displaysthe same result, but focuses on how the execution timeis distributed between the main computational kernels.Notably, most of the computation time is spent in thecalculation of the right hand side of the equations, which2 − − − − − − − − − t − t mrg ) /M − . − . − . . . . . < ( R h / ( Mν )) GR-Athena++ < ( R h / ( Mν )) TEOBResumS − − − −

100 0 Mω GR-Athena++ Mω TEOBResumS

Figure 17.

Comparison between the

GR-Athena++

BBH q = 1 waveform and the semi-analytical eﬀective-one-body model TEOBResumS . The plot shows the (2 , multipole of the GW strain normalized to the symmetric mass ratio ν = 1 / and theinstantaneous GW frequency. The time is shifted to the mode amplitude peak that approximately deﬁnes the merger time. The GR-Athena++ waveform is from the highest resolution simulation ( ρ h of Tab.1), extracted at coordinate radius R = 100 M andextrapolated to null inﬁnity using Eq.(43). Note: horizontal axis scale changes at ( t − t mgr ) /M = − . Figure 18.

Strong scaling tests for

GR-Athena++ for several CPU number regimes. Left plot: speed-up (top panel) and eﬃciency(bottom panel) calculated with respect to the ﬁrst point of the series. In each series the second point corresponds to a theoreticalspeed-up by a factor of 2, the third point a factor of 4 and so on. Right plot: eﬃciency as a function of the

MeshBlock loadeach CPU carries. is indeed the expected behavior in absence of race con-ditions or other bottlenecks elsewhere in the code. Thediscrepancy between the height of each bar in the bot-tom panel of the plot and the dotted line in the top panelgive an estimate on the communication time among allMPI processes and OMP threads and the comparisonbetween the two plots suggests that this also has goodscaling behavior. SUMMARY AND CONCLUSIONIn this work we have presented

GR-Athena++ ; a vertex-centered extension of the block-based AMR frameworkof

Athena++ for numerical relativity (NR) calculations. To this end we described our introduction of vertex-centered (VC) discretization for ﬁeld variables whichmay be utilized for general problems. A principle ad-vantage of VC is that restriction of sampled functiondata from ﬁne to coarse grids that are interspersed andthe coarser of which has fully coincident grid-points is ef-ﬁciently achieved by copying of data. This procedure isformally exact. The dual operation of prolongation viainterpolation of data from coarse to ﬁner grids also takesadvantage of the aforementioned grid structure wherepossible.Another novel feature is our introduction of geodesicspheres in the sense of highly reﬁned, triangulated con-vex, spherical polyhedra. Placement on a

Mesh may3

Figure 19.

Weak scaling tests for

Athena++ , performed on SuperMUC-NG at LRZ compiling the code with GNU compilerand Intel compiler. Top panel reports total CPU times measured for rank 0 directly in the code with the

C++ function clock_t .CPU time in the bottom panel is measured instead using the proﬁling tool gprof ; here, times which count less than 2% of thetotal CPU time are neglected. Light (left) bars represent results obtained with GNU compiler, while dark (right) ones are forIntel compiler. be arbitrarily chosen without restriction on underlyingcoordinatization. An advantage of this is that the as-sociated vertices deﬁning a geodesic sphere achieve amore uniform spatial sampling distribution when con-trasted against traditional uniform spherical latitude-longitude grids at comparable resolution. Furthermore,the potential introduction of coordinate singularities isavoided. We demonstrated the utility of this approachwhen quantities need to be extracted based on sphericalquadratures.Our code implements the Z4 c formulation of NR of the χ moving-punctures variety. The overall spatial order ofthe scheme may be selected at compile time where easilyextensible C++ templates deﬁne the formal order for de- sired ﬁnite diﬀerence operators and VC restriction andprolongation operators. Time-evolution is performedwith a low-storage th order Runge-Kutta method. Ourprimary analysis tool is based on extraction of gravita-tional radiation waveforms within GR-Athena++ throughthe use of the Ψ Weyl scalar with numerical quadraturethereof based on the aforementioned geodesic spheres.Through a post-processing step we also analyze the grav-itational strain h .In order to assess the numerical properties of our im-plementation we have repeated a subset of the stan-dard Apples with Apples test-bed suite Alcubierre et al.(2004); Babiuc et al. (2008); Cao & Hilditch (2012); Dav-erio et al. (2018) and performed a variety of cross-code4validation tests against BAM . To accomplish the latterwithin the oct-tree AMR approach of

GR-Athena++ weconstructed a reﬁnement criterion so as to closely emu-late a nested box-in-box grid structure. Test problemsin the cross-code comparison involved a single spinningpuncture and the two puncture binary black hole (BBH)inspiral calibration problem of Brügmann et al. (2008).For overall th order spatial scheme selection a commen-surate th order convergence was cleanly observed. For th order, convergence was also achieved for the sameproblem, though less cleanly. As another external val-idation of GR-Athena++ and a demonstration of utilityfor potential future studies of high mass ratio BBH thatinvolve signiﬁcant evolution duration and where accu-rate phase extraction from the gravitational strain iscrucial, we investigated a quasi-circular ten orbit inspi-ral problem based on parameters from Hannam et al.(2010). Here the evolution was performed with th orderspatial discretization and comparison was made againstthe NR informed EOB model of TEOBResumS

Nagar et al.(2018). Using a

Mesh tailored to moderate computa-tional resources we found accumulated EOBNR phasediﬀerences of order (cid:39) . rad to merger and (cid:39) . radto the ring-down for the highest resolution calculation.These tests highlight that GR-Athena++ is accurate androbust for BBH inspiral calculations and a strong con-tender for construction of high-quality data for wave-form modeling.In addition to accuracy, eﬃciency of computationalresource utilization is crucial. The task-based compu-tational model for distribution of calculations within

Athena++ and concomitant impressive scalability prop-erties as available resources are increased we have foundto readily extend to

GR-Athena++ and the Z4 c systemwith VC discretization for a wide variety of problemspeciﬁcations. Indeed during strong scaling tests itwas found that eﬃciency above is reached up to ∼ . × CPUs, whereas in weak scaling tests almostperfect scaling is achieved up to ∼ CPUs. This in-dicates that for the high resolutions and consequently resources required for a potential calculation describingan intermediate mass ratio BBH inspiral

GR-Athena++ compares favorably with the code-base of

Dendro-GR

Fernando et al. (2018) in terms of scalability.Finally, as

GR-Athena++ builds upon the modularframework of

Athena++ we inherit all its extant infras-tructure which will enable incorporating an NR treat-ment of the matter sector in future work. It is our inten-tion to make

GR-Athena++ code developments publiclyavailable in future in coordination with the

Athena++ team. ACKNOWLEDGMENTSThe authors thank Bernd Brügmann, Alessandro Na-gar, and Jim Stone for discussions, and Nestor Ortiz forinitial work on

AwA pn56zo , pn68wi and pn98bu ). The authors acknowledge HLRS for fundingthis project by providing test account access on the su-percomputer HPE Apollo Hawk under the grant number ACID 44191, GRAthenaBBH .APPENDIX A. APPLES WITH APPLES TEST-BEDSIn order to provide a series of computationally in-expensive and standard tests of diﬀering formulationsin numerical relativity (tailored for the vacuum sec-tor) a suite of so-called “Apples with Apples” test-bedproblems (hereafter

AwA ) have been proposed Alcubierreet al. (2004); Babiuc et al. (2008). Our goal here is to ensure that the implementation of Z4 c within GR-Athena++ reﬂects the overall propertiesobserved in prior tests made based on the same formu-lation Cao & Hilditch (2012); Daverio et al. (2018) witha particular focus on elements directly relevant to grav-itational wave propagation and extraction. Speciﬁcally,we examine the AwA: robust stability §A.1, linearizedwave §A.2 and gauge-wave §A.3 tests.5Generically the

AwA tests are speciﬁed for three di-mensional periodic spatial grids (i.e. of T topology)where the eﬀective dynamics occur over one (or two)spatial dimensions. Dynamics in “trivial” directions arereduced to a thin layer which in GR-Athena++ is achievedby selecting the relevant component(s) of N M of the Mesh sampling to be equal to (selected chosen as toprobe potential “checker-board instability” Babiuc et al.(2008)). This entails the full Z4 c equations and imple-mentation of (§3.1) are active during a calculation. Di-rections involving non-trivial dynamics are sampled andtested on both cell-centered G CC and vertex-centered G VC grids: G CC := (cid:110) −

12 + (cid:16) n + 12 (cid:17) δx (cid:12)(cid:12)(cid:12) n ∈ { , . . . N − } (cid:111) , G VC := (cid:110) −

12 + nδx (cid:12)(cid:12)(cid:12) n ∈ { , . . . N } (cid:111) ; (A1)where δx = 1 /N and N = 50 ρ with ρ ∈ N servingto adjust resolution as required. Each direction is fur-ther extended by ghost-zones as described in §2. Noreﬁnement is present in this section though MeshBlock objects have N B chosen so as to partition the domainand allow for a further consistency check on the MPI-OMP hybrid parallelism. Selection of Z4 c parameters isas follows: Constraint damping κ = 0 . and κ = 0 with shift-damping typically selected as η = 2 . Kreiss-Oliger (KO) dissipation is taken as σ = 0 . and theCourant-Friedrich-Lewy (CFL) condition as / . Thesechoices are motivated by those made in prior work Cao& Hilditch (2012) and we comment on behavior upondeviation from them as tests are presented.For the tests performed here Z4 c is always coupled tothe puncture gauge described in §3.2 as this is of primaryinterest in this work. We take initial gauge conditionsto be: α | t =0 = 1 , β i (cid:12)(cid:12) t =0 = 0 . Furthermore, in order to facilitate comparison we set N g = 2 throughout such that spatial discretization is of nd order, though numerical experiments with N g = 3 reveal similar properties. Time-evolution is performedusing the th order RK S ] method of Ketcheson(2010). A.1. Robust stability

The robust stability test is performed with one eﬀec-tive spatial dimension and is designed to eﬃciently de-tect instability within numerical algorithms aﬀecting theprincipal part of the evolution system. An initial spatialslice of Minkowski is made where all Z4 c ﬁeld compo-nents (§3.1) have to each sampled grid point a resolution Figure 20.

Robust stability test. Errors are displayed for G VC with solid lines and G CC with solid lines marked “ • ”.Diﬀerent selections of parameter entering the Mesh samplingare: ρ = 1 in blue; ρ = 2 in orange; ρ = 4 in green; ρ = 8 inred; ρ = 16 in purple. Top panel: Rescaled L ∞ norm of col-lective constraint monitor. Bottom panel: Deviation fromﬂat metric. Consistent behavior between CC and VC dis-cretization is observed at all resolutions. See text for furtherdiscussion. dependent, distinct (i.e. independent) uniform randomvalue added: ε ∈ ( − − /ρ , − /ρ ) . (A2)Values are selected such that ε is below round-oﬀ indouble precision arithmetic. The variable ε models theeﬀect of ﬁnite machine precision and for a code to passthe test stable evolution must be observed. A code thatcannot pass this test would potentially have severe issueswith any evolution of smooth initial data. We evolve toa ﬁnal time T = 1000 and monitor (cid:107)C(cid:107) ∞ where C is de-ﬁned in Eq.(21) together with (cid:107) γ ij − δ ij (cid:107) ∞ as suggestedin Daverio et al. (2018). Results are displayed in Fig.20.We ﬁnd that over the duration of the time-evolution (cid:107)C(cid:107) ∞ plateaus for ρ ≥ with ρ = 1 appearing alsoto tend towards a plateau. Similar behavior is foundfor (cid:107)H(cid:107) ∞ . This indicates stability in the sense thatno spurious exponential modes are excited. Reducing η → is observed to lead to qualitatively similar generalbehavior. This was also observed in Cao & Hilditch(2012). As errors decrease as resolution is increased weconsider GR-Athena++ to pass this test.A.2.

Linearized wave

The purpose of the linearized wave test is to checkwhether a code can propagate a linearized gravitationalwave, a minimal necessity for reliable wave extractionfrom strong-ﬁeld sources Babiuc et al. (2008).An eﬀective one-dimensional test with dynamicsaligned along the x -axis is speciﬁed through spatial slic-6ing of: d s = − d t + d x + (1 + H )d y + (1 − H )d z , (A3)where: H ( x, t ) = A sin (cid:32) π ( x − t ) d (cid:33) , (A4)and d = 1 is set to match the periodicity of the un-derlying computational domain and A = 10 − selectedsuch that quadratic terms are on the order of numericalround-oﬀ in double precision arithmetic. Consequently,initially we have α = 1 and β i = 0 with non-trivialextrinsic curvature components: K yy = − ∂ t [ H ( x, t )] , K zz = 12 ∂ t [ H ( x, t )] . (A5)Time evolution is performed to a ﬁnal time of T = 1000 .Rather than displaying approximate sinusoidal pro-ﬁles at some ﬁnal time as in Alcubierre et al. (2004);Babiuc et al. (2008); Cao & Hilditch (2012) we follow asuggestion of Daverio et al. (2018) to instead considerthe spectra of data. To this end we compute the discreteFourier transform as: F k ( t ) := 1 N N (cid:88) n =1 (cid:0) γ zz ( t, x n ) − (cid:1) exp( − πik ( x n − t )) , where x n ∈ G ( · ) (see Eq.(A1)). In the case of VC dis-cretization the last point of the (periodic) grid is identi-ﬁed with the ﬁrst and therefore dropped from the sum-mation. Thus comparing Eq.(A4) a spectral measureof the relative error in the travelling wave amplitudeis provided through (cid:15) a ( t ) := || F ( t ) | − A | /A . The ab-solute phase error may be directly inspected through (cid:15) p ( t ) := | arg( F ( t )) − π/ | . We also compute the oﬀ-set of the numerical waveform relative to the amplitude (cid:15) o ( t ) = | F ( t ) | /A .The initial data are constraint violating Cao &Hilditch (2012) and the puncture gauge is not neces-sarily compatible with simple advection, nonetheless weﬁnd that to an excellent approximation the solution isa simple travelling wave with results of the analysis de-scribed above shown in Fig.21.In agreement with Cao & Hilditch (2012); Daverioet al. (2018) we ﬁnd the dominant source of error tobe in the phase of the propagating waveform where thecoarsest sampling ρ = 1 leads to a ﬁnal absolute phaseerror of (cid:39) . cf., the ﬁnest sampling ρ = 16 yielding (cid:39) . × − rad . For (cid:15) a ( t ) and (cid:15) p ( t ) we observe conver-gence with increasing resolution. For (cid:15) o ( T ) at T = 1000 increasing ρ tended to increase error albeit overall thisis acceptably within [7 . , × − ; this is compatiblewith the general behavior found in Daverio et al. (2018). Figure 21.

Linearized wave test with parameters and leg-end of Fig.20. Top panel: relative error in the travelling waveamplitude (cid:15) a ( t ; ρ ) . Middle panel: phase error (cid:15) p ( t ; ρ ) . Bot-tom panel: oﬀset of the wave relative to amplitude (cid:15) o ( t ; ρ ) .The CC and VC discretizations display mutually consistentbehavior in all variables for all t apart from (cid:15) o ( t ; 16) thoughthe overall trend is recovered as t → T . See text for discus-sion. As observed in Cao & Hilditch (2012) reducing κ → or η → leads to qualitatively very similar results. Wethus consider GR-Athena++ to pass this test.A.3.

Gauge wave

A gauge transformation of Minkowski space-timedeﬁnes this test with parameters selected so as toinvolve the full non-linear dynamics. One takes η ab ˙=diag( − , , , in Cartesian coordinates x (cid:48) a andtransforms: ( t (cid:48) , x (cid:48) , y (cid:48) , z (cid:48) ) → (cid:0) t + G + ( x, t ) , x − G ± ( x, t ) , y, z (cid:1) , where G ± ( x, t ) := ± ∂ t [ H ( x, t )] / (8 π ) with H deﬁned inEq.(A4). Two test cases are deﬁned: shifted where G − is selected for the transformation on the x (cid:48) componentwhereas for unshifted G + is chosen in both components.For the latter the induced metric is: γ xx =1 − H, γ yy = γ zz = 1; (A6)resulting in non-trivial extrinsic curvature component: K xx = ∂ t [ H ( x, t )]2 (cid:112) − H ( x, t ) . (A7)During calculations we evolve to a ﬁnal time of T =1000 . In contrast to the standard AwA speciﬁcation Alcu-bierre et al. (2004); Babiuc et al. (2008) we make use of7

Figure 22.

Gauge-wave test. Top panel: Error of met-ric components based on Eq.(A8). Legend indicates scal-ing applied (based on an assumed nd order spatial scheme).Rescaling indicates anticipated convergence is well-obeyed.Bottom panel: Hamiltonian constraint with displayed datafollowing the legend of Fig.20. It is clear that with increasedresolution constraint violation converges away. Consistentbehavior between CC and VC discretizations is observed atall samplings. See text for discussion. the puncture gauge where we found it crucial to select anon-zero shift-damping of η = 2 . In addition to inspec-tion of the constraints through (cid:107)H(cid:107) ∞ we also considerconvergence of the aggregate, induced metric quantity: (cid:15) γ ( t ; ρ ) = (cid:118)(cid:117)(cid:117)(cid:116) (cid:88) i,j =1 (cid:68) γ ij ( t ) | ρ − γ ij ( t ) | ρ (cid:69) RMS , (A8)where the root-mean-square (RMS) value is computed atﬁxed times over a Mesh sampled with N = 50 . In Fig.22 we plot (cid:15) γ ( t ; ρ ) for a choice of A = 1 / and d = 1 inEq.(A4) and ﬁnd a nd order rate of convergence as isexpected for N g = 2 .The behavior of the shifted case is identical for thestated parameters and therefore we do not show it.A two-dimensional variant of the gauge wave testmay also be considered where an initial spatial rota-tion of π/ is made. In particular, coordinates aremapped according to SO (2) (cid:51) R : ( x, y ) (cid:55)→ (ˆ x, ˆ y ) =( x + y, x − y ) / √ . This results in x -aligned propaga-tion mapped to a periodic diagonal trajectory in the ˆ x - ˆ y plane. Here as per AwA speciﬁcation we evolve toa ﬁnal time of T = 100 and selected resolutions basedon ρ ∈ { , , , } . For an amplitude of A = 1 / we similarly found a nd order rate of convergence with N g = 2 in rescaling of (cid:15) γ ( ρ ) . For the case of A = 1 / however we did not observe clean nd order convergence.Indeed the AwA speciﬁcation suggestion to use an evenhigher amplitude A = 1 / is well-known to cause issueswith stability in a variety of formulations and regard-less of puncture or harmonic gauge choice Daverio et al.(2018); Cao & Hilditch (2012); Boyle et al. (2007).We consider GR-Athena++ to pass this test in boththe one-dimensional (un)-shifted cases and in the two-dimensional unshifted case with the caveat that initialamplitude must be reduced.A.4.

AwA summary

We have demonstrated that

GR-Athena++ with Z4 ccoupled to the moving puncture gauge passes the AwA ro-bust stability (§A.1) and the one-dimensional linearizedwave (§A.2) tests. For the gauge wave tests (§A.3) weﬁnd that

GR-Athena++ passes for a choice of reducedinitial amplitude of the propagated wave.REFERENCES

Abbott, B. P., et al. 2016a, Phys. Rev. Lett., 116, 061102,doi: 10.1103/PhysRevLett.116.061102—. 2016b, Phys. Rev. Lett., 116, 241102,doi: 10.1103/PhysRevLett.116.241102—. 2017a, Phys. Rev. Lett., 119, 161101,doi: 10.1103/PhysRevLett.119.161101—. 2017b, Class. Quant. Grav., 34, 044001,doi: 10.1088/1361-6382/aa51f4—. 2020, Living Reviews in Relativity, 23,doi: 10.1007/s41114-020-00026-9Akutsu, T., et al. 2020, arXiv:2009.09305 [astro-ph,physics:gr-qc]. https://arxiv.org/abs/2009.09305 Alcubierre, M., Brügmann, B., Diener, P., et al. 2003,Phys.Rev., D67, 084023,doi: 10.1103/PhysRevD.67.084023Alcubierre, M., et al. 2004, Class. Quant. Grav., 21, 589,doi: 10.1088/0264-9381/21/2/019Alﬁeri, R., Bernuzzi, S., Perego, A., & Radice, D. 2018,Journal of Low Power Electronics and Applications, 8,doi: 10.3390/jlpea8020015Amaro-Seoane, et al. 2017, arXiv:1702.00786 [astro-ph].https://arxiv.org/abs/1702.00786Ansorg, M., Brügmann, B., & Tichy, W. 2004, Phys. Rev.,D70, 064011, doi: 10.1103/PhysRevD.70.064011Arnowitt, R. L., Deser, S., & Misner, C. W. 1959, Phys.Rev., 116, 1322, doi: 10.1103/PhysRev.116.1322 —. 2008, Gen. Rel. Grav., 40, 1997,doi: 10.1007/s10714-008-0661-1Babiuc, M., et al. 2008, Class. Quant. Grav., 25, 125012,doi: 10.1088/0264-9381/25/12/125012Baiotti, L., Bernuzzi, S., Corvino, G., De Pietri, R., &Nagar, A. 2009, Phys. Rev., D79, 024002,doi: 10.1103/PhysRevD.79.024002Baker, J. G., van Meter, J. R., McWilliams, S. T.,Centrella, J., & Kelly, B. J. 2007, Phys.Rev.Lett., 99,181101, doi: 10.1103/PhysRevLett.99.181101Baumgarte, T., & Shapiro, S. 2010, Numerical Relativity(Cambridge: Cambridge University Press)Baumgarte, T. W., & Shapiro, S. L. 1999, Phys. Rev., D59,024007, doi: 10.1103/PhysRevD.59.024007Berger, M. J., & Colella, P. 1989, Journal of ComputationalPhysics, 82, 64, doi: 10.1016/0021-9991(89)90035-1Berger, M. J., & Oliger, J. 1984, J.Comput.Phys., 53, 484Bernuzzi, S. 2020, Invited Review for GERG.https://arxiv.org/abs/2004.06419Bernuzzi, S., & Dietrich, T. 2016, Phys. Rev., D94, 064062,doi: 10.1103/PhysRevD.94.064062Bernuzzi, S., & Hilditch, D. 2010, Phys. Rev., D81, 084003,doi: 10.1103/PhysRevD.81.084003Bernuzzi, S., Nagar, A., Balmelli, S., Dietrich, T., & Ujevic,M. 2014, Phys.Rev.Lett., 112, 201101,doi: 10.1103/PhysRevLett.112.201101Bernuzzi, S., Thierfelder, M., & Brügmann, B. 2012,Phys.Rev., D85, 104030,doi: 10.1103/PhysRevD.85.104030Berrut, J.-P., & Trefethen, L. N. 2004, SIAM Review, 46,501, doi: 10.1137/S0036144502417715Bona, C., Bona-Casas, C., & Palenzuela, C. 2010, PhysicalReview D, 82, 124010, doi: 10.1103/PhysRevD.82.124010Bona, C., Ledvinka, T., Palenzuela, C., & Zacek, M. 2003,Phys. Rev., D67, 104005,doi: 10.1103/PhysRevD.67.104005Bona, C., Massó, J., Seidel, E., & Stela, J. 1995, Phys. Rev.Lett., 75, 600Bowen, J. M., & York, Jr., J. W. 1980, Phys. Rev., D21,2047, doi: 10.1103/PhysRevD.21.2047Boyle, M., Lindblom, L., Pfeiﬀer, H., Scheel, M., & Kidder,L. E. 2007, Physical Review D, 75, 024006,doi: 10.1103/PhysRevD.75.024006Boyle, M., et al. 2019, Class. Quant. Grav., 36, 195006,doi: 10.1088/1361-6382/ab34e2Brandt, S., & Brügmann, B. 1997, Phys. Rev. Lett., 78,3606, doi: 10.1103/PhysRevLett.78.3606Brown, D., Diener, P., Sarbach, O., Schnetter, E., & Tiglio,M. 2009, Physical Review D, 79, 044023,doi: 10.1103/PhysRevD.79.044023 Brügmann, B., Gonzalez, J. A., Hannam, M., et al. 2008,Phys.Rev., D77, 024027,doi: 10.1103/PhysRevD.77.024027Bugner, M., Dietrich, T., Bernuzzi, S., Weyhausen, A., &Brügmann, B. 2016, Phys. Rev., D94, 084004,doi: 10.1103/PhysRevD.94.084004Burstedde, C., Holke, J., & Isaac, T. 2019, Foundations ofComputational Mathematics, 19, 843,doi: 10.1007/s10208-018-9400-5Burstedde, C., Wilcox, L. C., & Ghattas, O. 2011, SIAMJournal on Scientiﬁc Computing, 33, 1103,doi: 10.1137/100791634Campanelli, M., Lousto, C. O., Marronetti, P., &Zlochower, Y. 2006, Phys. Rev. Lett., 96, 111101,doi: 10.1103/PhysRevLett.96.111101Cao, Z., & Hilditch, D. 2012, Phys.Rev., D85, 124032,doi: 10.1103/PhysRevD.85.124032Cao, Z., Yo, H.-J., & Yu, J.-P. 2008, Physical Review D, 78,124011, doi: 10.1103/PhysRevD.78.124011Carter Edwards, H., Trott, C. R., & Sunderland, D. 2014,Journal of Parallel and Distributed Computing, 74, 3202,doi: 10.1016/j.jpdc.2014.07.003Chirvasa, M., & Husa, S. 2010, Journal of ComputationalPhysics, 229, 2675, doi: 10.1016/j.jcp.2009.12.016Clough, K., Figueras, P., Finkel, H., et al. 2015.https://arxiv.org/abs/1503.03436Damour, T., Nagar, A., Hannam, M., Husa, S., &Brügmann, B. 2008, Phys. Rev., D78, 044039,doi: 10.1103/PhysRevD.78.044039Daverio, D., Dirian, Y., & Mitsou, E. 2018.https://arxiv.org/abs/1810.12346Dietrich, T., & Bernuzzi, S. 2015, Phys.Rev., D91, 044039,doi: 10.1103/PhysRevD.91.044039Dietrich, T., Radice, D., Bernuzzi, S., et al. 2018, Class.Quant. Grav., 35, 24LT01,doi: 10.1088/1361-6382/aaebc0Felker, K. G., & Stone, J. M. 2018, Journal ofComputational Physics, 375, 1365,doi: 10.1016/j.jcp.2018.08.025Fernando, M., Neilsen, D., Lim, H., Hirschmann, E., &Sundar, H. 2018, doi: 10.1137/18M1196972Friedrich, H. 1985, Communications in MathematicalPhysics, 100, 525, doi: 10.1007/BF01217728Galaviz, P., Bruegmann, B., & Cao, Z. 2010, PhysicalReview D, 82, 024005, doi: 10.1103/PhysRevD.82.024005Goldberg, J. N., MacFarlane, A. J., Newman, E. T.,Rohrlich, F., & Sudarshan, E. C. G. 1967, J. Math.Phys., 8, 2155 Goodale, T., Allen, G., Lanfermann, G., et al. 2003, inVector and Parallel Processing – VECPAR’2002, 5thInternational Conference, Lecture Notes in ComputerScience (Berlin: Springer)Grete, P., Glines, F. W., & O’Shea, B. W. 2019,arXiv:1905.04341 [astro-ph, physics:physics].https://arxiv.org/abs/1905.04341Gundlach, C., Martin-Garcia, J. M., Calabrese, G., &Hinder, I. 2005, Class. Quant. Grav., 22, 3767,doi: 10.1088/0264-9381/22/17/025Gustafsson, B., Kreiss, H.-O., & Oliger, J. 2013,Time-dependent problems and diﬀerence methods; 2nded., Pure and applied mathematics a wiley series of texts,monographs and tracts (Somerset: Wiley).https://cds.cern.ch/record/2122877Hannam, M., Husa, S., Ohme, F., Müller, D., & Brügmann,B. 2010, Phys. Rev., D82, 124008,doi: 10.1103/PhysRevD.82.124008Healy, J., Lousto, C. O., Lange, J., et al. 2019, Phys. Rev.D, 100, 024021, doi: 10.1103/PhysRevD.100.024021Herrmann, F., Hinder, I., Shoemaker, D., & Laguna, P.2007, Classical and Quantum Gravity, 24, S33,doi: 10.1088/0264-9381/24/12/S04Hilditch, D., Bernuzzi, S., Thierfelder, M., et al. 2013, Phys.Rev., D88, 084057, doi: 10.1103/PhysRevD.88.084057Hilditch, D., & Ruiz, M. 2018, Class. Quant. Grav., 35,015006, doi: 10.1088/1361-6382/aa96c6Hilditch, D., Weyhausen, A., & Brügmann, B. 2016, Phys.Rev., D93, 063006, doi: 10.1103/PhysRevD.93.063006Holmström, M. 1999, SIAM Journal on ScientiﬁcComputing, 21, 405, doi: 10.1137/S1064827597316278Huerta, E. A., Haas, R., Jha, S., Neubauer, M., & Katz,D. S. 2019, Computing and Software for Big Science, 3,5, doi: 10.1007/s41781-019-0022-7Husa, S., González, J. A., Hannam, M., Brügmann, B., &Sperhake, U. 2008, Class. Quant. Grav., 25, 105006,doi: 10.1088/0264-9381/25/10/105006Jani, K., Healy, J., Clark, J. A., et al. 2016, Class. Quant.Grav., 33, 204001, doi: 10.1088/0264-9381/33/20/204001Ketcheson, D. I. 2010, Journal of Computational Physics,229, 1763, doi: 10.1016/j.jcp.2009.11.006Kidder, L. E., et al. 2017, J. Comput. Phys., 335, 84,doi: 10.1016/j.jcp.2016.12.059Kreiss, H. O., & Oliger, J. 1973, Methods for theapproximate solution of time dependent problems(Geneva: International Council of Scientiﬁc Unions,World Meteorological Organization)LIGO Scientiﬁc Collaboration. 2018, LIGO AlgorithmLibrary - LALSuite, free software (GPL),doi: 10.7935/GT1W-FZ16 Lindblom, L., Scheel, M. A., Kidder, L. E., Owen, R., &Rinne, O. 2006, Class.Quant.Grav., 23, S447,doi: 10.1088/0264-9381/23/16/S09Loﬄer, F., et al. 2012, Class. Quant. Grav., 29, 115001,doi: 10.1088/0264-9381/29/11/115001Lousto, C. O., Nakano, H., Zlochower, Y., & Campanelli,M. 2010, Phys.Rev., D82, 104057,doi: 10.1103/PhysRevD.82.104057Mewes, V., Zlochower, Y., Campanelli, M., et al. 2020,Phys. Rev. D, 101, 104007,doi: 10.1103/PhysRevD.101.104007—. 2018, Physical Review D, 97, 084059,doi: 10.1103/PhysRevD.97.084059Miller, J., Dolence, J., Gaspar, A., et al. 2021, Parthenonperformance portable AMR framework,https://github.com/lanl/parthenonMorton, G. M. 1966, A computer oriented geodetic database and a new technique in ﬁle sequencing, Tech. rep.Mösta, P., Mundim, B. C., Faber, J. A., et al. 2014,Class.Quant.Grav., 31, 015005,doi: 10.1088/0264-9381/31/1/015005Müller, D., & Brügmann, B. 2010, Class. Quant. Grav., 27,114008, doi: 10.1088/0264-9381/27/11/114008Nagar, A., et al. 2018, Phys. Rev., D98, 104052,doi: 10.1103/PhysRevD.98.104052Nakamura, T., Oohara, K., & Kojima, Y. 1987, Prog.Theor. Phys. Suppl., 90, 1Nakano, H. 2015, Classical and Quantum Gravity, 32,177002, doi: 10.1088/0264-9381/32/17/177002Nakano, H., Zlochower, Y., Lousto, C. O., & Campanelli,M. 2011, Phys.Rev., D84, 124006,doi: 10.1103/PhysRevD.84.124006Peters, P. C. 1964, Phys. Rev., 136, B1224,doi: 10.1103/PhysRev.136.B1224Peters, P. C., & Mathews, J. 1963, Phys. Rev., 131, 435,doi: 10.1103/PhysRev.131.435Pollney, D., Reisswig, C., Schnetter, E., Dorband, N., &Diener, P. 2011, Phys. Rev., D83, 044045,doi: 10.1103/PhysRevD.83.044045Pretorius, F. 2005, Phys. Rev. Lett., 95, 121101,doi: 10.1103/PhysRevLett.95.121101Punturo, M., Abernathy, M., Acernese, F., et al. 2010,Class.Quant.Grav., 27, 194002,doi: 10.1088/0264-9381/27/19/194002Purrer, M., Husa, S., & Hannam, M. 2012, Phys. Rev. D,85, 124051, doi: 10.1103/PhysRevD.85.124051Radice, D., Bernuzzi, S., & Perego, A. 2020, Ann. Rev.Nucl. Part. Sci., 70,doi: 10.1146/annurev-nucl-013120-1145410