GRAthena++: puncture evolutions on vertex-centered oct-tree AMR
Boris Daszuta, Francesco Zappa, William Cook, David Radice, Sebastiano Bernuzzi, Viktoriya Morozova
DDraft version January 22, 2021
Typeset using L A TEX twocolumn style in AASTeX63
GR-Athena++ : puncture evolutions on vertex-centered oct-tree AMR
Boris Daszuta, Francesco Zappa, William Cook, David Radice,
2, 3, 4
Sebastiano Bernuzzi, andViktoriya Morozova
2, 3 Theoretisch-Physikalisches Institut, Friedrich-Schiller-Universität Jena, 07743, Jena, Germany Institute for Gravitation and the Cosmos, The Pennsylvania State University, University Park, PA 16802, USA Department of Physics, The Pennsylvania State University, University Park, PA 16802, USA Department of Astronomy and Astrophysics, The Pennsylvania State University, University Park, PA 16802, USA (Dated: January 22, 2021)
ABSTRACTNumerical relativity is central to the investigation of astrophysical sources in the dynamical andstrong-field gravity regime, such as binary black hole and neutron star coalescences. Current chal-lenges set by gravitational-wave and multi-messenger astronomy call for highly performant and scal-able codes on modern massively-parallel architectures. We present
GR-Athena++ , a general-relativistic,high-order, vertex-centered solver that extends the oct-tree, adaptive mesh refinement capabilities ofthe astrophysical (radiation) magnetohydrodynamics code
Athena++ . To simulate dynamical space-times
GR-Athena++ uses the Z4 c evolution scheme of numerical relativity coupled to the moving punc-ture gauge. We demonstrate stable and accurate binary black hole merger evolutions via extensiveconvergence testing, cross-code validation, and verification against state-of-the-art effective-one-bodywaveforms. GR-Athena++ leverages the task-based parallelism paradigm of
Athena++ to achieve ex-cellent scalability. We measure strong scaling efficiencies above for up to ∼ . × CPUs andexcellent weak scaling is shown up to ∼ CPUs in a production binary black hole setup with adaptivemesh refinement.
GR-Athena++ thus allows for the robust simulation of compact binary coalescencesand offers a viable path towards numerical relativity at exascale. INTRODUCTIONNumerical relativity (NR) provides robust techniquesfor constructing numerical solutions to the Einstein fieldequations (EFE). The phenomenology of astrophysicalinspiral and merger events, such as those between bi-nary constituents involving (variously) black holes (BH)or neutron stars (NS) can be succinctly described withgravitational waves (GW) computed with NR Pretorius(2005); Baker et al. (2007); Campanelli et al. (2006);Shibata & Uryu (2000). This has crucially assisted inthe recent detection of such events by the LIGO andVirgo collaborations Abbott et al. (2016a,b, 2017a). Asthe operating sensitivity of these detectors is improvedAbbott et al. (2020) and new (KAGRA Akutsu et al.(2020)), approved (LISA Amaro-Seoane et al. (2017)),or proposed (Einstein telescope Punturo et al. (2010),Cosmic explorer Abbott et al. (2017b)) designs come on-line a concomitant enlargement of the physical parame-ter space that may be experimentally probed is offered.These experimental efforts are complemented by pub-licly available catalogs of simulation data provided bythe NR community (see e.g. Dietrich et al. (2018); Boyleet al. (2019); Healy et al. (2019); Jani et al. (2016)). On the NR side there is thus a pressing requirementto better resolve and characterize the underlying physicsduring simulation of binary black holes (BBH) in evermore extreme configurations, such as higher mass ratioLousto et al. (2010); Nakano et al. (2011) or to providediscrimination between candidate models in descriptionof binary NS Radice et al. (2020); Bernuzzi (2020) orBH-NS Shibata & Taniguchi (2011) events. Such sim-ulations can be extremely demanding from the point ofview of computational resources and viability typicallyhinges upon the availability of high performance com-puting (HPC) infrastructure Huerta et al. (2019). Thusaccurate NR codes that remain performant as HPC re-sources are scaled up and simultaneously allow for thescope of input physics to be simply extended are crucial.An important concern for NR investigations of the bi-nary merger problem is treatment of features sensitive towidely-varying length and time-scales. An approach in-spired by Berger-Oliger Berger & Oliger (1984) (see alsoBerger & Colella (1989)) is based on the introduction ofa sequence of hierarchically, well-nested patches (usuallyboxes) of increasing resolution and decreasing diametercentered at the constituents of the binary. The rela- a r X i v : . [ g r- q c ] J a n tive spatial positions of such patch configurations maybe arranged to automatically track the time-evolutionof the aforementioned compact objects. This has beena common approach adopted for NR codes that buildupon the open source Cactus framework Goodale et al.(2003) and utilize the
Carpet thorn Schnetter et al.(2004) (see also the
Einstein toolkit Loffler et al. (2012)for an overview tailored to astrophysical applications).Some notable code implementations based on
Cactus are
Llama
Pollney et al. (2011); Reisswig et al. (2013),
McLachlan
Brown et al. (2009),
LEAN
Sperhake (2007),
LazEv
Zlochower et al. (2005), and
Maya
Herrmann et al.(2007) furthermore within this framework magnetohy-drodynamics (MHD) may be coupled through use of
GRHydro
Mösta et al. (2014) or
WhiskyTHC
Radice et al.(2014). Other examples of non-
Cactus codes adoptingthe Berger-Oliger approach include
BAM
Brügmann et al.(2008); Galaviz et al. (2010); Thierfelder et al. (2011),
AMSS-NCKU
Cao et al. (2008). Elements of this approachare also shared by the recent
GRChombo
Clough et al.(2015). Thus far, all code-bases discussed here make(at least some) use of Cartesian grid coordinatizationsand involve use of finite-difference (FD) approximantsto derivative operators.Unfortunately here a priori specification of patch hier-archies is usually required which makes capturing emer-gent features at unexpected locations challenging. Moreimportantly for Berger-Oliger the overhead of synchro-nization of solution data between patches on differ-ing levels can incur heavy performance penalties whichspoil scaling in modern highly-parallel HPC architec-tures Stout et al. (1997).Other approaches such as those based on pseudo-spectral methods are represented by
SpeC
Szilagyi et al.(2009) where multi-patch decomposition of the compu-tational domain is made using a combination of topo-logical spheres and cylinders. The SXS collaborationBoyle et al. (2019) has used
SpeC to produce some ofthe longest and most-accurate binary GW to date, albeitBBH mass ratio (defined q := m /m where m i are con-stituent masses and m ≥ m ) for publicly available GRtemplates remains confined to q ≤ . Closely relatedare efforts based on the discontinuous Galerkin (DG)method such as bamps Hilditch et al. (2016); Bugneret al. (2016) and
SpECTRE
Kidder et al. (2017). An-other recently pursued alternative has been an attemptto eliminate the need for refinement altogether throughgeneration of adapted, problem-specific curvilinear-gridsas recently demonstrated in
SENR/NRPy+
Mewes et al.(2020, 2018); Ruchlin et al. (2018).A blend of benefits that ameliorates some of the dis-advantages of the above approaches is offered in block- based adaptive mesh refinement (AMR) strategies Stoutet al. (1997). Crucially, in contrast to hierarchical,nested patches, for block-based AMR each physical po-sition on a computational domain is covered by oneand only one level. This reduces the problem of syn-chronizing the data of a solution over differing levels tocommunication between block boundaries (as in DG)which when logically arranged into an oct-tree (in spatial dimensions – see e.g. Burstedde et al. (2011))can greatly improve computational efficiency throughpreservation of data locality in memory. Furthermore,making use of task-based parallelism as the computa-tional model greatly facilitates the overlap of communi-cation and computation. Additionally, great flexibilityis maintained in how the computational domain can berefined. The recent work of Dendro-GR
Fernando et al.(2018) utilizes such an approach in treatment of the vac-uum sector of the EFE with the BSSNOK formulationNakamura et al. (1987); Shibata & Nakamura (1995);Baumgarte & Shapiro (1999) where solution representa-tion is in terms of adaptive wavelets and choice of regionsto refine controlled by the wavelet expansion itself Holm-ström (1999). Scaling in terms of HPC performanceappears to have been convincingly demonstrated with
Dendro-GR for a mock q = 10 BBH event, although nu-merical accuracy and full scale evolutions, together withGW characteristics that would potentially be calculatedduring production runs are not presented for any q .In this work we present our effort to build upon apublic version of Athena++
White et al. (2016); Felker& Stone (2018); Stone et al. (2020) where changes tocore functionality have been made, together with intro-duction of new modules targeted towards solution of theEFE. We refer to these new features as
GR-Athena++ .Originally
Athena++ was conceived as a framework forpurely non- and special-relativistic MHD, as well asGRMHD for stationary space-times, which adopts manyof the mature and robust numerical algorithms of Stoneet al. (2008) in a modern
C++ design centered aroundblock-based AMR. Key design elements include nativesupport for Cartesian and curvilinear coordinates and aparticular focus on future-proofing through code mod-ularity. The computational model is task-based andembeds hybrid parallelism through dual use of mes-sage passing interface (MPI) and threading via OpenMP(OMP). In addition to excellent scaling properties onHPC infrastructure, modularity and modern code prac-tices have allowed for extension of
Athena++ to hetero-geneous architectures through
Kokkos
Carter Edwardset al. (2014) resulting in performant MHD calculationon graphics processing units with
K-Athena
Grete et al.(2019) (see also parthenon
Miller et al. (2021)).These attractive properties served as a strong moti-vation in development of
GR-Athena++ where we haveimplemented the Z4 c formulation Bernuzzi & Hilditch(2010); Ruiz et al. (2011); Weyhausen et al. (2012);Hilditch et al. (2013) of NR utilizing the (moving) punc-ture gauge Brandt & Brügmann (1997); Baker et al.(2007); Campanelli et al. (2006). We provide accu-rate and efficient extensions to derivative approximantsthrough (templated) arbitrary-order FD based on Al-fieri et al. (2018). Our introduction of vertex-centered(VC) variable treatment (extending core cell- and face-centered functionality) is motivated by a desire to matchany selected FD order in calculations that involve AMR.Furthermore our implementation of the level to leveltransfer operators that occur in AMR takes advantageof the particular structure over sampled nodes at dif-fering grid levels to simultaneously improve computa-tional efficiency and accuracy. Within (GR-)Athena++ time-evolution is achieved through the standard methodof lines approach where scheme order may be specifiedflexibly. In this work the formal order of the spatialdiscretizations considered are th and th whereas thetemporal treatment is at th order.As calculating quantities such as GW typically in-volves integration over spherical surfaces we have intro-duced a module implementing geodesic spheres based onWang & Lee (2011). A primary demonstration of thisfunctionality is presented in direct, cross-code validationagainst BAM where extracted GW are computed from theWeyl scalar Ψ and the gravitational strain is examinedfor the prototype BH and BBH calibration problems ofBrügmann et al. (2008). Additionally, it is importantthat simulations can be carried out that provide dataof physical relevance to detection efforts. To this endwe consider an equal mass BBH inspiral on an initiallyquasi-circular (i.e low eccentricity) co-orbit that resultsin a merger event. Initial data is based on the config-uration of Hannam et al. (2010). Verification is madeutilizing the state-of-the-art, NR informed, effective onebody model of TEOBResumS
Nagar et al. (2018).As we prioritized HPC efficiency it is thus importantthat
GR-Athena++ preserves the already impressive be-havior of
Athena++ where over parallel efficiencyis shown in weak scaling tests with uniform grids em-ploying up to ∼ . × CPUs for MHD/HD prob-lems Stone et al. (2020).
Dendro-GR code, in whichBBH are evolved using an oct-tree grid as we do in
GR-Athena++ but with different strategies, demonstratesvery good scaling properties of q = 10 BBH evolutionsutilizing up to ∼ . × CPUs. In these tests, however,re-mesh and inter-grid transfer operations are disabled. In this work we aim to reach such performance for BBHevolutions with full AMR.The rest of this paper is organized as follows: In §2 weprovide further details on the computational approachtaken within (GR-)Athena++ together with the variousextensions we have made to core functionality. Subse-quently in §3 an overview of the Z4 c system we use inour calculations is provided together with description ofthe numerical algorithms employed. Refinement strat-egy and details concerning grids are provided in §4. In §5we discuss results of extensive testing of GR-Athena++ onBH and BBH problems performing cross-code validationand assessing convergence properties whereupon in §6computational performance is detailed through strongand weak scaling tests. Finally §7 summarizes and con-cludes. METHOD
GR-Athena++ builds upon
Athena++ thus in order tospecify nomenclature, provide a self-contained descrip-tion, and explain our extensions, we first briefly recountsome details of the framework (see also White et al.(2016); Felker & Stone (2018); Stone et al. (2020)).In (GR-)Athena++ overall details about the domain Ω over which a problem is formulated are abstractedfrom the salient physics and contained within a classcalled the Mesh . Within the
Mesh an overall represen-tation of the domain as a logical n -rectangle is stored,together with details of coordinatization type (Carte-sian or more generally curvilinear), number of pointsalong each dimension for the coarsest sampling N M =( N M , · · · , N M d ) , and physical boundary conditions on ∂ Ω . In order to partition the domain we first fix a choice N B = ( N B , · · · , N B d ) where each element of N B mustdivide each element of N M component-wise. Then Ω is domain-decomposed through rectilinear sub-divisioninto a family of n -rectangles satisfying Ω = (cid:116) i ∈ Z Ω i ,where Z is the set of MeshBlock indices, correspond-ing to the ordering described in §2.1. Nearest-neighborelements are constrained to only differ by a single sub-division at most. The
MeshBlock class stores propertiesof an element Ω i of the sub-division. In particular thenumber of points in the sampling of Ω i is controlledthrough the choice of N B . For purposes of communi-cation of data between nearest neighbor MeshBlock ob-jects the sampling over Ω i is extended by a thin layer ofso-called “ghost nodes” in each direction. Furthermorethe local values (with respect to the chosen, extendedsampling on Ω i ) of any discretized, dependent field vari-ables of interest are stored within the MeshBlock .In both uniform grid ( ∀ i ∈ Z ) vol(Ω i ) = C and refinedmeshes ( ∃ i, j ∈ Z ) vol(Ω i ) (cid:54) = vol(Ω j ) it is crucial toarrange inter- MeshBlock communication efficiently – tothis end the relationships between differing
MeshBlock objects are arranged in a tree data structure, to whichwe now turn. 2.1.
Tree Structure of
Mesh
For the sake of exposition here and convenience inlater sections we now particularize to a Cartesian coor-dinatization though we emphasize that the general pic-ture (and our implementation) of the discussions hereand in §2.2 carry over to the curvilinear context withonly minor modification. (GR-)Athena++ stores the logical relationship betweenthe
MeshBlock objects (i.e. Ω i ) involved in descrip-tion of a domain Ω within a tree data structure. Abinary-tree, quad-tree or oct-tree is utilized when d :=dim(Ω) = 1 , , respectively. The relevant tree is thenconstructed by first selecting the minimum N such that N exceeds the largest number of Ω i along any dimen-sion. The root of the tree is assigned a logical levelof zero and the tree terminates at level N with every MeshBlock assigned to an appropriate leaf, though someleaves and nodes of the tree may remain empty. Datalocality is enhanced, as references to
MeshBlock objectsare stored such that a post-order, depth-first traversalof the tree follows Morton order (also termed Z-order)Morton (1966). This order can be used to encode multi-dimensional coordinates into a linear index parametriz-ing a Z-shaped, space-filling curve where small changesin the parameter imply spatial coordinates that are closein a suitable sense Burstedde et al. (2019).As an example we consider a three-dimensional
Mesh described by ( N x , N y , N z ) = (2 , , MeshBlock ob-jects in each direction at fixed physical level in Fig.1.Consider now a
Mesh with refinement. Function dataat a fixed physical level is transferred one level finerthrough use of a prolongation operator P ; dually, func-tion data may be coarsened by one physical level throughrestriction R . The number of physical refinement levelsadded to a uniform level, domain-decomposed Ω is con-trolled by the parameter N L . By convention N L startsat zero. Subject to satisfaction of problem-dependentrefinement criteria, there may exist physical levels at , · · · , N L . When a given MeshBlock is refined (coars-ened) d MeshBlock objects are constructed (destroyed).This is constrained to satisfy a refinement ratiowhere nearest-neighbor
MeshBlock objects can differ byat most one physical level.In Fig.2 we consider an example of a non-periodic Ω described by N x = N y = N z = 2 MeshBlock objectswith N L = 3 selected with refinement introduced at the Figure 1.
Example of
Mesh partitioned uniformly by
MeshBlock objects indexed via Z-order and traced in redthrough constituent geometric centroids. The logical rela-tionship between Ω i is stored in an oct-tree. Empty leavesare suppressed though each populated node above logicallevel three has eight children. Notice that physical level p and logical level l are distinct. See text for further discus-sion. corner x max , z max . If periodicity conditions are imposedon ∂ Ω then additional refinement may be required forboundary intersecting MeshBlock objects so as to main-tain the aforementioned inter-
MeshBlock refine-ment ratio.2.2.
Vertex-centered Discretization
Natively
Athena++ supports cell-centered (CC) andface-centered (FC) description of variables, togetherwith calculation of line-averages on cell edges Stoneet al. (2020).
GR-Athena++ extends support to allowfor vertex-centering (VC). The modifications required toachieve this are extensive as core code must be changedin such a way so as to complement existing functionality.The modularity and good code practices of
Athena++ greatly facilitated matters. Our motivation for intro-duction of VC is a desire to ensure each stage of our nu-merical scheme maintains consistent (high) order whilesimultaneously maintaining efficiency of R and P op-erator choice and implementation. In the remainder ofthis section we briefly describe this newly introducedfunctionality.2.2.1. VC and Communication: Fixed Physical Level
Unless otherwise stated in all remaining sections wefix N M and N B to be uniform in each dimension andrepresent each of these tuples with a single scalar. As a Figure 2.
Example of
Mesh partitioned and refined by
MeshBlock objects indexed (labels explicitly indicated up tological level two) via Z-order and traced in red through con-stituent geometric centroids. The logical relationship be-tween p Ω i and neighbors is stored in an oct-tree. There areno unpopulated leaves. Notice that physical level p and logi-cal level l are distinct; coloring corresponds to physical level: p = 0 in black, p = 1 in dark green, p = 2 in blue, and p = 3 in purple. See text for further discussion. preliminary, x ∈ [ a, b ] is said to be vertex-centered whendiscretized as x I = a + Iδx where δx = ( b − a ) /N B and I = 0 , . . . , N B yielding N B + 1 total samples. In prac-tice, to this an additional N g so-called ghost nodes areappended which extend the interval by N g δx on bothsides. When d = dim(Ω j ) = 2 , an appropriate ten-sor product of such extended one-dimensional discretiza-tions is utilized. When a field component V is sampledon such grids it is said to be VC. The additional ghostnodes form a layer that enables imposition of physicalboundary conditions and inter- MeshBlock communica-tion.Consider a domain decomposed into multiple
MeshBlock objects. Discretized variable data must becommunicated. An additional intricacy however arisesdue to the sharing of vertices at neighboring
MeshBlock interfaces that are not part of the ghost-layer. The num-ber of
MeshBlock objects a node is shared between isreferred to as the node-multiplicity.We illustrate this with a two-dimensional example.Let V be sampled on neighboring MeshBlock objects
Figure 3.
Schematic of (communicated) nodes on a two-dimensional
MeshBlock Ω i . The ghost-layer is shaded ingray with alternating shading demarcating differing neigh-bor MeshBlock objects. Nodes marked with “ (cid:3) ” are interiorto Ω i and are unaffected as neighbor data is received – allother nodes are updated. Ghost-layer multiplicities are in-dicated for “ (cid:4) ” and dark-green where µ = 1 whereas nodesin “ (cid:7) ” and light-green have µ = 2 . Interface nodes alongedges are marked with “ • ” in light-green and correspond to µ = 2 whereas corner nodes marked “ (cid:63) ” in green correspondto µ = 4 . See text for further discussion. of fixed physical level, where N B = 6 is chosen andghost-zone layer selected to have N g = 2 nodes. Fur-ther, we assume that Ω i is not on the physical boundaryof the domain. This entails that ( N B − nodes areinternal and the remainder require synchronization viadata received (i.e. populated) from neighboring blocksas depicted in Fig.(3). Note that independent commu-nication requests and buffers are posted for each neigh-bor. Communication from neighbors therefore has nopreferred order and consequently we follow an averagingapproach to achieve consistency as follows: All receiveddata is first additively accumulated on the MeshBlock with node-multiplicity µ dynamically updated in an aux-iliary array of d elements based on the location of therelevant neighbor . After data from all relevant neigh-bors has been received, a final division by µ is performed.This is done so as to not preferentially weight data fromany particular neighbor. In principle, it is possible toconstruct µ a priori, however, we elect to follow thisdynamical strategy to facilitate automatic treatment ofboundary conditions and avoid the additional complex-ity involved during Mesh refinement. For a given node µ is uniform in all VC variables that are to becommunicated and hence need only be constructed once in theabsence of Mesh refinement. The choice of d elements is made tosimultaneously treat communication over distinct physical levels(see §2.2.2). Communication: Distinct Physical Levels
We now consider a
Mesh featuring refinement. Thefundamental description of variables between neighbor-ing
MeshBlock objects may therefore differ by (at most)a single physical level (see also 2.1).A
MeshBlock at physical level p will be denoted by p Ω j and the corresponding collection of fields sampled usingVC discretization over the MeshBlock as F ( p Ω j ) . Inthis context V ∈ F ( p Ω j ) , has a complementary, coarseanalogue V c of ( (cid:98) N B / (cid:99) + 1) d samples further extendedby a coarse ghost-layer comprised of N cg nodes. Thesampling resolution for V c is thus half that used for V . In order to emphasize the physical level of a given MeshBlock and not blur the distinction between thetypes of samplings we also make use of the notation F c ( p Ω j ) ≡ F ( p − Ω j ) .In contrast to CC and FC as implemented in Athena++ our implementation of VC allows for N g and N cg to takeodd values and be independently varied. For simplicityof discussion we impose N g = N cg .When a Mesh involves multiple physical levels, priorto any communication of data, VC variables are initiallyrestricted so as to have a fundamental and coarse de-scription on each
MeshBlock excluding the ghost-layers.For logically Cartesian grids in particular, this turns outto be an inexpensive and exact operation (§2.2.3). Withthis initial step neighboring
MeshBlock objects at thesame physical level have V and V c communicated usingthe method described in §2.2.1.To describe our treatment when neighboring physi-cal levels differ, consider a two-dimensional Mesh where N B = 8 and N g = 2 . Once more, we work within alocal portion of the full Mesh where the role of the phys-ical boundary may be ignored. Suppose p Ω A is neigh-bored by p +1 Ω B and p +1 Ω C to the east and the lattertwo MeshBlock objects share a common edge. Figure 4shows how the ghost-layer nodes of the finer p +1 Ω B based on the coarser neighbor p Ω A are populated. Inthis situation data may be freely posted for communi-cation to the MeshBlock on the finer level whereuponghost-zones of its coarse variable are populated. How-ever depending on the details of P , the prolongation op-eration over the ghost-layer is blocked in the sense thatthe entirety of the coarse ghost-layer of F c ( p +1 Ω B ) mustfirst be populated. Once fully populated, prolongationis carried out on the target MeshBlock . During synchro-nization of data from coarse to fine levels interface nodesare maintained at the value of the finer level.An example of the dual process of populating nodeson a coarser level involving p Ω A and p +1 Ω C is depictedin Fig.5. In this case (previously) restricted data of thefiner MeshBlock interior is communicated, updating the
Figure 4.
Schematic of two-dimensional
MeshBlock p Ω A used to populate ghost-nodes of finer MeshBlock p +1 Ω B . Lo-cal view of the Mesh depicts nearest-neighbor
MeshBlock con-nectivity and physical levels. Nodes over p Ω A , and p +1 Ω B together with coarse analogues are shown. Sampled values V ∈ F ( p Ω A ) that are to be sent are marked by “ • ” in darkgreen; this data is received and directly populates the ghost-nodes marked by “ (cid:4) ” in dark green, i.e., V ∈ F c ( p +1 Ω B ) .Once the remaining data for F c ( p +1 Ω B ) – marked by “ (cid:3) ”– is filled, and any multiplicity conditions (here suppressed)are accounted for, prolongation P : F c ( p +1 Ω B ) (cid:55)→ F ( p +1 Ω B ) can be performed in order to populate values at the ghost-nodes of p +1 Ω B marked by “ (cid:4) ” in purple. Notice that forthis procedure data at nodes on the neighbor interface re-main unchanged. See text for further discussion. Figure 5.
Schematic of two-dimensional
MeshBlock p +1 Ω C used to populate ghost-layer of coarser MeshBlock p Ω A . Lo-cally the Mesh has the same structure as in Fig.4. Nodes over p Ω A , and p +1 Ω C together with coarse analogues are shown.Prior to communication, sampled data of V ∈ F ( p +1 Ω C ) atnodes marked by “ • ” in purple must be restricted to popu-late data V c ∈ F c ( p +1 Ω C ) at nodes marked by “ • ” in darkgreen. This is then sent whereupon data at the nodes of p Ω A marked by “ (cid:4) ” in dark green is provided. During thisprocedure data at nodes on the neighbor interface are (addi-tively) updated (cf. Fig.4) and multiplicity conditions (heresuppressed) dynamically updated in an auxiliary array. Seetext for further discussion. common interface and ghost-layer of F ( p Ω A ) . In thissituation no blocking occurs. However, non-trivial mul-tiplicity conditions arise on the common neighbor inter-face. Furthermore, the equivalent operation involving p +1 Ω B instead of p +1 Ω C induces another edge withinthe ghost-layer of p Ω A . Finally, we note that valuesof F c ( p Ω A ) must also be updated. Thus another re-striction of samples of F c ( p +1 Ω C ) is also made and theoverall communication process repeated.The steps for the above communication procedure aresummarized in §2.2.4.2.2.3. Restriction and Prolongation
When a
Mesh is refined restriction R and prolongation P operations are required. In GR-Athena++ these oper-ations for VC variables are implemented based on uni-variate Lagrange polynomial interpolation or productsthereof when dim(Ω) > with function data utilized atnodes centered about a target-point of interest.For a Mesh sampled according to a Cartesian coordi-natization
MeshBlock grids are uniformly spaced in eachdimension. This provides immediate simplifications to R and P which may be understood as follows. Con-sider interpolation of a smooth function V on a one-dimensional interval. A polynomial interpolant ˜ V of de-gree N with samples of the function V symmetricallyand uniformly spaced about x ∗ that passes through the N + 1 distinct points: I V := { ( x i , V ( x i )) | x i = x ∗ + iδx ∧ i ∈ {− N, . . . , N }} , is unique and may be written in Lagrange form Tre-fethen (2013): ˜ V ( x ) = N (cid:88) i = − N l i ( x ) V ( x i ) , (1)where the Lagrange cardinal polynomials satisfy l i ( x j ) = δ ij when x j is a node used during formationof I V and δ ij is the Kronecker delta. We use Eq.(1) (orappropriate product generalizations) in order to specify R . Given function data on a uniform, VC discretized in-terval suppose we wish to construct data on the interiorof a coarser overlapping interval that shares the sameend-points and is sampled at twice the spacing. We findthat points in the image of R (i.e. desired points overthe coarse grid) form a subset of points over the orig-inal fine grid. Therefore the desired data may simplybe immediately copied (see Fig.5). This is efficient andinvolves no approximation.Recall that the restriction operator is utilized duringtransfer of data from a MeshBlock to a coarser neighbor.Consider the case of the two-fold coarsened data that must be provided to the neighbor
MeshBlock . While R as specified here is an idempotent operation, some caremust be taken, because the spatial extent of the ghost-nodes to be populated is sampled by the non-ghost dataof the source MeshBlock . To ensure this is possible weimpose a constraint relating
MeshBlock sampling andghost-layer through: N B ≥ max(4 , N g − . (2)The above does not place any constraint on whether N g is even and therefore a choice of an odd or evennumber is allowed.Interpolation based on Eq.(1) is also utilized for pro-longation. Here function data is transferred to a finer,uniformly sampled grid of half the spacing and conse-quently interspersed nodes coincide (see Fig.4) offeringanother optimization in execution efficiency. Due tothe uniform structure of the source and target grids,interpolation at non-coincident nodes may be imple-mented through a weighted sum where weight factorscan be precomputed Berrut & Trefethen (2004). In prac-tice the width of the interpolation stencil we utilize is N = (cid:98)N cg / (cid:99) + 1 .Tailored, optimized routines for GR-Athena++ incor-porate the above simplifications for the case of logicallyCartesian grids.Finally, for later convenience we note that the d -rectangle [ x L , x R ] d as represented by a Mesh with Carte-sian coordinatization, N M points along each dimension,and N L physical levels of refinement has a grid spacingon the finest level of: δx = x R − x L N M N L − . (3)2.2.4. Summary
We close discussion of VC by providing a compactsummary of the overall logic involved during synchro-nization of data between
MeshBlock objects for a prob-lem involving refinement.At compile time N g and N cg are selected and C++ templates specify precomputed weights for any requisiteinterpolation during a computation (thus fixing R and P ). A given problem of interest may then be executedfor some choice of N M , N B (subject to Eq.(2)), N L , andphysical grid.The following steps are taken when function data froma MeshBlock p Ω i is to be sent :i. Non-ghost data is restricted populating F c ( p Ω i ) .ii. Neighbor MeshBlock objects are iterated over andtreated according to the physical level of the targetneighbors and the communication buffers are popu-lated from: p − Relevant interior (and shared interface) nodesof F c ( p Ω i ) ; similarly F ( p Ω i ) is twice restricteddirectly to the communication buffer. p : Relevant interior (and shared interface) nodesof F ( p Ω i ) together with F c ( p Ω i ) . p + 1 : Relevant interior nodes not on the common in-terface from F ( p Ω i ) .The following steps are taken when function data on p Ω i is to be received :i. The ghost-layer of variable data for the given MeshBlock p Ω i is set to zero and any previously ac-cumulated multiplicites are reset.ii. Non-ghost data is restricted populating F c ( p Ω i ) .iii. Function data is independently received (unordered)from neighbor MeshBlock objects. Treatment againsplits based on physical level of the salient neighborwith additive updating of the following
MeshBlock -local function data: p − Relevant ghost-layer nodes of F c ( p Ω i ) . p : Relevant ghost-layer and interface nodes of F c ( p Ω i ) and F ( p Ω i ) . p + 1 : Relevant ghost-layer and interface nodes of F ( p Ω i ) and F c ( p Ω i ) .iv. Once all neighbor function data is received, divisionby multiplicity conditions is carried out.v. Regions of the ghost-layer involving a coarser levelneighbor may finally be prolongated.For local calculations (in the absence of distributed, MPIcommunications) operations are performed locally inmemory. Finally we emphasize that the base Athena++
CC and FC variables when required continue to simul-taneously function as explained in Stone et al. (2020).2.3.
Geodesic spheres
Calculating quantities such as the ADM mass, mo-mentum and gravitational radiation associated with anisolated system typically involves integration over spher-ical surfaces, the radii of which are controlled by a lim-iting procedure. In practice, a large but finite radiusis often selected during numerical work. Denote the -sphere of fixed radius R by S R . The natural choice ofspherical coordinatization for S R involves uniform sam-pling in the polar and azimuthal angles ( ϑ, ϕ ) and it iswell-known that problems may arise at the poles duringdescription of geometric quantities; furthermore, pointstend to cluster there which may be undesirable from Figure 6.
Structure of the geodesic grid used by
GR-Athena++ . Left panel: an example of a low resolution (92vertices) geodesic grid highlighting the features of the grid.Right panel: a grid used for gravitational wave extraction inproduction simulations (9002 vertices). the stand-point of efficiency in some applications. In
GR-Athena++ we avoid these issues by instead workingwith triangulated geodesic spheres. In short, a geodesicsphere of radius R (denoted Q R ) may be viewed as theboundary of a convex polyhedron embedded in R withtriangular faces, i.e., a simplicial 2-sphere that is home-omorphic to S R . A sequence of geodesic spheres with anincreasing number of vertices (and consequently surfacetiling triangles) thus serves as a sequence of increasinglyaccurate approximants to S R ; see Fig.6To construct the geodesic grid we start from a regularicosahedron with 12 vertices and 20 plane equilateral tri-angular faces, embedded in a unit sphere. We refine itusing the so called “non-recursive” approach describedin Wang & Lee (2011). In this approach, each planeequilateral triangle of the icosahedron is divided into n Q small equilateral triangles (each side of the triangleis split into n Q equal segments, where n Q is called thegrid level). The intersection points are projected ontothe unit sphere, and together with the original 12 ver-tices of the icosahedron they form the convex polyhedronused as a grid. The resulting polyhedron has n Q + 2 vertices, in which we define the desired physical quan-tities. The left panel of Fig.6 shows the grid consistingof 92 vertices ( n Q = 3 ), while the right panel shows thegrid consisting of 9002 vertices ( n Q = 30 ).Integrals on the sphere are computed with numeri-cal quadratures. To this aim we associate to each gridpoint a solid angle in the following way. We constructcells around each vertex of the grid by connecting thecircumcenters of any pair of triangular faces that sharea common edge. The resulting cells are mostly repre-sented by hexagons, apart from the 12 vertices of theoriginal icosahedron, which have only five neighbors andtherefore correspond to pentagonal cells. The solid an-gles subtended by the cells at the center of the sphereare used as weighting coefficients when computing theaverages. The logical connection between the neighbor-ing cells is implemented as described in Randall et al.(2002).Using a geodesic grid ensures more even tiling of thesphere compared to the uniform latitude-longitude gridof similar resolution. The ratio between the solid anglescorresponding to the largest and smallest cells in the n Q = 30 grid is equal to . For comparison, a gridof comparable resolution with uniform sampling in thepolar and azimuthal angles (say, 67 ϑ angles and 134 ϕ angles, with the total of 8978 cells), would have theratio between the areas of the smallest and largest cells (cid:39) / sin( π/ (cid:39) . . Z4C
SYSTEM IN
GR-Athena++
In the Cauchy problem for the Einstein field equations(EFE), a globally hyperbolic space-time M is foliatedby a family of non-intersecting spatial slices { Σ t } t ∈ R where the parametrizing time-function t is assumedglobally defined. An initial slice Σ t is selected andwell-posed evolution equations based on the EFE mustbe prescribed. A variety of mature approaches exist tothis problem such as BSSNOK Nakamura et al. (1987);Shibata & Nakamura (1995); Baumgarte & Shapiro(1999) or those based on the generalized harmonic gauge(GHG) formulation Friedrich (1985); Pretorius (2005);Lindblom et al. (2006). A unifying framework is pro-vided in the Z4 approach Bona et al. (2003) where par-ticular cases of both GHG and BSSNOK formulationsmay be recovered (see Bona et al. (2010) and referencestherein). In particular Z4 c Bernuzzi & Hilditch (2010);Ruiz et al. (2011); Weyhausen et al. (2012); Hilditchet al. (2013) seeks to combine the strengths of theseother two approaches Cao & Hilditch (2012) thus moti-vating it as the choice of formulation for GR-Athena++ .In §3.1 and §3.2 we describe the overall idea behind nu-merical evolution with Z4 c and implementation within GR-Athena++ . Details on our method for wave extrac-tion (i.e., calculation of gravitational radiation) is pro-vided in §3.3 whereupon §3.4 closes with a brief descrip-tion of numerical methods we utilize.3.1.
Overview
At its core, the Z4 formulation Bona et al. (2003) seeksto stabilize the time-evolution problem through directaugmentation of the EFE via suitable introduction ofan auxiliary, dynamical vector field Z a and first-ordercovariant derivatives thereof. The approach admits nat-ural incorporation of constraint damping via explicitappearance of (freely chosen) parameters κ i Gundlachet al. (2005); Weyhausen et al. (2012). Recall that in the standard method of ADM-decomposition Arnowitt et al. (1959); Baumgarte &Shapiro (2010) one introduces a future-directed t a satis-fying t a ∇ a [ t ] = 1 and considers t a = αn a + β a where n a is a future-directed, time-like, unit normal n a toeach member of the foliation Σ t , α is the lapse and β a the shift. Subsequently geometric projections of am-bient fields, to (products of) the tangent and normalbundle(s) of Σ may be considered, which here leads toevolution equations for the augmented EFE. The evo-lution equations are written in terms of the variables (cid:0) γ ij , K ij , Θ , ˇ Z i (cid:1) where γ ij is the induced metric and K ij the extrinsic curvature associated with Σ ; Θ := − n a Z a and ˇ Z i := ⊥ ai Z a (with ⊥ ab := g ab + n a n b and g ab being the space-time metric) are the normal and spatialprojections of Z a respectively. Furthermore Hamilto-nian, momentum, and auxiliary vector constraints mustalso be satisfied C U := ( H , M i , Z a ) = 0 such that a nu-merical space-time is faithful to a solution of the stan-dard EFE. Importantly, for a space-time without bound-ary if C U = 0 for some element of the foliation Σ t ∗ then analytically this property extends for all t Bonaet al. (2003). This compatibility of C U with the evolu-tion is one crucial property for numerical calculationsallowing for a choice of free-evolution scheme. In such ascheme equations are discretized and initial data of in-terest is prepared so as to satisfy C U = 0 on Σ t , duringthe course of the time-evolution C U is monitored and itmust be verified that any accumulated numerical errorconverges away with increased resolution.In Z4 c Bernuzzi & Hilditch (2010); Hilditch et al.(2013) to fashion an evolution scheme an additional stepis taken wherein a spatial conformal degree of freedomis first factored out via: ˜ γ ij := ψ − γ ij , ˜ A ij := ψ − (cid:16) K ij − Kγ ij (cid:17) ; (4)with K := K ij γ ij and ψ := ( γ/f ) / where γ and f are determinants of γ ij and some spatial referencemetric f ij respectively. Here we assume f ij is flat andin Cartesian coordinates which immediately yields the algebraic constraints : C A := (cid:0) ln(˜ γ ) , ˜ γ ij ˜ A ij (cid:1) = 0 . (5)The expression C A = 0 must be continuously enforced during numerical evolution (see §3.4) to ensure consis-tency Cao & Hilditch (2012). Additionally we introduce From the point of view of computational efficiency this is trivialto accomplish but strictly speaking doing so entails a partially-constrained evolution scheme. χ := γ − / , ˆ K := K − (6) ˜Γ i := 2˜ γ ij Z j + ˜ γ ij ˜ γ kl ∂ l [˜ γ jk ] , (cid:98) Γ i := ˜ γ jk ˜Γ ijk ; (7)where the definition of χ implies that χ = ψ − . Col-lectively the Z4 c system is comprised of dynamical vari-ables (cid:0) χ, ˜ γ ij , ˆ K, ˜ A ij , Θ , ˜Γ i (cid:1) which are governed by the evolution equations : ∂ t [ χ ] = 23 χ (cid:16) α ( ˆ K + 2Θ) − ∂ i [ β i ] (cid:17) + β i ∂ i [ χ ] , (8) ∂ t [˜ γ ij ] = − α ˜ A ij + β k ∂ k [˜ γ ij ] −
23 ˜ γ ij ∂ k [ β k ]+ 2˜ γ k ( i ∂ j ) [ β k ] . (9) ∂ t [ ˆ K ] = − D i [D i [ α ]] + α (cid:20) ˜ A ij ˜ A ij + 13 ( ˆ K + 2Θ) (cid:21) + β i ∂ i [ ˆ K ] + ακ (1 − κ )Θ + 4 πα [ S + ρ ] , (10) ∂ t [ ˜ A ij ] = χ [ − D i [D j [ α ]] + α ( R ij − πS ij )] tf + α [( ˆ K + 2Θ) ˜ A ij − A ki ˜ A kj ] + β k ∂ k [ ˜ A ij ]+ 2 ˜ A k ( i ∂ j ) [ β k ] −
23 ˜ A ij ∂ k [ β k ] , (11) ∂ t [Θ] = α (cid:104) ˜ H − κ (2 + κ )Θ (cid:105) + β i ∂ i [Θ] , (12) ∂ t [˜Γ i ] = − A ij ∂ j [ α ] + 2 α (cid:104) ˜Γ ijk ˜ A jk −
32 ˜ A ij ∂ j [ln( χ )] − κ (˜Γ i − (cid:98) Γ i ) −
13 ˜ γ ij ∂ j [2 ˆ K + Θ] − π ˜ γ ij S j (cid:105) + ˜ γ jk ∂ k [ ∂ j [ β i ]] + 13 ˜ γ ij ∂ j [ ∂ k [ β k ]]+ β j ∂ j [˜Γ i ] − (cid:98) Γ j ∂ j [ β i ] + 23 (cid:98) Γ i ∂ j [ β j ]; (13)where in Eq.(11) the trace-free operation is computedwith respect to γ ij and ˜ H is defined in Eq.(18). Defini-tions of matter fields are based on projections of the de-composed space-time, energy-momentum-stress tensor: T ab = ρn a n b + 2 S ( a n b ) + S ab , (14)in terms of the energy density ρ := T ab n a n b , momentum S i := − T bc n b ⊥ ci , and spatial stress S ij := T cd ⊥ ci ⊥ dj with associated traces T := g ab T ab = − ρ + S and S := γ ij S ij . The intrinsic curvature appearing in Eq.(11) isdecomposed according to: R ij = ˜ R χij + ˜ R ij , (15) where in terms of the conformal connection ˜D i compat-ible with ˜ γ jk : ˜ R χij = 12 χ (cid:20) ˜D i [ ˜D j [ χ ]] + ˜ γ ij ˜D l [ ˜D l [ χ ]] − χ ˜D i [ χ ] ˜D j [ χ ] (cid:21) − χ ˜D l [ χ ] ˜D l [ χ ]˜ γ ij , (16)and: ˜ R ij = −
12 ˜ γ lm ∂ l [ ∂ m [˜ γ ij ]] + ˜ γ k ( i ∂ j ) [˜Γ k ] + (cid:98) Γ k ˜Γ ( ij ) k + ˜ γ lm (2˜Γ kl ( i ˜Γ j ) km + ˜Γ kim ˜Γ klj ) . (17)Furthermore, we emphasize that in Eq.(13) and Eq.(17)it is crucial to impose (cid:98) Γ i where it appears through thedefinition of Eq.(7).The dynamical constraints in terms of transformedvariables ( ˜ H , ˜ M i , Θ , ˇ Z i ) may be monitored to assessthe quality of a numerical calculation: ˜ H := R − ˜ A ij ˜ A ij + 23 (cid:0) ˆ K + 2Θ (cid:1) − πρ = 0 , (18) ˜ M j := ˜D i [ ˜ A ij ] −
32 ˜ A ij ∂ i [ln( χ )] − ∂ j [ ˆ K + 2Θ] − πS j = 0 , (19) Θ =0 , ˇ Z i =˜Γ i − (cid:98) Γ i = 0 . (20)Depending on the quantities of interest we may alterna-tively monitor the original non-rescaled constraints C U .Furthermore, we introduce for later convenience a single,scalar-valued collective constraint monitor: C := (cid:113) H + γ ij M i M j + Θ + 4 γ ij ˇ Z i ˇ Z j . (21)Finally we note that we have made use of the freedom toadjust the system by non-principal parts prior to con-formal decomposition so as to have a result closer toBSSNOK (which may be obtained by taking the formallimit Θ → in Eqs.(8–13)).3.2. Gauge choice and boundary conditions
To close the Z4 c system it must be further supple-mented by gauge conditions (i.e., conditions on α and β i ) that specify how the various elements Σ t of the foli-ation piece together. Furthermore in this work the com-putational domain does not extend to spatial infinityand consequently boundary conditions (BC) on ∂ Ω mustalso be imposed.In GR-Athena++ we make use of the puncture gaugecondition which consists of the Bona-Másso lapse Bona1et al. (1995) and the gamma-driver shift Alcubierre et al.(2003): ∂ t [ α ] = − µ L α ˆ K + β i ∂ i [ α ] ,∂ t [ β i ] = µ S α ˜Γ i − ηβ i + β j ∂ j [ β i ] . (22)In specification of Eq.(22) we employ the lapsevariant µ L = 2 /α together with µ S = 1 /α . Initially a“precollapsed” lapse and zero-shift is set: α | t =0 = ψ − (cid:12)(cid:12) t =0 , β i (cid:12)(cid:12) t =0 = 0; (23)where the choice is motivated by a resulting reductionin initial gauge dynamics Campanelli et al. (2006). Theshift damping parameter η appearing in Eq.(22) reduceslong-term drifts in the metric variables Alcubierre et al.(2003) and serves to magnify the effective spatial resolu-tion near a massive feature, which in turn reduces noisein its local motion and extracted gravitational wave-forms Brügmann et al. (2008) (see also §5). We adopt afixed choice η = 2 /M where M is the total ADM massArnowitt et al. (2008) of the system throughout thiswork as it is known to lead to successful time evolutionof binary black holes (BBH) of comparable masses Brüg-mann et al. (2008) and improves stability more broadlyCao & Hilditch (2012). With a view towards potentialinvestigations of high mass ratio binaries we have alsoincorporated η damping conditions within GR-Athena++ based on BBH location as a function of time Purreret al. (2012) together with the conformal factor basedapproach of Nakano et al. (2011); Lousto et al. (2010);Müller & Brügmann (2010).When coupled to the puncture gauge with the choicesmade above Z4 c forms a PDE system that is stronglyhyperbolic Cao & Hilditch (2012) and consequently theinitial value problem is well-posed Bernuzzi & Hilditch(2010). Artificial introduction of a boundary at finitedistance complicates the analysis of numerical stabil-ity significantly Hilditch & Ruiz (2018). For the ini-tial boundary value problem the analysis benefits fromsymmetric hyperbolicity of the underlying evolutionarysystem however for Z4 c starting in fully second orderform this property does not appear to exist within alarge class of symmetrizers Cao & Hilditch (2012). Wedo not seek to address this issue further here. Ourboundary treatment follows an approach due to Hilditchet al. (2013). We consider a Cartesian coordinatiza-tion of the Mesh Ω as a compactly contained domainwithin Σ t capturing the physics of interest. On ∂ Ω Sommerfeld BC are imposed on the subset of dynam-ical fields { ˆ K, ˜Γ , Θ , ˜ A ij } . Though this choice is notoptimal, as it is not constraint-preserving, we have notexperienced issues on account of this. An interesting consideration for future work would be to incorporatewithin GR-Athena++ the constraint-preserving BC of e.g.Ruiz et al. (2011); Hilditch et al. (2013) (see also Rinneet al. (2009)). 3.3.
Wave extraction
To obtain the gravitational wave content of the space-time, we calculate the Weyl scalar Ψ , the projectionof the Weyl tensor onto an appropriately chosen nulltetrad k, l, m, ¯ m . We use the same definition of Ψ hereas Brügmann et al. (2008): Ψ = − R abcd k a ¯ m b k c ¯ m d , (24)where we have exchanged the Weyl tensor for the Rie-mann tensor since we extract the gravitational wavesin vacuum. The dimensional Riemann tensor is con-structed from split ADM variables using the Gauss-Codazzi relations as detailed in Brügmann et al. (2008).To construct the null tetrad we start from a spatial co-ordinate basis: φ i =( − y, x, , r i =( x, y, z ) , θ i = (cid:15) ikl φ k r l ; (25)which is then Gram-Schmidt orthonormalized. Thenewly formed orthonormal, spatial triad is extended tospace-time with th components set to . With this weconstruct the null tetrad: k = 1 √ n − ˆ r ) , l = 1 √ n + ˆ r ); m = 1 √ θ + i ˆ φ ) , ¯ m = 1 √ θ − i ˆ φ ); (26)where n a = (1 /α, − β i /α ) . Once the Weyl scalar isobtained we perform a multipolar decomposition ontospherical harmonics of spin-weight s = − , defined asfollows : ψ (cid:96)m = (cid:90) π (cid:90) π Ψ ¯ − Y (cid:96)m sin ϑ d ϑ d ϕ, (27) − Y (cid:96)m = (cid:114) (cid:96) + 14 π d (cid:96)m ( ϑ ) e imϕ , (28) d (cid:96)m ( ϑ ) = k (cid:88) k = k ( − k (( (cid:96) + m )!( (cid:96) − m )!( (cid:96) − (cid:96) + 2)!) / ( (cid:96) + m − k )!( (cid:96) − k − k !( k − m + 2)! × (cid:18) cos (cid:18) ϑ (cid:19)(cid:19) (cid:96) + m − − k (cid:18) sin (cid:18) ϑ (cid:19)(cid:19) k − m +2 (29) k = max(0 , m − , (30) k = min( (cid:96) + m, (cid:96) − . (31) Convention here is that of Goldberg et al. (1967) up to a Condon-Shortley phase factor of ( − m . Ψ is firstcalculated at all grid points throughout the Mesh . Thisis then interpolated onto a set of geodesic spheres Q R (see §2.3) at given extraction radii R Q , over which theintegral in Eq.(27) is performed. Recall that the gridlevel parameter n Q controls the total number of sampleson on Q R as n Q + 2 . We select n Q through localmatching to an area element of a MeshBlock : n Q = (cid:115) πR Q δx − . (32)The modes of the gravitational wave strain h are com-puted from the projected Weyl scalar by integratingtwice in time ψ (cid:96)m = ¨ h (cid:96)m , (33)The strain is then given by the mode-sum: R ( h + − ih × ) = ∞ (cid:88) (cid:96) =2 (cid:96) (cid:88) m = − (cid:96) h (cid:96)m ( t ) − Y (cid:96)m ( ϑ, ϕ ) . (34)Following the convention of the LIGO algorithms libraryLIGO Scientific Collaboration (2018) we set: Rh (cid:96)m = A (cid:96)m exp( − iφ (cid:96)m ) , (35)and the gravitational-wave frequency is: ω (cid:96)m = ddt φ (cid:96)m . (36)3.4. Numerical technique
We implement the Z4 c system described in §3.1 in GR-Athena++ based on the method of lines approachwhere field variables may be chosen to obey a VC orCC discretization at compile time and time-evolution isperformed using the fourth order in time, four stage,low-storage RK S ] method of Ketcheson (2010).Generic spatial field derivatives in the bulk (awayfrom ∂ Ω ) are computed with high-order, centered, finitedifference (FD) stencils whereas shift advection termsuse stencils lopsided by one grid point Zlochower et al.(2005); Husa et al. (2008); Brügmann et al. (2008); Chir-vasa & Husa (2010). The implementation is based onAlfieri et al. (2018) and utilizes C++ templates to offerflexibility in problem-specific accuracy demands with-out performance penalties. A similar approach is takenfor implementation of the R and P operators discussedin §2.2.3. With this a consistent, overall, formal or-der throughout the bulk of the computational domain ismaintained during calculations by compile-time specifi-cation of the ghost-layer through choice of N g together with N cg . Throughout this work we take N g = N cg though for a VC discretization this is not a requirementwithin GR-Athena++ and may be tuned to the demandsof the desired
Mesh refinement strategy. In the case of Z4 c and VC this translates to an order for spatial dis-cretization in the bulk of N g − .We emphasize that during calculation of FD, R and P approximants, special care has been taken in the or-dering and grouping of arithmetical operations so as toreduce accumulation of small floating-point differences.This is a particularly important consideration in thepresence of physical symmetries where linear instabil-ities may amplify unwanted features present in the op-erator approximants and lead to resultant, spurious ap-pearance of asymmetry during late-time solutions Stoneet al. (2020).For most calculations involving Z4 c the treatment ofthe physical boundary is non-trivial. GR-Athena++ ex-tends
Athena++ by providing the Sommerfeld BC mo-tivated in §3.2. To accomplish this, within every time-integrator substep an initial Lagrange extrapolation isperformed so as to populate the ghost-layer at ∂ Ω . Or-der is again controlled at compile-time and we typicallyselect N g + 1 points for the extrapolation, albeit nu-merical experiments did not indicate significant changeswhen this choice was varied. The dynamical equationsof §3.1 and gauge conditions of §3.2 populate the sub-set of fields { χ, ˜ γ ij , α, β i } on nodes of ∂ Ω whereasfor { ˆ K, ˜Γ , Θ , ˜ A ij } Sommerfeld BC are imposed as inHilditch et al. (2013) where first order spatial deriva-tives are approximated through second order accurate,centered FD; we have found this to be crucial for nu-merical stability.As observed in Cao & Hilditch (2012) in the absenceof algebraic constraint C A projection Z4 c is only weak-hyperbolic. Therefore in GR-Athena++ we enforce C A at each time-integrator substep. On the other hand acoarse indicator on the overall error during the courseof a calculation is provided through inspection of theconstraints C U together with C (of Eq.(21)). Note thatthese latter constraints are not enforced.In order to have confidence in implementation detailswe have replicated a subset of tests from the “Appleswith Apples” test-bed suite Alcubierre et al. (2004);Babiuc et al. (2008); Cao & Hilditch (2012); Daverioet al. (2018) a discussion of which is provided in theappendix §A.The Z4 c system does not strictly impose any partic-ular underlying Mesh structure or refinement strategyand consequently we use this freedom to improve effi-ciency and accuracy by raising resolution only whereit is required. During evolution the Courant-Friedrich-3Lewy (CFL) condition must be satisfied. To achievethis in the context of refinement, spatial resolution onthe most refined level and the choice of
CFL factor itselfdetermines the global time-step that is applied on each
MeshBlock . Finally, in order to suppress high-frequencynumerical artifacts generated at
MeshBlock boundariesand not present in the physical solution we make useof high-order Kreiss-Oliger (KO) dissipation Kreiss &Oliger (1973); Gustafsson et al. (2013) of uniform fac-tor σ over all levels. In particular given a system oftime-evolution equations for a vector of variables u thereplacement ∂ t [ u ] ← ∂ t [ u ] + σ D [ u ] is made where D [ · ] is proportional to a spatial derivative of order N g − . MESH REFINEMENT FOR PUNCTURESBlack holes in
GR-Athena++ are modeled as in
BAM making use of the puncture formalism Brügmann et al.(2008). In numerical relativity, BH can be treated byadopting the Brill-Lindquist wormhole topology whichconsists of considering N black holes with N+1 asymp-totically flat ends for the initial geometry. These flatends are compactified and identified with points on R and the coordinate singularities at these points are called punctures . This allows one to produce black hole ini-tial data associating masses, momenta and spins to anynumber of black holes. The main application of thisformalism is binary black hole evolution.The adaptive mesh refinement criterion implementedin GR-Athena++ for puncture evolution mimics the clas-sic box-in-box refinement (used in e.g.
BAM , Cactus ),within the
Athena++ infrastructure. The main idea isto follow the punctures’ position during the evolutionand refine the grid depending on the distance from eachpuncture. 4.1.
Punctures’ initial data
Black holes initial data are constructed follow-ing Brügmann et al. (2008). We consider as our ini-tial data the positive-definite metric and extrinsic cur-vature ( γ ij , K ij ) on a spatial hypersurface Σ withtime-like unit normal n i such that n i n i = − . Suchinitial data are constructed by means of the confor-mal, transverse-traceless decomposition of the initial-value equations York (1979). We can use the map ofEq.(4) and freely choose an initially conformally flatbackground ˜ γ ij = δ ij and take a maximal slice, i.e.set K = 0 . Doing so, the momentum constraint be-comes ∂ j (cid:16) ψ ˜ A ij (cid:17) = 0 and admits Bowen-York solu-tions Bowen & York (1980) for an arbitrary number ofblack holes.The Hamiltonian constraint reduces to an ellipticequation for ψ , with solution (for N black holes with centers at r i ): ψ = 1 + N (cid:88) i =1 m i r i + u. (37)The variable ψ represents the initial value of ψ , which,based on its relation to χ , is evolved according to Eq.(8).In this equation the function u can be determined by anelliptic equation on R and is C at the punctures and C ∞ elsewhere. The variable m i is called the bare mass of a black hole and it coincides with the actual massonly in the Schwarzschild case. The total ADM mass ofeach black hole at the puncture is given by: M i = m i u i + (cid:88) i (cid:54) = j m j d ij , (38)where u i is the value of u at each puncture and d ij isthe coordinate distance between each pair of punctures.Ultimately, we denote the total mass of the system with M , which represents the physical mass scale of the prob-lem and thus all results will be reported accordingly.To produce BBH initial data following the above de-scription, we make use of an external C library based onthe pseudo-spectral approach of Ansorg et al. (2004),which is also used in the TwoPunctures thorn of Cactus . 4.2.
Puncture tracker
To follow the punctures’ position we need to solve anadditional ODE, which is not coupled to the Z4c system.Since the conformal factor χ vanishes at the puncture,Eq.(8) implies that Campanelli et al. (2006): ˙ x p ( t ) = − β | x p ( t ) , (39)where β | x p is the shift function evaluated at the punc-ture position. We solve this vectorial equation at everytimestep using an explicit Euler solver. Though BAM im-plements higher order methods to solve this equation(Crank–Nicolson method), the solution obtained withthe first order Euler solver agrees with that of
BAM wherea comparison is made for two trajectories in the leftpanel of Fig.10. 4.3.
Oct-tree box-in-box In Athena++ , adaptive mesh refinement (AMR) is im-plemented as follows: during the evolution a certain We adapted the public code into a stand-alone library thatmay be found at the URL https://bitbucket.org/bernuzzi/twopuncturesc/.
MeshBlock and conse-quently the code refines the particular
MeshBlock , de-refines it or does nothing. In the punctures’ case thecondition we employ relies on the punctures’ position.For a given
MeshBlock we first calculate the distance min i || x i p − x MB || ∞ , where x p , x MB denote the punctureand MeshBlock positions respectively, and i labels eachpuncture that is present. This allows one to assess the theoretical refinement level the MeshBlock should be in.If the refinement level of the considered
MeshBlock isnot the same as the theoretical one just calculated, thenthe block is either refined or de-refined according to itscurrent level. The theoretical refinement level is deter-mined by considering a classic box-in-box structure ofthe grid, in which each puncture is enclosed in a seriesof nested boxes centered on the puncture, all with thesame number of points but with increasing physical ex-tent, i.e. with decreasing resolution. In particular, eachbox has half of the resolution of the next one it contains.The presence of punctures and their position determinesa structure of nested boxes, in such a way that the small-est (and finest) imaginary box around a puncture definesthe highest refinement level. The box containing it cor-responds to a lower refinement level and so on up tothe th level which corresponds to the initial mesh it-self. Practically, to define a grid in (GR-)Athena++ oneneeds to specify the number of points of the initial meshgrid N M , the number of points per MeshBlock consti-tuting the mesh N B , and the number of total refinementlevels N L up to which the grid has to be refined. Follow-ing the procedure above, GR-Athena++ will refine each
MeshBlock , producing an oct-tree box-in-box grid struc-ture. A visualization of this can be seen in Fig.7, inwhich the initial configuration of two coalescing blackholes and a snapshot at later time are shown. Here N M = 64 and N B = 16 , thus the initial mesh is di-vided into MeshBlock s (level 0, see §2.1). Follow-ing the procedure described above,
MeshBlock s are sub-divided up to level 10, which corresponds to the smallest
MeshBlock s containing the two black holes. The finalgrid is composed of the initial
MeshBlock s, which arethose far from the punctures and thus untouched, andincreasingly smaller blocks (in terms of physical extent)for each refinement level. Note that in the top panellevels 7, 8, 9, 10 are visible, while due to regridding atlater times in the bottom panel
MeshBlock s of level 7 For implementation reasons, x MB is defined as follows: we con-sider a cube with same center, the edge of which is / of theedge of the original MeshBlock . x MB are the coordinates of thecorner of this cube which is closer to the closest puncture. (the ones closest to the edges of the plot) have beensubdivided and their children belong to level 8. Figure 7.
2D slice at z = 0 of a mesh grid produced with GR-Athena++ , setting 11 refinement levels. Top panel: blackholes at initial position x ± = ( ± . , , M . Bottompanel: snapshot at M in which trackers are shown (redlines). The color code refers to the value of the conformalfactor χ . For clarity, in the figure only a subset of the totalslice is shown, therefore only highest levels are visible. It isalso possible to see the underlying box-in-box structure, inwhich boxes are made up of MeshBlock s. Grid configurations
In order to accurately and efficiently perform a BBHmerger simulation it is crucial to optimize the grid con-figuration for a given problem. Thus to attain accuracyat reduced computational cost a balance must be strucksuch that the strong-field dynamics are well-resolved and5their effects propagate cleanly into the wave-zone for ex-traction. The former can be directly controlled based onthe oct-tree box-in-box refinement criterion and fixinga target puncture resolution δx p or equivalently, maxi-mum number of refinement levels N L . For the wave-zonean extraction radius R must be selected and the under-lying refinement strategy together with choices of N M and N B induce a resolution δx w . Finally, the maximumspatial extent x M of the computational domain Ω mustbe chosen to be sufficiently large so as to mitigate anypotential spurious effects due to imposed approximateboundary conditions.Unless otherwise stated we select Ω as the Cartesiancoordinatized cube [ − x M , x M ] which results in a res-olution on the most refined level in the vicinity of apuncture of: δx p = 2 x M N M N L − . (40)Wave-zone resolution δx w may similarly be computedtaking N L = (cid:100) log (cid:0) x M R (cid:1) (cid:101) in the above where R is theextraction radius. For the calculations presented in thiswork, we typically select N B = 16 , which allows forup to th order accuracy for approximants to operatorspertaining to quantities appearing during spatial dis-cretization. The constraint on maximum approximantorder that can be selected for a choice of N B is given inEq.(2) which arises on account of the double restrictionoperation described in §2.2.3. Natively, GR-Athena++ supports up to th order which may be further extendedthrough simple modification of the relevant C++ tem-plates.We have observed that a simple approach to furtheroptimize for efficiency once convergence properties areestablished is to modify N M and N B . PUNCTURE TESTSIn this section we present several tests of punctureevolutions to validate
GR-Athena++ . We compare ourresults against
BAM code and
TEOBResumS , used as bench-marks. We also demonstrate the convergence propertiesof our code for these tests.Unless otherwise stated, throughout this section weadopt tortoise coordinates, in which evolution time t is mapped as t → u ≡ t − r ∗ , where r ∗ = r +2 M log (cid:12)(cid:12) r M − (cid:12)(cid:12) and M, r are the total mass of thebinary system and the Schwarzschild coordinate, re-spectively. In waveform plots quantities are suitably Direct control on δx w is offered in GR-Athena++ through optionalintroduction of a minimum refinement level maintained over aball of radius R centered at C but for the results presented itwas not found to be necessary to utilize. rescaled by M (see §4.1) and by the symmetric massratio ν := M M ( M + M ) . The merger or time of merger is defined as the time corresponding to the peak of the ( (cid:96) = 2 , m = 2) -mode of A (cid:96)m (defined as in Eq.(35)).5.1. Single Spinning Puncture
In order to verify the evolution of a single black holepuncture, as well as the gravitational wave signal calcu-lated by
GR-Athena++ we perform a direct comparisonwith the established
BAM code. Using initial data gen-erated by the
TwoPunctures library for both codes, asused in similar tests of the
BAM code in Hilditch et al.(2013), we simulate the evolution of a single spinningpuncture, representing a Kerr black hole with dimen-sionless spin parameter a = 0 . . To this end we initial-ize two black holes, one with the target mass, M , andspin a = 0 . , and another with negligible mass, − M ,and zero spin, with a small initial separation, − M .These black holes merge soon after the simulation be-gins, and the resulting single black hole can be treated asour target Kerr BH. We use the static mesh refinementof GR-Athena++ to construct a refined grid around thepuncture that matches the resolution of the
BAM evolu-tion both at the puncture ( δx = 0 . M ) and in thewave zone ( δx = 0 . M ).To compare the two wave signals, we calculate thedominant (2 , mode of the strain from the expressionsabove in Eqs.(27–31, 33). In doing so, we perform twointegrations in the time domain (note this is differentto the frequency domain integration performed for the (2 , mode studied below for the black hole binary, ashere we have no well-motivated cut-off frequency avail-able Dietrich & Bernuzzi (2015)). These integrationsmay add an arbitrary quadratic polynomial in time ontothe strain as constants of integration Damour et al.(2008); Baiotti et al. (2009) and so, in the results pre-sented here, we fit for, and then subtract, this quadratic.We further note this reconstruction has been shown tointroduce errors in the waveform ring-down Dietrich &Bernuzzi (2015).In Fig.8 we show the match between the two calcu-lated signals for the real part of the (2 , mode of thegravitational wave strain. These show consistency inthe phasing of the signal, with slight discrepancies inthe amplitude of the strain.We also demonstrate the convergence properties of thewaveforms in GR-Athena++ for this test. We performthe same simulation at a coarse, medium and fine reso-lution (with finest grid spacings δx c = 0 . M, δx m = Hereafter the (2,2) mode, and similarly for other ( (cid:96), m ) modesand for all related quantities. Figure 8. (cid:60) ( Rh /M ) for a single spinning puncture in GR-Athena++ and
BAM with difference shown in black as afunction of Schwarzschild tortoise coordinate, u . Wave ex-tracted at R = 50 M . Figure 9.
The difference between the waveform at coarseand medium resolution ( δx c − δx m ) is consistent with thedifference between the waveform at medium and fine resolu-tion ( δx m − δx f ) when rescaled by the appropriate factor for th order convergence Q . Waves extracted at R = 50 M . M, δx f = 0 . M ), and show that the differ-ence between the medium and fine resolution waveformmatches the difference between the coarse and mediumwaveform, when rescaled by the factor Q n for n th orderconvergence, defined as: Q n = δx nc − δx nm δx nm − δx nf . (41)In Fig.9 we show convergence properties for a simula-tion with th order accurate finite differencing operators.We see here that, rescaling assuming th order conver-gence, GR-Athena++ demonstrates under-convergence atinitial times and a consistent order of convergence at later times, although without a precise point-wise scal-ing. We note that for similar tests performed with the
BAM code in Hilditch et al. (2013) using initial data con-structed in the same manner, these same convergenceproperties are observed for the waveform of the spin-ning puncture problem.5.2.
Calibration evolution of two punctures
We validate
GR-Athena++ against binary black holeevolutions by comparing with
BAM and performing con-vergence tests. For these tests the two initial non-spinning black holes, with bare-mass m ± = 0 . M , arelocated on the x − axis, with x , ± ( t = 0) = ± . M ,and initial momenta directed along the y − axis, p ± ( t =0) = ∓ . M . The gauge is chosen as explained in§3.2. For the GR-Athena++ vs.
BAM comparison and forconvergence tests several runs at different resolutionshave been performed. The grid configuration for bothcodes is described in detail below. This initial setup re-sults in a ∼ . orbits evolution of the two black holes be-fore merger, which happens at evolution time t ∼ M ,as can be seen in Fig.10.In the next two subsections the (2 , mode of gravi-tational wave strain is calculated according to Eq.(33).5.2.1. GR-Athena++ vs.
BAM comparison
Athena++ and
BAM implement completely differentgrid structures. To compare the two codes, we try togenerate grids as similar as possible aiming to matchthe resolution at the puncture and the physical extentof the grid. In the case of
GR-Athena++ we choosegrid parameters N B = 16 , N L = 11 , x M = 3072 M and various resolutions N M = [96 , , , .This results in resolutions at the puncture of δx p =[ 1 . , . , . , . × − M (seeEq.(40)). For BAM the same is achieved in each corre-sponding simulation by considering 6 nested boxes of N M points and 10 smaller boxes (5 per puncture, cen-tered at each one) of N M / points and maximum spac-ing in the outermost grids ∆ x = [96 , , , , M respectively. In both cases all simulations are performedwith th order finite differences stencils for derivatives.Fig.10 shows very good agreement between the twocodes regarding black hole trajectories (left panel). Thiscan be also seen looking at the right panel, in which thetwo GW frequencies perfectly match. There is a dis-crepancy of 2% between GW amplitudes that convergesaway with increasing resolution.5.2.2. Convergence tests for
GR-Athena++
Convergence tests are performed on the same runsas the previous section plus an additional run madeat resolution N M = 384 , with N B = 24 resulting in7 Figure 10.
Comparison between
BAM and
GR-Athena++ of the trajectories of puncture − (left panel) and of gravitationalwaveforms (right panel) for resolution N M = 192 . Waveforms are extracted at a representative radius R = 120 M and mergertime is defined as the amplitude peak time. Discrepancy between the two amplitudes is (cid:46) . Figure 11.
Convergence plots for calibration BBH evolution. Left and right plots correspond respectively to th order and th order finite differencing. In both case waves are extracted at R = 120 M . In bottom panels phase differences between resolutionsare rescaled according to Eq.(41) with respect to the blue line (corresponding to lowest resolutions). δx p = 0 . × − M . Moreover, we consider an-other set of runs employing th order finite differences,with the same grid setup as in the previous sectionbut halving the maximum extent of the physical grid,namely in this case x M = 1536 M . This has the ef-fect of doubling the resolution at the puncture. In thetop panels of Fig.11 we compare the gravitational wavestrain extracted at R = 120 M for every resolution, bothfor th and th order. In order to quantitatively investigate the effect of res-olution on phase error we inspect differences in phasebetween runs in the bottom panels of Fig.11. InvertingEq.(35) allows us to write the phase difference as: | ∆ φ ( α, β ) | := (cid:12)(cid:12)(cid:12) φ [ h ] | α − φ [ h ] | β (cid:12)(cid:12)(cid:12) . (42)In the bottom panel of the th order plot (Fig.11 left)the red and green lines match, demonstrating th orderconvergence for the highest resolutions. In the th ordercase (Fig.11 right) the waveforms (and corresponding8 Figure 12.
Self-convergence test for the calibration BBHevolution. In the plot phase difference at merger with re-spect to the highest resolution available is reported on y -axis, against the resolution N M . Diamonds correspond to th order series, while dots refer to th order series. Purpleand cyan lines represent the theoretical convergence for bothcases. frequencies) lay on top of each other, so as to be indis-tinguishable, and this translates into smaller phase dif-ferences comparing the bottom panels of the two plots.Even though the red line, corresponding to the phasedifference between the two highest resolution, is quitenoisy, th order convergence can be seen here for thethree highest resolutions as in the previous case. Thisbehavior is present in every extraction radius. In allcases, the plots show a convergent behavior with respectto the phase differences, i.e., the differences betweeneach pair of lines decreases with increasing resolution.Additionally, we check the accuracy of our convergencetests by evaluating the error on the phase differences, es-timated for each line as the difference between the phasegiven by the Richardson extrapolation formula and thephase corresponding to the highest resolution used tocalculate the Richardson extrapolated phase, similarlyto what is done in Bernuzzi & Dietrich (2016). For eachcorresponding phase difference line, we find that this er-ror is always at least ∼ smaller than ∆ φ .Additionally, we show in Fig.12 a convergence plot inwhich phase differences are calculated at merger (see be-ginning of §5). This figure further confirms the clean th order convergence of GR-Athena++ for the highest res-olution cases. For the th order case, comparing withthe other, phase differences are smaller and the conver-gence is faster. However, the noise in the phase differ-ences with respect to the highest resolution (right plotin Fig.11, bottom panel) makes the convergence assess-ment less clean. 5.3. Two punctures evolution of ten orbits
Physically one anticipates that inspiral of astrophysi-cal binary systems is well-described by a significant du-ration of co-orbit on a quasi-circular trajectory Peters &Mathews (1963); Peters (1964). This assumption is con-sistent with the events detected by the LIGO and Virgocollaborations Abbott et al. (2016a,b, 2017a). Conse-quently it is of considerable interest to also test the per-formance of
GR-Athena++ in this scenario. To this endwe evolve non-spinning, equal-mass, low eccentricity ini-tial data based on Hannam et al. (2010) where bare-massparameters are m ± = 0 . M and the puncturesare initially on-axis at x ± = ± . M with instan-taneous momenta p ± = ( ∓ . × − , ± . × − , M . This choice of parameters results in ∼ orbits prior to merger at t ∼ M . In comparison tothe calibration evolution this evolution is of significantlylonger duration and therefore it is of interest to inves-tigate how waveform accuracy is affected for a selectionof Mesh parameters that reduce computational resourcerequirements. In order to provide another comparisonthat is independent of
BAM here we provide a final as-sessment on the quality of waveforms computed with
GR-Athena++ based on the NR informed, effective-one-body model
TEOBResumS
Nagar et al. (2018).5.3.1.
Setup
The convergence studies performed for the calibra-tion BBH merger problem provide a guide as to howto choose resolution at the puncture δx p . Here we fixthe MeshBlock sampling to N B = 16 and work at th order in the spatial discretization. For the Mesh sam-pling we select N M = 64 and construct a sequence ofgrid configurations where each value of δx p is reducedby a factor of / compared to the previous in Tab.1. ρ ( · ) N L x M δx p × − [ M ] . . . . . . . Table 1.
A distinct label ρ ( · ) is assigned to each run withcorresponding maximum number of refinement levels N L andfixed physical extent x M of the Mesh (see §4.4). Resultantpuncture resolutions δx p and total number of MeshBlock ob-jects initially partitioning a
Mesh are provided.
That the choice of parameters in Tab.1 reduce overallcomputational resource requirements can be understood9
Figure 13.
Coordinate trajectories of both punctures( x + ( t ) in blue and x − ( t ) in orange) for parameter choice ρ h . as follows. Consider the choice of parameters made in ρ h and suppose N M and N L are varied while maintaining δx p fixed. With this, the number of MeshBlock objectsrequired to partition the initial
Mesh changes. For ex-ample, taking N M = 128 and N L = 12 resulted in aninitial number of MeshBlock objects of .Whereas selecting N M = 256 and N L = 11 leads to initially. Generally, we found an approx-imate cubic scaling in as N M is scaled which isrelated to the dimensionality of the problem.In this section an extraction radius of R = 100 M isused. The CFL condition is . and a KO dissipationof σ = 0 . is chosen. Constraint damping parametersare selected as κ = 0 . and κ = 0 .The coordinate trajectories of the punctures for a cal-culation utilizing grid parameters ρ h of Tab.1 can beseen to satisfy ten orbits in Fig.13. This provides an ini-tial verification of expected qualitative properties Han-nam et al. (2010) of the BBH inspiral and merger. Inorder to investigate the behavior of the constraints wefocus attention on the collective constraint C of Eq.(21).We display values of C in the orbital plane ( z = 0 ) atfixed times t = 500 M and t = 2100 M in Fig.14. Thegeneral properties of C discussed here we found to beshared between other simulations utilizing parametersfrom Tab.1. Crucially, this means that increasing re-finement in the vicinity of the puncture does not con-taminate the rest of the physical domain. In all cases wefound that away from the punctures values of C decreaseon average as the boundary of the computational do- Figure 14.
Values of the (normalized) collective constraint (cid:98) C ( x, y, z ) := C ( x, y, / max x, y C ( x, y, over the orbitalplane z = 0 for a simulation with ρ h of Tab.1. Upper panel:Evolution time is t = 500 M where max x, y C ( x, y, (cid:39) . . Lower panel: Evolution time is t = 2100 M where max x, y C ( x, y, (cid:39) . . As can be seen in both cases con-straint violation is greatest directly in the vicinity of thepunctures. See text for further discussion. main is approached. In particular, for the calculation in-volving parameters ρ h and during M ≤ t ≤ M as (cid:37) := (cid:112) x + y → M we found C ∼ − whichcontinues to decrease as (cid:37) → M to C ∼ − there-after plateauing at C ∼ − towards the boundary.We found qualitatively similar behavior when inspect-ing the Hamiltonian constraint. We remark that, evenin the continuum limit, constraints are not expected toconverge to zero in the entire domain for this solutionbecause punctures are excluded from R .Of principal interest for gravitational wave detectionis the strain. To this end we solve Eq.(33) for h in0 Figure 15.
The (2 , multipole of the GW strain normal-ized to the symmetric mass ratio ν = 1 / computed for sim-ulations based on parameters of Tab.1. Peak amplitude for achoice of ρ h occurs at u/M = 2037 . which indicates the endof the inspiral Bernuzzi et al. (2014). Dephasing as merger-time is approached reduces rapidly with increased resolution(see also Fig.16 though note that the legend differs there).Note: horizontal axis scale changes at u/M = 1500 . the frequency domain making use of the FFI methodof Reisswig & Pollney (2011). A cut-off frequency of f = 1 / × / is chosen which is physically motivatedby inspecting the early time puncture trajectories of theinspiral. We display the resulting (2 , mode of thestrain for calculations using the parameters of Tab.1 inFig.15.The peak amplitude of h indicates the end of theinspiral Bernuzzi et al. (2014) and for a grid parameterchoice of ρ h occurs at u = 2037 . M . The maximumdeviation from this value for parameters investigated inFig.15 occurs when ρ ml is utilized resulting in ∆ u =10 . M . In order to directly quantify how the choiceof δx p affects the phase error in the strain waveform asmerger time is approached we compute ∆ φ using Eq.(42)and show the result in Fig.16.As we have not modified resolution globally over thecomputational domain but rather considered the effectof introducing additional refinement levels in the vicinityof the punctures it is not clear what sort of convergenceshould be expected. Furthermore the extent to whicha time-integrator order below the order of the spatialdiscretization affects GW waveform quality can also besomewhat delicate (see e.g. the super-convergence dis-cussion of Reisswig & Pollney (2011)). In Fig.16 (upperpanel) clean th order convergence in ∆ φ is not found forall u upon rescaling with the appropriate factors deter-mined through Eq.(41). An additional issue that com-plicates the discussion here is that the parameters ofTab.1 also vary the spatial extent of the computationaldomain potentially introducing a source of systematic Figure 16.
Phase differences ∆ φ between simulations in-volving parameters of Tab.1. Upper panel: A trend of ∆ φ ac-cumulating with time is present. Merger time correspondingto ρ h is indicated with a vertical black line at u/M = 2037 . .A decrease in ∆ φ occurs as δx p pairs of decreasing values arecompared. In order to mitigate a systematic effect of variedspatial extent in the computational domain we also com-pute phase differences at fixed x M such as ∆ φ ( ρ l , ρ m ) and ∆ φ ( ρ m , ρ h ) . These two differences are also shown rescaledwith Q and Q (cid:48) as computed using Eq.(41) under the as-sumption of th order spatial discretization. Lower panel:Phase differences at merger computed with reference datataken from the ρ h run are depicted as a function of punctureresolution. Data on the black reference curve would obey a th order convergence trend. See text for further discussion. error. For example, at merger ∆ φ ( ρ mh , ρ h ) (cid:39) × − and ∆ φ ( ρ m , ρ h ) (cid:39) × − though δx p ( ρ m ) > δx p ( ρ mh ) .In order to compensate for this effect we consider phasedifferences at fixed x M . In particular the lower panel ofFig.16 displays phase differences at merger where ρ h istaken as the reference value for all comparisons. Whiledisplayed ∆ φ are compatible with a th order trend the ρ vvl choice of parameters is likely of too low resolution tomake a robust claim on convergence order with respectto varying δx p . Nonetheless it is clear that judiciouschoice of refinement level (and hence resolution local tothe punctures through δx p ) reduces GW phase error.5.3.2. EOB comparison
We compare the gravitational waveform from the orbit simulation to the state-of-the-art EOB model TEOBResumS
Nagar et al. (2018). The latter is in-formed by several existing NR datasets and faithfullymodels the two-body dynamics and radiation of spin-aligned BBH multipolar waveforms for a wide variety of1mass ratio and spins magnitudes. We focus again onthe (2 , mode of the gravitational wave strain. The GR-Athena++ ψ output mode is first extrapolated tonull infinity using the asymptotic extrapolation formulaLousto et al. (2010); Nakano (2015): lim r →∞ rψ ∼ A (cid:16) rψ − ( l − l + 2)2 r (cid:90) rψ d t (cid:17) , (43)where A ( r ) := 1 − M/r and r := R (1 + M/ (2 R )) with R being the (finite) GW extraction radius of anNR simulation. The extrapolated result is successivelyintegrated twice in time (Eq.(33)) using the FFI methodReisswig & Pollney (2011) to obtain the strain mode h . The waveform comparison is performed by suitablyaligning the two waveforms; the time and phase relativeshifts are determined by minimizing the L norm of thephase differences Bernuzzi et al. (2012).Figure 17 shows the two waveforms are compatiblewithin the NR errors. The accumulated EOBNR phasedifferences are of order (cid:39) . rad to merger and (cid:39) . radto the ring-down for the highest resolution GR-Athena++ simulation. The larger inaccuracy of the ring-down partis a resolution effect related to the higher frequency ofthe wave; it can potentially be improved by adding arefinement level so as to better resolve the black holeremnant. The maximum relative amplitude differenceis of order (cid:39) . . The same comparison using the low-est resolution gives (cid:39) . rad at merger ( (cid:39) rad duringthe ring-down) and maximum relative amplitude differ-ences of (cid:39) . . Overall, this analysis demonstrates that GR-Athena++ can produce high-quality data for wave-form modeling. SCALING TESTSTo check the performance of
GR-Athena++ and makesure it maintains the scalability properties of
Athena++ ,we conduct weak and strong scaling tests with the sameproblem setup as in §5.2. In these tests BBH evolu-tions of 20 Runge-Kutta time-steps are performed, withfull AMR and full production grids, in which N B = 16 , N L = 11 , x M = 1536 M are fixed and making use ofhybrid MPI and OMP parallelization. These tests areperformed on the cluster SuperMUC-NG at LRZ. Specif-ically, on each node of the cluster (48 CPUs per node) 8MPI tasks with 6 OMP threads are launched, thus fill-ing up the node. We find very good results using up to2048 nodes ( ∼ ) CPUs. With respect to scaling testspresented in Dendro-GR
Fernando et al. (2018), our re-sults favorably compare both in the case of strong andweak scaling tests.6.1.
Strong scaling tests
To test the strong scaling behavior of the code indifferent regimes of CPU numbers, we consider 6 res-olutions, namely N M = [64 , , , , , .For each of them we perform an evolution on i = N nodesmin , . . . , N nodesmax nodes, where N nodesmin/max representsome limits on the possible number of nodes that canbe used for each resolution. The presence of theseboundaries is due to the fact that for a certain res-olution a specific number of MeshBlock s is producedand consequently on the one hand a sufficient num-ber of CPUs is required to handle those
MeshBlock sand, on the other hand,
Athena++ ’s parallelization strat-egy is based on
MeshBlock s and so is not possible touse more OMP threads (and consequently CPUs) than
MeshBlock s. Fig.18 (left) shows that excellent strongscalability is obtained up to ∼ . × CPUs, withefficiency above . For the aforementioned reasons,for a given resolution it is not possible to achieve highefficiency when increasing CPU number above a cer-tain point. Fig.18 (right) shows that efficiency stronglydepends on the ratio between
MeshBlock number andCPUs. In particular, for each resolution, an efficiencyof above is obtained when there are at least 10
MeshBlock s/CPU. By contrast, the efficiency shown inthe strong scaling plot of Fernando et al. (2018) appearsto decrease faster when comparing to the brown line inFig.18 (left), although the two results are obtained inslightly different regimes of CPU number.6.2.
Weak scaling tests
For weak scaling tests we use asymmetric gridsin terms of N M in each direction, while keeping MeshBlock s of a constant size N B = 16 . In particu-lar, we start with a run on a single node with a grid N xM = 128 , N yM = 64 , N zM = 64 . Then for 2 nodes wedouble N yM . For 4 nodes N zM is doubled as well and for 8nodes N xM is also doubled. We continue this up to 2048nodes. In this way, we are able to double the resourcestogether with the required computations, and we man-age to keep a ratio of ∼ MeshBlock s/CPUs in eachdifferent run. The results are displayed in Fig.19. Weperformed these tests twice, once with the code compiledusing the Intel compiler and once with the code compiledusing the GNU compiler. The top panel of Fig.19 showsthat the total CPU time per MPI task remains constantup to CPUs employed, thus demonstrating excellentscalability in an unprecedented CPU number regime fora numerical relativity code. The bottom panel displaysthe same result, but focuses on how the execution timeis distributed between the main computational kernels.Notably, most of the computation time is spent in thecalculation of the right hand side of the equations, which2 − − − − − − − − − t − t mrg ) /M − . − . − . . . . . < ( R h / ( Mν )) GR-Athena++ < ( R h / ( Mν )) TEOBResumS − − − −
100 0 Mω GR-Athena++ Mω TEOBResumS
Figure 17.
Comparison between the
GR-Athena++
BBH q = 1 waveform and the semi-analytical effective-one-body model TEOBResumS . The plot shows the (2 , multipole of the GW strain normalized to the symmetric mass ratio ν = 1 / and theinstantaneous GW frequency. The time is shifted to the mode amplitude peak that approximately defines the merger time. The GR-Athena++ waveform is from the highest resolution simulation ( ρ h of Tab.1), extracted at coordinate radius R = 100 M andextrapolated to null infinity using Eq.(43). Note: horizontal axis scale changes at ( t − t mgr ) /M = − . Figure 18.
Strong scaling tests for
GR-Athena++ for several CPU number regimes. Left plot: speed-up (top panel) and efficiency(bottom panel) calculated with respect to the first point of the series. In each series the second point corresponds to a theoreticalspeed-up by a factor of 2, the third point a factor of 4 and so on. Right plot: efficiency as a function of the
MeshBlock loadeach CPU carries. is indeed the expected behavior in absence of race con-ditions or other bottlenecks elsewhere in the code. Thediscrepancy between the height of each bar in the bot-tom panel of the plot and the dotted line in the top panelgive an estimate on the communication time among allMPI processes and OMP threads and the comparisonbetween the two plots suggests that this also has goodscaling behavior. SUMMARY AND CONCLUSIONIn this work we have presented
GR-Athena++ ; a vertex-centered extension of the block-based AMR frameworkof
Athena++ for numerical relativity (NR) calculations. To this end we described our introduction of vertex-centered (VC) discretization for field variables whichmay be utilized for general problems. A principle ad-vantage of VC is that restriction of sampled functiondata from fine to coarse grids that are interspersed andthe coarser of which has fully coincident grid-points is ef-ficiently achieved by copying of data. This procedure isformally exact. The dual operation of prolongation viainterpolation of data from coarse to finer grids also takesadvantage of the aforementioned grid structure wherepossible.Another novel feature is our introduction of geodesicspheres in the sense of highly refined, triangulated con-vex, spherical polyhedra. Placement on a
Mesh may3
Figure 19.
Weak scaling tests for
Athena++ , performed on SuperMUC-NG at LRZ compiling the code with GNU compilerand Intel compiler. Top panel reports total CPU times measured for rank 0 directly in the code with the
C++ function clock_t .CPU time in the bottom panel is measured instead using the profiling tool gprof ; here, times which count less than 2% of thetotal CPU time are neglected. Light (left) bars represent results obtained with GNU compiler, while dark (right) ones are forIntel compiler. be arbitrarily chosen without restriction on underlyingcoordinatization. An advantage of this is that the as-sociated vertices defining a geodesic sphere achieve amore uniform spatial sampling distribution when con-trasted against traditional uniform spherical latitude-longitude grids at comparable resolution. Furthermore,the potential introduction of coordinate singularities isavoided. We demonstrated the utility of this approachwhen quantities need to be extracted based on sphericalquadratures.Our code implements the Z4 c formulation of NR of the χ moving-punctures variety. The overall spatial order ofthe scheme may be selected at compile time where easilyextensible C++ templates define the formal order for de- sired finite difference operators and VC restriction andprolongation operators. Time-evolution is performedwith a low-storage th order Runge-Kutta method. Ourprimary analysis tool is based on extraction of gravita-tional radiation waveforms within GR-Athena++ throughthe use of the Ψ Weyl scalar with numerical quadraturethereof based on the aforementioned geodesic spheres.Through a post-processing step we also analyze the grav-itational strain h .In order to assess the numerical properties of our im-plementation we have repeated a subset of the stan-dard Apples with Apples test-bed suite Alcubierre et al.(2004); Babiuc et al. (2008); Cao & Hilditch (2012); Dav-erio et al. (2018) and performed a variety of cross-code4validation tests against BAM . To accomplish the latterwithin the oct-tree AMR approach of
GR-Athena++ weconstructed a refinement criterion so as to closely emu-late a nested box-in-box grid structure. Test problemsin the cross-code comparison involved a single spinningpuncture and the two puncture binary black hole (BBH)inspiral calibration problem of Brügmann et al. (2008).For overall th order spatial scheme selection a commen-surate th order convergence was cleanly observed. For th order, convergence was also achieved for the sameproblem, though less cleanly. As another external val-idation of GR-Athena++ and a demonstration of utilityfor potential future studies of high mass ratio BBH thatinvolve significant evolution duration and where accu-rate phase extraction from the gravitational strain iscrucial, we investigated a quasi-circular ten orbit inspi-ral problem based on parameters from Hannam et al.(2010). Here the evolution was performed with th orderspatial discretization and comparison was made againstthe NR informed EOB model of TEOBResumS
Nagar et al.(2018). Using a
Mesh tailored to moderate computa-tional resources we found accumulated EOBNR phasedifferences of order (cid:39) . rad to merger and (cid:39) . radto the ring-down for the highest resolution calculation.These tests highlight that GR-Athena++ is accurate androbust for BBH inspiral calculations and a strong con-tender for construction of high-quality data for wave-form modeling.In addition to accuracy, efficiency of computationalresource utilization is crucial. The task-based compu-tational model for distribution of calculations within
Athena++ and concomitant impressive scalability prop-erties as available resources are increased we have foundto readily extend to
GR-Athena++ and the Z4 c systemwith VC discretization for a wide variety of problemspecifications. Indeed during strong scaling tests itwas found that efficiency above is reached up to ∼ . × CPUs, whereas in weak scaling tests almostperfect scaling is achieved up to ∼ CPUs. This in-dicates that for the high resolutions and consequently resources required for a potential calculation describingan intermediate mass ratio BBH inspiral
GR-Athena++ compares favorably with the code-base of
Dendro-GR
Fernando et al. (2018) in terms of scalability.Finally, as
GR-Athena++ builds upon the modularframework of
Athena++ we inherit all its extant infras-tructure which will enable incorporating an NR treat-ment of the matter sector in future work. It is our inten-tion to make
GR-Athena++ code developments publiclyavailable in future in coordination with the
Athena++ team. ACKNOWLEDGMENTSThe authors thank Bernd Brügmann, Alessandro Na-gar, and Jim Stone for discussions, and Nestor Ortiz forinitial work on
AwA pn56zo , pn68wi and pn98bu ). The authors acknowledge HLRS for fundingthis project by providing test account access on the su-percomputer HPE Apollo Hawk under the grant number ACID 44191, GRAthenaBBH .APPENDIX A. APPLES WITH APPLES TEST-BEDSIn order to provide a series of computationally in-expensive and standard tests of differing formulationsin numerical relativity (tailored for the vacuum sec-tor) a suite of so-called “Apples with Apples” test-bedproblems (hereafter
AwA ) have been proposed Alcubierreet al. (2004); Babiuc et al. (2008). Our goal here is to ensure that the implementation of Z4 c within GR-Athena++ reflects the overall propertiesobserved in prior tests made based on the same formu-lation Cao & Hilditch (2012); Daverio et al. (2018) witha particular focus on elements directly relevant to grav-itational wave propagation and extraction. Specifically,we examine the AwA: robust stability §A.1, linearizedwave §A.2 and gauge-wave §A.3 tests.5Generically the
AwA tests are specified for three di-mensional periodic spatial grids (i.e. of T topology)where the effective dynamics occur over one (or two)spatial dimensions. Dynamics in “trivial” directions arereduced to a thin layer which in GR-Athena++ is achievedby selecting the relevant component(s) of N M of the Mesh sampling to be equal to (selected chosen as toprobe potential “checker-board instability” Babiuc et al.(2008)). This entails the full Z4 c equations and imple-mentation of (§3.1) are active during a calculation. Di-rections involving non-trivial dynamics are sampled andtested on both cell-centered G CC and vertex-centered G VC grids: G CC := (cid:110) −
12 + (cid:16) n + 12 (cid:17) δx (cid:12)(cid:12)(cid:12) n ∈ { , . . . N − } (cid:111) , G VC := (cid:110) −
12 + nδx (cid:12)(cid:12)(cid:12) n ∈ { , . . . N } (cid:111) ; (A1)where δx = 1 /N and N = 50 ρ with ρ ∈ N servingto adjust resolution as required. Each direction is fur-ther extended by ghost-zones as described in §2. Norefinement is present in this section though MeshBlock objects have N B chosen so as to partition the domainand allow for a further consistency check on the MPI-OMP hybrid parallelism. Selection of Z4 c parameters isas follows: Constraint damping κ = 0 . and κ = 0 with shift-damping typically selected as η = 2 . Kreiss-Oliger (KO) dissipation is taken as σ = 0 . and theCourant-Friedrich-Lewy (CFL) condition as / . Thesechoices are motivated by those made in prior work Cao& Hilditch (2012) and we comment on behavior upondeviation from them as tests are presented.For the tests performed here Z4 c is always coupled tothe puncture gauge described in §3.2 as this is of primaryinterest in this work. We take initial gauge conditionsto be: α | t =0 = 1 , β i (cid:12)(cid:12) t =0 = 0 . Furthermore, in order to facilitate comparison we set N g = 2 throughout such that spatial discretization is of nd order, though numerical experiments with N g = 3 reveal similar properties. Time-evolution is performedusing the th order RK S ] method of Ketcheson(2010). A.1. Robust stability
The robust stability test is performed with one effec-tive spatial dimension and is designed to efficiently de-tect instability within numerical algorithms affecting theprincipal part of the evolution system. An initial spatialslice of Minkowski is made where all Z4 c field compo-nents (§3.1) have to each sampled grid point a resolution Figure 20.
Robust stability test. Errors are displayed for G VC with solid lines and G CC with solid lines marked “ • ”.Different selections of parameter entering the Mesh samplingare: ρ = 1 in blue; ρ = 2 in orange; ρ = 4 in green; ρ = 8 inred; ρ = 16 in purple. Top panel: Rescaled L ∞ norm of col-lective constraint monitor. Bottom panel: Deviation fromflat metric. Consistent behavior between CC and VC dis-cretization is observed at all resolutions. See text for furtherdiscussion. dependent, distinct (i.e. independent) uniform randomvalue added: ε ∈ ( − − /ρ , − /ρ ) . (A2)Values are selected such that ε is below round-off indouble precision arithmetic. The variable ε models theeffect of finite machine precision and for a code to passthe test stable evolution must be observed. A code thatcannot pass this test would potentially have severe issueswith any evolution of smooth initial data. We evolve toa final time T = 1000 and monitor (cid:107)C(cid:107) ∞ where C is de-fined in Eq.(21) together with (cid:107) γ ij − δ ij (cid:107) ∞ as suggestedin Daverio et al. (2018). Results are displayed in Fig.20.We find that over the duration of the time-evolution (cid:107)C(cid:107) ∞ plateaus for ρ ≥ with ρ = 1 appearing alsoto tend towards a plateau. Similar behavior is foundfor (cid:107)H(cid:107) ∞ . This indicates stability in the sense thatno spurious exponential modes are excited. Reducing η → is observed to lead to qualitatively similar generalbehavior. This was also observed in Cao & Hilditch(2012). As errors decrease as resolution is increased weconsider GR-Athena++ to pass this test.A.2.
Linearized wave
The purpose of the linearized wave test is to checkwhether a code can propagate a linearized gravitationalwave, a minimal necessity for reliable wave extractionfrom strong-field sources Babiuc et al. (2008).An effective one-dimensional test with dynamicsaligned along the x -axis is specified through spatial slic-6ing of: d s = − d t + d x + (1 + H )d y + (1 − H )d z , (A3)where: H ( x, t ) = A sin (cid:32) π ( x − t ) d (cid:33) , (A4)and d = 1 is set to match the periodicity of the un-derlying computational domain and A = 10 − selectedsuch that quadratic terms are on the order of numericalround-off in double precision arithmetic. Consequently,initially we have α = 1 and β i = 0 with non-trivialextrinsic curvature components: K yy = − ∂ t [ H ( x, t )] , K zz = 12 ∂ t [ H ( x, t )] . (A5)Time evolution is performed to a final time of T = 1000 .Rather than displaying approximate sinusoidal pro-files at some final time as in Alcubierre et al. (2004);Babiuc et al. (2008); Cao & Hilditch (2012) we follow asuggestion of Daverio et al. (2018) to instead considerthe spectra of data. To this end we compute the discreteFourier transform as: F k ( t ) := 1 N N (cid:88) n =1 (cid:0) γ zz ( t, x n ) − (cid:1) exp( − πik ( x n − t )) , where x n ∈ G ( · ) (see Eq.(A1)). In the case of VC dis-cretization the last point of the (periodic) grid is identi-fied with the first and therefore dropped from the sum-mation. Thus comparing Eq.(A4) a spectral measureof the relative error in the travelling wave amplitudeis provided through (cid:15) a ( t ) := || F ( t ) | − A | /A . The ab-solute phase error may be directly inspected through (cid:15) p ( t ) := | arg( F ( t )) − π/ | . We also compute the off-set of the numerical waveform relative to the amplitude (cid:15) o ( t ) = | F ( t ) | /A .The initial data are constraint violating Cao &Hilditch (2012) and the puncture gauge is not neces-sarily compatible with simple advection, nonetheless wefind that to an excellent approximation the solution isa simple travelling wave with results of the analysis de-scribed above shown in Fig.21.In agreement with Cao & Hilditch (2012); Daverioet al. (2018) we find the dominant source of error tobe in the phase of the propagating waveform where thecoarsest sampling ρ = 1 leads to a final absolute phaseerror of (cid:39) . cf., the finest sampling ρ = 16 yielding (cid:39) . × − rad . For (cid:15) a ( t ) and (cid:15) p ( t ) we observe conver-gence with increasing resolution. For (cid:15) o ( T ) at T = 1000 increasing ρ tended to increase error albeit overall thisis acceptably within [7 . , × − ; this is compatiblewith the general behavior found in Daverio et al. (2018). Figure 21.
Linearized wave test with parameters and leg-end of Fig.20. Top panel: relative error in the travelling waveamplitude (cid:15) a ( t ; ρ ) . Middle panel: phase error (cid:15) p ( t ; ρ ) . Bot-tom panel: offset of the wave relative to amplitude (cid:15) o ( t ; ρ ) .The CC and VC discretizations display mutually consistentbehavior in all variables for all t apart from (cid:15) o ( t ; 16) thoughthe overall trend is recovered as t → T . See text for discus-sion. As observed in Cao & Hilditch (2012) reducing κ → or η → leads to qualitatively very similar results. Wethus consider GR-Athena++ to pass this test.A.3.
Gauge wave
A gauge transformation of Minkowski space-timedefines this test with parameters selected so as toinvolve the full non-linear dynamics. One takes η ab ˙=diag( − , , , in Cartesian coordinates x (cid:48) a andtransforms: ( t (cid:48) , x (cid:48) , y (cid:48) , z (cid:48) ) → (cid:0) t + G + ( x, t ) , x − G ± ( x, t ) , y, z (cid:1) , where G ± ( x, t ) := ± ∂ t [ H ( x, t )] / (8 π ) with H defined inEq.(A4). Two test cases are defined: shifted where G − is selected for the transformation on the x (cid:48) componentwhereas for unshifted G + is chosen in both components.For the latter the induced metric is: γ xx =1 − H, γ yy = γ zz = 1; (A6)resulting in non-trivial extrinsic curvature component: K xx = ∂ t [ H ( x, t )]2 (cid:112) − H ( x, t ) . (A7)During calculations we evolve to a final time of T =1000 . In contrast to the standard AwA specification Alcu-bierre et al. (2004); Babiuc et al. (2008) we make use of7
Figure 22.
Gauge-wave test. Top panel: Error of met-ric components based on Eq.(A8). Legend indicates scal-ing applied (based on an assumed nd order spatial scheme).Rescaling indicates anticipated convergence is well-obeyed.Bottom panel: Hamiltonian constraint with displayed datafollowing the legend of Fig.20. It is clear that with increasedresolution constraint violation converges away. Consistentbehavior between CC and VC discretizations is observed atall samplings. See text for discussion. the puncture gauge where we found it crucial to select anon-zero shift-damping of η = 2 . In addition to inspec-tion of the constraints through (cid:107)H(cid:107) ∞ we also considerconvergence of the aggregate, induced metric quantity: (cid:15) γ ( t ; ρ ) = (cid:118)(cid:117)(cid:117)(cid:116) (cid:88) i,j =1 (cid:68) γ ij ( t ) | ρ − γ ij ( t ) | ρ (cid:69) RMS , (A8)where the root-mean-square (RMS) value is computed atfixed times over a Mesh sampled with N = 50 . In Fig.22 we plot (cid:15) γ ( t ; ρ ) for a choice of A = 1 / and d = 1 inEq.(A4) and find a nd order rate of convergence as isexpected for N g = 2 .The behavior of the shifted case is identical for thestated parameters and therefore we do not show it.A two-dimensional variant of the gauge wave testmay also be considered where an initial spatial rota-tion of π/ is made. In particular, coordinates aremapped according to SO (2) (cid:51) R : ( x, y ) (cid:55)→ (ˆ x, ˆ y ) =( x + y, x − y ) / √ . This results in x -aligned propaga-tion mapped to a periodic diagonal trajectory in the ˆ x - ˆ y plane. Here as per AwA specification we evolve toa final time of T = 100 and selected resolutions basedon ρ ∈ { , , , } . For an amplitude of A = 1 / we similarly found a nd order rate of convergence with N g = 2 in rescaling of (cid:15) γ ( ρ ) . For the case of A = 1 / however we did not observe clean nd order convergence.Indeed the AwA specification suggestion to use an evenhigher amplitude A = 1 / is well-known to cause issueswith stability in a variety of formulations and regard-less of puncture or harmonic gauge choice Daverio et al.(2018); Cao & Hilditch (2012); Boyle et al. (2007).We consider GR-Athena++ to pass this test in boththe one-dimensional (un)-shifted cases and in the two-dimensional unshifted case with the caveat that initialamplitude must be reduced.A.4.
AwA summary
We have demonstrated that
GR-Athena++ with Z4 ccoupled to the moving puncture gauge passes the AwA ro-bust stability (§A.1) and the one-dimensional linearizedwave (§A.2) tests. For the gauge wave tests (§A.3) wefind that
GR-Athena++ passes for a choice of reducedinitial amplitude of the propagated wave.REFERENCES
Abbott, B. P., et al. 2016a, Phys. Rev. Lett., 116, 061102,doi: 10.1103/PhysRevLett.116.061102—. 2016b, Phys. Rev. Lett., 116, 241102,doi: 10.1103/PhysRevLett.116.241102—. 2017a, Phys. Rev. Lett., 119, 161101,doi: 10.1103/PhysRevLett.119.161101—. 2017b, Class. Quant. Grav., 34, 044001,doi: 10.1088/1361-6382/aa51f4—. 2020, Living Reviews in Relativity, 23,doi: 10.1007/s41114-020-00026-9Akutsu, T., et al. 2020, arXiv:2009.09305 [astro-ph,physics:gr-qc]. https://arxiv.org/abs/2009.09305 Alcubierre, M., Brügmann, B., Diener, P., et al. 2003,Phys.Rev., D67, 084023,doi: 10.1103/PhysRevD.67.084023Alcubierre, M., et al. 2004, Class. Quant. Grav., 21, 589,doi: 10.1088/0264-9381/21/2/019Alfieri, R., Bernuzzi, S., Perego, A., & Radice, D. 2018,Journal of Low Power Electronics and Applications, 8,doi: 10.3390/jlpea8020015Amaro-Seoane, et al. 2017, arXiv:1702.00786 [astro-ph].https://arxiv.org/abs/1702.00786Ansorg, M., Brügmann, B., & Tichy, W. 2004, Phys. Rev.,D70, 064011, doi: 10.1103/PhysRevD.70.064011Arnowitt, R. L., Deser, S., & Misner, C. W. 1959, Phys.Rev., 116, 1322, doi: 10.1103/PhysRev.116.1322 —. 2008, Gen. Rel. Grav., 40, 1997,doi: 10.1007/s10714-008-0661-1Babiuc, M., et al. 2008, Class. Quant. Grav., 25, 125012,doi: 10.1088/0264-9381/25/12/125012Baiotti, L., Bernuzzi, S., Corvino, G., De Pietri, R., &Nagar, A. 2009, Phys. Rev., D79, 024002,doi: 10.1103/PhysRevD.79.024002Baker, J. G., van Meter, J. R., McWilliams, S. T.,Centrella, J., & Kelly, B. J. 2007, Phys.Rev.Lett., 99,181101, doi: 10.1103/PhysRevLett.99.181101Baumgarte, T., & Shapiro, S. 2010, Numerical Relativity(Cambridge: Cambridge University Press)Baumgarte, T. W., & Shapiro, S. L. 1999, Phys. Rev., D59,024007, doi: 10.1103/PhysRevD.59.024007Berger, M. J., & Colella, P. 1989, Journal of ComputationalPhysics, 82, 64, doi: 10.1016/0021-9991(89)90035-1Berger, M. J., & Oliger, J. 1984, J.Comput.Phys., 53, 484Bernuzzi, S. 2020, Invited Review for GERG.https://arxiv.org/abs/2004.06419Bernuzzi, S., & Dietrich, T. 2016, Phys. Rev., D94, 064062,doi: 10.1103/PhysRevD.94.064062Bernuzzi, S., & Hilditch, D. 2010, Phys. Rev., D81, 084003,doi: 10.1103/PhysRevD.81.084003Bernuzzi, S., Nagar, A., Balmelli, S., Dietrich, T., & Ujevic,M. 2014, Phys.Rev.Lett., 112, 201101,doi: 10.1103/PhysRevLett.112.201101Bernuzzi, S., Thierfelder, M., & Brügmann, B. 2012,Phys.Rev., D85, 104030,doi: 10.1103/PhysRevD.85.104030Berrut, J.-P., & Trefethen, L. N. 2004, SIAM Review, 46,501, doi: 10.1137/S0036144502417715Bona, C., Bona-Casas, C., & Palenzuela, C. 2010, PhysicalReview D, 82, 124010, doi: 10.1103/PhysRevD.82.124010Bona, C., Ledvinka, T., Palenzuela, C., & Zacek, M. 2003,Phys. Rev., D67, 104005,doi: 10.1103/PhysRevD.67.104005Bona, C., Massó, J., Seidel, E., & Stela, J. 1995, Phys. Rev.Lett., 75, 600Bowen, J. M., & York, Jr., J. W. 1980, Phys. Rev., D21,2047, doi: 10.1103/PhysRevD.21.2047Boyle, M., Lindblom, L., Pfeiffer, H., Scheel, M., & Kidder,L. E. 2007, Physical Review D, 75, 024006,doi: 10.1103/PhysRevD.75.024006Boyle, M., et al. 2019, Class. Quant. Grav., 36, 195006,doi: 10.1088/1361-6382/ab34e2Brandt, S., & Brügmann, B. 1997, Phys. Rev. Lett., 78,3606, doi: 10.1103/PhysRevLett.78.3606Brown, D., Diener, P., Sarbach, O., Schnetter, E., & Tiglio,M. 2009, Physical Review D, 79, 044023,doi: 10.1103/PhysRevD.79.044023 Brügmann, B., Gonzalez, J. A., Hannam, M., et al. 2008,Phys.Rev., D77, 024027,doi: 10.1103/PhysRevD.77.024027Bugner, M., Dietrich, T., Bernuzzi, S., Weyhausen, A., &Brügmann, B. 2016, Phys. Rev., D94, 084004,doi: 10.1103/PhysRevD.94.084004Burstedde, C., Holke, J., & Isaac, T. 2019, Foundations ofComputational Mathematics, 19, 843,doi: 10.1007/s10208-018-9400-5Burstedde, C., Wilcox, L. C., & Ghattas, O. 2011, SIAMJournal on Scientific Computing, 33, 1103,doi: 10.1137/100791634Campanelli, M., Lousto, C. O., Marronetti, P., &Zlochower, Y. 2006, Phys. Rev. Lett., 96, 111101,doi: 10.1103/PhysRevLett.96.111101Cao, Z., & Hilditch, D. 2012, Phys.Rev., D85, 124032,doi: 10.1103/PhysRevD.85.124032Cao, Z., Yo, H.-J., & Yu, J.-P. 2008, Physical Review D, 78,124011, doi: 10.1103/PhysRevD.78.124011Carter Edwards, H., Trott, C. R., & Sunderland, D. 2014,Journal of Parallel and Distributed Computing, 74, 3202,doi: 10.1016/j.jpdc.2014.07.003Chirvasa, M., & Husa, S. 2010, Journal of ComputationalPhysics, 229, 2675, doi: 10.1016/j.jcp.2009.12.016Clough, K., Figueras, P., Finkel, H., et al. 2015.https://arxiv.org/abs/1503.03436Damour, T., Nagar, A., Hannam, M., Husa, S., &Brügmann, B. 2008, Phys. Rev., D78, 044039,doi: 10.1103/PhysRevD.78.044039Daverio, D., Dirian, Y., & Mitsou, E. 2018.https://arxiv.org/abs/1810.12346Dietrich, T., & Bernuzzi, S. 2015, Phys.Rev., D91, 044039,doi: 10.1103/PhysRevD.91.044039Dietrich, T., Radice, D., Bernuzzi, S., et al. 2018, Class.Quant. Grav., 35, 24LT01,doi: 10.1088/1361-6382/aaebc0Felker, K. G., & Stone, J. M. 2018, Journal ofComputational Physics, 375, 1365,doi: 10.1016/j.jcp.2018.08.025Fernando, M., Neilsen, D., Lim, H., Hirschmann, E., &Sundar, H. 2018, doi: 10.1137/18M1196972Friedrich, H. 1985, Communications in MathematicalPhysics, 100, 525, doi: 10.1007/BF01217728Galaviz, P., Bruegmann, B., & Cao, Z. 2010, PhysicalReview D, 82, 024005, doi: 10.1103/PhysRevD.82.024005Goldberg, J. N., MacFarlane, A. J., Newman, E. T.,Rohrlich, F., & Sudarshan, E. C. G. 1967, J. Math.Phys., 8, 2155 Goodale, T., Allen, G., Lanfermann, G., et al. 2003, inVector and Parallel Processing – VECPAR’2002, 5thInternational Conference, Lecture Notes in ComputerScience (Berlin: Springer)Grete, P., Glines, F. W., & O’Shea, B. W. 2019,arXiv:1905.04341 [astro-ph, physics:physics].https://arxiv.org/abs/1905.04341Gundlach, C., Martin-Garcia, J. M., Calabrese, G., &Hinder, I. 2005, Class. Quant. Grav., 22, 3767,doi: 10.1088/0264-9381/22/17/025Gustafsson, B., Kreiss, H.-O., & Oliger, J. 2013,Time-dependent problems and difference methods; 2nded., Pure and applied mathematics a wiley series of texts,monographs and tracts (Somerset: Wiley).https://cds.cern.ch/record/2122877Hannam, M., Husa, S., Ohme, F., Müller, D., & Brügmann,B. 2010, Phys. Rev., D82, 124008,doi: 10.1103/PhysRevD.82.124008Healy, J., Lousto, C. O., Lange, J., et al. 2019, Phys. Rev.D, 100, 024021, doi: 10.1103/PhysRevD.100.024021Herrmann, F., Hinder, I., Shoemaker, D., & Laguna, P.2007, Classical and Quantum Gravity, 24, S33,doi: 10.1088/0264-9381/24/12/S04Hilditch, D., Bernuzzi, S., Thierfelder, M., et al. 2013, Phys.Rev., D88, 084057, doi: 10.1103/PhysRevD.88.084057Hilditch, D., & Ruiz, M. 2018, Class. Quant. Grav., 35,015006, doi: 10.1088/1361-6382/aa96c6Hilditch, D., Weyhausen, A., & Brügmann, B. 2016, Phys.Rev., D93, 063006, doi: 10.1103/PhysRevD.93.063006Holmström, M. 1999, SIAM Journal on ScientificComputing, 21, 405, doi: 10.1137/S1064827597316278Huerta, E. A., Haas, R., Jha, S., Neubauer, M., & Katz,D. S. 2019, Computing and Software for Big Science, 3,5, doi: 10.1007/s41781-019-0022-7Husa, S., González, J. A., Hannam, M., Brügmann, B., &Sperhake, U. 2008, Class. Quant. Grav., 25, 105006,doi: 10.1088/0264-9381/25/10/105006Jani, K., Healy, J., Clark, J. A., et al. 2016, Class. Quant.Grav., 33, 204001, doi: 10.1088/0264-9381/33/20/204001Ketcheson, D. I. 2010, Journal of Computational Physics,229, 1763, doi: 10.1016/j.jcp.2009.11.006Kidder, L. E., et al. 2017, J. Comput. Phys., 335, 84,doi: 10.1016/j.jcp.2016.12.059Kreiss, H. O., & Oliger, J. 1973, Methods for theapproximate solution of time dependent problems(Geneva: International Council of Scientific Unions,World Meteorological Organization)LIGO Scientific Collaboration. 2018, LIGO AlgorithmLibrary - LALSuite, free software (GPL),doi: 10.7935/GT1W-FZ16 Lindblom, L., Scheel, M. A., Kidder, L. E., Owen, R., &Rinne, O. 2006, Class.Quant.Grav., 23, S447,doi: 10.1088/0264-9381/23/16/S09Loffler, F., et al. 2012, Class. Quant. Grav., 29, 115001,doi: 10.1088/0264-9381/29/11/115001Lousto, C. O., Nakano, H., Zlochower, Y., & Campanelli,M. 2010, Phys.Rev., D82, 104057,doi: 10.1103/PhysRevD.82.104057Mewes, V., Zlochower, Y., Campanelli, M., et al. 2020,Phys. Rev. D, 101, 104007,doi: 10.1103/PhysRevD.101.104007—. 2018, Physical Review D, 97, 084059,doi: 10.1103/PhysRevD.97.084059Miller, J., Dolence, J., Gaspar, A., et al. 2021, Parthenonperformance portable AMR framework,https://github.com/lanl/parthenonMorton, G. M. 1966, A computer oriented geodetic database and a new technique in file sequencing, Tech. rep.Mösta, P., Mundim, B. C., Faber, J. A., et al. 2014,Class.Quant.Grav., 31, 015005,doi: 10.1088/0264-9381/31/1/015005Müller, D., & Brügmann, B. 2010, Class. Quant. Grav., 27,114008, doi: 10.1088/0264-9381/27/11/114008Nagar, A., et al. 2018, Phys. Rev., D98, 104052,doi: 10.1103/PhysRevD.98.104052Nakamura, T., Oohara, K., & Kojima, Y. 1987, Prog.Theor. Phys. Suppl., 90, 1Nakano, H. 2015, Classical and Quantum Gravity, 32,177002, doi: 10.1088/0264-9381/32/17/177002Nakano, H., Zlochower, Y., Lousto, C. O., & Campanelli,M. 2011, Phys.Rev., D84, 124006,doi: 10.1103/PhysRevD.84.124006Peters, P. C. 1964, Phys. Rev., 136, B1224,doi: 10.1103/PhysRev.136.B1224Peters, P. C., & Mathews, J. 1963, Phys. Rev., 131, 435,doi: 10.1103/PhysRev.131.435Pollney, D., Reisswig, C., Schnetter, E., Dorband, N., &Diener, P. 2011, Phys. Rev., D83, 044045,doi: 10.1103/PhysRevD.83.044045Pretorius, F. 2005, Phys. Rev. Lett., 95, 121101,doi: 10.1103/PhysRevLett.95.121101Punturo, M., Abernathy, M., Acernese, F., et al. 2010,Class.Quant.Grav., 27, 194002,doi: 10.1088/0264-9381/27/19/194002Purrer, M., Husa, S., & Hannam, M. 2012, Phys. Rev. D,85, 124051, doi: 10.1103/PhysRevD.85.124051Radice, D., Bernuzzi, S., & Perego, A. 2020, Ann. Rev.Nucl. Part. Sci., 70,doi: 10.1146/annurev-nucl-013120-1145410