[PDF] The moving mesh code Shadowfax

Abstract

We introduce the moving mesh code Shadowfax, which can be used to evolve a mixture of gas, subject to the laws of hydrodynamics and gravity, and any collisionless fluid only subject to gravity, such as cold dark matter or stars. The code is written in C++ and its source code is made available to the scientific community under the GNU Affero General Public License. We outline the algorithm and the design of our implementation, and demonstrate its validity through the results of a set of basic test problems, which are also part of the public version. We also compare Shadowfax with a number of other publicly available codes using different hydrodynamical integration schemes, illustrating the advantages and disadvantages of the moving mesh technique.

Full PDF

TThe moving mesh code

Shadowfax

B. Vandenbroucke a, ∗ , S. De Rijcke a a Dept. Physics & Astronomy, Ghent University, Krijgslaan 281, S9, 9000 Gent, Belgium

Abstract

We introduce the moving mesh code

Shadowfax , which can be used to evolve a mixture of gas, subject to the laws ofhydrodynamics and gravity, and any collisionless ﬂuid only subject to gravity, such as cold dark matter or stars. Thecode is written in C++ and its source code is made available to the scientiﬁc community under the GNU Aﬀero GeneralPublic License. We outline the algorithm and the design of our implementation, and demonstrate its validity throughthe results of a set of basic test problems, which are also part of the public version. We also compare

Shadowfax with anumber of other publicly available codes using diﬀerent hydrodynamical integration schemes, illustrating the advantagesand disadvantages of the moving mesh technique.

Keywords: methods: numerical, hydrodynamics

1. Introduction

Modern simulations of galaxy formation and evolutioncrucially depend on an accurate treatment of the hydro-dynamics of the interstellar medium (ISM) (Vogelsbergeret al., 2014; Schaye et al., 2015). The ISM fuels star for-mation and is disrupted by stellar feedback, and it is thiscomplex interplay that at least partly governs the observ-able content of galaxies (Verbeke, Vandenbroucke, and DeRijcke, 2015). If we want to be able to compare simulatedgalaxies with observations, we need to properly resolvethese eﬀects.Hydrodynamics is also important on smaller scales,when simulating star-forming clouds (Greif et al., 2011;Dobbs, 2015), feedback from a single star (Geen et al.,2015), or even planet formation in a circumstellar disc(Duﬀell and MacFadyen, 2012). A robust hydrodynami-cal integration scheme, optionally extended with magneticﬁelds, self-gravity or radiation transport, is hence an in-dispensable tool for many astrophysical simulators.Historically, two major classes of hydrodynamical solvershave been developed : grid based Eulerian techniques (Teys-sier, 2002; Keppens et al., 2012), and particle-based La-grangian techniques (Springel, 2005; Price, 2012). Bothdiscretize the ﬂuid as a ﬁnite set of ﬂuid elements. In theformer, the ﬂuid elements are cells, usually deﬁned througha (hierarchical) Cartesian grid, which have a ﬁxed positionin space, but can be allowed to reﬁne or dereﬁne accordingto the quality of the integration. In the latter, the ﬂuid el-ements are particles, which move along with the ﬂow, withthe hydrodynamics being expressed as inter-particle forces. ∗ Corresponding author

Email addresses: [email protected] (B. Vandenbroucke), [email protected] (S. De Rijcke)

It is generally acknowledged that grid based Eulerian tech-niques are more accurate at solving the equations of hy-drodynamics, especially since many particle-based imple-mentations have fundamental diﬃculties in resolving hy-drodynamical instabilities (Agertz et al., 2007). Nonethe-less, Lagrangian techniques are widely used to simulatesystems with a high dynamic range, like cosmological sim-ulations and simulations of galaxies, since they more nat-urally concentrate computational resources on regions ofinterest, and provide a Galilean invariant reference frame.Recently, a new class of hydrodynamical solvers hasbeen developed, mainly through the work of Springel (2010),which aims to combine the advantages of Eulerian andLagrangian techniques (see also Duﬀell and MacFadyen,2011; Yalinewich, Steinberg, and Sari, 2015). This newtechnique uses a moving grid to discretize the ﬂuid, andcombines an unstructured grid based ﬁnite volume inte-gration scheme with the Lagrangian nature of a particlemethod. We will refer to this method as a moving meshtechnique .A number of moving mesh codes are presented in theliterature (Springel, 2010; Duﬀell and MacFadyen, 2011),but only two of them are publicly available : rich (Ya-linewich, Steinberg, and Sari, 2015), written in C++, and FVMHD3D , written in the parallel object-oriented lan-guage Charm++ (Gaburov, Johansen, and Levin, 2012).In this paper, we introduce the new, publicly availablemoving mesh code

Shadowfax (the logo of the code isshown in Figure 1).

Shadowfax is written in C++, andmakes ample use of the object-oriented capabilities of thelanguage to provide an easy to extend framework. Thecode is parallelized for use on distributed memory systems ascl:1410.005 https://github.com/egaburov/fvmhd3d Preprint submitted to Astronomy & Computing May 13, 2016 a r X i v : . [ a s t r o - ph . I M ] M a y igure 1: The Shadowfax logo. using the Message Passing Interface (MPI) , and makesuse of the open source Boost

C++ libraries to extend ba-sic C++ language features. The code supports input andoutput using the HDF5 library in a format compatiblewith the output of Gadget2 (Springel, 2005), gizmo (Hopkins, 2015) and swift (Gonnet et al., 2013). A userfriendly compilation process is guaranteed through the useof CMake .The hydrodynamical algorithm implemented in Shad-owfax is the same as described by Springel (2010), butwith an additional per-face slope limiter and ﬂux limiter,and optional alternative approximate Riemann solvers. Thegravitational calculation is the same as the tree force cal-culation in

Gadget2 (Springel, 2005), and uses the samerelative tree opening criterion and Ewald summation tech-nique for periodic boundary conditions. We have portedthis algorithm to an object-oriented version, which makesuse of compile-time polymorphism using C++ templates.This ensures a clear separation of the algorithmic detailsunderlying the tree walk from the actual physics involvedwith the gravitational calculation. This way, it is mucheasier to focus on one particular aspect of the code, e.g.scalibility, precision..., without needing to worry aboutother aspects.Likewise, we have separated the geometrical detailscontained in the moving mesh from the hydrodynamicalintegration as mush as possible, to make it easier to re-place parts of the algorithm (e.g. the Riemann solver, thegrid...) by simply implementing an alternative class.Our code is predominantly meant to be used in as-trophysical simulations of galaxy formation and evolution,but could have applications in other areas of science aswell, as it is not diﬃcult to replace the Euler equations ofhydrodynamics by e.g. the shallow water equations by im-plementing a diﬀerent Riemann solver. Furthermore, theVoronoi grid used to discretize the ﬂuid can also be usedfor other purposes, e.g. for the suppression of Poissonnoise in randomly sampled distributions through Lloyd’s ascl:0003.001 ascl:1410.003 http://icc.dur.ac.uk/swift/ https://cmake.org algorithm (Lloyd, 1982), or as density estimator in N-bodysimulations (Cloet-Osselaer et al., 2014).In this paper, we outline the basic working of Shad-owfax . We mainly focus on the C++ implementation andthe object-oriented design of our code, and compare ourcode with other hydrodynamical solvers on a number oftest problems. Although the current version of

Shadow-fax focusses more on design and accuracy than on per-formance, we also highlight some basic strong and weakscaling tests. Performance optimizations and extra physi-cal ingredients (e.g. gas cooling, star formation and stel-lar feedback...) will be added in future versions of thecode. The source code of

Shadowfax is publicly avail-able from https://github.com/AstroUGent/shadowfax ,and is distributed under the GNU Aﬀero General PublicLicense .

2. Algorithm

Many of the algorithms implemented in

Shadowfax were already discussed in Springel (2005) and Springel(2010). For completeness, we summarize them below andpoint out the diﬀerences where necessary.

Shadowfax is based on a ﬁnite volume method, whichsubdivides the computational box into a (large) number ofsmall cells. The hydrodynamical integration is governedby the exchange of ﬂuxes between these cells.These ﬂuxes involve the conserved variables : mass( m ), momentum ( p ) and total energy ( E ). The Eulerequations of hydrodynamics however are usually formu-lated in terms of primitive variables : density ( ρ ), ﬂow ve-locity ( v ) and pressure ( p ). The pressure is sometimesreplaced by the thermal energy ( u ) or some form of en-tropic function of the ﬂuid, by using the equation of state of the ﬂuid. In this work, we will always assume an idealgas, with an equation of state of the form p = ( γ − ρu, (1)where γ is the adiabatic index of the gas, for which we willadopt the value γ = 5 /

3, unless otherwise stated. The con-served variables and primitive variables can be convertedinto one another whenever a volume ( V ) is available, since m = ρV (2) p = m v (3) E = mu + 12 m v . (4)It is common practice to combine the conserved andprimitive variables into two state vectors , Q =  m p E  and W =  ρ v p  . (5) Q i for a cell i , duringan integration time step of length ∆ t , is then given by∆ Q i = − ∆ t (cid:88) j A ij F ij ( W i , W j , ∇ W i , ∇ W j , v ij , ∆ t ) , (6)where A ij is the surface area of the interface between cell i and cell j . F ij is the ﬂux between cell i and cell j , whichin general depends on the primitive variables of both cellsand their gradients, the velocity v ij of the face with respectto a frame of reference ﬁxed to the simulation box, and theintegration time step.When formulated in this way, the ﬁnite volume methodcan be applied to any discretization of the ﬂuid, as longas this discretization yields volumes to convert conservedvariables to primitive variables, and deﬁnes a concept ofneighbour relations between cells, and an associated sur-face area and velocity for the neighbour interface. It caneven be applied to mesh-free, particle-based methods (Hop-kins, 2015).In the case of a moving mesh method, the discretizationis given by an unstructured Voronoi mesh, a 2D example ofwhich is shown in Figure 2. The mesh is deﬁned by meansof a set of mesh generating points ( generators ), with thecell associated with a speciﬁc generator containing the re-gion of space closest to that generator. A Voronoi meshcan be deﬁned in D dimensions, but we will focus on thecases D = 2 and D = 3. The Voronoi mesh has the in-teresting property that it is relatively stable under smallmovements of the generators (Reem, 2011), so that cellsdeform continuously under a continuous movement of thegenerators. We exploit this property to allow the mesh tomove in between integration time steps. The surface areaof the faces will change linearly in between steps, so thatthe time averaged ﬂux over the entire time step will becorrect, even if it is calculated at one speciﬁc moment intime.By setting the velocities of the mesh generators equalto the local ﬂow velocity, the mesh will eﬀectively movealong with the ﬂuid, and we end up with a Lagrangianmethod. We can even reinterpret the generators as beingparticles, so that the moving mesh technique becomes atrue alternative for particle-based methods. However, theunderlying integration scheme uses the full strength of aﬁnite volume method, and hence will be more accurate.Note that the quality of the integration will depend on theshape of the cells, with highly irregular cells leading to lessaccuracy (Vogelsberger et al., 2012). To ensure cell reg-ularity, it is sometimes necessary to add extra correctionterms to the generator velocities. We employ the schemeof Springel (2010) and steer the cell generator towards thecentroid of its cell if the distance between generator posi-tion and cell centroid exceeds a fraction of the generic cellsize, i.e. the radius of a sphere with the same volume asthe cell.In the remainder of this section, we describe the mesh construction and ﬂux calculation in more detail, to intro-duce the concepts that are used in the discussion of the Shadowfax implementation.

We construct the Voronoi mesh through its dual De-launay triangulation. The latter is constructed using anincremental construction algorithm. In the default im-plementation, the relevant parts of the mesh are recon-structed for every time step. We also experimented witha mesh evolution algorithm (Vandenbroucke & De Rijcke,in preparation), which evolves the mesh instead of recon-structing it.As Richard Shewchuk (1997) pointed out, incremen-tal Delaunay construction algorithms can become unsta-ble due to numerical round-oﬀ error. To prevent thisfrom happening, we employ arbitrary exact arithmeticsfor all geometrical tests involved. Since the predicates ofRichard Shewchuk (1997) depend on a number of assump-tions on the internal CPU precision that are not met on allhardware architectures, we use the technique outlined bySpringel (2010) : we map the ﬂoating point coordinates ofthe mesh generators to the interval [1 ,

2] and use an inte-ger representation of the mantissa to exactly calculate theresult of a geometrical test if the numerical error couldlead to a wrong result. We pre-calculated the maximalsize of an integer necessary to store the exact result anduse the

Boost Multiprecision library to perform thecalculations using long integer arithmetics.The result of the mesh construction is a list of neigh-bours for every mesh generator, and an associated list offaces, with each face consisting of an ordered list of vertexpositions. We need to compute the volume and geometri-cal centroid of each cell from this, as well as the surfacearea and midpoint of every face.In 2D, the faces each contain only two vertices, withthe midpoints being the midpoints of the line segmentsformed by these two vertices. In this case, we order theneighbours and faces counter-clockwise around the posi-tion of the generator of the cell. The volume (2D surfacearea) of the cell is then given by the sum of the surfaceareas of the triangles that are formed by the ﬁrst vertexand two consecutive other vertices. The centroid of thecell is then the weighted average of the centroids of thesesame triangles.In 3D, we use a similar technique to obtain the sur-face area and midpoint of the individual faces. The totalvolume of the cell is then the sum of the volumes of thepyramids with the faces as base and the cell generator astop (the latter being guaranteed to lie inside the cell). Thecentroid is the weighted average of the centroids of thesepyramids.The mesh construction algorithm only works if all cells,including those at the borders of the simulation volume, .0 0.3 0.6 x y x Figure 2: Density colour plot for the 2D Liska and Wendroﬀ (2003) implosion test at t = 0 . Left: general view of the entire simulation box, right: zoom in on part of the central low density region. The Voronoi mesh used for the discretization of the gas has been overplotted. have boundaries. To make sure this is the case, we al-ways enclose all generators in a simulation box, which is acuboid. Cells at the boundaries are then completed by in-serting ghost generators. If we want periodic boundaries,we insert periodic copies of generators at the other side ofthe simulation box. We can also mirror the generator po-sitions themselves with respect to the boundary face of thesimulation box, in this case we have reﬂective boundaries.During the ﬂux calculation, these ghost generators theneither represent the cell of which they are a periodic copy,or the border cell itself, but with the sign of the ﬂow ve-locity along the boundary face normal reversed.

We employ a Monotonic Upwind Scheme for Conserva-tion Laws (MUSCL) combined with a Hancock predictionstep to estimate the ﬂuxes. This scheme is more com-monly referred to as a MUSCL-Hancock scheme. The ﬂuxcalculation consists of a cell based interpolation and inte-gration step, whereby the primitive variables at the centerof the cell (the centroid of the Voronoi cell) are interpo-lated to the position of the midpoint of the face using thegradients of the primitive variables in that cell, and arepredicted forward in time for half the time step using thesame gradients and the Euler equations in primitive form.These face reconstructed primitive variables are thenused as the input for a Riemann solver, to obtain appropri-ately averaged primitive quantities at the interface whichtake into account the local wave structure of the Eulerequations (Toro, 2009). This can be done using either an exact, iterative Riemann solver, or an approximate Rie-mann solver. The solution of the Riemann problem, W ∗ ,is then used to calculate the hydrodynamical ﬂuxes as F ( W ∗ ) =  ρ ∗ v ∗ ρ ∗ v ∗ v ∗ + p ∗ ρ ∗ ( u ∗ + v ∗ ) v ∗ + p ∗ v ∗  . (7)The Riemann problem is generally formulated in 1D,with the change in variables corresponding to a change in x coordinate. In an unstructured mesh, the face normalsare generally not aligned with the x axis, so that we needto rotate the primitive variables to a reference frame wherethe x axis is aligned with the surface normal of the face.This only aﬀects the ﬂuid velocity components. The so-lution of the Riemann problem then needs to be rotatedback to the original reference frame.When the cells are allowed to move, the faces will moveas well, and we need to transform to a frame of referencemoving along with the face before solving the Riemannproblem. This again only aﬀects the ﬂuid velocity compo-nents of the primitive variables.However, the ﬂux needs to be adapted in this case aswell : even if the hydrodynamical ﬂux through the facewould normally be zero, there will be net ﬂux, caused onlyby the movement of the cell. This ﬂux is given by F mov ( W ∗ ) = −  ρ ∗ v ij ρ ∗ v ∗ v ij ρ ∗ ( u ∗ + v ∗ ) v ij  . (8)4e can either add this correction ﬂux to the hydrodynam-ical ﬂux (Springel, 2010), or calculate the hydrodynamicalﬂux in a reference frame moving along with the face andcorrect for the movement afterwards (Pakmor, Bauer, andSpringel, 2011). A side eﬀect of the gradient interpolation step dis-cussed above is the possible introduction of new extremaof the primitive variables, which can cause spurious oscil-lations in the solution (Toro, 2009). These spurious oscil-lations can be avoided by constructing monotone schemes,whereby the gradients of a cell are limited so that no inter-polated primitive value can exceed the primitive variablesin one of the neighbouring cells. However, such monotoneschemes are no longer second order accurate in space, sothat a limiting procedure inevitably leads to a loss of accu-racy. It might therefore be better to use a somewhat lessrestrictive gradient limiter, as long as the spurious oscilla-tions do not dominate the local solution (Hopkins, 2015).We use the cell wide slope limiter described by Springel(2010).Apart from a cell wide slope limiting procedure thatlimits the gradients of the cell during the gradient calcula-tion (which is performed after the primitive variables havebeen calculated, but before the ﬂuxes are computed), it isalso possible to limit the interpolated values at the faces.The general idea of such a pair-wise slope limiter is to con-serve the wave structure of the Riemann problem. Thiswave structure generally consists of a central contact dis-continuity and a left and right wave, which can be eithera shock wave or a rarefaction wave (Toro, 2009). Spuri-ous oscillations arise when the solution of the Riemannproblem with the cell centered primitive variables as inputyields e.g. a left and right rarefaction wave, while the Rie-mann problem with the interpolated variables at the faceyields a left shock wave and a right rarefaction wave. Theleft shock is not present in the ﬁrst order solution, andintroduces a growing artefact in the solution. With a pair-wise limiter, we limit the left and right interpolated valuesin such a way that the original wave structure of the Rie-mann problem is the same, but the input values can stilldiﬀer from the cell centred values. We implemented thepair-wise slope limiter of Hopkins (2015).Even with appropriate slope limiters, it is still possi-ble that the calculated ﬂuxes are too large, i.e. exceed thevalue of the conserved variables in the cell. For the massand energy of the cell, this is fatal, since this can causenegative masses and energies, which are evidently unphys-ical. This can happen for example if the integration timestep is too large, if the gradient interpolation is done in anasymmetric way (due to pair-wise limiting), or if externalforces (e.g. gravity) contribute to the ﬂux. The former isnormally excluded by choosing an appropriate time stepcriterion. To prevent the latter from crashing the code,we implemented a ﬂux limiter , which ensures that the ﬂuxthrough a face can never be larger than a fraction of the value of the conserved variables inside the cell. This frac-tion is equal to the ratio of the surface area of the face tothe total surface area of the entire cell.We note that a ﬂux limiter is only used to ensure codestability, and in this sense is equal to resetting the mass orenergy of a cell to some very small value whenever they be-come negative. However, by limiting the ﬂux and not thecell quantities themselves, we ensure manifest conservationof mass and energy, which would otherwise be violated.

Gravity is added as an extra term in the momentumequation:∆ p i = − ∆ t (cid:88) j A ij F ij, p −

12 ∆ t ( m i, old ∇ i Φ old + m i, new ∇ i Φ new ) , (9)where m i, old and m i, new represent the mass inside the cellbefore and after the update of the mass respectively, and ∇ i Φ old and ∇ i Φ new represent the gravitational acceler-ation before and after the generator positions have beenupdated. F ij, p is the hydrodynamical ﬂux for the momen-tum, as given above.The gravitational acceleration is also taken into ac-count during the half step prediction, before the ﬂux cal-culation, and only aﬀects the velocity.Gravity also aﬀects the total energy of the cell. Simplyadding a term to the energy equation does not take intoaccount the movement of the mass that ﬂuxes through cellfaces during the time step, and leads to signiﬁcant energyerrors. We therefore use the following more involved equa-tion to update the energy of a cell:∆ E i = − ∆ t (cid:88) j A ij F ij,E −

12 ( m i, old w i, old ∇ i Φ old + m i, new w i, new ∇ i Φ new ) − (cid:88) j ∆ m ij ( r i − r j ) ( ∇ i Φ old + ∇ i Φ new ) , (10)where w i, old and w i, new represent the generator velocitiesbefore and after the update of the generator positions, and∆ m ij is the mass that ﬂuxed from cell i to cell j duringthe time step. The sum extends over all neighbours of thecell.We soften the gravitational acceleration for both thehydrodynamical and collisionless component using a splinekernel with ﬁxed softening length (Springel, 2005). A grav-itational time step criterion based on the size of the gravi-tational acceleration and the softening length is combinedwith the hydrodynamical time step criterion to set theparticle time steps.5 imulationSimulation::initializeVorTessManagerRiemannSolverTimeLineSimulation::loadParticleVectorSimulation::main loop VorTessExactRiemannSolverSnapshotWriterGadgetSnapshotWriterSnapshotReaderGadgetSnapshotReader Figure 3: General overview of a simulation, using the default Voronoimesh, Riemann solver and input/output classes. After the desiredimplementations of the main classes have been constructed duringan initialization step, the initial conditions are read in and stored ina

ParticleVector data structure that will be updated during the mainsimulation loop.

3. Implementation

The ﬁnite volume method and the actual discretiza-tion of the ﬂuid as a Voronoi mesh are clearly well sepa-rated concepts, so that it makes sense to separate them inthe design of a moving mesh code. It then becomes pos-sible to test the ﬁnite volume method using e.g. a ﬁxedCartesian mesh, which is computationally much cheaper toconstruct, or to treat the mesh generators as particles andcalculate volumes and interfaces using a mesh-free method(Hopkins, 2015).Furthermore, various parts of the ﬁnite volume methodcan be adapted, and could be considered to be run timeparameters for the code : the choice of Riemann solver, thechoice of slope limiter,... A good code design should makeit possible to easily exchange these components withoutaﬀecting other parts of the algorithm, and where possiblealso without the need to recompile the code.Figure 3 shows a general overview of a simulation, asit is implemented in the

Simulation class, introducing themain classes of interest. Figure 4 shows how these diﬀerentclasses interact during the main simulation loop.In this section, we describe how various aspects of thealgorithm were implemented in the public version of

Shad-owfax . We highlight important abstractions, but also in-dicate where our current version does not comply with the

Simulation::main loopTimeLine::step forwardContinue simulation? StopWrite snapshot? SnapshotWriter::write snapshotKick colisionless particleskick cell generatorsExchange hydrodynamicalﬂuxesVorTessManager::estimate gradientsVorTessManager::calculate ﬂuxes(RiemannSolver)Drift colissionless particlesdrift cell generatorsUpdate conserved variablesParticleVector::sort ParticleVector::update treeVorTessManager::update gridParticleVector::Tree::calculate accelerations < GravityWalker > Kick colisionless particleskick cell generatorsUpdate primitivevariables VorTessManager::get volumeTimeLine::calculate timestep yesyes

Figure 4: Overview of the main simulation loop. We combine aleapfrog integration for the gravitational force with a simple kick-drift scheme for the hydrodynamics. Geometrical properties like cellvolumes, neighbour relations and face properties are extracted fromthe

VorTessManager , gravitational forces are calculated using the

Tree managed by the

ParticleVector , using a template

GravityWalker .All time related stuﬀ is handled by the

TimeLine . We have isolated the Voronoi mesh in a

VorTessMan-ager class, which encapsulates the entire mesh into a singleobject. Particles can be added to the

VorTessManager asa pair consisting of a set of coordinates and an integer key,so that the properties of the cell or mesh-free particle canthen later be retrieved by means of that same key. The

VorTessManager itself decides how these properties arecalculated. This can be either by using a Voronoi mesh, byusing a ﬁxed Cartesian grid, or even by using a mesh-freemethod. The Voronoi mesh could be constructed from thepositions of the particles, but it can also be evolved fromthe Voronoi mesh from a previous time step. The publicversion of

Shadowfax for the moment only contains theordinary (non-evolved) Voronoi mesh and the ﬁxed Carte-sian grid. The evolving mesh will be added as part offuture work (Vandenbroucke & De Rijcke, in preparation).The ﬂuxes are calculated through faces, which could inprinciple be returned by the

VorTessManager as VorFace objects, which contain the surface area and midpoint ofthe face, as well as the indices of the two neighbouringgenerators. However, due to the historical developmentof the code and the complexities involved with parallelliz-ing the Voronoi mesh, the ﬂux calculation is implementedas a method of the

VorTessManager object for the mo-ment, with the entire interpolation, integration and ﬂuxestimation procedure being implemented as a method ofthe

VorFace object.

The ordinary Voronoi mesh is contained in a

VorTess object, which contains an actual list of the faces and ofthe cells, represented as

VorCell objects. The

VorCell ob-ject contains the necessary methods to calculate volumesand centroids, and in the current version of the code alsothe methods that estimate the gradients and calculate thegenerator velocities.When adding a particle to the

VorTess , the particlecoordinates are mapped to the range [1 ,

2] and stored in a

VorGen object, which represents a mesh generator. Thisobject is stored in a

DelTess object, which represents theDelaunay triangulation. The

DelTess object also containsa list of

Simplex objects, which represent the 2D trianglesor 3D tetrahedra that constitute the triangulation. Each

Simplex object has 3 or 4

VorGen objects as vertices (de-pending on the dimension). During the incremental con-struction algorithm, these simplices will change. When allparticles have been added to the

VorTess object, we call anappropriate method to signal this. At this point, a methodof the

DelTess object is called that will ensure complete-ness of the Delaunay triangulation (by adding inactive par-ticles that are neighbour to active particles, and ghost gen-erators for particles on other MPI processes). When thisis done, the actual Voronoi mesh is constructed by loopingover the

VorGen objects stored in the

DelTess and addinga

VorCell for every

VorGen . For every neighbour relationbetween

VorGen s, a

VorFace is created. At the end ofthe procedure, we also calculate the volumes, centroids,surface areas and midpoints of all cells and faces.

When the ﬂux calculation method of the

VorTessMan-ager object is called, we pass on a general

RiemannSolver object, which is used to solve the Riemann problem atthe faces, but also to convert between primitive and con-served variables, and to calculate the actual ﬂuxes. The

RiemannSolver class itself is just an interface, with the ac-tual implementation being deferred to child objects. Thepublic version of

Shadowfax implements two Riemannsolvers : an exact Riemann solver (

ExactRiemannSolver class), and a two rarefaction Riemann solver (

TRRSSolver class), which is an approximate solver that assumes a wavestructure containing two rarefaction waves.Which Riemann solver is used, is speciﬁed as a run timeparameter. To generate the appropriate

RiemannSolver implementation, a

RiemannSolverFactory class is used,which implements methods that convert a string represen-tation of a Riemann solver type to an object. A single

RiemannSolver object is created during program initial-ization and used throughout the simulation, so that the

RiemannSolver can store some valuable information aboutthe number of Riemann solver evaluations and the fractionof the run time spent in Riemann solver evaluations.Since the Riemann solver contains all information aboutthe actual equations being solved, it is possible to exper-iment with new equations of state by writing new

Rie-mannSolver implementations.For all tests presented in this paper, we use the exactRiemann solver.

Apart from the spatial discretization, the system ofequations is also discretized in time. We have implementedan individual time stepping scheme, in which cell timesteps are power of 2 subdivisions of the total simulationtime. Fluxes are always exchanged using the smallest timestep of the two cells involved, and the time step of a cellcan only increase again if it then becomes synchronizedwith other cells on the same time step level. The systemis always evolved using the smallest time step of all cells,7ut only those cells that are active at the beginning of thecurrent system time step are actually integrated.The size of the time step depends on the hydrodynam-ics and hence should ideally be set in the

RiemannSolver class. However, for problems involving gravity, there is alsoa gravitational time step criterion. Furthermore, the sizeof the time step is also governed by a stability parameter(called the Courant-Friedrichs-Lewy or CFL parameter),which is speciﬁed as a run time parameter for the program.We hence have opted to make the time step calculation amethod of a

TimeLine object, which takes both the hydro-dynamical time step criterion and the gravitational timestep criterion into account and applies the correct CFLparameter. To separate the hydrodynamical aspects, allhydrodynamically relevant information is encoded in twovelocities : the ﬂuid velocity and a sound speed, which iscalculated by a method of the

RiemannSolver object.We also implemented an advanced, tree walk basedtime step criterion to adapt the cell time step in the pres-ence of strong shocks. This criterion makes use of the tem-plate tree walks discussed below. However, for simulationsinvolving regular ﬂow and a high dynamic range, this cri-terion was found to be very expensive, so that we disablethis criterion by default. This criterion can be activatedby setting the corresponding run time parameter.The

TimeLine object also keeps track of the systemtime, and provides methods to convert ﬂoating point timevalues to an internally used integer timeline (which is moreconvenient when using a power of 2 time step hierarchy).The

TimeLine is also responsible for writing snapshot ﬁles.Snapshots are dumps of the positions and primitivevariables of the particles (along with other relevant quan-tities) at a particular system time. The time interval inbetween snapshots can be speciﬁed as a run time parame-ter and is stored in the

TimeLine , together with a counterthat keeps track of the number of snapshot ﬁles alreadywritten. Whenever the system time becomes equal to orlarger than the time for the next snapshot, a signal issent to an implementation of the

SnapshotWriter inter-face, which is responsible for writing the snapshot. Notethat this means that snapshots are not necessarily writtenat the exact time requested, if that time does not coincidewith an actual system time. This is a consequence of ourstrict power of 2 time step hierarchy and the fact that wedo not drift quantities in between system times. Usuallyhowever, the diﬀerence will be small. Note also that theuse of individual time steps means that not necessarily allcells will be active at the time when a snapshot is written.In this case, the primitive variables that were calculatedat the last point in time when a cell was active are usedfor that cell. Again, the diﬀerence will usually be small.The

SnapshotWriter interface deﬁnes a method thatwrites the actual snapshot. Two implementations are pro-vided :

ShadowfaxSnapshotWriter , which writes snap-shots in the historically native

Shadowfax format, and

GadgetSnapshotWriter , which writes snapshots in the sameformat that is also used for gizmo and swift , and was one of the possible snapshot formats for

Gadget2 . Both for-mats are based on the HDF5 ﬁle format.Initial condition ﬁles have the same format as snapshotﬁles, and are read in using the appropriate implementa-tion of the

SnapshotReader interface. Both the format ofthe initial conditions as that of the snapshot ﬁles can bespeciﬁed as a run time parameters for the program, thedefault format being the

Gadget format. Appropriateobjects are generated using the

SnapshotReaderFactory and

SnapshotWriterFactory classes.

Gravitational accelerations are calculated using a Barnes-Hut tree walk (Barnes and Hut, 1986), using a relative treeopening criterion (Springel, 2005). To this end, a hierar-chical octree is constructed : a

Tree object. This objectholds a pointer to a single implementation of the

Node interface, which is called the root of the tree. The

Node interface has two implementations,

TreeNode and

Leaf ,which correspond to respectively a node and a leaf of theoctree. Each

TreeNode has up to 4 or 8 children (depend-ing on the dimension), which themselves are also

Node implementations, and can be either a

TreeNode or a

Leaf .Each

Leaf holds a pointer to a single

Particle , which canbe either a

GasParticle or a

DMParticle (see below). Inparallel simulations, there is also a third type of

Node ,called a

PseudoNode , which is used to represent a

Tree-Node on another MPI process.The tree is constructed using an incremental construc-tion algorithm, and making use of the close link betweenthe levels of the tree and the levels of a Hilbert space ﬁllingcurve (Sundar et al., 2008), the latter also being used forthe domain decomposition (Springel, 2005). To this end,we ﬁrst calculate a Hilbert key for each particle, using theeﬃcient algorithm of Jin and Mellor-Crummey (2005). Weuse a global octree on each process, with nodes that areentirely on other processes being represented by pseudonodes.The gravitational tree walk itself is in essence the sameas in

Gadget2 , using the same Barnes-Hut tree algorithmwith relative opening criterion and Ewald summation totreat periodic boundaries, but it has been entirely rewrit-ten to increase readability and code reuse. To this end, wehave deﬁned a general

TreeWalker interface. Every treewalk, be it a tree walk to obtain gravitational accelera-tions, or a tree walk to ﬁnd the neighbours of a particlewithin a given radius, is then represented by an implemen-tation of this interface. The interface itself deﬁnes methodsthat correspond to the diﬀerent tasks during a tree walk :a method to check whether a node of the tree should beopened or treated approximately, a method called when aleaf is encountered, and a method that is called when apseudo node is encountered and checks whether the treewalk should be continued on another process. Apart fromthis, the interface also deﬁnes methods that initialize thevariables used during the tree walk, and a method that iscalled when the tree walk is ﬁnished.8very possible tree walk is implemented as a singlemethod of the

Tree class, using the concept of C++ tem-plates. To this end, the tree walk method takes the name ofa

TreeWalker implementation as a template argument, aswell as a list of particles for which the tree walk should beperformed. For every particle in the list, a corresponding

TreeWalker object is created and used to walk the tree :we open the root node of the tree and apply the methodthat checks if a node should be opened on its children. Ifa node should be opened, we continue with its children, ifnot, we execute all calculations that are necessary to getthe approximate contribution of the entire node to the treewalk, using the node properties. If a leaf is encountered,the properties of the corresponding particle are used, andwe continue with the next node or leaf on the same level.By using class methods to deﬁne the diﬀerent steps inthe tree walk, the code is a lot easier to read, since, forinstance, the code that decides if a node should be openednow is in a separate method that takes a

TreeNode assingle parameter. We do not need to worry about how toimplement the tree walk itself eﬃciently, only about whathappens when a node or leaf is encountered. By usingC++ templates, we limit the overhead usually involvedwith run time polymorphism, since the code for a speciﬁctype of

TreeWalker is generated at compile time.Our approach has some beneﬁts for MPI parallelliza-tion as well, since all explicit communication (and the de-sign of an eﬀective communication scheme) is limited tothe single tree walk method. The only tree walk speciﬁcthings we need to do, are deciding if communication is nec-essary (which is in most cases similar to deciding whetheror not to open a node), deciding what information to com-municate, and how the result of the tree walk on anotherprocess should be communicated back to the original pro-cess. To this end, our

TreeNode interface deﬁnes two sub-classes, called

Export and

Import . The diﬀerent physical components of the simulationare all represented by

Particle s, which is an abstract classholding a position and velocity, as well as an unique iden-tiﬁer (ID), which can be used to trace a single particlethroughout diﬀerent snapshots. Currently, two subtypesof

Particle are supported :

GasParticle s and

DMParti-cle s, representing the generators of the Voronoi mesh anda collisionless cold dark matter component respectively.We have adopted the name

GasParticle instead of cell toreﬂect the fact that our hydrodynamical method is justan alternative for common particle-based hydrodynamicalintegration schemes. For the gravitational calculation, thegas is treated as if it consists of particles.The

GasParticle class extends the particle data withtwo

StateVector members, representing the primitive andthe conserved quantities for that cell. It also holds a num-ber of other variables required by the hydrodynamical in-tegration scheme. The

DMParticle only holds a particle mass and some auxiliary variables for the gravitational cal-culation.The

Particle class itself extends the

Hilbert Object in-terface, which links a space ﬁlling Hilbert key to it. Everyclass that implements the

Hilbert Object interface can besorted using an eﬃcient

ParallelSorter , based on Siebertand Wolf (2010). The

ParallelSorter takes care of the do-main decomposition and data size based load-balancingaccross all MPI processes. The domain decompositionis currently only based on the number of particles, andtries to assign equal numbers to all processes. This simpleapproach only works for homogeneous setups with smallnumbers of processes, and will be replaced by a more ad-vanced cost-based domain decompisition in future versionsof the code.

Particle s of all types are stored in a

ParticleVector ,which is a specialized wrapper around two standard C++ vector objects. The class has member methods to sepa-rately access

Particle s of both types, and is also respon-sible for maintaining the

Tree and storing some generalinformation about the simulation.

We have adopted the strategy used by swift , andforce all quantities that are used as input or output ofour program to have units attached to them. To this end,both snapshot formats have appropriate blocks specifyingin what units the given data are expressed, given in termsof a reference unit system, which is either the SI or CGSsystem. If no units are speciﬁed for the initial conditionﬁle, SI units are assumed by default. The output units canbe chosen as a run time parameter, as well as the inter-nally used units. Run time parameters that should haveunits are assumed to be in internal units, the default beingSI units.In principle, it does not matter what units are usedinternally in our code, although using a system of units inwhich variables have values close to unity can be advan-tageous. Furthermore, all run time log information willbe expressed in internal units, so that it is useful to havesome idea of what units are used.To simplify working with units, we have implementeda

Unit class. This class stores the quantity for which theunit is used, expressed as a combination of the three basicquantities length , mass and time , and the value of the unitin SI units. Temperature and current should be added tothe set of basic quantities to be able to express all possiblequantities, but these are not used for the variables thatare currently evolved in

Shadowfax . Unit s can be multiplied with or divided by one another,yielding a new

Unit . To make working with quantitiesin this case possible, we have adopted some conventionsof how quantities should be combined. We also made itpossible to compare diﬀerent

Unit s, with two

Unit s beingcompatible if their quantities are equal. Quantities withcompatible

Unit s can be converted into each other by us-ing a

UnitConverter . A combination of a length, mass and9ime unit deﬁnes a complete unit system, a

UnitSet , whichis what we specify as a run time parameter for the pro-gram. We currently support three diﬀerent unit systems :SI units (m, kg, s), CGS units (cm, g, s) and galactic units(kpc, M (cid:12) , Gyr), which can be generated using the

Unit-SetGenerator class.Physical constants also have units, but usually theirvalue is ﬁxed. We have hard coded the values of relevantphysical constants (currently only the gravitational con-stant G = 6 . × − m kg − s − ) in SI units, andstore their value in internal units in a Physics object. Wehave provided a mechanism to override the physical valueof the gravitational constant at run time for test purposes,but there are no plans to generalize this approach for otherphysical constants.

Since

Shadowfax is meant to be used on small clus-ters for simulations that take several days or even weeksto complete, we need to address the possibility that runsmight be interrupted, either by limits on the use of infras-tructure, or by hardware failure. Since the initial conditionﬁles and snapshot ﬁles have the same format, it is alwayspossible to restart a run from the last snapshot. How-ever, since a snapshot is not necessarily written at a timewhen all cells are active, this will aﬀect the outcome of thesimulation.We therefore implemented an explicit restart mecha-nism that dumps the entire simulation to binary ﬁles andthen restarts it as if the run never stopped. This is rep-resented by a

RestartFile object. This object has twotemplate methods, write and read , with several specializa-tions, to write values in all sorts of formats to the binaryﬁle without the need to explicitly type cast anything. Allobjects in the simulation that need to be dumped to therestart ﬁle either implement a dump method, which writesmember variables to the restart ﬁle, or are written entirelyto the restart ﬁle using one of the write specializations.When restarting the run, objects are either created us-ing a special constructor which initializes values by read-ing them from the restart ﬁle, or are read entirely from theﬁle using an appropriate read specialization. To ensure aproper working of the restart mechanism, we only need tomake sure that values are written and read in the sameorder.Since restarting a run is only meant to be done whenthe run was somehow interrupted, the

RestartFile alsowrites a short summary ﬁle, containing compilation, runtime and version information about the program that cre-ated the dump. When restarting, this ﬁle is read in andthe information is compared with the running program.Only when the same version of the code, with the samecompilation time and the same run time environment isused, do we allow the code to restart.

The current version of the code still contains traces ofthe way in which it was originally developed, that do notcomply with our strict design goals. However, refactoringthe code to eliminate these infers a major update, and ispostponed to a future version of the code. We give a briefoverview below. • VorFace still contains the entire gradient reconstruc-tion and prediction step, as well as the ﬂux calcula-tion. These should be isolated into a new class thatuse the properties of the

VorFace , but can be decou-pled from its geometric meaning. • Similarly,

VorCell still contains the gradient estima-tion and the generator velocity calculation. Theseshould be calculated inside a new class that uses the

VorCell properties. • Currently, only the time step tree walk and the grav-itational tree walk use a template

TreeWalker , theneighbour search is still hardcoded in the

Tree class. • The MPI communication for the cells is strongly cou-pled to ghost

VorGen s stored in the

DelTess . As aresult, all cell communication has to go through the

VorTessManager , which is not ideal. • The domain decomposition does not take into ac-count the eﬀective computational cost on an MPIprocess, but is only based on particle number. • The

FixedGrid only works if the particles have spe-ciﬁc positions and do not change their positions. Thismeans the

VorTessManager should have control overthe positions of the particles, and no other class.

4. Setup and analysis

Apart from the main simulation program, the public re-lease of

Shadowfax also includes two auxiliary programsthat can be used to generate initial condition ﬁles, and toconvert snapshot ﬁles to VTK ﬁles that can be easily vi-sualized using common visualization packages. These arediscussed below.

Since the default ﬁle format for

Shadowfax is thesame as for a number of important other astrophysicalcodes, it is possible to use software written for those codesto generate initial conditions for

Shadowfax , or to anal-yse

Shadowfax results. We however also provide ourown initial condition generating software, which is a partof

Shadowfax . It is also worth noting that HDF5 has auser-friendly Python interface, h5py . icmakerXd , (with X being2 or 3), and is based on the initial condition mechanismimplemented in the AMR code ramses (Teyssier, 2002).The simulation box is divided into a number of geometrical regions , each having a geometrical shape and values forthe primitive hydrodynamical variables inside that region.Multiple regions can overlap, in this case the values for theregion that was last added are used in the overlap region.The shape of the region is set by deﬁning the positionof the origin o = ( o x , o y , o z ) of the region, and 3 widths, w x , w y , w z , together with an exponent e . The latter isused to determine if a point p = ( p x , p y , p z ) lies inside theregion. For this, the inequality (cid:20)(cid:18) o x − p x w x (cid:19) e + (cid:18) o y − p x w y (cid:19) e + (cid:18) o z − p z w z (cid:19) e (cid:21) e ≤ ramses scheme and do not onlyallow constant values for the primitive variables inside asingle region, but also more complex expressions, contain-ing mathematical operations (+, − , × , / ), basic math-ematical functions (cos , sin ...), mathematical constants( π ), and even coordinate expressions ( x , y , z and r ). Sup-port for these expressions is provided by Boost Spirit . icmakerXd has support for regular Cartesian setups,but also for random unstructured grids, whereby grid gen-erators are sampled according to the hydrodynamical den-sity of the regions, using rejection sampling. To this end,the (potentially) complex density proﬁles inside the diﬀer-ent regions are numerically integrated to obtain weighingfactors. Allowing complex expressions makes this compu-tationally expensive, but makes the program a powerfultool for initial condition generation.To smoothen out Poisson noise in random generatorsetups, we apply 10 iterations of Lloyd’s algorithm (Lloyd,1982). This yields a more regular initial mesh. Visualizing a Voronoi mesh is not so straightforward,especially in the case of a 3D mesh. Furthermore, the ver-tices of the Voronoi mesh are not written to the snapshotﬁles, so that the mesh needs to be recalculated if we wantto visualize the mesh corresponding to some snapshot ﬁle.This requires a Voronoi construction algorithm, and hencewe have written an auxiliary program, called vtkmakerXd that converts a regular snapshot to a dump of the Voronoimesh, in the VTK ﬁle format . This ﬁle format can beread by software that makes use of the powerful Visualiza-tion Toolkit (VTK), the most commonly used examplesbeing Paraview and VisIt . ascl:1011.007 https://wci.llnl.gov/simulation/computer-codes/visit We have also written a plugin for VisIt that reads inthe default (

Gadget ) snapshot format. It should alsobe possible to use the eXtensible Data Model and For-mat (XDMF) and the XDMF plugins for VisIt and Par-aview, an approach taken by e.g. swift . However, thecurrent version of Shadowfax does not write the neces-sary XDMF ﬁle, so that it needs to be manually created.

5. Basic tests

To validate the code, we have used it to evaluate anumber of test problems. These have been gathered intoa testsuite , together with the necessary ﬁles to create theinitial conditions, run the simulations, and analyze the re-sults. Where possible, we have provided analytic solutionsto compare with.These tests are meant to be run as a general code checkafter every signiﬁcant change in the code, and for this rea-son use relatively low resolution grids. We therefore willnot focus on obtaining the best possible accuracy. Rather,we will focus on accuracy when we compare

Shadowfax with other publicly available codes in a later section.In this section, we describe the physical test problemscurrently in the testsuite and discuss their results. Wedo not describe tests that are used to verify the properworking of the program itself, like the restarttest , whichchecks if the program correctly restarts from restart ﬁles.

This test problem consists of a uniform box with unitlength, in which a ﬂuid with density 0 .

125 and pressure 0 . .

25 is inserted.This test corresponds to one of the Riemann solver tests inToro (2009), but generalized to 2 or 3 dimensions to testgeometrical aspects of the code as well.The solution of this problem consists of an inward trav-elling rarefaction wave, a central contact discontinuity, andan outward travelling shock wave. The simulation resultsin 2 and 3 dimensions at time t = 0 . For this test, a vortex in hydrostatic equilibrium isevolved for some time to check the local conservation ofangular momentum. Inside a box with unit length andconstant density 1, a 2D azimuthal velocity proﬁle of theform (Springel, 2010) v φ ( r ) =  r ≤ r < . − r . ≤ r < .

40 0 . ≤ r (12) .0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 r ρ r Figure 5: Radial density proﬁle for the spherical overdensity test at time t = 0 . Left:

2D result, using 10,000 uniformly sampled cells.

Right:

3D result, using 100,000 uniformly sampled cells. The black dots are the simulation results, the red line corresponds to the solution of thehigh resolution 1D equivalent problem. To limit the number of data points, the simulation results have been binned, the standard deviationof the density within the bins is indicated by the error ﬂags. is balanced by a pressure proﬁle of the form p ( r ) =  r ≤ r < .

29 + r − r + 4 log (cid:0) r . (cid:1) . ≤ r < .

43 + 4 log (2) 0 . ≤ r. (13)In 3D, we use the same proﬁle and expand it to cylin-ders in the third dimension, in a box of size 1 × × /

3. Weevolve the setup to time t = 3, and compare the velocityand pressure proﬁle with the initial proﬁles. The resultsare shown in Figure 6.Both the 2D and 3D test conserve the hydrostatic equi-librium relatively well over a long time scale. This meanslocal angular momentum is conserved to a reasonable de-gree. This problem is meant to test the limits of the code,both of the Riemann solver, the mesh regularization al-gorithm, and the time step criterion. It consists of a boxwith unit length in which a cold medium with density 1and pressure 10 − is in rest. In the central cell, we set thepressure to a much higher value, which corresponds to anenergy input of 1, so that a strong explosion is initiated.To accurately capture the explosion, it is important that • the central cells are kept regular at the start of thesimulation • the cells surrounding the center are given small enoughindividual time steps to be active when the shock ar-rives • the Riemann solver is able to handle vacuum gener-ating conditions to correctly estimate ﬂuxes aroundthe central cellThe resulting shock proﬁle is self-similar and has ananalytic solution (Sedov, 1977), which is shown togetherwith the simulation results in Figure 7. Again, the sim-ulation results are in line with the analytic solution, andthe shock is well-resolved. This test is used to validate the gravitational part ofthe code, and uses cold dark matter instead of gas. Sincegravity is not guaranteed to work in 2D, this problem isonly provided in 3D.A Plummer sphere (Plummer, 1911) with mass 1000and scale parameter 1 is initiated inside a large box (theactual box is irrelevant for this speciﬁc problem, but

Shad-owfax always requires a simulation box to be present).The velocities of the particles are chosen so that the entireproblem is independent of time. For convenience, we setthe gravitational constant G = 1 for this problem. Weadopt a gravitational softening length of 0 . ∼

10 dynamical times. Figure 9 shows the relative error12 .0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 r | v | r Figure 6: Radial velocity proﬁle for the Gresho vortex test at time t = 3. Both the 2D and the 3D simulation use 10,000 uniformly sampledcells. The black dots represent the binned simulation results, with the error ﬂags indicating the standard deviation on the values within thebins. The red line represents the initial velocity proﬁle, which should remain constant. r ρ r Figure 7: Density proﬁle for the Sedov-Taylor blast wave test at time t = 0 .

1. Left: 2D version, using an initially Cartesian grid with 45 × × ×

45 cells. The black dots represent the binned simulation results, withthe error ﬂags indicating the standard deviation of the density values within the bins. The full red line is the analytical solution of Sedov(1977). -2 -1 r -5 -4 -3 -2 -1 ρ t =0t =1 Figure 8: Density proﬁle of the N-body test at the start and end ofa simulation using 10,648 cold dark matter particles. The density iscalculated by summing the masses of all particles within sphericalshells. The dots and triangles represent simulation results, the fullblue line is the theoretical Plummer density proﬁle from which theinitial condition is sampled. t −0.0010−0.00050.00000.00050.00100.00150.0020 E t o t − E t o t , E t o t , Figure 9: The relative energy error of the N-body test as a functionof time. The gray line shows the energy error at all times, the blueline shows the energy error at the times when all particles are active. on the total energy. We see that the system remains stableand that the total energy is quite accurately preserved.

This problem tests the coupling between hydrodynam-ics and gravity, and consists of a self-gravitating gas cloudwith mass 1 and radius 1, with a density proﬁle of the form(Springel, 2010) ρ ( r ) = (cid:40) π ( r +0 . r ≤

10 1 < r. (14)The cloud is initially at rest and has a very low pressureproﬁle of the form p ( r ) = (cid:40) . π ( r +0 . r ≤

10 1 < r, (15)corresponding to a low constant thermal energy. We setthe gravitational constant G = 1 for this test and onlyconsider a 3D version. The softening length is set to 0 . t = 0 .

81 are shown in Figure 10, to-gether with a high resolution result for the equivalent 1Dproblem obtained using a ﬁnite volume method on a ﬁxed1D grid, and the solution of the same setup using the SPHcode

Gadget2 . The relative error on the total energy isshown in Figure 11. The virializing shock is clearly re-solved and travels at the expected speed. We see thattotal energy ﬂuctuates when the shock is formed and sta-bilizes after the system has virialized, which indicates thatthe gravitational correction terms in our scheme are notentirely eﬀective at the current resolution. Better energyconservation can be obtained by using more resolution, butthis is too computationally expensive for the testsuite.

6. Convergence rate

The MUSCL-Hancock ﬁnite volume method implementedin

Shadowfax is nominally second order in both spaceand time. However, due to the use of slope limiters andﬂux limiters, the actual order of the method can be lowerin the presence of strong discontinuities. To test this, westudy the convergence rate as a function of the numberof cells for a number of diﬀerent tests: a one dimensionalsmooth travelling sound wave, a one dimensional shocktube and a two dimensional vortex.14 -2 -1 ρ −1.6−1.2−0.8−0.40.0 v r -2 -1 r P / ρ γ Figure 10: Proﬁles of the Evrard collapse test at time t = 0 .

81, whenthe central virializing shock has formed, but has not yet reached theboundary of the cloud. Top: density proﬁle, middle: radial velocityproﬁle, bottom: entropy proﬁle. The simulation used 20,000 cells intotal, some of which are vacuum cells surrounding the cloud. Theblack dots represent the binned simulation results, with the errorbars indicating the standard deviation on the density values withinthe bins. The red line is the solution of the equivalent 1D problem.The gray points are the solution for the same setup, but using theSPH code

Gadget2 . t −0.010−0.0050.0000.0050.0100.015 E t o t − E t o t , E t o t , Figure 11: The relative energy error for the Evrard collapse test asa function of time. The gray line represents the energy error at alltimes, the blue line represents the energy error at times when all cellsare active.

The initial condition for this test consists of a periodiccubic box with unit side containing a sound wave with asmall amplitude A = 10 − . The density is given by ρ ( x ) = 1 + A sin 2 πx, (16)the pressure is p ( x ) = ρ ( x ) γ /γ , with γ = 5 /

3, and thevelocity is zero everywhere. The wave travels with thesound speed, c s = 1 for this particular choice of pressure.Although this problem is formally one dimensional, westudy it in 3D, using N comoving cells, with N the num-ber of cells in one dimension. We use both an initial Carte-sian grid (which remains Cartesian throughout the simu-lation as the ﬂuid velocity is zero), and a random uniformgrid. We expect the convergence to be better in the formercase, as the Cartesian solution corresponds to a combina-tion of N

1D solutions. For a random uniform grid, thecells are irregularly shaped, and the problem is eﬀectivelymulti-dimensional.We study the convergence rate by comparing the solu-tion ρ i at t = 1, when the wave has travelled for one boxlength, with the initial condition ρ ( x i ) at t = 0. To thisend, we calculate the L1 normL1 = 1 N (cid:88) i | ρ i − ρ ( x i ) | , (17)for all cells i .Figure 12 shows the L1 norm as a function of the equiv-alent 1D cell number N . Both the Cartesian and the ran-dom uniform grid show close to N − scaling, indicatingthat the method is indeed second order in space and time.15 N -9 -8 -7 -6 L Cartesian∼N −2.31 N Uniform random∼N −2.04

Figure 12: L1 norm as a function of 1D cell number for a travellingsound wave in 3D. The blue dots are the simulation results, the redline is a least squares ﬁt.

A more demanding test problem is the Sod shock test,which consists of a reﬂective cubic box with unit length inwhich the density is given by ρ ( x ) = (cid:40) x < . .

25 0 . ≤ x, (18)and the pressure by ρ ( x ) = (cid:40) x < . . . ≤ x. (19)The initial velocity is zero everywhere. This problem is the1D equivalent of the spherical overdensity discussed as partof the testsuite, and its solution has the same characteristicwave components, including a contact discontinuity and ashock wave, for which we expect the convergence rate ofour scheme to be worse than second order.We evolve the test to t = 0 .

12, and calculate the L1norm in the same way as above, but with ρ ( x i ) now givenby the exact solution to the equivalent Riemann problemat t = 0 .

12, which can be found using an exact Riemannsolver.Figure 13 shows the L1 norm as a function of the equiv-alent 1D cell number N , again for both a Cartesian initialgrid and a random uniform initial grid. In this case, con-vergence is clearly worse than N − , and is not even ﬁrstorder in space and time. There no longer is a clear diﬀer-ence between the Cartesian grid and the uniform randomgrid. To study the convergence in a manifestly multi-dimensionalproblem, we also consider the 2D Gresho vortex, which N -3 -2 -1 L Cartesian∼N −0.84 N Uniform random∼N −0.84

Figure 13: L1 norm as a function of 1D cell number for a Sod shocktest in 3D. The blue dots are the simulation results, the red line is aleast squares ﬁt. we already encountered as part of the testsuite. We runfour simulations with respectively 1,000, 10,000, 50,000and 100,000 random uniform cells until t = 3, and calcu-late the L1 norm for the azimuthal velocities v φ,i using thetime independent initial condition v φ ( r ) given above:L1 = 1 N (cid:88) i | v φ,i − v φ ( r i ) | . (20)In Figure 14, we show the L1 norm as a function of the1D equivalent cell number N (cid:48) = √ N . Again, convergenceis not second order, but it is clearly better than in the caseof a strong shock, as the Gresho vortex is overall moresmooth.

7. Performance

Modern high performance computing systems consistof large numbers of computing nodes, each of which canhave multiple CPUs, which are made up of multiple com-puting cores. These systems are hence highly parallel andemploying their full power requires algorithms that canexploit this parallelism. Compared to the available com-puting power, memory is relatively scarce on most system,so that it is also important to minimize the memory im-print of an application.The current version of

Shadowfax was not optimizedfor usage on high performance systems, but some basicMPI communication instructions were added to make itrun on distributed memory systems. To fully exploit thepower of modern architectures, a hybrid algorithm thatcombines distributed memory parallelism with shared mem-ory parallelism is needed, as well as a better representation16 N ′ -4 -3 -2 -1 L ∼N −1.12 Figure 14: L1 norm as a function of 1D cell number for a 2D Greshovortex test. The blue dots represent simulation values, the red lineis a least squares ﬁt. of the Voronoi mesh that depends less on a globally con-structed mesh. This requires a thorough refactoring of thecode, which is left for future versions.In this section, we discuss the strong and weak scal-ing of the current version of the code, the contribution ofthe various components of the code to the total run time,and their memory imprint. For all these tests, we use theEvrard collapse test discussed above, which uses both thehydrodynamical solver and the N-body solver. Since thissetup is very inhomogeneous, our simple domain decom-position leads to serious load-imbalances. For comparison,we also show scaling results for the more homogeneousspherical overdensity test.

Strong scaling measures the decrease in total simula-tion run time when running the same simulation on anincreasingly large number of processes. Ideally, doublingthe amount of processes should half the total run time, butdue to communication overhead this can never be achieved.As communication scales with the surface area of the dif-ferent computational domains in the simulation, while thecomputation time per domain scales with the domain vol-ume, we expect the relative contribution of communicationto the total run time to increase with increasing processnumber. This poses a natural limit on the speed up thatcan be achieved for a given problem size.Figure 15 shows the strong scaling for two of the testproblems that are part of the testsuite: the 3D Evrard col-lapse and the 3D spherical overdensity. The tests were runon a single node of our local computing cluster, consistingof 4 2.7 GHz Intel Xeon CPUs with 8 cores each, usingversion 4.8.4 of the GNU compiler and OpenMPI 1.6.5.The top row shows the speedup, i.e. the ratio of the single S p ee d u p Evrard collapse Spherical overdensity

Number of processes M P I f r a c t i o n Number of processes

Figure 15: Speedup and MPI fraction for two diﬀerent tests from thetestsuite under strong scaling. process total run time and the parallel run time. The bot-tom row shows the fraction of the total run time spent inMPI functions: send and receive operations and idle timedue to load imbalances. We see that the speedup is ratherpoor, except for small process numbers, and that the MPIfraction increases signiﬁcantly with increasing number ofprocesses. This is to be expected, since load balancing isin no way optimized in the current version of the code.For a homogeneous set up like the spherical overdensitythis leads to reasonable scaling, but for strongly inhomo-geneous tests like the Evrard collapse this is disastrous.

Weak scaling is the scaling behaviour of the code whenincreasing the number of processes for a ﬁxed problem sizeper process, i.e. if we double the number of processes, wedouble the number of cells in the simulation as well. It is agood measure for how well the problem is split up over thediﬀerent processes, as we expect the amount of work andthe amount of communication per process to stay roughlyconstant.Figure 16 shows the weak scaling for the 3D Evrardcollapse and 3D spherical overdensity, with a nominal loadof 10,000 cells per process for both tests. The tests werecarried out on the same hardware as the strong scalingtests discussed above. Instead of the speedup, we nowshow the slowdown in the top row, i.e. the ratio of thetotal parallel run time and the serial run time. Again, wehave rather poor scaling.

An important part of a

Shadowfax simulation is theconstruction of the Voronoi mesh that is used for the hy-drodynamical integration. On average, the serial versionof our code can handle ≈ ,

000 Voronoi cells per second17 S l o w d o w n Evrard collapse Spherical overdensity

Number of processes M P I f r a c t i o n Number of processes

Figure 16: Slowdown and MPI fraction for two diﬀerent tests fromthe testsuite under weak scaling. The red dashed line correspondsto a 1:1 relation, while the full red lines correspond to unity. in 2D, and ≈ ,

000 Voronoi cells per second in 3D, in-cluding the treatment of reﬂective or periodic boundariesand the calculation of cell centroids and volumes, and facemidpoints and surface areas.Figure 17 shows the amount of time spent in variousparts of the code during serial runs with diﬀerent problemsizes. The current design of the code makes it very hard toextract the time spent in MPI communications from thetimes for the diﬀerent components, so that we do not showparallel results.The gravitational force calculation clearly makes up alarge fraction of the total run time, which increases if thenumber of cells is increased. Grid construction in 3D takesup about double the time that is spent in the actual hy-drodynamical integration. Only very little time is spent inother parts of the algorithm, e.g. particle sorting, tree con-struction, snapshot output... Program performance hencedoes not suﬀer due to the somewhat less eﬃcient but muchmore convenient object oriented design of these parts.

To test the memory consumption of the code, we used

Massif , a heap proﬁler which is part of the Valgrind instrumentation framework . Since we do not expect thememory imprint to vary much over time, we limited theEvrard test to t = 0 .

06 for this test.Figure 18 shows the fraction of the memory occupiedby various parts of the code during the ﬁnal

Massif snap-shot, when the code used 137 MB of memory in total (in-cluding empty blocks used for memory alignment). A largefraction of this memory is occupied by the Voronoi mesh, http://valgrind.org/docs/manual/ms-manual.html http://valgrind.org/ Number of cells R un t i m e f r a c t i o n Evrard collapsevariousgravitygrid constructionhydrodynamics Number of cells

Spherical overdensityvariousinitializationgrid constructionhydrodynamics

Figure 17: Fraction of total run time spent in diﬀerent parts of thecode.

Voronoi faces28.0%Delaunay grid 23.3%Gas particles 9.3%Voronoi cells28.5% Other10.9%

Figure 18: The fraction of the memory occupied by various parts ofthe code during an Evrard collapse test with 20,000 cells.

8. Comparison with other methods

In this section, we compare

Shadowfax with a num-ber of other publicly available hydrodynamical solvers, toqualify the advantages and disadvantages of the movingmesh method. All ﬁles necessary to run these tests areavailable from . It is a known problem of particle-based Lagrangianmethods like SPH that they have diﬃculties resolving Kelvin-Helmholtz instabilities (Agertz et al., 2007), since the ba-sic SPH equations do not cover discontinuous solutions,like shock waves and contact discontinuities. As Springel(2010) showed, a moving mesh method does not experi-ence these diﬃculties, as the Riemann problem based ﬁnitevolume method that is used does include discontinuous so-lutions.One of the main advantages of a Lagrangian methodover methods that use a ﬁxed discretization (like AMR),is the Galilean invariance of the method. This means thatthe ﬂux between two neighbouring cells will only dependon the relative velocity of two neighbouring cells, and noton their absolute velocities with respect to some referenceframe ﬁxed to the simulation box. As a result, we expecta moving mesh method to better resolve instabilities in aﬂuid that is moving with a high bulk velocity with respectto the simulation box reference frame. To quantify thisbehaviour, we set up a variant of the shearing layers testintroduced in Agertz et al. (2007).For this test, two layers which can have diﬀerent densi-ties are in pressure equilibrium inside a periodic box. Thelayers receive a velocity component parallel to the interfacebetween the layers, but with opposite sign, so that theyshear against each other. Due to the pressure equilibrium,the system is marginally stable : a small velocity com-ponent perpendicular to the interface between the layerswill exponentially grow to form a Kelvin-Helmholtz insta-bility. The instability ﬁrst goes through a linear phaseof exponential growth, after which the system becomeshighly non-linear. The simple setup described here is un-stable on all scales, so that instabilities can even be seededby numerical noise in the absence of diﬀusion. As a re-sult, the results are highly dependent of the resolutionand the details of the numerical scheme. This makes athorough comparison of diﬀerent methods completely im-possible (Lecoanet et al., 2016).As Hendrix and Keppens (2014) point out, introduc-ing a middle layer with a linear transition in ﬂow velocity in between the shearing layers will suppress small scaleinstabilities, so that the growth of instabilities no longerdepends on the numerical resolution (if it is high enough).The middle layer also introduces a maximally unstablewavelength, which we will seed. This gives us full con-trol over the instabilities that will grow. We ﬁrst discussthe non-linear growth of the instability in a setup with adensity contrast of 10 between the layers, to show that

Shadowfax qualitatively produces similar instabilities.We then study the convergence of the linear growth ratein a setup without density contrast.

We set up a periodic 2D box with unit length, in whichthe density is given by ρ ( x, y ) =  y < . . ≤ y ≤ .

751 0 . < y. (21)The x component of the velocity is given by v x =  − . y ≤ . − d − . y + d − . d . − d < y < .

25 + d . .

25 + d ≤ y ≤ . − d . − y + d − . d . − d < y < .

75 + d − . .

75 + d ≤ y, (22)with d = 0 .

025 the thickness of the middle layer. The y component of the velocity is v y = A sin (4 πx ) (cid:18) e − ( y − . σ + e − ( y − . σ (cid:19) , (23)with A = 0 . σ = 0 . p = 2 . Shadowfax ,with results obtained using the AMR code MPI-AMRVAC (Keppens et al., 2012), using the same initial condition(we use a Cartesian initial grid for the Shadowfax runs).MPI-AMRVAC supports diﬀerent hydrodynamical schemes;we use both a conservative ﬁnite diﬀerence scheme withglobal Lax-Friedrich splitting and a ﬁfth order spatial re-construction (hereafter called FD), and a ﬁnite volumescheme with a Harten-Lax-van Leer Contact (HLLC) solver(hereafter called FV). Both schemes use a fourth order ac-curate Runge-Kutta time integration scheme.To test the Galilean invariance of the code, we option-ally add a bulk velocity v bulk = 100 to the entire ﬂuid(corresponding to a Mach number of 155 in the high den-sity layer), so that the ﬂuid moves with respect to thesimulation box reference frame. The results of a low reso-lution run at time t = 1 . ascl:1208.014 .00.51.0 y S HADOWFAX

MPI-AMRVAC FV MPI-AMRVAC FD x y x x Figure 19: Density colour plot for the shearing layers test at time t = 1 .

5. The top row corresponds to simulations with v bulk = 0, while thebottom row corresponds to simulations with v bulk = 100. All simulations start from a 100 ×

100 Cartesian grid and have a ﬁxed number ofcells. The individual cells are shown, this explains the irregularities at the boundaries of the

Shadowfax plot. .00.51.0 y S HADOWFAX

MPI-AMRVAC FV x y x Figure 20: Density colour plot for the shearing layers test at time t =1 . ×

400 grid. The top row corresponds tosimulations with no bulk velocity, the bottom row has v bulk = 100. results, in the sense that they all produce the same largeinstabilities on the same timescale. There are some dif-ferences in the non-linear phase of the instability, whichis to be expected. With a bulk velocity, the results dra-matically change for the MPI-AMRVAC simulations. TheFD method does not produce any result at all, while theinstabilities in the FV simulation are clearly aﬀected bythe bulk velocity. There is also a clear imprint on therun time of the MPI-AMRVAC simulations, since a muchsmaller system time step is needed for the integration. Therun time and the results of the Shadowfax simulationsare not aﬀected by the bulk velocity, since the mesh ismoving along with the ﬂow. These results are conﬁrmedby the high resolution runs, shown in Figure 20.

In a setup without density contrast, the initial expo-nential growth of the instability only depends on the wave-length of the instability and the thickness of the middlelayer (Hendrix and Keppens, 2014). This means that sim-ulations of this initial phase should converge to the samegrowth rate, irrespective of the resolution or the methodthat is used.To test this, we run a variant of the shearing layerstest without density contrast. We still use a periodic boxwith unit length, but now the density and pressure in thebox are both constant and equal to 1. The x componentof the velocity is given by (22), while the y component of − − − − − − − E k i n , y S HADOWFAX × S HADOWFAX × S HADOWFAX × S HADOWFAX × . . . . . . . . . t − − − − − − − − E k i n , y M PI - AMRVAC × M PI - AMRVAC × M PI - AMRVAC × Figure 21: The kinetic energy in the y direction as a function of timefor the shearing layers test without a density contrast. the velocity is now given by v y = B sin (4 πx ) (cid:18) e − ( y − . d + e − ( y − . d (cid:19) , (24)with B = 0 . d = 0 . y direction. Initially, this en-ergy is set by our seed velocity, but as the instability grows,kinetic energy in the x direction is converted into extra ki-netic energy in the y direction, so that this energy will growexponentially, as does the y component of the velocity.Figure 21 shows the kinetic energy in the y directionas a function of time for our simulations and for gridswith diﬀerent resolutions. The high resolution Shadow-fax results are in good agreement with the high resolutionMPI-AMRVAC results, but the convergence is slower for

Shadowfax . This is illustrated in Figure 22, where weplot the relative diﬀerence between the diﬀerent simula-tions and the high resolution MPI-AMRVAC result, thatwe use as a reference solution.We ﬁtted an exponential function of the form B e At tothe kinetic energy in the y direction, in the time interval[1 . , . A to quantify the growth of the21 . − . . . ( E k i n , y − E k i n , y , r e f ) / E k i n , y , r e f S HADOWFAX × S HADOWFAX × S HADOWFAX × S HADOWFAX × . . . . . . . . . t − . − . . . ( E k i n , y − E k i n , y , r e f ) / E k i n , y , r e f M PI - AMRVAC × M PI - AMRVAC × Figure 22: The relative diﬀerence between the kinetic energy in the y direction for the 400 ×

400 MPI-AMRVAC simulation and that forthe other simulations, as a function of time.Table 1: Slope of an exponential ﬁt to the kinetic energy in the y direction in the time interval [1 . , .

5] for the shearing layers testswithout density contrast. simulation slope

Shadowfax ×

100 4.61

Shadowfax ×

200 5.43

Shadowfax ×

400 5.10

Shadowfax ×

800 5.10MPI-AMRVAC 100 ×

100 5.28MPI-AMRVAC 200 ×

200 5.14MPI-AMRVAC 400 ×

400 5.12 instability. The results are shown in Table 1. Both thehigh resolution

Shadowfax and MPI-AMRVAC resultsare converged.

Shadowfax versus swift

The mesh-free methods introduced by Hopkins (2015)use the same ﬁnite volume method used by

Shadowfax ,but use an SPH-like discretization of the ﬂuid to calcu-late volumes and interfaces between neighbouring parti-cles. Since this method does not require the construc-tion of a global unstructured mesh, it is computationallycheaper and has a better potential for parallel scalability.Just as SPH however, the method smooths out local reso-lution over a number of neighbouring particles, potentiallylowering the eﬀective resolution.We compare

Shadowfax with our own implementa-tion of a mesh-free method in the highly parallel SPH-code swift . This method combines the fast neighbour loopalgorithms of swift with a ﬁnite volume method that isthe same as implemented in Shadowfax , and uses thesame exact Riemann solver. Since

Shadowfax reads thesame initial condition ﬁle format as swift , we can usethe exact same initial condition for both simulations anddirectly compare the results. We also compare with thedefault SPH version of swift .Figure 23 shows the density proﬁle of a Sod shock test,which is the 1D equivalent of the spherical overdensity testdiscussed as part of the testsuite . A left high density, highpressure region with ρ = 1 and p = 1 connects with a lowdensity, low pressure region with ρ = 0 .

25 and p = 0 . x = 0 . × . × .

125 with periodic boundaries. The results are evolvedto time t = 0 .

12, when the left rarefaction wave, centraldiscontinuity and right shock have developed.The

Shadowfax result clearly follows the theoreticalcurve, while both swift results show minor deviationsaround the diﬀerent features in the proﬁle. The deviationsare strongest for the swift mesh-free results, although thenoise on the swift

SPH result is higher. We calculated χ values for all three simulations by summing the quadraticdiﬀerences between the particle densities and the theoret-ical densities for all particles. This yielded χ = 29 . Shadowfax , χ = 106 .

71 for swift mesh-free, and χ = 123 .

54 for swift

SPH. Not smoothing out the localresolution hence clearly leads to an overall higher accuracyof the moving mesh method.

The strong shock test proposed by Noh (1987) is avery challenging test with a known analytical solution. Itconsists of a reﬂective box with unit length, in which aﬂuid with unit density and a negligible thermal energy of1 × − is enclosed. The radial velocity of the ﬂuid is set https://gitlab.cosma.dur.ac.uk/swift/swiftsim/tree/gizmo . . . . . ρ − . − . . . . ρ − ρ t h e o r y S HADOWFAX . . . . . ρ − . − . . . ρ − ρ t h e o r y SWIFT mesh-free . . . . . . x . . . . . ρ . . . . . . x − . − . . . ρ − ρ t h e o r y SWIFT

SPH

Figure 23: Density proﬁle of the Sod shock test at t = 0 .

12. The black dots represent the binned density values, with the error ﬂags indicatingthe standard deviation on the values within the bins. The red line corresponds to the exact solution, which is the solution of the equivalentRiemann problem. The right column shows the diﬀerence between the density and the analytical solution. All simulations use the same initialcondition with 1,024,128 particles. − t = 0, so that the ﬂuid collapses on the originand causes a strong shock with a very high Mach number.The velocity of the shock front is v shock = 1 /

3, and theradial density proﬁle at time t > ρ ( r, t ) = (cid:40) r ≤ v shock t tr v shock t < r (25)in 2D.As is common for this problem, we restrict ourselves tothe upper right quadrant of the box. However, we do notuse the commonly used inﬂow boundary conditions for theupper and right boundaries (Springel, 2010), since Shad-owfax does not currently support inﬂow boundaries. Sincethe radial ﬂow of the ﬂuid creates a very low density cav-ity at these boundaries, this leads to numerical problemswhen the shock reaches the low density cavities. We willtherefore restrict the simulation to t = 0 . t = 0 . Shadowfax result. Wecalculated χ values for the simulations, yielding χ =1 . × and χ = 1 . × for Shadowfax , and χ = 2 . × and χ = 4 . × for MPI-AMRVAC. Another challenging test is the implosion test of Liskaand Wendroﬀ (2003). It consist of a periodic box of length0 . × .

6, in which a ﬂuid with unit density and pressureis initially at rest. In the center of the box, a rhombuswith width 0 . ρ = 0 . p = 0 .

14. Thehigh density ﬂuid will implode into this region and causea strong shock wave, that will travel back and forth in theperiodic box. As in Liska and Wendroﬀ (2003), we set theadiabatic index γ = 1 . Shadowfax results with resultsobtained using MPI-AMRVAC. It is interesting to see thatboth methods produce asymmetrical results in the center,while reproducing the same large scale shock wave, even at later times, when the shock wave has interacted withthe central instabilities. Just as before, the

Shadowfax instabilities seem to develop faster than the ones in theMPI-AMRVAC result, and they are also signiﬁcantly morechaotic, due to the co-moving character of the mesh. Forthe same mesh resolution, the

Shadowfax results showmore instabilities, indicating that a moving mesh has ahigher local resolution. Figure 2 shows a zoom of the

Shadowfax result at time t = 0 .

9. Discussion

In this paper, we introduced the public simulation code

Shadowfax . The code can be used to simulate a mix-ture of gas and cold dark matter, with an accurate treat-ment of the hydrodynamics and gravitational forces. Thediscretization of the gas is provided by an unstructuredVoronoi mesh, which is evolved in time to more accuratelyfollow the hydrodynamical ﬂow.The code is inspired by Springel (2010), but uses anobject-oriented design, which attempts to make the codeeasier to read and extend. Some parts of the code makeuse of advanced C++ language features, like templates, toimprove code reuse without signiﬁcant run time cost. Wehave attempted to separate the actual physics from the al-gorithmic details as much as possible. Much improvementis possible however, which will be the subject of futurework on the code.We have introduced the test problems that are cur-rently included in the public version of the code, as wellas some more involved tests that compare the code withother publicly available codes. These tests show that theLagrangian nature of the moving mesh method makes itbetter at resolving instabilities in regions with high Machnumbers than ﬁxed grid methods. On the other hand, amoving mesh is a lot more sensitive to instabilities thatare seeded numerically. When the growth rate of insta-bilities is studied, a moving mesh method requires higherresolution to converge to a consistent growth rate than aﬁxed grid method.The ﬁnite volume method used by

Shadowfax is al-most identical to that used by the mesh-free methods ofHopkins (2015), but has a higher eﬀective resolution forthe same number of particles, since the mesh-free dis-cretization of space smooths out resolution over a largenumber of neighbours.24 ρ − . − . − . − . . . . . . ρ − ρ t h e o r y S HADOWFAX × S HADOWFAX × .

00 0 .

05 0 .

10 0 .

15 0 .

20 0 . r ρ .

00 0 .

05 0 .

10 0 .

15 0 .

20 0 . r − . − . − . − . . . . . . ρ − ρ t h e o r y MPI-AMRVAC × MPI-AMRVAC × Figure 24: Radial density proﬁle for the Noh test at t = 0 .

5. The black and gray dots show the binned density values, with the error barsindicating the standard deviation of the values inside the bins. The red line corresponds to the analytical solution. The right column showsthe diﬀerence between the density and the analytical solution. .00.30.6 y S HADOWFAX

MPI-AMRVAC FV y x y x Figure 25: Density colour plots for the implosion test at diﬀerent times.

Top: t = 0 . middle: t = 0 . bottom: t = 0 . cknowledgments We thank the two anonymous referees for the construc-tive feedback that improved the quality of this manuscript,and for pointing out the details of our implementation weforgot to mention in the ﬁrst version. We thank the GhentUniversity Special Research Fund and the InteruniversityAttraction Poles Programme initiated by the Belgian Sci-ence Policy Oﬃce (IAP P7/08 CHARM) for ﬁnancial sup-port. We thank Robbert Verbeke for his contribution tothe exact Riemann solver, and Peter Camps for sharingthe

Vec class that originally was part of SKIRT withus, before SKIRT became public. BV thanks MatthieuSchaller and Tom Theuns for the discusions about ﬁnitevolume methods, and the coupling between hydrodynam-ics and gravity. We thank Volker Springel and RomainTeyssier for making their codes Gadget2 and ramses public. Special thanks also to Rony Keppens for sharinghis knowledge about grid methods and helping us settingup the MPI-AMRVAC simulations. Finally, we want tothank J. R. R. Tolkien for providing an extensive list ofunique names to give to computing infrastructure and sim-ulation codes.

References

Agertz, O., Moore, B., Stadel, J., Potter, D., Miniati, F., Read, J.,Mayer, L., Gawryszczak, A., Kravtsov, A., Nordlund, ˚A., Pearce,F., Quilis, V., Rudd, D., Springel, V., Stone, J., Tasker, E., Teys-sier, R., Wadsley, J., Walder, R., Sep. 2007. Fundamental diﬀer-ences between SPH and grid methods. Mon. Not. R. Astron. Soc.380, 963–978.Barnes, J., Hut, P., Dec. 1986. A hierarchical O(N log N) force-calculation algorithm. Nature 324, 446–449.Cloet-Osselaer, A., De Rijcke, S., Vandenbroucke, B., Schroyen, J.,Koleva, M., Verbeke, R., Aug. 2014. Numerical simulations ofdwarf galaxy merger trees. Mon. Not. R. Astron. Soc. 442, 2909–2925.Dobbs, C. L., Mar. 2015. The interstellar medium and star formationon kpc size scales. Mon. Not. R. Astron. Soc. 447, 3390–3401.Duﬀell, P. C., MacFadyen, A. I., Dec. 2011. TESS: A RelativisticHydrodynamics Code on a Moving Voronoi Mesh. Astrophys. J.Suppl. Ser. 197, 15.Duﬀell, P. C., MacFadyen, A. I., Aug. 2012. Global Calculations ofDensity Waves and Gap Formation in Protoplanetary Disks Usinga Moving Mesh. Astrophys. J. 755, 7.Gaburov, E., Johansen, A., Levin, Y., Oct. 2012. Magnetically Lev-itating Accretion Disks around Supermassive Black Holes. Astro-phys. J. 758, 103.Geen, S., Rosdahl, J., Blaizot, J., Devriendt, J., Slyz, A., Apr. 2015.A detailed study of feedback from a massive star. Mon. Not. R.Astron. Soc. 448, 3248–3264.Gonnet, P., Schaller, M., Theuns, T., Chalk, A. B. G., Sep. 2013.SWIFT: Fast algorithms for multi-resolution SPH on multi-corearchitectures. ArXiv e-prints.Greif, T. H., Springel, V., White, S. D. M., Glover, S. C. O., Clark,P. C., Smith, R. J., Klessen, R. S., Bromm, V., Aug. 2011. Simula-tions on a Moving Mesh: The Clustered Formation of PopulationIII Protostars. Astrophys. J. 737, 75.Hendrix, T., Keppens, R., Feb. 2014. Eﬀect of dust on Kelvin-Helmholtz instabilities. Astron. Astrophys. 562, A114. ascl:1109.003 Hopkins, P. F., Jun. 2015. A new class of accurate, mesh-free hy-drodynamic simulation methods. Mon. Not. R. Astron. Soc. 450,53–110.Jin, G., Mellor-Crummey, J., Mar. 2005. Sfcgen: A framework foreﬃcient generation of multi-dimensional space-ﬁlling curves by re-cursion. ACM Trans. Math. Softw. 31 (1), 120–148.URL http://doi.acm.org/10.1145/1055531.1055537 Keppens, R., Meliani, Z., van Marle, A., Delmont, P., Vlasis,A., van der Holst, B., 2012. Parallel, grid-adaptive approachesfor relativistic hydro and magnetohydrodynamics. J. Comput.Phys. 231 (3), 718744, special Issue: Computational PlasmaPhysicsSpecial Issue: Computational Plasma Physics.URL

Lecoanet, D., McCourt, M., Quataert, E., Burns, K. J., Vasil, G. M.,Oishi, J. S., Brown, B. P., Stone, J. M., O’Leary, R. M., Feb. 2016.A validated non-linear Kelvin-Helmholtz benchmark for numericalhydrodynamics. Mon. Not. R. Astron. Soc. 455, 4274–4288.Liska, R., Wendroﬀ, B., Mar. 2003. Comparison of several diﬀerenceschemes on 1d and 2d test problems for the euler equations. SIAMJ. Sci. Comput. 25 (3), 995–1017.URL http://dx.doi.org/10.1137/S1064827502402120

Lloyd, S., Mar 1982. Least squares quantization in pcm. EEE Trans.Inf. Theory 28 (2), 129–137.Noh, W. F., Sep. 1987. Errors for calculations of strong shocks usingan artiﬁcial viscosity and an artiﬁcial heat ﬂux. J. Comput. Phys.72, 78–120.Pakmor, R., Bauer, A., Springel, V., Dec. 2011. Magnetohydrody-namics on an unstructured moving grid. Mon. Not. R. Astron.Soc. 418, 1392–1401.Plummer, H. C., Mar. 1911. On the problem of distribution in glob-ular star clusters. Mon. Not. R. Astron. Soc. 71, 460–470.Price, D. J., Feb. 2012. Smoothed particle hydrodynamics and mag-netohydrodynamics. J. Comput. Phys. 231, 759–794.Reem, D., 2011. The geometric stability of voronoi diagrams with re-spect to small changes of the sites. In: Proceedings of the Twenty-seventh Annual Symposium on Computational Geometry. SoCG’11. ACM, New York, NY, USA, pp. 254–263.URL http://doi.acm.org/10.1145/1998196.1998234

Richard Shewchuk, J., 1997. Adaptive Precision Floating-PointArithmetic and Fast Robust Geometric Predicates. Discrete Com-put. Geom. 18 (3), 305–363.URL http://dx.doi.org/10.1007/PL00009321

Schaye, J., Crain, R. A., Bower, R. G., Furlong, M., Schaller, M.,Theuns, T., Dalla Vecchia, C., Frenk, C. S., McCarthy, I. G.,Helly, J. C., Jenkins, A., Rosas-Guevara, Y. M., White, S. D. M.,Baes, M., Booth, C. M., Camps, P., Navarro, J. F., Qu, Y.,Rahmati, A., Sawala, T., Thomas, P. A., Trayford, J., 2015. Theeagle project: simulating the evolution and assembly of galaxiesand their environments. Mon. Not. R. Astron. Soc. 446 (1),521–554.URL http://mnras.oxfordjournals.org/content/446/1/521.abstract

Sedov, L., 1977. Similitude et dimensions en mecanique. Edition Mir.Siebert, C., Wolf, F., 2010. A scalable parallel sorting algorithm usingexact splitting. Tech. rep., German Research School for SimulationSciences GmbH.Springel, V., Dec. 2005. The cosmological simulation codeGADGET-2. Mon. Not. R. Astron. Soc. 364, 1105–1134.Springel, V., Jan. 2010. E pur si muove: Galilean-invariant cosmo-logical hydrodynamical simulations on a moving mesh. Mon. Not.R. Astron. Soc. 401, 791–851.Sundar, H., Sampath, R. S., Biros, G., 2008. Bottom-Up Construc-tion and 2:1 Balance Reﬁnement of Linear Octrees in Parallel.SIAM J. Sci. Comput. 30 (5), 26752708.URL http://dx.doi.org/10.1137/070681727

Teyssier, R., Apr. 2002. Cosmological hydrodynamics with adaptivemesh reﬁnement. A new high resolution code called RAMSES.Astron. Astrophys. 385, 337–364.Toro, E. F., 2009. Riemann Solvers and Numerical Methods for FluidDynamics, 3rd Edition. Springer-Verlag, Berlin Heidelberg. erbeke, R., Vandenbroucke, B., De Rijcke, S., Dec. 2015. How theFirst Stars Shaped the Faintest Gas-dominated Dwarf Galaxies.Astrophys. J. 815, 85.Vogelsberger, M., Genel, S., Springel, V., Torrey, P., Sijacki, D., Xu,D., Snyder, G., Bird, S., Nelson, D., Hernquist, L., May 2014.Properties of galaxies reproduced by a hydrodynamic simulation.Nature 509, 177–182.Vogelsberger, M., Sijacki, D., Kereˇs, D., Springel, V., Hernquist,L., Oct. 2012. Moving mesh cosmology: numerical techniques andglobal statistics. Mon. Not. R. Astron. Soc. 425, 3024–3057.Yalinewich, A., Steinberg, E., Sari, R., 2015. RICH: Open-sourceHydrodynamic Simulation on a Moving Voronoi Mesh. Astrophys.J. Suppl. Ser. 216 (2), 35.URL http://stacks.iop.org/0067-0049/216/i=2/a=35http://stacks.iop.org/0067-0049/216/i=2/a=35