The moving mesh code Shadowfax
TThe moving mesh code
Shadowfax
B. Vandenbroucke a, ∗ , S. De Rijcke a a Dept. Physics & Astronomy, Ghent University, Krijgslaan 281, S9, 9000 Gent, Belgium
Abstract
We introduce the moving mesh code
Shadowfax , which can be used to evolve a mixture of gas, subject to the laws ofhydrodynamics and gravity, and any collisionless fluid only subject to gravity, such as cold dark matter or stars. Thecode is written in C++ and its source code is made available to the scientific community under the GNU Affero GeneralPublic License. We outline the algorithm and the design of our implementation, and demonstrate its validity throughthe results of a set of basic test problems, which are also part of the public version. We also compare
Shadowfax with anumber of other publicly available codes using different hydrodynamical integration schemes, illustrating the advantagesand disadvantages of the moving mesh technique.
Keywords: methods: numerical, hydrodynamics
1. Introduction
Modern simulations of galaxy formation and evolutioncrucially depend on an accurate treatment of the hydro-dynamics of the interstellar medium (ISM) (Vogelsbergeret al., 2014; Schaye et al., 2015). The ISM fuels star for-mation and is disrupted by stellar feedback, and it is thiscomplex interplay that at least partly governs the observ-able content of galaxies (Verbeke, Vandenbroucke, and DeRijcke, 2015). If we want to be able to compare simulatedgalaxies with observations, we need to properly resolvethese effects.Hydrodynamics is also important on smaller scales,when simulating star-forming clouds (Greif et al., 2011;Dobbs, 2015), feedback from a single star (Geen et al.,2015), or even planet formation in a circumstellar disc(Duffell and MacFadyen, 2012). A robust hydrodynami-cal integration scheme, optionally extended with magneticfields, self-gravity or radiation transport, is hence an in-dispensable tool for many astrophysical simulators.Historically, two major classes of hydrodynamical solvershave been developed : grid based Eulerian techniques (Teys-sier, 2002; Keppens et al., 2012), and particle-based La-grangian techniques (Springel, 2005; Price, 2012). Bothdiscretize the fluid as a finite set of fluid elements. In theformer, the fluid elements are cells, usually defined througha (hierarchical) Cartesian grid, which have a fixed positionin space, but can be allowed to refine or derefine accordingto the quality of the integration. In the latter, the fluid el-ements are particles, which move along with the flow, withthe hydrodynamics being expressed as inter-particle forces. ∗ Corresponding author
Email addresses: [email protected] (B. Vandenbroucke), [email protected] (S. De Rijcke)
It is generally acknowledged that grid based Eulerian tech-niques are more accurate at solving the equations of hy-drodynamics, especially since many particle-based imple-mentations have fundamental difficulties in resolving hy-drodynamical instabilities (Agertz et al., 2007). Nonethe-less, Lagrangian techniques are widely used to simulatesystems with a high dynamic range, like cosmological sim-ulations and simulations of galaxies, since they more nat-urally concentrate computational resources on regions ofinterest, and provide a Galilean invariant reference frame.Recently, a new class of hydrodynamical solvers hasbeen developed, mainly through the work of Springel (2010),which aims to combine the advantages of Eulerian andLagrangian techniques (see also Duffell and MacFadyen,2011; Yalinewich, Steinberg, and Sari, 2015). This newtechnique uses a moving grid to discretize the fluid, andcombines an unstructured grid based finite volume inte-gration scheme with the Lagrangian nature of a particlemethod. We will refer to this method as a moving meshtechnique .A number of moving mesh codes are presented in theliterature (Springel, 2010; Duffell and MacFadyen, 2011),but only two of them are publicly available : rich (Ya-linewich, Steinberg, and Sari, 2015), written in C++, and FVMHD3D , written in the parallel object-oriented lan-guage Charm++ (Gaburov, Johansen, and Levin, 2012).In this paper, we introduce the new, publicly availablemoving mesh code
Shadowfax (the logo of the code isshown in Figure 1).
Shadowfax is written in C++, andmakes ample use of the object-oriented capabilities of thelanguage to provide an easy to extend framework. Thecode is parallelized for use on distributed memory systems ascl:1410.005 https://github.com/egaburov/fvmhd3d Preprint submitted to Astronomy & Computing May 13, 2016 a r X i v : . [ a s t r o - ph . I M ] M a y igure 1: The Shadowfax logo. using the Message Passing Interface (MPI) , and makesuse of the open source Boost
C++ libraries to extend ba-sic C++ language features. The code supports input andoutput using the HDF5 library in a format compatiblewith the output of Gadget2 (Springel, 2005), gizmo (Hopkins, 2015) and swift (Gonnet et al., 2013). A userfriendly compilation process is guaranteed through the useof CMake .The hydrodynamical algorithm implemented in Shad-owfax is the same as described by Springel (2010), butwith an additional per-face slope limiter and flux limiter,and optional alternative approximate Riemann solvers. Thegravitational calculation is the same as the tree force cal-culation in
Gadget2 (Springel, 2005), and uses the samerelative tree opening criterion and Ewald summation tech-nique for periodic boundary conditions. We have portedthis algorithm to an object-oriented version, which makesuse of compile-time polymorphism using C++ templates.This ensures a clear separation of the algorithmic detailsunderlying the tree walk from the actual physics involvedwith the gravitational calculation. This way, it is mucheasier to focus on one particular aspect of the code, e.g.scalibility, precision..., without needing to worry aboutother aspects.Likewise, we have separated the geometrical detailscontained in the moving mesh from the hydrodynamicalintegration as mush as possible, to make it easier to re-place parts of the algorithm (e.g. the Riemann solver, thegrid...) by simply implementing an alternative class.Our code is predominantly meant to be used in as-trophysical simulations of galaxy formation and evolution,but could have applications in other areas of science aswell, as it is not difficult to replace the Euler equations ofhydrodynamics by e.g. the shallow water equations by im-plementing a different Riemann solver. Furthermore, theVoronoi grid used to discretize the fluid can also be usedfor other purposes, e.g. for the suppression of Poissonnoise in randomly sampled distributions through Lloyd’s ascl:0003.001 ascl:1410.003 http://icc.dur.ac.uk/swift/ https://cmake.org algorithm (Lloyd, 1982), or as density estimator in N-bodysimulations (Cloet-Osselaer et al., 2014).In this paper, we outline the basic working of Shad-owfax . We mainly focus on the C++ implementation andthe object-oriented design of our code, and compare ourcode with other hydrodynamical solvers on a number oftest problems. Although the current version of
Shadow-fax focusses more on design and accuracy than on per-formance, we also highlight some basic strong and weakscaling tests. Performance optimizations and extra physi-cal ingredients (e.g. gas cooling, star formation and stel-lar feedback...) will be added in future versions of thecode. The source code of
Shadowfax is publicly avail-able from https://github.com/AstroUGent/shadowfax ,and is distributed under the GNU Affero General PublicLicense .
2. Algorithm
Many of the algorithms implemented in
Shadowfax were already discussed in Springel (2005) and Springel(2010). For completeness, we summarize them below andpoint out the differences where necessary.
Shadowfax is based on a finite volume method, whichsubdivides the computational box into a (large) number ofsmall cells. The hydrodynamical integration is governedby the exchange of fluxes between these cells.These fluxes involve the conserved variables : mass( m ), momentum ( p ) and total energy ( E ). The Eulerequations of hydrodynamics however are usually formu-lated in terms of primitive variables : density ( ρ ), flow ve-locity ( v ) and pressure ( p ). The pressure is sometimesreplaced by the thermal energy ( u ) or some form of en-tropic function of the fluid, by using the equation of state of the fluid. In this work, we will always assume an idealgas, with an equation of state of the form p = ( γ − ρu, (1)where γ is the adiabatic index of the gas, for which we willadopt the value γ = 5 /
3, unless otherwise stated. The con-served variables and primitive variables can be convertedinto one another whenever a volume ( V ) is available, since m = ρV (2) p = m v (3) E = mu + 12 m v . (4)It is common practice to combine the conserved andprimitive variables into two state vectors , Q = m p E and W = ρ v p . (5) Q i for a cell i , duringan integration time step of length ∆ t , is then given by∆ Q i = − ∆ t (cid:88) j A ij F ij ( W i , W j , ∇ W i , ∇ W j , v ij , ∆ t ) , (6)where A ij is the surface area of the interface between cell i and cell j . F ij is the flux between cell i and cell j , whichin general depends on the primitive variables of both cellsand their gradients, the velocity v ij of the face with respectto a frame of reference fixed to the simulation box, and theintegration time step.When formulated in this way, the finite volume methodcan be applied to any discretization of the fluid, as longas this discretization yields volumes to convert conservedvariables to primitive variables, and defines a concept ofneighbour relations between cells, and an associated sur-face area and velocity for the neighbour interface. It caneven be applied to mesh-free, particle-based methods (Hop-kins, 2015).In the case of a moving mesh method, the discretizationis given by an unstructured Voronoi mesh, a 2D example ofwhich is shown in Figure 2. The mesh is defined by meansof a set of mesh generating points ( generators ), with thecell associated with a specific generator containing the re-gion of space closest to that generator. A Voronoi meshcan be defined in D dimensions, but we will focus on thecases D = 2 and D = 3. The Voronoi mesh has the in-teresting property that it is relatively stable under smallmovements of the generators (Reem, 2011), so that cellsdeform continuously under a continuous movement of thegenerators. We exploit this property to allow the mesh tomove in between integration time steps. The surface areaof the faces will change linearly in between steps, so thatthe time averaged flux over the entire time step will becorrect, even if it is calculated at one specific moment intime.By setting the velocities of the mesh generators equalto the local flow velocity, the mesh will effectively movealong with the fluid, and we end up with a Lagrangianmethod. We can even reinterpret the generators as beingparticles, so that the moving mesh technique becomes atrue alternative for particle-based methods. However, theunderlying integration scheme uses the full strength of afinite volume method, and hence will be more accurate.Note that the quality of the integration will depend on theshape of the cells, with highly irregular cells leading to lessaccuracy (Vogelsberger et al., 2012). To ensure cell reg-ularity, it is sometimes necessary to add extra correctionterms to the generator velocities. We employ the schemeof Springel (2010) and steer the cell generator towards thecentroid of its cell if the distance between generator posi-tion and cell centroid exceeds a fraction of the generic cellsize, i.e. the radius of a sphere with the same volume asthe cell.In the remainder of this section, we describe the mesh construction and flux calculation in more detail, to intro-duce the concepts that are used in the discussion of the Shadowfax implementation.
We construct the Voronoi mesh through its dual De-launay triangulation. The latter is constructed using anincremental construction algorithm. In the default im-plementation, the relevant parts of the mesh are recon-structed for every time step. We also experimented witha mesh evolution algorithm (Vandenbroucke & De Rijcke,in preparation), which evolves the mesh instead of recon-structing it.As Richard Shewchuk (1997) pointed out, incremen-tal Delaunay construction algorithms can become unsta-ble due to numerical round-off error. To prevent thisfrom happening, we employ arbitrary exact arithmeticsfor all geometrical tests involved. Since the predicates ofRichard Shewchuk (1997) depend on a number of assump-tions on the internal CPU precision that are not met on allhardware architectures, we use the technique outlined bySpringel (2010) : we map the floating point coordinates ofthe mesh generators to the interval [1 ,
2] and use an inte-ger representation of the mantissa to exactly calculate theresult of a geometrical test if the numerical error couldlead to a wrong result. We pre-calculated the maximalsize of an integer necessary to store the exact result anduse the
Boost Multiprecision library to perform thecalculations using long integer arithmetics.The result of the mesh construction is a list of neigh-bours for every mesh generator, and an associated list offaces, with each face consisting of an ordered list of vertexpositions. We need to compute the volume and geometri-cal centroid of each cell from this, as well as the surfacearea and midpoint of every face.In 2D, the faces each contain only two vertices, withthe midpoints being the midpoints of the line segmentsformed by these two vertices. In this case, we order theneighbours and faces counter-clockwise around the posi-tion of the generator of the cell. The volume (2D surfacearea) of the cell is then given by the sum of the surfaceareas of the triangles that are formed by the first vertexand two consecutive other vertices. The centroid of thecell is then the weighted average of the centroids of thesesame triangles.In 3D, we use a similar technique to obtain the sur-face area and midpoint of the individual faces. The totalvolume of the cell is then the sum of the volumes of thepyramids with the faces as base and the cell generator astop (the latter being guaranteed to lie inside the cell). Thecentroid is the weighted average of the centroids of thesepyramids.The mesh construction algorithm only works if all cells,including those at the borders of the simulation volume, .0 0.3 0.6 x y x Figure 2: Density colour plot for the 2D Liska and Wendroff (2003) implosion test at t = 0 . Left: general view of the entire simulation box, right: zoom in on part of the central low density region. The Voronoi mesh used for the discretization of the gas has been overplotted. have boundaries. To make sure this is the case, we al-ways enclose all generators in a simulation box, which is acuboid. Cells at the boundaries are then completed by in-serting ghost generators. If we want periodic boundaries,we insert periodic copies of generators at the other side ofthe simulation box. We can also mirror the generator po-sitions themselves with respect to the boundary face of thesimulation box, in this case we have reflective boundaries.During the flux calculation, these ghost generators theneither represent the cell of which they are a periodic copy,or the border cell itself, but with the sign of the flow ve-locity along the boundary face normal reversed.
We employ a Monotonic Upwind Scheme for Conserva-tion Laws (MUSCL) combined with a Hancock predictionstep to estimate the fluxes. This scheme is more com-monly referred to as a MUSCL-Hancock scheme. The fluxcalculation consists of a cell based interpolation and inte-gration step, whereby the primitive variables at the centerof the cell (the centroid of the Voronoi cell) are interpo-lated to the position of the midpoint of the face using thegradients of the primitive variables in that cell, and arepredicted forward in time for half the time step using thesame gradients and the Euler equations in primitive form.These face reconstructed primitive variables are thenused as the input for a Riemann solver, to obtain appropri-ately averaged primitive quantities at the interface whichtake into account the local wave structure of the Eulerequations (Toro, 2009). This can be done using either an exact, iterative Riemann solver, or an approximate Rie-mann solver. The solution of the Riemann problem, W ∗ ,is then used to calculate the hydrodynamical fluxes as F ( W ∗ ) = ρ ∗ v ∗ ρ ∗ v ∗ v ∗ + p ∗ ρ ∗ ( u ∗ + v ∗ ) v ∗ + p ∗ v ∗ . (7)The Riemann problem is generally formulated in 1D,with the change in variables corresponding to a change in x coordinate. In an unstructured mesh, the face normalsare generally not aligned with the x axis, so that we needto rotate the primitive variables to a reference frame wherethe x axis is aligned with the surface normal of the face.This only affects the fluid velocity components. The so-lution of the Riemann problem then needs to be rotatedback to the original reference frame.When the cells are allowed to move, the faces will moveas well, and we need to transform to a frame of referencemoving along with the face before solving the Riemannproblem. This again only affects the fluid velocity compo-nents of the primitive variables.However, the flux needs to be adapted in this case aswell : even if the hydrodynamical flux through the facewould normally be zero, there will be net flux, caused onlyby the movement of the cell. This flux is given by F mov ( W ∗ ) = − ρ ∗ v ij ρ ∗ v ∗ v ij ρ ∗ ( u ∗ + v ∗ ) v ij . (8)4e can either add this correction flux to the hydrodynam-ical flux (Springel, 2010), or calculate the hydrodynamicalflux in a reference frame moving along with the face andcorrect for the movement afterwards (Pakmor, Bauer, andSpringel, 2011). A side effect of the gradient interpolation step dis-cussed above is the possible introduction of new extremaof the primitive variables, which can cause spurious oscil-lations in the solution (Toro, 2009). These spurious oscil-lations can be avoided by constructing monotone schemes,whereby the gradients of a cell are limited so that no inter-polated primitive value can exceed the primitive variablesin one of the neighbouring cells. However, such monotoneschemes are no longer second order accurate in space, sothat a limiting procedure inevitably leads to a loss of accu-racy. It might therefore be better to use a somewhat lessrestrictive gradient limiter, as long as the spurious oscilla-tions do not dominate the local solution (Hopkins, 2015).We use the cell wide slope limiter described by Springel(2010).Apart from a cell wide slope limiting procedure thatlimits the gradients of the cell during the gradient calcula-tion (which is performed after the primitive variables havebeen calculated, but before the fluxes are computed), it isalso possible to limit the interpolated values at the faces.The general idea of such a pair-wise slope limiter is to con-serve the wave structure of the Riemann problem. Thiswave structure generally consists of a central contact dis-continuity and a left and right wave, which can be eithera shock wave or a rarefaction wave (Toro, 2009). Spuri-ous oscillations arise when the solution of the Riemannproblem with the cell centered primitive variables as inputyields e.g. a left and right rarefaction wave, while the Rie-mann problem with the interpolated variables at the faceyields a left shock wave and a right rarefaction wave. Theleft shock is not present in the first order solution, andintroduces a growing artefact in the solution. With a pair-wise limiter, we limit the left and right interpolated valuesin such a way that the original wave structure of the Rie-mann problem is the same, but the input values can stilldiffer from the cell centred values. We implemented thepair-wise slope limiter of Hopkins (2015).Even with appropriate slope limiters, it is still possi-ble that the calculated fluxes are too large, i.e. exceed thevalue of the conserved variables in the cell. For the massand energy of the cell, this is fatal, since this can causenegative masses and energies, which are evidently unphys-ical. This can happen for example if the integration timestep is too large, if the gradient interpolation is done in anasymmetric way (due to pair-wise limiting), or if externalforces (e.g. gravity) contribute to the flux. The former isnormally excluded by choosing an appropriate time stepcriterion. To prevent the latter from crashing the code,we implemented a flux limiter , which ensures that the fluxthrough a face can never be larger than a fraction of the value of the conserved variables inside the cell. This frac-tion is equal to the ratio of the surface area of the face tothe total surface area of the entire cell.We note that a flux limiter is only used to ensure codestability, and in this sense is equal to resetting the mass orenergy of a cell to some very small value whenever they be-come negative. However, by limiting the flux and not thecell quantities themselves, we ensure manifest conservationof mass and energy, which would otherwise be violated.
Gravity is added as an extra term in the momentumequation:∆ p i = − ∆ t (cid:88) j A ij F ij, p −
12 ∆ t ( m i, old ∇ i Φ old + m i, new ∇ i Φ new ) , (9)where m i, old and m i, new represent the mass inside the cellbefore and after the update of the mass respectively, and ∇ i Φ old and ∇ i Φ new represent the gravitational acceler-ation before and after the generator positions have beenupdated. F ij, p is the hydrodynamical flux for the momen-tum, as given above.The gravitational acceleration is also taken into ac-count during the half step prediction, before the flux cal-culation, and only affects the velocity.Gravity also affects the total energy of the cell. Simplyadding a term to the energy equation does not take intoaccount the movement of the mass that fluxes through cellfaces during the time step, and leads to significant energyerrors. We therefore use the following more involved equa-tion to update the energy of a cell:∆ E i = − ∆ t (cid:88) j A ij F ij,E −
12 ( m i, old w i, old ∇ i Φ old + m i, new w i, new ∇ i Φ new ) − (cid:88) j ∆ m ij ( r i − r j ) ( ∇ i Φ old + ∇ i Φ new ) , (10)where w i, old and w i, new represent the generator velocitiesbefore and after the update of the generator positions, and∆ m ij is the mass that fluxed from cell i to cell j duringthe time step. The sum extends over all neighbours of thecell.We soften the gravitational acceleration for both thehydrodynamical and collisionless component using a splinekernel with fixed softening length (Springel, 2005). A grav-itational time step criterion based on the size of the gravi-tational acceleration and the softening length is combinedwith the hydrodynamical time step criterion to set theparticle time steps.5 imulationSimulation::initializeVorTessManagerRiemannSolverTimeLineSimulation::loadParticleVectorSimulation::main loop VorTessExactRiemannSolverSnapshotWriterGadgetSnapshotWriterSnapshotReaderGadgetSnapshotReader Figure 3: General overview of a simulation, using the default Voronoimesh, Riemann solver and input/output classes. After the desiredimplementations of the main classes have been constructed duringan initialization step, the initial conditions are read in and stored ina
ParticleVector data structure that will be updated during the mainsimulation loop.
3. Implementation
The finite volume method and the actual discretiza-tion of the fluid as a Voronoi mesh are clearly well sepa-rated concepts, so that it makes sense to separate them inthe design of a moving mesh code. It then becomes pos-sible to test the finite volume method using e.g. a fixedCartesian mesh, which is computationally much cheaper toconstruct, or to treat the mesh generators as particles andcalculate volumes and interfaces using a mesh-free method(Hopkins, 2015).Furthermore, various parts of the finite volume methodcan be adapted, and could be considered to be run timeparameters for the code : the choice of Riemann solver, thechoice of slope limiter,... A good code design should makeit possible to easily exchange these components withoutaffecting other parts of the algorithm, and where possiblealso without the need to recompile the code.Figure 3 shows a general overview of a simulation, asit is implemented in the
Simulation class, introducing themain classes of interest. Figure 4 shows how these differentclasses interact during the main simulation loop.In this section, we describe how various aspects of thealgorithm were implemented in the public version of
Shad-owfax . We highlight important abstractions, but also in-dicate where our current version does not comply with the
Simulation::main loopTimeLine::step forwardContinue simulation? StopWrite snapshot? SnapshotWriter::write snapshotKick colisionless particleskick cell generatorsExchange hydrodynamicalfluxesVorTessManager::estimate gradientsVorTessManager::calculate fluxes(RiemannSolver)Drift colissionless particlesdrift cell generatorsUpdate conserved variablesParticleVector::sort ParticleVector::update treeVorTessManager::update gridParticleVector::Tree::calculate accelerations < GravityWalker > Kick colisionless particleskick cell generatorsUpdate primitivevariables VorTessManager::get volumeTimeLine::calculate timestep yesyes
Figure 4: Overview of the main simulation loop. We combine aleapfrog integration for the gravitational force with a simple kick-drift scheme for the hydrodynamics. Geometrical properties like cellvolumes, neighbour relations and face properties are extracted fromthe
VorTessManager , gravitational forces are calculated using the
Tree managed by the
ParticleVector , using a template
GravityWalker .All time related stuff is handled by the
TimeLine . We have isolated the Voronoi mesh in a
VorTessMan-ager class, which encapsulates the entire mesh into a singleobject. Particles can be added to the
VorTessManager asa pair consisting of a set of coordinates and an integer key,so that the properties of the cell or mesh-free particle canthen later be retrieved by means of that same key. The
VorTessManager itself decides how these properties arecalculated. This can be either by using a Voronoi mesh, byusing a fixed Cartesian grid, or even by using a mesh-freemethod. The Voronoi mesh could be constructed from thepositions of the particles, but it can also be evolved fromthe Voronoi mesh from a previous time step. The publicversion of
Shadowfax for the moment only contains theordinary (non-evolved) Voronoi mesh and the fixed Carte-sian grid. The evolving mesh will be added as part offuture work (Vandenbroucke & De Rijcke, in preparation).The fluxes are calculated through faces, which could inprinciple be returned by the
VorTessManager as VorFace objects, which contain the surface area and midpoint ofthe face, as well as the indices of the two neighbouringgenerators. However, due to the historical developmentof the code and the complexities involved with parallelliz-ing the Voronoi mesh, the flux calculation is implementedas a method of the
VorTessManager object for the mo-ment, with the entire interpolation, integration and fluxestimation procedure being implemented as a method ofthe
VorFace object.
The ordinary Voronoi mesh is contained in a
VorTess object, which contains an actual list of the faces and ofthe cells, represented as
VorCell objects. The
VorCell ob-ject contains the necessary methods to calculate volumesand centroids, and in the current version of the code alsothe methods that estimate the gradients and calculate thegenerator velocities.When adding a particle to the
VorTess , the particlecoordinates are mapped to the range [1 ,
2] and stored in a
VorGen object, which represents a mesh generator. Thisobject is stored in a
DelTess object, which represents theDelaunay triangulation. The
DelTess object also containsa list of
Simplex objects, which represent the 2D trianglesor 3D tetrahedra that constitute the triangulation. Each
Simplex object has 3 or 4
VorGen objects as vertices (de-pending on the dimension). During the incremental con-struction algorithm, these simplices will change. When allparticles have been added to the
VorTess object, we call anappropriate method to signal this. At this point, a methodof the
DelTess object is called that will ensure complete-ness of the Delaunay triangulation (by adding inactive par-ticles that are neighbour to active particles, and ghost gen-erators for particles on other MPI processes). When thisis done, the actual Voronoi mesh is constructed by loopingover the
VorGen objects stored in the
DelTess and addinga
VorCell for every
VorGen . For every neighbour relationbetween
VorGen s, a
VorFace is created. At the end ofthe procedure, we also calculate the volumes, centroids,surface areas and midpoints of all cells and faces.
When the flux calculation method of the
VorTessMan-ager object is called, we pass on a general
RiemannSolver object, which is used to solve the Riemann problem atthe faces, but also to convert between primitive and con-served variables, and to calculate the actual fluxes. The
RiemannSolver class itself is just an interface, with the ac-tual implementation being deferred to child objects. Thepublic version of
Shadowfax implements two Riemannsolvers : an exact Riemann solver (
ExactRiemannSolver class), and a two rarefaction Riemann solver (
TRRSSolver class), which is an approximate solver that assumes a wavestructure containing two rarefaction waves.Which Riemann solver is used, is specified as a run timeparameter. To generate the appropriate
RiemannSolver implementation, a
RiemannSolverFactory class is used,which implements methods that convert a string represen-tation of a Riemann solver type to an object. A single
RiemannSolver object is created during program initial-ization and used throughout the simulation, so that the
RiemannSolver can store some valuable information aboutthe number of Riemann solver evaluations and the fractionof the run time spent in Riemann solver evaluations.Since the Riemann solver contains all information aboutthe actual equations being solved, it is possible to exper-iment with new equations of state by writing new
Rie-mannSolver implementations.For all tests presented in this paper, we use the exactRiemann solver.
Apart from the spatial discretization, the system ofequations is also discretized in time. We have implementedan individual time stepping scheme, in which cell timesteps are power of 2 subdivisions of the total simulationtime. Fluxes are always exchanged using the smallest timestep of the two cells involved, and the time step of a cellcan only increase again if it then becomes synchronizedwith other cells on the same time step level. The systemis always evolved using the smallest time step of all cells,7ut only those cells that are active at the beginning of thecurrent system time step are actually integrated.The size of the time step depends on the hydrodynam-ics and hence should ideally be set in the
RiemannSolver class. However, for problems involving gravity, there is alsoa gravitational time step criterion. Furthermore, the sizeof the time step is also governed by a stability parameter(called the Courant-Friedrichs-Lewy or CFL parameter),which is specified as a run time parameter for the program.We hence have opted to make the time step calculation amethod of a
TimeLine object, which takes both the hydro-dynamical time step criterion and the gravitational timestep criterion into account and applies the correct CFLparameter. To separate the hydrodynamical aspects, allhydrodynamically relevant information is encoded in twovelocities : the fluid velocity and a sound speed, which iscalculated by a method of the
RiemannSolver object.We also implemented an advanced, tree walk basedtime step criterion to adapt the cell time step in the pres-ence of strong shocks. This criterion makes use of the tem-plate tree walks discussed below. However, for simulationsinvolving regular flow and a high dynamic range, this cri-terion was found to be very expensive, so that we disablethis criterion by default. This criterion can be activatedby setting the corresponding run time parameter.The
TimeLine object also keeps track of the systemtime, and provides methods to convert floating point timevalues to an internally used integer timeline (which is moreconvenient when using a power of 2 time step hierarchy).The
TimeLine is also responsible for writing snapshot files.Snapshots are dumps of the positions and primitivevariables of the particles (along with other relevant quan-tities) at a particular system time. The time interval inbetween snapshots can be specified as a run time parame-ter and is stored in the
TimeLine , together with a counterthat keeps track of the number of snapshot files alreadywritten. Whenever the system time becomes equal to orlarger than the time for the next snapshot, a signal issent to an implementation of the
SnapshotWriter inter-face, which is responsible for writing the snapshot. Notethat this means that snapshots are not necessarily writtenat the exact time requested, if that time does not coincidewith an actual system time. This is a consequence of ourstrict power of 2 time step hierarchy and the fact that wedo not drift quantities in between system times. Usuallyhowever, the difference will be small. Note also that theuse of individual time steps means that not necessarily allcells will be active at the time when a snapshot is written.In this case, the primitive variables that were calculatedat the last point in time when a cell was active are usedfor that cell. Again, the difference will usually be small.The
SnapshotWriter interface defines a method thatwrites the actual snapshot. Two implementations are pro-vided :
ShadowfaxSnapshotWriter , which writes snap-shots in the historically native
Shadowfax format, and
GadgetSnapshotWriter , which writes snapshots in the sameformat that is also used for gizmo and swift , and was one of the possible snapshot formats for
Gadget2 . Both for-mats are based on the HDF5 file format.Initial condition files have the same format as snapshotfiles, and are read in using the appropriate implementa-tion of the
SnapshotReader interface. Both the format ofthe initial conditions as that of the snapshot files can bespecified as a run time parameters for the program, thedefault format being the
Gadget format. Appropriateobjects are generated using the
SnapshotReaderFactory and
SnapshotWriterFactory classes.
Gravitational accelerations are calculated using a Barnes-Hut tree walk (Barnes and Hut, 1986), using a relative treeopening criterion (Springel, 2005). To this end, a hierar-chical octree is constructed : a
Tree object. This objectholds a pointer to a single implementation of the
Node interface, which is called the root of the tree. The
Node interface has two implementations,
TreeNode and
Leaf ,which correspond to respectively a node and a leaf of theoctree. Each
TreeNode has up to 4 or 8 children (depend-ing on the dimension), which themselves are also
Node implementations, and can be either a
TreeNode or a
Leaf .Each
Leaf holds a pointer to a single
Particle , which canbe either a
GasParticle or a
DMParticle (see below). Inparallel simulations, there is also a third type of
Node ,called a
PseudoNode , which is used to represent a
Tree-Node on another MPI process.The tree is constructed using an incremental construc-tion algorithm, and making use of the close link betweenthe levels of the tree and the levels of a Hilbert space fillingcurve (Sundar et al., 2008), the latter also being used forthe domain decomposition (Springel, 2005). To this end,we first calculate a Hilbert key for each particle, using theefficient algorithm of Jin and Mellor-Crummey (2005). Weuse a global octree on each process, with nodes that areentirely on other processes being represented by pseudonodes.The gravitational tree walk itself is in essence the sameas in
Gadget2 , using the same Barnes-Hut tree algorithmwith relative opening criterion and Ewald summation totreat periodic boundaries, but it has been entirely rewrit-ten to increase readability and code reuse. To this end, wehave defined a general
TreeWalker interface. Every treewalk, be it a tree walk to obtain gravitational accelera-tions, or a tree walk to find the neighbours of a particlewithin a given radius, is then represented by an implemen-tation of this interface. The interface itself defines methodsthat correspond to the different tasks during a tree walk :a method to check whether a node of the tree should beopened or treated approximately, a method called when aleaf is encountered, and a method that is called when apseudo node is encountered and checks whether the treewalk should be continued on another process. Apart fromthis, the interface also defines methods that initialize thevariables used during the tree walk, and a method that iscalled when the tree walk is finished.8very possible tree walk is implemented as a singlemethod of the
Tree class, using the concept of C++ tem-plates. To this end, the tree walk method takes the name ofa
TreeWalker implementation as a template argument, aswell as a list of particles for which the tree walk should beperformed. For every particle in the list, a corresponding
TreeWalker object is created and used to walk the tree :we open the root node of the tree and apply the methodthat checks if a node should be opened on its children. Ifa node should be opened, we continue with its children, ifnot, we execute all calculations that are necessary to getthe approximate contribution of the entire node to the treewalk, using the node properties. If a leaf is encountered,the properties of the corresponding particle are used, andwe continue with the next node or leaf on the same level.By using class methods to define the different steps inthe tree walk, the code is a lot easier to read, since, forinstance, the code that decides if a node should be openednow is in a separate method that takes a
TreeNode assingle parameter. We do not need to worry about how toimplement the tree walk itself efficiently, only about whathappens when a node or leaf is encountered. By usingC++ templates, we limit the overhead usually involvedwith run time polymorphism, since the code for a specifictype of
TreeWalker is generated at compile time.Our approach has some benefits for MPI parallelliza-tion as well, since all explicit communication (and the de-sign of an effective communication scheme) is limited tothe single tree walk method. The only tree walk specificthings we need to do, are deciding if communication is nec-essary (which is in most cases similar to deciding whetheror not to open a node), deciding what information to com-municate, and how the result of the tree walk on anotherprocess should be communicated back to the original pro-cess. To this end, our
TreeNode interface defines two sub-classes, called
Export and
Import . The different physical components of the simulationare all represented by
Particle s, which is an abstract classholding a position and velocity, as well as an unique iden-tifier (ID), which can be used to trace a single particlethroughout different snapshots. Currently, two subtypesof
Particle are supported :
GasParticle s and
DMParti-cle s, representing the generators of the Voronoi mesh anda collisionless cold dark matter component respectively.We have adopted the name
GasParticle instead of cell toreflect the fact that our hydrodynamical method is justan alternative for common particle-based hydrodynamicalintegration schemes. For the gravitational calculation, thegas is treated as if it consists of particles.The
GasParticle class extends the particle data withtwo
StateVector members, representing the primitive andthe conserved quantities for that cell. It also holds a num-ber of other variables required by the hydrodynamical in-tegration scheme. The
DMParticle only holds a particle mass and some auxiliary variables for the gravitational cal-culation.The
Particle class itself extends the
Hilbert Object in-terface, which links a space filling Hilbert key to it. Everyclass that implements the
Hilbert Object interface can besorted using an efficient
ParallelSorter , based on Siebertand Wolf (2010). The
ParallelSorter takes care of the do-main decomposition and data size based load-balancingaccross all MPI processes. The domain decompositionis currently only based on the number of particles, andtries to assign equal numbers to all processes. This simpleapproach only works for homogeneous setups with smallnumbers of processes, and will be replaced by a more ad-vanced cost-based domain decompisition in future versionsof the code.
Particle s of all types are stored in a
ParticleVector ,which is a specialized wrapper around two standard C++ vector objects. The class has member methods to sepa-rately access
Particle s of both types, and is also respon-sible for maintaining the
Tree and storing some generalinformation about the simulation.
We have adopted the strategy used by swift , andforce all quantities that are used as input or output ofour program to have units attached to them. To this end,both snapshot formats have appropriate blocks specifyingin what units the given data are expressed, given in termsof a reference unit system, which is either the SI or CGSsystem. If no units are specified for the initial conditionfile, SI units are assumed by default. The output units canbe chosen as a run time parameter, as well as the inter-nally used units. Run time parameters that should haveunits are assumed to be in internal units, the default beingSI units.In principle, it does not matter what units are usedinternally in our code, although using a system of units inwhich variables have values close to unity can be advan-tageous. Furthermore, all run time log information willbe expressed in internal units, so that it is useful to havesome idea of what units are used.To simplify working with units, we have implementeda
Unit class. This class stores the quantity for which theunit is used, expressed as a combination of the three basicquantities length , mass and time , and the value of the unitin SI units. Temperature and current should be added tothe set of basic quantities to be able to express all possiblequantities, but these are not used for the variables thatare currently evolved in
Shadowfax . Unit s can be multiplied with or divided by one another,yielding a new
Unit . To make working with quantitiesin this case possible, we have adopted some conventionsof how quantities should be combined. We also made itpossible to compare different
Unit s, with two
Unit s beingcompatible if their quantities are equal. Quantities withcompatible
Unit s can be converted into each other by us-ing a
UnitConverter . A combination of a length, mass and9ime unit defines a complete unit system, a
UnitSet , whichis what we specify as a run time parameter for the pro-gram. We currently support three different unit systems :SI units (m, kg, s), CGS units (cm, g, s) and galactic units(kpc, M (cid:12) , Gyr), which can be generated using the
Unit-SetGenerator class.Physical constants also have units, but usually theirvalue is fixed. We have hard coded the values of relevantphysical constants (currently only the gravitational con-stant G = 6 . × − m kg − s − ) in SI units, andstore their value in internal units in a Physics object. Wehave provided a mechanism to override the physical valueof the gravitational constant at run time for test purposes,but there are no plans to generalize this approach for otherphysical constants.
Since
Shadowfax is meant to be used on small clus-ters for simulations that take several days or even weeksto complete, we need to address the possibility that runsmight be interrupted, either by limits on the use of infras-tructure, or by hardware failure. Since the initial conditionfiles and snapshot files have the same format, it is alwayspossible to restart a run from the last snapshot. How-ever, since a snapshot is not necessarily written at a timewhen all cells are active, this will affect the outcome of thesimulation.We therefore implemented an explicit restart mecha-nism that dumps the entire simulation to binary files andthen restarts it as if the run never stopped. This is rep-resented by a
RestartFile object. This object has twotemplate methods, write and read , with several specializa-tions, to write values in all sorts of formats to the binaryfile without the need to explicitly type cast anything. Allobjects in the simulation that need to be dumped to therestart file either implement a dump method, which writesmember variables to the restart file, or are written entirelyto the restart file using one of the write specializations.When restarting the run, objects are either created us-ing a special constructor which initializes values by read-ing them from the restart file, or are read entirely from thefile using an appropriate read specialization. To ensure aproper working of the restart mechanism, we only need tomake sure that values are written and read in the sameorder.Since restarting a run is only meant to be done whenthe run was somehow interrupted, the
RestartFile alsowrites a short summary file, containing compilation, runtime and version information about the program that cre-ated the dump. When restarting, this file is read in andthe information is compared with the running program.Only when the same version of the code, with the samecompilation time and the same run time environment isused, do we allow the code to restart.
The current version of the code still contains traces ofthe way in which it was originally developed, that do notcomply with our strict design goals. However, refactoringthe code to eliminate these infers a major update, and ispostponed to a future version of the code. We give a briefoverview below. • VorFace still contains the entire gradient reconstruc-tion and prediction step, as well as the flux calcula-tion. These should be isolated into a new class thatuse the properties of the
VorFace , but can be decou-pled from its geometric meaning. • Similarly,
VorCell still contains the gradient estima-tion and the generator velocity calculation. Theseshould be calculated inside a new class that uses the
VorCell properties. • Currently, only the time step tree walk and the grav-itational tree walk use a template
TreeWalker , theneighbour search is still hardcoded in the
Tree class. • The MPI communication for the cells is strongly cou-pled to ghost
VorGen s stored in the
DelTess . As aresult, all cell communication has to go through the
VorTessManager , which is not ideal. • The domain decomposition does not take into ac-count the effective computational cost on an MPIprocess, but is only based on particle number. • The
FixedGrid only works if the particles have spe-cific positions and do not change their positions. Thismeans the
VorTessManager should have control overthe positions of the particles, and no other class.
4. Setup and analysis
Apart from the main simulation program, the public re-lease of
Shadowfax also includes two auxiliary programsthat can be used to generate initial condition files, and toconvert snapshot files to VTK files that can be easily vi-sualized using common visualization packages. These arediscussed below.
Since the default file format for
Shadowfax is thesame as for a number of important other astrophysicalcodes, it is possible to use software written for those codesto generate initial conditions for
Shadowfax , or to anal-yse
Shadowfax results. We however also provide ourown initial condition generating software, which is a partof
Shadowfax . It is also worth noting that HDF5 has auser-friendly Python interface, h5py . icmakerXd , (with X being2 or 3), and is based on the initial condition mechanismimplemented in the AMR code ramses (Teyssier, 2002).The simulation box is divided into a number of geometrical regions , each having a geometrical shape and values forthe primitive hydrodynamical variables inside that region.Multiple regions can overlap, in this case the values for theregion that was last added are used in the overlap region.The shape of the region is set by defining the positionof the origin o = ( o x , o y , o z ) of the region, and 3 widths, w x , w y , w z , together with an exponent e . The latter isused to determine if a point p = ( p x , p y , p z ) lies inside theregion. For this, the inequality (cid:20)(cid:18) o x − p x w x (cid:19) e + (cid:18) o y − p x w y (cid:19) e + (cid:18) o z − p z w z (cid:19) e (cid:21) e ≤ ramses scheme and do not onlyallow constant values for the primitive variables inside asingle region, but also more complex expressions, contain-ing mathematical operations (+, − , × , / ), basic math-ematical functions (cos , sin ...), mathematical constants( π ), and even coordinate expressions ( x , y , z and r ). Sup-port for these expressions is provided by Boost Spirit . icmakerXd has support for regular Cartesian setups,but also for random unstructured grids, whereby grid gen-erators are sampled according to the hydrodynamical den-sity of the regions, using rejection sampling. To this end,the (potentially) complex density profiles inside the differ-ent regions are numerically integrated to obtain weighingfactors. Allowing complex expressions makes this compu-tationally expensive, but makes the program a powerfultool for initial condition generation.To smoothen out Poisson noise in random generatorsetups, we apply 10 iterations of Lloyd’s algorithm (Lloyd,1982). This yields a more regular initial mesh. Visualizing a Voronoi mesh is not so straightforward,especially in the case of a 3D mesh. Furthermore, the ver-tices of the Voronoi mesh are not written to the snapshotfiles, so that the mesh needs to be recalculated if we wantto visualize the mesh corresponding to some snapshot file.This requires a Voronoi construction algorithm, and hencewe have written an auxiliary program, called vtkmakerXd that converts a regular snapshot to a dump of the Voronoimesh, in the VTK file format . This file format can beread by software that makes use of the powerful Visualiza-tion Toolkit (VTK), the most commonly used examplesbeing Paraview and VisIt . ascl:1011.007 https://wci.llnl.gov/simulation/computer-codes/visit We have also written a plugin for VisIt that reads inthe default (
Gadget ) snapshot format. It should alsobe possible to use the eXtensible Data Model and For-mat (XDMF) and the XDMF plugins for VisIt and Par-aview, an approach taken by e.g. swift . However, thecurrent version of Shadowfax does not write the neces-sary XDMF file, so that it needs to be manually created.
5. Basic tests
To validate the code, we have used it to evaluate anumber of test problems. These have been gathered intoa testsuite , together with the necessary files to create theinitial conditions, run the simulations, and analyze the re-sults. Where possible, we have provided analytic solutionsto compare with.These tests are meant to be run as a general code checkafter every significant change in the code, and for this rea-son use relatively low resolution grids. We therefore willnot focus on obtaining the best possible accuracy. Rather,we will focus on accuracy when we compare
Shadowfax with other publicly available codes in a later section.In this section, we describe the physical test problemscurrently in the testsuite and discuss their results. Wedo not describe tests that are used to verify the properworking of the program itself, like the restarttest , whichchecks if the program correctly restarts from restart files.
This test problem consists of a uniform box with unitlength, in which a fluid with density 0 .
125 and pressure 0 . .
25 is inserted.This test corresponds to one of the Riemann solver tests inToro (2009), but generalized to 2 or 3 dimensions to testgeometrical aspects of the code as well.The solution of this problem consists of an inward trav-elling rarefaction wave, a central contact discontinuity, andan outward travelling shock wave. The simulation resultsin 2 and 3 dimensions at time t = 0 . For this test, a vortex in hydrostatic equilibrium isevolved for some time to check the local conservation ofangular momentum. Inside a box with unit length andconstant density 1, a 2D azimuthal velocity profile of theform (Springel, 2010) v φ ( r ) = r ≤ r < . − r . ≤ r < .
40 0 . ≤ r (12) .0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 r ρ r Figure 5: Radial density profile for the spherical overdensity test at time t = 0 . Left:
2D result, using 10,000 uniformly sampled cells.
Right:
3D result, using 100,000 uniformly sampled cells. The black dots are the simulation results, the red line corresponds to the solution of thehigh resolution 1D equivalent problem. To limit the number of data points, the simulation results have been binned, the standard deviationof the density within the bins is indicated by the error flags. is balanced by a pressure profile of the form p ( r ) = r ≤ r < .
29 + r − r + 4 log (cid:0) r . (cid:1) . ≤ r < .
43 + 4 log (2) 0 . ≤ r. (13)In 3D, we use the same profile and expand it to cylin-ders in the third dimension, in a box of size 1 × × /
3. Weevolve the setup to time t = 3, and compare the velocityand pressure profile with the initial profiles. The resultsare shown in Figure 6.Both the 2D and 3D test conserve the hydrostatic equi-librium relatively well over a long time scale. This meanslocal angular momentum is conserved to a reasonable de-gree. This problem is meant to test the limits of the code,both of the Riemann solver, the mesh regularization al-gorithm, and the time step criterion. It consists of a boxwith unit length in which a cold medium with density 1and pressure 10 − is in rest. In the central cell, we set thepressure to a much higher value, which corresponds to anenergy input of 1, so that a strong explosion is initiated.To accurately capture the explosion, it is important that • the central cells are kept regular at the start of thesimulation • the cells surrounding the center are given small enoughindividual time steps to be active when the shock ar-rives • the Riemann solver is able to handle vacuum gener-ating conditions to correctly estimate fluxes aroundthe central cellThe resulting shock profile is self-similar and has ananalytic solution (Sedov, 1977), which is shown togetherwith the simulation results in Figure 7. Again, the sim-ulation results are in line with the analytic solution, andthe shock is well-resolved. This test is used to validate the gravitational part ofthe code, and uses cold dark matter instead of gas. Sincegravity is not guaranteed to work in 2D, this problem isonly provided in 3D.A Plummer sphere (Plummer, 1911) with mass 1000and scale parameter 1 is initiated inside a large box (theactual box is irrelevant for this specific problem, but
Shad-owfax always requires a simulation box to be present).The velocities of the particles are chosen so that the entireproblem is independent of time. For convenience, we setthe gravitational constant G = 1 for this problem. Weadopt a gravitational softening length of 0 . ∼
10 dynamical times. Figure 9 shows the relative error12 .0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 r | v | r Figure 6: Radial velocity profile for the Gresho vortex test at time t = 3. Both the 2D and the 3D simulation use 10,000 uniformly sampledcells. The black dots represent the binned simulation results, with the error flags indicating the standard deviation on the values within thebins. The red line represents the initial velocity profile, which should remain constant. r ρ r Figure 7: Density profile for the Sedov-Taylor blast wave test at time t = 0 .
1. Left: 2D version, using an initially Cartesian grid with 45 × × ×
45 cells. The black dots represent the binned simulation results, withthe error flags indicating the standard deviation of the density values within the bins. The full red line is the analytical solution of Sedov(1977). -2 -1 r -5 -4 -3 -2 -1 ρ t =0t =1 Figure 8: Density profile of the N-body test at the start and end ofa simulation using 10,648 cold dark matter particles. The density iscalculated by summing the masses of all particles within sphericalshells. The dots and triangles represent simulation results, the fullblue line is the theoretical Plummer density profile from which theinitial condition is sampled. t −0.0010−0.00050.00000.00050.00100.00150.0020 E t o t − E t o t , E t o t , Figure 9: The relative energy error of the N-body test as a functionof time. The gray line shows the energy error at all times, the blueline shows the energy error at the times when all particles are active. on the total energy. We see that the system remains stableand that the total energy is quite accurately preserved.
This problem tests the coupling between hydrodynam-ics and gravity, and consists of a self-gravitating gas cloudwith mass 1 and radius 1, with a density profile of the form(Springel, 2010) ρ ( r ) = (cid:40) π ( r +0 . r ≤
10 1 < r. (14)The cloud is initially at rest and has a very low pressureprofile of the form p ( r ) = (cid:40) . π ( r +0 . r ≤
10 1 < r, (15)corresponding to a low constant thermal energy. We setthe gravitational constant G = 1 for this test and onlyconsider a 3D version. The softening length is set to 0 . t = 0 .
81 are shown in Figure 10, to-gether with a high resolution result for the equivalent 1Dproblem obtained using a finite volume method on a fixed1D grid, and the solution of the same setup using the SPHcode
Gadget2 . The relative error on the total energy isshown in Figure 11. The virializing shock is clearly re-solved and travels at the expected speed. We see thattotal energy fluctuates when the shock is formed and sta-bilizes after the system has virialized, which indicates thatthe gravitational correction terms in our scheme are notentirely effective at the current resolution. Better energyconservation can be obtained by using more resolution, butthis is too computationally expensive for the testsuite.
6. Convergence rate
The MUSCL-Hancock finite volume method implementedin
Shadowfax is nominally second order in both spaceand time. However, due to the use of slope limiters andflux limiters, the actual order of the method can be lowerin the presence of strong discontinuities. To test this, westudy the convergence rate as a function of the numberof cells for a number of different tests: a one dimensionalsmooth travelling sound wave, a one dimensional shocktube and a two dimensional vortex.14 -2 -1 ρ −1.6−1.2−0.8−0.40.0 v r -2 -1 r P / ρ γ Figure 10: Profiles of the Evrard collapse test at time t = 0 .
81, whenthe central virializing shock has formed, but has not yet reached theboundary of the cloud. Top: density profile, middle: radial velocityprofile, bottom: entropy profile. The simulation used 20,000 cells intotal, some of which are vacuum cells surrounding the cloud. Theblack dots represent the binned simulation results, with the errorbars indicating the standard deviation on the density values withinthe bins. The red line is the solution of the equivalent 1D problem.The gray points are the solution for the same setup, but using theSPH code
Gadget2 . t −0.010−0.0050.0000.0050.0100.015 E t o t − E t o t , E t o t , Figure 11: The relative energy error for the Evrard collapse test asa function of time. The gray line represents the energy error at alltimes, the blue line represents the energy error at times when all cellsare active.
The initial condition for this test consists of a periodiccubic box with unit side containing a sound wave with asmall amplitude A = 10 − . The density is given by ρ ( x ) = 1 + A sin 2 πx, (16)the pressure is p ( x ) = ρ ( x ) γ /γ , with γ = 5 /
3, and thevelocity is zero everywhere. The wave travels with thesound speed, c s = 1 for this particular choice of pressure.Although this problem is formally one dimensional, westudy it in 3D, using N comoving cells, with N the num-ber of cells in one dimension. We use both an initial Carte-sian grid (which remains Cartesian throughout the simu-lation as the fluid velocity is zero), and a random uniformgrid. We expect the convergence to be better in the formercase, as the Cartesian solution corresponds to a combina-tion of N
1D solutions. For a random uniform grid, thecells are irregularly shaped, and the problem is effectivelymulti-dimensional.We study the convergence rate by comparing the solu-tion ρ i at t = 1, when the wave has travelled for one boxlength, with the initial condition ρ ( x i ) at t = 0. To thisend, we calculate the L1 normL1 = 1 N (cid:88) i | ρ i − ρ ( x i ) | , (17)for all cells i .Figure 12 shows the L1 norm as a function of the equiv-alent 1D cell number N . Both the Cartesian and the ran-dom uniform grid show close to N − scaling, indicatingthat the method is indeed second order in space and time.15 N -9 -8 -7 -6 L Cartesian∼N −2.31 N Uniform random∼N −2.04
Figure 12: L1 norm as a function of 1D cell number for a travellingsound wave in 3D. The blue dots are the simulation results, the redline is a least squares fit.
A more demanding test problem is the Sod shock test,which consists of a reflective cubic box with unit length inwhich the density is given by ρ ( x ) = (cid:40) x < . .
25 0 . ≤ x, (18)and the pressure by ρ ( x ) = (cid:40) x < . . . ≤ x. (19)The initial velocity is zero everywhere. This problem is the1D equivalent of the spherical overdensity discussed as partof the testsuite, and its solution has the same characteristicwave components, including a contact discontinuity and ashock wave, for which we expect the convergence rate ofour scheme to be worse than second order.We evolve the test to t = 0 .
12, and calculate the L1norm in the same way as above, but with ρ ( x i ) now givenby the exact solution to the equivalent Riemann problemat t = 0 .
12, which can be found using an exact Riemannsolver.Figure 13 shows the L1 norm as a function of the equiv-alent 1D cell number N , again for both a Cartesian initialgrid and a random uniform initial grid. In this case, con-vergence is clearly worse than N − , and is not even firstorder in space and time. There no longer is a clear differ-ence between the Cartesian grid and the uniform randomgrid. To study the convergence in a manifestly multi-dimensionalproblem, we also consider the 2D Gresho vortex, which N -3 -2 -1 L Cartesian∼N −0.84 N Uniform random∼N −0.84
Figure 13: L1 norm as a function of 1D cell number for a Sod shocktest in 3D. The blue dots are the simulation results, the red line is aleast squares fit. we already encountered as part of the testsuite. We runfour simulations with respectively 1,000, 10,000, 50,000and 100,000 random uniform cells until t = 3, and calcu-late the L1 norm for the azimuthal velocities v φ,i using thetime independent initial condition v φ ( r ) given above:L1 = 1 N (cid:88) i | v φ,i − v φ ( r i ) | . (20)In Figure 14, we show the L1 norm as a function of the1D equivalent cell number N (cid:48) = √ N . Again, convergenceis not second order, but it is clearly better than in the caseof a strong shock, as the Gresho vortex is overall moresmooth.
7. Performance
Modern high performance computing systems consistof large numbers of computing nodes, each of which canhave multiple CPUs, which are made up of multiple com-puting cores. These systems are hence highly parallel andemploying their full power requires algorithms that canexploit this parallelism. Compared to the available com-puting power, memory is relatively scarce on most system,so that it is also important to minimize the memory im-print of an application.The current version of
Shadowfax was not optimizedfor usage on high performance systems, but some basicMPI communication instructions were added to make itrun on distributed memory systems. To fully exploit thepower of modern architectures, a hybrid algorithm thatcombines distributed memory parallelism with shared mem-ory parallelism is needed, as well as a better representation16 N ′ -4 -3 -2 -1 L ∼N −1.12 Figure 14: L1 norm as a function of 1D cell number for a 2D Greshovortex test. The blue dots represent simulation values, the red lineis a least squares fit. of the Voronoi mesh that depends less on a globally con-structed mesh. This requires a thorough refactoring of thecode, which is left for future versions.In this section, we discuss the strong and weak scal-ing of the current version of the code, the contribution ofthe various components of the code to the total run time,and their memory imprint. For all these tests, we use theEvrard collapse test discussed above, which uses both thehydrodynamical solver and the N-body solver. Since thissetup is very inhomogeneous, our simple domain decom-position leads to serious load-imbalances. For comparison,we also show scaling results for the more homogeneousspherical overdensity test.
Strong scaling measures the decrease in total simula-tion run time when running the same simulation on anincreasingly large number of processes. Ideally, doublingthe amount of processes should half the total run time, butdue to communication overhead this can never be achieved.As communication scales with the surface area of the dif-ferent computational domains in the simulation, while thecomputation time per domain scales with the domain vol-ume, we expect the relative contribution of communicationto the total run time to increase with increasing processnumber. This poses a natural limit on the speed up thatcan be achieved for a given problem size.Figure 15 shows the strong scaling for two of the testproblems that are part of the testsuite: the 3D Evrard col-lapse and the 3D spherical overdensity. The tests were runon a single node of our local computing cluster, consistingof 4 2.7 GHz Intel Xeon CPUs with 8 cores each, usingversion 4.8.4 of the GNU compiler and OpenMPI 1.6.5.The top row shows the speedup, i.e. the ratio of the single S p ee d u p Evrard collapse Spherical overdensity
Number of processes M P I f r a c t i o n Number of processes
Figure 15: Speedup and MPI fraction for two different tests from thetestsuite under strong scaling. process total run time and the parallel run time. The bot-tom row shows the fraction of the total run time spent inMPI functions: send and receive operations and idle timedue to load imbalances. We see that the speedup is ratherpoor, except for small process numbers, and that the MPIfraction increases significantly with increasing number ofprocesses. This is to be expected, since load balancing isin no way optimized in the current version of the code.For a homogeneous set up like the spherical overdensitythis leads to reasonable scaling, but for strongly inhomo-geneous tests like the Evrard collapse this is disastrous.
Weak scaling is the scaling behaviour of the code whenincreasing the number of processes for a fixed problem sizeper process, i.e. if we double the number of processes, wedouble the number of cells in the simulation as well. It is agood measure for how well the problem is split up over thedifferent processes, as we expect the amount of work andthe amount of communication per process to stay roughlyconstant.Figure 16 shows the weak scaling for the 3D Evrardcollapse and 3D spherical overdensity, with a nominal loadof 10,000 cells per process for both tests. The tests werecarried out on the same hardware as the strong scalingtests discussed above. Instead of the speedup, we nowshow the slowdown in the top row, i.e. the ratio of thetotal parallel run time and the serial run time. Again, wehave rather poor scaling.
An important part of a
Shadowfax simulation is theconstruction of the Voronoi mesh that is used for the hy-drodynamical integration. On average, the serial versionof our code can handle ≈ ,
000 Voronoi cells per second17 S l o w d o w n Evrard collapse Spherical overdensity
Number of processes M P I f r a c t i o n Number of processes
Figure 16: Slowdown and MPI fraction for two different tests fromthe testsuite under weak scaling. The red dashed line correspondsto a 1:1 relation, while the full red lines correspond to unity. in 2D, and ≈ ,
000 Voronoi cells per second in 3D, in-cluding the treatment of reflective or periodic boundariesand the calculation of cell centroids and volumes, and facemidpoints and surface areas.Figure 17 shows the amount of time spent in variousparts of the code during serial runs with different problemsizes. The current design of the code makes it very hard toextract the time spent in MPI communications from thetimes for the different components, so that we do not showparallel results.The gravitational force calculation clearly makes up alarge fraction of the total run time, which increases if thenumber of cells is increased. Grid construction in 3D takesup about double the time that is spent in the actual hy-drodynamical integration. Only very little time is spent inother parts of the algorithm, e.g. particle sorting, tree con-struction, snapshot output... Program performance hencedoes not suffer due to the somewhat less efficient but muchmore convenient object oriented design of these parts.
To test the memory consumption of the code, we used
Massif , a heap profiler which is part of the Valgrind instrumentation framework . Since we do not expect thememory imprint to vary much over time, we limited theEvrard test to t = 0 .
06 for this test.Figure 18 shows the fraction of the memory occupiedby various parts of the code during the final
Massif snap-shot, when the code used 137 MB of memory in total (in-cluding empty blocks used for memory alignment). A largefraction of this memory is occupied by the Voronoi mesh, http://valgrind.org/docs/manual/ms-manual.html http://valgrind.org/ Number of cells R un t i m e f r a c t i o n Evrard collapsevariousgravitygrid constructionhydrodynamics Number of cells
Spherical overdensityvariousinitializationgrid constructionhydrodynamics
Figure 17: Fraction of total run time spent in different parts of thecode.
Voronoi faces28.0%Delaunay grid 23.3%Gas particles 9.3%Voronoi cells28.5% Other10.9%
Figure 18: The fraction of the memory occupied by various parts ofthe code during an Evrard collapse test with 20,000 cells.
8. Comparison with other methods
In this section, we compare
Shadowfax with a num-ber of other publicly available hydrodynamical solvers, toqualify the advantages and disadvantages of the movingmesh method. All files necessary to run these tests areavailable from . It is a known problem of particle-based Lagrangianmethods like SPH that they have difficulties resolving Kelvin-Helmholtz instabilities (Agertz et al., 2007), since the ba-sic SPH equations do not cover discontinuous solutions,like shock waves and contact discontinuities. As Springel(2010) showed, a moving mesh method does not experi-ence these difficulties, as the Riemann problem based finitevolume method that is used does include discontinuous so-lutions.One of the main advantages of a Lagrangian methodover methods that use a fixed discretization (like AMR),is the Galilean invariance of the method. This means thatthe flux between two neighbouring cells will only dependon the relative velocity of two neighbouring cells, and noton their absolute velocities with respect to some referenceframe fixed to the simulation box. As a result, we expecta moving mesh method to better resolve instabilities in afluid that is moving with a high bulk velocity with respectto the simulation box reference frame. To quantify thisbehaviour, we set up a variant of the shearing layers testintroduced in Agertz et al. (2007).For this test, two layers which can have different densi-ties are in pressure equilibrium inside a periodic box. Thelayers receive a velocity component parallel to the interfacebetween the layers, but with opposite sign, so that theyshear against each other. Due to the pressure equilibrium,the system is marginally stable : a small velocity com-ponent perpendicular to the interface between the layerswill exponentially grow to form a Kelvin-Helmholtz insta-bility. The instability first goes through a linear phaseof exponential growth, after which the system becomeshighly non-linear. The simple setup described here is un-stable on all scales, so that instabilities can even be seededby numerical noise in the absence of diffusion. As a re-sult, the results are highly dependent of the resolutionand the details of the numerical scheme. This makes athorough comparison of different methods completely im-possible (Lecoanet et al., 2016).As Hendrix and Keppens (2014) point out, introduc-ing a middle layer with a linear transition in flow velocity in between the shearing layers will suppress small scaleinstabilities, so that the growth of instabilities no longerdepends on the numerical resolution (if it is high enough).The middle layer also introduces a maximally unstablewavelength, which we will seed. This gives us full con-trol over the instabilities that will grow. We first discussthe non-linear growth of the instability in a setup with adensity contrast of 10 between the layers, to show that
Shadowfax qualitatively produces similar instabilities.We then study the convergence of the linear growth ratein a setup without density contrast.
We set up a periodic 2D box with unit length, in whichthe density is given by ρ ( x, y ) = y < . . ≤ y ≤ .
751 0 . < y. (21)The x component of the velocity is given by v x = − . y ≤ . − d − . y + d − . d . − d < y < .
25 + d . .
25 + d ≤ y ≤ . − d . − y + d − . d . − d < y < .
75 + d − . .
75 + d ≤ y, (22)with d = 0 .
025 the thickness of the middle layer. The y component of the velocity is v y = A sin (4 πx ) (cid:18) e − ( y − . σ + e − ( y − . σ (cid:19) , (23)with A = 0 . σ = 0 . p = 2 . Shadowfax ,with results obtained using the AMR code MPI-AMRVAC (Keppens et al., 2012), using the same initial condition(we use a Cartesian initial grid for the Shadowfax runs).MPI-AMRVAC supports different hydrodynamical schemes;we use both a conservative finite difference scheme withglobal Lax-Friedrich splitting and a fifth order spatial re-construction (hereafter called FD), and a finite volumescheme with a Harten-Lax-van Leer Contact (HLLC) solver(hereafter called FV). Both schemes use a fourth order ac-curate Runge-Kutta time integration scheme.To test the Galilean invariance of the code, we option-ally add a bulk velocity v bulk = 100 to the entire fluid(corresponding to a Mach number of 155 in the high den-sity layer), so that the fluid moves with respect to thesimulation box reference frame. The results of a low reso-lution run at time t = 1 . ascl:1208.014 .00.51.0 y S HADOWFAX
MPI-AMRVAC FV MPI-AMRVAC FD x y x x Figure 19: Density colour plot for the shearing layers test at time t = 1 .
5. The top row corresponds to simulations with v bulk = 0, while thebottom row corresponds to simulations with v bulk = 100. All simulations start from a 100 ×
100 Cartesian grid and have a fixed number ofcells. The individual cells are shown, this explains the irregularities at the boundaries of the
Shadowfax plot. .00.51.0 y S HADOWFAX
MPI-AMRVAC FV x y x Figure 20: Density colour plot for the shearing layers test at time t =1 . ×
400 grid. The top row corresponds tosimulations with no bulk velocity, the bottom row has v bulk = 100. results, in the sense that they all produce the same largeinstabilities on the same timescale. There are some dif-ferences in the non-linear phase of the instability, whichis to be expected. With a bulk velocity, the results dra-matically change for the MPI-AMRVAC simulations. TheFD method does not produce any result at all, while theinstabilities in the FV simulation are clearly affected bythe bulk velocity. There is also a clear imprint on therun time of the MPI-AMRVAC simulations, since a muchsmaller system time step is needed for the integration. Therun time and the results of the Shadowfax simulationsare not affected by the bulk velocity, since the mesh ismoving along with the flow. These results are confirmedby the high resolution runs, shown in Figure 20.
In a setup without density contrast, the initial expo-nential growth of the instability only depends on the wave-length of the instability and the thickness of the middlelayer (Hendrix and Keppens, 2014). This means that sim-ulations of this initial phase should converge to the samegrowth rate, irrespective of the resolution or the methodthat is used.To test this, we run a variant of the shearing layerstest without density contrast. We still use a periodic boxwith unit length, but now the density and pressure in thebox are both constant and equal to 1. The x componentof the velocity is given by (22), while the y component of − − − − − − − E k i n , y S HADOWFAX × S HADOWFAX × S HADOWFAX × S HADOWFAX × . . . . . . . . . t − − − − − − − − E k i n , y M PI - AMRVAC × M PI - AMRVAC × M PI - AMRVAC × Figure 21: The kinetic energy in the y direction as a function of timefor the shearing layers test without a density contrast. the velocity is now given by v y = B sin (4 πx ) (cid:18) e − ( y − . d + e − ( y − . d (cid:19) , (24)with B = 0 . d = 0 . y direction. Initially, this en-ergy is set by our seed velocity, but as the instability grows,kinetic energy in the x direction is converted into extra ki-netic energy in the y direction, so that this energy will growexponentially, as does the y component of the velocity.Figure 21 shows the kinetic energy in the y directionas a function of time for our simulations and for gridswith different resolutions. The high resolution Shadow-fax results are in good agreement with the high resolutionMPI-AMRVAC results, but the convergence is slower for
Shadowfax . This is illustrated in Figure 22, where weplot the relative difference between the different simula-tions and the high resolution MPI-AMRVAC result, thatwe use as a reference solution.We fitted an exponential function of the form B e At tothe kinetic energy in the y direction, in the time interval[1 . , . A to quantify the growth of the21 . − . . . ( E k i n , y − E k i n , y , r e f ) / E k i n , y , r e f S HADOWFAX × S HADOWFAX × S HADOWFAX × S HADOWFAX × . . . . . . . . . t − . − . . . ( E k i n , y − E k i n , y , r e f ) / E k i n , y , r e f M PI - AMRVAC × M PI - AMRVAC × Figure 22: The relative difference between the kinetic energy in the y direction for the 400 ×
400 MPI-AMRVAC simulation and that forthe other simulations, as a function of time.Table 1: Slope of an exponential fit to the kinetic energy in the y direction in the time interval [1 . , .
5] for the shearing layers testswithout density contrast. simulation slope
Shadowfax ×
100 4.61
Shadowfax ×
200 5.43
Shadowfax ×
400 5.10
Shadowfax ×
800 5.10MPI-AMRVAC 100 ×
100 5.28MPI-AMRVAC 200 ×
200 5.14MPI-AMRVAC 400 ×
400 5.12 instability. The results are shown in Table 1. Both thehigh resolution
Shadowfax and MPI-AMRVAC resultsare converged.
Shadowfax versus swift
The mesh-free methods introduced by Hopkins (2015)use the same finite volume method used by
Shadowfax ,but use an SPH-like discretization of the fluid to calcu-late volumes and interfaces between neighbouring parti-cles. Since this method does not require the construc-tion of a global unstructured mesh, it is computationallycheaper and has a better potential for parallel scalability.Just as SPH however, the method smooths out local reso-lution over a number of neighbouring particles, potentiallylowering the effective resolution.We compare
Shadowfax with our own implementa-tion of a mesh-free method in the highly parallel SPH-code swift . This method combines the fast neighbour loopalgorithms of swift with a finite volume method that isthe same as implemented in Shadowfax , and uses thesame exact Riemann solver. Since
Shadowfax reads thesame initial condition file format as swift , we can usethe exact same initial condition for both simulations anddirectly compare the results. We also compare with thedefault SPH version of swift .Figure 23 shows the density profile of a Sod shock test,which is the 1D equivalent of the spherical overdensity testdiscussed as part of the testsuite . A left high density, highpressure region with ρ = 1 and p = 1 connects with a lowdensity, low pressure region with ρ = 0 .
25 and p = 0 . x = 0 . × . × .
125 with periodic boundaries. The results are evolvedto time t = 0 .
12, when the left rarefaction wave, centraldiscontinuity and right shock have developed.The
Shadowfax result clearly follows the theoreticalcurve, while both swift results show minor deviationsaround the different features in the profile. The deviationsare strongest for the swift mesh-free results, although thenoise on the swift
SPH result is higher. We calculated χ values for all three simulations by summing the quadraticdifferences between the particle densities and the theoret-ical densities for all particles. This yielded χ = 29 . Shadowfax , χ = 106 .
71 for swift mesh-free, and χ = 123 .
54 for swift
SPH. Not smoothing out the localresolution hence clearly leads to an overall higher accuracyof the moving mesh method.
The strong shock test proposed by Noh (1987) is avery challenging test with a known analytical solution. Itconsists of a reflective box with unit length, in which afluid with unit density and a negligible thermal energy of1 × − is enclosed. The radial velocity of the fluid is set https://gitlab.cosma.dur.ac.uk/swift/swiftsim/tree/gizmo . . . . . ρ − . − . . . . ρ − ρ t h e o r y S HADOWFAX . . . . . ρ − . − . . . ρ − ρ t h e o r y SWIFT mesh-free . . . . . . x . . . . . ρ . . . . . . x − . − . . . ρ − ρ t h e o r y SWIFT
SPH
Figure 23: Density profile of the Sod shock test at t = 0 .
12. The black dots represent the binned density values, with the error flags indicatingthe standard deviation on the values within the bins. The red line corresponds to the exact solution, which is the solution of the equivalentRiemann problem. The right column shows the difference between the density and the analytical solution. All simulations use the same initialcondition with 1,024,128 particles. − t = 0, so that the fluid collapses on the originand causes a strong shock with a very high Mach number.The velocity of the shock front is v shock = 1 /
3, and theradial density profile at time t > ρ ( r, t ) = (cid:40) r ≤ v shock t tr v shock t < r (25)in 2D.As is common for this problem, we restrict ourselves tothe upper right quadrant of the box. However, we do notuse the commonly used inflow boundary conditions for theupper and right boundaries (Springel, 2010), since Shad-owfax does not currently support inflow boundaries. Sincethe radial flow of the fluid creates a very low density cav-ity at these boundaries, this leads to numerical problemswhen the shock reaches the low density cavities. We willtherefore restrict the simulation to t = 0 . t = 0 . Shadowfax result. Wecalculated χ values for the simulations, yielding χ =1 . × and χ = 1 . × for Shadowfax , and χ = 2 . × and χ = 4 . × for MPI-AMRVAC. Another challenging test is the implosion test of Liskaand Wendroff (2003). It consist of a periodic box of length0 . × .
6, in which a fluid with unit density and pressureis initially at rest. In the center of the box, a rhombuswith width 0 . ρ = 0 . p = 0 .
14. Thehigh density fluid will implode into this region and causea strong shock wave, that will travel back and forth in theperiodic box. As in Liska and Wendroff (2003), we set theadiabatic index γ = 1 . Shadowfax results with resultsobtained using MPI-AMRVAC. It is interesting to see thatboth methods produce asymmetrical results in the center,while reproducing the same large scale shock wave, even at later times, when the shock wave has interacted withthe central instabilities. Just as before, the
Shadowfax instabilities seem to develop faster than the ones in theMPI-AMRVAC result, and they are also significantly morechaotic, due to the co-moving character of the mesh. Forthe same mesh resolution, the
Shadowfax results showmore instabilities, indicating that a moving mesh has ahigher local resolution. Figure 2 shows a zoom of the
Shadowfax result at time t = 0 .
9. Discussion
In this paper, we introduced the public simulation code
Shadowfax . The code can be used to simulate a mix-ture of gas and cold dark matter, with an accurate treat-ment of the hydrodynamics and gravitational forces. Thediscretization of the gas is provided by an unstructuredVoronoi mesh, which is evolved in time to more accuratelyfollow the hydrodynamical flow.The code is inspired by Springel (2010), but uses anobject-oriented design, which attempts to make the codeeasier to read and extend. Some parts of the code makeuse of advanced C++ language features, like templates, toimprove code reuse without significant run time cost. Wehave attempted to separate the actual physics from the al-gorithmic details as much as possible. Much improvementis possible however, which will be the subject of futurework on the code.We have introduced the test problems that are cur-rently included in the public version of the code, as wellas some more involved tests that compare the code withother publicly available codes. These tests show that theLagrangian nature of the moving mesh method makes itbetter at resolving instabilities in regions with high Machnumbers than fixed grid methods. On the other hand, amoving mesh is a lot more sensitive to instabilities thatare seeded numerically. When the growth rate of insta-bilities is studied, a moving mesh method requires higherresolution to converge to a consistent growth rate than afixed grid method.The finite volume method used by
Shadowfax is al-most identical to that used by the mesh-free methods ofHopkins (2015), but has a higher effective resolution forthe same number of particles, since the mesh-free dis-cretization of space smooths out resolution over a largenumber of neighbours.24 ρ − . − . − . − . . . . . . ρ − ρ t h e o r y S HADOWFAX × S HADOWFAX × .
00 0 .
05 0 .
10 0 .
15 0 .
20 0 . r ρ .
00 0 .
05 0 .
10 0 .
15 0 .
20 0 . r − . − . − . − . . . . . . ρ − ρ t h e o r y MPI-AMRVAC × MPI-AMRVAC × Figure 24: Radial density profile for the Noh test at t = 0 .
5. The black and gray dots show the binned density values, with the error barsindicating the standard deviation of the values inside the bins. The red line corresponds to the analytical solution. The right column showsthe difference between the density and the analytical solution. .00.30.6 y S HADOWFAX
MPI-AMRVAC FV y x y x Figure 25: Density colour plots for the implosion test at different times.
Top: t = 0 . middle: t = 0 . bottom: t = 0 . cknowledgments We thank the two anonymous referees for the construc-tive feedback that improved the quality of this manuscript,and for pointing out the details of our implementation weforgot to mention in the first version. We thank the GhentUniversity Special Research Fund and the InteruniversityAttraction Poles Programme initiated by the Belgian Sci-ence Policy Office (IAP P7/08 CHARM) for financial sup-port. We thank Robbert Verbeke for his contribution tothe exact Riemann solver, and Peter Camps for sharingthe
Vec class that originally was part of SKIRT withus, before SKIRT became public. BV thanks MatthieuSchaller and Tom Theuns for the discusions about finitevolume methods, and the coupling between hydrodynam-ics and gravity. We thank Volker Springel and RomainTeyssier for making their codes Gadget2 and ramses public. Special thanks also to Rony Keppens for sharinghis knowledge about grid methods and helping us settingup the MPI-AMRVAC simulations. Finally, we want tothank J. R. R. Tolkien for providing an extensive list ofunique names to give to computing infrastructure and sim-ulation codes.
References
Agertz, O., Moore, B., Stadel, J., Potter, D., Miniati, F., Read, J.,Mayer, L., Gawryszczak, A., Kravtsov, A., Nordlund, ˚A., Pearce,F., Quilis, V., Rudd, D., Springel, V., Stone, J., Tasker, E., Teys-sier, R., Wadsley, J., Walder, R., Sep. 2007. Fundamental differ-ences between SPH and grid methods. Mon. Not. R. Astron. Soc.380, 963–978.Barnes, J., Hut, P., Dec. 1986. A hierarchical O(N log N) force-calculation algorithm. Nature 324, 446–449.Cloet-Osselaer, A., De Rijcke, S., Vandenbroucke, B., Schroyen, J.,Koleva, M., Verbeke, R., Aug. 2014. Numerical simulations ofdwarf galaxy merger trees. Mon. Not. R. Astron. Soc. 442, 2909–2925.Dobbs, C. L., Mar. 2015. The interstellar medium and star formationon kpc size scales. Mon. Not. R. Astron. Soc. 447, 3390–3401.Duffell, P. C., MacFadyen, A. I., Dec. 2011. TESS: A RelativisticHydrodynamics Code on a Moving Voronoi Mesh. Astrophys. J.Suppl. Ser. 197, 15.Duffell, P. C., MacFadyen, A. I., Aug. 2012. Global Calculations ofDensity Waves and Gap Formation in Protoplanetary Disks Usinga Moving Mesh. Astrophys. J. 755, 7.Gaburov, E., Johansen, A., Levin, Y., Oct. 2012. Magnetically Lev-itating Accretion Disks around Supermassive Black Holes. Astro-phys. J. 758, 103.Geen, S., Rosdahl, J., Blaizot, J., Devriendt, J., Slyz, A., Apr. 2015.A detailed study of feedback from a massive star. Mon. Not. R.Astron. Soc. 448, 3248–3264.Gonnet, P., Schaller, M., Theuns, T., Chalk, A. B. G., Sep. 2013.SWIFT: Fast algorithms for multi-resolution SPH on multi-corearchitectures. ArXiv e-prints.Greif, T. H., Springel, V., White, S. D. M., Glover, S. C. O., Clark,P. C., Smith, R. J., Klessen, R. S., Bromm, V., Aug. 2011. Simula-tions on a Moving Mesh: The Clustered Formation of PopulationIII Protostars. Astrophys. J. 737, 75.Hendrix, T., Keppens, R., Feb. 2014. Effect of dust on Kelvin-Helmholtz instabilities. Astron. Astrophys. 562, A114. ascl:1109.003 Hopkins, P. F., Jun. 2015. A new class of accurate, mesh-free hy-drodynamic simulation methods. Mon. Not. R. Astron. Soc. 450,53–110.Jin, G., Mellor-Crummey, J., Mar. 2005. Sfcgen: A framework forefficient generation of multi-dimensional space-filling curves by re-cursion. ACM Trans. Math. Softw. 31 (1), 120–148.URL http://doi.acm.org/10.1145/1055531.1055537 Keppens, R., Meliani, Z., van Marle, A., Delmont, P., Vlasis,A., van der Holst, B., 2012. Parallel, grid-adaptive approachesfor relativistic hydro and magnetohydrodynamics. J. Comput.Phys. 231 (3), 718744, special Issue: Computational PlasmaPhysicsSpecial Issue: Computational Plasma Physics.URL
Lecoanet, D., McCourt, M., Quataert, E., Burns, K. J., Vasil, G. M.,Oishi, J. S., Brown, B. P., Stone, J. M., O’Leary, R. M., Feb. 2016.A validated non-linear Kelvin-Helmholtz benchmark for numericalhydrodynamics. Mon. Not. R. Astron. Soc. 455, 4274–4288.Liska, R., Wendroff, B., Mar. 2003. Comparison of several differenceschemes on 1d and 2d test problems for the euler equations. SIAMJ. Sci. Comput. 25 (3), 995–1017.URL http://dx.doi.org/10.1137/S1064827502402120
Lloyd, S., Mar 1982. Least squares quantization in pcm. EEE Trans.Inf. Theory 28 (2), 129–137.Noh, W. F., Sep. 1987. Errors for calculations of strong shocks usingan artificial viscosity and an artificial heat flux. J. Comput. Phys.72, 78–120.Pakmor, R., Bauer, A., Springel, V., Dec. 2011. Magnetohydrody-namics on an unstructured moving grid. Mon. Not. R. Astron.Soc. 418, 1392–1401.Plummer, H. C., Mar. 1911. On the problem of distribution in glob-ular star clusters. Mon. Not. R. Astron. Soc. 71, 460–470.Price, D. J., Feb. 2012. Smoothed particle hydrodynamics and mag-netohydrodynamics. J. Comput. Phys. 231, 759–794.Reem, D., 2011. The geometric stability of voronoi diagrams with re-spect to small changes of the sites. In: Proceedings of the Twenty-seventh Annual Symposium on Computational Geometry. SoCG’11. ACM, New York, NY, USA, pp. 254–263.URL http://doi.acm.org/10.1145/1998196.1998234
Richard Shewchuk, J., 1997. Adaptive Precision Floating-PointArithmetic and Fast Robust Geometric Predicates. Discrete Com-put. Geom. 18 (3), 305–363.URL http://dx.doi.org/10.1007/PL00009321
Schaye, J., Crain, R. A., Bower, R. G., Furlong, M., Schaller, M.,Theuns, T., Dalla Vecchia, C., Frenk, C. S., McCarthy, I. G.,Helly, J. C., Jenkins, A., Rosas-Guevara, Y. M., White, S. D. M.,Baes, M., Booth, C. M., Camps, P., Navarro, J. F., Qu, Y.,Rahmati, A., Sawala, T., Thomas, P. A., Trayford, J., 2015. Theeagle project: simulating the evolution and assembly of galaxiesand their environments. Mon. Not. R. Astron. Soc. 446 (1),521–554.URL http://mnras.oxfordjournals.org/content/446/1/521.abstract
Sedov, L., 1977. Similitude et dimensions en mecanique. Edition Mir.Siebert, C., Wolf, F., 2010. A scalable parallel sorting algorithm usingexact splitting. Tech. rep., German Research School for SimulationSciences GmbH.Springel, V., Dec. 2005. The cosmological simulation codeGADGET-2. Mon. Not. R. Astron. Soc. 364, 1105–1134.Springel, V., Jan. 2010. E pur si muove: Galilean-invariant cosmo-logical hydrodynamical simulations on a moving mesh. Mon. Not.R. Astron. Soc. 401, 791–851.Sundar, H., Sampath, R. S., Biros, G., 2008. Bottom-Up Construc-tion and 2:1 Balance Refinement of Linear Octrees in Parallel.SIAM J. Sci. Comput. 30 (5), 26752708.URL http://dx.doi.org/10.1137/070681727
Teyssier, R., Apr. 2002. Cosmological hydrodynamics with adaptivemesh refinement. A new high resolution code called RAMSES.Astron. Astrophys. 385, 337–364.Toro, E. F., 2009. Riemann Solvers and Numerical Methods for FluidDynamics, 3rd Edition. Springer-Verlag, Berlin Heidelberg. erbeke, R., Vandenbroucke, B., De Rijcke, S., Dec. 2015. How theFirst Stars Shaped the Faintest Gas-dominated Dwarf Galaxies.Astrophys. J. 815, 85.Vogelsberger, M., Genel, S., Springel, V., Torrey, P., Sijacki, D., Xu,D., Snyder, G., Bird, S., Nelson, D., Hernquist, L., May 2014.Properties of galaxies reproduced by a hydrodynamic simulation.Nature 509, 177–182.Vogelsberger, M., Sijacki, D., Kereˇs, D., Springel, V., Hernquist,L., Oct. 2012. Moving mesh cosmology: numerical techniques andglobal statistics. Mon. Not. R. Astron. Soc. 425, 3024–3057.Yalinewich, A., Steinberg, E., Sari, R., 2015. RICH: Open-sourceHydrodynamic Simulation on a Moving Voronoi Mesh. Astrophys.J. Suppl. Ser. 216 (2), 35.URL http://stacks.iop.org/0067-0049/216/i=2/a=35http://stacks.iop.org/0067-0049/216/i=2/a=35