REBOUND: An open-source multi-purpose N-body code for collisional dynamics
AAstronomy & Astrophysics manuscript no. paper c (cid:13)
ESO 2011November 11, 2011
REBOUND: An open-source multi-purpose N-body code forcollisional dynamics
Hanno Rein and Shang-Fei Liu , Institute for Advanced Study, 1 Einstein Drive, Princeton, NJ 08540e-mail: [email protected] Kavli Institute for Astronomy and Astrophysics, Peking University, Beijing 100871, P. R. China Department of Astronomy, Peking University, Beijing 100871, P. R. Chinae-mail: [email protected]
Submitted: 13 September 2011 – Accepted: 6 November 2011
ABSTRACT
REBOUND is a new multi-purpose N-body code which is freely available under an open-source license. It was designed for collisionaldynamics such as planetary rings but can also solve the classical N-body problem. It is highly modular and can be customized easilyto work on a wide variety of di ff erent problems in astrophysics and beyond. REBOUND comes with three symplectic integrators: leap-frog, the symplectic epicycle integrator (SEI) and a Wisdom-Holman mapping(WH). It supports open, periodic and shearing-sheet boundary conditions.
REBOUND can use a Barnes-Hut tree to calculate both self-gravity and collisions. These modules are fully parallelized with MPI as well as OpenMP. The former makes use of a static domaindecomposition and a distributed essential tree. Two new collision detection modules based on a plane-sweep algorithm are alsoimplemented. The performance of the plane-sweep algorithm is superior to a tree code for simulations in which one dimension ismuch longer than the other two and in simulations which are quasi-two dimensional with less than one million particles.In this work, we discuss the di ff erent algorithms implemented in REBOUND , the philosophy behind the code’s structure as well asimplementation specific details of the di ff erent modules. We present results of accuracy and scaling tests which show that the codecan run e ffi ciently on both desktop machines and large computing clusters. Key words.
Methods: numerical – Planets and satellites: rings – Proto-planetary disks
1. Introduction
REBOUND is a new open-source collisional N-body code. Thiscode, and precursors of it, have already been used in wide varietyof publications (Rein & Papaloizou 2010; Crida et al. 2010; Reinet al. 2010, Rein & Liu in preparation; Rein & Latter in prepa-ration). We believe that
REBOUND can be of great use for manydi ff erent problems and have a wide reach in astrophysics andother disciplines. To our knowledge, there is currently no pub-licly available code for collisional dynamics capable of solvingthe problems described in this paper. This is why we decided tomake it freely available under the open-source license GPLv3 .Collisional N-body simulations are extensively used in as-trophysics. A classical application is a planetary ring (seee.g. Wisdom & Tremaine 1988; Salo 1991; Richardson 1994;Lewis & Stewart 2009; Rein & Papaloizou 2010; Michikoshi &Kokubo 2011, and references therein) which have often a colli-sion time-scale that is much shorter than or at least comparableto an orbital time-scale. Self-gravity plays an important role, es-pecially in the dense parts of Saturn’s rings (Schmidt et al. 2009).These simulations are usually done in the shearing sheet approx-imation (Hill 1878).Collisions are also important during planetesimal formation(Johansen et al. 2007; Rein et al. 2010, Johansen et al. in prepa-ration). Collisions provide the dissipative mechanism to form aplanetesimal out of a gravitationally bound swarm of boulders. The full license is distributed together with
REBOUND . It can also bedownloaded from . REBOUND can also be used with little modification in situa-tions where only a statistical measure of the collision frequencyis required such as in transitional and debris discs. In such sys-tems, individual collisions between particles are not modeled ex-actly, but approximated by the use of super-particles (Stark &Kuchner 2009; Lithwick & Chiang 2007).Furthermore,
REBOUND can be used to simulate classical N-body problems involving entirely collision-less systems. A sym-plectic and mixed variable integrator can be used to follow thetrajectories of both test-particles and massive particles.We describe the general structure of the code, how to ob-tain, compile and run it in Sect. 2. The time-stepping schemeand our implementation of symplectic integrators are describedin Sect. 3. The modules for gravity are described in Sect. 4. Thealgorithms for collision detection are discussed in Sect. 5. InSect. 6, we present results of accuracy tests for di ff erent mod-ules. We discuss the e ffi ciency of the parallelization with the helpof scaling tests in Sect. 7. We finally summarize in Sect. 8.
2. Overview of the code structure
REBOUND is written entirely in C and conforms to the ISO C99standard. It compiles and runs on any modern computer platformwhich supports the POSIX standard such as Linux, Unix andMac OSX. In its simplest form,
REBOUND requires no externallibraries to compile.Users are encouraged to install the OpenGL and GLUT li-braries which enable real-time and interactive 3D visualizations.LIBPNG is required to automatically save screen-shots. The a r X i v : . [ a s t r o - ph . E P ] N ov anno Rein and Shang-Fei Liu: REBOUND: An open-source multi-purpose N-body code for collisional dynamics code uses OpenMP for parallelization on shared memory sys-tems. Support for OpenMP is built-in to modern compilers andrequires no libraries (for example gcc ≥ . REBOUND also supports hybrid parallelization using bothOpenMP and MPI simultaneously.
The source code is hosted on the github platform and can bedownloaded at http://github.com/hannorein/rebound/ .A snapshot of the current repository is provided as tar and zip-files. However, users are encouraged to clone the entire reposi-tory with the revision control system git . The latter allows oneto keep up-to-date with future updates. Contributions from usersto the public repository are welcome. Once downloaded and ex-tracted, one finds five main directories.The entire source code can be found in the src/ directory.In the vast majority of cases, nothing in this directory needs tobe modified.Many examples are provided in the examples/ directory.Each example comes with a problem file, named problem.c ,and a makefile named
Makefile . To compile one of the exam-ples, one has to run the make command in the corresponding di-rectory. The code compilation is then performed in the followingsteps:1. The makefile sets up environment variables which controlvarious options such as the choice of compiler, code opti-mization, real time visualization and parallelization.2. It sets symbolic links, specifying the modules chosen for thisproblem (see below).3. It calls the makefile in the src/ directory which compilesand links all source files.4. The binary file is copied to the problem directory, fromwhere it can be executed.Documentation of the source code can be generated in the doc/ directory with doxygen. There is no static documentationavailable because the code structure depends crucially on themodules currently selected. To update the documentation withthe current module selection, one can simply run make doc inany directory with a makefile.In the directory tests/ one finds tests for accuracy and scal-ing as well as simple unit tests. The source code of the tests pre-sented in Sects. 6 and 7 is included as well.The problem/ directory is the place to create new problems.It contains a template for that. Any of the examples can also beused as a starting point for new problems.
REBOUND is extremely modular. The user has the choice betweendi ff erent gravity, collision, boundary and integration modules.It is also possible to implement completely new modules withminimal e ff ort.Modules are chosen by setting symbolic links. Thus, there isno need to execute a configuration script prior to compiling thecode. For example, there is one link gravity.c which pointsto one of the gravity modules gravity *.c . The symbolic linksare set in each problem makefile. Only this makefile has to bechanged when a di ff erent module is used. Pre-compiler macrosare set automatically for situations in which di ff erent modulesneed to know about each other. This setup allows the user to work on multiple projects atthe same time using di ff erent modules. When switching to an-other problem, nothing has to be set-up and the problem can bycompiled by simply typing make in the corresponding directory.To implement a new module, one can just copy an existingmodule to the problem directory, modify it and change the link inthe makefile accordingly. Because no file in the src/ directoryneeds to be changed, one can easily keep REBOUND in sync withnew versions . In REBOUND , the computational domain consists of a collectionof cubic boxes. Any integer number of boxes can be used ineach direction. This allows elongated boxes to be constructedout of cubic boxes. The cubic root boxes are also used for staticdomain decomposition when MPI is enabled. In that case, thenumber of root boxes has to be a integer multiple of the numberof MPI nodes. When a tree is used for either gravity or collisiondetection, there is one tree structure per root box (see Sect. 4.2).
REBOUND comes with three di ff erent boundary conditions.Open boundaries ( boundaries open.c ) remove every parti-cle from the simulation that leaves the computational domain.Periodic boundary conditions ( boundaries periodic.c ) areimplemented with ghost boxes. Any number of ghost boxescan be used in each direction. Shear-periodic boundary condi-tions ( boundaries shear.c ) can be used to simulate a shear-ing sheet.
3. Integrators
Several di ff erent integrators have been implemented in REBOUND . Although all of these integrators are second order ac-curate and symplectic, their symplectic nature is formally lostas soon as self-gravity or collisions are approximated or whenvelocity dependent forces are included.All integrators follow the commonly used Drift-Kick-Drift(DKD) scheme but implement the three sub-steps di ff erently.We describe the particles’ evolution in terms of a Hamiltonian H which can often be written as the sum of two Hamiltonians H = H + H . How the Hamiltonian is split into two parts de-pends on the integrator. Usually, one identifies H ( p ) as the ki-netic part and H ( q ) as the potential part, where p and q arethe canonical momenta and coordinates. During the first driftsub-step, the particles evolve under the Hamiltonian H for halfa time-step dt /
2. Then, during the kick sub-step, the particlesevolve under the Hamiltonian H for a full time-step dt . Finally,the particles evolve again for half a time-step under H . Notethat the positions and velocities are synchronized in time only atthe end of the DKD time-steps. We refer the reader to Saha &Tremaine (1992) and references therein for a detailed discussionon symplectic integrators. REBOUND uses the same time-step for all particles. By de-fault, the time-step does not change during the simulation be-cause in all the examples that come with
REBOUND , the time-stepcan be naturally defined as a small fraction of the dynamical timeof the system. However, it is straight forward to implement avariable time-step. This implementation depends strongly on the On how to do that, see for example http://gitref.org/ for anintroduction to git. We could have also chosen a KDK scheme but found that a DKDscheme performs slightly better.2anno Rein and Shang-Fei Liu: REBOUND: An open-source multi-purpose N-body code for collisional dynamics problem studied. Note that in general variable time-steps alsobreak the symplectic nature of an integrator.
REBOUND does not choose the time-step automatically. It isup to the user to ensure that the time-step is small enough to nota ff ect the results. This is especially important for highly colli-sional systems in which multiple collisions per time-step mightoccur and in situations where the curvature of particle trajecto-ries is large. The easiest way to ensure numerical convergenceis to run the same simulation with di ff erent time-steps. We en-courage users to do that whenever a new parameter regime isstudied. Leap-frog is a second-order accurate and symplectic integratorfor non-rotating frames. Here, the Hamiltonian is split into thekinetic part H = p and the potential part H = Φ ( x ). Boththe drift and kick sub-steps are simple Euler steps. First the posi-tions of all particles are advanced for half a time-step while keep-ing the velocities fixed. Then the velocities are advanced for onetime-step while keeping the positions fixed. In the last sub-stepthe velocities are again advanced for half a time-step. Leap-frogis implemented in the module integrator leapfrog.c . A symplectic Wisdom-Holman mapping (WH, Wisdom& Holman 1991) is implemented as a module in integrator wh.c . The implementation follows closelythat by the SWIFT code . The WH mapping is a mixed variableintegrator that calculates the Keplerian motion of two bodiesorbiting each other exactly up to machine precision during thedrift sub-step. Thus, it is very accurate for problems in whichthe particle motion is dominated by a central 1 / r potential andperturbations added in the kick sub-step are small. However,the WH integrator is substantially slower than the leap-frogintegrator because Kepler’s equation is solved iteratively everytime-step for every particle.The integrator assumes that the central object has the index 0in the particle array, that it is located at the origin and that it doesnot move. The coordinates of all particles are assumed to be theheliocentric frame. During the sub-time-steps the coordinates areconverted to Jacobi coordinates (and back) according to theirindex. The particle with index 1 has the first Jacobi index, andso on. This works best if the particles are sorted according totheir semi-major axis. Note that this is not done automatically. The symplectic epicycle integrator (SEI, Rein & Tremaine2011) for Hill’s approximation (Hill 1878) is implemented in integrator sei.c . When shear-periodic boundary conditions( boundaries shear.c ) are used, the Hill approximation isknow as a shearing sheet.SEI has similar properties to the Wisdom-Holman mappingin the case of the Kepler potential but works in a rotating frameand is as fast as a standard non-symplectic integrator. The errorafter one time-step scales as the third power of the time-steptimes the ratio of the gravitational force over the Coriolis force(see Rein & Tremaine 2011, for more details on the performanceof SEI). The epicyclic frequency Ω and the vertical epicyclic fre-quency Ω z can be specified individually. This can be used toenhance the particle density in the mid-plane of a planetary ringand thus simulate the e ff ect of self-gravity (see e.g. Schmidt et al.2001).
4. Gravity
REBOUND is currently equipped with two (self)-gravity modules.A gravity module calculates exactly or approximately the accel-eration onto each particle. For a particle with index i this is givenby a i = N active − (cid:88) j = Gm j (cid:16) r ji + b (cid:17) / ˆr ji , (1)where G is the gravitational constant, m j the mass of particle j and r ji the relative distance between particles j and i . The grav-itational softening parameter b defaults to zero but can be set toa finite value in simulations where physical collisions betweenparticles are not resolved and close encounters might lead tolarge unphysical accelerations. The variable N active specifies thenumber of massive particles in the simulation. Particles with anindex equal or larger than N active are treated as test-particles. Bydefault, all particles are assumed to have mass and contribute tothe sum in Eq. 1. The direct summation module is implemented in gravity direct.c and computes Eq. 1 directly. If thereare N active massive particles and N particles in total, the perfor-mance scales as O ( N · N active ). Direct summation is only e ffi cientwith few active particles; typically N active (cid:46) . Barnes & Hut (1986, BH hereafter) proposed an algorithm to ap-proximate Eq. 1, which can reduce the computation time dras-tically from O ( N ) to O ( N log N ). The idea is straightforward:distant particles contribute to the gravitational force less thanthose nearby. By grouping particles hierarchically, one can sep-arate particles in being far or near from any one particle. Thetotal mass and the center of mass of a group of particles whichare far away can then be used as an approximation when cal-culating the long-range gravitational force. Contributions fromindividual particles are only considered when they are nearby.We implement the BH algorithm in the module gravity tree.c . The hierarchical structure is realizedusing a three-dimensional tree, called an octree. Each noderepresents a cubic cell which might have up to eight sub-cellswith half the size. The root node of the tree contains all theparticles, while the leaf nodes contain exactly one particle. TheBH tree is initially constructed by adding particles one at atime to the root box, going down the tree recursively to smallerboxes until one reaches an empty leaf node to which the particleis then added. If the leaf node already contains a particle it isdivided into eight sub-cells.Every time the particles move, the tree needs to be updatedusing a tree reconstruction algorithm. We therefore keep trackof any particle crossing the boundaries of the cell it is currentlyin. If it has moved outside, then the corresponding leaf node isdestroyed and the particle is re-added to the tree as described root 3root 0 root 1 root 2root 5root 8root 7root 6 root 4 Fig. 1.
Illustration of the essential trees needed by root box 4.The di ff erent levels of the tree structure which need to be copieddepend on the distance to the nearest boundary of root box 4 andthe opening angle θ . See text for details.above. After initialization and reconstruction, we walk throughthe tree to update the total mass and the center of mass for eachcell from the bottom-up.To calculate the gravitational forces on a given particle, onestarts at the root node and descends into sub-cells as long as thecells are considered to be close to the particle. Let us define theopening angle as θ = w / R , where w is the width of the cell and R is the distance from the cell’s center of mass to the particle.If the opening angle is smaller than a critical angle θ crit > θ , thetotal mass and center of mass of the cell are used to calculatethe contribution to the gravitational force. Otherwise, the sub-cells are opened until the criterion is met. One has to choose θ crit appropriately to achieve a balance between accuracy and speed. REBOUND can also include the quadrupole tensor of eachcell in the gravity calculation by setting the pre-compiler flag
QUADRUPOLE . The quadrupole expansion (Hernquist 1987) ismore accurate but also more time consuming for a fixed θ crit .We discuss how the critical opening angle and the quadrupoleexpansion a ff ect the accuracy in Sect. 6.1.With REBOUND , a static domain decomposition is used forparallelizing the tree algorithm on distributed memory systems.Each MPI node contains one or more root boxes (see alsoSect. 2.3) and all particles within these boxes belong to thatnode. The number of root boxes N RB has to be a multiple ofthe number of MPI nodes N MPI . For example, the setup illus-trated in Fig. 1 uses 9 root boxes allowing 1, 3 or 9 MPI nodes.By default, the domain decomposition is done first along the z di-rection, then along the y and x direction. If one uses 3 MPI nodesin the above example, the boxes 0 − − Fig. 2. Di ff erent collision detection algorithms. Left: curved par-ticle trajectories are approximated by straight lines. Right: tra-jectories are not approximated, particles only collide when theyare overlapping. See text for details.lation using an essential tree (Salmon et al. 1990) and an all-to-all communication pattern. The essential tree contains onlythose cells of the local tree that might be accessed by the remotenode during the force calculation. Each node prepares a total of N RB − N RB / N MPI di ff erent essential trees. The cells that constitutethe essential tree are copied into a bu ff er array and the daughtercell references therein are updated accordingly. The center ofmass and quadrupole tensors (if enabled) are stored in the cellstructure and automatically copied when a cell is copied. Forthat reason only the tree structure needs to be distributed, not in-dividual particles. The bu ff er array is then sent to the other nodesusing non-blocking MPI calls.For example, suppose 9 MPI nodes are used, each node us-ing exactly one tree in its domain. For that scenario the essentialtrees prepared for root box 4 are illustrated in Fig. 1. The essen-tial trees include all cells which are close enough to the bound-ary of root box 4 so that they might be opened during the forcecalculation of a particle in root box 4 according to the openingangle criteria.In Sect. 7 we show that this parallelization is very e ffi cientwhen the particle distribution is homogeneous and there aremore than a few thousand particles on every node. When thenumber of particles per node is small, communication betweennodes dominates the total runtime.
5. Collisions
REBOUND supports several di ff erent modules for collision de-tection which are described in detail below. All of these meth-ods search for collisions only approximately, might miss someof the collisions or detect a collision where there is no col-lision. This is because either curved particle trajectories areapproximated by straight lines ( collisions sweep.c and collisions sweepphi.c ) or particles have to be overlappingto collide ( collisions direct.c and collisions tree.c ).This is also illustrated in Fig. 2.In all modules, the order of the collisions is randomized. Thisensures that there is no preferred ordering which might lead tospurious correlations when one particles collides with multipleparticles during one time-step. Note that REBOUND uses a fixedtime-step for all particles. Therefore one has to ensure that thetime-step is chosen small enough so that one particle does collidewith no more than one other particle during one time-step, atleast on average. See also the discussion in Sect. 3.A free-slip, hard-sphere collision model is used. Individualcollisions are resolved using momentum and energy conserva-tion. A constant or an arbitrary velocity dependent normal coef-ficient of restitution (cid:15) can be specified to model inelastic colli- sions. The relative velocity after one collision is then given by v (cid:48) n = − (cid:15) v n v (cid:48) t = v t , (2)where v n and v t are the relative normal and tangential velocitiesbefore the collision. Particle spin is currently not supported. A direct nearest neighbor collisions search is the sim-plest collision module in
REBOUND . It is implemented in collisions direct.c ,In this module, a collision is detected whenever two particlesare overlapping at the end of the DKD time-step, i.e. the middleof the drift sub-step, where positions and velocities are synchro-nized in time (see Sect. 3). This is illustrated in the right panel ofFig. 2. Then, the collision is added to a collision queue. When allcollisions have been detected, the collision queue is shu ffl ed ran-domly. Each individual collision is then resolved after checkingthat the particles are approaching each other.Every pair of particles is checked once per time-step, mak-ing the method scale as O ( N ). Similar to the direct summationmethod for gravity, this is only useful for a small number of par-ticles. For most cases, the nearest neighbor search using a tree ismuch faster (see next section). The octree described in Sect. 4.2 can also be used to searchfor nearest neighbors. The module collisions tree.c im-plements such a nearest neighbor search. It is parallelized withboth OpenMP and MPI. It can be used in conjunction with anygravity module, but when both tree modules gravity tree.c and collisions tree.c are used simultaneously, only one treestructure is needed. When collisions tree.c is the only treemodule, center of mass and octopole tensors are not calculatedin tree cells.To find overlapping particles for particle i , one starts at theroot of the tree and descents into daughter cells as long as thedistance of the particle to the cell center r ic is smaller than acritical value: r ic < R i + R max + √ w c , (3)where R i is the size of the particle, R max is the maximum size of aparticle in the simulation and w c is the width of the current cell.When two particles are found to be overlapping, a collision isadded to the collision queue and resolved later in the same wayas above.If MPI is used, each node prepares the tree and particle struc-tures that are close to the domain boundaries as these might beneeded by other nodes (see Fig. 1). This essential tree is sendto other nodes and temporarily added to the local tree structure.The nearest neighbor search can then be performed in the sameway as in the serial version. The essential tree and particles arenever modified on a remote node.This essential tree is di ff erent from the essential tree used forthe gravity calculation in two ways. First, this tree is needed atthe end of the time-step, whereas the gravity tree is needed at thebeginning of the kick sub time-step. Second, the criteria for cellopening, Eq. 3, is di ff erent.A nearest neighbor search using the octree takes on average O (log( N )) operations for one particle and therefore O ( N log( N ))operations for all N particles.
712 34 6 95 8
734 6 95 8
Fig. 3.
Illustration of the plane-sweep algorithm. The plane isintersecting the trajectories of particles 5 and 7. See text for de-tails.
We further implement two collision detection modules basedon a plane-sweep algorithm in collisions sweep.c and collisions sweepphi.c . The plane-sweep algorithm makesuse of a conceptual plane that is moved along one dimension.The original algorithm described by Bentley & Ottmann(1979) maintains a binary search tree in the orthogonal dimen-sions and keeps track of line crossings. In our implementa-tion, we assume the dimension normal to the plane is muchlonger than the other dimensions. This allows us to simplify theBentley-Ottmann algorithm and get rid of the binary search treewhich further speeds up the calculation.In
REBOUND the sweep is either performed along the x -direction or along the azimuthal angle φ (measured in the xy -plane from the origin). The sweep in the x direction can alsobe used in the shearing sheet. The sweep in the φ direction isuseful for (narrow) rings in global simulations. Here, we onlydiscuss the plane-sweep algorithm in the Cartesian case (alongthe x -direction) in detail. The φ sweep implementation is almostidentical except of the di ff erence in periodicity and the need tocalculate the angle and angular frequency for every particle atthe beginning of the collision search.Our plane-sweep algorithm can be described as follows (seealso Fig. 3):1. If a tree is not used to calculate self-gravity, the particles aresorted according to their x coordinate . During the first time-step, quicksort is used as the particles are most likely not pre-sorted. In subsequent time-steps, the adaptive sort algorithminsertionsort is used. It can make use of the pre-sorted arrayfrom the previous time-step and has an average performanceof O ( N ) as long as particles do not completely randomizetheir positions in one time-step.2. The x coordinate of every particle before and after the driftstep is inserted into an array SWEEPX . The trajectory is ap-proximated by a line (see left panel of Fig. 2). In general, thereal particle trajectories will be curved. In that case the posi-tions are only approximately the start and end points of theparticle trajectory. The particle radius is subtracted / added tothe minimum / maximum x coordinate. The array contains 2 N elements when all particles have been added.3. If a tree is not used, the array SWEEPX is sorted with the x position as a key using the insertionsort algorithm. Becausethe particle array is pre-sorted, insertionsort runs in approxi-mately O ( N ) operations. If a tree is used, the array is sortedwith quicksort.4. A conceptual plane with its normal vector in the x directionis inserted at the left side of the box. While going through the Each tree cell keeps a reference to the particle it contains. This ref-erence has to be updated every time a particle is moved in the particlearray which would lead to larger overhead. 5anno Rein and Shang-Fei Liu: REBOUND: An open-source multi-purpose N-body code for collisional dynamics a v e r age r e l a t i v e a cc e l e r a t i on e rr o r opening anglemonopolequadrupole (a) Force accuracy as a function of the opening angle θ crit . a v e r age r e l a t i v e a cc e l e r a t i on e rr o r computation time [s] monopolequadrupole (b) Force accuracy as a function of the computation time. Fig. 4.
Comparison of the monopole and quadrupole expansion.array
SWEEPX , we move the plane towards the right one stepat a time according to the x coordinate of the current elementin the array. We thus move the plane to the other side of thebox in a total of 2 N stops.5. The plane is intersecting particle trajectories. We keeptrack of these intersection using a separate array SWEEPL .Whenever a particle appears for the first time in the array
SWEEPX the particle is added to the
SWEEPL array. The par-ticle is removed from the array
SWEEPL when it appears inthe array
SWEEPX for the second time. In Fig. 3, the plane isbetween stop 10 and 11, intersecting the trajectories of par-ticles 5 and 7.6. When a new particle is inserted into the array
SWEEPL , wecheck for collisions of this particle with any other parti-cle in
SWEEPL during the current time-step. The collision isrecorded and resolved later. In Fig. 3 the array
SWEEPL hastwo entries, particles 5 and 7. Those will be checked for col-lisions.The time needed to search for a collision at each stop of theplane is O ( N SWEEPL ), where N SWEEPL is the number of elements inthe array
SWEEPL . This could be reduced with a binary searchtree to O (log( N SWEEPL )) as in the original algorithm by Bentley &Ottmann (1979). However tests have shown that there is little tono performance gain for the problems studied with
REBOUND be-cause a more complicated data structure has to be maintained.One entire time-step with the plane-sweep algorithm is com-pleted in O ( N · N SWEEPL ). It is then easy to see that this methodcan only be e ffi cient when N SWEEPL (cid:46) log( N ), as a tree code ismore e ffi cient otherwise.Indeed, experiments have shown (see Sect. 7.4) that theplane-sweep algorithm is more e ffi cient than a nearest neigh-bor search with an octree by many orders of magnitude for lowdimensional systems in which N SWEEPL is small.
6. Test problems
We present several tests in this section which verify the imple-mentation of all modules. First, we measure the accuracy of thetree code. Then we check for energy and momentum conserva-tion. We use a long term integration of the outer solar system as a test of the symplectic WH integrator. Finally, we measurethe viscosity in a planetary ring which is a comprehensive test ofboth self-gravity and collisions.
We measure the accuracy of the tree module gravity tree.c by comparing the force onto each particle to the exact result ob-tained by direct summation (Eq. 1). We set up 1000 randomlydistributed particles with di ff erent masses in a box. We do notuse any ghost boxes and particles do not evolve.We sum up the absolute value of the acceleration error foreach particle and normalize it with respect to the total accelera-tion (see Hernquist 1987, for more details).This quantity is plotted as a function of the critical openingangle θ crit in Fig. 4(a). One can see that the force quickly con-verges towards the correct value for small θ crit . The quadrupoleexpansion performs one order of magnitude better then themonopole expansion for θ crit ∼ . θ crit ∼ . θ crit . However, the quadrupole expansion is faster when θ crit (cid:46) In a non-rotating simulation box with periodic boundaries andnon-gravitating collisional particles, we test both momentumand energy conservation. Using a coe ffi cient of restitution ofunity (perfectly elastic collisions), the total momentum and en-ergy is conserved up to machine precision for all collision detec-tion algorithms. To test the long-term behavior of our implementation of theWisdom-Holman Mapping, we integrate the outer Solar System pe r i he li on o f P l u t o w [ r ad ] time t [yrs] Fig. 5.
Libration pattern of Pluto with two distinct frequencies of3 . v i sc o s i t y particle radius r h* translationalcollisionalgravitational Fig. 6.
Individual components of the viscosity as a function ofthe non-dimensional particle radius.for 200 million years. We use the initial conditions given byApplegate et al. (1986) with 4 massive planets and Pluto as atest particle. The direct summation module has been used to cal-culate self-gravity. With a 40 day time-step and an integrationtime of 200 Myr, the total runtime on one CPU was less then 2hours.In Fig. 5, we plot the perihelion of Pluto as a function oftime. One can clearly see two distinct libration frequencies with3 . Daisaka et al. (2001) calculate the viscosity in a planetary ringusing numerical simulations. We repeat their analysis as this isan excellent code test as the results depend on both self-gravityand collisions. The quasi-equilibrium state is dominated by ei-ther self-gravity or collisions, depending on the ratio of the Hillradius over the physical particle radius, r (cid:63) h . In this simulation we use the octree implementation for grav-ity and the plane-sweep algorithm for collisions. The geometricoptical depth is τ = . ffi cient ofrestitution of (cid:15) = .
5. The separate parts of the viscosity are cal-culated directly as defined by Daisaka et al. (2001) for various r (cid:63) h and plotted in dimensionless units in Fig. 6.The results are in good agreement with previous results. Atlarge r (cid:63) h , the collisional part of the viscosity is slightly higher inour simulations when permanent particle clumps form. This ismost likely due to the the di ff erent treatment of collisions andsome ambiguity in defining the collisional viscosity when par-ticles are constantly touching each other (Daisaka, private com-munication).
7. Scaling
Using the shearing sheet configuration with the tree modules gravity tree.c and collisions tree.c , we measure thescaling of
REBOUND and the e ffi ciency of the parallelization. Thesimulation parameters have been chosen to resemble those of atypical simulation of Saturn’s A-ring with an optical depth oforder unity and a collision time-scale being much less than oneorbit. The opening angle is θ crit = .
7. The problem.c files forthis and all other tests can be found in the test/ directory.All scaling tests have been performed on the IAS auroracluster. Each node has dual quad-core 64-bit AMD OpteronBarcelona processors and 16 GB RAM. The nodes are intercon-nected with 4x DDR Infiniband.
In the strong scaling test the average time to compute one time-step is measured as a function of the number of processors for afixed total problem size (e.g. fixed total number of particles). Weuse only the MPI parallelization option.The results for simulations using N = . , , N pp ∼ In the weak scaling test we measure the average time to computeone time-step as a function of the number of processors for afixed number of particles per processor. Again, we only use theMPI parallelization option.The results for simulations using N pp = ,
50k and 100kparticles per processor are plotted in Fig. 7(b). One can easilyconfirm that the runtime for a simulation using k processors is O ( N pp log( N pp k )), as expected. By increasing the problem size,the communication per processor does not increase for the col-lision detection algorithm as only direct neighbors need to beevaluated on each node. The runtime and communication for thegravity calculation is increasing logarithmically with the totalnumber of particles (which is proportional to the number of pro-cessors in this case). t i m e s t ep s pe r s e c ond number of nodesMPI, 12.5k particlesMPI, 50k particlesMPI, 200k particlesMPI, 800k particleslinear scaling (a) Strong scaling test: constant problem size, varying number of nodes. t i m e s t ep s pe r s e c ond number of nodesMPI, 25k particles per nodeMPI, 50k particles per nodeMPI, 100k particles per node1/log(k) (b) Weak scaling test: constant problem size per node. Fig. 7.
Strong and weak scaling tests using a shearing sheet configuration with the gravity tree.c and collisions tree.c modules. t i m e s t ep s pe r s e c ond number of OMP processes per MPI nodenumber of MPI nodesOMP+MPI, 64 processes in total, 10k particlesOMP+MPI, 64 processes in total, 50k particlesOMP+MPI, 64 processes in total, 200k particlesOMP+MPI, 64 processes in total, 500k particles Fig. 8.
Comparison between OpenMP and MPI. Each runuses 64 CPU cores. A shearing sheet configuration the with gravity tree.c and collisions tree.c modules is used.These results shows that
REBOUND ’s scaling is as good asit can possibly get with a tree code. The problem size is onlylimited by the total number of available processors.
The previous results use only MPI for parallelization.
REBOUND also supports parallelization with OpenMP for shared memorysystems.OpenMP has the advantage over MPI that no communicationis needed. On one node, di ff erent processes share the same mem-ory and work on the same tree and particle structures. However,the tree building and reconstruction routines are not parallelized.These routines can only be parallelized e ffi ciently when a do-main decomposition is used (as used for MPI, see above). Results of hybrid simulations using both OpenMP and MPIat the same time are shown in Fig. 8. We plot the average time tocompute one time-step as a function of the number of OpenMPprocesses per MPI node. The total number of particles and pro-cessors (64) is kept fixed.One can see that OpenMP does indeed perform better thanMPI when the particle number per node is small and the run-time is dominated by communication (see also Sect. 7.1). Forlarge particle numbers, the di ff erence between OpenMP and MPIis smaller, as the sequential tree reconstruction outweighs thegains. Eventually, for very large simulations ( N pp (cid:38) The collision modules described in Sect. 5 have very di ff er-ent scaling behaviors and are optimized for di ff erent situations.Here, we illustrate their scalings using two shearing sheet con-figurations with no self-gravity. We plot the average number oftime-steps per second as a function of the problem size in Fig. 9for the plane-sweep algorithm and both the octree and directnearest neighbor collision search.In simulations used in Fig. 9(a), we vary both the azimuthalsize, L y , and radial size, L x , of the computational domain. Theaspect ratio of the simulation box is kept constant. For the plane-sweep algorithm, the number of particle trajectories intersectingthe plane scales as N SWEEPL ∼ L y ∼ √ N . Thus, the overall scal-ing of the plane-sweep method is O ( N . ), which can be verifiedin Fig. 9(a). Both the tree and direct detection methods scale un-surprisingly as O ( N log( N )) and O ( N ), respectively.For simulations used in Fig. 9(b), we vary the radial size ofthe computational domain and keep the azimuthal size fixed at20 particle radii. Thus, the aspect ratio changes and the box be- Note that a disk is e ff ectively a two dimensional system. In threedimensions N SWEEPL ∼ L y L z ∼ N / .8anno Rein and Shang-Fei Liu: REBOUND: An open-source multi-purpose N-body code for collisional dynamics t i m e s t ep s pe r s e c ond number of particles Nplane-sweep algorithm1/N direct collision search1/N nearest neighbor search with tree1/(N log(N)) (a) Varying the size of the simulation box and keeping a constant aspectratio.
10 100 1000 10000 100000 10 100 1000 10000 t i m e s t ep s pe r s e c ond number of particles Nplane-sweep algorithm1/Ndirect collision searchnearest neighbor search with tree1/N (b) Varying the radial size of the simulation box and keeping a constantazimuthal width. Fig. 9.
Scalings of the plane-sweep algorithm, the octree and direct nearest neighbor search as a function of particle number. Ashearing sheet configuration without self-gravity is used.comes very elongated for large particle numbers. If a tree is usedin
REBOUND , an elongated box is implemented as many indepen-dent trees, each being a cubic root box (see Sect. 2.3). Becauseeach tree needs to be accessed at least one during the colli-sion search, this makes the tree code scale as O ( N ) for large N , e ff ectively becoming a direct nearest neighbor search. Theplane-sweep algorithm on the other hand scales as O ( N ), as thenumber of particle trajectories intersecting the plane is constant, N sweep ∼ L y = const . Again, the direct nearest neighbor searchscales unsurprisingly as O ( N ).From these test cases, it is obvious that the choice of collisiondetection algorithm strongly depends on the problem. Also notethat if the gravity module is using a tree, the collision searchusing the same tree comes at only a small additional cost.The plane-sweep module can be faster for non-self-gravitating simulations by many orders of magnitude, especiallyif the problem size is varied only in one dimension.
8. Summary
In this paper, we presented
REBOUND , a new open-source multi-purpose N-body code for collisional dynamics.
REBOUND isavailable for download at http://github.com/hannorein/rebound and can be redistributed freely under the GPLv3 li-cense.The code is written in a modular way, allowing users tochoose between di ff erent numerical integrators, boundary con-ditions, self-gravity solvers and collision detection algorithms.With minimal e ff ort, one can also implement completely newmodules.The octree self-gravity and collision detection modules arefully parallelized with MPI and OpenMP. We showed that bothrun e ffi ciently on multi-core desktop machines as well as onlarge clusters. Results from a weak scaling test show that thereis no practical limit on the maximum number of particles that REBOUND can handle e ffi ciently except by the number of avail-able CPUs. We will use this in future work to conduct extremelyelongated simulations that can span the entire circumference ofSaturn’s rings. Two new collision detection methods based on a plane-sweep algorithm are implemented in REBOUND . We showed thatthe plane-sweep algorithm scales linearly with the number ofparticles for e ff ectively low dimensional systems and is there-for superior to a nearest neighbor search with a tree. Examplesof e ff ectively low dimensional systems include very elongatedsimulation domains and narrow rings. Furthermore, the simplerdata-structure of the plane-sweep algorithm makes it also supe-rior for quasi-two dimensional simulations with less than aboutone million particles.Three di ff erent integrators have been implemented, for ro-tating and non-rotating frames. All of these integrators are sym-plectic. Exact long-term orbit integrations can be performed witha Wisdom-Holman mapping.Given the already implemented features as well as the openand modular nature of REBOUND , we expect that this code willfind many applications both in the astrophysics community andbeyond. For example, molecular dynamics and granular flowsare subject areas where the methods implemented in
REBOUND can be readily applied. We strongly encourage users to contributenew algorithms and modules to
REBOUND . Acknowledgements.
We would like to thank the referee John Chambers for help-ful comments and suggestions. We would also like to thank Scott Tremaine,Hiroshi Daisaka and Douglas Lin for their feedback during various stages ofthis project. Hanno Rein was supported by the Institute for Advanced Study andthe NSF grant AST-0807444. Shang-Fei Liu acknowledges the support of theNSFC grant 11073002. Hanno Rein and Shang-Fei Liu would further like tothank the organizers of ISIMA 2011 and the Kavli Institute for Astronomy andAstrophysics in Beijing for their hospitality.
References
Applegate, J. H., Douglas, M. R., Gursel, Y., Sussman, G. J., & Wisdom, J. 1986,AJ, 92, 176Barnes, J. & Hut, P. 1986, Nature, 324, 446Bentley, J. & Ottmann, T. 1979, Computers, IEEE Transactions on, C-28, 643Crida, A., Papaloizou, J., Rein, H., Charnoz, S., & Salmon, J. 2010, AJ, submit-tedDaisaka, H., Tanaka, H., & Ida, S. 2001, Icarus, 154, 296Hernquist, L. 1987, ApJS, 64, 715Hill, G. W. 1878, Astronomische Nachrichten, 91, 251