PPICsar2D: Public Release
Mikhail A. Belyaev
Astronomy Department, University of California [email protected]
ABSTRACT
PICsar2D is a 2.5D relativistic, electromagnetic, particle in cell code designed forstudying the pulsar magnetosphere. The source code and a suite of
Python analysisroutines can be downloaded from “https://github.com/astromb/PICsar2D.git”. Addi-tionally, the repository includes a step-by-step tutorial for compiling the code, runningit, and analyzing the output. This article is devoted to several new algorithmic ad-vances and numerical experiments. These include a new pair injection prescription atthe pulsar surface, a comparison of different pair injection techniques, a discussion ofparticle trajectories near the pulsar Y-line, and performance optimization of the code.
1. Introduction
PICsar2D is a particle in cell (PIC) code that was developed for studying the pulsar magne-tosphere (Belyaev 2015a). The code solves Maxwell’s equations on a mesh with a current sourceterm provided by particles. Electric and magnetic fields are advanced in time using standard finitedifference time domain techniques (Yee 1966), and currents are deposited to the grid via either theVillasenor-Buneman (Villasenor & Buneman 1992) or Esirkepov (Esirkepov 2001) algorithms. Theparticle positions and velocities are leapfrogged in time using either the relativistic version of theBoris mover (Boris 1970) or the Vay mover (Vay 2008). The advantage of the Vay mover is that itaccurately captures the ExB particle drift, even when the particle Larmor frequency is not resolvedin time. The code is charge-conservative (to machine precision), and therefore no divergence clean-ing is required. A current filtering algorithm that preserves the charge-conservative property of thecode is used to suppress high-frequency noise. Additional features of the code include radiationabsorbing boundary conditions and variable grid spacing in the radial and theta directions.
PICsar2D has already been used to simulate the pulsar magnetosphere (Belyaev 2015b). Here,we focus on new results pertaining to numerical experiments with pair production and parti-cle motion near the Y-line. Additionally, we present performance optimization techniques thathave been used in tuning the public release of the code. The source code and analysis rou-tines used to generate the plots in this paper and in Belyaev (2015a,b) can be downloaded from“https://github.com/astromb/PICsar2D.git”. The repository also includes a step-by-step tutorialthat details how to compile and run the code, as well as use the analysis routines. a r X i v : . [ a s t r o - ph . H E ] J u l
2. Numerical Experiments with New Pair Production Algorithm2.1. New Pair Production Algorithm
There have already been a number of works investigating the pulsar magnetosphere with PIC(Spitkovsky & Arons 2002; Chen & Beloborodov 2014; Cerutti et al. 2015; Belyaev 2015b; Philippovet al. 2015a,b). The numerical methods used in these studies are broadly consistent, but one majorsource of variation between them is the particle injection method.Charged particles are expected to be present in the pulsar magnetosphere for two reasons. First,particles are pulled off the neutron star, because they are unbound by the intense electromagneticfields at the surface (Goldreich & Julian 1969). Second pair production of positrons and electrons(e.g. via gamma rays on magnetic field lines (Sturrock 1971)) creates a pair plasma in the vicinityof the pulsar surface.The particle injection prescription in Belyaev (2015b) consisted of two parts. The first partmodeled the emission of charged particles from the neutron star via a surface charge emissionalgorithm. This algorithm uses Gauss’s law to compute the amount of charge accumulated on thepulsar surface each timestep and releases a fraction of it into the magnetosphere. The second partmodeled the presence of pairs in the magnetosphere. To mock up a pair plasma, Belyaev (2015b)injected pairs throughout a spherical volume of the simulation domain with radius of order thelight cylinder radius in such a way that E · B was forced towards zero everywhere throughout thisvolume. This volumetric injection prescription was ad hoc, but it did achieve its intended purposeof generating a pair plasma around the pulsar.Cerutti et al. (2015) implemented a different pair injection algorithm. They injected pairs atthe surface of the pulsar and gave them a small initial outward velocity . This injection methodfills the magnetosphere with pairs that come exclusively from the surface of the neutron star. Thus,it can be viewed as a subgrid model for a surface pair cascade.The algorithm of Cerutti et al. (2015) produces pairs everywhere on the surface of the neutronstar. However, Beloborodov (2008); Timokhin & Arons (2013) have shown that a surface paircascade is present only if certain criteria are satisfied locally: for surface pair production to occur, either the four-current is spacelike, or it is timelike but the current density is in the sense ofthe Goldreich-Julian density flowing into the star. If the four-current is timelike and the currentdensity is in the same sense as the Goldreich-Julian density flowing away from the star, pairs arenot produced locally. The Goldreich-Julian density (in flat spacetime) can be written as ρ GJ = − Ω · B πc (Gaussian units) . (1) The small initial velocity can be physically justified by the fact that pair production by gamma rays on magneticfield lines leads to initial pair velocities that are outwardly directed along field lines.
We now perform numerical experiments that run the gamut of different options for surfacepair injection within
PICsar2D . These experiments not only demonstrate different “states” that arepossible in the pulsar problem with PIC, but also highlight the effect that different pair productionprescriptions have on the solution. These numerical experiments can be run using the publiclyavailable version of the
PICsar2D code and require only modest computing resources.We perform four simulations of the aligned rotator with different pair production prescriptions.Features common to each of the simulations are 1) surface charge emission is turned on for theentire simulation; 2) the grid dimensions are N r × N θ = 513 ×
257 ( N r and N θ give the number ofgrid edges which is one more than the number of grid cells); 3) the grid is logarithmic in the radialdirection and has extent 1 < r/r ∗ < .
5; 4) an equal area theta grid is used in the θ direction toreduce particle shot noise near the poles (Belyaev 2015a) (the simulation extent in θ is to the poles,0 < θ < π ); 5) the light cylinder radius is at R lc = 4 r ∗ ; 6) the simulations contain only electrons andpositrons (i.e. no ions or other particle species are included); 7) the simulations are for an alignedrotator; 8) the initial magnetic field structure is a dipole; 9) the Vay particle pusher is used ratherthan the Boris pusher; 10) Esirkepov current deposition is used rather than Villasenor-Buneman(although this essentially has no effect on the results); 11) the current is filtered twice using thesame filtering algorithm as in (Belyaev 2015a) to reduce high frequency noise.Our four numerical experiments are labeled A1, A2, D1, and D2, respectively, and they differ 4 –Fig. 1.— Aligned rotator spindown luminosity for simulations A1, A2, D1, & D2. The red lineshows the total spindown luminosity, and the black and blue lines show the respective contributionsfrom the electromagnetic and particle components.only in the pair production prescription. All simulations are run for a duration of t = 3 . P ∗ ,where P ∗ = 2 π/ Ω ∗ is the pulsar period with the exception of simulation D2, which is run forfour times longer. Simulation D1 has no pair production, only surface charge emission, similar toKrause-Polstorff & Michel (1985); Spitkovsky & Arons (2002). Simulation A1 has pair productionon all field lines for all time, similar to Cerutti et al. (2015). Simulation D2 has pair production onall field lines turned on for t < . P ∗ , but pair production is turned off entirely for t > . P ∗ ,leaving only surface charge emission. Simulation A2 has pair production on all field lines turnedon for t < . P ∗ but for t > . P ∗ we use our new injection algorithm that only injects pairson field lines satisfying the criteria of Beloborodov (2008); Timokhin & Arons (2013). We take theframe dragging rate in simulation A2 to be ω LT ( r ∗ ) = . ∗ as in Belyaev & Parfrey (2016) whencomputing the four current over the polar cap, which enters into the pair production criteria. 5 –Note that in our simulations that include pairs, we only model surface pair production. Wedo not provide models for any pair production processes that have been suggested for the outermagnetosphere. Thus, any particles that are injected into the simulation domain are injected withinone grid cell of the pulsar surface. Because there are no particles within the simulation domaininitially, this means all particles in all simulations originate very near the surface of the neutronstar.The different prefix labels, “A” or “D” for each simulation, denote whether the pulsar is inan active or a dead state at the end of the simulation. An active state means that the spindownluminosity is comparable to the spindown luminosity of the force-free aligned rotator (Contopouloset al. 1999; Spitkovsky 2006), and a dead state means that the spindown luminosity is muchlower than this. Fig. 1 shows the total spindown luminosity as a function of radius, as well asthe individual contributions from the electromagnetic component (Poynting flux integrated over aspherical shell) and the particle component. The spindown luminosity is normalized to that of theforce-free aligned rotator ( L ) and is measured at the end of each simulation.Active simulations have a spindown luminosity comparable to the force-free case, whereasdead simulations have a spindown luminosity that is effectively zero (note the different scale onthe y -axis between the active and dead cases in Fig. 1). However, the spindown luminosity isnot exactly equal to the force-free luminosity, even for active simulations. Moreover, there isconversion of electromagnetic energy to particle kinetic energy beyond the light cylinder in theactive simulations, which is not possible in the force-free limit. Both of these effects arise with PICbecause the force-free approximation breaks down over a finite volume of the simulation domain.The magnetization is a useful parameter that probes the ratio between the magnetic andparticle energy densities and determines how close to force-free the simulation is throughout thedomain. Fig. 2 shows the magnetization for simulations A1 and A2, where we define σ ≡ B π ( γ + n + + γ − n − ) mc (Gaussian units) , (2)and γ + , n + and γ − , n − are the average gamma factors and densities of the positrons and electrons,respectively. Near the Y-line and in the equatorial current layer, the magnetization is low implyingthat particle energy dominates magnetic energy. Particle inertia is important in these regions andtends to open up magnetic field lines. Because the spindown luminosity depends sensitively on theexact location of the Y-line as L ∼ L ( R Y /R LC ) − , moving the Y-line radially inward by only 5%amounts to a 20% increase in the pulsar spindown luminosity.Next, we consider the reason why simulations D1 and D2 lead to dead pulsar states. Becausethe magnetospheric plasma is generally well-magnetized, currents flow along magnetic field lines inthe meridional plane. In the active pulsar case, a return current flows at the boundary between thezone of open and closed field lines inside the light cylinder. The return current layer requires bothelectrons and positrons to be present for two reasons. First, the four-current in the return currentlayer is spacelike, and hence requires charge carriers of both signs moving in opposite directions. 6 –Fig. 2.— Logarithm of the magnetization parameter, σ , for simulations A1 (left panel) and A2(right panel) at the end of the simulation. The black curves are the magnetic field lines, and thegray inner circle is the neutron star. Note that just because a region has a high magnetization doesnot mean it is force-free, since magnetic field in vacuum has infinite magnetization. In fact, theregions on either side of the current layer in simulation A2 are vacuum gaps.Second, field lines that thread the return current layer “turn over” in the meridional plane ( B z changes sign at some point on the field line), which implies that the Goldreich-Julian density(equation [1]) also changes sign on these field lines. Emission of surface charge alone providesparticles of a single sign, either positive or negative, which depends on the sign of ρ GJ at thefootpoint of the magnetic field line. Thus, for field lines that turn over, surface charge emissioncannot supply plasma past the turn over point. For these reasons, pulsars with only surface chargeemission (no pair production) cannot support a return current layer and are dead.Because dead pulsars do not support return currents, the region of space around a dead pulsaris typically referred to as an “electrosphere” rather than as a magnetosphere. Electrospheres werestudied extensively by Krause-Polstorff & Michel (1985); Michel & Li (1999), who found a spacecharge separated plasma characterized by domes of negative charge near the poles and a torus ofpositive charge near the equatorial midplane . The left panel of Fig. 3 shows the charge distributionin the electrosphere at the end of simulation D1. Note that despite the large vacuum gaps in themagnetosphere, E · B = 0 over the surface of the neutron star. This is because the surface chargeinjection algorithm does not allow charge to build up on the pulsar surface.The rotation profile in the equatorial plane (right panel of Fig. 3) can be used to check the The only difference between the aligned and anti-aligned rotator as regards the electrosphere is that the sign ofcharge carriers is flipped in the domes and in the torus. ∗ . The solid black curve is calculated at the end of thesimulation and shows the rotation profile corresponding to the distribution of charge shown in theleft panel. The dashed curve is the initial rotation profile which corresponds to vacuum outside theneutron star (no particles).simulation result. Particles on magnetic field lines that are immersed in plasma corotate with theneutron star by Ferraro’s isorotation law (or equivalently by the frozen-in flux theorem). In theright panel of Fig. 3, the plasma in the equatorial plane corotates up to a point R (cid:46) . R ∗ , andthen departs from corotation for R (cid:38) . R ∗ . The radius R ≈ . R ∗ corresponds to the crossingradius in the equatorial plane of the last field line that is completely immersed in plasma (left panelof Fig. 3). Magnetic field lines that cross the equatorial plane at a larger radius pass through aregion of vacuum between the domes and the torus. Hence, the plasma on these field lines is nolonger bound to corotate with the pulsar by Ferraro’s isorotation law.Simulation D1 is in a dead state for all time, but even an active pulsar transitions to a deadstate if pair production is turned off entirely. Fig. 4 shows the transition from an active to adead state in simulation D2. The red points represent positrons and the blue points electrons.At t = 3 . P ∗ the magnetosphere is in an active state with plasma present everywhere, sincepair production has been turned on since the star of the simulation. After pair production is 8 –Fig. 4.— Downsampled snapshots of the spatial distribution of particles (positrons red, electronsblue) for simulation D2. The black curves are magnetic field lines and the gray sphere is theneutron star. The upper left panel corresponds to the time right before pair production is switchedoff in the simulation. After pair production is turned off, an electrosphere develops on a timescalecomparable to the pulsar rotation period. Notice how the open field lines close and contract to adipole structure after the return current ceases to flow.turned off, current ceases to flow in the return current layer, and the pulsar circuit is quenched.The magnetosphere collapses to an electrosphere and the pulsar transitions to a dead state on atimescale comparable to the pulsar period. The spindown luminosity in simulation D2 is slightlyelevated compared to simulation D1 for several rotational periods but settles down to the same lowlevels as in simulation D1 (compare the two bottom panels of Fig. 1).Given that turning off pair production entirely in simulation D2 rapidly leads to a dead state, itis interesting that simulation A2 remains in an active state after t > . P ∗ , when pair productionis shut off on field lines not satisfying the pair production criteria of Beloborodov (2008); Timokhin& Arons (2013). Perhaps not surprisingly, the reason why simulation A2 remains active has to dowith the return current layer. Because the four-current in the return current layer is spacelike, thefootpoints of the magnetic field lines along which the return current flows sustain pair production 9 – particles per cell A1 particles per cell A2pair multiplicity A1 pair multiplicity A2 Fig. 5.— Upper row: Logarithm of the particles per cell at the end of the simulation ( t = 3 . P ∗ )for simulations A1 (left) and A2 (right). The black wedge on either side of the current layer insimulation A2 is a vacuum gap that is devoid of both electrons and positrons. Lower row: Pairmultiplicity for simulations A1 (left) and A2 (right). The large purple wedge for simulation A2polewards of the the return current layer corresponds to field lines which do not support pairproduction and are populated by only a single sign of charge (electrons). The black curves aremagnetic field lines, and the gray circle is the neutron star.according to the criteria of Beloborodov (2008); Timokhin & Arons (2013). Thus, the return currentis not quenched in simulation A2 after the pair production criteria are imposed, and the simulationremains in an active state indefinitely.Even though simulations A1 and A2 both support active pulsars, there are important differ- 10 –ences between them. For instance, there is no pair production within a large wedge in θ startingfrom the return current layer and extending towards the pole in simulation A2. As a result, vac-uum gaps which are devoid of plasma are present on either side of the current layer. The upperpanels of Fig. 5 show the particles per cell (electrons and positrons combined) for simulations A1and A2. Vacuum gaps in simulation A2 are the black regions adjacent to the current layer. Fieldlines threading the vacuum gaps in simulation A2 are ones that do not support pair productionaccording to the criteria of Beloborodov (2008); Timokhin & Arons (2013) and also turn over in themeridional plane. Surface charge emission only supplies a single sign of charge from the neutronstar surface. Thus, these field lines contain no plasma beyond the point where they turn over andthe Goldreich-Julian density changes sign.At mid-latitudes, the field lines in simulation A2 do not support pair production, but they nolonger turn over, so the Goldreich-Julian density has the same sign everywhere along the field line.In this case, surface charge emission can provide plasma everywhere on the field line, meaning thesemid-latitude open field lines are populated with a single species electron plasma. Higher latitudefield lines close to the polar axis support pair production, because the four-current (taking GR intoaccount) is spacelike (Belyaev & Parfrey 2016; Gralla et al. 2016).The distinction between field lines that support pair production and those that do not becomesclear if we examine the pair multiplicity, κ ≡ abs (cid:18) n + + n − n + − n − (cid:19) (pair multiplicity) , (3)where n + and n − are the positron and electron densities, respectively. The lower panels of Fig.5 show the pair multiplicity for simulations A1 and A2. In simulation A2, there is a large purplewedge at mid-latitudes that corresponds to field lines that do not support pair production andhave pair multiplicity equal to unity. Near the polar axis and in the return current layer the pairmultiplicity is higher, because surface pair production is turned on in these regions according tothe criteria of Beloborodov (2008); Timokhin & Arons (2013).The frame dragging rate, ω LT ( r ∗ ), affects the θ -extent of the high-latitude pair producingregions adjacent to the polar axis. As the frame dragging rate is decreased, the pair producingregions near the polar axis shrink in size, and for ω LT = 0 (flat spacetime) they vanish altogether.However, even without pair production near the polar axis, there is still pair production in thereturn current layer. Thus, the pulsar remains active even in flat spacetime, because the presenceof pairs in the return current layer is the fundamental ingredient that is necessary for the pulsarcircuit to function.To conclude the discussion of active pulsars, Fig. 6 shows the cell-averaged gamma factorfor positrons in simulations A1 (left panel) and A2 (right panel). A large wedge of field lines at Although some particles may be present in the gaps, there is not enough plasma to short out the acceleratingcomponent of the electric field along magnetic field lines.
11 –Fig. 6.— Logarithm of the cell-averaged positron gamma factor at end of simulation A1 (left) andA2 (right). Particle acceleration occurs near the Y-line and in the current sheet. The large blackwedge in the right panel is a region devoid of positrons (see discussion in text). The black curvesare magnetic field lines, and the gray circle is the neutron star.mid-latitudes above the return current layer do not support pair production in simulation A2, asalready discussed. Only electrons are present on these field lines, and thus there is a large blackwedge in the right hand panel of Fig. 6, which contains only a few scattered positrons. One featurethat simulations A1 and A2 have in common, though, is the acceleration of positrons in the currentlayer starting at the Y-line near the light-cylinder. This is evident in both panels of Fig. 6 as theincrease in gamma factor at the Y-line and in the current layer.The electric field near the Y-line and in the current layer transforms electromagnetic energy toparticle kinetic energy. One unanswered question is what is the underlying reason for the presenceof a strong accelerating field in the vicinity of the Y-line? It is not obvious that the electric field inpulsar PIC simulations is due to spontaneous magnetic reconnection, which is the focus of manyidealized reconnection studies. Rather, it may be that the accelerating electric field in the vicinityof the Y-line is necessitated by the global structure of the magnetosphere. If this hypothesis iscorrect, then the electric field is required to “straighten out” positron trajectories passing throughthe Y-line and make them more nearly radial. A better understanding of the accelerating electricfield near the Y-line in PIC studies and how it depends on problem parameters is crucial for drawingcorrect comparisons between simulations and observations of high energy emission from pulsars. 12 – positronelectron
Fig. 7.— Upper panel: Multicolored curve shows a sample positron trajectory that starts at thesurface of the neutron star and passes through the Y-line into the current layer. Color indicates thegamma factor of the particle along its trajectory. The black and white background image shows themeridional current density normalized to a canonical value appropriate for the force-free solution.Lower panel: As above but for a sample electron trajectory that enters the current layer. 13 –
3. Particle Trajectories in the Current Layer and Near the Y-line
We now discuss particle trajectories on open field lines for an active pulsar, focusing particularattention on trajectories passing through the Y-line and current layer beyond the light cylinder.The electric field present at the Y-line and in the current layer acts as a particle filter, whichaccelerates positrons and electrons in opposite directions. Positrons entering the current layer areaccelerated radially outward and undergo relativistic Speiser orbits (Uzdensky et al. 2011). If thecurrent layer were exactly in the equatorial midplane, the radially-directed electric field wouldstraighten positron trajectories making them more nearly radial and focusing positrons toward themidplane. However, the current layer itself is susceptible to kinking, which deflects particles awayfrom the midplane. In contrast to positrons, electrons entering the current layer are acceleratedinward towards the Y-line. These electrons cannot travel past the Y-line and are reflected at thecorotation zone due to magnetic mirroring. The orbits of these electrons are complex, because theyare radially trapped in the current layer near the Y-line. However, electrons can pop out of thecurrent layer in the θ direction and become entrained in the wind above it. Such electrons undergoExB drift in the nearly force-free electromagnetic fields just above the current layer.The upper panel of Fig. 7 shows the trajectory of a sample positron from simulation A1 thatpasses through the Y-line and into the current layer. As the positron approaches the Y-line, it isaccelerated by the radially-directed electric field. It continues accelerating as it enters the currentlayer and initially starts to be focused into the midplane. However, for R/R lc (cid:38) . θ direction, becomes entrained in the pulsar wind above it, andstarts to undergo ExB drift in a predominantly radial direction. Note that the electron trajectoryappears to end only because the end of the simulation has been reached. If the simulation wererun for longer, the electron would continue to ExB drift outwards, either indefinitely or until it isreabsorbed into the current layer.Fig. 8 shows sample electron trajectories (blue) and positron trajectories (red) for simulationsA1 that pass beyond R = . R LC . Electrons and positrons that do not pass through the Y-lineor enter the current layer follow trajectories along field lines in the meridional plane. The currentlayer is actually much narrower than it appears judging from the positron trajectories beyond R = 1 . R LC . This is because the current layer is kinked, and the latitudinal spread in positrontrajectories traces the amplitude of the kink, not the width of the current layer. This is possiblebecause the kink and the positrons in the current sheet propagate outward in phase at nearly the 14 –Fig. 8.— Sample trajectories for electrons (blue) and positrons (red) that pass beyond R = . R LC .Black curves are magnetic field lines and the gray sphere is the neutron star.speed of light.Fig. 8 demonstrates filtering of particles by the electric field, which effectively removes allelectrons from the current layer past a few light cylinder radii. We caution against reading toomuch into this, however, since pair multiplicities in global pulsar PIC simulations are unrealisticallylow for numerical reasons. Thus, electrons may not be entirely filtered out from the current layerin reality, and it would be interesting to increase the pair multiplicity in future simulations. Inparticular, it would be interesting to study how the pair multiplicity affects the strength of theelectric field near the Y-line and in the current sheet.
4. Performance Optimization
PICsar2D has been optimized to take advantage of SIMD vectorization and cache reuse onmodern processors. We focused on optimizing the mover, since this is the slowest step in thecode. In fact, the mover is significantly slower than in a Cartesian PIC code, because of theadditional coordinate transformations that are necessary in curvilinear coordinates. Fortunately, itis straightforward to achieve vectorization of the most time-intensive parts of the mover by adoptingan Array of Structure of Arrays (AoSoA) data layout for the particles.Storing the data in an AoSoA format is a standard technique used to achieve both goodvectorization and cache localization for particle codes (Decyk & Singh 2014). To understand why 15 –this is the case, it is useful to first consider the simpler example of an Array of Structures (AoS). Inthis case, the particle array using Fortran indexing notation (innermost index is the fastest) can berepresented as a real array with dimensions p[PTL SIZE, N PTL, PTL TYPE]. Here, PTL SIZEis the size of a particle (e.g. if the particle only has 3 components of velocity and 3 of position thenPTL SIZE = 6), N PTL is the maximum length of the particle list, and PTL TYPE determineshow many particle species there are (e.g. PTL TYPE = 2 if there are only positrons and electrons).Because indices over the particle components change fastest, this leads to same memory layout asan array of particle structures. AoS leads to good cache reuse, since we typically want to updateall of the components of a single particle before moving on to the next particle. However, it doesnot lead to good vectorization, because the same component (e.g. x -velocity) for particles that areadjacent in memory is at a stride of PTL SIZE.To take advantage of SIMD vectorization and update multiple particles in one clock cycle,the individual components of position and velocity for different particles must have unit stride inmemory. To achieve this, we can consider using a structure of arrays (SoA), which is the transposeof an AoS (individually for each species). Thus, the particle data is stored in memory as p[N PTL,PTL SIZE, PTL TYPE]. Vectorization is now possible, since individual components of positionand velocity for different particles have unit stride in memory. However, good cache utilization isno longer possible if N PTL*PTL SIZE*SIZEOF(REAL) is greater than the size of cache in bytes,which is typically the case if N PTL is large.To achieve both good vectorization and cache reuse, it is necessary to use an AoSoA memorylayout for the particle data. In this case, the particle array can be represented in memory asp[VLEN,PTL SIZE, N PTL/VLEN, PTL TYPE]. Here, VLEN is a new free parameter that isrelated to the SIMD vector length of the machine. Notice that VLEN=1 corresponds to an SoAmemory layout and VLEN=N PTL corresponds to an AoS memory layout. However, the realpower of introducing a new parameter is due to the following reasons. First, if VLEN is a multipleof the machine vector length, then vectorization is possible for the same reason as in the AoS case.Second, good cache reuse only requires that VLEN*PTL SIZE*SIZEOF(REAL) is greater thanthe size of cache in bytes. Third, it is typically possible to choose VLEN to achieve both goodvectorization and cache reuse in a machine-independent way (i.e. without having to change VLENfor running on a different processor type). In our experience, setting VLEN to have a size of 128in units of bytes achieves the maximum possible vectorization without exceeding the cache size oncurrent CPUs.Fig. 9 shows the relative mover performance when varying VLEN from VLEN = 1 (AoS limit)to VLEN = N PTL = 2 (SoA limit) in multiples of two for a pulsar test problem. At low valuesof VLEN, vectorization is inefficient, but at high values of VLEN, the cache size is exceeded, somemory accesses are inefficient. In between, 2 < VLEN < , there is a plateau in performancewhen vectorization and cache reuse are both optimal. Such a performance plateau will in generalexist for each processor type, but the upper and lower extent of the plateau are determined by theSIMD vector length and cache size, which vary from processor to processor. 16 –Fig. 9.— Mover speedup (larger is better) on a single core as a function of the VLEN parameterusing an AoSoA memory layout to store the particle data. The blue points show the case with SIMDvectorization turned on and the orange points with SIMD vectorization turned off (via compilersettings and compile-time options). The vertical dashed lines depict the L1, L2 and L3 cache sizes(the processor type is Sandy Bridge). Note the drop off in performance each time a cache boundaryis exceeded. Performance is particularly bad when the vector length is so long that the particledata can no longer fit inside the L2 cache. 17 –Fig. 10.— Fractional time spent in different subroutines (smaller is better) for a pulsar test problemon a single core. The different segments of each bar depict the total time spent in the the currentdeposit (lower segment), mover (middle segment), and all other subroutines combined (upper seg-ment). Different bars correspond to different types of simulations, and the identifying label undereach bar is split into three parts. The first letter is “v” if vectorization is turned on or “n” ifvectorization is turned off. The second string of letters determines the type of current depositionalgorithm: “E” for Esirkepov or “VB” for Villasenor-Buneman. The last letter determines the typeof mover: “V” for Vay and “B” for Boris. The processor type is Broadwell and VLEN = 32.Storing the particles in memory using an AoSoA layout works well for vectorizing most of thework done by the mover without sacrificing cache performance. However, it does not vectorize theinterpolation of field components to particle locations in the mover or the deposition of currents fromindividual particles to the grid in the current deposit step. In particular, the Villasenor-Bunemancurrent deposition algorithm does not vectorize well, and the Esirkepov current deposition routinevectorizes only partially. Nevertheless, implementing an AoSoA layout for the particles in memoryalready improves code performance substantially.Fig. 10 shows a histogram of the relative performance for various combinations of mover anddeposit algorithms with and without SIMD vectorization using an AoSoA memory layout for theparticles. SIMD vectorization dramatically reduces the amount of time spent by the code in themover. In fact, the amount of time spent in the mover becomes about equal to the amount of timespent in the current deposit step, and the overall simulation time is reduced by over a factor oftwo. 18 – REFERENCES
Beloborodov, A. M. 2008, ApJ, 683, L41Belyaev, M. A. 2015, New Astronomy, 36, 37Belyaev, M. A. 2015, MNRAS, 449, 2759Belyaev, M. A., & Parfrey, K. 2016, ApJ, 830, 119Boris, J. P. 1970, “Relativistic plasma simulation-optimization of a hybrid code”. Proceedings ofthe 4th Conference on Numerical Simulation of Plasmas. Naval Res. Lab., Washington, D.C.pp. 3 - 67.Cerutti, B., Philippov, A., Parfrey, K., & Spitkovsky, A. 2015, MNRAS, 448, 606Chen, A. Y., & Beloborodov, A. M. 2014, ApJ, 795, L22Contopoulos, I., Kazanas, D., & Fendt, C. 1999, ApJ, 511, 351Decyk, V. K., & Singh, T. V. 2014, Computer Physics Communications, 185, 708Esirkepov, T. Z. 2001, Computer Physics Communications, 135, 144Goldreich, P., & Julian, W. H. 1969, ApJ, 157, 869Gralla, S. E., Lupsasca, A., & Philippov, A. 2016, ApJ, 833, 258Krause-Polstorff, J., & Michel, F. C. 1985, MNRAS, 213, 43PMichel, F. C., & Li, H. 1999, Phys. Rep., 318, 227Philippov, A. A., Spitkovsky, A., & Cerutti, B. 2015, ApJ, 801, L19Philippov, A. A., Cerutti, B., Tchekhovskoy, A., & Spitkovsky, A. 2015, ApJ, 815, L19Spitkovsky, A., & Arons, J. 2002, Neutron Stars in Supernova Remnants, 271, 81Spitkovsky, A. 2006, ApJ, 648, L51Sturrock, P. A. 1971, ApJ, 164, 529Surmin, I. A., Bastrakov, S. I., Efimenko, E. S., et al. 2016, Computer Physics Communications,202, 204Timokhin, A. N., & Arons, J. 2013, MNRAS, 429, 20Uzdensky, D. A., Cerutti, B., & Begelman, M. C. 2011, ApJ, 737, L40Vay, J.-L. 2008, Physics of Plasmas, 15, 056701 19 –Villasenor, J., & Buneman, O. 1992, Computer Physics Communications, 69, 306Yee, K. 1966, IEEE Transactions on Antennas and Propagation, 14, 302