[PDF] Asynchronous Liquids: Regional Time Stepping for Faster SPH and PCISPH

Abstract

This paper presents novel and efficient strategies to spatially adapt the amount of computational effort applied based on the local dynamics of a free surface flow, for both classic weakly compressible SPH (WCSPH) and predictive-corrective incompressible SPH (PCISPH). Using a convenient and readily parallelizable block-based approach, different regions of the fluid are assigned differing time steps and solved at different rates to minimize computational cost. Our approach for WCSPH scheme extends an asynchronous SPH technique from compressible flow of astrophysical phenomena to the incompressible free surface setting, and further accelerates it by entirely decoupling the time steps of widely spaced particles. Similarly, our approach to PCISPH adjusts the the number of iterations of density correction applied to different regions, and asynchronously updates the neighborhood regions used to perform these corrections; this sharply reduces the computational cost of slowly deforming regions while preserving the standard density invariant. We demonstrate our approaches on a number of highly dynamic scenarios, demonstrating that they can typically double the speed of a simulation compared to standard methods while achieving visually consistent results.

Full PDF

NNoname manuscript No. (will be inserted by the editor)

Asynchronous Liquids: Regional Time Stepping forFaster SPH and PCISPH

Prashant Goswami* · Christopher Batty

Received: date / Accepted: date

Abstract

This paper presents novel and eﬃcient strategies to spatially adaptthe amount of computational eﬀort applied based on the local dynamics of afree surface ﬂow, for both classic weakly compressible SPH (WCSPH) andpredictive-corrective incompressible SPH (PCISPH). Using a convenient andreadily parallelizable block-based approach, diﬀerent regions of the ﬂuid areassigned diﬀering time steps and solved at diﬀerent rates to minimize compu-tational cost. Our approach for WCSPH scheme extends an asynchronous SPHtechnique from compressible ﬂow of astrophysical phenomena to the incom-pressible free surface setting, and further accelerates it by entirely decouplingthe time steps of widely spaced particles. Similarly, our approach to PCISPHadjusts the the number of iterations of density correction applied to diﬀerentregions, and asynchronously updates the neighborhood regions used to per-form these corrections; this sharply reduces the computational cost of slowlydeforming regions while preserving the standard density invariant.We demon-strate our approaches on a number of highly dynamic scenarios, demonstratingthat they can typically double the speed of a simulation compared to standardmethods while achieving visually consistent results.

Keywords regional time stepping · asynchronous time integration · SPH · PCISPH

P. Goswami (*corresponding author)

Blekinge Institute of Technology, SwedenTel.: +0 (46) 455 - 38 58 26Fax: +0 (46) 455 - 38 50 57E-mail: [email protected]. BattyUniversity of Waterloo, CanadaTel.: +1 (519) 888-4567Fax:. +1 (519) 885-1208E-mail: [email protected] a r X i v : . [ c s . G R ] S e p Prashant Goswami*, Christopher Batty

The

Smoothed Particle Hydrodynamics (SPH) method is a powerful and widelyused approach to liquid animation [9,8,14]; among other beneﬁts, it producesdetailed splashing and droplet eﬀects, supports seamless topological changesand preservation of liquid mass, and handles complex boundaries in a straight-forward manner. However, capturing a suﬃciently wide range of spatial scalesin order to generate visually compelling results often requires large particlecounts, and correspondingly long simulation times.To date, several acceleration strategies have been proposed to tackle thischallenge, including GPU or multi-core CPU methods that exploit parallelism(e.g., [15,13,16]) and spatially adaptive methods that coarsen the particle’sspatial resolution away from the surface (e.g., [2,12]). We propose a new andcomplementary approach.Across all SPH methods, the choice of time step remains a crucial factor indetermining the overall computational cost. All else being equal, the smallerthe time step, the more iterations that must be taken to simulate a given spanof time, and hence the longer the total time spent running the simulation.The standard time stepping strategy is to use a single global time step whichis either held constant throughout the simulation, or varied as a function ofthe most rapidly deforming region of the ﬂow to ensure stability and accuracy[18]. However, many practical ﬂuid ﬂows involve both slow movement andcomparatively rapid movement, due to external forces, inﬂow/outﬂow bound-aries, collisions with objects, and so forth. A global time step is often muchtoo conservative in slow moving regions, leading to a great deal of wastedcomputational eﬀort where a large time step would suﬃce.Problems of this nature suggest the use of asynchronous time integration :diﬀerent regions of a simulation should be computed at diﬀerent rates in orderto maximize eﬃciency while satisfying accuracy and stability restrictions. Vari-ations on this idea have been applied to animation problems in rigid bodies,cloth, deformable bodies, and collision processing [38,37,36], and it has a longhistory in mechanics (e.g. [4]). This general strategy has also been developedfor certain SPH simulations in astrophysics [25,22] and weakly compressibleSPH (WCSPH) [14] in [1,39]. However, to our knowledge this concept has notbeen extended to animating free-surface ﬂow of incompressible SPH methodslike (PCISPH) scheme [17].In this paper, we introduce regional time stepping (RTS) approaches forboth the WCSPH and PCISPH methods, in which computational eﬀort isexpended on diﬀerent ﬂuid regions in proportion to the speed of their localdynamics. In our numerical experiments, we were able to reduce simulationtimes by approximately a factor of two compared to global adaptive timestepping on realistic, highly dynamic scenes in which the entire connected bodyof ﬂuid is in motion. Our algorithm relies on an eﬃcient block-based techniqueto determine the diﬀerent regions and support convenient parallelism. Afterreviewing related work, we will outline how to choose the regions and their synchronous Liquids: Regional Time Stepping for Faster SPH and PCISPH 3 corresponding time steps, and then describe how we eﬀectively incorporatethis central idea into each of the WCSPH and PCISPH schemes.This paper is an extension of our earlier published short paper on RTSfor WCSPH [1]. Here we extend our work to incorporate RTS to predictive-corrective incompressible SPH and also provide more detailed exposition ofour method.

The smoothed particle hydrodynamics method, or SPH, was ﬁrst applied toliquid animation by M¨uller et al. [9], although Desbrun and Cani [5,6] hadearlier applied it to animating highly deformable bodies. Further backgroundon the classic weakly compressible SPH scheme can be found in a review byMonaghan [8] and a paper by Becker and Teschner [14]. More recently, thepredictive-corrective incompressible variant of SPH (PCISPH) introduced bySolenthaler and Pajarola [17] has been widely adopted because it allows forsigniﬁcantly larger time steps while maintaining incompressibility.The Navier-Stokes equations governing ﬂuid ﬂow naturally yield behaviorspanning a wide range of spatial and temporal scales; depending on the ap-plication of interest, certain of these features are more relevant than others.For example, in liquid animation the surface motion and details often takepriority, and this is captured in spatially adaptive approaches that coarsenthe particle scale further from the surface to reduce the total number of par-ticles [2,11,12]. On the other hand, both GPU-based methods [10,11,15] andmulti-core CPU-based methods [13] have also been proposed to accelerate SPHsimulations, without necessarily relying on adaptivity. The application of asyn-chronous time stepping is largely orthogonal to many of these approaches, andtherefore complementary; we demonstrate our new method within a parallelCPU-based SPH code.As noted earlier, the time step is a crucial factor in the computational costof SPH. The most common strategy to incorporate temporal adaptivity is tomodify the global time step over the course of the simulation based on themaximum velocities and forces of the entire ﬂuid body at a given time. Thisallows the simulation to proceed more rapidly during calm motions, but totake much smaller time steps when necessary to resolve very rapid motion.For example, this strategy was recently adapted to the PCISPH method byIhmsen et al.[18]. Raveendran et al.[19] proposed a rather diﬀerent multi-resolution strategy to allow large time steps: the SPH method is augmentedwith a broad-scale Eulerian projection method to provide a good initial guessat the ﬂuid pressure. In contrast, our approaches are purely Lagrangian.We are aware of no methods for liquid animation that exploit the possibilityof varying the time step itself spatially. One partial exception is the methodof Goswami and Pajarola [20] in which very slow moving particles are entirelyfrozen to save computational cost. Although this accelerates the simulation,

Prashant Goswami*, Christopher Batty

Fig. 1

Water gushing down stairs and along a corridor, simulated using our regional timestepping PCISPH method with 1.6M particles. We achieve a factor of 3.6 speed-up over aconstant time step approach method and a factor of 1.7 speed-up over global adaptive timestepping [Ihmsen et al. 2010]. Top: Particles colored by time step region. Middle: Opaquesurface geometry. Bottom: Refractive water rendering. its applicability is fairly limited and it can introduce objectionable dissipationeﬀects in the ﬂuid motion if applied aggressively.SPH methods in astrophysics applications have employed asynchronoustime integration strategies [21,25,22,3] to deal with large variations in timescales and stiﬀnesses. This setting diﬀers from ours in that the target mediumis typically compressible and doesn’t involve a free-surface. Our asynchronousapproach for weakly compressible SPH builds on that of Serna et al.[22], aug-menting it with a block-based approach that lets us smooth temporal varia- synchronous Liquids: Regional Time Stepping for Faster SPH and PCISPH 5 tions between regions and skip a larger amount of computation in less activeregions. Furthermore, we develop a novel regional time stepping method forPCISPH that extends many of the advantages of asynchrony to this setting aswell.Lastly, we note that while projection-based Eulerian methods for incom-pressible ﬂow are inherently synchronous to some degree, Patel et al.[7] ex-plored using distinct time steps for disjoint liquid bodies of the same simulationto gain some of the beneﬁts of asynchrony.

Our algorithm relies on a block-based architecture. If s is the initial particlespacing, we divide the simulation domain into a virtual grid, with each blockhaving support radius r , such that r (cid:39) s . Thus each particle is contained byexactly one of the blocks in the simulation domain.Such an arrangement has several beneﬁts. For example, neighbors of allparticles in a block can be computed eﬃciently by examining neighboringblocks. Each block can also be treated as a parallelization unit for computingthe physics of particles within it, as in the work of Goswami et al.[15].However, the most important advantage of the block-based arrangementin our case is parallel region determination. The time steps for a given regionare computed over these virtual blocks instead of at the particle level, underthe reasonable assumption that liquid in a local area tend to be deforming atcomparable rates. This method can then be eﬃciently parallelized by launchinga thread per ﬁlled block instead of per particle. Particles falling within thatblock report their velocity and force up to the parent block, thereby avoidingany race or collision conditions.3.1 Time Step SelectionOur simple block-based time step computation is illustrated in Figure 2, andcomprises three steps:1. All particles compute their velocity and total force.2. Particles propagate their attributes to their parent (i.e., containing) block.A minimum time step is computed for the block based on the maximumforce and velocity from its particles.3. Each block’s time step is propagated back to its particles.This approach is used both for WCSPH and PCISPH.In what follows, (cid:60) n denotes a region or set of blocks assigned to a giventime step ∆t n = n∆t b where ∆t b is the base time step, and n is a positiveinteger. The corresponding particle set is denoted by S n .A block is assigned to a region corresponding to the largest time step forwhich it satisﬁes a set of three criteria, according to its particles’ maximum Prashant Goswami*, Christopher Batty

Fig. 2

Block-based time step computation for particles, red ( ∆t b ), green (2 ∆t b ) and blue(3 ∆t b ) where ∆t b is the base (smallest) time step. (a) Particles colored by the individualtime steps they would ordinarily possess. (b) Particles pass their velocity and force valuesto their parent blocks. Each block is assigned the minimum required time step based on itsparticles. (c) The block propagates its computed time step back to the particles. (Particleswhose time step has been altered are outlined in black.) velocity and force. The ﬁrst two criteria are: ∆t n ≤ λ v rc s (1) ∆t n ≤ λ f (cid:114) rmF max (2)These are standard time step conditions from the SPH literature (e.g.,[6,14]); the ﬁrst is a CFL condition, while the second accounts for suddenaccelerations over a time step. In these equations, c s is the speed of sound inthe medium, m is the particle mass, F max is the maximum force magnitudeof particles in the block, and V max is the maximum velocity magnitude ofparticles in the block. We set the remaining coeﬃcients λ to λ v ≤ . λ f ≤ . ∆t n V max r ≤ αβ n (3)In essence, Equation 3 assigns to each particle a time step based on thefraction of its support radius that it would cover in a step moving at itscurrent velocity. The threshold cutoﬀs β n determine how the time steps arepartitioned.For SPH all three criteria are applied, and we set α = 0 .

4. For PCISPH,larger time steps can safely be taken than the classic CFL condition woulddictate, so equation 1 is omitted when assigning blocks to regions, and we set α = 1.In our implementation, the value of β was set as follows: β = ∞ , β n = 0 . . ( n − for n ≥ β we ensure that (cid:60) is assignedthe smallest time step. The choice of β comes directly from the CFL condition,and higher coeﬃcients are obtained by scaling down the previous value. synchronous Liquids: Regional Time Stepping for Faster SPH and PCISPH 7 t n , with positions x ni , velocities v ni , and accelerations a ni for each particle i , the following predictor step of length ∆t = t n +1 − t n is taken by all particlesto estimate new velocities and positions at time t n +1 :˜ x n +1 i = x ni + v ni ∆t + a ni ( ∆t ) v n +1 i = v ni + a ni ∆t (5)Among the set of all particles, the time t n +1 will be the conclusion of a “true”time step for some, called active particles ; for the remainder this step is takenonly to provide intermediate information to nearby particles. Next, only theactive particles have their neighborhoods and accelerations re-evaluated at t n +1 , and their positions and velocities are corrected: x n +1 i = ˜ x n +1 i + ( a n +1 i − a ni ) δt v n +1 i = ˜ v n +1 i + ( a n +1 i − a ni ) δt δt refers to the length of time between t n +1 and the last time thateach speciﬁc particle’s acceleration was evaluated (i.e., the length of its truetime step). All other particles maintain their previous acceleration value.The net eﬀect is that particles taking large time steps assume constantacceleration over their true step, and the intermediate predictor steps approx-imate the necessary “substep” information required by nearby particles thatmay be taking smaller (or oﬀset) time steps. Because acceleration is assumedconstant (and position and velocity treated accordingly), the number of sub-steps taken does not change the ﬁnal end-of-step positions or velocities forthe particles being substepped, compared to taking a single large step; thesubstepping is merely an interpolation process.The correction applied at the end of a particle’s true time step main-tains second order accuracy in position and velocity. Furthermore, withoutthis correction na¨ıve asynchronous simulations exhibit visual artifacts. As ev-ident from the noisier surface and altered color distribution in the dual dambreaking scenario in Figure 3, the particles’ natural motion and stability isdisrupted, erroneously leading to smaller time steps. Prashant Goswami*, Christopher Batty all particles in the simulation at thesmallest global time step. We make the further observation that if all the par-ticles within a given particle’s neighborhood require only the same or largertime step, then no interpolated substeps need to be taken and the ﬁnal resultwill be the same. Therefore in regions of our domain assigned large time steps,we can safely integrate all the contained particles at that timestep withoutthe need to perform any substepping whatsoever. This allows the simulationto remain synchronized overall, while correctly integrating diﬀerent regions atappropriate rates and avoiding unnecessary computation.The basic outline of our approach is presented in Algorithm 1. The ﬁrststep assigns a time step to each block within the simulation domain. That is,we choose (cid:60) n and S n and update the global block-based neighborhood grid.To ensure that the time step varies gradually across the physical domain,which aids in simulating quite stiﬀ incompressible ﬂows, we locate the bound-ary between regions with diﬀerent time steps, and determine the set of blocks (cid:60) min on the side with the larger time step. This region is then assigned thesmaller time step of its neighboring region, which is done eﬃciently at theblock level by checking each block’s neighbors.In our algorithm, the particles maintain a few additional variables. validity is the number of the smallest time steps for which its most recently computedattributes are assumed valid (i.e., how many substeps before its true timestep ends). compute is a Boolean ﬂag that indicates whether the particle iscurrently active (i.e., requires re-computation of its acceleration, and end-of-step correction of its position and velocity) which occurs when the validity ≤

0. If the particle is active, its neighborhood set is determined and its localdensity and forces are computed. Otherwise, it skips these steps. At the endof each loop, the position and velocity of each particle is updated, and activeparticles have their velocities and positions corrected (lines 25-30), per Serna’sscheme [22].We make some additional observations. First, while the computation oftime steps is determined per block, it is updated on the individual particleswhich also track their own validity. Blocks do not have validity or history, andtherefore all computations over blocks are valid only for a frame. Second, thealgorithm is pre-emptive. That is, a particle can change its time step evenbefore its validity expires (line 14 of Algorithm 1). This allows the methodto maintain stability in the face of sudden accelerations, as often occurs incollisions with boundaries. synchronous Liquids: Regional Time Stepping for Faster SPH and PCISPH 9

Fig. 3

The initial moments of the double dam break example using RTS WCSPH. Top:Without applying the second order velocity and position corrections of Serna et al., the par-ticles quickly push apart and the smooth surface is disrupted. Bottom: With the correctionsapplied we achieve the expected behavior.

Algorithm 1

Regional Time Stepping for WCSPH for all particles i do

2: set validity i = 03: while (animating) do

4: update block neighborhood grid5: /*——— Region determination( (cid:60) i ) ———*/6: for all i ∈ S do

7: update parent block maximum with v i and F totali for all blocks b do

9: compute new region membership per section 310: for all i ∈ S do

11: decrement validity i if ( validity i ≤ (cid:107) ( parent block has a diﬀerent time step )13: set compute i = true

14: update validity i , timestep i else compute i = false

17: /*——— Physics computation ———*/18: for all i ∈ S do if compute i do ﬁnd neighborhoods N i for all i ∈ S do if compute i do update ρ i , p i for all i ∈ S do if compute i do update F p,v,g for all i ∈ S do

25: predict new v i

26: predict new x i if compute i then

28: apply correction to v i

29: apply correction to x i The second observation we build on is that more localized density cor-rections can be highly eﬀective. Raveendran et al.[19] noticed this, and ex-ploited it by applying a post-process that spends extra iterations correctingonly those particles with large remaining density errors after their core algo-rithm concludes. We instead make this observation a fundamental feature ofour algorithm, locally applying a diﬀerent number of density correction itera-tions based on the dynamics of diﬀerent regions of the ﬂow.5.2 The AlgorithmAt a high level, our algorithm works as follows: we pick a large time step t n ,called the major time step, and divide it into n equal subintervals, or minorsteps, so that t n = n∆t b . (We used n = 4.) At the beginning of a major step, synchronous Liquids: Regional Time Stepping for Faster SPH and PCISPH 11 Fig. 4

A 1.5M particle double dam break simulated using our regional time stepping al-gorithm for WCSPH. Top: Particles colored by time step region. Bottom: Particles forcorresponding frames from a diﬀerent viewpoint and colored uniformly. we assign blocks of particles to diﬀerent regions based on their dynamics asin section 3. On each minor step, all regions perform at least one iteration ofdensity correction, and then participate in additional iterations depending ontheir region membership.Note that our algorithm is therefore not truly asynchronous in the man-ner of our RTS SPH approach; all particles are advanced in synchronization.However, the computational expense of slow moving regions is dramaticallyreduced, by lowering the number of correction iterations applied. On the otherhand, we do update the particle neighborhoods asynchronously in proportionto how fast they are likely to change, since neighborhood searches are a majorexpense in SPH algorithms. As in WCSPH, we maintain a validity variablefor each particle that tracks how many (minor) time steps a particle’s datais considered to be valid for, based on its region membership. Primarily, thismeans that we update the particle neighborhood only when its validity expires,and otherwise reuse its most recently computed neighborhood.Pseudocode for our approach is given in Algorithm 2, and we describe itsvarious elements below.

Fig. 5

Block-based region determination. (a) Blocks are assigned time steps. (b) At bordersbetween time step regions, we assign blocks with larger time steps (blue, 3 t ) the time stepof their neighbour with a smaller time step (green, 2 t ) to smooth out region transitions.This region is called (cid:60) min , shown with brown borders. (c) For RTS PCISPH, we mustalso identify the set of blocks (cid:60) ob which border region changes on the side with the largertime step; this region is observed for density variations of particles that were previouslyresolved. (d) We also identify the region (cid:60) e containing particles with remaining densityerrors regardless of their region, and augment (cid:60) ob with the cells bordering this region.synchronous Liquids: Regional Time Stepping for Faster SPH and PCISPH 13 Fig. 6

Our density correction schedule for PCISPH. Each column (a)-(d) represents oneminor time step within a major time step of 4 t . Each row within a column represents oneiteration of density correction being applied. The colored blocks within each row indicatewhich regions have density correction performed on them at a given iteration within a minorstep. Blocks labelled e correspond to iterations performed for all remaining particles whosedensity error hasn’t been fully corrected.The schedule we designed has the following properties. All regions receive a minimum of oneiteration of density correction on every step, as seen in the ﬁrst row, with the exception of thevery ﬁrst step within a major step, in which all regions receive two iterations. Additionaliterations are then assigned based on region membership. The fastest moving ( t ) regionsreceive 3 iterations per step, as in classic PCISPH. The next fastest moving (2 t ) regionreceives 3 iterations for every two minor steps. Slower moving regions similarly receive fewertotal iterations per major step. turn is set to true if a given particle isscheduled to undergo density correction on the current iteration.5.4 Local Density CorrectionWe sometimes ﬁnd that a small number of particles have unresolved densityerrors after their assigned number of correction iterations is completed on agiven step. We address this in a manner similar to Raveendran et al.[19].We introduce an additional region (cid:60) e (with particle set S e ); this constitutesall particles having error larger than a chosen threshold irrespective of theirtime step. We follow the same error metric as given by Ihmsen et al.[18] where the average density error ρ avgerr of all particles and the maximum density error ρ maxerr of any particle should not exceed a certain value. Per Raveendran et al.,this threshold is set to half the value of the largest allowed density error. Theparticles that compose S e are corrected at every iteration of our schedule. Ifno such particles exist, we skip this step.Since we often perform density correction on just a subset of all the parti-cles, we need to know if these corrections have disrupted the density of particleswhose density was previously declared correct. Movement near or across theboundaries between regions can easily cause this to occur. To account for thiswe identify another region, (cid:60) ob , which is the set of ”observed” blocks that wemonitor for changes, as shown in Figure 5. This is the set of blocks that borderon either (cid:60) e or a region with a smaller time step; its particle set is denoted S ob . After each iteration, it suﬃces to check just the density of particles in S ob for newly introduced errors, avoiding an expensive global check on all particleswhose density is already correct.5.5 Extra Correction IterationsAfter the standard 3 iteration schedule is completed for a given minor step, ifsome error in density remains we do one of two things. We can either performadditional purely local corrections on the particles with remaining error, or runthrough additional global iterations by resetting to the top of the correctionschedule for the current minor step. Local correction is often preferable since itminimizes the number of particles involved, but it may not always be eﬃcientor feasible if the error is large or global. We therefore make this choice byconsidering whether the average compression error ρ avgerr exceeds a threshold,since this is indicative of global density errors. If local iterations are initiallyselected, but they nonetheless fail to converge after a few iterations, we switchback to global corrections and revert the pressure estimates to their valuesfrom before local iterations began.If a minor step exceeds 6 global iterations for correction, we terminate themajor step early and temporarily reduce the length of subsequent major stepsto 2 ∆t b (i.e., two minor steps rather than four). This allows the global neigh-borhood grid to be completely updated more frequently to better handle theselarge density changes. After 10 major steps in which we do not exceed 6 globaliterations on any minor steps, the major time step is reverted. However, weemphasize that in our experiments we found that for the majority of cases ex-tra global operations aren’t necessary; a few steps of local correction typicallysuﬃce.For PCISPH we make one additional minor change to our region determi-nation strategy: we expand regions (cid:60) and (cid:60) by 4 layers of blocks each beforebeginning a major step, in a manner similar to Re min . This is done to accountfor fast particles that may travel several blocks from their original position. synchronous Liquids: Regional Time Stepping for Faster SPH and PCISPH 15 Fig. 7

For PCISPH particles in (cid:60) , we search two layers of grid cells (red) when determiningits particle neighborhood, rather than the usual one (green). (cid:60) update their neighborhood every step, particlesin (cid:60) update every second minor step, and particles in (cid:60) do so only once permajor time step. We found that since particles in (cid:60) are also slow moving,postponing their neighborhood update to the next major step is satisfactoryas well.At each minor step, we also check if a particle now requires a smaller timestep. If so, we mark its block and the neighboring blocks to update their timestep. This allows the simulation to adapt to sudden changes.When a particle’s validity expires and its neighborhood needs updating,a na¨ıve approach would globally update the particles belonging to each gridblock and performing a new search within the updated neighboring blocks.This would be very costly, limiting the beneﬁt of our asynchronous strategy.To cope with this, we take two steps. First, we initially overpopulate the gridat the start of each major step; that is, we use a grid with blocks of size r = 2 . s rather than r = 2 s . This ensures that particles will nearly alwaysﬁnd all the particles belonging to their neighborhood in their own or adjacentblocks, even if we keep the same block data over the entire major step.Second, for the very fast particles within (cid:60) , we expand the radius ofour neighbor particle search to two grid blocks instead of one (see Figure 7).While this means we must check many more possible neighbor particles, this is justiﬁed because these particles are few in number and arise infrequently,yet are critical to stability. Doing this with the particles in (cid:60) as well would besubstantially more expensive. Figure 8 illustrates that expanding the searchradius for both (cid:60) and (cid:60) makes little visual diﬀerence, so we avoid doingso for the sake of eﬃciency. (Moreover, standard PCISPH assumes a ﬁxedneighborhood over large time steps [17], so the method is reasonably robustto small errors in neighborhood estimates.) Algorithm 2

Regional Time Stepping for PCISPH while animating do

2: update block neighborhood grid3: /*———- Region determination ( (cid:60) i ) ———*/4: for all i ∈ S do

5: update parent block maximum with v i and F totali for all blocks b do

7: compute new region membership8: expand regions (cid:60) , (cid:60)

9: ﬁnd observed blocks R ob and particle set S ob for all i ∈ S do

11: update timestep i , validity i , compute i using parent block12: /*———- N Minor time steps ———*/13: for j = 1 to N do for all i ∈ S do if compute i do update neighborhood N i for all i ∈ S do if compute i do update F v,g,ext for all i ∈ S do

19: initialize pressure p ( t ) = 0.020: initialize pressure force F p ( t ) = 0.021: /*———- Density Correction ———*/22: DensityCorrectionRTS23: for all i ∈ S do

24: compute new v i ( t + 1)25: compute new x i ( t + 1)26: for all i ∈ S do

27: decrement validity i if ( validity i ≤ then

29: set compute i = true

30: update validity i else compute i = false synchronous Liquids: Regional Time Stepping for Faster SPH and PCISPH 17 Algorithm 3

DensityCorrectionRTS

1: set minIterations to 32: set S e to NULL while ( iter ≤ minIterations ) (cid:107) ( ρ ∗ err ( t + 1) ≥ η ) do for all i ∈ S do

5: predict velocity v ∗ i ( t + 1)6: predict position x ∗ i ( t + 1)7: for all i ∈ S do if ( i has turn) (cid:107) ( i ∈ S e ) (cid:107) ( i ∈ S ob ) then

9: predict density ρ ∗ i ( t + 1)10: update density variation ρ ∗ err ( t + 1)11: update pressure p i ( t ) += f ( ρ ∗ err ( t + 1))12: if ( density error remains outside active regions ) then

13: update S e , S ob for all i ∈ S do if ( i has turn) (cid:107) ( i ∈ S e ) then

16: compute pressure force F pi ( t + 1)17: iter ++18: if ( ρ ∗ err ( t + 1) ≥ η ) & ( iter ≥ then minIterations ++20: if ( ρ avgerr ≥ η T ) then

21: reset to top of active step’s correction schedule (i.e., extra global corrections)22: else

23: perform local correction on S e The proposed and baseline WCSPH/PCISPH methods were implemented ona MAC 10.7.5 machine with 3.2 GHz quad-core Intel processor, using C++and the OpenMP API. The images were rendered oﬄine with POVRAY.For WCSPH, we have used ∆t cfl for standard time stepping and ∆t b = ∆t cfl for RTS. For PCISPH, we compare global time stepping with a constanttime step against both adaptive global stepping [18] and our method. For thesake of experimental similarity with Ihmsen’s results, we have used a constanttime step of 0 . ∆t b is chosen to be 0 . . ∆t b ineach major cycle. The user-chosen error threshold ρ T (for whether to performextra global or local corrections) is set to 0 .

25 in all our experiments.Since we a moderately restrictive value of α = 0 . (cid:60) and (cid:60) . For the same reason, particles do not often take time steps larger Fig. 8

Extending the neighborhood search radius to two blocks for particles in both (cid:60) and (cid:60) results in little evident diﬀerence when compared to 11(a), in which only the searchradius for (cid:60) is expanded. Fig. 9

A highly turbulent ﬂow involving several obstacles and boundaries, simulated with4.5M particles using the proposed regional time stepping algorithm for PCISPH. than 3 ∆t b . Our block-based RTS algorithm allows these transitions regions andoccasional very fast moving particles to be treated eﬃciently, while keepingthe simulation stable and accurate.Table 1 gives the performance speed-up of our methods in comparison toboth WCSPH and PCISPH for several examples. Our methods yield simula-tions between 1.7 and 2.1 times faster than the comparison methods for thegiven examples. Table 2 provides relative timing breakdowns for our meth-ods into physics computations, neighborhood searching, and regional time-stepping components. Figure 11 and our video examples illustrate that RTSapproaches yield results that are visually consistent with the original SPHschemes. Furthermore, they work well even for challenging dynamic scenesinvolving frequent collisions with boundaries and obstacles. synchronous Liquids: Regional Time Stepping for Faster SPH and PCISPH 19 Fig. 10

A dam break scenario where immediately after releasing the dam we drop a secondblock of liquid on top, simulated with 2.2M particles using RTS WCSPH.Scene

Table 1

Performance comparisons of regional time stepping with other methods.

We have presented an eﬃcient technique for regional time-stepping for WC-SPH and PCISPH. The proposed methods are simple to implement and canachieve signiﬁcant speed-ups while maintaining behaviour consistent with thecorresponding fully synchronous simulations. We envision several directions T neighbor T physics T RTS

WCSPH 37% 42% 14%PCISPH 28% 56% 12%

Table 2

Timing breakdowns for each method. Computations speciﬁc to RTS constituteonly 10-15% of the run-time. for future work. Two natural extensions would be implementing a GPU-basedversion, and combining our method with spatial adaptivity, such as Solen-thaler’s two-scale method [12]. Another promising research direction could beto integrate regional time stepping with implicit incompressible SPH [41], po-sition based ﬂuids [42] and divergence free SPH [40] It could also be useful toadjust the base step of our PCISPH scheme in a manner similar to the adaptiveglobal time stepping method of Ihmsen et al.[18], in order to achieve the ben-eﬁts of both. Finally, to reap the full beneﬁts of asynchrony in a wider rangeof scenarios we would like to explore methods that can couple between SPHand other asynchronously integrated physical systems, such as deformable orrigid bodies.

All persons who meet authorship criteria are listed as authors, and all au-thors certify that they have participated suﬃciently in the work to take publicresponsibility for the content, including participation in the concept, design,analysis, writing, or revision of the manuscript. Furthermore, each author cer-tiﬁes that this paper (and its contents) is original research work and has notbeen and will not be submitted to or published in any other publication.

References

1. Goswami, Prashant and Batty, Christopher, Regional Time Stepping for SPH, Euro-graphics Short Papers (2014)2. Adams, Bart and Pauly, Mark and Keiser, Richard and Guibas, Leonidas J., Adaptivelysampled particle ﬂuids, July, Volume 26, Number 3, ACM Trans. Graph. (2007)3. Vanaverbeke S. and Keppens R. and Poedts S. and Boﬃn H., GRADSPH: A parallelsmoothed particle hydrodynamics code for self-gravitating astrophysical ﬂuid dynamics,1164-1182 Volume 180, Computer Physics Communications (2009)4. Belytschko T., Partitioned and Adaptive Algorithms for Explicit Time Integration, 572–584, Nonlinear Finite Element Analysis in Structural Mechanics (1981)5. Desbrun, Mathieu and Cani, Marie-Paule, Smoothed particles: A new paradigm for an-imating higly deformable bodies, 61–76, Proceedings of the Eurographics workshop onComputer animation and simulation (1996)6. Desbrun, Mathieu and Cani, Marie-Paule, Space-Time Adaptive Simulation of HighlyDeformable Substances, Number 3829, Technical Report, INRIA (1999)7. Patel, Sanjit and Chu, Anson and Cohen, Jonathan and Pighin, Frederic, Fluid Simula-tion via Disjoint Translating Grids, Proceedings of ACM SIGGRAPH Sketches (2005)8. Monaghan, J. J., Smoothed Particle Hydrodynamics. Rep., 1703–1759, Volume 68, Prog.Phys. (2005)synchronous Liquids: Regional Time Stepping for Faster SPH and PCISPH 219. M¨uller, Matthias and Charypar, David and Gross, Markus, Particle-based ﬂuid sim-ulation for interactive applications, 154–159, Proceedings of the 2003 ACM SIG-GRAPH/Eurographics symposium on Computer animation, SCA ’03 (2003)10. Takahiro, Harada, and Seiichi, Koshizuka and Yoichiro, Kawaguchi, Smoothed ParticleHydrodynamics on GPUs, 63–70, Proc. of Computer Graphics International (2007)11. Zhang, Yanci and Solenthaler, Barbara and Pajarola, Renato, Adaptive sampling andrendering of ﬂuids on the GPU, 137–146, Proceedings of the Fifth Eurographics / IEEEVGTC conference on Point-Based Graphics (2008)12. Solenthaler, Barbara and Gross, Markus, Two-scale particle simulation, 81:1–81:8, Vol-ume 30, Number 4, ACM Trans. Graph. (2011)13. Ihmsen, Markus and Akinci,Nadir and Becker, Markus and Teschner,Matthias, A Par-allel SPH Implementation on Multi-Core CPUs, 99-11, Volume 30, Number 1, ComputerGraphics Forum (2011)14. Becker, Markus and Teschner, Matthias, Weakly compressible SPH for free surfaceﬂows, 209–217, Proceedings of the 2007 ACM SIGGRAPH/Eurographics symposium onComputer animation (2007)15. Goswami, Prashant and Schlegel, Philipp and Solenthaler, Barbara and Pajarola, Re-nato, Interactive SPH simulation and rendering on the GPU, 55–64, Proceedings of the2010 ACM SIGGRAPH/Eurographics Symposium on Computer Animation (2010)16. Goswami, Prashant and Eliasson, Andr´e and Franz´en, Pontus, Implicit IncompressibleSPH on the GPU, 23–29, Workshop on Virtual Reality Interaction and Physical Simulation(2015)17. Solenthaler, Barbara and Pajarola, Renato, Predictive-corrective incompressible SPH,40:1-40:6, Volume 28, Number 3, ACM Trans. Graph. (2009)18. Ihmsen, Markus and Akinci, Nadir and Gissler, Marc and Teschner, Matthias, BoundaryHandling and Adaptive Time-stepping for PCISPH, 79–88, Virtual Reality Interaction andPhysical Simulation (2010)19. Raveendran, Karthik and Wojtan, Chris and Turk, Greg, Hybrid smoothed particlehydrodynamics, 33–42, Proceedings of the 2011 ACM SIGGRAPH/Eurographics Sympo-sium on Computer Animation (2011)20. Goswami, Prashant and Pajarola, Renato, Time Adaptive Approximate SPH, 19-28,Proceedings of the VRIPHYS (2011)21. Hernquist, L. and Katz, N., TREESPH - A uniﬁcation of SPH with the hierarchicaltree method, 419-446, Volume 70, Astrophysical Journal Supplement Series (1989)22. Serna, A. and Domnguez-Tenreiro R. and Siz, A, Conservation Laws in Smooth Par-ticle Hydrodynamics: The DEVA Code, 597:878–597:892, Volume 597, Number 3, TheAstrophysical Journal (2003)23. Saitoh, T. R. and Makino, J., FAST: A Fully Asynchronous Split Time-Integrator fora Self-Gravitating Fluid, 301- pasj Volume 62 arXiv (2010)24. Zhang, F. and Shen, X. and Long, X. and Zhao, B. and Hu, L., Real-time Particle FluidSimulation with WCSPH, 29–34, Proceedings of the Paciﬁc Graphics Short Papers (2012)25. Owen, J. Michael and Villumsen, Jens V. and Shapiro, Paul R. and Martel, Hugo,Adaptive Smoothed Particle Hydrodynamics: Methodology. II., 155–, Volume 116, Number2, The Astrophysical Journal Supplement Series (1998)26. Park, S. W. and Linsen, L. and Kreylos, O. and Owens J.D., and Hamann B., DiscreteSibson interpolation, 243–253, Volume 12, Number 2, IEEE Transactions on Visualizationand Computer Graphics (2006)27. Pellacini, Fabio and Vidimˇce, Kiril and Lefohn, Aaron and Mohr, Alex and Leone, Markand Warren, John, Lpics: a Hybrid Hardware-Accelerated Relighting Engine for ComputerCinematography, 464–470, Volume 24, Number 3, ACM Transactions on Graphics (2005)28. Landis, H., Global Illumination in Production, ACM SIGGRAPH 2002 Course