An efficient parallel immersed boundary algorithm using a pseudo-compressible fluid solver
AAn Efficient Parallel Immersed Boundary Algorithm using aPseudo-Compressible Fluid Solver (cid:73)
Jeffrey K. Wiens a, ∗ , John M. Stockie a a Department of Mathematics, Simon Fraser University, 8888 University Drive, Burnaby, BC,Canada, V5A 1S6
Abstract
We propose an efficient algorithm for the immersed boundary method on distributed-memory architectures that has the computational complexity of a completely explicitmethod and also has excellent parallel scaling. The algorithm utilizes the pseudo-compressibility method recently proposed by Guermond and Minev that uses a directionalsplitting strategy to discretize the incompressible Navier-Stokes equations, thereby re-ducing the linear systems to a series of one-dimensional tridiagonal systems. We performnumerical simulations of several fluid-structure interaction problems in two and threedimensions and study the accuracy and convergence rates of the proposed algorithm.We also compare the proposed algorithm with other second-order projection-based fluidsolvers. Lastly, the execution time and scaling properties of the proposed algorithm areinvestigated and compared to alternate approaches.
Keywords: immersed boundary method, fluid-structure interaction, fractional stepmethod, pseudo-compressibility method, domain decomposition, parallel algorithm
1. Introduction
The immersed boundary (IB) method is a mathematical framework for studying fluid-structure interaction that was originally developed by Peskin to simulate the flow of bloodthrough a heart valve [49]. The IB method has been used in a wide variety of biofluidsapplications including blood flow through heart valves [26, 49], aerodynamics of the vocalcords [14], sperm motility [12], insect flight [41], and jellyfish feeding dynamics [32]. Themethod is also increasingly being applied in non-biological applications [43]. (cid:73)
We acknowledge support from the Natural Sciences and Engineering Research Council of Canada(NSERC) through a Postgraduate Scholarship (JKW) and a Discovery Grant (JMS). The numericalsimulations in this paper were performed using computing resources provided by WestGrid and ComputeCanada. ∗ Corresponding author.
Email addresses: [email protected] (Jeffrey K. Wiens), [email protected] (John M. Stockie)
URL: (Jeffrey K. Wiens), (John M.Stockie)
Preprint submitted to Journal of Computational Physics May 20, 2014 a r X i v : . [ c s . D C ] M a y he immersed boundary equations capture the dynamics of both fluid and immersedelastic structure using a mixture of Eulerian and Lagrangian variables: the fluid is rep-resented using Eulerian coordinates that are fixed in space, and the immersed boundaryis described by a set of moving Lagrangian coordinates. An essential component of themodel is the Dirac delta function that mediates interactions between fluid and IB quan-tities in two ways. First of all, the immersed boundary exerts an elastic force (possiblysingular) on the fluid through an external forcing term in the Navier-Stokes equationsthat is calculated using the current IB configuration. Secondly, the immersed boundaryis constrained to move at the same velocity as the surrounding fluid, which is just theno-slip condition. The greatest advantage of this approach is that when the govern-ing equations are discretized, no boundary-fitted coordinates are required to handle thesolid structure and the influence of the immersed boundary on the fluid is captured solelythrough an external body force.When devising a numerical method for solving the IB equations, a common approachis to use a fractional-step scheme in which the fluid is decoupled from the immersedboundary, thereby reducing the overall complexity of the method. Typically, thesefractional-step schemes employ some permutation of the following steps: • Velocity interpolation: the fluid velocity is interpolated onto the immersed bound-ary. • IB evolution: the immersed boundary is evolved in time using the interpolatedvelocity field. • Force spreading: calculate the force exerted by the immersed boundary and spreadsit onto the nearby fluid grid points, with the resulting force appearing as an externalforcing term in the Navier-Stokes equations. • Fluid solve: evolve the fluid variables in time using the external force calculated inthe force spreading step.Algorithms that fall into this category include Peskin’s original method [49] as well asalgorithms developed by Lai and Peskin [36], Griffith and Peskin [27], and many others.A popular recent implementation of fractional-step type is the IBAMR code [35] thatsupports distributed-memory parallelism and adaptive mesh refinement. This projectgrew out of Griffith’s doctoral thesis [21] and was outlined in the papers [24, 27]. In theoriginal IBAMR algorithm, the incompressible Navier-Stokes equations are solved usinga second-order accurate projection scheme in which the viscous term is handled withan L-stable discretization [40, 61] while an explicit second-order Godunov scheme [11,42] is applied to the nonlinear advection terms. The IB evolution equation is thenintegrated in time using a strong stability-preserving Runge-Kutta method [20]. SinceIBAMR’s release, drastic improvements have been made that increase both the accuracyand generality of the software [23, 26].Fractional-step schemes often suffer from a severe time step restriction due to numer-ical stiffness that arises from an explicit treatment of the immersed boundary in the mostcommonly used splitting approaches [58]. Because of this limitation, many researchershave proposed new algorithms that couple the fluid and immersed boundary together inan implicit fashion, for example [8, 34, 37, 44, 47]. These methods alleviate the severe2ime step restriction, but do so at the expense of solving large nonlinear systems of al-gebraic equations in each time step. Although these implicit schemes have been shownin some cases to be competitive with their explicit counterparts [48], there is not yetsufficient evidence to prefer one approach over the other, especially when consideringparallel implementations.Projection methods are a common class of fractional-step schemes for solving theincompressible Navier-Stokes equations, and are divided into two steps. First, the dis-cretized momentum equations are integrated in time to obtain an intermediate velocityfield that in general is not divergence-free. In the second step, the intermediate velocityis projected onto the space of divergence-free fields using the Hodge decomposition. Theprojection step typically requires the solution of large linear systems in each time stepthat are computationally costly and form a significant bottleneck in CFD codes. Thiscost is increased even more when a small time step is required for explicit implementa-tions. Note that even though some researchers make use of unsplit discretizations of theNavier-Stokes equations [23, 48], there is significant benefit to be had by using a split-step projection method as a preconditioner [22]. Therefore, any improvements made toa multi-step fluid solver can reasonably be incorporated into unsplit schemes as well.In this paper, we develop a fractional-step IB method that has the computationalcomplexity of a completely explicit method and exhibits excellent parallel scaling ondistributed-memory architectures. This is achieved by abandoning the projection methodparadigm and instead adopting the pseudo-compressible fluid solver developed by Guer-mond and Minev [28, 29]. Pseudo-compressibility methods relax the incompressibil-ity constraint by perturbing it in an appropriate manner, such as in Temam’s penaltymethod [59], the artificial compressibility method [9], and Chorin’s projection method [10,53]. Guermond and Minev’s algorithm differentiates itself by employing a directional-splitting strategy, thereby permitting the linear systems of size N d × N d typically arisingin projection methods (where d = 2 or 3 is the problem dimension) to be replaced witha set of one-dimensional tridiagonal systems of size N × N . These tridiagonal systemscan be solved efficiently on distributed-memory computing architectures by combiningThomas’s algorithm with a Schur-complement technique. This allows the proposed IBalgorithm to efficiently utilize parallel resources [18]. The only serious limitation of theIB algorithm is that it is restricted to simple geometries and boundary conditions dueto the directional-splitting strategy adopted by Guermond and Minev. However, sinceIB practitioners often use a rectangular fluid domain with periodic boundary conditions,this is not a serious limitation. Instead, the IB method provides a natural setting toleverage the strengths of Guermond and Minev’s algorithm allowing complex geometriesto be incorporated into the domain through an immersed boundary. This is a simplealternative to the fictitious domain procedure proposed by Angot et al. [1].In section 2, we begin by stating the governing equations for the immersed boundarymethod. We continue by describing our proposed numerical scheme in section 3 where weincorporate the higher-order rotational form of Guermond and Minev’s algorithm thatdiscretizes an O (cid:0) ∆ t (cid:1) perturbation of the Navier-Stokes equations to yield a formally O (cid:0) ∆ t / (cid:1) accurate method. As a result, the proposed method has convergence propertiessimilar to a fully second-order projection method, while maintaining the computationalcomplexity of a completely explicit method. In section 4, we discuss implementationdetails and highlight the novel aspects of our algorithm. Finally, in section 5 and 6, we3emonstrate the accuracy, efficiency and parallel performance of our method by meansof several test problems in 2D and 3D.
2. Immersed Boundary Equations
In this paper, we consider a d -dimensional Newtonian, incompressible fluid that fillsa periodic box Ω = [0 , H ] d having side length H and dimension d = 2 or 3. The fluidis specified using Eulerian coordinates, x = ( x, y ) in 2D or ( x, y, z ) in 3D. Immersedwithin the fluid is a neutrally-buoyant, elastic structure Γ ⊂ Ω that we assume is eithera single one-dimensional elastic fiber, or else is constructed from a collection of suchfibers. In other words, Γ can be a curve, surface or region. The immersed boundarycan therefore be described using a fiber-based Lagrangian parameterization, in whichthe position along any fiber is described by a single parameter s . If there are multiplefibers making up Γ (for example, for a “thick” elastic region in 2D, or a surface in 3D)then a second parameter r is introduced to identify individual fibers. The Lagrangianparameters are assumed to be dimensionless and lie in the interval s, r ∈ [0 , d = 2, and the extension to the three-dimensional case or for multiple fibersis straightforward. The fluid velocity u ( x , t ) = ( u ( x , t ) , v ( x , t )) and pressure p ( x , t ) atlocation x and time t are governed by the incompressible Navier-Stokes equations ρ (cid:18) ∂ u ∂t + u · ∇ u (cid:19) + ∇ p = µ ∇ u + f , (1) ∇ · u = 0 , (2)where ρ is the fluid density and µ is the dynamic viscosity (both constants). The term f appearing on the right hand side of (1) is an elastic force arising from the immersedboundary that is given by f ( x , t ) = (cid:90) Γ F ( s, t ) δ ( x − X ( s, t )) ds, (3)where x = X ( s, t ) = ( X ( s, t ) , Y ( s, t )) represents the IB configuration and F ( s, t ) isthe elastic force density. The delta function δ ( x ) = δ ( x ) δ ( y ) is a Cartesian product of1D Dirac delta functions, and acts to “spread” the IB force from Γ onto adjacent fluidparticles. In general, the force density F is a functional of the current IB configuration F ( s, t ) = F [ X ( s, t )] . (4)For example, the force density F [ X ( s, t )] = σ ∂∂s (cid:32) ∂ X ∂s (cid:32) − L | ∂ X ∂s | (cid:33)(cid:33) (5)corresponds to a single elastic fiber having “spring constant” σ and an equilibrium statein which the elastic strain | ∂ X /∂s | ≡ L . 4he final equation needed to close the system is an evolution equation for the im-mersed boundary, which comes from the simple requirement that Γ must move at thelocal fluid velocity: ∂ X ( s, t ) ∂t = u ( X ( s, t ) , t ) = (cid:90) Ω u ( x , t ) δ ( x − X ( s, t )) d x . (6)This last equation is simply the no-slip condition, with the rightmost equality corre-sponding to the delta function convolution form being more convenient for numericalcomputations because of its resemblance to the IB forcing term (3). Periodic boundaryconditions are imposed on both the fluid and the immersed structure and appropriate ini-tial values are prescribed for the fluid velocity u ( x ,
0) and IB position X ( s,
3. Algorithm
We now provide a detailed description of our algorithm for solving the immersedboundary problem. The novelty in our approach derives first and foremost from the use ofa pseudo-compressibility method for solving the incompressible Navier-Stokes equations,which is new in the IB context and is described in this section. The second novel aspectof our algorithm is in the implementation, which is detailed in section 4.
Pseudo-compressibility methods [53, 56] belong to a general class of numerical schemesfor approximating the incompressible Navier-Stokes equations by appropriately relaxingthe incompressibility constraint. An O ( (cid:15) ) perturbation of the governing equations isintroduced in the following manner ρ (cid:18) ∂ u (cid:15) ∂t + u (cid:15) · ∇ u (cid:15) (cid:19) + ∇ p (cid:15) = µ ∇ u (cid:15) + f , (7) (cid:15)ρ A p (cid:15) + ∇ · u (cid:15) = 0 , (8)where various choices of the generic operator A lead to a number of familiar numericalschemes. For example, choosing A = (the identity) corresponds to the penalty methodof Temam [59], A = ∂ t yields the artificial compressibility method [9], A = −∇ isequivalent to Chorin’s projection scheme [10, 53] (as long as the perturbation parameteris set equal to the time step, (cid:15) = ∆ t ), and A = −∇ ∂ t yields Shen’s method [55] (when (cid:15) = βρ (∆ t ) for some positive constant β ).Recently, Guermond and Minev [28, 29] proposed a new pseudo-compressibility methodwith excellent parallel scaling properties. The first-order version of their method can be5ast in the form of an O ( (cid:15) )-perturbation such as in equations (7)–(8) with (cid:15) = ∆ t and A = (cid:40) (1 − ∂ xx )(1 − ∂ yy ) in 2D , (1 − ∂ xx )(1 − ∂ yy )(1 − ∂ zz ) in 3D . They also proposed an O (cid:0) (cid:15) (cid:1) (second-order in time) variant that corresponds to thethree-stage scheme ρ (cid:18) ∂ u (cid:15) ∂t + u (cid:15) · ∇ u (cid:15) (cid:19) + ∇ p (cid:15) = µ ∇ u (cid:15) + f , (9) (cid:15)ρ A ψ (cid:15) + ∇ · u (cid:15) = 0 , (10) (cid:15) ∂p (cid:15) ∂t = ψ (cid:15) − χµ ∇ · u (cid:15) , (11)where ψ (cid:15) is an intermediate variable and χ ∈ (0 ,
1] is an adjustable parameter.For both variants of the method, corresponding to either (9)–(10) or (9)–(11), the mo-mentum equation is discretized in time using a Crank-Nicolson step and the viscous termis directionally-split using the technique proposed by Douglas [13]. The perturbed incom-pressibility constraint is solved using a straightforward discretization of the direction-splitfactors in the operator A that reduces to a set of one-dimensional tridiagonal systems.These simple linear systems can be solved very efficiently on a distributed-memory ma-chine by combining Thomas’s algorithm with a Schur-complement technique. This isachieved by expressing each tridiagonal system using block matrices and manipulatingthe original system into a set of block-structured systems and a Schur complement sys-tem. By solving these block-structured systems in parallel, the domain decompositioncan be effectively parallelized.It is important to note that Guermond and Minev’s fluid solver cannot be recast asa pressure projection algorithm; nevertheless, it has been demonstrated both analyti-cally [31] and computationally [30] to have comparable convergence properties to relatedprojection methods. More precisely, the higher-order algorithm we apply here yields aformally O (cid:0) ∆ t / (cid:1) accurate method for 2D flows, although in practice higher conver-gence rates are observed in both 2D and 3D computations.The main disadvantage of the algorithm is that it is limited to simple (rectangu-lar) geometries because of the use of directional-splitting. However, this is not a realdisadvantage in the immersed boundary context because complex solid boundaries canbe introduced by using immersed boundary points (attached to fixed “tether points”)that are embedded within a regular computational domain. In this way, the IB methodprovides a simple and efficient alternative to the fictitious domain approach [1] and re-lated methods that could be used to incorporate complex geometries into Guermond andMinev’s fluid solver. When discretizing the governing equations, we require two separate computationalgrids, one each for the Eulerian and Lagrangian variables. For simplicity, we state ourdiscrete scheme for a two-dimensional fluid ( d = 2) and a fiber consisting of a single6ne-dimensional closed curve. The immersed structure is discretized using N s uniformly-spaced points s k = kh s in the interval [0 , h s = 1 /N s and k =0 , , . . . , N s −
1. As a short-hand, we denote discrete approximations of the IB positionat time t n = n ∆ t by X nk ≈ ( X ( kh s , t n ) , Y ( kh s , t n )) , where n = 0 , , , . . . . Similarly, the fluid domain Ω = [0 , H ] is divided into an N × N ,uniform, rectangular mesh in which each cell has side length h = H/N . We employ a marker-and-cell (MAC) discretization [33] as illustrated in Figure 1, in which the pressure p ni,j ≈ p ( x i,j , t n )is approximated at the cell center points x i,j = (( i + 1 / h, ( j + 1 / h ) , for i, j = 0 , , . . . , N −
1. The velocities on the other hand are approximated at the edgesof cells u E ,ni,j = (cid:16) u E ,ni,j , v E ,ni,j (cid:17) , where u E ,ni,j ≈ u ( ih, ( j + 1 / h, t n ) and v E ,ni,j ≈ v (( i + 1 / h, jh, t n ) . The x -component of the fluid velocity is defined on the east and west cell edges, whilethe y -component is located on the north and south edges. Next, we introduce the discrete difference operators that are used for approximatingspatial derivatives. The second derivatives of a scalar Eulerian variable are replaced usingthe second-order centered difference stencils D xx p i,j = p i +1 ,j − p i,j + p i − ,j h and D yy p i,j = p i,j +1 − p i,j + p i,j − h . The same operators may be applied to the vector velocity, so that for example D xx u E i,j = (cid:34) D xx u E i,j D xx v E i,j (cid:35) . Since the fluid pressure and velocity variables are defined at different locations (i.e., cellcenters and edges respectively), we also require difference operators whose input andoutput are at different locations, and for this purpose we indicate explicitly the locationsof the input and output using a superscript of the form
Input → Output . For example, anoperator with the superscript C → E takes a cell-centered input variable (denoted “C”)7 igure 1: Location of fluid velocity and pressure variables on the staggered marker-and-cell (MAC) grid. and returns an output value located on a cell edge (denoted “E”). Using this notation,we may then define the discrete gradient operator G C → E as G C → E p i,j = (cid:18) p i,j − p i − ,j h , p i,j − p i,j − h (cid:19) , which acts on the cell-centered pressure variable and returns a vector-valued quantity onthe edges of a cell. Likewise, the discrete divergence of the edge-valued velocity D E → C · u E i,j = u i +1 ,j − u i,j h + v i,j +1 − v i,j h , which returns a cell-centered value.Difference formulas are also required for Lagrangian variables such as X k , for whichwe use the first-order one-sided difference approximations: D + s X k = X k +1 − X k h s and D − s X k = X k − X k − h s . Finally, when discretizing the integrals in (3) and (6), we require a discrete approximationto the Dirac delta function. Here, we make use of the following approximation δ h ( x ) = 1 h φ (cid:16) xh (cid:17) φ (cid:16) yh (cid:17) φ ( r ) = (3 − | r | + (cid:112) | r | − r ) if 0 ≤ | r | < , (5 − | r | − (cid:112) − | r | − r ) if 1 ≤ | r | < , ≤ | r | . (12)Peskin [50] derives this and other regularized delta function kernels by imposing variousdesirable smoothness and interpolation properties. We have chosen the form of δ h inequation (12) because numerical simulations have shown that it offers a good balancebetween accuracy and cost [6, 27, 57], not to mention that it is currently the approximatedelta function that is most commonly applied in other IB simulations. We are now prepared to describe our algorithm for the IB problem based on the fluidsolver of Guermond and Minev [29], which we abbreviate “IB-GM”. The fluid is evolvedin time in two main stages, both of which reduce to solving one-dimensional tridiagonallinear systems. In the first stage, the diffusion terms in the momentum equations areintegrated in time using the directional-splitting technique proposed by Douglas [13]. Thenonlinear advection term on the other hand is dealt with explicitly using the second-orderAdams-Bashforth extrapolation N n +1 / = 32 N ( u E ,ni,j ) − N ( u E ,n − i,j ) , (13)where N ( • ) is an approximation of the advection term u · ∇ u . In this paper, we writethe advection term in skew-symmetric form N ( u ) ≈ u · ∇ u + 12 ∇ · ( uu ) , (14)and then discretize the resulting expression using the second-order centered differencescheme studied by Morinishi et al. [45].In the second stage, the correction term ψ is calculated using Guermond and Minev’ssplitting operator [29], and the actual pressure variable is updated using the higher-ordervariant of their algorithm corresponding to (9)–(11). For all simulations, we use the sameparameter values χ = 0 . (cid:15) = ∆ t .For the remaining force spreading and velocity interpolation steps, we apply stan-dard techniques. The integrals appearing in equations (3) and (6) are approximated tosecond order using the trapezoidal quadrature rule and the fiber evolution equation (6)is integrated using the second-order Adams-Bashforth extrapolation.Assuming that the state variables are known at the ( n − n th time steps, theIB-GM algorithm proceeds as follows. Step 1.
Evolve the IB position to time t n +1 / = ( n + 1 / t :1a. Interpolate the fluid velocity onto immersed boundary points: U nk = (cid:88) i,j u E ,ni,j δ h ( x E i,j − X nk ) h . t n +1 using an Adams-Bashforth discretizationof (6): X n +1 k − X nk ∆ t = 32 U nk − U n − k . t n +1 / using the arithmetic average: X n +1 / k = 12 (cid:0) X n +1 k + X nk (cid:1) . Step 2.
Calculate the fluid forcing term:2a. Approximate the IB force density at time t n +1 / using (5): F n +1 / k = σ D − s D + s X n +1 / k − L (cid:12)(cid:12)(cid:12) D + s X n +1 / k (cid:12)(cid:12)(cid:12) . f E ,n +1 / i,j = (cid:88) k F n +1 / k δ h ( x E i,j − X n +1 / k ) h s . Step 3.
Solve the incompressible Navier–Stokes equations:3a. Predict the fluid pressure at time t n +1 / : p ∗ ,n +1 / i,j = p n − / i,j + ψ n − / i,j . u E , ∗ by integrating the momen-tum equations explicitly: ρ (cid:32) u E , ∗ i,j − u E ,ni,j ∆ t + N n +1 / (cid:33) = µ ( D xx + D yy ) u E ,ni,j − G C → E p ∗ ,n +1 / i,j + f E ,n +1 / i,j . u E , ∗∗ by solving the tridiagonalsystems corresponding to the x -derivative in the directional-split Laplacian: ρ (cid:32) u E , ∗∗ i,j − u E , ∗ i,j ∆ t (cid:33) = µ D xx (cid:16) u E , ∗∗ i,j − u E ,ni,j (cid:17) . t n +1 by solving the followingtridiagonal systems corresponding to the y -derivative piece of the directional-split Laplacian for u E ,n +1 i,j : ρ (cid:32) u E ,n +1 i,j − u E , ∗∗ i,j ∆ t (cid:33) = µ D yy (cid:16) u E ,n +1 i,j − u E ,ni,j (cid:17) . ψ n +1 / i,j by solving( − D xx ) ( − D yy ) ψ n +1 / i,j = − ρ ∆ t D E → C · u E ,n +1 i,j . t n +1 / using p n +1 / i,j = p n − / i,j + ψ n +1 / i,j − χµ D E → C · (cid:18)
12 ( u E ,n +1 i,j + u E ,ni,j ) (cid:19) . Note that in the first step of the algorithm with n = 0, we do not yet have an approx-imation of the solution at the previous time step, and therefore we make the followingreplacements: • In Step 1b, approximate the fiber evolution equation using a first-order forwardEuler approximation X k = X k + ∆ t U k . • In Step 3a, set p ∗ , / i,j = 0. • In Step 3b, the nonlinear term from equation (13) is replaced with N / = N ( u E , i,j ).
4. Parallel Implementation
Here we outline the details of the algorithm that relate specifically to the parallelimplementation. Since a primary feature of our algorithm is its parallel scaling proper-ties, it is important to discuss our implementation in order to understand the parallelcharacteristics of the method.
Suppose that the algorithm in section 3.4 is implemented on a distributed-memorycomputing machine with P = P x · P y processing nodes. The parallelization is performedby subdividing the rectangular domain Ω into equally-sized rectangular partitions { Ω (cid:96),m } ,with (cid:96) = 1 , , . . . , P x and m = 1 , , . . . , P y , where P x and P y refer to the number ofsubdivisions in the x – and y –directions respectively. Each node is allocated a singledomain partition Ω (cid:96),m , along with the values of the Eulerian and Lagrangian variablescontained within it. For example, the ( (cid:96), m ) node would contain in its memory the fluidvariables u E i,j and p i,j for all x i,j ∈ Ω (cid:96),m , along with all immersed boundary data X k and F k such that X k ∈ Ω (cid:96),m . This partitioning is illustrated for a simple 3 × a) (b)Figure 2: (a) Parallel domain decomposition with P x = P y = 3. (b) Communication required to updateghost cell regions for the subdomain Ω , . such as Hypre [15]. In particular, we use a fractional-step scheme that permits theimmersed boundary and fluid to be treated independently. The IB component of the al-gorithm is discretized in the same manner as Griffith [23] who used an Adams-Bashforthdiscretization to reduce the number of floating-point operations. Since this is an explicitdiscretization, the discrete immersed boundary can be viewed as a simple particle system– a collection of IB material points connected by force-generating connections – which is awell-established class of problems in the parallel computing community [2, 3]. Therefore,the major differences in parallel implementation come from the discrete delta functionwhose support allows particles to interact over multiple subdomains in the velocity in-terpolation and force spreading steps.For the fluid portion of the algorithm, we use the GM fluid solver as described in [29]with the following minor modifications: • the advection term is discretized in skew-symmetric form; • periodic boundary conditions are imposed on the fluid domain; • the directional-splitting order is rotated in each time step to reduce the possibilityof a directional bias; and • minor alterations are required to the parallel tridiagonal solver.By using the directional split strategy of Guermond and Minev, the discretized fluidequations deflate into a sequence of one-dimensional problems, which is where parallelismis introduced directly into the discretization. The most significant departure from othercommon IB schemes is that the GM solver is a pseudo-compressibility method that onlyapproximately satisfies the incompressibility constraint. It is yet to be seen how such afluid solver will handle the near-singular body forces that occur naturally in IB problems.12herefore, a comprehensive numerical study is required to test the accuracy, convergence,and volume conservation of the method.Next, we describe our approach for implementing data partitioning and communica-tion, which makes use of infrastructure provided by MPI [17] and PETSc [5]. Since thefluid and immersed boundary are discretized on two separate grids, the data partitioningbetween nodes must be handled differently in each case. The partitioning of Eulerianvariables is much simpler because the spatial locations are fixed in time and remain asso-ciated with the same node for the entire computation. In contrast, Lagrangian variablesare free to move throughout the fluid domain and so a given IB point may move betweentwo adjacent subdomains during the course of a single time step. As a result, the datastructure and communication patterns for the Lagrangian variables are more complex.Consider the communication required for the update of fluid variables in each timestep, for which the algorithm in section 3.4 requires the explicit computation of severaldiscrete difference operators. For points located inside a subdomain Ω (cid:96),m , the discreteoperators are easily computed; however for points on the edge of a domain partition,a difference operator may require data that is not contained in the current node’s localmemory. For example, when calculating the discrete Laplacian (using the 5-point stencil),data at points adjacent to the given state variable are required. As a result, when anadjacent variable does not reside in Ω (cid:96),m , communication is required with a neighbouringnode to obtain the required value. This communication is aggregated together using ghostcells that lie inside a strip surrounding each Ω (cid:96),m as illustrated in Figure 2(b). The widthof the ghost region is set equal to the support of the discrete delta function used in thevelocity interpolation and force spreading steps; that is, two grid points in the case ofthe delta-approximation (12). When a difference operator is applied to a state variablestored in the ( (cid:96), m ) node, the neighbouring nodes communicate the data contained inthe ghost cells adjacent to Ω (cid:96),m . After the ghost region is filled, the discrete differenceoperators may then be calculated for all points in Ω (cid:96),m . When combined with the parallellinear solver discussed later in section 4.2, this parallel communication technique permitsthe fluid variables to be evolved in time.As the IB points move through the fluid, the number of IB points residing in anyparticular subdomain can vary from one time step to the next. Therefore, the memoryrequired to store the local IB data structure changes with time, as does the communi-cation load. These complications are dealt with by splitting the data structure definingthe immersed boundary into two separate components corresponding to IB points andforce connections. The IB point ( IB ) data structure contains the position and velocityof all IB points resident in a given subdomain, whereas the force connection ( FC ) datastructure keeps track of all force-generating connections between these points. The forcedensity calculations depend on spatial information and so the IB data structure requiresa globally unique index (which we call the “primary key”) that is referenced by the FC data structure (the “foreign key”). This relationship is illustrated in Figure 3, where theforce connections shown are consistent with the elastic force function (5). If the IB datastructure is represented as an associative array using PointID as the key (and referencedas
IB[PointID] ) and
FC[i] represents a specific element of the force connection array,13hen the force density calculation may be written as
FC[i].Fdens = FC[i].sigma h s (cid:0) IB[ FC[i].LPointID ].X + IB[ FC[i].RPointID ].X − (cid:1) , where we have assumed here that the force parameter L = 0. The IB and FC datastructures are stored in a hash table and vector (respectively) using the standard STLcontainers in C++. (a)(b)Figure 3: (a) Relationship between the data structures for the IB points ( IB ) and elastic force connections( FC ). (b) References from a chosen force connection to the corresponding IB points. We are now prepared to summarize the complete parallel procedure that is used toevolve the fluid and immersed boundary. Keep in mind that at the beginning of eachtime step, a processing node contains only those IB points and force connections thatreside in the corresponding subdomain. The individual solution steps are: • Velocity interpolation:
Interpolate the fluid velocity onto the IB points and storethe result in
IB[ • ].U . This step requires fluid velocity data from the ghost region. • Immersed boundary evolution:
Evolve the IB points in time by updating
IB[ • ].X = X n +1 • . Note that the IB point position at the half time step ( IB[ • ].Xh = X n +1 / • )must also be stored for the force spreading step.14 Immersed boundary communication:
Send the data from IB points lying within theghost region to the neighbouring processing nodes. Figure 4 illustrates how the IBpoints residing in the ghost region corresponding to Ω (cid:96),m are copied from Ω (cid:96) +1 ,m (for both the full time step n + 1 and the half-step n + 1 / PointID = k, k + 1 , k + 2) and two force connections(with FC[i].PointID = k, k + 1) are communicated to Ω (cid:96),m . The additional IBpoint is required to calculate the force density for FC[i].PointID = k + 1. Becausethe IB point k − (cid:96),m , the force density can be computed for FC[i].PointID = k without any additional communication. • Force spreading:
Calculate the force density for all IB points in Ω (cid:96),m and thesurrounding ghost region at the time step n + 1 /
2. Then spread the force densityonto the Eulerian grid points residing in Ω (cid:96),m . • Immersed boundary cleanup:
Remove all IB points and corresponding force con-nections that do not reside in Ω (cid:96),m at time step n + 1. • Evolve fluid:
Evolve the fluid variables in time using the parallel techniques dis-cussed above. This requires communication with the neighbouring processing nodesto update the ghost cell region, and further communication is needed while solvingthe linear systems. x x x x x x x x x x x
Figure 4: IB points inside the ghost region surrounding Ω (cid:96),m are communicated from Ω (cid:96) +1 ,m . Using the approach outlined above, each processing node only needs to communicatewith its neighbouring nodes, with the only exception being the linear solver which weaddress in the next section. Since communication often hinders the performance of aparallel algorithm, this is the property that allows our method to scale so well. Forexample, if the problem size and number of processing nodes are doubled, then we wouldideally want the execution time per time step to remain unchanged.15 .2. Linear Solver
A key remaining component of the algorithm outlined in section 3.4 is the solution ofthe tridiagonal linear systems arising in the fluid solver. When solving the momentumequations the following linear systems arise: (cid:18) − µ ∆ t ρ D xx (cid:19) u E , ∗∗ i,j = u E , ∗ i,j − µ ∆ t ρ D xx u E ,ni,j , (15) (cid:18) − µ ∆ t ρ D yy (cid:19) u E ,n +1 i,j = u E , ∗∗ i,j − µ ∆ t ρ D yy u E ,ni,j , (16)while the pressure update step requires solving( − D xx ) ( − D yy ) ψ n +1 / i,j = − ρ ∆ t D E → C · u E ,n +1 i,j . This last equation can be split into two steps( − D xx ) ψ ∗ ,n +1 / i,j = − ρ ∆ t D E → C · u E ,n +1 i,j , (17)and ( − D yy ) ψ n +1 / i,j = ψ ∗ ,n +1 / i,j , (18)where ψ ∗ i,j is an intermediate variable. Note that each linear system in (15)–(18) involvesa difference operator that acts in one spatial dimension only and decouples into a setof one-dimensional periodic (or cyclic) tridiagonal systems. For example, equations (15)and (17) consist of N tridiagonal systems of size N × N having the general form A ( j ) Ψ i,j = b i,j , (19)for each j = 0 , , . . . , N − (cid:96), m ) contains only fluid data residing in subdomainΩ (cid:96),m , these tridiagonal linear systems divide naturally between nodes. Each node solvesthose linear systems for which it has the corresponding data b i,j ∈ Ω (cid:96),m as illustratedin Figure 5. For example, when solving (19) along the x -direction, each processing nodesolves N/P y linear systems and the total work is spread over P x nodes. Similarly, whensolving the corresponding systems along the y -direction, each processing node solves N/P x systems spread over P y nodes.Each periodic tridiagonal system is solved directly using a Schur-complement tech-nique [54, sec. 14.2.1]. This is achieved by rewriting the linear equations as a block-structured system where the interfaces between blocks correspond to those for the sub-domains. To illustrate, let us consider an example with P = 2 processors only, for which16 { (a) { { (b)Figure 5: (a) Coupling direction for linear systems (15) and (18). (b) Coupling direction for linearsystems (16) and (17). Each processing node participates in solving N/P x,y tridiagonal systems andrequires communication in the direction of the arrows. the periodic tridiagonal system a b c c a b . . . . . . c M − a M − b M − c M a M b M c M +1 a M +1 b M +1 . . . . . . b N c N a N y x ...... x M − y x M +1 ...... x N = g f ...... f M − g f M +1 ...... f N arises from a single row of unknowns in Figure 5(a) (or a single column in Figure 5(b)). Inthis example, the indices M − M refer to the subdomain boundary points (denotedwith a vertical line in the matrix above) so that the data ( y , x , . . . , x M − , g , f , . . . , f M − ) reside on processor 1 and ( y , x M +1 , . . . , x N , g , f M +1 , . . . , f N ) on processor 2.To isolate the coupling between subdomains, the rows in the matrix are reordered toshift the unknowns at periodic subdomain boundaries ( y and y ) to the last two rows,and then the columns are reordered to keep the diagonal entries on the main diagonal.17his yields the equivalent linear system a b c . . . . . . c M − a M − b M − a M +1 b M +1 c M +1 . . . . . . c N a N b N b c a c M b M a M x ...... x M − x M +1 ...... x N y y = f ...... f M − f M +1 ...... f N g g , which has the block structure B E B E F F C x x y = f f g . In the more general situation with P subdomains, the block structure becomes B E B E . . . ... B P E P F F · · · F P C x x ... x P y = f f ... f P g , or more compactly (cid:20) B EF C (cid:21) (cid:20) xy (cid:21) = (cid:20) fg (cid:21) , (20)where C ∈ R P × P , B ∈ R ( N − P ) × ( N − P ) , E ∈ R ( N − P ) × P , and F ∈ R P × ( N − P ) . Here, x and f denote the data located in the interior of a subdomain while y and g denote thedata residing on the interface between subdomains. Next, we use the LU factorizationto rewrite the block matrix from (20) as (cid:20) B EF C (cid:21) = (cid:20) I F B − I (cid:21) (cid:20) B E S (cid:21) , where S = C − F B − E is the Schur complement. Using this factorized form, we candecompose the block system into the following three smaller problems: Bf ∗ = f , (21) Sy = g − F f ∗ , (22) Bx = Bf ∗ − Ey . (23)Based on this decomposition, we can now summarize the solution procedure as follows:18 Local tridiagonal solver:
Each processor solves a local non-periodic tridiagonalsystem B p f ∗ p = f p , which can be solved efficiently using Thomas’s algorithm. The matrices B p are thenon-periodic tridiagonal blocks in the block diagonal matrix B . • Gather data to master node:
Each processor sends three scalar values to the masternode corresponding to the first and last entries of the vector f ∗ p , as well as the scalar g p . Because F p is sparse, only a few values are required to construct the right handside of the Schur complement system. • Solve Schur complement system:
On the master node, solve the reduced P × P Schurcomplement system (22). Based on the sparsity patterns of F and E , the Schurcomplement matrix S is periodic and tridiagonal and therefore can be invertedefficiently using Thomas’s algorithm. • Scatter data from master node:
The master node scatters two scalar values from y to each processor. Because of the sparsity of B − E , only a few values of y arerequired in the next step. Therefore, the p th processor only requires the entries of y numbered p and mod( p + 1 , P ). • Correct local solution:
Each processor corrects its local solution x p = f ∗ p − B − p E p y , using the local values f ∗ p computed in the first step.The tridiagonal systems above can be parallelized very efficiently. As already indicatedearlier, this procedure only requires two collective communications – scatter and gather– and because global communication only occurs along one spatial direction the commu-nication overhead increases only marginally with the number of processors. A furthercost savings derives from the fact that the tridiagonal systems do not change from onetime step to the next, and so the matrices S and B − E can be precomputed.The only potential bottleneck in this procedure is in solving the reduced Schur com-plement system (22). Since the reduced system is solved only on the master node, clockcycles on the remaining idle nodes are wasted at this time. Furthermore, this wasted timeincreases as the number of processors increase since the Schur complement system growswith P . Fortunately, the IB algorithm never solves just a single tridiagonal system. Forexample, when solving (19) in the x -direction, P = P x processing nodes work togetherto solve N/P y tridiagonal systems. Therefore, the N/P y systems when solved togetherrequire solving N/P y different Schur complement systems. This workload can be spreadout evenly between the P processors thereby keeping all processors occupied.
5. Numerical Results
To test the accuracy of our algorithm, we consider two model problems. The first is anidealized one-dimensional elliptical membrane with zero thickness that is immersed in a19D fluid and undergoes a damped periodic motion. Here, the immersed boundary exertsa singular force on the fluid and results in a pressure jump across the membrane thatreduces the method’s order of accuracy. The second model problem is a generalizationof the first in which we consider a thick immersed boundary, made up of multiple fiberlayers in which the elastic stiffness is reduced smoothly to a value of zero at the edges. Byproviding the immersed boundary in this example with a physical thickness, the externalforce is no longer singular, which then leads to higher-order convergence rates.
For our first 2D model problem, the initial configuration is an elliptical membranewith semi-axes r and r , parameterized by X ( s,
0) = (cid:18)
12 + r cos(2 πs ) ,
12 + r sin(2 πs ) (cid:19) , with s ∈ [0 , u ( x ,
0) = 0. We see from the solution snapshots in Figure 6 that the elas-tic membrane undergoes a damped periodic motion, oscillating back and forth betweenelliptical shapes having a semi-major axis aligned with the x - and y -directions. Theamplitude of the oscillations decreases over time, and the membrane tends ultimatelytoward a circular equilibrium state with radius approximately equal to √ r r (which hasthe same area as the initial ellipse).For this problem, we actually computed results using two immersed boundary algo-rithms corresponding to different fluid solvers. The first algorithm, denoted GM-IB, isthe same one described in section 3.4 that uses Guermond and Minev’s fluid solver. Thesecond algorithm, denoted BCM-IB, is identical to the first except that the fluid solveris replaced with the second-order projection method described by Brown, Cortez, andMinion [7]. We take values of the parameters from Griffith [23], who used µ = 0 . ρ = 1, r = , r = and N s = N . We then compare our numerical resultsfor different choices of the membrane elastic stiffness ( σ ) and spatial discretization ( N ).Unless stated otherwise, the time step is chosen so that the simulation is stable on thefinest spatial grid (with N = 512). This is a conservative choice for the time step thatattempts to avoid any unreasonable accumulation of errors in time, but it also provideslimited information regarding the time step restrictions for the two methods. However,we observe in practice that there is little difference between the time step restrictions forthe GM-IB and BCM-IB algorithms, although GM-IB does have a slightly stricter timestep restriction than BCM-IB.Because the fluid contained within the immersed boundary cannot escape, the areaof the oscillating ellipse should remain constant in time. However, many other IB com-putations for this thin ellipse problem exhibit poor volume conservation which manifestsitself as an apparent “leakage” of fluid out of the immersed boundary. The source of thisvolume conservation error is numerical error in the discrete divergence-free condition forthe interpolated velocity field located on immersed boundary points, which can be non-zero even when the fluid solver guarantees that the velocity is discretely divergence-freeon the Eulerian fluid grid [46, 51]. Griffith [23] observed that volume conservation can beimproved by using a pressure-increment fluid solver instead of a pressure-free solver, andfurthermore that fluid solvers based on a staggered grid tended to perform better than20 . . . . . . x . . . . . . y t=0.0300 (a) . . . . . . x . . . . . . y t=0.1000 (b) . . . . . . x . . . . . . y t=0.2000 (c) . . . . . . x . . . . . . y t=0.4100 (d)Figure 6: Snapshots of a thin oscillating ellipse using the GM-IB method, with parameters σ = 1, N = 256 and ∆ t = 0 . / . . . . . . . . Time . . . . . . . R a d i u s Ellipse Radius
Max Radius (N=64)Max Radius (N=128)Max Radius (N=256)Max Radius (N=512)Mean Radius (N=64)Mean Radius (N=128)Mean Radius (N=256)Mean Radius (N=512)Equilibrium (a) . . . . . . x − − P r e ss u r e Pressure ( t = 0 . , y = 0 . ) N=64N=128N=256N=512 (b)Figure 7: Results for the thin ellipse problem using the GM-IB method with σ = 1 and ∆ t = 0 . / x –axis with y = 0 . t = 0 . those using a collocated grid. We have employed both of these ideas in our proposedmethod and so we expect to see significant improvement in volume conservation relativeto other IB methods.We begin by plotting the maximum and mean radii of the ellipse versus time inFigure 7(a), from which it is clear that the immersed boundary converges to a circularsteady state having radius √ r r = . The BCM-IB results are indistinguishable fromthose using GM-IB, and so only the latter are depicted in this figure. The low rate ofvolume loss observed in both algorithms is consistent with the numerical experimentsof Griffith [23]. Owing to the relatively high Reynolds number for this flow ( Re ≈ Re flows. We suspect that thisfrequency error could be reduced significantly by employing higher-order approximationsin the nonlinear advection term and the IB evolution equation (6), such as has beendone by Griffith [24]. Finally, we note that Figure 7(b) shows that the GM-IB algorithmcaptures the discontinuity in pressure without any visible oscillations.We next estimate the error and convergence rate for both algorithms. Because thethin ellipse problem is characterized by a singular IB force, there is a discontinuity invelocity derivatives and pressure and so our numerical scheme is limited to first orderaccuracy. We note that improvements in the convergence rate could be achieved byexplicitly incorporating these discontinuities into the difference scheme, for example asis done in the immersed interface method [38, 39].When reporting the error in a discrete variable q N that is approximated on a grid atrefinement level N , we use the notation E [ q ; N ] = (cid:107) q N − q exact (cid:107) . (24)22ecause the exact solution for the thin ellipse problem is not known, we estimate q exact by using the approximate solution on the finest mesh corresponding to N f = 512 with theBCM-IB algorithm, and then take q exact = I N f → N q N f , where I N f → N is an operator thatinterpolates the finest mesh solution q N f onto the current coarse mesh with N points. Weuse the discrete (cid:96) norm to estimate errors, which is calculated for an Eulerian quantitysuch as the pressure using (cid:107) p i,j (cid:107) = h (cid:88) i,j | p i,j | / , (25)and similarly for a Lagrangian quantity such as the IB position using (cid:107) X k (cid:107) = (cid:32) h s (cid:88) k | X k | (cid:33) / , (26)where | · | represents the absolute value in the first formula and the Euclidean distancein the second. The convergence rate can then be estimated using solutions q N , q N and q N on successively finer grids as R [ q ; N ] = log (cid:18) (cid:107) q N − I N → N q N (cid:107) (cid:107) q N − I N → N q N (cid:107) (cid:19) . (27)A summary of convergence rates and errors is given in Tables 1 and 2 for both theGM-IB and BCM-IB algorithms, taking different values of the elastic stiffness parameter σ . The error in all cases is measured at a time three-quarters through the ellipse’s firstoscillation, when the membrane is roughly circular in shape. Table 1 clearly shows thatthe two algorithms exhibit similar convergence rates for all state variables. First-orderconvergence is seen in both the fluid velocity and membrane position, while the pressureshows the expected reduction in accuracy to O (cid:0) h / (cid:1) owing to the pressure discontinuity.The errors in Table 2 show that GM-IB and BCM-IB are virtually indistinguishable fromeach other except for the error in the divergence-free condition, E [ ∇ · u ; N ], where theBCM-IB algorithm appears to enforce the incompressibility constraint better than GM-IB. Because Guermond and Minev’s fluid solver does not project the velocity field ontothe space of divergence-free velocity fields (even approximately), it is not surprising thatBCM-IB performs better in this regard. We remark that the magnitude of the fluidvariables increases with the stiffness σ , so that the error increases as well (since E isdefined as an absolute error measure); however, the relative error and convergence ratesremain comparable as σ varies over several orders of magnitude.Lastly, we examine the issue of volume conservation by considering the volume (area)of the membrane for the GM-IB and BCM-IB methods as the solution approaches steady-state. Ideally, the membrane area should remain constant in time with a value of πr r because the fluid contained inside the immersed boundary cannot escape. For a mem-brane with an elastic stiffness of σ = 1, the volume conservation is illustrated in Table 3.For both numerical schemes, the loss of enclosed volume is less than one percent by thetime the solution attains a quasi-steady state near t = 4. The same is true of the corre-sponding simulations using σ = 10 and σ = 0 .
1, where the quasi-steady state is reachedat t = 2 and t = 4 respectively. 23 able 1: Estimated (cid:96) convergence rates for the thin ellipse problem with three different parametersets: ( σ = 0 . t = 1 .
06, ∆ t = 0 . / σ = 1, t = 0 .
31, ∆ t = 0 . / σ = 10, t = 0 . t = 0 . / R [ u ; N ] R [ p ; N ] R [ X ; N ] σ N GM BCM GM BCM GM BCM0 . .
02 1 .
02 0 .
55 0 .
55 1 .
46 1 . .
05 1 .
06 0 .
53 0 .
53 1 .
28 1 .
291 64 1 .
48 1 .
51 0 .
72 0 .
73 1 .
34 1 . .
96 1 .
03 0 .
57 0 .
58 1 .
31 1 . .
27 1 .
33 0 .
88 0 .
84 1 .
35 1 . .
89 1 .
03 0 .
68 0 .
82 1 .
32 1 . Table 2: Estimated (cid:96) errors the thin ellipse problem with three different parameter sets: ( σ = 0 . t = 1 .
06, ∆ t = 0 . / σ = 1, t = 0 .
31, ∆ t = 0 . / σ = 10, t = 0 . t = 0 . / E [ u ; N ] E [ p ; N ] E [ X ; N ] E [ ∇ · u ; N ] σ N GM BCM GM BCM GM BCM GM BCM0 . . − . − . − . − . − . − . − . − . − . − . − . − . − . − . − . − . − . − . − . − . − . − . − . − . − . − . − . − . −
151 64 5 . − . − . − . − . − . − . − . − . − . − . − . − . − . − . − . − . − . − . − . − . − . − . − . − . − . − . − . − . − . − . − . . . − . − . − . − . − . − . . . − . − . . − . − . − . . . − . − . . − . − . . − . . − E [ X ; N ]), the difference in volume conser-vation has negligible impact on the solution’s accuracy. It is only when approaching thestability boundaries (in terms of the allowable time step) that the difference in volumeconservation becomes noticeable. Lastly, when reducing the time step, the volume con-servation in the GM-IB algorithm improves noticeably, which is not surprising since theGM fluid solver introduces an O (∆ t ) perturbation to the incompressibility constraint (8).Furthermore, from Table 4, we see that leakage rate of the membrane is not affected bythe time step and is nearly identical to BCM-IB. Table 3: The loss of enclosed volume (relative error) of the immersed boundary at time t = 4 when σ = 1. GM-IB BCM-IBN ∆ t = . ∆ t = . ∆ t = . ∆ t = . ∆ t = .
64 2 . − . − . − . − . − . − . − . − . − . − . − . − . − . − . − . − . − . − . − . − Table 4: The temporal leakage rate of the membrane (relative error) over the time interval t ∈ [2 , σ = 1 which is obtained using linear least squares fit. GM-IB BCM-IBN ∆ t = . ∆ t = . ∆ t = . ∆ t = . ∆ t = .
64 1 . − . − . − . − . − . − . − . − . − . − . − . − . − . − . − . − . − . − . − . − Our second test problem involves the thick elastic shell pictured in Figure 8 that hasbeen studied before by Griffith and Peskin [27]. This is a natural generalization of thethin ellipse problem, wherein the shell is treated using a nested sequence of ellipticalimmersed fibers. The purpose of this example is not only to illustrate the application ofour algorithm to more general solid elastic structures, but also to illustrate the genuinesecond-order accuracy of our numerical method for problems that are sufficiently smooth.To this end, we take an elliptical elastic shell with thickness γ using two independentLagrangian parameters s, r ∈ [0 ,
1] and specify the initial configuration by X ( s, r,
0) = (cid:18)
12 + ( r + γ ( r − / πs ) ,
12 + ( r + γ ( r − / πs ) (cid:19) . The shell is composed of circumferential fibers having an elastic stiffness that varies inthe radial direction according to σ ( r ) = 1 − cos(2 πr ) . f is a continuous function of x ; this should be contrastedwith the “thin ellipse” example in which the fluid force is singular, since it consists of a1D delta distribution in the tangential direction along the membrane. As a result, weexpect in this example to observe higher order convergence because the solution does notcontain the discontinuities in pressure and velocity derivatives that were present in thethin ellipse problem. Unless otherwise indicated, we take the parameter values ρ = 1, r = 0 . r = 0 . γ = 0 . N s = (75 / N , N r = (3 / N and ∆ t = 0 . /
512 thatare consistent with the computations in [27].The dynamics of the thick ellipse problem illustrated in Figure 8 are qualitativelysimilar to those in the previous section, in that the elastic shell undergoes a dampedoscillation. In Table 5, we present the (cid:96) convergence rates in the solution for differentvalues of fluid viscosity µ . We also include the corresponding results computed by Griffithand Peskin [27] and observe that the GM-IB, BCM-IB, and Griffith-Peskin algorithms allexhibit remarkably similar convergence rates. The (cid:96) errors for the GM-IB and BCM-IBmethods are almost identical to Griffith and Peskin’s, and so we have not reported themfor this example.It is only when the viscosity is taken very small ( µ = 0 . Re (cid:47) Table 5: Estimated (cid:96) convergence rates R [ q ; 128] for the thick ellipse problem at time t = 0 .
4. Forcomparison, Griffith’s results [27] are reported in the final row. Since Griffith reports the component-wiseconvergence rate of the velocity field, we approximate R [ u ; 128] ≈ max( R [ u ; 128] , R [ v ; 128]). µ = 0 . µ = 0 . µ = 0 . u p X u p X u p X GM-IB 2 .
10 1 .
88 1 .
69 2 .
12 1 .
88 1 .
76 2 .
11 1 .
88 1 . .
11 1 .
88 1 .
69 2 .
09 1 .
87 1 .
74 2 .
09 1 .
87 1 . . ∗ .
89 1 .
98 – – – 2 . ∗ .
86 1 . Lastly, we investigate the accuracy with which our discrete solution satisfies the dis-crete divergence-free condition for a variety of time steps and spatial discretizations. Ouraim in this instance is to determine how well the fluid solver of Guermond and Minevapproximates the incompressibility constraint, which is related to the volume conserva-tion issue discussed in the thin ellipse example. Table 6 lists values of the error in thediscrete divergence of velocity, E [ ∇ · u ; N ], measured at time t = 0 . . . . . . . x . . . . . . y t=0.0100 (a) . . . . . . x . . . . . . y t=0.0800 (b) . . . . . . x . . . . . . y t=0.2000 (c) . . . . . . x . . . . . . y t=0.3200 (d)Figure 8: Snapshots of a thick oscillating ellipse using GM-IB method, with parameters µ = 0 . N = 256 and ∆ t = 0 . / E [ ∇ · u ; N ] increases slightly as the spatial discretizationis refined, but decreases when a smaller time step is used. This last result is to be ex-pected because Guermond and Minev use a O (∆ t ) perturbation of the incompressibilityconstraint (8). Table 6: Error in the divergence-free condition E [ ∇ · u ; N ] for the thick ellipse problem using the GM-IBmethod and µ = 0 . ∆ t = . ∆ t = . ∆ t = . ∆ t = . ∆ t = . ∆ t = . N = 64 6 . − . − . − . − . − . − N = 128 8 . − . − . − . − . − . − N = 256 1 . − . − . − . − . − . − N = 512 1 . − . − . − . − . − . −
6. Parallel Performance Results
We now focus on comparing the parallel performance of our algorithm (GM-IB) withan analogous projection-based scheme (BCM-IB). We begin by comparing the perfor-mance difference between solving the pure Poisson problem that plays a central role inprojection schemes, versus Guermond and Minev’s directional-split counterpart. Thiscaptures the major differences between the fluid solvers used in the corresponding IBalgorithms. We then follow by performing weak and strong scalability tests for the fullimmersed boundary problem.
In this section, we compare the performance of several Poisson solvers with our tridi-agonal solver described in section 4.2. Since any standard projection scheme requiressolving a Poisson problem, this performance study encapsulates the major differencesbetween Guermond and Minev’s fluid solver and other projection-based approaches (forexample, that of Brown-Cortez-Minion). To illustrate the comparison, we consider theproblem (cid:26) A ψ = f ( x ) in Ω = [0 , d ,ψ is periodic on ∂ Ω , (28)where f ( x ) = (cid:26) sin(2 πx ) cos(2 πy ) when d = 2 , sin(2 πx ) cos(2 πy ) cos(2 πz ) when d = 3 , and A is either the Laplacian operator ( A = ∇ ) or the directional-split operator A =(1 − ∂ xx )(1 − ∂ yy ) (when d = 2).When solving the Poisson problem, we compare with two other solvers: one basedon FFTs and the other on multigrid. For both of these solvers we discretize the prob-lem using a second-order finite difference scheme. In the FFT-based solver, the dif-ference scheme is rewritten in terms of the Fourier coefficients and solved using the28eal-to-complex and complex-to-real transformations found in FFTW [16]. For the multi-grid solver, we use a highly scalable multigrid preconditioner (PFMG) implemented inHypre [15] that is used with a conjugate gradient solver. When performing the com-parison for the directional-split problem, the discrete system decouples into a set ofone-dimensional tridiagonal systems that we solve using the techniques described insection 4.2. The major differences here occur in terms of the domain partitioning,which for the FFT-based solver involves a slab decomposition, whereas the multigridand directional-split solvers use square-like subdomains.Throughout our performance study, times are collected using MPI and the best resultof multiple runs is reported. All simulations are performed using the Bugaboo clustermanaged by WestGrid [63], a member of the high-performance computing consortiumCompute Canada. This cluster consists of 12-core blades, each containing two Intel XeonX5650 6-core processors (2.66 GHz) that are connected by Infiniband using a 288-portQLogic switch.First, we evaluate the strong scaling property of each solver by running a sequenceof simulations in which the problem size is held fixed as the number of processors in-creases. For the three-dimensional computations, the problems are solved on N = 128and N = 256 grids, which are common resolutions used in 3D IB calculations. For thetwo-dimensional computations, the problems are solved on grids that are larger thanusual ( N = 2048 and 4096). The strong scaling results are given in Tables 7 and 8,which report the execution time T P and parallel efficiency E P = T P T P , for P processors. The parallel efficiency quantifies how well the processors are utilizedthroughout a computation, where a value of E P = 1 corresponds to the ideal case andsmaller values indicate a reduced parallel efficiency. Note that a reduction in efficiency isexpected since the serial computation involves no Schur complement systems but insteadcomputes the tridiagonal systems directly.In all parallel computations, we observe that the directional-split solver is stronglyscalable ( E P > .
8) and outperforms both Poisson solvers by a significant margin. In-deed, when comparing the directional-split solver to multigrid, there is an order of magni-tude difference in execution time. For all multigrid computations, the conjugate gradientsolver required 6 iterations which makes the directional-split solver a factor of 2 to 5 timesfaster than a single multigrid iteration. When comparing the directional-split solver tothe FFT-based solver, the difference in execution times is much smaller, particularlywhen using fewer processors. However, as the processor count increases, we still see atwo-fold or greater performance improvement when using the directional-split solver.Besides the performance improvements, the Guermond and Minev fluid solver has afew additional advantages over FFT-based fluid solvers. First of all, FFT-based solversare restricted to periodic boundaries while the directional-split solver has no such re-striction. For example, the Guermond-Minev fluid solver can inexpensively computedriven-cavity and (periodic) channel flows without the use of immersed boundaries [30].Secondly, the slab decomposition used by many FFT libraries (such as FFTW) canlead to serious load-balancing issues in the immersed boundary context as indicated byYau [64]. Note that this could be mitigated somewhat in 3D simulations by moving to29 pencil decomposition that consequently allows for more processors to be used in thecomputation [52].
Table 7: Execution time T P and parallel efficiency E P for the 2D problem (28) on an N grid with P processors. Multigrid FFT Directional-Split
P T P E P T P E P T P E P N = 2048 1 4 . −− . − −− . − −− . − .
80 7 . − .
49 3 . − . . − .
87 3 . − .
54 1 . − . . − .
77 2 . − .
39 8 . − . . − .
80 1 . − .
34 4 . − . . − .
77 1 . − .
20 2 . − . N = 4096 1 1 . −− . −− . −− . .
54 3 . − .
47 1 . − . . .
47 2 . − .
39 7 . − . . − .
48 1 . − .
36 4 . − . . − .
45 6 . − .
33 2 . − . . − .
48 3 . − .
27 9 . − . . − .
51 2 . − .
20 5 . − . n d (where N = n · P x ) is held fixed as the number of processors is increased. In the idealcase, the execution time should stay constant so that the workload per processor doesnot change as the number of nodes increase. For 2D problems ( d = 2), the local gridresolution on each subdomain is either n = 128 or 256, while for 3D problems ( d = 3)we use either n = 32 or 64.As expected [4, 29], both solvers are weakly scalable since the execution time staysessentially constant as the problem size and number of processors increase. For the n = 128 simulations, the execution time jumps suddenly at P = 25, which is due toincreased communication costs occurring between blades inside the same chassis. Sincethe blades in a chassis are connected through the same switch where multiple cores sharethe same connection, and since the work load per processor is so small, a noticeable jumpappears as a result of resource contention.When comparing the execution time between solvers, the directional-split solver isaround 1 . N , the number of processors P , and the hardware configuration of the clus-ter. Furthermore, when solving the Poisson problem, we used the two highly optimized30 able 8: Execution time T P and parallel efficiency E P for the 3D problem (28) on an N grid with P processors. Multigrid FFT Directional-Split
P T P E P T P E P T P E P N = 128 1 2 . −− . − −− . − −− . − .
74 3 . − .
63 2 . − . . − .
73 1 . − .
53 1 . − . . − .
62 1 . − .
42 6 . − . . − .
58 6 . − .
38 3 . − . . − .
38 4 . − .
27 1 . − . N = 256 1 2 . −− . −− . −− . .
75 2 . − .
67 2 . − . . .
70 1 . − .
58 1 . − . . .
59 8 . − .
52 5 . − . . − .
50 5 . − .
41 2 . − . . − .
42 3 . − .
32 1 . − . . − .
37 2 . − .
24 7 . − . Number of Processors ( P ) . . . . . . E x e c u t i o n T i m e ( T P ) Weak Scaling in 2D
MG-256 (Time per Iteration)DS-256MG-128 (Time per Iteration)DS-128 (a)
Number of Processors ( P ) . . . . . . . . E x e c u t i o n T i m e ( T P ) Weak Scaling in 3D
MG-64 (Time per Iteration)DS-64MG-32 (Time per Iteration)DS-32 (b)Figure 9: Weak scaling of the multigrid and directional-split solvers when approximating problem (28)in 2D and 3D. For the 2D computations, MG-128 and MG-256 denote the execution time of a singlemultigrid iterations using local n = 128 and 256 grids. Likewise, DS-128 and DS-256 denote the executiontimes for completely solving the directional-split problem. For 3D computations, we use a local n = 32and n = 64 grids where the solver is specified using the same 2D naming convention. The next example is designed to explore in more detail the parallel performance ofGM-IB and BCM-IB (with Hypre) by computing a variation of the thin ellipse problemfrom section 5.1. Because our 2D computations are performed on a doubly-periodic fluiddomain, the thin ellipse geometry is actually equivalent to an infinite array of identicalelliptical membranes. This periodicity in the solution provides a simple mechanism forincreasing the computational complexity of a simulation by explicitly adding multipleperiodic copies while technically solving a problem with a solution that is identical tothat for a single membrane. Each copy of the original domain (see section 5.1) maythen be handled by a different processing node, which allows us to explore the parallelperformance in an idealized geometry.Suppose that we would like to perform a parallel simulation using P = P x · P y processing nodes. On such a cluster, we can simulate a rectangular P x × P y array ofidentical ellipses, situated on the fluid domain Ω = [0 , P x ] × [0 , P y ]. We subdivide thedomain into equal partitions so that each processor handles the unit-square subdomainΩ (cid:96),m = [ (cid:96) − , (cid:96) ] × [ m − , m ], for (cid:96) = 1 , , . . . , P x and m = 1 , , . . . P y . If we denoteby ( x (cid:96),m , y (cid:96),m ) the centroid of Ω (cid:96),m , then each such subdomain contains a single ellipsehaving the initial configuration X (cid:96),m ( s,
0) = ( x (cid:96),m + r cos(2 πs ) , y (cid:96),m + r sin(2 πs )) , where s ∈ [0 ,
1] is the same Lagrangian parameter as before. In order to make theflow slightly more interesting, and to test the ability of our parallel algorithm to handleimmersed boundaries that move between processing nodes, we impose a constant back-ground fluid velocity field u ( x ,
0) = (cid:0) , √ (cid:1) instead of the zero initial velocity used insection 5.1. Snapshots of the solution for a 2 × P x and P y in therange [1 , µ = 0 . ρ = 1, σ = 1, r = , r = , h s = h and ∆ t = 0 . h , and we compute up to time t = 1 .
00 using twovalues of the fluid mesh width h = and . In the case of perfect parallel scalingthe execution time should remain constant between simulations, because of our problemconstructon in which doubling the problem size also double the number of nodes P .Therefore, the problem represents a weak scalability test for our algorithm in which theworkload per processor node remains constant as the number of nodes increase.The execution times for various array sizes ( P x , P y ) are summarized in Figure 11for both IB solvers. The execution time remains roughly constant in both cases, whichindicates that the GM-IB and BCM-IB implementations are essentially weakly scalable.Notice that there is a slight degradation in performance as P increases; for example, onthe h = grid the GM-IB execution time increases by roughly 20% between P = 64to P = 254, which is minimal considering the large variation in problem size.When comparing solvers, GM-IB outperforms BCM-IB by more than a factor of 5in execution time. This difference in performance is largely due to the efficiency of32 . . . . . x . . . . . y t=0.0100 (a) . . . . . x . . . . . y t=0.1700 (b) . . . . . x . . . . . y t=0.4100 (c) . . . . . x . . . . . y t=0.5400 (d)Figure 10: Simulation of a 2 ×
50 100 150 200 250
Number of Processors ( P ) E x e c u t i o n T i m e ( T P ) Weak Scaling in 2D
BCM-IB-128GM-IB-128 BCM-IB-256GM-IB-256
Figure 11: Execution time (in seconds) for the multiple thin ellipse problem using the BCM-IB andGM-IB algorithms ( P ellipses, P processors, local grids with n = 128 and 256). the linear solvers. Here, the BCM-IB solver uses conjugate gradient with a multigridpreconditioner (PFMG) implemented within Hypre [15], where the initial guess is set tothe solution from the previous time step. For this particular problem, the multigrid solvertypically requires 2 iterations to solve the momentum equations and 5 iterations for theprojection step. Naturally, if either iteration count could be reduced, the performance ofthe BCM-IB solver would improve significantly. However, since an iteration of multigridis substantially slower than the directional-split solver (see section 6.1), GM-IB wouldcontinue to outperform BCM-IB. For our final test case, we consider a three-dimensional example in which the immersedboundary is a cylindrical elastic shell. The cylinder initially has an elliptical cross-sectionwith semi-axes r and r that is parameterized by X ( s, r,
0) = (cid:18) r,
12 + r cos(2 πs ) ,
12 + r sin(2 πs ) (cid:19) , using the two Lagrangian parameters s, r ∈ [0 , F [ X ( s, r, t )] = σ s ∂ X ∂s + σ r ∂∂r (cid:32) ∂ X ∂r (cid:32) − L (cid:12)(cid:12) ∂ X ∂r (cid:12)(cid:12) (cid:33)(cid:33) , which corresponds to an elastic shell made up of an interwoven mesh of one-dimensionalelastic fibers. The s parameterization identifies individual fibers running around the34lliptical cross-section of the cylinder, each having zero resting length and elastic stiffness σ s . On the other hand, the r parameterization describes fibers running axially along thelength of the cylinder, each having a non-zero resting-length L and stiffness σ r . Sincethe domain is periodic in all directions, the ends of the cylinder are connected to theirperiodic copies so that there are no “cuts” along the fibers. This problem is essentiallyequivalent to the two-dimensional thin ellipse problem considered in section 5.1, with theonly difference being that the 2D problem does not have any fibers running along thenon-existent third dimension. The 2D thin ellipse and 3D cylinder problems are onlystrictly equivalent when σ r = 0. However, we take σ r = σ s = 1 and L = 1 in orderto maintain the integrity of the elastic shell and to avoid any drifting of elliptical cross-sections in the x -direction. The elastic shell is discretized using equally-spaced values ofthe Lagrangian parameters s and r . In the simulations that follow, we use the parametervalues µ = 0 . ρ = 1, r = , r = , N s = N , N r = 3 N and ∆ t = 0 . /N where N = 128 and N = 256.The solution dynamics are illustrated by the snapshots pictured in Figure 12, andwe observe that the 3D elastic shell oscillates at roughly the same frequency as the 2Dellipse shown in Figure 6. Although the geometry of this problem may seem somewhatof a special case because of the alignment of axial fibers along the x -coordinate direction,this feature has no noticeable impact on parallel performance measurements. Indeed, thereason that fiber alignment doesn’t affect communication cost is because all IB points andforce connections residing in the ghost region are communicated regardless of whether ornot they actually cross subdomain boundaries.In Table 9, we present measurements of execution time and efficiency that illustratethe parallel scaling over the first 100 time steps with the number of processors P varyingbetween 1 and 128. In all runs, the domain is partitioned evenly between the P processingnodes using rectangular boxes. When the IB points are evenly distributed between alldomain partitions, we observe good parallel efficiency. On average, we obtain a speedupfactor of 1 .
85 when doubling the number of processors for this particular problem.For larger runs, the parallel efficiency does deteriorate as the local subdomain shrinksin size. For example, when subdividing a N = 128 grid to a ( P x , P y , P z ) = (32 , , × ×
64. Therefore, every processor has tocommunicate all data within its subdomain to neighbouring processors, since the entiredomain overlaps with ghost regions. For this reason, the GM-IB algorithm performsremarkably well, given the circumstances.Lastly, in Table 9, we show the execution time for situations where the immersedboundary is not evenly distributed between processors. Since no load balancing strategyis incorporated in our implementation, the parallel efficiency drops as the computationalwork becomes more unevenly divided. Here, the total execution time becomes increas-ingly dominated by the IB portion of the calculation as the workload becomes moreunbalanced. The parallel efficiency in this situation could be improved by partitioningthe fluid domain in a dynamic manner, such as is done by IBAMR using SAMRAI in [25].
7. Conclusions
We have developed a new algorithm for the immersed boundary problem on distributed-memory parallel computers that is based on the pseudo-compressibility method of Guer-mond and Minev for solving the incompressible Navier-Stokes equations. The funda-35 a )( b )( c )( d ) F i g u r e : Sn a p s h o t s o f a n o s c ill a t i n g3 D c y li nd r i c a l s h e ll ( N = )t h a t i s i n i t i a ll y s t r e t c h e d o u t w a r d a l o n g t h e x – d i r ec t i o n . able 9: Execution time (in seconds) and efficiency for the 3D cylindrical shell problem for a fixedproblem size while varying the number of processing nodes P . N = 128 N = 256 P ( P x , P y , P z ) Wall Time Efficiency Wall Time Efficiency1 (1 , ,
1) 4 . .
00 2 . .
002 (2 , ,
1) 2 . .
93 1 . .
914 (4 , ,
1) 1 . .
81 7 . .
928 (2 , ,
2) 6 . .
82 3 . . , ,
2) 3 . .
82 1 . . , ,
2) 1 . .
79 9 . . , ,
2) 9 . .
70 5 . . , ,
2) 7 . .
55 3 . . , ,
2) 2 . .
52 1 . . , ,
4) 1 . .
46 7 . . , ,
4) 7 . .
42 3 . . mental advantage of this fluid solver is the direction-splitting strategy applied to theincompressibility constraint, which reduces to solving a series of tridiagonal linear sys-tems with an extremely efficient parallel implementation.Numerical computations demonstrate the ability of our method to simulate a widerange of immersed boundary problems that includes not only 2D flows containing iso-lated fibers and thick membranes constructed of multiple nested fibers, but also 3D flowscontaining immersed elastic surfaces. The strong and weak scalability of our algorithmis demonstrated in tests with up to 256 distributed processors, where excellent speedupsare observed. Furthermore, comparisons against FFT-based (FFTW [16]) and multigrid(Hypre [15]) solvers shows substantial performance improvements when using the Guer-mond and Minev solver. We observe that since our implementation does not apply anyload balancing strategy, some degradation in the parallel efficiency is observed in im-mersed boundary portion of the computation when the elastic membrane is not equallydivided between processors.We believe that our computational approach is a promising one for solving fluid-structure interaction problems in which the solid elastic component takes up a largeportion of the fluid domain, such as occurs with dense particle suspensions [60] or verycomplex elastic structures that are distributed throughout the fluid. These are problemswhere local adaptive mesh refinement is less likely to offer any advantage because of theneed to use a nearly-uniform fine mesh over the entire domain in order to resolve theimmersed boundary. It is for this class of problems that we expect our approach to offersignificant advantages over methods such as that of Griffith et al. [24].We plan in future to implement modifications to our algorithm that will improve theparallel scaling, and particularly on improving memory access patterns for the Lagrangianportion of the calculation related to force spreading and velocity interpolation. We willalso investigate code optimizations that aim to reduce cache misses and exploit on-chip parallelism. Lastly, work has started on applying this algorithm to study sphericalmembrane dynamics, particle sedimentation, and fiber suspensions.37 eferences [1] P. Angot, J. Keating, and P. D. Minev. A direction splitting algorithm for incompressible flowin complex geometries. Computer Methods in Applied Mechanics and Engineering , 217:111–120,2012.[2] K. Asanovic, R. Bodik, B. C. Catanzaro, J. J. Gebis, P. Husbands, K. Keutzer, D. A. Patterson,W. L. Plishker, J. Shalf, and S. W. Williams. The landscape of parallel computing research: Aview from Berkeley. Technical report, UCB/EECS-2006-183, EECS Department, University ofCalifornia, Berkeley, 2006.[3] K. Asanovic, R. Bodik, J. Demmel, T. Keaveny, K. Keutzer, J. Kubiatowicz, N. Morgan, D. Pat-terson, K. Sen, and J. Wawrzynek. A view of the parallel computing landscape.
Communicationsof the ACM , 52(10):56–67, 2009.[4] A. H. Baker, R. D. Falgout, T. V. Kolev, and U. M. Yang. Scaling hypres multigrid solvers to100,000 cores. In
High-Performance Scientific Computing , pages 261–279. Springer, 2012.[5] S. Balay, J. Brown, K. Buschelman, V. Eijkhout, W. Gropp, D. Kaushik, M. Knepley, L. C. McInnes,B. Smith, and H. Zhang. PETSc Users Manual, Revision 3.3. Report ANL-95/11, Mathematicsand Computer Science Division, Argonne National Laboratory, Argonne, IL, June 2012. RetrievedApril 28, 2013 from .[6] T. T. Bringley and C. S. Peskin. Validation of a simple method for representing spheres and slenderbodies in an immersed boundary method for Stokes flow on an unbounded domain.
Journal ofComputational Physics , 227:5397–5425, 2008.[7] D. L. Brown, R. Cortez, and M. L. Minion. Accurate projection methods for the incompressibleNavier–Stokes equations.
Journal of Computational Physics , 168(2):464–499, 2001.[8] H. D. Ceniceros, J. E. Fisher, and A. M. Roma. Efficient solutions to robust, semi-implicit discretiza-tions of the immersed boundary method.
Journal of Computational Physics , 228(19):7137–7158,2009.[9] A. J. Chorin. A numerical method for solving incompressible viscous flow problems.
Journal ofComputational Physics , 2(1):12–26, 1967.[10] A. J. Chorin. Numerical solution of the Navier-Stokes equations.
Mathematics of Computation ,22(104):745–762, 1968.[11] P. Colella. Multidimensional upwind methods for hyperbolic conservation laws.
Journal of Com-putational Physics , 87(1):171–200, 1990.[12] R. H. Dillon, L. J. Fauci, C. Omoto, and X. Yang. Fluid dynamic models of flagellar and ciliarybeating.
Annals of the New York Academy of Sciences , 1101(1):494–505, 2007.[13] J. Douglas. Alternating direction methods for three space variables.
Numerische Mathematik ,4:41–63, 1962.[14] C. Duncan, G. Zhai, and R. Scherer. Modeling coupled aerodynamics and vocal fold dynamics usingimmersed boundary methods.
Acoustical Society of America Journal , 120(5):2859–2871, 2006.[15] R. D. Falgout, J. E. Jones, and U. M. Yang. The design and implementation of hypre, a library ofparallel high performance preconditioners. In
Numerical solution of partial differential equationson parallel computers , pages 267–294. Springer, 2006.[16] M. Frigo and S. G. Johnson. FFTW: An adaptive software architecture for the FFT. In
Proceedingsof the 1998 IEEE International Conference on Acoustics, Speech, and Signal Processing. , volume 3,pages 1381–1384. IEEE, 1998.[17] E. Gabriel, G. E. Fagg, G. Bosilca, T. Angskun, J. J. Dongarra, J. M. Squyres, V. Sahay, P. Kam-badur, B. Barrett, A. Lumsdaine, R. H. Castain, D. J. Daniel, R. L. Graham, and T. S. Woodall.Open MPI: Goals, concept, and design of a next generation MPI implementation. In
Proceedings,11th European PVM/MPI Users’ Group Meeting , pages 97–104, Budapest, Hungary, September2004.[18] M. Ganzha, K. Georgiev, I. Lirkov, S. Margenov, and M. Paprzycki. Highly parallel alternatingdirections algorithm for time dependent problems. In
Third Conference on Application of Math-ematics in Technical and Natural Sciences , volume 1404 of
AIP Conference Proceedings , pages210–217, Albena, Bulgaria, June 20-25, 2011.[19] E. Givelberg and K. Yelick. Distributed immersed boundary simulation in Titanium.
SIAM Journalon Scientific Computing , 28(4):1361–1378, 2006.[20] S. Gottlieb, C. W. Shu, and E. Tadmor. Strong stability-preserving high-order time discretizationmethods.
SIAM Review , 43(1):89–112, 2001.[21] B. E. Griffith.
Simulating the blood-muscle-valve mechanics of the heart by an adaptive and parallelversion of the immersed boundary method . PhD thesis, New York University, 2005.
22] B. E. Griffith. An accurate and efficient method for the incompressible Navier-Stokes equationsusing the projection method as a preconditioner.
Journal of Computational Physics , 228(20):7565–7595, 2009.[23] B. E. Griffith. On the volume conservation of the immersed boundary method.
Communicationsin Computational Physics , 12:401–432, 2012.[24] B. E. Griffith, R. D. Hornung, D. M. McQueen, and C. S. Peskin. An adaptive, formally secondorder accurate version of the immersed boundary method.
Journal of Computational Physics ,223(1):10–49, 2007.[25] B. E. Griffith, R. D. Hornung, D. M. McQueen, and C. S. Peskin. Parallel and adaptive simulation ofcardiac fluid dynamics. In M. Parashar and X. Li, editors,
Advanced computational infrastructuresfor parallel and distributed adaptive application , pages 105–130. John Wiley and Sons, 2009.[26] B. E. Griffith, X. Y. Luo, D. M. McQueen, and C. S. Peskin. Simulating the fluid dynamics ofnatural and prosthetic heart valves using the immersed boundary method.
International Journalof Applied Mechanics , 1(1):137–177, 2009.[27] B. E. Griffith and C. S. Peskin. On the order of accuracy of the immersed boundary method:Higher order convergence rates for sufficiently smooth problems.
Journal of Computational Physics ,208(1):75–105, 2005.[28] J. L. Guermond and P. D. Minev. A new class of fractional step techniques for the incompressibleNavier-Stokes equations using direction splitting.
Comptes Rendus Mathematique , 348:581–585,2010.[29] J. L. Guermond and P. D. Minev. A new class of massively parallel direction splitting for theincompressible Navier-Stokes equations.
Computer Methods in Applied Mechanics and Engineering ,200(23-24):2083–2093, 2011.[30] J. L. Guermond and P. D. Minev. Start-up flow in a three-dimensional lid-driven cavity by meansof a massively parallel direction splitting algorithm.
International Journal for Numerical Methodsin Fluids , 68(7):856–871, 2011.[31] J. L. Guermond, P. D. Minev, and A. J. Salgado. Convergence analysis of a class of massively paralleldirection splitting algorithms for the Navier-Stokes equations in simple domains.
Mathematics ofComputation , 81(280):1951, 2012.[32] C. Hamlet, A. Santhanakrishnan, and L. A. Miller. A numerical study of the effects of bell pulsationdynamics and oral arms on the exchange currents generated by the upside-down jellyfish
Cassiopeaxamachana . Journal of Experimental Biology , 214:1911–1921, 2011.[33] F. H. Harlow and J. E. Welch. Numerical calculation of time-dependent viscous incompressible flowof fluid with free surface.
Physics of Fluids , 8(12):2182–2189, 1965.[34] T. Y. Hou and Z. Shi. An efficient semi-implicit immersed boundary method for the Navier-Stokesequations.
Journal of Computational Physics , 227(20):8968–8991, 2008.[35] IBAMR: An adaptive and distributed-memory parallel implementation of the immersed boundary(IB) method. Retrieved April 28, 2013 from http://code.google.com/p/ibamr .[36] M. C. Lai and C. S. Peskin. An immersed boundary method with formal second-order accuracyand reduced numerical viscosity.
Journal of Computational Physics , 160(2):705–719, 2000.[37] D. V. Le, J. White, J. Peraire, K. M. Lim, and B. C. Khoo. An implicit immersed boundarymethod for three-dimensional fluid-membrane interactions.
Journal of Computational Physics ,228(22):8427–8445, 2009.[38] L. Lee and R. J. LeVeque. An immersed interface method for incompressible Navier–Stokes equa-tions.
SIAM Journal on Scientific Computing , 25(3):832–856, 2003.[39] R. J. LeVeque and Z. Li. Immersed interface methods for Stokes flow with elastic boundaries orsurface tension.
SIAM Journal on Scientific Computing , 18(3):709–735, 1997.[40] P. McCorquodale, P. Colella, and H. Johansen. A Cartesian grid embedded boundary method forthe heat equation on irregular domains.
Journal of Computational Physics , 173(2):620–635, 2001.[41] L. A. Miller and C. S. Peskin. Computational fluid dynamics of ‘clap and fling’ in the smallestinsects.
Journal of Experimental Biology , 208:195–212, 2005.[42] M. L. Minion. On the stability of Godunov-projection methods for incompressible flow.
Journal ofComputational Physics , 123(2):435–449, 1996.[43] R. Mittal and G. Iaccarino. Immersed boundary methods.
Annual Review of Fluid Mechanics ,37:239–261, 2005.[44] Y. Mori and C. S. Peskin. Implicit second-order immersed boundary methods with boundary mass.
Computer Methods in Applied Mechanics and Engineering , 197(2528):2049–2067, 2008.[45] Y. Morinishi, T. Lund, O. Vasilyev, and P. Moin. Fully conservative higher order finite differenceschemes for incompressible flow.
Journal of Computational Physics , 143(1):90–124, 1998.
46] E. P. Newren.
Enhancing the immersed boundary method: Stability, volume conservation, andimplicit solvers . PhD thesis, Department of Mathematics, University of Utah, Salt Lake City, UT,May 2007.[47] E. P. Newren, A. L. Fogelson, R. D. Guy, and R. M. Kirby. Unconditionally stable discretizationsof the immersed boundary equations.
Journal of Computational Physics , 222(2):702–719, 2007.[48] E. P. Newren, A. L. Fogelson, R. D. Guy, and R. M. Kirby. A comparison of implicit solvers forthe immersed boundary equations.
Computer Methods in Applied Mechanics and Engineering ,197(25-28):2290–2304, 2008.[49] C. S. Peskin. Flow patterns around heart valves: A numerical method.
Journal of ComputationalPhysics , 10:252–271, 1972.[50] C. S. Peskin. The immersed boundary method.
Acta Numerica , 11:479–517, 2002.[51] C. S. Peskin and B. F. Printz. Improved volume conservation in the computation of flows withimmersed elastic boundaries.
Journal of Computational Physics , 105:33–46, 1993.[52] M. Pippig. PFFT: An extension of FFTW to massively parallel architectures.
SIAM Journal onScientific Computing , 35(3):C213–C236, 2013.[53] R. Rannacher. On Chorin’s projection method for the incompressible Navier-Stokes equations. InJ. Heywood, K. Masuda, R. Rautmann, and V. Solonnikov, editors,
The Navier-Stokes EquationsII – Theory and Numerical Methods , volume 1530 of
Lecture Notes in Mathematics , pages 167–183.Springer, Berlin/Heidelberg, 1992.[54] Y. Saad.
Iterative Methods for Sparse Linear Systems , volume 620. PWS Publishing Company,Boston, 1996.[55] J. Shen. On a new pseudo-compressibility method for the incompressible Navier-Stokes equations.
Applied Numerical Mathematics , 21:71–90, 1996.[56] J. Shen. Pseudo-compressibility methods for the unsteady incompressible Navier-Stokes equations.In
Proceedings of the 1994 Beijing Symposium on Nonlinear Evolution Equations and InfiniteDynamical Systems , pages 68–78, 1997.[57] J. M. Stockie.
Analysis and Computation of Immersed Boundaries, with Application to Pulp Fibres .PhD thesis, Institute of Applied Mathematics, University of British Columbia, Vancouver, Canada,1997. Available from https://circle.ubc.ca/handle/2429/7346 .[58] J. M. Stockie and B. R. Wetton. Analysis of stiffness in the immersed boundary method andimplications for time-stepping schemes.
Journal of Computational Physics , 154(1):41–64, 1999.[59] R. Temam. Une m´ethode d’approximation de la solution des ´equations de Navier-Stokes.
Bulletinde la Soci´et´e Math´ematique de France , 98(4):115–152, 1968.[60] A.-K. Tornberg and M. J. Shelley. Simulating the dynamics and interactions of flexible fibers inStokes flows.
Journal of Computational Physics , 196:8–40, 2004.[61] E. Twizell, A. Gumel, and M. Arigu. Second-order, L0-stable methods for the heat equation withtime-dependent boundary conditions.
Advances in Computational Mathematics , 6:333–352, 1996.[62] M. Uhlmann. Simulation of particulate flows on multi-processor machines with distributed memory.Technical Report 1039, CIEMAT, Madrid, Spain, May 2003.[63] WestGrid. QuickStart Guide to Bugaboo. Retrieved April 11, 2013 from .[64] S. M. Yau. Experiences in using titanium for simulation of immersed boundary biological systems.Master’s thesis, UC Berkeley, 2002..[64] S. M. Yau. Experiences in using titanium for simulation of immersed boundary biological systems.Master’s thesis, UC Berkeley, 2002.