[PDF] Fast and spectrally accurate summation of 2-periodic Stokes potentials

Abstract

We derive a Ewald decomposition for the Stokeslet in planar periodicity and a novel PME-type O(N log N) method for the fast evaluation of the resulting sums. The decomposition is the natural 2P counterpart to the classical 3P decomposition by Hasimoto, and is given in an explicit form not found in the literature. Truncation error estimates are provided to aid in selecting parameters. The fast, PME-type, method appears to be the first fast method for computing Stokeslet Ewald sums in planar periodicity, and has three attractive properties: it is spectrally accurate; it uses the minimal amount of memory that a gridded Ewald method can use; and provides clarity regarding numerical errors and how to choose parameters. Analytical and numerical results are give to support this. We explore the practicalities of the proposed method, and survey the computational issues involved in applying it to 2-periodic boundary integral Stokes problems.

Full PDF

FFast and spectrally accurate summation of 2-periodic Stokespotentials

Dag Lindbo and Anna-Karin Tornberg Numerical Analysis, Royal Inst. of Tech. (KTH), 100 44 Stockholm, Sweden

November 3, 2018

Abstract

We derive a Ewald decomposition for the Stokeslet in planar periodicity and a novelPME-type O ( N log N ) method for the fast evaluation of the resulting sums. The decom-position is the natural 2P counterpart to the classical 3P decomposition by Hasimoto,and is given in an explicit form not found in the literature. Truncation error estimatesare provided to aid in selecting parameters. The fast, PME-type, method appears tobe the ﬁrst fast method for computing Stokeslet Ewald sums in planar periodicity, andhas three attractive properties: it is spectrally accurate; it uses the minimal amount ofmemory that a gridded Ewald method can use; and provides clarity regarding numericalerrors and how to choose parameters. Analytical and numerical results are give to supportthis. We explore the practicalities of the proposed method, and survey the computationalissues involved in applying it to 2-periodic boundary integral Stokes problems. Viscous ﬂow systems continuously enjoy much attention from various scientiﬁc disciplines,such as the widespread study of passive and motile suspensions and other applications inbio-, micro- and complex ﬂuidics. An example is the large body of work that involves thestudy of locomotion of small organisms, the basics of which are illustrated in a classic article byPurcell [55], with further developments in e.g. Koiller et al. [37]. Theoretical understanding ofvarious modes of propulsion [66, 65, 17, 8, 30, 29] is rapidly progressing apace with simulationmethods [32, 59]. If history is any guide, analytical results will feed into computationalmodels and rich computational studies of complex systems will emerge. There are also notablystrong interdisciplinary connections present in this area. A recent survey of the modelling ofbiomimetic ﬂuid ﬂow by Saha et al. [57] provides a broader perspective, including modelingand computation outside of the viscous ﬂow regime.Another area where there has been much activity is in simulation of suspensions of variousparticles in viscous ﬂows, motivated, for instance, by a desire to understand the formationof microstructures [27] and paper-making [46]. Here one ﬁnds detailed computational studiesof rigid or ﬂexible ﬁber suspension, such as the work by Saintillan et al. [58] and Tornberg, ∗ To whom correspondence should be addressed. Email: [email protected] a r X i v : . [ phy s i c s . f l u - dyn ] N ov helley and collaborators [69, 68, 61], as well as large body of work concerning suspensionsof various spheroids [7, 64] and related particles [41]. Complex structure formation is ob-served, based on accurate modeling of hydrodynamical interactions, in these computationalinvestigations.Alas, the present work shall not in itself elucidate these fascinating questions of collectivedynamics, but rather pursue algorithmic development that could help such investigations dealwith larger systems in future.The models under consideration in the works cited, and in countless other investigations,are based on the Stokes equations. A variety of numerical and analytical techniques areused, but a common feature is the use of singularity solutions. One class of methods is theso called distributed singularity approach, in which the various Green’s functions of Stokesequations are combined in point, line, or surface distributions to generate the ﬂow induced bya particular particle or distribution. In this way, slender bodies (e.g. ﬁbers) are representedby a line distribution of Stokes potentials along its centerline, as was suggested by Batchelor[3]. In an analogous way, spheroids can be represented by a combination of higher-orderGreen’s functions, see e.g. Zhou & Pozrikidis [73] and G¨otz [18].There are also direct boundary integral methods, as discussed in e.g. Pozrikids [53, 51],Anderson et al. [31, 5, 4], and others [35, 71, 72]. Another class of methods that has beenhighly successful is known as Stokesian dynamics , and is due to Brady and collaborators[6, 64]. All of these methods are based on distributions of Green’s functions that are eithersummed or integrated, often in intricate ways.Moreover, results free from ﬁnite-size eﬀects are often of interest; the well-trodden pathtowards which is applying periodic boundary conditions to a suﬃciently large system [64, 69,58]. Solvers that exploit periodic structure, such as the fast Ewald methods that we shallsurvey shortly, are often highly specialized. In particular, there are fundamental diﬀerencesbetween how full (i.e. applied in all three directions) and lower dimensional periodicity (i.e.periodicity with respect to one or two dimensions) enters in mathematical (cf. Pozrikidis[52]) and algorithmic terms. Whereas methods are relatively well established for fully periodicStokes problems [64, 58, 42], methods for problems in planar periodicity, e.g. when periodicityis applied to ( x, y ) and z is “free”, are less so (see e.g. [52, 28]). However, such systems,including wall-conﬁned systems, applications in biology (such as beating ﬂagella in planarconﬁgurations [9, 36, 38]), are of growing interest.The present work deals with the eﬃcient, O ( N log N ), and spectrally accurate summationof Stokes potentials in the 2P setting. To clarify, let Ω = [0 , L ) and consider Stokes equationsfor x ∈ Ω, −∇ p ( x ) + µ ∆ u ( x ) = f ( x ) ∇ · u ( x ) = 0 , (1)where u ( x ) denotes the velocity ﬁeld, p ( x ) the pressure, and f ( x ) the force applied to theﬂuid. The fundamental solution of Stokes ﬂow represents solutions of the singularly forcedStokes equations, i.e. (1) with f ( x ) = f δ ( x − x ).Introducing the Stokeslet , S ( x ) = I k x k + x ⊗ x k x k , (2)2here ( I ) ij = δ ij is the identity and x ⊗ x is the outer product x i x j , the solution of (1) infree space with f ( x ) = f δ ( x − x ) can be written: u ( x ) = 18 πµ Z R S ( x − y ) f δ ( y − x )d y = 18 πµ S ( x − x ) f . See e.g. the textbook by Pozrikidis [51, p. 22] for a derivation of (2). A solution u ( x ) , x ∈ Ω,that satisﬁes periodicity is expressed as an inﬁnite sum over periodic images of f convolvedwith S , the Stokeslet, u ( x ) = 18 πµ Z R S ( x − y ) X p ∈ Z d f δ ( y − x n + τ ( p ))d y = 18 πµ X p ∈ Z d S ( x − x n + τ ( p )) f , where d = 1 , , x ∈ R throughout) and τ : Z d → R is a translation into the periodic image p . Under fully periodic boundary conditions (3P) onewould let τ ( p ) = L p , and in the 2-periodic (2P) case we let τ ( p ) = [ L p , f denote an array of point forces, f ( x ) = N X n =1 f n δ ( x − x n ) , (3)where f n may contain any particular set of e.g. quadrature weights and physical constants.We focus on the evaluation the periodized Stokeslet sum u ( x ) = X p ∈ Z d N X n =1 S ( x − x n + τ ( p )) f n , . (4)Note that the terms in (4) decay as 1 /r , and (4) is obviously not absolutely summable.Takemoto et al. [67] discuss this in the related context of the electrostatic potential. However,the practical computation of (4) is plainly infeasible, even for small N , so fast methods areessential. Alternatives exist at this point, including pursuing extensions to the fast multipolemethod (FMM) by Greengard and Rokhlin [22]. There exist FMMs that incorporate peri-odicity in 3D, e.g. Kudin & Scuseria [40] for electrostatics, and FMM-related methods forStokes [19] in 2D with periodicity. However, the dominating framework for periodic problemsin this setting incorporates periodicity by using Fourier transforms. The basic principles forhow to proceed in that direction have long been clear, neatly divided into two stages:(a) Decompose the Stokeslet sum (4) into rapidly converging parts, e.g. by applying ideasfrom Ewald summation .(b) Devise a method to reduce the complexity of computing the decomposed Stokeslet sums,e.g. by means of FFT-based methods.The extent to which this has been realized depends on the periodic structure of theproblem. In the fully periodic (3P) setting, various decompositions exist that clarify ( a ), andmethods with O ( N log N ) complexity have been developed ( b ), as we survey in Section 2.3n the case of planar periodicity (2P), the picture is much less clear. Existing decompo-sitions pertaining to ( a ) are surveyed in Section 3, after which we derive an Ewald-type sumfor computing (4) (Section 3.2). A fast, O ( N log N ), method is presented in Section 4, which,as far as we know, is the ﬁrst reported method that deals with ( b ) in the 2-periodic setting.Interestingly, the fast method we propose ( b ) is, in some ways, simpler than the underlyingEwald sum ( a ) that we derive. As shall become clear, this has to do with a relationshipbetween the 2- and 3-periodic settings, which we exploit for the fast method.We demonstrate several appealing characteristics of the proposed method: spectral accu-racy; eﬃciency in both run-time and memory; clear error estimation and parameter selection;and a close and revealing correspondence to methods for the 3P problem. This will, we hope,facilitate future computational investigations of micro- and complex ﬂow systems, enablinglarge systems to be simulated accurately. We start by summarizing well-established results for the fully periodic case. Before interestin solving Stokes equations took oﬀ there was already a large body of work established on arelated problem in electrostatics – summing Coulomb potentials under periodicity (solving aPoisson problem, rather than Stokes) – known as Ewald summation. The basic principle isthat the potential, φ ∼ /r , is split into a short range part that is exponentially decaying,and a long range part that is very smooth (and thus exponentially convergent in reciprocalspace).Pioneering work by Hasimoto [24] showed that a 3-periodic vector ﬁeld u P ( x ) can becomputed in a Ewald-like manner, u P ( x m ) = X p ∈ Z N X n =1 A ( ξ, x m − x n + L p ) f n ++ 8 πL X k =0 B ( ξ, k ) e − k / ξ N X n =1 f n e − i k · ( x m − x n ) − ξ √ π f m , (5)where A ( ξ, x ) = 2 ξe − ξ r √ πr + erfc( ξr )2 r ! ( r I + x ⊗ x ) − ξ √ π e − ξ r I , (6)with r := k x k , and B ( ξ, k ) = k ξ ! k ( k I − k ⊗ k ) . (7)The parameter ξ >

0, which u P is independent of, is known as the Ewald parameter andcontrols the rate at which the two sums converge relative to each other. Note here that thetwo sums in (5) have the desired structure – the ﬁrst is a sum in real space that convergesroughly as e − ξ r , and the second is a sum in k -space that converges as roughly as e − k / ξ .Other decompositions exist, see Pozrikidis [52, Sec. 3.1, 4.1].4 .1 Fast methods in full periodicity The ∼ /r convergence of the original Stokeslet sum (4) is vastly improved by the Ewaldmethod (5). However, the complexity of computing the Ewald sum (5) is still severely limiting.There are two reasons for this: First, summing (5) for all x m has O ( N ) complexity. Secondly,the constant hidden in the formal complexity can be very large; in fact, it grows cubicallywith higher accuracy.Faster methods for the corresponding Poisson problem in electrostatics and molecularsimulation have been around for three decades, following work by Hockney & Eastwood [26];see the survey by Deserno & Holm [12], or recent work by the present authors [44].Such methods have been adapted for the 3P Stokes Ewald sum (5), staring with Sierou &Brady [64] (embedded in the framework of “Stokesian dynamics”). Their method is based onthe Particle Mesh Ewald (PME) method by Darden et al. [11]. Saintillan et al. [58], in theirmethod for sedimenting ﬁbers in Stokes ﬂow, base a fast method for the Stokeslet sum on areﬁnement of the PME method, known as

Smooth Particle Mesh Ewald (SPME) [14]. TheSPME method is known to be more accurate than its predecessor, though still of polynomialorder. Recognizing that the exponentially fast convergence of the underlying Ewald sums islost when a traditional PME approach is used, the present authors presented a spectrallyaccurate PME-type method for Stokes in [42].

Perhaps counter to intuition, one should not expect a method for 2-periodic systems tofollow by elementary manipulations of the 3P decomposition (5) – to the contrary, one isbest advised to start anew from the Stokeslet sum (4). It is also instructive to review thecorresponding transition from 3P to 2P in the electrostatics setting, as we do in [43]. Oneﬁnds that consolidation, on the level of agreeing on a preferred decomposition, has yet tohappen (cf. e.g. the 2P Ewald sum in Gryzbowski et al. [23] and a survey of non-Ewaldmethods by Mazars [50]), and that fast PME-type methods are considerably less mature thantheir 3P counterparts, though we hope that [43] contributes in this direction.There does exist 2P Ewald decompositions for Stokes – notably in Pozrikidis [52], relatingto earlier work by Ishii [28]. While it should be emphasized that [52] is among the mostvaluable references in the present context, we ﬁnd the results given therein lacking in tworespects: First, it turns out to be quite straight-forward to derive and present a 2P StokesEwald sum in explicit form, whereas Pozrikidis [52, Sec. 2.2] is content with giving a “gen-erating function” and a diﬀerential operator. Secondly, and more importantly, the results onoﬀer in [52] do not, at least to us, suggest how a fast PME-type method could be developed.Such a fast method being our objective, we now set out to derive a 2P Ewald sum forStokes in a suitable way. To give an overview, we start by deriving a pure k -space solutionto a 2-periodic Stokes problem that decays as z → ±∞ , which we then decompose using the“screening function” (15) proposed by Hernandez-Ortiz et al. [25] to generate the Hasimotodecomposition (5)-(7) in the 3-periodic setting. We then consider the family of solutionsobtained by adding a piecewise linear function, showing that this admits an end-result whichis diﬀerentiable. Under certain conditions, we show that these smooth solutions have ﬁnitelimits as z → ±∞ , as expected physically. The resolution of the smoothness/singularity issueis quite compactly discussed in related work [52, pp. 83-84], [28, p. 676]; this may suit somereaders more than others. 5igure 1: Planar periodicity (2P): Primary cell replicated inﬁnitely in the plane. The conventions for Fourier transform pairs used in the present work are given in AppendixB. As in [43], we start with observing that the periodic structure of a 2-periodic function h ( x )implies that its spectral representation will be mixed in the following sense: Let h ( x ) = h ( ρ, z )be periodic in the ( x, y )-plane and “free” in z , i.e. ρ := ( x, y ) ∈ Ω = [0 , L ) and z ∈ R . Thena Fourier representation of h is h ( ρ, z ) = 12 π X k Z ∞−∞ ˆ h ( k , κ ) e i k · ρ e iκz d κ, (8)where k ∈ π Z /L and κ ∈ R . We shall assume that h ( ρ, z → ±∞ ) decays faster than anyinverse power of z . In this setting, (8) exists. Also under the present assumptions, 2P versionsof several familiar results from spectral analysis hold, such as the Poisson summation formula,Parseval/Plancherel’s theorems and the convolution theorem, see [43]. Before we address Ewald summation for the 2P Stokes problem, we need to establish (slowlyconverging) solutions of the original problem that, crucially, vanishes as z → ±∞ . Considerthe Stokes equations, −∇ p ( x ) + µ ∆ u ( x ) + f ( x ) = 0 ∇ · u ( x ) = 0under 2-periodic boundary conditions with respect to Ω. Presently, assume that the sourceterm, f , has a mixed Fourier transform, ˆ f ( k , κ ), and expand u and q := ∇ p likewise, u ( x ) = u ( ρ, z ) = 12 π X k Z R ˆ u ( k , κ ) e i k · ρ e iκz d κ (9) ∇ p ( x ) = ∇ p ( ρ, z ) = 12 π X k Z R ˆ q ( k , κ ) e i k · ρ e iκz d κ. k := ( k , κ ) ∈ { π Z /L } × R ,ˆ q + µ k k k ˆ u = ˆ f i k · ˆ u = 0 . As u should vanish at inﬁnity we may omit summing over k = 0, because ˆ u (0 , κ ) = 0 isimplied. Eliminating the pressure gradient above, ˆ q ( k , κ ) = ( k ⊗ k )ˆ f ( k , κ ) / k k k (since q is agradient ˆ q ( k ) is parallel to k ), givesˆ u ( k ) = ˆ u ( k , κ ) = 1 µ (cid:18) I k k k − k ⊗ k k k k (cid:19) ˆ f ( k , κ ) , k = 0 . (10)Hence, in light of (9), u ( ρ, z ) = 12 πµ X k =0 Z R (cid:18) I k k k − k ⊗ k k k k (cid:19) ˆ f ( k , κ ) e i k · ρ e iκz d κ. (11)Recalling f ( x ) from (3), we absorb a factor 8 πµ into the coeﬃcients f n for convenience andconformity with convention. With that, (11) becomes˜ u ( ρ, z ) = 4 L X k =0 Z R (cid:18) I k k k − k ⊗ k k k k (cid:19) N X n =1 f n e i k · ( ρ − ρ n ) e iκ ( z − z n ) d κ, where the integrals are computable. We obtain (see Appendix A.1):˜ u ( ρ, z ) = 4 L X k =0 N X n =1 ˜ Q ( k , z − z n ) f n e i k · ( ρ − ρ n ) , (12)where ˜ Q ( k , z ) = e −k k k| z | k k k  π − ( π k k k| z | + π ) k k k k − π ( k k k| z | +1) k k k k k − iπzk − π ( k k k| z | +1) k k k k k π − ( π k k k| z | + π ) k k k k − iπzk − iπzk − iπzk ( π k k k| z | + π )  . (13)Note that ˜ u is well-deﬁned, but, as will be discussed in Section 3.2.2, fails to be diﬀerentiablein an ( x, y )-plane around each z n . A fruitful and ﬂexible approach to deriving Ewald sums is to compute convolutions of sourceterm with a so called screening function – this is the classical approach for the 3P Laplace(electrostatics) problem. In planar periodicity the singularities encountered are more severe,and the appropriate behavior in the limit z → ±∞ not as straight forward as the correspond-ing “tin foil” (or, for Stokes,“pressure gradient counters net force”) conditions that enter 3Pderivations. Our approach here is to construct a decomposition of (12) and then address theregularity issue.By “screening function” we mean any normalized function, k γ ( x ) k = 1, that decayssmoothly away from zero, such that a decomposition, f ( x ) = (( δ − γ ) ∗ f )( x ) + ( γ ∗ f )( x ) = f ( x ) − f γ ( x ) + f γ ( x ) , δ ( x ) denotes the Dirac measure on R and f γ ( x ) := ( f ∗ γ )( x ), is well deﬁned. Withthis, clearly, u ( x ) = 18 πµ Z R S ( x − y ) ( f ( y ) − f γ ( y )) d y + 18 πµ Z R S ( x − y ) f γ ( y )d y . (14)As noted, the standard derivation for the 3P Laplace case is to take γ as a pure Gaussian,compute the ﬁrst convolution directly and treat the second term in reciprocal space. In-terestingly, in a short paper sparse on details [25], Hernandez-Ortiz et al. give a screeningfunction, γ ( r ) = ξ √ π e − ξ r (cid:18) − ξ r (cid:19) , (15)that generates exactly the 3P Hasimoto decomposition (5)-(7). The ﬁrst term in (14) can be computed directly, as is standard in this context (though thecalculations are laborious), and one ﬁnds that Z R S ( x − y ) ( f ( y ) − f γ ( y )) d y = N X n =1 X p ∈ Z A ( ξ, x − x n + L p ) f n , (16)where A is the same (though summed over a two-dimensional lattice this time) vis-a-vis the3P decomposition (6).Note that a particle does not itself contribute to the potential ﬁeld, or ﬂow, it experiences,so it is natural to simply drop the term in (16) that corresponds to p = 0 when m = n .However, by the decomposition (14), part of the contribution that we’re trying to remove hasgone into the second term. By computing the diﬀerence between the free space Stokeslet andthe real-space term, we can ﬁnd this contribution. This is to be subtracted oﬀ, so we do itwith the opposite sign:lim k x k→ ( A ( x , ξ ) − S ( x )) = lim k x k→ (cid:18) − ( α + erf( ξ k x k )) I k x k + ( α − erf( ξ k x k )) x ⊗ x k x k (cid:19) = − ξ √ π I , where α := 2 ξπ − / k x k e − ξ k x k . By convention, this term is denoted “self interaction” whichcan lead to some confusion, since it’s the term removing self interaction that is included inthe k -space sum. Nonetheless, we have arrived at two parts of the 2P Stokeslet Ewald sum, u R ( x m ) := N X n =1 ∗ X p ∈ Z A ( ξ, x m − x n + L p ) f n (17) u self ( x m ) := − ξ √ π f m . (18)The notational inconvenience ∗ , signifying the omission of the term { p = 0 , n = m } in (17),is standard. 8 .2.2 The k-space sum Turning to the second term in the Stokeslet decomposition (14), ﬁrst note that both termsin (14) are solutions to Stokes equations. Moreover, both f γ and f − f γ are admissible underthe assumptions of Section 3.1.2. The Fourier transform of γ over R isˆ γ ( k ) = k ξ ! e − k / ξ . In light of (11) we let f γ = f ∗ γ generate a function u F that satisﬁes u F → z → ±∞ , u F ( ρ, z ) = 12 πµ X k =0 Z R (cid:18) I k k k − k ⊗ k k k k (cid:19) b f γ ( k , κ ) e i k · ρ e iκz d κ = 4 L X k =0 Z R (cid:18) I k k k − k ⊗ k k k k (cid:19) k k k ξ ! e −k k k / ξ N X n =1 f n e i k · ( ρ − ρ n ) e iκ ( z − z n ) d κ (19)= 4 L X k =0 Z R B ( k ) e −k k k / ξ N X n =1 f n e i k · ( ρ − ρ n ) e iκ ( z − z n ) d κ, where we have used Poisson summation and grouped similar Fourier coeﬃcients.For future reference, we note that this expression is the starting point from which wedevelop a fast PME-type method. Given the present mixed periodic setting, the close corre-spondence between (19) and (7) is entirely expected, and illuminating per se .As it transpires in Appendix A.2, the integrals above, Q ( k , z ) := e −k k k / ξ Z R (cid:18) I k k k − k ⊗ k k k k (cid:19) k k k ξ ! e − κ / ξ e iκz d κ, (20)can be computed. For clarity of notation is is useful to let Q ( k , z ) = Q I ( k , z ) + Q k ⊗ k ( k , z ) , where the meaning of the superscripts is implied from the terms in the ﬁrst factor under theintegral in (20). With this, and the computations in Appendix A.2, it follows that we maywrite u F as a sum, u F ( ρ m , z m ) = 4 L X k =0 N X n =1 (cid:18)(cid:0) Q I ( k , z mn ) + Re Q k ⊗ k ( k , z mn ) (cid:1) cos( k · ρ mn ) − + Im Q k ⊗ k ( k , z mn ) sin( k · ρ mn ) (cid:19) f n , (21)where Q I ( k , z ) = 2 J ( z )4 ξ + J ( k k k , z ) ! I (22)9nd Q k ⊗ k ( k , z ) = −  k (cid:16) J ξ + J (cid:17) k k (cid:16) J ξ + J (cid:17) k (cid:16) iK ξ + iK (cid:17) k k (cid:16) J ξ + J (cid:17) k (cid:16) J ξ + J (cid:17) k (cid:16) iK ξ + iK (cid:17) k (cid:16) iK ξ + iK (cid:17) k (cid:16) iK ξ + iK (cid:17) J ξ + J  ( k k k , z ) . (23)The terms J pq and K pq are short-hand for the various scalar integrals that can be identiﬁed bystudying the integrand in Q . With λ := e − k / ξ − ξ z (24) θ + := e kz erfc (cid:18) k ξ + ξz (cid:19) (25) θ − := e − kz erfc (cid:18) k ξ − ξz (cid:19) (26)we can write down the computed integrals as J ( z, k ) = √ πλξ (27) J ( z, k ) = π ( θ − + θ + )4 k (28) J ( z, k ) = √ πλ k ξ + π (cid:18) θ − + θ + k + ( θ − − θ + ) z k − θ − + θ + kξ (cid:19) (29) J ( z, k ) = 14 π ( − θ − − θ + ) k + √ πλξ (30) J ( z, k ) = π (cid:18) ( θ − + θ + ) k ξ + θ − + θ + k + 18 ( θ + − θ − ) z (cid:19) − √ πλ ξ (31) K ( z, k ) = π ( θ − − θ + )4 (32) K ( z, k ) = π (cid:18) θ + − θ − ξ + ( θ − + θ + ) z k (cid:19) . (33)Despite some diﬃculty of notation, the sum (21) is straight-forward to evaluate.Now, recall that we are presently aiming for a decomposition of the pure k -space solution(12), wherein the k = 0 term was dropped to ensure decay as z → ±∞ . However, by thedecomposition (14), part of the k = 0 mode has gone into the real-space sum (17) and mustbe subtracted oﬀ. Computing (12) explicitly has provided a k -space view of the real-spacesum, so that the term to be removed can be extracted as u ∗ ( z ) = lim k → (cid:16) ˆ u ( k ) − ˆ u F ( k ) (cid:17) = lim k → L N X n =1 ( ˜ Q ( k , z − z n ) − Q ( k , z − z n )) f n ! . Computing the desired limit, one ﬁndslim k → (cid:16) Q ( k , z ) − ˜ Q ( k , z ) (cid:17) =  a ( z ) 0 00 a ( z ) 00 0 0  , a ( z ) := π | z | − πz erf( zξ ) − e − z ξ √ π ξ , (34)so that u ∗ ( z ) = 4 L N X n =1 a ( z − z n ) I f n , I :=   . We now have the desired decomposition of (12),˜ u ( ρ m , z m ) = u R ( ρ m , z m ) + u F ( ρ m , z m ) − u ∗ ( z m ) + u self . Naturally, the non-smoothness in (12) is present here (in the term u ∗ ). However, we canrecover a smooth solution by relaxing the restriction that u → z → ±∞ . To any u that satisﬁes the Stokes equation (1) we can add a linear function. By that token, and inclose analogy to how the 2P Ewald sum for electrostatics was obtained in [43], we subtract apiecewise linear function, u F, k =0 ( x m ) = − πL N X n =1 | z m − z n | I f n − u ∗ = − L N X n =1 π ( z m − z n )erf (cid:0) ( z m − z n ) ξ (cid:1) + √ π ξ e − ( z m − z n ) ξ ! I f n . (35)With this, the derivation of the 2P Stokes Ewald sum is complete. We have that u ( x m ) = u ( ρ m , z m ) = u R ( ρ m , z m ) + u F ( ρ m , z m ) + u F, k =0 ( z m ) + u self , (36)from (17), (21), (35) and (18). These are nice and exponentially convergent sums, but,as discussed in the introduction, the complexity of evaluating them for a large system isdebilitating. Note that u F, k =0 contains terms that look like an unbounded shear ﬂow in z .However, if the coeﬃcients f n sum to zero in the ﬁrst two components, u F, k =0 tends to ﬁnitelimits as z → ±∞ . As in the electrostatics case, one can show that (cid:18) lim z →∞ u ( ρ, z ) − lim z →−∞ u ( ρ, z ) (cid:19) = 8 πL N X n =1 z n f n I , if N X n =1 ( f n ) j = 0 , j = 1 , . That is, in the planar directions ( x, y ) the ﬂow sees a transition (across z ∈ (0 , L ) continuously)proportional to the dipole moment in z. It’s noteworthy that the usefulness of Ewald sums (cf. Section 4.4, on parameter selection)rests on the availability of truncation error estimates. Therefore, we need to endow the 2PStokeslet Ewald sums, (17) and (21), with such estimates.11o this end, let the real-space sum (17) be truncated such that only interactions betweenparticles (including periodic images) within a distance r c from each other are included. Thatis, let u Rr c ( x m ) := N X n =1 ∗ X p ∈ Z r c ( k x m − x n + τ ( p ) k ) A ( ξ, x m − x n + τ ( p )) f n , where r c ( r ) denotes the indicator function which is one if r ≤ r c and zero otherwise. Esti-mating k u R − u Rr c k is most tractable in the RMS norm, e rms := vuut N N X n =1 ( u ( x n ) − u ∗ ( x n )) . (37)In the electrostatic case, Kolafa & Perram [39] have derived famous truncation error estimatesfor randomly scattered particles in RMS norm. However, the diagonal terms in A are A jj =erfc( ξr ) /r − ξ exp( − ξ r ) / √ π , whereas in the Laplace case only the former (and smaller)term is present. On the other hand, if we disregard the oﬀ-diagonal terms in A , the key stepin [39, Appendix A, Eq. (14)] becomes tractable for the Stokeslet, and we get estimates forcomponents j = 1 , , e R rms ,j ) ≈ Q j L Z π Z π Z ∞ r c A jj r sin( θ )d r d θ d φ = Q j L r c (cid:16) e − ξ r c − π erfc( ξr c ) (cid:17) + √ πξ erfc( √ ξr c ) ! , (38)where Q j := P Nn =1 ( f j ) n . In what follows, we suppress the vector index j .The estimate (38) is to be treated as such – it does not include the full operator and itis statistical, the latter meaning that it’s only valid if x m is randomly distributed. None theless, (38) is very predictive within its domain, as we illustrate in Figure 2. Here we have N = 10000, x m randomly from a uniform distribution over Ω = [0 , , f m from a uniformdistribution centered at zero and the RMS average was formed over 30 random x m . Finally,for selecting parameters one would ideally like to solve e R rms ( ξ ) = ε for ξ , but this does notappear tractable. Rather, we series expand the error estimate for large ξr c , obtaining that e R rms ≈ s Qr c L e − ξ r c , (39)and consequently, assuming that r c ≥ L ε / (4 Q ), ξ ( r c , ε ) ≈ r c vuuut log  ε s Qr c L  . The agreement between computed errors and the simpliﬁed estimate (39) is illustrated inFigure 2.For the k -space sum (21), truncated as k k k ≤ πk ∞ /L , it’s harder to follow Kolafa &Perram [39] and derive a corresponding estimate. Rather than to ignore the issue altogether,12 −15 −10 −5 r c Figure 2: Real-space sum truncation error (37) and estimate (38) of x -component, in RMS-norm, as a function of truncation radius. N = 10000, Ω = [0 , , ξ = 4 , ,

12 (top to bottom).As dashed, simpliﬁed estimate (39), hardly distinguishable.which would break the way parameters are selected (cf. Section 4.4), we take a heuristicapproach. Based on existing results in the RMS setting, it’s natural to suppose that anestimate of the form e F rms ≈ C √ Qk a ∞ ξ b e − ( πk ∞ ξL ) , for particular a, b , can be of value. A few trialruns quite clearly suggest that good agreement is obtained if a = − / b = 3. Moreover,we let C = L π − , obtaining a heuristic, or practical, error estimate for the truncation errorof the k -space sum (21), e F rms ≈ p Q ξ L π k / ∞ e − ( πk ∞ ξL ) . (40)In Figure 3 we give results that demonstrate that this estimate captures the truncation errorwell for a range of parameters. Here, ξ = 4 , ,

12 and L = 1 , N = 400 ,

200 and the RMS-average is formed over a random set of 30 points . We observe satisfying agreement, and, asappropriate, that the estimate is somewhat conservative.Pertaining to both the real- and k -space sums, the RMS measure will underestimate errorsif particles are not randomly scattered. In that case, it becomes appropriate to consider ∞ -norm estimates. The essential property, that the sums converge at least as fast as exp( − r c ξ )and exp( − ( πk ∞ / ( ξL )) ) respectively, still holds, though obtaining reliable estimates withdetailed constants, as in the RMS case, is bound to pose a challenge. In Section 2.1 we brieﬂy surveyed fast, O ( N log N ), methods for the fully periodic (3P)problem, and we shall adhere to same framework under 2P:a) The real-space sum (17) can be made arbitrarily cheap to compute by choosing theEwald parameter, ξ , suﬃciently large (cf. Section 4.1); The small systems used to evaluate the estimate (40) are small – limited by the computational cost ofevaluating (21) for large ξ . −15 −10 −5 k ∞ e r m s −15 −10 −5 k ∞ e r m s Figure 3: Truncation error of k -space 2P Stokeslet Ewald sum (21) in RMS norm (37) vs.number of modes k ∞ , i.e. k k k ≤ πk ∞ /L . Left: L = 1 , N = 400. Right: L = 3 , N = 200. Inboth panels, ξ = 4 , ,

12 (left to right), and (solid line) heuristic error estimate (40).b) by using fast Fourier transform (FFT), the (non-singular) k -space sum (21) can beevaluated in linear time (independently of ξ );c) the (2P-only) singularity contribution (35) is easily dealt with as a one-dimensionalinterpolation problem.It is really item ( b ) that is the heart of the matter, that and the glue that hold the partstogether: parameter estimation. In the 3P setting one starts from the Ewald sum (the secondterm in (5), in the Stokes case), and the steps to obtain a fast, FFT-based, method are quiteintuitive. That is not to say that the matter is simple; a rather large debt of insight is owedto the early pioneers in the ﬁeld, such as Hockney & Eastwood [26] and Darden et al. [11].Attempts to do the same thing in planar periodicity immediately run aground on the alge-braic structure of the corresponding k -space Ewald sum (21) (cf. e.g. Grzybowski et al. [23]for Laplace). Looking at the k -space 2P Stokes Ewald sum, and its constituent expressions,(21)-(33), it is not hard appreciate the challenge in ﬁnding transforms and approximationsthat mimic the 3P approach. In the electrostatics setting, a variety of non-Ewald (cf. Mazars[50]) and non-2P Ewald methods (cf. Arnold et al. [2] and the references therein), havebeen developed instead. These methods have not been adapted to Stokes (i.e. deriving cor-responding Lekner sums [50] or correction terms [2]), and it is quite evident that, at best, asubstantial eﬀort would be required to do so.What we propose, in contrast, starts from the mixed sum/integral (19). Formulating aFFT-based (PME) method becomes a quite straight-forward matter. It follows the same lineof reasoning that we set forth in [43]. First, however, we give a few remarks regarding item( a ). 14 .1 Real-space summation in linear time The truncated real-space sum u Rr c ( x m ) = N X n =1 ∗ X p ∈ Z r c ( k x m − x n + τ ( p ) k ) A ( ξ, x m − x n + L p ) f n , with A ( ξ, x ) = 2 ξe − ξ r √ πr + erfc( ξr )2 r ! ( r I + x ⊗ x ) − ξ √ π e − ξ r I , is, in order to beneﬁt from much related work, most conveniently written as u Rr c ( x m ) = X y ∈ NL m A ( ξ, x m − y ) f ( y ) , (41)where NL m = NL m ( r c ) denotes the set of near neighbors to x m (counting periodic images,if necessary). The point being that if | NL m | is constant, for all m , as N , the number ofsources, grows, then evaluating (41) for all x m has complexity O ( N ) instead of O ( N ). Thispresupposes that all neighbor lists NL m can be found in linear time, which is the case. Infact, it is quite elementary and we refer the reader to the textbook by Frenkel & Smit [15,Appendix F], and the recent paper [44] by the present authors where additional details aregiven.Note that each neighbor list, NL m , can be viewed as the non-zero pattern for a row in asparse matrix. In light of this, it’s natural to suggest that the evaluation of (41) for all x m be implemented as a sparse matrix-vector product, u R ← ˜ A ( r c , { x m } )vec( f ). The 3 N × N matrix ˜ A has a 3 × A ij , but eachblock has the same sparsity pattern. If (41) is to be evaluated for many f , as is the case inthe context of iterative solution of boundary integral equations (cf. Section 5.1), the matrixform saves a very large amount of redundant arithmetic [58]. In a setting where locations x m change, as in ﬁber simulations [68], the matrix elements of ˜ A have to be recomputed, but thesparsity pattern can be valid for several time steps . The fast method that we give here is based on earlier work by the present authors [42, 44, 43],and this exposition is somewhat condensed and, in contrast to our previous work, draws inits structure and notation from the elegant treatment of the 3P electrostatics case by Shanet al. [60].Instead of starting from the 2P k -space Stokes Ewald (21), we invoke the mixed sum-integral form (19) u F ( ρ m , z m ) = 4 L X k =0 Z R B ( k ) e −k k k / ξ N X n =1 f n e i k · ( ρ − ρ n ) e iκ ( z − z n ) d κ. Typically, the neighbor lists NL m are constructed with some margin, k x − y k ≤ r c + r , to allow particlesto move a distance r before the neighbor list has to be recomputed.

15t is fairly easy [43] to show that u F ( x m ) can be obtained as a convolution of a particularsmooth function, ψ : Ω × R → R , with a suitably scaled Gaussian centered in x m , u F ( ρ m , z m ) = C Z R Z Ω ψ ( ρ, z ) e − β k ρ − ρ m k ∗ e − β | z − z m | d ρ d z, (42)where C = ξ πη ! / , β = 2 ξ η and η > ψ , and show how it enters, we recall the decomposition(14) and note that we can view B ( k ) e −k k k / ξ as the k -space representation of the “regu-larized” Stokeslet ( S ∗ γ )( x ), as discussed at the outset of Section 3.2. In Section 3.2.2 weproceeded to obtain the Ewald sum (21) by convolving S ∗ γ with f ( x ) and laboring over theresulting integrals (the convolution itself was trivial though). Mesh-based Ewald methods de-viate at this point. The logic is to consider not f but a regularization, f η ( x ) = ( G ( η ) ∗ f )( x ),and adjust the Green’s function accordingly. That is, choose a kernel G and determine amodiﬁed Green’s function, ˜ B , such that u γ = ( S ∗ γ ( ξ ) ∗ f )( x ) = ( f ∗ G ( η ) ∗ ˜ B ∗ G ( η ))( x ) . (43)It turns out to be advantageous [44] to let G = C e − β k x k , so that, f η ( ρ, z ) = C N X n =1 e − β k ρ − ρ m k ∗ e − β | z − z m | f n , (44)by which it modiﬁed Green’s function, ˜ B = e − (1 − η ) k k k / ξ B ( k , κ ), follows. The convolution f η ∗ ˜ B is conveniently computed as a multiplication in frequency domain, b ψ ( k , κ ) = ( , k = 0 e − (1 − η )( k + κ ) / ξ B ( k , κ ) b f η ( k , κ ) , otherwise . (45)The last convolution in (43), ( ψ ∗ G ( η ))( x ), brings us back to (42). In [42, 44, 43] detailed andconstructive derivations are given instead of this sketch. To clarify, we give a few remarks,starting with a summary of the method: Summary of method

To compute the (non-singular) k -space contribution u F ( x m ) for all m the steps are asfollows: ( i ) evaluate f η (44) on a uniform grid over Ω; ( ii ) compute a mixed Fouriertransform to arrive at b f η ; ( iii ) multiply with modiﬁed Green’s function according to (45);( iv ) an inverse mixed transform yields ψ on a regular grid, so that; ( v ) the convolution(42) can be computed for all x m . In steps ( i ) and ( v ), the Gaussians are truncated tohave support on P grid points, for some P determined by the acceptable approximationerror. Fast transforms; uniform grids

PME-methods become eﬃcient by requiring that the reg-ularized charge distribution f η ( x ) be evaluated on a uniform grid. The transforms arethen handled by FFTs. 16 ixed transforms Recall the spectral representation (8) of 2P functions in terms of discreteFourier series in ( x, y ) and a continuous transform in z . Hence, the transforms f η → b f η and b ψ → ψ are mixed. This, together with the exclusion of k = 0 in (45) and theinclusion of the singularity contribution (35), is where our proposed 2P method deviatesfrom the well-established 3P methods. Fourier integral quadrature

Computing the mixed transforms requires two standard dis-crete Fourier transforms and an approximation of the Fourier integral (in the z -direction).The quadrature step has to be done in the same time-complexity as the discrete trans-forms; otherwise, the method would not be more eﬃcient than directly summing (21).This is discussed at length in [43], and it is shown to be quite satisfactory to use anFFT-based method (following Press et al. [54, Sec.13.9]) on a moderately oversampledgrid. Spectral accuracy

Trapezoidal quadrature applied to (42) is spectrally accurate, due to the C ∞ -regularity of the integrand. Detailed analysis on this appears in [44]. Moreover,the accuracy in computing the Fourier integral in the mixed transform using a simpleFFT-based approach strongly depends on the regularity of the integrand. Grid size corresponds to direct sum truncation

As is also extensively discussed in [44],the proposed method enjoys a close relationship between the Ewald sum (21) and thefast method of this section. A ﬁnite truncation | k | ≤ πk ∞ /L of the Ewald sum (21)corresponds to a grid of size M = 2 k ∞ + 1. Due to the construction of the PME-methodbased on Gaussians (with C ∞ -regularity), the truncation error from the Ewald sum car-ries over to the fast method, meaning that the appropriate grid size can be estimatedusing truncation error estimates for the Ewald sum. This is, as of yet, conjecture in theStokes 2P setting, veriﬁed numerically (cf. Figure 5, right). Parameters and truncation

The free parameter, η , can be used to control the width,denoted w , of the Gaussian used for the convolutions in (44), (42), and we ﬁnd it naturalto prescribe this width in terms of a number of grid points, w = LP/ (2 M ) = hP/ P ≤ M denote the number ofgrid points within the support of each Gaussian, as seen in Figure 4 (bottom). Moreover,it’s important to have control of the shape, parameterized with m >

0, of the Gaussians(independently of ξ ), see Figure 4 (top). It then follows naturally [44] to let η = (cid:18) wξm (cid:19) . (46) Periodicity

Above, k · k ∗ denotes that periodicity is implied. To prove that (42) equals(19) one needs this notion to be exact (which rules out the well-known “closest imageconvention”). Formally, as Gaussians lack compact support, it should be understoode.g. that e − β k x − x m k ∗ := P p ∈ Z e − β k x − x m + τ ( p ) k . After truncating the Gaussians, thisceases to be relevant – simply extend the domain to accommodate the support of thetruncated Gaussians; evaluate (44) on this domain, taking Euclidean distance betweenparticles and grid points; and then additively wrap the extended domain into the originalone. 17 ( x ) h h h m m m f η ( x ) x x Gaussian with supp P points on x ± w Figure 4: Top: Gaussians with diﬀerent shape parameters, m < m < m . Bottom: Gaus-sian with support on P grid points around x j Approximation errors

By approximation errors we mean the errors that the fast SE2Pmethod contributes (in addition to the spectrum truncation errors that are inheritedfrom a ﬁnite truncation of (21)). These fall into three categories: ( i ) the numericalerror from evaluating the integral (42) with a trapezoidal rule T P ; ( ii ) the truncationof the Gaussians; and, ﬁnally, ( iii ) the error in computing the Fourier integral in themixed transforms. Regarding ( i ) and ( ii ), we can prove a detailed error estimate [44,Thm. 3.1], namely that | T P − u F | ≤ C (cid:16) e − π P / (2 m ) + erfc (cid:16) m/ √ (cid:17)(cid:17) . (47)Naturally the ﬁrst term corresponds to ( i ) and the second to ( ii ). It also reveals that if m is taken large the ﬁrst term goes slower to zero, indicating that selecting m ≈ √ πP strikes a balance between the terms that is favorable for the total error. The ﬁnal,and 2P speciﬁc, error concerns the Fourier integral quadrature, and is, as noted above,examined in detail in [43]. Fast gridding

The main trade-oﬀ for a spectral method is that Gaussians have wider sup-port than e.g. the Cardinal B-splines (used in the SPME method [14]) have. By usingthe fast Gaussian gridding (FGG) procedure, proposed by Greengard & Lee for thenon-uniform FFT [21], we mediate this to a large extent. Again, this is examined in[44], and detailed algorithms are given. 18 −15 −10 −5 P s f = 1s f = 2s f = 3 −10 −5 k ∞ = (M−1)/2 e r m s Figure 5: Left: Approximation error in ∞ -norm for SE2P-Stokes method, and (dashed) errorestimate (47), as a function of the support of truncated Gaussians, P . Here, N = 20, M = 20, ξ = 4. Right: ( ∗ ) truncation error for the k -space Ewald sum (21), vs number of modes κ ∞ (i.e. | k | ≤ πκ ∞ /L ). As ( · ) error in SE2P-Stokes method vs. grid size, where P selected tomake approximation errors small (left). This isolates the “spectrum truncation” error of thefast method and shows that grid size in the fast method can be selected based error estimatespertaining only to the underlying Ewald sum. The last term to evaluate is the contribution from the singular integrals that were taken outof the k -space sum (21), u F, k =0 ( z m ) = − L N X n =1 g ( z m − z n ) I f n .g ( z ) := πz erf (cid:0) zξ (cid:1) + √ π ξ e − z ξ , (48)cf. the Laplace case [43]. Because g only depends on z it’s almost trivial to compute u F, k =0 using an interpolation approach. We advocate a Chebyshev method as follows: ( i ) evaluate u F, k =0 ( s ), where s denotes the set of M Gauss-points scaled to the relevant interval; ( ii )compute the M coeﬃcients of the interpolating Chebyshev polynomial, ζ ( z ), by means ofelementary recursions; ( iii ) evaluate u F, k =0 ( z m ) ≈ ζ ( z m ) using a numerically stable Clenshawformula [54, Sec. 5.4]. Properties of Chebyshev interpolation, including spectral accuracy,are discussed in e.g. Rivlin [56] and many elementary textbooks. A commonly cited drawback for Ewald methods, particularly for fast FFT-based ones, is thatthere are several parameters to choose appropriate values for. For 3P Laplace, this is exten-sively discussed by Deserno & Holm in their famous survey [12]. Recent results and extensivediscussion is found in [70]. In the 2P Laplace case, the situation is, unsurprisingly, quitemisty. Kawata & Nagashima [34] have proposed a method, which shares some foundations19ith the present work, where it’s suggest that an optimization approach be applied to tunea set of eight parameters. The situation is, hopefully, clariﬁed to a large extent in [43].We suggest that the parameters fall into three natural categories:1. The desired accuracy, ε , in the computed u with respect to a particular norm.2. The Ewald decomposition parameter, ξ , and the truncation of real- and k -space Ewaldsums, r c and k ∞ , for particular ε .3. Parameters relating to the fast method:a) the grid size, M , (in each dimension);b) the support, P , and shape, m , of Gaussians, in the FFT-based method for u F , andthe oversampling factor s f in the associated mixed transforms;c) the number of Gauss-points, M , in the interpolation method for u F, k =0 .Item one, choosing ε , comes ﬁrst – all other parameters will depend on it. We noted thatthe cost of the real-space sum can vary by orders of magnitude depending on it’s implemen-tation and whether a sparse matrix can be assembled and reused or not. With this in mind,we argue that the next consideration should be to select the real-space truncation r c suchthat the real-space is cheap to compute.Item two is clariﬁed by the truncation error estimates of Section 3.3. We noted that, interms of ε and r c we can take ξ = 1 r c vuuut log  ε s Qr c L  . (49)That is, ξ is chosen large enough that the error committed by truncating the real-space sumat r c is close to ε . Fourth, we invoke the equivalence between k -space truncation and grid inthe PME-type method. That is, the truncation error estimate (40) is inverted, k ∞ = √ Lξ π vuut W L / ξ Q / π / ε / ! , (50)where W ( · ) is Lambert W function [10], and the grid size selected as M = 2 k ∞ + 1.The remaining parameters, { P, m, s f } , are all accuracy parameters, pertaining to theapproximation errors added by the fast method, that don’t depend on the system. The onewith greatest impact on the computational cost is P , the discrete support of the truncatedGaussians, and (cf. Figure 5, left) rough estimates are that P = 15 is appropriate for accuracyaround 10 − and P = 23 is required for full (double-precision) accuracy. As previouslydetermined, it’s natural to let m = C √ πP , and we’ve found that this constant is best takenslightly below unity, C = 0 .

92. The FFT-oversampling in the z -direction, s f , should be atleast two, but should be four or six for higher accuracies (cf. Figure 5). Finally, the numberof Gauss-points in the 1D interpolation used to compute u F, k =0 should be small, M < Numerical results

In the introduction several areas of ongoing research involving boundary integral methods forStokes were noted, and we would like to put the present method into that context. This alsogives the opportunity to illustrate parameter selection and the practical characteristics of ourmethod.

As an example, consider a large number of rigid spheres, Γ i , i = 1 . . . N s , in 2P Stokes ﬂow. Let˜ f i denote the (unknown) Stokeslet density on Γ i . Under the constraint of planar periodicitywe write the velocity ﬁeld generated by all { ˜ f i , Γ i } as u ( x ) = 18 πµ X p ∈ Z N s X i =1 Z Γ i S ( x − y + τ ( p ))˜ f i ( y )dΓ( y ) . (51)Recall that the periodized Stokeslet sum (4) was given a well-deﬁned meaning by the 2PStokeslet Ewald decomposition (Section 3). The same logic shall apply to (51). We refer thereader to e.g. [51, 71] regarding which Green’s functions are required to represent particularﬂow cases, noting that for this external ﬂow it suﬃces to use the Stokeslet.The boundary integral setting is vastly expanded in comparison to the exposition hitherto.Issues that arise, as concisely reviewed in Ying et al. [71], include fast summation methods (towhich this work pertains), accurate quadrature rules (surveyed below) and domain boundaryrepresentation.Let each sphere Γ i be discretized with a set of quadrature points, x ij , j = 1 , . . . , N p , andlet T x denote integration with respect to x by a simple quadrature rule, T [ ζ ] := P j ζ ( x j ) w j ≈ R Γ ζ ( x )dΓ( x ). Moreover, let T ,k denote the so-called “punctured” rule, where w k ≡ x on some Γ i , the integral over Γ i in (51) is singular when p = 0. Discretizing it inthe Nystr¨om fashion, we apply the punctured rule, u i ( x m ) ∼ Z Γ i S ( x m − y )˜ f ( y )dΓ( y ) ≈ T ,m y [ S ( x m − y )˜ f ( y )] . This has one major beneﬁt and one drawback. The beneﬁt is that it renders the discretizationof (51) equivalent to the 2P Stokeslet ewald sum (36) – c.f. the exclusion of the term { n = m, p = 0 } in the real space sum (17). The drawback is that the punctured rule is only ﬁrstorder accurate.To obtain higher accuracy, without sacriﬁcing the structure that lets us apply the Ewalddecomposition, one can add a local correction, Υ m loc , to the punctured rule, u i ( x m ) ≈ T ,m y [ S ( x m − y )˜ f ( y )] + Υ m loc . Such local corrections to simple quadrature rules (that allow integrable singularities tobe handled) have been extensively investigated, and are, of course, unrelated to the planarperiodicity of the present problem. Progress has been on a case-by-case basis, with diﬀerentcorrections developed depending on the singularity and the geometry of the boundary, Γ.Early references here include Lyness [47] and Kapur & Rokhlin [33], with subsequent work21y e.g. Aguilar & Chen [1] and Marin et al. [49]. The latter gives formal proofs to theeﬀect, roughly, that on a uniform grid in d dimensions where the singularity has order q and p is the radius (in terms of grid points) of the modiﬁed quadrature rule, accuracy of order O ( h p +2+ q + d ) is attained. Furthermore, Marin et al. in [48] developed such formulas for theStokeslet, S , though only of the case of Γ being ﬂat.An alternative that could be expected to yield local corrections, in a way suitable for pair-ing with fast Ewald methods, is the contour integral formulation of Bazhlekov, Anderson andMeijer [4]. Other alternatives for quadrature over integrable singularities includes singularitysubtraction , as discussed in the rich survey by Pozrikidis [53], and methods based on carefullyselected variable transformations, as discussed in e.g. Sidi [62, 63] and Ying et al. [71].Moving on, for x m on Γ i , we discretize (51) as u ( x m ) ≈ πµ  T ,m y [ S ( x m − y ) ˜ f i ( y )] + Υ m loc + N s X j =1 X p ∈ Z \ T y [ S ( x m − y + τ ( p )) ˜ f j ( y )]  = 18 πµ (Ew ( x , f ) + Υ m loc ( S, f )) . (52)Here, x denotes the set of N = N s N p discretization points (i.e. x = { x ij : i = 1 , . . . , N s , j =1 , . . . , N p } ). Correspondingly, f denotes the set of discrete Stokeslet strengths multiplied byappropriate quadrature weights, f = { ˜ f ij w j } .The mapping f → u , for ﬁxed x , is to be understood as a linear system of dimension3 N × N ; knowing u , we can solve for f . Such an inversion is by necessity iterative, as themapping does not have a closed form. Indeed, the Ewald summation method is requiredto make the action of the periodized Stokeslet convergent (including constraints on f andphysical conditions at z → ±∞ ).The fast Ewald method of Section 4 acts as an O ( N log N ) “matvec” operator, for useinside an iterative solver (such as GMRES). The ultimate complexity then depends on theconvergence of this iterative procedure (whether the number of GMRES iterations dependson N or not), which leads to the expansive topic of preconditioning; Ying et al. [71] use amethod due to Greengard et al. [20], wherein additional references are found.In this context it’s natural to remark that a second-kind boundary integral formulation isexpected to be more advantageous with respect to iteration convergence than the ﬁrst-kindequation (51). This motivates the development of Ewald decompositions, and associated fastmethods, for the double-layer (“doublet” or “stresslet”) Stokes potential.Finally, it should be noted that investigations of physical systems generally pose theboundary integral problem in a diﬀerent form. Rather than having boundary conditions for u on all particles, kinematic conditions (which include boundary integration) are used and thesystem is closed using force and torque balances. Such a formulation is used for sedimentingﬁbers [69, 58, 68]. Ewald techniques apply and can handle the main computational task inthese sedimentation problems, though additional considerations emerge. Having posed a relevant boundary integral problem, we now discuss how to deploy our methodin detail and give an indication of how eﬃcient it is. Several of the topics touched upon in theprevious section, such as preconditioning and local quadrature correction terms, are beyond22igure 6: Nearly uniform point distribution on sphere by subdividing icosahedron. Left toright: N p = 12 , , We consider a system with N s spheres, each with a radius r s , in a domain Ω = [0 , L ) , andassume that the spheres’ center are separated by at least 3 r s . A nearly uniform distributionof points on each sphere, as seen in Figure 6, is obtained by subdividing an icosahedron [45].We target an accuracy ε ≈ − in the Ewald method. The cost of computing the 2PStokeslet Ewald sums (17) and (21) depends on the density of the system. Throughout, weshall take an initial number of spheres N s in a domain with L = L and then grow the systemat constant density to, say, L = 3 L .Assuming uniformity, the number of points within a ball with radius r c is N n ( r c ) =4 πr c N s N p / (3 L ). This represents the number of near neighbors that have to be consideredfor a particular real-space truncation radius, r c . The cost of computing the real-space sum(cf. Section 4.1) is roughly N s N p N n ( r c ) λ , where λ represents the arithmetic cost associatedwith each interaction, plus the cost of ﬁnding the neighbor list. The factor λ is one (i.e.one ﬂoating-point multiply-add) if the real-space sum is computed as a sparse matrix-vectormultiplication and on the order of 100-1000 otherwise (because of the computations of erf( · )and exp( · )).Table 1 attempts to give an overview of the relationship between real-space sum cost, thesystem size, and what is then implied for the k -space sum by the inverted error estimatesof Section 4.4. As expected, making the systems denser requires ξ to grow for the costof the real-space sum (in terms of near neighbors) to remain ﬁxed. Correspondingly, theFFT-grid M = 2 k ∞ grows to maintain a ﬁxed accuracy. In what remains, we shall take P = 16, the FFTs oversampled by a factor four in the z -direction and M = 80 Gauss pointsin interpolation method for (35). With r c = 1 / , ξ = 9 , M = 30, and x the points from 5randomly placed spheres, we compute RMS errors: 2 . × − and 6 . × − for the real-spacesum and SE2P method respectively. In Figures 7 and 8 we give timing results as the system is scaled up and show how the run-time is distributed between the steps in the algorithm. Here we ﬁx the average number ofnear neighbors at roughly 400 and 1000 respectively, and scale the system up under thatconstraint. The following remarks clarify the situation:23iven Derived N s = 100 , N p = 12 , N n = 100 r c = 0 . , ξ = 17 . , k ∞ = 27 N s = 100 , N p = 12 , N n = 1000 r c = 0 . , ξ = 8 . , k ∞ = 12 N s = 100 , N p = 42 , N n = 100 r c = 0 . , ξ = 26 . , k ∞ = 41 N s = 100 , N p = 42 , N n = 1000 r c = 0 . , ξ = 12 . , k ∞ = 19 N s = 1000 , N p = 12 , N n = 100 r c = 0 . , ξ = 37 . , k ∞ = 58 N s = 1000 , N p = 12 , N n = 1000 r c = 0 . , ξ = 17 . , k ∞ = 27 N s = 1000 , N p = 42 , N n = 100 r c = 0 . , ξ = 57 . , k ∞ = 89 N s = 1000 , N p = 42 , N n = 1000 r c = 0 . , ξ = 26 . , k ∞ = 41Table 1: Parameter examples for 2P Stokeslet Ewald summation, with ε = 10 − and L = 1. N s : number of spheres. N p : number of points per sphere. N n : number of near neighbors toaccount for in real-space sum. Right column: parameter values selected based on truncationerror estimates (39) and (40). Remark 1 (Real space computation)

The real space sum is evaluated in the matrix formas discussed in Section 4.1. The matrix is computed once and the time to do so is amor-tized over a hypothetical number of iterations: 50 (which is reasonable, see [58]). This pre-computation contains two parts: ﬁrst the matrix elements are computed (a), then the sparsematrix is ﬁnalized (b). The real-space contribution (17) is computed as a matrix-vector prod-uct (c). The bars in Figures 7 and 8 (left) are stacked from bottom to top in the order (a),(b), (c) – showing that, at 50 iterations, constructing the sparse matrix is still the dominatingcost.

Remark 2 (Computation of k-space method)

The tasks in the FFT-based method tocompute u F are as follows: (a) compute grid function (44) ; (b) compute oversampled trans-forms; (c) solve Poission problem (45) ; and (d) compute convolution with Gaussians (42) .The bar stacks in Figures 7 and 8 (right) breaks down the total time for the method in thesesteps from bottom to top.Our implementation was mostly in Matlab code, and, hence, FFTs were handled by thehighly optimized library FFTW [16] . The code for gridding (44) and integration (42) withthe FGG algorithm [21, 44] was implemented in C and the time-critical parts were hand-codedat machine level with SSE instructions and software prefetching. These implementations runclose to peak ﬂops of present hardware. An untuned implementation in C runs roughly at athird of the speed, making the gridding steps dominate transforms. Remark 3 (Computation of u F, k =0 ) The computation of u F, k =0 as discussed in Section4.3 involves computing πz erf (cid:0) zξ (cid:1) + √ π ξ e − z ξ for z = z m − z n , where n = 1 , . . . , N and m = 1 , . . . , M (the number of Gauss-points for the Chebyshev interpolation). In the iterative By convention, the fast Fourier transform is said to have complexity O ( N log N ), but this concept encom-passes a lot of variability which is seen in practice. For instance, an FFT of length 2 n will typically be severaltimes faster than a transform of length 2 n + 1. The grid sizes seen in the scaling tests here try to avoid thebest and worst case transform lengths, to faithfully represent the method. w a ll ti m e w a ll ti m e ξ = 9, r c = 1 / ε = 10 − , N n ≈ L = 1 1.2 1.4 1.6 1.8 2 2.2 N s = 80 138 220 328 467 640 852 N = 2880 4968 7920 11808 16812 23040 30672 M = 30 36 42 48 54 60 66Figure 7: Runtime to compute (36) as system is scaled up, keeping the number of nearneighbors ﬁxed at around 400, according to the table above. Left:

Real-space sum, cf.Remark 1.

Right:

FFT-based k -space method, cf. Remark 2. setting, where we wish to rapidly evaluate u F, k =0 for many f , this arithmetic should be pre-computed and stored in a matrix, A . Then the sum in (48) is computed by multiplying A with f i , i = 1 , . The interpolation procedure follows, which is very fast. Corresponding to theruns in Figures 7 and 8, the times to compute u F, k =0 were vanishingly small. We note that balancing the computational cost of the real-space sum and k -space fastmethod is non-trivial. At a basic level, it hinges on how dense the system is. Evaluating thereal-space sum using a neighbor list method (matrix-form or not) requires the list to be stored.In the iterative setting, the matrix form oﬀers superior performance – so much so that themain constraint of the real-space method becomes memory (i.e. limiting r c for a particular N ). In the fast k -space method the split in computational burden between transforms andgridding also depends on density – or rather, it depends on the Ewald parameter ξ , which hasto increase to keep the number of near neighbors to account for in the real-space sum ﬁxed ifthe system is made denser. In this paper we have derived a Ewald decomposition for the Stokeslet in planar periodicity(Section 3.2) and a PME-type O ( N log N ) method for the fast evaluation of the resultingexpression. The decomposition is the natural 2P counterpart to the classical 3P decompositionby Hasimoto [24]. Truncation error estimates are provided to aid in selecting parameters(Section 3.3). The fast, PME-type, method (Section 4) for computing the k -space sum (21)is based on a mixed sum/integral form (19), and is similar to the method by the present25 .6 0.8 1 1.2 1.4 1.6 x 10 w a ll ti m e w a ll ti m e ξ = 9, r c = 1 / ε = 10 − , N n ≈ L = 1 1.1 1.2 1.3 1.4 N s = 180 240 311 395 494 N = 6480 8640 11196 14220 17784 M = 30 33 36 39 42Figure 8: Runtime to compute (36) as system is scaled up, keeping the number of nearneighbors ﬁxed at around 1000, according to the table above. Left:

Real-space sum, cf.Remark 1.

Right:

FFT-based k -space method, cf. Remark 2.authors [43] for the corresponding problem in electrostatics. This appear to be the ﬁrst fastmethod for computing Stokeslet Ewald sums in planar periodicity, and has three attractiveproperties: it is spectrally accurate; it uses the minimal amount of memory that a griddedEwald method can use; and a clear view of numerical errors and how to choose parametersis provided (Section 4.4). The ﬁnal part explores the practicalities of the proposed method,and surveys the computational issues involved in applying it to 2-periodic boundary integralStokes problems. We presently pursue applications-oriented questions in this regard, andhope to communicate further results in the near future. A 2P Stokeslet sums, details

A.1 Explicit k-space form of Stokeslet sum

Here we compute ˜ u ( x m ) = 4 L X k =0 Z R (cid:18) I k k k − k ⊗ k k k k (cid:19) N X n =1 f n e i k · ( x m − x n ) d κ, (53)where k = ( k , κ ) = ( k , k , κ ) is the composition of the discrete and continuous transformvariables. The integrals in (53) are computable. We let˜ u ( x m ) = 4 L X k =0 N X n =1 ˜ Q ( k , z m − z n ) f n e i k · ( ρ m − ρ n ) , Q ( k , z ) := Z R (cid:18) I k k k − k ⊗ k k k k (cid:19) e iκz d κ = Z R (cid:18) I k k k + κ − k ⊗ k ( k k k + κ ) (cid:19) e iκz d κ =: ˜ Q + ˜ Q , where ˜ Q and ˜ Q denote integrals over the diagonal- and outer product terms respectively.Note that, since k = 0, k > L pq ( k, z ) := Z ∞ κ p cos( κz )( k + κ ) q d κ and M pq ( k, z ) := Z ∞ κ p sin( κz )( k + κ ) q d κ, for which we have, with X = L, M , that X pq +1 = − kq ∂X pq ∂k , q ∈ Z , q > . (54)Evidently, ˜ Q ( k , z ) = 2 I L ( k k k , z ) . (55)The integral here is known [74, 3.723(2), p. 424], L ( k, z ) = πe − k | z | k , k > . To simplify matters in computing ˜ Q – reducing the number of integrals to labor over – wenote that the integrand can be viewed as an element-wise multiplication of( k , ⊗ ( k ,

1) =  k k k k k k k k k k  and (1 , , κ ) ⊗ (1 , , κ )( k k k + κ ) = ( k k k + κ ) −  κ κκ κ κ  where the former is a constant under the integration. The latter demonstrates that there arethree integrals to compute and arrange as:˜ Q ( k , z ) = −  k L ( k k k , z ) k k L ( k k k , z ) ik M ( k k k , z ) k k L ( k k k , z ) k L ( k k k , z ) ik M ( k k k , z ) ik M ( k k k , z ) ik M ( k k k , z ) L ( k k k , z )  . Having already found L , (54) gives that L ( k, z ) = πe − k | z | ( k | z | + 1)4 k . L = L − k L , so that L ( k, z ) = − πe − k | z | ( k | z | − k , by (54). Finally, M ( k, z ) = − ∂∂z L ( k, z ) = πze − k | z | k . Thus, we have obtained an explicit form of the k -space 2P Stokeslet sum (53):˜ u ( x m ) = 4 L X k =0 N X n =1 ˜ Q ( k , z m − z n ) f n e i k · ( ρ m − ρ n ) , where ˜ Q ( k , z ) = 2  L − k L − k k L − ik M − k k L L − k L − ik M − ik M − ik M L − L  ( k k k , z )= e −k k k| z | k k k  π − ( π k k k| z | + π ) k k k k − π ( k k k| z | +1) k k k k k − iπzk − π ( k k k| z | +1) k k k k k π − ( π k k k| z | + π ) k k k k − iπzk − iπzk − iπzk ( π k k k| z | + π )  . A.2 Explicit k-space form of regularized Stokeslet sum

Our objective here is to compute the integral Q , given in (20), to obtain an explicit 2PStokeslet Ewald sum. The tensor B (7) contains a diagonal and an outer product term, andwe integrate them separately. That is, let Q ( k , z ) = Q I ( k , z ) + Q k ⊗ k ( k , z ) , where Q I ( k , z ) = I e −k k k / ξ Z R (cid:18) ξ + 1 k k k (cid:19) e − κ / ξ e iκz d κ = I e −k k k / ξ Z R (cid:18) ξ + 1 k k k + κ (cid:19) e − κ / ξ e iκz d κ and Q k ⊗ k ( k , z ) = − e −k k k / ξ Z R (cid:18) ξ k k k + 1 k k k (cid:19) ( k ⊗ k ) e − κ / ξ e iκz d κ = − e −k k k / ξ Z R (cid:18) ξ ( k k k + κ ) + 1( k k k + κ ) (cid:19) ( k ⊗ k ) e − κ / ξ e iκz d κ. For non-negative integers p and q we let˜ J pq ( k, z ) := Z ∞ κ p cos( κz )( k + κ ) q e − κ / ξ d κ, J pq ( k, z ) := e − k / ξ ˜ J pq ( k, z )28nd ˜ K pq ( k, z ) := Z ∞ κ p sin( κz )( k + κ ) q e − κ / ξ d κ, K pq ( k, z ) := e − k / ξ ˜ K pq ( k, z ) . With these deﬁnitions, it is evident that Q I ( k , z ) = 2 J ( z )4 ξ + J ( k k k , z ) ! I . The ﬁrst integral is elementary, ˜ J ( z ) = √ πξe − z ξ , (56)and the second can be found [74, 3.954(2), p. 504]˜ J ( k, z ) = πe k / ξ k (cid:18) e − kz erfc (cid:18) k ξ − zξ (cid:19) + e kz erfc (cid:18) k ξ + zξ (cid:19)(cid:19) , k > . (57)Moving on to Q k ⊗ k , one studies the symmetries of the integrand as when computing ˜ Q , toﬁnd that Q k ⊗ k ( k , z ) = −  k (cid:16) J ξ + J (cid:17) k k (cid:16) J ξ + J (cid:17) k (cid:16) iK ξ + iK (cid:17) k k (cid:16) J ξ + J (cid:17) k (cid:16) J ξ + J (cid:17) k (cid:16) iK ξ + iK (cid:17) k (cid:16) iK ξ + iK (cid:17) k (cid:16) iK ξ + iK (cid:17) J ξ + J  ( k k k , z ) . The integrals present are related, as previously, via (54) and obvious algebraic relationships.With ˜ J and ˜ J known, one ﬁnds ˜ J ( k, z ) = − k ∂∂k ˜ J ( k, z )˜ J ( k, z ) = ˜ J ( k, z ) − k ˜ J ( k, z )˜ J ( k, z ) = − k ∂∂k ˜ J ( k, z ) . Moving on to the anti-symmetric integrals K pq , one can ﬁnd [74, 3.954(1),p. 504]˜ K ( k, z ) = πe k / ξ (cid:18) e − kz erfc (cid:18) k ξ − zξ (cid:19) − e kz erfc (cid:18) k ξ + zξ (cid:19)(cid:19) , and, consequently, ˜ K ( k, z ) = − k ∂∂k ˜ K ( k, z ) . This completes the delicate matter of integration. There remains to ﬁnd compact expressionsfor the J pq - and K pq terms that were left as derivatives above. The forms of (56) and (57)suggest that we look for terms λ := e − k / ξ − ξ z θ + := e kz erfc (cid:18) k ξ + ξz (cid:19) θ − := e − kz erfc (cid:18) k ξ − ξz (cid:19) . With this, explicit forms of J pq and K pq , as given in (27)-(33) follow.29 Conventions for Fourier transforms and series

For compactness, we use a non-unitary form of the Fourier transform. For a function, f ( x ), x ∈ R n , which is periodic with respect to Ω ⊂ R n , we have the Fourier series deﬁned as f ( x ) = X k ˆ f k e i k · x ˆ f k = 1 | Ω | Z Ω f ( x ) e − i k · x d x, where k ∈ { π n /L : n ∈ Z n } . The corresponding integral transform for functions that decaysuﬃciently fast on R n is then naturally f ( x ) = 1(2 π ) n Z R n ˆ f (¯ κ ) e i ¯ κ · x d¯ κ ˆ f (¯ κ ) = Z R n f ( x ) e − i ¯ κ · x d x , ¯ κ ∈ R n . References [1] J. C. Aguilar and Y. Chen. High-order corrected trapezoidal quadrature rules for theCoulomb potential in three dimensions.

Comput. Math. Appl. , 49:625–631, February2005.[2] A. Arnold, J. de Joannis, and C. Holm. Electrostatics in periodic slab geometries. I.

J.Chem. Phys. , 117:2496–2502, 2002.[3] G. K. Batchelor. Slender-body theory for particles of arbitrary cross-section in Stokesﬂow.

J. Fluid Mech. , 44:419–440, 1970.[4] I. Bazhlekov, P. Anderson, and H. Meijer. Nonsingular boundary integral method fordeformable drops in viscous ﬂows.

Phys. Fluids , 16:1064–1081, April 2004.[5] I. Bazhlekov, P. Anderson, and H. Meijer. Numerical investigation of the eﬀect of in-soluble surfactants on drop deformation and breakup in simple shear ﬂows.

J. Colloid.Interf. Sci. , 298:369–394, 2006.[6] J. F. Brady and G. Bossis. Stokesian Dynamics.

Annu. Rev. Fluid Mech. , 20:111–157,1988.[7] J. E. Butler and E. S. G. Shaqfeh. Dynamic simulations of the inhomogeneous sedimen-tation of rigid ﬁbres.

J. Fluid Mech. , 468:205–237, October 2002.[8] S. Childress, S. E. Spagnolie, and T. Tokieda. A bug on a raft: recoil locomotion in aviscous ﬂuid.

J. Fluid Mech. , 669:527–556, February 2011.[9] N. Coq, A. Bricard, F.-D. Delapierre, L. Malaquin, O. du Roure, M. Fermigier, andD. Bartolo. Collective beating of artiﬁcial microcilia.

Phys. Rev. Lett. , 107:014501, Jun2011.[10] R. M. Corless, G. H. Gonnet, D. E. G. Hare, D. J. Jeﬀrey, and D. E. Knuth. On theLambert W function.

Adv. Comput. Math. , 5:329–359, 1996.3011] T. Darden, D. York, and L. Pedersen. Particle mesh Ewald - an N.log(N) method forEwald sums in large systems.

J. Chem. Phys. , 98:10089–10092, 1993.[12] M. Deserno and C. Holm. How to mesh up Ewald sums. I. A theoretical and numericalcomparison of various particle mesh routines.

J. Chem. Phys. , 109:7678–7693, 1998.[13] A. Dutt and V. Rokhlin. Fast Fourier-transforms for nonequispaced data.

SIAM J. Sci.Comput. , 14:1368–1393, 1993.[14] U. Essmann, L. Perera, M. L. Berkowitz, T. Darden, H. Lee, and L. G. Pedersen. Asmooth particle mesh Ewald method.

J. Chem. Phys. , 103:8577–8593, 1995.[15] D. Frenkel and B. Smit.

Understanding Molecular Simulation . Academic Press, 2 edition,11 2001.[16] M. Frigo and S. G. Johnson. The design and implementation of FFTW3.

P. IEEE ,93:216–231, 2005.[17] D. Gonzalez-Rodriguez and E. Lauga. Reciprocal locomotion of dense swimmers in Stokesﬂow.

J. Phys.-Condens. Mat. , 21:204103, May 2009.[18] T. G¨otz. Simulating particles in Stokes ﬂow.

J. Comput. Appl. Math. , 175:415–427, 2005.[19] L. Greengard and M. C. Kropinski. Integral equation methods for Stokes ﬂow in doubly-periodic domains.

J. Eng. Math. , 48:157–170, 2004.[20] L. Greengard, M. C. Kropinski, and A. Mayo. Integral equation methods for Stokes ﬂowand isotropic elasticity in the plane.

J. Comput. Phys. , 125:403–414, May 1996.[21] L Greengard and JY Lee. Accelerating the nonuniform fast Fourier transform.

SIAMRev. , 46(3):443–454, 2004.[22] L. Greengard and V. Rokhlin. A fast algorithm for particle simulations.

J. Comput.Phys. , 73:325–348, 1987.[23] A. Grzybowski, E. Gwozdz, and A. Brodka. Ewald summation of electrostatic inter-actions in molecular dynamics of a three-dimensional system with periodicity in twodirections.

Phys. Rev. B , 61:6706–6712, 2000.[24] H. Hasimoto. On the periodic fundamental solutions of the Stokes equations and theirapplication to viscous ﬂow past a cubic array of spheres.

J. Fluid Mech , 5:317–328, 1959.[25] J. P. Hernandez-Ortiz, J. J. de Pablo, and M. D. Graham. Fast computation of many-particle hydrodynamic and electrostatic interactions in a conﬁned geometry.

Phys. Rev.Lett. , 98, April 2007.[26] R.W. Hockney and J.W. Eastwood.

Computer Simulation Using Particles . McGraw-Hill,New York, 1981.[27] I. D. Hosein and C. M. Liddell. Convectively assembled asymmetric dimer-based colloidalcrystals.

Langmuir , 23:10479–10485, October 2007.3128] K. Ishii. Viscous-ﬂow past multiple planar arrays of small spheres.

J. Phys. Soc. Jpn. ,46:675–680, 1979.[29] T. Ishikawa, G. Sekiya, Y. Imai, and T. Yamaguchi. Hydrodynamic interactions betweentwo swimming bacteria.

Biophys. J. , 93:2217–2225, September 2007.[30] T. Ishikawa, M. P. Simmonds, and T. J. Pedley. Hydrodynamic interaction of twoswimming model micro-organisms.

J. Fluid Mech. , 568:119–160, December 2006.[31] P. J. A. Janssen and P. D. Anderson. Boundary-integral method for drop deformationbetween parallel plates.

Phys. Fluids , 19:043602, April 2007.[32] A. Kanevsky, M. J. Shelley, and A.-K. Tornberg. Modeling simple locomotors in Stokesﬂow.

J. Comput. Phys. , 229:958–977, February 2010.[33] S. Kapur and V. Rokhlin. High-order corrected trapezoidal quadrature rules for singularfunctions.

SIAM J. Numer. Anal. , 34:1331–1356, August 1997.[34] M. Kawata and U. Nagashima. Particle mesh Ewald method for three-dimensional sys-tems with two-dimensional periodicity.

Chem. Phys. Lett. , 340:165–172, 2001.[35] E. E. Keaveny and M. J. Shelley. Applying a second-kind boundary integral equationfor surface tractions in Stokes ﬂow.

J. Comput. Phys. , 230:2141–2159, March 2011.[36] S. N. Khaderi, M. G. H. M. Baltussen, P. D. Anderson, D. Ioan, J. M. J. den Toonder,and P. R. Onck. Nature-inspired microﬂuidic propulsion using magnetic actuation.

Phys.Rev. E , 79:046304, Apr 2009.[37] J. Koiller, K. Ehlers, and R. Montgomery. Problems and progress in microswimming.

J.Nonlinear Sci. , 6:507–541, November 1996.[38] G. Kokot, Vilfan M, N. Osterman, A. Vilfan, B. Kavcic, I. Poberaj, and D. Babic.Measurement of ﬂuid ﬂow generated by artiﬁcial cilia.

Biomicroﬂuidics , 5(3):034103,2011.[39] J. Kolafa and J. W. Perram. Cutoﬀ errors in the Ewald summation formulas for point-charge systems.

Mol. Simulat. , 9:351–368, 1992.[40] K. N. Kudin and G. E. Scuseria. Revisiting inﬁnite lattice sums with the periodic fastmultipole method.

J. Chem. Phys. , 121:2886–2890, 2004.[41] A. Kumar and J. J. L. Higdon. Particle mesh Ewald Stokesian dynamics simulations forsuspensions of non-spherical particles.

J. Fluid Mech. , 675:297–335, May 2011.[42] D. Lindbo and A.-K. Tornberg. Spectrally accurate fast summation for periodic Stokespotentials.

J. Comput. Phys. , 229:8994 – 9010, 2010.[43] D. Lindbo and A.-K. Tornberg. Fast and spectrally accurate Ewald summation for 2-periodic electrostatic systems. arXiv:1109.1667 (2011), http://arxiv.org/abs/1109.1667,2011.[44] D. Lindbo and A.-K. Tornberg. Spectral accuracy in fast Ewald-based methods forparticle simulations.

J. Comput. Phys. , 230:8744–8761, 2011.3245] G. Luigi. Matlab code: BuildSphere. http://giaccariluigi.altervista.org/blog/, April2009. Retreived 08/19/2011.[46] F. Lundell, L. D. S¨oderberg, and P. H. Alfredsson. Fluid mechanics of papermaking.

Annu. Rev. Fluid Mech. , 43:195–217, 2011.[47] J. N. Lyness. Error functional expansion for N-dimensional quadrature with an integrandfunction singular at a point.

Math. Comput. , 30:1–23, 1976.[48] O. Marin, K. Gustavsson, and A.-K. Tornberg. A wall treatment for conﬁned Stokesﬂow. In preparation, 2011.[49] O. Marin, O. Runborg, and A.-K. Tornberg. Corrected trapezoidal rules for a class ofsingular functions. In preparation, 2011.[50] M. Mazars. Lekner summations and Ewald summations for quasi-two-dimensional sys-tems.

Mol. Phys. , 103:1241–1260, 2005.[51] C. Pozrikidis.

Boundary Integral and Singularity Methods for Linearized Viscous Flow .Cambridge University Press, 1992.[52] C. Pozrikidis. Computation of periodic Green’s functions of Stokes ﬂow.

J. Eng. Math. ,30:79–96, 1996.[53] C. Pozrikidis. Interfacial dynamics for Stokes ﬂow.

J. Comput. Phys. , 169:250–301, 2001.[54] W. H. Press, S. A. Teukolsky, W. T. Vetterling, and B. P. Flannery.

Numerical Recipes3rd Edition: The Art of Scientiﬁc Computing . Cambridge University Press, 2007.[55] E. M. Purcell. Life at low Reynolds-number.

Am. J. Phys. , 45:3–11, 1977.[56] T. J. Rivlin.

Chebyshev Polynomials: From Approximation Theory to Algebra and Num-ber Theory . Wiley-Interscience, 2 edition, 1990.[57] S. K. Saha and G. P. Celata. Advances in modelling of biomimetic ﬂuid ﬂow at diﬀerentscales.

Nanoscale Res. Lett. , 6:344, April 2011.[58] D. Saintillan, E. Darve, and E. S. G. Shaqfeh. A smooth particle-mesh Ewald algorithmfor Stokes suspension simulations: The sedimentation of ﬁbers.

Phys. Fluids , 17, March2005.[59] D. Saintillan and M. J. Shelley. Orientational order and instabilities in suspensions ofself-locomoting rods.

Phys. Rev. Lett. , 99:058102, August 2007.[60] Y. B. Shan, J. L. Klepeis, M. P. Eastwood, R. O. Dror, and D. E. Shaw. Gaussian splitEwald: A fast Ewald mesh method for molecular simulation.

J. Chem. Phys. , 122:054101,2005.[61] M. J. Shelley and T. Ueda. The Stokesian hydrodynamics of ﬂexing, stretching ﬁlaments.

Physica D , 146:221 – 245, 2000.[62] A. Sidi. Numerical integration over smooth surfaces in R-3 via class S(m) variable trans-formations. Part II: Singular integrands.

Appl. Math. Comput. , 181:291–309, October2006. 3363] A. Sidi. Further extension of a class of periodizing variable transformations for numericalintegration.

J. Comput. Appl. Math. , 221:132–149, November 2008.[64] A. Sierou and J. F. Brady. Accelerated Stokesian Dynamics simulations.

J. Fluid Mech. ,448:115–146, 2001.[65] S. E. Spagnolie and E. Lauga. Jet propulsion without inertia.

Phys. Fluids , 22:081902,August 2010.[66] S. E. Spagnolie and E. Lauga. The optimal elastic ﬂagellum.

Phys. Fluids , 22:031901,March 2010.[67] H. Takemoto, T. Ohyama, and A Tohsaki. Direct sum of Coulomb potential withoutambiguities of conditionally convergent series.

Prog. Theor. Phys. , 109:563–573, 2003.[68] A.-K. Tornberg and K. Gustavsson. A numerical method for simulations of rigid ﬁbersuspensions.

J. Comput. Phys. , 215:172–196, June 2006.[69] A.-K. Tornberg and M. J. Shelley. Simulating the dynamics and interactions of ﬂexibleﬁbers in Stokes ﬂows.

J. Comput. Phys. , 196:8–40, May 2004.[70] H. Wang, F. Dommert, and C. Holm. Optimizing working parameters of the smoothparticle mesh Ewald algorithm in terms of accuracy and eﬃciency rid.

J. Chem. Phys. ,133:034117, July 2010.[71] L. X. Ying, G. Biros, and D. Zorin. A high-order 3d boundary integral equation solverfor elliptic pdes in smooth domains.

J. Compup. Phys. , 219(1):247–275, 2006.[72] Y. N. Young, M. R. Booty, M. Siegel, and J. Li. Inﬂuence of surfactant solubility on thedeformation and breakup of a bubble or capillary jet in a viscous ﬂuid.

Phys. Fluids , 21,July 2009.[73] H. Zhou and C. Pozrikidis. Adaptive singularity method for Stokes ﬂow past particles.

J. Comput. Phys. , 117:79 – 89, 1995.[74] D. Zwillinger, editor.