A scalable computational platform for particulate Stokes suspensions
Wen Yan, Eduardo Corona, Dhairya Malhotra, Shravan Veerapaneni, Michael Shelley
AA scalable computational platform for particulate Stokes suspensions
Wen Yan a, ∗ , Eduardo Corona c , Dhairya Malhotra b , Shravan Veerapaneni d , Michael Shelley a,b a Center for Computational Biology, Flatiron Institute, Simons Foundation b Courant Institute of Mathematical Sciences, New York University c Department of Mathematics, New York Institute of Technology d Department of Mathematics, University of Michigan
Abstract
We describe a computational framework for simulating suspensions of rigid particles in Newtonian Stokesflow. One central building block is a collision-resolution algorithm that overcomes the numerical constraintsarising from particle collisions. This algorithm extends the well-known complementarity method for non-smoothmulti-body dynamics to resolve collisions in dense rigid body suspensions. This approach formulates the collisionresolution problem as a linear complementarity problem with geometric ‘non-overlapping’ constraints imposed ateach time-step. It is then reformulated as a constrained quadratic programming problem and the Barzilai-Borweinprojected gradient descent method is applied for its solution. This framework is designed to be applicable for anyconvex particle shape, e.g., spheres and spherocylinders, and applicable to any Stokes mobility solver, includingthe Rotne-Prager-Yamakawa approximation, Stokesian Dynamics, and PDE solvers (e.g., boundary integral andimmersed boundary methods). In particular, this method imposes Newton’s Third Law and records the entirecontact network. Further, we describe a fast, parallel, and spectrally-accurate boundary integral method tailoredfor spherical particles, capable of resolving lubrication effects. We show weak and strong parallel scalings up to8 × particles with approximately 4 × degrees of freedom on 1792 cores. We demonstrate the versatilityof this framework with several examples, including sedimentation of particle clusters, and active matter systemscomposed of ensembles of particles driven to rotate.
1. Introduction
Particulate suspensions are important to both technology and fundamental science. The presence of particlessuspended in fluid leads to rich rheological behaviors of the mixture [ ] , and novel applications. For example,shear-thickening colloidal fluids have been used to reinforce the Kevlar-woven fabrics [ ] in bulletproof vests.Colloidal suspensions are important model systems in the study of jamming [ ] and phase transitions [ ] . Activesuspensions [ ] are prototypes of active matter [ ] , where the immersed particles may be self-propelled, drivento rotate, or coupled together by biologically active cross-linkers [
7, 8 ] .Here we discuss simulating the dynamics of particulate suspensions in a Newtonian solvent. The accuratepredictions of the dynamic properties of such systems are key to understanding their behaviors and to conceivingnew applications. However, simulating such systems is difficult. First, the intrinsic length scales in Stokes suspen-sions can be large due to long-range many-body hydrodynamic interactions (HIs hereafter). Second, boundaryconditions (no-slip, slip, electrostatic, magnetic, etc.) on each particle must be accurately satisfied. Otherwise,for example, the osmotic pressure measured from dynamic simulations may show significant deviations from itstrue value due to the numerical error in resolving particle-particle collisions [ ] . Third, such mixture systemsmust often be tracked for long times due to slow relaxation processes or significant Brownian noise in the system.Many numerical methods have been developed to simulate suspensions in a Newtonian solvent. The Rotne-Prager-Yamakawa tensor [
10, 11, 12, 13, 14, 15, 16, 17 ] is a popular approximation to account for HIs. This ∗ Corresponding author
Email address: [email protected], [email protected] (Wen Yan)
Preprint submitted to Elsevier May 18, 2020 a r X i v : . [ c s . C E ] M a y ethod keeps the mobility matrix (cid:77) symmetric-positive-definite (SPD), but is rather crude for dense colloidalsuspensions. A more accurate method is Stokesian Dynamics [
18, 19, 20, 21, 22 ] , which splits the HIs into afar-field and near-field part. The far-field part represents the HIs through multipole expansions truncated atthe stresslet level. The near-field lubrication effects are then added pairwise between particles with asymptoticlubrication resistance functions. Stokesian Dynamics has also been extended to electrorheological suspensions [
23, 24 ] and non-spherical particles [ ] . However, it is difficult to improve the accuracy of Stokesian Dynamicsfurther due to the algebraic complexity of including higher force moments on particles beyond the stresslet level.In all such computational methods, the efficient handling of collisions between particles remains a long-standing problem. Theoretically, for smooth rigid particles with no-slip boundary conditions and moving underfinite forces, lubrication effects prevents their collision [ ] . However, collisions are inevitable in simulationsbecause numerically it is impossible to fully resolve the opposing lubrication effects with the finite time-stepsizes and finite accuracy of the HI solvers. Further, particles with surface roughness or surface slip velocity mayactually collide. This feature is believed to underly the observed loss of flow reversibility in the shearing of non-Brownian, “spherical” particle suspensions [ ] . In simulations, such collisions are usually handled, or resolved,by using a prescribed short-range, pairwise, repulsive potential between particles. For example, an exponentiallydecaying pairwise repulsive force is used in Stokesian Dynamics. Such pairwise potentials must be steep andtherefore imposes a restrictive upper bound on the time-step size to maintain stability. Penalty methods [ ] work similarly and suffer from the same stability constraints. To build a stable and efficient numerical scheme, we reconsider the collision resolution method. Pairwise po-tentials and penalty methods compute collision forces on each particle at the start of each time-step. Alternatively,collision forces can be solved for by imposing non-overlapping constraints between bodies on particle configura-tions over time. In this way, the stability issues induced by the stiffness induced by potentials or penalties can beavoided. Foss and Brady [ ] applied a ‘potential-free’ algorithm in this fashion, where overlaps are resolved byiteratively moving each overlapping pair of particles apart until a non-overlapping configuration is achieved. Thisscheme, however, may converge slowly in dense systems and does not allow many-body HI coupling betweenparticles. Another method is the constrained minimization scheme developed by Maury [ ] . This method worksas a prediction-correction scheme, where a predicted velocity U p for each particle is first computed, and then acorrected velocity U satisfying the no-overlap constraints is obtained by minimizing the L norm of the difference U − U p for overdamped systems. For underdamped systems, the objective function being optimized is modified toinclude the effects of mass and acceleration. This scheme, however, does not determine collision forces betweeneach colliding pair, making it difficult to compute the mechanical stress induced by collisions.Non-overlapping constraints can be coupled with collision forces in inelastic collisions to construct comple-mentarity constraints, since each close pair of particles must be at one of the two possible states: (1) they havecollided, and so the collision force is positive and their minimal separation is zero, or (2) they did not collide,and so the collision force is zero and the minimal separation is positive. The early development of these methodsformulated collision resolution problems in rigid body dynamics in various forms and probed their mathematicalproperties, for example, the existence of solutions [
30, 31, 32, 33, 34 ] . A linear complementarity formulationwas soon developed to practically simulate collections of frictionless [ ] and frictional [
36, 37 ] rigid particles.This formulation uses a first-order Euler temporal discretization and solves linear complementarity problems ateach time-step. This method has been proved [
38, 39 ] to generate convergent particle trajectories as the time-step ∆ t →
0. These early developments have been summarized by Stewart [ ] , and extended to handle stiffexternal forces [ ] and bilateral constraints such as mechanical joints [ ] . This early formulation, however,considered non-convex linear complementarity formulations for frictional particles and has been superseded bythe modern formulation of Anitescu [ ] , where a convex cone quadratic program is solved at each time-step.This convexity allows efficent numerical solution in large scale simulations for frictional granular flow problems [
44, 45, 46, 47, 48, 49, 50 ] , and open source software implementations of this method are now available [
51, 52 ] .Recently, this formulation has been extended to deformable particles [
53, 54 ] . To speed up these large scale sim-ulations, various iterative solvers have been discussed and compared in [
55, 56, 57, 58, 59 ] . Recent work by twoof the co-authors (Corona and Veerapaneni) and their co-workers demonstrated the scalability of these methods In one interpretation, collision resolution algorithms are also models for the irreversible processes associated with the collisions ofphysical, and hence rough, particles.
2n large distributed memory machines, simulating the frictional multibody dynamics of hundreds of millions ofgranular particles [ ] .For particulate Stokes flow problems, the recent work of Lu et al. [ ] applied a similar explicit contact con-straint enforcement approach for simulating deformable particles. Furthermore, this approach was extended tothree dimensional problems in [ ] . Both works demonstrated impressive gains in computational efficiency byimproving the stability and accuracy of the underlying numerical solvers. However, we note that their imple-mentation is tightly bound to a particular boundary integral (BI) fluid solver. Moreover, the computed collisionforces do not follow Newton’s Third Law where collision forces for a colliding pair must be equal and opposite.Therefore, although the collision forces are explicitly computed in this method, the collision stress may still beincorrect.A proper extension of the complementarity method to Stokes particulate suspensions is difficult because theparticle motion is overdamped in contrast to granular flows dominated by inertia. This requires significant re-formulation of the complementarity problem. The solver must also be improved because the complementarityproblem now involves a full dense matrix due to the many-body hydrodynamic coupling in Stokes suspensions.Yan et al. [ ] presented a method to resolve normal collisions for arbitrarily-shaped rigid particles in Stokessuspensions, as an extension of [ ] . The complementarity problem for collision resolution is reformulated anda different but more efficient solution algorithm is described, utilizing the symmetric-positive-definiteness ofthe mobility matrix. This approach is generic because the motion induced by collision force is represented bythis mobility matrix, and any hydrodynamic solver can be used. This method imposes Newton’s Third Law andguarantees the symmetry and translational invariance of the collision stress tensor. They validated this methodby computing the equation of state for Brownian spherocylinders, and demonstrated its application to variousself-propelled rod systems.Although the method is generic, results presented in [ ] neglected the many-body HI coupling. In thiswork, we extend this generic method to Stokes particulate suspensions with full many-body HIs. We use BImethods specialized for spheres and achieving spectral accuracy by spherical harmonic expansions for singularand near-singular integration [ ] . The implementation is fully parallelized with hybrid OpenMP and
MPI toachieve scalability. We demonstrate the application of this platform to various problems, such as sedimentationof particle ensembles, and the collective dynamics of rotor systems.The latter class of many-particle systems has been studied of late as an active matter system, showing aspectssuch as activity-induced phase separation [ ] , crystallization [ ] , odd rheological and surface flow dynamics [ ] , and forms of active turbulence [ ] . Motivated by the simulations of [ ] , Fig. 1 shows a large-scalesimulation of 20, 000 particles, densely packed and suspended on a plane in the fluid, with each particle drivenby an external out-of-plane torque. This simulation has roughly 6 × degrees of freedom, and was run on576 CPU cores. The simulation shows the development of an extensive and inhomogeneous fine-scaled collisionnetwork, as well as the development of large-scale collective rotation induced by long-ranged hydrodynamiccoupling. Of scientific interest is the interaction of multiple such ensembles, and the detailed structure of particleflows within them. 3 igure 1: A simulation snapshot of a monolayer of 20000 spherical particles with radii a at approximately 60% area fraction. A constantexternal torque T in the direction perpendicular to this monolayer is exerted on each particle to drive a counter-clockwise planar rotation ofthis mobolayer. (A) shows the entire simulation, with each particle colored by its total collision force magnitude. (B) shows the region insidethe white box. The left panel of (B) shows the hydrodynamic force density (traction) on the sphere surfaces induced by collisions, scaled by T / a . The center and right panels show the net collision forces, scaled by T / a , on each particle and on each collision constraint, respectively.The three panels of (B) are colored by the same colormap as in (A). . Problem formulation In this section we briefly summarize the well-known mobility problem, i.e., the motion of arbitrarily-shapedparticles driven by forces and torques and immersed in a 3D Newtonian Stokes flow. Here, for simplicity, thefluid occupies all space but the particle volumes. Hence, consider a suspension of N rigid particles, like thatshown in Fig. 1 but not limited to spherical shapes. Let { V j , Γ j , c j } Nj = denote the volume, boundary, and trackingpoints of the particles, respectively. Here, the ‘tracking point’ refers to any point on one particle, not limited toits center-of-mass, because in this manuscript we consider only overdamped dynamics of particles where inertiacan be neglected. Each particle moves with a translational velocity U j , e and angular velocity Ω j , e , in response toan externally applied body force F j , e and a body torque T j , e .Denoting the fluid viscosity by η , the fluid velocity by u , and the pressure by p , the fluid stress is σ = − p I + η (cid:2) ∇ u + ( ∇ u ) T (cid:3) . Neglecting inertial and body forces gives that ∇ · σ = −∇ p + η ∇ u = (cid:82) \ ∪ j V j , (1a) ∇ · u = (cid:82) \ ∪ j V j , (1b) u → x → ∞ . (1c)The traction f , i.e., the hydrodynamic force density on the particle surface applied by the fluid, is determinedby the stress tensor σ as f = σ · n , where the surface normal n points into the fluid domain. The solution ( u e , p e ) to Eq. (1)(a-c) is subject to the boundary conditions: u e = U j , e + Ω j , e × ( x − c j ) on Γ j , ∀ j , (2a) ˆ Γ j f e dS = F j , e , ∀ j , (2b) ˆ Γ j ( x − c j ) × f e dS = T j , e , ∀ j . (2c)There may also be collision a force F j , c and torque T j , c on each particle j applied by other particles. Theconsequent velocities ( U j , c , Ω j , c ) induced by F j , c and T j , c satisfy the same Stokes equation (1) and boundaryconditions: u c = U j , c + Ω j , c × ( x − c j ) on Γ j , ∀ j , (3a) ˆ Γ j f c dS = F j , c , ∀ j , (3b) ˆ Γ j ( x − c j ) × f c dS = T j , c , ∀ j . (3c)Due to the linearity of the Stokes equation (1) and boundary conditions Eqs. (2) and (3), the overall solutionis simply the superposition of the two separate mobility problems induced by external and collisional forces andtorques: u = u e + u c , p = p e + p c , f = f e + f c , U j = U j , e + U j , c , Ω j = Ω j , e + Ω j , c . (4)Conventionally, the solution to a mobility problem can be written compactly as (cid:85) = (cid:77) (cid:70) , where (cid:85) = (cid:128) . . . , U xj , U yj , U zj , Ω xj , Ω yj , Ω zj , . . . (cid:138) and (cid:70) = (cid:128) . . . , F xj , F yj , F zj , T xj , T yj , T zj , . . . (cid:138) are both column vectors with 6degrees of freedom per particle. (cid:77) is the mobility matrix , which is a dense square matrix containing all theinformation of the Stokes equation and boundary conditions. The solutions given by Eq. (2) and Eq. (3) can bewritten compactly as (cid:85) e = (cid:77) (cid:70) e and (cid:85) c = (cid:77) (cid:70) c , respectively.5 is usually not formed explicitly because of the difficulty and high cost of computing its entries, except incases where the many-body coupling of HIs is treated with very crude approximations [
10, 11 ] . Instead, a linearsystem Ax = b is usually constructed to solve the mobility problem. The linear operator A is usually computedfrom the geometry and boundary conditions, containing all the information of (cid:77) . The right hand side vector b is usually computed from (cid:70) , and the velocities (cid:85) are computed according to the solution x . The linear system Ax = b is ususally solved iteratively.There are a variety of methods to construct the linear system Ax = b . Stokesian Dynamics [
19, 25, 21 ] formsthe linear system using multipole expansions for both spherical and non-spherical particles. BI methods formsthe linear system using boundary integral operators and discretization of particle surfaces [ ] . When high levelsof accuracy are necessary, BI methods provide the most accurate and efficient solvers to the mobility problem;for example, see [
70, 64, 71, 72 ] and references therein.It is well-known that (cid:77) is symmetric-positive-definite [
18, 26 ] . Physically, this is related to the dissipativenature of a Stokes suspensions. The energy injected by driving forces and torques is always dissipated instanta-neously by the fluid flow. This is the key to our collision resolution algorithm.
3. Collision resolution algorithm
In this section we describe our collision resolution algorithm in detail, making no assumptions on the particleshape, or on the numerical method to solve the mobility problem (cid:85) = (cid:77) (cid:70) . The dynamics of particulate Stokessuspensions is overdamped, i.e., the particle inertia is negligible. In this case, collisions are inelastic with a zerocoefficient of restitution. Consequently, a colliding pair of particles remain in contact until they are driven apartby flow or collisions with other particles. This is different from granular flow, where inertial effects dominate andparticles may rebound after collisions, depending on their coefficients of restitution. We also ignore inter-particlefriction for Stokes suspensions since, physically, hydrodynamic lubrication effects dominate for smooth particlesclose to contact. In general, Eq. (4) can be generalized as: (cid:85) = (cid:85) nc + (cid:85) c . (5)Here the subscript nc stands for all non-collisional motion, such as the motion (cid:85) e driven by external forces andtorques. In other problems, (cid:85) nc may originate from prescribed motions or other physical processes, such asBrownian fluctuations or electrophoresis.The geometric configuration of a collection of rigid particles in 3D space is fully specified by the tracking pointlocation c j and the orientation unit quaternion θ j = { s , p } ∈ (cid:82) of each particle j . The temporal evolution of c j and θ j is given by: ˙ c j = U j , (6)˙ θ j = Ψ j Ω j , where Ψ j = (cid:20) − p Tj s j I − p j (cid:21) ∈ (cid:82) × . (7)Here we follow the rigid body kinematic equation using quaternions by Delong et al. [ ] . Similar to (cid:70) and (cid:85) ,we define the configuration (cid:67) as a column vector with 7 N entries, containing c j and θ j for all N particles. Theoverdamped equation of motion for these particles can be written compactly as:˙ (cid:67) = (cid:71) (cid:85) = (cid:71) (cid:85) nc + (cid:71) (cid:77) (cid:70) c . (8)where (cid:71) ∈ (cid:82) N × N is a block diagnoal matrix, containing 3 × × Ψ j corre-sponding to c j and θ j for each particle j , respectively. Note that (cid:71) depends on those quaternion θ components in (cid:67) , but not on those c components. If the orientation of each particle is represented with Euler angles or rotationmatrices, the kinematic equation can still be written in the form of Eq. (8), but the dimension of (cid:67) and the6efinition of each Ψ j block must both be adjusted accordingly. However, different representations of orientationdoes not affect the derivation of collision resolution algorithms [ ] .The equation of motion Eq. (8) should be augmented by geometric constraints to generate trajectories withoutoverlaps between particles. The constraints are simply that the minimal separation distance Φ (cid:96) ( (cid:67) ) betweeneach pair (cid:96) of close rigid objects remain non-negative for all configurations (cid:67) . In total there are potentially n c = N ( N + ) / (cid:96) , we denote the collision force magnitudebetween this pair of particles as γ (cid:96) . The pair ( Φ (cid:96) , γ (cid:96) ) must satisfy one of two conditions:No contact: Φ (cid:96) > γ (cid:96) =
0. (9)In contact: Φ (cid:96) = γ (cid:96) >
0. (10)Mathematically, this yields a complementarity problem , written vertically over the set of pairs as:0 ≤ Φ ⊥ γ ≥
0, (11)where Φ = ( Φ , Φ , ... ) ∈ (cid:82) n c and γ = ( γ , γ , ... ) ∈ (cid:82) n c denote the collections of minimal distances and contactforce magnitudes for each (cid:96) .For N rigid particles, let D (cid:96) ∈ (cid:82) N be a sparse column vector mapping the magnitude γ (cid:96) to the collision force(and torque) vector applied to each particle. We define a sparse matrix (cid:68) ∈ (cid:82) N × n c as a collection of all D (cid:96) : (cid:68) = (cid:2) D D . . . D n c (cid:3) ∈ (cid:82) N × n c , (cid:70) c = (cid:68) γ . (12)For each pair (cid:96) , D (cid:96) defined in this way has 12 non-zero entries for non-spherical shapes, corresponding to 3translational and 3 rotational degrees of freedom for each particle. For the special cases of two spheres, D (cid:96) has6 non-zero entries, as normal collision forces induce no torques on spheres. The entries of each D (cid:96) are explicitlygiven in [ ] . For rigid particles, there is an important relation between the transpose of (cid:68) and Φ [ ] : (cid:68) T = ( ∇ (cid:67) Φ ) (cid:71) . (13)This relation holds for different choices of orientation representations used in (cid:67) and (cid:71) , including quaternionsand Euler angles [ ] .Combining Eq. (8), CP (11), and Eq. (12), we reach a differential variational inequality (DVI) for N particlesand N ( N + ) / (cid:67) = (cid:71) (cid:85) nc + (cid:71) (cid:77) (cid:68) γ , (14a)0 ≤ Φ ( (cid:67) ) ⊥ γ ≥
0. (14b)Here (cid:71) , (cid:85) nc , (cid:77) , (cid:68) , and Φ depend only on the geometric configuration (cid:67) . This DVI is solvable and integrableover time once a relation between the configuration (cid:67) and the collision force γ is supplied, that is, a time-steppingscheme. Higher order schemes such as the Runge-Kutta scheme can be used, but for simplicity of presentationwe derive a first order Euler integration scheme here. With a given time-step ∆ t , the total number of constraints n c can be greatly reduced from N ( N + ) / O ( N ) , since only those pairs that are close enough to be possibly in contact within this time-step need to beincluded in the constraints. For example, for hexagonal close packing of spheres, every sphere is in contact with12 neighbors. This gives n c = N . We define (cid:65) as the set of pairs of bodies ‘close-to-collision’, that is, for whichthe minimal separation distance function Φ (cid:96) is smaller than a positive threshold δ : (cid:65) ( (cid:67) , δ ) = { (cid:96) | Φ (cid:96) ( (cid:67) ) ≤ δ } . (15)The choice of δ depends on the particle velocity (cid:85) and the time-step ∆ t , and should be taken large enough sothat no possible collisions are missed in the collision resolution algorithm. For each pair (cid:96) ∈ (cid:65) , the non-overlap7ondition can be simply stated as Φ (cid:96) ( (cid:67) ) ≥
0. Increasing δ gives a larger set (cid:65) , and increases the dimension ofthe CP (11). Empirically, we set δ to 30% of the sum of the radii for each pair of spherical particles.The time-stepping scheme should evolve a non-overlapping configuration (cid:67) k at t k to a non-overlapping (cid:67) k + at t k + . Therefore, for the discretized version of the DVI (14), the complementarity condition is betweenthe geometry at the end of this time-step Φ ( (cid:67) k + ) and the collision force at this time-step γ k : (cid:67) k + = (cid:67) k + (cid:71) k (cid:85) knc ∆ t + (cid:71) k (cid:77) k (cid:68) k γ k ∆ t , (16)0 ≤ Φ (cid:0) (cid:67) k + (cid:1) ⊥ γ k ≥
0. (17)This is a nonlinear complementarity problem (NCP) because the minimum separation Φ is a nonlinear functionof (cid:67) . For first-order Euler time-stepping, this NCP can be linearized as an LCP by taking a one-term Taylorexpansion of Φ over (cid:67) :0 ≤ Φ ( (cid:67) k ) + ( ∇ (cid:67) Φ ) k (cid:71) k (cid:85) knc ∆ t + ( ∇ (cid:67) Φ ) k (cid:71) k (cid:77) k (cid:68) k γ k ∆ t ⊥ γ k ≥
0. (18)At each time-step t k we solve the LCP (18) to compute the collision force magnitude γ . Then we use Eq. (16)to update the geometry of all the particles. In summary, the procedures of this LCP method include:1. Compute (cid:85) knc at time-step k .2. Construct the sparse matrix (cid:68) k given the geometric configuration (cid:67) k and apply threshold δ for possiblecontacts.3. Solve the LCP for γ k . (cid:85) kc and (cid:70) kc are computed simultaneously.4. Evolve the system to (cid:67) k + with (cid:85) knc + (cid:85) kc .This derivation is mostly the same as in the granular flow case [ ] , except that inertia and friction areignored. However, this does not simplify the problem, because the mobility matrix (cid:77) is a full dense matrix,in contrast to the block-diagonal inertial and moment-of-inertia matrix in the granular flow case. The high costinvolved in computing (cid:77) (cid:68) γ requires careful attention to the solution strategy. Once the solution to LCP (18)is found, a subset (cid:65) c of (cid:65) can be identified as the ‘active constraints’, where the collision force between twoparticles is non zero: (cid:65) c = { (cid:96) | γ (cid:96) > } ⊆ (cid:65) . (19) The success of the LCP collision resolution algorithm depends on the efficiency of the solver. In this sectionwe briefly discuss the state-of-the-art solvers used in this work. The superscript k denoting the time-steps aredropped in this subsection to simplify notation.The LCP defined in LCP (18) can be written as the following standard form, using Eq. (13):0 ≤ M γ + q ⊥ γ ≥
0, (20a) M = (cid:68) T (cid:77) (cid:68) , q = ∆ t Φ ( (cid:67) ) + (cid:68) T (cid:85) nc , (20b)where M and q have been scaled by 1 / ∆ t for convenience. The term (cid:68) T (cid:85) nc computes the (linearized) changein the minimum separation function Φ due to non-collisional motion. Each evaluation of M γ corresponds to thesolution of a mobility problem for the contact force (cid:70) c = (cid:68) γ . For large enough numbers of particles, it may thusbe preferable to rely on matrix-free operations instead of forming M explicitly.8 .3.1. Iterative solution methods The matrix M defined in LCP (20) is in general symmetric-positive-semidefinite (SPSD), because the mo-bility matrix (cid:77) is SPD. Therefore the LCP can be conveniently converted to a convex constrained quadraticprogramming (CQP) [ ] : γ = arg min γ ≥ (cid:149) f ( γ ) = γ T M γ + q T γ (cid:152) . (21)For convenience, we denote the gradient of the objective function f ( γ ) as g = ∇ f = M γ + q . From a physicsperspective, minimizing f ( γ ) can be understood as minimizing the total virtual work done by collision forces.Popular methods to iteratively solve this CQP includes the second-order minimum-map Newton method, and thefirst-order projected gradient descent (PGD) method.The minimum-map Newton method [ ] proceeds by reformulating the LCP as a root-finding problem for the L norm of the componentwise minimum-map function H = min ( γ , g ) , which means taking the smaller entry atevery corresponding component of the two vectors: H i = min { γ i , g i } . ϕ ( γ , g ) = (cid:107) min ( γ , g ) (cid:107) =
0. (22)The solution γ to the CQP (21) is reached when ϕ =
0. This Newton-type method can achieve high accuracy,but the cost is usually much higher than first order methods because a linear system must be iteratively solved atevery minimization step to compute the Newton step.Various PGD methods have been proposed to solve this CQP efficiently because the gradient g = M γ + q is straightforward to compute, and the constraint γ ≥ γ to 0 because negative components violate the non-negative constraint of the LCP problem. Thisis a ‘projection’ into the feasible region γ ≥
0, at each gradient descent step. Such first-order methods do notfind the root to Eq. (22). Instead, Eq. (22) is used to check the convergence with a prescribed residual tolerance ε tol . When ϕ ( γ , g ) < ε tol , the PGD iterations stop.The key ingredient in all PGD methods is to choose a proper step-size α j for the j -th GD step: γ j + = γ j − α j ∇ f ( γ j ) . The Accelerated PGD (APGD) by Mazhar et al. [ ] is shown to be a competitive choice in a recentsurvey article [ ] in the context of granular flow. However, the step size α j chosen by APGD converges onlywhen α j < / L , where L is the Lipschitz constant for the linear operator M and is usually not known a priori.Consequently, α j is adaptively adjusted at each iteration to fit a local estimated Lipschitz constant L j at the j -thstep. This adjustment process is prohibitively expensive in our context since the gradient g is evaluated a largenumber of times, each of which requires solving a mobility problem.A much more efficient method for Eq. (21) is the Barzilai-Borwein PGD (BBPGD) method [
75, 76, 77 ] . Thismethod has been successfully applied to our collision resolution method for cases where the many-body couplingof HIs are ignored [ ] . It constructs the step size α j from the previous two steps, and does not require any stepsize adjustment as APGD. A version of this method adapted to Eq. (21) is summarized by Algorithm 1. In thisalgorithm, the two different choices of step size α BB j and α BB j can be used either consistently throughout allsteps, or alternatively for odd and even j -th steps. We find that there is almost no difference in performance fordifferent choices of α BB j or α BB j (Step 15) in solving our problems, and we use α BB j for all results reported inthis work. In BBPGD, two evaluations, M γ and M g (Step 2 and 6), are necessary before the first iteration,and thereafter only one evaluation M γ j is needed per iteration. In our numerical tests BBPGD shows a similarconvergence rate as APGD, but each APGD iteration is significantly more expensive because of the necessity tocheck the Lipschitz condition. ε tol The error bound for the general case SPSD matrix M [ ] is highly complicated and there is no direct relationbetween the error bound to our residual function ϕ . However, for most practical simulations the matrix M isSPD. This is because if we assume that there is a vector γ (cid:54) = such that γ T M γ =
0, then (cid:68) γ = because (cid:77) is SPD in LCP (20). This means (cid:68) must be rank-deficient for its column vectors. There exists at least onecolumn vector D (cid:96) linearly dependent on others. Geometrically, this is possible but only for very special shapesand configurations of particles. In most cases, the column vectors (constraints) in (cid:68) are independent of eachother, and M = (cid:68) T (cid:77) (cid:68) is SPD. 9 lgorithm 1 The Barzilai-Borwein Projected Gradient Descent method Solve LCP (20) with initial guess γ . g = M γ + q . if ϕ ( γ , g ) < ε tol then Solution is γ . end if Simple gradient-descent step size: α = g T g / g T M g . for j = j max do Compute the descent step: γ j = γ j − − α BBj − g j − . Projection: set the negative components of γ j to zero. Compute the gradient: g j = M γ j + q . if ϕ ( γ j , g j ) < ε tol then Stop iteration, solution is γ j . end if Update: s j − = γ j − γ j − , y j − = g j − g j − . Update: α BB j = s Tj − s j − / s Tj − y j − or α BB j = s Tj − y j − / y Tj − y j − . Set α BBj to either α BB j or α BB j . end for This allows us to use the simpler error bound for SPD matrices [ ] . The absolute and relative error boundsfor Eq. (21) are between an arbitrary vector γ and the exact solution γ ∗ to the LCP (20): (cid:107) γ − γ ∗ (cid:107) ≤ (cid:107) M (cid:107) + λ min ϕ ( γ , g ) , (23) (cid:107) γ − γ ∗ (cid:107) (cid:107) γ ∗ (cid:107) ≤ cond ( M ) ( (cid:107) M (cid:107) + ) ϕ ( γ , g ) (cid:107) max ( − q ) (cid:107) . (24)Here λ min is the smallest eigenvalue of M , (cid:107) M (cid:107) is the L -norm of M , cond ( M ) is the condition number of M , and max ( ., . ) is the ‘componentwise maximum map function’, a counterpart to H = min ( ., . ) .However, λ min , (cid:107) M (cid:107) , and cond ( M ) are all very difficult to estimate, because M = (cid:68) T (cid:77) (cid:68) and only verycrude estimates of (cid:77) are available [ ] : cond ( (cid:77) ) ≈ O ( ) of dilute suspensions and cond ( (cid:77) ) ≈ O (cid:0) (cid:1) fordensely packed spherical particles. A detailed error bound analysis is beyond the scope of this work. Instead, wepick ε tol based on physical intuition and practical considerations.We choose to set ε tol as an absolute error bound, following Eq. (23). This is because the collision force fordifferent constraints can be orders of magnitude different, as illustrated by the center panel of Fig. 1 and inother results reported in Section §6 and Section §7. This is one of the physical features of rigid body suspensions,where the particle-particle interactions induced by perturbation may decay rapidly as the perturbation propagatesand the particle-particle interaction in some regions of the system may be orders of magnitude larger or smallerthan other regions. One example phenomena is Brinkman screening [ ] where the perturbation is ‘screened’hydrodynamically. If using the relative bound, as in Eq. (24), the large collision forces dominate the norm and thesmaller entries in γ may be completely inaccurate, leading to inaccurate trajectories or even possibly overlappingconfigurations. We set ε tol = − for all results reported in this work unless otherwise specified.In other works on granular flow [
56, 57 ] , there are other residual functions used besides ϕ ( γ , g ) . Thisis because in granular flow the matrix M involves only a block diagonal inertial and moment-of-inertia matrix,instead of a full dense mobility matrix (cid:77) . Various matrix splitting and manipulation techniques can then be usedto compute various residual functions. For our problem Eq. (21), we stick to ϕ ( γ , g ) because of its negligibleextra cost of computation at every gradient descent step, and because it is directly related to the theoretical errorbounds Eqs. (23) and (24). 10 .3.3. Extensions In the above discussion we constructed the basic form of our collision resolution algorithm. There are twoimportant extensions that can be easily incorporated.The first straightforward extension is to include collisions between particles and external boundaries. Herea ‘boundary’ refers to any object either static or moving with a prescribed velocity. In other words, a ‘boundary’does not appear in the mobility matrix (cid:77) . For each collision pair (cid:96) between a particle and a boundary, a vector D (cid:96) can be added to (cid:68) in Eq. (12). The only difference is that these D (cid:96) column vectors have 6 non-zero entriescorresponding to the degrees of freedom of the particle only. Once (cid:68) is constructed including all these particle-boundary collisions, the solution to Eq. (21) remains exactly the same.Another straightforward extension is to include particle motions other than those driven by external forces.For example, consider squirmers, which are rigid particle models for ciliated organisms [
82, 83 ] . Squirmerspropel themselves through a quiescent fluid, without any external driving force, by inducing a nonzero surfaceslip velocity u j , slip on the outer flow. For these particles, the swimming motion ( U j , swim , Ω j , swim ) can be computedby solving the Stokes equation (1) subject to the boundary conditions: u = u j , slip + U j , swim + Ω j , swim × ( x − c j ) on Γ j , ∀ j , (25a) ˆ Γ j f dS = ∀ j , (25b) ˆ Γ j ( x − c j ) × f dS = ∀ j . (25c)When external forces exist or squirmers may collide with each other, this set of boundary conditions Eq. (25) canbe superposed to the boundary conditions Eq. (2) and Eq. (3). The swimming velocities (cid:85) swim and externallydriven velocities (cid:85) e can be computed straightforwardly and independently of each other, without any knowledgeabout the collisional motion, at the beginning of each time-step. Then, we can apply the collision resolutionalgorithm described in this section to compute the collision velocities (cid:85) c , by simply setting the non-collisionalvelocities as the sum of swimming velocities and externally driven velocities: (cid:85) nc = (cid:85) swim + (cid:85) e . (26)
4. Boundary integral formulation for the mobility problem
The collision resolution method described above can be applied in conjunction with any mobility problemsolvers for arbitrarily-shaped convex bodies. Since one mobility problem needs to be solved at each evaluation
M γ during the LCP solution, the computational cost and accuracy strongly depend on the mobility solver beingused.In this work, we apply a recently developed ‘indirect’ BI formulation [ ] to solve the mobility problem. Whilethe PDEs Eq. (1) and Eq. (2) can be reformulated as an integral equation in a number of ways using potentialtheory [ ] , the advantage of this particular formulation is that second-kind integral equations are obtainedwithout introducing any additional unknowns and constraints such as those in the work by Power and Miranda [ ] . The discretized linear system can be solved rapidly with iterative solvers and, in most cases, 10 ∼ The reformulation of Eq. (1) into a set of integral equations relies on the following standard operators:Single Layer Operator: (cid:83) Γ [ ρ ]( x ) = ˆ Γ G ( x − y ) ρ ( y ) dS y , (27)Double Layer Operator: (cid:68) Γ [ ρ ]( x ) = ˆ Γ T ( x − y ) n ( y ) ρ ( y ) dS y , (28)Traction Operator: (cid:75) Γ [ ρ ]( x ) = ˆ Γ T ( x − y ) n ( x ) ρ ( y ) dS y . (29)11ere n is the outward surface normal vector to the rigid body boundary Γ , and G and T are the fundamentalsolutions to the Stokes equations. In particular, G and T are 2 rd and 3 nd rank tensors known as the Stokesletand traction kernels, respectively, and are given by G i j ( r ) = π (cid:18) δ i j | r | + r i r j | r | (cid:19) , (30) T i jk ( r ) = − π r i r j r k | r | . (31)Note that the viscosity η is set to unity here. For N rigid bodies suspended in free-space with respective boundaries Γ = { Γ j } Nj = , the indirect BI formulation of [ ] sets the fluid velocity u at an arbitrary point x in the fluid domainas u ( x ) = (cid:83) Γ [ ρ + ζ ]( x ) , (32)The unknown density functions ρ and ζ are determined as follows. The role of ρ is to match the given force andtorque (Eq. (2)d & e) on each body. It has been shown [
85, 70 ] that this can be achieved by setting ρ j on eachrigid body j as: ρ j ( x ) = F j | Γ j | + τ − j T j × (cid:0) x − c j (cid:1) , (33)where τ j is the moment of inertia tensor [ ] and | Γ j | is the surface area defined for particle j with surface Γ j . Forthe remaining unknown ζ , an integral equation can be obtained by enforcing that the internal stress generatedby ρ + ζ on each rigid body vanishes. This is accomplished by: (cid:129) (cid:73) + (cid:75) Γ + (cid:76) Γ (cid:139) [ ζ ]( x ) = − (cid:129) (cid:73) + (cid:75) Γ (cid:139) [ ρ ]( x ) , ∀ x ∈ Γ , (34)where for each Γ j , (cid:76) Γ j is a local operator defined for each particle independently, designed to remove the 6dimensional null space of the operator (cid:73) + (cid:75) Γ : (cid:76) Γ j [ ζ ]( x ) = | Γ j | ´ Γ j ζ j ( y ) dS y + τ − j (cid:128) ´ Γ j (cid:0) y − c j (cid:1) × ζ j ( y ) dS y (cid:138) × (cid:0) x − c j (cid:1) . We refer to [ ] for more details.In summary, the mobility problem is solved by first computing ρ using given forces and torques, then solvingEq. (34) for ζ , and then computing the fluid velocity u where needed with Eq. (32). The velocities U j , Ω j foreach rigid body can be computed by averaging u over the surface Γ j . In this approach, the operation (cid:85) = (cid:77) (cid:70) reqires an iterative solution to Eq. (34), instead of a simple explicit matrix-vector multiplication. The BI formulation Eq. (32) is applicable to arbitrarily-shaped rigid bodies, but the accuracy relies on the accu-rate evaluation of the operators (cid:75) and (cid:83) for the given geometry. For well-separated rigid bodies this is straight-forward via various standard surface discretization and smooth quadrature rules. However, for close pairs nearlyin contact, special techniques must be employed because the Stokes kernels G and T become nearly-singular .While such techniques are well-developed for two-dimensional problems (see [ ] and references therein), opti-mally handling arbitrary geometries in three dimensions is still an open problem and currently an active area ofresearch [
87, 88, 89 ] . For spherical geometries, however, an efficient method based on vectorial spherical har-monic (VSH) basis functions was recently developed in [ ] by two of the co-authors. Here, we briefly summarizethis VSH technique and apply it to develop a computational framework for spheres.Any smooth vector field ρ , e.g., the hydrodynamic traction, defined on a spherical surface Γ can be representedas an expansion over the VSH basis functions V mn , W mn , X mn : ρ = (cid:88) n ≥ − n ≤ m ≤ n (cid:12)(cid:12) V mn (cid:12)(cid:12) ˆ ρ Vn , m V mn + (cid:12)(cid:12) W mn (cid:12)(cid:12) ˆ ρ Wn , m W mn + (cid:12)(cid:12) X mn (cid:12)(cid:12) ˆ ρ Xn , m X mn , (35)12here the basis functions V mn , W mn , X mn are generated from the scalar spherical harmonic functions Y mn : G mn = ∇ Γ Y mn , (36) V mn = G mn − ( n + ) Y mn e r , (37) W mn = G mn + nY mn e r , (38) X mn = e r × G mn , (39)where ∇ Γ = ∂∂ θ e θ + θ ∂∂ φ e φ is the surface gradient operator. Note that when n = V = − Y e r does notvanish, being just a vector field pointing inward on the unit sphere surface.The coefficients ˆ ρ Vn , m , ˆ ρ Wn , m , ˆ ρ Xn , m are the inner product of ρ and the basis functions on the spherical surface:ˆ ρ Vn , m = (cid:10) ρ , V mn (cid:11) , ˆ ρ Wn , m = (cid:10) ρ , W mn (cid:11) , ˆ ρ Xn , m = (cid:10) ρ , X mn (cid:11) , (40)where 〈 u , v 〉 = ˆ S uv dS . (41)For real-valued ρ , the n , m and n , − m coefficients are complex conjugates to each other. In this work we followthe notational convention for spherical harmonics, and write Y mn as Y mn ( θ , φ ) = (cid:118)(cid:116) n + π (cid:118)(cid:116) ( n − m ) ! ( n + m ) ! P mn ( cos θ ) e im φ , (42)where P mn ( x ) = n n ! ( − ) m ( − x ) m / ∂ n + m ∂ x n + m (cid:0) x − (cid:1) n , (43)are Legendre polynomials. The P ± mn satisfy: P − mn ( x ) = ( − ) m ( n − m ) ! ( n + m ) ! P mn ( x ) . (44)The expansion Eq. (35) is spectrally convergent for a smooth density ρ . Similarly, for a target point not onthe spherical surface Γ , the value of the BI operators evaluated at this point (cid:83) Γ [ ρ ]( x ) , (cid:68) Γ [ ρ ]( x ) , and (cid:75) Γ [ ρ ]( x ) can all be expanded as a series summation over the VSH basis. For example, for a point x outside Γ , we firstwrite x = ( r , θ , φ ) in the spherical coordinate system of Γ . Then, the velocity u ( r , θ , φ ) = (cid:83) Γ [ ρ ]( x ) and thecorresponding fluid pressure p ( r , θ , φ ) governed by Stokes equation can be expressed by: u ( r , θ , φ ) = (cid:88) n , m f Vn , m ( r ) V mn + f Wn , m ( r ) W mn + f Xn , m ( r ) X mn , (45) p ( r , θ , φ ) = (cid:88) n , m g n , m ( r ) Y mn . (46)The mapping from the VSH coefficients for ρ in Eq. (35) to f Vn , m ( r ) , f Wn , m ( r ) , f Xn , m ( r ) , g n , m ( r ) is linear due to thelinearity of Stokes equation, and is diagonal: f Vn , m ( r ) = n ( n + )( n + ) r − ( n + ) (cid:12)(cid:12) V mn (cid:12)(cid:12) ˆ ρ Vn , m + n + n + (cid:0) r − ( n + ) − r − n (cid:1) (cid:12)(cid:12) W mn (cid:12)(cid:12) ˆ ρ Wn , m , (47a) f Wn , m ( r ) = n + ( n + )( n − ) r − n (cid:12)(cid:12) W mn (cid:12)(cid:12) ˆ ρ Wn , m , (47b) f Xn , m ( r ) = n + r − ( n + ) (cid:12)(cid:12) X mn (cid:12)(cid:12) ˆ ρ Xn , m , (47c) g n , m ( r ) = nr − ( n + ) (cid:12)(cid:12) W mn (cid:12)(cid:12) ˆ ρ Wn , m . (47d)13imilar diagonalized relations have been derived by Corona and Veerapaneni [ ] for both (cid:83) and (cid:68) operators,and for x both inside and outside Γ .For the traction operator (cid:75) , this mapping is no longer diagonal. In other words, the mapping from the VSHcoefficients for ρ to the coefficients of (cid:75) Γ [ ρ ]( x ) is a dense matrix. In this work, we derive a general analyticalrelation, for the target point x both inside and outside Γ , of this full dense mapping using Eq. (45). The traction t at each target point x is defined as: t = σ · n = (cid:0) − p I + ∇ u + ∇ u T (cid:1) · n , (48)where p has been given in Eq. (45) and (47). The velocity gradient tensor ∇ u has 9 components in the ( r , θ , φ ) spherical coordinate system: ∇ u = (cid:88) n , m g rrnm g r θ nm g r φ nm g θ rnm g θθ nm g θφ nm g φ rnm g φθ nm g φφ nm . (49)Each g nm can be analytically computed with the functions f Vnm , f Wnm , f Xnm in Eq. (47). The detailed expressions aregiven in Appendix A. The velocity gradient ∇ u is then transformed from the spherical coordinate system r , θ , φ to the Cartesian coordinate system by standard tensor rotation rules.For a smooth density ρ defined on a spherical surface Γ , (cid:83) Γ [ ρ ]( x ) , (cid:68) Γ [ ρ ]( x ) , and (cid:75) Γ [ ρ ]( x ) are again spec-trally convergent. This important feature allows us to represent lubrication effects efficiently with only a fewspherical harmonic modes. In simulations of many spheres, the operators (cid:83) Γ [ ρ ]( x ) , (cid:68) Γ [ ρ ]( x ) , and (cid:75) Γ [ ρ ]( x ) are first directly evaluated on discretized spherical harmonics grid points [ ] defined on the surface of eachsphere by Kernel Independent Fast Multipole Method (KIFMM). However, the results evaluated by KIFMM areinaccurate when spheres are close to each other, because the Stokes kernels G i j and T i jk are singular To overcomethis inaccuracy, for each close pair of spheres i , j we first transform the density ρ from the surface grid points tothe VSH coefficients ˆ ρ Vn , m , ˆ ρ Wn , m , and ˆ ρ Vn , m . Then, the VSH representation is used to evaluate the accurate valuesof the boundary integral operators (cid:83) Γ [ ρ ]( x ) , (cid:68) Γ [ ρ ]( x ) , and (cid:75) Γ [ ρ ]( x ) . The close pairs are detected by checkingthe center-to-center distance (cid:12)(cid:12) c i − c j (cid:12)(cid:12) . If this distance is smaller than a threshold value β ( R i + R j ) , the VSH rep-resentation is used. In this work, we usually choose β ∈ ( ) . We observed no benefits in accuracy for larger β , and the cost of close-pair VSH corrections quickly increases with increasing β .
5. Implementation
Proper implementation is necessary to achieve a scalable computational framework with modern
MPI+OpenMP parallelism that maximizes the efficiency on high-core-count CPUs. In this section we describe the four majorcomponents in our implementation.
The mobility problem is solved via GMRES iteration of the BI Eq. (34), where the operator (cid:75) must be evalu-ated once every GMRES iteration. The operator (cid:75) is an all-to-all operator, where the traction kernel T is evalu-ated between every pair of the spherical harmonic grid points on all spheres. In total, there are ( p + )( p + ) N points for N spheres with order p spherical harmonics. The operators (cid:83) and (cid:68) are also evaluated in this all-to-all style. These are standard operations and can be computed by KIFMM with O ( N ) cost. In this work, we usethe fully parallelized KIFMM package PVFMM [ ] , and code the kernel functions as optimized AVX2 intrinsicinstructions to fully utilize the 256-bit
SIMD capability of modern CPUs. The developed code is open-sourced as
STKFMM on GitHub. We verified the accuracy of the operator evaluations to machine precision, and benchmarkedthe scalability to thousands of cores on a CPU cluster interconnected by Intel Omni-Path fabrics. https://github.com/wenyan4work/STKFMM .2. Near neighbor detection Once the BI operators are evaluated with KIFMM, near-field corrections must be performed for close-to-contactpairs of spheres with the VSH representation. This step requires efficient detection of all close-to-contact pairs.This is a standard neighbor detection operation and can be efficiently completed with algorithms such as cell listor k-D tree. However, although there are a few high-performance libraries publicly available such as FDPS [ ] ,DataTransferKit [ ] , and LibGeoDecomp [ ] , there are some important features still missing. For example, thenecessary VSH data, including grid point coordinates, values, and expansion coefficients, needs to be efficientlymigrated between the MPI ranks, and customizable serialization and de-serialization are necessary to allow VSHdata with different order p to be sent and received as MPI messages.Therefore we built a custom near neighbor detection module based on a Morton-coded octree. This nearneighbor module is fully parallelized with both
OpenMP and
MPI . Once the near neighbor pairs are detected, thenecessary data is transferred in msgpack binary format with the msgpack-c library. Each pair is dispatchedto one
OpenMP thread to compute the near corrections, implemented in a thread-safe way. This near neighbordetection module is also used to identify the possibly colliding pairs to construct the set for all constraints (cid:65) .Then the sparse matrix (cid:68) is constructed for the LCP (20), distributed on all
MPI ranks.
Proper load balancing is necessary to ensure scalability over
MPI parallelism, and the balancing of both theLCP and the mobility problem must be considered. The balancing of the mobility problem Eq. (34) is handledthrough the KIFMM routine and the near neighbor detection routine. An adaptive octree with 2:1 balancing isbuilt inside the PVFMM library and decomposed to each MPI rank to allow efficient KIFMM evaluation. Similardecomposition is also used in the near neighbor detection module so that each
MPI rank handles roughly the samenumber of near neighbors pairs to perform VSH corrections. The decomposition for the LCP is slightly different.In LCP (20), each contact pair appears only once in the vector γ and the geometric matrix (cid:68) . We use a simple buteffective strategy to partition γ and (cid:68) . Each particle is labeled with an index i , which is globally unique acrossall MPI ranks and randomly initialized in the beginning of simulations, so that i is uncorrelated with its spatiallocation c i . When the pair of particles i , j is detected to be possibly colliding, the minimum separation Φ i j , thesparse colliding geometry column vector D i j , and the unknown collision force γ i j , are determined to be ownedby the MPI rank which owns the smaller index between i , j . The transposed matrix (cid:68) T is then constructed incompressed-row-storage (CRS) format where the rows are partitioned to each MPI rank. Φ and γ are partitionedin the same way as (cid:68) T . Compared to the matrix-free implementation where the matrix (cid:68) T and (cid:68) are not explicitlyconstructed [ ] , our implementation takes more storage space but allows the utilization of the highly optimizedsparse linear algebra functions implemented in the libraries Tpetra and
Kokkos in Trilinos . For spherical harmonics of order p , we compute the VSH expansion coefficients from function values onthe ( p + )( p + ) spherical harmonic grid points using a spherical harmonic transform (SHT). We do thisby computing 2 p + φ direction and a Legendre transform in the θ direction for each sphere. The DFT is computed using the FFTW3 functions [ ] and requires O (cid:0) p log p (cid:1) workper sphere. We compute the Legendre transform using matrix-vector products, and this requires O (cid:0) p (cid:1) workper sphere. We do not use a fast Legendre transform (FLT) since it is advantageous only for very large p . Formultiple spheres, we parallelize using OpenMP by partitioning the spheres across threads. When the number ofspheres is greater than the number of threads, we use blocking to compute multiple transforms together. Thisallows us to use matrix-matrix products for the Legendre transform, and these are computed efficiently using anoptimized BLAS implementation. We use Intel Math Kernel Library for both the FFTW3 and BLAS functions. Wecan similarly compute function values on the spherical harmonic grid from the spherical harmonic coefficientsusing the inverse SHT. For vector valued functions, we can compute the coefficients for the representation in the https://github.com/msgpack/msgpack-c https://trilinos.org [ ] . Similarly, we can compute thegrid values from the VSH coefficients using three inverse SHTs.Once the VSH coefficients for the density function ρ have been computed for each sphere, we can thencompute the coefficients for the velocity u , the pressure p and the traction t for each sphere-target pair using thediagonal operators discussed in Section 4.2. Finally, we evaluate the VSH expansions to get the potentials. Thisrequires evaluating the VSH basis functions and has O (cid:0) p (cid:1) computational cost for each sphere-target pair. Sincethe positions of the spheres only change between time-steps, we could precompute these basis functions at eachtime step and reuse them during the linear solve. However, this would require O (cid:0) p (cid:1) memory for each sphereand near target pair; therefore, we do not do this precomputation in our current implementation.
6. Results
In this section we report the numerical accuracy and performance results for suspensions of spherical particlesin unbounded Stokes flow.
In this section we probe the accuracy of the mobility solver discussed in Section §4, for a few static configu-rations. The collision resolution algorithm is not used.
We begin with a convergence test for a static configuration where lubrication forces are important. A pair ofnearby spheres with radius a are driven by the same force F g , i.e. sedimentation, or torque T as illustrated bythe schematics in Figs. 2 and 3, respectively. To evaluate convergence, U and Ω are solved for both spheres, foreach case, at various gap separations ε . The errors are shown in Fig. 2 for the sedimentation case and in Fig. 3for the rotation case. In each figure, the left panel shows the convergence error (absolute value) relative to theresults generated by p = (cid:12)(cid:12) U g , error ( p ) (cid:12)(cid:12) = (cid:12)(cid:12) U g ( p ) − U g ( ) (cid:12)(cid:12) and | Ω error | = | Ω ( p ) − Ω ( ) | , of the particle onthe left.It is also well-known that spherical harmonic grids are significantly denser around the poles in comparisonto areas close to the equator. This non-uniformity across the sphere surface may affect the capability to resolvethe highly non-uniform distribution of hydrodynamic force f induced by lubrication effects. Because of this, thespherical harmonics grid is randomly oriented for each sphere at different ε . This randomness induces someasymmetry error in the computed U and Ω , defined as the difference of computed velocity for the two particles ∆ U g = U g ,1 − U g ,2 and ∆Ω = Ω − Ω , as shown in the right panels of Figs. 2 and 3.The results show that the asymmetry error is always on the same order as the convergence error. Therefore,the user does not need to pick a particular pole orientation to ‘better resolve’ the lubrication effect. Further,spectral convergence holds until the separation ε/ a is comparable with 1 / p . This is because for such small gapsthe hydrodynamic traction has a large peak in the near-contact region due to lubrication effects. Also, the errorfor the rotation case is larger than for the sedimentation case, because when the two spheres are rotating in thesame direction, the fluid in the near-contact region has a very large shear rate due to the relative motion of thetwo sphere surfaces. Accurate benchmark data for lubrication effects is surprisingly hard to find because the dominating methodin this setting was asymptotic expansions [
18, 26 ] with questionable convergence beyond 3 digits of accuracy [ ] . To our knowledge the most accurate lubrication benchmark is given by Wilson [ ] to 10 digits of accuracyfor a few particular geometries of several spheres. We use these results to evaluate the accuracy of our mobilitysolver based on KIFMM and VSH for three equidistant spheres of equal radii each driven by a constant force F perpendicular to their common plane. The results for the relative error of the translational velocity U andangular velocity Ω are shown in Fig. 4 as a function of the gap separation distance ε between each pair of spheres.The results in Fig. 4 show accuracy similar to the two-particle benchmarks shown previously in Fig. 2 andFig. 3. It is clear that the spectral convergence with increasing p is kept until ε/ a is comparable to 1 / p , which isroughly at ε/ a ≈ p =
24. When ε further decreases, the error decreases slowly with increasing p .16 igure 2: The convergence error and asymmetry error for the sedimentation velocity U g . The inset plot shows the sedimentation velocity incomparison to the single sphere Stokes solution F g / ( πη a ) .Figure 3: The convergence error and asymmetry error for the angular velocity Ω . The inset plot shows the angular velocity in comparisonto the single sphere Stokes solution Ω = T / ( πη a ) . igure 4: The relative error of translational and angular velocity in comparison to high accuracy reference data [ ] . The error is limited to10 − as this reference data has 10 digits of accuracy. Lubrication effects are added explicitly in the popular Stokesian Dynamics method [
18, 21 ] , where the asymp-totic analytical functions for particle hydrodynamic traction multipoles are tabulated and added to a far-fieldexpansion of the mobility matrix. The analytic functions for multipoles are truncated at the stresslet level. There-fore even at ε/ a = [ ] . Even with a low order grid p =
8, theVSH approach has accuracy on par with Stokesian Dynamics.For dynamic simulations with many possible collisions, to reduce computation cost we generally use a loworder grid with p =
6, 8, etc., because when particles are close to contact the benefits of increasing p are notsignificant enough to justify the extra cost. In particular, we set the collision radius a c to be slightly largerthan the particle radius a , and resolve collisions for each particle at the collision radius. Usually a c / a = [
19, 96, 97 ] in particle-tracking simulations in suspension mechanics, where a softrepulsive pairwise potential which acts at the length-scale a c is used to prevent particle overlaps. For example,in Stokesian Dynamics an exponentially decaying pairwise repulsive force is usually added over the length-scale ε ≈ ( ∼ ) a to prevent the spheres from overlapping [
19, 96 ] . In this work, we actually choose a c basedon the choice of p , for example, a c = a when p =
24 and a c = a when p = In this section we probe the behavior of the mobility solver and the LCP solver with a dynamic problem, wheretwo spheres of equal radius a are dragged by constant equal and opposite external forces ± F along the horizontal x -axis, with the spheres starting with a vertical offset a from each other. The two spheres then approach androll over each other. The collision radius is set to be a c = a , i.e., the collision force is non-zero when theseparation ε/ a = p =
24 is randomly chosen for each sphere, and differentrandom orientations are used for computing (cid:85) nc and (cid:85) c . The time-step ∆ t = η a / F is fixed for the resultsreported in Fig. 5, and the simulation remains stable if ∆ t is increased by a factor of 10. The residual toleranceof BBPGD is set to 10 − . The simulation shown in Fig. 5 includes 4 stages, as illustrated by the snapshots (A),(B), (C), and (D). In each snapshot, the left panel shows the hydrodynamic traction induced by the externalforce F , with the grey arrows showing the total velocity U nc + U c for each sphere. The right panel shows thehydrodynamic traction induced by collision forces. The performance of BBPGD and APGD solvers for the LCP arecompared in Fig. 6.In Fig. 5(A), the particles are close enough and are determined to be possibly in contact. The LCP (20) isconstructed as a scalar problem, because there is only one possible contact. The LCP solver then determinesthat no actual collision happens, as shown by the zero hydrodynamic traction in the right panel of Fig. 5(A). InFig. 5(B), the LCP is constructed similarly, but the LCP solver finds that the collision force is non-zero. This isshown by the non-zero hydrodynamic traction induced by collision in Fig. 5(B). In Fig. 5(C), the particles are18bout to roll over each other so the collision force is tiny. In Fig. 5(D), the collision force is zero because thespheres are instantaneously touching each other and are moving towards a collision-free configuration. Figure 5: A pair of particles dragged by equal and opposite external forces to roll over each other. In (A), (B), (C), and (D), the left panelshows the hydrodynamic traction induced by the external force F with colored surface and vectors on the spherical harmonic grid. The greyarrows show the total velocity U nc + U c of each particle. The right panels show the hydrodynamic traction induced by the collision force. The performance history of BBPGD and APGD, shown in Fig. 6, shows the difference between the four stagescorresponding to the snapshots. The ‘steps’ count shows how many GD steps are used in the solver, and the‘MVOPs’ number shows how many evaluations of
M γ are invoked during the GD steps, for each time-step. Theactual cost (and running time) scales with the number of ‘MVOPs’ because one mobility problem is solved withineach evaluation of
M γ . For stages (A) and (D), only 1 MVOP is necessary for BBPGD because the solver finds thatthe zero initial guess of collision force already solves the problem, without the necessity of computing GD steps.In stage (B) and (C0), BBPGD is able to converge with only 1 GD step for this scalar LCP. Recall in Algorithm 1that the first BBPGD step takes 2 MVOPs, and then only 1 MVOP per step. It is clear that in all cases BBPGDhas much lower cost compared to APGD. We observed similar performance advantages of BBPGD in many-bodysimulations, and all the rest results reported in this paper are computed with BBPGD only.
Figure 6: The performance of BBPGD and APGD solvers at each time-step for the simulation shown in Fig. 5. The residual is set to 10 − forboth cases. BBPGD significantly outperforms APGD. .3. Sedimentation of a dense cluster The sedimentation of 1000 monodisperse spheres is simulated as a more demanding and realistic testingproblem. The spheres are initially placed in a dense spherical cluster of volume fraction ≈ a and sediments due to the samegravitational force F pointing towards − z direction. The collision radius for each particle is set to 5% larger than a . The time-step is set to ∆ t = η a / F . p = m = (cid:65) , and thevelocity of each particle. In general, particles in a crowded environment sediment faster than isolated particles,because their close neighbors drag fluid together with them, effectively reducing the relative friction betweenparticle and fluid. Therefore, the particles close to the cluster surface sediment much slower than the particlesclose to the cluster core. These slower particles tend to lag behind and accumulate at the trailing point of thecluster. At that point, these slow particles form a dilute structure locally, and become even slower as shownin snapshots (B) and (C) of Fig. 7. Eventually, a tail of slow particles form behind the sedimenting cluster, asillustrated in snapshot (D) of Fig. 7 showing the structure at the end of the simulation t = η a / F , where eachparticle is colored by the hydrodynamic traction induced by collisions. This is a well-known phenomena [ ] ,and has also been observed in sedimenting clouds of fibers [ ] .Because of this phenomena, the cluster of spheres becomes less dense during sedimentation. Consequently,the number of BBPGD steps necessary to resolve collisions gradually decreases. The performance of the BBPGDsolver is shown in Fig. 8, where the convergence tolerance of BBPGD is set to ε tol = − . In our computational framework there are several tunable parameters, including the order p of sphericalharmonic representation, the time-step ∆ t , the BBPGD solver tolerance ε tol , and the collision radius a c . In thissection we report the effect of these four parameters on the accuracy of the solver by repeating the sedimentationcluster simulation of 1000 spheres done in the previous subsection but for a short amount of time t = η a / F .The simulation error is measured by the relative L error of the vector (cid:85) ∈ R at the end of the simulation,relative to a more accurate reference (cid:85) re f : ε = (cid:13)(cid:13) (cid:85) − (cid:85) re f (cid:13)(cid:13) (cid:13)(cid:13) (cid:85) re f (cid:13)(cid:13) . (50)We fix m =
10 (equivalent point density along each octree box edge) for KIFMM evaluations, and the GMRESconvergence residual to 10 − .We first vary ∆ t over the range ( − ) η a / F , while fixing p = a c = a , and ε tol = − . The time-stepping as implemented in Eq. (16) has first order accuracy, and the result shown in Table 1 shows a consistentbehavior at the smaller time-steps. We note that the largest time-step, ∆ t = η a / F is 20 times larger thanthe time-step used in the previous section, and for this large time-step each particle may move 2 ∼ ∆ t / ( η a / F ) ε Table 1: The effect of varying the time-step ∆ t upon the relative error ε . The reference case uses ∆ t = η a / F , with p = a c = a ,and ε tol = − . We next study the effect of varying the BBPGD tolerance ε tol between 10 − to 10 − , while fixing other pa-rameters to p = a c = a , and ∆ t = η a / F . The results, reported in Table 2 , show that tuning ε tol igure 7: The sedimentation of a dense cluster of 1000 monodisperse spheres with volume fraction ≈ F pointing toward − z direction, the hydrodynamictraction induced by collision forces in the cluster, the collision force network corresponding to the set of constraints (cid:65) , and the velocity ofeach particle. The snapshot (D) shows the structure at the end of the simulation, at t = η a / F , where each particle is colored by thehydrodynamic traction induced by collisions. A video of this sedimentation process is available in the Supplemental Material. igure 8: The performance of the BBPGD solver for the sedimentation simulation shown in Fig. 7. The simulation starts from a collision-freeconfiguration. The time-step is ∆ t = η a / F . The convergence tolerance of BBPGD is set to ε tol = − . has little effect on the simulation accuracy. This is because the most difficult part of the BBPGD solver is theidentification of the active set (cid:65) c , i.e., determining where are the actual collisions. Once this set is identified,the BBPGD solver converges quickly, as demonstrated by the average number of BBPGD steps in Table 2, wheredecreasing ε tol by an order of magnitude only requires 2 or 3 more BBPGD steps. Dai and Fletcher [ ] havethoroughly analyzed this behavior of BBPGD algorithm. ε tol − − − ε × − × − Table 2: The effect of varying the tolerance ε tol of BBPGD on the relative error ε . The reference case uses ε tol = − . Other parametersare fixed to p = a c = a , and ∆ t = η a / F . ε = The effect of changing the order of the spherical harmonics expansion is shown in Table 3 , where p is variedfrom 8 to 20, with p =
20 used as the reference case, and with other parameters held fixed at a c = a , ε tol = − , and ∆ t = η a / F . The error ε shows slow convergence with increasing p . This is becauseparticles are very close to each other in this dense cluster simulation, and as we have shown in Fig. 4, the errorof the mobility solver decreases slowly in this regime. This is the intrinsic difficulty of resolving lubricationeffects efficiently in dynamic simulations, and remains an open problem. The reference data used in Fig. 4 iscomputed by a highly accurate method which is not generally feasible in dynamic simulations. Because of thecost in increasing the order of the spherical harmonics expansion, we do not use high order expansions in thelarge scale simulations of the next section . p ε Table 3: The effect of varying p , the order of the spherical harmonics expansion, upon the relative error ε . The reference case uses p = a c = a , ε tol = − , ∆ t = η a / F . Finally, we investigate the effect of the collision radius a c . Table 4 reports the relative error ε for a c varyingfrom 1.001 a to 1.05 a , and fixing p = ε tol = − , and ∆ t = η a / F . ε decreases monotonically withdecreasing a c , but the convergence is slow. This is again related to the difficulty in resolving lubrication effectswhen particles are close to each other, because changing a c only affects those particles that may get closer. Inthis regime the accuracy of spherical harmonics expansion is only modestly accurate, even with a high order22xpansion. Therefore, there is generally no need to pick a very small value of a c , and we will use fairly largevalues of a c in the next section for large scale simulations. a c / a − ε Table 4: The effect of varying the collision radius a c upon the relative error ε . The reference case uses a c = a , with other parametersfixed to p = ε tol = − , and ∆ t = η a / F . The sedimentation of particles randomly placed in a cubic box in an unbounded fluid is also used to benchmarkthe scaling of our implementation, using up to 80000 spheres on 1792 cores (64 nodes). Weak and strong scalingis measured on a cluster where each node is equipped with two 14-core Intel Xeon CPU E5-2680 v4 running at2.40GHz. Hyper-Threading is turned off. One
MPI rank is launched for each CPU socket, and every
MPI ranklaunches 14
OpenMP threads. In total, 28 cores are used on each node, and the nodes are interconnected by a100 Gb / s Intel Omni-Path (OPA) network. Intel MPI compilers and libraries are used, together with Intel MKLlibraries for BLAS, LAPACK and FFTW3 functions. The radius a of each particle is randomly generated from alog-normal distribution with standard parameters µ = a and σ = a . The collision radius a c = a is takenfor each sphere. The time-step is fixed at ∆ t = η a / F , where F is the sedimentation force applied on eachparticle. p = m =
10 (equivalent point density along each octree boxedge) is used for the KIFMM evaluations, throughout these benchmarks. The largest system of 8 × sphereshas 486 degrees of freedom for hydrodynamic traction on each sphere and ∼ × degrees of freedom intotal.The running time for the five major operations are measured and reported in Fig. 9, 10, and 11. ‘Far Setup’refers to the time to compute the coordinates of spherical harmonic grid points and to build an adaptive octree forKIFMM evaluations. ‘Far Traction’ refers to the evaluation of the traction operator (cid:75) in the BI equation Eq. (34).‘Far SL’ refers to the evaluation of the single layer operator (cid:83) to evaluate the fluid velocity u on grid points afterthe traction ρ is found by solving Eq. (34). ‘Near Setup’ refers to the time to perform near neighbor detection toprepare the VSH evaluations. ‘Near Correction’ refers to the time for VSH corrections. The cost of constructingthe LCP sparse matrix (cid:68) and applying the sparse matrix-vector multiplications during LCP solution is not shownhere because it is negligible compared to the total cost of mobility solutions.Fig. 9 shows the strong scaling for 10, 000 spheres on up to 224 cores (8 nodes) and Fig. 10 shows thestrong scaling for 80, 000 spheres on up to 1792 cores (64 nodes). In both cases the same initial configurationat approximately 6% volume fraction is used throughout all runs on different numbers of cores. The left panelsof Fig. 9 and 10 show the total wall time for different operations for 11 time-steps. The right panels show theaverage wall time for performing each operation once. The black dashed line denotes the ideal scaling with100% parallel efficiency for reference. In these scaling tests, spherical harmonic grids with the same order p butdifferent orientation are used for mobility and collision problems. Therefore, at each time-step ‘Far Setup’ and‘Near Setup’ are both executed twice, i.e., once for the mobility problem and once for the collision problem. The‘Far Traction’ is evaluated once at each GMRES iteration of solving the mobility problem Eq. (34), but the ‘Far SL’is evaluated only when GMRES converges. In general, ‘Far Setup’ and ‘Far SL’ are executed the same number oftimes. The parallel efficiency of the ‘Near Correction’ part is close to ideal except for the largest case on 1792 cores,shown in Fig. 10, where we suspect there is some load-unbalancing due to domain decomposition. The efficiencyof ‘Far Traction’ and ‘Far SL’ are a bit lower than the near correction part, but they are consistent with the resultsdemonstrated for the library PVFMM [ ] , which we used to perform KIFMM evaluations. The efficiency of ‘FarSetup’ is even lower, but since it runs only once per time-step and takes only a small fraction of the total runningtime, there is almost no noticeable benefit in optimizing this part. The wall time for ‘Near Setup’ is negligible andinvisible in both Fig. 9 and Fig. 11. 23 igure 9: The strong scaling of 10, 000 spheres for 11 time-steps on up to 224 cores. The left panel shows the total time for each operation.‘Far Setup’ and ‘Near Setup’ both run 22 times. ‘Far SL’ runs 21 times and ‘Far Traction’ runs 298 times. ‘Near Correction’ runs 21 + = × spheres for 11 time-steps on up to 1792 cores. The left panel shows the total time for each operation.‘Far Setup’ and ‘Near Setup’ both run 22 times. ‘Far SL’ runs 21 times and ‘Far Traction’ runs 298 times. ‘Near Correction’ runs 21 + = .5.2. Weak scaling Since the cost of performing near corrections grows with the number of near pairs, we keep the system volumefraction approximately constant by adjusting the box size according to the number of particles. Fig. 11 showsthe average time for performing each operation once during a 10-time-step simulation, for two volume fractions3% and 6%. Ideally a flat line is expected if the parallel efficiency is 100%. Here the running time of ‘Far Setup’grows faster but other operations are not significantly far from the ideal case. This non-ideal scaling of ‘Far Setup’does not matter in real simulations because this operation is performed only once when particles move, that is,once for each time-step. Therefore the net cost of this ‘Far Setup’ is far less than the total cost in other operations,as can be seen from the left panels in Fig. 9 and Fig. 10.
Figure 11: The weak scaling at different volume fractions. The time for performing each operation once is measured. The left panel showsthe scaling with 625 spheres per node, ranging from 2500 spheres (4 nodes) to 4 × spheres (64 nodes). The right panel shows the scalingwith spheres per node, ranging from 5000 spheres (4 nodes) to 80, 000 spheres (64 nodes).
7. An active-matter case study: Suspensions of Stokes rotors
Recently, suspensions of active particles have been intensely investigated as real-world realizations of activematter [ ] . Active matter refers to multiscale materials whose microscopic constituents perform directed work onthe system, leading to large-scale dynamics such as self-organization. A canonical example of an active suspensionis a bacterial bath, wherein microorganisms interact through the flow-fields created by their self-propulsion [ ] .These systems can evince instabilities towards alignment and unpredictable large-scale flows sometimes termed“bacterial turbulence” [ ] . Similarly complex dynamics appears in suspensions of microtubules which are“polarity sorted” by the directed motion of cross-linking motor-proteins [ ] .A very different kind of active suspension consists of immersed particles that are driven to rotate, say ratherthan swim or sort, with that rotation again creating flow fields that can create large-scale coupling and dynamics.Such rotor systems are typically – but not always [ ] – driven by external means, such as a rotating magneticfield. Such systems can show activity-induced phase separation [ ] , crystallization [ ] , odd surface dynamicsand rheology [ ] , and forms of active turbulence [ ] .Here we use the methods developed here to study the dynamics of closely packed rotor systems, showing thedevelopment of large-scale dynamics. In each example we assume the same external torque T is applied to everyparticle. Within each time-step, we first solve a mobility problem to compute the velocity (cid:85) nc driven by T . Thisis followed by applying the collision resolution method described in Section §3 to compute the collision velocity (cid:85) c . A fixed time-step ∆ t is used throughout, and taken large enough so that each particle may move about20% of its radius at each time-step. This time-step is roughly two to three orders of magnitude larger than theusual choices in Stokes suspension simulations [
9, 21 ] . We do this to demonstrate the stability of our collisionresolution algorithm. For all simulations in this section, p = m = .1. A cluster of
10, 000 rotors
We first probe the dynamics of a spherical cluster of 10, 000 polydisperse (in diameter) spherical rotors in anunbounded fluid. A constant torque T along the z -axis is applied to each sphere. The radius of each particle israndomly generated from a log-normal distribution with standard parameters µ = a and σ = a , where theprobability density function is defined as p ( x ) = x σ (cid:112) π exp (cid:128) − ( ln x − µ ) σ (cid:138) . The cluster is approximately sphericaland the volume fraction of spheres is approximately 10%. A fixed time-step, ∆ t = η a / T , is used. Thecollision radius a c is set to 10% larger than the radius for each particle.In the absence of hydrodynamic interactions, each particle would rotate at constant rate about its z − axis,with larger particles rotating more slowly than smaller ones (given the constant driving torque). Figure 12,at t = η a / T , shows the effect of hydrodynamic and steric coupling in creating large-scale rotation of thecluster. The left panel shows the hydrodynamic traction magnitude across each rotor surface. The middle panelshows the instantaneous velocity magnitude of each particle, where blue ones are slow and green ones are fast.The right panel shows the trajectory of each particle, starting from t =
0, and colored by each particle’s velocitymagnitude at each time-step. The trajectories show that this cluster is rotating relative to a common axis in the z direction through the cluster center. The simulation runs till t = η a / T , and no visible expansion or shrinkingof this cluster is observed. Figure 12: Snapshots at t = η a / T of a cluster driven by a fixed torque T on each sphere. The left panel shows the hydrodynamic tractionmagnitude across each rotor surface. The middle and the right panels are both colored by the instantaneous center velocity magnitude of eachsphere at t = η a / T , showing the particle structure at this time-step and the particle trajectories starting from the initial configuration,respectively. A video of this simulation is available in the Supplemental Material. To analyze this global rotation, we set up a cylindrical coordinate system ( r , θ , z ) , where r , z = U of each particle onto this cylindrical coordinatesystem and take the angular component U θ . Then we estimate the normalized distribution P ( U θ , r ) using dataaccumulated over this entire simulation. The result is shown as a two dimensional histogram in Fig. 13. Thedistribution P ( U θ , r ) shows that the overall motion is close to that of a rigid body rotation U θ ∝ r , where particleswithin the cluster all rotate about the central z -axis with roughly the same angular velocity. Only relatively fewparticles, far from the cluster center with approximately r / a >
45, moves more slowly than the cluster’s globalrotation. The expectation is that collectively induced velocities will decay as r − for r >>
1, as the cluster willappear as a rotlet singularity in the far-field.In this system, for each time step approximately 2000 possible collisions are included in the collision resolutionsolver, and approximately 250 collisions actually happen. That is, the size of the set (cid:65) is roughly 2000 and thesize of the set (cid:65) c is roughly 250. The performance of the collision resolution algorithm over the course of thesimulation is shown in Fig. 14. The simulation starts from a collision-free configuration, and then collisions aregenerated and reaches a steady state as the cluster remains close to a rigid body rotation, as shown above. Inthe beginning roughly 14 BBPGD steps are necessary to resolve the collisions and later this number increasesto roughly 20. In comparison to the sedimentation case reported in Section §6, this rotor simulation involves10 times the number of particles but the number of necessary BBPGD steps only slightly increases. Empirically,the number of BBPGD steps scales much slower than the number of particles. This feature makes this collision26 igure 13: The normalized velocity distribution P ( U θ , r ) for rotors in the cluster. The distribution P is normalized so that ´ ∞ ´ ∞ P ( U θ , r ) π rdrdU =
1. The data is accumulated over the entire simulation of t = η a / T . resolution algorithm suitable for large scale simulations. Figure 14: The performance of the BBPGD solver for 10, 000 rotors in a spherical cluster. The simulation starts from a configuration withmany collisions. The time-step ∆ t = η a / T . The convergence tolerance of BBPGD is set to ε tol = − .
20, 000 rotors
Recent experiments and simulations have studied the dynamics of rotors confined to a surface within a fluid [ ] , or atop a solid substrate [ ] , with the driving external torque perpendicular to the surface or substrate.We use this setup to analyze the internal processes of a cluster of rotors, most especially the evolution of thecollision network, using a disk of 20, 000 monodisperse rotors confined in a monolayer, as shown in Fig. 1. Weset ∆ t = η a / T . The area fraction of particles is approximately 60% for this simulation, much denser thanthe previous example. As the collision radius of each particle is also set to a c = a , the effective area fractionfor collision resolution is around 70%. As a result, the rotors show a good deal of hexagonal ordering, as shownin Fig. 15. Even at such high densities, the BBPGD collision resolution solver takes about 20 descent steps pertime step, similar to the spherical cluster example shown in Fig. 14.The dynamics of the rotors is detailed in Fig. 15, where the hydrodynamic force density, i.e. the traction f , thenet collision force (cid:70) c on each particle, and the collision force magnitude γ for each contact constraint are shownin the left, center, and right panels, respectively. Figure 15 (A) and (B) show a snapshot at times t = η a / T and t = η a / T , respectively. The comparison between (A) and (B) clearly indicates that there are manysmall-scale motions caused by collisions, while the entire cluster is rotating collectively and differentially.To examine this rotation, we compute the normalized distribution P ( U θ , r ) as in Fig. 13. This is shown inFig. 16. The radius of the disk is R = a , and U θ ∝ r holds only near the center of the disk, and at the disk27 igure 15: Snapshots of a quarter of the rotor disk at times t = η a / T (A) and 350 η a / T (B). Counterclockwise global rotation is drivenby the torque T on each rotor, perpendicular to the disk. For (A) and (B), the left panels show the magnitude of hydrodynamic tractioninduced by collisions on each rotor surface, the middle panel shows the net collision force on each particle, and the right panel shows thecollision force γ (cid:96) on each contact constraint (cid:96) . The rightmost dashed circle marks a set of rotors that form a transient collision chain appearingin (B) only. Each of the other two white dashed circles on the left in (A) and (B) mark a single persistent collision chain. A video of thissimulation is available in the Supplemental Material. U θ ( r ) with a logarithmic divergence at the disk edge: U θ T / ( η a ) ≈ A log ( R − r ) + B , as r → R . (51)This calculation is detailed in Appendix B for the reader’s interest. In Fig. 16 we show a fit (black dashed curve)to this form, i.e. for A and B , from our numerical data in the range r > a . It appears that the simulation hasachieved sufficient scale to describe well a continuum of forced particles. Figure 16: The normalized velocity distribution P ( U θ , r ) for rotors in the monolayer disk. The distribution P is normalized so that ´ ∞ ´ ∞ P ( U θ , r ) π rdrdU =
1. The data is accumulated over the entire simulation of t = η a / T . The dashed curve is the function A log ( R − r ) + B , where A and B are estimated by fitting to the simulation data in the range r / a > To quantify the time-scales induced by collisions, we analyze the lifetime of each collision constraint withcollision force γ (cid:96) , for both the set (cid:65) for all constraints, and its subset (cid:65) c for active constraints. For the set (cid:65) thelifetime of a constraint is defined as the number of time-steps k during which the constraint remain included inthe LCP solver. For the subset (cid:65) c , the lifetime is defined as the number of time-steps during which the solution γ (cid:96) > (cid:96) . The normalized distributions for these two cases are shown in Fig. 17. Figure 17: The normalized distribution P ( k ) of constraint lifetimes. P ( k ) is normalized so that (cid:80) k max P ( k ) =
1, where k is the number oftime-steps, and k max is the total number of time-steps. The blue and yellow distributions show P ( k ) for the active constraint set (cid:65) c and theall constraint set (cid:65) , respectively. Figure 17 shows that while many constraints in the active set (cid:65) c last for only 1 time-step, there is a broadregion of short-lived constraints lasting between 2 and 20 time-steps. These longer-lived active constraints are of29elatively lower probability, but are not rare. However, almost no constraints in the set (cid:65) have a lifetime shorterthan 10 time-steps. The majority of them last more than 200 time-steps. This is because the set (cid:65) depends solelyon the particles’ geometric configuration. While the rotation rate is differential in this simulation, neighboringpairs of particles rarely change their relative positions, and so the set (cid:65) barely changes from step to step. Thisis shown in the right panels of Fig. 15(A) and (B). In contrast, the active constraint set (cid:65) c depends not only ongeometry, but also on the velocity (cid:85) nc , i.e., the particles’ tendency to collide with each other. For this many-body problem, slight relative displacements of particles may induce significant changes in relative velocities, andgenerate very different (cid:65) c at different time-steps. As a final note, collisions may form ‘collision force chains’when a few close-by particles run into one another like a chain. The two white dashed circles on the left inFig. 15(A) and (B) mark persisting collision chains, while the rightmost white dashed circles mark a transientcollision chain appearing in (B) only.
20, 000 rotors
In a last example, we place a square sheet of 20, 000 rotors on the z = T on each particle but now along the y -axis. Having the putative rotation axis aligned with the layer is a verydifferent kind of forcing from the previous examples, and is conceptually akin to a vortex layer or sheet immersedin an inviscid fluid. We set ∆ t = η a / T . The initial area fraction is still ≈
60% and we set the collision radius a c = a as in the previous examples.The top and side views of simulation snapshots are shown in Fig. 18 and 19, where subfigures (A) - (D) showthe magnitude of hydrodynamic traction, the collision force on each particle, and the collision force γ (cid:96) on eachconstraint, at times t = η a / T , 170 η a / T , 300 η a / T and 689 η a / T .Fig. 18(A) shows the initial stage of system evolution, where particles driven by the torque tend to roll overeach other and generate many collisions within the sheet, but with the activated constraints (right panel) beingrather isolated instead of forming chains. At t = η a / T , (B) shows the peak time of collision force in thissimulated process and, as shown in the right panel, that many force chains are forming. Later, as shown insubfigure (C), the collision force decreases because narrow regions void of particles start to form. The moststriking feature is the formation of strings of particles, or rollers in the vortical dynamics parlance, along thedirection of the torque T , as shown in Fig. 18(D). These are reminiscent, perhaps, of the Kelvin-Helmholtz rollsthat form from flat vortex sheets and layers [ ] .The formation of these rollers can also be seen from the side view Fig. 20, where the arrangement is similarto Fig. 19 but only 10% of particles close to the edge are shown. Inside each roller, the hydrodynamic tractiondistributed across the rotor surface induced by the torque T and collisions can be clearly seen. The formationof these chains of rotors are related to the flow generated by the torques. Since all rotors are rotating alongthe same + y direction, a global flow is induced towards the + x direction above the sheet, and towards the − x direction below the sheet. This causes a jump of fluid velocity across the sheet. Further, because collectively thesheet maintains the thin layer geometry as shown in the side view Fig. 19, this jump of fluid velocity persists fora long time, and keeps driving the formation of chains. Again, this is similar to the Kelvin-Helmholtz instability,where the fluid velocity jumps across a vortical layer separating two fluids. However, for our rotor system theReynolds number is zero, rather than infinity, and reflects the precise balance of drive and dissipation, rather thanof dissipationless conservation laws and Hamiltonian structure (though see [ ] ). A complete investigation ofthese analogies is beyond the scope of this work on numerical methods , and we leave it for a future study.30 igure 18: The top view of simulation snapshots for 20, 000 rotors on the z = T aligned with the y -axis. (A),(B), (C), and (D) are taken at t = η a / T , 170 η a / T , 300 η a / T and 689 η a / T , respectively. The left panels show the magnitude ofhydrodynamic traction induced by collisions on each rotor surface, the middle panels show the net collision force on each particle, and theright panels show the collision force γ (cid:96) on each contact constraint (cid:96) . This arrangement is the same as the previous example Fig. 15. A videoof this simulation is available in the Supplemental Material. igure 19: A side view of simulation snapshots for 20, 000 rotors on the z = T aligned with the y -axis. The threepanels of (A), (B), (C), and (D) show the same visualizations as in Fig. 18, viewed from the negative y -axis.Figure 20: The side view of simulation snapshots at t = η a / T , showing only those particles close to the bottom edge of the structureshown in Fig. 18. The left panels in (A) and (B) show the hydrodynamic traction induced by the torque T on each particle, and the rightpanels show the hydrodynamic traction induced by collisions. The colormap is the same as the left panels in Fig. 18. . Conclusions In this work we described a computational framework for simulating particulate Stokes suspensions. A keycomponent is the collision resolution algorithm we extended from the LCP method for underdamped (inertial)granular flow [
35, 39 ] to overdamped Stokes suspensions. The LCP is constructed at every time-step basedon the non-overlapping geometric constraints and coupled to an explicit time-stepping scheme. The LCP is thenconverted to a CQP, utilizing the fact that the mobility matrix (cid:77) is SPD, which is efficiently solved with the BBPGDmethod. This collision resolution algorithm addresses two important drawbacks in traditional collision resolutionmethods based on pairwise repulsive potentials: (i) the temporal stiffness induced by repulsive potentials, and(ii) the particles becoming effectively soft since the repulsive potentials cannot be infinitely stiff. This collisionresolution method does not require explicit construction of the mobility matrix (cid:77) . Any mobility solver that isable to compute (cid:85) with given force (cid:70) can be used within this collision resolution algorithm as long as (cid:77) iskept SPD. Further, the particles do not have to be spherical [ ] .We then demonstrated the application of this method to suspensions of spherical particles, where the mobilitysolver is based on a new second-kind BI equation [ ] . In particular, VSH expansions [ ] are utilized to maintainhigh accuracy of the BI operators for close pairs of spheres. Consequently, this specialized mobility solver is well-conditioned in all test cases, even when particles are very close. The stability and scalability of our algorithm isdemonstrated in Section §6, where we implemented these methods with full MPI and
OpenMP parallelism. In thesedimentation and rotation tests, time-stepping remains stable although very large time-steps are purposefullyused to demonstrate stability, even when particles move about 10 ∼
20% of their radii within one time-step. Inscalability benchmarks, systems of up to 8 × spheres on 1792 cores are demonstrated. We believe the codecan be successfully scaled to much larger systems on larger machines. Finally, in Section §7, we demonstrated theapplication of our method to suspensions of driven rotors, which illustrate the collectively-induced system-scalerotational motions, intricate microscopic collision networks, and a Kelvin-Helmholtz-like instability.In comparison to a few other methods based on geometric constraints [
29, 61 ] , a key advantage of our methodis that the collision force between each collision pair is individually computed and recorded. This preserves theentire collision force network information, which is necessary for computing the system collision stress. Ourapproach requires computing the collision force as a separate mobility problem, instead of embedding the mini-mization problem into the mobility solver, as done by Lu et al. [
61, 62 ] in their work on deformable bodies (notethat while the LCP collision resolution method is derived for rigid particles, it does allow the particles to deformoutside this collision resolution stage). Therefore, our method has a higher cost because the dense mobility ma-trix (cid:77) appears in the matrix A of the LCP (20). This extra cost can potentially be reduced by the matrix-splittingsubspace optimization method [ ] , where in most stages of the minimization process only a block-diagonalpart of (cid:77) is used. We leave this to future work.In other work we have demonstrated the applicability of our collision resolution scheme to evolving assembliesof Brownian spherocylinders [ ] . There, we simulated dynamics of a system of growing and dividing cells, wherecell sizes increase when they grow and decrease when they divide, and steric interactions are central to how the“colony” grows. Similarly, this computational framework can be directly applied to many other interesting physicsand engineering problems, such as confined suspensions of swimmers [ ] and cell packing in biofilms [ ] .
9. Acknowledgements
We thank E. Lushi for useful conversations. EC and SV acknowledge support from NSF under grants DMS-1454010 and DMS-1719834. DM acknowledges support from the Office of Naval Research under award numberN00014-17-1-2451 and Simons Foundation / SFARI(560651, AB). The work of SV was also supported by theFlatiron Institute, a division of the Simons Foundation. MJS acknowledges the support of NSF Grants DMR-1420073 (NYU MRSEC), DMS-1463962, and DMS-1620331.Our implementation of this framework will be released on GitHub (https: // github.com / wenyan4work / SphereSimulator)as an open-source software following the publication of this article.33 ppendix A. The VSH expansion of traction operator
The first row of Eq. (49): g rrnm = (cid:0) n f Wnm (cid:48) ( r ) − ( n + ) f Vnm (cid:48) ( r ) (cid:1) Y mn ( θ , φ ) (A.1) g r θ nm = Y mn ( θ , φ )( − m ( n + ) f Vnm ( r ) cot ( θ ) + m ( n − ) f Wnm ( r ) cot ( θ ) + im f Xnm ( r ) csc ( θ )) r − e − i φ (cid:112) − m ( m + ) + n + n (( n + ) f Vnm ( r ) − n f Wnm ( r ) + f Wnm ( r )) Y m + n ( θ , φ ) r (A.2) g r φ nm = me − i φ csc ( θ )( sin ( φ ) − i cos ( φ )) Y mn ( θ , φ )(( n + ) f Vnm ( r ) − n f Wnm ( r ) + f Wnm ( r ) − i f Xnm ( r ) cos ( θ )) r − e − i φ f Xnm ( r ) (cid:112) − m ( m + ) + n + nY m + n ( θ , φ ) r (A.3)The second row: g θ rnm = m csc ( θ ) Y mn ( θ , φ ) (cid:0) cos ( θ ) (cid:0) f Vnm (cid:48) ( r ) + f Wnm (cid:48) ( r ) (cid:1) − i f Xnm (cid:48) ( r ) (cid:1) + e − i φ Y m + n ( θ , φ ) (cid:128)(cid:198) ( n − m )( m + n + ) f Vnm (cid:48) ( r ) + (cid:198) ( n − m )( m + n + ) f Wnm (cid:48) ( r ) (cid:138) (A.4) g θθ nm = (cid:20) f Vnm ( r ) (cid:0) − m (cid:0) csc ( θ ) − m cot ( θ ) (cid:1) − n − (cid:1) r + f Wnm ( r ) (cid:0) m cot ( θ ) − m csc ( θ ) + n (cid:1) r − i ( m − ) m f Xnm ( r ) cot ( θ ) csc ( θ ) r (cid:21) Y mn ( θ , φ )+ (cid:130) e − i φ (cid:112) − m ( m + ) + n + n csc ( θ )(( m + ) cos ( θ )( f Vnm ( r ) + f Wnm ( r )) − im f Xnm ( r )) r (cid:140) Y m + n ( θ , φ )+ (cid:130) e − i φ (cid:112) ( m − n )( m − n + )( m + n + )( m + n + )( f Vnm ( r ) + f Wnm ( r )) r (cid:140) Y m + n ( θ , φ ) (A.5) g θφ nm = mY mn ( θ , φ )( f Xnm ( r ) + ( m − ) csc ( θ )( f Xnm ( r ) csc ( θ ) + i cot ( θ )( f Vnm ( r ) + f Wnm ( r )))) r + (cid:20) ime − i φ f Vnm ( r ) (cid:112) − m ( m + ) + n + n csc ( θ ) r + ime − i φ f Wnm ( r ) (cid:112) − m ( m + ) + n + n csc ( θ ) r − e − i φ f Xnm ( r ) (cid:112) − m ( m + ) + n + n cot ( θ ) r (cid:21) Y m + n ( θ , φ ) (A.6)34he third row: g φ rnm = m csc ( θ ) Y mn ( θ , φ ) (cid:0) cos ( θ ) f Xnm (cid:48) ( r ) + i (cid:0) f Vnm (cid:48) ( r ) + f Wnm (cid:48) ( r ) (cid:1)(cid:1) + e − i φ (cid:198) ( n − m )( m + n + ) f Xnm (cid:48) ( r ) Y m + n ( θ , φ ) (A.7) g φθ nm = (cid:20) i ( m − ) m f Vnm ( r ) cot ( θ ) csc ( θ ) r + i ( m − ) m f Wnm ( r ) cot ( θ ) csc ( θ ) r + m f Xnm ( r ) csc ( θ )( m cos ( θ ) + m − ) r (cid:21) Y mn ( θ , φ ) (cid:20) ime − i φ f Vnm ( r ) (cid:112) − m ( m + ) + n + n csc ( θ ) r + ime − i φ f Wnm ( r ) (cid:112) − m ( m + ) + n + n csc ( θ ) r + ( m + ) e − i φ f Xnm ( r ) (cid:112) − m ( m + ) + n + n cot ( θ ) r (cid:21) Y m + n ( θ , φ ) (cid:20) ime − i φ f Vnm ( r ) (cid:112) − m ( m + ) + n + n csc ( θ ) r + ime − i φ f Wnm ( r ) (cid:112) − m ( m + ) + n + n csc ( θ ) r + ( m + ) e − i φ f Xnm ( r ) (cid:112) − m ( m + ) + n + n cot ( θ ) r (cid:21) Y m + n ( θ , φ ) (A.8) g φφ nm = (cid:20) f Vnm ( r ) (cid:0) − ( m − ) m csc ( θ ) − m − n − (cid:1) r + f Wnm ( r ) (cid:0) m (cid:0) cot ( θ ) − m csc ( θ ) (cid:1) + n (cid:1) r + i ( m − ) m f Xnm ( r ) cot ( θ ) csc ( θ ) r (cid:21) Y mn ( θ , φ ) (cid:20) e − i φ f Vnm ( r ) (cid:112) − m ( m + ) + n + n cot ( θ ) r + e − i φ f Wnm ( r ) (cid:112) − m ( m + ) + n + n cot ( θ ) r + ime − i φ f Xnm ( r ) (cid:112) − m ( m + ) + n + n csc ( θ ) r (cid:21) Y m + n ( θ , φ ) (A.9) Appendix B. The singularity of velocity close to the disk edge of Stokes rotlets
Appendix B.1. Geometry and setup
We consider a domain D of a disk on z = D = { ( x , y ) | x + y < R } (B.1)We denote the number density of particles within this disk as n . A constant torque T toward + z direction isexerted on each particle. We assume that each particle follows the fluid velocity induced by other particles. Eachparticle induces a rotational fluid flow u j , as given by the Stokes rotlet velocity field: u j = ε jlm πµ r m r T l , (B.2)where ε jlm is the Levi-Civita tensor. r is the vector pointing from the rotlet to the point where u j is evaluated.Due to the symmetry of this disk D , we consider a point ( s , 0 ) on the x -axis. The velocity u at this point isaligned with the y -axis, given by an integral over the rotlets on this entire disk:8 πµ nT u ( s ) = R π r ˆ π s − r cos θ (cid:0) ( s − r cos θ ) + r sin θ (cid:1) d θ d r , (B.3)35here r , θ is the cylindrical coordinate system used to denote the rotlet point in this disk of radius R . ffl denotesthe principal value of this integral, because this integral involves a high order of singularity at the point ( r , 0 ) .The integral over θ can be computed to get the closed form: F ( r ) = K (cid:128) rs ( r + s ) (cid:138) s ( r + s ) + E (cid:128) rs ( r + s ) (cid:138) s ( s − r ) , (B.4)where K ( x ) is the complete elliptic integral of the first kind [ ] , and E ( x ) is the complete elliptic integral ofthe second kind [ ] . K ( x ) is singular at x =
1, i.e., r = s . So the two terms of F ( r ) are both singular at r = s .As a result, the velocity u ( s ) becomes: 8 πµ nT u ( s ) = R F ( r ) π rd r , (B.5)where 0 < s < R . The principal value can be computed as this limit as δ → + :8 πµ nT u ( s ) = lim δ → + (cid:18) ˆ s − δ F ( r ) π rd r + ˆ Rs + δ F ( r ) π rd r (cid:19) . (B.6) Appendix B.2. Principal value and singularity
Now we discuss the singularity in a small region [ s − a , s + b ] around s , where s − a < s − δ < s < s + δ < s + b .In this small region, we use the asymptotic expansion of the integrand r F ( r ) around r = s [ ] : r F ( r ) ≈ − ( s − r ) − log (cid:0) s (cid:1) − + ( ) s − r − s when r < s (B.7) ≈ − ( r − s ) − log (cid:0) s (cid:1) − + ( ) s − r − s when r > s (B.8)In the small region [ s − a , s + b ] , the integral can be asymptotically computed: s + bs − a r F ( r ) d r = ˆ s − δ s − a F ( r ) rd r + ˆ s + bs + δ F ( r ) rd r (B.9) ≈ ( δ − a ) log (cid:0) s (cid:1) + s log (cid:0) a δ (cid:1) + ( log ( ) − )( a − δ ) − a log ( a ) + δ log ( δ ) s + ( b − δ ) (cid:0) − log (cid:0) s (cid:1) − + log ( ) (cid:1) + s log (cid:0) δ b (cid:1) − b log ( b ) + δ log ( δ ) s (B.10) = ( log ( ) − )( a + b − δ ) − ( a + b ) log (cid:0) s (cid:1) + s log (cid:0) ab (cid:1) − a log ( a ) − b log ( b ) + δ log (cid:128) δ s (cid:138) s (B.11)The limit as δ → δ → s + bs − a r F ( r ) d r = (cid:16) ab (cid:17) + − a log (cid:0) s (cid:1) − a + a log ( ) − a log ( a ) − b log (cid:0) s (cid:1) − b + b log ( ) − b log ( b ) s (B.12)This gives the behavior of the singularity around the point ( s , 0 ) .When the point ( s , 0 ) is close to the edge of the disk, i.e., s → R − : b = R − s → + (B.13)36he behavior of this pole can be calculated as the limit of b → + : (cid:18) lim δ → ˆ s + bs − a r F ( r ) d r (cid:19) = − b − R b log b + O ( b ) + C ( a ) as b → + (B.14)where we have used s → R − . O ( b ) is of order b or higher. C ( a ) is a function involving a and R only.In sum, for a target point at ( s , 0 ) close to the disk edge, the velocity shows a logarithm singularity dominatedby the log b term: u = A log ( R − s ) + B + O (cid:129) R − sR log ( R − s ) (cid:139) , (B.15)where A and B are two constants to be determined. ReferencesReferences [ ] N. J. Wagner, J. F. Brady, Shear thickening in colloidal dispersions, Physics Today 62 (2009) 27–32. doi: . [ ] Y. S. Lee, E. D. Wetzel, N. J. Wagner, The ballistic impact characteristics of kevlar (r) woven fabrics impregnated with a colloidal shearthickening fluid, Journal of Materials Science 38 (2003) 2825–2833. doi:
Doi10.1023/A:1024424200221 . [ ] V. Trappe, V. Prasad, L. Cipelletti, P. N. Segre, D. A. Weitz, Jamming phase diagram for attractive particles, Nature 411 (2001) 772–775.doi: . [ ] Anderson, Lekkerkerker, Insights into phase transition kinetics from colloid science, Nature (2002). [ ] D. Saintillan, M. J. Shelley, Active suspensions and their nonlinear models, Comptes Rendus Physique 14 (2013) 497–517. doi: . [ ] M. C. Marchetti, J.-F. Joanny, S. Ramaswamy, T. B. Liverpool, J. Prost, M. Rao, R. A. Simha, Hydrodynamics of soft active matter,Reviews of Modern Physics 85 (2013) 1143. [ ] M. J. Shelley, The dynamics of microtubule / motor-protein assemblies in biology and physics, Annual Review of Fluid Mechanics 48(2016) 487–506. [ ] D. Needleman, Z. Dogic, Active matter at the interface between materials science and cell biology, Nature Reviews Materials 2 (2017)17048. [ ] D. R. Foss, J. F. Brady, Brownian dynamics simulation of hard-sphere colloidal dispersions, Journal of Rheology 44 (2000) 629–651.doi:
Doi10.1122/1.551104 . [ ] J. Rotne, S. Prager, Variational treatment of hydrodynamic interaction in polymers, The Journal of Chemical Physics 50 (1969)4831–4837. doi: . [ ] H. Yamakawa, Transport properties of polymer chains in dilute solution: Hydrodynamic interaction, The Journal of Chemical Physics53 (1970) 436–443. doi: . [ ] P. J. Zuk, E. Wajnryb, K. A. Mizerski, P. Szymczak, Rotne-Prager-Yamakawa approximation for different-sized particles in applicationto macromolecular bead models, Journal of Fluid Mechanics 741 (2014). URL: http://journals.cambridge.org/article_S002211201300668X . doi: . [ ] E. Wajnryb, K. A. Mizerski, P. J. Zuk, P. Szymczak, Generalization of the Rotne-Prager-Yamakawa mobility and shear disturbancetensors, Journal of Fluid Mechanics 731 (2013). URL: http://journals.cambridge.org/article_S0022112013004023 .doi: . [ ] K. A. Mizerski, E. Wajnryb, P. J. Zuk, P. Szymczak, The Rotne-Prager-Yamakawa approximation for periodic systems in a shear flow,The Journal of Chemical Physics 140 (2014) 184103. doi: . [ ] C. W. J. Beenakker, Ewald sum of the Rotne-Prager tensor, Journal of Chemical Physics 85 (1986) 1581–1582. doi:
Doi10.1063/1.451199 . [ ] Z. Liang, Z. Gimbutas, L. Greengard, J. Huang, S. Jiang, A fast multipole method for the Rotne-Prager-Yamakawa tensor and itsapplications, Journal of Computational Physics 234 (2013) 133–139. doi: . [ ] W. Guan, X. Cheng, J. Huang, G. Huber, W. Li, J. A. McCammon, B. Zhang, RPYFMM: Parallel adaptive fast multipole method forRotne-Prager-Yamakawa tensor in biomolecular hydrodynamics simulations, Computer Physics Communications 227 (2018) 99–108.doi: . [ ] L. Durlofsky, J. F. Brady, G. Bossis, Dynamic simulation of hydrodynamically interacting particles, Journal of Fluid Mechanics 180(1987) 21–49. doi:
Doi10.1017/S002211208700171x . [ ] J. F. Brady, G. Bossis, Stokesian dynamics, Annual Review of Fluid Mechanics 20 (1988) 111–157. doi: . [ ] T. N. Phung, J. F. Brady, G. Bossis, Stokesian dynamics simulation of brownian suspensions, Journal of Fluid Mechanics 313 (1996)181–207. doi: . [ ] A. Sierou, J. F. Brady, Accelerated stokesian dynamics simulations, Journal of Fluid Mechanics 448 (2001) 115–146. doi: . ] M. Wang, J. F. Brady, Spectral ewald acceleration of stokesian dynamics for polydisperse suspensions, Journal of ComputationalPhysics 306 (2016) 443–477. [ ] R. T. Bonnecaze, J. F. Brady, Yield stresses in electrorheological fluids, Journal of Rheology 36 (1992) 73–115. doi:
Doi10.1122/1.550343 . [ ] R. T. Bonnecaze, J. F. Brady, Dynamic simulation of an electrorheological fluid, Journal of Chemical Physics 96 (1992) 2183–2202.doi:
Doi10.1063/1.462070 . [ ] I. L. Claeys, J. F. Brady, Suspensions of prolate spheroids in stokes flow. part 1. dynamics of a finite number of particles in an unboundedfluid, Journal of Fluid Mechanics 251 (1993-06) 411–442. doi: . [ ] S. Kim, S. J. Karrila, Microhydrodynamics: Principles and Selected Applications, Courier Corporation, 2005. [ ] D. J. Pine, J. P. Gollub, J. F. Brady, A. M. Leshansky, Chaos and threshold for irreversibility in sheared suspensions, Nature 438 (2005)997. [ ] J. a. Janela, A. Lefebvre, B. Maury, A penalty method for the simulation of fluid - rigid body interaction, ESAIM: Proceedings 14(2005) 115–123. doi: . [ ] B. Maury, A time-stepping scheme for inelastic collisions, Numerische Mathematik 102 (2006) 649–679. doi: . [ ] P. A. Cundall, O. D. Strack, A discrete numerical model for granular assemblies, geotechnique 29 (1979) 47–65. [ ] P. Lötstedt, Mechanical Systems of Rigid Bodies Subject to Unilateral Constraints, SIAM Journal on Applied Mathematics 42 (1982)281–296. doi: . [ ] Chunsheng Cai, B. Roth, On the spatial motion of a rigid body with point contact, in: 1987 IEEE International Conference on Roboticsand Automation Proceedings, volume 4, 1987, pp. 686–695. doi: . [ ] D. J. Montana, The Kinematics of Contact and Grasp, The International Journal of Robotics Research 7 (1988) 17–32. doi: . [ ] D. Baraff, Issues in computing contact forces for non-penetrating rigid bodies, Algorithmica 10 (1993) 292. doi: . [ ] M. Anitescu, J. F. Cremer, F. A. Potra, Formulating Three-Dimensional Contact Dynamics Problems, Mechanics of Structures andMachines 24 (1996) 405–437. doi: . [ ] D. E. Stewart, J. C. Trinkle, An implicit time-stepping scheme for rigid body dynamics with inelastic collisions and coulomb friction,International Journal for Numerical Methods in Engineering 39 (1996) 2673–2691. [ ] M. Anitescu, F. A. Potra, Formulating Dynamic Multi-Rigid-Body Contact Problems with Friction as Solvable Linear ComplementarityProblems, Nonlinear Dynamics 14 (1997) 231–247. doi: . [ ] D. E. Stewart, Convergence of a Time-Stepping Scheme for Rigid-Body Dynamics and Resolution of Painlevé’s Problem, Archive forRational Mechanics and Analysis 145 (1998) 215–260. doi: . [ ] M. Anitescu, F. A. Potra, D. E. Stewart, Time-stepping for three-dimensional rigid body dynamics, Computer Methods in AppliedMechanics and Engineering 177 (1999) 183–197. doi: . [ ] D. Stewart, Rigid-Body Dynamics with Friction and Impact, SIAM Review 42 (2000) 3–39. doi: . [ ] M. Anitescu, F. A. Potra, A time-stepping method for stiff multibody dynamics with contact and friction, International Journal forNumerical Methods in Engineering 55 (2002) 753–784. doi: . [ ] M. Anitescu, G. D. Hart, A constraint-stabilized time-stepping approach for rigid multibody dynamics with joints, contact and friction,International Journal for Numerical Methods in Engineering 60 (2004) 2335–2371. doi: . [ ] M. Anitescu, Optimization-based simulation of nonsmooth rigid multibody dynamics, Mathematical Programming 105 (2006) 113–143. doi: . [ ] A. Tasora, D. Negrut, M. Anitescu, Large-scale parallel multi-body dynamics with frictional contact on the graphical processing unit,Proceedings of the Institution of Mechanical Engineers, Part K: Journal of Multi-body Dynamics 222 (2008) 315–326. doi: . [ ] A. Tasora, M. Anitescu, A Fast NCP Solver for Large Rigid-Body Problems with Contacts, Friction, and Joints, in: Multibody Dynamics,Computational Methods in Applied Sciences, Springer, Dordrecht, 2009, pp. 45–55. 10.1007 / [ ] A. Tasora, M. Anitescu, A Convex Complementarity Approach for Simulating Large Granular Flows, Journal of Computational andNonlinear Dynamics 5 (2010) 031004–031004–10. doi: . [ ] A. Tasora, M. Anitescu, A matrix-free cone complementarity approach for solving large-scale, nonsmooth, rigid body dynamics,Computer Methods in Applied Mechanics and Engineering 200 (2011) 439–453. doi: . [ ] A. Tasora, D. Negrut, M. Anitescu, GPU-Based Parallel Computing for the Simulation of Complex Multibody Systems with Unilateraland Bilateral Constraints: An Overview, in: Multibody Dynamics, Computational Methods in Applied Sciences, Springer, Dordrecht,2011, pp. 283–307. 10.1007 / [ ] D. Negrut, A. Tasora, H. Mazhar, T. Heyn, P. Hahn, Leveraging parallel computing in multibody dynamics, Multibody System Dynamics27 (2012) 95–117. doi: . [ ] A. Tasora, M. Anitescu, A complementarity-based rolling friction model for rigid contacts, Meccanica 48 (2013) 1643–1659. doi: . [ ] T. Heyn, H. Mazhar, A. Pazouki, D. Melanz, A. Seidl, J. Madsen, A. Bartholomew, D. Negrut, D. Lamb, A. Tasora, Chrono: A ParallelPhysics Library for Rigid-Body, Flexible-Body, and Fluid Dynamics, ASME 2013 International Design Engineering Technical Confer-ences and Computers and Information in Engineering Conference 7B (2013) V07BT10A050. doi: . [ ] H. Mazhar, T. Heyn, A. Pazouki, D. Melanz, A. Seidl, A. Bartholomew, A. Tasora, D. Negrut, CHRONO: A parallel multi-physics libraryfor rigid-body, flexible-body, and fluid dynamics, Mechanical Sciences 4 (2013) 49–64. doi: . [ ] A. Tasora, M. Anitescu, S. Negrini, D. Negrut, A compliant visco-plastic particle contact model based on differential variationalinequalities, International Journal of Non-Linear Mechanics 53 (2013) 2–12. doi: . [ ] A. Pazouki, M. Kwarta, K. Williams, W. Likos, R. Serban, P. Jayakumar, D. Negrut, Compliant contact versus rigid contact: A comparison n the context of granular dynamics, Physical Review E 96 (2017) 042905. doi: . [ ] M. Anitescu, A. Tasora, An iterative approach for cone complementarity problems for nonsmooth dynamics, Computational Optimiza-tion and Applications 47 (2010) 207–235. doi: . [ ] T. Heyn, M. Anitescu, A. Tasora, D. Negrut, Using Krylov subspace and spectral methods for solving complementarity problems inmany-body contact dynamics simulation, International Journal for Numerical Methods in Engineering 95 (2013) 541–561. doi: . [ ] H. Mazhar, T. Heyn, D. Negrut, A. Tasora, Using Nesterov’s Method to Accelerate Multibody Dynamics with Friction and Contact, ACMTrans. Graph. 34 (2015) 32:1–32:14. doi: . [ ] D. Melanz, L. Fang, P. Jayakumar, D. Negrut, A comparison of numerical methods for solving multibody dynamics problems withfrictional contact modeled via differential variational inequalities, Computer Methods in Applied Mechanics and Engineering 320(2017) 668–693. doi: . [ ] E. Corona, D. Gorsich, P. Jayakumar, S. Veerapaneni, Tensor train accelerated solvers for nonsmooth rigid body dynamics, arXivpreprint arXiv:1808.02558 (2018). [ ] S. De, E. Corona, P. Jayakumar, S. Veerapaneni, Scalable solvers for cone complementarity problems in frictional multibody dynamics,Proceedings of IEEE Conference on High Performance Extreme Computing To appear (2019). [ ] L. Lu, A. Rahimian, D. Zorin, Contact-aware simulations of particulate Stokesian suspensions, Journal of Computational Physics 347(2017) 160–182. URL: . doi: . [ ] L. Lu, A. Rahimian, D. Zorin, Parallel contact-aware simulations of deformable particles in 3d stokes flow, arXiv preprintarXiv:1812.04719 (2018). [ ] W. Yan, H. Zhang, M. Shelley, Computing collision stress in assemblies of active spherocylinders: applications of a fast and genericgeometric method, The Journal of Chemical Physics 150 (2019). doi: arXiv:1811.04736 . [ ] E. Corona, S. Veerapaneni, Boundary integral equation analysis for suspension of spheres in stokes flow, Journal of ComputationalPhysics 362 (2018-06) 327–345. doi: . [ ] K. Yeo, E. Lushi, P. M. Vlahovska, Collective Dynamics in a Binary Mixture of Hydrodynamically Coupled Microrotors, Physical ReviewLetters 114 (2015) 188301. doi: . [ ] N. Oppenheimer, D. B. Stein, M. J. Shelley, Hydrosteric crystallization of rotating membrane inclusions, to appear, Physical ReviewLetters (2019). [ ] V. Soni, E. Bililign, S. Magkiriadou, S. Sacanna, D. Bartolo, M. J. Shelley, W. T. M. Irvine, The odd free surface flows of a colloidalchiral fluid, to appear, Nature Physics (2019). [ ] G. Kokot, S. Das, R. G. Winkler, G. Gompper, I. S. Aranson, A. Snezhko, Active turbulence in a gas of self-assembled spinners,Proceedings of the National Academy of Sciences 114 (2017) 12870–12875. doi: . [ ] H. Power, G. Miranda, Second kind integral equation formulation of stokes’ flows past a particle of arbitrary shape, SIAM Journal onApplied Mathematics 47 (1987) 689–698. [ ] E. Corona, L. Greengard, M. Rachh, S. Veerapaneni, An integral equation formulation for rigid bodies in stokes flow in three dimen-sions, Journal of Computational Physics 332 (2017-03) 504–519. doi: . [ ] A.-K. Tornberg, K. Gustavsson, A numerical method for simulations of rigid fiber suspensions, Journal of Computational Physics 215(2006) 172–196. URL: . doi: . [ ] K. Gustavsson, A.-K. Tornberg, Gravity induced sedimentation of slender fibers, Physics of Fluids (1994-present) 21 (2009)123301. URL: http://scitation.aip.org/content/aip/journal/pof2/21/12/10.1063/1.3273091 . doi: . [ ] S. Delong, F. B. Usabiaga, A. Donev, Brownian dynamics of confined rigid bodies, The Journal of Chemical Physics 143 (2015) 144107.doi: . [ ] S. Niebe, K. Erleben, Numerical methods for linear complementarity problems in physics-based animation, Morgan & Claypool Pub-lishers, 2015. [ ] R. Fletcher, On the Barzilai-Borwein method, in: L. Qi, K. Teo, X. Yang (Eds.), Optimization and Control with Applications, SpringerUS, Boston, MA, 2005, pp. 235–256. [ ] Y.-H. Dai, R. Fletcher, Projected Barzilai-Borwein methods for large-scale box-constrained quadratic programming, NumerischeMathematik 100 (2005) 21–47. doi: . [ ] Y.-H. Dai, W. W. Hager, K. Schittkowski, H. Zhang, The cyclic Barzilai-Borwein method for unconstrained optimization, IMA Journalof Numerical Analysis 26 (2006) 604–627. doi: . [ ] O. L. Mangasarian, T. H. Shiau, Error bounds for monotone linear complementarity problems, Mathematical Programming 36 (1986)81–89. doi: . [ ] Y. Lin, J. Pang, Iterative Methods for Large Convex Quadratic Programs: A Survey, SIAM Journal on Control and Optimization 25(1987) 383–411. doi: . [ ] S. Jiang, Z. Liang, J. Huang, A fast algorithm for Brownian dynamics simulation with hydrodynamic interactions, Mathematics ofComputation 82 (2013) 1631–1645. doi: . [ ] H. C. Brinkman, A calculation of the viscous force exerted by a flowing fluid on a dense swarm of particles, Flow, Turbulence andCombustion 1 (1949) 27. URL: https://doi.org/10.1007/BF02120313 . doi: . [ ] J. R. Blake, A spherical envelope approach to ciliary propulsion, Journal of Fluid Mechanics 46 (1971) 199–208. [ ] T. Ishikawa, T. Pedley, Coherent structures in monolayers of swimming particles, Physical review letters 100 (2008) 088103. [ ] C. Pozrikidis, Boundary Integral and Singularity Methods for Linearized Viscous Flow, Cambridge Texts in Applied Mathematics,Cambridge University Press, 1992. [ ] M. Rachh, L. Greengard, Integral equation methods for elastance and mobility problems in two dimensions, SIAM Journal on umerical Analysis 54 (2016) 2889–2909. [ ] B. Wu, H. Zhu, A. Barnett, S. Veerapaneni, Solution of stokes flow in complex nonsmooth 2d geometries via a linear-scaling high-orderadaptive integral equation scheme, Journal of Computational Physics Under review (2019). [ ] M. Siegel, A.-K. Tornberg, A local target specific quadrature by expansion method for evaluation of layer potentials in 3D,Journal of Computational Physics 364 (2018) 365–392. URL: . doi: . [ ] M. Wala, A. Klöckner, A fast algorithm for Quadrature by Expansion in three dimensions, Journal of Computational Physics 388(2019) 655–689. doi: . [ ] C. Pérez-Arancibia, L. M. Faria, C. Turc, Harmonic density interpolation methods for high-order evaluation of Laplace layer potentialsin 2D and 3D, Journal of Computational Physics 376 (2019) 411–434. doi: . [ ] D. Malhotra, G. Biros, PVFMM: A Parallel Kernel Independent FMM for Particle and Volume Potentials, Communications in Computa-tional Physics 18 (2015) 808–830. URL: http://journals.cambridge.org/article_S181524061500081X . doi: . [ ] M. Iwasawa, A. Tanikawa, N. Hosono, K. Nitadori, T. Muranushi, J. Makino, Implementation and performance of FDPS: A frameworkfor developing parallel particle simulation codes, Publications of the Astronomical Society of Japan 68 (2016) 54. URL: http://pasj.oxfordjournals.org/lookup/doi/10.1093/pasj/psw053 . doi: . [ ] S. R. Slattery, Mesh-free data transfer algorithms for partitioned multiphysics problems: Conservation, accuracy, and parallelism,Journal of Computational Physics 307 (2016) 164–188. URL: . doi: . [ ] A. Schäfer, D. Fey, LibGeoDecomp: A Grid-Enabled Library for Geometric Decomposition Codes, in: A. Lastovetsky, T. Kechadi,J. Dongarra (Eds.), Recent Advances in Parallel Virtual Machine and Message Passing Interface, Lecture Notes in Computer Science,Springer Berlin Heidelberg, 2008, pp. 285–294. [ ] M. Frigo, S. Johnson, The design and implementation of FFTW3, Proceedings of the IEEE 93 (2005) 216 – 231. doi: . [ ] H. J. Wilson, Stokes flow past three spheres, Journal of Computational Physics 245 (2013) 302–316. URL: . doi: . [ ] A. Sierou, J. F. Brady, Rheology and microstructure in concentrated noncolloidal suspensions, Journal of Rheology 46 (2002) 1031–1056. URL: http://sor.scitation.org/doi/abs/10.1122/1.1501925 . doi: . [ ] A. M. Fiore, J. W. Swan, Fast Stokesian dynamics, Journal of Fluid Mechanics 878 (2019) 544–597. doi: . [ ] É. Guazzelli, Sedimentation of small particles: how can such a simple problem be so difficult?, Comptes Rendus Mécanique 334(2006) 539–544. doi: . [ ] J. Park, B. Metzger, É. Guazzelli, J. E. Butler, A cloud of rigid fibres sedimenting in a viscous fluid, Journal of Fluid Mechanics 648(2010) 351–362. doi: . [ ] C. Dombrowski, L. Cisneros, S. Chatkaew, R. E. Goldstein, J. O. Kessler, Self-concentration and large-scale coherence in bacterialdynamics, Physical review letters 93 (2004) 098103. [ ] D. Saintillan, M. J. Shelley, Instabilities and Pattern Formation in Active Particle Suspensions: Kinetic Theory and Continuum Simu-lations, Physical Review Letters 100 (2008) 178103. doi: , 00248. [ ] T. Sanchez, D. T. Chen, S. J. DeCamp, M. Heymann, Z. Dogic, Spontaneous motion in hierarchically assembled active matter, Nature491 (2012) 431. [ ] M. K. Kuimova, Mapping viscosity in cells using molecular rotors, Physical Chemistry Chemical Physics 14 (2012) 12671–12686.doi: . [ ] R. Krasny, Desingularization of periodic vortex sheet roll-up, Journal of Computational Physics 65 (1986) 292–313. [ ] G. Baker, M. Shelley, On the connection between thin vortex layers and vortex sheets, Journal of Fluid Mechanics 215 (1990) 161–194. [ ] E. Lushi, P. M. Vlahovska, Periodic and Chaotic Orbits of Plane-Confined Micro-rotors in Creeping Flows, Journal of Nonlinear Science25 (2015) 1111–1123. doi: . [ ] D. Robinson, L. Feng, J. Nocedal, J. Pang, Subspace Accelerated Matrix Splitting Algorithms for Asymmetric and Symmetric LinearComplementarity Problems, SIAM Journal on Optimization 23 (2013) 1371–1397. URL: http://epubs.siam.org/doi/abs/10.1137/110845094 . doi: . [ ] S. Y. Reigh, L. Zhu, F. Gallaire, E. Lauga, Swimming with a cage: Low-Reynolds-number locomotion inside a droplet, Soft Matter 13(2017) 3161–3173. doi: . [ ] R. Hartmann, P. K. Singh, P. Pearce, R. Mok, B. Song, F. Díaz-Pascual, J. Dunkel, K. Drescher, Emergence of three-dimensional orderand structure in growing biofilms, Nature Physics (2018) 1. doi: . [ ] E. W. Weisstein, Complete elliptic integral of the first kind, http://mathworld.wolfram.com/CompleteEllipticIntegraloftheFirstKind.html , ????. [ ] E. W. Weisstein, Complete elliptic integral of the second kind, http://mathworld.wolfram.com/CompleteEllipticIntegraloftheSecondKind.html , ????. [ ] NIST digital library of mathematical functions, https://dlmf.nist.gov/ , 2019., 2019.