Hierarchical interpolative factorization for elliptic operators: integral equations
HHierarchical Interpolative Factorization for EllipticOperators: Integral Equations
KENNETH L. HO
Stanford University
ANDLEXING YING
Stanford University
Abstract
This paper introduces the hierarchical interpolative factorization for integral equa-tions (HIF-IE) associated with elliptic problems in two and three dimensions.This factorization takes the form of an approximate generalized LU decompo-sition that permits the efficient application of the discretized operator and itsinverse. HIF-IE is based on the recursive skeletonization algorithm but incorpo-rates a novel combination of two key features: (1) a matrix factorization frame-work for sparsifying structured dense matrices and (2) a recursive dimensionalreduction strategy to decrease the cost. Thus, higher-dimensional problems areeffectively mapped to one dimension, and we conjecture that constructing, ap-plying, and inverting the factorization all have linear or quasilinear complexity.Numerical experiments support this claim and further demonstrate the perfor-mance of our algorithm as a generalized fast multipole method, direct solver,and preconditioner. HIF-IE is compatible with geometric adaptivity and can han-dle both boundary and volume problems. MATLAB codes are freely available.c (cid:13)
This paper considers integral equations (IEs) of the form a ( x ) u ( x ) + b ( x ) (cid:90) Ω K ( (cid:107) x − y (cid:107) ) c ( y ) u ( y ) d Ω ( y ) = f ( x ) , x ∈ Ω ⊂ R d (1.1)associated with elliptic partial differential equations (PDEs), where a ( x ) , b ( x ) , c ( x ) , and f ( x ) are given functions; the integral kernel K ( r ) is related to the funda-mental solution of the underlying PDE; and d = ∆ u ( x ) = , x ∈ D ⊂ R d , (1.2a) u ( x ) = f ( x ) , x ∈ ∂ D ≡ Γ (1.2b) Communications on Pure and Applied Mathematics, Vol. 000, 0001–0039 (2000) c (cid:13) a r X i v : . [ m a t h . NA ] A p r K. L. HO AND L. YING in a smooth, simply connected domain, which can be solved by writing u ( x ) as the double-layer potential u ( x ) = (cid:90) Γ ∂ G ∂ ν y ( (cid:107) x − y (cid:107) ) σ ( y ) d Γ ( y ) , x ∈ D (1.3) over an unknown surface density σ ( x ) , where G ( r ) = (cid:40) − / ( π ) log r , d = / ( π r ) , d = ν y is the unit outernormal at y ∈ Γ . By construction, (1.3) satisfies (1.2a). To enforce theboundary condition (1.2b), take the limit as x → Γ and use standard resultsfrom potential theory [31] to obtain − σ ( x ) + (cid:90) Γ ∂ G ∂ ν y ( (cid:107) x − y (cid:107) ) σ ( y ) d Γ ( y ) = f ( x ) , x ∈ Γ , (1.5) where the integral is defined in the principal value sense. This is a bound-ary IE for σ ( x ) of the form (1.1) (up to a straightforward generalization tomatrix-valued kernels).Alternatively, one could use the single-layer potential representation u ( x ) = (cid:90) Γ G ( (cid:107) x − y (cid:107) ) σ ( y ) d Γ ( y ) , x ∈ D , which immediately gives the IE (cid:90) Γ G ( (cid:107) x − y (cid:107) ) σ ( y ) d Γ ( y ) = f ( x ) , x ∈ Γ upon taking the limit as x → Γ since the integral is well-defined. Note thatthis has a ( x ) ≡ a ( x ) (cid:54) = x and are usually well-conditioned.(2) Consider the divergence-form PDE ∇ · ( a ( x ) ∇ u ( x )) = f ( x ) , x ∈ Ω ⊂ R d and let u ( x ) = (cid:90) Ω G ( (cid:107) x − y (cid:107) ) σ ( y ) d Ω ( y ) , where G ( r ) is as defined in (1.4). Then the PDE becomes the volume IE a ( x ) σ ( x ) + ∇ a ( x ) · (cid:90) Ω ∇ x G ( (cid:107) x − y (cid:107) ) σ ( y ) d Ω ( y ) = f ( x ) , x ∈ Ω upon substitution, which again has the form (1.1). IF-IE 3
IEs can similarly be derived for many of the PDEs of classical physics includingthe Laplace, Helmholtz, Stokes, and time-harmonic Maxwell equations. In suchcases, the kernel function K ( r ) is typically singular near zero but otherwise smoothwith non-compact support. For this paper, we will also require that K ( r ) not be toooscillatory.Discretization of (1.1) using, e.g., the Nystr¨om, collocation, or Galerkin methodleads to a linear system Au = f , (1.6)where A ∈ C N × N is dense with u and f the discrete analogues of u ( x ) and f ( x ) ,respectively. This paper is concerned with the efficient factorization and solutionof such systems. Numerical methods for solving (1.6) can be classified into several groups. Thefirst consists of classical direct methods like Gaussian elimination or other standardmatrix factorizations [26], which compute the solution exactly (in principle, tomachine precision, up to conditioning) without iteration. These methods are usefulwhen N is small. However, since A is dense, such algorithms generally have O ( N ) complexity, which quickly makes them infeasible as N increases.The second group is that of iterative methods, among the most popular of whichare Krylov subspace methods such as conjugate gradient [38, 49] or GMRES [47].The number of iterations required depends on the problem and is typically smallfor second-kind IEs but can grow rapidly for first-kind ones. The main computa-tional cost is the calculation of matrix-vector products at each iteration. Combinedwith fast multipole methods (FMMs) [22, 28, 29, 54] or other accelerated matrixmultiplication schemes [5, 36], such techniques can yield asymptotically optimalor near-optimal solvers with O ( N ) or O ( N log N ) complexity. However, iterativemethods are not as robust as their direct counterparts, especially when a ( x ) , b ( x ) ,or c ( x ) lacks regularity or has high contrast. In such cases, convergence can be slowand specialized preconditioners are often needed. Furthermore, iterative methodscan be inefficient for systems involving multiple right-hand sides or low-rank up-dates, which is an important setting for many applications of increasing interest,including time stepping, inverse problems, and design.The third group covers rank-structured direct solvers, which exploit the obser-vation that certain off-diagonal blocks of A are numerically low-rank in order todramatically lower the cost. The seminal work in this area is due to Hackbusch etal. [32, 34, 35], whose H - and H -matrices have been shown to achieve linearor quasilinear complexity. Although their work has had significant theoretical im-pact, in practice, the constants implicit in the asymptotic scalings tend to be largedue to the recursive nature of the inversion algorithms and the use of expensivehierarchical matrix-matrix multiplication. K. L. HO AND L. YING
More recent developments aimed at improving practical performance includesolvers for hierarchically semiseparable (HSS) matrices [10, 11, 51] and meth-ods based on recursive skeletonization (RS) [25, 27, 39, 43], among other relatedschemes [2, 8, 13]. These can be viewed as special cases of H -matrices and areoptimal in one dimension (1D) (e.g., boundary IEs on curves) but have superlinearcomplexities in higher dimensions. In particular, RS proceeds analogously to thenested dissection multifrontal method (MF) for sparse linear systems [19, 23], withthe so-called skeletons characterizing the off-diagonal blocks corresponding to theseparator fronts. These grow as O ( N / ) in two dimensions (2D) and O ( N / ) in three dimensions (3D), resulting in solver complexities of O ( N / ) and O ( N ) ,respectively.Recently, Corona, Martinsson, and Zorin [16] constructed an O ( N ) RS solverin 2D by exploiting further structure among the skeletons and using hierarchicalmatrix algebra. The principal observation is that for a broad class of integral ker-nels, the generic behavior of RS is to retain degrees of freedom (DOFs) only alongthe boundary of each cell in a domain partitioning. Thus, 2D problems are reducedto 1D, and the large skeleton matrices accumulated throughout the algorithm canbe handled efficiently using 1D HSS techniques. However, this approach is quiteinvolved and has yet to be realized in 3D or in complicated geometries.
In this paper, we introduce the hierarchical interpolative factorization for IEs(HIF-IE), which produces an approximate generalized LU decomposition of A withlinear or quasilinear complexity estimates. HIF-IE is based on RS but augmentsit with a novel combination of two key features: (1) a matrix factorization formu-lation via a sparsification framework similar to that developed in [11, 50, 51] and(2) a recursive dimensional reduction scheme as pioneered in [16]. Unlike [16],however, which keeps large skeleton sets but works with them implicitly using faststructured methods, our sparsification approach allows us to reduce the skeletonsexplicitly. This obviates the need for internal hierarchical matrix representations,which substantially simplifies the algorithm and enables it to extend naturally to3D and to complex geometries, in addition to promoting a more direct view of thedimensional reduction process.Figure 1.1 shows a schematic of HIF-IE as compared to RS in 2D. In RS (top),the domain is partitioned into a set of square cells at each level of a tree hierarchy.Each cell is skeletonized from the finest level to the coarsest, leaving DOFs onlyalong cell interfaces. The size of these interfaces evidently grows as we march upthe tree, which ultimately leads to the observed O ( N / ) complexity.In contrast, in HIF-IE (bottom), we start by skeletonizing the cells at the finestlevel as in RS but, before proceeding further, perform an additional level of edgeskeletonization by grouping the remaining DOFs by cell edge. This respects the1D structure of the interface geometry and allows more DOFs to be eliminated.The combination of cell and edge compression is then repeated up the tree, with IF-IE 5 F IGURE the result that the skeleton growth is now suppressed. The reduction from 2D(square cells) to 1D (edges) to zero dimensions (0D) (points) is completely explicit.Extension to 3D is immediate by skeletonizing cubic cells, then faces, then edgesat each level to execute a reduction from 3D to 2D to 1D to 0D. This tight controlof the skeleton size is essential for achieving near-optimal scaling.Once the factorization has been constructed, it can be used to rapidly apply both A and A − , thereby serving as a generalized FMM, direct solver, or preconditioner(depending on the accuracy). Other capabilities are possible, too, though they willnot be pursued here. As such, HIF-IE is considerably more general than manyprevious non–factorization-based fast direct solvers [10, 16, 25, 39, 43], whichfacilitate only the application of the inverse.Extensive numerical experiments reveal strong evidence for quasilinear com-plexity and demonstrate that HIF-IE can accurately approximate various integraloperators in both boundary and volume settings with high practical efficiency. The remainder of this paper is organized as follows. In Section 2, we introducethe basic tools needed for our algorithm, including an efficient matrix sparsifica-tion operation that we call skeletonization. In Section 3, we describe the recursiveskeletonization factorization (RSF), a reformulation of RS using our new factor-ization approach. This will serve to familiarize the reader with our sparsificationframework as well as to highlight the fundamental difficulty associated with RSmethods in 2D and 3D. In Section 4, we present HIF-IE as an extension of RSFwith additional levels of skeletonization corresponding to recursive dimensional
K. L. HO AND L. YING reductions. Although we cannot yet provide a rigorous complexity analysis, es-timates based on well-supported rank assumptions suggest that HIF-IE achieveslinear or quasilinear complexity. This conjecture is borne out by numerical ex-periments, which we detail in Section 5. Finally, Section 6 concludes with somediscussion and future directions.
In this section, we first list our notational conventions and then describe thebasic elements of our algorithm.Uppercase letters will generally denote matrices, while the lowercase letters c , p , q , r , and s denote ordered sets of indices, each of which is associated with aDOF in the problem. For a given index set c , its cardinality is written | c | . The (un-ordered) complement of c is given by c C , with the parent set to be understood fromthe context. The uppercase letter C is reserved to denote a collection of disjointindex sets.Given a matrix A , A pq is the submatrix with rows and columns restricted to theindex sets p and q , respectively. We also use the MATLAB notation A : , q to denotethe submatrix with columns restricted to q .Throughout, (cid:107) · (cid:107) refers to the 2-norm. Let A = A pp A pq A qp A qq A qr A rq A rr (2.1)be a matrix defined over the indices ( p , q , r ) . This matrix structure often appears insparse PDE problems, where, for example, p corresponds to the interior DOFs of aregion D , q to the DOFs on the boundary ∂ D , and r to the external region Ω \ ¯ D ,which should be thought of as large. In this setting, the DOFs p and r are separatedby q and hence do not directly interact, resulting in the form (2.1).Our first tool is quite standard and concerns the efficient elimination of DOFsfrom such sparse matrices. Lemma 2.1.
Let A be given by (2.1) and write A pp = L p D p U p in factored form,where L p and U p are unit triangular matrices (up to permutation). If A pp is non-singular, then R ∗ p AS p = D p B qq A qr A rq A rr , (2.2) IF-IE 7 where R ∗ p = I − A qp U − p D − p I I L − p I I , S p = U − p I I I − D − p L − p A pq I I and B qq = A qq − A qp A − pp A pq is the associated Schur complement. Note that the indices p have been decoupled from the rest. Regarding the sub-system in (2.2) over the indices ( q , r ) only, we may therefore say that the DOFs p have been eliminated. The operators R p and S p carry out this elimination, whichfurthermore is particularly efficient since the interactions involving the large indexset r are unchanged. Our next tool is the interpolative decomposition (ID) [14] for low-rank matrices,which we present in a somewhat nonstandard form below.
Lemma 2.2.
Let A = A : , q ∈ C m × n with rank k ≤ min ( m , n ) . Then there exist apartitioning q = ˆ q ∪ ˇ q with | ˆ q | = k and a matrix T q ∈ C k × n such that A : , ˇ q = A : , ˆ q T q .Proof. Let A Π = QR = Q (cid:2) R R (cid:3) be a so-called thin pivoted QR decomposition of A , where Q ∈ C m × k is unitary, R ∈ C k × n is upper triangular, and the permutation matrix Π ∈ { , } n × n has beenchosen so that R ∈ C k × k is nonsingular. Then identifying the first k pivots with ˆ q and the remainder with ˇ q , A : , ˇ q = QR = ( QR )( R − R ) ≡ A : , ˆ q T q for T q = R − R . (cid:3) The ID can also be written more traditionally as A = A : , ˆ q (cid:2) I T q (cid:3) Π where Π is the permutation matrix associated with the ordering ( ˆ q , ˇ q ) . We callˆ q and ˇ q the skeleton and redundant indices, respectively. Lemma 2.2 states thatthe redundant columns of A can be interpolated from its skeleton columns. Thefollowing shows that the ID can also be viewed as a sparsification operator. Corollary 2.3.
Let A = A : , q be a low-rank matrix. If q = ˆ q ∪ ˇ q and T q are such thatA : , ˇ q = A : , ˆ q T q , then (cid:2) A : , ˇ q A : , ˆ q (cid:3) (cid:20) I − T q I (cid:21) = (cid:2) A : , ˆ q (cid:3) . K. L. HO AND L. YING
In general, let A : , ˇ q = A : , ˆ q T q + E for some error matrix E and characterize the IDby the functions α ( n , k ) and β ( n , k ) such that (cid:107) T q (cid:107) ≤ α ( n , k ) , (cid:107) E (cid:107) ≤ β ( n , k ) σ k + ( A ) , (2.3)where σ k + ( A ) is the ( k + ) st largest singular value of A . If | α ( n , k ) | and | β ( n , k ) | are not too large, then (2.3) implies that the reconstruction of A : , ˇ q is stable andaccurate. There exists an ID with α ( n , k ) = (cid:113) f k ( n − k ) , β ( n , k ) = (cid:113) + f k ( n − k ) (2.4)for f =
1, but computing it can take exponential time, requiring the combinatorialmaximization of a submatrix determinant. However, an ID satisfying (2.4) with any f > f = O ( kmn ) operations. Fast algorithms based on random sampling are also available[37], but these can incur some loss of accuracy (see also Section 4.3).The ID can be applied in both fixed and adaptive rank settings. In the former,the rank k is specified, while, in the latter, the approximation error is specifiedand the rank adjusted to achieve (an estimate of) it. Hereafter, we consider the IDonly in the adaptive sense, using the relative magnitudes of the pivots to adaptivelyselect k such that (cid:107) E (cid:107) (cid:46) ε (cid:107) A (cid:107) for any specified relative precision ε > We now combine Lemmas 2.1 and 2.2 to efficiently eliminate redundant DOFsfrom dense matrices with low-rank off-diagonal blocks.
Lemma 2.4.
Let A = (cid:20) A pp A pq A qp A qq (cid:21) with A pq and A qp low-rank, and let p = ˆ p ∪ ˇ p and T p be such that (cid:20) A q ˇ p A ∗ ˇ pq (cid:21) = (cid:20) A q ˆ p A ∗ ˆ pq (cid:21) T p , i.e., A q ˇ p = A q ˆ p T p and A ˇ pq = T ∗ p A ˆ pq . Without loss of generality, writeA = A ˇ p ˇ p A ˇ p ˆ p A ˇ pq A ˆ p ˇ p A ˆ p ˆ p A ˆ pq A q ˇ p A q ˆ p A qq and define Q p = I − T p I I . IF-IE 9
Then Q ∗ p AQ p = B ˇ p ˇ p B ˇ p ˆ p B ˆ p ˇ p A ˆ p ˆ p A ˆ pq A q ˆ p A qq , (2.5) where B ˇ p ˇ p = A ˇ p ˇ p − T ∗ p A ˆ p ˇ p − A ˇ p ˆ p T p + T ∗ p A ˆ p ˆ p T p , B ˇ p ˆ p = A ˇ p ˆ p − T ∗ p A ˆ p ˆ p , B ˆ p ˇ p = A ˆ p ˇ p − A ˆ p ˆ p T p , so R ∗ ˇ p Q ∗ p AQ p S ˇ p = D ˇ p B ˆ p ˆ p A ˆ pq A q ˆ p A qq ≡ Z p ( A ) , (2.6) where R ˇ p and S ˇ p are the elimination operators of Lemma 2.1 associated with ˇ p andB ˆ p ˆ p = A ˆ p ˆ p − B ˆ p ˇ p B − p ˇ p B ˇ p ˆ p , assuming that B ˇ p ˇ p is nonsingular. In essence, the ID sparsifies A by decoupling ˇ p from q , thereby allowing it tobe eliminated using efficient sparse techniques. We refer to this procedure as skele-tonization since only the skeletons ˆ p remain. Note that the interactions involving q = p C are unchanged. A very similar approach has previously been described inthe context of HSS ULV decompositions [11] by combining the structure-preservingrank-revealing factorization [53] with reduced matrices [50].In general, the ID often only approximately sparsifies A (for example, if its off-diagonal blocks are low-rank only to a specified numerical precision) so that (2.5)and consequently (2.6) need not hold exactly. In such cases, the skeletonizationoperator Z p ( · ) should be interpreted as also including an intermediate truncationstep that enforces sparsity explicitly. For notational convenience, however, we willcontinue to identify the left- and right-hand sides of (2.6) by writing Z p ( A ) ≈ R ∗ ˇ p Q ∗ p AQ p S ˇ p , with the truncation to be understood implicitly.In this paper, we often work with a collection C of disjoint index sets, where A c , c C and A c C , c are numerically low-rank for all c ∈ C . Applying Lemma 2.4 to all c ∈ C gives Z C ( A ) ≈ U ∗ AV , U = ∏ c ∈ C Q c R ˇ c , V = ∏ c ∈ C Q c S ˇ c , where the redundant DOFs ˇ c for each c ∈ C have been decoupled from the rest andthe matrix products over C can be taken in any order. The resulting skeletonizedmatrix Z C ( A ) is significantly sparsified and has a block diagonal structure over theindex groups θ = (cid:32) (cid:91) c ∈ C { ˇ c } (cid:33) ∪ (cid:40) s \ (cid:91) c ∈ C ˇ c (cid:41) , where the outer union is to be understood as acting on collections of index sets and s = { , . . . , N } is the set of all indices. In this section, we present RSF, a reformulation of RS [25, 27, 39, 43] as amatrix factorization using the sparsification view of skeletonization as developedin Lemma 2.4. Mathematically, RSF is identical to RS but expresses the matrix A as a (multiplicative) multilevel generalized LU decomposition instead of as anadditive hierarchical low-rank update. This representation enables much simpleralgorithms for applying A and A − as well as establishes a direct connection withMF [19, 23] for sparse matrices, which produces a (strict) LU decomposition usingLemma 2.1. Indeed, RSF is essentially just MF with pre-sparsification via the IDat each level. This point of view places methods for structured dense and sparsematrices within a common framework, which provides a potential means to transfertechniques from one class to the other.Note that because RSF is based on elimination, it requires that certain interme-diate matrices be invertible, which in general means that A must be square. Thisis a slight limitation when compared to RS, which can be used, for example, as ageneralized FMM [25, 39] or least squares solver [40] for rectangular matrices.We begin with a detailed description of RSF in 2D before extending to 3D inthe natural way (the 1D case will not be treated but should be obvious from thediscussion). The same presentation framework will also be used for HIF-IE inSection 4, which we hope will help make clear the specific innovations responsiblefor its improved complexity estimates. Consider the IE (1.1) on Ω = ( , ) , discretized using a piecewise constantcollocation method over a uniform n × n grid for simplicity. More general domainsand discretizations can be handled without difficulty, but the current setting willserve to illustrate the main ideas. Let h be the step size in each direction andassume that n = / h = L m , where m = O ( ) is a small integer. Integer pairs j = ( j , j ) index the elements Ω j = h ( j − , j ) × h ( j − , j ) and their centers x j = h ( j − / , j − / ) for 1 ≤ j , j ≤ n . With { x j } as the collocation points,the discrete system (1.6) reads a i u i + b i ∑ j K i j c j u j = f i at each x i , where a j = a ( x j ) , b j = b ( x j ) , c j = c ( x j ) , and f j = f ( x j ) ; u j is theapproximation to u ( x j ) ; and K i j = (cid:90) Ω j K ( (cid:107) x i − y (cid:107) ) d Ω ( y ) . (3.1) IF-IE 11 (cid:96) = (cid:96) = (cid:96) = (cid:96) = IGURE (cid:96) of RSF in 2D.
Note that A is not stored since it is dense; rather, its entries are generated as needed.The total number of DOFs is N = n , each of which is associated with a point x j and an index in s .The algorithm proceeds by eliminating DOFs level by level. At each level (cid:96) ,the set of DOFs that have not been eliminated are called active with indices s (cid:96) .Initially, we set A = A and s = s . Figure 3.1 shows the active DOFs at each levelfor a representative example. Level A and s . Partition Ω into the Voronoi cells [4] mh ( j − , j ) × mh ( j − , j ) of width mh = n / L about the centers mh ( j − / , j − / ) for 1 ≤ j , j ≤ L . Let C be the collection of index sets correspond-ing to the active DOFs of each cell. Clearly, (cid:83) c ∈ C c = s . Then skeletonizationwith respect to C gives A = Z C ( A ) ≈ U ∗ A V , U = ∏ c ∈ C Q c R ˇ c , V = ∏ c ∈ C Q c S ˇ c , where the DOFs (cid:83) c ∈ C ˇ c have been eliminated (and marked inactive). Let s = s \ (cid:83) c ∈ C ˇ c = (cid:83) c ∈ C ˆ c be the remaining active DOFs. The matrix A is block diagonalwith block partitioning θ = (cid:32) (cid:91) c ∈ C { ˇ c } (cid:33) ∪ { s } . Level (cid:96)
Defined at this stage are A (cid:96) and s (cid:96) . Partition Ω into the Voronoi cells 2 (cid:96) mh ( j − , j ) × (cid:96) mh ( j − , j ) of width 2 (cid:96) mh = n / L − (cid:96) about the centers 2 (cid:96) mh ( j − / , j − / ) for 1 ≤ j , j ≤ L − (cid:96) . Let C (cid:96) be the collection of index sets cor-responding to the active DOFs of each cell. Clearly, (cid:83) c ∈ C (cid:96) c = s (cid:96) . Skeletonizationwith respect to C (cid:96) then gives A (cid:96) + = Z C (cid:96) ( A (cid:96) ) ≈ U ∗ (cid:96) A (cid:96) V (cid:96) , U (cid:96) = ∏ c ∈ C (cid:96) Q c R ˇ c , V (cid:96) = ∏ c ∈ C (cid:96) Q c S ˇ c , where the DOFs (cid:83) c ∈ C (cid:96) ˇ c have been eliminated. The matrix A (cid:96) + is block diagonalwith block partitioning θ (cid:96) + = (cid:32) (cid:91) c ∈ C { ˇ c } (cid:33) ∪ · · · ∪ (cid:32) (cid:91) c ∈ C (cid:96) { ˇ c } (cid:33) ∪ { s (cid:96) + } , where s (cid:96) + = s (cid:96) \ (cid:83) c ∈ C (cid:96) ˇ c = (cid:83) c ∈ C (cid:96) ˆ c . Level L Finally, we have A L and s L , where D ≡ A L is block diagonal with block parti-tioning θ L = (cid:32) (cid:91) c ∈ C { ˇ c } (cid:33) ∪ · · · ∪ (cid:32) (cid:91) c ∈ C L − { ˇ c } (cid:33) ∪ { s L } . Combining the approximation over all levels gives D ≈ U ∗ L − · · · U ∗ AV · · · V L − , where each U (cid:96) and V (cid:96) are products of unit triangular matrices, each of which canbe inverted simply by negating its off-diagonal entries. Therefore, A ≈ U −∗ · · · U −∗ L − DV − L − · · · V − ≡ F , (3.2a) A − ≈ V · · · V L − D − U ∗ L − · · · U ∗ = F − . (3.2b)The factorization F permits fast multiplication and can be used as a generalizedFMM. Its inverse F − can be used as a direct solver at high accuracy or as a pre-conditioner otherwise. If D is stored in factored form, e.g., as an LU decomposi-tion, then the same factorization can readily be used for both tasks. We call (3.2)an (approximate) generalized LU decomposition since while each U (cid:96) and V (cid:96) arecomposed of triangular factors, they are not themselves triangular, being the prod-uct of both upper and lower triangular matrices. We emphasize that F and F − arenot assembled explicitly and are applied only in factored form.The entire procedure is summarized compactly as Algorithm 3.1. In general, weconstruct the cell partitioning at each level using an adaptive quadtree [48], whichrecursively subdivides the domain until each node contains only O ( ) DOFs.
Algorithm 3.1
RSF. A = A (cid:46) initialize for (cid:96) = , , . . . , L − do (cid:46) loop from finest to coarsest level A (cid:96) + = Z C (cid:96) ( A (cid:96) ) ≈ U ∗ (cid:96) A (cid:96) V (cid:96) (cid:46) skeletonize cells end for A ≈ U −∗ · · · U −∗ L − A L V − L − · · · V − (cid:46) generalized LU decomposition IF-IE 13 (cid:96) = (cid:96) = (cid:96) = IGURE (cid:96) of RSF in 3D.
Consider now the analogous setting in 3D, where Ω = ( , ) is discretizedusing a uniform n × n × n grid with Ω j = h ( j − , j ) × h ( j − , j ) × h ( j − , j ) and x j = h ( j − / , j − / , j − / ) for j = ( j , j , j ) . The total number ofDOFs is N = n .The algorithm extends in the natural way with cubic cells 2 (cid:96) mh ( j − , j ) × (cid:96) mh ( j − , j ) × (cid:96) mh ( j − , j ) about the centers 2 (cid:96) mh ( j − / , j − / , j − / ) replacing the square cells in 2D at level (cid:96) for 1 ≤ j , j , j ≤ L − (cid:96) . Withthis modification, the rest of the algorithm remains unchanged. Figure 3.2 showsthe active DOFs at each level for a representative example. The output is again afactorization of the form (3.2). General geometries can be treated using an adaptiveoctree. A dominant contribution to the cost of RSF is computing IDs for skeletoniza-tion. The basic operation required is the construction of an ID of W (cid:96), c = (cid:20) ( A (cid:96) ) c C , c ( A (cid:96) ) ∗ c , c C (cid:21) , where c ∈ C (cid:96) and c C = s (cid:96) \ c , following Lemma 2.4. We hereafter drop the depen-dence on (cid:96) for notational convenience. Observe that W c is a tall-and-skinny matrixof size O ( N ) × | c | , so forming its ID takes at least O ( N | c | ) work. The total numberof index sets c ∈ C (cid:96) for all (cid:96) is O ( N ) , so considering all W c yields a lower bound of O ( N ) on the total work and hence on the complexity of RSF.In principle, it is straightforward to substantially accelerate the algorithm byreconstructing an ID of W c from that of a much smaller matrix Y c . All that isneeded is that the rows of Y c span those of W c , i.e., R ( W ∗ c ) ⊆ R ( Y ∗ c ) , where R ( · ) denotes the matrix range. Lemma 3.1.
Let W = XY with column indices q. If q = ˆ q ∪ ˇ q and T q are such thatY : , ˇ q = Y : , ˆ q T q , then W : , ˇ q = XY : , ˇ q = XY : , ˆ q T q = W : , ˆ q T q . F IGURE B canbe represented by its interactions with an artificial local proxy surface Γ and with all DOFs interior to Γ . In other words, an ID of Y c gives an ID of W c = X c Y c . Note that we make noexplicit reference to X c ; only its existence is assumed. Of course, such a smallmatrix Y c always exists since rank ( W c ) ≤ | c | ; the difficulty lies in finding Y c apriori .For elliptic problems, the integral kernel K ( r ) typically satisfies some form ofGreen’s theorem, in which its values inside a region D ∈ Ω can be recovered fromits values on the boundary Γ = ∂ D . Consider, for example, the Laplace kernel (1.4)and let ϕ ( x ) = G ( (cid:107) x − x (cid:107) ) be the harmonic field in D due to an exterior source x ∈ Ω \ ¯ D . Then ϕ ( x ) = (cid:90) Γ (cid:20) ϕ ( y ) ∂ G ∂ ν y ( (cid:107) x − y (cid:107) ) − ∂ ϕ∂ ν y ( y ) G ( (cid:107) x − y (cid:107) ) (cid:21) d Γ ( y ) , x ∈ D , i.e., the “incoming” field ϕ ( x ) lives in the span of single- and double-layer inter-actions with Γ . In practice, we will use this fact only when x ∈ D is sufficientlyseparated from Γ (see below), in which case the double-layer term can often evenbe omitted since the corresponding discrete spaces are equal to high precision. Out-going interactions can essentially be treated in the same way using the “transpose”of this idea.In such cases, a suitable Y c can readily be constructed. To see this, let B denotethe cell containing the DOFs c and draw a local “proxy” surface Γ around B (Figure3.3). This partitions c C as c C = c N ∪ c F , where c N consists of all DOFs interior to Γ (the near field) and c F consists of the rest (the far field). By Green’s theorem, theinteractions involving c F can be represented by artificial “equivalent” interactionswith Γ . Therefore, discretizing Γ with equivalent DOFs c E , we assert that: Lemma 3.2.
Consider (1.1) with b ( x ) ≡ c ( x ) ≡ and let all quantities be asdefined in the preceding discussion. Then, up to discretization error (see [45] ), R ( A ∗ c F , c ) ⊆ R ( Y ∗ c E , c ) , where ( Y c E , c ) i j = K ( (cid:107) x E i − x j (cid:107) ) for { x j } and { x E j } the pointsidentified with the DOFs c and c E , respectively. IF-IE 15
Proof.
This immediately follows from Green’s theorem upon recognizing that A c F , c contains interactions involving only the original kernel function K ( r ) . This must bechecked because A : , c may have Schur complement interactions (SCIs), i.e., thosecorresponding to the matrix B ˆ p ˆ p in (2.6), accumulated from skeletonization at pre-vious levels, over which we do not have analytic control. However, due to thehierarchical nature of the domain partitioning, any such SCIs must be restricted tothe diagonal block A cc . Thus, Green’s theorem applies. (cid:3) Lemma 3.3.
Consider (1.1) with general b ( x ) and c ( x ) . Then, up to discretizationerror, R ( A ∗ c F , c ) ⊆ R ( Y ∗ c E , c ) and R ( A c , c F ) ⊆ R ( Y c , c E ) , where ( Y c E , c ) i j = K ( (cid:107) x E i − x j (cid:107) ) c ( x j ) , ( Y c , c E ) i j = b ( x i ) K ( (cid:107) x i − x E j (cid:107) ) . Proof.
The functions b ( x ) and c ( x ) act as diagonal multipliers, so A c F , c = B c F ˜ A c F , c C c ,where ˜ A c F , c is the corresponding interaction matrix with b ( x ) ≡ c ( x ) ≡ B c F = diag ( b ( x F i )) and C c = diag ( c ( x i )) for { x F j } the pointsattached to c F . By Lemma 3.2, ˜ A c F , c = ˜ X c E , c ˜ Y c E , c for some ˜ X c E , c , so A c F , c = B c F ˜ X c E , c ˜ Y c E , c C c = (cid:0) B c F ˜ X c E , c (cid:1) (cid:0) ˜ Y c E , c C c (cid:1) ≡ X c E , c Y c E , c . A similar argument with A c , c F = B c ˜ A c , c F C c F analogously defined and˜ A c , c F = ˜ A T c F , c = ˜ Y T c E , c ˜ X T c E , c ≡ ˜ Y c , c E ˜ X c , c E proves that A c , c F = Y c , c E X c , c E for some X c , c E . (cid:3) If Γ is separated from B , for example as in Figure 3.3, then standard multipoleestimates [28, 29] show that we only need | c E | = O ( log d − ( / ε )) to satisfy Green’stheorem to any precision ε . In particular, for fixed ε , we can choose | c E | to beconstant. Therefore, Lemma 3.3 gives W c ≈ X c Y c ≡ X c A c N , c A ∗ c , c N Y c E , c Y ∗ c , c E (3.3)for some X c , where Y c has size O ( | c N | + ) × | c | with | c N | = O ( | c | ) typically.Lemma 3.1 then reduces the global compression of W c to the local compressionof Y c . This so-called proxy trick has also been employed by [14, 16, 25, 27, 39, 43,44, 46, 54] and is crucial for reducing the asymptotic complexity. For numericalstability, we include the quadrature weights for the integral (3.1) in Y c E , c and Y c , c E so that the various components of Y c are all of the same order.In this paper, for a cell B with scaled width 1 centered at the origin, we take as Γ the circle of radius 3 / / | c E | have been experimentally validated to repro-duce interactions via the Laplace kernel (1.4) with ε ∼ − . This approach is F IGURE more efficient than the “supercell” proxy of [27, 39] by factors of 4 / π = . ... in 2D and 6 / π = . ... in 3D (volume ratio of the cube to the sphere of equaldiameter), which takes as Γ the outer boundary of the 3 × ×
3) cell block centeredat B . We now investigate the computational complexity of RSF. For this, we need toestimate the skeleton size | ˆ c | for a typical index set c ∈ C (cid:96) at level (cid:96) . Denote thisquantity by k (cid:96) and let n (cid:96) = ( (cid:96) m ) d = O ( d (cid:96) ) be the number of DOFs (both activeand inactive) in each cell. From Figures 3.1 and 3.2, it is clear that skeletons tend tocluster around cell interfaces, which can again be justified by Green’s theorem, so k (cid:96) = O ( n / (cid:96) ) = O ( (cid:96) ) in 2D and k (cid:96) = O ( n / (cid:96) ) = O ( (cid:96) ) in 3D. Indeed, this can beverified using standard multipole estimates by noting that k (cid:96) is on the order of theinteraction rank between two adjacent cells at level (cid:96) , which can be analyzed viarecursive subdivision to expose well-separated structures (Figure 3.4). This yieldsthe more detailed result k (cid:96) = (cid:40) O ( (cid:96) ) , d = O ( ( d − ) (cid:96) ) , d ≥ , (3.4)which, in fact, holds for d equal to the intrinsic dimension rather than the ambientdimension. Theorem 3.4 ([39, 43]) . Assume that (3.4) holds. Then the cost of constructing thefactorization F in (3.2) using RSF with accelerated compression ist f = O ( dL m d ) + L ∑ (cid:96) = d ( L − (cid:96) ) O ( k (cid:96) ) = (cid:40) O ( N ) , d = O ( N ( − / d ) ) , d ≥ , (3.5) while that of applying F or F − ist a / s = O ( dL m d ) + L ∑ (cid:96) = d ( L − (cid:96) ) O ( k (cid:96) ) = O ( N ) , d = O ( N log N ) , d = O ( N ( − / d ) ) , d ≥ . (3.6) IF-IE 17
Proof.
Consider first the factorization cost t f . There are 2 d ( L − (cid:96) ) cells at level (cid:96) ,where each cell c ∈ C (cid:96) requires the calculation of an ID of Y c in (3.3) as well asvarious local matrix operations at a total cost of O ( | c | ) , assuming that | c N | = O ( | c | ) . But | c | = m d for (cid:96) =
0, while | c | = O ( k (cid:96) − ) = O ( k (cid:96) ) for (cid:96) ≥ c are obtained by merging the skeletons of 2 d cells at level (cid:96) − t a / s by observing that each c ∈ C (cid:96) requires localmatrix-vector products with cost O ( | c | ) . (cid:3) Remark . If a tree is used, then there is also a cost of O ( N log N ) for tree con-struction, but the associated constant is tiny and so we can ignore it for all practicalpurposes.The memory cost to store F itself is clearly m f = O ( t a / s ) and so is also givenby (3.6). From Theorem 3.4, it is immediate that RSF behaves just like MF, withthe geometric growth of k (cid:96) in 2D and 3D leading to suboptimal complexities. Corollary 3.6.
If k (cid:96) = O ( k (cid:96) ) (3.7) for some constant k, then t f = O ( Nk ) and t a / s = O ( Nk ) .Proof. From (3.5), t f = O ( dL ( m d + k ) ) , so choosing m d = O ( k ) gives N = n d =( L m ) d = O ( dL k ) and t f = O ( dL k ) = O ( Nk ) . Similarly, t a / s = O ( dL ( m d + k ) ) = O ( dL k ) = O ( Nk ) . (cid:3) This is a more precise version of the 1D result that will be useful later whendiscussing HIF-IE.
In this section, we present HIF-IE, which builds upon RSF by introducing ad-ditional levels of skeletonization in order to effectively reduce all problems to 1D.Considering the 2D case for concreteness, the main idea is simply to employ anadditional level (cid:96) + / (cid:96) by partitioning Ω according to the celledges near which the surviving active DOFs cluster. This fully exploits the 1Dgeometry of the active DOFs. However, the algorithm is complicated by the factthat the cell and edge partitions are non-nested, so different index groups may nowinteract via SCIs. Such SCIs do not lend themselves easily to analysis and we haveyet to prove a statement like (3.4) on their ranks. Nevertheless, extensive numer-ical experiments by ourselves (Section 5) and others [16] reveal that very similarbounds appear to be obeyed. This suggests that SCIs do not need to be treated inany significantly different way, and we hereafter assume that interaction rank iscompletely determined by geometry. (cid:96) = (cid:96) = / (cid:96) = (cid:96) = / (cid:96) = (cid:96) = / (cid:96) = IGURE (cid:96) of HIF-IE in 2D.
The overall approach of HIF-IE is closely related to that of [16], but our sparsi-fication framework permits a much simpler implementation and analysis. As withRSF, we begin first in 2D before extending to 3D.
Assume the same setup as in Section 3.1. HIF-IE supplements cell skeletoniza-tion (2D to 1D) at level (cid:96) with edge skeletonization (1D to 0D) at level (cid:96) + / (cid:96) = , , . . . , L −
1. Figure 4.1 shows the active DOFs at each level for arepresentative example.
Level (cid:96)
Partition Ω into Voronoi cells about the cell centers 2 (cid:96) mh ( j − / , j − / ) for 1 ≤ j , j ≤ L − (cid:96) . Let C (cid:96) be the collection of index sets corresponding to theactive DOFs of each cell. Skeletonization with respect to C (cid:96) then gives A (cid:96) + / = Z C (cid:96) ( A (cid:96) ) ≈ U ∗ (cid:96) A (cid:96) V (cid:96) , U (cid:96) = ∏ c ∈ C (cid:96) Q c R ˇ c , V (cid:96) = ∏ c ∈ C (cid:96) Q c S ˇ c , where the DOFs (cid:83) c ∈ C (cid:96) ˇ c have been eliminated. Level (cid:96) + / Ω into Voronoi cells about the edge centers 2 (cid:96) mh ( j , j − / ) for1 ≤ j ≤ L − (cid:96) −
1, 1 ≤ j ≤ L − (cid:96) and 2 (cid:96) mh ( j − / , j ) for 1 ≤ j ≤ L − (cid:96) , 1 ≤ j ≤ L − (cid:96) −
1. Let C (cid:96) + / be the collection of index sets corresponding to the active IF-IE 19
DOFs of each cell. Skeletonization with respect to C (cid:96) + / then gives A (cid:96) + = Z C (cid:96) + / ( A (cid:96) + / ) ≈ U ∗ (cid:96) + / A (cid:96) + / V (cid:96) + / , U (cid:96) + / = ∏ c ∈ C (cid:96) + / Q c R ˇ c , V (cid:96) + / = ∏ c ∈ C (cid:96) + / Q c S ˇ c , where the DOFs (cid:83) c ∈ C (cid:96) + / ˇ c have been eliminated. Level L Combining the approximation over all levels gives D ≡ A L ≈ U ∗ L − / · · · U ∗ / U ∗ AV V / · · · V L − / , so A ≈ U −∗ U −∗ / · · · U −∗ L − / DV − L − / · · · V − / V − ≡ F , (4.1a) A − ≈ V V / · · · V L − / D − U ∗ L − / · · · U ∗ / U ∗ = F − . (4.1b)This is a factorization of exactly the same type as that in (3.2) (but with twice thenumber of factors). The entire procedure is summarized as Algorithm 4.1. Algorithm 4.1
HIF-IE in 2D. A = A (cid:46) initialize for (cid:96) = , , . . . , L − do (cid:46) loop from finest to coarsest level A (cid:96) + / = Z C (cid:96) ( A (cid:96) ) ≈ U ∗ (cid:96) A (cid:96) V (cid:96) (cid:46) skeletonize cells A (cid:96) + = Z C (cid:96) + / ( A (cid:96) + / ) ≈ U ∗ (cid:96) + / A (cid:96) + / V (cid:96) + / (cid:46) skeletonize edges end for A ≈ U −∗ U −∗ / · · · U −∗ L − / A L V − L − / · · · V − / V − (cid:46) generalized LU decomposition Assume the same setup as in Section 3.2. HIF-IE now performs two rounds ofadditional dimensional reduction over RSF by supplementing cell skeletonization(3D to 2D) at level (cid:96) with face skeletonization (2D to 1D) at level (cid:96) + / (cid:96) + /
3. Figure 4.2 shows the active DOFs ateach level for a representative example.
Level (cid:96)
Partition Ω into Voronoi cells about the cell centers 2 (cid:96) mh ( j − / , j − / , j − / ) for 1 ≤ j , j , j ≤ L − (cid:96) . Let C (cid:96) be the collection of index sets correspondingto the active DOFs of each cell. Skeletonization with respect to C (cid:96) then gives A (cid:96) + / = Z C (cid:96) ( A (cid:96) ) ≈ U ∗ (cid:96) A (cid:96) V (cid:96) , U (cid:96) = ∏ c ∈ C (cid:96) Q c R ˇ c , V (cid:96) = ∏ c ∈ C (cid:96) Q c S ˇ c , where the DOFs (cid:83) c ∈ C (cid:96) ˇ c have been eliminated. (cid:96) = (cid:96) = / (cid:96) = / (cid:96) = (cid:96) = / (cid:96) = / (cid:96) = IGURE (cid:96) of HIF-IE in 3D.
Level (cid:96) + / Ω into Voronoi cells about the face centers2 (cid:96) mh (cid:18) j , j − , j − (cid:19) , ≤ j ≤ L − (cid:96) − , ≤ j , j ≤ L − (cid:96) , (cid:96) mh (cid:18) j − , j , j − (cid:19) , ≤ j ≤ L − (cid:96) − , ≤ j , j ≤ L − (cid:96) , (cid:96) mh (cid:18) j − , j − , j (cid:19) , ≤ j ≤ L − (cid:96) − , ≤ j , j ≤ L − (cid:96) . Let C (cid:96) + / be the collection of index sets corresponding to the active DOFs of eachcell. Skeletonization with respect to C (cid:96) + / then gives A (cid:96) + / = Z C (cid:96) + / ( A (cid:96) + / ) ≈ U ∗ (cid:96) + / A (cid:96) + / V (cid:96) + / , U (cid:96) + / = ∏ c ∈ C (cid:96) + / Q c R ˇ c , V (cid:96) + / = ∏ c ∈ C (cid:96) + / Q c S ˇ c , where the DOFs (cid:83) c ∈ C (cid:96) + / ˇ c have been eliminated. IF-IE 21
Level (cid:96) + / Ω into Voronoi cells about the edge centers2 (cid:96) mh (cid:18) j , j , j − (cid:19) , ≤ j , j ≤ L − (cid:96) − , ≤ j ≤ L − (cid:96) , (cid:96) mh (cid:18) j , j − , j (cid:19) , ≤ j , j ≤ L − (cid:96) − , ≤ j ≤ L − (cid:96) , (cid:96) mh (cid:18) j − , j , j (cid:19) , ≤ j , j ≤ L − (cid:96) − , ≤ j ≤ L − (cid:96) . Let C (cid:96) + / be the collection of index sets corresponding to the active DOFs of eachcell. Skeletonization with respect to C (cid:96) + / then gives A (cid:96) + = Z C (cid:96) + / ( A (cid:96) + / ) ≈ U ∗ (cid:96) + / A (cid:96) + / V (cid:96) + / , U (cid:96) + / = ∏ c ∈ C (cid:96) + / Q c R ˇ c , V (cid:96) + / = ∏ c ∈ C (cid:96) + / Q c S ˇ c , where the DOFs (cid:83) c ∈ C (cid:96) + / ˇ c have been eliminated. Level L Combining the approximation over all levels gives D ≡ A L ≈ U ∗ L − / · · · U ∗ / U ∗ / U ∗ AV V / V / · · · V L − / , so A ≈ U −∗ U −∗ / U −∗ / · · · U −∗ L − / DV − L − / · · · V − / V − / V − ≡ F , (4.2a) A − ≈ V V / V / · · · V L − / D − U ∗ L − / · · · U ∗ / U ∗ / U ∗ = F − . (4.2b)This procedure is summarized as Algorithm 4.2. Algorithm 4.2
HIF-IE in 3D. A = A (cid:46) initialize for (cid:96) = , , . . . , L − do (cid:46) loop from finest to coarsest level A (cid:96) + / = Z C (cid:96) ( A (cid:96) ) ≈ U ∗ (cid:96) A (cid:96) V (cid:96) (cid:46) skeletonize cells A (cid:96) + / = Z C (cid:96) + / ( A (cid:96) + / ) ≈ U ∗ (cid:96) + / A (cid:96) + / V (cid:96) + / (cid:46) skeletonize faces A (cid:96) + = Z C (cid:96) + / ( A (cid:96) + / ) ≈ U ∗ (cid:96) + / A (cid:96) + / V (cid:96) + / (cid:46) skeletonize edges end for A ≈ U −∗ U −∗ / · · · U −∗ L − / A L V − L − / · · · V − / V − (cid:46) generalized LU decomposition Proxy compression still applies, provided that we make some minor modifica-tions to account for SCIs, which we generally have access to only numerically andso cannot evaluate at arbitrary points as needed in Lemma 3.3. Specifically, for agiven index set c , we now expand c N by including all DOFs that interact with c viaSCIs in addition to those interior to Γ as in Section 3.3. The far field c F = c C \ c N then consists only of original kernel interactions, so Lemma 3.3 holds. It remainsto observe that SCIs are local due to the domain partitioning strategy. Thus, all c N reside in an immediate neighborhood of c and we again conclude that | c N | = O ( | c | ) .Even with this acceleration, however, the ID still manifests as a computationalbottleneck. To combat this, we also tried fast randomized methods [37] based oncompressing Φ c Y c , where Φ c is a small Gaussian random sampling matrix. Wefound that the resulting ID was inaccurate when Y c contained SCIs. This could beremedied by considering instead Φ c ( Y c Y ∗ c ) γ Y c for some small integer γ = , , . . . ,but the expense of the extra multiplications usually outweighed any efficiencygains. The algorithms presented so far are highly accurate for first-kind IEs in that (cid:107) A − F (cid:107) / (cid:107) A (cid:107) = O ( ε ) , where ε is the input precision to the ID (Section 5). Forsecond-kind IEs, however, we see a systematic deterioration of the relative errorroughly as O ( N ε ) as N → ∞ . This instability can be explained as follows. Let A be a typical second-kind IE matrix discretization. Then the diagonal entries of A are O ( ) , while its off-diagonal entries are O ( / N ) . Since the interpolation matrix,say, T p from the ID has entries of order O ( ) , the same is true of B ˇ p ˇ p , B ˇ p ˆ p , and B ˆ p ˇ p in (2.5). Therefore, the entries of the Schur complement B ˆ p ˆ p in (2.6) are O ( ) , i.e.,SCIs dominate kernel interactions by a factor of O ( N ) . Lemma 4.1.
Assume the setting of the discussion above and let c ∈ C (cid:96) be such thatY c in (3.3) contains SCIs. Then (cid:107) Y c (cid:107) = O ( ) , so the ID of Y c has absolute error (cid:107) E c (cid:107) = O ( ε ) . Consider now the process of “unfolding” the factorization F from the middlematrix D ≡ A L outward. This is accomplished by undoing the skeletonization oper-ation for each c ∈ C (cid:96) in reverse order, at each step reconstructing ( A (cid:96) ) : , ˇ c and ( A (cid:96) ) ˇ c , : from ( A (cid:96) + / d ) : , ˆ c and ( A (cid:96) + / d ) ˆ c , : . Restricting attention to 2D for concreteness, westart at level L with interactions between the DOFs s L as depicted in Figure 4.3(left). By Lemma 4.1, un-skeletonizing each edge c ∈ C L − / induces an error inthe interactions between the edges e and e as labeled in the figure (center) of ab-solute magnitude O ( ε ) . At the next level, un-skeletonizing the shaded cell c ∈ C L − which they bound then relies on the approximate interactions between e and e .This spreads the O ( ε ) error over the reconstructed cell interactions, which is smallfor SCIs acting internally to each cell c ∈ C L − (omitting level L − / B and B (right); IF-IE 23 F IGURE
IGURE indeed, the relative error for the latter is O ( N ε ) . These corrupted interactions arethen used for reconstruction at the next level and are eventually spread throughoutthe whole matrix. The same argument clearly holds in 3D.This analysis suggests that the only fix is to skeletonize at effective precision O ( ε / N ) so that kernel interactions are accurately reconstructed. This is equivalentto ensuring that both scales in Y c are well approximated by the ID. Following thisintuition, we decompose Y c as Y c = Y K c + Y S c , where Y K c consists purely of kernelinteractions, and set ρ c ε for ρ c = min ( , (cid:107) Y K c (cid:107) / (cid:107) Y S c (cid:107) ) as the local compressiontolerance, which we note uses increased precision only when necessary.The two-scale structure of Y c also enables an additional optimization as canbe seen by studying the sparsity patterns of SCIs. Figure 4.4 shows an exampleconfiguration in 2D after cell skeletonization at level (cid:96) , which leaves a collectionof edges at level (cid:96) + /
2, each composed of two half-edges consisting of skeletonsfrom the two cells on either side (left). Let c = g ∪ g ∈ C (cid:96) + / be a given edge withindices partitioned by half-edge, and let Y g j be the submatrix of Y c corresponding to g j . Then Y S g and Y S g (analogously defined) have different nonzero structures, so Y g and Y g have large entries in different row blocks (right). The stable interpolation of Y c hence requires that all interpolation coefficients from one half-edge to the otherbe O ( / N ) since otherwise the reconstruction of, say, Y g will have large errorsin rows where Y S g is nonzero. As N → ∞ , these cross-interpolation coefficientsmust therefore vanish and the compression of Y c decouples into the compressionof Y g and Y g separately. We enforce this asymptotic decoupling explicitly, which moreover provides an acceleration due to the cubic cost of the ID. The ID of Y c isthen given by ˆ c = ( ˆ g , ˆ g ) , ˇ c = ( ˇ g , ˇ g ) , and T c = diag ( T g , T g ) , where g j = ˆ g j ∪ ˇ g j and T g j define the ID of Y g j . We use the compression tolerance ρ g j ε with ρ g j = min ( , (cid:107) Y K g j (cid:107) / (cid:107) Y S g j (cid:107) ) locally for each g j .In general, we define the subsets { g j } algebraically according to the sparsitypattern of Y S c , which can be done using the matrix indicator function ( S ( A )) i j = (cid:40) , A i j = , A i j (cid:54) = . Lemma 4.2.
Let B = S ( A ) ∗ S ( A ) for some matrix A. Then A : , i and A : , j have thesame sparsity pattern if and only B i j = max ( (cid:107) A : , i (cid:107) , (cid:107) A : , j (cid:107) ) . Analysis of HIF-IE is impeded by the compression of SCIs, for which we donot have rigorous bounds on the interaction rank. Nonetheless, ample numericalevidence suggests that SCIs behave very similarly to standard kernel interactions.For the sake of analysis, we hence assume that the same rank estimates apply, fromwhich we have (3.7) for all (cid:96) ≥ Theorem 4.3.
Assume that (3.7) holds. Then the cost of constructing the factoriza-tion F in (4.1) or (4.2) using HIF-IE with accelerated compression is t f = O ( N ) ,while that of applying F or F − is t a / s = O ( N ) .Proof. This is essentially just a restatement of Corollary 3.6 (but with the sum nowtaken also over fractional levels). (cid:3)
Corollary 4.4.
For second-kind IEs,t f = (cid:40) O ( N log N ) , d = O ( N log N ) , d = , t a / s = (cid:40) O ( N log log N ) , d = O ( N log N ) , d = . Proof.
According to the modifications of Section 4.4, there are now two effectiveID tolerances: ε for all c ∈ C (cid:96) such that Y S c = O ( ε / N ) otherwise. The formeris used for all initial levels (cid:96) ≤ λ before SCIs have become widespread (i.e., beforeany meaningful dimensional reduction has occurred), and the latter for all (cid:96) > λ .But using precision O ( ε / N ) yields a rank estimate with constant of proportionality O ( log δ N ) , where δ is the intrinsic dimension of the DOF cluster c [28, 29], sothe amount of compression depends on N . Thus, λ = λ ( N ) and our first task is todetermine its form.The crossover level λ can be obtained by balancing the typical size | c | of anedge (2D and 3D) or face (3D only) with its skeleton size | ˆ c | . In 2D, this is 2 λ ∼ IF-IE 25 λ log N , where the left-hand side gives the size of an edge at level λ , and the right-hand side the estimated rank for SCI compression. Therefore, λ ∼ log log N .In 3D, there are two crossover levels λ and λ corresponding to face and edgecompression, respectively, with λ = max ( λ , λ ) :2 λ ∼ λ log N , λ ∼ λ log N . Hence, λ ∼ N and λ ∼ log log N , so λ ∼ N .The cost of constructing F for second-kind IEs is then t f = O ( dL m d ) + λ ∑ (cid:96) = d ( L − (cid:96) ) O ( ( d − ) (cid:96) ) + L ∑ (cid:48) (cid:96) = λ O ( d ( L − (cid:96) ) k (cid:96) ) , where prime notation denotes summation over all levels, both integer and frac-tional, and k (cid:96) is as given in (3.7) with k = O ( log N ) . The first sum corresponds torunning RSF on the initial levels and reduces to λ ∑ (cid:96) = d ( L − (cid:96) ) O ( ( d − ) (cid:96) ) = (cid:40) O ( N log N ) , d = O ( N log N ) , d = , while the second can be interpreted as the cost of the standard HIF-IE (withoutmodification) applied to the remaining O ( − λ N ) = (cid:40) O ( N / log N ) , d = O ( N / log N ) , d = O ( ε / N ) . By Corollary 3.6, this is L ∑ (cid:48) (cid:96) = λ O ( d ( L − (cid:96) ) k (cid:96) ) = (cid:40) O ( N log N ) , d = O ( N ) , d = , so, adding all terms, we derive t f as claimed.A similar argument for t a / s = O ( dL m d ) + λ ∑ (cid:96) = d ( L − (cid:96) ) O ( ( d − ) (cid:96) ) + L ∑ (cid:48) (cid:96) = λ O ( d ( L − (cid:96) ) k (cid:96) ) completes the proof. (cid:3) Remark . Like Theorem 3.4 for RSF, the parameter d in Corollary 4.4 can alsobe regarded as the intrinsic dimension. In this section, we demonstrate the efficiency of HIF-IE by reporting numeri-cal results for some benchmark problems in 2D and 3D. All algorithms and exam-ples were implemented in MATLAB and are freely available at https://github.com/klho/FLAM/ . In what follows, we refer to RSF as rskelf2 in 2D and rskelf3 in 3D. Similarly, we call HIF-IE hifie2 and hifie3 , respectively, with hifie2x and hifie3x denoting their second-kind IE counterparts. All codes are fully adaptiveand built on quadtrees in 2D and octrees in 3D. The average block size | c | at level0 (and hence the tree depth L ) was chosen so that | c | ∼ | ˆ c | . In select cases, thefirst few fractional levels of HIF-IE were skipped to optimize the running time.Symmetry was exploited wherever possible by compressing Y (cid:48) c = (cid:20) A c N , c Y c E , c (cid:21) instead of the full matrix Y c in (3.3), which reduces the cost by about a factor of 2.Diagonal blocks, i.e., A pp in Lemma 2.1, were factored using the (partially pivoted)LDL decomposition if A is symmetric and the LU decomposition otherwise.For each example, the following, if applicable, are given: • ε : base relative precision of the ID; • N : total number of DOFs in the problem; • | s L | : number of active DOFs remaining at the highest level; • t f : wall clock time for constructing the factorization F in seconds; • m f : memory required to store F in GB; • t a / s : wall clock time for applying F or F − in seconds; • e a : a posteriori estimate of (cid:107) A − F (cid:107) / (cid:107) A (cid:107) (see below); • e s : a posteriori estimate of (cid:107) I − AF − (cid:107) ≥ (cid:107) A − − F − (cid:107) / (cid:107) A − (cid:107) ; • n i : number of iterations to solve (1.6) using GMRES with preconditioner F − to a tolerance of 10 − , where f is a standard uniform random vector(ill-conditioned systems only).The operator errors e a and e s were estimated using power iteration with a standarduniform random start vector [18, 42] and a convergence criterion of 10 − relativeprecision in the matrix norm. This procedure requires the application of both A and A ∗ , which for translation-invariant kernels was done using fast Fourier convolution[9] and for non–translation-invariant kernels using an ID-based kernel-independentFMM [44, 46] at precision 10 − . The same methods were also used to apply A when solving (1.6) iteratively.For simplicity, all IEs were discretized using a piecewise constant collocationmethod as in Section 3. Certain near-field interactions (to be defined for eachcase) were computed using adaptive quadrature, while all other interactions werehandled as simple one-point approximations, e.g., K i j = K ( (cid:107) x i − x j (cid:107) ) h d in (3.1).All computations were performed in MATLAB R2010b on a single core (with-out parallelization) of an Intel Xeon E7-4820 CPU at 2.0 GHz on a 64-bit Linuxserver with 256 GB of RAM. We begin first in 2D, where we present three examples.
IF-IE 27 T ABLE rskelf2 hifie2 ε N | s L | t f m f | s L | t f m f − . + . − . + . − . + . + . + . + . + . + . + . + − . + . + . + . − . + . + . + . + . + . + . + . + − . + . + . + . + . + . + . + . + . + . + . + . + T ABLE rskelf2 hifie2 ε N t a / s t a / s e a e s n i − . − . − . −
04 1 . − . + . + . −
04 1 . − . + . + . −
04 1 . − − . − . − . −
07 5 . − . + . + . −
07 6 . − . + . + . −
07 4 . − − . + . − . −
10 4 . − . + . + . −
10 6 . − . + . + . −
10 1 . − Example 1
Consider (1.1) with a ( x ) ≡ b ( x ) ≡ c ( x ) ≡ K ( r ) = − / ( π ) log r , and Ω =( , ) , i.e., a first-kind volume IE in the unit square, discretized over a uniform n × n grid. The diagonal entries K ii are computed adaptively, while all K i j for i (cid:54) = j are approximated using one-point quadratures. We factored the resulting matrix A using both rskelf2 and hifie2 at ε = − , 10 − , and 10 − . The data are summarizedin Tables 5.1 and 5.2 with scaling results shown in Figure 5.1.It is evident that | s L | ∼ k L behaves as predicted, with HIF-IE achieving signif-icant compression over RSF. Consequently, we find strong support for asymptoticcomplexities consistent with Theorems 3.4 and 4.3. For all problem sizes tested, t f and m f are always smaller for HIF-IE, though t a / s is quite comparable. This is be-cause t a / s is dominated by memory access (at least in our current implementation),which also explains its relative insensitivity to ε . Furthermore, we observe that t a / s (cid:28) t f for both methods, which makes them ideally suited to systems involvingmultiple right-hand sides. F IGURE t f ( ◦ ) and t a / s ( (cid:3) ) and storage requirements m f ( (cid:5) ) are shown for rskelf2 (white)and hifie2 (black) at precision ε = − . Included also are referencescalings (gray dashed lines) of O ( N ) and O ( N / ) (left, from bottom totop), and O ( N ) and O ( N log N ) (right). The lines for t a / s (bottom left) lienearly on top of each other. The forward approximation error e a = O ( ε ) for all N and seems to increaseonly very mildly, if at all, with N . This indicates that the local accuracy of the IDprovides a good estimate of the overall accuracy of the algorithm, which is not easyto prove since the multilevel matrix factors constituting F are not unitary. On theother hand, we expect the inverse approximation error to scale as e s = O ( κ ( A ) e a ) ,where κ ( A ) = (cid:107) A (cid:107)(cid:107) A − (cid:107) is the condition number of A , and indeed we see that e s is much larger due to the ill-conditioning of the first-kind system. When using F − to precondition GMRES, however, the number of iterations required is always verysmall. This indicates that F − is a highly effective preconditioner. Example 2
Consider now the same setup as in Example 1 but with a ( x ) ≡
1. This givesa well-conditioned second-kind IE, which we factored using rskelf2 , hifie2 , and hifie2x . The data are summarized in Tables 5.3 and 5.4 with scaling results inFigure 5.2.As expected, results for rskelf2 are essentially the same as those in Example1 since the off-diagonal interactions at each level are identical. We also see thebreakdown of hifie2 , which still has linear complexity but fails to properly approx-imate A as predicted in Section 4.4. This is remedied by hifie2x , which achieves e a , e s = O ( ε ) but with a slight increase in cost. In particular, it appears to scalesomewhat faster than linearly but remains consistent with Corollary 4.4. IF-IE 29 T ABLE rskelf2 hifie2 hifie2x ε N | s L | t f m f | s L | t f m f | s L | t f m f − . + . − . + . − . + . − . + . + . + . + . + . + . + . + . + . + . + . + − . + . + . + . − . + . + . + . + . + . + . + . + . + . + . + . + . + . + − . + . + . + . + . + . + . + . + . + . + . + . + . + . + . + . + . + . + T ABLE rskelf2 hifie2 hifie2x ε N t a / s t a / s e a e s t a / s e a e s − . − . − . − . − . − . −
04 2 . − . + . + . − . − . + . −
04 3 . − . + . + . − . − . + . −
04 8 . − − . + . − . − . − . + . −
07 6 . − . + . + . − . − . + . −
07 1 . − . + . + . − . − . + . −
06 1 . − − . + . − . − . − . + . −
10 3 . − . + . + . − . − . + . −
10 3 . − . + . + . − . − . + . −
09 1 . − F IGURE rskelf2 (white), hifie2 (gray), and hifie2x (black) at precision ε = − . Included also arereference scalings of O ( N ) , O ( N log N ) , and O ( N / ) (left); and O ( N ) and O ( N log N ) (right). All other notation as in Figure 5.1. T ABLE rskelf2 hifie2 hifie2x ε N κ | s L | t f m f | s L | t f m f | s L | t f m f − . + . − . + . − . + . −
16 2995 5 . + . + . + . + . + . +
32 5918 3 . + . + . + . + . + . + T ABLE rskelf2 hifie2 hifie2x ε N κ t a / s t a / s e a e s n i t a / s e a e s n i − . − . − . − . − . − . − . −
16 2 . + . + . − . − . + . − . −
32 1 . + . + . − . − . + . − . − Example 3
We then turn to the Lippmann-Schwinger equation σ ( x ) + k (cid:90) Ω K ( (cid:107) x − y (cid:107) ) ω ( y ) σ ( y ) d Ω ( y ) = f ( x ) , x ∈ Ω = ( , ) for Helmholtz scattering, where k = πκ is the frequency of the incoming wavewith κ the number of wavelengths in Ω ; K ( r ) = ( i / ) H ( ) ( kr ) is the fundamentalsolution of the associated Helmholtz equation satisfying the Sommerfeld radiationcondition, where i is the imaginary unit and H ( ) ( · ) is the zeroth order Hankel func-tion of the first kind; and ω ( x ) is a continuous function representing the scatterer.We refer the interested reader to [15] for details. Assuming that ω ( x ) ≥
0, this canbe symmetrized by the change of variables u ( x ) = (cid:112) ω ( x ) σ ( x ) as(5.1) u ( x ) + k (cid:112) ω ( x ) (cid:90) Ω K ( (cid:107) x − y (cid:107) ) (cid:104) k (cid:112) ω ( y ) (cid:105) u ( y ) d Ω ( y ) = (cid:112) ω ( x ) f ( x ) , x ∈ Ω , i.e., (1.1) with a ( x ) ≡ b ( x ) ≡ c ( x ) = k (cid:112) ω ( x ) . We took a Gaussian bump ω ( x ) = exp ( − ( x − x ) ) for x = ( / , / ) as the scatterer and discretized (5.1)using a uniform grid with quadratures as computed in Example 1. The frequency k was increased with n = √ N to keep the number of DOFs per wavelength fixedat 32. Data for rskelf2 , hifie2 , and hifie2x with κ =
8, 16, and 32 at ε = − areshown in Tables 5.5 and 5.6.Overall, the results are similar to those in Example 2 but with added compu-tational expense due to working over C and computing H ( ) ( kr ) . Moreover, al-though (5.1) is formally a second-kind IE, it becomes increasingly first-kind as k → ∞ . Thus, the problem is somewhat ill-conditioned, as reflected in the dete-rioration of e a and e s even for hifie2x . Nevertheless, F − remains a very good IF-IE 31 preconditioner, with n i = O ( ) for hifie2x . Interestingly, despite its inaccuracy, hifie2 is also quite effective for preconditioning: experimentally, we observe that n i = O ( log N ) , which can be justified as follows. Lemma 5.1.
If A = I + E with ε = (cid:107) E (cid:107) , then the number of iterations for GMRESto solve (1.6) to any target precision ε > is n i ≤ log ε ε .Proof. Let u k be the k th iterate with residual r k = Au k − f . Then the relative resid-ual satisfies (cid:107) r k (cid:107)(cid:107) f (cid:107) ≤ min p ∈ P k (cid:107) p ( A ) (cid:107) , where P k is the set of all polynomials p of degree at most k such that p ( ) = p ( z ) = ( − z ) k . Then (cid:107) p ( A ) (cid:107) ≤ (cid:107) I − A (cid:107) k = (cid:107) E (cid:107) k = ε k , so (cid:107) r k (cid:107) / (cid:107) f (cid:107) ≤ ε k . Setting the left-hand side equal to ε yields n i ≡ k ≤ log ε ε . (cid:3) Corollary 5.2.
Let F = A + E and F − = A − + G with (cid:107) E (cid:107) ≤ CN ε (cid:107) A (cid:107) and (cid:107) G (cid:107) ≤ CN εκ ( A ) (cid:107) A − (cid:107) for some constant C such that CN εκ ( A ) (cid:28) . Then the number ofiterations for GMRES to solve (1.6) with preconditioner F − isn i ∼ (cid:16) + log / ε CN κ ( A ) (cid:17) log ε ε . Proof.
The preconditioned matrix is F − A = F − ( F − E ) = I − F − E , where (cid:107) F − E (cid:107) ≤ ( (cid:107) A − (cid:107) + (cid:107) G (cid:107) ) (cid:107) E (cid:107) ≤ CN εκ ( A )( + CN εκ ( A )) ∼ CN εκ ( A ) , so Lemma 5.1 gives n i ∼ log CN εκ ( A ) ε = log ε log CN εκ ( A ) = (cid:18) + log ε CN κ ( A ) (cid:19) log ε log ε = (cid:32) − log / ε CN κ ( A ) (cid:33) log ε ε . But CN κ ( A ) (cid:28) / ε , so log / ε CN κ ( A ) (cid:28)
1. The claim now follows by first-orderexpansion of the term in parentheses. (cid:3)
We remark that HIF-IE is effective only at low to moderate frequency since therank structures employed break down as k → ∞ . In the limit, the only compressionpossible is due to Green’s theorem, with HIF-IE reducing to RSF for volume IEs.The situation is yet worse for boundary IEs, for which no compression at all isavailable in general, and both RSF and HIF-IE revert to having O ( N ) complexity. We next present three examples in 3D: a boundary IE and two volume IEs as inExamples 1 and 2. T ABLE rskelf3 hifie3 hifie3x ε N | s L | t f m f | s L | t f m f | s L | t f m f − . + . − . + . − . + . − . + . + . + . − . + . + . + . + . + . + . + . + . + . + . + . + . + . + − . + . + . + . − . + . + . + . + . + . + . + . + . + . + . + . + . + . + T ABLE rskelf3 hifie3 hifie3x ε N t a / s t a / s e a e s t a / s e a e s − . − . − . − . − . − . − . − . + . − . − . − . − . − . − . + . + . − . − . + . − . − . + . + . − . − . + . − . − − . − . − . − . − . − . − . − . + . + . − . − . + . − . − . + . + . − . − . + . − . − Example 4
Consider the second-kind boundary IE (1.5) on the unit sphere Γ = S , where G ( r ) is as defined in (1.4). It is possible to reparametrize Γ in 2D and then use2D algorithms, but we ran the full 3D solvers here. We represented Γ as a collec-tion of flat triangles and discretized via a centroid collocation scheme. Near-fieldinteractions for all centroids within a local neighborhood of radius h about eachtriangle, where h is the average triangle diameter, were computed using fourth-order tensor-product Gauss-Legendre quadrature. This gives a linear system (1.6)with unsymmetric A . Data for rskelf3 , hifie3 , and hifie3x at ε = − and 10 − areshown in Tables 5.7 and 5.8 with scaling results in Figure 5.3.Since Γ is a 2D surface, d = O ( N / ) complexity, as observed. However, the skeleton size is substantially largerthan in 2D, so the corresponding costs are much higher. The same is true forHIF-IE, which achieves quasilinear complexity as predicted in Theorem 4.3 andCorollary 4.4. As before, e a , e s = O ( ε ) for hifie3x but suffer for hifie3 .We also tested the accuracy of our algorithms in solving the associated PDE(1.2) by constructing an interior harmonic field v ( x ) = ∑ j G ( (cid:107) x − y j (cid:107) ) q j , x ∈ D IF-IE 33 F IGURE rskelf3 (white), hifie3 (gray), and hifie3x (black) at precision ε = − ; all other notationas in Figure 5.1.T ABLE ε N rskelf3 hifie3 hifie3x − . − . − . − . − . − . − . − . − . − . − . − . − − . − . − . − . − . − . − . − . − . − due to 16 random exterior sources { y j } with (cid:107) y j (cid:107) =
2, where the “charge” strengths q j were drawn from the standard uniform distribution. This induces the boundarydata f ( x ) = v ( x ) | x ∈ Γ , which returns the charge density σ ( x ) upon solving (1.5).The field u ( x ) due to σ ( x ) via the double-layer potential (1.3) is then, in principle,identical to v ( x ) by uniqueness of the boundary value problem. This equality wasassessed by evaluating both u ( x ) and v ( x ) at 16 random interior targets { z j } with (cid:107) z j (cid:107) = /
2. The relative error between { u ( z j ) } and { v ( z j ) } is shown in Table 5.9,from which we observe that rskelf3 and hifie3x are both able to solve the PDE upto the discretization or approximation error. Example 5
Now consider the 3D analogue of Example 1, i.e., (1.1) with a ( x ) ≡ b ( x ) ≡ c ( x ) = K ( r ) = / ( π r ) , and Ω = ( , ) , discretized over a uniform grid withadaptive quadratures for the diagonal entries. Data for rskelf3 and hifie3 at ε = − and 10 − are given in Tables 5.10 and 5.11 with scaling results in Figure 5.4.It is immediate that t f = O ( N ) and t a / s = O ( N / ) for RSF, which consider-ably degrades its performance for large N . Indeed, we were unable to run rskelf3 for N = because of the excessive memory cost. In contrast, HIF-IE scales T ABLE rskelf3 hifie3 ε N | s L | t f m f | s L | t f m f − . + . + . + . − . + . + . + . + — — — 3981 5 . + . + − . + . + . + . + — — — 16401 1 . + . + T ABLE rskelf3 hifie3 ε N t a / s t a / s e a e s n i − . − . − . − . − . + . + . − . − — 1 . + . − . − − . + . − . − . − — 6 . + . − . − F IGURE rskelf3 (white)and hifie3 (black) at precision ε = − . Dotted lines denote extrapolatedvalues. Included also are reference scalings of O ( N ) and O ( N ) (left),and O ( N ) and O ( N / ) (right); all other notation as in Figure 5.1. much better though does not quite achieve O ( N ) complexity as stated in Theo-rem 4.3: the empirical scaling for t f at ε = − , for instance, is approximately O ( N . ) . We believe this to be a consequence of the large interaction ranks in 3D,which make the asymptotic regime rather difficult to reach. Still, even the exper-imental growth rate of k (cid:96) (cid:39) O ( (cid:96) ) would be sufficient for theoretical O ( N log N ) complexity. In parallel with Example 1, e a = O ( ε ) but e s is somewhat larger due toill-conditioning. We found F − to be a very effective preconditioner throughout. IF-IE 35 T ABLE rskelf3 hifie3 hifie3x ε N | s L | t f m f | s L | t f m f | s L | t f m f − . + . + . + . − . + . − . + . + . + . + . + . + — — — 5105 5 . + . + . + . + − . + . + . + . + . + . + — — — 12558 5 . + . + . + . + T ABLE rskelf3 hifie3 hifie3x ε N t a / s t a / s e a e s t a / s e a e s − . − . − . − . − . − . − . − . + . + . − . − . + . − . − — 1 . + . − . − . + . − . − − . + . − . − . − . − . − . − — 6 . + . − . − . + . − . − F IGURE rskelf3 (white), hifie3 (gray), and hifie3x (black) at precision ε = − . Included also arereference scalings of O ( N ) , O ( N log N ) , and O ( N ) (left); and O ( N ) , O ( N log N ) , and O ( N / ) (right). All other notation as in Figure 5.4. Example 6
Finally, we consider the 3D analogue of Example 2, i.e., Example 5 but with a ( x ) ≡
1. This is a well-conditioned second-kind IE, which we factored using rskelf3 , hifie3 , and hifie3x . The data are summarized in Tables 5.12 and 5.13 withscaling results shown in Figure 5.5.Algorithms rskelf3 and hifie3 behave very similarly as in Example 5 but withsome error propagation for hifie3 as discussed in Section 4.4. Full accuracy isrestored using hifie3x but at the cost of significantly larger skeleton sizes. The empirical complexity of hifie3x hence suffers but remains quite favorable comparedto that of rskelf3 . We also find a good fit with the complexity estimates of Corollary4.4, though the presumed penalty for not yet reaching the asymptotic regime mayimply that the proposed bounds are overly pessimistic. In this paper, we have introduced HIF-IE for the efficient factorization of dis-cretized integral operators associated with elliptic PDEs in 2D and 3D. HIF-IEcombines a novel matrix sparsification framework with recursive dimensional re-duction to construct an approximate generalized LU decomposition at estimatedquasilinear cost. The latter enables significant compression over RS and is criticalfor improving the asymptotic complexity, while the former substantially simplifiesthe algorithm and permits its formulation as a factorization. This representation al-lows the rapid application of both the matrix and its inverse, and therefore providesa generalized FMM, direct solver, or preconditioner, depending on the accuracy.We have also presented RSF, a factorization formulation of RS [25, 27, 39, 43] thatis closely related to MF [19, 23] for sparse matrices. Indeed, a key observationunderlying both RSF and HIF-IE is that structured dense matrices can be sparsifiedvery efficiently via the ID. This suggests that well-developed sparse techniques canbe applied, and we anticipate that fully exploring this implication will lead to newfast algorithms for dense linear algebra.The skeletonization operator at the core of RSF and HIF-IE can be interpretedin several ways. For example, we can view it as an approximate local change ofbasis in order to gain sparsity. Unlike traditional approaches [1, 7, 17], however,this basis is determined optimally on the fly using the ID. Skeletonization can alsobe regarded as adaptive numerical upscaling or as implementing specialized re-striction and prolongation operators in the context of multigrid methods [33].Although we have presently only considered matrices arising from IEs, thesame methods can also be applied (with minor modification) to various generalstructured matrices such as those encountered in Gaussian process modeling [3, 12]or sparse differential formulations of PDEs [6, 24, 52]. In particular, HIF-IE canbe heavily specialized to the latter setting by explicitly taking advantage of ex-isting sparsity. The resulting hierarchical interpolative factorization for differen-tial equations (HIF-DE) is described in the companion paper [41] and likewiseachieves estimated linear or quasilinear complexity in 2D and 3D.Some important directions for future research include: • Obtaining analytical estimates of the interaction rank for SCIs, even for thesimple case of the Laplace kernel (1.4). This would enable a much moreprecise understanding of the complexity of HIF-IE, which has yet to berigorously established.
IF-IE 37 • Parallelizing RSF and HIF-IE, both of which are organized according to atree structure where each node at a given level can be processed indepen-dently of the rest. The parallelization of HIF-IE holds particular promiseand should have significant impact on practical scientific computing. • Investigating alternative strategies for reducing skeleton sizes in 3D, whichcan still be quite large, especially at high precision. New ideas may berequired to build truly large-scale direct solvers. • Understanding the extent to which our current techniques can be adaptedto highly oscillatory kernels, which possess rank structures of a differenttype than that exploited here [20, 21]. Such high-frequency problems canbe extremely difficult to solve by iteration and present a prime target areafor future fast direct methods.
Acknowledgment.
We would like to thank Leslie Greengard for many helpful discussions, LenyaRyzhik for providing computing resources, and the anonymous referees for theircareful reading of the manuscript, which have improved the paper tremendously.K.L.H. was partially supported by the National Science Foundation under awardDMS-1203554. L.Y. was partially supported by the National Science Founda-tion under award DMS-1328230 and the U.S. Department of Energy’s AdvancedScientific Computing Research program under award DE-FC02-13ER26134/DE-SC0009409.
Bibliography [1] Alpert, B.; Beylkin, G.; Coifman, R.; Rokhlin, V. Wavelet-like bases for the fast solution ofsecond-kind integral equations.
SIAM J. Sci. Comput. (1993), no. 1, 159–184.[2] Ambikasaran, S.; Darve, E. An O ( N log N ) fast direct solver for partial hierarchically semi-separable matrices. J. Sci. Comput. (2013), 477–501.[3] Ambikasaran, S.; Foreman-Mackey, D.; Greengard, L.; Hogg, D. W.; O’Neil, M. Fast di-rect methods for Gaussian processes and analysis of NASA Kepler mission data. Preprint,arXiv:1403.6015 [math.NA].[4] Aurenhammer, F. Voronoi diagrams — A survey of a fundamental geometric data structure. ACM Comput. Surv. (1991), no. 3, 345–405.[5] Barnes, J.; Hut, P. A hierarchical O ( N log N ) force-calculation algorithm. Nature (1986),no. 4, 446–449.[6] Bebendorf, M.; Hackbusch, W. Existence of H -matrix approximants to the inverse FE-matrixof elliptic operators with L ∞ -coefficients. Numer. Math. (2003), 1–28.[7] Beylkin, G.; Coifman, R.; Rokhlin, V. Fast wavelet transforms and numerical algorithms I. Comm. Pure Appl. Math. (1991), 141–183.[8] Bremer, J. A fast direct solver for the integral equations of scattering theory on planar curveswith corners. J. Comput. Phys. (2012), 1879–1899.[9] Brigham, E. O.
The Fast Fourier Transform and Its Applications . Prentice Hall, EnglewoodCliffs, 1988.[10] Chandrasekaran, S.; Dewilde, P.; Gu, M.; Lyons, W.; Pals, T. A fast solver for HSS representa-tions via sparse matrices.
SIAM J. Matrix Anal. Appl. (2006), no. 1, 67–81.8 K. L. HO AND L. YING[11] Chandrasekaran, S.; Gu, M.; Pals, T. A fast ULV decomposition solver for hierarchicallysemiseparable representations.
SIAM J. Matrix Anal. Appl. (2006), no. 3, 603–622.[12] Chen, J.; Wang, L.; Anitescu, M. A fast summation tree code for the Mat´ern kernel. SIAM J.Sci. Comput. (2014), no. 1, A289–A309.[13] Chen, Y. A fast, direct algorithm for the Lippmann-Schwinger integral equation in two dimen-sions. Adv. Comput. Math. (2002), 175–190.[14] Cheng, H.; Gimbutas, G.; Martinsson, P. G.; Rokhlin, V. On the compression of low rank ma-trices. SIAM J. Sci. Comput. (2005), no. 4, 1389–1404.[15] Colton, D.; Kress, R. Inverse Acoustic and Electromagnetic Scattering . Applied MathematicalSciences, vol. 93. Springer-Verlag, Berlin, 1992.[16] Corona, E.; Martinsson, P.-G.; Zorin, D. An O ( N ) direct solver for integral equations on theplane. Appl. Comput. Harmon. Anal. (2015), 284–317.[17] Dahmen, W. Wavelet and multiscale methods for operator equations. Acta Numer. (1997),55–228.[18] Dixon, J. D. Estimating extremal eigenvalues and condition numbers of matrices. SIAM J. Nu-mer. Anal. (1983), no. 4, 812–814.[19] Duff, I. S.; Reid, J. K. The multifrontal solution of indefinite sparse symmetric linear systems. ACM Trans. Math. Software (1983), no. 3, 302–325.[20] Engquist, B.; Ying, L. A fast directional algorithm for high frequency acoustic scattering in twodimensions. Comm. Math. Sci. (2009), no. 2, 327–345.[21] Engquist, B.; Ying, L. Fast directional multilevel algorithms for oscillatory kernels. SIAM J.Sci. Comput. (2007), no. 4, 1710–1737.[22] Fong, W.; Darve, E. The black-box fast multipole method. J. Comput. Phys. (2009), 8712–8725.[23] George, A. Nested dissection of a regular finite element mesh.
SIAM J. Numer. Anal. (1973),no. 2, 345–363.[24] Gillman, A.; Martinsson, P. G. An O ( N ) algorithm for constructing the solution operator to 2Delliptic boundary value problems in the absence of body loads. Adv. Comput. Math. (2014),773–796.[25] Gillman, A.; Young, P. M.; Martinsson, P.-G. A direct solver with O ( N ) complexity for integralequations on one-dimensional domains. Front. Math. China (2012), no. 2, 217–247.[26] Golub, G. H.; van Loan, C. F. Matrix Computations , 3rd ed. Johns Hopkins University Press,Baltimore, 1996.[27] Greengard, L.; Gueyffier, D.; Martinsson, P.-G.; Rokhlin, V. Fast direct solvers for integralequations in complex three-dimensional domains.
Acta Numer. (2009), 243–275.[28] Greengard, L.; Rokhlin, V. A fast algorithm for particle simulations. J. Comput. Phys. (1987), 325–348.[29] Greengard, L.; Rokhlin, V. A new version of the Fast Multipole Method for the Laplace equationin three dimensions. Acta Numer. (1997), 229–269.[30] Gu, M.; Eisenstat, S. C. Efficient algorithms for computing a strong rank-revealing QR factor-ization. SIAM J. Sci. Comput. (1996), no. 4, 848–869.[31] Guenther, R. B.; Lee, J. W. Partial Differential Equations of Mathematical Physics and IntegralEquations . Prentice Hall, Englewood Cliffs, 1988.[32] Hackbusch, W. A sparse matrix arithmetic based on H -matrices. Part I: Introduction to H -matrices. Computing (1999), 89–108.[33] Hackbusch, W. Multi-Grid Methods and Applications . Springer, Berlin, 1985.[34] Hackbusch, W.; B¨orm, S. Data-sparse approximation by adaptive H -matrices. Computing (2002), 1–35.IF-IE 39[35] Hackbusch, W.; Khoromskij, B. N. A sparse H -matrix arithmetic. Part II: Application to multi-dimensional problems. Computing (2000), 21–47.[36] Hackbusch, W.; Nowak, Z. P. On the fast matrix multiplication in the boundary element methodby panel clustering. Numer. Math. (1989), 463–491.[37] Halko, N.; Martinsson, P. G.; Tropp, J. A. Finding structure with randomness: Probabilisticalgorithms for constructing approximate matrix decompositions. SIAM Rev. (2011), no. 2,217–288.[38] Hestenes, M. R.; Stiefel, E. Method of conjugate gradients for solving linear systems. J. Res.Nat. Bur. Stand. (1952), no. 6, 409–436.[39] Ho, K. L.; Greengard, L. A fast direct solver for structured linear systems by recursive skele-tonization. SIAM J. Sci. Comput. (2012), no. 5, A2507–A2532.[40] Ho, K. L.; Greengard, L. A fast semidirect least squares algorithm for hierarchically blockseparable matrices. SIAM J. Matrix Anal. Appl. (2014), no. 2, 725–748.[41] Ho, K. L.; Ying, L. Hierarchical interpolative factorization for elliptic operators: differentialequations. Submitted to Comm. Pure Appl. Math. [42] Kuczy´nski, J.; Wo´zniakowski, H. Estimating the largest eigenvalue by the power and Lanczosalgorithms with a random start.
SIAM J. Matrix Anal. Appl. (1992), no. 4, 1094–1122.[43] Martinsson, P. G.; Rokhlin, V. A fast direct solver for boundary integral equations in two di-mensions. J. Comput. Phys. (2005), 1–23.[44] Martinsson, P. G.; Rokhlin, V. An accelerated kernel-independent fast multipole method in onedimension.
SIAM J. Sci. Comput. (2007), no. 3, 1160–1178.[45] Martinsson, P.-G.; Rokhlin, V.; Tygert, M. On interpolation and integration in finite-dimensional spaces of bounded functions. Commun. Appl. Math. Comput. Sci. (2006), no.1, 133–142.[46] Pan, X.-M.; Wei, J.-G.; Peng, Z.; Sheng, X.-Q. A fast algorithm for multiscale electromagneticproblems using interpolative decomposition and multilevel fast multipole algorithm. Radio Sci. (2012), RS1011.[47] Saad, Y.; Schultz, M. H. GMRES: A generalized minimal residual algorithm for solving non-symmetric linear systems. SIAM J. Sci. Stat. Comput. (1986), no. 3, 856–869.[48] Samet, H. The quadtree and related hierarchical data structures. ACM Comput. Surv. (1984),no. 2, 187–260.[49] van der Vorst, H. A. Bi-CGSTAB: A fast and smoothly converging variant of Bi-CG for thesolution of nonsymmetric linear systems. SIAM J. Sci. Stat. Comput. (1992), no. 2, 631–644.[50] Xia, J. Efficient structured multifrontal factorization for general large sparse matrices. SIAM J.Sci. Comput. (2013), no. 2, A832–A860.[51] Xia, J.; Chandrasekaran, S.; Gu, M.; Li, X. S. Fast algorithms for hierarchically semiseparablematrices. Numer. Linear Algebra Appl. (2010), 953–976.[52] Xia, J.; Chandrasekaran, S.; Gu, M.; Li, X. S. Superfast multifrontal method for large structuredlinear systems of equations. SIAM J. Matrix Anal. Appl. (2009), no. 3, 1382–1411.[53] Xia, J.; Xi, Y.; Gu, M. A superfast structured solver for Toeplitz linear systems via randomizedsampling. SIAM J. Matrix Anal. Appl. (2012) no. 3, 837–858.[54] Ying, L.; Biros, G.; Zorin, D. A kernel-independent adaptive fast multipole algorithm in twoand three dimensions. J. Comput. Phys. (2004), 591–626.(2004), 591–626.