Review of Cyclic Reduction for Parallel Solution of Hermitian Positive Definite Block-Tridiagonal Linear Systems
aa r X i v : . [ m a t h . NA ] J u l Review of Cyclic Reduction for Parallel Solutionof Hermitian Positive Definite Block-TridiagonalLinear Systems
Martin NeuenhofenJuly 3, 2018
Abstract
Cyclic reduction is a method for the solution of (block-)tridiagonallinear systems. In this note we review the method tailored to hermitianpositive definite banded linear systems.The reviewed method has the following advantages: It is numericallystable without pivoting. It is suitable for parallel computations. In thepresented form, it uses fewer computations by exploiting symmetry. LikeCholesky, the reviewed method breaks down when the matrix is not pos-itive definite, offering a robust way for determining positive definiteness.
Brief Summary
Equations (9) give formulas to separate the block-tridiagonal linear system (1)into two separated block-tridiagonal systems of half the dimension. This canbe used for solving the system in a parallel divide-and-conquer approach. Theresulting method is described in Algorithm 1.
We skip the introductory section and literature review on cyclic reduction meth-ods and move directly to the problem.We consider the linear system A B H B A B H B A B H B . . . . . .. . . . . . B H N − B N − A N | {z } =: A · x x x x ... x N | {z } =: x = y y y y ... y N | {z } =: y , (1)1here the matrices A j ∈ C m × m ∀ j = 1 , ..., N (2a) B j ∈ C m × m ∀ j = 1 , ..., ( N −
1) (2b) y j ∈ C m × k ∀ j = 1 , ..., N (2c)are given. m, k ∈ N are integers that shall be much smaller than N ∈ N . Thematrix A ∈ C ( N · m ) × ( N · m ) shall be positive definite. Sought are the matrices x j ∈ C m × k ∀ j = 1 , ..., N . (3)We assume N ∈ · N . Derivation of the Method
The idea of cyclic reduction is based on a specialreordering. Putting the elements of A in the order1 , , , , . . . , ( N − , , , , , . . . , N , (4)we obtain the rearranged matrix A B H A B B H A B B H . . . . . . . . . A N − B N − B H N − B B H A B B H A B . . . A . . . B H N − . . . B N − A N . The four sub-blocks in the above matrix we denote with D , D , C ∈ C ( N/ · m ) × ( N/ · m ) . I.e., for the above matrix we use the compact writing (cid:20) D C H C D (cid:21) . These three matrices D , D , C can be used to construct two separate linearsystems for the solution components x j . As a benefit, the two separated linear2ystems have only half the dimension of the original system. In particular, oneof the two systems is only in the odd indices j = 1 , , , . . . , ( N − x j . We clarify this in the following.To be able to write the systems in compact form, we use the notation ofeven and odd vectors x o = x x x ... x N − , x e = x x x ... x N , y o = y y y ... y N − , y e = y y y ... y N . The decoupled linear systems for x and x e are:• Odd System: ( D − C H · D − · C ) | {z } =: U · x o = y o − C H · D − · y e | {z } =: u (5)• Even System: ( D − C · D − · C H ) | {z } =: V · x e = y e − C · D − · y o | {z } =: v (6)The odd system has the system matrix U and right-hand side u . The evensystem has V and v . In the following we give formulas for these matrices andvectors. Formulas for the Odd and Even System
We now give explicit for-mulas for the matrices U , V ∈ C ( N/ · m ) × ( N/ · m ) and the right-hand sides u , v ∈ C ( N/ · m ) × k , that build the odd and even system.The matrices and vectors have the block-structure below: U = U E H E U E H E U E H E . . . . . .. . . . . . E H N/ − E N/ − U N/ , u = u u u ... u N/ − u N/ ; (7a) V = V F H F V F H F V F H F . . . . . .. . . . . . F H N/ − F N/ − V N/ , v = v v v ... v N/ − v N/ . (7b)3here appear the inner matrices U j , V j ∈ C m × m ∀ j = 1 , , . . . , ( N/
2) (8a) E j , F j ∈ C m × m ∀ j = 1 , , . . . , ( N/ −
1) (8b) u j , v j ∈ C m × k ∀ j = 1 , , . . . , ( N/ . (8c)The formulas for the these inner matrices can be found by insertion. We presentthese formulas below: U j = A · j − − B · j − · A − · j − · B H · j − − B H · j − · A − · j · B · j − (9a) E j = − B · j · A − · j · B · j − (9b) u j = y · j − − B · j − · A − · j − · y · j − − B H · j − · A − · j · y · j (9c) V j = A · j − B · j − · A − · j − · B H · j − − B H · j · A − · j +1 · B · j (9d) F j = − B · j +1 · A − · j +1 · B · j (9e) v j = y · j − B · j − · A − · j − · y · j − − B H · j · A − · j +1 · y · j +1 (9f)The matrices on the left can be computed in parallel for j = 1 , , , . . . , ( N/
2) .It is important that in the above we have – for the sake of compact formulas– used identical equations for all j . However, for j = 1 for instance, the formulafor U accesses the matrix B , that does not exist. To repair this detail, wedefine the following auxiliary matrices, that appear in the above formulas: B := ∈ C m × m , B N := ∈ C m × m A := I ∈ C m × m , A N +1 := I ∈ C m × m Formulation of the Method
With the above ingredients, we are able todescribe how cyclic reduction is used to solve the linear system (1). The methodconsists of three phases:In the first phase, given the data (2), the method uses the formulas in (9)to compute the block matrices (8) for the odd system (5) and even system (6).In the second phase, cyclic reduction is used recursively for solving the oddand even system for their respective solutions x o , x e . I.e., we consider (5) and (6)as instances of (1). This works excellent for the following reason: We assumedthat A is positive definite. If the assumption holds, then further it holds that U , V are positive definite as well, because they are Schur complements of A .In the third phase, we rearrange the vectors x o , x e into the global solutionvector x of (1). This completes the method. Determination of Positive Definiteness
If the matrix A is indeed positivedefinite, then all recursive calls of cyclic reduction will succeed, until at thebottom there arise positive definite linear systems of size m × m . These shallbe solved with Cholesky’s method. 4f, in contrast to the assumption, the matrix A is not positive definite, theneventually at least either of the matrices U , V will not be well-defined, andthus the method will crash. This crash will appear in the form that Cholesky’smethod determines indefiniteness for one of the matrices A j in (9). Computing Model
For our description of the parallel method, we considera parallel computing system that consists of N computing nodes; each has anindex, counted as 1 , , , ..., N . The nodes are connected in a communicationnetwork, that consists of cables. The length each cable equals the distance ofthe respective nodes in space. Information travels along these cables at a limitedspeed. Idea of the Parallel Method
The idea for a parallel cyclic reduction consistsof two ingredients:1.
Parallel instantiation of odd and even systems:
The equations (9)are evaluated in parallel over j = 1 , . . . , ( N/ N parallel computingunits are given, then the nodes 1 , , . . . , N/ N/ , . . . , N can computethe matrices for the even system.2. Parallel recursive solution of the split systems:
The linear systems(5) and (6) are solved recursively for x o and x e . To this end, the samemethod as for (1) can be used.A pseudo-code for this procedure is given in Algorithm 1. The algorithmassumes N ∈ N . Time-Complexity and Communication
The pure time-complexity of par-allel computations is in O ( log( N ) · m ) . But, there is an additional cost for communication of data between the com-puting nodes. This cost is discussed below.The method has significant non-local communication. We describe this inthe following, where j ranges from 1 to N/ j receives data from nodes2 · j −
2, 2 · j −
1, and 2 · j . Analogously, for the computation of the even system,the node N/ j receives data from nodes 2 · j −
1, 2 · j , and 2 · j + 1 .Depending on the communication network of the parallel computing system,this non-local communication can be very expensive. In fact, it may result in atime complexity that exceeds the time spent in actual parallel computations.Non-local communications are not unavoidable in parallel methods for solv-ing block-tridiagonal linear systems. In [1] we provide an accurate description of5 lgorithm 1 Cyclic Reduction Method procedure Solver ( A , y ) if dim( A ) is sufficiently small then Solve A · x = y for x , using Cholesky’s method. return x end if // parallel for-loop for nodes j = 1 , , . . . , ( N/ for j = 1 , , . . . , N/ do Compute U j , E j , u j from formulas (9). end for // concurrent parallel for-loop for nodes j = ( N/
2) + 1 , ( N/
2) + 2 , . . . , N for j = 1 , , . . . , N/ do Compute V j , F j , v j from formulas (9). end for // The even and odd systems U , V are given as U j , E j , V j , F j , u j , v j according to (7) x o := Solver ( U , u ) // solve on nodes j = 1 , , . . . , ( N/ x e := Solver ( V , v ) // solve on nodes j = ( N/
2) + 1 , ( N/
2) + 2 , . . . , N
Reassemble x from x o , x e . return x end procedure a method (with directly implementable code for an MPI cluster) that works onlyvia local communications. The communication network used for this method isa binary tree – the cheapest possible way for connecting all nodes. If A is positive definite, then: The matrices D , D are regular, and the matrices Q o and Q e are positive definite. If A is positive definite, and Cholesky decom-position with forward and backward substitution is used to compute formulas(9), then: The matrices Q o and Q e are bit-wise hermitian, and the algorithm isnumerically stable. In particular, the substitution must be applied as follows:Given a formula like Z := G H · M − · G we must use the Cholesky decomposition M = L · L H in the following way: L := Chol ( M ) ˜ G := L \ G return Z := ˜ G H · ˜ G H Using the cyclic reduction in a recursive way, or solving the split systems withan arbitrary forward stable numerical algorithm, results in an overall methodthat is numerically stable. 6he described algorithm can also be used to determine whether a matrix A at hand is positive definite. Namely, this is the case if and only if all splitsubsystems are positive definite. Hence, we can stop the recursive splitting whenthe submatrices are small enough, and then check, e.g. with Cholesky, for thepositive definiteness of U , V . References [1] M. P. Neuenhofen. A time-optimal algorithm for solving (block-)tridiagonallinear systems of dimension n on a distributed computer of n nodes.