[PDF] Review of Cyclic Reduction for Parallel Solution of Hermitian Positive Definite Block-Tridiagonal Linear Systems

Abstract

Cyclic reduction is a method for the solution of (block-)tridiagonal linear systems. In this note we review the method tailored to hermitian positive definite banded linear systems. The reviewed method has the following advantages: It is numerically stable without pivoting. It is suitable for parallel computations. In the presented form, it uses fewer computations by exploiting symmetry. Like Cholesky, the reviewed method breaks down when the matrix is not positive definite, offering a robust way for determining positive definiteness.

Full PDF

aa r X i v : . [ m a t h . NA ] J u l Review of Cyclic Reduction for Parallel Solutionof Hermitian Positive Deﬁnite Block-TridiagonalLinear Systems

Martin NeuenhofenJuly 3, 2018

Abstract

Cyclic reduction is a method for the solution of (block-)tridiagonallinear systems. In this note we review the method tailored to hermitianpositive deﬁnite banded linear systems.The reviewed method has the following advantages: It is numericallystable without pivoting. It is suitable for parallel computations. In thepresented form, it uses fewer computations by exploiting symmetry. LikeCholesky, the reviewed method breaks down when the matrix is not pos-itive deﬁnite, oﬀering a robust way for determining positive deﬁniteness.

Brief Summary

Equations (9) give formulas to separate the block-tridiagonal linear system (1)into two separated block-tridiagonal systems of half the dimension. This canbe used for solving the system in a parallel divide-and-conquer approach. Theresulting method is described in Algorithm 1.

We skip the introductory section and literature review on cyclic reduction meth-ods and move directly to the problem.We consider the linear system  A B H B A B H B A B H B . . . . . .. . . . . . B H N − B N − A N | {z } =: A ·  x x x x ... x N | {z } =: x =  y y y y ... y N | {z } =: y , (1)1here the matrices A j ∈ C m × m ∀ j = 1 , ..., N (2a) B j ∈ C m × m ∀ j = 1 , ..., ( N −

1) (2b) y j ∈ C m × k ∀ j = 1 , ..., N (2c)are given. m, k ∈ N are integers that shall be much smaller than N ∈ N . Thematrix A ∈ C ( N · m ) × ( N · m ) shall be positive deﬁnite. Sought are the matrices x j ∈ C m × k ∀ j = 1 , ..., N . (3)We assume N ∈ · N . Derivation of the Method

The idea of cyclic reduction is based on a specialreordering. Putting the elements of A in the order1 , , , , . . . , ( N − , , , , , . . . , N , (4)we obtain the rearranged matrix  A B H A B B H A B B H . . . . . . . . . A N − B N − B H N − B B H A B B H A B . . . A . . . B H N − . . . B N − A N  . The four sub-blocks in the above matrix we denote with D , D , C ∈ C ( N/ · m ) × ( N/ · m ) . I.e., for the above matrix we use the compact writing (cid:20) D C H C D (cid:21) . These three matrices D , D , C can be used to construct two separate linearsystems for the solution components x j . As a beneﬁt, the two separated linear2ystems have only half the dimension of the original system. In particular, oneof the two systems is only in the odd indices j = 1 , , , . . . , ( N − x j . We clarify this in the following.To be able to write the systems in compact form, we use the notation ofeven and odd vectors x o =  x x x ... x N −  , x e =  x x x ... x N  , y o =  y y y ... y N −  , y e =  y y y ... y N  . The decoupled linear systems for x and x e are:• Odd System: ( D − C H · D − · C ) | {z } =: U · x o = y o − C H · D − · y e | {z } =: u (5)• Even System: ( D − C · D − · C H ) | {z } =: V · x e = y e − C · D − · y o | {z } =: v (6)The odd system has the system matrix U and right-hand side u . The evensystem has V and v . In the following we give formulas for these matrices andvectors. Formulas for the Odd and Even System

We now give explicit for-mulas for the matrices U , V ∈ C ( N/ · m ) × ( N/ · m ) and the right-hand sides u , v ∈ C ( N/ · m ) × k , that build the odd and even system.The matrices and vectors have the block-structure below: U =  U E H E U E H E U E H E . . . . . .. . . . . . E H N/ − E N/ − U N/  , u =  u u u ... u N/ − u N/  ; (7a) V =  V F H F V F H F V F H F . . . . . .. . . . . . F H N/ − F N/ − V N/  , v =  v v v ... v N/ − v N/  . (7b)3here appear the inner matrices U j , V j ∈ C m × m ∀ j = 1 , , . . . , ( N/

2) (8a) E j , F j ∈ C m × m ∀ j = 1 , , . . . , ( N/ −

1) (8b) u j , v j ∈ C m × k ∀ j = 1 , , . . . , ( N/ . (8c)The formulas for the these inner matrices can be found by insertion. We presentthese formulas below: U j = A · j − − B · j − · A − · j − · B H · j − − B H · j − · A − · j · B · j − (9a) E j = − B · j · A − · j · B · j − (9b) u j = y · j − − B · j − · A − · j − · y · j − − B H · j − · A − · j · y · j (9c) V j = A · j − B · j − · A − · j − · B H · j − − B H · j · A − · j +1 · B · j (9d) F j = − B · j +1 · A − · j +1 · B · j (9e) v j = y · j − B · j − · A − · j − · y · j − − B H · j · A − · j +1 · y · j +1 (9f)The matrices on the left can be computed in parallel for j = 1 , , , . . . , ( N/

2) .It is important that in the above we have – for the sake of compact formulas– used identical equations for all j . However, for j = 1 for instance, the formulafor U accesses the matrix B , that does not exist. To repair this detail, wedeﬁne the following auxiliary matrices, that appear in the above formulas: B := ∈ C m × m , B N := ∈ C m × m A := I ∈ C m × m , A N +1 := I ∈ C m × m Formulation of the Method

With the above ingredients, we are able todescribe how cyclic reduction is used to solve the linear system (1). The methodconsists of three phases:In the ﬁrst phase, given the data (2), the method uses the formulas in (9)to compute the block matrices (8) for the odd system (5) and even system (6).In the second phase, cyclic reduction is used recursively for solving the oddand even system for their respective solutions x o , x e . I.e., we consider (5) and (6)as instances of (1). This works excellent for the following reason: We assumedthat A is positive deﬁnite. If the assumption holds, then further it holds that U , V are positive deﬁnite as well, because they are Schur complements of A .In the third phase, we rearrange the vectors x o , x e into the global solutionvector x of (1). This completes the method. Determination of Positive Deﬁniteness

If the matrix A is indeed positivedeﬁnite, then all recursive calls of cyclic reduction will succeed, until at thebottom there arise positive deﬁnite linear systems of size m × m . These shallbe solved with Cholesky’s method. 4f, in contrast to the assumption, the matrix A is not positive deﬁnite, theneventually at least either of the matrices U , V will not be well-deﬁned, andthus the method will crash. This crash will appear in the form that Cholesky’smethod determines indeﬁniteness for one of the matrices A j in (9). Computing Model

For our description of the parallel method, we considera parallel computing system that consists of N computing nodes; each has anindex, counted as 1 , , , ..., N . The nodes are connected in a communicationnetwork, that consists of cables. The length each cable equals the distance ofthe respective nodes in space. Information travels along these cables at a limitedspeed. Idea of the Parallel Method

The idea for a parallel cyclic reduction consistsof two ingredients:1.

Parallel instantiation of odd and even systems:

The equations (9)are evaluated in parallel over j = 1 , . . . , ( N/ N parallel computingunits are given, then the nodes 1 , , . . . , N/ N/ , . . . , N can computethe matrices for the even system.2. Parallel recursive solution of the split systems:

The linear systems(5) and (6) are solved recursively for x o and x e . To this end, the samemethod as for (1) can be used.A pseudo-code for this procedure is given in Algorithm 1. The algorithmassumes N ∈ N . Time-Complexity and Communication

The pure time-complexity of par-allel computations is in O ( log( N ) · m ) . But, there is an additional cost for communication of data between the com-puting nodes. This cost is discussed below.The method has signiﬁcant non-local communication. We describe this inthe following, where j ranges from 1 to N/ j receives data from nodes2 · j −

2, 2 · j −

1, and 2 · j . Analogously, for the computation of the even system,the node N/ j receives data from nodes 2 · j −

1, 2 · j , and 2 · j + 1 .Depending on the communication network of the parallel computing system,this non-local communication can be very expensive. In fact, it may result in atime complexity that exceeds the time spent in actual parallel computations.Non-local communications are not unavoidable in parallel methods for solv-ing block-tridiagonal linear systems. In [1] we provide an accurate description of5 lgorithm 1 Cyclic Reduction Method procedure Solver ( A , y ) if dim( A ) is suﬃciently small then Solve A · x = y for x , using Cholesky’s method. return x end if // parallel for-loop for nodes j = 1 , , . . . , ( N/ for j = 1 , , . . . , N/ do Compute U j , E j , u j from formulas (9). end for // concurrent parallel for-loop for nodes j = ( N/

2) + 1 , ( N/

2) + 2 , . . . , N for j = 1 , , . . . , N/ do Compute V j , F j , v j from formulas (9). end for // The even and odd systems U , V are given as U j , E j , V j , F j , u j , v j according to (7) x o := Solver ( U , u ) // solve on nodes j = 1 , , . . . , ( N/ x e := Solver ( V , v ) // solve on nodes j = ( N/

2) + 1 , ( N/

2) + 2 , . . . , N

Reassemble x from x o , x e . return x end procedure a method (with directly implementable code for an MPI cluster) that works onlyvia local communications. The communication network used for this method isa binary tree – the cheapest possible way for connecting all nodes. If A is positive deﬁnite, then: The matrices D , D are regular, and the matrices Q o and Q e are positive deﬁnite. If A is positive deﬁnite, and Cholesky decom-position with forward and backward substitution is used to compute formulas(9), then: The matrices Q o and Q e are bit-wise hermitian, and the algorithm isnumerically stable. In particular, the substitution must be applied as follows:Given a formula like Z := G H · M − · G we must use the Cholesky decomposition M = L · L H in the following way: L := Chol ( M ) ˜ G := L \ G return Z := ˜ G H · ˜ G H Using the cyclic reduction in a recursive way, or solving the split systems withan arbitrary forward stable numerical algorithm, results in an overall methodthat is numerically stable. 6he described algorithm can also be used to determine whether a matrix A at hand is positive deﬁnite. Namely, this is the case if and only if all splitsubsystems are positive deﬁnite. Hence, we can stop the recursive splitting whenthe submatrices are small enough, and then check, e.g. with Cholesky, for thepositive deﬁniteness of U , V . References [1] M. P. Neuenhofen. A time-optimal algorithm for solving (block-)tridiagonallinear systems of dimension n on a distributed computer of n nodes.