A Generalization of the 2D-DSPM for Solving Linear System of Equations
aa r X i v : . [ m a t h . NA ] J un A Generalization of the 2D-DSPM for SolvingLinear System of Equations
Davod Khojasteh Salkuyeh
Department of Mathematics, University of Mohaghegh Ardabili,P. O. Box. 56199-11367, Ardabil, IranE-mail: [email protected]
Abstract
In [N. Ujevi´c, New iterative method for solving linear systems, Appl. Math. Comput.179 (2006) 725730], a new iterative method for solving linear system of equations waspresented which can be considered as a modification of the Gauss-Seidel method.Then in [Y.-F. Jing and T.-Z. Huang, On a new iterative method for solving linearsystems and comparison results, J. Comput. Appl. Math., In press] a differentapproach, say 2D-DSPM, and more effective one was introduced. In this paper, weimprove this method and give a generalization of it. Convergence properties of thiskind of generalization are also discussed. We finally give some numerical experimentsto show the efficiency of the method and compare with 2D-DSPM.
AMS Subject Classification :
Keywords : linear system, projection method, Gauss-Seidel method, Petrov-Galerkincondition, convergence.
1. Introduction
Consider the linear system of equations Ax = b, (1)where A ∈ R n × n is a symmetric positive definite (SPD) matrix and x, b ∈ R n . TheGauss-Seidel method is an stationary iterative method for solving linear system ofequation and is convergent for SPD matrices. This method is frequently used inscience and engineering, both for solving linear system of equations and precondi-tioning [2, 6]. It can be easily seen that the Gauss-Seidel method is an special case of1 projection method [1, 6]. Let K and L be two m -dimensional subspaces of R n . Letalso x be an initial guess of the solution of (1). A projection method onto K andorthogonal to L is a process which finds an approximate solution x ∈ R n to (1) byimposing the conditions that x ∈ x + K and the new residual vector be orthogonalto L (Petrov-Galerkin condition), i.e.Find x ∈ x + K , such that b − Ax ⊥L . (2)It is well known that an iteration of the elementary Gauss-Seidel method can beviewed as a set of projection methods with L = K = { e i } , i = 1 , , . . . , n, where e i is the i th column of the identity matrix. In fact, a single correction is made at eachstep of these projection steps cycled for i = 1 , . . . , n .In [7], Ujevi´c proposed a modification of the Gauss-Seidel method which maybe named as a “one-dimensional double successive projection method” and referredto as 1D-DSPM. In an iteration of 1D-DSPM, a set of double successive projectionmethods with two pairs of one-dimensional subspaces are used. In fact, in an iter-ation of 1D-DSPM two pairs of subspaces ( K , L ) and ( K , L ) of one dimensionare chosen while it makes double correction at each step of the process cycled for i = 1 , , . . . , n . In [4], Jing and Huang proposed the “two-dimensional double succes-sive projection method” and referred to as 2D-DSPM. In an iteration of 2D-DSPM,a set of projection methods with a pairs of two-dimensional subspaces K and L isused and a double correction at each step of the projection steps is made.In this paper, a generalization of 2D-DSPM say m D-SPM which is referred toas “ m -dimensional successive projection method” is proposed and its convergenceproperties are studied. For m = 2, the m D-SPM results in 2D-DSPM.Throughout this paper we use some notations as follows. By h ., . i we denote thestandard inner product in R n . In fact, for two vectors x and y in R n , h x, y i = y T x .For any SPD matrix M ∈ R n × n , the M -inner product is defined as h x, y i M = h M x, y i and its corresponding norm is k x k M = h x, x i / M .This paper is organized as follows. In section 2, a brief description of 1D-DSPMand 2D-DSPM are given. In section 3, the m D-SPM is presented and its convergenceproperties are studied. In section 4 the new algorithm and its practical implemen-tations are given. Section 5 is devoted to some numerical experiments to show theefficiency of the method and comparing with 1D-DSPM and 2D-DSPM. Some con-2luding remarks are given in 6.
2. A brief description of 1D-DSPM and 2D-DSPM
We review 1D-DSPM and 2D-DSPM in the literature of the projection methods.As we mentioned in the previous section in each iteration of 1D-DSPM a set of doublesuccessive projection method is used. Let x k be the current approximate solution.Then the double successive projection method is applied as following. The first stepis to choose two pairs of the subspaces K = L = { v } , K = L = { v } and thenext approximate solution x k +1 is computed as followsFind e x k +1 ∈ x k + K , such that b − A e x k +1 ⊥L , (3)Find x k +1 ∈ e x k +1 + K , such that b − Ax k +1 ⊥L . (4)This framework results in [4, 7] x k +1 = x k + α v + β v where α = − p /a, β = ( cp − ap ) /ad, (5)in which a = h v , v i A , c = h v , v i A , d = h v , v i A . (6)In [7], it has been proven that this method is convergent to the exact solution x ∗ of(1) for any initial guess.In the 2D-DSPM, two two-dimensional subspaces K = L = span { v , v } arechosen and a projection process onto K and orthogonal to L is used instead of dou-ble successive projection method used in 1D-DSPM. In other words, two subspaces K = L = span { v , v } are chosen and a projection method is defined as following.Find x k +1 ∈ x k + K , such that b − Ax k +1 ⊥L . (7)In [4], it has been shown that this projection process gives α = cp − dp ad − c , β = cp − ap ad − c , p , p , a, b, and c were defined in Eqs. (5) and (6). It has been proven in[4] that the 2D-DSPM is also convergent. Theoretical analysis and numerical exper-iments presented in [4] show that 2D-DSPM is more effective than the 1D-DSPM.A main problem with 1D-DSPM and 2D-DSPM is to choose the optimal vec-tors v and v . In this paper, we first propose a generalization of 2D-DSPM and thengive a strategy to choose vectors v and v in a special case. m -dimensional successive projection method Let { v , v , . . . , v m } be a set of m independent vectors in R n . For later use, letalso V m = [ v , v , . . . , v m ]. Now we define the m D-SPM as follows. In an iteration ofthe m D-SPM we use a set of projection process onto K = span { v , v , . . . , v m } andorthogonal to L = K . In other words, two m -dimensional subspaces K and L are usedin the projection step instead of two two-dimensional subspaces used in 2D-DSPM.In this case Eq. (2) turns the formFind y m ∈ R m such that x k +1 = x k + V m y m , and V Tm ( b − Ax k +1 ) = 0 . (8)We have r k +1 = b − Ax k +1 = b − A ( x k + V m y m )= r k − AV m y m , where r k = b − Ax k . Hence from Eq. (8) we deduce0 = V Tm r k +1 = V Tm ( r k − AV m y ) = V Tm r k − V Tm AV m y m ⇒ V Tm AV m y m = V Tm r k . The matrix V Tm AV m is an SPD matrix, since A is SPD. Therefore y m = ( V Tm AV m ) − V Tm r k . (9)Hence, from (8) we conclude that x k +1 = x k + V m ( V Tm AV m ) − V Tm r k . (10)4 heorem 1. Let A be an SPD matrix and assume that x k is an approximate solutionof (1). Then k d k k A ≥ k d k +1 k A , (11) where d k = x ∗ − x k and d k +1 = x ∗ − x k +1 in which x k +1 is the approximate solutioncomputed by Eq. (10). Proof.
It can be easily verified that d k +1 = d k − V m y m and Ad k = r k where y m isdefined by (7). Then h Ad k +1 , d k +1 i = h Ad k − AV m y m , d k − V m y m i = h Ad k , d k i − h V m y m , r k i + h AV m y m , V m y m i by Ad k = r k = h Ad k , d k i − h y m , V Tm r k i by (9) . Therefore k d k k A − k d k +1 k A = h Ad k , d k i − h Ad k +1 , d k +1 i = h y m , V Tm r k i = ( V Tm r k ) T ( V Tm AV m ) − V Tm r k , by (9)=: S ( r k ) . Since ( V Tm AV m ) − is SPD then we have k d k k A − k d k +1 k A ≥ , and the desired result is obtained. (cid:3) This theorem shows that if V Tm r k = 0 then S ( r k ) = 0 and we don’t have anyreduction in the square of the A -norm of error. But, if V Tm r k = 0 then the square ofthe A -norm of error is reduced by S ( r k ) > Theorem 2.
Assume that A is an SPD matrix and L = K . Then a vector x k +1 is the result of projection method onto K orthogonal to L with the starting vector x k iff it minimizes the A -norm of the error over x k + K . Proof.
See [6], page 126. (cid:3)
This theorem shows that if K = L ⊂ K = L , then the reduction of A -norm ofthe errors obtained by the subspaces K = L is more than or equal to that of the5ubspaces K = L . Hence by increasing the value of m , the convergence rate mayincrease.In the continue we consider the special case that the vectors v i are the columnvectors of the identity matrix. The next theorem not only proves the convergence ofthe method in this special case but also gives an idea to choose the optimal vectors v i . Theorem 3.
Let { i , i , . . . , i m } be the set of indices of m components of largestabsolute values in r k such that i < i < . . . < i m . If v j = e i j , j = 1 , . . . , m then k d k A − k d new k A ≥ mnλ max ( A ) k r k k . (12) Proof.
Let E m = [ e i , e i , . . . , e i m ]. By Theorem 1, we have k d k k A − k d k +1 k A = S ( r k )= ( E Tm r k ) T ( E Tm AE m ) − E Tm r k . Then by using Theorem 1.19 in [6] we conclude S ( r k ) ≥ λ min (( E Tm AE m ) − ) k E Tm r k k ≥ λ max ( E Tm AE m ) k E Tm r k k , (13)where for a square matrix Z , λ min ( Z ) and λ max ( Z ) stand for the smallest and largesteigenvalues of Z . It can be easily verified that [3] λ max ( E Tm AE m ) ≤ λ max ( A ) , ( E Tm r k , E Tm r k )( r k , r k ) ≥ mn . (14)Hence S ( r k ) ≥ mnλ max ( A ) k r k k , (15)and the desired result is obtained. (cid:3) Eq. (12) shows the convergence of the method. Eq. (13) together with thefirst relation of the equation (14) give S ( r k ) ≥ λ max ( A ) k E Tm r k k . i j , j = 1 , . . . , m . In fact, it shows thatif these indices are the m components of the largest absolute values in r k , then thelower bound of S ( r k ) depends on k E Tm r k k , and will be as large as possible.In [3], another theorem for the convergence of the method obtained by v i = e i j waspresented and an algorithm based upon this theorem was constructed for computing asparse approximate inverse factor of an SPD matrix and was used as a preconditionerfor SPD linear system.
4. Algorithm and its practical implementations
Hence, according to the results obtained in the previous section we can summa-rized the m D-DSM in the special case that v j = e i j as following. Algorithm 1: m D-DSM1. Choose an initial guess x ∈ R n to (1) and r = b − Ax .2. Until convergence, Do3. x := x
4. For i = 1 , . . . , n , Do5. Select the indices i , i , . . . , i m of r as defined in Theorem 36. E m := [ e i , e i , . . . , e i m ]7. Solve ( E Tm AE m ) y m = E Tm r for y m x := x + E m y m r := r − AE m y m
10. EndDo11. x := x and if x has converged then Stop12. EndDoIn practice, we see that the matrix E Tm AE m is a principal submatrix of A withcolumn and row indices in { i , i , . . . , i m } . Hence, we do not need any computationfor computing the matrix E Tm AE m in step 7. For solving the linear system in step7 of this algorithm one can use the Cholesky factorization of the coefficient matrix.Step 8 of the algorithm may be written as7 For j = 1 , , . . . , m • x i j := x i j + ( y m ) j • EndDoHence only m components of the vector x are modified. Step 9 of the algorithm canbe written as r = r − m X j =1 ( y m ) j a i j where a i j is the i j column of the matrix A .It can be seen that Algorithm 2 in [4] is an special case of this algorithm. Infact, if m = 2 and the indices i and i are chosen as i = i and i = i − ij gap ( i = i − ij gap + n if i ≤ ij gap ), where ij gap is a positive integer parameter less than n , then Algorithm 2 in [4] is obtained.As we see, the first advantage of our algorithm over Algorithm 2 in [4] is thatour algorithm chooses the indices { i , i , . . . , i m } , automatically. Another advantageis that our algorithm chooses the indices such that the reduction in the square of the A -norm of the error is more than that of Algorithm 2 in [4]. Numerical experimentsin the next section also confirm also this fact. The main advantage of Algorithm 2in [4] over our algorithm is that only m components of the current residual are com-puted whereas in our algorithm the residual vector should be computed for choosing m indices of largest components in absolute value.
5. Numerical experiments
In this section we give some numerical experiments to compare our method withAlgorithm 2 in [4]. Numerical results have been obtained by some MATLAB codes.We use all of the assumptions such as initial guess, exact solution, stopping criterion,and the examples used in [4]. Let b = Ae , where e is an n -vector whose elementsare all equal to unity, i.e., e = (1 , , . . . , T . We use k x k +1 − x k k ∞ < − as thestopping criterion. An initial guess equal to x = ( x , . . . , x n ), where x i = 0 . × i , i = 1 , , . . . , n is chosen. For each of the systems we give the numerical experimentsof Algorithm 2 in [4] with ij gap = 2 and 500 and our algorithm with m = 2 , , ij gap = 2) 5 4 3 27 ( ij gap = 500)Table 2: Results for the Example 2Algorithm 2 in [4] 2D-DSM 3D-DSM 4D-DSM 5D-DSM8 ( ij gap = 2) 7 6 4 49 ( ij gap = 500) Example 1.
Let A = ( a ij ) where a ii = 4 n, a i,i +1 = a i +1 ,i = n, a ij = 0 . i = 1 , , . . . , n, j = i, i + 1 . We also assume n = 1000. Numerical experiments in terms of iteration number wereshown in Table 1.Numerical experiments presented in Table 1 show that the 2D-DSM method givesbetter results than the Algorithm 2 in [4]. This table also shows the effect of increasein m on the number of iterations for convergence. Example 2.
Let A = ( a ij ) be the same matrix used in the previous example exceptthe diagonal entries are changed to a ii = 3 n, i = 1 , , . . . , n. Numerical experiments were given in Table 2.This table also shows the advantages of our method on Algorithm 2 [4].
Example 3.
Our third set of test matrices used arise from standard five pointfinite difference scheme to discretize −△ u + a ( x, y ) u x + b ( x, y ) u y + c ( x, y ) u = f ( x, y ) , in Ω = [0 , × [0 , , where a ( x, y ) , b ( x, y ) , c ( x, y ) and d ( x, y ) are given real valued functions. We considerthree following cases: 9able 3: Results for the Example 3Cases Algorithm 2 in [4] 2D-DSM 3D-DSM 4D-DSM 5D-DSMCase 1 391 ( ij gap = 2) 226 153 116 94323 ( ij gap = 500)Case 2 312 ( ij gap = 2) 192 131 100 80256 ( ij gap = 500)Case 3 302 ( ij gap = 2) 218 151 115 93250 ( ij gap = 500)Case1 : a ( x, y ) = 0 , b ( x, y ) = 10( x + y ) , c ( x, y ) = 10( x − y ) , f ( x, y ) = 0 , Case2 : a ( x, y ) = − x + y ) , b ( x, y ) = − x − y ) , c ( x, y ) = 1 , f ( x, y ) = 0 , Case3 : a ( x, y ) = 10 e xy , b ( x, y ) = 10 e − xy , c ( x, y ) = 0 , f ( x, y ) = 0 . We assume m = 32. In this case we obtain three SPD matrices of order n = 32 ×
5. Conclusion
In this paper a generalization of the 2D-DSPM [4] which itself is a generalizationof 1D-DSPM [7] is presented. 1D-DSPM and 2D-DPM need to prescribed some sub-spaces of R n for the projection steps. But our method in the spacial case chooses thissubspaces automatically. Theoretical analysis and numerical experiments presentedin this paper showed that our method is more effective that 2D-DSPM.
6. Acknowledgments
The author would like to thank Yan-Fei Jing for providing the matrices of Example3. 10 eferences [1] R. Barrett et al.,
Template for the solution of linear systems: building blocks foriterative methods , SIAM Press: Philadelphia, 1994.[2] M. Benzi,
Preconditioning techniques for large linear systems: A survey , J. Com-put. Phys., 182 (2002) 418-477.[3] D. Khojasteh Salkuyeh and F. Toutounian,
A sparse-sparse iteration for comput-ing a sparse incomplete factorization of an SPD matrix , submitted.[4] Y.-F. Jing and T.-Z. Huang,
On a new iterative method for solving linear systemsand comparison results , J. Comput. Appl. Math., doi:10.1016/j.cam.2007.07.035,2007.[5] C.D. Meyer,
Matrix analysis and applied linear algebra , SIAM, 2004.[6] Y. Saad,
Iterative Methods for Sparse linear Systems , PWS press, New York,1995.[7] N. Ujevi´c,