[PDF] Randomized extended block Kaczmarz for solving least squares

Abstract

Randomized iterative algorithms have recently been proposed to solve large-scale linear systems. In this paper, we present a simple randomized extended block Kaczmarz algorithm that exponentially converges in the mean square to the unique minimum ℓ 2 -norm least squares solution of a given linear system of equations. The proposed algorithm is pseudoinverse-free and therefore different from the projection-based randomized double block Kaczmarz algorithm of Needell, Zhao, and Zouzias. We emphasize that our method works for all types of linear systems (consistent or inconsistent, overdetermined or underdetermined, full-rank or rank-deficient). Moreover, our approach can utilize efficient implementations on distributed computing units, yielding remarkable improvements in computational time. Numerical examples are given to show the efficiency of the new algorithm.

Full PDF

RRandomized extended block Kaczmarz for solving least squares

Kui Du ∗ , Wu-Tao Si † , Xiao-Hui Sun ‡ Abstract

Randomized iterative algorithms have recently been proposed to solve large-scale linearsystems. In this paper, we present a simple randomized extended block Kaczmarz algo-rithm that exponentially converges in the mean square to the unique minimum (cid:96) -normleast squares solution of a given linear system of equations. The proposed algorithm ispseudoinverse-free and therefore diﬀerent from the projection-based randomized double blockKaczmarz algorithm of Needell, Zhao, and Zouzias. We emphasize that our method works forall types of linear systems (consistent or inconsistent, overdetermined or underdetermined,full-rank or rank-deﬁcient). Moreover, our approach can utilize eﬃcient implementationson distributed computing units, yielding remarkable improvements in computational time.Numerical examples are given to show the eﬃciency of the new algorithm. Keywords . general linear systems, minimum (cid:96) -norm least squares solution, randomizedextended (block) Kaczmarz, exponential convergence AMS subject classiﬁcations : 65F10, 65F20

The Kaczmarz method [27] is a simple iterative method for solving a linear systems of equations Ax = b , A ∈ R m × n , b ∈ R m . Due to its simplicity and numerical performance, the Kaczmarz method has found many appli-cations in many ﬁelds, such as computer tomography [34, 28, 24], image reconstruction [45, 25],digital signal processing [9, 32], etc. At each step, the method projects the current iterate ontoone hyperplane deﬁned by a row of the system. More precisely, assuming that the i th row A i, : has been selected at the k th iteration, then the k th estimate vector x k is obtained by x k = x k − − α k A i, : x k − − b i A i, : ( A i, : ) T ( A i, : ) T , where ( A i, : ) T denotes the transpose of A i, : , b i is the i th component of b , and α k is a step-size. Numerical experiments show that using the rows of the coeﬃcient matrix in the Kaczmarzmethod in random order, rather than in their given order, can often greatly improve the con-vergence [26, 34]. In a seminal paper [48], Strohmer and Vershynin proposed a randomizedKaczmarz (RK) algorithm which exponentially converges in expectation to the solutions of con-sistent, overdetermined, full-rank linear systems. The convergence result was extended andreﬁned in various directions including inconsistent [29, 37, 51, 17, 41, 38, 21], underdetermined ∗ School of Mathematical Sciences, Xiamen University, Xiamen 361005, China ( [email protected] ). † School of Mathematical Sciences, Xiamen University, Xiamen 361005, China( ). ‡ School of Mathematical Sciences, Xiamen University, Xiamen 361005, China( ). a r X i v : . [ m a t h . NA ] J u l r rank-deﬁcient linear systems [33, 20, 46, 15], ridge regression problems [23, 31], linear fea-sibility problems [11], convex feasibility problems [36], block variants [39, 40, 35], accelerationstrategies [30, 47, 3, 4, 5, 6, 50], and many others [2, 44, 13, 14, 22].Let A † denote the Moore-Penrose pseudoinverse [7] of A . In this paper, we are interestedin the vector A † b . Here we would like to make clear what A † b stands for diﬀerent types oflinear systems (see [7, 19]):(1) If Ax = b is consistent with full-column rank A , i.e., rank( A ) = n , then A † b is theunique solution. In this case, we have m ≥ n and the linear system is overdeterminedwhen m > n .(2) If Ax = b is consistent with rank( A ) < n , then A † b is the unique minimum (cid:96) -normsolution. In this case, we have m ≥ n or m < n , and the linear system is overdetermined(resp. underdetermined) when m > n (resp. m < n ). The matrix A can be of full-rowrank, i.e., rank( A ) = m , or rank-deﬁcient, i.e., rank( A ) < m .(3) If Ax = b is inconsistent with rank( A ) = n , then A † b is the unique least squares solution.In this case, we have m ≥ n and the linear system is overdetermined when m > n .(4) If Ax = b is inconsistent with rank( A ) < n , then A † b is the unique minimum (cid:96) -normleast squares solution. In this case, we have m ≥ n or m < n , and the linear system isoverdetermined (resp. underdetermined) when m > n (resp. m < n ). The matrix A canbe of full-row rank, i.e., rank( A ) = m , or rank-deﬁcient, i.e., rank( A ) < m .If Ax = b is inconsistent, Needell [37] showed that RK does not converge to A † b . Toresolve this problem, Zouzias and Freris [51] proposed a randomized extended Kaczmarz (REK)algorithm, which uses RK twice [30, 13] at each iteration and exponentially converges in themean square to A † b . More precisely, assuming that the j th column A : ,j and the i th row A i, : have been selected at the k th iteration, REK generates two vectors z k and x k via two RKupdates (one for A T z = from z k − and the other for Ax = b − z k from x k − ): z k = z k − − ( A : ,j ) T z k − ( A : ,j ) T A : ,j A : ,j , x k = x k − − A i, : x k − − b i + z ki A i, : ( A i, : ) T ( A i, : ) T . For general linear systems (consistent or inconsistent, full-rank or rank-deﬁcient), the vector x k generated by REK exponentially converges to A † b if z ∈ b + range( A ) and x ∈ range( A T )[30, 13]. To accelerate the convergence, the following projection-based block variants [39, 40] ofRK and REK were developed. For a subset I ⊂ { , , . . . , m } and a subset J ⊂ { , , . . . , n } ,denote by A I , : and A : , J the row submatrix of A indexed by I and the column submatrix of A indexed by J , respectively. Assuming that the subset I i has been selected at the k th iteration,the randomized block Kaczmarz (RBK) algorithm [39] generates the k th estimate x k via x k = x k − − ( A I i , : ) † ( A I i , : x k − − b I i ) . Assuming that the subsets J j and I i have been selected at the k iteration, the randomizeddouble block Kaczmarz (RDBK) algorithm [40] generates the k th estimate x k via z k = z k − − A : , J j ( A : , J j ) † z k − , x k = x k − − ( A I i , : ) † ( A I i , : x k − − b I i + z k I i ) . Every m × n matrix A has a unique Moore-Penrose pseudoinverse. In particular, in this paper we will usethe following property of the pseudoinverse: A T = A T AA † . I has been selected at the k th iteration, RABK generatesthe k th estimate x k via x k = x k − − α k (cid:32)(cid:88) i ∈I ω ki A i, : x k − − b i A i, : ( A i, : ) T ( A i, : ) T (cid:33) , (1)where the weights ω ki ∈ [0 ,

1] such that (cid:80) i ∈I ω ki = 1, and the stepsize α k ∈ (0 , I and J have been selected atthe k th iteration, DSBGS generates the k th estimate x k via x k = x k − − α k I : , J ( A I , J ) T ( I : , I ) T (cid:107) A I , J (cid:107) ( Ax k − − b ) , where I denotes the identity matrix, A I , J denotes the submatrix that lies in the rows indexedby I and the columns indexed by J , and (cid:107) · (cid:107) F is the Frobenius norm. Exponential convergenceof DSBGS for consistent linear systems was proved. By setting I ⊂ { , , . . . , m } and J = { , , . . . , n } , DSBGS recovers a special case of RABK, i.e., RABK with weight ω ki = A i, : ( A i, : ) T (cid:107) A I , : (cid:107) , i ∈ I . Note that both RABK and DSBGS are very easy to implement on distributed computingunits, yielding remarkable improvements in computational time. We emphasize that convergenceresults in the mean square of RABK and DSBGS are obtained only for consistent linear systems.In this paper, based on the REK algorithm and the RABK algorithm, we present a simplerandomized extended block Kaczmarz (REBK) algorithm that exponentially converges in themean square to the unique minimum (cid:96) -norm (least squares) solution of a given general linearsystem (full-rank or rank-deﬁcient, overdetermined or underdetermined, consistent or inconsis-tent). Our method is diﬀerent from those projection-based block methods, for example, thosein [18, 1, 8, 43, 39, 40, 16]. At each step, REBK, as a direct extension of REK, uses two specialRABK (which also can be viewed as special DSBGS) updates (one for A T z = from z k − and the other for Ax = b − z k from x k − ; see Section 2 for details). Compared with REK,REBK usually has a better convergence rate and can exploit the high-level basic linear algebrasubroutine ( BLAS2 ), even fast matrix-vector multiplies (for example, if submatrices of A havecirculant or Toeplitz structures, then the Fast Fourier Transform technique can be used), andtherefore could be more eﬃcient. Compared with RDBK, REBK can be implemented on dis-tributed computing units. We refer the reader to [39, 35] for more advantages of block methods.Numerical examples are given to illustrate the eﬃciency of REBK. Organization of the paper . In the rest of this section, we give some notation. In Section2 we describe the randomized extended block Kaczmarz algorithm and prove its convergence3heory. Both the exponential convergence of the norm of the expected error and the exponentialconvergence of the expected norm of the error are discussed. In Section 3 we report the numericalresults. Finally, we present brief concluding remarks in Section 4.

Notation . For any random variable ξ , let E (cid:2) ξ (cid:3) denote its expectation. For an integer m ≥

1, let [ m ] := { , , , . . . , m } . Lowercase (upper-case) boldface letters are reserved forcolumn vectors (matrices). For any vector u ∈ R m , we use u i , u T , and (cid:107) u (cid:107) to denote the i th element, the transpose, and the (cid:96) -norm of u , respectively. We use I to denote the identitymatrix whose order is clear from the context. For any matrix A ∈ R m × n , we use A T , A † , (cid:107) A (cid:107) F ,range( A ), σ ( A ) ≥ σ ( A ) ≥ · · · ≥ σ r ( A ) > A ,respectively. Obviously, r is the rank of A . For index sets I ⊆ [ m ] and J ⊆ [ n ], let A I , : , A : , J ,and A I , J denote the row submatrix indexed by I , the column submatrix indexed by J , andthe submatrix that lies in the rows indexed by I and the columns indexed by J , respectively.We call {I , I , . . . , I s } a partition of [ m ] if I i ∩ I j = ∅ for i (cid:54) = j and ∪ si =1 I i = [ m ]. Similarly, {J , J , . . . , J t } denotes a partition of [ n ] if J i ∩ J j = ∅ for i (cid:54) = j and ∪ tj =1 J j = [ n ]. We use |I| to denote the cardinality of a set I ⊆ [ m ]. In this section, based on given partitions of [ m ] and [ n ], we propose the following randomizedextended block Kaczmarz algorithm (see Algorithm 1) for solving consistent or inconsistent linearsystems. We emphasize that this algorithm can be implemented on distributed computing units. Algorithm 1:

Randomized extended block Kaczmarz (REBK)Let {I , I , . . . , I s } and {J , J , . . . , J t } be partitions of [ m ] and [ n ], respectively.Let α >

0. Initialize z ∈ R m and x ∈ R n . for k = 1 , , . . . , do Pick j ∈ [ t ] with probability (cid:107) A : , J j (cid:107) / (cid:107) A (cid:107) Set z k = z k − − α (cid:107) A : , J j (cid:107) A : , J j ( A : , J j ) T z k − Pick i ∈ [ s ] with probability (cid:107) A I i , : (cid:107) / (cid:107) A (cid:107) Set x k = x k − − α (cid:107) A I i , : (cid:107) ( A I i , : ) T ( A I i , : x k − − b I i + z k I i )Here we only consider constant stepsize for simplicity. By choosing the row partition parameter s = m , the column partition parameter t = n , and the stepsize α = 1, we recover the well-knownrandomized extended Kaczmarz algorithm of Zouzias and Freris [51]. REBK uses two RABKupdates (see (1)) at each step: • RABK update for A T z = from z k − z k = z k − − α  (cid:88) l ∈J j ω kl ( A : ,l ) T z k − ( A : ,l ) T A : ,l A : ,l  , ω kl = ( A : ,l ) T A : ,l (cid:107) A : , J j (cid:107) ; • RABK update for Ax = b − z k from x k − x k = x k − − α (cid:88) l ∈I i ω kl A l, : x k − − b l + z kl A l, : ( A l, : ) T ( A l, : ) T  , ω kl = A l, : ( A l, : ) T (cid:107) A I i , : (cid:107) .

4e note that if z = in REBK, then all z k ≡ , which yields the update of x k is exactly thesame as that of RABK.Before proving the convergence theory of REBK for general linear systems, we give thefollowing notation. Let E k − (cid:2) · (cid:3) denote the conditional expectation conditioned on the ﬁrst k − E k − (cid:2) · (cid:3) = E (cid:2) ·| j , i , j , i , . . . , j k − , i k − (cid:3) , where j l is the l th column block chosen and i l is the l th row block chosen. We denote theconditional expectation conditioned on the ﬁrst k − k th column blockchosen as E ik − (cid:2) · (cid:3) = E (cid:2) ·| j , i , j , i , . . . , j k − , i k − , j k (cid:3) . Then by the law of total expectation we have E k − (cid:2) · (cid:3) = E k − (cid:2) E ik − (cid:2) · (cid:3)(cid:3) . In this subsection we show the exponential convergence of the norm of the expected error, i.e., (cid:107) E (cid:2) x k − A † b (cid:3) (cid:107) . The convergence of the norm of the expected error depends on the positive number δ deﬁned as δ := max ≤ i ≤ r (cid:12)(cid:12)(cid:12)(cid:12) − ασ i ( A ) (cid:107) A (cid:107) (cid:12)(cid:12)(cid:12)(cid:12) . The following lemma will be used and its proof is straightforward (e.g., via the singular valuedecomposition).

Lemma 1.

Let α > and A ∈ R m × n be any nonzero real matrix with rank( A ) = r . For every u ∈ range( A T ) , it holds (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:18) I − α A T A (cid:107) A (cid:107) (cid:19) k u (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) ≤ δ k (cid:107) u (cid:107) . We give the convergence of the norm of the expected error of REBK in the following theorem.

Theorem 2.

For any given consistent or inconsistent linear system Ax = b , let x k be the k thiterate of REBK with z ∈ R m and x ∈ range( A T ) . It holds (cid:107) E (cid:2) x k − A † b (cid:3) (cid:107) ≤ δ k (cid:18) (cid:107) x − A † b (cid:107) + αk (cid:107) A T z (cid:107) (cid:107) A (cid:107) (cid:19) . Proof.

Note that E k − (cid:2) z k (cid:3) = z k − − E k − (cid:20) α (cid:107) A : , J j (cid:107) A : , J j ( A : , J j ) T (cid:21) z k − = (cid:18) I − α AA T (cid:107) A (cid:107) (cid:19) z k − , and therefore E (cid:2) z k (cid:3) = E (cid:2) E k − (cid:2) z k (cid:3)(cid:3) = (cid:18) I − α AA T (cid:107) A (cid:107) (cid:19) E (cid:2) z k − (cid:3) = (cid:18) I − α AA T (cid:107) A (cid:107) (cid:19) k z . A T b = A T AA † b , we have E k − (cid:2) x k − A † b (cid:3) = E k − (cid:2) E ik − (cid:2) x k − A † b (cid:3)(cid:3) = E k − (cid:20) E ik − (cid:20) x k − − A † b − α (cid:107) A I i , : (cid:107) ( A I i , : ) T ( A I i , : x k − − b I i + z k I i ) (cid:21)(cid:21) = E k − (cid:20) x k − − A † b − α A T ( Ax k − − b + z k ) (cid:107) A (cid:107) (cid:21) = x k − − A † b − α A T Ax k − − A T b (cid:107) A (cid:107) − α A T (cid:107) A (cid:107) E k − (cid:2) z k (cid:3) = x k − − A † b − α A T Ax k − − A T AA † b (cid:107) A (cid:107) − α A T (cid:107) A (cid:107) E k − (cid:2) z k (cid:3) = (cid:18) I − α A T A (cid:107) A (cid:107) (cid:19) ( x k − − A † b ) − α A T (cid:107) A (cid:107) (cid:18) I − α AA T (cid:107) A (cid:107) (cid:19) z k − . Taking expectation gives E (cid:2) x k − A † b (cid:3) = E (cid:2) E k − (cid:2) x k − A † b (cid:3)(cid:3) = (cid:18) I − α A T A (cid:107) A (cid:107) (cid:19) E (cid:2) x k − − A † b (cid:3) − α A T (cid:107) A (cid:107) (cid:18) I − α AA T (cid:107) A (cid:107) (cid:19) E (cid:2) z k − (cid:3) = (cid:18) I − α A T A (cid:107) A (cid:107) (cid:19) E (cid:2) x k − − A † b (cid:3) − α A T (cid:107) A (cid:107) (cid:18) I − α AA T (cid:107) A (cid:107) (cid:19) k z = (cid:18) I − α A T A (cid:107) A (cid:107) (cid:19) E (cid:2) x k − − A † b (cid:3) − α (cid:18) I − α A T A (cid:107) A (cid:107) (cid:19) k A T z (cid:107) A (cid:107) = (cid:18) I − α A T A (cid:107) A (cid:107) (cid:19) E (cid:2) x k − − A † b (cid:3) − α (cid:18) I − α A T A (cid:107) A (cid:107) (cid:19) k A T z (cid:107) A (cid:107) = · · · = (cid:18) I − α A T A (cid:107) A (cid:107) (cid:19) k ( x − A † b ) − αk (cid:18) I − α A T A (cid:107) A (cid:107) (cid:19) k A T z (cid:107) A (cid:107) . Applying the norms to both sides we obtain (cid:107) E (cid:2) x k − A † b (cid:3) (cid:107) = (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:18) I − α A T A (cid:107) A (cid:107) (cid:19) k ( x − A † b ) − αk (cid:18) I − α A T A (cid:107) A (cid:107) (cid:19) k A T z (cid:107) A (cid:107) (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) ≤ (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:18) I − α A T A (cid:107) A (cid:107) (cid:19) k ( x − A † b ) (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) + (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) αk (cid:18) I − α A T A (cid:107) A (cid:107) (cid:19) k A T z (cid:107) A (cid:107) (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) ≤ δ k (cid:18) (cid:107) x − A † b (cid:107) + αk (cid:107) A T z (cid:107) (cid:107) A (cid:107) (cid:19) . Here the last inequality follows from the fact that x ∈ range( A T ), A † b ∈ range( A T ), A T z ∈ range( A T ), and Lemma 1. Remark 3.

To ensure convergence of the expected error, it suﬃces to have δ = max ≤ i ≤ r (cid:12)(cid:12)(cid:12)(cid:12) − ασ i ( A ) (cid:107) A (cid:107) (cid:12)(cid:12)(cid:12)(cid:12) < , which implies < α < (cid:107) A (cid:107) σ ( A ) . he optimal α in Theorem 2 is (see [42]) (cid:107) A (cid:107) σ ( A ) + σ r ( A ) = argmin <α< (cid:107) A (cid:107) σ A ) max ≤ i ≤ r (cid:12)(cid:12)(cid:12)(cid:12) − ασ i ( A ) (cid:107) A (cid:107) (cid:12)(cid:12)(cid:12)(cid:12) , and the corresponding convergence rate δ is σ ( A ) − σ r ( A ) σ ( A ) + σ r ( A ) . In this subsection we show the exponential convergence of the expected norm of the error, i.e., E (cid:2) (cid:107) x k − A † b (cid:107) (cid:3) . The convergence of the expected norm of the error depends on the positive numbers η and ρ deﬁned as η := 1 − (2 α − α β I max ) σ r ( A ) (cid:107) A (cid:107) , ρ := 1 − (2 α − α β J max ) σ r ( A ) (cid:107) A (cid:107) , where β I max := max i ∈ [ s ] (cid:107) A I i , : (cid:107) (cid:107) A I i , : (cid:107) , β J max := max j ∈ [ t ] (cid:107) A : , J j (cid:107) (cid:107) A : , J j (cid:107) . The following lemmas will be used extensively in this paper.

Lemma 4.

Let A ∈ R m × n be any nonzero real matrix with rank( A ) = r . For every u ∈ range( A ) , it holds (cid:107) A T u (cid:107) ≥ σ r ( A ) (cid:107) u (cid:107) . Lemma 5.

Let A ∈ R m × n be any nonzero real matrix. For every u ∈ R m , it holds u T ( AA T ) u ≤ (cid:107) A (cid:107) (cid:107) A T u (cid:107) . The proof of Lemma 4 is straightforward (e.g., via the singular value decomposition), andLemma 5 follows from u T ( AA T ) u = u T A ( A T A ) A T u ≤ (cid:107) A T A (cid:107) (cid:107) A T u (cid:107) = (cid:107) A (cid:107) (cid:107) A T u (cid:107) . In the following lemma we show that the vector z k generated in REBK with z ∈ b + range( A )converges to b ⊥ =: ( I − AA † ) b , which is the orthogonal projection of z onto the set { z | A T z = } . Lemma 6.

For any given consistent or inconsistent linear system Ax = b , let z k be the vectorgenerated in REBK with z ∈ b + range( A ) . Assume < α < /β J max . It holds E (cid:2) (cid:107) z k − b ⊥ (cid:107) (cid:3) ≤ ρ k (cid:107) z − b ⊥ (cid:107) . Proof.

By ( A : , J j ) T b ⊥ = , we have z k − b ⊥ = z k − − b ⊥ − α (cid:107) A : , J j (cid:107) A : , J j ( A : , J j ) T ( z k − − b ⊥ ) . (2)7y z − b ⊥ = AA † z ∈ range( A ) and A : , J j ( A : , J j ) T ( z k − − b ⊥ ) ∈ range( A ), we can show that z k − b ⊥ ∈ range( A ) by induction. It follows from (2) that (cid:107) z k − b ⊥ (cid:107) = (cid:107) z k − − b ⊥ (cid:107) − α (cid:107) ( A : , J j ) T ( z k − − b ⊥ ) (cid:107) (cid:107) A : , J j (cid:107) + α ( z k − − b ⊥ ) T (cid:32) A : , J j (cid:107) A : , J j (cid:107) F (cid:18) A : , J j (cid:107) A : , J j (cid:107) F (cid:19) T (cid:33) ( z k − − b ⊥ ) ≤ (cid:107) z k − − b ⊥ (cid:107) − (cid:32) α − α (cid:107) A : , J j (cid:107) (cid:107) A : , J j (cid:107) (cid:33) (cid:107) ( A : , J j ) T ( z k − − b ⊥ ) (cid:107) (cid:107) A : , J j (cid:107) (by Lemma 5) ≤ (cid:107) z k − − b ⊥ (cid:107) − (2 α − α β J max ) (cid:107) ( A : , J j ) T ( z k − − b ⊥ ) (cid:107) (cid:107) A : , J j (cid:107) . Taking the conditioned expectation on the ﬁrst k − E k − (cid:2) (cid:107) z k − b ⊥ (cid:107) (cid:3) ≤ (cid:107) z k − − b ⊥ (cid:107) − (2 α − α β J max ) (cid:107) A T ( z k − − b ⊥ ) (cid:107) (cid:107) A (cid:107) ≤ (cid:107) z k − − b ⊥ (cid:107) − (2 α − α β J max ) σ r ( A ) (cid:107) A (cid:107) (cid:107) z k − − b ⊥ (cid:107) (by Lemma 4 and 0 < α < /β J max )= ρ (cid:107) z k − − b ⊥ (cid:107) Taking expectation again gives E (cid:2) (cid:107) z k − b ⊥ (cid:107) (cid:3) = E (cid:2) E k − (cid:2) (cid:107) z k − b ⊥ (cid:107) (cid:3)(cid:3) ≤ ρ E (cid:2) (cid:107) z k − − b ⊥ (cid:107) (cid:3) ≤ ρ k (cid:107) z − b ⊥ (cid:107) . This completes the proof.We give the main convergence result of REBK in the following theorem.

Theorem 7.

For any given consistent or inconsistent linear system Ax = b , let x k be the k th iterate of REBK with z ∈ b + range( A ) and x ∈ range( A T ) . Assume that < α < / max( β I max , β J max ) . For any ε > , it holds E (cid:2) (cid:107) x k − A † b (cid:107) (cid:3) ≤ (1 + ε ) k η k (cid:107) x − A † b (cid:107) + (cid:18) ε (cid:19) α β I max (cid:107) z − b ⊥ (cid:107) (cid:107) A (cid:107) k − (cid:88) l =0 ρ k − l (1 + ε ) l η l . Proof.

Let (cid:98) x k = x k − − α (cid:107) A I i , : (cid:107) ( A I i , : ) T A I i , : ( x k − − A † b ) , which is actually one RABK update for the linear system Ax = AA † b from x k − . It followsfrom x k − (cid:98) x k = α (cid:107) A I i , : (cid:107) ( A I i , : ) T ( b I i − A I i , : A † b − z k I i )8hat (cid:107) x k − (cid:98) x k (cid:107) = α (cid:107) A I i , : (cid:107) (cid:107) ( A I i , : ) T ( b I i − A I i , : A † b − z k I i ) (cid:107) ≤ α (cid:107) A I i , : (cid:107) (cid:107) A I i , : (cid:107) (cid:107) A I i , : (cid:107) (cid:107) b I i − A I i , : A † b − z k I i (cid:107) ≤ α β I max (cid:107) A I i , : (cid:107) (cid:107) b I i − A I i , : A † b − z k I i (cid:107) . (3)It follows from E k − (cid:2) (cid:107) x k − (cid:98) x k (cid:107) (cid:3) = E k − (cid:2) E ik − (cid:2) (cid:107) x k − (cid:98) x k (cid:107) (cid:3)(cid:3) ≤ E k − (cid:20) E ik − (cid:20) α β I max (cid:107) A I i , : (cid:107) (cid:107) b I i − A I i , : A † b − z k I i (cid:107) (cid:21)(cid:21) (by (3))= E k − (cid:20) α β I max (cid:107) b − AA † b − z k (cid:107) (cid:107) A (cid:107) (cid:21) that E (cid:2) (cid:107) x k − (cid:98) x k (cid:107) (cid:3) = E (cid:2) E k − (cid:2) (cid:107) x k − (cid:98) x k (cid:107) (cid:3)(cid:3) ≤ α β I max (cid:107) A (cid:107) E (cid:2) (cid:107) b − AA † b − z k (cid:107) (cid:3) ≤ α β I max ρ k (cid:107) A (cid:107) (cid:107) z − b ⊥ (cid:107) . (by Lemma 6) (4)By x ∈ range( A T ), A † b ∈ range( A T ), ( A I i , : ) T ( A I i , : x k − − b I i + z k I i ) ∈ range( A T ), and x k − A † b = x k − − A † b − α (cid:107) A I i , : (cid:107) ( A I i , : ) T ( A I i , : x k − − b I i + z k I i ) , we can show that x k − A † b ∈ range( A T ) by induction. By (cid:107) (cid:98) x k − A † b (cid:107) = (cid:107) x k − − A † b (cid:107) − α (cid:107) A I i , : ( x k − − A † b ) (cid:107) (cid:107) A I i , : (cid:107) + α ( x k − − A † b ) T (cid:32)(cid:18) A I i , : (cid:107) A I i , : (cid:107) F (cid:19) T A I i , : (cid:107) A I i , : (cid:107) F (cid:33) ( x k − − A † b ) ≤ (cid:107) x k − − A † b (cid:107) − (2 α − α β I max ) (cid:107) A I i , : ( x k − − A † b ) (cid:107) (cid:107) A I i , : (cid:107) , (by Lemma 5)we have E k − (cid:2) (cid:107) (cid:98) x k − A † b (cid:107) (cid:3) ≤ (cid:107) x k − − A † b (cid:107) − (2 α − α β I max ) (cid:107) A ( x k − − A † b ) (cid:107) (cid:107) A (cid:107) ≤ (cid:107) x k − − A † b (cid:107) − (2 α − α β I max ) σ r ( A ) (cid:107) ( x k − − A † b ) (cid:107) (cid:107) A (cid:107) (by Lemma 4 and 0 < α < /β I max )= η (cid:107) x k − − A † b (cid:107) , which yields E (cid:2) (cid:107) (cid:98) x k − A † b (cid:107) (cid:3) ≤ η E (cid:2) (cid:107) x k − − A † b (cid:107) (cid:3) . (5)9ote that for any ε >

0, we have (cid:107) x k − A † b (cid:107) = (cid:107) x k − (cid:98) x k + (cid:98) x k − A † b (cid:107) ≤ ( (cid:107) x k − (cid:98) x k (cid:107) + (cid:107) (cid:98) x k − A † b (cid:107) ) ≤ (cid:107) x k − (cid:98) x k (cid:107) + (cid:107) (cid:98) x k − A † b (cid:107) + 2 (cid:107) x k − (cid:98) x k (cid:107) (cid:107) (cid:98) x k − A † b (cid:107) ≤ (cid:18) ε (cid:19) (cid:107) x k − (cid:98) x k (cid:107) + (1 + ε ) (cid:107) (cid:98) x k − A † b (cid:107) . (6)Combining (4), (5), and (6) yields E (cid:2) (cid:107) x k − A † b (cid:107) (cid:3) ≤ (cid:18) ε (cid:19) E (cid:2) (cid:107) x k − (cid:98) x k (cid:107) (cid:3) + (1 + ε ) E (cid:2) (cid:107) (cid:98) x k − A † b (cid:107) (cid:3) ≤ (cid:18) ε (cid:19) α β I max ρ k (cid:107) A (cid:107) (cid:107) z − b ⊥ (cid:107) + (1 + ε ) η E (cid:2) (cid:107) x k − − A † b (cid:107) (cid:3) ≤ (cid:18) ε (cid:19) α β I max (cid:107) z − b ⊥ (cid:107) (cid:107) A (cid:107) ( ρ k + ρ k − (1 + ε ) η )+ (1 + ε ) η E (cid:2) (cid:107) x k − − A † b (cid:107) (cid:3) ≤ · · ·≤ (cid:18) ε (cid:19) α β I max (cid:107) z − b ⊥ (cid:107) (cid:107) A (cid:107) k − (cid:88) l =0 ρ k − l (1 + ε ) l η l + (1 + ε ) k η k (cid:107) x − A † b (cid:107) . This completes the proof.

Remark 8.

For the case REBK with s = m , t = n and α = 1 (i.e., REK), we have β I max = max i ∈ [ m ] (cid:107) A i, : (cid:107) (cid:107) A i, : (cid:107) = 1 , β J max = max j ∈ [ n ] (cid:107) A : ,j (cid:107) (cid:107) A : ,j (cid:107) = 1 . Therefore, η = 1 − (2 α − α β I max ) σ r ( A ) (cid:107) A (cid:107) = 1 − σ r ( A ) (cid:107) A (cid:107) , and ρ = 1 − (2 α − α β J max ) σ r ( A ) (cid:107) A (cid:107) = 1 − σ r ( A ) (cid:107) A (cid:107) . It follows from (cid:98) x k − A † b = (cid:18) I − ( A i, : ) T A i, : (cid:107) A i, : (cid:107) (cid:19) ( x k − − A † b ) and x k − (cid:98) x k = b i − A i, : A † b − z ki (cid:107) A i, : (cid:107) ( A i, : ) T that ( (cid:98) x k − A † b ) T ( x k − (cid:98) x k ) = 0 . Then we have (cid:107) x k − A † b (cid:107) = (cid:107) x k − (cid:98) x k (cid:107) + (cid:107) (cid:98) x k − A † b (cid:107) , hich yields the following convergence for REK (see [13]): E (cid:2) (cid:107) x k − A † b (cid:107) (cid:3) = E (cid:2) (cid:107) x k − (cid:98) x k (cid:107) (cid:3) + E (cid:2) (cid:107) (cid:98) x k − A † b (cid:107) (cid:3) ≤ α ρ k (cid:107) A (cid:107) (cid:107) z − b ⊥ (cid:107) + ρ E (cid:2) (cid:107) x k − − A † b (cid:107) (cid:3) ≤ α ρ k (cid:107) z − b ⊥ (cid:107) (cid:107) A (cid:107) + ρ E (cid:2) (cid:107) x k − − A † b (cid:107) (cid:3) ≤ · · ·≤ ρ k (cid:18) k (cid:107) z − b ⊥ (cid:107) (cid:107) A (cid:107) + (cid:107) x − A † b (cid:107) (cid:19) . Actually our proof is a modiﬁcation of that of Zouzias and Freris [51]. We reorganize thearguments used by Zouzias and Freris and reﬁne the analysis to get a better convergence estimate.

Remark 9.

Let (cid:98) ρ := max( η, ρ ) and β max := max( β I max , β J max ) . Then we have (cid:98) ρ = 1 − (2 α − α β max ) σ r ( A ) (cid:107) A (cid:107) . By Theorem 7, we have E (cid:2) (cid:107) x k − A † b (cid:107) (cid:3) ≤ (1 + ε ) k η k (cid:107) x − A † b (cid:107) + (cid:18) ε (cid:19) α β I max (cid:107) z − b ⊥ (cid:107) (cid:107) A (cid:107) k − (cid:88) l =0 ρ k − l (1 + ε ) l η l ≤ (1 + ε ) k (cid:98) ρ k (cid:107) x − A † b (cid:107) + (cid:18) ε (cid:19) α β I max (cid:107) z − b ⊥ (cid:107) (cid:107) A (cid:107) (cid:98) ρ k k − (cid:88) l =0 (1 + ε ) l ≤ (1 + ε ) k (cid:98) ρ k (cid:107) x − A † b (cid:107) + (cid:18) ε (cid:19) α β I max (cid:107) z − b ⊥ (cid:107) (cid:107) A (cid:107) (cid:98) ρ k (1 + ε ) k − ε ≤ (1 + ε ) k (cid:98) ρ k (cid:18) (cid:107) x − A † b (cid:107) + (1 + ε ) α β I max (cid:107) z − b ⊥ (cid:107) ε (cid:107) A (cid:107) (cid:19) , which shows that REBK exponentially converges in the mean square to the minimum (cid:96) -normleast squares solution of a given linear system of equations with the rate (1 + ε ) (cid:98) ρ if < α < /β max . Setting α = 1 /β max yields (cid:98) ρ = 1 − σ r ( A ) β max (cid:107) A (cid:107) , which is better than the rate of REK (see Remark 8) ρ = 1 − σ r ( A ) (cid:107) A (cid:107) if β max < . A smaller β max means a faster convergence in terms of iterations. Recalling that β I max := max i ∈ [ s ] (cid:107) A I i , : (cid:107) (cid:107) A I i , : (cid:107) and β J max := max j ∈ [ t ] (cid:107) A : , J j (cid:107) (cid:107) A : , J j (cid:107) , we have max i ∈ [ s ] |I i | ≤ max i ∈ [ s ] A I i , : ) ≤ β I max ≤ nd max j ∈ [ t ] |J j | ≤ max j ∈ [ t ] A : , I j ) ≤ β J max ≤ . Therefore, max (cid:18) max i ∈ [ s ] |I i | , max j ∈ [ t ] |J j | (cid:19) ≤ β max ≤ , which means that REBK is at least as fast as REK in terms of iterations. The numerical resultsin Section 3 show that the convergence of REBK with appropriate block sizes and stepsizes ismuch faster than that of REK both in the numbers of iterations and the computing times. Remark 10.

It was shown in [20] that the convergence of x k to A † b under the expected normof the error (Theorem 7) is a stronger form of convergence than the convergence of the norm ofthe expected error (Theorem 2), as the former also guarantees that the variance of x ki (the i thelement of x k ) converges to zero for i = 1 , . . . , n . By Remark 3, we know < α < (cid:107) A (cid:107) /σ ( A ) guarantees the convergence of the norm of the expected error. By Remark 9, we know <α < /β max guarantees the convergence of the expected norm of the error. However, since theconvergence estimate in Remark 9 usually is not sharp, the stepsize α satisfying /β max ≤ α < (cid:107) A (cid:107) /σ ( A ) is also possible to result in convergence (see Figure 1, Tables 2 and 3 in Section3). In this section, we compare the performance of the randomized extended block Kaczmarz(REBK) algorithm proposed in this paper against the randomized extended Kaczmarz (REK)algorithm [51] and the projection-based randomized double block Kaczmarz (RDBK) algorithm[40] on a variety of test problems. We do not claim optimized implementations of the algorithms,and only run on small or medium-scale problems. The purpose is only to demonstrate that evenin these simple examples, REBK oﬀers signiﬁcant advantages to REK. All experiments are per-formed using MATLAB (version R2019a) on a laptop with 2.7-GHz Intel Core i7 processor, 16GB memory, and Mac operating system.To construct an inconsistent linear system, we set b = Ax + r where x is a vector withentries generated from a standard normal distribution and the residual r ∈ null( A T ). Notethat one can obtain such a vector r by the MATLAB function null . For all algorithms, we set z = b and x = and stop if the error (cid:107) x k − A † b (cid:107) ≤ − . We report the average numberof iterations (denoted as ITER) and the average computing time in seconds (denoted as CPU)of REK, RDBK, and REBK. Note that A \ b will usually not be the same as pinv(A)*b when A is rank-deﬁcient or underdetermined. We use MATLAB’s lsqminnorm (which is typically moreeﬃcient than pinv ) to solve the small least squares problems at each step of RDBK. We refer thereader to [39, 40] for more numerical aspects of RDBK. We also report the speed-up of REBKagainst REK, which is deﬁned as speed-up = CPU of REKCPU of REBK . For the block methods, we assume that the subsets {I i } s − i =1 and {J j } t − j =1 have the same size τ (i.e., |I i | = |J j | = τ ). We consider the row partition {I i } si =1 : I i = { ( i − τ + 1 , ( i − τ + 2 , . . . , iτ } , i = 1 , , . . . , s − , I s = { ( s − τ + 1 , ( s − τ + 2 , . . . , m } , |I s | ≤ τ, {J j } tj =1 : J j = { ( j − τ + 1 , ( j − τ + 2 , . . . , jτ } , j = 1 , , . . . , t − , J t = { ( t − τ + 1 , ( t − τ + 2 , . . . , n } , |J t | ≤ τ. Two types of coeﬃcient matrices are generated as follows. • Type I: For given m , n , r = rank ( A ), and κ >

1, we construct a matrix A by A = UDV T , where U ∈ R m × r and V ∈ R n × r . Entries of U and V are generated from a standardnormal distribution, and then, columns are orthonormalized,[ U , ∼ ] = qr ( randn ( m , r ) , ); [ V , ∼ ] = qr ( randn ( n , r ) , );The matrix D is an r × r diagonal matrix whose diagonal entries are uniformly distributednumbers in (1 , κ ), D = diag ( + ( κ − ) . ∗ rand ( r , ));So the condition number of A is upper bounded by κ . • Type II: For given m , n , entries of A are generated from a standard normal distribution, A = randn ( m , n );So A is a full-rank matrix almost surely.In Figure 1, we plot the error (cid:107) x k − A † b (cid:107) of REBK with a ﬁxed block size ( τ = 10) anddiﬀerent stepsizes ( α from 0 . /β max to 2 . /β max ) for two inconsistent linear systems withcoeﬃcient matrices of Types I ( A = UDV T with m = 500, n = 250, r = 150, κ = 2) and II( A = randn(500,250) ). It is observed that the convergence of REBK becomes faster as theincrease of the stepsize, and then slows down after reaching the fastest rate. -6 -4 -2 -6 -4 -2 Figure 1: The average (10 trials of each case) error (cid:107) x k − A † b (cid:107) of REBK with block size τ = 10and diﬀerent stepsizes α from 0 . /β max to 2 . /β max for two inconsistent linear systems. Left:Type I matrix A = UDV T with m = 500, n = 250, r = 150, κ = 2. Right: Type II matrix A = randn(500,250) . 13n Tables 1 and 2, we report the numbers of iterations and the computing times of the REK,RDBK, and REBK algorithms for solving inconsistent linear systems. For the block algorithms(RDBK and REBK), a ﬁxed block size τ = 10 is used. For the REBK algorithm, empirical step-sizes α = 1 . /β max and α = 2 . /β max are used for Type I and Type II matrices, respectively.From these two tables, we observe: (i) in all cases, the RDBK and REBK algorithms vastlyoutperform the REK algorithm in terms of both the numbers of iterations and the computingtimes; (ii) for Type I matrix, the convergence rates of the RDBK and REBK algorithms arealmost the same in terms of the numbers of iterations; (iii) for Type II matrix, REBK performsbetter than RDBK in terms of the numbers of iterations.Table 1: The average (10 trials of each algorithm) ITER and CPU of REK, RDBK( τ = 10), andREBK( τ = 10, α = 1 . /β max ) for inconsistent linear systems with random coeﬃcient matrices A of Type I: A = UDV T . m × n rank κ REK RDBK REBKITER CPU ITER CPU α ITER CPU speed-up250 ×

500 150 2 5826 0.26 572 0.21 10.87 586 0.05 4.90250 ×

500 150 10 65520 2.87 6166 2.19 9.36 7365 0.63 4.59500 × × ×

250 150 2 5755 0.25 562 0.19 10.70 578 0.03 7.32500 ×

250 150 10 63741 2.76 5784 1.90 10.13 6424 0.36 7.60500 ×

250 250 2 9971 0.43 940 0.31 12.47 961 0.06 7.81500 ×

250 250 10 119182 5.14 11328 3.73 10.99 10783 0.61 8.431000 ×

500 250 2 9959 0.55 974 0.39 12.10 987 0.10 5.531000 ×

500 250 10 118134 6.54 11236 4.44 11.20 10349 1.03 6.361000 ×

500 500 2 20188 1.11 2007 0.80 13.84 2115 0.21 5.201000 ×

500 500 10 254117 14.01 25361 10.00 12.67 20432 2.03 6.92

Table 2: The average (10 trials of each algorithm) ITER and CPU of REK, RDBK( τ = 10), andREBK( τ = 10, α = 2 . /β max ) for inconsistent linear systems with random coeﬃcient matrices A of Type II: A = randn(m,n) . m × n rank σ ( A ) σ r ( A ) REK RDBK REBKITER CPU ITER CPU α ITER CPU speed-up250 ×

120 120 5.25 18060 0.66 1646 0.50 13.48 1337 0.06 10.54500 ×

250 250 5.73 41016 1.79 3811 1.26 14.50 2885 0.17 10.81750 ×

370 370 5.80 59660 2.91 5929 2.20 16.23 4115 0.36 8.071000 ×

500 500 5.74 83093 4.61 8183 3.25 16.42 5422 0.55 8.41

In Figure 2, we plot the error (cid:107) x k − A † b (cid:107) and the computing times of REBK with block sizes τ = 5 , , , , ,

200 and stepsize α = 1 . /β max for two inconsistent linear systems withcoeﬃcient matrices of Types I ( A = UDV T with m = 20000, n = 5000, r = 4500, κ = 2) andII ( A = randn(20000,5000) ). The average numbers of required iterations are also reported.We observe: (i) increasing block size and using the empirical stepsize α = 1 . /β max lead toa better convergence in terms of the numbers of iterations; (ii) with the increase of block size,the computing time ﬁrst decreases, then increases after reaching the minimum value, and ﬁnallytends to be stable. This means that for suﬃciently large block size the decrease in iterationcomplexity cannot compensate for the increase in cost per iteration. On the other hand, if adistributed version of REBK is implemented, a larger τ will be better.14 -6 -4 -2 -6 -4 -2 Figure 2: The average (10 trials of each case) error (cid:107) x k − A † b (cid:107) and CPU of REBK withdiﬀerent block sizes τ = 10 , , ,

200 and stepsize α = 1 . /β max for inconsistent linearsystems. The average numbers of required iterations are also reported. Upper: Type I matrix A = UDV T with m = 20000, n = 5000, r = 4500, and κ = 2. Lower: Type II matrix A = randn(20000,5000) . 15n Figure 3, we plot the computing times of the REK, RDBK, and REBK algorithms forinconsistent linear systems with coeﬃcient matrices of Types I ( A = UDV T with m = 2000,4000, . . . , 20000, n = 500, r = 250, κ = 2) and II ( A = randn(m,n) with m = 2000, 4000, . . . ,20000, n = 500). For all cases, the block size τ = 10 and the stepsize α = 1 . /β max are used.We observe that both RDBK and REBK are better than REK, and REBK is the best. Figure 3: The average (10 trials of each algorithm) CPU of REK, RDBK( τ = 10), andREBK( τ = 10 , α = 1 . /β max ) for inconsistent linear systems. Left: Type I matrix A = UDV T with m = 2000 , , . . . , n = 500, r = 250, and κ = 2. Right: Type II matrix A = randn(m,n) with m = 2000 , , . . . , n = 500. Finally, we test REK and REBK using eight inconsistent linear systems with coeﬃcient matricesfrom the University of Florida sparse matrix collection [10]. The eight matrices are abtaha1 , flower 5 1 , football , lp nug15 , relat6 , relat7 , Sandi authors , and

WorldCities . In Table3, we report the numbers of iterations and the computing times for the REK and REBK algo-rithms. For each matrix, we tested two stepsizes of REBK, the ﬁrst is 1 /β max and the second isempirical. We observe that REBK based on good choices of block size and stepsize signiﬁcantlyoutperforms REK. Moreover, good stepsize and block size are problem dependent. We have proposed a randomized extended block Kaczmarz (REBK) algorithm for solving gen-eral linear systems and prove its convergence theory. At each step, REBK uses two RABK (withspecial choice of weights) updates. The new algorithm can utilize eﬃcient implementations ondistributed computing units. Numerical experiments show that the crucial point for guaran-teeing fast convergence is to obtain good block size and stepsize. Finding appropriate variablestepsize by the adaptive extrapolation [35] and proposing more eﬀective partitions based on thetechniques of [39, 12, 49, 35] should be valuable topics. We also note that RABK allows theﬂexibility that the distributions from which blocks are selected do not require the blocks to forma partition of the columns, or rows. Designing variants of REBK based on RABK with randomsamplings that do not depend on the partitions is straightforward. We believe the techniqueused in the proof of Theorem 7 still works for these variants. Although the analysis will bemore complicated. Besides, developing parallel and accelerated variants of REBK based on theapproach used by Richt´arik and Tak´aˇc [47] is also worth exploring. We will work on these topicsin the future. 16able 3: The average (10 trials of each algorithm) ITER and CPU of REK and REBK( τ, α ) forinconsistent linear systems with coeﬃcient matrices from [10]. For each matrix, two stepsizes ofREBK are tested: the ﬁrst is 1 /β max , and the second is empirical. Matrix m × n rank σ ( A ) σ r ( A ) REK REBKITER CPU τ α ITER CPU speed-up abtaha1 ×

209 209 12.23 276946 89.38 10 1.82 151395 68.30 1.315 56064 25.34 3.53 flower 5 1 ×

201 179 13.70 135117 5.16 5 1 136037 6.15 0.844 34381 1.55 3.34 football ×

35 19 166.47 810792 21.99 5 1 858215 30.64 0.722 409995 14.63 1.50 lp nug15 × relat6 ×

157 137 7.74 34536 2.43 10 1 34273 3.81 0.642.5 13971 1.56 1.56 relat7 × Sandi authors ×

86 72 189.58 2525141 73.28 5 1 2533343 99.36 0.742.5 999294 39.15 1.87

WorldCities ×

100 100 66.00 120699 4.32 5 1.13 105647 4.52 0.962.5 47372 2.02 2.14

Acknowledgments

The authors are thankful to the referees for their detailed comments and valuable suggestionsthat have led to remarkable improvements. The research of the ﬁrst author was supported by theNational Natural Science Foundation of China (No.11771364) and the Fundamental ResearchFunds for the Central Universities (No.20720180008).

References [1] M. Arioli, I. Duﬀ, J. Noailles, and D. Ruiz. A block projection method for sparse matrices.

SIAM J. Sci. Statist. Comput. , 13(1):47–70, 1992.[2] Z.-Z. Bai and W.-T. Wu. On convergence rate of the randomized Kaczmarz method.

LinearAlgebra Appl. , 553:252–269, 2018.[3] Z.-Z. Bai and W.-T. Wu. On greedy randomized Kaczmarz method for solving large sparselinear systems.

SIAM J. Sci. Comput. , 40(1):A592–A606, 2018.[4] Z.-Z. Bai and W.-T. Wu. On relaxed greedy randomized Kaczmarz methods for solvinglarge sparse linear systems.

Appl. Math. Lett. , 83:21–26, 2018.[5] Z.-Z. Bai and W.-T. Wu. On greedy randomized coordinate descent methods for solvinglarge linear least-squares problems.

Numer. Linear Algebra Appl. , 26(4):e2237, 15, 2019.[6] Z.-Z. Bai and W.-T. Wu. On partially randomized extended Kaczmarz method for solvinglarge sparse overdetermined inconsistent linear systems.

Linear Algebra Appl. , 578:225–250,2019. 177] A. Ben-Israel and T. N. E. Greville.

Generalized Inverses: Theory and Applications. , vol-ume 15 of

CMS Books in Mathematics/Ouvrages de Math´ematiques de la SMC . Springer-Verlag, New York, second edition, 2003.[8] R. Bramley and A. Sameh. Row projection methods for large nonsymmetric linear systems.

SIAM J. Sci. Statist. Comput. , 13(1):168–193, 1992.[9] C. Byrne. A uniﬁed treatment of some iterative algorithms in signal processing and imagereconstruction.

Inverse Problems , 20(1):103–120, 2004.[10] T. A. Davis and Y. Hu. The University of Florida sparse matrix collection.

ACM Trans.Math. Software , 38(1):Art. 1, 25, 2011.[11] J. A. De Loera, J. Haddock, and D. Needell. A sampling Kaczmarz-Motzkin algorithm forlinear feasibility.

SIAM J. Sci. Comput. , 39(5):S66–S87, 2017.[12] L. A. Drummond, I. S. Duﬀ, R. Guivarch, D. Ruiz, and M. Zenadi. Partitioning strategiesfor the block Cimmino algorithm.

J. Engrg. Math. , 93:21–39, 2015.[13] K. Du. Tight upper bounds for the convergence of the randomized extended Kaczmarz andGauss-Seidel algorithms.

Numer. Linear Algebra Appl. , 26(3):e2233, 14, 2019.[14] K. Du and H. Gao. A new theoretical estimate for the convergence rate of the maximalweighted residual Kaczmarz algorithm.

Numer. Math. Theory Methods Appl. , 12(2):627–639, 2019.[15] K. Du and X. Sun. A doubly stochastic block Gauss-Seidel algorithm for solving linearequations. arXiv preprint arXiv:1912.13291 , 2019.[16] I. S. Duﬀ, R. Guivarch, D. Ruiz, and M. Zenadi. The augmented block Cimmino distributedmethod.

SIAM J. Sci. Comput. , 37(3):A1248–A1269, 2015.[17] B. Dumitrescu. On the relation between the randomized extended Kaczmarz algorithm andcoordinate descent.

BIT , 55(4):1005–1015, 2015.[18] T. Elfving. Block-iterative methods for consistent and inconsistent linear equations.

Numer.Math. , 35(1):1–12, 1980.[19] G. H. Golub and C. F. Van Loan.

Matrix computations . Johns Hopkins Studies in theMathematical Sciences. Johns Hopkins University Press, Baltimore, MD, fourth edition,2013.[20] R. M. Gower and P. Richt´arik. Randomized iterative methods for linear systems.

SIAM J.Matrix Anal. Appl. , 36(4):1660–1690, 2015.[21] J. Haddock and D. Needell. On Motzkin’s method for inconsistent linear systems.

BIT ,59(2):387–401, 2019.[22] J. Haddock and D. Needell. Randomized projection methods for linear systems with arbi-trarily large sparse corruptions.

SIAM J. Sci. Comput. , 41(5):S19–S36, 2019.[23] A. Hefny, D. Needell, and A. Ramdas. Rows versus columns: randomized Kaczmarz orGauss-Seidel for ridge regression.

SIAM J. Sci. Comput. , 39(5):S528–S542, 2017.[24] G. T. Herman.

Fundamentals of computerized tomography . Advances in Pattern Recogni-tion. Springer, Dordrecht, second edition, 2009. Image reconstruction from projections.1825] G. T. Herman and R. Davidi. Image reconstruction from a small number of projections.

Inverse Problems , 24(4):045011, 17, 2008.[26] G. T. Herman and L. B. Meyer. Algebraic reconstruction techniques can be made compu-tationally eﬃcient.

IEEE Trans. Medical Imaging , 12(3):600–9, 1993.[27] S. Kaczmarz. Angen¨aherte auﬂ¨osung von systemen linearer gleichungen.

Bull. Intern. Acad.Polonaise Sci. Lett., Cl. Sci. Math. Nat. A , 35:355–357, 1937.[28] A. C. Kak and M. Slaney.

Principles of computerized tomographic imaging , volume 33 of

Classics in Applied Mathematics . Society for Industrial and Applied Mathematics (SIAM),Philadelphia, PA, 2001. Reprint of the 1988 original.[29] D. Leventhal and A. S. Lewis. Randomized methods for linear constraints: convergencerates and conditioning.

Math. Oper. Res. , 35(3):641–654, 2010.[30] J. Liu and S. J. Wright. An accelerated randomized Kaczmarz algorithm.

Math. Comp. ,85(297):153–178, 2016.[31] Y. Liu and C.-Q. Gu. Variant of greedy randomized Kaczmarz for ridge regression.

Appl.Numer. Math. , 143:223–246, 2019.[32] D. A. Lorenz, S. Wenger, F. Sch¨opfer, and M. Magnor. A sparse kaczmarz solver and a lin-earized bregman method for online compressed sensing. In

IEEE International Conferenceon Image Processing , pages 1347 – 1351, 2015.[33] A. Ma, D. Needell, and A. Ramdas. Convergence properties of the randomized extendedGauss-Seidel and Kaczmarz methods.

SIAM J. Matrix Anal. Appl. , 36(4):1590–1604, 2015.[34] F. Natterer.

The mathematics of computerized tomography , volume 32 of

Classics in AppliedMathematics . Society for Industrial and Applied Mathematics (SIAM), Philadelphia, PA,2001. Reprint of the 1986 original.[35] I. Necoara. Faster randomized block kaczmarz algorithms.

SIAM J. Matrix Anal. Appl. ,40(4):1425–1452, 2019.[36] I. Necoara, P. Richt´arik, and A. Patrascu. Randomized projection methods for convexfeasibility: conditioning and convergence rates.

SIAM J. Optim. , 29(4):2814–2852, 2019.[37] D. Needell. Randomized Kaczmarz solver for noisy linear systems.

BIT , 50(2):395–403,2010.[38] D. Needell, N. Srebro, and R. Ward. Stochastic gradient descent, weighted sampling, andthe randomized Kaczmarz algorithm.

Math. Program. , 155(1-2, Ser. A):549–573, 2016.[39] D. Needell and J. A. Tropp. Paved with good intentions: analysis of a randomized blockKaczmarz method.

Linear Algebra Appl. , 441:199–221, 2014.[40] D. Needell, R. Zhao, and A. Zouzias. Randomized block Kaczmarz method with projectionfor solving least squares.

Linear Algebra Appl. , 484:322–343, 2015.[41] S. Petra and C. Popa. Single projection Kaczmarz extended algorithms.

Numer. Algorithms ,73(3):791–806, 2016.[42] B. T. Poljak. Gradient methods for minimizing functionals. ˇZ. Vyˇcisl. Mat i Mat. Fiz. ,3:643–653, 1963. 1943] C. Popa. Extensions of block-projections methods with relaxation parameters to inconsis-tent and rank-deﬁcient least-squares problems.

BIT , 38(1):151–176, 1998.[44] C. Popa. Convergence rates for Kaczmarz-type algorithms.

Numer. Algorithms , 79(1):1–17,2018.[45] C. Popa and R. Zdunek. Kaczmarz extended algorithm for tomographic image reconstruc-tion from limited data.

Math. Comput. Simulation , 65(6):579–598, 2004.[46] M. Razaviyayn, M. Hong, N. Reyhanian, and Z.-Q. Luo. A linearly convergent doublystochastic Gauss-Seidel algorithm for solving linear equations and a certain class of over-parameterized optimization problems.

Math. Program. , 176(1-2, Ser. B):465–496, 2019.[47] P. Richt´arik and M. Tak´aˇc. Stochastic reformulations of linear systems algorithms andconvergence theory. arXiv preprint arXiv:1706.01108 , 2017.[48] T. Strohmer and R. Vershynin. A randomized Kaczmarz algorithm with exponential con-vergence.

J. Fourier Anal. Appl. , 15(2):262–278, 2009.[49] F. S. Torun, M. Manguoglu, and C. Aykanat. A novel partitioning method for acceleratingthe block Cimmino algorithm.

SIAM J. Sci. Comput. , 40(6):C827–C850, 2018.[50] J.-J. Zhang. A new greedy Kaczmarz algorithm for the solution of very large linear systems.

Appl. Math. Lett. , 91:207–212, 2019.[51] A. Zouzias and N. M. Freris. Randomized extended Kaczmarz for solving least squares.