CComputing with Quasiseparable Matrices
Clément Pernet
Univ. Grenoble Alpes Laboratoire LIP Inria, Université de Lyon46, Allée d’Italie, F69364 Lyon Cedex 07, [email protected]
ABSTRACT
The class of quasiseparable matrices is defined by a pairof bounds, called the quasiseparable orders, on the ranksof the sub-matrices entirely located in their strictly lowerand upper triangular parts. These arise naturally in ap-plications, as e.g. the inverse of band matrices, and arewidely used for they admit structured representations allow-ing to compute with them in time linear in the dimension.We show, in this paper, the connection between the notionof quasiseparability and the rank profile matrix invariant,presented in [Dumas & al. ISSAC’15]. This allows us topropose an algorithm computing the quasiseparable orders( r L , r U ) in time O ( n s ω − ) where s = max( r L , r U ) and ω the exponent of matrix multiplication. We then present twonew structured representations, a binary tree of PLUQ de-compositions, and the Bruhat generator, using respectively O ( ns log ns ) and O ( ns ) field elements instead of O ( ns ) forthe classical generator and O ( ns log n ) for the hierarchicallysemiseparable representations. We present algorithms com-puting these representations in time O ( n s ω − ). These rep-resentations allow a matrix-vector product in time linear inthe size of their representation. Lastly we show how to mul-tiply two such structured matrices in time O ( n s ω − ).
1. INTRODUCTION
The inverse of a tridiagonal matrix, when it exists, is adense matrix with the property that all sub-matrices en-tirely below or above its diagonal have rank at most one.This property and many generalizations of it, defining thesemiseparable and quasiseparable matrices, have been ex-tensively studied over the past 80 years. We refer to [16]and [17] for a broad bibliographic overview on the topic.In this paper, we will focus on the class of quasiseparablematrices, introduced in [8]:
Definition An n × n matrix M is ( r L , r U ) -quasisepa-rable if its strictly lower and upper triangular parts satisfy Publication rights licensed to ACM. ACM acknowledges that this contribution wasauthored or co-authored by an employee, contractor or affiliate of a national govern-ment. As such, the Government retains a nonexclusive, royalty-free right to publish orreproduce this article, or to allow others to do so, for Government purposes only.
ISSAC ’16, July 19 - 22, 2016, Waterloo, ON, Canada
ACM ISBN 978-1-4503-4380-0/16/07. . . $15.00Copyright is held by the owner/author(s). Publication rights licensed to ACM.DOI: http://dx.doi.org/10.1145/2930889.2930915 the following low rank structure: for all ≤ k ≤ n − ,rank ( M k +1 ..n, ..k ) ≤ r L , (1) rank ( M ..k,k +1 ..n ) ≤ r U . (2) The values r L and r U define the quasiseparable orders of M . Quasiseparable matrices can be represented with fewerthan n coefficients, using a structured representation, calleda generator. The most commonly used generator [8, 16, 17,9, 1] for a matrix M , consists of ( n −
1) pairs of vectors p ( i ) , q ( i ) of size r L , ( n −
1) pairs of vectors g ( i ) , h ( i ) of size r U , n − a ( i ) of dimension r L × r L , and n − b ( i ) of dimension r U × r U such that M i,j = p ( i ) T a >ij q ( j ) , ≤ j < i ≤ nd ( i ) , ≤ i = j ≤ ng ( i ) T b 1) for i > j + 1, b i,i +1 = 1. Thisrepresentation, of size O ( n ( r L + r U )) makes it possible toapply a vector in O ( n ( r L + r U )) field operations, multiplytwo quasiseparable matrices in time O ( n max( r L , r U ) ) andalso compute the inverse in time O ( n max( r L , r U ) ) [8].The contribution of this paper, is to make the connec-tion between the notion of quasiseparability and a matrixinvariant, the rank profile matrix, that we introduced in [6].More precisely, we show that the PLUQ decompositions ofthe lower and upper triangular parts of a quasiseparable ma-trix, using a certain class of pivoting strategies, also have astructure ensuring that their memory footprint and the timecomplexity to compute them does not depend on the rankof the matrix but on the quasiseparable order (which can bearbitrarily lower). Note that we will assume throughout thepaper that the PLUQ decomposition algorithms mentionedhave the ability to reveal ranks. This is the case when com-puting with exact arithmetic (e.g. finite fields or multipreci-sion rationals), but not always with finite precision floatingpoint arithmetic. In the latter context, a special care needto be taken for the pivoting of LU decompositions [10, 14],and QR or SVD decompositions are often more commonlyused [2, 3]. This study is motivated by the design of new al-gorithms on polynomial matrices where quasiseparable ma-trices naturally occur, and more generally by the frameworkof the LinBox library [15] for black-box exact linear algebra.After defining and recalling the properties of the rank pro-file matrix in Section 2, we propose in Section 3 an algo-rithm computing the quasiseparable orders ( r L , r U ) in time O ( n s ω − ) where s = max( r L , r U ) and ω the exponent of a r X i v : . [ c s . S C ] S e p atrix multiplication. We then present in Section 4 twonew structured representations, a binary tree of PLUQ de-compositions, and the Bruhat generator, using respectively O ( ns log ns ) and O ( ns ) field elements instead of O ( ns ) forthe previously known generators. We present in Section 5algorithms computing them in time O ( n s ω − ). These rep-resentations support a matrix-vector product in time linearin the size of their representation. Lastly we show how tomultiply two such structured matrices in time O ( n s ω − ).Throughout the paper, A i..j,k..l will denote the sub-matrixof A of row indices between i and j and column indicesbetween k and l . The matrix of the canonical basis, with aone at position ( i, j ) will be denoted by ∆ ( i,j ) . 2. PRELIMINARIES2.1 Left triangular matrices We will make intensive use of matrices with non-zero el-ements only located above the main anti-diagonal. We willrefer to these matrices as left triangular, to avoid any con-fusion with upper triangular matrices. Definition A left triangular matrix is any m × n ma-trix A such that A i,j = 0 for all i > n − j . The left triangular part of a matrix A , denoted by Left( A )will refer to the left triangular matrix extracted from it. Wewill need the following property on the left triangular partof the product of a matrix by a triangular matrix. Lemma Let A = BU be an m × n matrix where U is n × n upper triangular. Then Left ( A ) = Left ( Left ( B ) U ) . Proof. Let ¯A = Left( A ) , ¯B = Left( B ). For j ≤ n − i, wehave ¯A i,j = (cid:80) nk =1 B i,k · U k,j = (cid:80) jk =1 B i,k · U k,j as U is uppertriangular. Now for k ≤ j ≤ n − i , B i,k = ¯B i,k , which provesthat the left triangular part of A is that of Left( B ) U .Applying Lemma 1 on A T yields Lemma 2 Lemma Let A = LB be an m × n matrix where L is m × m lower triangular. Then Left ( A ) = Left ( L Left ( B )) . Lastly, we will extend the notion of quasiseparable or-der to left triangular matrices, in the natural way: the leftquasiseparable order is the maximal rank of any leading k × ( n − k ) sub-matrix. When no confusion may occur, wewill abuse the definition and simply call it the quasiseparableorder. We will use a matrix invariant, introduced in [6, Theo-rem 1], that summarizes the information on the ranks ofany leading sub-matrices of a given input matrix. Definition [6, Theorem 1] The rank profile matrix ofan m × n matrix A of rank r is the unique m × n matrix R A ,with only r non-zero coefficients, all equal to one, located ondistinct rows and columns such that any leading sub-matricesof R A has the same rank as the corresponding leading sub-matrix in A . This invariant can be computed in just one Gaussian elim-ination of the matrix A , at the cost of O ( mnr ω − ) field oper-ations [6], provided some conditions on the pivoting strategy being used. It is obtained from the corresponding PLUQ de-composition as the product R A = P (cid:20) I r ( m − r ) × ( n − r ) (cid:21) Q . We also recall in Theorem 1 an important property of suchPLUQ decompositions revealing the rank profile matrix. Theorem 1 ([7, Th. 24], [5, Th. 1]). Let A = PLUQ be a PLUQ decomposition revealing the rank profile matrixof A . Then, P (cid:2) L 0 m × ( m − r ) (cid:3) P T is lower triangular and Q T (cid:20) U0 ( n − r ) × n (cid:21) Q is upper triangular. Lemma The rank profile matrix invariant is preservedby multiplication1. to the left with an invertible lower triangular matrix,2. to the right with an invertible upper triangular matrix. Proof. Let B = LA for an invertible lower triangularmatrix L . Then rank( B ..i, ..j ) = rank( L ..i, ..i A ..i, ..j ) =rank( A ..i, ..j ) for any i ≤ m, j ≤ n . Hence R B = R A . 3. COMPUTING THE QUASISEPARABLEORDERS Let M be an n × n matrix of which one want to determinethe quasiseparable orders ( r L , r U ). Let L and U be respec-tively the lower triangular part and the upper triangularpart of M .Let J n be the unit anti-diagonal matrix. Multiplying onthe left by J n reverts the row order while multiplying onthe right by J n reverts the column order. Hence both J n L and UJ n are left triangular matrices. Remark that the con-ditions (1) and (2) state that all leading k × ( n − k ) sub-matrices of J n L and UJ n have rank no greater than r L and r U respectively. We will then use the rank profile matrix ofthese two left triangular matrices to find these parameters. First, note that the rank profile matrix of a left triangularmatrix is not necessarily left triangular. For example, therank profile matrix of (cid:104) (cid:105) is (cid:104) (cid:105) . However, only theleft triangular part of the rank profile matrix is sufficient tocompute the left quasiseparable orders.Suppose for the moment that the left-triangular part ofthe rank profile matrix of a left triangular matrix is given(returned by a function LT-RPM). It remains to enumer-ate all leading k × ( n − k ) sub-matrices and find the onewith the largest number of non-zero elements. Algorithm 1shows how to compute the largest rank of all leading sub-matrices of such a matrix. Run on J n L and UJ n , it returnssuccessively the quasiseparable orders r L and r U .This algorithm runs in O ( n ) provided that the rank profilematrix R is stored in a compact way, e.g. using a vector of r pairs of pivot indices ([( i , j ) , . . . , ( i r , j r )]. We now deal with the missing component: computing theleft triangular part of the rank profile matrix of a left trian-gular matrix. lgorithm 1 QS-order Input: A , an n × n matrix Output: max { rank( A ..k, ..n − k ) : 1 ≤ k ≤ n − } R ← LT-RPM( A ) (cid:46) The left triangular part of the rank profilematrix of A rows ← (False,. . . ,False)cols ← (False,. . . ,False) for all ( i, j ) such that R i,j = 1 do rows[i] ← Truecols[j] ← True end for s, r ← for i = 1 . . . n doif rows[ i ] then r ← r + 1 if cols[ n − i + 1] then r ← r − s ← max( s, r ) end forreturn s A first approach is to run any Gaussian elimination algo-rithm that can reveal the rank profile matrix, as describedin [6]. In particular, the PLUQ decomposition algorithmof [5] computes the rank profile matrix of A in O ( n r ω − )where r = rank( A ). However this estimate is pessimisticas it does not take into account the left triangular shape ofthe matrix. Moreover, this estimate does not depend on theleft quasiseparable order s but on the rank r , which may bemuch higher. Remark The discrepancy between the rank r of a lefttriangular matrix and its quasiseparable order arises fromthe location of the pivots in its rank profile matrix. Piv-ots located near the top left corner of the matrix are sharedby many leading sub-matrices, and are therefore likely con-tribute to the quasiseparable order. On the other hand, pivotsnear the anti-diagonal can be numerous, but do not add upto a large quasiseparable order. As an illustration, considerthe two following extreme cases:1. a matrix A with generic rank profile. Then the leading r × r sub-matrix of A has rank r and the quasiseparableorder is s = r .2. the matrix with n − ones right above the anti-diagonal.It has rank r = n − but quasiseparable order . Remark 1 indicates that in the unlucky cases when r (cid:29) s ,the computation should reduce to instances of smaller sizes,hence a trade-off should exist between, on one hand, thediscrepency between r and s , and on the other hand, thedimension n of the problems. All contributions presented inthe remaining of the paper are based on such trade-offs. In order to reach a complexity depending on s and not r ,we adapt in Algorithm 2 the tile recursive algorithm of [5],so that the left triangular structure of the input matrix ispreserved and can be used to reduce the amount of compu-tation.Algorithm 2 does not assume that the input matrix isleft triangular, as it will be called recursively with arbitrarymatrices, but guarantees to return the left triangular partof the rank profile matrix. While the top left quadrant A Algorithm 2 LT-RPM: Left Triangular part of the RankProfile Matrix Input: A : an n × n matrix Output: R : the left triangular part of the RPM of A if n = 1 then return [0] Split A = (cid:20) A A A (cid:21) where A is (cid:98) n (cid:99) × (cid:98) n (cid:99) Decompose A = P (cid:20) L M (cid:21) (cid:2) U V (cid:3) Q R ← P (cid:20) I r (cid:21) Q where r = rank( A ). (cid:20) B B (cid:21) ← P T A (cid:2) C C (cid:3) ← A Q T Here A = L \ U V B M C C . D ← L − B E ← C U − F ← B − M D G ← C − EV Here A = L \ U V DM . H ← P (cid:20) r × n F (cid:21) I ← (cid:2) r × n G (cid:3) Q R ← LT-RPM ( H ) R ← LT-RPM ( I ) return R ← (cid:20) R R R (cid:21) is eliminated using any PLUQ decomposition algorithm re-vealing the rank profile matrix, the top right and bottomleft quadrants are handled recursively. Theorem Given an n × n input matrix A with left qua-siseparable order s , Algorithm 2 computes the left triangularpart of the rank profile matrix of A in O ( n s ω − ) . Proof. First remark that P (cid:20) DF (cid:21) = P (cid:20) L − − M L − I n − r (cid:21) P T (cid:124) (cid:123)(cid:122) (cid:125) L P (cid:20) B B (cid:21) = LA . Hence L (cid:2) A A (cid:3) = P (cid:20) (cid:2) U V (cid:3) Q D0 F (cid:21) . From Theorem 1, the matrix L is lower triangular and byLemma 3 the rank profile matrix of (cid:2) A A (cid:3) equals thatof P (cid:20) (cid:2) U V (cid:3) Q D0 F (cid:21) . Now as U is upper triangu-lar and non-singular, this rank profile matrix is in turnthat of P (cid:20) (cid:2) U V (cid:3) Q 00 F (cid:21) and its left triangular partis (cid:2) R R (cid:3) .By a similar reasoning, (cid:2) R R (cid:3) T is the left triangularpart of the rank profile matrix of (cid:2) A A (cid:3) T , which showsthat the algorithm is correct.Let s be the left quasiseparable order of H and s thatof I . The number of field operations to run Algorithm 2 is T ( n, s ) = αr ω − n + T LT-RPM ( n/ , s ) + T LT-RPM ( n/ , s )or a positive constant α . We will prove by induction that T ( n, s ) ≤ αs ω − n .Again, since L is lower triangular, the rank profile matrixof LA is that of A and the quasiseparable orders of thetwo matrices are the same. Now H is the matrix LA withsome rows zeroed out, hence s , the quasiseparable orderof H is no greater than that of A which is less or equalto s . Hence max( r , s , s ) ≤ s and we obtain T ( n, s ) ≤ αs ω − n + 4 αs ω − ( n/ = 2 αs ω − n . 4. MORE COMPACT GENERATORS Taking advantage of their low rank property, quasisep-arable matrices can be represented by a structured repre-sentation allowing to compute efficiently with them, as forexample in the context of QR or QZ elimination [9, 1].The most commonly used generator, as described in [8, 1]and in the introduction, represents an ( r L , r U )-quasiseparablematrix of order n by O ( n ( r L + r U )) field coefficients .Alternatively, hierarchically semiseparable representations(HSS) [18, 11] use numerical rank revealing factorizations ofthe off-diagonal blocks in a divide and conquer approach,reducing the size to O (max( r L , r U ) n log n ) [11].A third approach, based on Givens or unitary weights [4],performs another kind of elimination so as to compact thelow rank off-diagonal blocks of the input matrix.We propose, in this section, two alternative generators,based on an exact PLUQ decomposition revealing the rankprofile matrix. The first one matches the best space com-plexity of the HSS representation, and improves the timecomplexity to compute it by a reduction to fast matrix mul-tiplication. The second one also improves on the space com-plexity of HSS representation by removing the extra log n factor and shares some similarities with the unitary weightrepresentations of [4].First, remark that storing a PLUQ decomposition of rank r and dimension n × n uses 2 rn − r coefficients: each of the L and U factor has dimension n × r or r × n ; the negative r term comes from the lower and upper triangular shapesof L and U . Here again, the rank r can be larger than thequasiseparable order s thus storing directly a PLUQ decom-position is too expensive. But as in Remark 1, the settingwhere r (cid:29) s is precisely when the pivots are near the anti-diagonal, and therefore the L and U factors have an addi-tional structure, with numerous zeros. The two proposedgenerators, rely on this fact. Following the divide and conquer scheme of Algorithm 2,we propose a first generator requiring O ( n ( r L log nr L + r U log nr U )) (3)coefficients.For a left triangular matrix A = (cid:20) A A A (cid:21) , the sub-matrix A is represented by its PLUQ decomposition ( P , L , U , Q ),which requires 2 r n ≤ sn field coefficients for L and U and2 n indices for P and Q . This scheme is then recursively ap-plied for the representation of A and A . These matriceshave quasiseparable order at most s , therefore the following Note that the statement of O ( n ( r L + r U )) for the samegenerator in [9] is erroneous. recurrence relation for the size of the representation holds: (cid:26) S ( n, s ) = sn + 2 S ( n/ , s ) for s < n/ S ( n, s ) = n + 2 S ( n/ , n/ 4) for s ≥ n/ s ≥ n/ 2, it solves in S ( n, s ) = n . Then for s < n/ S ( n, s ) = sn + 2 sn/ · · · + 2 k S ( n/ k , s ), for k such that n k ≤ s < n k − , which is k = (cid:100) log ns (cid:101) . Hence S ( n, s ) = sn log ns + sn = O ( sn log ns ). The estimate (3) is obtainedby applying this generator to the upper and lower triangularparts of the ( r L , r U )-quasiseparable matrix.This first generator does not take fully advantage of therank structure of the matrix: the representation of each anti-diagonal block is independent from the pivots found in theblock A . The second generator, that will be presented inthe next section adresses this issue, in order to remove thelogarithmic factors in the estimate (3). We propose an alternative generator inspired by the gen-eralized Bruhat decomposition [13, 12, 7]. Contrarily to theformer one, it is not depending on a specific recursive cuttingof the matrix.Given a left triangular matrix A of quasiseparable order s and a PLUQ decomposition of it, revealing its rank profilematrix E , the generator consists in the three matrices L = Left( P (cid:2) L 0 (cid:3) Q ) , (4) E = Left( E ) , (5) U = Left( P (cid:20) U0 (cid:21) Q ) . (6)Lemma 4 shows that these three matrices suffice to recoverthe initial left triangular matrix. Lemma A = Left ( LE T U ) Proof. A = P (cid:2) L 0 m × ( n − r ) (cid:3) QQ T (cid:20) U0 ( n − r ) × n (cid:21) Q . FromTheorem 1, the matrix Q T (cid:20) U0 (cid:21) Q is upper triangular and thematrix P (cid:2) L 0 (cid:3) P T is lower triangular. Applying Lemma 1yields A = Left( A ) = Left( L Q T (cid:20) U0 (cid:21) Q ) = Left( L E T P (cid:20) U0 (cid:21) Q ) , where E = P (cid:2) I r (cid:3) Q . Then, as L E T is the matrix P (cid:2) L 0 (cid:3) P T with some coefficients zeroed out, it is lower triangular,hence applying again Lemma 2 yields A = Left( L E T U ) . (7)Consider any non-zero coefficient e j,i of E T that is not inits the left triangular part, i.e. j > n − i . Its contributionto the product L E T , is only of the form L k,j e j,i . Howeverthe leading coefficient in column j of P (cid:2) L (cid:3) Q is preciselyat position ( i, j ). Since i > n − j , this means that the j -thcolumn of L is all zero, and therefore e i,j has no contributionto the product. Hence we finally have A = Left( LE T U ).We now analyze the space required by this generator. Lemma Consider an n × n left triangular rank profilematrix R with quasiseparable order s . Then a left triangularmatrix L all zero except at the positions of the pivots of R and below these pivots, does not contain more than s ( n − s ) non-zero coefficients. roof. Let p ( k ) = rank( R ..k, ..n − k ). The value p ( k )indicates the number of non zero columns located in the k × n − k leading sub-matrix of L . Consequently the sum (cid:80) n − k =1 p ( k ) is an upper bound on the number of non-zero co-efficients in L . Since p ( k ) ≤ s , it is bounded by sn . More pre-cisely, there is no more than k pivots in the first k columnsand the first k rows, hence p ( k ) ≤ k and p ( n − k ) ≤ k for k ≤ s . The bound becomes s ( s +1)+( n − s − s = s ( n − s ). Corollary The generator ( L , E , U ) uses s ( n − s ) fieldcoefficients and O ( n ) additional indices. Proof. The leading column elements of L are located atthe pivot positions of the left triangular rank profile matrix E . Lemma 5 can therefore be applied to show that thismatrix occupies no more than s ( n − s ) non-zero coefficients.The same argument applies to the matrix U .Figure 1 illustrates this generator on a left triangular ma-trix of quasiseparable order 5. As the supports of L and U Figure 1: Support of the L (yellow), E (black) and U (red) matrices of the Bruhat generator for a × left triangular matrix of quasiseparable order . are disjoint, the two matrices can be shown on the same lefttriangular matrix. The pivots of E (black) are the leadingcoefficients of every non-zero row of U and non-zero columnof L . Corollary Any ( r L , r U ) -quasiseparable matrix of di-mension n × n can be represented by a generator using nomore than n ( r L + r U ) + n − r L − r U ) field elements. The sparse structure of the Bruhat generator makes it notamenable to the use of fast matrix arithmetic. We thereforepropose here a slight variation of it, that we will use insection 5 for fast complexity estimates. We will first describethis compact representation for the L factor of the Bruhatgenerator.First, remark that there exists a permutation matrix Q moving the non-zero columns of L to the first r positions,sorted by increasing leading row index, i.e. such that LQ isin column echelon form. The matrix LQ is now compacted,but still has r = rank( A ) columns, which may exceed s and thus preventing to reach complexities in terms of n and s only. We will again use the argument of Lemma 5 to producea more compact representation with only O ( ns ) non-zeroelements, stored in dense blocks. Algorithm 3 shows howto build such a representation composed of a block diagonalmatrix and a block sub-diagonal matrix, where all blockshave column dimension s : D S D S D ... ... S t D t . Algorithm 3 Compressing the Bruhat generator Input: L : the first matrix of the Bruhat generator Output: D , S , T , Q : the compression of L Q ← a permutation s.t. LQ is in column echelon form C ← LQ (cid:2) I r (cid:3) where r = rank( L ) Split C in column slices of width s . (cid:46) C = C C C ... ... ... C t C t ... C tt where C ii is k i × s . D ← Diag( C , . . . , C tt ) C ← C − D = ... ... ... C t ... C t,t − T ← I n for i = 3 . . . t do for each non zero column j of (cid:20) C i,i − ... C t,i − (cid:21) do Let k be a zero column of (cid:20) C i,i − ... C t,i − (cid:21) Move col. j in C i,i − ... C t,i − to col. k in C i,i − ... C t,i − . T ← ( I n + ∆ ( k,j ) − ∆ ( k,k ) ) × T end for end for S ← C = ... ... C t,t − Return ( D , S , T , Q ) Lemma Algorithm 3 computes a tuple ( D , S , T , Q ) where Q is a permutation matrix putting L in column echelon form, T ∈ { , } r × r , D = Diag ( D , . . . , D t ) , S = ... ... S t where each D i and S i is k i × s for k i ≥ s and (cid:80) ti =1 k i = n .This tuple is the compact Bruhat generator for L and satis-fies L = (cid:2) D + ST 0 n × ( n − r ) (cid:3) Q T . Proof. First, note that for every i , the dimensions ofthe blocks S i and D i are that of the block C ii . This blockcontains s pivots, hence k i ≥ s . We then prove that therealways exists a zero column to pick at step 10. The loci of thepossible non-zero elements in L are column segments belowa pivot and above the anti-diagonal. From Lemma 5, thesesegments have the property that each row of L is intersectedby no more than s of them. This property is preserved bycolumn permutation, and still holds on the matrix C . Inthe first row of (cid:2) C i . . . C ii (cid:3) , there is a pivot located inthe block C ii . Hence there is at most s − (cid:2) C i . . . C i,i − (cid:3) . These s − C i,i − of column dimension s . igure 2: Support of the matrices LQ (left), PU (cen-ter) and of the corresponding compact Bruhat gen-erator (right and bottom) for the matrix of Figure 1.In the compact Bruhat generator: D is in black and S in magenta and yellow; those rows and columnsmoved at step 11 of Algorithm 3 are in yellow. There only remains to show that ST is the matrix C ofstep 6. For every pair of indices ( j, k ) selected in loop 8,right multiplication by ( I n + ∆ ( k,j ) − ∆ ( k,k ) ) adds up column k to column j and zeroes out column k . On matrix S , thishas the effect of reverting each operation done at step 11 inthe reverse order of the loop 8.A compact representation of U is obtained in Lemma 7 byrunning Algorithm 3 on U T and transposing its output. Lemma There exist a tuple ( D , S , T , P ) called the com-pact Bruhat generator for U such that P is a permutationmatrix putting U in row echelon form, T ∈ { , } r × r , D = Diag ( D , . . . , D t ) , S = (cid:34) ... ... t (cid:35) where each D i and S i is s × k i for k i ≥ s and (cid:80) ti =1 k i = n and U = P T (cid:20) D + TS0 ( n − r ) × n (cid:21) . According to (7), the reconstruction of the initial ma-trix A , from the compact Bruhat generators, writes A = ( D L + S L T L ) R ( D U + T U S U ) (8)where R is the leading r × r sub-matrix of Q T E T P T . As ithas full rank, it is a permutation matrix.This factorization is a compact version of the generalizedBruhat decomposition [13, 7]: the left factor is a columnechelon form, the right factor a row echelon form. 5. COST OF COMPUTING WITH THE NEWGENERATORS5.1 Computation of the generators Let T ( n, s ) denote the cost of the computation of thebinary tree generator for an n × n matrix of order of qua-siseparability s . It satisfies the recurrence relation T ( n, s ) = K ω s ω − (cid:0) n (cid:1) + 2 T ( n/ , s ), which solves in T ( n, s ) = K ω s ω − n with K ω = 2 ω − (2 ω − ω − − C ω where C ω is the leading constant of the complexity of matrixmultiplication [5]. We propose in Algorithm 4 an evolution of Algorithm 2to compute the factors of the Bruhat generator. Algorithm 4 LT-Bruhat Input: A : an n × n matrix Output: ( L , E , U ): a Bruhat generator for the left triangularpart of A if n = 1 then return ([0] , [0] , [0]) Split A = (cid:20) A A A (cid:21) where A is (cid:98) n (cid:99) × (cid:98) n (cid:99) Decompose A = P (cid:20) L M (cid:21) (cid:2) U V (cid:3) Q (cid:46) PLUQ ( A ) R ← P (cid:20) I r (cid:21) Q where r = rank( A ). (cid:20) B B (cid:21) ← P T A (cid:46) PermR ( A , P T ) (cid:2) C C (cid:3) ← A Q T (cid:46) PermC ( A , Q T ) Here A = L \ U V B M C C . D ← L − B (cid:46) TRSM ( L , B ) E ← C U − (cid:46) TRSM ( C , U ) F ← B − M D (cid:46) MM ( B , M , D ) G ← C − EV (cid:46) MM ( C , E , V ) Here A = L \ U V DM . H ← P (cid:20) r × n F (cid:21) I ← (cid:2) r × n G (cid:3) Q ( L , E , U ) ← LT-Bruhat ( H ) ( L , E , U ) ← LT-Bruhat ( I ) L ← Left (cid:20) P I n (cid:21) L M 0E 0 (cid:20) Q I n (cid:21) + (cid:20) L L (cid:21) U ← P (cid:20) U V (cid:21) Q Left( P (cid:20) D0 (cid:21) ) + (cid:20) U U (cid:21) E ← (cid:20) E E E (cid:21) return ( L , E , U ) Theorem For any n × n matrix A with a left tri-angular part of quasiseparable order s , Algorithm 4 com-putes the Bruhat generator of the left triangular part of A in O ( s ω − n ) field operations. roof. The correctness of E is proven in Theorem 2. Wewill prove by induction the correctness of U , noting that thecorrectness of L works similarly.Let H = P L U Q and I = P L U Q be PLUQ decom-positions of H and I revealing their rank profile matrices.Assume that Algorithm LT-Bruhat is correct in the two re-cursive calls 15 and 16, that is U = Left( P (cid:20) U (cid:21) Q ) , U = Left( P (cid:20) U (cid:21) Q ) , L = Left( P (cid:2) L (cid:3) Q ) , L = Left( P (cid:2) L (cid:3) Q ) . At step 7, we have (cid:20) A A A ∗ (cid:21) = (cid:20) P I n (cid:21) L M I n − r E 0 I n × U V D0 FG (cid:20) Q I n (cid:21) As the first r rows of P T H are zeros, there exists ¯P apermutation matrix and ¯L , a lower triangular matrix, suchthat P T P L = (cid:20) r × n ¯P ¯L (cid:21) . Similarly, there exsist ¯Q , a per-mutation matrix and ¯U , an upper triangular matrix, suchthat U Q Q T = (cid:2) n × r ¯U ¯Q (cid:3) . Hence (cid:20) A A A ∗ (cid:21) = (cid:20) P P (cid:21) L M ¯P ¯L P T E 0 L × U V DQ T ¯U ¯Q (cid:20) Q Q (cid:21) Setting N = ¯P T M and W = V ¯Q T , we have (cid:20) A A A ∗ (cid:21) = P (cid:20) I r ¯P (cid:21) P L N ¯L E 0 L × U W DQ T ¯U (cid:20) I r ¯Q (cid:21) Q Q . A PLUQ of (cid:104) A A A (cid:105) revealing its rank profile matrix is thenobtained from this decomposition by a row block cylic-shifton the second factor and a column block cyclic shift on thethird factor as in [5, Algorithm 1].Finally, P (cid:20) U (cid:21) Q = (cid:20) P I n (cid:21) U V D0 ¯P U Q P ¯U ¯Q (cid:20) Q I n (cid:21) = P (cid:20) U V (cid:21) Q P (cid:20) D0 (cid:21) + P (cid:20) U (cid:21) Q P (cid:20) U (cid:21) Q . Hence Left( PUQ ) = P (cid:20) U V (cid:21) Q Left( P (cid:20) D0 (cid:21) ) + (cid:20) U U (cid:21) . The complexity analysis is exactly that of Theorem 2.The computation of a compact Bruhat generator is ob-tained by combining Algorithm 4 with Algorithm 3. For the three generators proposed earlier, the applicationof a vector to the corresponding left triangular matrix takesthe same amount of field operations as the number of co-efficients used for its representation. This yields a cost of O ( n ( r L log nr L + r U log nr U )) field operations for multiplyinga vector to an ( r L , r U )-quasiseparable matrix using the bi-nary tree PLUQ generator and O ( n ( r L + r U )) using eitherone of the Bruhat generator or its compact variant. Let T RL ( n, s ) denote the cost of multiplying a dense s × n matrix by a left triangular quasiseparable matrix of order s .The natural divide and conquer algorithm yields the recur-rence formula: T RL ( n, s ) = 2 T RL ( n/ , s ) + O ( ns ω − ) = O ( ns ω − log ns ) . Let T PL ( n, s ) denote the cost of multiplying a PLUQ de-composition of dimension n and rank s ≤ n/ s . The productcan be done in T PL ( n, s ) = T RL ( n, s ) + O ( n s ω − ) = O ( n s ω − ) . Lastly, let T LL ( n, s ) denote the cost of multiplying two left-triangular matrices of quasiseparability order s . Again thenatural recursive algorithm yields: T LL ( n, s ) = 2 T LL ( n/ , s ) + 2 T PL ( n/ , s ) + O ( n s ω − )= O ( n s ω − ) Using the decomposition (8), the product of two left tri-angular matrices writes A × B = C A R A E A × C B R B E B where C X = D L X + S L X T L X and E X = D U X + T U X S U X for X ∈ { A , B } .We will compute it using the following parenthesizing: A × B = C A ( R A ( E A × C B ) R B ) E B . (9)The product E A × C B = ( D U A + T U A S U A )( D L B + S L B T L B )only consists in multiplying together block diagonal or sub-diagonal matrices n × r B or r A × n . We will describe theproduct of two block diagonal matrices (flat times tall); theother cases with sub-diagonal matrices work similarly.Each term to be multiplied is decomposed in a grid of s × s tiles (except at the last row and column positions).In this grid, the non-zero blocks are non longer in a block-diagonal layout: in a flat matrix, the leading block of ablock row may lie at the same block column position asthe trailing block of its preceding block row, as shown inFigure 3. However, since k i ≥ s for all i , no more than two Figure 3: Aligning a block diagonal matrix (blue) onan s × s grid. Each block row of the aligned structure(red) may overlap with the previous and next blockrows on at most one s × s tile on each side. onsecutive block rows of a flat matrix lie in the same blockcolumn. Consequently these terms can be decomposed asa sum of two block diagonal matrices aligned on an s × s grid. Multiplying two such matrices costs O ( s ω − n ) whichis consequently also the cost of computing the product E A C B .After left and right multiplication by the permutations R A and R B , this r A × r B dense matrix is multiplied to the left by C A . This costs O ( nr B s ω − ). Lastly, the right multiplicationby E B of the resulting n × r A matrix costs O ( n s ω − ) whichdominates the overall cost. Decomposing each multiplicand into its upper, lower anddiagonal terms, a product of two quasiseparable matriceswrites A × B = ( L A + D A + U A )( L B + D B + U B ) . Beside thescaling by diagonal matrices, all other operations involve aproduct between any combination of lower an upper trian-gular matrices, which in turn translates into products of lefttriangular matrices and J n as shows in Table 1. The com- × Lower UpperLower J n × Left × J n × Left J n × Left × Left × J n Upper Left × J n × J n × Left Left × J n × Left × J n Table 1: Reducing products of lower and upper toproducts of left triangular matrices. plexity of section 5.3 directly applies for the computation ofUpper × Lower and Lower × Upper products. For the otherproducts, a J n factor has to be added between the E A and C B factors in the innermost product of (9). As reverting therow order of C B does not impact the cost of computing thisproduct, the same complexity applies here too. Theorem Mutliplying two quasiseparable matrices oforder respectively ( l A , u A ) and ( l B , u B ) costs O ( n s ω − ) fieldoperations where s = max( l A , u A , l B , u B ) , using either one ofthe binary tree or the compact Bruhat generator. 6. PERSPECTIVES The algorithms proposed for multiplying two quasisepara-ble matrices output a dense n × n matrix in time O ( n s ω − )for s = max( l A , u A , l B , u B ). However, the product is also aquasiseparable matrix, of order ( l A + l B , u A + u B ) [8, Theo-rem 4.1], which can be represented by a Bruhat generatorwith only O ( n ( l A + l B + u A + u B )) coefficients. A first naturalquestion is thus to find an algorithm computing this repre-sentation from the generators of A and B in time O ( ns ω − ).Second, a probabilistic algorithm [7, § 7] reduces the com-plexity of computing the rank profile matrix to O ˜( n + r ω ).It is not clear whether it can be applied to compute a com-pact Bruhat generator in time O ˜( n + max( l A , u A ) ω ). Note (added Sept. 16, 2016.) Equation (9) for the multiplication of two Bruhat genera-tors is missing the Left operators, and is therefore incorrect.The target complexities can still be obtained by slight modi-fication of the algorithm: computing the inner-most product E A × C B as an unevaluated sum of blocks products. This willbe detailed in a follow-up paper. Acknowledgment We thank Paola Boito for introducing us to the field of qua-siseparable matrices and two anonymous referees for point-ing us to the HSS and the Givens weight representations. Weacknowledge the financial support from the HPAC project(ANR 11 BS02 013) and from the OpenDreamKit Horizon2020 European Research Infrastructures project ( 7. REFERENCES [1] P. Boito, Y. Eidelman, and L. Gemignani. Implicit QRfor companion-like pencils. Math. of Computation ,85(300):1753–1774, 2016.[2] Tony F. Chan. Rank revealing QR factorizations. Linear Algebra and its Applications , 88:67–82, April1987.[3] S. Chandrasekaran and I. Ipsen. On Rank-RevealingFactorisations. SIAM Journal on Matrix Analysis andApplications , 15(2):592–622, April 1994.[4] S. Delvaux and M. Van Barel. A Givens-WeightRepresentation for Rank Structured Matrices. SIAMJ. on Matrix Analysis and Applications ,29(4):1147–1170, November 2007.[5] Jean-Guillaume Dumas, Cl´ement Pernet, and ZiadSultan. Simultaneous computation of the row andcolumn rank profiles. In Manuel Kauers, editor, Proc.ISSAC’13 , pages 181–188. ACM Press, 2013.[6] Jean-Guillaume Dumas, Cl´ement Pernet, and ZiadSultan. Computing the rank profile matrix. In Proc.ISSAC’15 , pages 149–156, New York, NY, USA, 2015.ACM.[7] Jean-Guillaume Dumas, Cl´ement Pernet, and ZiadSultan. Fast computation of the rank profile matrixand the generalized bruhat decomposition. Technicalreport, 2015. arXiv:1601.01798 .[8] Y. Eidelman and I. Gohberg. On a new class ofstructured matrices. Integral Equations and OperatorTheory , 34(3):293–324, September 1999.[9] Yuli Eidelman, Israel Gohberg, and Vadim Olshevsky.The QR iteration method for hermitian quasiseparablematrices of an arbitrary order. Linear Algebra and itsApplications , 404:305 – 324, 2005.[10] Tsung-Min Hwang, Wen-Wei Lin, and Eugene K.Yang. Rank revealing LU factorizations. LinearAlgebra and its Applications , 175:115–141, October1992.[11] K Lessel, M. Hartman, and ShivkumarChandrasekaran. A fast memory efficient constructionalgorithm for hierarchically semi-separablerepresentations. Technical report, 2015. http://scg.ece.ucsb.edu/publications/MemoryEfficientHSS.pdf.[12] Gennadi Ivanovich Malaschonok. Fast generalizedBruhat decomposition. In CASC’10 , volume 6244 of LNCS , pages 194–202. Springer-Verlag, Berlin,Heidelberg, 2010.[13] Wilfried Manthey and Uwe Helmke. Bruhat canonicalform for linear systems. Linear Algebra and itsApplications , 425(2–3):261 – 282, 2007. Special Issuein honor of Paul Fuhrmann.[14] C.-T. Pan. On the existence and computation ofrank-revealing LU factorizations. Linear Algebra andits Applications , 316(1–3):199–222, September 2000.15] The LinBox Group. LinBox: Linear algebra overblack-box matrices , v1.4.1 edition, 2016.http://linalg.org/.[16] R. Vandebril, M. Van Barel, G. Golub, andN. Mastronardi. A bibliography on semiseparablematrices. CALCOLO , 42(3):249–270, 2005.[17] Raf Vandebril, Marc Van Barel, and NicolaMastronardi. Matrix computations and semiseparablematrices: linear systems , volume 1. The JohnsHopkins University Press, 2007.[18] Jianlin Xia, Shivkumar Chandrasekaran, Ming Gu,and Xiaoye S. Li. Fast algorithms for hierarchicallysemiseparable matrices.