[PDF] Computing with quasiseparable matrices

Abstract

The class of quasiseparable matrices is defined by a pair of bounds, called the quasiseparable orders, on the ranks of the maximal sub-matrices entirely located in their strictly lower and upper triangular parts. These arise naturally in applications, as e.g. the inverse of band matrices, and are widely used for they admit structured representations allowing to compute with them in time linear in the dimension and quadratic with the quasiseparable order. We show, in this paper, the connection between the notion of quasisepa-rability and the rank profile matrix invariant, presented in [Dumas \& al. ISSAC'15]. This allows us to propose an algorithm computing the quasiseparable orders (rL, rU) in time O(n^2 s^( ω --2)) where s = max(rL, rU) and ω the exponent of matrix multiplication. We then present two new structured representations, a binary tree of PLUQ decompositions, and the Bruhat generator, using respectively O(ns log n/s) and O(ns) field elements instead of O(ns^2) for the previously known generators. We present algorithms computing these representations in time O(n^2 s^( ω --2)). These representations allow a matrix-vector product in time linear in the size of their representation. Lastly we show how to multiply two such structured matrices in time O(n^2 s^( ω --2)).

Full PDF

CComputing with Quasiseparable Matrices

Clément Pernet

Univ. Grenoble Alpes Laboratoire LIP Inria, Université de Lyon46, Allée d’Italie, F69364 Lyon Cedex 07, [email protected]

ABSTRACT

The class of quasiseparable matrices is deﬁned by a pairof bounds, called the quasiseparable orders, on the ranksof the sub-matrices entirely located in their strictly lowerand upper triangular parts. These arise naturally in ap-plications, as e.g. the inverse of band matrices, and arewidely used for they admit structured representations allow-ing to compute with them in time linear in the dimension.We show, in this paper, the connection between the notionof quasiseparability and the rank proﬁle matrix invariant,presented in [Dumas & al. ISSAC’15]. This allows us topropose an algorithm computing the quasiseparable orders( r L , r U ) in time O ( n s ω − ) where s = max( r L , r U ) and ω the exponent of matrix multiplication. We then present twonew structured representations, a binary tree of PLUQ de-compositions, and the Bruhat generator, using respectively O ( ns log ns ) and O ( ns ) ﬁeld elements instead of O ( ns ) forthe classical generator and O ( ns log n ) for the hierarchicallysemiseparable representations. We present algorithms com-puting these representations in time O ( n s ω − ). These rep-resentations allow a matrix-vector product in time linear inthe size of their representation. Lastly we show how to mul-tiply two such structured matrices in time O ( n s ω − ).

1. INTRODUCTION

The inverse of a tridiagonal matrix, when it exists, is adense matrix with the property that all sub-matrices en-tirely below or above its diagonal have rank at most one.This property and many generalizations of it, deﬁning thesemiseparable and quasiseparable matrices, have been ex-tensively studied over the past 80 years. We refer to [16]and [17] for a broad bibliographic overview on the topic.In this paper, we will focus on the class of quasiseparablematrices, introduced in [8]:

Definition An n × n matrix M is ( r L , r U ) -quasisepa-rable if its strictly lower and upper triangular parts satisfy Publication rights licensed to ACM. ACM acknowledges that this contribution wasauthored or co-authored by an employee, contractor or afﬁliate of a national govern-ment. As such, the Government retains a nonexclusive, royalty-free right to publish orreproduce this article, or to allow others to do so, for Government purposes only.

ISSAC ’16, July 19 - 22, 2016, Waterloo, ON, Canada

ACM ISBN 978-1-4503-4380-0/16/07. . . $15.00Copyright is held by the owner/author(s). Publication rights licensed to ACM.DOI: http://dx.doi.org/10.1145/2930889.2930915 the following low rank structure: for all ≤ k ≤ n − ,rank ( M k +1 ..n, ..k ) ≤ r L , (1) rank ( M ..k,k +1 ..n ) ≤ r U . (2) The values r L and r U deﬁne the quasiseparable orders of M . Quasiseparable matrices can be represented with fewerthan n coeﬃcients, using a structured representation, calleda generator. The most commonly used generator [8, 16, 17,9, 1] for a matrix M , consists of ( n −

1) pairs of vectors p ( i ) , q ( i ) of size r L , ( n −

1) pairs of vectors g ( i ) , h ( i ) of size r U , n − a ( i ) of dimension r L × r L , and n − b ( i ) of dimension r U × r U such that M i,j =  p ( i ) T a >ij q ( j ) , ≤ j < i ≤ nd ( i ) , ≤ i = j ≤ ng ( i ) T b ij = a ( i − . . . a ( j + 1) for j > i + 1, a j +1 ,j = 1,and b

1) for i > j + 1, b i,i +1 = 1. Thisrepresentation, of size O ( n ( r L + r U )) makes it possible toapply a vector in O ( n ( r L + r U )) ﬁeld operations, multiplytwo quasiseparable matrices in time O ( n max( r L , r U ) ) andalso compute the inverse in time O ( n max( r L , r U ) ) [8].The contribution of this paper, is to make the connec-tion between the notion of quasiseparability and a matrixinvariant, the rank proﬁle matrix, that we introduced in [6].More precisely, we show that the PLUQ decompositions ofthe lower and upper triangular parts of a quasiseparable ma-trix, using a certain class of pivoting strategies, also have astructure ensuring that their memory footprint and the timecomplexity to compute them does not depend on the rankof the matrix but on the quasiseparable order (which can bearbitrarily lower). Note that we will assume throughout thepaper that the PLUQ decomposition algorithms mentionedhave the ability to reveal ranks. This is the case when com-puting with exact arithmetic (e.g. ﬁnite ﬁelds or multipreci-sion rationals), but not always with ﬁnite precision ﬂoatingpoint arithmetic. In the latter context, a special care needto be taken for the pivoting of LU decompositions [10, 14],and QR or SVD decompositions are often more commonlyused [2, 3]. This study is motivated by the design of new al-gorithms on polynomial matrices where quasiseparable ma-trices naturally occur, and more generally by the frameworkof the LinBox library [15] for black-box exact linear algebra.After deﬁning and recalling the properties of the rank pro-ﬁle matrix in Section 2, we propose in Section 3 an algo-rithm computing the quasiseparable orders ( r L , r U ) in time O ( n s ω − ) where s = max( r L , r U ) and ω the exponent of a r X i v : . [ c s . S C ] S e p atrix multiplication. We then present in Section 4 twonew structured representations, a binary tree of PLUQ de-compositions, and the Bruhat generator, using respectively O ( ns log ns ) and O ( ns ) ﬁeld elements instead of O ( ns ) forthe previously known generators. We present in Section 5algorithms computing them in time O ( n s ω − ). These rep-resentations support a matrix-vector product in time linearin the size of their representation. Lastly we show how tomultiply two such structured matrices in time O ( n s ω − ).Throughout the paper, A i..j,k..l will denote the sub-matrixof A of row indices between i and j and column indicesbetween k and l . The matrix of the canonical basis, with aone at position ( i, j ) will be denoted by ∆ ( i,j ) .

2. PRELIMINARIES2.1 Left triangular matrices

We will make intensive use of matrices with non-zero el-ements only located above the main anti-diagonal. We willrefer to these matrices as left triangular, to avoid any con-fusion with upper triangular matrices.

Definition A left triangular matrix is any m × n ma-trix A such that A i,j = 0 for all i > n − j . The left triangular part of a matrix A , denoted by Left( A )will refer to the left triangular matrix extracted from it. Wewill need the following property on the left triangular partof the product of a matrix by a triangular matrix. Lemma Let A = BU be an m × n matrix where U is n × n upper triangular. Then Left ( A ) = Left ( Left ( B ) U ) . Proof.

Let ¯A = Left( A ) , ¯B = Left( B ). For j ≤ n − i, wehave ¯A i,j = (cid:80) nk =1 B i,k · U k,j = (cid:80) jk =1 B i,k · U k,j as U is uppertriangular. Now for k ≤ j ≤ n − i , B i,k = ¯B i,k , which provesthat the left triangular part of A is that of Left( B ) U .Applying Lemma 1 on A T yields Lemma 2 Lemma Let A = LB be an m × n matrix where L is m × m lower triangular. Then Left ( A ) = Left ( L Left ( B )) . Lastly, we will extend the notion of quasiseparable or-der to left triangular matrices, in the natural way: the leftquasiseparable order is the maximal rank of any leading k × ( n − k ) sub-matrix. When no confusion may occur, wewill abuse the deﬁnition and simply call it the quasiseparableorder. We will use a matrix invariant, introduced in [6, Theo-rem 1], that summarizes the information on the ranks ofany leading sub-matrices of a given input matrix.

Definition [6, Theorem 1] The rank proﬁle matrix ofan m × n matrix A of rank r is the unique m × n matrix R A ,with only r non-zero coeﬃcients, all equal to one, located ondistinct rows and columns such that any leading sub-matricesof R A has the same rank as the corresponding leading sub-matrix in A . This invariant can be computed in just one Gaussian elim-ination of the matrix A , at the cost of O ( mnr ω − ) ﬁeld oper-ations [6], provided some conditions on the pivoting strategy being used. It is obtained from the corresponding PLUQ de-composition as the product R A = P (cid:20) I r ( m − r ) × ( n − r ) (cid:21) Q . We also recall in Theorem 1 an important property of suchPLUQ decompositions revealing the rank proﬁle matrix.

Theorem 1 ([7, Th. 24], [5, Th. 1]).

Let A = PLUQ be a PLUQ decomposition revealing the rank proﬁle matrixof A . Then, P (cid:2) L 0 m × ( m − r ) (cid:3) P T is lower triangular and Q T (cid:20) U0 ( n − r ) × n (cid:21) Q is upper triangular. Lemma The rank proﬁle matrix invariant is preservedby multiplication1. to the left with an invertible lower triangular matrix,2. to the right with an invertible upper triangular matrix.

Proof.

Let B = LA for an invertible lower triangularmatrix L . Then rank( B ..i, ..j ) = rank( L ..i, ..i A ..i, ..j ) =rank( A ..i, ..j ) for any i ≤ m, j ≤ n . Hence R B = R A .

3. COMPUTING THE QUASISEPARABLEORDERS

Let M be an n × n matrix of which one want to determinethe quasiseparable orders ( r L , r U ). Let L and U be respec-tively the lower triangular part and the upper triangularpart of M .Let J n be the unit anti-diagonal matrix. Multiplying onthe left by J n reverts the row order while multiplying onthe right by J n reverts the column order. Hence both J n L and UJ n are left triangular matrices. Remark that the con-ditions (1) and (2) state that all leading k × ( n − k ) sub-matrices of J n L and UJ n have rank no greater than r L and r U respectively. We will then use the rank proﬁle matrix ofthese two left triangular matrices to ﬁnd these parameters. First, note that the rank proﬁle matrix of a left triangularmatrix is not necessarily left triangular. For example, therank proﬁle matrix of (cid:104) (cid:105) is (cid:104) (cid:105) . However, only theleft triangular part of the rank proﬁle matrix is suﬃcient tocompute the left quasiseparable orders.Suppose for the moment that the left-triangular part ofthe rank proﬁle matrix of a left triangular matrix is given(returned by a function LT-RPM). It remains to enumer-ate all leading k × ( n − k ) sub-matrices and ﬁnd the onewith the largest number of non-zero elements. Algorithm 1shows how to compute the largest rank of all leading sub-matrices of such a matrix. Run on J n L and UJ n , it returnssuccessively the quasiseparable orders r L and r U .This algorithm runs in O ( n ) provided that the rank proﬁlematrix R is stored in a compact way, e.g. using a vector of r pairs of pivot indices ([( i , j ) , . . . , ( i r , j r )]. We now deal with the missing component: computing theleft triangular part of the rank proﬁle matrix of a left trian-gular matrix. lgorithm 1

QS-order

Input: A , an n × n matrix Output: max { rank( A ..k, ..n − k ) : 1 ≤ k ≤ n − } R ← LT-RPM( A ) (cid:46) The left triangular part of the rank proﬁlematrix of A rows ← (False,. . . ,False)cols ← (False,. . . ,False) for all ( i, j ) such that R i,j = 1 do rows[i] ← Truecols[j] ← True end for s, r ← for i = 1 . . . n doif rows[ i ] then r ← r + 1 if cols[ n − i + 1] then r ← r − s ← max( s, r ) end forreturn s A ﬁrst approach is to run any Gaussian elimination algo-rithm that can reveal the rank proﬁle matrix, as describedin [6]. In particular, the PLUQ decomposition algorithmof [5] computes the rank proﬁle matrix of A in O ( n r ω − )where r = rank( A ). However this estimate is pessimisticas it does not take into account the left triangular shape ofthe matrix. Moreover, this estimate does not depend on theleft quasiseparable order s but on the rank r , which may bemuch higher. Remark The discrepancy between the rank r of a lefttriangular matrix and its quasiseparable order arises fromthe location of the pivots in its rank proﬁle matrix. Piv-ots located near the top left corner of the matrix are sharedby many leading sub-matrices, and are therefore likely con-tribute to the quasiseparable order. On the other hand, pivotsnear the anti-diagonal can be numerous, but do not add upto a large quasiseparable order. As an illustration, considerthe two following extreme cases:1. a matrix A with generic rank proﬁle. Then the leading r × r sub-matrix of A has rank r and the quasiseparableorder is s = r .2. the matrix with n − ones right above the anti-diagonal.It has rank r = n − but quasiseparable order . Remark 1 indicates that in the unlucky cases when r (cid:29) s ,the computation should reduce to instances of smaller sizes,hence a trade-oﬀ should exist between, on one hand, thediscrepency between r and s , and on the other hand, thedimension n of the problems. All contributions presented inthe remaining of the paper are based on such trade-oﬀs. In order to reach a complexity depending on s and not r ,we adapt in Algorithm 2 the tile recursive algorithm of [5],so that the left triangular structure of the input matrix ispreserved and can be used to reduce the amount of compu-tation.Algorithm 2 does not assume that the input matrix isleft triangular, as it will be called recursively with arbitrarymatrices, but guarantees to return the left triangular partof the rank proﬁle matrix. While the top left quadrant A Algorithm 2

LT-RPM: Left Triangular part of the RankProﬁle Matrix

Input: A : an n × n matrix Output: R : the left triangular part of the RPM of A if n = 1 then return [0] Split A = (cid:20) A A A (cid:21) where A is (cid:98) n (cid:99) × (cid:98) n (cid:99) Decompose A = P (cid:20) L M (cid:21) (cid:2) U V (cid:3) Q R ← P (cid:20) I r (cid:21) Q where r = rank( A ). (cid:20) B B (cid:21) ← P T A (cid:2) C C (cid:3) ← A Q T Here A =  L \ U V B M C C  . D ← L − B E ← C U − F ← B − M D G ← C − EV Here A =  L \ U V DM  . H ← P (cid:20) r × n F (cid:21) I ← (cid:2) r × n G (cid:3) Q R ← LT-RPM ( H ) R ← LT-RPM ( I ) return R ← (cid:20) R R R (cid:21) is eliminated using any PLUQ decomposition algorithm re-vealing the rank proﬁle matrix, the top right and bottomleft quadrants are handled recursively. Theorem Given an n × n input matrix A with left qua-siseparable order s , Algorithm 2 computes the left triangularpart of the rank proﬁle matrix of A in O ( n s ω − ) . Proof.

First remark that P (cid:20) DF (cid:21) = P (cid:20) L − − M L − I n − r (cid:21) P T (cid:124) (cid:123)(cid:122) (cid:125) L P (cid:20) B B (cid:21) = LA . Hence L (cid:2) A A (cid:3) = P (cid:20) (cid:2) U V (cid:3) Q D0 F (cid:21) . From Theorem 1, the matrix L is lower triangular and byLemma 3 the rank proﬁle matrix of (cid:2) A A (cid:3) equals thatof P (cid:20) (cid:2) U V (cid:3) Q D0 F (cid:21) . Now as U is upper triangu-lar and non-singular, this rank proﬁle matrix is in turnthat of P (cid:20) (cid:2) U V (cid:3) Q

00 F (cid:21) and its left triangular partis (cid:2) R R (cid:3) .By a similar reasoning, (cid:2) R R (cid:3) T is the left triangularpart of the rank proﬁle matrix of (cid:2) A A (cid:3) T , which showsthat the algorithm is correct.Let s be the left quasiseparable order of H and s thatof I . The number of ﬁeld operations to run Algorithm 2 is T ( n, s ) = αr ω − n + T LT-RPM ( n/ , s ) + T LT-RPM ( n/ , s )or a positive constant α . We will prove by induction that T ( n, s ) ≤ αs ω − n .Again, since L is lower triangular, the rank proﬁle matrixof LA is that of A and the quasiseparable orders of thetwo matrices are the same. Now H is the matrix LA withsome rows zeroed out, hence s , the quasiseparable orderof H is no greater than that of A which is less or equalto s . Hence max( r , s , s ) ≤ s and we obtain T ( n, s ) ≤ αs ω − n + 4 αs ω − ( n/ = 2 αs ω − n .

4. MORE COMPACT GENERATORS

Taking advantage of their low rank property, quasisep-arable matrices can be represented by a structured repre-sentation allowing to compute eﬃciently with them, as forexample in the context of QR or QZ elimination [9, 1].The most commonly used generator, as described in [8, 1]and in the introduction, represents an ( r L , r U )-quasiseparablematrix of order n by O ( n ( r L + r U )) ﬁeld coeﬃcients .Alternatively, hierarchically semiseparable representations(HSS) [18, 11] use numerical rank revealing factorizations ofthe oﬀ-diagonal blocks in a divide and conquer approach,reducing the size to O (max( r L , r U ) n log n ) [11].A third approach, based on Givens or unitary weights [4],performs another kind of elimination so as to compact thelow rank oﬀ-diagonal blocks of the input matrix.We propose, in this section, two alternative generators,based on an exact PLUQ decomposition revealing the rankproﬁle matrix. The ﬁrst one matches the best space com-plexity of the HSS representation, and improves the timecomplexity to compute it by a reduction to fast matrix mul-tiplication. The second one also improves on the space com-plexity of HSS representation by removing the extra log n factor and shares some similarities with the unitary weightrepresentations of [4].First, remark that storing a PLUQ decomposition of rank r and dimension n × n uses 2 rn − r coeﬃcients: each of the L and U factor has dimension n × r or r × n ; the negative r term comes from the lower and upper triangular shapesof L and U . Here again, the rank r can be larger than thequasiseparable order s thus storing directly a PLUQ decom-position is too expensive. But as in Remark 1, the settingwhere r (cid:29) s is precisely when the pivots are near the anti-diagonal, and therefore the L and U factors have an addi-tional structure, with numerous zeros. The two proposedgenerators, rely on this fact. Following the divide and conquer scheme of Algorithm 2,we propose a ﬁrst generator requiring O ( n ( r L log nr L + r U log nr U )) (3)coeﬃcients.For a left triangular matrix A = (cid:20) A A A (cid:21) , the sub-matrix A is represented by its PLUQ decomposition ( P , L , U , Q ),which requires 2 r n ≤ sn ﬁeld coeﬃcients for L and U and2 n indices for P and Q . This scheme is then recursively ap-plied for the representation of A and A . These matriceshave quasiseparable order at most s , therefore the following Note that the statement of O ( n ( r L + r U )) for the samegenerator in [9] is erroneous. recurrence relation for the size of the representation holds: (cid:26) S ( n, s ) = sn + 2 S ( n/ , s ) for s < n/ S ( n, s ) = n + 2 S ( n/ , n/

4) for s ≥ n/ s ≥ n/

2, it solves in S ( n, s ) = n . Then for s < n/ S ( n, s ) = sn + 2 sn/ · · · + 2 k S ( n/ k , s ), for k such that n k ≤ s < n k − , which is k = (cid:100) log ns (cid:101) . Hence S ( n, s ) = sn log ns + sn = O ( sn log ns ). The estimate (3) is obtainedby applying this generator to the upper and lower triangularparts of the ( r L , r U )-quasiseparable matrix.This ﬁrst generator does not take fully advantage of therank structure of the matrix: the representation of each anti-diagonal block is independent from the pivots found in theblock A . The second generator, that will be presented inthe next section adresses this issue, in order to remove thelogarithmic factors in the estimate (3). We propose an alternative generator inspired by the gen-eralized Bruhat decomposition [13, 12, 7]. Contrarily to theformer one, it is not depending on a speciﬁc recursive cuttingof the matrix.Given a left triangular matrix A of quasiseparable order s and a PLUQ decomposition of it, revealing its rank proﬁlematrix E , the generator consists in the three matrices L = Left( P (cid:2) L 0 (cid:3) Q ) , (4) E = Left( E ) , (5) U = Left( P (cid:20) U0 (cid:21) Q ) . (6)Lemma 4 shows that these three matrices suﬃce to recoverthe initial left triangular matrix. Lemma A = Left ( LE T U ) Proof. A = P (cid:2) L 0 m × ( n − r ) (cid:3) QQ T (cid:20) U0 ( n − r ) × n (cid:21) Q . FromTheorem 1, the matrix Q T (cid:20) U0 (cid:21) Q is upper triangular and thematrix P (cid:2) L 0 (cid:3) P T is lower triangular. Applying Lemma 1yields A = Left( A ) = Left( L Q T (cid:20) U0 (cid:21) Q ) = Left( L E T P (cid:20) U0 (cid:21) Q ) , where E = P (cid:2) I r (cid:3) Q . Then, as L E T is the matrix P (cid:2) L 0 (cid:3) P T with some coeﬃcients zeroed out, it is lower triangular,hence applying again Lemma 2 yields A = Left( L E T U ) . (7)Consider any non-zero coeﬃcient e j,i of E T that is not inits the left triangular part, i.e. j > n − i . Its contributionto the product L E T , is only of the form L k,j e j,i . Howeverthe leading coeﬃcient in column j of P (cid:2) L (cid:3) Q is preciselyat position ( i, j ). Since i > n − j , this means that the j -thcolumn of L is all zero, and therefore e i,j has no contributionto the product. Hence we ﬁnally have A = Left( LE T U ).We now analyze the space required by this generator. Lemma Consider an n × n left triangular rank proﬁlematrix R with quasiseparable order s . Then a left triangularmatrix L all zero except at the positions of the pivots of R and below these pivots, does not contain more than s ( n − s ) non-zero coeﬃcients. roof. Let p ( k ) = rank( R ..k, ..n − k ). The value p ( k )indicates the number of non zero columns located in the k × n − k leading sub-matrix of L . Consequently the sum (cid:80) n − k =1 p ( k ) is an upper bound on the number of non-zero co-eﬃcients in L . Since p ( k ) ≤ s , it is bounded by sn . More pre-cisely, there is no more than k pivots in the ﬁrst k columnsand the ﬁrst k rows, hence p ( k ) ≤ k and p ( n − k ) ≤ k for k ≤ s . The bound becomes s ( s +1)+( n − s − s = s ( n − s ). Corollary The generator ( L , E , U ) uses s ( n − s ) ﬁeldcoeﬃcients and O ( n ) additional indices. Proof.

The leading column elements of L are located atthe pivot positions of the left triangular rank proﬁle matrix E . Lemma 5 can therefore be applied to show that thismatrix occupies no more than s ( n − s ) non-zero coeﬃcients.The same argument applies to the matrix U .Figure 1 illustrates this generator on a left triangular ma-trix of quasiseparable order 5. As the supports of L and U Figure 1: Support of the L (yellow), E (black) and U (red) matrices of the Bruhat generator for a × left triangular matrix of quasiseparable order . are disjoint, the two matrices can be shown on the same lefttriangular matrix. The pivots of E (black) are the leadingcoeﬃcients of every non-zero row of U and non-zero columnof L . Corollary Any ( r L , r U ) -quasiseparable matrix of di-mension n × n can be represented by a generator using nomore than n ( r L + r U ) + n − r L − r U ) ﬁeld elements. The sparse structure of the Bruhat generator makes it notamenable to the use of fast matrix arithmetic. We thereforepropose here a slight variation of it, that we will use insection 5 for fast complexity estimates. We will ﬁrst describethis compact representation for the L factor of the Bruhatgenerator.First, remark that there exists a permutation matrix Q moving the non-zero columns of L to the ﬁrst r positions,sorted by increasing leading row index, i.e. such that LQ isin column echelon form. The matrix LQ is now compacted,but still has r = rank( A ) columns, which may exceed s and thus preventing to reach complexities in terms of n and s only. We will again use the argument of Lemma 5 to producea more compact representation with only O ( ns ) non-zeroelements, stored in dense blocks. Algorithm 3 shows howto build such a representation composed of a block diagonalmatrix and a block sub-diagonal matrix, where all blockshave column dimension s :  D S D S D ... ... S t D t  . Algorithm 3

Compressing the Bruhat generator

Input: L : the ﬁrst matrix of the Bruhat generator Output: D , S , T , Q : the compression of L Q ← a permutation s.t. LQ is in column echelon form C ← LQ (cid:2) I r (cid:3) where r = rank( L ) Split C in column slices of width s . (cid:46) C =  C C C ... ... ... C t C t ... C tt  where C ii is k i × s . D ← Diag( C , . . . , C tt ) C ← C − D =  ... ... ... C t ... C t,t −  T ← I n for i = 3 . . . t do for each non zero column j of (cid:20) C i,i − ... C t,i − (cid:21) do Let k be a zero column of (cid:20) C i,i − ... C t,i − (cid:21) Move col. j in  C i,i − ... C t,i −  to col. k in  C i,i − ... C t,i −  . T ← ( I n + ∆ ( k,j ) − ∆ ( k,k ) ) × T end for end for S ← C =  ... ... C t,t −  Return ( D , S , T , Q ) Lemma Algorithm 3 computes a tuple ( D , S , T , Q ) where Q is a permutation matrix putting L in column echelon form, T ∈ { , } r × r , D = Diag ( D , . . . , D t ) , S =  ... ... S t  where each D i and S i is k i × s for k i ≥ s and (cid:80) ti =1 k i = n .This tuple is the compact Bruhat generator for L and satis-ﬁes L = (cid:2) D + ST 0 n × ( n − r ) (cid:3) Q T . Proof.

First, note that for every i , the dimensions ofthe blocks S i and D i are that of the block C ii . This blockcontains s pivots, hence k i ≥ s . We then prove that therealways exists a zero column to pick at step 10. The loci of thepossible non-zero elements in L are column segments belowa pivot and above the anti-diagonal. From Lemma 5, thesesegments have the property that each row of L is intersectedby no more than s of them. This property is preserved bycolumn permutation, and still holds on the matrix C . Inthe ﬁrst row of (cid:2) C i . . . C ii (cid:3) , there is a pivot located inthe block C ii . Hence there is at most s − (cid:2) C i . . . C i,i − (cid:3) . These s − C i,i − of column dimension s . igure 2: Support of the matrices LQ (left), PU (cen-ter) and of the corresponding compact Bruhat gen-erator (right and bottom) for the matrix of Figure 1.In the compact Bruhat generator: D is in black and S in magenta and yellow; those rows and columnsmoved at step 11 of Algorithm 3 are in yellow. There only remains to show that ST is the matrix C ofstep 6. For every pair of indices ( j, k ) selected in loop 8,right multiplication by ( I n + ∆ ( k,j ) − ∆ ( k,k ) ) adds up column k to column j and zeroes out column k . On matrix S , thishas the eﬀect of reverting each operation done at step 11 inthe reverse order of the loop 8.A compact representation of U is obtained in Lemma 7 byrunning Algorithm 3 on U T and transposing its output. Lemma There exist a tuple ( D , S , T , P ) called the com-pact Bruhat generator for U such that P is a permutationmatrix putting U in row echelon form, T ∈ { , } r × r , D = Diag ( D , . . . , D t ) , S = (cid:34) ... ... t (cid:35) where each D i and S i is s × k i for k i ≥ s and (cid:80) ti =1 k i = n and U = P T (cid:20) D + TS0 ( n − r ) × n (cid:21) . According to (7), the reconstruction of the initial ma-trix A , from the compact Bruhat generators, writes A = ( D L + S L T L ) R ( D U + T U S U ) (8)where R is the leading r × r sub-matrix of Q T E T P T . As ithas full rank, it is a permutation matrix.This factorization is a compact version of the generalizedBruhat decomposition [13, 7]: the left factor is a columnechelon form, the right factor a row echelon form.

5. COST OF COMPUTING WITH THE NEWGENERATORS5.1 Computation of the generators

Let T ( n, s ) denote the cost of the computation of thebinary tree generator for an n × n matrix of order of qua-siseparability s . It satisﬁes the recurrence relation T ( n, s ) = K ω s ω − (cid:0) n (cid:1) + 2 T ( n/ , s ), which solves in T ( n, s ) = K ω s ω − n with K ω = 2 ω − (2 ω − ω − − C ω where C ω is the leading constant of the complexity of matrixmultiplication [5]. We propose in Algorithm 4 an evolution of Algorithm 2to compute the factors of the Bruhat generator.

Algorithm 4

LT-Bruhat

Input: A : an n × n matrix Output: ( L , E , U ): a Bruhat generator for the left triangularpart of A if n = 1 then return ([0] , [0] , [0]) Split A = (cid:20) A A A (cid:21) where A is (cid:98) n (cid:99) × (cid:98) n (cid:99) Decompose A = P (cid:20) L M (cid:21) (cid:2) U V (cid:3) Q (cid:46) PLUQ ( A ) R ← P (cid:20) I r (cid:21) Q where r = rank( A ). (cid:20) B B (cid:21) ← P T A (cid:46) PermR ( A , P T ) (cid:2) C C (cid:3) ← A Q T (cid:46) PermC ( A , Q T ) Here A =  L \ U V B M C C  . D ← L − B (cid:46) TRSM ( L , B ) E ← C U − (cid:46) TRSM ( C , U ) F ← B − M D (cid:46) MM ( B , M , D ) G ← C − EV (cid:46) MM ( C , E , V ) Here A =  L \ U V DM  . H ← P (cid:20) r × n F (cid:21) I ← (cid:2) r × n G (cid:3) Q ( L , E , U ) ← LT-Bruhat ( H ) ( L , E , U ) ← LT-Bruhat ( I ) L ←

Left (cid:20) P I n (cid:21)  L M

0E 0  (cid:20) Q I n (cid:21) + (cid:20) L L (cid:21) U ←  P (cid:20) U V (cid:21) Q Left( P (cid:20) D0 (cid:21) )  + (cid:20) U U (cid:21) E ← (cid:20) E E E (cid:21) return ( L , E , U ) Theorem For any n × n matrix A with a left tri-angular part of quasiseparable order s , Algorithm 4 com-putes the Bruhat generator of the left triangular part of A in O ( s ω − n ) ﬁeld operations. roof. The correctness of E is proven in Theorem 2. Wewill prove by induction the correctness of U , noting that thecorrectness of L works similarly.Let H = P L U Q and I = P L U Q be PLUQ decom-positions of H and I revealing their rank proﬁle matrices.Assume that Algorithm LT-Bruhat is correct in the two re-cursive calls 15 and 16, that is U = Left( P (cid:20) U (cid:21) Q ) , U = Left( P (cid:20) U (cid:21) Q ) , L = Left( P (cid:2) L (cid:3) Q ) , L = Left( P (cid:2) L (cid:3) Q ) . At step 7, we have (cid:20) A A A ∗ (cid:21) = (cid:20) P I n (cid:21)  L M I n − r E 0 I n  ×  U V D0 FG  (cid:20) Q I n (cid:21) As the ﬁrst r rows of P T H are zeros, there exists ¯P apermutation matrix and ¯L , a lower triangular matrix, suchthat P T P L = (cid:20) r × n ¯P ¯L (cid:21) . Similarly, there exsist ¯Q , a per-mutation matrix and ¯U , an upper triangular matrix, suchthat U Q Q T = (cid:2) n × r ¯U ¯Q (cid:3) . Hence (cid:20) A A A ∗ (cid:21) = (cid:20) P P (cid:21)  L M ¯P ¯L P T E 0 L  ×  U V DQ T ¯U ¯Q  (cid:20) Q Q (cid:21) Setting N = ¯P T M and W = V ¯Q T , we have (cid:20) A A A ∗ (cid:21) =  P (cid:20) I r ¯P (cid:21) P   L N ¯L E 0 L  ×  U W DQ T ¯U  (cid:20) I r ¯Q (cid:21) Q Q  . A PLUQ of (cid:104) A A A (cid:105) revealing its rank proﬁle matrix is thenobtained from this decomposition by a row block cylic-shifton the second factor and a column block cyclic shift on thethird factor as in [5, Algorithm 1].Finally, P (cid:20) U (cid:21) Q = (cid:20) P I n (cid:21)  U V D0 ¯P U Q P ¯U ¯Q  (cid:20) Q I n (cid:21) =  P (cid:20) U V (cid:21) Q P (cid:20) D0 (cid:21)  +  P (cid:20) U (cid:21) Q P (cid:20) U (cid:21) Q  . Hence

Left(

PUQ ) =  P (cid:20) U V (cid:21) Q Left( P (cid:20) D0 (cid:21) )  + (cid:20) U U (cid:21) . The complexity analysis is exactly that of Theorem 2.The computation of a compact Bruhat generator is ob-tained by combining Algorithm 4 with Algorithm 3.

For the three generators proposed earlier, the applicationof a vector to the corresponding left triangular matrix takesthe same amount of ﬁeld operations as the number of co-eﬃcients used for its representation. This yields a cost of O ( n ( r L log nr L + r U log nr U )) ﬁeld operations for multiplyinga vector to an ( r L , r U )-quasiseparable matrix using the bi-nary tree PLUQ generator and O ( n ( r L + r U )) using eitherone of the Bruhat generator or its compact variant. Let T RL ( n, s ) denote the cost of multiplying a dense s × n matrix by a left triangular quasiseparable matrix of order s .The natural divide and conquer algorithm yields the recur-rence formula: T RL ( n, s ) = 2 T RL ( n/ , s ) + O ( ns ω − ) = O ( ns ω − log ns ) . Let T PL ( n, s ) denote the cost of multiplying a PLUQ de-composition of dimension n and rank s ≤ n/ s . The productcan be done in T PL ( n, s ) = T RL ( n, s ) + O ( n s ω − ) = O ( n s ω − ) . Lastly, let T LL ( n, s ) denote the cost of multiplying two left-triangular matrices of quasiseparability order s . Again thenatural recursive algorithm yields: T LL ( n, s ) = 2 T LL ( n/ , s ) + 2 T PL ( n/ , s ) + O ( n s ω − )= O ( n s ω − ) Using the decomposition (8), the product of two left tri-angular matrices writes A × B = C A R A E A × C B R B E B where C X = D L X + S L X T L X and E X = D U X + T U X S U X for X ∈ { A , B } .We will compute it using the following parenthesizing: A × B = C A ( R A ( E A × C B ) R B ) E B . (9)The product E A × C B = ( D U A + T U A S U A )( D L B + S L B T L B )only consists in multiplying together block diagonal or sub-diagonal matrices n × r B or r A × n . We will describe theproduct of two block diagonal matrices (ﬂat times tall); theother cases with sub-diagonal matrices work similarly.Each term to be multiplied is decomposed in a grid of s × s tiles (except at the last row and column positions).In this grid, the non-zero blocks are non longer in a block-diagonal layout: in a ﬂat matrix, the leading block of ablock row may lie at the same block column position asthe trailing block of its preceding block row, as shown inFigure 3. However, since k i ≥ s for all i , no more than two Figure 3: Aligning a block diagonal matrix (blue) onan s × s grid. Each block row of the aligned structure(red) may overlap with the previous and next blockrows on at most one s × s tile on each side. onsecutive block rows of a ﬂat matrix lie in the same blockcolumn. Consequently these terms can be decomposed asa sum of two block diagonal matrices aligned on an s × s grid. Multiplying two such matrices costs O ( s ω − n ) whichis consequently also the cost of computing the product E A C B .After left and right multiplication by the permutations R A and R B , this r A × r B dense matrix is multiplied to the left by C A . This costs O ( nr B s ω − ). Lastly, the right multiplicationby E B of the resulting n × r A matrix costs O ( n s ω − ) whichdominates the overall cost. Decomposing each multiplicand into its upper, lower anddiagonal terms, a product of two quasiseparable matriceswrites A × B = ( L A + D A + U A )( L B + D B + U B ) . Beside thescaling by diagonal matrices, all other operations involve aproduct between any combination of lower an upper trian-gular matrices, which in turn translates into products of lefttriangular matrices and J n as shows in Table 1. The com- × Lower UpperLower J n × Left × J n × Left J n × Left × Left × J n Upper Left × J n × J n × Left Left × J n × Left × J n Table 1: Reducing products of lower and upper toproducts of left triangular matrices. plexity of section 5.3 directly applies for the computation ofUpper × Lower and Lower × Upper products. For the otherproducts, a J n factor has to be added between the E A and C B factors in the innermost product of (9). As reverting therow order of C B does not impact the cost of computing thisproduct, the same complexity applies here too. Theorem Mutliplying two quasiseparable matrices oforder respectively ( l A , u A ) and ( l B , u B ) costs O ( n s ω − ) ﬁeldoperations where s = max( l A , u A , l B , u B ) , using either one ofthe binary tree or the compact Bruhat generator.

6. PERSPECTIVES

The algorithms proposed for multiplying two quasisepara-ble matrices output a dense n × n matrix in time O ( n s ω − )for s = max( l A , u A , l B , u B ). However, the product is also aquasiseparable matrix, of order ( l A + l B , u A + u B ) [8, Theo-rem 4.1], which can be represented by a Bruhat generatorwith only O ( n ( l A + l B + u A + u B )) coeﬃcients. A ﬁrst naturalquestion is thus to ﬁnd an algorithm computing this repre-sentation from the generators of A and B in time O ( ns ω − ).Second, a probabilistic algorithm [7, §

7] reduces the com-plexity of computing the rank proﬁle matrix to O ˜( n + r ω ).It is not clear whether it can be applied to compute a com-pact Bruhat generator in time O ˜( n + max( l A , u A ) ω ). Note (added Sept. 16, 2016.)

Equation (9) for the multiplication of two Bruhat genera-tors is missing the Left operators, and is therefore incorrect.The target complexities can still be obtained by slight modi-ﬁcation of the algorithm: computing the inner-most product E A × C B as an unevaluated sum of blocks products. This willbe detailed in a follow-up paper. Acknowledgment

We thank Paola Boito for introducing us to the ﬁeld of qua-siseparable matrices and two anonymous referees for point-ing us to the HSS and the Givens weight representations. Weacknowledge the ﬁnancial support from the HPAC project(ANR 11 BS02 013) and from the OpenDreamKit Horizon2020 European Research Infrastructures project (

7. REFERENCES [1] P. Boito, Y. Eidelman, and L. Gemignani. Implicit QRfor companion-like pencils.

Math. of Computation ,85(300):1753–1774, 2016.[2] Tony F. Chan. Rank revealing QR factorizations.

Linear Algebra and its Applications , 88:67–82, April1987.[3] S. Chandrasekaran and I. Ipsen. On Rank-RevealingFactorisations.

SIAM Journal on Matrix Analysis andApplications , 15(2):592–622, April 1994.[4] S. Delvaux and M. Van Barel. A Givens-WeightRepresentation for Rank Structured Matrices.

SIAMJ. on Matrix Analysis and Applications ,29(4):1147–1170, November 2007.[5] Jean-Guillaume Dumas, Cl´ement Pernet, and ZiadSultan. Simultaneous computation of the row andcolumn rank proﬁles. In Manuel Kauers, editor,

Proc.ISSAC’13 , pages 181–188. ACM Press, 2013.[6] Jean-Guillaume Dumas, Cl´ement Pernet, and ZiadSultan. Computing the rank proﬁle matrix. In

Proc.ISSAC’15 , pages 149–156, New York, NY, USA, 2015.ACM.[7] Jean-Guillaume Dumas, Cl´ement Pernet, and ZiadSultan. Fast computation of the rank proﬁle matrixand the generalized bruhat decomposition. Technicalreport, 2015. arXiv:1601.01798 .[8] Y. Eidelman and I. Gohberg. On a new class ofstructured matrices.

Integral Equations and OperatorTheory , 34(3):293–324, September 1999.[9] Yuli Eidelman, Israel Gohberg, and Vadim Olshevsky.The QR iteration method for hermitian quasiseparablematrices of an arbitrary order.

Linear Algebra and itsApplications , 404:305 – 324, 2005.[10] Tsung-Min Hwang, Wen-Wei Lin, and Eugene K.Yang. Rank revealing LU factorizations.

LinearAlgebra and its Applications , 175:115–141, October1992.[11] K Lessel, M. Hartman, and ShivkumarChandrasekaran. A fast memory eﬃcient constructionalgorithm for hierarchically semi-separablerepresentations. Technical report, 2015. http://scg.ece.ucsb.edu/publications/MemoryEﬃcientHSS.pdf.[12] Gennadi Ivanovich Malaschonok. Fast generalizedBruhat decomposition. In

CASC’10 , volume 6244 of

LNCS , pages 194–202. Springer-Verlag, Berlin,Heidelberg, 2010.[13] Wilfried Manthey and Uwe Helmke. Bruhat canonicalform for linear systems.

Linear Algebra and itsApplications , 425(2–3):261 – 282, 2007. Special Issuein honor of Paul Fuhrmann.[14] C.-T. Pan. On the existence and computation ofrank-revealing LU factorizations.

Linear Algebra andits Applications , 316(1–3):199–222, September 2000.15] The LinBox Group.

LinBox: Linear algebra overblack-box matrices , v1.4.1 edition, 2016.http://linalg.org/.[16] R. Vandebril, M. Van Barel, G. Golub, andN. Mastronardi. A bibliography on semiseparablematrices.

CALCOLO , 42(3):249–270, 2005.[17] Raf Vandebril, Marc Van Barel, and NicolaMastronardi.

Matrix computations and semiseparablematrices: linear systems , volume 1. The JohnsHopkins University Press, 2007.[18] Jianlin Xia, Shivkumar Chandrasekaran, Ming Gu,and Xiaoye S. Li. Fast algorithms for hierarchicallysemiseparable matrices.