[PDF] Computing the Rank Profile Matrix

Abstract

The row (resp. column) rank profile of a matrix describes the staircase shape of its row (resp. column) echelon form. In an ISSAC'13 paper, we proposed a recursive Gaussian elimination that can compute simultaneously the row and column rank profiles of a matrix as well as those of all of its leading sub-matrices, in the same time as state of the art Gaussian elimination algorithms. Here we first study the conditions making a Gaus-sian elimination algorithm reveal this information. Therefore, we propose the definition of a new matrix invariant, the rank profile matrix, summarizing all information on the row and column rank profiles of all the leading sub-matrices. We also explore the conditions for a Gaussian elimination algorithm to compute all or part of this invariant, through the corresponding PLUQ decomposition. As a consequence, we show that the classical iterative CUP decomposition algorithm can actually be adapted to compute the rank profile matrix. Used, in a Crout variant, as a base-case to our ISSAC'13 implementation, it delivers a significant improvement in efficiency. Second, the row (resp. column) echelon form of a matrix are usually computed via different dedicated triangular decompositions. We show here that, from some PLUQ decompositions, it is possible to recover the row and column echelon forms of a matrix and of any of its leading sub-matrices thanks to an elementary post-processing algorithm.

Full PDF

CComputing the Rank Proﬁle Matrix ∗ Jean-Guillaume Dumas † Cl´ement Pernet ‡ Ziad Sultan § May 11, 2018

Abstract

The row (resp. column) rank proﬁle of a matrix describes the stair-case shape of its row (resp. column) echelon form. In an ISSAC’13 paper,we proposed a recursive Gaussian elimination that can compute simulta-neously the row and column rank proﬁles of a matrix as well as those of allof its leading sub-matrices, in the same time as state of the art Gaussianelimination algorithms. Here we ﬁrst study the conditions making a Gaus-sian elimination algorithm reveal this information. Therefore, we proposethe deﬁnition of a new matrix invariant, the rank proﬁle matrix, summa-rizing all information on the row and column rank proﬁles of all the leadingsub-matrices. We also explore the conditions for a Gaussian eliminationalgorithm to compute all or part of this invariant, through the correspond-ing PLUQ decomposition. As a consequence, we show that the classicaliterative CUP decomposition algorithm can actually be adapted to com-pute the rank proﬁle matrix. Used, in a Crout variant, as a base-caseto our ISSAC’13 implementation, it delivers a signiﬁcant improvement ineﬃciency. Second, the row (resp. column) echelon form of a matrix areusually computed via diﬀerent dedicated triangular decompositions. Weshow here that, from some PLUQ decompositions, it is possible to recoverthe row and column echelon forms of a matrix and of any of its leadingsub-matrices thanks to an elementary post-processing algorithm.

Triangular matrix decompositions are widely used in computational linear alge-bra. Besides solving linear systems of equations, they are also used to computeother objects more speciﬁc to exact arithmetic: computing the rank, samplinga vector from the null-space, computing echelon forms and rank proﬁles. ∗ This work is partly funded by the HPAC project of the French Agence Nationalede la Recherche (ANR 11 BS02 013). † Universit´e de Grenoble Alpes. Laboratoire LJK, umr CNRS. 51, av. des Math´ematiques,F38041 Grenoble, France. [email protected], ‡ Universit´e de Grenoble Alpes. Laboratoire LIP, Inria, CNRS, UCBL, ENS de Lyon, 46,All´ee d’Italie, F69364 Lyon Cedex 07 France. [email protected], § Universit´e de Grenoble Alpes. Laboratoires LJK and LIG, Inria, CNRS. Inovall´ee, 655,av. de l’Europe, F38334 St Ismier Cedex, France. [email protected]. a r X i v : . [ c s . S C ] A ug he row rank proﬁle (resp. column rank proﬁle ) of an m × n matrix A withrank r , denoted by RowRP(A) (resp. ColRP(A)), is the lexicographically small-est sequence of r indices of linearly independent rows (resp. columns) of A . An m × n matrix has generic row (resp. column) rank proﬁle if its row (resp. column)rank proﬁle is (1 , .., r ). Lastly, an m × n matrix has generic rank proﬁle if its r ﬁrst leading principal minors are non-zero. Note that if a matrix has genericrank proﬁle, then its row and column rank proﬁles are generic, but the converseis false: the matrix (cid:20) (cid:21) does not have generic rank proﬁle even if its row andcolumn rank proﬁles are generic. The row support (resp. column support) of amatrix A , denoted by RowSupp( A ) (resp. ColSupp( A )), is the subset of indicesof its non-zero rows (resp. columns).We recall that the row echelon form of an m × n matrix A is an uppertriangular matrix E = T A , for a non-singular matrix T , with the zero rows of E at the bottom and the non-zero rows in stair-case shape: min { j : a i,j = 0 } < min { j : a i +1 ,j = 0 } . As T is non singular, the column rank proﬁle of A is thatof E , and therefore corresponds to the column indices of the leading elements inthe staircase. Similarly the row rank proﬁle of A is composed of the row indicesof the leading elements in the staircase of the column echelon form of A . Rank proﬁle and triangular matrix decompositions

The rank proﬁlesof a matrix and the triangular matrix decomposition obtained by Gaussianelimination are strongly related. The elimination of matrices with arbitraryrank proﬁles gives rise to several matrix factorizations and many algorithmicvariants. In numerical linear algebra one often uses the PLUQ decomposition,with P and Q permutation matrices, L a lower unit triangular matrix and U an upper triangular matrix. The LSP and LQUP variants of [8] are used toreduce the complexity rank deﬁcient Gaussian elimination to that of matrixmultiplication. Many other algorithmic decompositions exist allowing fractionfree computations [10], in-place computations [4, 9] or sub-cubic rank-sensitivetime complexity [13, 9]. In [5] we proposed a Gaussian elimination algorithmwith a recursive splitting of both row and column dimensions, and replacingrow and column transpositions by rotations. This elimination can computesimultaneously the row and column rank proﬁle while preserving the sub-cubicrank-sensitive time complexity and keeping the computation in-place.In this paper we ﬁrst study the conditions a PLUQ decomposition algorithmmust satisfy in order to reveal the rank proﬁle structure of a matrix. We in-troduce in section 2 the rank proﬁle matrix R A , a normal form summarizingall rank proﬁle information of a matrix and of all its leading sub-matrices. Wethen decompose, in section 3, the pivoting strategy of any PLUQ algorithm intotwo types of operations: the search of the pivot and the permutation used tomove it to the main diagonal. We propose a new search and a new permutationstrategy and show what rank proﬁles are computed using any possible combina-tion of these operations and the previously used searches and permutations. Inparticular we show three new pivoting strategy combinations that compute the2ank proﬁle matrix and use one of them, an iterative Crout CUP with rotations,to improve the base case and thus the overall performance of exact Gaussianelimination. Second, we show that preserving both the row and column rankproﬁles, together with ensuring a monotonicity of the associated permutations,allows us to compute faster several other matrix decompositions, such as theLEU and Bruhat decompositions, and echelon forms.In the following, 0 m × n denotes the m × n zero matrix and A i..j,k..l denotesthe sub-matrix of A of rows between i and j and columns between k and l . Toa permutation σ : { , . . . , n } → { , . . . , n } we deﬁne the associated permutationmatrix P σ , permuting rows by left multiplication: the rows of P σ A are that of A permuted by σ . Reciprocally, for a permutation matrix P , we denote by σ P the associated permutation. We start by introducing in Theorem 1 the rank proﬁle matrix, that we will usethroughout this document to summarize all information on the rank proﬁles of amatrix. From now on, matrices are over a ﬁeld K and a valid pivot is a non-zeroelement.

Deﬁnition 1. An r -sub-permutation matrix is a matrix of rank r with only r non-zero entries equal to one. Lemma 1. An m × n r -sub-permutation matrix has at most one non-zero entryper row and per column, and can be written P (cid:20) I r ( m − r ) × ( n − r ) (cid:21) Q where P and Q are permutation matrices. Theorem 1.

Let A ∈ K m × n . There exists a unique m × n rank ( A ) -sub-permutation matrix R A of which every leading sub-matrix has the same rankas the corresponding leading sub-matrix of A . This sub-permutation matrix iscalled the rank proﬁle matrix of A . Proof.

We prove existence by induction on the row dimension of the leadingsubmatrices.If A , ..n = 0 × n , setting R (1) = 0 × n satisﬁes the deﬁning condition. Oth-erwise, let j be the index of the leftmost invertible element in A , ..n and set R (1) = e Tj the j-th n -dimensional row canonical vector, which satisﬁes the deﬁn-ing condition.Now for a given i ∈ { , . . . , m } , suppose that there is a unique i × n rank proﬁle matrix R ( i ) such that rank( A ..i, ..j ) = rank( R ..i, ..j ) for every j ∈ { ..n } . If rank( A ..i +1 , ..n ) = rank( A ..i, ..n ), then R ( i +1) = (cid:20) R ( i ) × n (cid:21) . Oth-erwise, consider k , the smallest column index such that rank( A ..i +1 , ..k ) =rank( A ..i, ..k ) + 1 and set R ( i +1) = (cid:20) R ( i ) e Tk (cid:21) . Any leading sub-matrix of R ( i +1) has the same rank as the corresponding leading sub-matrix of A : ﬁrst, for any3eading subset of rows and columns with less than i rows, the case is covered bythe induction; second deﬁne (cid:20) B uv T x (cid:21) = A ..i +1 , ..k , where u, v are vectors and x is a scalar. From the deﬁnition of k , v is linearly dependent with B and thus anyleading sub-matrix of (cid:20) Bv T (cid:21) has the same rank as the corresponding sub-matrixof R ( i +1) . Similarly, from the deﬁnition of k , the same reasoning works whenconsidering more than k columns, with a rank increment by 1.Lastly we show that R ( i +1) is a r i +1 -sub-permutation matrix. Indeed, u is lin-early dependent with the columns of B : otherwise, rank( (cid:2) B u (cid:3) ) = rank( B ) + 1.From the deﬁnition of k we then have rank( (cid:20) B uv T x (cid:21) ) = rank ( (cid:2) B u (cid:3) ) + 1 =rank( B ) + 2 = rank( (cid:20) Bv T (cid:21) ) + 2 which is a contradiction. Consequently, the k -thcolumn of R ( i ) is all zero, and R ( i +1) is a r -sub-permutation matrix.To prove uniqueness, suppose there exist two distinct rank proﬁle matrices R (1) and R (2) for a given matrix A and let ( i, j ) be some coordinates where R (1)1 ..i, ..j = R (2)1 ..i, ..j and R (1)1 ..i − , ..j − = R (2)1 ..i − , ..j − . Then, rank( A ..i, ..j ) =rank( R (1)1 ..i, ..j ) = rank( R (2)1 ..i, ..j ) = rank( A ..i, ..j ) which is a contradiction. Example 1. A =   has R A =   for rank proﬁle matrix over Q . Remark 1.

The matrix E introduced in Malaschonok’s LEU decomposition [12,Theorem 1], is in fact the rank proﬁle matrix. There, the existence of thisdecomposition was only shown for m = n = 2 k , and no connection was madeto the relation with ranks and rank proﬁles. This connection was made in [5,Corollary 1], and the existence of E generalized to arbitrary dimensions m and n . Finally, after proving its uniqueness here, we propose this deﬁnition as anew matrix normal form. The rank proﬁle matrix has the following properties:

Lemma 2.

Let A be a matrix.1. R A is diagonal if A has generic rank proﬁle .2. R A is a permutation matrix if A is invertible

3. RowRP ( A ) = RowSupp ( R A ) ; ColRP ( A ) = ColSupp ( R A ) .Moreover, for all ≤ i ≤ m and ≤ j ≤ n , we have:4. RowRP ( A ..i, ..j ) = RowSupp (( R A ) ..i, ..j )

5. ColRP ( A ..i, ..j ) = ColSupp (( R A ) ..i, ..j ) , These properties show how to recover the row and column rank proﬁles of A and of any of its leading sub-matrix.4 Ingredients of a PLUQ decomposition algo-rithm

Over a ﬁeld, the LU decomposition generalizes to matrices with arbitrary rankproﬁles, using row and column permutations (in some cases such as the CUP,or LSP decompositions, the row permutation is embedded in the structure ofthe C or S matrices). However such PLUQ decompositions are not unique andnot all of them will necessarily reveal rank proﬁles and echelon forms. We willcharacterize the conditions for a PLUQ decomposition algorithm to reveal therow or column rank proﬁle or the rank proﬁle matrix.We consider the four types of operations of a Gaussian elimination algorithmin the processing of the k -th pivot: Pivot search: ﬁnding an element to be used as a pivot,

Pivot permutation: moving the pivot in diagonal position ( k, k ) by columnand/or row permutations,

Update: applying the elimination at position ( i, j ): a i,j ← a i,j − a i,k a − k,k a k,j , Normalization: dividing the k -th row (resp. column) by the pivot.Choosing how each of these operation is done, and when they are scheduledresults in an elimination algorithm. Conversely, any Gaussian elimination algo-rithm computing a PLUQ decomposition can be viewed as a set of specializationsof each of these operations together with a scheduling.The choice of doing the normalization on rows or columns only determineswhich of U or L will be unit triangular. The scheduling of the updates varydepending on the type of algorithm used: iterative, recursive, slab or tiled blocksplitting, with right-looking, left-looking or Crout variants [2]. Neither thenormalization nor the update impact the capacity to reveal rank proﬁles andwe will thus now focus on the pivot search and the permutations.Choosing a search and a permutation strategy ﬁxes the matrices P and Q of the PLUQ decomposition obtained and, as we will see, determines the abilityto recover information on the rank proﬁles. Once these matrices are ﬁxed, the L and the U factors are unique. We introduce the pivoting matrix. Deﬁnition 2.

The pivoting matrix of a PLUQ decomposition A = P LU Q ofrank r is the r -sub-permutation matrix Π P,Q = P (cid:20) I r ( m − r ) × ( n − r ) (cid:21) Q. The r non-zero elements of Π P,Q are located at the initial positions of thepivots in the matrix A . Thus Π P,Q summarizes the choices made in the searchand permutation operations.

Pivot search

The search operation vastly diﬀers depending on the ﬁeld ofapplication. In numerical dense linear algebra, numerical stability is the maincriterion for the selection of the pivot. In sparse linear algebra, the pivot is5hosen so as to reduce the ﬁll-in produced by the update operation. In order toreveal some information on the rank proﬁles, a notion of precedence has to beused: a usual way to compute the row rank proﬁle is to search in a given rowfor a pivot and only move to the next row if the current row was found to be allzeros. This guarantees that each pivot will be on the ﬁrst linearly independentrow, and therefore the row support of Π

P,Q will be the row rank proﬁle. Theprecedence here is that the pivot’s coordinates must minimize the order for theﬁrst coordinate (the row index). As a generalization, we consider the mostcommon preorders of the cartesian product { , . . . m } × { , . . . n } inherited fromthe natural orders of each of its components and describe the correspondingsearch strategies, minimizing this preorder: Row order: ( i , j ) (cid:22) row ( i , j ) iﬀ i ≤ i : search for any invertible elementin the ﬁrst non-zero row. Column order: ( i , j ) (cid:22) col ( i , j ) iﬀ j ≤ j . search for any invertible ele-ment in the ﬁrst non-zero column. Lexicographic order: ( i , j ) (cid:22) lex ( i , j ) iﬀ i < i or i = i and j ≤ j : search for the leftmost non-zero element of the ﬁrst non-zero row. Reverse lexicographic order: ( i , j ) (cid:22) revlex ( i , j ) iﬀ j < j or j = j and i ≤ i : search for the topmost non-zero element of the ﬁrst non-zerocolumn. Product order: ( i , j ) (cid:22) prod ( i , j ) iﬀ i ≤ i and j ≤ j : search for anynon-zero element at position ( i, j ) being the only non-zero of the leading ( i, j ) sub-matrix. Example 2.

Consider the matrix  a b c d e fg h i j kl m n o p  , where each literal is a non-zeroelement. The minimum non-zero elements for each preorder are the following:Row order a, b Column order g, l

Lexicographic order a Reverse lexic. order g Product order a, c, g

Pivot permutation

The pivot permutation moves a pivot from its initialposition to the leading diagonal. Besides this constraint all possible choicesare left for the remaining values of the permutation. Most often, it is doneby row or column transpositions, as it clearly involves a small amount of datamovement. However, these transpositions can break the precedence relations inthe set of rows or columns, and can therefore prevent the recovery of the rankproﬁle information. A pivot permutation that leaves the precedence relationsunchanged will be called k -monotonically increasing. Deﬁnition 3.

A permutation of σ ∈ S n is called k -monotonically increasing ifits last n − k values form a monotonically increasing sequence.

6n particular, the last n − k rows of the associated row-permutation matrix P σ are in row echelon form. For example, the cyclic shift between indices k and i , with k < i deﬁned as R k,i = (1 , . . . , k − , i, k, k + 1 , . . . , i − , i + 1 , . . . , n ),that we will call a ( k, i )-rotation, is an elementary k -monotonically increasingpermutation. Example 3.

The (1 , -rotation R , = (4 , , , is a -monotonically increas-ing permutation. Its row permutation matrix is   . In fact, any ( k, i ) -rotation is a k -monotonically increasing permutation. Monotonically increasing permutations can be composed as stated in Lemma 3.

Lemma 3. If σ ∈ S n is a k -monotonically increasing permutation and σ ∈S k × S n − k a k -monotonically increasing permutation with k < k then thepermutation σ ◦ σ is a k -monotonically increasing permutation.Proof. The last n − k values of σ ◦ σ are the image of a sub-sequence of n − k values from the last n − k values of σ through the monotonically increasingfunction σ .Therefore an iterative algorithm, using rotations as elementary pivot per-mutations, maintains the property that the permutation matrices P and Q atany step k are k -monotonically increasing. A similar property also applies withrecursive algorithms. A PLUQ decomposition reveals the row (resp. column) rank proﬁle if it can beread from the ﬁrst r values of the permutation matrix P (resp. Q ). Equivalently,by Lemma 2, this means that the row (resp. column) support of the pivotingmatrix Π P,Q equals that of the rank proﬁle matrix.

Deﬁnition 4.

The decomposition A = P LU Q reveals:1. the row rank proﬁle if RowSupp (Π P,Q ) =

RowSupp ( R A ) ,2. the col. rank proﬁle if ColSupp (Π P,Q ) =

ColSupp ( R A ) ,3. the rank proﬁle matrix if Π P,Q = R A . Example 4. A =   has R A =   for rank proﬁle matrix over Q . Now the pivoting matrix obtained from a PLUQ decomposition with a pivotsearch operation following the row order (any column, ﬁrst non-zero row) could e the matrix Π P,Q =   . As these matrices share the same row support,the matrix Π P,Q reveals the row rank proﬁle of A . Remark 2.

Example 4, suggests that a pivot search strategy minimizing rowand column indices could be a suﬃcient condition to recover both row and col-umn rank proﬁles at the same time, regardless the pivot permutation. However,this is unfortunately not the case. Consider for example a search based on thelexicographic order (ﬁrst non-zero column of the ﬁrst non-zero row) with trans-position permutations, run on the matrix: A = (cid:20) (cid:21) . Its rank proﬁle matrixis R A = (cid:20) (cid:21) whereas the pivoting matrix could be Π P,Q = (cid:20) (cid:21) , whichdoes not reveal the column rank proﬁle. This is due to the fact that the col-umn transposition performed for the ﬁrst pivot changes the order in which thecolumns will be inspected in the search for the second pivot. We will show that if the pivot permutations preserve the order in whichthe still unprocessed columns or rows appear, then the pivoting matrix willequal the rank proﬁle matrix. This is achieved by the monotonically increasingpermutations.Theorem 2 shows how the ability of a PLUQ decomposition algorithm torecover the rank proﬁle information relates to the use of monotonically increas-ing permutations. More precisely, it considers an arbitrary step in a PLUQdecomposition where k pivots have been found in the elimination of an ‘ × p leading sub-matrix A of the input matrix A . Theorem 2.

Consider a partial PLUQ decomposition of an m × n matrix A : A = P (cid:20) L M I m − k (cid:21) (cid:20) U V H (cid:21) Q where (cid:20) L M (cid:21) is m × k lower triangular and (cid:2) U V (cid:3) is k × n upper triangular, andlet A be some ‘ × p leading sub-matrix of A , for ‘, p ≥ k . Let H = P L U Q be a PLUQ decomposition of H . Consider the PLUQ decomposition A = P (cid:20) I k P (cid:21)| {z } P (cid:20) L P T M L (cid:21)| {z } L (cid:20) U V Q T U (cid:21)| {z } U (cid:20) I k Q (cid:21) Q | {z } Q . Consider the following clauses:(i) RowRP ( A ) = RowSupp (Π P ,Q ) (ii) ColRP ( A ) = ColSupp (Π P ,Q ) (iii) R A = Π P ,Q (iv) RowRP ( H ) = RowSupp (Π P ,Q ) 8 v) ColRP ( H ) = ColSupp (Π P ,Q ) (vi) R H = Π P ,Q (vii) P T is k -monotonically increasing or ( P T is ‘ -monotonically increasingand p = n )(viii) Q T is k -monotonically increasing or ( Q T is p -monotonically increasingand ‘ = m )Then,(a) if (i) or (ii) or (iii) then H = (cid:20) ( ‘ − k ) × ( p − k ) ∗∗ ∗ (cid:21) (b) if (vii) then ((i) and (iv)) ⇒ RowRP ( A ) = RowSupp (Π P,Q ) ;(c) if (viii) then ((ii) and (v)) ⇒ ColRP ( A ) = ColSupp (Π P,Q ) ;(d) if (vii) and (viii) then (iii) and (vi) ⇒ R A = Π P,Q .Proof.

Let P = (cid:2) P E (cid:3) and Q = (cid:20) Q F (cid:21) where E is m × ( m − k ) and F is( n − k ) × n . On one hand we have A = (cid:2) P E (cid:3) (cid:20) L M (cid:21) (cid:2) U V (cid:3) (cid:20) Q F (cid:21)| {z } B + E HF . (1)On the other hand,Π P,Q = P (cid:20) I k P (cid:21) (cid:20) I r ( m − r ) × ( n − r ) (cid:21) (cid:20) I k Q (cid:21) Q = P (cid:20) I k Π P ,Q (cid:21) Q = Π P ,Q + E Π P ,Q F . Let A = (cid:20) A

00 0 ( m − ‘ ) × ( n − p ) (cid:21) and denote by B the ‘ × p leading sub-matrixof B .(a) The clause (i) or (ii) or (iii) implies that all k pivots of the partial eliminationwere found within the ‘ × p sub-matrix A . Hence rank( A ) = k and wecan write P = (cid:20) P ( m − ‘ ) × k E (cid:21) and Q = (cid:20) Q k × ( n − p ) F (cid:21) , and the matrix A writes A = (cid:2) I ‘ (cid:3) A (cid:20) I p (cid:21) = B + (cid:2) I ‘ (cid:3) E HF (cid:20) I p (cid:21) . (2)Now rank( B ) = k as a sub-matrix of B of rank k and since B = (cid:2) P (cid:2) I ‘ (cid:3) · E (cid:3) (cid:20) L M (cid:21) (cid:2) U V (cid:3)  Q F · (cid:20) I p (cid:21) = P L U Q + (cid:2) I ‘ (cid:3) E M (cid:2) U V (cid:3) Q (cid:20) I p (cid:21) P L U Q , has rank k and the second term has adisjoint row support.Finally, consider the term (cid:2) I ‘ (cid:3) E HF (cid:20) I p (cid:21) of equation (2). As its rowsupport is disjoint with that of the pivot rows of B , it has to be composed ofrows linearly dependent with the pivot rows of B to ensure that rank( A ) = k . As its column support is disjoint with that of the pivot columns of B , we conclude that it must be the zero matrix. Therefore the leading( ‘ − k ) × ( p − k ) sub-matrix of E HF is zero.(b) From (a) we know that A = B . Thus RowRP( B ) = RowRP( A ). Recallthat A = B + E HF . No pivot row of B can be made linearly dependentby adding rows of E HF , as the column position of the pivot is alwayszero in the latter matrix. For the same reason, no pivot row of E HF canbe made linearly dependent by adding rows of B . From (i), the set of pivotrows of B is RowRP( A ), which shows thatRowRP( A ) = RowRP( A ) ∪ RowRP( E HF ) . (3)Let σ E : { ..m − k } → { ..m } be the map representing the sub-permutation E (i.e. such that E [ σ E ( i ) , i ] = 1 ∀ i ). If P T is k -monotonically increasing,the matrix E has full column rank and is in column echelon form, whichimplies that RowRP( E HF ) = σ E (RowRP( HF ))= σ E (RowRP( H )) , (4)since F has full row rank. If P T is ‘ monotonically increasing, we can write E = (cid:2) E E (cid:3) , where the m × ( m − ‘ ) matrix E is in column echelonform. If p = n , the matrix H writes H = (cid:20) ( ‘ − k ) × ( n − k ) H (cid:21) . Hence we have E HF = E H F which also impliesRowRP( E HF ) = σ E (RowRP( H )) . From equation (2), the row support of Π

P,Q is that of Π P ,Q + E Π P ,Q F ,which is the union of the row support of these two terms as they are dis-joint. Under the conditions of point (b), this row support is the union ofRowRP( A ) and σ E (RowRP( H )), which is, from (4) and (3), RowRP( A ).(c) Similarly as for point (b).(d) From (a) we have still A = B . Now since rank( B ) = rank( B ) =rank( A ) = k , there is no other non-zero element in R B than those in R A and R B = R A . The row and column support of R B and that of E HF are disjoint. Hence R A = R A + R E HF . (5)If both P T and Q T are k -monotonically increasing, the matrix E is incolumn echelon form and the matrix F in row echelon form. Consequently,10he matrix E HF is a copy of the matrix H with k zero-rows and k zero-columns interleaved, which does not impact the linear dependency relationsbetween the non-zero rows and columns. As a consequence R E HF = E R H F . (6)Now if Q T is k -monotonically increasing, P T is ‘ -monotonically increasingand p = n , then, using notations of point (b), E HF = E H F where E is in column echelon form. Thus R E HF = E R H F for the same reason.The symmetric case where Q T is p -monotonically increasing and ‘ = m works similarly. Combining equations (2), (5) and (6) gives R A = Π P,Q . Using Theorem 2, we deduce what rank proﬁle information is revealed by aPLUQ algorithm by the way the Search and the Permutation operations aredone. Table 1 summarizes these results, and points to instances known in theliterature, implementing the corresponding type of elimination. More precisely,we ﬁrst distinguish in this table the ability to compute the row or column rankproﬁle or the rank proﬁle matrix, but we also indicate whether the resultingPLUQ decomposition preserves the monotonicity of the rows or columns. Indeedsome algorithm may compute the rank proﬁle matrix, but break the precedencerelation between the linearly dependent rows or columns, making it unusable asa base case for a block algorithm of higher level.

Search Row Perm. Col. Perm. Reveals Monotonicity Instance

Row order Transposition Transposition RowRP [8, 9]Col. order Transposition Transposition ColRP [11, 9]Lexicographic Transposition Transposition RowRP [13]Transposition Rotation RowRP, ColRP, R Col. hereRotation Rotation RowRP, ColRP, R Row, Col. hereRev. lexico. Transposition Transposition ColRP [13]Rotation Transposition RowRP, ColRP, R Row hereRotation Rotation RowRP, ColRP, R Row, Col. hereProduct Rotation Transposition RowRP Row hereTransposition Rotation ColRP Col hereRotation Rotation RowRP, ColRP, R Row, Col. [5]

Table 1: Pivoting Strategies revealing rank proﬁles

We start with iterative algorithms, where each iteration handles one pivot ata time. Here Theorem 2 is applied with k = 1, and the partial elimination11epresents how one pivot is being treated. The elimination of H is done byinduction. Row and Column order Search

The row order pivot search operation is ofthe form: any non-zero element in the ﬁrst non-zero row . Each row is inspectedin order, and a new row is considered only when the previous row is all zeros.With the notations of Theorem 2, this means that A is the leading ‘ × n sub-matrix of A , where ‘ is the index of the ﬁrst non-zero row of A . Whenpermutations P and Q , moving the pivot from position ( ‘, j ) to ( k, k ) aretranspositions, the matrix Π P ,Q is the element E ‘,j of the canonical basis. Itsrow rank proﬁle is ( ‘ ) which is that of the ‘ × n leading sub-matrix A . Finally,the permutation P is ‘ -monotonically increasing, and Theorem 2 case (b) canbe applied to prove by induction that any such algorithm will reveal the rowrank proﬁle: RowRP( A ) = RowSupp(Π P,Q ). The case of the column ordersearch is similar.

Lexicographic order based pivot search

In this case the Pivot Searchoperation is of the form: ﬁrst non-zero element in the ﬁrst non-zero row . Thelexicographic order being compatible with the row order, the above results holdwhen transpositions are used and the row rank proﬁle is revealed. If in additioncolumn rotations are used, Q = R ,j which is 1-monotonically increasing. NowΠ P ,Q = E ‘,j which is the rank proﬁle matrix of the ‘ × n leading sub-matrix A of A . Theorem 2 case (d) can be applied to prove by induction that anysuch algorithm will reveal the rank proﬁle matrix: R A = Π P,Q . Lastly, the useof row rotations, ensures that the order of the linearly dependent rows will bepreserved as well. Algorithm 1 is an instance of Gaussian elimination with alexicographic order search and rotations for row and column permutations.The case of the reverse lexicographic order search is similar. As an example,the algorithm in [13, Algorithm 2.14] is based on a reverse lexicographic ordersearch but with transpositions for the row permutations. Hence it only revealsthe column rank proﬁle.

Product order based pivot search

The search here consists in ﬁnding anynon-zero element A ‘,p such that the ‘ × p leading sub-matrix A of A is all zerosexcept this coeﬃcient. If the row and column permutations are the rotations R ,‘ and R ,p , we have Π P ,Q = E ‘,p = R A . Theorem 2 case (d) can be appliedto prove by induction that any such algorithm will reveal the rank proﬁle matrix: R A = Π P,Q . An instance of such an algorithm is given in [5, Algorithm 2]. If P (resp. Q ) is a transposition, then Theorem 2 case (c) (resp. case (b)) appliesto show by induction that the columns (resp. row) rank proﬁle is revealed. A recursive Gaussian elimination algorithm can either split one of the row orcolumn dimension, cutting the matrix in wide or tall rectangular slabs, or split12oth dimensions, leading to a decomposition into tiles.

Slab recursive algorihtms

Most algorithms computing rank proﬁles are slabrecursive [8, 11, 13, 9]. When the row dimension is split, this means that thesearch space for pivots is the whole set of columns, and Theorem 2 applies with p = n . This corresponds to a either a row or a lexicographic order. Fromcase( b), one shows that, with transpositions, the algorithm recovers the rowrank proﬁle, provided that the base case does. If in addition, the elementarycolumn permutations are rotations, then case (d) applies and the rank proﬁlematrix is recovered. Finally, if rows are also permuted by monotonically increas-ing permutations, then the PLUQ decomposition also respects the monotonicityof the linearly dependent rows and columns. The same reasoning holds whensplitting the column dimension. Tile recursive algorithms

Tile recursive Gaussian elimination algorithms [5,12, 6] are more involved, especially when dealing with rank deﬁciencies, and werefer to [5] for a detailed description of such an algorithm. Here, the searcharea A has arbitrary dimensions ‘ × p , often specialized as m/ × n/

2. As aconsequence, the pivot search can not satisfy neither row, column, lexicographicor reverse lexicographic orders. Now, if the pivots selected in the eliminationof A minimizes the product order, then they necessarily also respect this or-der as pivots of the whole matrix A . Now, from (a), the remaining matrix H writes H = (cid:20) ( ‘ − k ) × ( p − k ) H H H (cid:21) and its elimination is done by two independenteliminations on the blocks H and H , followed by some update of H anda last elimination on it. Here again, pivots minimizing the row order on H and H are also pivots minimizing this order for H , and so are those of thefourth elimination. Now the block row and column permutations used in [5,Algorithm 1] to form the PLUQ decomposition are r -monotonically increasing.Hence, from case (d), the algorithm computes the rank proﬁle matrix and pre-serves the monotonicity. If only one of the row or column permutations arerotations, then case (b) or (c) applies to show that either the row or the columnrank proﬁle is computed. The LEU decomposition introduced in [12] involves a lower triangular matrix L , an upper triangular matrix U and a r -sub-permutation matrix E . Theorem 3.

Let A = P LU Q be a PLUQ decomposition revealing the rankproﬁle matrix ( Π P,Q = R A ). Then an LEU decomposition of A with E = R A s obtained as follows (only using row and column permutations): A = P (cid:2) L m × ( n − r ) (cid:3) P T | {z } L P (cid:20) I r (cid:21) Q | {z } E Q T (cid:20) U ( n − r ) × n (cid:21) Q | {z } U (7) Proof.

First E = P (cid:20) I r (cid:21) Q = Π P,Q = R A . Then there only needs to show that L is lower triangular and U is upper triangular. Suppose that L is not lowertriangular, let i be the ﬁrst row index such that L i,j = 0 for some i < j . First j ∈ RowRP( A ) since the non-zero columns in L are placed according to theﬁrst r values of P . Remarking that A = P (cid:2) L m × ( n − r ) (cid:3) (cid:20) U I n − r (cid:21) Q , and sinceright multiplication by a non-singular matrix does not change row rank proﬁles,we deduce that RowRP(Π P,Q ) = RowRP( A ) = RowRP( L ). If i / ∈ RowRP( A ),then the i -th row of L is linearly dependent with the previous rows, but noneof them has a non-zero element in column j > i . Hence i ∈ RowRP( A ).Let ( a, b ) be the position of the coeﬃcient L i,j in L , that is a = σ − P ( i ) , b = σ − P ( j ). Let also s = σ Q ( a ) and t = σ Q ( b ) so that the pivots at diagonal position a and b in L respectively correspond to ones in R A at positions ( i, s ) and ( j, t ).Consider the ‘ × p leading sub-matrices A of A where ‘ = max x =1 ..a − ( σ P ( x ))and p = max x =1 ..a − ( σ Q ( x )). On one hand ( j, t ) is an index position in A butnot ( i, s ), since otherwise rank( A ) = b . Therefore, ( i, s ) ⊀ prod ( j, t ), and s > t as i < j . As coeﬃcients ( j, t ) and ( i, s ) are pivots in R A and i < j and t < s ,there can not be a non-zero element above ( j, t ) at row i when it is chosen as apivot. Hence L i,j = 0 and L is lower triangular. The same reasoning applies toshow that U is upper triangular. Remark 3.

Note that the LEU decomposition with E = R A is not unique, evenfor invertible matrices. As a counter-example, the following decomposition holdsfor any a ∈ K : (cid:20) (cid:21) = (cid:20) a (cid:21) (cid:20) (cid:21) (cid:20) − a (cid:21) (8) The Bruhat decomposition, that has inspired Malaschonok’s LEU decomposi-tion [12], is another decomposition with a central permutation matrix [1, 7].

Theorem 4 ([1]) . Any invertible matrix A can be written as A = V P U for V and U uppper triangular invertible matrices and P a permutation matrix. Thelatter decomposition is called the Bruhat decomposition of A . It was then naturally extended to singular square matrices by [7]. Corollary 1generalizes it to matrices with arbitrary dimensions, and relates it to the PLUQdecomposition. 14 orollary 1.

Any m × n matrix of rank r has a V P U decomposition, where V and U are upper triangular matrices, and P is a r -sub-permutation matrix.Proof. Let J n be the unit anti-diagonal matrix. From the LEU decompositionof J n A , we have A = J n LJ n | {z } V J n E |{z} P U where V is upper triangular. The LUP decomposition A = LU P only exists for matrices with generic rowrank proﬁle (including matrices with full row rank). Corollary 2 shows uponwhich condition the permutation matrix P equals the rank proﬁle matrix R A .Note that although the rank proﬁle A is trivial in such cases, the matrix R A still carries important information on the row and column rank proﬁles of allleading sub-matrices of A . Corollary 2.

Let A be an m × n matrix.If A has generic column rank proﬁle, then any PLU decomposition A = P LU computed using reverse lexicographic order search and row rotations is such that R A = P (cid:20) I r (cid:21) . In particular, P = R A if r = m .If A has generic row rank proﬁle, then any LUP decomposition A = LU P computed using lexicographic order search and column rotations is such that R A = (cid:20) I r (cid:21) P . In particular, P = R A if r = n .Proof. Consider A has generic column rank proﬁle. From table 1, any PLUQdecomposition algorithm with a reverse lexicographic order based search androtation based row permutation is such that Π P,Q = P (cid:20) I r (cid:21) Q = R A . Since thesearch follows the reverse lexicographic order and the matrix has generic columnrank proﬁle, no column will be permuted in this elimination, and therefore Q = I n . The same reasoning hold for when A has generic row rank proﬁle.Note that the L and U factors in a PLU decomposition are uniquely de-termined by the permutation P . Hence, when the matrix has full row rank, P = R A and the decomposition A = R A LU is unique. Similarly the decompo-sition A = LU R A is unique when the matrix has full column rank. Now whenthe matrix is rank deﬁcient with generic row rank proﬁle, there is no longer aunique PLU decomposition revealing the rank proﬁle matrix: any permutationapplied to the last m − r columns of P and the last m − r rows of L yields aPLU decomposition where R A = P (cid:20) I r (cid:21) .Lastly, we remark that the only situation where the rank proﬁle matrix R A can be read directly as a sub-matrix of P or Q is as in corollary 2, when thematrix A has generic row or column rank proﬁle. Consider a PLUQ decom-position A = P LU Q revealing the rank proﬁle matrix ( R A = P (cid:20) I r (cid:21) Q ) such15hat R A is a sub-matrix of P . This means that P = R A + S where S hasdisjoint row and column support with R A . We have R A = ( R A + S ) (cid:20) I r (cid:21) Q =( R A + S ) (cid:20) Q ( n − r ) × n (cid:21) . Hence R A ( I n − (cid:20) Q ( n − r ) × n (cid:21) ) = S (cid:20) Q ( n − r ) × n (cid:21) but therow support of these matrices are disjoint, hence R A (cid:20) I n − r (cid:21) = 0 which impliesthat A has generic column rank proﬁle. Similarly, one shows that R A can be asub-matrix of Q only if A has a generic row rank proﬁle. In our previous contribution [5], we identiﬁed the ability to recover the rankproﬁle matrix via the use of the product order search and of rotations. Hencewe proposed an implementation combining a tile recursive algorithm and aniterative base case, using these search and permutation strategies.The analysis of sections 4 and 5 shows that other pivoting strategies can beused to compute the rank proﬁle matrix, and preserve the monotonicity. Wepresent here a new base case algorithm and its implementation over a ﬁniteﬁeld that we wrote in the

FFLAS-FFPACK library . It is based on a lexicographicorder search and row and column rotations. Moreover, the schedule of theupdate operations is that of a Crout elimination, for it reduces the number ofmodular reductions, as shown in [3, § Algorithm 1

Crout variant of PLUQ with lexicographic search and columnrotations k ← for i = 1 . . . m do A i,k..n ← A i,k..n − A i, ..k − × A ..k − ,k..n if A i,k..n = 0 then Loop to next iteration end if Let A i,s be the left-most non-zero element of row i . A i +1 ..m,s ← A i +1 ..m,s − A i +1 ..m, ..k − × A ..k − ,s A i +1 ..m,s ← A i +1 ..m,s /A i,s Bring A ∗ ,s to A ∗ ,k by column rotation Bring A i, ∗ to A k, ∗ by row rotation k ← k + 1 end for FFLAS-FFPACK revision 1193, http://linalg.org/projects/fflas-ffpack , linkedagainst OpenBLAS-v0.2.8. E ff e c t i v e G f op s nPLUQ base cases mod 131071. Rank = n/2. on a i5-3320 at 2.6GHzRec->Crout Lexico. (1)Rec->Left look. Prod. (2)Crout Lexico. (3)Left-looking Product (4)Right-Looking Product (5)Pure Recursive (6) E ff e c t i v e G f op s ( / / ( T i m e i n s e c ond s )) PLUQ base cases mod 131071. Rank = n/2. on an i5-3320 at 2.6GHzRecursive --> Crout LexicographicRecursive --> Left-looking Product

Figure 1: Computation speed of PLUQ decomposition base cases.In the following experiments, we measured the real time of the computationaveraged over 10 instances (100 for n < n × n matrices with rank r = n/ n between 20 and 700. In order to ensurethat the row and column rank proﬁles of these matrices are uniformly random,we construct them as the product A = L R U , where L and U are randomnon-singular lower and upper triangular matrices and R is an m × n r -sub-permutation matrix whose non-zero elements positions are chosen uniformlyat random. The eﬀective speed is obtained by dividing an estimate of thearithmetic cost (2 mnr + 2 / r − r ( m + n )) by the computation time.17igure 1 shows its computation speed (3), compared to that of the purerecursive algorithm (6), and to our previous base case algorithm [5], using aproduct order search, and either a left-looking (4) or a right-looking (5) schedule.At n = 200, the left-looking variant (4) improves over the right looking variant(5) by a factor of about 2 .

14 as it performs fewer modular reductions. Then, theCrout variant (3) again improves variant (4) by a factor of about 3.15. Lastlywe also show the speed of the ﬁnal implementation, formed by the tile recursivealgorithm cascading to either the Crout base case (1) or the left-looking one (2).The threshold where the cascading to the base case occurs is experimentally setto its optimum value, i.e. 200 for variant (1) and 70 for variant (2). Thisillustrates that the gain on the base case eﬃciency leads to a higher threshold,and improves the eﬃciency of the cascade implementation (by an additive gainof about 2.2 eﬀective Gfops in the range of dimensions considered).

Usual algorithms computing an echelon form [13, 9] use a slab block decomposi-tion (with row or lexicographic order search), which implies that pivots appearin the order of the echelon form. The column echelon form is simply obtainedas C = P L from the PLUQ decomposition. Using product order search, thisis no longer true, and the order of the columns in L may not be that of theechelon form. Algorithm 2 shows how to recover the echelon form in such cases.Note that both the row and the column echelon forms can thus be computed Algorithm 2

Echelon form from a PLUQ decomposition

Input:

P, L, U, Q , a PLUQ decomp. of A with R A = Π P,Q

Output: C : the column echelon form of A C ← P L ( p , .., p r ) = Sort( σ P (1) , .., σ P ( r )) for i = 1 ..r do τ = ( σ − P ( p ) , .., σ − P ( p r ) , r + 1 , .., m ) end for C ← CP τ from the same PLUQ decomposition. Lastly, the column echelon form of the i × j leading sub-matrix, is computed by removing rows of P L below index i andﬁltering out the pivots of column index greater than j . The latter is achievedby replacing line 2 by ( p , .., p s ) = Sort( { σ P ( i ) : σ Q ( i ) ≤ j } ). References [1] N. Bourbaki.

Groupes et Alg`egres de Lie . Number Chapters 4–6 in Elementsof mathematics. Springer, 2008. 182] J. J. Dongarra, L. S. Duﬀ, D. C. Sorensen, and H. A. V. Vorst.

NumericalLinear Algebra for High Performance Computers . SIAM, 1998.[3] J.-G. Dumas, T. Gautier, C. Pernet, and Z. Sultan. Parallel computationof echelon forms. In

Euro-Par 2014 Parallel Proc. , LNCS (8632), pages499–510. Springer, 2014. doi:10.1007/978-3-319-09873-9_42 .[4] J.-G. Dumas, P. Giorgi, and C. Pernet. Dense linear algebra over primeﬁelds.

ACM TOMS , 35(3):1–42, Nov. 2008. doi:10.1145/1391989.1391992 .[5] J.-G. Dumas, C. Pernet, and Z. Sultan. Simultaneous computation of therow and column rank proﬁles. In M. Kauers, editor,

Proc. ISSAC’13 . ACMPress, 2013. URL: http://hal.archives-ouvertes.fr/hal-00778136 , doi:10.1145/2465506.2465517 .[6] J.-G. Dumas and J.-L. Roch. On parallel block algorithms for exact tri-angularizations. Parallel Computing , 28(11):1531–1548, Nov. 2002. doi:10.1016/S0167-8191(02)00161-8 .[7] D. Y. Grigor’ev. Analogy of Bruhat decomposition for the closure of acone of Chevalley group of a classical serie.

Soviet Mathematics Doklady ,23(2):393–397, 1981.[8] O. H. Ibarra, S. Moran, and R. Hui. A generalization of the fast LUP matrixdecomposition algorithm and applications.

J. of Algorithms , 3(1):45–56,1982. doi:10.1016/0196-6774(82)90007-4 .[9] C.-P. Jeannerod, C. Pernet, and A. Storjohann. Rank-proﬁle revealingGaussian elimination and the CUP matrix decomposition.

J. SymbolicComput. , 56:46–68, 2013. URL: http://dx.doi.org/10.1016/j.jsc.2013.04.004 , doi:10.1016/j.jsc.2013.04.004 .[10] D. J. Jeﬀrey. LU factoring of non-invertible matrices. ACM Comm. Comp.Algebra , 44(1/2):1–8, July 2010. URL: , doi:10.1145/1838599.1838602 .[11] W. Keller-Gehrig. Fast algorithms for the characteristic polynomial. Th.Comp. Science , 36:309–317, 1985. doi:10.1016/0304-3975(85)90049-0 .[12] G. I. Malaschonok. Fast generalized Bruhat decomposition. In

CASC’10 ,volume 6244 of

LNCS , pages 194–202. Springer-Verlag, Berlin, Heidelberg,2010.[13] A. Storjohann.

Algorithms for Matrix Canonical Forms . PhD the-sis, ETH-Zentrum, Z¨urich, Switzerland, Nov. 2000. doi:10.3929/ethz-a-004141007doi:10.3929/ethz-a-004141007