Computing the Rank Profile Matrix
CComputing the Rank Profile Matrix ∗ Jean-Guillaume Dumas † Cl´ement Pernet ‡ Ziad Sultan § May 11, 2018
Abstract
The row (resp. column) rank profile of a matrix describes the stair-case shape of its row (resp. column) echelon form. In an ISSAC’13 paper,we proposed a recursive Gaussian elimination that can compute simulta-neously the row and column rank profiles of a matrix as well as those of allof its leading sub-matrices, in the same time as state of the art Gaussianelimination algorithms. Here we first study the conditions making a Gaus-sian elimination algorithm reveal this information. Therefore, we proposethe definition of a new matrix invariant, the rank profile matrix, summa-rizing all information on the row and column rank profiles of all the leadingsub-matrices. We also explore the conditions for a Gaussian eliminationalgorithm to compute all or part of this invariant, through the correspond-ing PLUQ decomposition. As a consequence, we show that the classicaliterative CUP decomposition algorithm can actually be adapted to com-pute the rank profile matrix. Used, in a Crout variant, as a base-caseto our ISSAC’13 implementation, it delivers a significant improvement inefficiency. Second, the row (resp. column) echelon form of a matrix areusually computed via different dedicated triangular decompositions. Weshow here that, from some PLUQ decompositions, it is possible to recoverthe row and column echelon forms of a matrix and of any of its leadingsub-matrices thanks to an elementary post-processing algorithm.
Triangular matrix decompositions are widely used in computational linear alge-bra. Besides solving linear systems of equations, they are also used to computeother objects more specific to exact arithmetic: computing the rank, samplinga vector from the null-space, computing echelon forms and rank profiles. ∗ This work is partly funded by the HPAC project of the French Agence Nationalede la Recherche (ANR 11 BS02 013). † Universit´e de Grenoble Alpes. Laboratoire LJK, umr CNRS. 51, av. des Math´ematiques,F38041 Grenoble, France. [email protected], ‡ Universit´e de Grenoble Alpes. Laboratoire LIP, Inria, CNRS, UCBL, ENS de Lyon, 46,All´ee d’Italie, F69364 Lyon Cedex 07 France. [email protected], § Universit´e de Grenoble Alpes. Laboratoires LJK and LIG, Inria, CNRS. Inovall´ee, 655,av. de l’Europe, F38334 St Ismier Cedex, France. [email protected]. a r X i v : . [ c s . S C ] A ug he row rank profile (resp. column rank profile ) of an m × n matrix A withrank r , denoted by RowRP(A) (resp. ColRP(A)), is the lexicographically small-est sequence of r indices of linearly independent rows (resp. columns) of A . An m × n matrix has generic row (resp. column) rank profile if its row (resp. column)rank profile is (1 , .., r ). Lastly, an m × n matrix has generic rank profile if its r first leading principal minors are non-zero. Note that if a matrix has genericrank profile, then its row and column rank profiles are generic, but the converseis false: the matrix (cid:20) (cid:21) does not have generic rank profile even if its row andcolumn rank profiles are generic. The row support (resp. column support) of amatrix A , denoted by RowSupp( A ) (resp. ColSupp( A )), is the subset of indicesof its non-zero rows (resp. columns).We recall that the row echelon form of an m × n matrix A is an uppertriangular matrix E = T A , for a non-singular matrix T , with the zero rows of E at the bottom and the non-zero rows in stair-case shape: min { j : a i,j = 0 } < min { j : a i +1 ,j = 0 } . As T is non singular, the column rank profile of A is thatof E , and therefore corresponds to the column indices of the leading elements inthe staircase. Similarly the row rank profile of A is composed of the row indicesof the leading elements in the staircase of the column echelon form of A . Rank profile and triangular matrix decompositions
The rank profilesof a matrix and the triangular matrix decomposition obtained by Gaussianelimination are strongly related. The elimination of matrices with arbitraryrank profiles gives rise to several matrix factorizations and many algorithmicvariants. In numerical linear algebra one often uses the PLUQ decomposition,with P and Q permutation matrices, L a lower unit triangular matrix and U an upper triangular matrix. The LSP and LQUP variants of [8] are used toreduce the complexity rank deficient Gaussian elimination to that of matrixmultiplication. Many other algorithmic decompositions exist allowing fractionfree computations [10], in-place computations [4, 9] or sub-cubic rank-sensitivetime complexity [13, 9]. In [5] we proposed a Gaussian elimination algorithmwith a recursive splitting of both row and column dimensions, and replacingrow and column transpositions by rotations. This elimination can computesimultaneously the row and column rank profile while preserving the sub-cubicrank-sensitive time complexity and keeping the computation in-place.In this paper we first study the conditions a PLUQ decomposition algorithmmust satisfy in order to reveal the rank profile structure of a matrix. We in-troduce in section 2 the rank profile matrix R A , a normal form summarizingall rank profile information of a matrix and of all its leading sub-matrices. Wethen decompose, in section 3, the pivoting strategy of any PLUQ algorithm intotwo types of operations: the search of the pivot and the permutation used tomove it to the main diagonal. We propose a new search and a new permutationstrategy and show what rank profiles are computed using any possible combina-tion of these operations and the previously used searches and permutations. Inparticular we show three new pivoting strategy combinations that compute the2ank profile matrix and use one of them, an iterative Crout CUP with rotations,to improve the base case and thus the overall performance of exact Gaussianelimination. Second, we show that preserving both the row and column rankprofiles, together with ensuring a monotonicity of the associated permutations,allows us to compute faster several other matrix decompositions, such as theLEU and Bruhat decompositions, and echelon forms.In the following, 0 m × n denotes the m × n zero matrix and A i..j,k..l denotesthe sub-matrix of A of rows between i and j and columns between k and l . Toa permutation σ : { , . . . , n } → { , . . . , n } we define the associated permutationmatrix P σ , permuting rows by left multiplication: the rows of P σ A are that of A permuted by σ . Reciprocally, for a permutation matrix P , we denote by σ P the associated permutation. We start by introducing in Theorem 1 the rank profile matrix, that we will usethroughout this document to summarize all information on the rank profiles of amatrix. From now on, matrices are over a field K and a valid pivot is a non-zeroelement.
Definition 1. An r -sub-permutation matrix is a matrix of rank r with only r non-zero entries equal to one. Lemma 1. An m × n r -sub-permutation matrix has at most one non-zero entryper row and per column, and can be written P (cid:20) I r ( m − r ) × ( n − r ) (cid:21) Q where P and Q are permutation matrices. Theorem 1.
Let A ∈ K m × n . There exists a unique m × n rank ( A ) -sub-permutation matrix R A of which every leading sub-matrix has the same rankas the corresponding leading sub-matrix of A . This sub-permutation matrix iscalled the rank profile matrix of A . Proof.
We prove existence by induction on the row dimension of the leadingsubmatrices.If A , ..n = 0 × n , setting R (1) = 0 × n satisfies the defining condition. Oth-erwise, let j be the index of the leftmost invertible element in A , ..n and set R (1) = e Tj the j-th n -dimensional row canonical vector, which satisfies the defin-ing condition.Now for a given i ∈ { , . . . , m } , suppose that there is a unique i × n rank profile matrix R ( i ) such that rank( A ..i, ..j ) = rank( R ..i, ..j ) for every j ∈ { ..n } . If rank( A ..i +1 , ..n ) = rank( A ..i, ..n ), then R ( i +1) = (cid:20) R ( i ) × n (cid:21) . Oth-erwise, consider k , the smallest column index such that rank( A ..i +1 , ..k ) =rank( A ..i, ..k ) + 1 and set R ( i +1) = (cid:20) R ( i ) e Tk (cid:21) . Any leading sub-matrix of R ( i +1) has the same rank as the corresponding leading sub-matrix of A : first, for any3eading subset of rows and columns with less than i rows, the case is covered bythe induction; second define (cid:20) B uv T x (cid:21) = A ..i +1 , ..k , where u, v are vectors and x is a scalar. From the definition of k , v is linearly dependent with B and thus anyleading sub-matrix of (cid:20) Bv T (cid:21) has the same rank as the corresponding sub-matrixof R ( i +1) . Similarly, from the definition of k , the same reasoning works whenconsidering more than k columns, with a rank increment by 1.Lastly we show that R ( i +1) is a r i +1 -sub-permutation matrix. Indeed, u is lin-early dependent with the columns of B : otherwise, rank( (cid:2) B u (cid:3) ) = rank( B ) + 1.From the definition of k we then have rank( (cid:20) B uv T x (cid:21) ) = rank ( (cid:2) B u (cid:3) ) + 1 =rank( B ) + 2 = rank( (cid:20) Bv T (cid:21) ) + 2 which is a contradiction. Consequently, the k -thcolumn of R ( i ) is all zero, and R ( i +1) is a r -sub-permutation matrix.To prove uniqueness, suppose there exist two distinct rank profile matrices R (1) and R (2) for a given matrix A and let ( i, j ) be some coordinates where R (1)1 ..i, ..j = R (2)1 ..i, ..j and R (1)1 ..i − , ..j − = R (2)1 ..i − , ..j − . Then, rank( A ..i, ..j ) =rank( R (1)1 ..i, ..j ) = rank( R (2)1 ..i, ..j ) = rank( A ..i, ..j ) which is a contradiction. Example 1. A = has R A = for rank profile matrix over Q . Remark 1.
The matrix E introduced in Malaschonok’s LEU decomposition [12,Theorem 1], is in fact the rank profile matrix. There, the existence of thisdecomposition was only shown for m = n = 2 k , and no connection was madeto the relation with ranks and rank profiles. This connection was made in [5,Corollary 1], and the existence of E generalized to arbitrary dimensions m and n . Finally, after proving its uniqueness here, we propose this definition as anew matrix normal form. The rank profile matrix has the following properties:
Lemma 2.
Let A be a matrix.1. R A is diagonal if A has generic rank profile .2. R A is a permutation matrix if A is invertible
3. RowRP ( A ) = RowSupp ( R A ) ; ColRP ( A ) = ColSupp ( R A ) .Moreover, for all ≤ i ≤ m and ≤ j ≤ n , we have:4. RowRP ( A ..i, ..j ) = RowSupp (( R A ) ..i, ..j )
5. ColRP ( A ..i, ..j ) = ColSupp (( R A ) ..i, ..j ) , These properties show how to recover the row and column rank profiles of A and of any of its leading sub-matrix.4 Ingredients of a PLUQ decomposition algo-rithm
Over a field, the LU decomposition generalizes to matrices with arbitrary rankprofiles, using row and column permutations (in some cases such as the CUP,or LSP decompositions, the row permutation is embedded in the structure ofthe C or S matrices). However such PLUQ decompositions are not unique andnot all of them will necessarily reveal rank profiles and echelon forms. We willcharacterize the conditions for a PLUQ decomposition algorithm to reveal therow or column rank profile or the rank profile matrix.We consider the four types of operations of a Gaussian elimination algorithmin the processing of the k -th pivot: Pivot search: finding an element to be used as a pivot,
Pivot permutation: moving the pivot in diagonal position ( k, k ) by columnand/or row permutations,
Update: applying the elimination at position ( i, j ): a i,j ← a i,j − a i,k a − k,k a k,j , Normalization: dividing the k -th row (resp. column) by the pivot.Choosing how each of these operation is done, and when they are scheduledresults in an elimination algorithm. Conversely, any Gaussian elimination algo-rithm computing a PLUQ decomposition can be viewed as a set of specializationsof each of these operations together with a scheduling.The choice of doing the normalization on rows or columns only determineswhich of U or L will be unit triangular. The scheduling of the updates varydepending on the type of algorithm used: iterative, recursive, slab or tiled blocksplitting, with right-looking, left-looking or Crout variants [2]. Neither thenormalization nor the update impact the capacity to reveal rank profiles andwe will thus now focus on the pivot search and the permutations.Choosing a search and a permutation strategy fixes the matrices P and Q of the PLUQ decomposition obtained and, as we will see, determines the abilityto recover information on the rank profiles. Once these matrices are fixed, the L and the U factors are unique. We introduce the pivoting matrix. Definition 2.
The pivoting matrix of a PLUQ decomposition A = P LU Q ofrank r is the r -sub-permutation matrix Π P,Q = P (cid:20) I r ( m − r ) × ( n − r ) (cid:21) Q. The r non-zero elements of Π P,Q are located at the initial positions of thepivots in the matrix A . Thus Π P,Q summarizes the choices made in the searchand permutation operations.
Pivot search
The search operation vastly differs depending on the field ofapplication. In numerical dense linear algebra, numerical stability is the maincriterion for the selection of the pivot. In sparse linear algebra, the pivot is5hosen so as to reduce the fill-in produced by the update operation. In order toreveal some information on the rank profiles, a notion of precedence has to beused: a usual way to compute the row rank profile is to search in a given rowfor a pivot and only move to the next row if the current row was found to be allzeros. This guarantees that each pivot will be on the first linearly independentrow, and therefore the row support of Π
P,Q will be the row rank profile. Theprecedence here is that the pivot’s coordinates must minimize the order for thefirst coordinate (the row index). As a generalization, we consider the mostcommon preorders of the cartesian product { , . . . m } × { , . . . n } inherited fromthe natural orders of each of its components and describe the correspondingsearch strategies, minimizing this preorder: Row order: ( i , j ) (cid:22) row ( i , j ) iff i ≤ i : search for any invertible elementin the first non-zero row. Column order: ( i , j ) (cid:22) col ( i , j ) iff j ≤ j . search for any invertible ele-ment in the first non-zero column. Lexicographic order: ( i , j ) (cid:22) lex ( i , j ) iff i < i or i = i and j ≤ j : search for the leftmost non-zero element of the first non-zero row. Reverse lexicographic order: ( i , j ) (cid:22) revlex ( i , j ) iff j < j or j = j and i ≤ i : search for the topmost non-zero element of the first non-zerocolumn. Product order: ( i , j ) (cid:22) prod ( i , j ) iff i ≤ i and j ≤ j : search for anynon-zero element at position ( i, j ) being the only non-zero of the leading ( i, j ) sub-matrix. Example 2.
Consider the matrix a b c d e fg h i j kl m n o p , where each literal is a non-zeroelement. The minimum non-zero elements for each preorder are the following:Row order a, b Column order g, l
Lexicographic order a Reverse lexic. order g Product order a, c, g
Pivot permutation
The pivot permutation moves a pivot from its initialposition to the leading diagonal. Besides this constraint all possible choicesare left for the remaining values of the permutation. Most often, it is doneby row or column transpositions, as it clearly involves a small amount of datamovement. However, these transpositions can break the precedence relations inthe set of rows or columns, and can therefore prevent the recovery of the rankprofile information. A pivot permutation that leaves the precedence relationsunchanged will be called k -monotonically increasing. Definition 3.
A permutation of σ ∈ S n is called k -monotonically increasing ifits last n − k values form a monotonically increasing sequence.
6n particular, the last n − k rows of the associated row-permutation matrix P σ are in row echelon form. For example, the cyclic shift between indices k and i , with k < i defined as R k,i = (1 , . . . , k − , i, k, k + 1 , . . . , i − , i + 1 , . . . , n ),that we will call a ( k, i )-rotation, is an elementary k -monotonically increasingpermutation. Example 3.
The (1 , -rotation R , = (4 , , , is a -monotonically increas-ing permutation. Its row permutation matrix is . In fact, any ( k, i ) -rotation is a k -monotonically increasing permutation. Monotonically increasing permutations can be composed as stated in Lemma 3.
Lemma 3. If σ ∈ S n is a k -monotonically increasing permutation and σ ∈S k × S n − k a k -monotonically increasing permutation with k < k then thepermutation σ ◦ σ is a k -monotonically increasing permutation.Proof. The last n − k values of σ ◦ σ are the image of a sub-sequence of n − k values from the last n − k values of σ through the monotonically increasingfunction σ .Therefore an iterative algorithm, using rotations as elementary pivot per-mutations, maintains the property that the permutation matrices P and Q atany step k are k -monotonically increasing. A similar property also applies withrecursive algorithms. A PLUQ decomposition reveals the row (resp. column) rank profile if it can beread from the first r values of the permutation matrix P (resp. Q ). Equivalently,by Lemma 2, this means that the row (resp. column) support of the pivotingmatrix Π P,Q equals that of the rank profile matrix.
Definition 4.
The decomposition A = P LU Q reveals:1. the row rank profile if RowSupp (Π P,Q ) =
RowSupp ( R A ) ,2. the col. rank profile if ColSupp (Π P,Q ) =
ColSupp ( R A ) ,3. the rank profile matrix if Π P,Q = R A . Example 4. A = has R A = for rank profile matrix over Q . Now the pivoting matrix obtained from a PLUQ decomposition with a pivotsearch operation following the row order (any column, first non-zero row) could e the matrix Π P,Q = . As these matrices share the same row support,the matrix Π P,Q reveals the row rank profile of A . Remark 2.
Example 4, suggests that a pivot search strategy minimizing rowand column indices could be a sufficient condition to recover both row and col-umn rank profiles at the same time, regardless the pivot permutation. However,this is unfortunately not the case. Consider for example a search based on thelexicographic order (first non-zero column of the first non-zero row) with trans-position permutations, run on the matrix: A = (cid:20) (cid:21) . Its rank profile matrixis R A = (cid:20) (cid:21) whereas the pivoting matrix could be Π P,Q = (cid:20) (cid:21) , whichdoes not reveal the column rank profile. This is due to the fact that the col-umn transposition performed for the first pivot changes the order in which thecolumns will be inspected in the search for the second pivot. We will show that if the pivot permutations preserve the order in whichthe still unprocessed columns or rows appear, then the pivoting matrix willequal the rank profile matrix. This is achieved by the monotonically increasingpermutations.Theorem 2 shows how the ability of a PLUQ decomposition algorithm torecover the rank profile information relates to the use of monotonically increas-ing permutations. More precisely, it considers an arbitrary step in a PLUQdecomposition where k pivots have been found in the elimination of an ‘ × p leading sub-matrix A of the input matrix A . Theorem 2.
Consider a partial PLUQ decomposition of an m × n matrix A : A = P (cid:20) L M I m − k (cid:21) (cid:20) U V H (cid:21) Q where (cid:20) L M (cid:21) is m × k lower triangular and (cid:2) U V (cid:3) is k × n upper triangular, andlet A be some ‘ × p leading sub-matrix of A , for ‘, p ≥ k . Let H = P L U Q be a PLUQ decomposition of H . Consider the PLUQ decomposition A = P (cid:20) I k P (cid:21)| {z } P (cid:20) L P T M L (cid:21)| {z } L (cid:20) U V Q T U (cid:21)| {z } U (cid:20) I k Q (cid:21) Q | {z } Q . Consider the following clauses:(i) RowRP ( A ) = RowSupp (Π P ,Q ) (ii) ColRP ( A ) = ColSupp (Π P ,Q ) (iii) R A = Π P ,Q (iv) RowRP ( H ) = RowSupp (Π P ,Q ) 8 v) ColRP ( H ) = ColSupp (Π P ,Q ) (vi) R H = Π P ,Q (vii) P T is k -monotonically increasing or ( P T is ‘ -monotonically increasingand p = n )(viii) Q T is k -monotonically increasing or ( Q T is p -monotonically increasingand ‘ = m )Then,(a) if (i) or (ii) or (iii) then H = (cid:20) ( ‘ − k ) × ( p − k ) ∗∗ ∗ (cid:21) (b) if (vii) then ((i) and (iv)) ⇒ RowRP ( A ) = RowSupp (Π P,Q ) ;(c) if (viii) then ((ii) and (v)) ⇒ ColRP ( A ) = ColSupp (Π P,Q ) ;(d) if (vii) and (viii) then (iii) and (vi) ⇒ R A = Π P,Q .Proof.
Let P = (cid:2) P E (cid:3) and Q = (cid:20) Q F (cid:21) where E is m × ( m − k ) and F is( n − k ) × n . On one hand we have A = (cid:2) P E (cid:3) (cid:20) L M (cid:21) (cid:2) U V (cid:3) (cid:20) Q F (cid:21)| {z } B + E HF . (1)On the other hand,Π P,Q = P (cid:20) I k P (cid:21) (cid:20) I r ( m − r ) × ( n − r ) (cid:21) (cid:20) I k Q (cid:21) Q = P (cid:20) I k Π P ,Q (cid:21) Q = Π P ,Q + E Π P ,Q F . Let A = (cid:20) A
00 0 ( m − ‘ ) × ( n − p ) (cid:21) and denote by B the ‘ × p leading sub-matrixof B .(a) The clause (i) or (ii) or (iii) implies that all k pivots of the partial eliminationwere found within the ‘ × p sub-matrix A . Hence rank( A ) = k and wecan write P = (cid:20) P ( m − ‘ ) × k E (cid:21) and Q = (cid:20) Q k × ( n − p ) F (cid:21) , and the matrix A writes A = (cid:2) I ‘ (cid:3) A (cid:20) I p (cid:21) = B + (cid:2) I ‘ (cid:3) E HF (cid:20) I p (cid:21) . (2)Now rank( B ) = k as a sub-matrix of B of rank k and since B = (cid:2) P (cid:2) I ‘ (cid:3) · E (cid:3) (cid:20) L M (cid:21) (cid:2) U V (cid:3) Q F · (cid:20) I p (cid:21) = P L U Q + (cid:2) I ‘ (cid:3) E M (cid:2) U V (cid:3) Q (cid:20) I p (cid:21) P L U Q , has rank k and the second term has adisjoint row support.Finally, consider the term (cid:2) I ‘ (cid:3) E HF (cid:20) I p (cid:21) of equation (2). As its rowsupport is disjoint with that of the pivot rows of B , it has to be composed ofrows linearly dependent with the pivot rows of B to ensure that rank( A ) = k . As its column support is disjoint with that of the pivot columns of B , we conclude that it must be the zero matrix. Therefore the leading( ‘ − k ) × ( p − k ) sub-matrix of E HF is zero.(b) From (a) we know that A = B . Thus RowRP( B ) = RowRP( A ). Recallthat A = B + E HF . No pivot row of B can be made linearly dependentby adding rows of E HF , as the column position of the pivot is alwayszero in the latter matrix. For the same reason, no pivot row of E HF canbe made linearly dependent by adding rows of B . From (i), the set of pivotrows of B is RowRP( A ), which shows thatRowRP( A ) = RowRP( A ) ∪ RowRP( E HF ) . (3)Let σ E : { ..m − k } → { ..m } be the map representing the sub-permutation E (i.e. such that E [ σ E ( i ) , i ] = 1 ∀ i ). If P T is k -monotonically increasing,the matrix E has full column rank and is in column echelon form, whichimplies that RowRP( E HF ) = σ E (RowRP( HF ))= σ E (RowRP( H )) , (4)since F has full row rank. If P T is ‘ monotonically increasing, we can write E = (cid:2) E E (cid:3) , where the m × ( m − ‘ ) matrix E is in column echelonform. If p = n , the matrix H writes H = (cid:20) ( ‘ − k ) × ( n − k ) H (cid:21) . Hence we have E HF = E H F which also impliesRowRP( E HF ) = σ E (RowRP( H )) . From equation (2), the row support of Π
P,Q is that of Π P ,Q + E Π P ,Q F ,which is the union of the row support of these two terms as they are dis-joint. Under the conditions of point (b), this row support is the union ofRowRP( A ) and σ E (RowRP( H )), which is, from (4) and (3), RowRP( A ).(c) Similarly as for point (b).(d) From (a) we have still A = B . Now since rank( B ) = rank( B ) =rank( A ) = k , there is no other non-zero element in R B than those in R A and R B = R A . The row and column support of R B and that of E HF are disjoint. Hence R A = R A + R E HF . (5)If both P T and Q T are k -monotonically increasing, the matrix E is incolumn echelon form and the matrix F in row echelon form. Consequently,10he matrix E HF is a copy of the matrix H with k zero-rows and k zero-columns interleaved, which does not impact the linear dependency relationsbetween the non-zero rows and columns. As a consequence R E HF = E R H F . (6)Now if Q T is k -monotonically increasing, P T is ‘ -monotonically increasingand p = n , then, using notations of point (b), E HF = E H F where E is in column echelon form. Thus R E HF = E R H F for the same reason.The symmetric case where Q T is p -monotonically increasing and ‘ = m works similarly. Combining equations (2), (5) and (6) gives R A = Π P,Q . Using Theorem 2, we deduce what rank profile information is revealed by aPLUQ algorithm by the way the Search and the Permutation operations aredone. Table 1 summarizes these results, and points to instances known in theliterature, implementing the corresponding type of elimination. More precisely,we first distinguish in this table the ability to compute the row or column rankprofile or the rank profile matrix, but we also indicate whether the resultingPLUQ decomposition preserves the monotonicity of the rows or columns. Indeedsome algorithm may compute the rank profile matrix, but break the precedencerelation between the linearly dependent rows or columns, making it unusable asa base case for a block algorithm of higher level.
Search Row Perm. Col. Perm. Reveals Monotonicity Instance
Row order Transposition Transposition RowRP [8, 9]Col. order Transposition Transposition ColRP [11, 9]Lexicographic Transposition Transposition RowRP [13]Transposition Rotation RowRP, ColRP, R Col. hereRotation Rotation RowRP, ColRP, R Row, Col. hereRev. lexico. Transposition Transposition ColRP [13]Rotation Transposition RowRP, ColRP, R Row hereRotation Rotation RowRP, ColRP, R Row, Col. hereProduct Rotation Transposition RowRP Row hereTransposition Rotation ColRP Col hereRotation Rotation RowRP, ColRP, R Row, Col. [5]
Table 1: Pivoting Strategies revealing rank profiles
We start with iterative algorithms, where each iteration handles one pivot ata time. Here Theorem 2 is applied with k = 1, and the partial elimination11epresents how one pivot is being treated. The elimination of H is done byinduction. Row and Column order Search
The row order pivot search operation is ofthe form: any non-zero element in the first non-zero row . Each row is inspectedin order, and a new row is considered only when the previous row is all zeros.With the notations of Theorem 2, this means that A is the leading ‘ × n sub-matrix of A , where ‘ is the index of the first non-zero row of A . Whenpermutations P and Q , moving the pivot from position ( ‘, j ) to ( k, k ) aretranspositions, the matrix Π P ,Q is the element E ‘,j of the canonical basis. Itsrow rank profile is ( ‘ ) which is that of the ‘ × n leading sub-matrix A . Finally,the permutation P is ‘ -monotonically increasing, and Theorem 2 case (b) canbe applied to prove by induction that any such algorithm will reveal the rowrank profile: RowRP( A ) = RowSupp(Π P,Q ). The case of the column ordersearch is similar.
Lexicographic order based pivot search
In this case the Pivot Searchoperation is of the form: first non-zero element in the first non-zero row . Thelexicographic order being compatible with the row order, the above results holdwhen transpositions are used and the row rank profile is revealed. If in additioncolumn rotations are used, Q = R ,j which is 1-monotonically increasing. NowΠ P ,Q = E ‘,j which is the rank profile matrix of the ‘ × n leading sub-matrix A of A . Theorem 2 case (d) can be applied to prove by induction that anysuch algorithm will reveal the rank profile matrix: R A = Π P,Q . Lastly, the useof row rotations, ensures that the order of the linearly dependent rows will bepreserved as well. Algorithm 1 is an instance of Gaussian elimination with alexicographic order search and rotations for row and column permutations.The case of the reverse lexicographic order search is similar. As an example,the algorithm in [13, Algorithm 2.14] is based on a reverse lexicographic ordersearch but with transpositions for the row permutations. Hence it only revealsthe column rank profile.
Product order based pivot search
The search here consists in finding anynon-zero element A ‘,p such that the ‘ × p leading sub-matrix A of A is all zerosexcept this coefficient. If the row and column permutations are the rotations R ,‘ and R ,p , we have Π P ,Q = E ‘,p = R A . Theorem 2 case (d) can be appliedto prove by induction that any such algorithm will reveal the rank profile matrix: R A = Π P,Q . An instance of such an algorithm is given in [5, Algorithm 2]. If P (resp. Q ) is a transposition, then Theorem 2 case (c) (resp. case (b)) appliesto show by induction that the columns (resp. row) rank profile is revealed. A recursive Gaussian elimination algorithm can either split one of the row orcolumn dimension, cutting the matrix in wide or tall rectangular slabs, or split12oth dimensions, leading to a decomposition into tiles.
Slab recursive algorihtms
Most algorithms computing rank profiles are slabrecursive [8, 11, 13, 9]. When the row dimension is split, this means that thesearch space for pivots is the whole set of columns, and Theorem 2 applies with p = n . This corresponds to a either a row or a lexicographic order. Fromcase( b), one shows that, with transpositions, the algorithm recovers the rowrank profile, provided that the base case does. If in addition, the elementarycolumn permutations are rotations, then case (d) applies and the rank profilematrix is recovered. Finally, if rows are also permuted by monotonically increas-ing permutations, then the PLUQ decomposition also respects the monotonicityof the linearly dependent rows and columns. The same reasoning holds whensplitting the column dimension. Tile recursive algorithms
Tile recursive Gaussian elimination algorithms [5,12, 6] are more involved, especially when dealing with rank deficiencies, and werefer to [5] for a detailed description of such an algorithm. Here, the searcharea A has arbitrary dimensions ‘ × p , often specialized as m/ × n/
2. As aconsequence, the pivot search can not satisfy neither row, column, lexicographicor reverse lexicographic orders. Now, if the pivots selected in the eliminationof A minimizes the product order, then they necessarily also respect this or-der as pivots of the whole matrix A . Now, from (a), the remaining matrix H writes H = (cid:20) ( ‘ − k ) × ( p − k ) H H H (cid:21) and its elimination is done by two independenteliminations on the blocks H and H , followed by some update of H anda last elimination on it. Here again, pivots minimizing the row order on H and H are also pivots minimizing this order for H , and so are those of thefourth elimination. Now the block row and column permutations used in [5,Algorithm 1] to form the PLUQ decomposition are r -monotonically increasing.Hence, from case (d), the algorithm computes the rank profile matrix and pre-serves the monotonicity. If only one of the row or column permutations arerotations, then case (b) or (c) applies to show that either the row or the columnrank profile is computed. The LEU decomposition introduced in [12] involves a lower triangular matrix L , an upper triangular matrix U and a r -sub-permutation matrix E . Theorem 3.
Let A = P LU Q be a PLUQ decomposition revealing the rankprofile matrix ( Π P,Q = R A ). Then an LEU decomposition of A with E = R A s obtained as follows (only using row and column permutations): A = P (cid:2) L m × ( n − r ) (cid:3) P T | {z } L P (cid:20) I r (cid:21) Q | {z } E Q T (cid:20) U ( n − r ) × n (cid:21) Q | {z } U (7) Proof.
First E = P (cid:20) I r (cid:21) Q = Π P,Q = R A . Then there only needs to show that L is lower triangular and U is upper triangular. Suppose that L is not lowertriangular, let i be the first row index such that L i,j = 0 for some i < j . First j ∈ RowRP( A ) since the non-zero columns in L are placed according to thefirst r values of P . Remarking that A = P (cid:2) L m × ( n − r ) (cid:3) (cid:20) U I n − r (cid:21) Q , and sinceright multiplication by a non-singular matrix does not change row rank profiles,we deduce that RowRP(Π P,Q ) = RowRP( A ) = RowRP( L ). If i / ∈ RowRP( A ),then the i -th row of L is linearly dependent with the previous rows, but noneof them has a non-zero element in column j > i . Hence i ∈ RowRP( A ).Let ( a, b ) be the position of the coefficient L i,j in L , that is a = σ − P ( i ) , b = σ − P ( j ). Let also s = σ Q ( a ) and t = σ Q ( b ) so that the pivots at diagonal position a and b in L respectively correspond to ones in R A at positions ( i, s ) and ( j, t ).Consider the ‘ × p leading sub-matrices A of A where ‘ = max x =1 ..a − ( σ P ( x ))and p = max x =1 ..a − ( σ Q ( x )). On one hand ( j, t ) is an index position in A butnot ( i, s ), since otherwise rank( A ) = b . Therefore, ( i, s ) ⊀ prod ( j, t ), and s > t as i < j . As coefficients ( j, t ) and ( i, s ) are pivots in R A and i < j and t < s ,there can not be a non-zero element above ( j, t ) at row i when it is chosen as apivot. Hence L i,j = 0 and L is lower triangular. The same reasoning applies toshow that U is upper triangular. Remark 3.
Note that the LEU decomposition with E = R A is not unique, evenfor invertible matrices. As a counter-example, the following decomposition holdsfor any a ∈ K : (cid:20) (cid:21) = (cid:20) a (cid:21) (cid:20) (cid:21) (cid:20) − a (cid:21) (8) The Bruhat decomposition, that has inspired Malaschonok’s LEU decomposi-tion [12], is another decomposition with a central permutation matrix [1, 7].
Theorem 4 ([1]) . Any invertible matrix A can be written as A = V P U for V and U uppper triangular invertible matrices and P a permutation matrix. Thelatter decomposition is called the Bruhat decomposition of A . It was then naturally extended to singular square matrices by [7]. Corollary 1generalizes it to matrices with arbitrary dimensions, and relates it to the PLUQdecomposition. 14 orollary 1.
Any m × n matrix of rank r has a V P U decomposition, where V and U are upper triangular matrices, and P is a r -sub-permutation matrix.Proof. Let J n be the unit anti-diagonal matrix. From the LEU decompositionof J n A , we have A = J n LJ n | {z } V J n E |{z} P U where V is upper triangular. The LUP decomposition A = LU P only exists for matrices with generic rowrank profile (including matrices with full row rank). Corollary 2 shows uponwhich condition the permutation matrix P equals the rank profile matrix R A .Note that although the rank profile A is trivial in such cases, the matrix R A still carries important information on the row and column rank profiles of allleading sub-matrices of A . Corollary 2.
Let A be an m × n matrix.If A has generic column rank profile, then any PLU decomposition A = P LU computed using reverse lexicographic order search and row rotations is such that R A = P (cid:20) I r (cid:21) . In particular, P = R A if r = m .If A has generic row rank profile, then any LUP decomposition A = LU P computed using lexicographic order search and column rotations is such that R A = (cid:20) I r (cid:21) P . In particular, P = R A if r = n .Proof. Consider A has generic column rank profile. From table 1, any PLUQdecomposition algorithm with a reverse lexicographic order based search androtation based row permutation is such that Π P,Q = P (cid:20) I r (cid:21) Q = R A . Since thesearch follows the reverse lexicographic order and the matrix has generic columnrank profile, no column will be permuted in this elimination, and therefore Q = I n . The same reasoning hold for when A has generic row rank profile.Note that the L and U factors in a PLU decomposition are uniquely de-termined by the permutation P . Hence, when the matrix has full row rank, P = R A and the decomposition A = R A LU is unique. Similarly the decompo-sition A = LU R A is unique when the matrix has full column rank. Now whenthe matrix is rank deficient with generic row rank profile, there is no longer aunique PLU decomposition revealing the rank profile matrix: any permutationapplied to the last m − r columns of P and the last m − r rows of L yields aPLU decomposition where R A = P (cid:20) I r (cid:21) .Lastly, we remark that the only situation where the rank profile matrix R A can be read directly as a sub-matrix of P or Q is as in corollary 2, when thematrix A has generic row or column rank profile. Consider a PLUQ decom-position A = P LU Q revealing the rank profile matrix ( R A = P (cid:20) I r (cid:21) Q ) such15hat R A is a sub-matrix of P . This means that P = R A + S where S hasdisjoint row and column support with R A . We have R A = ( R A + S ) (cid:20) I r (cid:21) Q =( R A + S ) (cid:20) Q ( n − r ) × n (cid:21) . Hence R A ( I n − (cid:20) Q ( n − r ) × n (cid:21) ) = S (cid:20) Q ( n − r ) × n (cid:21) but therow support of these matrices are disjoint, hence R A (cid:20) I n − r (cid:21) = 0 which impliesthat A has generic column rank profile. Similarly, one shows that R A can be asub-matrix of Q only if A has a generic row rank profile. In our previous contribution [5], we identified the ability to recover the rankprofile matrix via the use of the product order search and of rotations. Hencewe proposed an implementation combining a tile recursive algorithm and aniterative base case, using these search and permutation strategies.The analysis of sections 4 and 5 shows that other pivoting strategies can beused to compute the rank profile matrix, and preserve the monotonicity. Wepresent here a new base case algorithm and its implementation over a finitefield that we wrote in the
FFLAS-FFPACK library . It is based on a lexicographicorder search and row and column rotations. Moreover, the schedule of theupdate operations is that of a Crout elimination, for it reduces the number ofmodular reductions, as shown in [3, § Algorithm 1
Crout variant of PLUQ with lexicographic search and columnrotations k ← for i = 1 . . . m do A i,k..n ← A i,k..n − A i, ..k − × A ..k − ,k..n if A i,k..n = 0 then Loop to next iteration end if Let A i,s be the left-most non-zero element of row i . A i +1 ..m,s ← A i +1 ..m,s − A i +1 ..m, ..k − × A ..k − ,s A i +1 ..m,s ← A i +1 ..m,s /A i,s Bring A ∗ ,s to A ∗ ,k by column rotation Bring A i, ∗ to A k, ∗ by row rotation k ← k + 1 end for FFLAS-FFPACK revision 1193, http://linalg.org/projects/fflas-ffpack , linkedagainst OpenBLAS-v0.2.8. E ff e c t i v e G f op s nPLUQ base cases mod 131071. Rank = n/2. on a i5-3320 at 2.6GHzRec->Crout Lexico. (1)Rec->Left look. Prod. (2)Crout Lexico. (3)Left-looking Product (4)Right-Looking Product (5)Pure Recursive (6) E ff e c t i v e G f op s ( / / ( T i m e i n s e c ond s )) PLUQ base cases mod 131071. Rank = n/2. on an i5-3320 at 2.6GHzRecursive --> Crout LexicographicRecursive --> Left-looking Product
Figure 1: Computation speed of PLUQ decomposition base cases.In the following experiments, we measured the real time of the computationaveraged over 10 instances (100 for n < n × n matrices with rank r = n/ n between 20 and 700. In order to ensurethat the row and column rank profiles of these matrices are uniformly random,we construct them as the product A = L R U , where L and U are randomnon-singular lower and upper triangular matrices and R is an m × n r -sub-permutation matrix whose non-zero elements positions are chosen uniformlyat random. The effective speed is obtained by dividing an estimate of thearithmetic cost (2 mnr + 2 / r − r ( m + n )) by the computation time.17igure 1 shows its computation speed (3), compared to that of the purerecursive algorithm (6), and to our previous base case algorithm [5], using aproduct order search, and either a left-looking (4) or a right-looking (5) schedule.At n = 200, the left-looking variant (4) improves over the right looking variant(5) by a factor of about 2 .
14 as it performs fewer modular reductions. Then, theCrout variant (3) again improves variant (4) by a factor of about 3.15. Lastlywe also show the speed of the final implementation, formed by the tile recursivealgorithm cascading to either the Crout base case (1) or the left-looking one (2).The threshold where the cascading to the base case occurs is experimentally setto its optimum value, i.e. 200 for variant (1) and 70 for variant (2). Thisillustrates that the gain on the base case efficiency leads to a higher threshold,and improves the efficiency of the cascade implementation (by an additive gainof about 2.2 effective Gfops in the range of dimensions considered).
Usual algorithms computing an echelon form [13, 9] use a slab block decomposi-tion (with row or lexicographic order search), which implies that pivots appearin the order of the echelon form. The column echelon form is simply obtainedas C = P L from the PLUQ decomposition. Using product order search, thisis no longer true, and the order of the columns in L may not be that of theechelon form. Algorithm 2 shows how to recover the echelon form in such cases.Note that both the row and the column echelon forms can thus be computed Algorithm 2
Echelon form from a PLUQ decomposition
Input:
P, L, U, Q , a PLUQ decomp. of A with R A = Π P,Q
Output: C : the column echelon form of A C ← P L ( p , .., p r ) = Sort( σ P (1) , .., σ P ( r )) for i = 1 ..r do τ = ( σ − P ( p ) , .., σ − P ( p r ) , r + 1 , .., m ) end for C ← CP τ from the same PLUQ decomposition. Lastly, the column echelon form of the i × j leading sub-matrix, is computed by removing rows of P L below index i andfiltering out the pivots of column index greater than j . The latter is achievedby replacing line 2 by ( p , .., p s ) = Sort( { σ P ( i ) : σ Q ( i ) ≤ j } ). References [1] N. Bourbaki.
Groupes et Alg`egres de Lie . Number Chapters 4–6 in Elementsof mathematics. Springer, 2008. 182] J. J. Dongarra, L. S. Duff, D. C. Sorensen, and H. A. V. Vorst.
NumericalLinear Algebra for High Performance Computers . SIAM, 1998.[3] J.-G. Dumas, T. Gautier, C. Pernet, and Z. Sultan. Parallel computationof echelon forms. In
Euro-Par 2014 Parallel Proc. , LNCS (8632), pages499–510. Springer, 2014. doi:10.1007/978-3-319-09873-9_42 .[4] J.-G. Dumas, P. Giorgi, and C. Pernet. Dense linear algebra over primefields.
ACM TOMS , 35(3):1–42, Nov. 2008. doi:10.1145/1391989.1391992 .[5] J.-G. Dumas, C. Pernet, and Z. Sultan. Simultaneous computation of therow and column rank profiles. In M. Kauers, editor,
Proc. ISSAC’13 . ACMPress, 2013. URL: http://hal.archives-ouvertes.fr/hal-00778136 , doi:10.1145/2465506.2465517 .[6] J.-G. Dumas and J.-L. Roch. On parallel block algorithms for exact tri-angularizations. Parallel Computing , 28(11):1531–1548, Nov. 2002. doi:10.1016/S0167-8191(02)00161-8 .[7] D. Y. Grigor’ev. Analogy of Bruhat decomposition for the closure of acone of Chevalley group of a classical serie.
Soviet Mathematics Doklady ,23(2):393–397, 1981.[8] O. H. Ibarra, S. Moran, and R. Hui. A generalization of the fast LUP matrixdecomposition algorithm and applications.
J. of Algorithms , 3(1):45–56,1982. doi:10.1016/0196-6774(82)90007-4 .[9] C.-P. Jeannerod, C. Pernet, and A. Storjohann. Rank-profile revealingGaussian elimination and the CUP matrix decomposition.
J. SymbolicComput. , 56:46–68, 2013. URL: http://dx.doi.org/10.1016/j.jsc.2013.04.004 , doi:10.1016/j.jsc.2013.04.004 .[10] D. J. Jeffrey. LU factoring of non-invertible matrices. ACM Comm. Comp.Algebra , 44(1/2):1–8, July 2010. URL: , doi:10.1145/1838599.1838602 .[11] W. Keller-Gehrig. Fast algorithms for the characteristic polynomial. Th.Comp. Science , 36:309–317, 1985. doi:10.1016/0304-3975(85)90049-0 .[12] G. I. Malaschonok. Fast generalized Bruhat decomposition. In
CASC’10 ,volume 6244 of
LNCS , pages 194–202. Springer-Verlag, Berlin, Heidelberg,2010.[13] A. Storjohann.
Algorithms for Matrix Canonical Forms . PhD the-sis, ETH-Zentrum, Z¨urich, Switzerland, Nov. 2000. doi:10.3929/ethz-a-004141007doi:10.3929/ethz-a-004141007