A Generalization of QR Factorization To Non-Euclidean Norms
AA GENERALIZATION OF QR FACTORIZATION TONON-EUCLIDEAN NORMS
REID ATCHESON ∗ Abstract.
I propose a way to use non-Euclidean norms to formulate a QR-like factorizationwhich can unlock interesting and potentially useful properties of non-Euclidean norms - for examplethe ability of l norm to suppresss outliers or promote sparsity. A classic QR factorization of amatrix A computes an upper triangular matrix R and orthogonal matrix Q such that A = QR . Togeneralize this factorization to a non-Euclidean norm (cid:107) · (cid:107) I relax the orthogonality requirement for Q and instead require it have condition number κ ( Q ) = (cid:107) Q − (cid:107)(cid:107) Q (cid:107) that is bounded independentlyof A . I present the algorithm for computing Q and R and prove that this algorithm results in Q with the desired properties. I also prove that this algorithm generalizes classic QR factorization inthe sense that when the norm is chosen to be Euclidean: (cid:107) · (cid:107) = (cid:107) · (cid:107) then Q is orthogonal. FinallyI present numerical results confirming mathematical results with l and l ∞ norms. I supply Pythoncode for experimentation. Key words.
QR factorization
1. Introduction.
The QR factorization allows for highly stable, robust, andefficient matrix factorization with beneficial properties to many areas of numericallinear algebra. The factorization may be implemented using only stable operationssuch as Householder reflectors or Givens rotations [10] and highly efficient blockedimplementations can use level-3 BLAS [8],[1]. The key property of the factorizationis orthogonality. Orthgonality means that if we QR factorize a matrix A = QR then Q T = Q − . This property has enormous utility in numerical linear algebraand it plays a significant role in almost every eigenvalue algorithm [3] as well asleast-squares solvers [8]. Despite huge success of the QR factorization it has resistedgeneralization in large part because the orthgonality property tightly bound to theunderlying Euclidean norm (cid:107)·(cid:107) and is almost meaningless without it. Indeed a well-known fact is that the constraint of orthogonality on Q almost completely determinesthe result of the algorithm, leaving little room for alterations that could benefit anew domain - for example by using a norm with domain-specific advantages over theEuclidean norm.Non-Euclidean norms can sometimes provide domain-specific benefits. It is nowwell known for example that the l norm, when used for regression (where it is called”Least Absolut Deviations”) is far less sensitive outliers than the l norm [6], and the l norm also can promote sparsity compared to the l norm [7]. The l ∞ norm alsohas utility for minimax problems [5]. These properties of some non-Euclidean normshave recently resulted in new matrix factorizations using these norms as opposed tothe l norm, for example l norm based SVD factorizations [11]. I determined toinvestigate whether a we can similarly modify a QR factorization to inherit benefitsfrom non-Euclidean norms.As already mentioned the orthogonality property of the QR factorization nearlycompletely determines it and it simultaneously locks us in to the l norm, thus we needa way to relax this condition but find an analogous condition which provides similarutility. For this work I focus on the conditioning of Q rather than orthogonality. Anorthogonal matrix has condition number (in the l norm) equal to 1. Thus I presentalgorithm 2.1 which for any prescribed norm can produce a QR-like factorizationwhere the resulting matrix Q is well-conditioned in the supplied norm. One key ∗ a r X i v : . [ m a t h . NA ] J a n R. ATCHESON contribution of this work is the statement and proof of this key conditioning theorem2.5 which bounds the norm of the inverse of Q . I also show in theorem 2.10 that thisalgorithm generalizes the classic QR factorization in the sense that if the prescribednorm is the Euclidean norm then Q is orthogonal. I follow the mathematical proofswith numerical experiments using the l and l ∞ norms.
2. Main results.
The main results of this work are mathematical with somelight numerical experiments for illustration purposes. In this section I first presentthe algorithm 2.1 below. I then state and prove key bounds on the resulting matrix Q . I first state and prove the forward bound 2.3 which shows that while Q does nothave orthogonality, it still does not increase the norm of vectors significantly whenapplied to them (an orthogonal matrix by comparison does not increase the norm ofa vector at all). The next theorem 2.5 shows a similar kind of bound but in the otherdirection: how much can it shrink an input vector. As usual an orthogonal matrixwill not shrink an input vector at all, but since the algorithm 2.1 does not guaranteeorthogonality I instead provide bounds that constrain its conditioning.I freely use the following vector norms throughout: Definition
Suppose that m is a positive integer and that x ∈ R m . Definethe following: (cid:107) x (cid:107) = m (cid:88) j =1 | x j | l − norm (2.1) (cid:107) x (cid:107) = (cid:118)(cid:117)(cid:117)(cid:116) m (cid:88) j =1 x j l − norm (2.2) (cid:107) x (cid:107) ∞ = max j | x j | l ∞ − norm (2.3)The core algorithm under investigation follows.For simplicity of presentation Ifocus on the case where the input matrix A is full-rank and square, but I also showthat the algorithm and subsequent theorems may be trivially extended to low-rankor rectangular cases in section 2.1. GENERALIZATION OF QR FACTORIZATION TO NON-EUCLIDEAN NORMS Algorithm 2.1
Generalized QR FactorizationStart with an input A ∈ R m × m and any norm (cid:107) · (cid:107) on R m .I use A = ( A , A , A , . . . , A m )(2.4) Q = ( Q , Q , Q , . . . , Q m )(2.5)to represent A and Q by their respective columns. Furthermore I define A i , Q i ∈ R m × i as the first i columns of A, Q respectively: Q i = ( Q , Q , . . . , Q i ) .A i = ( A , A , . . . , A i ) . I now define the Q and R factors inductively as follows: Q = A / (cid:107) A (cid:107) (2.6) R (1 ,
1) = 1 (cid:107) A (cid:107) (2.7)and for any 1 ≤ j ≤ m − c j = arg min c j ∈ R j (cid:107) A j − Q j c j (cid:107) (2.8) γ j = (cid:107) A j − Q j c j (cid:107) (2.9) Q j +1 = ( Q j , γ − j ( A j − Q j c j ))(2.10) R ( j, j −
1) = c j (2.11) R ( j, j ) = γ j (2.12) Theorem
Suppose that m is a positiveinteger, that A ∈ R m × m is full-rank, that (cid:107) · (cid:107) is a norm, and that Q , R are outputfrom algorithm 2.1. Then A = QR Proof.
I proceed by mathematical deduction. Note that A = Q R (1 ,
1) followsdirectly from the base case definitions of these quantities 2.6,2.7. Now assume A j = Q j R (1 : j, j ) for some j > . Then from 2.8,2.9,2.10,2.11,2.12 we have Q j +1 R (1 : j + 1 , j + 1) = ( Q j , Q j +1 ) (cid:20) R (1 : j, j ) c kj γ j (cid:21) = ( Q j R (1 : j, j ) , Q j c j + γ − j Q j +1 )= ( Q j R (1 : j, j ) , Q j c kj + γ − j γ j (cid:0) A j − Q j c j (cid:1) = ( A j , A j +1 )= A j +1 establishing the equation R. ATCHESON A j = Q j R (1 : j, j )for all nonnegative integers j ≤ m. Taking j = m proves 2.2The first theorem related to conditioning of Q establishes a simple forward bound onits norm. Theorem
Suppose that m is a positive integer andthat (cid:107) · (cid:107) is a norm on R m . Then there exists C > such that for every full-rank A ∈ R m × m and every x ∈ R m we have (cid:107) Qx (cid:107) ≤ C (cid:107) x (cid:107) where Q is output from algorithm 2.1Proof. Suppose that x ∈ R m . (cid:107) Qx (cid:107) = (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) m (cid:88) i =1 Q i x i (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) ≤ m (cid:88) i =1 (cid:107) Q i x i (cid:107)≤ m (cid:88) i =1 (cid:107) Q i (cid:107) | x i | = (cid:107) x (cid:107) Finally we may apply norm equivalence between all norms in finite dimensional spacesto choose C > (cid:107) x (cid:107) ≤ C (cid:107) x (cid:107) holds for all x Remark C produced above is independent of A but still (likely)depends on the dimension m of the space because of the use of norm equivalence. Theorem
Suppose that m is a positive integer andthat (cid:107) · (cid:107) is a norm on R m . Then there exists C > such that for every full-rank A ∈ R m × m and every x ∈ R m we have C (cid:107) x (cid:107) ≤ (cid:107) Qx (cid:107) where Q is output from algorithm 2.1 Theorem 2.5 is a little more involved than those that preceeded it, so I organize itsproof into a sequence of lemmas followed by main proof. The lemmas effectivelyestablish partial bounds which if combined carefully result in the complete inversebound. The first lemma establishes a partial inverse bound on Q using the optimalityproperties of its columns. Lemma Q ). Suppose that m is a positive integer,that (cid:107) · (cid:107) is a norm on R m , and that A ∈ R m × m . Suppose further that k is such that ≤ k ≤ m and that x , . . . , x k are real numbers such that x k (cid:54) = 0 . Then (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) k (cid:88) j =1 Q k x k (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) ≥ | x k | where Q is produced by algorithm 2.1 GENERALIZATION OF QR FACTORIZATION TO NON-EUCLIDEAN NORMS Proof of lemma 2.6.
From the inductive definition 2.10 of Q we have (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) k (cid:88) j =1 Q j x j (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) = (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) k − (cid:88) j =1 Q j x j + Q k x k (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) = (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) k − (cid:88) j =1 Q j x j + γ − k A k − k − (cid:88) j =1 Q j c kj x k (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) where c k are coefficients which solve the minimization problem 2.8. We may rearrangeterms as follows (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) k − (cid:88) j =1 Q j x j + γ − k A k − k − (cid:88) j =1 Q j c kj x k (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) = (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) k − (cid:88) j =1 Q j ( x j − γ − k c kj x k ) + γ − k A n x k (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) = (cid:12)(cid:12) γ − k x k (cid:12)(cid:12) (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) γ k x − k k − (cid:88) j =1 Q j ( x j − γ − k c kj x k ) + A k (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) ≥ (cid:12)(cid:12) γ − k x k (cid:12)(cid:12) (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) k − (cid:88) j =1 Q j c kj − A j (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) = | x k | the inequality follows from optimality properties of c k as specified in 2.8 and the finalequation from the definition of γ k in 2.9.The next lemma establishes a partial bound for Q when the vector x satisfies a decayproperty. Lemma
Suppose that m is a positive integer, that (cid:107) · (cid:107) is a norm on R m , and that A ∈ R m × m . Supposefurther that k is such that ≤ k ≤ m and that x ∈ R m satisfies the following decayproperty: | x j | ≤ j (cid:107) x (cid:107) ∞ ( j = k, . . . , m ) Then we have (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) m (cid:88) j = k Q j x j (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) ≤ m (cid:88) j = k k (cid:107) x (cid:107) ∞ where Q is produced by algorithm 2.1Proof of lemma 2.7. This follows by application of triangle inequality, recognizingthat (cid:107) Q j (cid:107) = 1 for all j , and then applying the decay property of x Now I prove the main fact below.
Proof of theorem 2.5.
For this proof I use the l ∞ norm defined as: (cid:107) y (cid:107) ∞ = max j | y j | R. ATCHESON
Suppose that A ∈ R m × m and that x ∈ R m . Define the integer k to be the largestinteger that satisfies 0 ≤ k ≤ m and the following inequality: | x k | ≥ k (cid:107) x (cid:107) ∞ By definition of k we see that x satisfies the decay property stated in lemma 2.7 for x k +1 , . . . , x m . Before proceeding with the key inequality I first establish nonnegativityof a key term so that we may later remove absolute values from it: (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) m (cid:88) j = k +1 Q j x j (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) ≤ m (cid:88) j = k +1 j (cid:107) x (cid:107) ∞ ≤ k (cid:107) x (cid:107) ∞ ≤ | x k |≤ (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) k (cid:88) j =1 Q j x j (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) Thus we have (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) k (cid:88) j =1 Q j x j (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) − (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) m (cid:88) j = k +1 Q j x j (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) = (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) k (cid:88) j =1 Q j x j (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) − (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) m (cid:88) j = k +1 Q j x j (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) ≥ GENERALIZATION OF QR FACTORIZATION TO NON-EUCLIDEAN NORMS (cid:107) Qx (cid:107) = (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) m (cid:88) j =1 Q j x j (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) = (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) k (cid:88) j =1 Q j x j + m (cid:88) j = k +1 Q j x j (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) ≥ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) k (cid:88) j =1 Q j x j (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) − (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) m (cid:88) j = k +1 Q j x j (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) = (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) k (cid:88) j =1 Q j x j (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) − (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) m (cid:88) j = k +1 Q j x j (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) ≥ (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) k (cid:88) j =1 Q j x j (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) − m (cid:88) j = k +1 j (cid:107) x (cid:107) ∞ ≥ | x k | − m (cid:88) j = k +1 j (cid:107) x (cid:107) ∞ ≥ k (cid:107) x (cid:107) ∞ − m (cid:88) j = k +1 j (cid:107) x (cid:107) ∞ = ( 12 k − m (cid:88) j = k +1 j ) (cid:107) x (cid:107) ∞ ≥ m (cid:107) x (cid:107) ∞ Next we may apply norm equivalence in finite dimensional spaces to choose a constant
K > (cid:107) y (cid:107) ≤ K (cid:107) y (cid:107) ∞ holds for all y ∈ R m . Finally by defining C = K m we see that (cid:107) Qx (cid:107) ≥ C (cid:107) x (cid:107) completing the proof Corollary
Suppose that m is a positiveinteger and that (cid:107) · (cid:107) is a norm on R m . Then there exists C > such that for everyfull-rank A ∈ R m × m and every x ∈ R m we have κ ( Q ) = (cid:107) Q − (cid:107)(cid:107) Q (cid:107) ≤ C where Q is output from algorithm 2.1Proof. Applying theorems 2.3 and 2.5 together we may take C = C C and seeimmediately that it establishes the desired bound. Remark C likely depends on the dimension m of the spacebecause of the use of norm equivalence, C almost certainly depends on m . The best R. ATCHESON bound achieved here has C decaying exponentially with m , leading to an exponen-tially growing condition number of Q if the bound is sharp. We will see however innumerical experiments presented in section 3 that this bound appears to be a muchmore manageable O ( m ) for the l and l ∞ norms.The key of this theorem wasn’t necessarily a provably small bound, but ratherthat, regardless of the norm (cid:107)·(cid:107) the bound is independent of A , so in particular A maybe nearly numerically singular and Q still mathemtatically has the same conditioning.It may be possible if we restrict ourselves to specific norms to prove much more lenientbounds. Of course we already know that for the l norm we have C = C = 1 (seetheorem 2.10 below).Finally I show that when we take the input norm as the classic Euclidean norm thenthe factorization becomes a classic QR factorization. Theorem
Suppose that m is a positive integer,that A ∈ R m × m is full-rank, that (cid:107) · (cid:107) = (cid:107) · (cid:107) is the classic Euclidean norm, and that Q , R are output from algorithm 2.1. Then Q − = Q T Proof.
By the inductive definition of Q in 2.10 we have(2.13) Q j +1 = ( Q j , γ − j ( A j − Q j c j ))Recall that c j solves the minimization problem(2.14) c j = arg min c j ∈ R j (cid:13)(cid:13) A j − Q j c j (cid:13)(cid:13) which means it is forming the l projection of A k onto the space V = span( Q , . . . , Q j ).Since Q j +1 is the residual of this projection, it is orthogonal to the whole space V .In other words the above shows that the columns of Q are mutually orthogonal,and the columns are also obviously normalized, so in fact the columns are mutuallyorthonormal - thus we have Q T Q = I as desiredThe theorems above establish that the algorithm produces a factorization of A and that the resulting matrix Q has good conditioning properties. Everything sofar assumed that A was square and full-rank but I demonstrate below that theserestrictions may easily be removed without changing the theorems. One can usethe algorithm 2.1 without significant modification on rank-deficient matrices and rec-tangular matrices. For this we need a ”breakdown condition” on the normalizationvalue γ j in equation 2.10. When γ j ≈ A is rank deficient. To handlethis case the algorithm fills the corresponding values of R (resulting in a 0 on thediagonal) but does not include the new column of Q corresponding to the breakdownand then proceeds to the next column of A until all columns have been processed.However many columns of A get ”skipped” in this fashion reduces the number ofcolumns of Q and rows of R . In other words if the input matrix A ∈ R m × m hasrank k ≤ m then the above modifications output Q ∈ R m × k , R ∈ R k × m (similar to GENERALIZATION OF QR FACTORIZATION TO NON-EUCLIDEAN NORMS
9a ”thin QR”). A factorization in this way may readily be shown to also satisfy all ofthe theorems that assumes full-rank A .For rectangular matrices we may use the above observation and simply input therectangular matrix into a square matrix that is zero-padded. The resulting matrix willbe rank deficient and the earlier modifications to the algorithm will correctly producea factorization. In practice one should simply use the rectangular matrices directly -but I make this observation for the purpose of extending the theorems proven for thesquare matrix case. Following observations in 2.1 we couldfurther extend this algorithm into a ”rank revealing” algorithm which also outputs acolumn permutation for A which guarantees that the diagonal of R is decreasing . Ihave done this in a pre-print [2] and there have proven that the resulting factorizationhas the expected rank-revealing properties and can even be used as a way to computelow-rank approximations to an input matrix similar to classical rank-revealing QR.I found the resulting conditioning theorems, specifically 2.5, very difficult to provehowever and in this manuscript sought to remove any extraneous details not relevantto this bound.
3. Numerical Experiments.
Below I provide numerical experiments to con-firm the theorems conerning Q and to provide some intuition I also suggest a way tointerpret Q as a basis - similarly to how it is interpreted for classic QR. I do thesestudies for both the l and l ∞ norms. I describe in the appendix section A how thefactorization was implemented and provide example code for this purpose. For these studies I seek toconfirm that the forward bound 2.3 and the inverse bound 2.5 are indeed independentof any input A . I then attempt to quantify the dependence of these bounds on m asproof of theorem 2.5 resulted in exponentially decaying bound as m → ∞ , resultingin exponentially growing inverse matrix norm. Since this theorem was proved usingan arbitrary norm it stands to reason that specific concrete norms could improve onthis growth significantly. To show these I randomly sample matrices with differentcondition numbers and sizes m , apply algorithm 2.1, and then compute the forwardand inverse bounds as matrix norms. I do this first for the l case and then followwith the l ∞ case0 R. ATCHESON l -condition number of A || Q || || Q || as function of conditioning of A
20 40 60 80 100 m || Q || || Q || as a function of m l -condition number of A || Q || || Q || as function of conditioning of A
20 40 60 80 100 m || Q || || Q || as a function of m Fig. 1: Using l norm: Dependence of forward and inverse bounds of Q on m and κ ( A )and also the l ∞ case below l -condition number of A || Q || || Q || as function of conditioning of A
20 40 60 80 100 m || Q || || Q || as a function of m l -condition number of A || Q || || Q || as function of conditioning of A
20 40 60 80 100 m || Q || || Q || as a function of m Fig. 2: Using l ∞ norm: Dependence of forward and inverse bounds of Q on m and κ ( A )What we see here is confirmation that the bounds are independent of A and that GENERALIZATION OF QR FACTORIZATION TO NON-EUCLIDEAN NORMS m though there is what appears tobe linear growth. To help provide extra intuitionfor the matrix Q I show here we may interpret it in much the same way we interpretthis matrix when it arises from a classic QR factorization - as an optimized basis.To illustrate this I take the Vandermonde matrix V defined as follows: m = 400 n = 5 h = 2 m − x i = − h ∗ ( i −
1) ( i = 1 , . . . , m ) V i,j = x j − i ( i = 1 , . . . , m ) , ( j = 1 , . . . , n )or, in other words, the j − th column of the Vandermonde matrix is the j -th monomialapplied to a sampling of its domain, in this case 400 equally spaced points from theinterval [ − , l Q , and the l ∞ Q . Ordinary monomials l -optimized polynomials l -optimized polynomials Fig. 3: Applying algorithm 2.1 to Vandermonde matrix to generate different polyno-mial basisNote that the l ∞ plot effectively is an approximation to Chebyshev polynomialswhich would get better for larger m .
4. Conclusions.
I demonstrated that algorithm 2.1 produces a factorizationof an input matrix such that its factors satisfy analogous properties to classical QR2
R. ATCHESON factorization, but instead depend on potentially non-Euclidean norms (which are user-specified). I showed specifically in theorem 2.3 and 2.5 that the Q factor is ”wellbehaved” with respect to the input matrix A . I illustrated the mathematical factswith numerical experiments that validated the principle and showed we can likelyimprove the constants - at least in the case of the l , l ∞ norms.Given the ongoing research of exploiting novel properties of non-Euclidean normsfor matrix factorizations I hope to use this as a step towards QR-like factorizationsthat can build in these properties from other norms. Appendix A. Python implementation.
The key to implementing 2.1 isthe minimization problem 2.8. Fortunately for the l and l ∞ norms we can easilyformulate the minimization problems as linear programs and solve it with the simplexalgorithm - see e.g. [12] for the l case and [4] for the l ∞ case.I implement these minimum-norm solvers in two different ways - one way is highlyoptimized and enables factorizing much larger matrices but depends on the closed-source NAG library [13] by way of the Python interface [14]. For the l norm I usedthe NAG routine ”e02gac” and for the l ∞ norm I used the NAG routine ”e02gcc”.This way of computing the factorizations is preferred because it is much more efficient.Since some may not have access to the NAG library however I also provide a way tosolve the minimization problems directly with linear programs using NumPy [9] andSciPy [15].The plain SciPy implementation of the l solver is in the soruce listing 4, theplain SciPy implementation of the l ∞ solver is in source listing 5. Finally the actualQR algorithm is in 6. Note that the norm and the solver are input callbacks so thatone may use faster solvers as I have done for the NAG library variants.Fig. 4: Solving l minimization problem with linear program in SciPy import numpy as npimport scipy . optimize as optdef lst1norm (A,b):(m,n)=A. shapenvars =m+nncons =2*mcons =np. zeros (( ncons , nvars ))cons [0:m,0:m]=-np. identity (m)cons [0:m,m:m+n]=Acons [m:2*m,0:m]=-np. identity (m)cons [m:2*m,m:m+n]=-Ac=np. zeros ( nvars )c[0:m]=1.0ub=np. zeros ( ncons )ub[0:m]=bub[m:2*m]=-bbounds =[]for i in range (0,m):bounds . append ((0, None ))for i in range (m,m+n):bounds . append (( None , None ))out = opt . linprog (c,cons ,ub ,None ,None , bounds , options ={’tol ’:1e-10 ,’lstsq ’ : True })return ( out .x[m:m+n],out . fun ) GENERALIZATION OF QR FACTORIZATION TO NON-EUCLIDEAN NORMS l ∞ minimization problem with linear program in SciPy import numpy as npimport scipy . optimize as optdef lstinfnorm (A,b):(m,n)=A. shape nvars =n+1ncons =2*mcons =np. zeros (( ncons , nvars ))ub=np. zeros ( ncons ) cons [0:m,0:n]=Acons [0:m,n]=-1.0ub[0:m]=b cons [m:2*m,0:n]=-Acons [m:2*m,n]=-1.0ub[m:2*m]=-b coeffs =np. zeros ( nvars )coeffs [n]=1.0 bounds =[]for i in range (0,n):bounds . append (( None , None )) bounds . append ((0, None ))out = opt . linprog ( coeffs ,cons ,ub ,None ,None , bounds , options ={’tol ’:1e-10,’lstsq ’ : True })return ( out .x[0:n],out . fun ) R. ATCHESON
Fig. 6: Python QR algorithm import numpy as npUSE_NAG = False try : USE_NAG = Truefrom naginterfaces . library . fit import glin_l1solfrom naginterfaces . library . fit import glin_linfexcept :from plain_scipy_solvers import lst1norm , lstinfnorm def l1solve (A,b):m,n=A. shapeif USE_NAG :B=np. zeros ((m+2,n+2))B[0:m,0:n]=A_,_,x,_,_,_= glin_l1sol (B,b)return x[0:n]else :return lst1norm (A,b)[0] def linfsolve (A,b):m,n=A. shapeif USE_NAG :B=np. zeros ((n+3,m+1))B[0:n,0:m]=A.Trelerr =0.0_,_,_,x,_,_,_ = glin_linf (n,B,b, relerr )return xelse :return lstinfnorm (A,b)[0] def qr(A, norm = lambda x : np. linalg . norm (x, ord =1), solver = l1solve ):m,n=A. shapeQ=np. zeros ((m,n))R=np. zeros ((n,n)) gamma = norm (A[:,0])Q[:,0]=A[:,0]/ gamma
R[0,0]= gammafor i in range (1,n): c= solver (Q[:,0:i],A[:,i]) r=A[:,i]-Q[:,0:i]@c gamma = norm (r)
Q[:,i]=r/ gamma
R[0:i,i]=c
R[i,i]= gammareturn Q,R
GENERALIZATION OF QR FACTORIZATION TO NON-EUCLIDEAN NORMS REFERENCES[1]
E. Anderson, Z. Bai, C. Bischof, L. S. Blackford, J. Demmel, J. Dongarra, J. Du Croz,A. Greenbaum, S. Hammarling, A. McKenney, and D. Sorensen , LAPACK Users’Guide , Society for Industrial and Applied Mathematics, third ed., 1999, https://doi.org/10.1137/1.9780898719604, https://epubs.siam.org/doi/abs/10.1137/1.9780898719604, https://arxiv.org/abs/https://epubs.siam.org/doi/pdf/10.1137/1.9780898719604.[2]
R. Atcheson , A rank revealing factorization using arbitrary norms , 2019, https://arxiv.org/abs/1905.02355.[3]
Z. Bai, J. Demmel, J. Dongarra, A. Ruhe, and H. van der Vorst , Templates for the So-lution of Algebraic Eigenvalue Problems , Society for Industrial and Applied Mathemat-ics, 2000, https://doi.org/10.1137/1.9780898719581, https://epubs.siam.org/doi/abs/10.1137/1.9780898719581, https://arxiv.org/abs/https://epubs.siam.org/doi/pdf/10.1137/1.9780898719581.[4]
I. Barrodale and C. Phillips , An improved algorithm for discrete chebyshev linear approxi-mation , in Proc. 4th Manitoba Conf. on Numer. Math., U. of Manitoba, Winnipeg, Canada,1974, pp. 177–190.[5]
I. Barrodale and C. Phillips , Algorithm 495: Solution of an overdetermined system of linearequations in the chebychev norm [f4] , ACM Trans. Math. Softw., 1 (1975), p. 264–270,https://doi.org/10.1145/355644.355651, https://doi.org/10.1145/355644.355651.[6]
D. Birkes and Y. Dodge , Alternative methods of regression , vol. 190, John Wiley & Sons,2011.[7]
D. Donoho , For most large underdetermined systems of linear equations the minimal 1-normsolution is also the sparsest solution , Communications on Pure and Applied Mathematics,59 (2006), pp. 797–829.[8]
G. H. Golub and C. F. Van Loan , Matrix computations , vol. 3, JHU press, 2013.[9]
C. R. Harris, K. J. Millman, S. J. van der Walt, R. Gommers, P. Virtanen, D. Cour-napeau, E. Wieser, J. Taylor, S. Berg, N. J. Smith, R. Kern, M. Picus, S. Hoyer,M. H. van Kerkwijk, M. Brett, A. Haldane, J. F. del R’ıo, M. Wiebe, P. Pe-terson, P. G’erard-Marchant, K. Sheppard, T. Reddy, W. Weckesser, H. Ab-basi, C. Gohlke, and T. E. Oliphant , Array programming with NumPy , Nature, 585(2020), pp. 357–362, https://doi.org/10.1038/s41586-020-2649-2, https://doi.org/10.1038/s41586-020-2649-2.[10]
N. J. Higham , Accuracy and Stability of Numerical Algorithms , Society for Industrial andApplied Mathematics, Philadelphia, PA, USA, second ed., 2002.[11]
Q. Ke , Robust l1 norm factorization in the presence of outliers and miss-ing data by alternative convex programming
J. W. Liu , The role of elimination trees in sparse factorization , SIAM Journal on MatrixAnalysis and Applications, 11 (1990), pp. 134–172, https://doi.org/10.1137/0611010, https://doi.org/10.1137/0611010, https://arxiv.org/abs/https://doi.org/10.1137/0611010.[13]
NAG Inc. and NAG Ltd. , Nag c library
NAG Inc. and NAG Ltd. , Nag library for python