Least singular value and condition number of a square random matrix with i.i.d. rows
aa r X i v : . [ m a t h . P R ] A p r Least singular value and condition numberof a square random matrix with i.i.d. rows
M. Gregoratti , D. Maran Politecnico di Milano, Dipartimento di Matematica , Piazza Leonardo da Vinci 32, I-20133 Milano, Italy
April 16, 2020
Abstract
We consider a square random matrix made by i.i.d. rows with any distribution and provethat, for any given dimension, the probability for the least singular value to be in [0 , ǫ ) is at leastof order ǫ . This allows us to generalize a result about the expectation of the condition numberthat was proved in the case of centered gaussian i.i.d. entries: such an expectation is alwaysinfinite. Moreover, we get some additional results for some well-known random matrix ensembles,in particular for the isotropic log-concave case, which is proved to have the best behaving in termsof the well conditioning. Keywords : least singular value, condition number, random matrix
The first important results about the least singular value σ min ( e X ) and the condition number κ ( e X ) of asquare n × n random matrix e X were obtained in 1988. Edelman in [3] computed the exact distributionof σ min ( e X ) for a matrix of i.i.d. complex standard gaussian entries and the limiting distribution in thei.i.d. real standard gaussian case. Kostlan in [5] proved that E [ κ ( e X )] = + ∞ whenever the entries arei.i.d. real centered gaussian, regardless of the matrix dimension. Two years later Szarek in [13] foundlower and upper bounds both for E [log κ ( e X )] and for E [ κ ( e X ) α ], 0 < α <
1, which hold every time theentries are i.i.d. standard gaussian, and which depend only on the matrix dimension n , the choice ofthe p -norm on R n , and the choice of α .After twenty years Tao and Vu discovered that, always in the case of i.i.d. random entries, thelimiting distribution of σ min ( e X ) is universal [18]: if the entries moments are bounded, then thecumulative distribution function of the least singular value converges (uniformly) to the one of thegaussian case when the dimension n of the matrix grows.In recent years it were studied some more general classes of random matrices: on one side someworks were focused on removing the assumption of gaussian entries and substituting it with a boundon their tail distribution [10] or even just with the fact that they admit variance [9]; on the other side,some works relaxed the assumption that the entries of the same row are independent and focused onmatrices with i.i.d. rows of specific distributions [1] [20].These works where focused on finding upper bounds for the cumulative distribution function ofthe least singular value σ min ( e X ) as well as lower bounds for the one of the condition number κ ( e X )(since the least singular value is smaller the closer the matrix is to singularity while the condition actually, the cumulative distribution function of nσ , since σ min → n . Morevorer, two of them, [1] and [20] managed to prove even bounds of the type P (cid:16) σ min ( e X ) ≤ ǫ (cid:17) < f ( n, ǫ )which hold for fixed values of n and ǫ > P (cid:16) σ min ( e X ) ≤ ǫ (cid:17) < f ( n ) ǫ − δ for any δ >
0. As we are going to show, this is not a characteristic of the gaussian case.Indeed, we can prove a lower bound for the cumulative distribution function of the least singularvalue σ min ( e X ) of a square random matrix, of every fixed dimension n , in the general setting of i.i.d.rows. We do not ask any additional assumption on the rows distribution, that may have unboundedmoments or even not admit neither a continuous density function nor a discrete one.Under these only assumptions we can prove our main results: • lim inf ǫ → + P (cid:16) σ min ( e X ) < ǫ (cid:17) ǫ > • E (cid:20) σ min ( e X ) (cid:21) = E h k e X − k i = + ∞ , • E h κ ( e X ) i = E h k e X kk e X − k i = + ∞ . The first item generalizes the behaviour of the least singular value of i.i.d. real gaussian entries. Thelast item generalizes the result by Kostlan on the average condition number. Of course, k · k can beany matrix norm and the results are still valid for matrices with i.i.d. columns instead of rows.Moreover, in the cases of a random matrix described by [1, 20], we get additional results bycombining our lower bound with their upper bounds. We prove that the probability of σ min ( e X ) ∈ [0 , ǫ )grows linearly with ǫ in a neighbourhood of 0, as well as we prove an interesting property of themoments of the condition number showing that the isotropic log-concave distribution has the bestbehaving in terms of the well conditioning.Of course, our results are trivial if P (cid:16) σ min ( e X ) = 0 (cid:17) >
0. In particular, our results are trivial inthe discrete case, which was vastly studied by [2] [14] [15] [16]. Indeed, if e X is a square random matrixwith i.i.d. rows X , . . . , X n assuming some value x with positive probability, then P (cid:16) σ min ( e X ) = 0 (cid:17) ≥ P (cid:16) X = X (cid:17) ≥ P (cid:16) X = x (cid:17) > . It is also easy to see that relaxing our only hypothesis, for example taking shifted random matrices(matrices which are made by the sum of a random matrix with independent entries and a deterministicone), our results may not hold true. Indeed, if e X = 3 I + (cid:18) M M M M (cid:19) where M , M , M , M are i.i.d. and such that M ∈ ( − ,
1) a.s., then σ min ( e X ) > σ min ( e X ) which generalize the onesof [10].Therefore, it remains as new open question to find the minimal hypothesis on the random matrix e X such that our results hold.As it will be clear in section 4, our tecniques are ineffective in the case of rectangular randommatrices, where some estimations for the distribution of the least singular values have been foundin [6], [7] and [11]. Given a vector x ∈ R n and a square matrix A ∈ R n × n , we introduce the usual vector and operator p -norms, p ∈ N ∪ { + ∞} , k x k p = p vuut n X i =1 | x ( i ) | p , ∀ p ∈ N , k x k ∞ = max i =1 ,...,n | x ( i ) | , k A k p = max k x k p =1 k Ax k p . In particular, if we denote the rows of the matrix A by A , . . . , A n , we also have k A k ∞ = max i =1 ,...,n k A i k . Moreover, if we denote by σ min ( A ) and σ max ( A ) the smallest and the largest singular value of A respectively, that is the square root of the smallest and the largest eigenvalue of A T A , then we have σ max ( A ) = k A k = max k x k =1 k Ax k , σ min ( A ) = min k x k =1 k Ax k and, if A is invertible, σ min ( A ) = 1 k A − k . Finally, the condition number of A in matrix norm k · k on R n × n is κ ( A ) = ( k A k k A − k , if A is invertible,+ ∞ , otherwise.The condition number depends on the choice of the matrix norm, but different condition numbers arealways pairwise equivalent thanks to the pairwise equivalence of the norms. Our results are based on the introduction of moulds, whose definition is motivated by the followinglemma about the expectation of a positive random variable.
Lemma 3.1.
Let W be a positive random variable such that lim inf t → + ∞ (cid:16) − P ( W ≤ t ) (cid:17) t = q > . Then E [ W ] = + ∞ . roof. By assumption, there exists
T > (cid:16) − P ( W ≤ t ) (cid:17) ≥ q t , ∀ t > T, otherwise we could find a sequence t k → ∞ such that lim k → + ∞ (cid:16) − P ( W ≤ t k ) (cid:17) t k < q/ < q . Then E [ W ] = Z ∞ P ( W > t ) dt ≥ Z ∞ T P ( W > t ) dt ≥ Z ∞ T q t dt = ∞ . Motivated by this lemma, we introduce our main definition.
Definition 3.2.
Let X be a random vector in R n . For every integer number m ≥ , the m -dimensionalmould of X , denoted by C m ( X ) , is the set of all x ∈ R n such that lim inf ǫ → + P (cid:16) k X − x k < ǫ (cid:17) ǫ m > . Of course, every mould C m ( X ) only depends on the distribution of the random vector and, more-over, it does not change if we replace the euclidean norm in the definition with any other one. Thenwe can immediately prove the following important feature of the moulds. Theorem 3.3.
Let X be a random vector in R n and let x be a point in C m ( X ) , m ≥ . Then E (cid:20) k X − x k m (cid:21) = + ∞ . Proof.
By definition of mould, we know thatlim inf ǫ → + P ( k X − x k < ǫ ) ǫ m > . Then, after the change of variable ǫ = m p /t , we havelim inf t → + ∞ P (cid:18) k X − x k m > t (cid:19) t > ⇒ lim inf t → + ∞ (cid:18) − P (cid:18) k X − x k m ≤ t (cid:19)(cid:19) t > . Finally, thanks to previous lemma 3.1, this is enough to get E h k X − x k − m i = + ∞ . In order to usefully apply such a theorem, we need to explore some other features of the moulds.First of all, moulds are a sequence of sets that obviously grows with the index: C ℓ ( X ) ⊆ C m ( X ) , ∀ ℓ ≤ m. (1)Moreover, in order to compute the liminf in the definition of moulds, it is enough to compute theliminf along the sequence ǫ k = 1 /k . Proposition 3.4.
Let X be a random vector in R n , x be a point in R n , m ≥ . Then lim inf ǫ → + P (cid:16) k X − x k < ǫ (cid:17) ǫ m = lim inf k →∞ P (cid:16) k X − x k < /k (cid:17) (1 /k ) m . roof. For a given m , if we set f m ( x ) = lim inf k →∞ P (cid:16) k X − x k < /k (cid:17) (1 /k ) m , then it is enough to show that the liminf computed along any another sequence ǫ j ↓ f m ( x ).Thus, given ǫ j ↓
0, let us consider the integer part of 1 /ǫ j , k j = (cid:20) ǫ j (cid:21) , so that k j ↑ ∞ and, eventually, k j ∈ N and 1 k j + 1 < ǫ j ≤ k j . Then lim inf j →∞ P (cid:16) k X − x k < ǫ j (cid:17) ǫ mj ≥ lim inf j →∞ P (cid:16) k X − x k < k j +1 (cid:17)(cid:16) k j +1 (cid:17) m k mj ( k j + 1) m ≥ f m ( x ) . Thus every m -dimensional mould C m ( X ) is a borelian subset of R n , but in general it could beempty. Anyway an important result holds for m = n . Theorem 3.5.
Let X be a random vector in R n . Then P (cid:16) X ∈ C n ( X ) (cid:17) = 1 .Proof. In order to prove the theorem, we can prove that, if a compact set K occurs with positiveprobability, i.e. P (cid:16) X ∈ K (cid:17) >
0, then K contains at least one point x from the mould C n ( X ).Indeed, this immediately would imply that P (cid:16) X ∈ K (cid:17) = 0 for every compact set K ⊆ C n ( X ) c ,and, by the properties of a probability measure on the borel sets of a metric space, P (cid:16) X ∈ B (cid:17) = sup K ⊆ BK compact P (cid:16) X ∈ K (cid:17) = 0for every borelian B ⊆ C n ( X ) c , and hence the thesis of the theorem for B = C n ( X ) c .So, let K be a compact set such that P (cid:16) X ∈ K (cid:17) = p > C containing K , closed ball with radius R in infinity norm (namely, an R n hypercube), let { c , c , c ...c i ...c n } be the cover of C obtained by splitting C into 2 n identical closedhypercubes (each one of them with radius R/ i such that P (cid:16) X ∈ K ∩ c i (cid:17) ≥ p n . Let us call C the hypercube c i with this property, which obviously implies K ∩ C = ∅ .Since C is a compact hypercube too, we can iterate this process in order to find a sequence ofcompact sets C j such that • C j ⊃ C ℓ for every j < ℓ , • radius ∞ ( C j ) = R/ j , • K ∩ C j = ∅ P (cid:16) X ∈ K ∩ C j (cid:17) ≥ p jn . Now, the Axiom of Choice allows us to find a sequence { a j } j ⊂ R n , with a j ∈ C j ∩ K . Furthermore(i) a j is a Cauchy sequence: k a j − a ℓ k ∞ < R/ j − for all ℓ ≥ j ,(ii) a ℓ belongs to K ∩ C j for all ℓ ≥ j ,(iii) a j → a , where a belongs to K ∩ C j for all j ,(iv) C j +1 ⊆ n x : k x − a k ∞ < R/ j o for all j .Thus we have found a point a which belongs to K and such that, for every j , P (cid:18) k X − a k ∞ ≤ R j (cid:19) ≥ P (cid:16) X ∈ C j +1 (cid:17) ≥ P (cid:16) X ∈ C j +1 ∩ K (cid:17) ≥ p ( j +1) n . Finally, thanks to this inequality, we can conclude the proof by showing that a belongs also to themould C n ( X ). Indeed, given 0 < ǫ < R/
2, if we consider the integer part of log ( R/ǫ ), j ( ǫ ) = (cid:20) log Rǫ (cid:21) ∈ N , then we have R j ( ǫ )+1 < ǫ ≤ R j ( ǫ ) , and therefore P (cid:16) k X − a k ∞ < ǫ (cid:17) ≥ P (cid:18) k X − a k ∞ < R j ( ǫ )+1 (cid:19) ≥ p ( j ( ǫ )+2) n ≥ p ǫ n n R n . This implies lim inf ǫ → + P ( k X − a k ∞ < ǫ ) ǫ n ≥ p n R n > . Thus, every random vector X in R n takes values almost surely in its n -dimensional mould C n ( X ).In particular C n ( X ) cannot be empty. Depending on the distribution of X , such a property can beextended also to lower m -dimensional moulds C m ( X ). Proposition 3.6.
Let X be random vector in R n such that X ∈ B a.s., B being a borelian subset of R n . Suppose that there exists a measurable function d : B → R m and a number c > such that k d ( x ) − d ( y ) k ≥ c k x − y k , ∀ x, y ∈ B. Then P (cid:16) X ∈ C m ( X ) (cid:17) = 1 . Of course, the norms in the theorem do not count.
Proof.
By applying theorem 3.5 to the random vector d ( X ), we immediately get1 = P (cid:16) d ( X ) ∈ C m ( d ( X )) (cid:17) = P (cid:16) X ∈ d − (cid:0) C m ( d ( X )) (cid:1)(cid:17) . So we only need to prove that d − (cid:0) C m ( d ( X )) (cid:1) ⊆ C m ( X ) in order to get the desired result. Byhypotesis, for every ǫ > x ∈ R n we have (cid:16) k X − x k < ǫ (cid:17) ⊇ (cid:16) k d ( X ) − d ( x ) k < cǫ (cid:17) . x ∈ d − (cid:0) C m ( d ( X )) (cid:1) , we havelim inf ǫ → P (cid:16) k X − x k < ǫ (cid:17) ǫ m ≥ lim inf ǫ → P (cid:16) k d ( X ) − d ( x ) k < cǫ (cid:17) ǫ m > , so that x ∈ C m ( X ). This shows that d − (cid:0) C m ( d ( X )) (cid:1) ⊆ C m ( X ) and completes the proof.For example, proposition 3.6 immediately implies that P (cid:16) X ∈ C m ( X ) (cid:17) = 1 if X takes valuesalmost surely in some m -dimensional linear subspace of R n . n i.i.d. n -dimensional random vectors In order to prove our results about the least singular value and the condition number of a squarerandom matrix, first we have to introduce a peculiar property of an n -uple of i.i.d. n -dimensionalrandom vectors satisfying the following assumption. It is crucial that the number of vectors coincideswith the dimension of the space, that is the reason why our results do not extend to rectangularmatrices. Assumption 4.1.
We say that X , . . . , X n satisfy assumption 4.1 if they are i.i.d. random vectorsin R n such that X , . . . , X n − are linearly independent a.s. ( n ≥ ). For example, assumption 4.1 is satisfied by n i.i.d. random vectors with an absolutely continuousdistribution in R n .In order to state the peculiar property holding under this assumption, we need, for n ≥
2, thegeneralized cross product of n − R n , that is ∧ : R ( n − × n → R n , ∧ ( x , . . . , x n − ) = det e · · · e n x (1) · · · x ( n )... · · · ... x n − (1) · · · x n − ( n ) where e i is the i -th element of the canonical basis of R n . Its properties generalize the features of the R cross product:(i) ∧ ( x , . . . , x n − ) is orthogonal to the vector space spanned by x , . . . , x n − ,(ii) k ∧ ( x , . . . , x n − ) k = 0 ⇐⇒ x , . . . , x n − are linearly dependent.Finally we can state the above mentioned property, the main result of this section. Theorem 4.2.
Let X , . . . , X n be random vectors satisfying assumption 4.1. Let Y = ∧ ( X , . . . , X n − ) k ∧ ( X , . . . , X n − ) k ∞ . Then ∈ C ( X n · Y ) , E (cid:20) | X n · Y | (cid:21) = + ∞ . The proof of theorem 4.2 takes the whole section and, of course, it relays on the introduction ofmoulds and their basic properties.First of all, let us remark that k ∧ ( X , . . . , X n − ) k 6 = 0 a.s. because of assumption 4.1 and so therandom vector Y is well defined. Furthermore, the vector ∧ ( X , . . . , X n − ) is a.s. orthogonal to X j for all j = 1 , . . . , n −
1, and it is stochastically independent of X n .7 emark 4.3. The vector Y introduced in theorem 4.2 is similar to the ones introduced in [8](thm1.2), [19] (pagg 6-7) and [1] (proof of proposition 2.10). In these three cases it is indicated as thevector orthogonal to the hyperplane spanned by a set of n − rows and it is normalized with respect tothe euclidean norm instead of the infinity norm. In particular, [1] manages to arrive to complementaryenstimations to the ours, namely, in that article is proved that (for the isotropic log-concave ensemble,see 5.2.1) P ( | X n · Y | < ǫ ) < Cǫ, while we are proving that P ( | X n · Y | < ǫ ) > cǫ for any distribution of X n and positive ǫ sufficiently small. We begin with the following property of the ( n − Y . Proposition 4.4.
Let X , . . . , X n be random vectors satisfying assumption 4.1. Let Y = ∧ ( X , . . . , X n − ) k ∧ ( X , . . . , X n − ) k ∞ . Then Y ∈ C n − ( Y ) a.s..Proof. By construction, the random vector Y belongs to S n − ∞ = n v ∈ R n : k v k ∞ = 1 o a.s..Since there exists a measurable dilation d : S n − ∞ → R n − , the thesis follows immediately byLemma 3.6.The next step is to study the special case of bounded random vectors X , . . . , X n , where we canprove the desired results by showing a link between the n − Y and the propertiesof X n . Proposition 4.5.
Let X , . . . , X n be random vectors satisfying assumption 4.1 and, moreover, letthem be bounded. Let Y = ∧ ( X , . . . , X n − ) k ∧ ( X , . . . , X n − ) k ∞ . Then1. y ∈ C n − ( Y ) = ⇒ ∈ C ( X n · y ) ,2. ∈ C ( X n · Y ) ,3. E (cid:20) | X n · Y | (cid:21) = + ∞ .Proof. We prove the proposition thesis by thesis.1. Since X , . . . , X n are i.i.d., for every y ∈ R n and for every ǫ > P (cid:16) | X n · y | < ǫ (cid:17) = n − vuuut P n − \ j =1 (cid:16) | X j · y | < ǫ (cid:17) . Now, let us take r > k X j k < r a.s., and let us denote by b X the R ( n − × n randommatrix with rows X j : 1 ≤ j ≤ n − In its case the constant depends on the dimension of the matrix and it is universal for every isotropic log-concavedistribution while in our case it is different for every random matrix considered. n − \ j =1 (cid:16) | X j · y | < ǫ (cid:17) = (cid:16) k b Xy k ∞ < ǫ (cid:17) = (cid:16) k b X ( y − Y ) k ∞ < ǫ (cid:17) ⊇ (cid:16) k b X k ∞ k y − Y k ∞ < ǫ (cid:17) ⊇ (cid:16) r k y − Y k ∞ < ǫ (cid:17) , so that lim inf ǫ → + P (cid:16) | X n · y | < ǫ (cid:17) ǫ ≥ lim inf ǫ → + n − vuut P (cid:16) k Y − y k ∞ < ǫ/r (cid:17) ǫ n − . Therefore 0 ∈ C ( X n · y ) for every y ∈ C n − ( Y ).2. Let us consider the following measurable functions of y ∈ R n φ k ( y ) = P (cid:16) | X n · y | < /k (cid:17) , k ∈ N , f (0 | y ) = lim inf k →∞ k φ k ( y ) . Then, by the previous point and by proposition 3.4, for every y ∈ C n − ( Y ) we have f (0 | y ) > k ( y ) ∈ N such that k φ k ( y ) ≥ f (0 | y ) > k ≥ k ( y ).Thus, if we consider B m = (cid:26) y ∈ C n − ( Y ) : k φ k ( y ) ≥ m ∀ k ≥ m (cid:27) , m ∈ N , we get a sequence of borel sets in R n growing to C n − ( Y ). Indeed, for every m ≥ B m ⊆ B m +1 ⊆ ∪ ℓ B ℓ ⊆ C n − ( Y ), obviously, but we also have the opposite inclusion C n − ( Y ) ⊆ ∪ ℓ B ℓ because, taken any y ∈ C n − ( Y ), there exists m ∈ N such that k φ k ( y ) ≥ m for every k ≥ k ( y ), that is y ∈ B m .By monotonicity, this implies that P (cid:16) Y ∈ B m (cid:17) → P (cid:16) Y ∈ C n − ( Y ) (cid:17) , which equals 1 byproposition 4.4, so that there exists m ⋆ such that P (cid:16) Y ∈ B m ⋆ (cid:17) ≥ / P (cid:16) | X n · Y | < /k (cid:17) = E h E h I [0 , /k ) ( | X n · Y | ) (cid:12)(cid:12)(cid:12) Y ii and, thanks to the freezing lemma, which we can apply due to the independence of X n and Y , E h E h I [0 , /k ) ( | X n · Y | ) (cid:12)(cid:12)(cid:12) Y ii = E h φ k ( Y ) i . Then proposition 3.4 allows us to conclude: k P (cid:16) | X n · Y | < /k (cid:17) = k E h E h I [0 , /k ) ( | X n · Y | ) (cid:12)(cid:12)(cid:12) Y ii = k E h φ k ( Y ) i ≥ k E h φ k ( Y ) I B m⋆ ( Y ) i ≥ m ⋆ P (cid:16) Y ∈ B m ⋆ (cid:17) > .
3. Thesis 3 follows immediately from thesis 2 thanks to theorem 3.3.Finally we can prove theorem 4.2. 9 roof of theorem 4.2.
The result is already proved for bounded random vectors thanks to proposition4.5. Then, taken a ρ > E ρ = n \ i =1 (cid:16) k X i k < ρ (cid:17) has positive probability, it is enough to consider the conditional probability P ρ ( · ) = P ( ·| E ρ ) . Indeed for every ǫ > P (cid:16) | X n · Y | < ǫ (cid:17) ǫ ≥ P ρ (cid:16) | X n · Y | < ǫ (cid:17) ǫ P ( E ρ ) , where the right hand side has a strictly positive liminf as ǫ → + by proposition 4.5, as the randomvectors X , . . . , X n are bounded under P ρ and it is a straightforward verification that they are also P ρ -i.i.d. and still satisfy the assumption 4.1.Therefore 0 ∈ C ( X n · Y ) and the full thesis immediately follows thanks to theorem 3.3. σ min ( e X ) Thanks to the introduction of the definition of moulds for a random vector (section 3) and thanks tothe properties deduced for an n -uple of i.i.d. random vectors in R n (section 4), we can finally come toour main results. Let us start with the least singular value. σ min ( e X ) Theorem 5.1.
Let e X be a square random matrix with i.i.d. rows. Then ∈ C (cid:16) σ min ( e X ) (cid:17) i.e. lim inf ǫ → + P (cid:16) σ min ( e X ) < ǫ (cid:17) ǫ > , and, if e X is invertible almost surely, E " | σ min ( e X ) | = E h k e X − k i = + ∞ . Proof.
If the random matrix e X is singular with positive probability the thesis is trivial. Otherwiseits rows X , . . . , X n satisfy assumption 4.1 and we can consider the random vector Y of theorem 4.2.Then it is enough to observe that, since k Y k ∞ = 1 and so k Y k ≥ (cid:16) σ min ( e X ) < ǫ (cid:17) = (cid:16) min k y k =1 k e X y k < ǫ (cid:17) ⊇ k e X Y k k Y k < ǫ ! ⊇ (cid:16) k e X Y k < ǫ (cid:17) = (cid:16) | X n · Y | < ǫ (cid:17) , to deduce lim inf ǫ → + P (cid:16) σ min ( e X ) < ǫ (cid:17) ǫ ≥ lim inf ǫ → + P (cid:16) | X n · Y | < ǫ (cid:17) ǫ > . The full thesis then follows thanks to theorem 3.3.Since the least singular value is invariant under transposition, the theorem holds for matrices withi.i.d. columns, too. 10 .2 Additional results for σ min ( e X ) for some well known ensembles After finding a lower bound of kǫ for the probability that the least singular value σ min of a squarerandom matrix with generic i.i.d. rows is smaller than ǫ , it is natural to ask if this estimation can beimproved for particular random matrix ensembles.Of course, if e X is a random matrix with i.i.d. discrete rows, P ( σ min ( e X ) = 0) > σ min in theneighbourhood of 0. A random vector has a log-concave distribution if for every λ ∈ (0 , f ( x ) its density function,we have f ( λx + (1 − λ ) y ) ≤ f ( x ) λ f ( y ) − λ . A random vector is said to be isotropic if it has mean value zero.In [1] Adamczak et al. show (corollary 2.14) that if e X is a square random matrix of dimension n with i.i.d. rows drawn from an isotropic log-concave distribution, ∀ ǫ ∈ (0 , , ∀ δ > , ∃ C δ : P (cid:16) σ min ( e X ) < n − / ǫ (cid:17) < ǫ − δ C δ . If the matrix is larger than a fixed dimension n , we can even choose δ = 0 in the previousestimation, as it was proved by Tikhomirov in [20] (corollary 1.4), obtaining P (cid:16) σ min ( e X ) < n − / t (cid:17) < Ct, ∀ t > . The dimension n is universal, in the sense that it is independent of the isotropic log-concave distri-bution, as well as C is a universal constant independent both of the isotropic log-concave distributionand of the dimension n > n . Summing up our result and the ones of [1] and [20] we get the followingcorollary. Corollary 5.2.
Let e X be a random matrix with i.i.d. rows drawn from an isotropic log-concavedistribution. Then, for every δ > there exist < k < k such that k ǫ < P (cid:16) σ min ( e X ) < ǫ (cid:17) < k ǫ − δ (where k = C δ √ n and C δ only depends on δ ) holds for positive ǫ sufficiently small. Moreover, thereexists a universal constant n such that, if the size of e X is greater than n , then k ǫ < P (cid:16) σ min ( e X ) < ǫ (cid:17) < k ǫ (where k = C √ n and C is a universal constant) holds for positive ǫ sufficiently small. L contiuous entries (large n) Tikhomirov in [20] proved (corollary 1.3) that for any
L > v ( L ) > n ∈ N such thatfor all matrices e X of dimension n > n of i.i.d. continuous entries X ij with density f such that E [ X ij ] = 0 , E [ X ij ] = 1 , sup x ∈ R f ( x ) < L we have P (cid:16) σ min ( e X ) < n − / t (cid:17) < v ( L ) t, ∀ t > . Summing up with 5.1, we have that even in this case the probability of the least singular value ofbeing small is a first order infinitesimal in the case when the matrix is big enough.11 orollary 5.3.
Let e X be an n × n ( n > n universal constant) random matrix with i.i.d. continuousentries of mean zero and unit variance whose density function is bounded. Then there exist < v < v such that v ǫ < P (cid:16) σ min ( e X ) < ǫ (cid:17) < v ǫ holds for positive ǫ sufficiently small. κ ( e X ) Last but not least the condition number. κ ( e X ) Theorem 6.1.
Let e X be a square random matrix with i.i.d. rows. Then, for every choice of thematrix norm, E h κ ( e X ) i = + ∞ . Proof.
If the random matrix e X is singular with positive probability the thesis is trivial. Otherwise,when e X is invertible a.s., it is enough to prove the theorem for the operator norm induced by thenorm infinity of R n , as condition numbers are pairwise equivalent for a change of the matrix norm.We prove the theorem in two steps, first for rows X , . . . , X n bounded from below, then for thegeneral case of e X invertible a.s..1. If k X k > ρ a.s. for some ρ >
0, then the thesis immediately follows. Indeed, such a conditiongives k e X k ∞ = max i k X i k > ρ a.s.and so, by theorem 5.1, E h κ ∞ ( e X ) i = E h k e X k ∞ k e X − k ∞ i > ρ E h k e X − k i = + ∞ .
2. If e X is invertible a.s., then P (cid:16) k X i k > (cid:17) = 1 and, by monotonicity, there exists ρ > P (cid:16) k X k > ρ (cid:17) >
0. Thus, the event E ρ = n \ i =1 (cid:16) k X i k > ρ (cid:17) has positive probability and we can consider the conditional probability P ρ ( · ) = P ( ·| E ρ ) . As P ( A ) ≥ P ρ ( A ) P ( E ρ ) for every event A , we also have E [ W ] ≥ E ρ [ W ] P ( E ρ ) for every randomvariable W ≥
0. Thus E h κ ( e X ) i ≥ E ρ h κ ( e X ) i P ( E ρ ) = + ∞ by step 1, as the random vectors X , . . . , X n are bounded from below under P ρ and it is a straightforward verification that theyare also P ρ -i.i.d. and satisy assumption 4.1.This theorem is a generalization of [5], theorem 5.2, in which it was shown that the averagecondition number for a random matrix with i.i.d. gaussian entries was infinite.Since the condition number in euclidean norm is invariant under transposition, the previous theo-rem holds for matrices with i.i.d. columns, too. 12 .2 Additional result for κ ( e X ) in the isotropic log-concave case Again, in [1] Adamczak et al. proved an upper bound for the condition number (corollary 2.15) in theisotropic log-concave case: for every square random matrix e X with n columns (or rows) i.i.d. withisotropic log-concave distribution and for every δ >
0, there exists C δ such that P (cid:16) κ ( e X ) > nt (cid:17) ≤ C δ t − δ , ∀ t > . This result, which bounds the probability that the condition number is high, can be merged withtheorem 6.1 to prove the following corollary.The corollary shows that, under the isotropic log-concave hypothesis, α = 1 is the least numbersuch that E [ κ ( e X ) α ] = + ∞ . Corollary 6.2.
Let e X be a square random matrix with i.i.d. rows (or columns) with isotropic log-concave distribution. Then E h κ ( e X ) α i < + ∞ ⇐⇒ α < . Proof.
Our theorem 6.1 proves that α ≥ ⇒ E h κ ( e X ) α i = + ∞ . So it is enough to show that α < ⇒ E h κ ( e X ) α i < + ∞ . By the above mentiond result we have ∀ δ > , ∃ C δ > P (cid:16) κ ( e X ) > t (cid:17) ≤ C δ t − δ , ∀ t > , and so it follows that, for all t > α ∈ (0 , P (cid:16) κ ( e X ) α > t (cid:17) = P (cid:16) κ ( e X ) > t /α (cid:17) ≤ C δ t (1 − δ ) /α . Now, since α <
1, we can choose δ positive such that η = (1 − δ ) /α > . This means that there exist η > C δ > P (cid:16) κ ( e X ) > t (cid:17) ≤ C δ t η , ∀ t > . Then E h κ ( e X ) i = Z ∞ P (cid:16) κ ( X ) > t (cid:17) dt ≤ Z dt + Z ∞ C δ t η dt = 1 + C δ η − , which is less than infinity since η > This last result shows again that, for random matrices with i.i.d. isotropic log-concave rows, our lowerbound estimations of the least singular value and of the condition number are complementary to theupper bounds known from the literature: [1] and [20] give ∃ k δ , k > P (cid:16) σ min ( e X ) < ǫ (cid:17) < ( k δ ǫ − δ , ∀ δ > , ∀ < ǫ < ,kǫ, ∀ ǫ > , if the size of e X is large enough,13 h κ ( e X ) α i < ∞ , ∀ α < , while we proved (corollaries 5.2 and 6.2) that ∃ k , ǫ > k ǫ < P (cid:16) σ min ( e X ) < ǫ (cid:17) ∀ < ǫ < ǫ ,α < ⇐⇒ E h κ ( e X ) α i < ∞ . This means that, for every random matrix with i.i.d. rows, even if they do not admit a density functionor their moments are unbounded, the probability of the least singular value of laying in the interval[0 , ǫ ) is at least of the order of ǫ , and in some special cases such as the log-concave ensembles it isexactly of that order. Fortunately, for these distributions we can even bound the previous probabilitywith constants that depends only on the dimension of the matrix and on some universal constants.Similarly, taking a random matrix where the rows are i.i.d., then inevitably κ ( e X ) / ∈ L . However, choosing the previous particular ensembles we can have a slightly weaker integrability, κ ( e X ) α ∈ L , ∀ α < . As this one is the best achievable integrability, it is shown that the isotropic log-concave distributionsare among the ”nicest” ones in terms of the well-conditioning of a matrix with those rows.
References [1] R. Adamczak, O. Gu´edon, A. E. Litvak, A. Pajor, N. Tomczak-Jaegermann,
Condition numberof a square matrix with i.i.d. columns drawn from a convex body , Proceedings of the AmericanMathematical Society Volume 140, Number 3, Pages 987998 S 0002-9939(2011)10994-8 (2012).[2] J. Bourgain, V. H. Vu, P. M. Wood,
On the singularity probability of discrete random matrices ,J. Funct. Anal. 258 (2010), no. 2, 559-603.MR2557947[3] A. Edelman,
Eigenvalues and condition numbers of random matrices , SIAM J. Matrix Anal. Appl.9 (1988), 543560.[4] H. Huang, K. Tikhomirov,
Remark on the smallest singular value of powers of gaussian matrices ,arXiv:1910.03702 [math.PR] (2020)[5] E. Kostlan,
Complexity Theory of Numerical Linear Algebra , Journal of Computational andApplied Mathematics 22 (1988).[6] A. Litvak, A. Pajor, M. Rudelson, N. Tomczak-Jaegermann,
Smallest singular value of randommatrices and geometry of random polytopes , Adv. Math. 195 (2005), no. 2, 491523.[7] A. E. Litvak, O. Rivasplata, Smallest singular value of sparse random matrices, Studia Math.212 (2012), no. 3, 195218.[8] Galyna V. Livshyts, Konstantin Tikhomirov, Roman Vershynin
The smallest singular value ofinhomogeneous square random matrices
Georgia Institute of Technology (2019).[9] E. Rebrova, K. Tikhomirov,
Coverings of random ellipsoids, and invertibility of matrices withi.i.d. heavy-tailed entries
Israel Journal of Mathematics volume 227, pages 507544(2018)[10] M. Rudelson, R. Vershynin,
Invertibility of random matrices: norm of the inverse , Annals ofMathematics, 168 (2008), 575-600 1411] M. Rudelson, R. Vershynin,
Smallest singular value of a random rectangular matrix , Communi-cations on Pure and Applied Mathematics 62 (2009), 1707-1739.[12] A. Sankar, D. A. Spielman, S.-H. Teng,
Smoothed analysis of the condition numbers and growthfactors of matrices , SIAM J. Matrix Anal. Appl. 28 (2006), no. 2, 446476 (electronic). MR2255338[13] S. J. Szarek,
Condition numbers of random matrices , J. Complexity 7 (1991), no. 2, 131149.MR1108773[14] T. Tao, V. Vu,
On random ρ
28 (2006), no 1, 1-23[15] T. Tao, V. Vu,
On the singularity probability of random Bernoulli matrices , J. Amer. Math. Soc.20 (2007), 603-628.[16] T. Tao, V. Vu,
Inverse Littlewood-Offord theorems and the condition number of random discretematrices , Annals of Mathematics 169 (2009), 595-632[17] T. Tao, V. Vu,
Smooth analysis of the condition number and the least singular value , Math. Comp.79 (2010), no. 272, 23332352. MR2684367[18] T. Tao, V. Vu
Random matrices: the distribution of the smallest singular values , Geometric andFunctional Analysis volume 20, 260297(2010)[19] K. Tatarko
An upper bound on the smallest singular value of a square random matrix
Journal ofComplexity Volume 48, October (2018), Pages 119-128[20] K. Tikhomirov,