A Short Proof for Gap Independence of Simultaneous Iteration
aa r X i v : . [ c s . NA ] M a y A Short Proof for Gap Independence of Simultaneous Iteration
Edo LibertyYahoo Research
This note provides a very short proof of a spectral gap independent property of the simultaneousiterations algorithm for finding the top singular space of a matrix [1, 2, 3, 4]. The proof is tersebut completely self contained and should be accessible to the linear algebra savvy reader.
Lemma.
Let A ∈ R n × m be an arbitrary matrix and let G ∈ R m × k be a matrix of i.i.d. randomGaussian entries. Let t = c · log( n/ε ) /ε and Z = span(( AA T ) t AG ) then with high probabilitydepending only on a universal constant c || A − ZZ T A || ≤ (1 + ε ) σ k +1 Proof. || A − ZZ T A || = max x : k x k =1 k x T A k such that k x T Z k = 0. Using the SVD of A we changevariables A = U SV T , x = U y and G ′ = V T G . Note that G ′ is also a matrix of i.id. Gaussianentries because V is orthogonal. We get || A − ZZ T A || = max y : k y k =1 k y T S k s.t. y T S t +1 G ′ = 0. Wenow break y , S , and G ′ to two blocks each such that y = (cid:18) y y (cid:19) , S = (cid:18) S S (cid:19) , G ′ = (cid:18) G ′ G ′ (cid:19) and y ∈ R k , y ∈ R n − k , S ∈ R k × k , S ∈ R ( n − k ) × ( n − k ) , G ′ ∈ R k × k , and G ′ ∈ R ( n − k ) × k .0 = k y T S t +1 G ′ k = k y T S t +11 G ′ + y T S t +12 G ′ k≥ k y T S t +11 G ′ k − k y T S t +12 G ′ k≥ k y T S t +11 k / k G ′− k − k y T k · k S t +12 k · k G ′ k≥ | y ( i ) | σ t +1 i / k G ′− k − σ t +1 k +1 · k G ′ k . This gives that | y ( i ) | ≤ ( σ k +1 /σ i ) t +1 k G ′ kk G ′− k . Equipped with this inequality we bound theexpression k y T S k . Let 1 ≤ k ′ ≤ k be such that σ k ′ ≥ (1 + ε ) σ k +1 and σ k ′ +1 < (1 + ε ) σ k +1 . If nosuch k ′ exists the claim is trivial. || A − ZZ T A || = k y T S k = k ′ X i =1 y i σ i + n X i = k ′ +1 y i σ i (1) ≤ ( k G ′ k k G ′− k k ′ X i =1 ( σ k +1 /σ i ) t σ k +1 ) + (1 + ε ) σ k +1 (2) ≤ (cid:2) k G ′ k k G ′− k k (1 / (1 + ε )) t + (1 + ε ) (cid:3) σ k +1 ≤ (1 + 2 ε ) σ k +1 (3)The last step is correct as long as k G ′ k k G ′− k k (1 / (1 + ε )) t ≤ εσ k +1 which holds for t ≥ log( k G ′ k k G ′− k k/ε ) / ε ) = O (log( n/ε ) /ε ). The last inequality uses the fact that G ′ and G ′ are random gaussian due to rotational invariance of the Gaussian distribution. This means that k G ′ k k G ′− k = O (poly( n )) with high probability [5]. Finally, || A − ZZ T A || ≤ √ ε · σ k +1 ≤ (1 + ε ) σ k +1 . eferences [1] Vladimir Rokhlin, Arthur Szlam, and Mark Tygert. A randomized algorithm for principalcomponent analysis. SIAM J. Matrix Analysis Applications , 31(3):1100–1124, 2009.[2] N. Halko, P. G. Martinsson, and J. A. Tropp. Finding structure with randomness: Probabilisticalgorithms for constructing approximate matrix decompositions.
SIAM Rev. , 53(2):217–288,May 2011.[3] Cameron Musco and Christopher Musco. Randomized block krylov methods for strongerand faster approximate singular value decomposition. In Corinna Cortes, Neil D. Lawrence,Daniel D. Lee, Masashi Sugiyama, and Roman Garnett, editors,
Advances in Neural Infor-mation Processing Systems 28: Annual Conference on Neural Information Processing Systems2015, December 7-12, 2015, Montreal, Quebec, Canada , pages 1396–1404, 2015.[4] Rafi Witten and Emmanuel Cand`es. Randomized algorithms for low-rank matrix factorizations:Sharp performance bounds.
Algorithmica , 72(1):264–281, May 2015.[5] Mark Rudelson. Invertibility of random matrices: norm of the inverse.