Improved mutual coherence of some random overcomplete dictionaries for sparse repsentation
aa r X i v : . [ m a t h . NA ] N ov Improved mutual coherence of some randomovercomplete dictionaries for sparse repsentation ∗ Yingtong Chen [email protected]
School of Mathematics and Statistics, Xi’an Jiaotong University,No.28, Xianning West Road, Xi’an, Shaanxi, 710049, P.R. ChinaBeijing Center for Mathematics and InformationInterdisciplinary Sciences (BCMIIS)
Jigen Peng [email protected]
School of Mathematics and Statistics, Xi’an Jiaotong University,No.28, Xianning West Road, Xi’an, Shaanxi, 710049, P.R. ChinaBeijing Center for Mathematics and InformationInterdisciplinary Sciences (BCMIIS)
Abstract
The letter presents a method for the reduction in the mutual coherenceof an overcomplete Gaussian or Bernoulli random matrix, which is fairlysmall due to the lower bound given here on the probability of the event thatthe aforesaid mutual coherence is less than any given number in (0, 1). Themutual coherence of the matrix that belongs to a set which contains the twotypes of matrices with high probability can be reduced by a similar methodbut a subset that has Lebesgue measure zero. The numerical results areprovided to illustrate the reduction in the mutual coherence of an overcom-plete Gaussian, Bernoulli or uniform random dictionary. The effect on thethird type is better than a former result.
Keywords:
Sparse representation, Mutual coherence, Gaussian random matrices,Bernoulli random matrices ∗ This work was supported by the National Natural Science Foundation of China under Grant11131006. This work was supported in part by the National Basic Research Program of Chinaunder Grant 2013CB329404. Introduction
Recently, sparse representation has attracted a lot of scientists from many dif-ferent fields. If b ∈ R m is a given signal which is known to be represented as alinear combination of only a few atoms of a dictionary A ∈ R m × n ( m < n ), therepresentation vector x can be correctly and effectively computed by some proce-dures. The desire to find the maximally sparse solutions of an underdeterminedlinear system Ax = b can be cast as the following optimization problem, ( P ) min x k x k s.t. b = Ax (1) It is well known that that for a noiseless signal b = Ax , x is the uniquesolution to the problem (1) with k x k < (1 + 1 /µ ( A )) / [5] and both OMP [8] andBP [4] can find it [5, 11]. µ ( A ) is the largest absolute normalized inner productbetween different columns from A . Clearly, the smaller the mutual coherence of A [5] is, the better the above result becomes.An optimized algorithm was proposed in [6] to reduce µ t of the product of aprojection and a given dictionary in the context of compressed sensing, a descen-dant of sparse approximation. [9] presented improved versions of thresholding andOMP for sparse representation with an iterative algorithm which produced a newdictionary at the step of sensing with the cross cumulative coherence. While wework with the mutual coherence of an overcomplete Gaussian or Bernoulli randomdictionary to improve the classical results in [5, 8, 4, 11].Why we feel an interest in the mutual coherence of such dictionaries? Firstly,an overcomplete Gaussian or Bernoulli random matrix satisfies the RIP condi-tion, another measure of a dictionary, which is widely used and different from themutual coherence, with a high probability [2]. So they have been commonly usedin general numerical experiments to verify new methods. Secondly, the mutualcoherence of such a matrix has been studied by some statistics from the limitinglaws point of view [10]. But how can a person directly reduces them in the finitedimensional case? When A is an overcomplete Gaussian or Bernoulli random matrix and hasfull row rank, the mutual coherence is fairly small (A) but can be reduced bymultiplying an inverse matrix P generated from A itself on the left side. In thissection, it is shown that the lower bound of the probability P { µ ( PA ) ≤ ε } with ε ∈ (0 , is larger than the one of P { µ ( A ) ≤ ε } . The two bounds are obtained2y the same way.A Bernoulli random matrix A means that a ij is independently selected tobe ± √ m with the same probability, while a ij of a Gaussian random matrixindependently follows a normal distribution N (0 , m − ) . α j denotes the j th columnof A . From the strong concentration of k Ax k in [2], it is known that the extremesingular values of A , σ and σ m , satisfy < √ − η ≤ σ m ( A ) ≤ σ ( A ) ≤ √ η with the probability which is more than − c ( η )) with c ( η ) = η / − η / , η ∈ (0 , . So A has full rank with exponentially high probability. P = ( AA T ) − / makes ( PA )( PA ) T = I m . The intuitive idea to do this is thatan orthogonal matrix has the smallest mutual coherence and PA consisting of m orthonormal rows may have a smaller mutual coherence than µ ( A ) .Consider PA and A in the singular value decomposition domain. The SVD of A is A = U [ S , O ] V T with two orthonormal matrices U and V and a diagonalmatrix, S = diag { σ ≥ σ ≥ · · · ≥ σ m > } . V can be divided into V ∈ R n × m and V ∈ R n × ( n − m ) . v j = [ v (1) j , v (2) j ] is the j th row of V and v (1) j , v (2) j are the j th rows of V and V respectively. The k th column of A = USV T is α k = USv (1) Tk and |h α j , α k i| can be bounded by the Wielandt inequality in [13, 7] as |h α j , α k i| j = k = | ( USv (1) Tj ) T ( USv (1) Tk ) | = | v (1) j S v (1) Tk |≤ σ − σ m σ + σ m + |h v (1) Tj , v (1) Tk i|k v (1) j k k v (1) k k ! · σ − σ m σ + σ m · |h v (1) Tj , v (1) Tk i|k v (1) j k k v (1) k k ! − , u The k th column of PA is P α k = Uv (1) Tk and |h P α j , P α k i| / ( k P α j k k P α k k ) which is equal to |h v (1) Tj , v (1) Tk i| / ( k v (1) j k k v (1) k k ) , u . Clearly, u is larger than u . For the subset of overcomplete Gaussian and Bernoulli random matrices withthe aforementioned positive bounded singular values, consider the following events E = { (1 − η ) k x k ≤ k Ax k ≤ (1 + η ) k x k , ∃ η ∈ (0 , } , E = { µ ( A ) ≤ ε, ε ∈ (0 , } and E = { µ ( PA ) ≤ ε, ε ∈ (0 , } . It is shown that P { E ∩ E } = P { E } − P { E ∩ E c } P { E ∩ E } = P { E } − P { E ∩ E c } P { E ∩ E c } ≤ n ( n − P { ε ≤ u } , p P { E ∩ E c } ≤ n ( n − P { ε ≤ u } , p So it indirectly reflects the phenomenon that the mutual coherence of theproduct of an inverse matrix P and an overcomplete Gaussian or Bernoulli randommatrix is smaller than the original mutual coherence from the result that theformer has a better lower bound of the probability P { µ ( PA ) ≤ ε } than the oneof P { µ ( A ) ≤ ε } , ε ∈ (0 , , although they are obtained by the same way. In this section, we use the above way for reducing the mutual coherence of anovercomplete Gaussian or Bernoulli dictionary on a set X which almost includes3he two types and prove that the newly defined essential mutual coherence on X is strictly smaller than the original one but a Lebesgue zero measure subset. X , { A ∈ R m × n ; m < n, µ ( A ) < , diag( A T A ) } . m < n is natural and A has no zero columns otherwise there exist some meaningless atoms. It is senselessto multiply an inverse matrix P from the left side of A for reducing µ ( A ) , if µ ( A ) = 1 . For the Gaussian type, P { ∈ diag( A T A ) } = 0 = P { µ ( A ) = 1 } if wenotice that the independence of each a ij and the condition on equality of CauchySchwarz Inequality. For the Bernoulli type, P { µ ( A ) = 1 } ≤ n ( n −
1) exp( − m/ due to the Hoeffding Inequality in probability. That is to say that the Gaussianor Bernoulli type belongs to X with a high probability.Consider an equivalent problem of ( P ) , ( P ′ ) min x k x k s.t. Pb = PAx withan invertible matrix P . A new quantity, essential mutual coherence, is defined as µ e ( A ) , inf P µ ( PA ) , P ∈ GL m ( R ) = { all m order real invertible matrices } andis invariant for the elementary row operations of matrices. In this section we showthat µ e ( A ) < µ ( A ) holds true almost every where on X by constructing a matrix P , E m + ε E m, ∈ GL m ( R ) with a proper ε such that µ ( PA ) < µ ( A ) . E m is the m order identity matrix and E m, ∈ R m × m consists of 0 but 1 on the position ( m ,1). How to choose the parameter ε ? For two different columns of A , α i and α j , i < j , a polynomial f ij ( ε ) = A ij + B ij ε + C ij ε + D ij ε + E ij ε can be obtained from I ( P α i , P α j ) , |h P α i , P α j i| / ( k P α i k k P α j k ) < µ ( A ) (use µ instead of µ ( A ) ). f ij (0) = A ij = k α i k k α j k ( I ( α i , α j ) − µ ) < , for all pairs ( α i , α j ) suchthat I ( α i , α j ) < µ . There exists a ε ( ij )1 > such that for all ε ∈ ( − ε ( ij )1 , ε ( ij )1 ) , f ij ( ε ) < owing to continuity of the polynomial f ij ( ε ) . A minimum, ε , can befounded among all the ε ( ij )1 > for all such pairs.For all pairs ( α i , α j ) such that I ( α i , α j ) = µ , if f ′ ij (0) = B ij is positive ornegative, then there exists a ε ( ij )2 > such that f ij ( ε ) < , ∀ ε ∈ ( − ε ( ij )2 , or (0 , ε ( ij )2 ) . Among all of them there exists a minimum, ε . So the sign of the final ε is attributed to all of the pairs such that I ( α i , α j ) = µ .Two exceptions catch our attention. Firstly, it is impossible to choose ε ( ij )1 or ε ( ij )2 when both A ij and B ij are zero. The set X , { A ∈ X ; B ij = 0 , A ij = 0 } has measure zero due to X ⊂ ∪ i
100 150 200 250 300 350 400 450 5001.522.533.54 m . ( + / c ohe r en c e ) originalprocessed Fig. 1.
A comparison of . /µ ( A )) and . /µ ( PA )) for the Bernoullirandom dictionary of size m × m , where m ∈ [100 , .about m = 280, the bound 0.5(1+1/coherence) exceeds 3 and keeps increasing till500 when coherence is taken as µ ( PA ) but the one with µ ( A ) remains below 3till 500 although it also becomes larger and larger. That is, µ ( PA ) can lead tothe recovery of the sparse solution with 3 nonzero elements for both OMP andBP when m ∈ [280 , . It is one more than the result obtained by using µ ( A ) .Both . /µ ( A )) and . /µ ( PA )) grow alongside the increase in m , soonly six values of m are listed in Table 1 in order to avoid the similarity betweenthe forms of Fig. 1 and Table 1 which considers all the values in [1500, 2000]. Asshown in Table 1, the condition . /µ ( A )) is below 5 but . /µ ( PA )) is above 6 when m is after 1800. The second experiment compares the effectof the above method and the one proposed in [3] (called BEZ here) for decreasingthe mutual coherence of the overcomplete uniform random dictionary D whichis obtained by choosing the entries independently from a uniform distribution in[0, 1] and then normalizing the columns to a unit ℓ -norm. The authors in [3] used P = (1 − ǫ ) / T . Here, ∈ R is of 1’s and ǫ satisfies < ǫ ≪ .Three values, . /µ ( D )) , . /µ ( PD )) with P = ( DD T ) − / and . /µ ( PD ) ) with P = (1 − ǫ ) / T are obtained for the dictionary D generated as per instructions in [3] with a fixed size of × . This is done5
00 150 200 250 300 350 400 450 5001.522.533.54 m . ( + / c ohe r en c e ) originalprocessed Fig. 2.
A comparison of . /µ ( A )) and . /µ ( PA )) for the Gaussianrandom dictionary of size m × m , where m ∈ [100 , . Table 1
A comparison of 0.5(1+1/ µ ( A ) ) and 0.5(1+1/ µ ( PA ) ) for the Bernoulli randommatrix of size m × m when m ∈ linspace(1500 , , (a MATLAB r nota-tion). m µ ( A ) ) 4.1531 4.2708 4.3896 4.4893 4.5769 4.67380.5(1+1/ µ ( PA ) ) 5.6711 5.8311 5.9798 6.0945 6.2611 6.4046 Table 2
A comparison of 0.5(1+1/ µ ( A ) ) and 0.5(1+1/ µ ( PA ) ) for the Gaussian randommatrix of size m × m when m ∈ linspace(1500 , , . m µ ( A ) ) 4.1686 4.3185 4.4032 4.4860 4.5952 4.64690.5(1+1/ µ ( PA ) ) 5.7126 5.8191 5.9664 6.0822 6.2473 6.4228600 times and ǫ is selected to maximize . /µ ( PD ) ) with P in [3] over linspace(10 − , − , per time. In Fig. 3, three lines placed from bottomto up represent the aforementioned three values respectively. . /µ ( D )) isbelow 1.5 all the time. In all 100 times, . /µ ( PD )) obtained by using BEZin [3] is strictly smaller than 2, but meanwhile, the one generated by applying ourmethod can leap two 86 times.
10 20 30 40 50 60 70 80 90 10011.522.5 time . ( + / c ohe r en c e ) original BEZ our Fig. 3.
A comparison of 0.5(1+1/ µ ( D ) ), 0.5(1+1/ µ ( PD ) ) with P = ( DD T ) − / and 0.5(1+1/ µ ( PD ) ) with P = (1 − ǫ ) / T for D of size × . The mutual coherence of an overcomplete Gaussian or Bernoulli random ma-trix which is proven to be small can be reduced directly by multiplying an inversematrix from the left side to improve the traditional umbrella of unique sparserepresentation and successful reconstruction behavior. Furthermore, the newlyproposed essential mutual coherence which is inspired by what has been done isproven to be strictly smaller than the original one on a set except a Lebesguemeasure zero set. Numerical results exhibits the decrease in the mutual coherenceof an overcomplete Gaussian or Bernoulli random dictionary and also an over-complete Uniform random matrix which is better than the former result obtainedby other authors.
A Tail bound
For Gaussian random matrices, m k α j k ∼ χ ( m ) . • P { Z − m ≤ − √ mx } ≤ exp( − x ) , ∀ x > , for a centralized χ -variable Z with m degrees of freedom [12]. • P {|h α j , α k i| j J.Commun. Netw. , 12(4):289–307, August 2010.[2] R. Baraniuk, M. Davenport, R. DeVore, and M. Wakin. A simple proofof the restricted isometry property for random matrices. Constr. Approx. ,28(3):253–263, December 2008.[3] A. M. Bruckstein and M. Elad. On the uniqueness of nonnegative sparsesolutions to underdetermined systems of equations. IEEE Trans. Inf. Theory ,54(11):4813–4820, November 2008.[4] S.S. Chen, D.L. Donoho, and M.A. Saunders. Atomic decomposition by basispursuit. SIAM J. Sci. Comput. , 20(1):33–61, 1998.[5] D.L. Donoho and M. Elad. Optimally sparse representations in general(nonorthogonal) dictionaries via ℓ minimization. Proc. Natl. Acad. Sci. ,100(5):2197–2202, March 2003. 86] M. Elad. Optimized projections for compressed sensing. IEEE Trans. SignalProcess. , 55(12):5695–5702, December 2007.[7] Minghua Lin and G. Sinnamon. The generalized wielandt inequality in innerproduct spaces. Eurasian Math. J. , 3(1):72–85, 2012.[8] Y.C. Pati, R. Rezaiifar, and P.S. Krishnaprasad. Orthogonal matching pur-suit: recursive function approximation with applications to wavelet decom-position. In , pages 40–44. IEEE, November1993.[9] K. Schnass and P. Vandergheynst. Dictionary preconditioning for greedyalgorithms. IEEE Trans. Signal Process. , 56(5):1994–2002, May 2008.[10] T. TonyCai and T. Jiang. Limiting laws of coherence of random matrices withapplications to testing covariance structure and construction of compressedsensing matrices. Ann. Stat. , 39(3):1496–1525, 2011.[11] J.A. Tropp. Greed is good: algorithmic results for sparse approximation. IEEE Trans. Inf. Theory , 50(10):2231–2242, October 2004.[12] M. J. Wainwright. Information-theoretic limits on sparsity recovery in thehigh-dimensional and noisy setting. IEEE Trans. Inf. Theory , 55(12):5728–5741, December 2009.[13] Z. Yan. A unified version of cauchy-schwarz and wielandt inequalities.