Compressed Sensing for Block-Sparse Smooth Signals
aa r X i v : . [ s t a t . M L ] S e p Compressed Sensing for Block-SparseSmooth Signals
Shahzad Gishkori,
Student Member, IEEE, and Geert Leus,
Fellow, IEEE
Abstract —We present reconstruction algorithms for smoothsignals with block sparsity from their compressed measurements.We tackle the issue of varying group size via group-sparse leastabsolute shrinkage selection operator (LASSO) as well as vialatent group LASSO regularizations. We achieve smoothness inthe signal via fusion. We develop low-complexity solvers for ourproposed formulations through the alternating direction methodof multipliers.
Index Terms —Compressed sensing, block sparsity, smoothness,signal reconstruction
I. I
NTRODUCTION
Compressed sensing [1], [2] is one of the most excitingtopics of present-day signal processing. Signal reconstructionfrom its low-dimensional representation becomes a possi-bility, given the sparse nature of the signal and, incoher-ence and/or restricted isometry property (RIP) [2] of thesensing/measurement process. In this regard, a number ofapproaches can be used, e.g., basis pursuit (BP) [3], leastabsolute shrinkage and selection operator (LASSO) [4] andgreedy algorithms [5]. In order to exploit the structure ofthe signal being sensed, a number of variants of LASSOhave become popular, e.g., group LASSO (G-LASSO) [6],sparse group LASSO (SG-LASSO) [7] and fused LASSO(F-LASSO) [8], etc. In this letter we propose new LASSOformulations to handle block sparse smooth signals.Smooth signals are often encountered in a wide rangeof engineering and biological fields. In engineering, signalsobserved in image processing, control systems and environ-ment monitoring are often smooth or piece-wise smooth. Inbiology, a similar structure is observed, e.g., in protein massspectroscopy [8]. The goal is to recover such structured signalsfrom noisy and/or under-sampled measurements. A relatedtopic is signal smoothing which deals with removing randomoutliers. Apart from being smooth, such signals can oftenbe represented as sparse in some basis. This sparsity patternnormally varies in terms of the location and block size of thesparse coefficients. The challenge for signal reconstruction isto exploit the block sparsity with varying block sizes, whilekeeping smoothness intact and using fewer measurements, butall at low complexity. In the CS domain, signal smoothness hasbeen handled by using a fusion constraint in [8]. The fusionis also known as total variation (TV) in the image processingliterature. Apart from fusion, [8] also proposed an ℓ -norm This work is supported in part by NWO-STW under the VICI program(project 10382).The authors are with the Faculty of Electrical Engineering, Mathematicsand Computer Science of Delft University of Technology, 2628 CD, Delft,The Netherlands (emails: { s.s.gishkori, g.j.t.leus } @tudelft.nl ). penalty to cater for signal sparsity. However, since most ofthe signals are block sparse, [8] cannot give efficient results.To cater for the block sparsity, one can replace the ℓ -normpenalty with a group penalty. Although this approach can han-dle the block sparsity very well, it only offers fixed group sizesand causes complete groups to be zero or nonzero. To avoidelimination of small sets of nonzero elements, a very smallgroup size is opted but that can make the algorithm inefficientin eliminating large blocks of zero elements. In this regard wepropose to use a moderate group size along with an ℓ -normpenalty over the signal, to create sparsity within the groups.Thus by using fusion in combination with ℓ -norm penalty anda moderate group size, a smooth signal can be reconstructedwith high accuracy. The problem of varying group sizes canalso be handled by using the concept of latent groups, see [9]and references therein. These are basically overlapping groups,with recurring signal elements in possibly multiple groups.Thus, an element lost in one group may resurface throughanother group after reconstruction. So we also propose to usesuch latents groups in combination with a fusion constraintto recover block sparse smooth signals with varying blocksizes. Note that a work on using overlapping groups over thefusion function instead of the signal structure has appearedin [10], which however requires complete signal samples.Instead, we propose overlapping groups and fusion penaltiesover the actual signal for the under-determined systems. Thus,we exploit the actual structure of the signal rather than thedifference of elements. Further, in order to solve the proposedformulations we derive low-complexity algorithms based onthe alternating direction method of multipliers (ADMM) [11].The reason for using this version of the augmented Lagrangianmethods is primarily the non-separability of the fusion penaltyin terms of the elements of the signal. Thus, the generalconvergence properties of ADMM can be used to guaranteeoptimal results for our proposed algorithms. Notations . Matrices are in upper case bold while columnvectors are in lower case bold, [ X ] i,j is the ( i, j ) th entry of thematrix X , [ x ] i is the i th entry of the vector x , I N is the identitymatrix of size N × N , ( · ) T is transpose, ˆ x is the estimate of x , ∆ = defines an entity, k x k p = ( P N − i =0 | [ x ] i | p ) /p is the the ℓ p norm of x , sign ( x ) is the sign function which takes values − and depending on the polarity of the element x , whereas thefunction ( x ) + = x if and only if x > otherwise ( x ) + = 0 .II. S IGNAL R ECONSTRUCTION
Let x be the N × recoverable signal. Given M measure-ments, the sensed signal can be written as y = Φx + v (1) where y is an M × measurement vector, Φ is an M × N mea-surement matrix ( M < N ) with compression ratio µ ∆ = M/N and v is an M × zero-mean additive white Gaussian noisevector with variance σ v . To recover the signal from thecompressed measurements while keeping the signal structurein tact, we propose below, two LASSO formulations. A. Sparse Group LASSO with Fusion
Through sparse group fused LASSO (SGF-LASSO), we canresolve the issue of signal smoothness, as well as, that of fixedgroup sizes. The optimization problem can be formulated as ˆ x = arg min x k y − Φx k + λ e k x k + λ g G − X i =0 k x i k + λ f N − X j =1 k [ x ] j − [ x ] j − k (2)where x i is an N/G × sub-vector of x , represent-ing one of G groups over the elements of x , i.e., x = [ x T , x T , · · · , x TG − , ] T . We can see from (2) that λ g P G − i =0 k x i k accounts for group sparsity, λ e k x k forelement-wise sparsity and λ f P N − j =1 k [ x ] j − [ x ] j − k accountsfor fusion within the elements of x , such that the effect of eachpenalty becomes severe with increasing penalty parameters,i.e., λ g , λ e and λ f , respectively. For a moderate value of G , the proposed formulation can tackle the varying groupsize problem by creating sparsity within the group along withfusing consecutive elements. Note that, for λ g = λ f = 0 ,(2) reduces to the standard LASSO problem, for λ f = 0 , (2)reduces to SG-LASSO, for λ e = λ f = 0 , (2) takes the shapeof G-LASSO and for λ g = 0 , (2) becomes F-LASSO. Solver for SGF-LASSO:
In order to solve the SGF-LASSOproblem via ADMM, we introduce two auxiliary variables u and z of size N × . Thus, (2) can be written as [ˆ x , ˆ u , ˆ z ] = arg min x , u , z k y − Φx k + λ e k u k + λ g G − X i =0 k u i k + λ f k z k s.t. u i = x i , ≤ i ≤ G − , z = Dx (3)where u i is an N/G × sub-vector of u , i.e., u =[ u T , u T , · · · , u TG − , ] T , and D is the difference matrix with [ D ] j,j = − , [ D ] j,j +1 = 1 , for ≤ j ≤ N − and [ D ] N − ,N − = 1 , such that k Dx k equals the element-wisefusion. From (3), the Lagrangian problem can be written as L ( x , u , z , ρ u , ρ z ) = 12 k y − Φx k + λ e k u k + λ g G − X i =0 k u i k + λ f N X j =2 k z k + G − X i =0 ρ Tu i ( u i − x i ) + c u G − X i =0 k u i − x i k + ρ Tz ( z − Dx ) + c z k z − Dx k (4)where ρ u (with sub-vectors ρ u i , for ≤ i ≤ G − ) and ρ z are Lagrange multipliers and, c u and c z are positive constants. The solution of (3) is generated by the following successiveapproximations x [ n ] = arg min x L (cid:16) x , u [ n − , z [ n − , ρ [ n − u , ρ [ n − z (cid:17) (5) u [ n ] = arg min u L (cid:16) x [ n − , u , ρ [ n − u (cid:17) (6) z [ n ] = arg min z L (cid:16) x [ n − , z , ρ [ n − z (cid:17) (7)and the multipliers are updated as ρ [ n ] u = ρ [ n − u + c u ( x [ n ] − u [ n ] ) (8) ρ [ n ] z = ρ [ n − z + c z ( Dx [ n ] − z [ n ] ) . (9)The closed-form solution for (5) at the n th iteration can bederived to be x [ n ] = (cid:0) Φ T Φ + c z D T D + c u I N (cid:1) − × (cid:16) Φ T y − D T ρ [ n − z + c z D T z [ n − − ρ [ n − u + c u u [ n − (cid:17) . (10)We can see from (10) that the matrix inversion part does notchange during the iterations so that it can be performed off-line, resulting in reduced complexity. Note that the matrixinversion lemma can be used to further ease the computationsinvolved in the inversion operation.For u , note that the optimization involves two penalties,i.e., apart from penalizing each element of u for sparsity, weneed to optimize on each of its sub-groups as well. Sinceboth penalties are non-differentiable, we utilize the fact thatsoft thresholding generates a minimizer for the cost functioninvolving λ e k u i k [4], and for the cost function involving λ g k u i k , the minimizer is s u = u i / k u i k in case k u i k = 0 and the minimizer is a vector s u such that k s u k < in case k u i k = 0 [7]. Thus the closed-form solution of (6) for the i th subgroup at the n th iteration can be written as u [ n ] i = kS x [ n − i + ρ [ n − u i c u , λ e c u ! k − λ g c u ! + × S (cid:18) x [ n − i + ρ [ n − ui c u , λ e c u (cid:19) kS (cid:18) x [ n − i + ρ [ n − ui c u , λ e c u (cid:19) k (11)for ≤ i ≤ G − , where S ( s , λ ) ∆ = sign ( x )( x − λ ) + isthe soft thresholding operator. Thus the estimate of u can beobtained as u [ n ] = [ u [ n ] T , u [ n ] T , · · · , u [ n ] TG − ] T (12)which along with x [ n ] is used to update ρ [ n ] u in (8).Now from (7), the closed-form expression for the estimateof z at the n th iteration can be derived as z [ n ] = S Dx [ n − + ρ [ n − z c z , λ f c z ! (13)which subsequently updates ρ [ n ] z in (9). W x W x W x W x W x W x W x W x W x x x x x x Fig. 1.
Above : Disjoint groups.
Below : Overlapping groups.
B. Latent Group LASSO with Fusion
For the latent group fused LASSO (LGF-LASSO), thesignal is segmented into many overlapping groups of certainsizes . In contrast to the disjoints groups, overlapping groupscan reselect the elements from other groups. We create ˜ G overlapping groups through an N/G × N sub-selection ma-trices W i which select N/G rows from an identity matrix I N . An overlapping group can then be obtained by therelation, W i x , for ≤ i ≤ ˜ G − , where W i is such that W ∆ = [ W T , W T , · · · , W T ˜ G ] T . Each sub-selection matrix W i repeats K rows of W i − , where K is the overlapping factorand ≤ K ≤ N − . Figure 1 schematically shows thedifference between disjoint ( K = 0 ) and overlapping groups(for K = N/ (2 G ) ). We can see that the overlapping groupscan solve the problem of the fixed group size but the price to bepaid is in terms of computational complexity which increasesexcessively with the factor K due to the related increase in ˜ G . Now, the optimization problem for LGF-LASSO can beformulated as ˆ x = arg min x k y − Φx k + λ g ˜ G − X i =0 k W i x k + λ f k Dx k (14)which does not contain an element-wise sparsity term asrequired in (2). Solver for LGF-LASSO:
To solve the LGF-LASSO prob-lem, we again turn to ADMM. By introducing a new auxiliaryvariable ˜ u of size ˜ GN/G , (14) can be written as [ˆ x , ˆ˜ u , ˆ z ] = arg min x , ˜ u , z k y − Φx k + λ g ˜ G − X i =0 k ˜ u i k + λ f k z k s.t. ˜ u i = W i x , ≤ i ≤ ˜ G − , z = Dx (15)where ˜ u i is an N/G × sub-vector of ˜ u , i.e., ˜ u =[˜ u T , ˜ u T , · · · , ˜ u T ˜ G − , ] T . Now the Lagrangian for (15) can bewritten as L ( x , ˜ u , z , ρ ˜ u , ρ z ) = 12 k y − Φx k + λ g ˜ G − X i =0 k ˜ u i k + λ f N X j =2 k z k + ˜ G − X i =0 ρ T ˜ u i (˜ u i − W i x ) + c u ˜ G − X i =0 k ˜ u i − W i x k + ρ Tz ( z − Dx ) + c z k z − Dx k (16)where ρ ˜ u collects the Lagrangian multipliers with sub-vectors ρ ˜ u i for ≤ i ≤ ˜ G − . Now the successive approximations In this paper we consider overlapping groups of fixed sizes, but the conceptcan easily be extended to varying sizes as well. for the solution of (16) w.r.t. x , ˜ u and ρ ˜ u can be written as x [ n ] = arg min x L (cid:16) x , ˜ u [ n − , z [ n − , ρ [ n − u , ρ [ n − z (cid:17) (17) ˜ u [ n ] = arg min u L (cid:16) x [ n − , ˜ u , ρ [ n − u (cid:17) (18) ρ [ n ]˜ u = ρ [ n − u + c u ( x [ n ] − ˜ u [ n ] ) (19)whereas, the expressions for the estimates of z and ρ z are thesame as in (7) and (9), respectively.From (17), the closed-form expression for the estimate of x at the n th iteration can be derived as x [ n ] = (cid:0) Φ T Φ + c z D T D + c u W T W (cid:1) − × (cid:16) Φ T y − D T ρ [ n − z + c z D T z [ n − − W T ( ρ [ n − u − c u u [ n − ) (cid:17) . (20)where we can see that as W is already known, the matrixinversion part of the estimate can again be obtained off-lineand does not need to be estimated for each iteration.From (18), the closed-form expression for the estimate of ˜ u i , for ≤ i ≤ ˜ G − , at the n th iteration can be derived as ˜ u [ n ] i = k W i x [ n − + ρ [ n − u i c u k − λ g c u ! + × W i x [ n − + ρ [ n − ui c u k W i x [ n − + ρ [ n − ui c u k (21)and the estimate for u [ n ] can be obtained as ˜ u [ n ] = [˜ u [ n ] T , ˜ u [ n ] T , · · · , ˜ u [ n ] TG − ] T (22)which is then used to update the multipliers ρ ˜ u in (19).III. S IMULATIONS
In this section, we present some simulation results to com-pare the performance of our proposed algorithms. We com-pare the performance of SGF-LASSO, LGF-LASSO and G-LASSO. We consider a random test signal of length N = 140 ,which is composed of a couple of blocks of exponentiallydecaying elements, a step signal block and a lone small groupof nonzero elements, along with multiple zero blocks. Sucha mixture is mostly expected in smooth signals. The noisevariance has been considered as, σ = 0 . . The signal issensed through the measurement matrix Φ , which has beendrawn from a zero-mean Gaussian distribution with variance /M . We have further orthogonalized the rows of matrix Φ .The penalty parameters for the simulations have been con-sidered as λ e = 0 . , λ g = 5 . and λ f = 3 . . In general,these parameters can be selected from a given range in across-validation manner, by varying one of the parameters andkeeping others fixed [7]. Further, since all of these parametersare sparsity promoting, and can possibly affect each other, it isexpected that the search of the optimal set of parameters wouldbe restricted to a smaller range. The parameters c u and c z arepositive numbers and may affect the convergence rate. We takethem as c u = c z = 2 . As initial conditions, the vectors x [0] , u [0] , z [0] , ρ [0] u , ρ [0] z , ˜ u [0] and ρ [0]˜ u , have all been considered as Original SignalSGF−LASSOOriginal SignalLGF−LASSOOriginal SignalG−LASSO
Fig. 2. Comparison of SGF-LASSO, LGF-LASSO and G-LASSO zero vectors, respectively. Note that, a least-squares solutionof x , can also be considered as a warm-start to speed up theconvergence rate.The group size for SGF-LASSO, LGF-LASSO and G-LASSO has been taken as . Therefore, the number of groupsin SGF-LASSO and G-LASSO are the same, i.e., G = 14 . ForLGF-LASSO, an overlapping factor of K = 5 has been used,and therefore the number of overlapping groups of size are ˜ G = 27 . We use a maximum of iterations for eachalgorithm. We have observed that a tolerance level of − between consecutive updates is reached much earlier than thislimit, and therefore we stop the algorithm at this stage.Figure 2 shows the reconstruction performance of SGF-LASSO, LGF-LASSO and G-LASSO when the signal wassensed with a compression ratio µ = 0 . . We can see that theperformance of SGF-LASSO and LGF-LASSO is very close toeach other and both are able to recover the smooth transitionsof the original signal. SGF-LASSO has an edge over LGF-LASSO, as it better reconstructs even a very small group ofnonzero elements. On the other hand, the performance of G-LASSO deteriorates both on the front of smoothness as wellas block size. Note that in contrast to SGF-LASSO and LGF-LASSO, λ g is the only sparsity creating parameter for G-LASSO. Therefore, we increase its value to . , which isthe minimum to recreate the actual zero blocks.Figure 3 shows the performance comparison of the pro-posed algorithms through the mean squared error (MSE)metric against varying compression ratios, where MSE ∆ = k x − ˆ x k /N . We can see that the performance improvesin general with increasing value of µ , for . ≤ µ ≤ . .Nonetheless, the difference in performance follows the previ-ously observed pattern. SGF-LASSO keeps an edge over LGF-LASSO, whereas G-LASSO remains quite far away. Here wewould like to mention that SGF-LASSO has an edge overLGF-LASSO with a lower number of groups. LGF-LASSOcan have improved performance by increasing the overlappingfactor but that would cause a subsequent increase in thecomputational complexity. µ M SE SGF−LASSOLGF−LASSOG−LASSO
Fig. 3. MSE comparison of SGF-LASSO, LGF-LASSO and G-LASSO
IV. C
ONCLUSIONS
In this letter, we have proposed two new LASSO for-mulations, namely, sparse group fused LASSO and latentgroup fused LASSO. The former uses element-wise sparsity,group sparsity (over disjoint groups) and fusion penalties,whereas the latter combines the fusion penalty with a latentgroup penalty. Both formulations can be used to reconstructsmooth signals from their compressed measurements. We alsoprovide low-complexity solvers for the proposed formulations,based on the alternating direction method of multipliers. Wecompared the performance of our proposed algorithms withstandard group LASSO over a smooth test signal. The simu-lation results confirm the better performance of the proposedalgorithms for signal reconstruction against group LASSO.Similar results were obtained for the mean squared errormetric, for varying compression ratios.R
EFERENCES[1] David L. Donoho, “Compressed sensing,”
IEEE Transactions on Infor-mation Theory , vol. 52, no. 4, April 2006.[2] E. Cand`es, J. Romberg and T. Tao, “Robust uncertainty principles: exactsignal reconstruction from highly incomplete frequency information,”
IEEE Transaction on Information Theory , vol. 52, no. 2, pp. 489-509,February 2006.[3] S. S. Chen, D.L.Donoho and M.A. Saunders, “Atomic decompositionby basis pursuit,”
SIAM journal on Scientific Computing , vol. 43, no. 1,2001.[4] R. Tibshirani, “Regression shrinkage and selection via the lasso”,
J.Royal Statist. Soc B. , vol. 58, no. 1, pp 267-288.[5] J. A. Tropp, A. C. Gilbert, “Signal recovery from random measurementsvia orthogonal matching pursuit,”
IEEE Trans. on Info. Theory , vol. 53,no. 12, December 2007.[6] M. Yuan, Y. Lin, “Model selection and estimation in regression withgrouped variables,”
J. Royal Statist. Soc B. , vol. 68, no. 1, pp. 49-67.[7] J. Friedman, T. Hastie and R. Tibshirani, “A note on the group lassoand a sparse group lasso,” Technical report, Department of Statistics,Stanford University, 2010[8] R. Tibshirani, S. Rosset, J. Zhu and K. Knight, “Sparsity and smoothnessvia the fused lasso,”
J.R. Statist. Soc. B , vol. 67, pp. 91-108, 2005.[9] G. Obozinski, L. Jacob and J.-P. Vert, “Group lasso with overlaps: thelatent group lasso appraoch,” Technical report, 2011.[10] I. W. Selesnick and Po-Yu Chen, “Total variation denoising withoverlapping group sparsity,”
IEEE ICASSP , 2013.[11] D. P. Bertsekas and J. N. Tsitsiklis,