Sparsity Based Methods for Overparameterized Variational Problems
aa r X i v : . [ c s . C V ] A ug Sparsity Based Methods for OverparameterizedVariational Problems
Raja Giryes * , Michael Elad ** , and Alfred M. Bruckstein *** Department of Electrical and Computer Engineering, Duke University, Durham, North Carolina, 27708, USA. [email protected] . ** Department of Computer Science, The Technion - Israel Institute of Technology, Haifa, 32000, Israel. { elad, freddy } @cs.technion.ac.il Abstract —Two complementary approaches have been exten-sively used in signal and image processing leading to novel results,the sparse representation methodology and the variational strat-egy. Recently, a new sparsity based model has been proposed,the cosparse analysis framework, which may potentially help inbridging sparse approximation based methods to the traditionaltotal-variation minimization. Based on this, we introduce a spar-sity based framework for solving overparameterized variationalproblems. The latter has been used to improve the estimation ofoptical flow and also for general denoising of signals and images.However, the recovery of the space varying parameters involvedwas not adequately addressed by traditional variational methods.We first demonstrate the efficiency of the new framework forone dimensional signals in recovering a piecewise linear andpolynomial function. Then, we illustrate how the new techniquecan be used for denoising and segmentation of images.
I. I
NTRODUCTION
Many successful signal and image processing techniquesrely on the fact that the given signals or images of interestbelong to a class described by a certain a priori known model.Given the model, the signal is processed by estimating the“correct” parameters of the model. For example, in the sparsityframework the assumption is that the signals belong to aunion of low dimensional subspaces [1], [2], [3], [4]. In thevariational strategy, a model is imposed on the variations ofthe signal, e.g., its derivatives are required to be smooth [5],[6], [7], [8].Though both sparsity-based and variational-based ap-proaches are widely used for signal processing and computervision, they are often viewed as two different methods withlittle in common between them. One of the well known vari-atonal tools is the total-variation regularization, used mainlyfor denoising and inverse problems. It can be formulated as[6] min ˜ f (cid:13)(cid:13)(cid:13) g − M ˜ f (cid:13)(cid:13)(cid:13) + λ (cid:13)(cid:13)(cid:13) ∇ ˜ f (cid:13)(cid:13)(cid:13) , (1)where g = Mf + e ∈ R m are the given noisy measurements, M ∈ R m × d is a measurement matrix, e ∈ R m is anadditive (typically white Gaussian) noise, λ is a regularizationparameter, f ∈ R d is the original unknown signal to berecovered, and ∇ f is its gradients vector. The anisotropic version of (1), which we will use in thiswork, is min ˜ f (cid:13)(cid:13)(cid:13) g − M ˜ f (cid:13)(cid:13)(cid:13) + λ (cid:13)(cid:13)(cid:13) Ω DIF ˜ f (cid:13)(cid:13)(cid:13) , (2)where Ω DIF is the finite-difference operator that returns thederivatives of the signal. In the D case it applies the filter [1 , − , i.e., Ω DIF = Ω = − . . . . . .
00 1 − . . . . . . ... . . . . .. . . . . . . ... . .. − . . . . . . − , (3)For images it returns the horizontal and vertical derivativesusing the filters [1 , − and [1 , − T respectively. Note thatfor one dimensional signals there is no difference between(1) and (2) as the gradient equals the derivative. However, inthe 2D case the first (Eq. (1)) considers the sum of gradients(square root of the squared sum of the directional derivatives),while the second (Eq. (2)) considers the absolute sum of thedirectional derivatives, approximated by finite differences.Recently, a very interesting connection has been drawnbetween the total-variation minimization problem and thesparsity model. It has been shown that (2) can be viewedas an ℓ -relaxation technique for approximating signals thatare sparse in their derivatives domain, i.e., after applying theoperator Ω DIF on them [9], [10], [11]. Such signals are said tobe cosparse under the operator Ω DIF in the analysis (co)sparsitymodel [10].Notice that the TV regularization is only one example fromthe variational framework. Another recent technique, which isthe focus of this paper, is the overparameterization idea, whichrepresents the signal as a combination of known functionsweighted by space-variant parameters of the model [12], [13].Let us introduce this overparameterized model via an ex-ample. If a 1D signal f is known to be piecewise linear, its i -th element can be written as f ( i ) = a ( i ) + b ( i ) i , where a ( i ) and b ( i ) are the local coefficients describing the localline-curve. As such, the vectors a and b should be piecewise constant , with discontinuities in the same locations. Eachconstant interval in a and b corresponds to one linear segment in f . When put in matrix-vector notation, f can be writtenalternatively as f = a + Zb , (4)where Z ∈ R d × d is a diagonal matrix with the values , , . . . , d on its main diagonal. For images this parameteriza-tion would similarly be f ( i, j ) = a ( i, j )+ b ( i, j ) i + b ( i, j ) j .This strategy is referred to as ”overparameterization” be-cause the number of representation parameters is larger thanthe signal size. In the above 1D example, while the originalsignal contains d unknown values, the recovery problem thatseeks a and b has twice as many variables. Clearly, thereare many other parameterization options for signals, beyondthe linear one. Such parameterizations have been shown toimprove the denoising performance of the solution of theproblem posed in (1) in some cases [12], and to provide veryhigh quality results for optical flow estimation [13], [14]. A. Our Contribution
The true force behind overparameterization is that while ituses more variables than needed for representing the signals,these are often more naturally suited to describe its structure.For example, if a signal is piecewise linear then we mayimpose a constraint on the overparameterization coefficients a and b to be piecewise constant.Note that piecewise constant signals are sparse under the Ω DIF operator. Therefore, for each of the coefficients we canuse the tools developed in the analysis sparsity model [11],[15], [16], [17], [18]. However, in our case a and b are jointlysparse, i.e., their change points are collocated and therefore anextension is necessary.Constraints on the structure in the sparsity pattern of arepresentation have already been analyzed in the literature.They are commonly referred to as joint sparsity models, andthose are found in the literature, both in the context of handlinggroups of signals [19], [20], [21], [22], [23], [24], or whenconsidering blocks of non-zeros in a single representationvector [25], [26], [27], [28], [29]. We use these tools to extendthe existing analysis techniques to handle the block sparsityin our overparameterized scheme.In this paper we introduce a general sparsity based frame-work for solving overparameterized variational problems. Asthe structure of these problems enables segmentation whilerecovering the signal, we provide an elegant way for recover-ing a signal from its deteriorated measurements by using an ℓ approach, which is accompanied by theoretical guarantees.We demonstrate the efficiency of the new framework for onedimensional functions in recovering piecewise polynomial sig-nals. Then we shift our view to images and demonstrate howthe new approach can be used for denoising and segmentation. B. Organization
This paper is organized as follows: In Section II we presentthe overparameterized variational model with more details.In Section III we describe briefly the synthesis and analysissparsity models. In Sections IV and V we introduce a newframework for solving overparameterized variational problems using sparsity. Section IV proposes a recovery strategy forthe 1D polynomial case based on the SSCoSaMP techniquewith optimal projections [30], [31], [32]. We provide stablerecovery guarantees for this algorithm for the case of anadditive adversarial noise and denoising guarantees for thecase of a zero-mean white Gaussian noise. In Section V weextend our scheme beyond the 1D case to higher dimensionalpolynomial functions such as images. We employ an extensionof the GAPN algorithm [33] for block sparsity for this task. InSection VI we present experiments for linear overparameteri-zation of images and one dimensional signals. We demonstratehow the proposed method can be used for image denoising andsegmentation. Section VII concludes our work and proposesfuture directions of research.II. T HE O VERPARAMETERIZED V ARIATIONAL F RAMEWORK
Considering again the linear relationship between the mea-surements and the unknown signal, g = Mf + e , (5)note that without a prior knowledge on f we cannot recoverit from g if m < d or e = 0 .In the variational framework, a regularization is imposedon the variations of the signal f . One popular strategy forrecovering the signal in this framework is by solving thefollowing minimization problem: min ˜ f (cid:13)(cid:13)(cid:13) g − M ˜ f (cid:13)(cid:13)(cid:13) + λ (cid:13)(cid:13)(cid:13) A (˜ f ) (cid:13)(cid:13)(cid:13) p , (6)where λ is the regularization weight and p ≥ is the typeof norm used with the regularization operator A , which istypically a local operator. For example, for p = 1 and A = ∇ we get the TV minimization (Eq. (2)). Another example for aregularization operator is the Laplace operator A = ∇ . Othertypes of regularization operators and variational formulationscan be found in [34], [35], [36], [37].Recently, the overameterized variational framework hasbeen introduced as an extension to the traditional variationalmethodology [12], [13], [14], [38]. Instead of applying aregularization on the signal itself, it is applied on the coef-ficients of the signal under a global parameterization of thespace. Each element of the signal can be modeled as f ( i ) = P nj =1 b j ( i ) x j ( i ) , where { b j } nj =1 are the coefficients vectorsand { x j } nj =1 contain the parameterization basis functions forthe space.Denoting by X i , diag( x i ) the diagonal matrix that hasthe vector x i on its diagonal, we can rewrite the above as f = P nj =1 X j b j . With these notations, the overparameterizedminimization problem becomes min ˜ b i , ≤ i ≤ n (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) g − M n X i =1 X i ˜ b i (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) + n X i =1 λ i (cid:13)(cid:13)(cid:13) A i (˜ b i ) (cid:13)(cid:13)(cid:13) p i , (7)where each coefficient b i is regularized separately by the op-erator A i (which can be the same one for all the coefficients) . We note that it is possible to have more than one regularization for eachcoefficient, as practiced in [38].
Returning to the example of a linear overparameterization,we have that f = a + Zb , where in this case X = I (theidentity matrix) and X = Z = diag(1 , . . . , d ) , a diagonalmatrix with , . . . , d on its diagonal. If f is a piecewiselinear function then the coefficients vectors a and b shouldbe piecewise constant, and therefore it would be natural toregularize these coefficients with the gradient operator. Thisleads to the following minimization problem: min ˜ a , ˜ b (cid:13)(cid:13)(cid:13) g − M (cid:16) ˜ a + Z ˜ b (cid:17)(cid:13)(cid:13)(cid:13) + λ k∇ ˜ a k + λ (cid:13)(cid:13)(cid:13) ∇ ˜ b (cid:13)(cid:13)(cid:13) , (8)which is a special case of (7). The two main advantages ofusing the overparameterized formulation are these: (i) the newunknowns have a simpler form (e.g. a piecewise linear signal istreated by piecewise constant unknowns), and thus are easierto recover; and (ii) this formulation leads to recovering theparameters of the signal along with the signal itself.The overparametrization idea, as introduced in [12], [13],[14], [38] builds upon the vast work in signal processing thatrefers to variational methods. As such, there are no knownguarantees for the quality of the recovery of the signal, whenusing the formulation posed in (8) or its variants. Moreover, ithas been shown in [38] that even for the case of M = I (and obviously, e = 0 ), a poor recovery is achieved inrecovering f and its parameterization coefficients. Note thatthe same happens even if more sophisticated regularizationsare combined and applied on a , b , and eventually on f [38].This leads us to look for another strategy to approachthe problem of recovering a piecewise linear function fromits deteriorated measurement g . Before describing our newscheme, we introduce in the next section the sparsity modelthat will aid us in developing this alternative strategy.III. T HE S YNTHESIS AND A NALYSIS S PARSITY M ODELS
A popular prior for recovering a signal f from its distortedmeasurements (as posed in (5)) is the sparsity model [1], [2].The idea behind it is that if we know a priori that f resides ina union of low dimensional subspaces, which do not intersecttrivially with the null space of M , then we can estimate f stably by selecting the signal that belongs to this union ofsubspaces and is the closest to g [3], [4].In the classical sparsity model, the signal f is assumed tohave a sparse representation α under a given dictionary D ,i.e., f = D α , k α k ≤ k , where k·k is the ℓ pseudo-normthat counts the number of non-zero entries in a vector, and k is the sparsity of the signal. Note that each low dimensionalsubspace in the standard sparsity model, known also as thesynthesis model, is spanned by a collection of k columns from D . With this model we can recover f by solving min α k g − MD α k s.t. k α k ≤ k, (9)if k is known, or min α k α k s.t. k g − MD α k ≤ k e k , (10)if we have information about the energy of the noise e .Obviously, once we get α , the desired recovered signal is simply D α . As both of these minimization problems are NP-hard [39], many approximation techniques have been proposedto approximate their solution, accompanied with recoveryguarantees that depend on the properties of the matrices M and D . These include ℓ -relaxation [40], [41], [42], knownalso as LASSO [43], matching pursuit (MP) [44], orthogonalmatching pursuit (OMP) [45], [46], compressive samplingmatching pursuit (CoSaMP) [47], subspace pursuit (SP) [48],iterative hard thresholding (IHT) [49] and hard thresholdingpursuit (HTP) [50].Another framework for modeling a union of low dimen-sional subspaces is the analysis one [10], [40]. This modelconsiders the behavior of Ωf , the signal after applying a givenoperator Ω on it, and assumes that this vector is sparse. Notethat here the zeros are those that characterize the subspace inwhich f resides, as each zero in Ωf corresponds to a row in Ω to which f is orthogonal to. Therefore, f resides in a subspaceorthogonal to the one spanned by these rows. We say that f is cosparse under Ω with a cosupport Λ if Ω Λ f = 0 , where Ω Λ is a sub-matrix of Ω with the rows corresponding to theset Λ .The analysis variants of (9) and (10) for estimating f are min ˜ f (cid:13)(cid:13)(cid:13) g − M ˜ f (cid:13)(cid:13)(cid:13) s.t. (cid:13)(cid:13)(cid:13) Ω ˜ f (cid:13)(cid:13)(cid:13) ≤ k, (11)where k is the number of non-zeros in Ωf , and min ˜ f (cid:13)(cid:13)(cid:13) Ω ˜ f (cid:13)(cid:13)(cid:13) s.t. (cid:13)(cid:13)(cid:13) g − M ˜ f (cid:13)(cid:13)(cid:13) ≤ k e k . (12)As in the synthesis case, these minimization problems arealso NP-hard [10] and approximation techniques have beenproposed including Greedy Analysis Pursuit (GAP) [10], GAPnoise (GAPN) [33], analysis CoSAMP (ACoSaMP), analysisSP (ASP), analysis IHT (AIHT) and analysis HTP (AHTP)[11].IV. O VERPARAMETERIZATION VIA THE A NALYSIS S PARSITY M ODEL
With the sparsity models now defined, we revisit theoverparameterization variational problem. If we know thatour signal f is piecewise linear, then it is clear that thecoefficients parameters should be piecewise constant with thesame discontinuity locations, when linear overparameterizationis used. We denote by k the number of these discontinuitylocations.As a reminder we rewrite f = [ I , Z ] (cid:2) a T , b T (cid:3) T . Note that a and b are jointly sparse under Ω DIF , i.e, Ω DIF a and Ω DIF b have the same non-zero locations. With this observation wecan extend the analysis minimization problem (11) to supportthe structured sparsity in the vector (cid:2) a T , b T (cid:3) T , leading to thefollowing minimization problem: min a , b (cid:13)(cid:13)(cid:13)(cid:13) g − M [ I , Z ] (cid:20) ab (cid:21)(cid:13)(cid:13)(cid:13)(cid:13) (13) s.t. k| Ω DIF a | + | Ω DIF b |k ≤ k, where | Ω DIF a | denotes applying element-wise absolute valueon the entries of Ω DIF a . Note that we can have a similar formulation for this problemalso in the synthesis framework using the Heaviside dictionary D HS = . . . . . . . . . ... . . . . . . ...... . . . . . . . . . , (14)whose atoms are step functions of different length. We use theknown observation that every one dimensional signal with k change points can be sparsely represented using k + 1 atomsfrom D HS ( k columns for representing the change points plusone for the DC). One way to observe that is by the fact that Ω DIF ˜ D HS = I , where ˜ D HS is a submatrix of D HS obtainedby removing the last column of D HS (the DC component).Therefore, one may recover the coefficient parameters a and b , by their sparse representations α and β , solving min α , β (cid:13)(cid:13)(cid:13)(cid:13) g − M [ I , Z ] (cid:20) D HS D HS (cid:21) (cid:20) αβ (cid:21)(cid:13)(cid:13)(cid:13)(cid:13) (15) s.t. k| α | + | β |k ≤ k, where a = D HS α and b = D HS β . This minimizationproblem can be approximated using block-sparsity techniquessuch as the group-LASSO estimator [25], the mixed- ℓ /ℓ relaxation (extension of the ℓ relaxation) [26], [27], the BlockOMP (BOMP) algorithm [28] or the extensions of CoSaMPand IHT for structured sparsity [29]. The joint sparsity frame-work can also be used with (15) [19], [20], [21], [22], [23],[24].The problem with the above synthesis techniques is twofold:(i) No recovery guarantees exist for this formulation with thedictionary D HS ; (ii) It is hard to generalize the model in (9)to higher order signals, e.g., images.The reason that no theoretical guarantees are providedfor the D HS dictionary is the high correlation between itscolumns. These create high ambiguity, causing the classicalsynthesis techniques to fail in recovering the representations α and β . This problem has been addressed in several con-tributions that have treated the signal directly and not itsrepresentation [30], [31], [32], [51], [52], [53].We introduce an algorithm that approximates the solutionsof both (9) and (11) and has theoretical reconstruction per-formance guarantees for one dimensional functions f withmatrices M that are near isometric for piecewise polynomialfunctions. In the next section we shall present another algo-rithm that does not have such guarantees but is generalizableto higher order functions.Though till now we have restricted our discussion onlyto piecewise linear functions, we turn now to look at themore general case of piecewise 1D polynomial functions ofdegree n . Note that this method approximates the followingminimization problem, which is a generalization of (13) to any polynomial of degree n , min b , b ,..., b n (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) g − M (cid:2) I , Z , Z , . . . , Z n (cid:3) b b ... b n (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) (16) s.t. (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) n X i =0 | Ω DIF b i | (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) ≤ k, where | Ω DIF b i | is an element-wise operation that calculates theabsolute value of each entry in Ω DIF b i .We employ the signal space CoSaMP (SSCoSaMP) strategy[30], [31] to approximate the solution of (16). This algorithmassumes the existence of a projection that for a given signalfinds its closest signal (in the ℓ -norm sense) that belongs tothe model , where in our case the model is piecewise polyno-mial functions with k jump points. This algorithm, along withthe projection required, are presented in Appendix A. A. Recovery Guarantees for Piecewise Polynomial Functions
To provide theoretical guarantees for the recovery bySSCoSaMP we employ two theorems from [31] and [32].These lead to reconstruction error bounds for SSCoSaMP thatguarantee stable recovery if the noise is adversarial, and aneffective denoising effect if it is zero-mean white Gaussian.Both theorems rely on the following property of the mea-surement matrix M , which is a special case of the D -RIP [18]and Ω -RIP [11]. Definition 4.1:
A matrix M has a polynomial restrictedisometry property of order n ( P n -RIP) with a constant δ k iffor any piecewise polynomial function f of order n with k jumps we have (1 − δ k ) k f k ≤ k Mf k ≤ (1 + δ k ) k f k . (17)Having the P n -RIP definition we turn to present the firsttheorem, which treats the adversarial noise case. Theorem 4.2 (Based on Corollary 3.2 in [31] ): Let f be apiecewise polynomial function of order n , e be an adversarialbounded noise and M satisfy the P n -RIP (17) with a con-stant δ k < . . Then after a finite number of iterations,SSCoSaMP yields (cid:13)(cid:13)(cid:13) ˆ f − f (cid:13)(cid:13)(cid:13) ≤ C k e k , (18)where C > is a constant depending on δ k .Note that the above theorem implies that we may com-pressively sense piecewise polynomial functions and achievea perfect recovery in the noiseless case e = 0 . Note also thatif M is a subgaussian random matrix then it is sufficient touse only m = O ( k ( n + log( d )) measurements [3], [11].Though the above theorem is important for compressedsensing, it does not guarantee noise reduction, even for the In a very similar way we could have used the analysis CoSaMP(ACoSaMP) [11], [16]. Note that in [30], [31] the projection might be allowed to be near-optimalin the sense that the projection error is close to the optimal error up to amultiplicative constant. (a) Noisy Function σ = 0 . (b) Function Recovery for σ = 0 . (c) Coefficients Parameters Recovery for σ = 0 . (d) Noisy Function σ = 0 . (e) Function Recovery for σ = 0 . (f) Coefficients Parameters Recovery for σ = 0 . Fig. 1. Recovery of a piecewise linear function using the BGAPN algorithm with and without a constraint on the continuity. case M = I , as C > . The reason for this is that thenoise here is adversarial, leading to a worst-case bound. Byintroducing a random distribution for the noise, one mayget better reconstruction guarantees. The following theoremassumes that the noise is randomly Gaussian distributed, thisway enabling to provide effective denoising guarantees. Theorem 4.3 (Based on Theorem 1.7 in [32] ): Assumethe conditions of Theorem 4.2 such that e is a randomzero-mean white Gaussian noise with a variance σ . Thenafter a finite number of iterations, SSCoSaMP yields (cid:13)(cid:13)(cid:13) ˆ f − f (cid:13)(cid:13)(cid:13) ≤ (19) C p (1 + δ k )3 k (cid:16) p β ) log( nd ) (cid:17) σ, with probability exceeding − k )! ( nd ) − β .The bound in the theorem can be given on the expectederror instead of being given only with high probability usingthe proof technique in [54]. We remark that if we were givenan oracle that foreknows the locations of the jumps in theparameterization, the error we would get would be O ( √ kσ ) .As the log( nd ) factor in our bound is inevitable [55], we mayconclude that our guarantee is optimal up to a constant factor.V. S PARSITY BASED O VERPARAMETERIZED V ARIATIONAL A LGORITHM FOR H IGH D IMENSIONAL F UNCTIONS
We now turn to generalize the model in (13) to supportother overparameterization forms, including higher dimen-sional functions such as images. We consider the case where an upper-bound for the noise energy is given and not the sparsity k , as is common in many applications. Notice that for thesynthesis model, such a generalization is not trivial becausewhile it is easy to extend the Ω DIF operator to high dimensions,it is not clear how to do this for the Heaviside dictionary.Therefore we consider an overparameterized version of (12),where the noise energy is known and the analysis modelis used. Let X , . . . X n be matrices of the space variablesand b . . . b n their coefficients parameters. For example, ina 2D (image) case of piecewise linear constant, X willbe the identity matrix, X will be a diagonal matrix withthe values [1 , , . . . , d, , , . . . , d, . . . , , . . . , d ] on its maindiagonal, and X will similarly be a diagonal matrix with [1 , , . . . , , , , . . . , , . . . d, d, . . . , d ] on its main diagonal.Assuming that all the coefficient parameters are jointly sparseunder a general operator Ω , we may recover these coefficientsby solving h ˆ b T , . . . , ˆ b Tn i T = min ˜ b ,..., ˜ b n (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) n X i =1 (cid:12)(cid:12)(cid:12) Ω ˜ b i (cid:12)(cid:12)(cid:12)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) (20) s.t. (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) g − M [ X , . . . X n ] ˜ b ... ˜ b n (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) ≤ k e k . Having an estimate for all these coefficients,our approximation for the original signal f is ˆ f = [ X , . . . , X n ] h ˆ b T , . . . , ˆ b Tn i T .As the minimization problem in (20) is NP-hard we suggest (a) Noisy Function σ = 0 . (b) Function Recovery for σ = 0 . (c) Coefficients Parameters Recovery for σ = 0 . (d) Noisy Function σ = 0 . (e) Function Recovery for σ = 0 . (f) Coefficients Parameters Recovery for σ = 0 . Fig. 2. Recovery of a piecewise second-order polynomial function using the BGAPN algorithm with and without a constraint on the continuity. to solve it by a generalization of the GAPN algorithm [33] –the block GAPN (BGAPN). We introduce this extension inAppendix B.This algorithm aims at finding in a greedy way the rows of Ω that are orthogonal to the space variables b . . . b n . Noticethat once we find the indices of these rows, the set Λ thatsatisfies Ω Λ b i = 0 for i = 1 . . . n ( Ω Λ is the submatrixof Ω with the rows corresponding to the set Λ ), we mayapproximate b . . . b n by solving h ˆ b T , . . . , ˆ b Tn i T = min ˜ b ,..., ˜ b n n X i =1 (cid:13)(cid:13)(cid:13) Ω Λ ˜ b i (cid:13)(cid:13)(cid:13) (21) s.t. (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) g − M [ X , . . . X n ] ˜ b ... ˜ b n (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) ≤ k e k . Therefore, BGAPN approximates b . . . b n by finding Λ first.It starts with Λ that includes all the rows of Ω and thengradually removes elements from it by solving the problemposed in (21) at each iteration and then finding the row in Ω that has the largest correlation with the current temporalsolution h ˆ b T , . . . , ˆ b Tn i T .Note that there are no known recovery guarantees forBGAPN of the form we have had for SSCoSaMP before.Therefore, we present its efficiency in several experiments inthe next section. As explained in Appendix B, the advantagesof BGAPN over SSCoSaMP, despite the lack of theoreticalguarantees, are that (i) it does not need k to be foreknown and (ii) it is easier to use with higher dimensional functions.Before we move to the next section we note that one of theadvantages of the above formulation and the BGAPN algo-rithm is the relative ease of adding to it new constraints. Forexample, we may encounter piecewise polynomial functionsthat are also continuous. However, we do not have such acontinuity constraint in the current formulation. As we shallsee in the next section, the absence of such a constraint allowsjumps in the discontinuity points between the polynomialsegments and therefore it is important to add it to the algorithmto get a better reconstruction.One possibility to solve this problem is to add a continuityconstraint on the jump points of the signal. In Appendix B wepresent also a modified version of the BGAPN algorithm thatimposes such a continuity constraint, and in the next sectionwe shall see how this handles the problem. Note that this isonly one example of a constraint that one may add to theBGAPN technique. For example, in images one may add asmoothness constraint on the edges’ directions.VI. E XPERIMENTS
For demonstrating the efficiency of the proposed method weperform several tests. We start with the one dimensional case,testing our polynomial fitting approach with the continuityconstraint and without it for continuous piecewise polynomialsof first and second degrees. We compare these results withthe optimal polynomial approximation scheme presented inSection IV and to the variational approach in [38]. We con-tinue with a compressed sensing experiment for discontinuous σ M ean S qua r ed E rr o r BGAPNBGAPN ContinuousOptimal ProjectionOptimal ContinuousTVOPNL 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.502468101214161820 σ M ean S qua r ed E rr o r BGAPNBGAPN ContinuousOptimal ProjectionOptimal ContinuousTVOPNL
Fig. 3. MSE of the recovered piecewise linear functions (left) and piecewise second-order polynomial functions (right) as a function of the noise variance σ for the methods BGAPN with and without the continuity constraint and the optimal approximation with and without continuity post-processing. As a referencewe compare to the non local overparameterized TV (TVOPNL) approach introduced in [38]. R e c o v e r y R a t e BGAPNSSCoSaMP
Fig. 4. Recovery rate of piecewise second-order polynomial functions as a function of the sampling rate m/d for the methods BGAPN and SSCoSaMP. piecewise polynomials and compare BGAPN with SSCoSaMP.Then we perform some tests on images using BGAPN. Westart by denoising cartoon images using the piecewise linearmodel. We compare our outcome with the one of TV denoising[6] and show that our result does not suffer from a staircasingeffect [56]. We compare also to a TV denoising version withoverparameterization [12]. Then we show how our frameworkmay be used for image segmentation, drawing a connectionto the Mumford-Shah functional [57], [58]. We compare ourresults with the ones obtained by the popular graph-cuts basedsegmentation [59].
A. Continuous Piecewise Polynomial Functions Denoising
In order to check the performance of the polynomial fitting,we generate random continuous piecewise-linear and second-order polynomial functions with samples, jumps and adynamic range [ − , . Then we contaminate the signal witha white Gaussian noise with a standard deviation from the set { . , . , . , . . . , . } . We compare the recovery result of BGAPN with andwithout the continuity constraint with the one of the optimalapproximation . Figs. 1 and 2 present BGAPN reconstructionresults for the linear and second order polynomial casesrespectively, for two different noise levels. It can be observedthat the addition of the continuity constraint is essential forthe correctness of the recovery. Indeed, without it we getjumps between the segments. Note also that the number ofjumps in our recovery may be different than the one ofthe original signal as BGAPN does not have a preliminaryinformation about it. However, it still manages to recover theparameterization in a good way, especially in the lower noisecase.The possibility to provide a parametric representation is oneof the advantages of our method. Indeed, one may achievegood denoising results without using the linear model in We have done the same experiment with the BOMP algorithm [28],adopting the synthesis framework, with and without the continuity constraint,and observed that it performs very similarly to BGAPN. terms of mean squared error (MSE) using methods such asfree-knot spline [60]. However, the approximated function isnot guaranteed to be piecewise linear and therefore learningthe change points from it is sub-optimal. See [38] and thereferences therein for more details.To evaluate our method with respect to its MSE we compareit with the optimal approximation for piecewise polynomialfunction presented in Appendix A-A. Note that the targetsignals are continuous while this algorithm does not use thisassumption. Therefore, we add the continuity constraint to thismethod as a post processing (unlike BGAPN that merges thisin its steps). We take the changing points it has recoveredand project the noisy measurement g to its closest continuouspiecewise polynomial function with the same discontinuities.Figure 3 presents the recovery performance of BGAPNand the projection algorithm with and without the continuousconstraint. Without the constraint, it can be observed thatBGAPN achieves better recovery performance. This is dueto the fact that it is not restricted to the number of changepoints in the initial signal and therefore it can use more pointsand thus adapt itself better to the signal, achieving lowerMSE. However, after adding the constraint in the piecewiselinear case the optimal projection achieves a better recoveryerror. The reason is that, as the optimal projection uses theexact number of points, it finds the changing locations moreaccurately. Note though that in the case of second order poly-nomial functions, BGAPN gets better recovery. This happensbecause this program uses the continuity constraint also withinits iterations and not only at the final step, as is the case withthe projection algorithm. As the second order polynomial caseis more complex than the piecewise linear one, the impact ofthe usage of the continuity prior is higher and more significantthan the information on the number of change points.We compare also to the non-local opverapameterized TValgorithm (TVOPNL) in [38] , which was shown to be bet-ter for the task of line segmentation, when compared withseveral alternatives including the ones reported in [12] and[13]. Clearly, our proposed scheme achieves better recoveryperformance than TVOPNL, demonstrating the supremacy ofour line segmentation strategy. B. Compressed Sensing of Piecewise Polynomial Functions
We perform also a compressed sensing experiment in whichwe compare the performance of SSCoSAMP, with the opti-mal projection, and BGAPN for recovering a second orderpolynomial function with jumps from a small set of linearmeasurements. Each entry in the measurement matrix M isselected from an i.i.d normal distribution and then all columnsare normalized to have a unit norm. The polynomial functionsare selected as in the previous experiment but with twodifferences: (i) we omit the continuity constraint; and (ii) wenormalize the signals to be with a unit norm.Fig. 4 presents the recovery rate (noiseless case σ = 0 ) ofeach program as a function of the number of measurements m .Note that for a very small or large number of samples BGAPNbehaves better. However, in the middle range SSCoSaMP Code provided by the authors. (a) Original Image (b) Noisy Image σ = 20 (c) BGAPN with Ω DIF . PSNR =40.09dB.(d) TV recovery. PSNR = 38.95dB. (e) TV OP recovery. PSNR =37.41dB.Fig. 5. Denoising of swoosh using the BGAPN algorithm with and withoutdiagonal derivatives. Notice that we do not have the staircasing effect thatappears in the TV reconstruction. achieves a better reconstruction rate. Nonetheless, we may saythat their performance is more or less the same.
C. Cartoon Image Denoising
We turn to evaluate the performance of our approach onimages. A piecewise smooth model is considered to be a goodmodel for images, and especially to the ones with no texture,i.e., cartoon images [61], [62]. Therefore, we use a linearoverparameterization of the two dimensional plane and employthe two dimensional difference operator Ω DIF that calculatesthe horizontal and vertical discrete derivatives of an image byapplying the filters [1 , − and [1 , − T on it. In this case, the (a) Original Image (b) Noisy Image σ = 20 (c) BGAPN with Ω DIF . PSNR =34.02dB.(d) TV recovery. PSNR =33.69dB. (e) TV OP recovery. PSNR=31.83dB.Fig. 6. Denoising of sign using the BGAPN algorithm. The results of TVand OP-TV are presented as a reference. problem in (20) turns to be (notice that M = I ) h ˆ b T , ˆ b Th , ˆ b Tv i T = (22) min ˜ b , ˜ b h , ˜ b v (cid:13)(cid:13)(cid:13)(cid:13)(cid:12)(cid:12)(cid:12) Ω DIF ˜ b (cid:12)(cid:12)(cid:12) + (cid:12)(cid:12)(cid:12) Ω DIF ˜ b h (cid:12)(cid:12)(cid:12) + (cid:12)(cid:12)(cid:12) Ω DIF ˜ b v (cid:12)(cid:12)(cid:12) (cid:13)(cid:13)(cid:13)(cid:13) s.t. (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) g − [ X , X h , X v ] ˜ b ˜ b h ˜ b v (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) ≤ k e k , where X , X h , X v are the matrices that contain the DC, thehorizontal and the vertical parameterizations respectively; and ˆ b , ˆ b h , ˆ b v are their corresponding space variables.We apply this scheme for denoising two cartoon images, (a) Original Image (b) Noisy Image σ = 20 (c) BGAPN with Ω DIF . PSNR =30.77dB.(d) TV recovery. PSNR = 31.44dB. (e) TV OP recovery. PSNR =30.6dB.Fig. 7. Denoising of house using the BGAPN algorithm. Notice that we donot have the staircasing effect that appears in the TV reconstruction. Becauseour model is linear we do not recover the texture and thus we get slightlyinferior results compared to TV with respect to PSNR. Note that if we usea cubic overparameterization with BGAPN instead of linear we get PSNR(=31.81dB) better than that of TV. swoosh and sign . We compare our results with the ones of TVdenoising [6]. Figs. 5 and 6 present the recovery of swoosh and sign from their noisy version contaminated with an additivewhite Gaussian noise with σ = 20 . Note that we achieve betterrecovery results than TV and do not suffer from its staircasingeffect. We have tuned the parameters of TV separately for eachimage to optimize its output quality, while we have used thesame setup for our method in all the denoising experiments.To get a good quality with BGAPN, we run the algorithmseveral times with different set of parameters (which are thesame for all images) and then provide as an output the averageimage of all the runs. Notice that using this technique with TVdegrades its results.To test whether our better denoising is just a result of using (a) Original Image Gradients (b) Recovered Image GradientsFig. 8. Gradient map of the clean house image and our recovered imagefrom Fig. 7. overparameterization or an outcome of our new framework, wecompare also to TV with linear overparameterization [12] .Notice that while plugging overparameterization directly inTV improves the results in some cases [12], this is not thecase with the images here. Therefore, we see that our newframework that links sparsity with overparameterization hasan advantage over the old approach that still acts within thevariational scheme.We could use other forms of overparameterizations suchas cubical instead of planar or add other directions of thederivatives in addition to the horizontal and vertical ones. Forexample, one may apply our scheme also using an operatorthat calculates also the diagonal derivatives using the filters (cid:20) − (cid:21) and (cid:20) − (cid:21) . Such choices may lead to animprovement in different scenarios. A future work shouldfocus on learning the overparameterizations and the type ofderivatives that should be used for denoising and for othertasks. We believe that such a learning has the potential to leadto state-of-the-art results. D. Image Segmentation
As a motivation for the task of segmentation we present thedenoising of an image with a texture. We continue using themodel in (22) and consider the house image as an example.Fig. 7 demonstrates the denoising result we get for this image.Note that here as well we do not suffer from the staircasingeffect that appears in the TV recovery. However, due to thenature of our model we loose the texture and therefore achievean inferior PSNR compared to the TV denoising .Though the removal of texture is not favorable for the task ofdenoising, it makes the recovery of salient edges in the originalimage easier. In Fig. 8 we present the gradient map of ourrecovered image and the one of the original image. It can beseen that while the gradients of the original image capture also Code provided by the authors. The lower PSNR we get with our method is because our model is linearand therefore is less capable to adapt itself to the texture. By using a cubicoverparameterization we get PSNR which is equal to the one of TV. Note alsothat for larger noise magnitudes the recovery performance of our algorithmin terms of PSNR becomes better than TV also with the linear model, as inthese conditions, we tend to loose the texture anyway. the texture changes, our method finds only the main edges .This motivates us to use our scheme for segmentation.Since our scheme divides the image into piecewise linear re-gions, we can view our strategy as an approach that minimizesthe Mumford-Shah functional [57], [58]. On the other hand, ifthe image has only two regions, our segmentation result canbe viewed as a solution of the Chan-Vese functional with thedifference that we model each region by a polynomial functioninstead of approximating it by a constant [63].We present our segmentation results for three images, andfor each we display the piecewise constant version of eachimage together with its boundary map. Our segmentationresults appear in Figs. 9, 10 and 11. We compare our resultsto the popular graph-cuts based segmentation [59]. Notice thatwe achieve a comparable performance, where in some placesour method behaves better and in others the strategy in [59]provides a better result.Though we get a good segmentation, it is clear that thereis still a large room for improvement compared to the currentstate-of-the-art. One direction for improvement is to use morefilters within Ω . Another one is to calculate the gradients ofthe coefficients parameters and not of the recovered image asthey are supposed to be truly piecewise constant. We leavethese ideas to a future work.VII. C ONCLUSION AND F UTURE W ORK
This work has presented a novel framework for solving theoverparameterized variational problem using sparse represen-tations. We have demonstrated how this framework can beused both for one dimensional and two dimensional functions,while a generalization to other higher dimensions (such as 3D)is straightforward. We have solved the problem of line fittingfor piecewise polynomial 1D signals and then shown how thenew technique can be used for compressed sensing, denoisingand segmentation.Though this work has focused mainly on linear overparam-eterizations, the extension to other forms is straightforward.However, to keep the discussion as simple as possible, wehave chosen to use simple forms of overparameterizations inthe experiments section. As a future research, we believe thata learning process should be added to our scheme. It shouldadapt the functions of the space variables X , . . . , X n and thefilters in Ω to the signal at hand. We believe that this hasthe potential to lead to state-of-the-art results in segmentation,denoising and other signal processing tasks. Combining of ourscheme with the standard sparse representation approach mayprovide the possibility to add support to images with texture.This will lead to a scheme that works globally on the imagefor the cartoon part and locally for the texture part. Anotherroute for future work is to integrate our scheme in the state-of-the-art overparameterized based algorithm for optical flowin [13]. A CKNOWLEDGMENT
The authors would like to thank Tal Nir and Guy Rosemanfor providing their code for the experiments. The research We have tested our approach also in the presence of a blur operator. Theedges in this case are preserved as well. (a) Original Image (b) Piecewise linear version of the image(c) Image Segmentation (d) Image Segmentation using Graph-Cuts [59]Fig. 9. Piecewise linear version of coins image together with the segmentation result. We compare to the popular graph-cuts based segmentation [59]. leading to these results has received funding from the Eu-ropean Research Council under European Unions SeventhFramework Program, ERC Grant agreement no. 320649. Thisresearch is partially supported by AFOSR, ARO, NSF, ONR,and NGA. The authors would like to thank the anonymousreviewers for their helpful and constructive comments thatgreatly contributed to improving this paper.A PPENDIX AT HE SSC O S A MP A
LGORITHM
For approximating (16), we use a block sparsity variant ofSSCoSaMP [32] and adapt it to our model. It is presentedin Algorithm 1. Due to the equivalence between D HS and Ω DIF , we use the latter in the algorithm.This method uses a projection S n ( · , k ) that given a signalfinds its closest piecewise polynomial functions with k jumppoints. We calculate this projection using dynamic program-ming. Our strategy is a generalization of the one that appearsin [11], [64] and is presented in the next subsection. The halting criterion we use in our work in Algorithm 1is k g tr k ≤ ǫ for a given small constant ǫ . Other options forstopping criteria are discussed in [47]. A. Optimal Approximation using Piecewise Polynomial Func-tions
Our projection technique uses the fact that once the jumppoints are set, the optimal parameters of the polynomial in asegment [ t, l ] can be calculated optimally by solving a leastsquares minimization problem (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) [ I [ t, l ] , Z [ t, l ] , . . . , Z n [ t, l ]] b [ t, l ] b [ t, l ] ... b n [ t, l ] − g [ t, l ] (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) , (23)where g [ t, l ] is the sub-vector of g supported by the indices t to l ( t ≤ l ) and Z i [ t, l ] is the (square) sub-matrix of Z i corresponding to the indices t to l . We denote by P n ( g [ t, l ]) the polynomial function we get by solving (23). Indeed, in (a) Original Image (b) Piecewise linear version of the image(c) Image Segmentation (d) Image Segmentation using Graph-Cuts [59]Fig. 10. Piecewise linear version of airplane image together with the segmentation result. We compare to the popular graph-cuts based segmentation [59]. the case that the size of the segment [ t, l ] is smaller than thenumber of parameters, e.g. segment of size one for a linearfunction, the above minimization problem has infinitely manyoptions for setting the parameters. However, all of them leadto the same result, which is keeping the values of the pointsin the segment, i.e., having P n ( g [ t, l ]) = g [ t, l ] .Denote by S n ( g [1 , ˜ d ] , k ) the optimal approximation of thesignal g [1 , ˜ d ] by a piecewise polynomial function with k jumps. It can be calculated by solving the following recursiveminimization problem ˆ t = argmin ≤ t< ˜ d k S n ( g [1 , t ] , k − − g [1 , t ] k (24) + (cid:13)(cid:13)(cid:13) P n ( g [ t + 1 , ˜ d ]) − g [ t + 1 , ˜ d ] (cid:13)(cid:13)(cid:13) , and setting S n ( g [1 , ˜ d ] , k ) = (cid:20) S n ( g [1 , ˆ t ] , k − P n ( g [ˆ t + 1 , ˜ d ]) (cid:21) . (25)The vectors S n ( g [1 , t ] , k − can be calculated recur-sively using (24). The recursion ends with the base case S n ( g [1 , t ] ,
0) = P n ( g [1 , t ]) .This leads us to the following algorithm for calculatingan optimal approximation for a signal g . Notice that thisalgorithm provides us also with the parameterization of apiecewise polynomial. 1) Calculate S n ( g [1 , t ] ,
0) = P n ( g [1 , t ]) for ≤ t ≤ d .2) For ˜ k = 1 : k − do • Calculate S n ( g [1 , ˜ d ] , ˜ k ) for ≤ ˜ d ≤ d using (24)and (25).3) Calculate S n ( g [1 , d ] , k ) using (24) and (25).Denoting by T the worst case complexity of calculating P n ( g [ t, l ]) for any pair t, l , we have that the complexity of step1) is O ( dT ) ; of step 2) is O ( kd ( T + d )) , as the computationof the projection error is of complexity O ( d ) ; and of step 3) O ( d ( T + d )) . Summing all together we get a total complexityof O ( kd ( T + d )) for the algorithm, which is a polynomialcomplexity since T is polynomial.A PPENDIX BT HE B LOCK
GAPN A
LGORITHM
For approximating (20), we extend the GAPN technique[33] to block sparsity and adapt it to our model. It is presentedin Algorithm 2. Notice that this program, unlike SSCoSaMP,does not assume the knowledge of k or the existence of anoptimal projection onto the signals’ low dimensional union ofsubspaces. Note also that it suits a general form of overparam-eterization and not only 1D piecewise polynomial functions. Itis possible to accelerate BGAPN for highly scaled problemsby removing from the cosupport several elements at a timeinstead of one in the update cosupport stage. (a) Original Image (b) Piecewise linear version of the image(c) Image Segmentation (d) Image Segmentation using Graph-Cuts [59]Fig. 11. Piecewise linear version of man image together with the segmentation result. We compare to the popular graph-cuts based segmentation [59]. Ideally, we would expect that after several iterations ofupdating the cosupport in BGAPN we would have Ω Λ ˆ b i = 0 .However, many signals are only nearly cosparse, i.e., have k significantly large values in Ωb i while the rest are smallerthan a small constant ǫ . Therefore, a natural stopping criterionin this case would be to stop when the maximal value in (cid:12)(cid:12)(cid:12) Ω ˆ b i (cid:12)(cid:12)(cid:12) is smaller than ǫ . This is the stopping criterion we usethroughout this paper for BGAPN. Of course, this is not theonly option for a stopping criterion, e.g. one may look at therelative solution change in each iteration or use a constantnumber of iterations if k is foreknown.We present also a modified version of BGAPN in Algo-rithm 3 that imposes a continuity constraint on the changepoints. This is done by creating a binary diagonal matrix W = diag( w , . . . , w p ) such that in each iteration of theprogram the i -th element w i is if it corresponds to a changepoint and zero otherwise. This matrix serves as a weightsmatrix to penalize discontinuity in the change point. This isdone by adding the regularizing term γ (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) WΩ [ X , . . . X n ] ˜ b ... ˜ b n (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) to the minimization problem in (26) (Eq. (27) in Algorithm 3),which leads to the additional step (Eq. (28)) in the modified program. R EFERENCES[1] A. M. Bruckstein, D. L. Donoho, and M. Elad, “From sparse solutions ofsystems of equations to sparse modeling of signals and images,”
SIAMReview , vol. 51, no. 1, pp. 34–81, 2009.[2] R. Gribonval and M. Nielsen, “Sparse representations in unions ofbases,”
IEEE Trans. Inf. Theory. , vol. 49, no. 12, pp. 3320–3325, Dec.2003.[3] T. Blumensath and M. Davies, “Sampling theorems for signals from theunion of finite-dimensional linear subspaces,”
IEEE Trans. Inf. Theory. ,vol. 55, no. 4, pp. 1872 –1882, april 2009.[4] Y. M. Lu and M. N. Do, “A theory for sampling signals from a unionof subspaces,”
IEEE Trans. Signal Process. , vol. 56, no. 6, pp. 2334–2345, Jun. 2008.[5] P. Perona and J. Malik, “Scale-space and edge detection usinganisotropic diffusion,”
Pattern Analysis and Machine Intelligence, IEEETransactions on , vol. 12, no. 7, pp. 629–639, Jul 1990.[6] L. I. Rudin, S. Osher, and E. Fatemi, “Nonlinear total variation basednoise removal algorithms,”
Physica D: Nonlinear Phenomena , vol. 60,no. 1-4, pp. 259–268, 1992.[7] V. Caselles, R. Kimmel, and G. Sapiro, “Geodesic active contours,”
International Journal of Computer Vision , vol. 22, no. 1, pp. 61–79,1997.[8] J. Weickert,
Anisotropic diffusion in image processing . Teubner,Stuttgart, Germany, 1998.[9] D. Needell and R. Ward, “Stable image reconstruction using totalvariation minimization,”
SIAM Journal on Imaging Sciences , vol. 6,no. 2, pp. 1035–1058, 2013.[10] S. Nam, M. E. Davies, M. Elad, and R. Gribonval, “The cosparseanalysis model and algorithms,”
Appl. Comput. Harmon. Anal. , vol. 34,no. 1, pp. 30 – 56, 2013. Algorithm 1
Signal Space CoSaMP (SSCoSaMP) for Piece-wise Polynomial Functions
Input: k, M , g , γ , where g = Mf + e , f = (cid:2) I , Z , Z , . . . , Z n (cid:3) (cid:2) b T , b T , . . . , b Tn (cid:3) T is a piecewisepolynomial function of order n , k = k P ni =0 | Ω DIF b i |k is the number of jumps in the representation coefficientsof f , e is an additive noise and γ is a parameter of thealgorithm. S n ( · , k ) is a procedure that approximates agiven signal by a piecewise polynomial function of order n with k jumps. Output: ˆ f : A piecewise polynomial with k + 1 segments thatapproximates f . • Initialize the jumps’ locations T = ∅ , the residual g r = g and set t = 0 . while halting criterion is not satisfied do • t = t + 1 . • Find the parameterization b r, , b r, , . . . , b r,n ofthe residual’s polynomial approximation by calculating S n ( M T g t − r , γk ) . • Find new temporal jump locations: T ∆ = the supportof P ni =0 | Ω DIF b r,i | . • Update the jumps locations’ indices: ˜ T t = T t − ∪ T ∆ . • Compute temporal parameters: [ b p, , . . . , b p,n ] =argmin ˜ b ,..., ˜ b n (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) g − M (cid:2) I , Z , Z . . . , Z n (cid:3) ˜ b ˜ b ... ˜ b n (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) s.t. ( Ω DIF ˜ b ) ( ˜ T t ) C = 0 , . . . ( Ω DIF ˜ b n ) ( ˜ T t ) C = 0 . • Calculate a polynomial approximation of order n : f t = S n ( (cid:2) I , Z , Z , . . . , Z n (cid:3) (cid:2) b Tp, , . . . , b Tp,n (cid:3) T , k ) . • Find new jump locations: T t = the locations of thejumps in the parameterization of f t . • Update the residual: g tr = g − Mf t . end while • Form final solution ˆ f = f t . [11] R. Giryes, S. Nam, M. Elad, R. Gribonval, and M. E. Davies, “Greedy-like algorithms for the cosparse analysis model,” Linear Algebra and itsApplications , vol. 441, no. 0, pp. 22 – 60, Jan. 2014, special Issue onSparse Approximate Solution of Linear Systems.[12] T. Nir and A. Bruckstein, “On over-parameterized model based TV-denoising,” in
Signals, Circuits and Systems, 2007. ISSCS 2007. Inter-national Symposium on , vol. 1, July 2007, pp. 1–4.[13] T. Nir, A. M. Bruckstein, and R. Kimmel, “Over-parameterized varia-tional optical flow,”
International Journal of Computer Vision , vol. 76,no. 2, pp. 205–216, 2008.[14] G. Rosman, S. Shem-Tov, D. Bitton, T. Nir, G. Adiv, R. Kimmel,A. Feuer, and A. M. Bruckstein, “Over-parameterized optical flow usinga stereoscopic constraint,” in
Scale Space and Variational Methodsin Computer Vision , ser. Lecture Notes in Computer Science, A. M.Bruckstein, B. M. Haar Romeny, A. M. Bronstein, and M. M. Bronstein,Eds. Springer Berlin Heidelberg, 2012, vol. 6667, pp. 761–772.[15] R. Giryes, S. Nam, R. Gribonval, and M. E. Davies, “Iterative cosparseprojection algorithms for the recovery of cosparse vectors,” in
The 19thEuropean Signal Processing Conference (EUSIPCO-2011) , Barcelona,Spain, 2011.[16] R. Giryes and M. Elad, “CoSaMP and SP for the cosparse anal-ysis model,” in
The 20th European Signal Processing Conference(EUSIPCO-2012) , Bucharest, Romania, 2012.[17] T. Peleg and M. Elad, “Performance guarantees of the thresholdingalgorithm for the Co-Sparse analysis model,” submitted to IEEE Trans.
Algorithm 2
The Block GAPN Algorithm
Input: M , g , Ω , where g = Mf + e , f =[ X , . . . , X n ] (cid:2) b T , . . . , b Tn (cid:3) T such that P ni | Ωb i | issparse, and e is an additive noise. Output: ˆ f = [ X , . . . , X n ] h ˆ b T , . . . , ˆ b Tn i T : an estimate for f such that P ni (cid:12)(cid:12)(cid:12) Ω ˆ b i (cid:12)(cid:12)(cid:12) is sparse.Initialize cosupport Λ = { , . . . , p } and set t = 0 . while halting criterion is not satisfied do t = t + 1 .Calculate a new estimate: h ˆ b T , . . . , ˆ b Tn i T = argmin ˜ b ,..., ˜ b n n X i =1 (cid:13)(cid:13)(cid:13) Ω Λ ˜ b i (cid:13)(cid:13)(cid:13) (26) s.t. (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) g − M [ X , . . . X n ] ˜ b ... ˜ b n (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) ≤ k e k . Update cosupport:
Λ = Λ \ (cid:26) argmax j P ni =1 (cid:13)(cid:13)(cid:13) Ω j ˆ b i (cid:13)(cid:13)(cid:13) (cid:27) . end while Form an estimate for the original signal: ˆ f =[ X , . . . , X n ] h ˆ b T , . . . , ˆ b Tn i T . on Information Theory .[18] E. J. Cand`es, Y. C. Eldar, D. Needell, and P. Randall, “Compressed sens-ing with coherent and redundant dictionaries,” Appl. Comput. Harmon.Anal. , vol. 31, no. 1, pp. 59 – 73, 2011.[19] S. F. Cotter, B. D. Rao, K. Engan, and K. Kreutz-Delgado, “Sparsesolutions to linear inverse problems with multiple measurement vectors,”
IEEE Trans. Signal Process. , vol. 53, no. 7, pp. 2477–2488, July 2005.[20] J. A. Tropp, A. C. Gilbert, and M. J. Strauss, “Algorithms for simulta-neous sparse approximation. Part I: Greedy pursuit,”
Signal Processing ,vol. 86, no. 3, pp. 572 – 588, 2006, sparse Approximations in Signaland Image Processing.[21] J. A. Tropp, “Algorithms for simultaneous sparse approximation. PartII: Convex relaxation,”
Signal Processing , vol. 86, no. 3, pp. 589 – 602,2006, sparse Approximations in Signal and Image Processing.[22] D. Wipf and B. Rao, “An empirical bayesian strategy for solvingthe simultaneous sparse approximation problem,”
IEEE Trans. SignalProcess. , vol. 55, no. 7, pp. 3704–3716, July 2007.[23] M. Mishali and Y. C. Eldar, “Reduce and boost: Recovering arbitrarysets of jointly sparse vectors,”
IEEE Trans. Signal Process. , vol. 56,no. 10, pp. 4692–4702, Oct. 2008.[24] M. Fornasier and H. Rauhut, “Recovery algorithms for vector-valueddata with joint sparsity constraints,”
SIAM Journal on Numerical Anal-ysis , vol. 46, no. 2, pp. 577–613, 2008.[25] M. Yuan and Y. Lin, “Model selection and estimation in regression withgrouped variables,”
Journal of the Royal Statistical Society: Series B(Statistical Methodology) , vol. 68, no. 1, pp. 49–67, 2006.[26] Y. C. Eldar and M. Mishali, “Robust recovery of signals from astructured union of subspaces,”
Information Theory, IEEE Transactionson , vol. 55, no. 11, pp. 5302–5316, Nov 2009.[27] M. Stojnic, F. Parvaresh, and B. Hassibi, “On the reconstruction ofblock-sparse signals with an optimal number of measurements,”
SignalProcessing, IEEE Transactions on , vol. 57, no. 8, pp. 3075–3085, Aug2009.[28] Y. C. Eldar, P. Kuppinger, and H. Bolcskei, “Block-sparse signals:Uncertainty relations and efficient recovery,”
Signal Processing, IEEETransactions on , vol. 58, no. 6, pp. 3042–3054, June 2010.[29] R. Baraniuk, V. Cevher, M. Duarte, and C. Hegde, “Model-based com-pressive sensing,”
Information Theory, IEEE Transactions on , vol. 56,no. 4, pp. 1982–2001, April 2010.[30] M. A. Davenport, D. Needell, and M. B. Wakin, “Signal space cosampfor sparse recovery with redundant dictionaries,”
IEEE Trans. Inf.Theory , vol. 59, no. 10, pp. 6820–6829, Oct 2013. Algorithm 3
The Block GAPN Algorithm with ContinuityConstraint
Input: M , g , Ω , γ , where g = Mf + e , f =[ X , . . . , X n ] (cid:2) b T , . . . , b Tn (cid:3) T such that P ni | Ωb i | is sparse, e is an additive noise, and γ is a weight for the continuityconstraint. Output: ˆ f = [ X , . . . , X n ] h ˆ b T , . . . , ˆ b Tn i T : an estimate for f such that P ni (cid:12)(cid:12)(cid:12) Ω ˆ b i (cid:12)(cid:12)(cid:12) is sparse.Initialize cosupport Λ = { , . . . , p } and set t = 0 . while halting criterion is not satisfied do t = t + 1 .Calculate a new estimate: h ˆ b T , . . . , ˆ b Tn i T = argmin ˜ b ,..., ˜ b n n X i =1 (cid:13)(cid:13)(cid:13) Ω Λ ˜ b i (cid:13)(cid:13)(cid:13) (27) s.t. (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) g − M [ X , . . . X n ] ˜ b ... ˜ b n (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) ≤ k e k . Update cosupport:
Λ = Λ \ (cid:26) argmax j P ni =1 (cid:13)(cid:13)(cid:13) Ω j ˆ b i (cid:13)(cid:13)(cid:13) (cid:27) .Create Weight Matrix: W = diag( w , . . . , w p ) , where w i = 0 if i ∈ Λ or w i = 1 otherwise.Recalculate the estimate: h ˆ b T , . . . , ˆ b Tn i T = argmin ˜ b ,..., ˜ b n n X i =1 (cid:13)(cid:13)(cid:13) Ω Λ ˜ b i (cid:13)(cid:13)(cid:13) (28) + γ (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) WΩ [ X , . . . X n ] ˜ b ... ˜ b n (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) s.t. (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) g − M [ X , . . . X n ] ˜ b ... ˜ b n (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) ≤ k e k . end while Form an estimate for the original signal: ˆ f =[ X , . . . , X n ] h ˆ b T , . . . , ˆ b Tn i T . [31] R. Giryes and D. Needell, “Greedy signal space methods for incoherenceand beyond,” Appl. Comput. Harmon. Anal. , vol. 39, no. 1, pp. 1 – 20,Jul. 2015.[32] ——, “Near oracle performance and block analysis of signal spacegreedy methods,”
Journal of Approximation Theory , vol. 194, pp. 157– 174, Jun. 2015.[33] S. Nam, M. E. Davies, M. Elad, and R. Gribonval, “Recovery of cosparsesignals with greedy analysis pursuit in the presence of noise,” in , Dec 2011, pp. 361–364.[34] J. M. Morel and S. Solimini,
Variational Methods in Image Segmenta-tion . Cambridge, MA, USA: Birkhauser Boston Inc., 1995.[35] T. Chan and J. Shen,
Image Processing and Analysis . Society forIndustrial and Applied Mathematics, 2005.[36] J. Weickert, A. Bruhn, T. Brox, and N. Papenberg, “A survey onvariational optic flow methods for small displacements,” in
MathematicalModels for Registration and Applications to Medical Imaging , ser.Mathematics in industry. Springer Berlin Heidelberg, 2006, vol. 10,pp. 103–136.[37] G. Aubert and P. Kornprobst,
Mathematical problems in image pro- cessing: partial differential equations and the calculus of variations .Springer, 2006, vol. 147.[38] S. Shem-Tov, G. Rosman, G. Adiv, R. Kimmel, and A. M. Bruckstein,“On globally optimal local modeling: From moving least squares to over-parametrization,” in
Innovations for Shape Analysis . Springer, 2013,pp. 379–405.[39] G. Davis, S. Mallat, and M. Avellaneda, “Adaptive greedy approxima-tions,”
Constructive Approximation , vol. 50, pp. 57–98, 1997.[40] M. Elad, P. Milanfar, and R. Rubinstein, “Analysis versus synthesis insignal priors,”
Inverse Problems , vol. 23, no. 3, pp. 947–968, June 2007.[41] S. S. Chen, D. L. Donoho, and M. A. Saunders, “Atomic decompositionby basis pursuit,”
SIAM Journal on Scientific Computing , vol. 20, no. 1,pp. 33–61, 1998.[42] D. L. Donoho and M. Elad, “On the stability of the basis pursuit in thepresence of noise,”
Signal Process. , vol. 86, no. 3, pp. 511–532, 2006.[43] R. Tibshirani, “Regression shrinkage and selection via the lasso,”
J. Roy.Statist. Soc. B , vol. 58, no. 1, pp. 267–288, 1996.[44] S. Mallat and Z. Zhang, “Matching pursuits with time-frequency dictio-naries,”
IEEE Trans. Signal Process. , vol. 41, pp. 3397–3415, 1993.[45] S. Chen, S. A. Billings, and W. Luo, “Orthogonal least squares methodsand their application to non-linear system identification,”
InternationalJournal of Control , vol. 50, no. 5, pp. 1873–1896, 1989.[46] G. Davis, S. Mallat, and M. Avellaneda, “Adaptive time-frequencydecompositions,”
Optical Engineering , vol. 33, no. 7, pp. 2183–2191,July 1994.[47] D. Needell and J. Tropp, “CoSaMP: Iterative signal recovery fromincomplete and inaccurate samples,”
Appl. Comput. Harmon. Anal. ,vol. 26, no. 3, pp. 301 – 321, May 2009.[48] W. Dai and O. Milenkovic, “Subspace pursuit for compressive sensingsignal reconstruction,”
IEEE Trans. Inf. Theory. , vol. 55, no. 5, pp. 2230–2249, May 2009.[49] T. Blumensath and M. E. Davies, “Iterative hard thresholding forcompressed sensing,”
Appl. Comput. Harmon. Anal. , vol. 27, no. 3, pp.265 – 274, 2009.[50] S. Foucart, “Hard thresholding pursuit: an algorithm for compressivesensing,”
SIAM J. Numer. Anal. , vol. 49, no. 6, pp. 2543–2563, 2011.[51] R. Giryes and M. Elad, “Can we allow linear dependencies in the dic-tionary in the synthesis framework?” in
IEEE International Conferenceon Acoustics, Speech and Signal Processing (ICASSP) , 2013.[52] ——, “Iterative hard thresholding for signal recovery using near optimalprojections,” in ,2013.[53] ——, “OMP with highly coherent dictionaries,” in , 2013.[54] ——, “RIP-based near-oracle performance guarantees for SP, CoSaMP,and IHT,”
IEEE Trans. Signal Process. , vol. 60, no. 3, pp. 1465–1468,March 2012.[55] E. J. Cand`es, “Modern statistical estimation via oracle inequalities,”
ActaNumerica , vol. 15, pp. 257–325, 2006.[56] J. Savage and K. Chen, “On multigrids for solving a class of improvedtotal variation based staircasing reduction models,” in
Image ProcessingBased on Partial Differential Equations , ser. Mathematics and Visual-ization, X.-C. Tai, K.-A. Lie, T. Chan, and S. Osher, Eds. SpringerBerlin Heidelberg, 2007, pp. 69–94.[57] D. Mumford and J. Shah, “Optimal approximations by piecewise smoothfunctions and associated variational problems,”
Communications on Pureand Applied Mathematics , vol. 42, no. 5, pp. 577–685, 1989.[58] L. Ambrosio and V. M. Tortorelli, “Approximation of functional depend-ing on jumps by elliptic functional via t-convergence,”
Communicationson Pure and Applied Mathematics , vol. 43, no. 8, pp. 999–1036, 1990.[59] P. F. Felzenszwalb and D. P. Huttenlocher, “Efficient graph-based imagesegmentation,”
International Journal of Computer Vision , vol. 59, no. 2,pp. 167–181, 2004.[60] D. Jupp, “Approximation to data by splines with free knots,”
SIAMJournal on Numerical Analysis , vol. 15, no. 2, pp. 328–343, 1978.[61] E. J. Cand`es and D. L. Donoho, “Curvelets? a surprisingly effectivenonadaptive representation for objects with edges,” in
Curves andSurface Fitting: Saint-Malo 99 , C. R. A. Cohen and L. Schumaker, Eds.Vanderbilt University Press, Nashville, 2000, p. 105?120.[62] D. L. Donoho, “Sparse components of images and optimal atomic de-compositions,”
Constructive Approximation , vol. 17, no. 3, p. 353?382,2001.[63] T. F. Chan and L. Vese, “Active contours without edges,”