Generalized Forward-Backward Splitting
GGeneralized Forward-Backward Splitting
Hugo Raguet Jalal Fadili Gabriel Peyré CeremadeCNRS-Université Paris-DauphinePl. De Lattre De Tassigny,75775 Paris Cedex 16France {raguet,peyre}@ceremade.dauphine.fr GREYCCNRS-ENSICAEN-Université de Caen6, Bd du Maréchal Juin,14050 Caen CedexFrance
January 13, 2012
Abstract
This paper introduces the generalized forward-backward splitting algorithm forminimizing convex functions of the form F + ∑ ni = G i , where F has a Lipschitz-continuous gradient and the G i ’s are simple in the sense that their Moreau proxim-ity operators are easy to compute. While the forward-backward algorithm cannotdeal with more than n = non-smooth function, our method generalizes it to thecase of arbitrary n . Our method makes an explicit use of the regularity of F in theforward step, and the proximity operators of the G i ’s are applied in parallel in thebackward step. This allows the generalized forward-backward to efficiently addressan important class of convex problems. We prove its convergence in infinite dimen-sion, and its robustness to errors on the computation of the proximity operatorsand of the gradient of F . Examples on inverse problems in imaging demonstratethe advantage of the proposed methods in comparison to other splitting algorithms. Throughout this paper, H denotes a real Hilbert space endowed with scalar product ⟨⋅ ∣ ⋅⟩ and associated norm ∣∣ ⋅ ∣∣ , and n is a positive integer. We consider the followingminimization problem min x ∈H { Ψ ( x ) def = F ( x ) + n ∑ i = G i ( x )} , (1)where all considered functions belong to the class Γ (H) of lower semicontinuous, proper(its domain is non-empty) and convex functions from H to ]−∞ , +∞] . The decomposition (1) is fairly general, and a wide range of iterative algorithmstakes advantage of the specific properties of the functions in the summand. One crucialproperty is the possibility to compute the associated proximity operators [54], defined as prox G ( x ) def = argmin y ∈H ∣∣ x − y ∣∣ + G ( y ) . (2)1 a r X i v : . [ m a t h . O C ] J a n his is in itself a convex optimization problem, which can be solved efficiently for manyfunctions, e.g. when the solution, unique by strong convexity, can be written in closedform. Such functions are referred to as “simple”.Another important feature is the differentiability of the functional to be minimized.However, gradient-descent approaches do not apply as soon as one of the functions G i isnon-smooth. For n ≡ and G simple, the forward-backward algorithm circumvents thisdifficulty if F is differentiable with a Lipschitz-continuous gradient. This scheme consistsin performing alternatively a gradient-descent (corresponding to an explicit step on thefunction F ) followed by a proximal step (corresponding to an implicit step on the function G ). Such a scheme can be understood as a generalization of the projected gradientmethod. This algorithm has been well studied [50, 40, 67, 16, 69, 24, 7]. Acceleratedmultistep versions have been proposed [55, 70, 6], that enjoy a faster convergence rateof O ( / t ) on the objective Ψ .Other splitting methods do not require any smoothness on some part of the compositefunctional Ψ . The Douglas-Rachford [27] and Peaceman-Rachford [57] schemes weredeveloped to minimize G ( x )+ G ( x ) , provided that G and G are simple [47, 45, 33, 17]and rely only on the use of proximity operators. The backward-backward algorithm [46,56, 1, 5, 17] can be used to minimize Ψ ( x ) = G ( x ) + G ( x ) when the functions involvedare the indicator functions of non-empty closed convex sets, or involve Moreau envelopes.Interestingly, if one of the functions G or G is a Moreau envelope and the other issimple, the forward-backward algorithm amounts to a backward-backward scheme.If L is a bounded injective linear operator, it is possible to minimize Ψ ( x ) = G ○ L ( x ) + G ( x ) by applying these splitting schemes on the Fenchel-Rockafellar dual prob-lem. It was shown that applying the Douglas-Rachford scheme leads to the alternat-ing direction method of multipliers (ADMM) [39, 40, 41, 42, 33]. For non-necessarilyinjective L and G strongly convex with a Lipschitz-continuous gradient, the forward-backward algorithm can be applied to the Fenchel-Rockafellar dual [36, 19]. Dealingwith an arbitrary bounded linear operator L can be achieved using primal-dual methodsmotivated by the classical Kuhn-Tucker theory. Starting from methods to solve saddlefunction problems such as the Arrow-Hurwicz method [2] and its modification [60], theextragradient method [44], this problem has received a lot of attention more recently[15, 68, 64, 53, 12, 9].It is also possible to extend the Douglas-Rachford algorithm to an arbitrary number n > of simple functions. Inspired by the method of partial inverses [65, Section 5], mostmethods rely either explicitly or implicitly on introducing auxiliary variables and bringingback the original problem to the case n = in the product space H n . Doing so yieldsiterative schemes in which one performs independent parallel proximal steps on each ofthe simple functions and then computes the next iterate by essentially averaging theresults. Variants have been proposed in [21] and [34], who describe a general projectiveframework that does not reduce the problem to the case n = . Note however that theseextensions do not apply to the forward-backward scheme that can only handle n ≡ . Itis at the heart of this paper to present such an extension.Recently proposed methods extend existing splitting schemes to handle the sum ofany number of n ≥ composite functions of the form G i = H i ○ L i , where the H i ’s aresimple and the L i ’s are bounded linear operators. Let us denote L i ∗ the adjoint operatorof L i . If L i satisfies L i L i ∗ = ν Id for any ν > (it is a so-called tight frame ), H i ○ L i issimple as soon as H i is simple and L i ∗ is easy to compute [20]. This case thus reduces tothe previously reviewed ones. If L i is not a tight frame but ( Id + L i ∗ L i ) or ( Id + L i L i ∗ ) is easily invertible, it is again possible to reduce the problem to the previous cases byintroducing as many auxiliary variables as the number of L i ’s each belonging to the range2f L i . Note however that, if solved with the Douglas-Rachford algorithm on the productspace, the auxiliary variables are also duplicated, which would increase significantly thedimensionality of the problem. Some dedicated parallel implementations were specificallydesigned for the case where (∑ i L i ∗ L i ) or (∑ i L i L i ∗ ) is (easily) invertible, see for instance[32, 58]. If the L i ’s satisfy none of the above properties, it is still possible to call on primal-dual methods, either by writing Ψ ( x ) = ∑ ni = H i ( L i x ) = G ( Lx ) with L ( x ) = ( L i ( x )) i and G ( ( x i ) i ) = ∑ i H i ( x i ) , see for instance [29]; or Ψ (( x i ) i ) = ι S (( x i ) i ) + ∑ i H i ( L i x i ) [9],where S is the closed convex set defined in Section 3.2.In spite of the wide range of already existing proximal splitting methods, none seemssatisfying to address explicitly the case where n > and F is smooth but not necessarilysimple. A workaround that has been proposed previously used nested algorithms to com-pute the proximity operator of ∑ i G i within sub-iterations, see for instance [30, 14]; thisleads to practical as well as theoretical difficulties to select the number of sub-iterations.More recently, [53] proposed an algorithm for minimizing Ψ ( x ) = F ( x ) + G ( x ) underlinear constraints. We show in Section 5 how this can be adapted to adress the generalproblem (1) while achieving full splitting of the proximity operators of the G i ’s and usingthe gradient of F . It suffers however from limitations, in particular the introduction ofmany auxiliary variables and the fact that the gradient descent can’t be directly appliedto the minimizer; see Section 5 and 6 for details. The generalized forward-backwardalgorithm introduced in this paper is intended to avoid all those shortcomings.As this paper was being finalized, the authors in [23] independently developed aprimal-dual algorithm to solve a class of problems that cover those we consider here.Their approach and algorithm are however very different from ours in many importantways. We will provide a detailed comparison with this work in Section 5 and will alsoshow on numerical experiments in Section 6 that our algorithm seems more adapted forproblems of the form (1). Many imaging applications require solving ill-posed inverse problems to recover highquality images from low-dimensional and noisy observations. These challenging problemsnecessitate the use of regularization through prior models to capture the geometry ofnatural signals, images or videos. The resolution of the inverse problem can be achievedby minimizing objective functionals, with respect to a high-dimensional variable, thattakes into account both a fidelity term to the observations and regularization termsreflecting the priors. Clearly, such functionals are composite by construction. Section 6details several examples of such inverse problems.In many situations, this leads to the optimization of a convex functional that canbe split into the sum of convex smooth and non-smooth terms. The smooth part of theobjective is in some cases a data fidelity term and reflects some specific knowledge aboutthe forward model, i.e. the noise and the measurement/degradation operator. This is forinstance the case if the operator is linear and the noise is additive Gaussian, in which casethe data fidelity is a quadratic function. The most successful regularizations that havebeen advocated are non-smooth, which typically allow to preserve sharp and intricatestructures in the recovered data. Among such priors, sparsity-promoting ones havebecome popular, e.g. the (cid:96) -norm of coefficients in a wisely chosen dictionary [49], or totalvariation (TV) prior [63]. To better model the data, composite priors can be constructedby summing several suitable regularizations, see for instance the morphological diversityframework [66]. The proximity operator of the (cid:96) -norm penalization is a simple soft-thresholding [26], whereas the use of complex or mixed regularization priors justifies the3plitting of non-smooth terms in several simpler functions (see Section 6 for concreteexamples).The composite structure of convex optimization problems raising when solving inverseproblems in the form of a sum of simple and/or smooth functions involving linear opera-tors explains the popularity of proximal splitting schemes in imaging science. Dependingon the structure of the objective functional as detailed in the previous section, one canresort to the appropriate splitting algorithm. For instance, the forward-backward algo-rithm and its modifications has become popular for sparse regularization with a smoothdata fidelity, see for instance [38, 25, 24, 35, 13, 6, 8]. The Douglas-Rachford and itsparallelized extensions were also used in a variety of inverse problems implying only non-smooth functions, see for instance [20, 21, 30, 14, 10, 28, 31, 61]. The ADMM (whichis nothing but Douglas-Rachford on the dual) was also applied to some linear inverseproblems in [48, 37]. Primal-dual schemes [12, 29] are among the most flexible schemesto handle more complicated priors. The interested reader may refer to [66, Chapter 7]and [22] for extensive reviews. This paper introduces a novel generalized forward-backward algorithm to solve (1)when F is convex with a Lipschitz continuous gradient, and the G i ’s are convex andsimple. The algorithm achieves full splitting where all operators are used separately: anexplicit step for ∇ F (single-valued) and a parallelized implicit step through the proximityoperators of the G i ’s. We prove convergence of the algorithm as well as its robustness toerrors that may contaminate the iterations. To the best of our knowledge, it is amongthe first algorithms to tackle the case where n > and F is smooth (see Section 5 forrelation to a recent work developed in parallel to ours). Although our numerical resultsare reported only on imaging applications, the algorithm may prove useful for manyother applications such as machine learning or statistical estimation.Section 2 presents the algorithm and state our main theoretical result. Section 3, thatcan be skipped by experienced readers, sets some necessary material from the frameworkof monotone operator theory. Section 4 reformulates the generalized forward-backwardalgorithm for finding the zeros of the sum of maximal monotone operators, and provesits convergence and its robustness. Special instances of the algorithm, its potentialextensions and discussion of its relation to two alternatives in the literature are given inSection 5. Numerical examples are reported in Section 6 to show the usefulness of thisapproach for applications to imaging problems. We consider problem (1) where all functions are in Γ (H) , F is differentiable on H with / β -Lipschitz gradient where β ∈] , +∞[ , and for all i , G i is simple. We also assumethe following:(H1) The set of minimizers of (1) argmin ( Ψ ) is non-empty;(H2) The domain qualification condition holds, i.e. ( , . . . , ) ∈ sri {( x − y , . . . , x − y n ) ∣ x ∈ H and ∀ i, y i ∈ dom ( G i )} , dom ( G i ) def = { x ∈ H∣ G i ( x ) < +∞} and sri ( C ) is the strong relative interior of a non-empty convex subset C of H [4]. Under (H1)-(H2), it follows from [62, 4, Theorem 16.2and Theorem 16.37(i)] that ∅ ≠ argmin ( Ψ ) = zer ( ∂ Ψ ) = zer (∇ F + ∑ i ∂G i ) , where ∂G i denotes the subdifferential of G i and zer ( A ) is the set of zeros of a set-valuedmap A (see Definition 3.1 in Section 3.1). Therefore, solving (1) is equivalent toFind x ∈ H such that ∈ ∇ F ( x ) + ∑ i ∂G i ( x ) . (3)The generalized forward-backward we propose to minimize (1) (or equivalently tosolve (3)) is detailed in Algorithm 1. Algorithm 1
Generalized Forward-Backward Algorithm for solving (1). β - ∈] , +∞[ is the Lipschitz constant of ∇ F ; I λ is defined in Theorem 2.1. Require ( z i ) i ∈ (cid:74) ,n (cid:75) ∈ H n , ( ω i ) i ∈ (cid:74) ,n (cid:75) ∈ ] , [ n s.t. ∑ ni = ω i = ,γ t ∈ ] , β [ ∀ t ∈ N , λ t ∈ I λ ∀ t ∈ N . Initialization x ← ∑ i ω i z i ; t ← . Main iterationrepeatfor i ∈ (cid:74) , n (cid:75) do z i ← z i + λ t ( prox γtωi G i ( x − z i − γ t ∇ F ( x )) − x ) ; x ← ∑ i ω i z i ; t ← t + . until convergence ; Return x .To state our main theorem that ensures the convergence of the algorithm and itsrobustness, for each i let ε ,t,i be the error at iteration t when computing prox γtωi G i atits argument, and let ε ,t be the error at iteration t when applying ∇ F to its argument.Algorithm 1 generates sequences ( z i,t ) t ∈ N , i ∈ (cid:74) , n (cid:75) and ( x t ) t ∈ N , such that for all i and t , z i,t + = z i,t + λ t ( prox γtωi G i ( x t − z i,t − γ t (∇ F ( x t ) + ε ,t ) ) + ε ,t,i − x t ) . (4)The following theorem introduces two different sets of assumptions to guaranteeconvergence. Assumption (A1) allows one to use a greater range for the relaxationparameters λ t , while assumptions (A2) enables varying gradient-descent step-size γ t andensures strong convergence in the uniformly convex case. Recall that a function F ∈ Γ (H) is uniformly convex if there exists a non-decreasing function ϕ ∶ [ , +∞[→ [ , +∞] that vanishes only at 0, such that for all x and y in dom ( F ) , the following holds ∀ ρ ∈] , [ , F ( ρx + ( − ρ ) y ) + ρ ( − ρ ) ϕ (∣∣ x − y ∣∣) ≤ ρF ( x ) + ( − ρ ) F ( y ) . (5) Theorem 2.1.
Set lim γ t = ¯ γ and define the following assumptions: (A0) (i) < lim λ t ≤ lim λ t < min ( , + β / ¯ γ ) ; (ii) ∑ +∞ t = ∣∣ ε ,t ∣∣ < +∞ , and for all i , ∑ +∞ t = ∣∣ ε ,t,i ∣∣ < +∞ . ∀ t, γ t = ¯ γ ∈] , β [ ; (ii) I λ = ] , min ( , + β / ¯ γ )[ . (A2) (i) < lim γ t ≤ ¯ γ < β ; (ii) I λ =] , ] .Suppose that (H1) , (H2) and (A0) are satisfied. Then, if either (A1) or (A2) is satisfied, ( x t ) t ∈ N defined in (4) converges weakly towards a minimizer of (1) . Moreover, if (A2) is satisfied and F is uniformly convex, the convergence is strong to the unique globalminimizer of (1) . This theorem will be proved after casting it in the more general framework of mono-tone operator splitting in Section 4.
Remark . The sufficient condition of strong convergence in Theorem 2.1 can be weak-ened, and other ones can be stated as well. Indeed, the generalized forward-backwardalgorithm has a structure that bears similarities with the classical forward-backward,since it consists of an explicit forward step, followed by an implicit step where the prox-imity operators are computed in parallel. In fact, it turns out that the backward stepinvolves a firmly non-expansive operator (see next section), and therefore statementsof [24, Theorem 3.4(iv) and Proposition 3.6] can be transposed with some care to ouralgorithm.The formulation of Algorithm 1 is general, but it can be simplified for practicalpurposes. In particular, the auxiliary variables z i can all be initialized to , the weights ω i set equally to / n , and for simplicity the relaxation parameters λ t and the gradient-descent step-size γ t can be set constant along iterations. This is typically what has beendone in the numerical experiments. The subdifferential of a function in Γ (H) is the best-known example of maximalmonotone operator. Therefore, it is natural to extend the generalized forward-backward,Algorithm 1, to find the zeros of the sum of maximal monotone operators, i.e. solvethe monotone inclusion (3) when the subdifferential is replaced by any maximal mono-tone operator. This is the goal pursued in Section 4 where we provide the proof of ageneral convergence and robustness theorem whose byproduct is a convergence proof ofTheorem 2.1.We first begin by recalling some essential definitions and properties of monotoneoperators that are necessary to our exposition. The interested reader may refer to [59, 4]for a comprehensive treatment. In the following, A ∶ H → H is a set-valued operator, and Id is the identity operatoron H . A is single-valued if the cardinality of Ax is at most 1. Definition 3.1 (Graph, inverse, domain, range and zeros) . The graph of A is theset gra ( A ) def = {( x, y ) ∈ H ∣ y ∈ Ax } . The inverse of A is the operator whose graph is gra ( A - ) def = {( x, y ) ∈ H ∣ ( y, x ) ∈ gra ( A )} . The domain of A is dom ( A ) def = { x ∈ H ∣ Ax ≠ ∅} .The range of A is ran ( A ) def = { y ∈ H ∣ ∃ x ∈ H ∶ y ∈ Ax } , and its zeros set is zer ( A ) def ={ x ∈ H ∣ ∈ Ax } = A - ( ) . 6 efinition 3.2 (Resolvant and reflection operators) . The resolvant of A is the operator J A def = ( Id + A ) - . The reflection operator associated to J A is the operator R A def = J A − Id . Definition 3.3 (Maximal monotone operator) . A is monotone if ∀ x, y ∈ H , u ∈ Ax and v ∈ Ay ⇒ ⟨ u − v ∣ x − y ⟩ ≥ . It is moreover maximal if its graph is not strictly contained in the graph of any othermonotone operator.
Definition 3.4 (Non-expansive and α -averaged operators) . A is non-expansive if ∀ x, y ∈ H , u ∈ Ax and v ∈ Ay ⇒ ∣∣ u − v ∣∣ ≤ ∣∣ x − y ∣∣ . For α ∈] , [ , A is α -averaged if there exists R non-expansive such that A = ( − α ) Id + αR .We denote A( α ) the class of α -averaged operators on H . In particular, A ( ) is the classof firmly non-expansive operators.Note that non-expansive operators are necessarily single-valued and 1-Lipschitz con-tinuous, and so are α -averaged operators since they are also non-expansive. The followinglemma gives some useful characterizations of firmly non-expansive operators. Lemma 3.1.
Let A ∶ dom ( A ) = H → H . The following statements are equivalent: (i) A is firmly non-expansive; (ii) A − Id is non-expansive; (iii) ∀ x, y ∈ H , ∣∣ Ax − Ay ∣∣ ≤ ⟨ Ax − Ay ∣ x − y ⟩ ; (iv) A is the resolvent of a maximal monotone operator A ′ , i.e. A = J A ′ .Proof. (i) ⇔ (ii), A ∈ A ( ) ⇔ A = Id + R for some R non-expansive. (i) ⇔ (iii), see [72].(i) ⇔ (iv), see [51].We now summarize some properties of the subdifferential that will be useful in thesequel. Lemma 3.2.
Let F ∶ H → R be a convex differentiable function, with / β -Lipschitzcontinuous gradient, β ∈] , +∞[ , and let G ∶ H →]−∞ , +∞] be a function in Γ (H) .Then, (i) β ∇ F ∈ A ( ) , i.e. is firmly non-expansive; (ii) ∂G is maximal monotone; (iii) The resolvent of ∂G is the proximity operator of G , i.e. prox G = J ∂G .Proof. (i) This is Baillon-Haddad theorem [3]. (ii) See [62]. (iii) See [54].We thus consider in the following n maximal monotone operators A i ∶ H → H indexed by i ∈ (cid:74) , n (cid:75) , and a (single-valued) operator B ∶ H → H and β ∈] , +∞[ such that βB ∈ A ( ) . Therefore, solving (3) can be translated in the moregeneral language of maximal monotone operators as solving the monotone inclusionFind x ∈ H such that ∈ Bx + ∑ i A i x, (6)where it is assumed that zer ( B + ∑ i A i ) ≠ ∅ .7 .2 Product Space The previous definitions being valid for any real Hilbert space, they also apply tothe product space H n endowed with scalar product and norm derived from the onesassociated to H .Let ( ω i ) i ∈ (cid:74) ,n (cid:75) ∈ ] , [ n such that ∑ ni = ω i = . We consider H def = H n endowed with thescalar product ⟨⟨⋅ ∣∣ ⋅⟩⟩ , defined as ∀ x = ( x i ) i , y = ( y i ) i ∈ H , ⟨⟨ x ∣∣ y ⟩⟩ = n ∑ i = ω i ⟨ x i ∣ y i ⟩ and with the corresponding norm ∣∣ ⋅ ∣∣ . S ⊂ H denotes the closed convex set definedby S def = { x = ( x i ) i ∈ H ∣ x = x = ⋯ = x n } , whose orthogonal complement is the closedlinear subspace S (cid:150) = { x = ( x i ) i ∈ H ∣ ∑ i ω i x i = } . We denote by Id the identity operatoron H , and we define the canonical isometry C ∶ H → S , x ↦ ( x, . . . , x ) .ι S ∶ H →]−∞ , +∞] and N S ∶ H → H are respectively the indicator function and thenormal cone of S , that is ι S ( x ) def = ⎧⎪⎪⎨⎪⎪⎩ if x ∈ S , +∞ otherwise , and N S ( x ) def = ⎧⎪⎪⎨⎪⎪⎩ S (cid:150) if x ∈ S , ∅ otherwise . Since S is non-empty closed and convex, it is straightforward to see that N S is maximalmonotone. To lighten the notation in the sequel, we introduce the following concatenatedoperators. For every i ∈ (cid:74) , n (cid:75) , let A i and B as defined in (6). For γ = ( γ i ) i ∈ (cid:74) ,n (cid:75) ∈] , +∞[ n , we define γA ∶ H → H , x = ( x i ) i ↦ ⨉ ni = γ i A i ( x i ) , i.e. its graph is gra ( γA ) def = n ⨉ i = gra ( γ i A i )= {( x , y ) ∈ H ∣ x = ( x i ) i , y = ( y i ) i , and ∀ i, y i ∈ γ i A i x i } , and B ∶ H → H , x = ( x i ) i ↦ ( Bx i ) i .Using the maximal monotonicity of A , . . . , A n and B it is an easy exercise to establishthat γA and B are maximal monotone on H . Now that we have set all necessary material, we are ready to solve the monotoneinclusion (6). First, we derive an equivalent fixed point equation satisfied by any solutionof (6). From this, we draw an algorithmic scheme and prove its convergence towards asolution, as well as its robustness to errors. Finally, we derive the proof of Theorem 2.1.
From now on, we denote the set of fixed points of an operator T ∶ H → H by Fix T def = { z ∈ H ∣ T z = z } . 8 roposition 4.1. Let ( ω i ) i ∈ (cid:74) ,n (cid:75) ∈ ] , [ n . For any γ > , x ∈ H is a solution of (6) ifand only if there exists ( z i ) i ∈ (cid:74) ,n (cid:75) ∈ H n such that ⎧⎪⎪⎪⎨⎪⎪⎪⎩ ∀ i, z i = R γωi A i ( x − z i − γBx ) − γBx ,x = ∑ i ω i z i . (7) Proof. set γ > , we have the equivalence ∈ Bx + ∑ i A i x ⇔ ∃ ( z i ) i ∈ H n ∶ { ∀ i, ω i ( x − z i − γBx ) ∈ γA i x ,x = ∑ i ω i z i . Now, ω i ( x − z i − γBx ) ∈ γA i x ⇔ ( x − z i − γBx ) − x ∈ γω i A i x (by Lemma 3.1 (iv)) ⇔ x = J γωi A i ( x − z i − γBx )⇔ x − ( x − z i ) = J γωi A i ( x − z i − γBx )− ( x − z i − γBx ) − γBx ⇔ z i = R γωi A i ( x − z i − γBx ) − γBx . Before formulating a fixed point equation, consider the following preparatory lemma.
Lemma 4.1.
For all z = ( z i ) i ∈ H , b = ( b ) i ∈ S , and γ = ( γ i ) i ∈ ] , +∞[ n , (i) J N S is the orthogonal projector on S , and J N S z = C (∑ i ω i z i ) ; (ii) R N S ( z − b ) = R N S z − b ; (iii) R γA z = ( R γ i A i ( z i )) i .Proof. (i). From Lemma 3.2 (iii), we have for z ∈ H , J N S ( z ) = argmin y ∈ S ∣∣ z − y ∣∣ def = proj S ( z ) . Now, argmin y ∈ S ∣∣ z − y ∣∣ = C ( argmin y ∈H ∑ i ω i ∣∣ z i − y ∣∣ ) , where the unique minimizer of ∑ i ω i ∣∣ z i − y ∣∣ is the barycenter of ( z i ) i , i.e. ∑ i ω i z i .(ii). J N S is obviously linear, and so is R N S . Since b ∈ S , R N S b = b and the resultfollows.(iii). This is a consequence of the separability of γA in terms of the components of z implying that J γA z = ( J γ i A i z i ) i . The result follows from the definition of R γA . Proposition 4.2. ( z i ) i ∈ (cid:74) ,n (cid:75) ∈ H n satisfies (7) if and only if z = ( z i ) i is a fixed point ofthe following operator H —→ H z z→ [ R γA R N S + Id ][ Id − γ B J N S ]( z ) , (8) with γ = ( γω i ) i . roof. Using Lemma 4.1 in (7), we have C ( x ) = J N S z , C ( Bx ) = B J N S ( z ) and R N S − γ B J N S = R N S [ Id − γ B J N S ] . Altogether, this yields, z satisfies (7) ⇔ z = R γA R N S [ Id − γ B J N S ] z − γ B J N S z ⇔ z = R γA R N S [ Id − γ B J N S ] z + [ Id − γ B J N S ] z ⇔ z = [ R γA R N S + Id ][ Id − γ B J N S ] z The expression (8) gives us the operator on which is based the generalized forward-backward. We first study the properties of this operator before establishing convergenceand robustness results of our algorithm derived from the Krasnoselskij-Mann schemeassociated to it.
Proposition 4.3.
For all γ ∈ ] , +∞[ n , define T , γ ∶ H —→ H z z→ [ R γA R N S + Id ] z . (9) Then, T , γ is firmly non-expansive, i.e. T , γ ∈ A ( ) .Proof. From Lemma 3.1, R γ i A i and R N S are non-expansive. In view of Lemma 4.1 (iii), R γA is non-expansive as well. Finally, as a composition of non-expansive operators, R γA R N S is also non-expansive, and the proof is complete by the definition of A ( ) . Proposition 4.4.
For all γ ∈] , β [ , define T ,γ ∶ H —→ H z z→ [ Id − γ B J N S ] z . (10) Then, T ,γ ∈ A ( γ β ) .Proof. By hypothesis, βB ∈ A ( ) and so is β B . Then, we have for any x , y ∈ H∣∣ β B J N S x − β B J N S y ∣∣ ≤ ⟨⟨ β B J N S x − β B J N S y ∣∣ J N S x − J N S y ⟩⟩= ⟨⟨ βJ N S B J N S x − βJ N S B J N S y ∣∣ x − y ⟩⟩= ⟨⟨ β B J N S x − β B J N S y ∣∣ x − y ⟩⟩ , (11)where we derive the first equality from the fact that J N S is self-adjoint (Lemma 4.1(i)), and the second equality using that for all x ∈ S , Bx ⊂ S and J N S x = x . FromLemma 3.1 (iii) ⇔ (i), we establish that β B J N S ∈ A ( ) . We conclude using [17, Lemma2.3]. Proposition 4.5.
For all γ ∈ ] , +∞[ n and γ ∈ ] , β [ , T , γ T ,γ ∈ A( α ) , with α = max ( , + β / γ ) .Proof. As T , γ and T ,γ are α -averaged operators by Proposition 4.3 and Proposition 4.4,it follows from [17, Lemma 2.2 (iii)] that their composition is also α -averaged with thegiven value of α . 10he following proposition defines a maximal monotone operator A ′ γ which will be usefulfor caracterizing fixed points of T , γ T ,γ as monotone inclusions. Proposition 4.6.
For all γ ∈ ] , +∞[ n there exists a maximal monotone operator A ′ γ such that T , γ = J A ′ γ . Moreover for all γ > , Fix T , γ T ,γ = zer ( A ′ γ + γ B J N S ) . (12) Proof.
The existence of A ′ γ is ensured by Proposition 4.3 and Lemma 3.1 (iv). Then for z ∈ H , z = T , γ T ,γ z ⇔ z = ( Id + A ′ γ ) - ( Id − γ B J N S ) z ⇔ z − γ B J N S z ∈ z + A ′ γ z ⇔ ∈ A ′ γ z + γ B J N S z Now, let us examine the properties of A ′ γ . Proposition 4.7.
For all γ ∈ ] , +∞[ n and u , y ∈ H u ∈ A ′ γ y ⇔ u S − y (cid:150) ∈ γA ( y S − u (cid:150) ) , (13) where we denote for y ∈ H , y S def = proj S ( y ) and y (cid:150) def = proj S (cid:150) ( y ) .Proof. First of all, by definition of T , γ we have T , γ = [( J γA − Id ) ( J N S − Id ) + Id ]= [ J γA ( proj S − proj S (cid:150) ) − ( proj S − proj S (cid:150) ) + proj S + proj S (cid:150) ]= J γA ( proj S − proj S (cid:150) ) + proj S (cid:150) . (14)By definition we have A ′ γ = T , γ - − Id so that u ∈ A ′ γ y ⇔ u + y ∈ T , γ - y ⇔ T , γ ( u + y ) = y (by (14)) ⇔ J γA (( u + y ) S − ( u + y ) (cid:150) ) = y − ( u + y ) (cid:150) ⇔ ( u + y ) S − ( u + y ) (cid:150) ∈ y S − u (cid:150) + γA ( y S − u (cid:150) )⇔ u S − y (cid:150) ∈ γA ( y S − u (cid:150) ) . We are now ready to state our main result, establishing convergence and robustnessof the generalized forward-backward algorithm to solve (6).
Theorem 4.1.
Let ( γ t ) t ∈ N be a sequence in ] , β [ , ( γ t ) t ∈ N be a sequence in ] , +∞[ n such that ∀ t, γ t = ( γ t ω i ) i , ( λ t ) t ∈ N be a sequence such that ∀ t, λ t ∈ I λ (made explicit below),set z ∈ H , and for every t ∈ N , set z t + = z t + λ t ( T , γ t ( T ,γ t z t + ε ,t ) + ε ,t − z t ) (15) where T , γ t (resp. T ,γ t ) is defined in (9) (resp. in (10) ), and ε ,t , ε ,t ∈ H . Set lim γ t = ¯ γ and define the following conditions: zer ( B + ∑ i A i ) ≠ ∅ ; (ii) < lim λ t ≤ lim λ t < min ( , + β / ¯ γ ) ; (iii) ∑ +∞ t = ∣∣ ε ,t ∣∣ < +∞ and ∑ +∞ t = ∣∣ ε ,t ∣∣ < +∞ . (A1) (i) ∀ t, γ t = ¯ γ ∈] , β [ ; (ii) I λ = ] , min ( , + β / ¯ γ )[ . (A2) (i) < lim γ t ≤ ¯ γ < β ; (ii) I λ =] , ] .Suppose that (A0) is satisfied. Then, If either (A1) or (A2) is satisfied, (i) ( T , γ t T ,γ t z t − z t ) t ∈ N converges strongly to . (ii) ( z t ) t ∈ N converges weakly to a point z ∈ F def = ⋂ t ∈ N Fix T , γ t T ,γ t . (iii) ( x t def = ∑ i ω i z i,t ) t ∈ N converges weakly to x def = ∑ i ω i z i ∈ zer ( B + ∑ i A i ) .Moreover, if (A2) is satisfied and B is uniformly monotone, then (iv) ( x t ) t ∈ N converges strongly.Proof. For sequences in a Hilbert space, strong convergence is denoted by —→ and weakconvergence is denoted by ——⇀ .(i)-(ii). Suppose first that (A0) and (A1) are satisfied.Under (A1)-(i), T def = T , ¯ γ T , ¯ γ does not depend on t (stationary operator). For all t ∈ N ,we have z t + = z t + λ t ( T z t + ε t − z t ) , (16)with ε t def = T , ¯ γ ( T , ¯ γ z t + ε ,t ) − T , ¯ γ ( T , ¯ γ z t ) + ε ,t . Proposition 4.3 shows that T , ¯ γ ∈ A ( ) is in particular non-expansive, so that ∣∣ ε t ∣∣ ≤ ∣∣ ε ,t ∣∣ + ∣∣ ε ,t ∣∣ , and we deduce from (A0)-(iii) that ∑ +∞ t = ∣∣ ε t ∣∣ < +∞ . Moreover, by Proposition 4.5 and (A1)-(i), T ∈ A( α ) with α = max ( , + β / ¯ γ ) . In particular, T is non-expansive and thus F = Fix T is closed andconvex. Now, for t ∈ N , set T t def = Id + λ t ( T − Id H ) , the iterations (16) can be rewritten z t + = T t z t + λ t ε t . (17)Since for all t , α t def = λ t α < by (A1)-(ii), [17, Lemma 2.2 (i)] shows that T t ∈ A( α t ) ,and (17) is thus a particular instance of [17, Algorithm 4.1]. Also, it is clear thatfor all t , Fix T t = Fix T . By Proposition 4.1 and Proposition 4.2, (A0)-(i) provides F = ⋂ t ∈ N Fix T t ≠ ∅ . According to (A0)-(ii), lim λ t > and lim α t < , so we deduce from[17, Theorem 3.1 and Remark 3.4] that ∑ t ∈ N ∣∣ T t z t − z t ∣∣ < +∞ . (18)and that ( z t ) t is quasi-Fejér monotone with respect to F . By definition of T t , (18) gives ∑ t ∈ N λ t ∣∣ T z t − z t ∣∣ < +∞ , which in turn implies T z t − z t —→ since lim λ t > . Then T being non-expansive, it follows from the demiclosed principle [11][4, Corollary 4.18] thatany weak cluster point of ( z t ) t belongs to Fix T , so that [4, Theorem 5.5] provides weak12onvergence towards z ∈ F .Suppose now that (A0) and (A2) are satisfied.Again with Proposition 4.1, Proposition 4.2 and (A0)-(i), F ≠ ∅ . From Proposition 4.3,Proposition 4.4 and (A2)-(i), T , γ t ∈ A ( ) and T ,γ t ∈ A ( γ t β ) for all t . So, under assump-tions (A0)-(iii) and (A2), [17, Theorem 3.1 and Remark 3.4] provides that T , γ t T ,γ t z t − z t —→ (establishing (i)), that for any z ∈ F ( Id − T ,γ t ) z t − ( Id − T ,γ t ) z ———→ t →+∞ , (19)and that ( z t ) t is quasi-Fejér monotone with respect to F . Again, by non-expansivity F is closed and convex, and with [4, Theorem 5.5], ( z t ) t converges weakly to some pointin F if, and only if, all of its weak cluster points lie in F .Let thus y be a weak cluster point of ( z t ) t . ( γ t ) t being bounded, we can extract asubsequence ( z t τ ) τ converging weakly towards y such that ( γ t τ ) τ converges stronglyto some γ ∞ ( < γ ∞ < β by (A2)-(i)). Fix then z ∈ F and observe that (19) implies B J N S z t τ —→ B J N S z .Since β B J N S ∈ A ( ) , B J N S is continuous and monotone, hence maximal monotone[4, Corollary 20.25]. Consequently, its graph is sequentially weakly-strongly closed [4,Corollary 20.33(ii)]. Because B J N S is single-valued and z t τ ——⇀ y , we deduce B J N S y = B J N S z .Now denote for all t , y t def = T , γ t T ,γ t z t and u t def = ( Id − T , γ t ) T ,γ t z t . It follows from (i)that y t − z t —→ , implying y t τ ——⇀ y . Then, u t = T ,γ t z t − T , γ t T ,γ t z t = z t − γ t B J N S z t − y t , so that u t τ —→ − γ ∞ B J N S y .Moreover, u t ∈ (( Id + A ′ γ t ) − Id ) T , γ t T ,γ t z t , hence u t ∈ A ′ γ t y t . Thus for all t , u t S − y t (cid:150) ∈ γ t A ( y t S − u t (cid:150) ) by Proposition 4.7. If ( v , u ) ∈ gra ( γ ∞ A ) with γ ∞ def = ( γ ∞ ω i ) i , then γ t γ ∞ u ∈ γ t Av , and by monotonicity ⟨⟨ u t S − y t (cid:150) − γ t γ ∞ u ∣∣ y t S − u t (cid:150) − v ⟩⟩ ≥ , by bilinearity and taking into account orthogonality ⟨⟨ u t S − γ t γ ∞ u ∣∣ y t S − u t (cid:150) − v ⟩⟩ + ⟨⟨ y t (cid:150) ∣∣ u t (cid:150) + v ⟩⟩ ≥ . By weak convergence, ( y t τ ) τ is bounded. Together with strong convergence of ( u t τ ) τ and ( γ t τ ) τ , [4, Lemma 2.36] allows to take the limit as τ tends to infinity in the aboveinequality. Using B J N S y ∈ S , ⟨⟨− γ ∞ B J N S y − u ∣∣ y S − v ⟩⟩ + ⟨⟨ y (cid:150) ∣∣ v ⟩⟩ ≥ ⟨⟨− γ ∞ B J N S y − y (cid:150) − u ∣∣ y S − v ⟩⟩ ≥ . Hence maximality of γ ∞ A forces ( y S , − γ ∞ B J N S y − y (cid:150) ) ∈ gra ( γ ∞ A ) , i.e. − γ ∞ B J N S y − y (cid:150) ∈ γ ∞ A ( y S ) . Thus Proposition 4.7 provides − γ ∞ B J N S y ∈ A ′ γ ∞ y ,and by Proposition 4.6, y ∈ Fix T , γ ∞ T ,γ ∞ = F .(iii). In both cases, for any y ∈ H , ⟨ y ∣ x t − x ⟩ = ⟨ y ∣ ∑ i ω i ( z i,t − z i )⟩ = ∑ i ω i ⟨ y ∣ z i,t − z i ⟩ =⟨⟨ C ( y ) ∣∣ z t − z ⟩⟩ —→ since z t ——⇀ z , hence weak convergence of ( x t ) t ∈ N towards x , whichis a zero of B + ∑ i A i by Proposition 4.1. 13iv). If B is uniformly monotone, then there exists a non-decreasing function ϕ ∶ [ , +∞[→[ , +∞] that vanishes only at 0, such that for all x, y ∈ H⟨ Bx − By ∣ x − y ⟩ ≥ ϕ (∣∣ x − y ∣∣) . For all t ∈ N , ⟨⟨ B J N S z t − B J N S z ∣∣ z t − z ⟩⟩ = ∑ i ω i ⟨ B (∑ i ω i z i,t ) − B (∑ i ω i z i ) ∣ z i,t − z i ⟩= ⟨ B (∑ i ω i z i,t ) − B (∑ i ω i z i ) ∣ ∑ i ω i ( z i,t − z i )⟩≥ ϕ (∣∣∑ i ω i ( z i,t − z i )∣∣) = ϕ (∣∣ x t − x ∣∣) . Recall that under (A0) and (A2), B J N S z t —→ B J N S z and z t ——⇀ z , so that ⟨⟨ B J N S z t − B J N S z ∣∣ z t − z ⟩⟩ —→ . In view of the properties of ϕ , we obtain strong con-vergence of ( x t ) t towards x . Remark . In statements (i)-(iii) of Theorem 4.1 under (A0)-(A1) (stationary case), as-sumptions (A0) can be weakened. More precisely, (A0)-(ii) can be replaced by ∑ t ∈ N λ t ( − αλ t ) = +∞ where α = max ( / , /( + β / ¯ γ )) , and (A0)-(iii) by ∑ t ∈ N λ t ( ∣∣ ε ,t ∣∣ + ∣∣ ε ,t ∣∣ ) <+∞ . The proof would follow the same lines as [17, Lemma 5.1]. Let’s note also that apart of assumption (A0)-(ii) on lim λ t is not needed under (A2). Remark . Assumption of uniform monotonicity in the proofof statement (iv) can be relaxed. For instance, the sequence ( z t ) t ∈ N is indeed quasi-Fejér monotone with respect to F . Thus, if int F ≠ ∅ , strong convergence occurs by [17,Lemma 2.8(iv)]. Corollary 4.1.
Theorem 2.1 holds.Proof.
Let ( z i,t ) t ∈ N and ( x t ) t ∈ N be the sequences defined in (4). Identifying B with ∇ F and A i with ∂G i and skipping some calculations, (( z i,t ) i ) t ∈ N follows iterations (15)with ε ,t = ( ε ,t,i ) i and ε ,t = C (− γ t ε ,t ) , providing (A0)-(ii)-(iii) in Theorem 4.1. Now,under (H1)-(H2), argmin ( F + ∑ i G i ) = zer (∇ F + ∑ i ∂G i ) ≠ ∅ , providing (A0)-(i) inTheorem 4.1. The proof of weak convergence of ( x t ) t ∈ N follows from Theorem 4.1-(iii).The proof of strong convergence is a consequence of Theorem 4.1-(iv) together with thefact that uniform convexity of a function in Γ (H) implies uniform monotonicity of itssubdifferential [4]. The generalized forward-backward algorithm can be viewed as a hybrid splittingalgorithm whose special instances turn out to be classical splitting methods; namely theforward-backward and Douglas-Rachford algorithms.
Relaxed Forward-Backward
For n ≡ , we have J N S = Id , A def = A = A , B = B andthe operator (8) specializes to [ R γA + Id ][ Id − γB ] = J A ( Id − γB ) , (20)so that x t = z t = z ,t given by (15) (resp. (4) in the optimization case) follows exactly theiterations of the relaxed forward-backward algorithm [17, Section 6], and its convergenceproperties under assumptions (A0) and (A2).14his comparison is of particular interest in the convex optimization case since it maybe inspiring to study the convergence rate of the generalized forward-backward on theobjective. Indeed, it is now known that the exact forward-backward algorithm enjoys aconvergence rate in O ( / t ) on the objective [55, 7]. Furthermore, there has been severalaccelerated multistep versions of the exact forward-backward in the literature [55, 6, 70]with a convergence rate of O ( / t ) on the objective (although no convergence guaranteeon the iterate itself is given). Therefore, two possible perspectives of this work wouldbe to investigate the convergence rate (on the objective of course) of the generalizedforward-backward and to design a potential multistep acceleration. Relaxed Douglas-Rachford
If we set B ≡ , the operator (8) becomes [ R γA R N S + Id ] . (21)Taking γ t = ¯ γ ∈] , +∞[ , ∀ t , z t provided by (15) (resp. (4) in the optimization case) wouldbe equivalent to applying the relaxed Douglas-Rachford algorithm on the product space H for solving ∈ ∑ i A i x [65, 21]. The convergence statements of Theorem 4.1-(i)-(iii)holds in this case under (A0)-(i), λ t ∈] , [ with ∑ t ∈ N λ t ( − λ t ) = +∞ and ∑ t ∈ N λ t ( ∣∣ ε ,t ∣∣ + ∣∣ ε ,t ∣∣ ) < +∞ ; see Remark 4.1 where α = by Proposition 4.3. Resolvents of the sum of monotone operators
The generalized forward-backwardalgorithm provides yet another way for computing the resolvent of the sum of maximalmonotone operators at a point y ∈ ran ( Id + ∑ i A i ) . It is sufficient to take in (6) Bx = x − y and β = . It would be interesting to compare this algorithm with the Douglas-Rachfordand Dykstra-based variants [18]. This will be left to a future work. Relation to [53]
In a finite-dimensional setting, these authors propose an algorithmfor the monotone inclusion problem consisting of the sum of a continuous monotonemap and a set-valued maximal monotone operator, introducing a “block-decomposition”hybrid proximal extragradient (HPE).They also derive the corresponding convergencerates.More precisely, our optimization problem can be rewritten in the form consideredin [53, Section 5.3, (51)]. Indeed, (1) is equivalent to the linearly constrained convexproblem min z =( z i ) i ∈ H F ( ∑ i ω i z i ) + ∑ i G i ( z i ) such that proj S (cid:150) ( z ) = , (22)As proj S (cid:150) is self-adjoint, z is an optimal solution if and only if there exists v = ( v i ) i ∈ H such that ∈ (∇ F ( ∑ i ω i z i )) i + ( ∂G i ( z i )/ w i ) i + proj S (cid:150) ( v ) and proj S (cid:150) ( z ) = , and the minimizer is given by x = ∑ i ω i z i .Let ς ∈] , ] and γ = ς ςβ +√ + ς β . Transposed to our setting, their iterations read:15 lgorithm 2 Iterations Block-Decomposition HPE [53]. for i ∈ (cid:74) , n (cid:75) do z i ← prox γωi G i ( γ x + ( − γ ) z i − γ ∇ F ( x ) + γ ( v i − u ) ) ; for i ∈ (cid:74) , n (cid:75) do v i ← v i − γz i + γx ; x ← ∑ i ω i z i ; u ← ∑ i ω i v i .The update of the z i ’s in this iteration shares similarities with the one in Algorithm 1,where γ is identified with γ t . Nonetheless, the two algorithms are different in someimportant ways. Our algorithm is robust to errors while there is no proof of such robust-ness for HPE. Furthermore, HPE carries additional (dual) variables hence increasing thecomputational load of the algorithm. Finally, unlike our algorithm, the step-size in HPE γ cannot be iteration-varying, and γ < ς whatever the Lipschitz constant of ∇ F , whichis a stronger condition than ours. The latter can have important practical impact. Relation to [23]
While this paper was being released, these authors independentlydeveloped another algorithm to solve a class of problems that covers (6). They relyon the classical Kuhn-Tucker theory and propose a primal-dual splitting algorithm forsolving monotone inclusions involving a mixture of sums, linear compositions, and paral-lel sums (inf-convolution in convex optimization) of set-valued and Lipschitz operators.More precisely, the authors exploit the fact that the primal and dual problems havea similar structure, cast the problem as finding a zero of the sum of a Lipschitz con-tinuous monotone map with a maximal monotone operator whose resolvent is easilycomputable. They solve the corresponding monotone inclusion using an inexact versionof Tseng’s forward-backward-forward splitting algorithm [69].Removing the parallel sum and taking the linear operators as the identity in [23,(1.1)], one recovers problem (6). For the sake of simplicity and space saving we do notreproduce here in full their algorithm. However, adapted to the optimization problem min x F ( x ) + ∑ i G i ( L i x ) , where each L i is a bounded linear operator, their algorithmreads ( G i ∗ is the Legendre-Fenchel conjugate of G i ): Algorithm 3
Iterations Primal-Dual Combettes-Pesquet [23]. y ← x − γ t (∇ F ( x ) + ∑ ni = L i ∗ v i ) for i ∈ (cid:74) , n (cid:75) do z i ← v i + γ t L i x ; v i ← v i − z i + prox γ t G i ∗ ( z i ) + γ t L i y ; x ← x − γ t (∇ F ( y ) + ∑ ni = L i ∗ ( prox γ t G i ∗ ( z i ))) ;Recall that the proximity operator of G i ∗ can be easily deduced from that of G i usingMoreau’s identity. Taking L i = Id in Algorithm 3 solves (1). Similarly to the thegeneralized forward-backward, this algorithm allows for inexact computations of theinvolved operators and for varying step-size γ t . However, if (cid:96) def = / β denotes the Lipschitzconstant of F , the bound on our step-size sequence is / (cid:96) while theirs is /( (cid:96) + √ n ) , atleast twice lower and degrading as n increases. While we solve the primal problem, theiralgorithm solves both the primal and dual ones, which at least doubles the number ofauxiliary variables required. Moreover, it also requires two calls to the gradient of F periteration. Nonetheless, their algorithm is able to solve a more general class of problems.16inally, let us notice that if one want to use the composition with linear operators,each iteration requires two calls to each one of them and two calls to their adjoints, whatcan be computationally more expensive than computing directly the proximity operatorsof the G i ○ L i ’s (see Section 6).It is also noteworthy to point out that Tseng’s forward-backward-forward algorithmthey used is a special case of the HPE method whose iteration complexity results werederived in [52]. This section applies the generalized forward-backward to image processing problems.The problems are selected so that other splitting algorithms can be applied as well andcompared fairly. In the following, Id denotes the identity operator on the appropriatespace to be understood from the context, N is a positive integer and I ≡ R N × N is theset of images of size N × N pixels. We consider a class of inverse problem regularizations, where one wants to recoveran (unknown) high resolution image y ∈ I from noisy low resolution observations y = Φ y + w ∈ I . We report results using several ill-posed linear operators Φ ∶ I → I , andfocus our attention to convolution and masking operator, and a combination of theseoperators. In the numerical experiments, the noise vector w ∈ I is a realization of anadditive white Gaussian noise of variance σ w .The restored image ˆ y = W ˆ x is obtained by optimizing the coefficients ˆ x ∈ H ina redundant wavelet frame [49], where W ∶ H → I is the wavelet synthesis operator.The wavelet atoms are normalized so that W is a Parseval tight frame, i.e. it satisfies W W ∗ = Id . In this setting, the coefficients are vectors x ∈ H ≡ I J where the redundancy J = J + depends on the number of scales J of the wavelet transform.The general variational problem for the recovery reads min x ∈H { Ψ ( x ) def = ∣∣ y − Φ W x ∣∣ + µ ∣∣ x ∣∣ B , + ν ∣∣ W x ∣∣ TV } . (23)The first term in the summand is the data-fidelity term, which is taken to be a squared (cid:96) -norm to reflect the additive white Gaussianity of the noise. The second and thirdterms are regularizations , enforcing priors assumed to be satisfied by the original image.The first regularization is a (cid:96) / (cid:96) - norm by blocks , inducing structured sparsity on thesolution. The second regularization is a discrete total variation semi-norm , inducingsparsity on the gradient of the restored image. The scalars µ and ν are weights – so-called regularization parameters – to balance between each terms of the energy Ψ . Wenow detail the properties of each of these three terms. ∣∣ y − Φ W x ∣∣ For the inpainting inverse problem, one considers a masking operator ( M y ) p def = ⎧⎪⎪⎨⎪⎪⎩ if p ∈ Ω ,y p otherwise.Where Ω is a set of pixels, taking into account missing or defective sensors that deterioratethe observations; we will denote ρ = ∣ Ω ∣/ N the ratio of missing pixels. For the deblurring17nverse problem, we consider a convolution with a discrete Gaussian filter of width σ , K ∶ y ↦ g σ ∗ y , normalized to a unit mass. This simulates a defocus effect or low-resolution sensors. In the following, we thus consider Φ being equal either to M , K orthe composition M K .Denoting L def = Φ W , the fidelity term thus reads F ( x ) = ∣∣ y − Lx ∣∣ . The function F corresponds to the smooth term in (1). Its gradient ∇ F ∶ x ↦ L ∗ ( Lx − y ) is Lipschitz-continuous with constant β − ≤ ∣∣ Φ W ∣∣ = .For any γ > , the proximity operator of F reads prox γF ( x ) = ( Id + γL ∗ L ) - ( x + γL ∗ y ) . (24)The vector L ∗ y can be precomputed, but inverting Id + γL ∗ L may be problematic. For L ≡ Id , this is trivial. For inpainting or deblurring alone, as Φ is associated to a Parsevaltight frame, L ≡ M W or L ≡ KW , the Sherman-Morrison-Woodbury formula gives ( Id + γL ∗ L ) - = Id − L ∗ ( Id + γLL ∗ ) - L = Id − W ∗ Φ ∗ ( Id + γ ΦΦ ∗ ) - Φ W . (25)Since M (resp. K ) is a diagonal operator in the pixel domain (resp. Fourier domain), (25)can be computed in O ( N ) (resp. O ( N log N ) ) operations. However, the compositecase L ≡ M KW is more involved. An auxiliary variable is required, replacing F ∶ H → R by ˜ F ∶ H × I →]−∞ , +∞] defined by ˜ F ( x, u ) = ∣∣ y − M u ∣∣ + ι C KW ( x, u ) = G ( x, u ) + G ( x, u ) , (26)where C KW def = {( x, u ) ∈ H × I ∣ u = KW x } . Only then, prox γG can be computed from(24), and prox γG is the orthogonal projection on ker ([ Id , − KW ]) [29, 10], which involves a similar inversion as in (25). µ ∣∣ x ∣∣ B , Sparsity-promoting regularizations over wavelet (and beyond) coefficients are popularto solve a wide range of inverse problems [49]. Figure 1(a), left, shows an example oforthogonal wavelet coefficients of a natural image, where most of the coefficients havesmall amplitude, they are thus quite sparse. A way to enforce this sparsity is to use the (cid:96) -norm of the coefficients ∣∣ x ∣∣ = ∑ p ∣ x p ∣ .The presence of edges or textures creates structured local dependencies in the waveletcoefficients of natural images. A way to take into account those dependencies is to replacethe absolute value of the coefficients in the (cid:96) -norm by the (cid:96) -norm of groups (or blocks )of coefficients [71]. This is known as the mixed (cid:96) / (cid:96) -norm by ∣∣ x ∣∣ B , = ∑ b ∈B µ b ∣∣ x b ∣∣ = ∑ b ∈B µ b √∑ p ∈ b x p , (27)where p indexes the coefficients, the blocks b are sets of indexes, the block-structure B is a collection of blocks and x b def = ( x p ) p ∈ b is a subvector of x . The positive scalars µ b are weights tuning the influence of each block. It is a norm on H as soon as B coversthe whole space, i.e. ∀ p ∈ (cid:74) , N (cid:75) × (cid:74) , J (cid:75) , ∃ b ∈ B s.t. p ∈ B . Note that for B ≡ ⋃ p { p } and µ { p } ≡ for all p , it reduces to the (cid:96) -norm.We mentionned in the introduction that the proximal operator of a (cid:96) -norm is asoft-thresholding on the coefficients. Similarly, it is easy to show that whenever B is18 on-overlapping , i.e. ∀ b , b ′ ∈ B , b ∩ b ′ = ∅ , the proximity operator of ∣∣ ⋅ ∣∣ B , is a soft-thresholding by block prox µ ∣∣⋅∣∣ B , ( ( x b ) b ) = ( Θ µ b ⋅ µ ( x b )) b , with Θ µ ( x b ) = ⎧⎪⎪⎨⎪⎪⎩ if ∣∣ x b ∣∣ < µ , ( − µ ∣∣ x b ∣∣ ) x b otherwise ,and the coefficients x p not covered by B remaining unaltered.Non-overlapping block structures break the translation invariance that is underlyingmost traditional image models. To restore this invariance, one can consider overlappingblocks, as illustrated in Figure 1(c). Computing prox ∣∣⋅∣∣ B , in this case is not as simple asfor the non-overlapping case, because the blocks cannot be treated separately. For tree-structured blocks ( i.e. b ∩ b ′ ≠ ∅ ⇒ b ⊂ b ′ or b ′ ⊂ b ), [43] proposes a method involvingthe computation of a min-cost flow. This could be computationally expensive and do notaddress the general case anyway. Instead, it is always possible to decompose the blockstructure as a finite union of non-overlapping sub-structures B = ⋃ i B i . The resultingterm can finally be split into ∣∣ x ∣∣ B , = ∑ b ∈B ∣∣ x b ∣∣ = ∑ i ∑ b ∈B i ∣∣ x b ∣∣ = ∑ i ∣∣ x ∣∣ B i , , where each ∣∣ ⋅ ∣∣ B i , is simple. (a) ∣∣ x ∣∣ = ∑ p ∣ x p ∣ (b) ∣∣ x ∣∣ B , = ∑ b ∈B ∣∣ x b ∣∣ (c) ∣∣ x ∣∣ B , = ∣∣ x ∣∣ B , + ∣∣ x ∣∣ B , Figure 1: Illustration of the block (cid:96) / (cid:96) -norm. (a) sparsity of the image in an orthogonalwavelet decomposition (gray pixels corresponds to low coefficients); (b) a non-overlappingblock structure; (c) splitting of a more complex structure into two non-overlapping layers.In our numerical experiments where H ≡ I J , coefficients within each resolution level(from to J ) and each subband are grouped according to all possible square spatialblocks of size S × S ; which can be decomposed into S non-overlapping block structures. ν ∣∣ W x ∣∣ TV The second regularization favors piecewise-smooth images, by inducing sparsity onits gradient [63]. The total variation semi-norm can be viewed as a specific instance of (cid:96) / (cid:96) -norm, ∣∣ y ∣∣ TV = ∣∣∇ I y ∣∣ B TV , , with ∇ I ∶ { I —→ I y z→ ( V ∗ y, H ∗ y ) and ∣∣ ( v, h ) ∣∣ B TV , = ∑ p ∈ (cid:74) ,N (cid:75) √ v p + h p , where the image gradient is computed by finite differences through convolution with avertical filter V and a horizontal filter H , and B TV is clearly non-overlapping. For some19pecial gradient filters, the modified TV semi-norm can be splitted into simple functions,see for instance [21, 61]. However, we consider more conventional filters V = ( − ) and H = ( − ) centered in the upper-left corner. Introducing an auxiliary variable as advocated in(26), the main difficulty remains to invert the operator ( Id + γ ∇ I ∇ I ∗ ) . Under appro-priate boundary conditions, this can be done in the Fourier domain in O ( N log ( N )) operations. We now give the details of the different splitting strategies required to apply thethree tested algorithms to (23).
Generalized Forward-Backward (
GFB ) The problem is rewritten under the form(1) as min x ∈H u ∈I ∣∣ y − M KW x ∣∣ + µ S ∑ i = ∣∣ x ∣∣ B i , + ν ∣∣ u ∣∣ B TV , + ι C ∇I W ( x, u ) , (28)with F ( x ) ≡ ∣∣ y − M KW x ∣∣ and n ≡ S + . The indicator function ι C ∇I W is definedsimilarly as in (26). In Algorithm 1, we set equal weights ω i ≡ / n , a constant gradientstep-size γ ≡ . β and a constant relaxation parameter to λ ≡ . Relaxed Douglas-Rachford ( DR ) Here the problem is split as min x ∈H u ∈I u ∈I ∣∣ y − M u ∣∣ + ι C KW ( x, u ) + µ S ∑ i = ∣∣ x ∣∣ B i , + ν ∣∣ u ∣∣ B TV , + ι C ∇I W ( x, u ) , and solved with Algorithm 1, where F ≡ and n ≡ S + . As mentioned in Section 5, thiscorresponds to a relaxed version of the Douglas-Rachford algorithm, with best resultswhen γ ≡ / n . Primal-Dual Chambolle-Pock (
ChPo ) A way to avoid operator inversions is torewrite the original problem as min x ∈H G ( Λ x ) where Λ ∶ { H —→ I × (H) S × I x z→ ( M KW x, x, . . . , x, ∇ I W x ) , and G ∶ ⎧⎪⎪⎨⎪⎪⎩ I × (H) S × I —→ R ( u , x , . . . , x S , g ) z→ ∣∣ y − u ∣∣ + µ ∑ S i = ∣∣ x i ∣∣ B i , + ν ∣∣ g ∣∣ B TV , . The operator Λ is a concatenation of linear operators and its adjoint is easy to compute,and G is simple, being a separable mixture of simple functions. Note that this is not theonly splitting possible. For instance, one can write the problem on a product space as20 in ( x i ) i ∈ H ι S (( x i ) i ) + ∑ i G i ( Λ i x i ) , where G i is each of the functions in G above, and Λ i iseach of the linear operators in Λ .To solve this, we here use the primal-dual relaxed Arrow-Hurwicz algorithm describedin [12]. According to the notations in that paper, we set the parameters σ ≡ , τ ≡ . σ ( + S + ) and ≡ . Block-Decomposition Hybrid Proximal Extragradient (
HPE ) We split the prob-lem written in (28) according to (22), and set equals weights w i ≡ / n . According toSection 5.2, we set the parameter ς ≡ . . Primal-Dual Combettes-Pesquet (
CoPe ) Finally, the problem takes its simplestform min x ∈H ∣∣ y − M KW x ∣∣ + µ S ∑ i = ∣∣ x ∣∣ B i , + ν ∣∣∇ I W x ∣∣ B TV , . (29)As long as ν ≡ (no TV-regularization), this is exactly (28); we apply Algorithm 3where L i ≡ Id for all i and γ ≡ . /( + S ) . However with TV-regularization, we avoid theintroduction of the auxiliary variable u with L S + ≡ ∇ I W and γ ≡ . /( + √ S + ) . All experiments were performed on a discrete image of width N ≡ , with values inthe range [ , ] . The additive white Gaussian noise has standard-deviation σ w ≡ . ⋅ − .The reconstruction operator W uses non-separable, bi-dimensional Daubechies waveletswith 2 vanishing moments. It is implemented such that each atom has norm − j , with j ∈ (cid:74) , J (cid:75) and where J is the coarsest resolution level. Accordingly, we set the weights µ b in the (cid:96) / (cid:96) -norm to − j at the resolution level j of the coefficients in block b . Weuse J ≡ , resulting in a dictionary with redundancy J = J + = . All algorithmsare implemented in Matlab .Results are presented in Figures 2, 3, 4 and 5. For each problem, the five algorithmswere run iterations (initialized at zero), while monitoring their objective functionalvalues Ψ along iterations. Ψ min is fixed as the minimum value reached over the fivealgorithms (in our experiments, this was always the generalized forward-backward), andevolution of the objectives compared to Ψ min is displayed for the first iterations.Because the computational complexity of an iteration may vary between algorithms,computation times for iterations (no parallel implementation) are given beside thecurves. Below the energy decay graph, one can find from left to right the original image,the degraded image and the restored image after iterations of generalized forward-backward. Degraded and restored images quality are given in term of the signal-to-noiseratio ( SNR ). Comparison to algorithms that do not use the (gradient) explicit step (
ChPo , DR ) For the first three experiments, there is no total variation regularization. In thedeblurring task (Figure 2), blocks of size × are used. GFB is slightly better thanthe others and iteration cost of
ChPo is too high for this problem. When increasingthe number of block structures (inpainting, Figure 3, size × ) computation timestends to be similar but GFB clearly outperforms the others for the task. However, oneadvantage of using the gradient becomes obvious in the composite case ( i.e. Φ ≡ M K ): An implementation of the generalized forward-backward, as well as the codes and materials for theexperiments, are available at
21n Figure 4, DR performs hardly better than ChPo . Indeed, in contrast to previous cases(see Section 6.1.1), F is not simple anymore and the introduction of the auxiliary variabledecreases the efficiency of each iteration of DR . This phenomenon is further illustrated inthe last case, where the total variation is added, introducing another auxiliary variable. Comparison to algorithms that use the (gradient) explicit step (
HPE , CoPe ) In the first experiment where n is small, the iterations of HPE and
CoPe are almost asefficient as the iterations of
GFB but take more time to compute, especially for
CoPe that needs twice more calls to ∇ F . In the second setting, HPE and
CoPe are hardlybetter than DR , maybe suffering from small gradient step-sizes. They perform better inthe composite setting, but require more computional time than GFB . In the last setting,iterations of
CoPe are still not as efficient as iterations of
GFB in spite of their highercomputational load due to the composition by the linear operator ∇ I W (see (29)).Finally, let us note that in the composite case ( i.e. Φ ≡ M K ), the
SNR of therestored image is greater when using both regularizations rather than one or the otherseparately. Moreover, we observed that it yields restorations more robust to variations ofthe parameters µ and ν . Those arguments seem to be in favor of mixed regularizations. We have introduced in this paper a novel proximal splitting method able to handleconvex functionals that are the sum of a smooth term and several simple functions.It generalizes existing schemes by enlarging the class of problems that can be solvedefficiently with proximal methods to the case where one of the function is smooth butnot simple. We provided theoretical guarantees on the convergence and robustness of thealgorithm even for the more general problem of finding the zeros of the sum of maximalmonotone operators, one of which is also co-coercive. Numerical experiments on convexoptimization problems encountered in inverse problems show evidence of the advantagesof our approach for large-scale imaging problems.In analogy with first-order methods such as the forward-backward algorithm, estab-lishing convergence rates (on the objective) and designing multistep accelerations arepossible perspectives that we leave to a future work.22 a) log ( Ψ − Ψ min ) vs. iteration
20 40 60 80 1000123 ChPoDRHPECoPeGFB (b) computing time t ChPo =
153 s t DR =
95 s t HPE =
148 s t CoPe =
235 s t GFB =
73 s (c) LaBoute y (d) y = Ky + w, .
63 dB (e) ˆ y = W ˆ x, .
45 dB
Figure 2: Deblurring: σ = ; µ = . ⋅ − ; S = ; ν = . (a) log ( Ψ − Ψ min ) vs. iteration
20 40 60 80 1000.511.522.53 ChPoDRHPECoPeGFB (b) computing time t ChPo =
229 s t DR =
219 s t HPE =
352 s t CoPe =
340 s t GFB =
203 s (c) LaBoute y (d) y = My + w, .
54 dB (e) ˆ y = W ˆ x, .
66 dB
Figure 3: Inpainting: ρ = . ; µ = . ⋅ − ; S = ; ν = .23 a) log ( Ψ − Ψ min ) vs. iteration
20 40 60 80 1000123 ChPoDRHPECoPeGFB (b) computing time t ChPo =
313 s t DR =
256 s t HPE =
342 s t CoPe =
268 s t GFB =
233 s (c) LaBoute y (d) y = MKy + w, .
93 dB (e) ˆ y = W ˆ x, .
77 dB
Figure 4: Composite: σ = ; ρ = . ; µ = . ⋅ − ; S = ; ν = . (a) log ( Ψ − Ψ min ) vs. iteration
20 40 60 80 1000123 ChPoDRHPECoPeGFB (b) computing time t ChPo =
358 s t DR =
294 s t HPE =
409 s t CoPe =
441 s t GFB =
286 s (c) LaBoute y (d) y = MKy + w, .
93 dB (e) ˆ y = W ˆ x, .
48 dB
Figure 5: Composite: σ = ; ρ = . ; µ = . ⋅ − ; S = ; ν = . ⋅ − .24 eferences [1] F. Acker and M. A. Prestel. Convergence d’un schéma de minimisation alternée. Annales de la faculté des sciences de Toulouse , 5,2(1):1–9, 1980.[2] K.J. Arrow, L. Hurwicz, and H. Uzawa.
Studies in linear and non-linear program-ming . Stanford University Press, 1958.[3] J.-B. Baillon and G. Haddad. Quelques propriétés des opérateurs angle-bornés etn-cycliquement monotones.
Israel J. Math , 26:137–150, 1977.[4] H. H. Bauschke and P. L. Combettes.
Convex Analysis and Monotone OperatorTheory in Hilbert Spaces.
Springer-Verlag, New York, 2011.[5] H. H. Bauschke, P. L. Combettes, and S. Reich. The asymptotic behavior of thecomposition of two resolvents.
Nonlinear Analysis-theory Methods & Applications ,60:283–301, 2005.[6] A. Beck and M. Teboulle. A fast iterative shrinkage-thresholding algorithm forlinear inverse problems.
SIAM J. Img. Sci. , 2(1):183–202, March 2009.[7] K. Bredies and D.A. Lorenz. Linear convergence of iterative soft-thresholding.
J.Fourier Anal. Appl. , 14, Dec 2008. 813–837.[8] L. M. Briceño-Arias and P. L. Combettes. Convex variational formulation withsmooth coupling for multicomponent signal decomposition and recovery.
Numer.Math. Theory Methods Appl. , 2:485–508, 2009.[9] L. M. Briceño-Arias and P. L. Combettes. A monotone+skew splitting model forcomposite monotone inclusions in duality.
SIAM J. Opt. , to appear, 2011.[10] L. M. Briceño-Arias, P. L. Combettes, J.-C. Pesquet, and N. Pustelnik. Proximalalgorithms for multicomponent image recovery problems.
Journal of MathematicalImaging and Vision , pages 1–20, 2010.[11] F. E. Browder. Convergence theorems for sequences of nonlinear operators in banachspaces.
Mathematische Zeitschrift , 100(3):201–225, 1967.[12] A. Chambolle and T. Pock. A first-order primal-dual algorithm for convex problemswith applications to imaging.
J. Math. Imaging Vis. , 40(1):120–145, May 2011.[13] C. Chaux, P.L. Combettes, J.-C. Pesquet, and V.R. Wajs. A variational formulationfor frame based inverse problems.
Inverse Problems , 23, June 2007. 1495–1518.[14] C. Chaux, J.-C. Pesquet, and N. Pustelnik. Nested iterative algorithms for convexconstrained image recovery problems.
SIAM Journal on Imaging Sciences , 2(2):730–762, 2009.[15] G. Chen and M. Teboulle. A proximal-based decomposition method for convexminimization problems.
Math. Program. , 64(1-3):81–101, 1994.[16] G. H.-G. Chen and R. T. Rockafellar. Convergence rates in forward–backwardsplitting.
SIAM Journal on Optimization , 7(2):421–444, 1997.[17] P. L. Combettes. Solving monotone inclusions via compositions of nonexpansiveaveraged operators.
Optimization , 53(5-6):475–504, 2004.2518] P. L. Combettes. Iterative construction of the resolvent of a sum of maximal mono-tone opera- tors.
J. Convex Anal. , 16:727–748, 2009.[19] P. L. Combettes, D. D˜ung, and B. C. V˜u. Dualization of signal recovery problems.
Set-Valued and Variational Analysis , 18:373–404, 2010.[20] P. L. Combettes and J.-. Pesquet. A Douglas-Rachford splitting approach to non-smooth convex variational signal recovery.
IEEE J. Selected Topics in Signal Pro-cessing , 1(4):564–574, 2007.[21] P. L. Combettes and J.-C. Pesquet. A proximal decomposition method for solvingconvex variational inverse problems.
Inverse Problems , 24(6):065014, 2008.[22] P. L. Combettes and J.-C. Pesquet.
Fixed-Point Algorithms for Inverse Problems inScience and Engineering , chapter Proximal Splitting Methods in Signal Processing,pages 185–212. Springer-Verlag, 2011.[23] P. L. Combettes and J.-C. Pesquet. Primal-dual splitting algorithm for solvinginclusions with mixtures of composite, Lipschitzian, and parallel-sum monotoneoperators. arXiv:1107.0081v1 , 30 June 2011.[24] P. L. Combettes and V. R. Wajs. Signal recovery by proximal forward-backwardsplitting.
SIAM Multiscale Modeling and Simulation , 4(4):1168, 2005.[25] I. Daubechies, M. Defrise, and C. De Mol. An iterative thresholding algorithm forlinear inverse problems with a sparsity constraint.
Commun. on Pure and Appl.Math. , 57(11):1413–1541, 2004.[26] D. L. Donoho. De-noising by soft-thresholding.
IEEE Transactions on InformationTheory , 41(3):613–627, 1995.[27] J. Douglas and H. H. Rachford. On the numerical solution of heat conduction prob-lems in two and three space variables.
Transactions of the American MathematicalSociety , 82(2):421–439, 1956.[28] F.-X. Dupé, J. M. Fadili, and J.-L. Starck. Inverse problems with Poisson noise:Primal and primal-dual splitting. In
International Conference on Image Processing(ICIP) , Brussels, 2011.[29] F.-X. Dupé, J. M. Fadili, and J.-L. Starck. Linear inverse problems with variousnoise models and mixed regularizations. In , Paris, 2011.[30] F.-X. Dupé, M.J. Fadili, and J.-L. Starck. A proximal iteration for deconvolvingPoisson noisy images using sparse representations.
IEEE Transactions on ImageProcessing , 18(2):310–321, 2009.[31] F.-X. Dupé, M.J. Fadili, and J.-L. Starck. Deconvolution under Poisson noise usingexact data fidelity and synthesis or analysis sparsity priors.
Statistical Methodology ,2011. in press.[32] J. Eckstein. Parallel alternating direction multiplier decomposition of convex pro-grams.
Journal of Optimization Theory and Applications , 80(1):39–62, 1994.2633] J. Eckstein and D. P. Bertsekas. On the Douglas-Rachford splitting method andthe proximal point algorithm for maximal monotone operators.
Math. Program. ,55(3):293–318, 1992.[34] J. Eckstein and B. F. Svaiter. General projective splitting methods for sums ofmaximal monotone operators.
SIAM J. Control Optim. , 48(2):787–811, 2009.[35] M. J. Fadili, J.-L Starck, and F. Murtagh. Inpainting and zooming using sparserepresentations.
The Computer Journal , 52, 2007. 64–79.[36] M.J. Fadili and G. Peyré. Total variation projection with first order schemes.
IEEETransactions on Image Processing , 2010. in press.[37] M. Figueiredo and J. Bioucas-Dias. Restoration of Poissonian images using alter-nating direction optimization.
IEEE Transactions on Image Processing , 2010.[38] M.A. Figueiredo and R. Nowak. An EM algorithm for wavelet-based image restora-tion.
IEEE Transactions on Image Processing , 12(8), 2003. 906–916.[39] M. Fortin and R. Glowinski.
Augmented Lagrangian Methods: Applications to theNumerical Solution of Boundary-Value Problems . Elsevier Science Publishers, Am-sterdam, 1983.[40] D. Gabay. Applications of the method of multipliers to variational inequalities. InM. Fortin and R. Glowinski, editors,
Augmented Lagrangian Methods: Applicationsto the Numerical Solution of Boundary-value Problems , Amsterdam, 1983. North-Holland Publishing Company.[41] D. Gabay and B. Mercier. A dual algorithm for the solution of nonlinear varia-tional problems via finite element approximation.
Computers & Mathematics withApplications , 2(1):17–40, 1976.[42] R. Glowinski and P. Le Tallec.
Augmented Lagrangian and Operator-Splitting Meth-ods in Nonlinear Mechanics . SIAM, 1989.[43] R. Jenatton, J. Mairal, G. Obozinski, and F. Bach. Proximal methods for hierar-chical sparse coding.
ArXiv e-prints , September 2010.[44] G.M. Korpelevich. An extragradient method for finding saddle points and for otherproblems.
Ekonom. Mat. Metody , 12(4):747–756, 1976.[45] J. Lieutaud.
Approximation d’Opérateurs par des Méthodes de Décomposition . PhDthesis, Université de Paris, 1969.[46] P. L. Lions. Une méthode itérative de résolution d’une inéquation variationnelle.
Israel Journal of Mathematics , 31(2):204–208, 1978.[47] P. L. Lions and B. Mercier. Splitting algorithms for the sum of two nonlinearoperators.
SIAM Journal on Numerical Analysis , 16(6):964–979, 1979.[48] M. A. T. Figueiredo M. V. Afonso, J. M. Bioucas-Dias. Fast image recovery us-ing variable splitting and constrained optimization.
IEEE Transactions on ImageProcessing , 2010.[49] S. Mallat.
A Wavelet Tour of Signal Processing, Third Edition . Academic Press,2008. 2750] B. Mercier. Topics in finite element solution of elliptic problems.
Lectures onMathematics , 63, 1979.[51] O. J. Minty. Montone (nonlinear) operators in Hilbert space.
Duke Math. J ,29(3):341–346, 1962.[52] R. D. C. Monteiro and B. F. Svaiter. Complexity of variants of Tseng’s modifiedforward-backward splitting and Korpelevich’s methods for generalized variationalinequalities with applications to saddle point and convex optimization problems.Technical Report GA 30332-0205, School of Industrial and Systems Engineering,Georgia Institute of Technology, Atlanta, 2010. Submitted to SIAM Journal onOptimization.[53] R. D. C. Monteiro and B. F. Svaiter. Iteration-complexity of block-decompositionalgorithms and the alternating minimization augmented Lagrangian method. sub-mitted , 2010.[54] J.-J. Moreau. Proximité et dualité dans un espace hilbertien.
Bull. Soc. Math.France , 1965.[55] Y. Nesterov. Gradient methods for minimizing composite objective function. COREDiscussion Papers 2007076, Université catholique de Louvain, Center for OperationsResearch and Econometrics (CORE), Sep 2007.[56] G. B. Passty. Ergodic convergence to a zero of the sum of monotone operators inHilbert space.
Journal of Mathematical Analysis and Applications , 72(2):383–390,1979.[57] D. W. Peaceman and H. H. Rachford. The numerical solution of parabolic andelliptic differential equations.
Journal of the Society for Industrial and AppliedMathematics , 3(1):pp. 28–41, 1955.[58] J.-C. Pesquet and N. Pustelnik. A parallel proximal optimization method. preprint ,2011.[59] R. R. Phelps.
Convex Functions, Monotone Operators and Differentiability . LectureNotes Math. Springer-Verlag, second edition edition, 1993.[60] L.D. Popov. A modification of the Arrow-Hurwitz method of search for saddlepoints.
Mat. Zametki , 28(5):777–784, 1980.[61] N. Pustelnik, C. Chaux, and J.-C. Pesquet. Parallel proximal algorithm for imagerestoration using hybrid regularization. to appear in IEEE Transactions on ImageProcessing , November 2011.[62] R. T. Rockafellar.
Convex Analysis . Princeton University Press, 1970.[63] L. I. Rudin, S. Osher, and E. Fatemi. Nonlinear total variation based noise removalalgorithms.
Phys. D , 60(1–4):259–268, November 1992.[64] M. V. Solodov. A class of decomposition methods for convex optimization andmonotone variational inclusions via the hybrid inexact proximal point framework.
Optim. Methods Softw. , 19:557–575, 2004.[65] J. E. Spingarn. Partial inverse of a monotone operator.
Applied Mathematics &Optimization , 10(1):247–265, 1983. 2866] J.-L Starck, F. Murtagh, and M.J. Fadili.
Sparse Signal and Image Processing:Wavelets, Curvelets and Morphological Diversity . Cambridge University Press, Cam-bridge, UK, 2010. in press.[67] P. Tseng. Applications of splitting algorithm to decomposition in convex program-ming and variational inequalities.
SIAM J. Control Optim. , 29(1):119–138, January1991.[68] P. Tseng. Alternating projection-proximal methods for convex programming andvariational inequalities.
SIAM Journal on Optimization , 7(4):951–965, 1997.[69] P. Tseng. A modified forward-backward splitting method for maximal monotonemapping.
SIAM J. Control Optim. , 38(2), 2000.[70] P. Tseng. On accelerated proximal gradient methods for convex-concave optimiza-tion. submitted to SIAM Journal on Optimization , 2008.[71] M. Yuan and Y. Lin. Model selection and estimation in regression with groupedvariables.
J. of The Roy. Stat. Soc. B , 68(1):49–67, 2006.[72] E. H. Zarantonello. I. projections on convex sets, contributions to nonlinear func-tional analysis. In