[PDF] An Empirical Study of ADMM for Nonconvex Problems

Abstract

The alternating direction method of multipliers (ADMM) is a common optimization tool for solving constrained and non-differentiable problems. We provide an empirical study of the practical performance of ADMM on several nonconvex applications, including l0 regularized linear regression, l0 regularized image denoising, phase retrieval, and eigenvector computation. Our experiments suggest that ADMM performs well on a broad class of non-convex problems. Moreover, recently proposed adaptive ADMM methods, which automatically tune penalty parameters as the method runs, can improve algorithm efficiency and solution quality compared to ADMM with a non-tuned penalty.

Full PDF

AAn Empirical Study of ADMM forNonconvex Problems

Zheng Xu , Soham De , Mário A. T. Figueiredo , Christoph Studer , Tom Goldstein Department of Computer Science, University of Maryland, College Park, MD Instituto de Telecomunicações, Instituto Superior Técnico, Universidade de Lisboa, Portugal Department of Electrical and Computer Engineering, Cornell University, Ithaca, NY

Abstract

The alternating direction method of multipliers (ADMM) is a common optimiza-tion tool for solving constrained and non-differentiable problems. We providean empirical study of the practical performance of ADMM on several nonconvexapplications, including (cid:96) regularized linear regression, (cid:96) regularized image de-noising, phase retrieval, and eigenvector computation. Our experiments suggestthat ADMM performs well on a broad class of non-convex problems. Moreover,recently proposed adaptive ADMM methods, which automatically tune penaltyparameters as the method runs, can improve algorithm efﬁciency and solutionquality compared to ADMM with a non-tuned penalty. The alternating direction method of multipliers (ADMM) has been applied to solve a wide rangeof constrained convex and nonconvex optimization problems. ADMM decomposes complex opti-mization problems into sequences of simpler subproblems that are often solvable in closed form.Furthermore, these sub-problems are often amenable to large-scale distributed computing environ-ments [14, 23]. ADMM solves the problem min u ∈ R n ,v ∈ R m H ( u ) + G ( v ) , subject to Au + Bv = b, (1)where H : R n → ¯ R , G : R m → ¯ R , A ∈ R p × n , B ∈ R p × m , and b ∈ R p , by the following steps, u k +1 = arg min u H ( u ) + (cid:104) λ k , − Au (cid:105) + τ k (cid:107) b − Au − Bv k (cid:107) (2) v k +1 = arg min v G ( v ) + (cid:104) λ k , − Bv (cid:105) + τ k (cid:107) b − Au k +1 − Bv (cid:107) (3) λ k +1 = λ k + τ k ( b − Au k +1 − Bv k +1 ) , (4)where λ ∈ R p is a vector of dual variables (Lagrange multipliers), and τ k is a scalar penalty parameter.The convergence of the algorithm can be monitored using primal and dual “residuals,” both of whichapproach zero as the iterates become more accurate, and which are deﬁned as r k = b − Au k − Bv k , and d k = τ k A T B ( v k − v k − ) , (5)respectively [2]. The iteration is generally stopped when (cid:107) r k (cid:107) ≤ (cid:15) tol max {(cid:107) Au k (cid:107) , (cid:107) Bv k (cid:107) , (cid:107) b (cid:107) } and (cid:107) d k (cid:107) ≤ (cid:15) tol (cid:107) A T λ k (cid:107) , (6) ZX, SD, and TG were supported by US NSF grant CCF-1535902 and by US ONR grant N00014-15-1-2676.CS was supported in part by Xilinx Inc., and by the US NSF under grants ECCS-1408006 and CCF-1535897. a r X i v : . [ m a t h . O C ] D ec here (cid:15) tol > is the stopping tolerance.ADMM was introduced by Glowinski and Marroco [11] and Gabay and Mercier [10], and convergencehas been proved under mild conditions for convex problems [9, 7, 15]. The practical performanceof ADMM on convex problems has been extensively studied, see [2, 13, 28] and references therein.For nonconvex problems, the convergence of ADMM under certain assumptions are studied in[24, 20, 17, 25]. The current weakest assumptions are given in [25], which requires a number of strictconditions on the objective, including a Lipschitz differentiable objective term. In practice, ADMMhas been applied on various nonconvex problems, including nonnegative matrix factorization [27], (cid:96) p -norm regularization ( < p < )[1, 5], tensor factorization [21, 29], phase retrieval [26], manifoldoptimization [19, 18], random ﬁelds [22], and deep neural networks [23].The penalty parameter τ k is the only free choice in ADMM, and plays an important role in thepractical performance of the method. Adaptive methods have been proposed to automatically tunethis parameter as the algorithm runs. The residual balancing method [16] automatically increase ordecrease the penalty so that the primal and dual residuals have approximately similar magnitudes.The more recent AADMM method [28] uses a spectral (Barzilai-Borwein) rule for tuning the penaltyparameter. These methods achieve impressive practical performance for convex problems and areguaranteed to converge under moderate conditions (such as when adaptivity is stopped after a ﬁnitenumber of iterations).In this manuscript, we study the practical performance of ADMM on several nonconvex applications,including (cid:96) regularized linear regression, (cid:96) regularized image denoising, phase retrieval, andeigenvector computation. While the convergence of these applications may (not) be guaranteed bythe current theory, ADMM is one of the (popular) choices to solve these nonconvex problems. Thefollowing questions are addressed using these model problems: (i) does ADMM converge in practice,(ii) does the update order of H ( u ) and G ( v ) matter, (iii) is the local optimal solution good, (iv) doesthe penalty parameter τ k matter, and (v) is an adaptive penalty choice effective? (cid:96) regularized linear regression. Sparse linear regression can be achieved using the non-convex, (cid:96) regularized problem min x (cid:107) Dx − c (cid:107) + ρ (cid:107) x (cid:107) , (7)where D ∈ R n × m is the data matrix, c is a measurement vector, and x is the regression coefﬁcients.ADMM is applied to solve problem (7) using the equivalent formulation min u,v (cid:107) Du − c (cid:107) + ρ (cid:107) v (cid:107) subject to u − v = 0 . (8) (cid:96) regularized image denoising. The (cid:96) regularizer [6] can be substituted for the (cid:96) regularizerwhen computing total variation for image denoising. This results in the formulation [4] min x (cid:107) x − c (cid:107) + ρ (cid:107)∇ x (cid:107) (9)where c represents a given noisy image, ∇ is the linear discrete gradient operator, and (cid:107) · (cid:107) / (cid:107) · (cid:107) isthe (cid:96) / (cid:96) norm. We solve the equivalent problem min u,v (cid:107) u − c (cid:107) + ρ (cid:107) v (cid:107) subject to ∇ u − v = 0 . (10)The resulting ADMM sub-problems can be solved in closed form using fast Fourier transforms [12]. Phase retrieval.

Ptychographic phase retrieval [30, 26] solves the problem min x || abs ( Dx ) − c || , (11)where x ∈ C n , D ∈ C m × n , and abs ( · ) denotes the elementwise magnitude of a complex vector.ADMM is applied to the equivalent problem min u,v || abs ( u ) − c || subject to u − Dv = 0 . (12)2 igenvector problem. The eigenvector problem is a fundamental problem in numerical linearalgebra. The leading eigenvalue of a matrix D is found by computing max (cid:107) Dx (cid:107) subject to (cid:107) x (cid:107) = 1 . (13)ADMM is applied to the equivalent problem min −(cid:107) Du (cid:107) + ι { z : (cid:107) z (cid:107) =1 } ( v ) subject to u − v = 0 , (14)where ι S is the characteristic function deﬁned by ι S ( v ) = 0 , if v ∈ S , and ι S ( v ) = ∞ , otherwise. Experimental setting.

We implemented “vanilla ADMM” (ADMM with constant penalty), andfast ADMM with Nesterov acceleration and restart [13]. We also implemented two methods forautomatically selecting penalty parameters: residual balancing [16], and the spectral adaptive method[28]. For (cid:96) regularized linear regression, the synthetic problem in [31, 13, 28] and realistic problemsin [8, 31, 28] are investigated with ρ = 1 . For (cid:96) regularized image denoising, a one-dimensionalsynthetic problem was created by the process described in [31], and is shown in Fig. 3. For thetotal-variation experiments, the "Barbara" , "Cameraman", and "Lena" images are investigated, whereGaussian noise with zero mean and standard deviation 20 was added to each image (Fig. 4). ρ = 1 and ρ = 500 are used for the synthetic problem and image problems, respectively. For phase retrieval,a synthetic problem is constructed with a random matrix D ∈ C × , x ∈ C , e ∈ C and c = abs ( Dx + e ) . Three images in Fig. 4 are used. Each image is measured with 21 octanarypattern ﬁlters as described in [3]. For the eigenvector problem, a random matrix D ∈ R × is used. L0 LinReg -5 -4 -3 -2 -1 Initial penalty parameter I t e r a ti on s Vanilla ADMMResidual balanceAdaptive ADMM

L0 ImgRes Phase Retrieval -5 -4 -3 -2 -1 Initial penalty parameter I t e r a ti on s Vanilla ADMMResidual balanceAdaptive ADMM

Eigenvector -5 -4 -3 -2 -1 Initial penalty parameter O b j ec ti v e Vanilla ADMMResidual balanceAdaptive ADMM 10 -5 -4 -3 -2 -1 Initial penalty parameter PS N R Vanilla ADMMResidual balanceAdaptive ADMM 10 -5 -4 -3 -2 -1 -10010203040506070 Initial penalty parameter O b j ec ti v e Vanilla ADMMResidual balanceAdaptive ADMM10 -5 -4 -3 -2 -1 PS N R Vanilla ADMMResidual balanceAdaptive ADMM10 -5 -4 -3 -2 -1 Initial penalty parameter I t e r a ti on s Vanilla ADMMResidual balanceAdaptive ADMM 10 -5 -4 -3 -2 -1 Initial penalty parameter I t e r a ti on s Vanilla ADMMResidual balanceAdaptive ADMM

Figure 1:

Sensitivity to the (initial) penalty parameter τ for the (cid:96) regularized linear regression, eigenvectorcomputation, "cameraman" denoising, and phase retrieval. (top) Number of iterations needed as a function ofinitial penalty parameter. (bottom) The objective/PSNR of the minima found for each non-convex problem. Does ADMM converge in practice?

The convergence of vanilla ADMM is quite sensitive to thechoice of penalty parameter. For vanilla ADMM, the iterates may oscillate, and if convergence occursit may be very slow when the penalty parameter is not properly tuned. The residual balancing methodconverges more often than vanilla ADMM, and the spectral adaptive ADMM converges the mostoften. However, none of these methods uniformly beats all others, and it appears that vanilla ADMMwith a highly tuned stepsize can sometimes outperform adaptive variants.

Does the update order of H ( u ) and G ( v ) matter? In Fig. 1, ADMM is performed by ﬁrstminimizing with respect to the smooth objective term, and then the nonsmooth term. We repeatthe experiments with the update order swapped, and report the results in Fig. 2 of the appendix.When updating the non-smooth term ﬁrst, the convergence of ADMM for the phase retrieval problembecomes less reliable. However, for some problems (like image denoising), convergence happened abit faster than with the original update order. Although the behavior of ADMM changes, there is nopredictable difference between the two update orderings.3 s the local optimal solution good?

The bottom row of Fig. 1 presents the objective/PSNR achievedby the ADMM variants when varying the (initial) penalty parameter. In general, the quality of thesolution depends strongly on the penalty parameter chosen. There does not appear to be a predictablerelationship between the best penalty for convergence speed and the best penalty for solution quality.

Does the adaptive penalty work?

In Table 1, we see that adaptivity not only speeds up convergence,but for most problem instances it also results in better minimizers. This behavior is not uniform acrossall experiments though, and for some problems a slightly lower objective value can be achieved usinga ﬁnely tuned constant stepsize.Table 1:

Iterations (with runtime in seconds) and objective (or PSNR) for the various algorithms and applicationsdescribed in the text. Absence of convergence after n iterations is indicated as n + . Application Dataset × VanillaADMM Residualbalance [16]

AdaptiveADMM [28] (cid:96) regularizedlinear regression Synthetic 50 ×

40 2000+(.621) 2000+(.604)

Boston 506 ×

13 2000+(.598) 2000+(.570)

Diabetes 768 ×

384 648

Leukemia 38 × Prostate 97 × Servo 130 ×

267 267 (cid:96) regularizedimage restoration Synthetic1D 100 × Barbara 512 ×

512 200+(35.5) 200+(35.1)

Cameraman 256 ×

256 200+(5.75) 200+(5.60)

Lena 512 ×

512 200+(35.5) 200+(35.8) phase retrieval Synthetic × Barbara 512 × ×

21 59(91.1) 59(89.6)

Cameraman 256 × ×

21 59(29.6) 55(19.4)

Lena 512 × ×

21 59(90.1)) 57(87.4) width × height for image restoration; width × height × ﬁlters for phase retrieval We provide a detailed discussion of the performance of ADMM on several nonconvex applications,including (cid:96) regularized linear regression, (cid:96) regularized image denoising, phase retrieval, andeigenvector computation. In practice, ADMM usually converges for those applications, and thepenalty parameter choice has a signiﬁcant effect on both convergence speed and solution quality.Adaptive penalty methods such as AADMM [28] automatically select the penalty parameter, andperform optimization with little user oversight. For most problems, adaptive stepsize methods resultin faster convergence or better minimizers than vanilla ADMM with a constant non-tuned penaltyparameter. However, for some difﬁcult non-convex problems, the best results can still be obtained byﬁne-tuning the penalty parameter. 4 eferences [1] S. Bouaziz, A. Tagliasacchi, and M. Pauly. Sparse iterative closest point. In Computer graphics forum ,volume 32, pages 113–123. Wiley Online Library, 2013.[2] S. Boyd, N. Parikh, E. Chu, B. Peleato, and J. Eckstein. Distributed optimization and statistical learningvia the alternating direction method of multipliers.

Found. and Trends in Mach. Learning , 3:1–122, 2011.[3] E. J. Candes, X. Li, and M. Soltanolkotabi. Phase retrieval via wirtinger ﬂow: Theory and algorithms.

IEEE Transactions on Information Theory , 61(4):1985–2007, 2015.[4] R. Chartrand. Exact reconstruction of sparse signals via nonconvex minimization.

IEEE Signal ProcessingLetters , 14(10):707–710, 2007.[5] R. Chartrand and B. Wohlberg. A nonconvex ADMM algorithm for group sparsity with sparse groups.In , pages 6009–6013.IEEE, 2013.[6] B. Dong and Y. Zhang. An efﬁcient algorithm for (cid:96) minimization in wavelet frame based image restoration. Journal of Scientiﬁc Computing , 54(2-3):350–368, 2013.[7] J. Eckstein and D. Bertsekas. On the Douglas-Rachford splitting method and the proximal point algorithmfor maximal monotone operators.

Mathematical Programming , 55(1-3):293–318, 1992.[8] B. Efron, T. Hastie, I. Johnstone, and R. Tibshirani. Least angle regression.

The Annals of statistics , 32(2):407–499, 2004.[9] D. Gabay. Applications of the method of multipliers to variational inequalities.

Studies in mathematicsand its applications , 15:299–331, 1983.[10] D. Gabay and B. Mercier. A dual algorithm for the solution of nonlinear variational problems via ﬁniteelement approximation.

Computers & Mathematics with Applications , 2(1):17–40, 1976.[11] R. Glowinski and A. Marroco. Sur l’approximation, par éléments ﬁnis d’ordre un, et la résolution,par pénalisation-dualité d’une classe de problémes de Dirichlet non linéaires.

ESAIM: ModélisationMathématique et Analyse Numérique , 9:41–76, 1975.[12] T. Goldstein and S. Osher. The split Bregman method for L1-regularized problems.

SIAM Journal onImaging Sciences , 2(2):323–343, 2009.[13] T. Goldstein, B. O’Donoghue, S. Setzer, and R. Baraniuk. Fast alternating direction optimization methods.

SIAM Journal on Imaging Sciences , 7(3):1588–1623, 2014.[14] T. Goldstein, G. Taylor, K. Barabin, and K. Sayre. Unwrapping ADMM: efﬁcient distributed computingvia transpose reduction. In

AISTATS , 2016.[15] B. He and X. Yuan. On non-ergodic convergence rate of Douglas-Rachford alternating direction method ofmultipliers.

Numerische Mathematik , 130:567–577, 2015.[16] B. He, H. Yang, and S. Wang. Alternating direction method with self-adaptive penalty parameters formonotone variational inequalities.

Jour. Optim. Theory and Appl. , 106(2):337–356, 2000.[17] M. Hong, Z.-Q. Luo, and M. Razaviyayn. Convergence analysis of alternating direction method ofmultipliers for a family of nonconvex problems.

SIAM Journal on Optimization , 26(1):337–364, 2016.[18] A. Kovnatsky, K. Glashoff, and M. M. Bronstein. Madmm: a generic algorithm for non-smooth optimizationon manifolds. arXiv preprint arXiv:1505.07676 , 2015.[19] R. Lai and S. Osher. A splitting method for orthogonality constrained problems.

Journal of ScientiﬁcComputing , 58(2):431–449, 2014.[20] G. Li and T. K. Pong. Global convergence of splitting methods for nonconvex composite optimization.

SIAM Journal on Optimization , 25(4):2434–2460, 2015.[21] A. P. Liavas and N. D. Sidiropoulos. Parallel algorithms for constrained tensor factorization via alternatingdirection method of multipliers.

IEEE Transactions on Signal Processing , 63(20):5450–5463, 2015.[22] O. Miksik, V. Vineet, P. Pérez, P. H. Torr, and F. Cesson Sévigné. Distributed non-convex admm-inferencein large-scale random ﬁelds. In

British Machine Vision Conference, BMVC , 2014.

23] G. Taylor, R. Burmeister, Z. Xu, B. Singh, A. Patel, and T. Goldstein. Training neural networks withoutgradients: A scalable ADMM approach. arXiv preprint arXiv:1605.02026 , 2016.[24] F. Wang, Z. Xu, and H.-K. Xu. Convergence of bregman alternating direction method with multipliers fornonconvex composite problems. arXiv preprint arXiv:1410.8625 , 2014.[25] Y. Wang, W. Yin, and J. Zeng. Global convergence of admm in nonconvex nonsmooth optimization. arXivpreprint arXiv:1511.06324 , 2015.[26] Z. Wen, C. Yang, X. Liu, and S. Marchesini. Alternating direction methods for classical and ptychographicphase retrieval.

Inverse Problems , 28(11):115010, 2012.[27] Y. Xu, W. Yin, Z. Wen, and Y. Zhang. An alternating direction algorithm for matrix completion withnonnegative factors.

Frontiers of Mathematics in China , 7(2):365–384, 2012.[28] Z. Xu, M. A. Figueiredo, and T. Goldstein. Adaptive ADMM with spectral penalty parameter selection. arXiv preprint arXiv:1605.07246 , 2016.[29] Z. Xu, F. Huang, L. Raschid, and T. Goldstein. Non-negative factorization of the occurrence tensor fromﬁnancial contracts.

NIPS tensor workshop , 2016.[30] C. Yang, J. Qian, A. Schirotzek, F. Maia, and S. Marchesini. Iterative algorithms for ptychographic phaseretrieval. arXiv preprint arXiv:1105.5628 , 2011.[31] H. Zou and T. Hastie. Regularization and variable selection via the elastic net.

Journal of the RoyalStatistical Society: Series B (Statistical Methodology) , 67(2):301–320, 2005. Appendix: more experimental results

L0 LinReg L0 ImgRes Phase Retrieval Eigenvector -5 -4 -3 -2 -1 Initial penalty parameter I t e r a ti on s Vanilla ADMMResidual balanceAdaptive ADMM10 -5 -4 -3 -2 -1 -10010203040506070 Initial penalty parameter O b j ec ti v e Vanilla ADMMResidual balanceAdaptive ADMM10 -5 -4 -3 -2 -1 Initial penalty parameter I t e r a ti on s Vanilla ADMMResidual balanceAdaptive ADMM10 -5 -4 -3 -2 -1 PS N R Vanilla ADMMResidual balanceAdaptive ADMM10 -5 -4 -3 -2 -1 Initial penalty parameter I t e r a ti on s Vanilla ADMMResidual balanceAdaptive ADMM10 -5 -4 -3 -2 -1 O b j ec ti v e Vanilla ADMMResidual balanceAdaptive ADMM 10 -5 -4 -3 -2 -1 Initial penalty parameter O b j ec ti v e Vanilla ADMMResidual balanceAdaptive ADMM10 -5 -4 -3 -2 -1 Initial penalty parameter I t e r a ti on s Vanilla ADMMResidual balanceAdaptive ADMM

Figure 2:

Convergence results when the non-smooth objective term is updated ﬁrst, and the smooth termis updated second. Sensitivity to the (initial) penalty parameter τ is shown for the synthetic problem of (cid:96) regularized linear regression, eigenvector computation, the "cameraman" denoising problem, and phase retrieval.The top row shows the convergence speed in iterations. The bottom row shows the objective/PSNR achieved bythe ﬁnal iterates. (cid:96) regularized linear regression (cid:96) regularized linear regression is a nonconvex problem min x (cid:107) Dx − c (cid:107) + ρ (cid:107) x (cid:107) (15)where D ∈ R n × m is the data matrix, c is the measurement vector, and x is the regression coefﬁcients. ADMMis applied to solve problem (15) by solving the equivalent problem min u,v (cid:107) Du − c (cid:107) + ρ (cid:107) v (cid:107) subject to u − v = 0 . (16)The proximal operator of the (cid:96) norm is the hard-thresholding,hard ( z, t ) = arg min x (cid:107) x (cid:107) + 12 t (cid:107) x − z (cid:107) = z (cid:12) I { z : | z | > √ t } ( z ) , (17)where (cid:12) represents element-wise multiplication, and I S is the indicator function of the set S : I S ( v ) = 1 , if v ∈ S , and I S ( v ) = 0 , otherwise. Then the steps of ADMM can be written u k +1 = arg min u (cid:107) Du − c (cid:107) + τ (cid:107) − u + v k + λ k /τ (cid:107) (18) = (cid:40) ( D T D + τ I n ) − ( τ v k + λ k + D T c ) if n ≥ m ( I n − D T ( τ I m + DD T ) − D )( v k + λ k /τ + D T c/τ ) if n < m (19) v k +1 = arg min v ρ (cid:107) v (cid:107) + τ (cid:107) − u k +1 + v + λ k /τ (cid:107) = hard ( u k +1 − λ k /τ, ρ/τ ) (20) λ k +1 = λ k + τ (0 − u k +1 + v k +1 ) . (21) (cid:96) regularized image denoising The (cid:96) regularizer [6] is an alternative to the (cid:96) regularizer when computing total variation [12, 13]. (cid:96) regularizedimage denoising solves the nonconvex problem min x (cid:107) x − c (cid:107) + ρ (cid:107)∇ x (cid:107) (22) here c represents a given noisy image, ∇ is the linear gradient operator, and (cid:107) · (cid:107) / (cid:107) · (cid:107) denotes the (cid:96) / (cid:96) norm of vectors. The steps of ADMM for this problem are u k +1 = arg min u (cid:107) u − c (cid:107) + τ (cid:107) v k + λ k /τ − ∇ u (cid:107) (23) = ( I + τ ∇ T ∇ ) − ( c + τ ∇ T ( v k + λ k /τ )) (24) v k +1 = arg min v ρ (cid:107) v (cid:107) + τ (cid:107) − ∇ u k +1 + v + λ k /τ (cid:107) = hard ( ∇ u k +1 − λ k /τ, ρ/τ ) (25) λ k +1 = λ k + τ (0 − ∇ u k +1 + v k +1 ) (26)where the linear systems can be solved using fast Fourier transforms. Ptychographic phase retrieval [30, 26] solves problem min x || abs ( Dx ) − c || , (27)where x ∈ C n , D ∈ C m × n , and abs ( · ) denotes the elementwise magnitude of a complex-valued vector. ADMMis applied to the equivalent problem min u,v || abs ( u ) − c || subject to u − Dv = 0 . (28)Deﬁne the projection operator of a complex valued vector asabsProj ( z, c, t ) = min x (cid:107) abs ( x ) − c (cid:107) + t (cid:107) x − z (cid:107) = (cid:18) t t abs ( z ) + 11 + t c (cid:19) (cid:12) sign ( z ) , (29)where sign ( · ) denotes the elementwise phase of a complex-valued vector. In the following ADMM steps, noticethat the dual variable λ ∈ C m is complex, and the penalty parameter τ ∈ R is a real non-negative scalar, u k +1 = arg min u (cid:107) abs ( u ) − c (cid:107) + τ (cid:107) Dv k + λ k /τ − u (cid:107) = absProj ( Dv k + λ k /τ, c, τ ) (30) v k +1 = arg min v τ (cid:107) − u k +1 + Dv + λ k /τ (cid:107) = D − ( u k +1 − λ k /τ ) (31) λ k +1 = λ k + τ (0 − u k +1 + Dv k +1 ) . (32) The eigenvector problem is a fundamental problem in numerical linear algebra. The leading eigenvector of amatrix can be recovered by solving the Rayleigh quotient maximization problem max (cid:107) Dx (cid:107) subject to (cid:107) x (cid:107) = 1 . (33)ADMM is applied to the equivalent problem min −(cid:107) Du (cid:107) + ι { z : (cid:107) z (cid:107) =1 } ( v ) subject to u − v = 0 , (34)where ι S is the characteristic function of the set S : ι S ( v ) = 0 , if v ∈ S , and ι S ( v ) = ∞ , otherwise. TheADMM steps are u k +1 = arg min u −(cid:107) Du (cid:107) + τ (cid:107) − u + v k + λ k /τ (cid:107) = ( τ I − D T D ) − ( τ v k + λ k ) (35) v k +1 = arg min v ι { z : (cid:107) z (cid:107) =1 } ( v ) + τ (cid:107) − u k +1 + v + λ k /τ (cid:107) = u k +1 − λ k /τ (cid:107) u k +1 − λ k /τ (cid:107) (36) λ k +1 = λ k + τ (0 − u k +1 + v k +1 ) . (37) We provide the detailed construction of the synthetic dataset for our linear regression experiments. The samesynthetic dataset has been used in [31, 13, 28]. Based on three random normal vectors ν a , ν b , ν c ∈ R , the datamatrix D = [ d . . . d ] ∈ R × is deﬁned as d i =  ν a + e i , i = 1 , . . . , ,ν b + e i , i = 6 , . . . , ,ν c + e i , i = 11 , . . . , ,ν i ∈ N (0 , , i = 16 , . . . , , (38) here e i are random normal vectors from N (0 , . The problem is to recover the vector x ∗ = (cid:40) , i = 1 , . . . , , , otherwise (39)from noisy measurements of the form c = Dx ∗ + ˆ e, with ˆ e ∈ N (0 , . GroundtruthNoisyRecovered

Figure 3:

The synthetic one-dimensional signal for (cid:96) regularized image denoising. The groundtruth signal,noisy signal (PSNR = 37.8) and recovered signal by AADMM (PSNR = 45.4) are shown. Groundtruth Noisy Recovered

Figure 4:

The groundtruth image (left), noisy image (middle), and recovered image by AADMM (right) for (cid:96) regularized image denoising. The PSNR of the noisy/recovered images are 21.9/24.7 for "Barbara", 22.4/27.8for "Cameraman", 21.9/27.9 for "Lena".regularized image denoising. The PSNR of the noisy/recovered images are 21.9/24.7 for "Barbara", 22.4/27.8for "Cameraman", 21.9/27.9 for "Lena".