[PDF] Accelerating ADMM for Efficient Simulation and Optimization

Abstract

The alternating direction method of multipliers (ADMM) is a popular approach for solving optimization problems that are potentially non-smooth and with hard constraints. It has been applied to various computer graphics applications, including physical simulation, geometry processing, and image processing. However, ADMM can take a long time to converge to a solution of high accuracy. Moreover, many computer graphics tasks involve non-convex optimization, and there is often no convergence guarantee for ADMM on such problems since it was originally designed for convex optimization. In this paper, we propose a method to speed up ADMM using Anderson acceleration, an established technique for accelerating fixed-point iterations. We show that in the general case, ADMM is a fixed-point iteration of the second primal variable and the dual variable, and Anderson acceleration can be directly applied. Additionally, when the problem has a separable target function and satisfies certain conditions, ADMM becomes a fixed-point iteration of only one variable, which further reduces the computational overhead of Anderson acceleration. Moreover, we analyze a particular non-convex problem structure that is common in computer graphics, and prove the convergence of ADMM on such problems under mild assumptions. We apply our acceleration technique on a variety of optimization problems in computer graphics, with notable improvement on their convergence speed.

Full PDF

AAccelerating ADMM for Efficient Simulation and Optimization

JUYONG ZHANG ∗† , University of Science and Technology of China

YUE PENG ∗ , University of Science and Technology of China

WENQING OUYANG ∗ , University of Science and Technology of China

BAILIN DENG,

Cardiff University

Target Mesh Initial Mesh Optimized Mesh

ADMM Ours

Planarity Error N o r m a li ze d C o m b i n e d R e s i du a l Fig. 1. We apply our accelerated ADMM solver to optimize a quad mesh, subject to hard constraints of face planarity and soft constraints of closeness to areference surface. Our solver leads to a faster decrease of combined residual than the original ADMM, achieving better satisfaction of hard constraints withinthe same computational time (highlighted in the plot in bottom right).

The alternating direction method of multipliers (ADMM) is a popular ap-proach for solving optimization problems that are potentially non-smoothand with hard constraints. It has been applied to various computer graph-ics applications, including physical simulation, geometry processing, andimage processing. However, ADMM can take a long time to converge to asolution of high accuracy. Moreover, many computer graphics tasks involvenon-convex optimization, and there is often no convergence guarantee forADMM on such problems since it was originally designed for convex op-timization. In this paper, we propose a method to speed up ADMM usingAnderson acceleration, an established technique for accelerating fixed-pointiterations. We show that in the general case, ADMM is a fixed-point iterationof the second primal variable and the dual variable, and Anderson accelera-tion can be directly applied. Additionally, when the problem has a separabletarget function and satisfies certain conditions, ADMM becomes a fixed-point iteration of only one variable, which further reduces the computationaloverhead of Anderson acceleration. Moreover, we analyze a particular non-convex problem structure that is common in computer graphics, and provethe convergence of ADMM on such problems under mild assumptions. We ∗ Equal contributions. † Corresponding author ([email protected]).Authors’ addresses: { Juyong Zhang, Yue Peng, Wenqing Ouyang } , University ofScience and Technology of China, 96 Jinzhai Road, Hefei 230026, Anhui, China, { [email protected], [email protected], [email protected] } ; BailinDeng, Cardiff University, 5 The Parade, Cardiff CF24 3AA, Wales, United Kingdom,[email protected].© 2019 Association for Computing Machinery.This is the author’s version of the work. It is posted here for your personal use. Not forredistribution. The definitive Version of Record was published in ACM Transactions onGraphics , https://doi.org/10.1145/3355089.3356491. apply our acceleration technique on a variety of optimization problems incomputer graphics, with notable improvement on their convergence speed.CCS Concepts: •

Computing methodologies → Computer graphics ; Animation ; •

Theory of computation → Nonconvex optimization ;Additional Key Words and Phrases: Physics Simulation, Geometry Optimiza-tion, ADMM, Anderson Acceleration

ACM Reference Format:

Juyong Zhang, Yue Peng, Wenqing Ouyang, and Bailin Deng. 2019. Acceler-ating ADMM for Efficient Simulation and Optimization.

ACM Trans. Graph.

38, 6, Article 163 (November 2019), 21 pages. https://doi.org/10.1145/3355089.3356491

Many tasks in computer graphics involve solving optimization prob-lems. For example, a geometry processing task may compute thevertex positions of a deformed mesh by minimizing its deformationenergy [Sorkine and Alexa 2007], whereas a physical simulationtask may optimize the node positions of a system to enforce physicslaws that govern its behavior [Martin et al. 2011; Schumacher et al.2012]. Such tasks are often formulated as unconstrained optimiza-tion, where the target function penalizes the violation of certainconditions so that they are satisfied as much as possible by the solu-tion. It has been an active research topic to develop fast numericalsolvers for such problems, with various methods proposed in thepast [Sorkine and Alexa 2007; Liu et al. 2008; Bouaziz et al. 2012;Liu et al. 2013; Bouaziz et al. 2014; Wang 2015; Kovalsky et al. 2016;Liu et al. 2017; Shtengel et al. 2017; Rabinovich et al. 2017].

ACM Trans. Graph., Vol. 38, No. 6, Article 163. Publication date: November 2019. a r X i v : . [ c s . G R ] S e p On the other hand, some applications involve optimization with hard constraints , i.e., conditions that need to be enforced strictly.Such constrained optimization problems are often more difficult tosolve [Nocedal and Wright 2006]. One possible solution strategy isto introduce a quadratic penalty term for the hard constraints witha large weight, thereby converting it into an unconstrained prob-lem that is easier to handle. However, to strictly enforce the hardconstraints, their penalty weight needs to approach infinity [No-cedal and Wright 2006], which can cause instability for numericalsolvers. More sophisticated techniques, such as sequential quadraticprogramming or the interior-point method, can enforce constraintswithout stability issues. However, these solvers often incur high com-putational costs and may not meet the performance requirementsfor graphics applications. It becomes even more challenging fornon-smooth problems where the target function is not everywheredifferentiable, as many constrained optimization solvers requiregradient information and may not be applicable for such cases.In recent years, the alternating direction method of multipliers(ADMM) [Boyd et al. 2011] has become a popular approach for solv-ing optimization problems that are potentially non-smooth and withhard constraints. The key idea is to introduce auxiliary variablesand derive an equivalent problem with a separable target function,subject to a linear compatibility constraint between the originalvariables and the auxiliary variables [Combettes and Pesquet 2011].ADMM searches for a solution to this converted problem by al-ternately updating the original variables, the auxiliary variables,and the dual variables. With properly chosen auxiliary variables,each update step can reduce to simple sub-problems that can besolved efficiently, often in parallel with closed-form solutions. Inaddition, ADMM does not rely on the smoothness of the problem,and converges quickly to a solution of moderate accuracy [Boydet al. 2011]. Such properties make ADMM an attractive choice forsolving large-scale optimization problems in various applicationssuch as signal processing [Chartrand and Wohlberg 2013; Simonettoand Leus 2014], image processing [Figueiredo and Bioucas-Dias2010; Almeida and Figueiredo 2013], and computer vision [Liu et al.2013]. Recently, ADMM has also been applied for computer graphicsproblems such as geometry processing [Bouaziz et al. 2013; Neu-mann et al. 2013; Zhang et al. 2014; Xiong et al. 2014; Neumann et al.2014], physics simulation [Gregson et al. 2014; Pan and Manocha2017; Overby et al. 2017], and computational photography [Heideet al. 2016; Xiong et al. 2017; Wang et al. 2018].Despite the effectiveness and versatility of ADMM, there arestill two major limitations for its use in computer graphics. First,although ADMM converges quickly in initial iterations, its final con-vergence might be slow [Boyd et al. 2011]. This makes it impracticalfor problems with a strong demand for solution accuracy, such asthose with strict requirements on the satisfaction of hard constraints.Recent attempts to accelerate ADMM such as [Goldstein et al. 2014;Kadkhodaie et al. 2015; Zhang and White 2018] are only designedfor convex problems, which limits their applications in computergraphics. Second, ADMM was originally designed for convex prob-lems, whereas many computer graphics tasks involve non-convexoptimization. Although ADMM turns out to be effective for manynon-convex problems in practice, its convergence for general non-convex optimization remains an open research question. Recent convergence results such as [Li and Pong 2015; Hong et al. 2016;Magnússon et al. 2016; Wang et al. 2019] rely on strong assumptionsthat are not satisfied by many computer graphics problems.This paper addresses these two issues of ADMM. First, we proposea method to accelerate ADMM for non-convex optimization prob-lems. Our approach is based on Anderson acceleration [Anderson1965; Walker and Ni 2011], a well-established technique for acceler-ating fixed-point iterations. Previously, Anderson acceleration hasbeen applied to local-global solvers for unconstrained optimizationproblems in computer graphics [Peng et al. 2018]. Our approachexpands its applicability to many constrained optimization prob-lems as well as other unconstrained problems where local-solversolvers are not feasible. To this end, we need to solve two prob-lems: (i) we must find a way to interpret ADMM as a fixed-pointiteration; (ii) as Anderson acceleration can become unstable, weshould define criteria to accept the accelerated iterate and a fall-backstrategy when it is not accepted, similar to [Peng et al. 2018]. Weshow that in the general case ADMM is a fixed-point iteration of thesecond primal variable and the dual variable, and we can evaluatethe effectiveness of an accelerated iterate via its combined residual which is known to vanish when the solver converges. Moreover,when the problem structure satisfies some mild conditions, one ofthese two variables can be determined from the other one; in thiscase ADMM becomes a fixed-point iteration of only one variablewith less computational overhead, and we can accept an acceleratediterate based on a more simple condition. We apply this method toa variety of ADMM solvers for computer graphics problems, andobserve a notable improvement in their convergence rates.Additionally, we provide a new convergence proof of ADMM onnon-convex problems, under weaker assumptions than the conver-gence results in [Li and Pong 2015; Hong et al. 2016; Magnússon et al.2016; Wang et al. 2019]. For a particular problem structure that iscommon in computer graphics, we also provide sufficient conditionsfor the global linear convergence of ADMM. Our proofs shed newlight on the convergence properties of non-convex ADMM solvers.

Optimization solvers in computer graphics.

The development ofefficient optimization solvers has been an active research topic incomputer graphics. One particular type of method, called local-global solvers, has been widely used for unconstrained optimizationin geometry processing and physical simulation. For geometry pro-cessing, Sorkine and Alexa [2007] proposed a local-global approachto minimize deformation energy for as-rigid-as-possible mesh sur-face modeling. Liu et al. [2008] developed a similar method to per-form conformal and isometric parameterization for triangle meshes.Bouaziz et al. [2012] extended the approach to a unified frame-work for optimizing discrete shapes. For physical simulation, Liuet al. [2013] proposed a local-global solver for optimization-basedsimulation of mass-spring systems. Bouaziz et al. [2014] extendedthis approach to the projective dynamics framework for implicittime integration of physical systems via energy minimization.Local-global solvers often converge quickly to an approximatesolution, but may be slow for final convergence. Other methodshave been proposed to achieve improved convergence rates. For

ACM Trans. Graph., Vol. 38, No. 6, Article 163. Publication date: November 2019. ccelerating ADMM for Efficient Simulation and Optimization • 163:3 geometry processing, Kovalsky et al. [2016] achieved a fast con-vergence of geometric optimization by iteratively minimizing alocal quadratic proxy function. Rabinovich et.al. [2017] proposeda scalable approach to compute locally injective mappings, vialocal-global minimization of a reweighted proxy function. Claici etal. [2017] proposed a preconditioner for fast minimization of distor-tion energies. Shtengel et al. [2017] applied the idea of majorization-minimization [Lange 2004] to iteratively update and minimize aconvex majorizer of the target energy in geometric optimization.Zhu et al. [2018] proposed a fast solver for distortion energy min-imization, using a blended quadratic energy proxy together withimproved line-search strategy and termination criteria. For physi-cal simulation, Wang [2015] proposed a Chebyshev semi-iterativeacceleration technique for projective dynamics. Later, Wang andYang [2016] developed a GPU-friendly gradient descent method forelastic body simulation, using Jacobi preconditioning and Cheby-shev acceleration. Liu et al. [2017] proposed an L-BFGS solver forphysical simulation, with faster convergence than the projectivedynamics solver from [Bouaziz et al. 2014]. Brandt et al. [2018] per-formed projective dynamics simulation in a reduced subspace, tocompute fast approximate solutions for high-resolution meshes.

ADMM.

ADMM is a popular solver for optimization problemswith separable target functions and linear side constraints [Boydet al. 2011]. Using auxiliary variables and indicator functions, suchformulation allows for non-smooth optimization with hard con-straints, with wide applications in signal processing [Erseghe et al.2011; Simonetto and Leus 2014; Shi et al. 2014], image process-ing [Figueiredo and Bioucas-Dias 2010; Almeida and Figueiredo2013], computer vision [Hu et al. 2013; Liu et al. 2013; Yang et al.2017], computational imaging [Chan et al. 2017], automatic con-trol [Lin et al. 2013], and machine learning [Zhang and Kwok 2014;Hajinezhad et al. 2016]. ADMM has also been used in computergraphics to handle non-smooth optimization problems [Bouazizet al. 2013; Neumann et al. 2013; Zhang et al. 2014; Xiong et al.2014; Neumann et al. 2014] or to benefit from its fast initial conver-gence [Gregson et al. 2014; Heide et al. 2016; Xiong et al. 2017; Panand Manocha 2017; Overby et al. 2017; Wang et al. 2018].ADMM was originally designed for convex optimization [Gabayand Mercier 1976; Fortin and Glowinski 1983; Eckstein and Bertsekas1992]. For such problems, its global linear convergence has beenestablished in [Lin et al. 2015; Deng and Yin 2016; Giselsson and Boyd2017], but these proofs require both terms in the target function to beconvex. In comparison, our proof of global linear convergence allowsfor non-convex terms in the target function, which is better alignedwith computer graphics problems. In practice, ADMM works well formany non-convex problems as well [Wen et al. 2012; Chartrand 2012;Chartrand and Wohlberg 2013; Miksik et al. 2014; Lai and Osher2014; Liavas and Sidiropoulos 2015], but it is more challenging toestablish its convergence for general non-convex problems. Onlyvery recently have such convergence proofs been given under strongassumptions [Li and Pong 2015; Hong et al. 2016; Magnússon et al.2016; Wang et al. 2019]. We provide in this paper a general proof ofconvergence for non-convex problems under weaker assumptions.It is well known that ADMM converges quickly to an approximatesolution, but may take a long time to convergence to a solution of high accuracy [Boyd et al. 2011]. This has motivated researchers toexplore acceleration techniques for ADMM. Goldstein et al. [2014]and Kadkhodaie et al. [2015] applied Nesterov’s acceleration [Nes-terov 1983], whereas Zhang and White [2018] applied GMRES ac-celeration to a special class of problems where the ADMM iteratesbecome linear. All these methods are designed for convex problemsonly, which limits their applicability in computer graphics.

Anderson acceleration.

Anderson acceleration [Walker and Ni2011] is an established technique to speed up the convergence ofa fixed-point iteration. It was first proposed in [Anderson 1965]for solving nonlinear integral equations, and independently re-discovered later by Pulay [1980; 1982] for accelerating the self-consistent field method in quantum chemistry. Its key idea is toutilize the m previous iterates to compute a new iterate that con-verges faster to the fixed point. It is indeed a quasi-Newton methodfor finding a root of the residual function, by approximating itsinverse Jacobian using previous iterates [Eyert 1996; Fang and Saad2009; Rohwedder and Schneider 2011]. Recently, a renewed interestin this method has led to the analysis of its convergence [Toth andKelley 2015; Toth et al. 2017], as well as its application in variousnumerical problems [Sterck 2012; Lipnikov et al. 2013; Pratapa et al.2016; Suryanarayana et al. 2019; Ho et al. 2017]. Peng et al. [Penget al. 2018] noted that local-global solvers in computer graphicscan be treated as fixed-point iteration, and applied Anderson ac-celeration to improve their convergence. Additionally, to addressthe stability issue of classical Anderson acceleration [Walker andNi 2011; Potra and Engler 2013], they utilize the monotonic energydecrease of local-global solvers and only accept an accelerated it-erate when it decreases the target energy. Fang and Saad [2009]called classical Anderson acceleration the Type-II method in anAnderson family of multi-secant methods. Another member of thefamily, called the type-I method, uses quasi-Newton to approximatethe Jacobian of the fixed-point residual function instead [Walkerand Ni 2011], and has been analyzed recently in [Zhang et al. 2018].In this paper, we focus our discussion on the type-II method. ADMM.

Let us consider an optimization problemmin x Φ ( x , Dx + h ) . (1)Here x can be the vertex positions of a discrete geometric shape, orthe node positions of a physical system at a particular time instance.The quantity Dx + h encodes a transformation of the positions x relevant for the optimization problem, such as the deformation gra-dient of each tetrahedron element in an elastic object. The notation Φ ( x , Dx + h ) signifies that the target function contains a term thatdirectly depends on Dx + h , such as elastic energy dependent onthe deformation gradient. In some applications, the optimizationenforces hard constraints on x or Dx + h , i.e., conditions that needto be strictly satisfied by the solution. Such hard constraints can beencoded using an indicator function term within the target function.Specifically, suppose we want to enforce a condition y ∈ C where y is a subset from the components of x or Dx + h , and C is the feasible ACM Trans. Graph., Vol. 38, No. 6, Article 163. Publication date: November 2019. set . Then we include the following term into Φ : σ C ( y ) = (cid:26) y ∈ C + ∞ otherwise . By definition, if x ∗ is a solution, then the corresponding components y ∗ must satisfy y ∗ ∈ C ; otherwise it will result in a target functionvalue + ∞ instead of the minimum. Examples of such an approachto modeling hard constraints can be found in [Deng et al. 2015].In many applications, the optimization problem (1) can be non-linear, non-convex, and potentially non-smooth. It is challenging tosolve such a problem numerically, especially when hard constraintsare involved. One common technique is to introduce an auxiliaryvariable z = Dx + h to derive an equivalent problemmin x , z Φ ( x , z ) s.t. W ( z − Dx − h ) = , (2)where W is a diagonal matrix with positive diagonal elements. W can be the identity matrix in the trivial case, or a diagonal scal-ing matrix that improves conditioning [Giselsson and Boyd 2017;Overby et al. 2017]. ADMM [Boyd et al. 2011] is widely used to solvesuch problems. For ease of discussion, let us consider the problemmin x , z Φ ( x , z ) s.t. Ax − Bz = c , (3)Its solution corresponds to a stationary point of the augmentedLagrangian function L ( x , z , u ) = Φ ( x , z ) + ⟨ µ u , Ax − Bz − c ⟩ + µ ∥ Ax − Bz − c ∥ = Φ ( x , z ) + µ ∥ Ax − Bz + u − c ∥ − µ ∥ u ∥ . (4)Here u is the dual variable and µ > x and z the primal variables .ADMM searches for a stationary point by alternately updating x , z and u , resulting in the following iteration scheme [Boyd et al. 2011]: x k + = argmin x L ( x , z k , u k ) , z k + = argmin z L ( x k + , z , u k ) , u k + = u k + Ax k + − Bz k + − c . (5)We can also update z before x , resulting in an alternative scheme: z k + = argmin z L ( x k , z , u k ) , x k + = argmin x L ( x , z k + , u k ) , u k + = u k + Ax k + − Bz k + − c . (6)In this paper, we refer to the scheme (5) as x - z - u iteration, and thescheme (6) as z - x - u iteration. In both cases, the updates for z and x often reduce to simple subproblems that can potentially be solvedin parallel. According to [Boyd et al. 2011], the optimality conditionof ADMM is that both its primal residual and dual residual vanish.For both iteration schemes above, the primal residual is defined as r k + = Ax k + − Bz k + − c . As for the dual residual: for the x - z - u iteration it is defined as r k + = µ A T B ( z k + − z k ) , (7) whereas for the z - x - u iteration it is defined as r k + = µ B T A ( x k + − x k ) . (8)Intuitively, the primal residual measures the violation of the linearside constraint, whereas the dual residual measures the violation ofthe dual feasibility condition [Boyd et al. 2011]. Accordingly, ADMMis terminated when both ∥ r k + ∥ and ∥ r k + ∥ are small enough. Anderson acceleration.

ADMM is easy to parallelize and conver-gences quickly to an approximate solution. However, it can take along time to converge to a solution of high accuracy [Boyd et al.2011]. In the following subsections, we will discuss how to applyAnderson acceleration [Walker and Ni 2011] to improve its conver-gence. Anderson acceleration is a technique to speed up the conver-gence of a fixed-point iteration G : R n R n , by utilizing the cur-rent iterate as well as m previous iterates. Let q k − m , q k − m + , . . . , q k be the latest m + G as F k − m , F k − m + , . . . , F k , where F j = G ( q j ) − q j ( j = k − m , . . . , k ). Then the accelerated iterate is computed as q k + = ( − β ) '›« q k − m (cid:213) j = θ ∗ j ( q k − j + − q k − j ) “ﬁ‹ + β '›« G ( q k ) − m (cid:213) j = θ ∗ j ( G ( q k − j + ) − G ( q k − j )) “ﬁ‹ , (9)where ( θ ∗ , . . . , θ ∗ m ) is the solution to a linear least-squares problem:min ( θ ,..., θ m ) (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) F k − m (cid:213) j = θ j ( F k − j + − F k − j ) (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) . (10)In Eq. (9), β ∈ ( , ] is a mixing parameter, and is typically setto 1 [Walker and Ni 2011]. We follow this convention throughoutthis paper. Previously, Anderson acceleration has been applied tospeed up local-global solvers in computer graphics [Peng et al. 2018]. To speed up ADMM with Anderson acceleration, we must first defineits iteration scheme as a fixed-point iteration. For the x - z - u iteration,we note that x k + is dependent only on z k and u k . Therefore, bytreating x k + as a function of ( z k , u k ) , we can rewrite z k + , andsubsequently u k + , as a function of ( z k , u k ) as well. In this way, the x - z - u iteration can be treated as a fixed-point iteration of ( z , u ) : ( z k + , u k + ) = G ( z k , u k ) . Similarly, we can treat the z - x - u scheme as a fixed-point iteration of ( x , u ) . In addition, to ensure stability for Anderson acceleration, weshould define criteria to evaluate the effectiveness of an acceleratediterate, as well as a fall-back strategy when the criteria are not met.Goldstein et al. [2014] pointed out that if the problem is convex,then its combined residual is monotonically decreased by ADMM.For the x - z - u iteration, the combined residual is defined as r k + x - z - u = µ ∥ Ax k + − Bz k + − c ∥ + µ ∥ B ( z k + − z k )∥ . (11)Here the first term is a measure of the primal residual, whereas thesecond term is related to the dual residual (7) but without the matrix ACM Trans. Graph., Vol. 38, No. 6, Article 163. Publication date: November 2019. ccelerating ADMM for Efficient Simulation and Optimization • 163:5

Algorithm 1:

Anderson acceleration for ADMM with x - z - u iteration. Data: x , z , u : initial values of variables; L : the augmented Lagrangian function; m : the number of previous iterates used for acceleration;AA (G , F) : Anderson accleration from a sequence G of fixed-pointmapping results of previous iterates, and a sequence F of theircorresponding fixed-point residuals; I max : the maximum number of iterations; ε : convergence threshold for combined residual. x default = x ; z default = z ; u default = u ; r prev = + ∞ ; j =

0; reset = TRUE; k = while TRUE do // Run one iteration of ADMM x ⋆ = argmin x L ( x , z k , u k ) ; z ⋆ = argmin z L ( x ⋆ , z , u k ) ; u ⋆ = u k + Ax ⋆ − Bz ⋆ − c ; // Compute the combined residual r = ∥ Ax ⋆ − Bz ⋆ − c ∥ + ∥ B ( z ⋆ − z k )∥ ; if reset == TRUE OR r < r prev then // Record the latest accepted iterate x default = x ⋆ ; z default = z ⋆ ; u default = u ⋆ ; r prev = r ; reset = FALSE; // Compute the accelerated iterate g j = ( z ⋆ , u ⋆ ) ; f j = ( z ⋆ − z k , u ⋆ − u k ) ; j = j + m = min ( m − , j ) ; ( z k + , u k + ) = AA (cid:0) [ g j , . . . , g j − m ] , [ f j , . . . , f j − m ] (cid:1) ; k = k + else // Revert to the last accepted iterate z k = z default ; u k = u default ; reset = TRUE; end if if k ≥ I max OR r < ε then // Check termination return x default ; // Return the last accepted x end if end while A T . The combined residual for the z - x - u iteration is defined as r k + z - x - u = µ ∥ Ax k + − Bz k + − c ∥ + µ ∥ A ( x k + − x k )∥ . (12)Although [Goldstein et al. 2014] only proved the monotonic decreaseof the combined residual for convex problems, our experimentsshow that the combined residual is decreased by the majority ofiterates from the non-convex ADMM solvers considered in this pa-per. Indeed, if ADMM converges to a solution, then both the primalresidual Ax k + − Bz k + − c and the variable changes z k + − z k and x k + − x k must converge to zero, so the combined residual mustconverge to zero as well. Therefore, we evaluate the effectivenessof an accelerated iterate by checking whether it decreases the com-bined residual compared with the previous iteration, and revert tothe un-accelerated ADMM iterate if this is not the case.Algorithm 1 summarizes our Anderson acceleration approach forthe x - z - u iteration. Note that the evaluation of combined residualrequires computing the change of z in one un-accelerated ADMMiteration. However, given an accelerated iterate ( z AA , u AA ) , it is of-ten difficult to find a pair ( z † , u † ) that leads to ( z AA , u AA ) after one Algorithm 2:

Anderson acceleration for ADMM with z - x - u iteration. x default = x ; u default = u ; r prev = + ∞ ; j =

0; reset = TRUE; k = while TRUE do z ⋆ = argmin z L ( x k , z , u k ) ; x ⋆ = argmin x L ( x , z ⋆ , u k ) ; u ⋆ = u k + Ax ⋆ − Bz ⋆ − c ; r = ∥ Ax ⋆ − Bz ⋆ − c ∥ + ∥ A ( x ⋆ − x k )∥ ; if reset == TRUE OR r < r prev then x default = x ⋆ ; u default = u ⋆ ; r prev = r ; reset = FALSE; j = j + m = min ( m − , j ) ; g j = ( x ⋆ , u ⋆ ) ; f j = ( x ⋆ − x k , u ⋆ − u k ) ; ( x k + , u k + ) = AA (cid:0) [ g j , . . . , g j − m ] , [ f j , . . . , f j − m ] (cid:1) ; k = k + else x k = x default ; u k = u default ; reset = TRUE; end if if k ≥ I max OR r < ε then return x default ; end if end while ADMM iteration (i.e., ( z AA , u AA ) = G ( z † , u † ) ). Therefore, we runone ADMM iteration on ( z AA , u AA ) instead, and use the resultingvalues ( z ⋆ , u ⋆ ) = G ( z AA , u AA ) to evaluate the combined residual. Ifthe accelerated iterate is accepted, then the computation of ( z ⋆ , u ⋆ ) can be reused in the next step of the algorithm and incurs no over-head. We can derive an acceleration method for the x - z - u iterationin a similar way, by swapping x and z and adopting Eq. (8) for thecomputation of combined residual, as summarized in Algorithm 2. Remark . If the target function Φ contains an indicator functionfor a hard constraint on the primal variable updated in the secondstep of an ADMM iteration (i.e., z in the x - z - u iteration, or x in the z - x - u iteration), then after each iteration this variable must satisfy thehard constraint. However, as Anderson acceleration computes theaccelerated iterate via an affine combination of previous iterates, theaccelerated z AA or x AA may violate the constraint unless its feasibleset is an affine space. In other words, the accelerated iterate may notcorrespond to a valid ADMM iteration, and may cause issues if it isused as a solution. Therefore, to apply Anderson acceleration, weshould ensure that Φ contains no indicator function associated withthe primal variable updated in the second step of the original ADMMiteration. This does not limit the applicability of our method, becauseit can always be achieved by introducing auxiliary variables andchoosing an appropriate iteration scheme. The simulation in Fig. 4 isan example of changing the iteration scheme to allow acceleration. The general approach in Section 3.2 does not assume any specialstructure of the target function. When the target function termsfor x and z are separable, it is possible to improve the efficiency ofacceleration further. To this end, we consider the following problemmin x , z f ( x ) + д ( z ) , s.t. Ax − Bz = c . (13)Moreover, we assume this problem satisfies the following properties: ACM Trans. Graph., Vol. 38, No. 6, Article 163. Publication date: November 2019.

Assumption 3.1.

Matrix B is invertible. Assumption 3.2. f ( x ) is a strongly convex quadratic function f ( x ) = ( x − ˜ x ) T G ( x − ˜ x ) , (14) where ˜ x is a constant and G is a symmetric positive definite matrix. One example of such optimization is the implicit time integrationof elastic bodies in [Overby et al. 2017], where ˜ x is the predictedvalues of node positions x without internal forces, G = M / ∆ t where M is the mass matrix and ∆ t is the integration time step, theauxiliary variable z stacks the deformation gradient of each element,and д ( z ) sums the elastic potential energy for all elements. For theproblem (13), the x - z - u iteration of ADMM becomes x k + = ( G + µ A T A ) − ( G ˜ x + µ A T ( Bz k + c − u k )) , (15) z k + = argmin z (cid:16) д ( z ) + µ ∥ Ax k + − Bz − c + u k ∥ (cid:17) , (16) u k + = u k + Ax k + − Bz k + − c . (17)And the z - x - u iteration becomes z k + = argmin z (cid:16) д ( z ) + µ ∥ Ax k − Bz − c + u k ∥ (cid:17) , (18) x k + = ( G + µ A T A ) − ( G ˜ x + µ A T ( Bz k + + c − u k )) , (19) u k + = u k + Ax k + − Bz k + − c . (20)Similar to Remark 3.1, we assume that the target function containsno indicator function for the primal variable updated in the secondstep. The general acceleration algorithms in Section 3.2 treat ADMMas a fixed-point iteration of ( z , u ) or ( x , u ) . Next, we will show thatif the problem satisfies certain conditions, then ADMM becomes afixed-point iteration of only one variable, allowing us to reduce theoverhead of Anderson acceleration and improve its effectiveness. Remark . Without assuming the convexity of function д (·) , theremay be multiple solutions for the minimization problems in (16)and (18). Throughout this paper, we assume the solver adopts adeterministic algorithm for (16) and (18), so that given the samevalues of x and u it always returns the same value of z . x - z - u iteration. For the x - z - u iteration (15)-(17), under cer-tain conditions u k + can be represented as a function of z k + :Proposition 3.1. If the optimization problem (13) satisfies As-sumptions 3.1 and 3.2, and the function д ( z ) is differentiable, then the x - z - u iteration (15) - (17) satisfies u k + = µ B − T ∇ д ( z k + ) . (21)A proof is given in Appendix A. Proposition 3.1 shows that u k + can be recovered from z k + . Therefore, we can treat the x - z - u iter-ation (15)-(17) as a fixed-point iteration of z instead of ( z , u ) , andapply Anderson acceleration to z alone. From the accelerated z AA ,we recover its corresponding dual variable u AA via Eq. (21). Thisapproach brings two major benefits. First, the main computationaloverhead for Anderson acceleration in each iteration is to updatethe normal equation system for the problem (10), which involvesinner products of time complexity O ( mn ) where n is the dimension Algorithm 3:

Anderson acceleration for ADMM with x - z - u iteration, ona problem (13) that satisfies Assumptions 3.1, 3.2 and with a differentiable д . r prev = + ∞ ; j =

0; reset = TRUE; k = while TRUE do // Update x with (15) and compute residual with (22) x k + = ( G + µ A T A ) − ( G ˜ x + µ A T ( Bz k + c − u k )) ; r = ∥ Ax k + − Bz k − c ∥ ; if reset == FALSE AND r ≥ r prev then // Check residual z k = z default ; // Revert to un-accelerated z // Re-compute u and x with (17) and (15) u k = u k − + Ax k − Bz k − c ; x k + = ( G + µ A T A ) − ( G ˜ x + µ A T ( Bz k + c − u k )) ; // Re-compute residual r = ∥ Ax k + − Bz k − c ∥ ; reset = TRUE; end if // Check termination criteria if k + ≥ I max OR r < ε then return x k + ; end if // Compute un-accelerated z value with (16) z default = argmin z (cid:16) д ( z ) + µ ∥ Ax k + − Bz − c + u k ∥ (cid:17) // Compute accelerated z value j = j + m = min ( m , j ) ; g j = z default ; f j = z default − z k ; z k + = AA (cid:0) [ g j , . . . , g j − m ] , [ f j , . . . , f j − m ] (cid:1) ; // Recover compatible u value with (21) u k + = µ B − T ∇ д ( z k + ) ; k = k + r prev = r ; end while of variables that undergo fixed-point iteration [Peng et al. 2018].Since B is invertible, u and z are of the same dimension; thus thisnew approach reduces the computational cost of inner products byhalf. Another benefit is a more simple criterion for the effectivenessof an accelerated iterate, based on the following property:Proposition 3.2. Suppose the problem (13) satisfies Assumptions 3.1and 3.2, and the function д ( z ) is differentiable. Let z k + = G xzu ( z k ) denote the fixed-point iteration of z induced by the x - z - u iteration (15) - (17) . Then z k + is a fixed point of mapping G xzu (·) if and only if Ax k + − Bz k + − c = . (22)A proof is given in Appendix B. Note that the left-hand side of (22)has a similar form as the primal residual, but involves the value of x in the next iteration. Accordingly, we evaluate the effectivenessof an accelerated iterate z AA and its corresponding dual variable u AA by first computing a new value x ⋆ according to the x -updatestep (15), then evaluating a residual ˆ r x - z - u = Ax ⋆ − Bz AA − c . We only accept z AA if it leads to a smaller norm of this residualcompared to the previous iteration; otherwise, we revert to the lastun-accelerated iterate. If z AA is accepted, then x ⋆ can be reused inthe next step. The main benefit here is that we do not need to run an ACM Trans. Graph., Vol. 38, No. 6, Article 163. Publication date: November 2019. ccelerating ADMM for Efficient Simulation and Optimization • 163:7 additional ADMM iteration to verify the effectiveness of z AA , whichincurs less computational overhead when the accelerated iterate isrejected. This acceleration strategy is summarized in Algorithm 3.Fig. 2 shows an example where accelerating z alone leads to a fasterdecrease of combined residual than accelerating z , u together. z - x - u iteration. Similar to the previous discussion, whenthe problem satisfies certain conditions, the z - x - u scheme is a fixed-point iteration of only one variable. In particular, we have:Proposition 3.3. If the optimization problem (13) satisfies As-sumptions 3.1 and 3.2, then the z - x - u iteration (18) - (20) satisfies x k + = ˜ x − µ G − A T u k + . (23)A proof is given in Appendix C. This property implies that x k + can be recovered from u k + ; thus we can treat the z - x - u scheme (18)-(20) as a fixed-point iteration of u instead of ( x , u ) . In theory, wecan apply Anderson acceleration to the history of u to obtain anaccelerated iterate u AA , and recover the corresponding x AA fromEq. (23). However, this would require solving a linear system withmatrix G , and can be computationally expensive. Instead, we notethat x k + and u k + are related to by an affine map, and this relationis satisfied by any previous pair of x and u values. Then since u AA is an affine combination of previous u values, we can apply thesame affine combination coefficients to the corresponding previous x values to obtain x AA , which is guaranteed to satisfy Eq. (23) with u AA . As the affine combination coefficients are computed from u only, this still reduces the computational cost compared to applyingAnderson acceleration to ( x , u ) . Similar to the x - z - u case, we canverify the convergence of the z - x - u iteration by comparing x in thecurrent iteration with the value of z in the next iteration:Proposition 3.4. Suppose the problem (13) satisfies Assumptions 3.1and 3.2. Let u k + = G zxu ( u k ) denote the fixed-point iteration of u induced by the x - z - u iteration (18) - (20) . Then u k + is a fixed point ofmapping G zxu (·) if and only if Ax k + − Bz k + − c = . (24)Accordingly, we evaluate the effectiveness of u AA and x AA bycomputing from them a z ⋆ using Eq. (18), and evaluating the residualˆ r z - x - u = Ax AA − Bz ⋆ − c . We accept u AA if the norm of this residualis smaller than the previous iteration, and revert to the last un-accelerated iterate otherwise. If u AA is accepted, then z ⋆ is reusedin the next step. Algorithm 4 summarizes our approach. Remark . We have shown that ADMM can be reduced to a fixed-point iteration of the second primal variable or the dual variablebased on Assumptions 3.1 and 3.2, and (for the x - z - u iteration) thesmoothness of д . In fact, these assumptions can be further relaxed.We refer the reader to Appendix E for more details. Fig. 11 is anexample of using such relaxed conditions to reduce the fixed-pointiteration to one variable. Algorithm 4:

Anderson acceleration for ADMM with z - x - u iteration,on a problem (13) that satisfies Assumptions 3.1 and 3.2. r prev = + ∞ ; j =

0; reset = TRUE; k = while TRUE do // Update z with (18) and compute residual with (24) z k + = argmin z (cid:16) д ( z ) + µ ∥ Ax k − Bz − c + u k ∥ (cid:17) ; r = ∥ Ax k − Bz k + − c ∥ ; // Check whether the residual increases if reset == FALSE AND r ≥ r prev then // Revert to un-accelerated x , u x k = x default ; u k = u default ; z k + = argmin z (cid:16) д ( z ) + µ ∥ Ax k − Bz − c + u k ∥ (cid:17) ; r = ∥ Ax k − Bz k + − c ∥ ; reset = TRUE; end if if k + ≥ I max OR r < ε then return x k ; end if // Compute un-accelerated x and u x default = ( G + µ A T A ) − ( G ˜ x + µ A T ( Bz k + + c − u k )) ; u default = u k + Ax default − Bz k + − c ; // Use history of u to compute affine coeffients j = j + m = min ( m , j ) ; g x j = x default ; g u j = u default ; f u j = u default − u k ; ( θ ∗ , . . . , θ ∗ m ) = argmin ( θ ,..., θ m ) (cid:13)(cid:13)(cid:13) f u j − ˝ mi = θ i ( f u j − i + − f u j − i ) (cid:13)(cid:13)(cid:13) ; // Compute accelerated x and u with the coefficients x k + = g x j − ˝ mi = θ ∗ i (cid:16) g x j − i + − g x j − i (cid:17) ; u k + = g u j − ˝ mi = θ ∗ i (cid:16) g u j − i + − g u j − i (cid:17) ; k = k + r prev = r ; end while For Anderson acceleration to be applicable, an ADMM solver mustbe convergent already. However, many ADMM solvers used incomputer graphics lack a convergence guarantee due to the non-convexity of the problems they solve. Although ADMM works wellfor many non-convex problems in practice, convergence proofs onsuch problems rely on strong assumptions that are often not satisfiedby graphics problems [Li and Pong 2015; Hong et al. 2016; Magnús-son et al. 2016; Wang et al. 2019]. In this subsection, we discuss theconvergence of ADMM on the problem (13) where the term д in thetarget function can be non-convex. We first provide a set of condi-tions for linear convergence of ADMM on such problems, and thengive more general convergence proofs using weaker assumptionsthan existing results in the literature. As the problem structure (13)is common in computer graphics, our new results can potentiallyexpand the applicability of ADMM for graphics problems.To ease the presentation, we first introduce some notation. Toaccount for the fact that the target function may be unbounded fromabove due to an indicator function, we suppose all the functions aremappings to R — { + ∞} . Following [Rockafellar 1997], for a function ACM Trans. Graph., Vol. 38, No. 6, Article 163. Publication date: November 2019. F we define its effective domain and level set as:dom ( F ) B { x | f ( x ) < + ∞} , L Fα B { x | f ( x ) ≤ α } , given α ∈ R . A function F is level-bounded if L Fα is a bounded set for any α ∈ R .Given a set S , let I S and B S denote the interior and the bound-ary of S , respectively. A function F is continuous on R n if: (i) itis continuous within I dom ( F ) in the conventional sense; and (ii) ∀ x k → x ∈ B dom ( F ) , we have F ( x k ) → F ( x ) = + ∞ . We say a func-tion is Lipschitz differentiable if it is differentiable and its gradient isLipschitz continuous. Unless specified otherwise, I denotes the iden-tity matrix and the identity map. The symbol conv (S) denotes theconvex hull of a set S , and ∂ F denotes the set of all sub-differentialsfor a function F (see [Rockafellar and Wets 2009, Definition 8.3(b)]).For matrix Q , we use ρ ( Q ) to represent its spectral radius. We willdiscuss conditions for the ADMM iterates {( x k , z k , u k )} to convergeto a stationary point ( x ∗ , z ∗ , u ∗ ) of the augmented Lagrangian forproblem (13), which is defined by the conditions [Boyd et al. 2011]: Ax ∗ − Bz ∗ = c , ∈ ∂ f ( x ∗ ) + A T u ∗ , ∈ ∂ д ( z ∗ ) − B T u ∗ . (25) Linear convergence.

Our discussion involves the following defini-tions related to the problem (13) and Assumptions 3.1 and 3.2:ˆ д ( z ) B д ( B − z ) , K B AG − A T . (26)We denote by ρ ( K ) the spectral radius of matrix K . To prove linearconvergence of ADMM for the problem (13) regardless of its initialvalue, we need the following assumption:Assumption 3.3. ∇ ˆ д is Lipschitz differentiable on R n with a Lips-chitz constant L , i.e. ∥∇ ˆ д ( z ) − ∇ ˆ д ( z )∥ ≤ L ∥ z − z ∥ ∀ z , z ∈ R n . Then we have:Theorem 3.1.

If Assumptions 3.1-3.3 are satisfied and ρ ( K ) < L ,then for a sufficiently large µ the x - z - u iteration (15) - (17) convergesto a stationary point defined in Eq. (25) . Moreover, ∥ Bz n + − Bz n ∥ ≤ γ ∥ Bz n − Bz n − ∥ , where γ = µρ ( K ) + µρ ( K ) + Lµ − Lµ < is a constant. Theorem 3.2.

If Assumptions 3.1-3.3 are satisfied, ρ ( K ) < L and I − µ K is invertible, then for a sufficiently large µ the z - x - u iteration (18) - (20) converges to a stationary point defined in Eq. (25) . Moreover, ∥ v k + − v k ∥ ≤ γ ∥ v k − v k − ∥ , where v k = ( I − µ K ) u k and γ = µρ ( K ) + µρ ( K ) + Lµ − L < . Proofs are provided in Appendix F. The theorems above rely onAssumption 3.3 which requires the function д to be globally Lipschitzdifferentiable. This may not be the case for some graphics problems.For example, the StVK energy used for simulation of hyperelasticmaterials is a quartic function of the deformation gradient, and islocally Lipschitz differentiable but not globally so. For such problems,we can still prove linear convergence with additional conditionson its initial value and penalty parameter. In the following, we use T ( x , z ) to denote the target function (13). We make the followingrelaxed assumption about ˆ д :Assumption 3.4. (1) ˆ д is level-bounded, and ˆ д ( z ) ≥ ∀ z ∈ R n .(2) ˆ д is continuous on R n and differentiable in I dom ( ˆ д ) .(3) ˆ д is Lipschitz differentiable on any compact convex set in dom ( ˆ д ) . For linear convergence of the x - z - u iteration, we assume thefollowing for the initial value ( x , z , u ) and penalty parameter µ :Assumption 3.5. (1) z = B − ( Ax − c ) , u = µ B − T ∇ д ( z ) . z ∈ dom ( д ) .(2) µ is large enough such that c ≤ , where c = sup z ∈ L дT + µ ∥ B − T ∇ д ( z )∥ and T = T ( x , z ) . Moreover, suppose conv ( L ˆ дT + c ) ⊂ dom ( ˆ д ) and let L c be a Lipschitz constant of ∇ ˆ д over this set. Theorem 3.3.

Suppose Assumptions 3.1, 3.2, 3.4, 3.5 are satisfied, µ − L c µ > L c , and ρ ( K ) < L c . Then for a sufficiently large µ the x - z - u iteration (15) - (17) converges to a stationary point defined in Eq. (25) ,and ∥ Bz n + − Bz n ∥ ≤ γ ∥ Bz n − Bz n − ∥ with γ = µρ ( K ) + µρ ( K ) + L c µ − L c µ < . For the z - x - u iteration, we need a different assumption that relieson the following proposition which is proved in Appendix F.3:Proposition 3.5. Let R ( A ) be the range of matrix A . Then for any x ∈ R ( A ) , ∥ Kx ∥ ≥ η ∥ x ∥ where η > is a constant depending on K . Assumption 3.6.

The initial value ( x , z , u ) satisfies:(1) z = B − ( Ax − c ) , x = ˜x , u = . z ∈ dom ( д ) .(2) µ is large enough such that c + c ≤ , where c = sup ( x , z )∈ L TT + η µ ∥ Ax − A˜x ∥ + ( ρ ( K ) µη + µ )∥ B − T ∇ д ( z )∥ , c = ( ρ ( K ) µη + µ )∥ B − T ∇ д ( z )∥ , where η is defined in Proposition 3.5. Moreover, let L d be aLipschitz constant of ∇ ˆ д over conv ( L ˆ дT + c + c ) , and suppose conv ( L ˆ дT + c + c ) ⊂ dom ( ˆ д ) . Theorem 3.4.

Suppose Assumptions 3.1, 3.2, 3.4, 3.6 are satisfied, ρ ( K ) < L d , and I − µ K is invertible. Then for a sufficiently large µ the z - x - u iteration (18) - (20) converges to a stationary point defined inEq. (25) , and ∥ v k + − v k ∥ ≤ γ ∥ v k − v k − ∥ , with v k = ( I − µ K ) u k and γ = µρ ( K ) + µρ ( K ) + L d µ − L d < . The proofs for these two theorems are given in Appendix F.

Remark . Unlike existing linear convergence proofs such as [Linet al. 2015; Deng and Yin 2016; Giselsson and Boyd 2017], we do notrequire both f and д to be convex. This makes our proofs applicableto some graphics problems with a non-convex д , such as the elasticbody simulation problem in [Overby et al. 2017] where д is anelastic potential energy. In the supplementary material we providenumerical verification of linear convergence on such a problem. ACM Trans. Graph., Vol. 38, No. 6, Article 163. Publication date: November 2019. ccelerating ADMM for Efficient Simulation and Optimization • 163:9 x-z-u (Soft Rubber)

Normalized Forward Residual N o r m a li ze d C o m b i n e d R e s i du a l N o r m a li ze d F o r w a r d R e s i du a l N o r m a li ze d C o m b i n e d R e s i du a l N o r m a li ze d C o m b i n e d R e s i du a l N o r m a li ze d F o r w a r d R e s i du a l ( z , u )-Acceleration z -Acceleration z -Acceleration x-z-u (Rubber) Fig. 2. Comparison between the ADMM solver in [Overby et al. 2017] and our method according to Algorithm 3, for computing the same frame of a simulationsequence with three elastic bars. Two material stiffness settings (“soft rubber” and “rubber”) are used for testing. In both case, our method leads to fasterdecrease of residuals and accelerates the convergence. For the case with rubber, we also test Algorithm 1 that applies Anderson acceleration to ( z , u ) , whichalso speeds up the convergence but is less effective than accelerating z alone. General convergence under weak assumptions.

If a linear conver-gence rate is not needed, the assumptions above can be furtherrelaxed to prove the convergence of ADMM on problem (13): in-stead of the relation between the matrix K and the Lipschitz constant L , we require the following weak assumption on function д .Assumption 3.7. д is a semi-algebraic function. A function F : R n R is called semi-algebraic if its graph { (cid:0) y , F ( y ) (cid:1) | y ∈ R n } ⊂ R n + is a union of finitely many setseach defined by a finite number of polynomial equalities and strictinequalities [Li and Pong 2015]. This assumption covers a large rangeof functions used in computer graphics. For example, polynomials(such as StVK energy) and rational functions (such as NURBS) areboth semi-algebraic. Then we have:Theorem 3.5. Suppose Assumptions 3.1, 3.2, 3.4, 3.5 and 3.7 aresatisfied, and µ − L c µ > L c . Then for a sufficiently large µ the x - z - u iteration (15) - (17) converges to a stationary point defined in Eq. (25) ,and ˝ + ∞ n = ∥ z k + − z k ∥ < ∞ . Theorem 3.6.

If Assumptions 3.1, 3.2, 3.4, 3.6 and 3.7 are satisfied,then for a sufficiently large µ the z - x - u iteration (18) - (20) converges toa stationary point defined in Eq. (25) , and ˝ + ∞ n = ∥ Ax k + − Ax k ∥ < ∞ . Proofs are given in Appendix F.1 and F.3.

Remark . Compared with existing convergence results for non-convex ADMM such as [Li and Pong 2015; Wang et al. 2019], forthe x - z - u iteration we do not require the function д to be globallyLipschitz differentiable, and for the z - x - u iteration we do not re-quire the matrix A to be of full row rank. This makes our resultsapplicable to a wider range of problems in computer graphics. Inparticular, for geometry optimization, the reduction matrix A thatrelates vertex positions to auxiliary variables may not be of full rowrank, potentially due to the presence of auxiliary variables that arederived in the same way from vertex positions but involved in dif-ferent constraints. Although for the z - x - u iteration our assumptions N o r m a li ze d C o m b i n e d R e s i du a l x-z-u (Soft Rubber) ADMMOur methodOver relaxation (α = 1.5)Over relaxation (α = 1.6)Over relaxation (α = 1.7)Over relaxation (α = 1.8)[Kadkhodaie et al. 2015][Goldstein et al. 2014]

Fig. 3. Comparison with other ADMM acceleration schemes on the samenon-convex problem for rubber simulation as Fig. 2. The methods from [Gold-stein et al. 2014] and [Kadkhodaie et al. 2015], which are designed for convexproblems, are ineffective for this problem instance. Over-relaxation is effec-tive in accelerating the convergence, but not as much as our approach. on д are more restrictive than those in [Li and Pong 2015; Wanget al. 2019], such assumptions are still general enough to be satisfiedby many graphics problems. We apply our methods to a variety of ADMM solvers in graphics.We implement Anderson acceleration following the source codereleased by the authors of [Peng et al. 2018] . The source codeof our implementation is available at https://github.com/bldeng/AA-ADMM. All examples are run on a desktop PC with 32GB ofRAM and a quad-core CPU at 3.6GHz. To account for the dimensionand the numerical range of the variables, we use the following normalized combined residual R c and normalized forward residual R f to measure convergence: R c = r r c N z · a , R f = r r f N z · a , (27)where r c is the combined residual computed from Eq. (11) or (12), r f is the squared norm of the residual of Eq. (22) or (24), N z is thedimension of z , and a > https://github.com/bldeng/AASolverACM Trans. Graph., Vol. 38, No. 6, Article 163. Publication date: November 2019. N o r m a li ze d F o r w a r d R e s i du a l ADMMOur method Frame 1 Frame 5 Frame 20

Singular Value Error

Frame 1 Frame 5 Frame 20

Fig. 4. For the simulation of a discretized flag with hard constraints that limit its strain, our accelerated solver convergences faster than an ADMM solver.Here the color-coding shows the deviation from the deformation gradient singular values from their prescribed range. Using the same computational budgetto compute a frame, the results with our solver satisfy the strain limiting constraints better. variable range. In the following, for all physical simulation andgeometry optimization problems, we set a to the average edge lengthof the initial discretized model. For image processing problems, wesimply set a =

1. For the choice of parameter m , similar to [Penget al. 2018] we observe that a large m tends to improve the reductionof iteration count but increases the computational overhead periteration (see Fig. 2). We choose m = Overby et al. [2017] performed physical simulation via the followingoptimization problem:min x , z f ( x ) + д ( z ) s.t. W ( z − Dx ) = , (28)Here x is the node positions of the discretized object, f ( x ) is a mo-mentum energy of the form (14) with G being a scaled mass matrix, Dx collects the deformation gradient of each element, д ( z ) is theelastic potential energy, and W is a diagonal scaling matrix thatimproves conditioning. This problem is solved in (28) using ADMMwith the x - z - u iteration. As it satisfies the assumptions in Proposi-tion 3.1, we apply Anderson acceleration to variable z according toAlgorithm 3. Our method is implemented based on the source codereleased by the authors of [Overby et al. 2017] . Fig. 2 compares thesimulation performance on three elastic bars subject to horizontalexternal forces on their two ends. We use the same material stiffnessfor all bars, and a different elastic potential energy model for eachbar (corotational, StVK and neo-Hookean, respectively). We applythe original solver and our solver with different m values to the sameproblem for a particular frame, and plot their normalized combinedresiduals and normalized forward residuals through the iterations.The methods are compared on two types of material stiffness (“softrubber” and “rubber” as defined in the code from [Overby et al.2017], with the latter one being stiffer). Our method decreases bothresiduals much faster than the original ADMM solver for each stiff-ness settings. Moreover, these two residuals are highly correlated,which demonstrates the effectiveness of using the forward residualto verify accelerated iterates according to Proposition 3.2. On the https://github.com/mattoverby/admm-elastic rubber models, we also evaluate the performance of the general ap-proach in Algorithm 1 that accelerates z and u together. We can seethat accelerating z alone leads to a faster decrease of the combinedresidual. One possible reason is that Algorithm 3 explicitly enforcesthe compatibility condition (21), so that the accelerated z and therecovered u always correspond to a valid intermediate value for acertain ADMM iterate sequence. This property does not hold forthe general approach, since it only performs affine combination toobtain the accelerated z and u , which is more akin to finding a newinitial value for an ADMM sequence. In Fig. 3, we use the same softrubber simulation problem to compare our method with existingADMM acceleration techniques, including [Goldstein et al. 2014]and [Kadkhodaie et al. 2015] which combined Nesterov’s acceler-ation scheme with a restarting rule based on combined residual,as well as over-relaxation [Eckstein and Bertsekas 1992] with a re-laxation parameter α ∈ [ . , . ] as explained in [Boyd et al. 2011,§3.4.3]. As [Goldstein et al. 2014; Kadkhodaie et al. 2015] rely on theconvexity of the problem, they are ineffective for this non-convexproblem and in fact increases the computational time. Althoughover-relaxation speeds up the decrease of residual, it achieves lessacceleration than our method.The solver in [Overby et al. 2017] allows enforcing hard con-straints on node positions. Our method can be applied in such casesas well. In Fig. 4, we simulate the movement of a triangulated flagunder the wind force. Within д ( z ) , the elastic potential energy foreach triangle is defined as the squared Euclidean distance from itsdeformation gradient to the closest rotation matrix. In addition, д ( z ) contains an indicator function term for the strain limit of each trian-gle that requires all singular values of the deformation gradient tobe within the range [ . , . ] . Due to such hard constraints for z ,we cannot apply our method to the x - z - u iteration (see Remark 3.1).Instead, we adopt the z - x - u iteration and apply Algorithm 4 to ac-celerate u alone, because the iteration satisfies the assumptions inProposition 3.3. We compare the original ADMM solver with our ac-celerated solver with m =

6. To this end, we first apply our solver tocompute a simulation sequence, and then re-solve the optimizationproblem using the original ADMM solver. Fig. 4 plots the normal-ized forward residual from each solver on three frames, where we

ACM Trans. Graph., Vol. 38, No. 6, Article 163. Publication date: November 2019. ccelerating ADMM for Efficient Simulation and Optimization • 163:11 N o r m a li ze d F o r w a r d R e s i du a l Frame 200Frame 70 Frame 300

Fig. 5. Simulation of a falling horse, with hard constraints on node positionsthat prevent them from penetrating the static objects. Our method achievesfaster convergence than ADMM, as shown by the plots of normalized for-ward residual for three frames. see a faster decrease of the residual using our solver. In addition,for these three frames we take the results from both solvers withinthe same computational time, and use color-coding to illustrate themaximum deviation of its deformation gradient singular values fromthe prescribed range on each triangle. We can see that our solverleads to better satisfaction of the strain limiting constraints.Hard constraints are also used in [Overby et al. 2017] to handlecollision between objects. In Figs. 5 and 6, we apply our method insuch scenarios. Here an elastic solid horse model falls under gravityand collides with static objects in the scene. In [Overby et al. 2017],this is handled by enforcing hard constraints on x that prevent thenodes from penetrating the static objects. As this would reduce the x -update step to a time-consuming quadratic programming problem,[Overby et al. 2017] linearizes the constraints and solve the resultinglinear system. However, with such modification it is no longer anADMM algorithm. Therefore, we apply the constraints on z insteadand solve the problem using z - x - u iteration, with acceleration ac-cording to Algorithm 4. Figs. 5 and 6 plot the normalized forwardresidual for computing certain frames in the simulation sequence,showing a faster decrease of the residual with our method. We also apply our method to an ADMM solver for mesh optimizationsubject to both soft and hard constraints based on [Deng et al. 2015].The input is a mesh with vertex positions x , soft constraints A i x ∈C i ( i ∈ S ), and hard constraints A j x ∈ C j ( j ∈ H ). Here eachreduction matrix A i and A j selects vertex positions relevant tothe constraint and (where appropriate) compute their differentialcoordinates with respect to either their mean position or one of thevertices. [Deng et al. 2015] introduce auxiliary variables z i ∈ C i N o r m a li ze d F o r w a r d R e s i du a l Frame 20Frame 275 N o r m a li ze d F o r w a r d R e s i du a l Fig. 6. The same simulation of a falling horse as in Fig. 5, with more complexarrangement of static objects. Our acceleration approach remains effective. ( i ∈ S ) and z j ∈ C j ( j ∈ H ) to derive an optimization problemmin x , z ∥ L ( x − ˜ x )∥ + (cid:213) i ∈S (cid:16) w i ∥ A i x − z i ∥ + σ C i ( z i ) (cid:17) + (cid:213) j ∈H σ C j ( z j ) s.t. A j x − z j = , ∀ j ∈ H . (29)Here ∥ L ( x − ˜ x )∥ is an optional Laplacian fairing energy for thevertex positions and/or for their displacement from initial positions,whereas ∥ A i x − z i ∥ penalizes the violation of a soft constraint witha user-specified weight w i . This problem is solved in [Deng et al.2015] using the augmented Lagrangian method (ALM), where eachiteration performs multiple alternate updates of z and x followed bya single update of u , using the same formulas as (6). Wu et al. [2011]pointed out that it is more efficient to perform only one alternateupdate of primal variables per iteration, in which case ALM reducesto ADMM. Therefore, we solve the problem using ADMM with the z - x - u iteration, and apply the general approach in Algorithm 2 foracceleration because the target function is not separable.In Fig. 7, we apply our method with m = • Hard constraints: all edges have the same length l ; within a face,each angle formed by two incident edges is in the range [ π , π ] . • Soft constraint: each vertex lies on a given reference surface.The mesh is optimized without the Laplacian fairing term, i.e., L = .Our method leads to a faster decrease of the combined residual with ACM Trans. Graph., Vol. 38, No. 6, Article 163. Publication date: November 2019.

Edge length error

Target Mesh N o r m a li ze d C o m b i n e d R e s i du a l ADMM Our methodInitial Mesh

Fig. 7. Our method accelerates an ADMM solver for wire mesh optimization, as shown by the normalized combined residual plots. We also show two resultscomputed using ADMM and our accelerated solver within the same computational time (indicated in the bottom-right plot), and evaluate their violation ofthe angle constraints and edge length constraints using the error metrics in Eq. (30) . Our result satisfies these constraints better.

Reference surface distance angle & edge penalty = 100 angle & edge penalty = 10000 angle & edge penalty = 1000000 ADMM penalty = 1000

ShapeUp + Anderson Acceleration Our method

Target Mesh Initial Mesh

Fig. 8. Comparison of wire mesh optimzation results using our accelerated ADMM solver and an accelerated quadratic penalty method as described in [Penget al. 2018]. The error metric E is the sum of squared distances from the mesh vertices to the reference shape, and the color-coding illustrates the distance foreach vertex. Although the quadratic penalty method can improve satisfaction of the angle and edge length constraints with a larger penalty weight, this leadsto greater deviation from the reference shape. respect to both the iteration count and the computational time. Wealso evaluate the violation of hard constraints using the followingerror metrics for angle α and edge length e : ξ ( e ) = | e − l | l , γ ( α ) =  π − α if α < π , α − π if α > π , . (30)The data and color-coding in Fig. 7 show that within the samecomputational time, the result from our method satisfies the hardconstraints better than the original ADMM. Besides ADMM, another popular approach for enforcing hardconstraints is the quadratic penalty method, which replaces theoriginal constrained problem by an unconstrained problem withquadratic terms in the target function to penalize the violation ofhard constraints [Nocedal and Wright 2006]. Fig. 8 compares theeffectiveness of these two approaches in enforcing hard constraintswhile decreasing the original target function. For the quadraticpenalty method, we use ShapeUp [Bouaziz et al. 2012] with An-derson acceleration as described in [Peng et al. 2018], and solvethree problem instances with different penalty weights for hard con-straints and fixed weights for the other terms. Each solver is run to ACM Trans. Graph., Vol. 38, No. 6, Article 163. Publication date: November 2019. ccelerating ADMM for Efficient Simulation and Optimization • 163:13

Target Mesh Optimized MeshInitial Mesh

Our methodADMM

Planarity error N o r m a li ze d C o m b i n e d R e s i du a l Fig. 9. PQ mesh optimization using our accelerated solver convergences faster than ADMM, and achieves better satisfaction of the planarity constraintswithin the same computational time (highlighted in the plot in bottom right). full convergence for comparison. We can see that although a largerpenalty weight for hard constraints improves their satisfaction, italso leads to relatively less penalty and greater violation of the softconstraints. In particular, with a large penalty weight to satisfy thehard constraints to a similar level as ADMM, the result from thequadratic penalty method deviates much more from the referencesurface than ADMM. It shows that ADMM is more effective in sat-isfying hard constraints without compromising the minimization ofthe target function, and our method further improves its efficiency.In Figs. 1 and 9, we apply our method to planar quad mesh opti-mization, a classical problem in architectural geometry [Liu et al.2006]. The input is a quad mesh subject to the following constraints: • Hard constraint: vertices within each face lie on a common plane. • Soft constraint: each vertex lies on a given reference surface.Following [Bouaziz et al. 2012], the reduction matrix for each hardconstraint represents the mean-centering operator for the verticeson a common face. The target function includes a Laplacian fairnessenergy and a relative fairness energy for the vertex positions, asdescribed in [Liu et al. 2011]. We measure the planarity error foreach face F of a given mesh using the metric d max ( F )/ e , where d max ( F ) is the maximum distance from a vertex of F to the bestfitting plane of its vertices, and e is the average edge length of themesh. In both Fig. 1 and Fig. 9, our method accelerates the decreaseof the combined residual, producing a result with lower planarityerror than the original ADMM within the same computational time. In Fig. 10, we apply our method to the ADMM solver from theProxImaL image optimization framework [Heide et al. 2016]. Wecompare our method with the original solver on the following prob-lem that computes a deconvoluted image x from an observationimage f with Gaussian noise and a convolution operator K :min x , z λ ∥ z − f ∥ + λ ∥ z i , j ∥ s.t. Kx = z , (∇ x ) i , j = z i , j ∀ i , j , (31) where (∇ x ) i , j is the image gradient of x at pixel ( i , j ) . This is solvedin [Heide et al. 2016] using ADMM with the x - z - u iteration, and weaccelerate it using Algorithm 1 with m =

6. We modify the sourcecode of the ProxImaL library to implement our accelerated solver,and use conjugate gradient to solve the linear systems in the updatesteps. Fig. 10 shows that our method requires less computationaltime and lower iteration count to achieve the same residual value.Finally, in Fig. 11, we accelerate the ADMM solver used by theCoded Wavefront Sensor in [Wang et al. 2018] for computing theobserved wavefront from a captured image. The wavefront x iscomputed by solving an optimization problemmin x , z λ ∥∇ x ∥ + д ( z ) s.t. ∇ x = z , (32)where z is an auxiliary variable for image gradient, and д ( z ) is a qua-dratic term that measures the consistency between the wavefrontand the captured image. From the general condition presented inAppendix E.1, we know that in each iteration the dual variable u k can be represented as a function of z k via Eq. (49). Therefore, weapply Anderson acceleration to z alone. Moreover, as д ( z ) is qua-dratic, Eq. (49) implies that z k and u k are related by a linear map.Thus we use the history z to compute the affine combination coeffi-cients for Anderson acceleration, and apply them to both z and u toderive the accelerated z and its compatible u , similar to Algorithm 4.We modify the source code released by the authors of [Wang et al.2018] to implement our accelerated solver. Fig. 11 compares thenormalized combined residual plots between the two solvers, usinga test example provided in the released source code. Compared tothe original ADMM, our method leads to a significant reduction ofcomputational time and iteration count for the same accuracy. Alsoincluded in the comparison is the GMRES acceleration for ADMM https://github.com/comp-imaging/ProxImaL https://github.com/vccimaging/MegapixelAOACM Trans. Graph., Vol. 38, No. 6, Article 163. Publication date: November 2019. Observation ResultGround truth N o r m a li ze d C o m b i n e d R e s i du a l Fig. 10. Our method accelerates the ADMM solver in [Heide et al. 2016] forthe deconvolution of a × image with Gaussian noise using Eq. (31) .The given convolution operator K is visualized in the bottom right of theobservation image. proposed in [Zhang and White 2018], which is designed specifi-cally for strongly convex quadratic problems. Following [Zhang andWhite 2018], we restart GMRES every 10 iterations to reduce com-putational cost. As a general method, our approach is outperformedby GMRES acceleration, but only by a small margin. In this paper, we apply Anderson acceleration to improve the con-vergence of ADMM on computer graphics problems. We show thatADMM can be interpreted as a fixed-point iteration of the secondprimal variable and the dual variable in the general case, and of onlyone of them when the problem has a separable target function andsatisfies certain conditions. Such interpretation allows us to directlyapply Anderson acceleration in the former case, and further reduceits computational overhead in the latter case. Moreover, for eachcase we propose a simple residual for measuring the convergence,and use it to determine whether to accept an accelerated iterate. Weapply this method to a variety of ADMM solvers in graphics, withapplications ranging from physics simulation, geometry process-ing, to image processing. Our method shows its effectiveness on allthese problems, with a notable reduction of iteration account andcomputational time required to reach the same accuracy. On thetheoretical front, we also prove the convergence of ADMM for acommon non-convex problem structure in computer graphics un-der weak assumptions. Our work addresses two main limitationsof ADMM especially on non-convex problems, which will help toexpand its applicability in computer graphics as a versatile solver foroptimization problems that are potentially non-smooth, non-convex,and with hard constraints.One limitation of our method is that it can be less effective forADMM solvers with very low computational cost per iteration. Inthis case, the overhead of Anderson acceleration can cause a largerelative increase of computational time, which partly cancels outthe speedup gained from the reduction of iteration count. One such N o r m a li ze d C o m b i n e d R e s i du a l Ground Truth Result

ADMMOurs m=1Ours m=4Ours m=3Ours m=2Ours m=5Ours m=6GMRES

Fig. 11. Our method accelerates the ADMM solver in [Wang et al. 2018] forcomputing the observed wavefront from a captured image, and achievessimilar performance as the specialized GMRES acceleration [Zhang andWhite 2018] despite being a general acceleration technique. example is Fig. 12, where we apply our method to the ADMM solverin [Tao et al. 2019] for correcting a vector field into an integrablegradient field of geodesic distance. Although our method reduces thenumber of iterations, its large relative overhead actually increasesthe computational time for achieving the same residual.Our experiments show that Anderson acceleration is effective inreducing the number of iterations, but we do not have a theoreticalguarantee for such property. This is still an open research problem,and the only existing result we are aware of is [Evans et al. 2018],which proves that Anderson acceleration improves the convergencerate for linearly converging fixed-point methods if a set of strongassumptions is satisfied. Further theoretical analysis of our methodis needed to understand and guarantee its performance.Currently we follow the convention and set the mixing parameter β = β = ACM Trans. Graph., Vol. 38, No. 6, Article 163. Publication date: November 2019. ccelerating ADMM for Efficient Simulation and Optimization • 163:15 N o r m a li ze d C o m b i n e d R e s i du a l Fig. 12. We apply our method to the ADMM solver in [Tao et al. 2019] forcorrecting a vector field into an integrable gradient field. Due to the very lowcomputational cost per iteration in the original solver, Anderson accelerationincurs a large relative overhead. As a result, although our method reducesthe number of iterations, it actually increases the computational time.

ACKNOWLEDGMENTS

The target model in Figure 7, “Male Torso, Diadumenus Type” byCosmo Wenman, is licensed under CC BY 3.0. This work was sup-ported by National Natural Science Foundation of China (No. 61672481),and Youth Innovation Promotion Association CAS (No. 2018495).

REFERENCES

M. S. C. Almeida and M. Figueiredo. 2013. Deconvolving images With unknownboundaries using the alternating direction method of multipliers.

IEEE Transactionson Image Processing

22, 8 (2013), 3074–3086.Donald G. Anderson. 1965. Iterative procedures for nonlinear integral equations.

J.ACM

12, 4 (1965), 547–560.Sofien Bouaziz, Mario Deuss, Yuliy Schwartzburg, Thibaut Weise, and Mark Pauly. 2012.Shape-Up: Shaping discrete geometry with projections.

Comput. Graph. Forum

31, 5(2012), 1657–1667.Sofien Bouaziz, Sebastian Martin, Tiantian Liu, Ladislav Kavan, and Mark Pauly. 2014.Projective dynamics: fusing constraint projections for fast simulation.

ACM Trans.Graph.

33, 4 (2014), 154:1–154:11.Sofien Bouaziz, Andrea Tagliasacchi, and Mark Pauly. 2013. Sparse iterative closestpoint.

Computer Graphics Forum

32, 5 (2013), 113–123.Stephen Boyd, Neal Parikh, Eric Chu, Borja Peleato, Jonathan Eckstein, et al. 2011.Distributed optimization and statistical learning via the alternating direction methodof multipliers.

Foundations and Trends in Machine learning

3, 1 (2011), 1–122.Christopher Brandt, Elmar Eisemann, and Klaus Hildebrandt. 2018. Hyper-reducedprojective dynamics.

ACM Trans. Graph.

37, 4 (2018), 80:1–80:13.S. H. Chan, X. Wang, and O. A. Elgendy. 2017. Plug-and-play ADMM for image restora-tion: Fixed-point convergence and applications.

IEEE Transactions on ComputationalImaging

3, 1 (2017), 84–98.R. Chartrand. 2012. Nonconvex splitting for regularized low-rank + sparse decomposi-tion.

IEEE Transactions on Signal Processing

60, 11 (2012), 5810–5819.R. Chartrand and B. Wohlberg. 2013. A nonconvex ADMM algorithm for group sparsitywith sparse groups (ICASSP 2013) . 6009–6013.S. Claici, M. Bessmeltsev, S. Schaefer, and J. Solomon. 2017. Isometry-Aware precondi-tioning for mesh parameterization.

Comput. Graph. Forum

36, 5 (2017), 37–47.Patrick L. Combettes and Jean-Christophe Pesquet. 2011. Proximal splitting methodsin signal processing. In

Fixed-Point Algorithms for Inverse Problems in Science andEngineering , Heinz H. Bauschke, Regina S. Burachik, Patrick L. Combettes, VeitElser, D. Russell Luke, and Henry Wolkowicz (Eds.). 185–212.Bailin Deng, Sofien Bouaziz, Mario Deuss, Alexandre Kaspar, Yuliy Schwartzburg, andMark Pauly. 2015. Interactive design exploration for constrained meshes.

Computer-Aided Design

61, Supplement C (2015), 13–23.Wei Deng and Wotao Yin. 2016. On the global and linear convergence of the generalizedalternating direction method of multipliers.

Journal of Scientific Computing

66, 3(2016), 889–916.Jonathan Eckstein and Dimitri P Bertsekas. 1992. On the DouglasâĂŤRachford split-ting method and the proximal point algorithm for maximal monotone operators.

Mathematical Programming

55, 1-3 (1992), 293–318.T. Erseghe, D. Zennaro, E. Dall’Anese, and L. Vangelista. 2011. Fast consensus by thealternating direction multipliers method.

IEEE Transactions on Signal Processing arXiv preprint arXiv:1810.08455 (2018).V. Eyert. 1996. A comparative study on methods for convergence acceleration ofiterative vector sequences.

J. Comput. Phys.

Numerical Linear Algebra with Applications

16, 3 (2009), 197–221.M. A. T. Figueiredo and J. M. Bioucas-Dias. 2010. Restoration of Poissonian imagesusing alternating direction optimization.

IEEE Transactions on Image Processing

Studies in Mathematics and Its Appli-cations . Vol. 15. Elsevier, 97–146.Daniel Gabay and Bertrand Mercier. 1976. A dual algorithm for the solution of nonlinearvariational problems via finite element approximation.

Computers & Mathematicswith Applications

2, 1 (1976), 17–40.Akash Garg, Andrew O. Sageman-Furnas, Bailin Deng, Yonghao Yue, Eitan Grinspun,Mark Pauly, and Max Wardetzky. 2014. Wire mesh design.

ACM Trans. Graph.

33, 4(2014), 66:1–66:12.E. Ghadimi, A. Teixeira, I. Shames, and M. Johansson. 2015. Optimal parameter selectionfor the alternating direction method of multipliers (ADMM): quadratic problems.

IEEE Trans. Automat. Control

60, 3 (2015), 644–658.P. Giselsson and S. Boyd. 2017. Linear convergence and metric selection for Douglas-Rachford splitting and ADMM.

IEEE Trans. Automat. Control

62, 2 (2017), 532–544.T. Goldstein, B. O’Donoghue, S. Setzer, and R. Baraniuk. 2014. Fast alternating directionoptimization methods.

SIAM Journal on Imaging Sciences

7, 3 (2014), 1588–1623.James Gregson, Ivo Ihrke, Nils Thuerey, and Wolfgang Heidrich. 2014. From capture tosimulation: Connecting forward and inverse problems in fluids.

ACM Trans. Graph.

33, 4 (2014), 139:1–139:11.D. Hajinezhad, T. Chang, X. Wang, Q. Shi, and M. Hong. 2016. Nonnegative matrixfactorization using ADMM: Algorithm and convergence analysis (ICASSP 2016) .4742–4746.Felix Heide, Steven Diamond, Matthias Nießner, Jonathan Ragan-Kelley, WolfgangHeidrich, and Gordon Wetzstein. 2016. ProxImaL: Efficient image optimizationusing proximal algorithms.

ACM Trans. Graph.

35, 4 (2016), 84:1–84:15.Nguyenho Ho, Sarah D. Olson, and Homer F. Walker. 2017. Accelerating the Uzawaalgorithm.

SIAM Journal on Scientific Computing

39, 5 (2017), S461–S476.Mingyi Hong, Zhi-Quan Luo, and Meisam Razaviyayn. 2016. Convergence analysisof alternating direction method of multipliers for a family of nonconvex problems.

SIAM Journal on Optimization

26, 1 (2016), 337–364.Y. Hu, D. Zhang, J. Ye, X. Li, and X. He. 2013. Fast and accurate matrix completion viatruncated nuclear norm regularization.

IEEE Transactions on Pattern Analysis andMachine Intelligence

35, 9 (2013), 2117–2130.Mojtaba Kadkhodaie, Konstantina Christakopoulou, Maziar Sanjabi, and ArindamBanerjee. 2015. Accelerated alternating direction method of multipliers (KDD ’15) .497–506.Shahar Z. Kovalsky, Meirav Galun, and Yaron Lipman. 2016. Accelerated quadraticproxy for geometric optimization.

ACM Trans. Graph.

35, 4 (2016), 134:1–134:11.Rongjie Lai and Stanley Osher. 2014. A splitting method for orthogonality constrainedproblems.

Journal of Scientific Computing

58, 2 (2014), 431–449.Kenneth Lange. 2004. The MM Algorithm. In

Optimization . Springer, 119–136.Guoyin Li and Ting Kei Pong. 2015. Global convergence of splitting methods fornonconvex composite optimization.

SIAM Journal on Optimization

25, 4 (2015),2434–2460.Athanasios P Liavas and Nicholas D Sidiropoulos. 2015. Parallel algorithms for con-strained tensor factorization via alternating direction method of multipliers.

IEEETransactions on Signal Processing

63, 20 (2015), 5450–5463.F. Lin, M. Fardad, and M. R. JovanoviÄĞ. 2013. Design of optimal sparse feedback gainsvia the alternating direction method of multipliers.

IEEE Trans. Automat. Control

SIAM Journal on Optimization

25, 3 (2015), 1478–1497.K. Lipnikov, D. Svyatskiy, and Y. Vassilevski. 2013. Anderson acceleration for nonlinearfinite volume scheme for advection-diffusion problems.

SIAM Journal on ScientificComputing

35, 2 (2013), A1120–A1136.J. Liu, P. Musialski, P. Wonka, and J. Ye. 2013. Tensor completion for estimating missingvalues in visual data.

IEEE Transactions on Pattern Analysis and Machine Intelligence

35, 1 (2013), 208–220.Ligang Liu, Lei Zhang, Yin Xu, Craig Gotsman, and Steven J. Gortler. 2008. A local/globalapproach to mesh parameterization.

Computer Graphics Forum

27, 5 (2008), 1495–1504.Tiantian Liu, Adam W. Bargteil, James F. O’Brien, and Ladislav Kavan. 2013. Fastsimulation of mass-spring systems.

ACM Trans. Graph.

32, 6 (2013), 214:1–214:7.Tiantian Liu, Sofien Bouaziz, and Ladislav Kavan. 2017. Quasi-Newton methods forreal-time simulation of hyperelastic materials.

ACM Trans. Graph.

36, 3 (2017),23:1–23:16.ACM Trans. Graph., Vol. 38, No. 6, Article 163. Publication date: November 2019.

Yang Liu, Helmut Pottmann, Johannes Wallner, Yong-Liang Yang, and Wenping Wang.2006. Geometric modeling with conical meshes and developable surfaces.

ACMTrans. Graph.

25, 3 (2006), 681–689.Yang Liu, Weiwei Xu, Jun Wang, Lifeng Zhu, Baining Guo, Falai Chen, and GuopingWang. 2011. General planar quadrilateral mesh design using conjugate directionfield.

ACM Trans. Graph.

30, 6 (2011), 140:1–140:10.Sindri Magnússon, Pradeep Chathuranga Weeraddana, Michael G Rabbat, and CarloFischione. 2016. On the convergence of alternating direction Lagrangian methodsfor nonconvex structured optimization problems.

IEEE Transactions on Control ofNetwork Systems

3, 3 (2016), 296–309.Sebastian Martin, Bernhard Thomaszewski, Eitan Grinspun, and Markus Gross. 2011.Example-based elastic materials.

ACM Trans. Graph.

30, 4 (2011), 72:1–72:8.Ondrej Miksik, Vibhav Vineet, Patrick Pérez, and Phillip Torr. 2014. Distributed non-convex ADMM-based inference in large-scale random fields (BMVC 2014) .Yurii Nesterov. 1983. A method of solving a convex programming problem withconvergence rate O ( / k ) . Soviet Mathematics Doklady

27 (1983), 372–376.Yurii Nesterov. 2013.

Introductory Lectures on Convex Optimization: A Basic Course .Springer Science & Business Media.T. Neumann, K. Varanasi, C. Theobalt, M. Magnor, and M. Wacker. 2014. Compressedmanifold modes for mesh processing.

Computer Graphics Forum

33, 5 (2014), 35–44.Thomas Neumann, Kiran Varanasi, Stephan Wenger, Markus Wacker, Marcus Magnor,and Christian Theobalt. 2013. Sparse localized deformation components.

ACMTrans. Graph.

32, 6 (2013), 179:1–179:10.Jorge Nocedal and Stephen J. Wright. 2006.

Numerical Optimization (2nd ed.). Springer-Verlag New York.M. Overby, G. E. Brown, J. Li, and R. Narain. 2017. ADMM ⊇ projective dynamics: Fastsimulation of hyperelastic models with dynamic constraints. IEEE Transactions onVisualization and Computer Graphics

23, 10 (2017), 2222–2234.Zherong Pan and Dinesh Manocha. 2017. Efficient solver for spacetime control ofsmoke.

ACM Trans. Graph.

36, 5 (2017).Yue Peng, Bailin Deng, Juyong Zhang, Fanyu Geng, Wenjie Qin, and Ligang Liu. 2018.Anderson acceleration for geometry optimization and physics simulation.

ACMTrans. Graph.

37, 4 (2018), 42:1–42:14.Florian A. Potra and Hans Engler. 2013. A characterization of the behavior of theAnderson acceleration on linear problems.

Linear Algebra Appl.

J. Comput. Phys.

306 (2016), 43–54.Péter Pulay. 1980. Convergence acceleration of iterative sequences. the case of SCFiteration.

Chemical Physics Letters

73, 2 (1980), 393–398.P. Pulay. 1982. Improved SCF convergence acceleration.

Journal of ComputationalChemistry

3, 4 (1982), 556–560.Michael Rabinovich, Roi Poranne, Daniele Panozzo, and Olga Sorkine-Hornung. 2017.Scalable locally injective mappings.

ACM Trans. Graph.

36, 2 (2017), 16:1–16:16.Ralph Tyrell Rockafellar. 1997.

Convex Analysis . Princeton University Press.R Tyrrell Rockafellar and Roger J-B Wets. 2009.

Variational Analysis . Vol. 317. SpringerScience & Business Media.Thorsten Rohwedder and Reinhold Schneider. 2011. An analysis for the DIIS accelerationmethod used in quantum chemistry calculations.

Journal of Mathematical Chemistry

49, 9 (2011), 1889–1914.Christian Schumacher, Bernhard Thomaszewski, Stelian Coros, Sebastian Martin,Robert Sumner, and Markus Gross. 2012. Efficient simulation of example-basedmaterials (SCA ’12) . 1–8.W. Shi, Q. Ling, K. Yuan, G. Wu, and W. Yin. 2014. On the linear convergence ofthe ADMM in decentralized consensus optimization.

IEEE Transactions on SignalProcessing

62, 7 (2014), 1750–1761.Anna Shtengel, Roi Poranne, Olga Sorkine-Hornung, Shahar Z. Kovalsky, and YaronLipman. 2017. Geometric optimization via composite majorization.

ACM Trans.Graph.

36, 4 (2017), 38:1–38:11.A. Simonetto and G. Leus. 2014. Distributed maximum likelihood sensor networklocalization.

IEEE Transactions on Signal Processing

62, 6 (2014), 1424–1437.Olga Sorkine and Marc Alexa. 2007. As-rigid-as-possible surface modeling (SGP ’07) .109–116.H. De Sterck. 2012. A nonlinear GMRES optimization algorithm for canonical tensordecomposition.

SIAM Journal on Scientific Computing

34, 3 (2012), A1351–A1379.Phanish Suryanarayana, Phanisri P. Pratapa, and John E. Pask. 2019. AlternatingAnderson-Richardson method: An efficient alternative to preconditioned Krylovmethods for large, sparse linear systems.

Computer Physics Communications

IEEE Transactions on Pattern Analysisand Machine Intelligence (2019).Alex Toth, J. Austin Ellis, Tom Evans, Steven Hamilton, C. T. Kelley, Roger Pawlowski,and Stuart Slattery. 2017. Local improvement results for Anderson acceleration withinaccurate function evaluations.

SIAM Journal on Scientific Computing

39, 5 (2017), S47–S65.Alex Toth and C. T. Kelley. 2015. Convergence analysis for Anderson acceleration.

SIAM J. Numer. Anal.

53, 2 (2015), 805–819.Homer F. Walker and Peng Ni. 2011. Anderson acceleration for fixed-point iterations.

SIAM J. Numer. Anal.

49, 4 (2011), 1715–1735.Congli Wang, Qiang Fu, Xiong Dun, and Wolfgang Heidrich. 2018. Megapixel adaptiveoptics: Towards correcting large-scale distortions in computational cameras.

ACMTrans. Graph.

37, 4 (2018), 115:1–115:12.Huamin Wang. 2015. A Chebyshev semi-iterative approach for accelerating projectiveand position-based dynamics.

ACM Trans. Graph.

34, 6 (2015), 246:1–246:9.Huamin Wang and Yin Yang. 2016. Descent methods for elastic body simulation on theGPU.

ACM Trans. Graph.

35, 6 (2016), 212:1–212:10.Yu Wang, Wotao Yin, and Jinshan Zeng. 2019. Global convergence of ADMM innonconvex nonsmooth optimization.

Journal of Scientific Computing

78, 1 (2019),29–63.Zaiwen Wen, Chao Yang, Xin Liu, and Stefano Marchesini. 2012. Alternating directionmethods for classical and ptychographic phase retrieval.

Inverse Problems

28, 11(2012).Chunlin Wu, Juyong Zhang, and Xue-Cheng Tai. 2011. Augmented Lagrangian methodfor total variation restoration with non-quadratic fidelity.

Inverse Problems andImaging

5, 1 (2011), 237–261.Jinhui Xiong, Ramzi Idoughi, Andres A. Aguirre-Pablo, Abdulrahman B. Aljedaani,Xiong Dun, Qiang Fu, Sigurdur T. Thoroddsen, and Wolfgang Heidrich. 2017. Rain-bow particle imaging velocimetry for dense 3D fluid velocity imaging.

ACM Trans.Graph.

36, 4 (2017), 36:1–36:14.Shiyao Xiong, Juyong Zhang, Jianmin Zheng, Jianfei Cai, and Ligang Liu. 2014. Robustsurface reconstruction via dictionary learning.

ACM Trans. Graph.

33, 6 (2014),201:1–201:12.J. Yang, L. Luo, J. Qian, Y. Tai, F. Zhang, and Y. Xu. 2017. Nuclear norm based matrixregression with applications to face recognition with occlusion and illuminationchanges.

IEEE Transactions on Pattern Analysis and Machine Intelligence

39, 1 (2017),156–171.Juyong Zhang, Bailin Deng, Zishun Liu, Giuseppe Patanè, Sofien Bouaziz, Kai Hormann,and Ligang Liu. 2014. Local barycentric coordinates.

ACM Trans. Graph.

33, 6 (2014),188:1–188:12.Junzi Zhang, Brendan O’Donoghue, and Stephen Boyd. 2018. Globally convergenttype-I Anderson acceleration for non-smooth fixed-point iterations. arXiv preprintarXiv:1808.03971 (2018).Ruiliang Zhang and James T. Kwok. 2014. Asynchronous distributed ADMM forconsensus optimization (ICML ’14) . II–1701–II–1709.Richard Y. Zhang and Jacob K. White. 2018. GMRES-accelerated ADMM for quadraticobjectives.

SIAM Journal on Optimization

28, 4 (2018), 3025–3056.Yufeng Zhu, Robert Bridson, and Danny M. Kaufman. 2018. Blended cured quasi-Newton for distortion optimization.

ACM Trans. Graph.

37, 4 (2018), 40:1–40:14.

A PROOF FOR PROPOSITION 3.1

By the optimality condition of (16) we have: ∇ д ( z k + ) − µ B T ( Ax k + − Bz k + + u k − c ) = . (33)Put (17) into (33): ∇ д ( z k + ) = µ B T u k + , (34)which completes the proof. □ B PROOF FOR PROPOSITION 3.2

For the first part, suppose z k + is the fixed-point of the x - z - u itera-tion, which means that z k + = z k + . (35)Then we have u k + = u k + by (34) = ⇒ Ax k + − Bz k + − c = = ⇒ Ax k + − Bz k + − c = Ax k + − Bz k + − c = Ax k + − c + u k + = Ax k + − c + u k . (36) ACM Trans. Graph., Vol. 38, No. 6, Article 163. Publication date: November 2019. ccelerating ADMM for Efficient Simulation and Optimization • 163:17

And from (16) and Remark 3.2 z k + = argmin z (cid:16) д ( z ) + µ ∥( Ax k + − c + u k ) − Bz ∥ (cid:17) = argmin z (cid:16) д ( z ) + µ ∥( Ax k + − c + u k + ) − Bz ∥ (cid:17) = z k + , which completes the proof. □ C PROOF FOR PROPOSITION 3.3

By (20) we have: Bz k + − u k + c = Ax k + − u k + (37)Put (37) into (19): ( G + µ A T A ) x k + = G ˜ x + µ A T ( Ax k + − u k + ) = ⇒ G x k + = G ˜ x − µ A T u k + = ⇒ x k + = ˜ x − µ G − A T u k + , (38)which completes the proof. □ D PROOF FOR PROPOSITION 3.4

For the first part, suppose u k + is the fixed-point of the z - x - u itera-tion, so that u k + = u k + . (39)Then by (38) and (39): x k + = x k + . (40)Therefore Ax k + − Bz k + − c = by (20) and (39) = ⇒ Ax k + − Bz k + − c = Ax k + − Bz k + − c = . (41)Then we have u k + − Bz k + = u k − Bz k + by (20) and (41) = ⇒ x k + = x k + by (19) = ⇒ Ax k + − Bz k + − c = by (41) = ⇒ u k + = u k + by (20) , which completes the proof. □ E FURTHER DISCUSSION FOR PROPOSITIONS 3.1-3.4

We now consider the general condition such that between the secondupdated primal variable and the dual variable, one of them is afunction of the other. We consider the most general case:min x , z f ( x ) + д ( z ) s.t. Ax − Bz = c . (42)Unlike Section 3.3, we do not assume any specific form of f and д .We then only need to discuss the following x - z - u iteration because the conclusion for z - x - u iteration is similar: x k + ∈ argmin x L ( x , z k , u k ) , (43) z k + ∈ argmin z L ( x k + , z , u k ) , (44) u k + = u k + Ax k + − Bz k + − c . (45)We first need the subproblem (43) and (44) to be well-defined, forwhich the next condition is sufficient :(C1) f and д are bounded from below and lower-semi continuous.Then we rewrite the ADMM iteration as: − A T ( Ax k + − Bz k − c + u k ) ∈ ∂ f ( x k + ) , (46) B T ( Ax k + − Bz k + − c + u k ) ∈ ∂ д ( z k + ) , (47) u k + = u k + Ax k + − Bz k + − c . (48) E.1 u as a function of z By (47) and (48): B T u k + ∈ ∂ д ( z k + ) . (49)Now we can see that u k + is a function z k + if and only if:(C2) B is invertible.(C3) ∂ д ( z ) contains exactly one element ∀ z ∈ dom ( ∂ д ) .From [Rockafellar and Wets 2009, Theorem 9.18] we know that thenext condition is sufficient:(C3 ′ ) д ( z ) is strictly differentiable ∀ z ∈ dom ( ∂ д ) .Moreover, we need additional conditions in order to use Andersonacceleration on z . Note that Anderson acceleration generates z AA by affine combination. So if we want to use (49) to compute u AA from z AA , the following condition is needed:(C4) The domain of ∂ д , defined as { z | ∂ д ( z ) , ∅} , is affine. E.2 z as a function of u From (49) we know that z is a function of u if and only if:(C5) The inverse mapping of set-valued mapping ∂ д ( z ) is a single-valued mapping.The next condition is sufficient to ensure (C5) but not necessary:(C5 ′ ) д ( z ) is strictly convex.Also, similar to the argument in Appendix E.1, in order to applyAnderson acceleration on u we need the following condition:(C6) The range of ∂ д , defined as — z ∈ R n ∂ д ( z ) , is affine. F PROOFS FOR CONVERGENCE THEOREMS

This section proves the linear convergence theorems when д is lo-cally Lipschitz differentiable (Theorems 3.3 and 3.4) and the generalconvergence theorems (Theorems 3.5 and 3.6). The proofs for Theo-rems 3.1 and 3.2 are similar to those for Theorems 3.3 and 3.4, so wewill not give their complete proofs but only summarize the mainsteps. Because of the order in which some lemmas are used in theproofs, we will prove Theorem 3.5 and 3.3 first. Without loss ofgenerality, we assume c = in Eq. (13) to simplify notation. ACM Trans. Graph., Vol. 38, No. 6, Article 163. Publication date: November 2019.

F.1 Proof for Theorem 3.5

Recall that Theorem 3.5 is about general convergence of the x - z - u iteration. We first note that: ∇ ˆ д ( z ) = B − T ∇ д ( B − z ) (50) = ⇒ ∇ ˆ д ( Bz ) = B − T ∇ д ( z ) . (51)These two equations will be used frequently in the following. Notethat from Assumption 3.4(2) we can derive (33) from (16). Moreover,based on the definition of L c in Assumption 3.5 we have:Proposition F.1. Suppose the Lipschitz constant of ∇ ˆ д ( z ) over conv ( L ˆ дα ) is L , then ∀ Bz , Bz ∈ L ˆ дα , we have | ˆ д ( Bz ) − ˆ д ( Bz ) − ⟨∇ ˆ д ( Bz ) , Bz − Bz ⟩| ≤ L ∥ Bz − Bz ∥ . (52) Moreover, if µ > L , and z ∈ argmin z ( д ( z ) + µ ∥ Bz − q ∥ ) , then: д ( z ) + µ ∥ Bz − q ∥ ≤ д ( z ) + µ ∥ Bz − q ∥ − µ − L ∥ Bz − Bz ∥ . The proof is standard so we omit it. Also see [Nesterov 2013,Lemma 1.2.3 & Theorem 2.1.8]. The next lemma is important.Lemma F.1.

If Assumption 3.4 and 3.5 hold, µ − L c µ > L c , and the x - z - u iteration satisfies д ( z k ) ≤ T ( x , z ) + c and L ( x k , z k , u k ) ≤ L ( x , z , u ) = T ( x , z ) . Then д ( z k + ) ≤ T ( x , z ) + c , L ( x k + , z k + , u k + ) ≤ T ( x , z ) . (53)Proof. By the definition of z k + in (16): д ( z k + ) + µ ∥ Ax k + − Bz k + + u k ∥ ≤ д ( z k ) + µ ∥ Ax k + − Bz k + u k ∥ . And notice the definition of x k + in (15): f ( x k + ) + µ ∥ Ax k + − Bz k + u k ∥ ≤ f ( x k ) + µ ∥ Ax k − Bz k + u k ∥ . (54)Combine the two equations above: T ( x k + , z k + ) + µ ∥ Ax k + − Bz k + + u k ∥ ≤ L ( x k , z k , u k ) + µ ∥ u k ∥ . By (17) and (34): T ( x k + , z k + ) + µ ∥ u k + ∥ ≤ L ( x k , z k , u k ) + µ ∥ B − T ∇ д ( z k )∥ . (55)Notice that L ( x k , z k , u k ) ≤ T ( x , z ) and by the definition of c : д ( z k + ) ≤ T ( x , z ) + c . Thus we have proved the first part. For the second part, we have: L ( x k + , z k , u k ) ≤ L ( x k , z k , u k ) , (56) L ( x k + , z k + , u k ) ≤ L ( x k + , z k , u k ) − µ − L c ∥ Bz k + − Bz k ∥ , (57) L ( x k + , z k + , u k + ) = L ( x k + , z k + , u k ) + µ ∥ u k + − u k ∥ . (58) Here (56) is derived from (54), (57) is derived from Assumption 3.5(2)and Proposition F.1, and (58) is trivial. Add them together, and thenuse (34) and the fact that µ − L c µ > L c : L ( x k + , z k + , u k + ) ≤ L ( x k , z k , u k ) − ( µ − L c µ − L c )∥ Bz k + − Bz k ∥ ≤ T ( x , z ) , (59)Which completes the proof. □ From Assumption 3.5(1) and Lemma F.1, we have:Proposition F.2.

Suppose Assumptions 3.4 and 3.5 hold, and µ − L c µ > L c . Then the x - z - u iteration satisfies д ( z k ) ≤ T ( x , z ) + c , L ( x k , z k , u k ) ≤ T ( x , z ) . (60)By Proposition F.2, Assumption 3.4(3) has the same effect as theLipschitz differentiability assumption. The next step is similar tothe convergence proof in [Wang et al. 2019], which requires thefollowing properties for the sequence ( x k , z k , u k ) :(P1) Boundedness : the generated sequence ( x k , z k , u k ) is bounded,and L ( x k , z k , u k ) is lower bounded.(P2) Sufficient descent : there is a constant C ( µ ) > k , we have: L ( x k , z k , u k ) − L ( x k + , z k + , u k + )≥ C ( µ )(∥ B ( z k + − z k )∥ + ∥ A ( x k + − x k )∥ ) . (P3) Subgradient bound : there is a constant C ( µ ) > d k + ∈ ∂ L ( x k + , y k + , u k + ) such that ∥ d k + ∥ ≤ C ( µ )(∥ B ( z k + − z k )∥ + ∥ A ( x k + − x k )∥) . (P4) Limiting continuity : if ( x ∗ , z ∗ , u ∗ ) is the limit point of thesub-sequence ( x k s , z k s , u k s ) for s ∈ N , then we have:lim s →∞ L ( x k s , z k s , u k s ) = L ( x ∗ , z ∗ , u ∗ ) . Note that although the x - z - u iteration is not same as the one definedin [Li and Pong 2015], the proof for [Li and Pong 2015, Theorem3] is not affected by the difference. Combining it with [Wang et al.2019, Proposition 2], we can prove Theorem 3.5:Proof for Theorem 3.5. From [Wang et al. 2019, Proposition 2],[Li and Pong 2015, Theorem 3], and Proposition F.2 in our paper,we only need to show (P1)-(P4) hold for ( x k , z k , u k ) .For (P1), from (55) we have: T ( x k , z k ) ≤ T ( x , z ) + c . (61)From Assumption 3.4(1) д ( z ) is level-bounded and G is invertibleso f ( x ) is also level-bounded, thus ( x k , z k ) is bounded. The bound-edness of u k can be derived from (33). The lower boundednessof L ( x k , z k , u k ) comes from Assumption 3.5(2) and the fact that T ( x , z ) ≥

0. In fact we have: L ( x k , z k , u k ) ≥ − c .In the derivation of (56), we did not use the fact that f ( x ) isquadratic. If we take this into consideration, then (56) becomes: L ( x k + , z k , u k ) ≤ L ( x k , z k , u k ) − l ∥ x k + − x k ∥ . (62) ACM Trans. Graph., Vol. 38, No. 6, Article 163. Publication date: November 2019. ccelerating ADMM for Efficient Simulation and Optimization • 163:19

Here l > ∇ x L ( x k + , z k + , u k + ) = G ( x k + − ˜ x ) + µ A T ( Ax k + − Bz k + + u k + ) = µ A T ( Bz k − Bz k + + u k + − u k ) , (63) ∇ z L ( x k + , z k + , u k + ) = ∇ д ( z k + ) + µ B T ( Bz k + − Ax k + − u k + ) = µ B T ( u k − u k + ) , (64) ∇ u L ( x k + , z k + , u k + ) = µ ( Ax k + − Bz k + ) = µ ( u k + − u k ) . (65)Here we use (15) and (17) for (63); (33) for (64); (17) for (65). By (33),Assumption 3.4(3), and Assumption 3.5: ∥∇ x L ( x k + , z k + , u k + )∥ ≤ q ρ ( A T A )( µ + L c )∥ Bz k + − Bz k ∥ , ∥∇ z L ( x k + , z k + , u k + )∥ ≤ q ρ ( B T B ) L c ∥ Bz k + − Bz k ∥ . And notice that ∇ u L ( x k + , z k + , u k + ) = − B T ∇ z L ( x k + , z k + , u k + ) ,then we get the result. □ F.2 Proof for Theorem 3.3

Recall that Theorem 3.3 is about linear convergence of the x - z - u iteration. To simplify the notation, we define: N ( z ) B z + µ B − T ∇ д ( B − z ) . (66)Proposition F.3. The x - z - u iteration (15) - (17) satisfies N ( Bz k + ) = ( I + µ K ) − ( A ˜ x + µ KB z k + µ B − T ∇ д ( z k )) , (67) where matrix K is defined in (26) . Proof. By (17) we have: Bz k − u k = Ax k + + Bz k − Bz k + − u k + . by (15) = ⇒ ( G + µ A T A ) x k + = G ˜ x + µ A T ( Ax k + + Bz k − Bz k + − u k + ) = ⇒ x k + = ˜ x + µ G − A T ( Bz k − Bz k + − u k + ) = ⇒ Ax k + = A ˜ x + µ AG − A T ( Bz k − Bz k + − u k + ) . by (17) = ⇒ ( I + µ AG − A T )( u k + + Bz k + ) = A ˜ x + µ AG − A T Bz k + u k . by (34) = ⇒ ( I + µ AG − A T )( µ B − T ∇ д ( z k + ) + Bz k + ) = A ˜ x + µ AG − A T Bz k + µ B − T ∇ д ( z k ) . From the definitions of N and K , the last equation above becomes: ( I + µ K ) N ( Bz k + ) = A ˜ x + µ KBz k + µ B − T ∇ д ( z k ) = ⇒ N ( Bz k + ) = ( I + µ K ) − ( A ˜ x + µ KB z k + µ B − T ∇ д ( z k )) , which completes the proof. □ Next we show a sufficient condition for the convergence to astationary point:Proposition F.4.

If the sequence { z k } converges, then { x k , z k , u k } converges to a stationary point defined in (25) . Proof. Suppose z k → z ∗ . Then by (21), u k → u ∗ = B − T ∇ д ( z ∗ ) ,which proves ∇ д ( z ∗ ) − B T u ∗ =

0. By (15), x k → x ∗ where x ∗ = ( G + µ A T A ) − ( G ˜ x + µ A T ( Bz ∗ + c − u ∗ )) (68)In (17), let k → ∞ then we have Ax ∗ − Bz ∗ = c (69)The identity ∇ f ( x ∗ ) + A T u ∗ = □ We now show that { z k } converge linearly:Proof for Theorem 3.3. From (67): N ( Bz k + ) − N ( Bz k ) = ( I + µ K ) − ( µ KB ( z k − z k − ) + µ B − T (∇ д ( z k ) − ∇ д ( z k − ))) . (70)By Proposition F.2, д ( z k ) ≤ T ( x , z ) + c , ∀ k ∈ N . Then by thedefinition of c (see Assumption 3.5) and Assumption 3.4(3): ∥∇ ˆ д ( Bz k + ) − ∇ ˆ д ( Bz k )∥ ≤ L c ∥ Bz k + − Bz k ∥ , ∀ k ∈ N = ⇒ ∥ N ( Bz k + ) − N ( Bz k )∥ ≥ ( − L c µ )∥ Bz k + − Bz k ∥ . (71)For the right hand side of (70): ∥( I + µ K ) − ( µ KB ( z k − z k − ) + µ B − T (∇ д ( z k ) − ∇ д ( z k − )))∥≤ ∥( I + µ K ) − ( µ KB ( z k − z k − )∥ + ∥ µ ( I + µ K ) − B − T (∇ д ( z k ) − ∇ д ( z k − ))∥ . By the spectral mapping theorem: ∥( I + µ K ) − ( µ K )∥ = ρ (cid:16) ( I + µ K ) − ( µ K ) (cid:17) = µρ ( K ) + µρ ( K ) . (72)And notice that K is positive semi-definite: ∥ µ ( I + µ K ) − B − T (∇ д ( z k )−∇ д ( z k − )))∥ ≤ L c µ ∥ Bz k − Bz k − ∥ . (73)Combine (72) with (73): ∥( I + µ K ) − ( µ K ( Bz k − Bz k − ) + ( µ ( B − T ∇ д ( z k ) − B − T ∇ д ( z k − )))∥≤ ( µρ ( K ) + µρ ( K ) + L c µ )∥ Bz k − Bz k − ∥ . (74)By (71) and (74) we have: ∥ Bz k + − Bz k ∥ ≤ µρ ( K ) + µρ ( K ) + L c µ − L c µ ∥ Bz k − Bz k − ∥ . (75)If µ > max (cid:26) Lc − ρ ( K ) , L c (cid:27) then γ <

1, which completes the proof. □ ACM Trans. Graph., Vol. 38, No. 6, Article 163. Publication date: November 2019.

F.3 Proof for Theorem 3.6

Theorem 3.6 is about general convergence of the x - z - u iteration. Wefirst prove Proposition 3.5 that defines the value η .Proof for Proposition 3.5. By the definition of K in (26), weknow that K ( R ( A )) ⊂ R ( A ) . Since R ( A ) is a linear subspace and K is alinear operator, for the proof it suffices to show ker ( K ) ∩ R ( A ) = { } ,where ker ( K ) is the kernel of K . Now assume y ∈ ker ( K ) , then forany z ∈ R q where q is the number of rows in matrix A , we have: ⟨ AG − A T y , z ⟩ = = ⇒ ⟨ G − A T y , A T z ⟩ = = ⇒ ⟨ G − A T y , A T y ⟩ = z = y ) . Notice that G − is positive definite, so we have A T y =

0, which isequivalent to y ⊥ R ( A ) . Hence we get ker ( K ) ∩ R ( A ) = { } , whichcompletes the proof. □ The next proposition provides a characterization of u k + :Proposition F.5. The z - x - u iteration (18) - (20) satisfies: u k + = Ax k + − Ax k + µ B − T ∇ д ( z k + ) . (76)Proof. From (20): u k + − Ax k + = u k − Bz k + . (77)From (23): Ax k + u k = A ˜ x − µ Ku k + u k by (18) = ⇒ Bz k + + µ B − T ∇ д ( z k + ) = Ax k + u k = ⇒ µ B − T ∇ д ( z k + ) = Ax k + u k − Bz k + . (78)Combine (77) with (78) then we can get the result. □ Now we are able to bound both ∥ u k ∥ and ∥ u k + − u k ∥ :Proposition F.6. For z - x - u iteration (18) - (20) and k ≥ we have: ∥ u k ∥ ≤ η µ ∥ Ax k − A˜x ∥ + ( ρ ( K ) µ η + µ )∥ B − T ∇ д ( z k )∥ , (79) ∥ u k + − u k ∥ ≤ µ η ∥ Ax k + − Ax k ∥ + ( ρ ( K ) µ η + µ )∥ B − T (∇ д ( z k + ) − ∇ д ( z k ))∥ . (80)Proof. To prove (79), note that from (76): ∥ u k ∥ ≤ ∥ Ax k − Ax k − ∥ + µ ∥ B − T ∇ д ( z k )∥ . (81)And from (23): Ax k + = A˜x − µ Ku k + (82) by (76) = ⇒ Ax k + = A˜x − µ K ( Ax k + − Ax k ) − KB − T ∇ д ( z k + ) = ⇒ Ax k + − A˜x + KB − T ∇ д ( z k + ) = − µ K ( Ax k + − Ax k ) . (83)Hence by Proposition 3.5: µ η ∥ Ax k + − Ax k ∥ ≤ ∥ Ax k + − A˜x ∥ + ρ ( K ) ∥ B − T ∇ д ( z k + )∥ , and (79) follows from this equation and (81). For (80), from (76): u k + − u k = A ( x k + − x k + x k − ) + µ B − T (∇ д ( z k + ) − ∇ д ( z k )) = ⇒ ∥ u k + − u k ∥ ≤ ∥ A ( x k + − x k + x k − )∥ + µ ∥ B − T (∇ д ( z k + ) − ∇ д ( z k ))∥ . (84)And by (83): Ax k + − Ax k + KB − T (∇ д ( z k + )−∇ д ( z k )) = − µ KA ( x k + − x k + x k − ) . Hence: µ η ∥ A ( x k + − x k + x k − )∥ ≤ ∥ Ax k + − Ax k ∥ + ρ ( K ) ∥ B − T (∇ д ( z k + ) − ∇ д ( z k ))∥ . (85)Then (80) follows from (84) and (85). □ Similar to Proposition F.2, we can prove:Proposition F.7.

Suppose Assumptions 3.4 and 3.6 hold, and µ issufficiently large. The the z - x - u iteration satisfies: T ( x k , z k ) ≤ T ( x , z ) + c + c , L ( x k , z k , u k ) ≤ T ( x , z ) + c . (86)Proof. We will prove this by induction. For k = k ≤ l . Consider k = l +

1. By thedefinition of z l + in (18): д ( z l + ) + µ ∥ Ax l − Bz l + + u l ∥ ≤ д ( z l ) + µ ∥ Ax l − Bz l + u l ∥ . (87)By the definition of x l + in (18): f ( x l + ) + µ ∥ Ax l + − Bz l + + u l ∥ ≤ f ( x l ) + µ ∥ Ax l − Bz l + + u l ∥ . (88)add (88) to (87): T ( x l + , z l + ) ≤ L ( x l , z l , u l ) + µ ∥ u l ∥ . By induction: L ( x l , z l , u l ) ≤ T ( x , z ) + c . Since l + ≥

1, by Proposition F.6: µ ∥ u l ∥ ≤ η µ ∥ Ax l − A˜x ∥ + ( ρ ( K ) µη + µ )∥ B − T ∇ д ( z l )∥ . By induction: T ( x l , z l ) ≤ T ( x , z ) + c + c ≤ T ( x , z ) + . By the definition of c , µ ∥ u l ∥ ≤ c . Hence: T ( x l + , z l + ) ≤ T ( x , z ) + c + c , which proves the first part. For the second part, we first prove thatthe conclusion holds for l = k = T ( x , z ) ≤ T ( x , z ) + c + c . Notice that f ( x ) ≥ д ( z ) ≤ T ( x , z ) + c + c . Henceby Proposition F.1: L ( x , z , u ) ≤ L ( x , z , u ) − µ − L d ∥ Bz − Bz ∥ . And by Assumption 3.2: L ( x , z , u ) ≤ L ( x , z , u ) − µ ∥ Ax − Ax ∥ . ACM Trans. Graph., Vol. 38, No. 6, Article 163. Publication date: November 2019. ccelerating ADMM for Efficient Simulation and Optimization • 163:21

Moreover, we have: L ( x , z , u ) = L ( x , z , u ) + µ ∥ u − u ∥ By (79) and u = µ ∥ u − u ∥ = µ ∥ u ∥ = η µ ∥ Ax − A˜x ∥ + ( ρ ( K ) µη + µ )∥ B − T ∇ д ( z )∥ . Moreover, we have: ∥ B − T ∇ д ( z )∥ ≤ ∥ B − T ∇ д ( z ) − B − T ∇ д ( z )∥ + ∥ B − T ∇ д ( z )∥ ≤ L d ∥ Bz − Bz ∥ + ∥ B − T ∇ д ( z )∥ . So if µ ≥ η µ and µ − L d ≥ L d ( ρ ( K ) µη + µ ) , then we have: L ( x , z , u ) ≤ L ( x , z , u ) + ( ρ ( K ) µη + µ )∥ B − T ∇ д ( z )∥ = T ( x , z ) + ( ρ ( K ) µη + µ )∥ B − T ∇ д ( z )∥ . By the definition of c we have L ( x , z , u ) ≤ T ( x , z ) + c . Nowsuppose l ≥

1. Similar to the proof of the case l = L ( x l , z l + , u l ) ≤ L ( x l , z l , u l ) − µ − L d ∥ Bz l + − Bz l ∥ , L ( x l + , z l + , u l ) ≤ L ( x l , z l + , u l ) − µ ∥ Ax l + − Ax l ∥ , L ( x l + , z l + , u l + ) = L ( x l + , z l + , u l ) + µ ∥ u l + − u l ∥ . By (80) we have: µ ∥ u l + − u l ∥ ≤ µη ∥ Ax l + − Ax l ∥ + ( ρ ( K ) µη + µ )∥ B − T (∇ д ( z l + ) − ∇ д ( z l ))∥ . Since д ( z l ) , д ( z l + ) ≤ T ( x , z ) + c + c , by the definition of L d : ∥ B − T (∇ д ( z l + ) − ∇ д ( z l ))∥ ≤ L d ∥ Bz l + − Bz l ∥ . (89)Hence we have: µ ∥ u l + − u l ∥ ≤ µη ∥ Ax l + − Ax l ∥ + ( ρ ( K ) L d µη + L d µ )∥ Bz l + − Bz l ∥ . If µ ≥ η µ and µ − L d ≥ ( ρ ( K ) L d µη + L d µ ) , then we have: L ( x l + , z l + , u l + ) ≤ L ( x l , z l , u l ) ≤ T ( x , z ) + c (90)which completes the proof. □ Similar to the proof of Theorem 3.5, we need to show (P1)-(P4)hold for z - x - u iteration. Sufficient descent has already been shownin the proof of Proposition F.7. The remaining part is the same asthe proof of Theorem 3.5 so we omit it. F.4 Proof for Theorem 3.4

Theorem 3.4 is about linear convergence of the z - x - u iteration. Sim-ilar to Proposition F.4, for the convergence of the z - x - u iterationto a stationary point, it suffices to show that the sequence { u k } converges. Then for the main proof:Proof for Theorem 3.4. By (78): Bz k + + µ B − T ∇ д ( z k + ) = A ˜ x − v k = ⇒ B ( z k + − z k ) + µ B − T (∇ д ( z k + ) − ∇ д ( z k )) = −( v k − v k − ) . By (89): ( − L d µ )∥ Bz k + − Bz k ∥ ≤ ∥ v k − v k − ∥ . Hence we have:1 µ ∥ B − T (∇ д ( z k + ) − ∇ д ( z k ))∥ ≤ L d µ ∥ Bz k + − Bz k ∥≤ L d µ − L d ∥ v k − v k − ∥ . (91)By (76) and (82): ( I + µ K ) u k + = µ Ku k + µ B − T ∇ д ( z k + ) = ⇒ v k + = ( I + µ K ) − µ Kv k + ( I + µ K ) − ( I − µ K ) µ B − T ∇ д ( z k + ) . Hence we have: ∥ v k + − v k ∥ ≤ ∥( I + µ K ) − µ K ( v k − v k − )∥ + µ ∥( I + µ K ) − ( I − µ K ) B − T (∇ д ( z k + ) − ∇ д ( z k ))∥ . Similar to (72) we have: ∥( I + µ K ) − µ K ( v k − v k − )∥ ≤ µρ ( K ) + µρ ( K ) ∥ v k − v k − ∥ by (91) = ⇒ ∥ v k + − v k ∥ ≤ ( µρ ( K ) + µρ ( K ) + L d µ − L d )∥ v k − v k − ∥ . Then let µ > max (cid:26) Ld − ρ ( K ) , L d (cid:27) and we get the result. □ F.5 Sketch of proofs for Theorems 3.1 and 3.2

For the proof of Theorem 3.1, the derivation can start from (71)without assumptions on the initial values. The rest of the proofs isthe same as the proof of Theorem 3.3.For the proof of Theorem 3.2, the derivation of (89) does not relyon the initial value. The rest is the same.

ACM Trans. Graph., Vol. 38, No. 6, Article 163. Publication date: November 2019. upplementary Material for

Accelerating ADMM for Efficient Simulation and Optimization

In this supplementary material, we will verify the linear convergence theorems with a target function д that islocally Lipschitz differentiable (Theorems 3.3 and 3.4). We will use ADMM to solve the following optimizationproblem from [1] for physical simulation:min x , z f ( x ) + д ( z ) s.t. W ( z − Dx ) = , (1)Here x is the node positions of the discretized object. f ( x ) is a momentum energy of the form f ( x ) = ( x − ˜ x ) T G ( x − ˜ x ) , with ˜ x being a constant vector, and G being a scaled mass matrix. Dx collects the deformation gradient of eachelement. W is a diagonal scaling matrix that improves conditioning. д ( z ) is an elastic potential energy. Comparedto the following form of optimization problems discussed in our paper:min x , z f ( x ) + д ( z ) s.t. Ax − Bz = c , (2)we can see that problems 1 and 2 are equivalent if A = WD , B = W , c = . In this report, we assume thesimulation object to be a tetrahedral mesh and use the following potential energy: д ( z ) = n t (cid:213) i = v i ψ ( F i ) , where v i is the volume of each tetrahedron, and F i ∈ R × is its deformation gradient with respect ot the restshape, and ψ is the strain energy density function of StVK materials [2]: ψ ( F ) = λ E : E + λ tr ( E ) , (3)where λ , λ are given material parameters, and E = ( F T F − I ) ∈ R × and I is the identity matrix.The linear convergence theorems we will verify require a local Lipschitz constant for the gradient of д . Thuswe need to analyze the Lipschitz differentiability of ψ . The gradient of ψ is the first Piola-Kirchhoff stress tensor: P ( F ) = FS ( F ) . (4)where S is the second Piola-Kirchhoff stress tensor: S ( F ) = λ E + λ tr ( E ) I , Note that the value of E depends on F . In the following, we will denote E j = E ( F j ) for a subscript j . Throughoutthis report, for a matrix A , ∥ A ∥ denotes its Frobenius norm and ∥ A ∥ denotes its l norm. Our task is to estimatethe Lipschitz constant of P ( F ) with respect to F . Proposition 1. ∥ P ( F ) − P ( F )∥ ≤ (( λ + √ λ )∥ F ∥(∥ F ∥ + ∥ F ∥) + ( λ + λ )∥ E ∥)∥ F − F ∥ . a r X i v : . [ c s . G R ] S e p roof. We have: E − E = F T F − F T F = ( F T F − F T F + F T F − F T F ) = ( F T ( F − F ) + ( F T − F T ) F ) (5)Hence: ∥ E − E ∥ ≤ (∥ F ∥ + ∥ F ∥)∥ F − F ∥ (6)And: ∥ tr ( E ) I − tr ( E ) I ∥ = √ | tr ( E ) − tr ( E )| = √ | tr ( E − E )| = √ | tr ( F T ( F − F ) + ( F T − F T ) F )|≤ √ (∥ F ∥ + ∥ F ∥)∥ F − F ∥ (7)For P ( F ) we have: P ( F ) − P ( F ) = F S ( F ) − F S ( F ) = F S ( F ) − F S ( F ) + F S ( F ) − F S ( F ) = F ( S ( F ) − S ( F )) + ( F − F ) S ( F ) (8)Therefore: ∥ P ( F ) − P ( F )∥ ≤ ∥ F ∥∥ S ( F ) − S ( F )∥ + ∥ F − F ∥∥ S ( F )∥ (9)By (6) and (7) we have: ∥ S ( F ) − S ( F )∥ ≤ ( λ + √ λ )(∥ F ∥ + ∥ F ∥)∥ F − F ∥ (10)We next estimate ∥ S ( F )∥ : ∥ S ( F )∥ ≤ λ ∥ E ∥ + λ √ | tr ( E )|≤ ( λ + λ )∥ E ∥ (11)The result comes from (6), (7), (9) and (11). □ Proposition 2.

Assume ∥ F ∥ ≥ , then we have: ψ ( F ) ≥ ( λ + λ )∥ F ∥ (12) roof. Assume the singular values of F are σ ≥ σ ≥ σ . Then we have: ∥ F ∥ = (cid:213) i = σ i (13) ∥ E ∥ = (cid:213) i = ( σ i − ) (14) tr ( E ) = ( (cid:213) i = σ i − ) (15)Since ∥ F ∥ ≥

27 we have σ ≥

3, and we have: ( σ − ) = max i = , , ( σ i − ) (16)Hence: ∥ E ∥ ≥ ( σ − ) ≥ ( σ ) ≥ ( ∥ F ∥ ) (17) tr ( E ) ≥ ( ∥ F ∥ ) (18)The result comes from (16) and (17). □ Proposition 3. conv ( L aψ ) ⊂ { F | ∥ F ∥ ≤ b } , where b = max { √ , q a λ + λ } . Proof. Since { F | ∥ F ∥ ≤ b } is convex, it suffice to show L aψ ⊂ { F | ∥ F ∥ ≤ b } . Now assume ϕ ( F ) ≤ a . If ∥ F ∥ ≤ √ ( λ + λ )∥ F ∥ ≤ a (19)which completes the proof. □ Proposition 4. (1): The value (( λ + √ λ ) b + ( λ + λ )√ b + ) is a Lipschitz constant of P ( F ) over the set conv ( L aψ ) , where b is defined in Proposition 3.(2): sup F ∈ L aψ ∥ P ( F )∥ ≤ ( λ + λ ) b ( b + ) Proof. Assume the singular values of F are σ ≥ σ ≥ σ , by (14) we have: ∥ E ∥ ≤ (cid:213) i = σ i + ≤ ( (cid:213) i = σ i ) + = ∥ F ∥ +

34 (20)Now assume F , F ∈ conv ( L aψ ) , by Proposition 3 we have: ∥ F ∥ ≤ b , ∥ F ∥ ≤ b (21)By Proposition 1 and (21) we have: ∥ P ( F ) − P ( F )∥ ≤ (( λ + √ λ ) b + ( λ + λ )√ b + )∥ F − F ∥ (22)which proves (1).

80 160 240Iter10 -9 -6 -3 TheoreticalActual

Fig. 1. Comparison between theoretical and actual shrinkage of ∥ Bz k + − Bz k ∥ for verification of Theorem 3.3. The valuesare normalized by ∥ Bz − Bz ∥ and plotted in logarithmic scale. The actual shrinkage ratio stays below the theoretical upperbound, before it oscillates due to numerical error when close enough to the solution. For (2), by (11), Proposition 3 and (20): ∥ P ( F )∥ ≤ b ∥ S ( F )∥ ≤ b ( λ + λ ) ∥ E ∥ ≤ ( λ + λ ) b ( b + ) (23) □ X - Z - U ITERATION

In this subsection, we construct an example to verify the linear convergence theorem for the x - z - u iteration(Theorem 3.3). Assume the function J : R → R × assembles a vector into its matrix form. Assume z ∈ R n t , z i ∈ R is its component, where i ∈ [ , n t ] . Then д ( z ) becomes: д ( z ) = n t (cid:213) i = v i ψ ( J ( z i )) . (24)The next proposition is just a corollary from the proofs of previous propositions so we omit its proof. Proposition 5. (1): The value max i v i (( λ + √ λ ) b i + ( λ + λ ) q b i + ) is a Lipschitz constant of ∇ д ( z ) overthe set conv ( L aд ) , where b i = max { √ , r a / v i λ + λ } .(2): sup z ∈ L aд ∥∇ д ( z )∥ ≤ ˝ mi = v i ( λ + λ ) b i ( b i + ) . For the sake of simplicity, we suppose B = I in Eq. 2. The optimization problem for the verification example isconstructed via the following procedure:1. Choose the initial value as suggested by Assumption 3.5.2. Let a = T +

1. Compute L c and sup z ∈ L aд ∥∇ д ( z )∥ using Proposition 5.3. Choose a matrix G that is large enough such that ρ ( K ) ≤ L c . -5 -3 -1 TheoreticalActual

Fig. 2. Comparison between theoretical and actual shrinkage of ∥ v k + − v k ∥ for verification of Theorem 3.4. The values arenormalized by ∥ v − v ∥ and plotted in logarithmic scale. The actual shrinkage ratio stays below the theoretical upper bound.

4. Choose a µ that is large enough such that c ≤ µ − L c µ ≥ L c and µ > max { Lc − ρ ( K ) , L c } .Fig. 1 plots in logarithmic scale the value of ∥ Bz k + − Bz k ∥ throughout the iterations, as well as a straight linewhere the value changes according to the constant upper bound of shrinkage ratio given in Theorem 3.3. We cansee that the actual shrinkage ratio stays below the theoretical upper bound before convergence. Z - X - U ITERATION

We now construct an example to verify the linear convergence theorem for the z - x - u iteration (Theorem 3.4).The first problem is to compute η . We first compute the SVD for A . Suppose A = UDV and let r be the rank of A .Let U r be a sub-matrix of U consisting of its first r columns. Then we know R ( A ) = R ( U r ) . Moreover, ∀ y ∈ R ( A ) ,suppose y = U r z , then we have: ∥ y ∥ = ∥ z ∥ . Thus we choose η to be the minimal eigenvalue of U Tr KU r . Proposition 6. sup x ∈ L af ∥ Ax − A ˜x ∥ ≤ ∥ A ∥ a / q , where q is the minimal eigenvalue of G . Proof. By the definition of f ( x ) we have: L af = { x : 12 ∥ x − ˜x ∥ G ≤ a } ⊂ {∥ x − ˜x ∥ ≤ a / q } (25)Moreover, we have: ∥ Ax − A ˜x ∥ ≤ ∥ A ∥ ∥ x − ˜x ∥ (26)which completes the proof. □ To determine c defined in Assumption 3.6 we run one dimensional line-search for t and compute the upperbound for sup x ∈ L tf η µ ∥ Ax − A ˜x ∥ + sup z ∈ L a − tд ( ρ ( K ) µη + µ )∥ B − T ∇ д ( z )∥ by using Proposition 6 and Proposition 5.The optimization problem used of verification is constructed via the following procedure:1. Choose initial value as suggested by Assumption 3.6. Then compute η . . Let a = T +

1. Compute L d and sup z ∈ L aд ∥∇ д ( z )∥ using Proposition 5.3. Choose a matrix G that is large enough such that ρ ( K ) ≤ L d .4. Choose a µ that is large enough such that c + c ≤ µ ≥ η µ , µ − L d ≥( ρ ( K ) L d µη + L d µ ) and µ > max { Ld − ρ ( K ) , L d } .Fig. 2 plots in logarithmic scale the value ∥ v k + − v k ∥ throughout the iterations, as well as a straight line wherethe value shrinks according to the constant theoretical upper bound given in Theorem 3.4. We can see that theactual shrinkage ratio stays below the upper bound. REFERENCES [1] M. Overby, G. E. Brown, J. Li, and R. Narain. 2017. ADMM ⊇ projective dynamics: Fast simulation of hyperelastic models with dynamicconstraints. IEEE Transactions on Visualization and Computer Graphics

23, 10 (2017), 2222–2234.[2] Eftychios Sifakis and Jernej Barbič. 2012. FEM simulation of 3D deformable solids: A practitioner’s guide to theory, discretization andmodel reduction. In

ACM SIGGRAPH 2012 Courses . 20:1–20:50.. 20:1–20:50.