A Constraint-Reduced MPC Algorithm for Convex Quadratic Programming, with a Modified Active Set Identification Scheme
aa r X i v : . [ m a t h . O C ] O c t Noname manuscript No. (will be inserted by the editor)
A Constraint-Reduced MPC Algorithm for ConvexQuadratic Programming, with a Modified Active SetIdentification Scheme
M. Paul Laiu · Andr´e L. Tits
October 23, 2018
Abstract
A constraint-reduced Mehrotra–Predictor–Corrector algorithm forconvex quadratic programming is proposed. (At each iteration, such algorithmsuse only a subset of the inequality constraints in constructing the search di-rection, resulting in CPU savings.) The proposed algorithm makes use of aregularization scheme to cater to cases where the reduced constraint matrix isrank deficient. Global and local convergence properties are established underarbitrary working-set selection rules subject to satisfaction of a general condi-tion. A modified active-set identification scheme that fulfills this condition isintroduced. Numerical tests show great promise for the proposed algorithm, inparticular for its active-set identification scheme. While the focus of the presentpaper is on dense systems, application of the main ideas to large sparse systemsis briefly discussed.
This manuscript has been authored, in part, by UT-Battelle, LLC, under Contract No.DE-AC0500OR22725 with the U.S. Department of Energy. The United States Governmentretains and the publisher, by accepting the article for publication, acknowledges that theUnited States Government retains a non-exclusive, paid-up, irrevocable, world-wide licenseto publish or reproduce the published form of this manuscript, or allow others to do so,for the United States Government purposes. The Department of Energy will provide publicaccess to these results of federally sponsored research in accordance with the DOE PublicAccess Plan ( http://energy.gov/downloads/doe-public-access-plan ).M. Paul LaiuComputational and Applied Mathematics Group, Computer Science and Mathematics Di-vision, Oak Ridge National Laboratory, Oak Ridge, TN 37831 USAE-mail: [email protected] author’s research was sponsored by the Office of Advanced Scientific Computing Re-search and performed at the Oak Ridge National Laboratory, which is managed by UT-Battelle, LLC under Contract No. DE-AC05-00OR22725.Andr´e L. TitsDepartment of Electrical and Computer Engineering & Institute for Systems Research, Uni-versity of Maryland College Park, MD 20742 USA,E-mail: [email protected] M. Paul Laiu, Andr´e L. Tits
Consider a strictly feasible convex quadratic program (CQP) in standard in-equality form, minimize x ∈ R n f ( x ) := 12 x T H x + c T x subject to A x ≥ b , (P)where x ∈ R n is the vector of optimization variables, f : R n → R is the objec-tive function with c ∈ R n and H ∈ R n × n a symmetric positive semi-definitematrix, A ∈ R m × n and b ∈ R m , with m >
0, define the m linear inequal-ity constraints, and (here and elsewhere) all inequalities ( ≥ or ≤ ) are meantcomponent-wise. H , A , and c are not all zero. The dual problem associated to(P) ismaximize x ∈ R n , λ ∈ R m − x T H x + b T λ subject to H x + c − A T λ = , λ ≥ , (D)where λ ∈ R m is the vector of multipliers. Since the objective function f isconvex and the constraints are linear, solving (P)–(D) is equivalent to solvingthe Karush-Kuhn-Tucker (KKT) system H x − A T λ + c = , A x − b − s = , S λ = , s , λ ≥ , (1)where s ∈ R n is a vector of slack variables associated to the inequality con-straints in (P), and S = diag( s ).Primal–dual interior–point methods (PDIPM) solve (P)–(D) by iterativelyapplying a Newton-like iteration to the three equations in (1). Especially pop-ular for its numerical behavior is S. Mehrotra’s predictor–corrector (MPC)variant, which was introduced in [21] for the case of linear optimization (i.e., H = ) with straightforward extension to CQP (e.g., [22, Section 16.6]). (Ex-tension to linear complementarity problems was studied in [39].)A number of authors have paid special attention to “imbalanced” problems,in which the number of active constraints at the solution is small comparedto the total number of constraints (in particular, cases in which m ≫ n ). Insolving such problems, while most constraints are in a sense redundant, tradi-tional IPMs devote much effort per iteration to solving large systems of linearequations that involve all m constraints. In the 1990s, work toward reducingthe computational burden by using only a small portion (“working set”) of theconstraints in the search direction computation focused mainly on linear opti-mization [6,15,31,38]. This was also the case for [28], which may have been thefirst to consider such “constraint-reduction” schemes in the context of PDIPMs(vs. purely dual interior–point methods), and for its extensions [14,34,35]. Ex-ceptions include the works of Jung et al. [17, 18] and of Park et al. [23, 24]; inthe former, an extension to CQP was considered, with an affine-scaling variantused as the “base” algorithm; in the latter, a constraint-reduced PDIPM for See end of Sections 2.3 and 2.5 below for a brief discussion of how linear equality con-straints can be incorporated. Constraint-Reduced MPC Algorithm for Convex Quadratic Programming 3 semi-definite optimization (which includes CQP as a special case) was pro-posed, for which polynomial complexity was proved. Another exception is the“QP-free” algorithm for inequality-constrained (smooth) nonlinear program-ming of Chen et al. [5]. There, a constraint-reduction approach is used whereworking sets are selected by means of the Facchinei–Fischer–Kanzow activeset identification technique [8].In the linear-optimization case, the size of the working set is usually keptabove (or no lower than) the number n of variables in (P). It is indeed knownthat, in that case, if the set of solutions to (P) is nonempty and bounded, thena solution exists at which at least n constraints are active. Further if fewer than n constraints are included in the linear system that defines the search direction,then the default (Newton-KKT) system matrix is structurally singular.When H = though, the number of active constraints at solutions maybe much less than n and, at strictly feasible iterates, the Newton-KKT matrixis non-singular whenever the subspace spanned by the columns of H and theworking-set columns of A T has (full) dimension n . (In particular, of course, if H is non-singular (i.e., is positive definite) the Newton-KKT matrix is non-singular even when the working set is empty—in which case the unconstrainedNewton direction is obtained.) Hence, when solving a CQP, forcing the workingset to have size at least n is usually wasteful.The present paper proposes a constraint-reduced MPC algorithm for CQP.The work borrows from [18] (affine-scaling, convex quadratic programming)and is significantly inspired from [34] (MPC, linear optimization), but improveson both in a number of ways—even for the case of linear optimization (i.e.,when H = ). Specifically, – in contrast with [18, 34] (and [5]), it does not involve a (CPU-expensive)rank estimation combined with an increase of the size of the working setwhen a rank condition fails; rather, it makes use of a regularization schemeadapted from [35]; – a general condition (Condition CSR) to be satisfied by the constraint-selection rule is proposed which, when satisfied, guarantees global andlocal quadratic convergence of the overall algorithm (under appropriateassumptions); – a specific constraint-selection rule is introduced which, like in [5] (but un-like in [18]), does not impose any a priori lower bound on the size of theworking set; this rule involves a modified active set identification schemethat builds on results from [8]; numerical comparison shows that the newrule outperforms previously used rules.Other improvements over [18,34] include (i) a modified stopping criterion anda proof of termination of the algorithm (in [18, 34], termination is only guar-anteed under uniqueness of primal-dual solution and strict complementarity),(ii) a potentially larger value of the “mixing parameter” in the definition of theprimal search direction (see footnote 2(iii)), and (iii) an improved update for-mula (compared to that used in [35]) for the regularization parameter, whichfosters a smooth evolution of the regularized Hessian W from an initial (ma- M. Paul Laiu, Andr´e L. Tits trix) value H + R , where R is specified by the user, at a rate no faster thanthat required for local q-quadratic convergence.In Section 2 below, we introduce the proposed algorithm (Algorithm CR-MPC) and a general condition (Condition CSR) to be satisfied by the constraint-selection rule, and we state global and local quadratic convergence results forAlgorithm CR-MPC, subject to Condition CSR. We conclude the section byproposing a specific rule (Rule R), and proving that it satisfies Condition CSR.Numerical results are reported in Section 3. While the focus of the present pa-per is on dense systems, application of the main ideas to large sparse systemsis briefly discussed in the Conclusion (Section 4), which also includes otherconcluding remarks. Convergence proofs are given in two appendices.The following notation is used throughout the paper. To the number m of inequality constraints, we associate the index set m := { , , . . . , m } . Theprimal feasible and primal strictly feasible sets are F P := { x ∈ R n : A x ≥ b } and F oP := { x ∈ R n : A x > b } , and the primal and primal-dual solution sets are F ∗ P := { x ∈ F P : f ( x ) ≤ f (˜ x ) , ∀ ˜ x ∈ F P } and F ∗ := { ( x , λ ) : (1) holds } . Of course, x ∗ ∈ F ∗ P if and only if, for some λ ∗ ∈ R m , ( x ∗ , λ ∗ ) ∈ F ∗ Also, weterm stationary a point ˆ x ∈ F P for which there exists ˆ λ ∈ R m such that (ˆ x , ˆ λ )satisfies (1) except possibly for non-negativity of the components of ˆ λ . Next,for x ∈ F P , the (primal) active-constraint set at x is A ( x ) := { i ∈ m : a Ti x = b i } , where a i is the transpose of the i -th row of A . Given a subset Q ⊆ m , Q c indicates its complement in m and | Q | its cardinality; for a vector v ∈ R m , v Q is a sub-vector consisting of those entries with index in Q , and for an m × n matrix L , L Q is a | Q | × n sub-matrix of L consisting of those rows with indexin Q . An exception to this rule, which should not create any confusion, is thatfor an m × m diagonal matrix V = diag( v ), V Q is diag( v Q ), a | Q |×| Q | diagonalsub-matrix of V . For symmetric matrices W and H , W (cid:23) H (resp. W ≻ H )means that W − H is positive semi-definite (resp. positive definite). Finally,given a vector v , [ v ] + and [ v ] − denote the positive and negative parts of v ,i.e., vectors with components respectively given by max { v i , } and min { v i , } , is a vector of all ones, and given a Euclidean space R p , the ball centered at v ∗ ∈ R p with radius ρ > B ( v ∗ , ρ ) := { v ∈ R p : k v − v ∗ k ≤ ρ } . linear optimiza-tion problems, as a constraint-reduced extension of a globally and locally su-perlinearly convergent variant of Mehrotra’s original algorithm [21,36]. Trans-posed to the CQP context, that variant proceeds as follows. Constraint-Reduced MPC Algorithm for Convex Quadratic Programming 5
In a first step (following the basic MPC paradigm), given ( x , λ , s ) with λ > , s > , it computes the primal–dual affine-scaling direction ( ∆ x a , ∆ λ a , ∆ s a )at ( x , λ , s ), viz., the Newton direction for the solution of the equations portionof (1). Thus, it solves J ( H, A, s , λ ) ∆ x a ∆ λ a ∆ s a = −∇ f ( x ) + A T λ − S λ , (2)where, given a symmetric matrix W (cid:23) , we define J ( W, A, s , λ ) := W − A T A − I S Λ , with Λ = diag( λ ). Conditions for unique solvability of system (2) are given inthe following standard result (invoked in its full form later in this paper); see,e.g., [16, Lemma B.1]. Lemma 1
Suppose s i , λ i ≥ for all i and W (cid:23) . Then J ( W, A, s , λ ) isinvertible if and only if the following three conditions hold:(i) s i + λ i > for all i ;(ii) A { i : s i =0 } has full row rank; and(iii) h W (cid:0) A { i : λ i =0 } (cid:1) T i has full row rank. In particular, with λ > and s > , J ( H, A, s , λ ) is invertible if and only if[ H A T ] has full row rank. By means of two steps of block Gaussian elimination,system (2) reduces to the normal system M ∆ x a = −∇ f ( x ) ,∆ s a = A∆ x a ,∆ λ a = − λ − S − Λ∆ s a , (3)where M is given by M := H + A T S − ΛA = H + m X i =1 λ i s i a i a Ti . (4)Given positive definite S and Λ , M is invertible whenever J ( H, A, s , λ ) is.In a second step, MPC algorithms construct a centering/corrector direc-tion, which in the CQP case (e.g., [22, Section 16.6]) is the solution ( ∆ x c , ∆ λ c , ∆ s c )to (same coefficient matrix as in (2)) J ( H, A, s , λ ) ∆ x c ∆ λ c ∆ s c = σµ − ∆S a ∆ λ a , (5) M. Paul Laiu, Andr´e L. Tits where µ := s T λ /m is the “duality measure” and σ := (1 − α a ) is the centeringparameter, with α a := argmax { α ∈ [0 , | s + α∆ s a ≥ , λ + α∆ λ a ≥ } . While most MPC algorithms use as search direction the sum of the affine-scaling and centering/corrector directions, to force global convergence, we bor-row from [34] and define( ∆ x , ∆ λ , ∆ s ) = ( ∆ x a , ∆ λ a , ∆ s a ) + γ ( ∆ x c , ∆ λ c , ∆ s c ) , (6)where the “mixing” parameter γ ∈ [0 ,
1] is one when ∆ x c = and otherwise γ := min (cid:26) γ , τ k ∆ x a kk ∆ x c k , τ k ∆ x a k σµ (cid:27) , (7)where τ ∈ [0 ,
1) and γ := argmax { ˜ γ ∈ [0 , | f ( x ) − f ( x + ∆ x a + ˜ γ∆ x c ) ≥ ω ( f ( x ) − f ( x + ∆ x a )) } , (8)with ω ∈ (0 , α p := argmax { α : s + α∆ s ≥ } , α p := min { , max { κ ¯ α p , ¯ α p − k ∆ x k}} , ¯ α d := argmax { α : λ + α∆ λ ≥ } , α d := min { , max { κ ¯ α d , ¯ α d − k ∆ x k}} , with κ ∈ (0 , x + , s + ) := ( x , s ) + α p ( ∆ x , ∆ s ) . and finally λ + i := min { λ max , max { λ i + α d ∆λ i , min { λ, χ }}} , i = 1 , . . . , m , (9)where λ max > λ ∈ (0 , λ max ) are algorithm parameters, and χ := k ∆ x a k ν + k [ λ + ∆ λ a ] − k ν , with ν ≥ We however do not fully follow [34]: (i) Equation (8) generalizes (22) of [34] to CQP; (ii)In (9) we explicitly bound λ + ( x + in [34]), by λ max ; in the linear case, such boundedness isguaranteed (Lemma 3.3 in [34]); as a side-effect, in (7), we could drop the penultimate termin (24) of [34] (invoked in proving convergence of the x sequence in the proof of Lemma 3.4of [34]); (iii) We do not restrict the primal step size as done in (25) of [34] (dual step sizein the context of [34]), at the expense of a slightly more involved convergence proof: see ourProposition 3 below, to be compared to [34, Lemma 3.7]. Constraint-Reduced MPC Algorithm for Convex Quadratic Programming 7 M (see (4)), which re-quires approximately mn / A is dense,regardless of how many of the m inequality constraints in (P) are active atthe solution. This may be wasteful when few of these constraints are active atthe solution, in particular (generically) when m ≫ n (imbalanced problems).The constraint-reduction mechanism introduced in [28] and used in [18] in thecontext of an affine-scaling algorithm for the solution of CQPs modifies M bylimiting the sum in (4) to a wisely selected small subset of terms, indexed byan index set Q ⊆ m referred to as the working set .Given a working set Q , the constraint-reduction technique produces an ap-proximate affine-scaling direction by solving a “reduced” version of the Newtonsystem (2), viz. J ( H, A Q , s Q , λ Q ) ∆ x a ∆ λ a Q ∆ s a Q = −∇ f ( x ) + ( A Q ) T λ Q − S Q λ Q . (10)Just like the full system, when s Q > , the reduced system (10) is equivalentto the reduced normal system˜ M ( Q ) ∆ x a = −∇ f ( x ) ,∆ s a Q = A Q ∆ x a ,∆ λ a Q = − λ Q − S − Q Λ Q ∆ s a Q , (11)where the “reduced” ˜ M ( Q ) (still of size n × n ) is given by˜ M ( Q ) := H + ( A Q ) T S − Q Λ Q A Q = H + X i ∈ Q λ i s i a i a Ti . When A is dense, the cost of forming ˜ M ( Q ) is approximately qn /
2, where q := | Q | , leading to significant savings when q ≪ m .One difficulty that may arise, when substituting A Q for A in the Newton-KKT matrix, is that the resulting linear system might no longer be uniquelysolvable. Indeed, even when [ H A T ] has full row rank, [ H ( A Q ) T ] may be rank-deficient, so the third condition in Lemma 1 would not hold. A possible remedyis to regularize the linear system. In the context of linear optimization, suchregularization was implemented in [9] and explored in [25] by effectively addinga fixed scalar multiple of identity matrix into the normal matrix to improve nu-merical stability of the Cholesky factorization. A more general regularizationwas proposed in [1] where diagonal matrices that are adjusted dynamicallybased on the pivot values in the Cholesky factorization were used for regu-larization. On the other hand, quadratic regularization was applied to obtainbetter preconditioners in [4], where a hybrid scheme of the Cholesky factor-ization and a preconditioned conjugate gradient method is used to solve linear M. Paul Laiu, Andr´e L. Tits systems arising in primal block-angular problems. In [4], the regularizationdies out when optimality is approached.Applying regularization to address rank-deficiency of the normal matrixdue to constraint reduction was first considered in [35], in the context of linearoptimization. There a similar regularization as in [9, 25] is applied, while thescheme lets the regularization die out as a solution to the optimization problemis approached, to preserve fast local convergence. Adapting such approach tothe present context, we replace J ( H, A Q , s Q , λ Q ) by J ( W, A Q , s Q , λ Q ) and˜ M ( Q ) by M ( Q ) := W + ( A Q ) T S − Q Λ Q A Q , (12)with W := H + ̺R , where ̺ ∈ (0 ,
1] is a regularization parameter that isupdated at each iteration and R (cid:23) a constant symmetric matrix such that H + R ≻ . Thus the inequality W (cid:23) H is enforced, ensuring f ( x + ∆ x a )
0, ˜ λ , and produces the nextiterates x + , s + > λ + >
0, ˜ λ + , used as input to the next iteration. Here˜ λ , with possibly ˜ λ
0, is asymptotically slightly closer to optimality than λ , and is used in the stopping criterion. While dual feasibility of ( x , λ ) is notenforced along the sequence of iterates, a primal strictly feasible starting point x ∈ F oP is required, and primal feasibility of subsequent iterates is enforced,as it allows for monotone descent of f , which in the present context is keyto global convergence. (An extension of Algorithm CR-MPC that allows forinfeasible starting points is discussed in Section 2.3 below.) Algorithm CR-MPC makes use (in its stopping criterion and ̺ update) of an “error” function E : R n × R m → R (also used in the constraint-selection Rule R in Section 2.6below) given by E ( x , λ ) := k ( k v ( x , λ ) k , k w ( x , λ ) k ) k , (17)where v ( x , λ ) := H x + c − A T λ , w i ( x , λ ) := min {| s i | , | λ i |} , i = 1 , . . . , m, (18)with s := A x − b , and where the norms are arbitrary. Here E measures bothdual feasibility (via v ) and complementary slackness (via w ). Note that, for x ∈ F P and λ ≥ , E ( x , λ ) = 0 if and only if ( x , λ ) solves (P)–(D). Algorithm
CR-MPC:
A Constraint-Reduced variant of MPC Algorithm for CQP
Parameters: ε ≥ τ ∈ [0 , ω ∈ (0 , κ ∈ (0 , ν ≥ λ max > λ ∈ (0 , λ max ),and ¯ E > A symmetric n × n matrix R (cid:23) such that H + R ≻ . Initialization: x ∈ F oP , λ > , s := A x − b > , ˜ λ := λ . Iteration CR-MPC:Step 1.
Terminate if either (i) ∇ f ( x ) = , in which case ( x , ) is optimal for (P)–(D),or (ii) min n E ( x , λ ) , E ( x , [˜ λ ] + ) o < ε, (19)in which case ( x , [˜ λ ] + ) is declared ε -optimal for (P)–(D) if E ( x , λ ) ≥ E ( x , [˜ λ ] + ), and( x , λ ) is otherwise. Step 2.
Select a working set Q . Set q := | Q | . Set ̺ := min { , E ( x , λ )¯ E } . Set W := H + ̺R . Step 3.
Compute approximate normal matrix M ( Q ) := W + P i ∈ Q λ i s i a i a Ti . Step 4.
Solve M ( Q ) ∆ x a = −∇ f ( x ) , (20)and set ∆ s a := A∆ x a , ∆ λ a Q := − λ Q − S − Q Λ Q ∆ s a Q . (21) Step 5.
Compute the affine-scaling step α a := argmax { α ∈ [0 , | s + α∆ s a ≥ , λ Q + α∆ λ a Q ≥ } . (22) The “modified MPC algorithm” outlined in Section 2.1 is recovered as a special case bysetting ̺ + = 0 and Q = { , . . . , m } in Step 2 of Algorithm CR-MPC. For scaling reasons, it may be advisable to set the value of ¯ E to the initial value of E ( x , λ ) (so that, in Step 2 of the initial iteration, ̺ is set to 1, and W to H + R ). This wasdone in the numerical tests reported in Section 3. Here it is implicitly assumed that F oP is nonempty. This assumption is subsumed byAssumption 1 below.0 M. Paul Laiu, Andr´e L. Tits Step 6.
Set µ ( Q ) as in (14). Compute centering parameter σ := (1 − α a ) . Step 7.
Solve (13) for the corrector direction ( ∆ x c , ∆ λ c Q , ∆ s c Q ), and set ∆ s c := A∆ x c . Step 8. If q = 0, set γ := 0. Otherwise, compute γ as in (7)–(8), with µ ( Q ) replacing µ .Compute the search direction( ∆ x , ∆ λ Q , ∆ s ) := ( ∆ x a , ∆ λ a Q , ∆ s a ) + γ ( ∆ x c , ∆ λ c Q , ∆ s c ) . (23)Set ˜ λ + i = λ i + ∆λ i , ∀ i ∈ Q , and ˜ λ + i = 0, ∀ i ∈ Q c . Step 9.
Compute the primal and dual steps α p and α d by¯ α p := argmax { α : s + α∆ s ≥ } , α p := min { , max { κ ¯ α p , ¯ α p − k ∆ x k}} , ¯ α d := argmax { α : λ Q + α∆ λ Q ≥ } , α d := min { , max { κ ¯ α d , ¯ α d − k ∆ x k}} . (24) Step 10.
Updates: ( x + , s + ) := ( x , s ) + ( α p ∆ x , α p ∆ s ) . (25)Set χ := k ∆ x a k ν + k [ λ Q + ∆ λ a Q ] − k ν . Set λ + i := max { min { λ i + α d ∆λ i , λ max } , min { χ, λ }} , ∀ i ∈ Q . (26)Set µ +( Q ) := ( s + Q ) T ( λ + Q ) /q if q = 0, otherwise set µ +( Q ) := 0. Set λ + i := max { min { µ +( Q ) /s + i , λ max } , min { χ, λ }} , ∀ i ∈ Q c . (27) A few more comments are in order concerning Algorithm CR-MPC. First,the stopping criterion is a variation on that of [18, 34], involving both λ and[˜ λ ] + instead of only λ ; in fact the latter will fail when the parameter λ max (see (26)–(27)) is not large enough and may fail when second order sufficientconditions are not satisfied, while we prove below (Theorem 1(iv)) that thenew criterion is eventually satisfied indeed, in that the iterate ( x , λ ) convergesto a solution (even if it is not unique), be it on a mere subsequence. Second, ourupdate formula for the regularization parameter ̺ in Step 2 improves on thatin [35] ( ̺ + = min { χ, χ max } in the notation of this paper, where χ max is a user-defined constant) as it fosters a “smooth” evolution of W from the initial valueof H + R , with R specified by the user, at a rate no faster than that requiredfor local q-quadratic convergence. And third, R (cid:23) H —so as to mitigate possible earlyill-conditioning of M ( Q ) . (Note that a nonzero R may be beneficial even when H is non-singular.)It is readily established that, starting from a strictly feasible point, regard-less of the choice made for Q in Step 2, Algorithm CR-MPC either stops atStep 1 after finitely many iterations, or generates infinite sequences { E k } ∞ k =0 , { x k } ∞ k =0 , { λ k } ∞ k =0 , { ˜ λ k } ∞ k =0 , { s k } ∞ k =0 , { χ k } ∞ k =0 , { Q k } ∞ k =0 , { ̺ k } ∞ k =0 , and { W k } ∞ k =0 ,with s k = A x k − b > and λ k > for all k . ( E , χ , ̺ , and W correspondto the values of E k , χ k , ̺ k , and W k computed in the initial iteration, whilethe other initial values are provided in the “Initialization” step.) Indeed, ifthe algorithm does not terminate at Step 1, then ∇ f ( x ) = , i.e., ∆ x a = (from (20), since M ( Q ) is invertible); it follows that ∆ x = (if ∆ x c = , since τ ∈ [0 , γ < k ∆ x a k / k ∆ x c k ) and, since ∆ x a = implies χ > s + = A x + − b > and λ + > .From now on, we assume that infinite sequences are generated. Constraint-Reduced MPC Algorithm for Convex Quadratic Programming 11 minimize x ∈ R n , z ∈ R m f ( x ) + ϕ T z s.t. A x + z ≥ b , z ≥ , (P ϕ )maximize x ∈ R n , λ ∈ R m , u ∈ R m − x T H x + b T λ s.t. H x + c − A T λ = , λ + u = ϕ , ( λ , u ) ≥ , (D ϕ )with ϕ > x , z ) are readily available. Hence, given ϕ , this problem can be handled byAlgorithm CR-MPC. Such ℓ penalty function is known to be exact, i.e.,for some unknown, sufficiently large (but still moderate) value of ϕ , solutions( x ∗ , z ∗ ) to (P ϕ ) are such that x ∗ solves (P); further ( [13, 14]), z ∗ = . In [13,14], an adaptive scheme is proposed for increasing ϕ to such value. Applyingthis scheme on (P ϕ )–(D ϕ ) allows Algorithm CR-MPC to handle infeasiblestarting points for (P)–(D). We refer the reader to [13, Chapter 3] for details.Linear equality constraints of course can be handled by first projectingthe problem on the associated affine space, and then run Algorithm CR-MPCon that affine space. A weakness of this approach though is that it does notadequately extend to the case of sparse problems (discussed in the Conclusionsection (Section 4) of this paper), as projection may destroy sparsity. An al-ternative approach is, again, via augmentation: Given the constraints C x = d ,with d ∈ R p , solve the problemminimize x ∈ R n , y ∈ R p f ( x ) + ϕ T y s.t. C x + y ≥ d , C x − y ≤ d . (28)(Note that, taken together, the two constraints imply y ≥ .) Again, given ϕ > Q . Several constraint-selection rules have been proposed An ℓ ∞ penalty function can be substituted for this ℓ penalty function with minoradjustments: see [13, 14] for details. It is readily checked that, given the simple form in which z enters the constraints, fordense problems, the cost of forming M ( Q ) still dominates and is still approximately | Q | n / | Q | ≪ m .2 M. Paul Laiu, Andr´e L. Tits for constraint-reduced algorithms on various classes of optimization problems,such as linear optimization [28, 34, 35], convex quadratic optimization [17, 18],semi-definite optimization [23,24], and nonlinear optimization [5]. In [28,34,35],the cardinality q of Q is constant and decided at the outset. Because in thenon-degenerate case the set of active constraints at the solution of (P) with H = is at least equal to the number n of primal variables, q ≥ n is usuallyenforced in that context. In [18], which like this paper deals with quadraticproblems, q was allowed to vary from iteration from iteration, but q ≥ n wasstill enforced throughout (owing to the fact that, in the regular case, there areno more than n active constraints at the solution). Here we propose to againallow q to vary, but in addition to not a priori impose a positive lower boundon q .The convergence results stated in Section 2.5 below are valid with anyconstraint-selection rule that satisfies the following condition. Condition CSR
Let { ( x k , λ k ) } be the sequence constructed by Algorithm CR-MPC with the constraint-selection rule under consideration, and let Q k be theworking set generated by the constraint-selection rule at iteration k . Then thefollowing holds: (i) if { ( x k , λ k ) } is bounded away from F ∗ , then, for all limitpoints x ′ such that { x k } → x ′ on some infinite index set K , A ( x ′ ) ⊆ Q k forall large enough k ∈ K ; and (ii) if (P) – (D) has a unique solution ( x ∗ , λ ∗ ) andstrict complementarity holds at x ∗ , and if { x k } → x ∗ , then A ( x ∗ ) ⊆ Q k forall k large enough. Condition CSR(i) aims at preventing convergence to non-optimal primal point,and hence (given a bounded sequence of iterates) forcing convergence to solu-tion points. Condition CSR(ii) is important for fast local convergence to setin. A specific rule that satisfies Condition CSR, Rule R, used in our numericalexperiments, is presented in Section 2.6 below.2.5 Convergence PropertiesThe following standard assumptions are used in portions of the analysis.
Assumption 1 F oP is nonempty and F ∗ P is nonempty and bounded . Assumption 2 At every stationary point x , A A ( x ) has full row rank. Assumption 3
There exists a (unique) x ∗ where the second-order sufficientcondition of optimality with strict complementarity holds, with (unique) λ ∗ . Assumption 3, mostly used in the local analysis, subsumes Assumption 1.Theorem 1, proved in Appendix A, addresses global convergence. Nonemptiness and boundedness of F ∗ P are equivalent to dual strict feasibility (e.g., [7]). Equivalently (under the sole assumption that F ∗ P is nonempty) A A ( x ) has full row rankat all x ∈ F P . In fact, while we were not able to carry out the analysis without such (strong)assumption (the difficulty being to rule out convergence to non-optimal stationary points),numerical experimentation suggests that the assumption is immaterial. Constraint-Reduced MPC Algorithm for Convex Quadratic Programming 13 Theorem 1
Suppose that the constraint-selection rule invoked in Step 2 ofAlgorithm CR-MPC satisfies Condition CSR. First suppose that ε = 0 , thatthe iteration never stops, and that Assumptions 1 and 2 hold. Then (i) theinfinite sequence { x k } it constructs converges to the primal solution set F ∗ P ; ifin addition, Assumption 3 holds, then (ii) { ( x k , ˜ λ k ) } converges to the uniqueprimal–dual solution ( x ∗ , λ ∗ ) and { ( x k , λ k ) } converges to ( x ∗ , ξ ∗ ) , with ξ ∗ i :=min { λ i , λ max } for all i ∈ m , and (iii) for sufficiently large k , the working set Q k contains A ( x ∗ ) .Finally, suppose again that Assumptions 1 and 2 hold. Then (iv) if ε > ,Algorithm CR-MPC stops (in Step 1) after finitely many iterations. Fast local convergence is addressed next; Theorem 2 is proved in Appendix B.
Theorem 2
Suppose that Assumption 3 holds, that ε = 0 , that the iterationnever stops, and that λ ∗ i < λ max for all i ∈ m . Then there exist ρ > and C > such that, if k ( x − x ∗ , λ − λ ∗ ) k < ρ and Q ⊇ A ( x ∗ ) , then k ( x + − x ∗ , ˜ λ + − λ ∗ ) k ≤ C k ( x − x ∗ , λ − λ ∗ ) k . (29)When the constraint-selection rule satisfies Condition CSR(ii), local q-quadraticconvergence is an immediate consequence of Theorems 1 and 2. Corollary 1
Suppose that Assumptions 1–3 hold, that ε = 0 , that the itera-tion never stops, and that λ ∗ i < λ max for all i ∈ m . Further suppose that theconstraint-selection rule invoked in Step 2 satisfies Condition CSR. Then Al-gorithm CR-MPC is locally q-quadratically convergent. Specifically, there exists C > such that, given any initial point ( x , λ ) , for some k ′ > , A ( x ∗ ) ⊆ Q k and k ( x k +1 − x ∗ , λ k +1 − λ ∗ ) k ≤ C k ( x k − x ∗ , λ k − λ ∗ ) k , ∀ k > k ′ . The foregoing theorems and corollary (essentially) extend to the case ofinfeasible starting point discussed in Section 2.3. The proof follows the linesof that in [13, Theorem 3.2]. While Assumptions 1 and 3 remain unchanged,Assumption 2 must be tightened to: For every x ∈ R n , { a i : a Ti x ≤ b i } is a linearly independent set. (While this assumption appears to be ratherrestrictive—a milder condition is used in [13, Theorem 3.2] and [14], but webelieve it is insufficient—footnote 10 applies here as well.)Subject to such tightening of Assumption 2, Theorem 1 still holds. Further,Theorem 2 and Corollary 1 (local quadratic convergence) also hold, but forthe augmented set of primal–dual variables, ( x , z , λ , u ). While proving theresults for ( x , λ ) might turn out to be possible, an immediate consequence ofq-quadratic convergence for ( x , z , λ , u ) is r-quadratic convergence for ( x , λ ).Under the same assumptions, Theorems 1 and 2 and Corollary 1 still holdin the presence of equality constraints C x = d via transforming the problem In fact, given any known upper bound z to { z k } , this assumption can be relaxed tomerely requiring linear independence of the set { a i : b i − z ≤ a Ti x ≤ b i } , which tends tothe set of active constraints when z goes to zero. This can be done, e.g., with z = c k z k ∞ ,with any c >
1, if the constraint z ≤ c z is added to the augmented problem.4 M. Paul Laiu, Andr´e L. Tits to (28), provided { a i : a Ti x ≤ b i } ∪ { c i : i = 1 , . . . , p } is a linearly independentset for every x ∈ R n , with c i the i th row of C . Note that it may be beneficialto choose x to lie on the affine space defined by C x = d , in which case thecomponents of y can be chosen quite small, and to include the constraint y ≤ c y for some c > E k := E ( x k , λ k ), and then selectsthe working set by including all constraints with slack values less than thecomputed threshold. Rule R
Proposed Constraint-Selection Rule
Parameters: ¯ δ >
0, 0 < β < θ < Input:
Iteration: k , Slack variable: s k , Error: E min (when k > E k := E ( x k , λ k ),Threshold: δ k − . Output:
Working set: Q k , Threshold: δ k , Error: E min .1: if k = 0 then δ k := ¯ δ E min := E k else if E k ≤ βE min then δ k := θδ k − E min := E k else δ k := δ k − end if
10: Select Q k := { i ∈ m | s ki ≤ δ k } . A property of E that plays a key role in proving that Rule R satisfies Con-dition CSR is stated next; it does not require strict complementarity. It wasestablished in [8], within the proof of Theorem 3.12 (equation (3.13)); alsosee [12, Theorem 1], [37, Theorem A.1], as well as [3, Lemma 2, with the “vec-tor of perturbations” set to zero] for an equivalent, yet global inequality in thecase of linear optimization ( H = ), under an additional dual (primal in thecontext of [3]) feasibility assumption (( H x ) − A T λ + c = ). A self-containedproof in the case of CQP is given here for the sake of completeness and easeof reference. Lemma 2
Suppose ( x ∗ , λ ∗ ) solves (P)–(D), let I := { i ∈ m : λ ∗ i > } , andsuppose that (i) A A ( x ∗ ) and (ii) [ H ( A I ) T ] have full row rank. Then thereexists c > and some neighborhood V of the origin such that E ( x , λ ) ≥ c k ( x − x ∗ , λ − λ ∗ ) k whenever ( x − x ∗ , λ − λ ∗ ) ∈ V. Proof
Let z ∗ := ( x ∗ , λ ∗ ) ∈ R n + m , let s := A x − b , s ∗ := A x ∗ − b , and let Ψ : R n + m → R be given by Ψ ( ζ ) := E ( z ∗ + ζ ). We show that, restricted to an Constraint-Reduced MPC Algorithm for Convex Quadratic Programming 15 appropriate punctured convex neighborhood of the origin, Ψ is strictly positiveand absolutely homogeneous, so that the convex hull ˆ Ψ of such restrictiongenerates a norm on R n + m , proving the claim (with c = 1 for the normgenerated by ˆ Ψ ). To proceed, let ζ x ∈ R n and ζ λ ∈ R m denote respectivelythe first n and last m components of ζ , and let ζ s := A ζ x .First, since v ( z ∗ ) = , v ( z ∗ + ζ ) = H ζ x − A T ζ λ is linear in ζ , making itsnorm absolutely homogeneous in ζ ; and since s i = s ∗ i + ζ si and λ i = λ ∗ i + ζ λi ,complementarity slackness ( s ∗ i λ ∗ i = 0) implies that k w ( x ∗ + ζ x , λ ∗ + ζ λ ) k isabsolutely homogeneous in ζ as well, in some neighborhood V of the origin.Hence Ψ is indeed absolutely homogeneous in V .Next, turning to strict positiveness and proceeding by contradiction, sup-pose that for every δ > ζ = , with k ζ k < δ , such that Ψ ( ζ ) = 0,i.e., v ( z ∗ + ζ ) = and w ( z ∗ + ζ ) = . In view of (i), which implies uniqueness(over all of R m ) of the KKT multiplier vector associated to x ∗ , and giventhat ζ = , we must have ζ x = . In view of (ii), this implies that H ζ x and ζ s I = A I ζ x cannot vanish concurrently. On the other hand, for i ∈ I and forsmall enough δ , w i ( z ∗ + ζ ) = 0 implies ζ si = 0. Hence, H ζ x cannot vanish, andit must hold that ( ζ x ) T H ζ x >
0. Since v ( z ∗ + ζ ) = H ζ x − A T ζ λ , we concludefrom v ( z ∗ + ζ ) = that ( ζ x ) T A T ζ λ >
0, i.e., ( ζ s ) T ζ λ >
0. Now, the argumentthat shows that ζ si = 0 when λ ∗ i > ζ λi = 0 when s ∗ i > P { i : s ∗ i = λ ∗ i =0 } ζ si ζ λi >
0, in contradiction with w ( z ∗ + ζ ) = . Taking V to be a convex neighborhood of the origin containedin V ∩ { ζ : k ζ k < δ } completes the proof. ✷ Proposition 1
Algorithm CR-MPC with Rule R satisfies Condition CSR.Proof
To prove that Condition CSR(i) holds, let { ( x k , λ k ) } be bounded awayfrom F ∗ , let x ′ be a limit point of { x k } , and let K be an infinite subsequencesuch that { x k } → x ′ on K . By (17)–(18), { E k } is bounded away from zero sothat, under Rule R, there exists δ ′ > δ k > δ ′ for all k . Now, with s ′ := A x ′ − b and s k := A x k − b for all k , since s ′A ( x ′ ) = , we have that, forall i ∈ A ( x ′ ), s ki < δ ′ for all large enough k ∈ K . Hence, for all large enough k ∈ K , s ki < δ ′ < δ k , ∀ i ∈ A ( x ′ ) . Since Rule R chooses the working set Q k := { i ∈ m | s ki ≤ δ k } for all k , weconclude that A ( x ′ ) ⊆ Q k for all large enough k ∈ K , which proves Claim (i).Turning now to Condition CSR (ii), suppose that (P)–(D) has a uniquesolution ( x ∗ , λ ∗ ), that strict complementarity holds at x ∗ , and that { x k } → x ∗ .If δ k is reduced no more than finitely many times, then of course it is boundedaway from zero, and the proof concludes as for Condition CSR(i); thus suppose { δ k } →
0. Let K := { k ≥ δ k = θδ k − } (an infinite index set) and, for given k , let ℓ ( k ) be the cardinality of { k ′ ≤ k : k ′ ∈ K } . Then we have δ k = ¯ δθ ℓ ( k ) for all k and E k ≤ β ℓ ( k ) E for all k ∈ K . Since β < θ (see Rule R), thisimplies that { E k δ k } k ∈ K →
0. And from the definition of E k and uniqueness ofthe solution to (P)–(D), it follows that { λ k } → λ ∗ as k → ∞ , k ∈ K . We use these two facts to prove that, for all i ∈ A ( x ∗ ) and some k , s ki ≤ δ k ∀ k ≥ k ; (30)in view of Rule R, this will complete the proof of Claim (ii). From Lemma 2,there exist C > η > k ( x − x ∗ , ψ − λ ∗ ) k ≤ CE ( x , ψ )for all ( x , ψ ) satisfying k ( x − x ∗ , ψ − λ ∗ ) k < η . Since { E k δ k } k ∈ K → { λ k − λ ∗ } k ∈ K → , and since k s k − s ∗ k ≤ k A kk x k − x ∗ k (since s k − s ∗ = A ( x k − x ∗ )), there exists k such that, for all i ∈ A ( x ∗ ), s ki ≤ k A kk ( x k − x ∗ , λ k − λ ∗ ) k ≤ k A k CE k ≤ δ k , ∀ k ∈ K, k ≥ k , (31)establishing (30) for k ∈ K . It remains to show that (30) does hold for all k large enough. Let ρ and C be as in Theorem 2 and without loss of generalitysuppose Cρ ≤ θ . Since { ( x k − x ∗ , λ k − λ ∗ ) } k ∈ K → , there exists k ∈ K (w.l.o.g. k ≥ k ), such that k ( x k − x ∗ , λ k − λ ∗ ) k < ρ . Theorem 2 togetherwith (31) then imply that k ( x k +1 − x ∗ , λ k +1 − λ ∗ ) k ≤ C k ( x k − x ∗ , λ k − λ ∗ ) k < θ k ( x k − x ∗ , λ k − λ ∗ ) k ≤ θδ k / k A k . (When A = , Proposition 1 holds trivially.) Hence, k ( x k +1 − x ∗ , λ k +1 − λ ∗ ) k <ρ and since (in view of Rule R) δ k +1 is equal to either δ k or θδ k and θ ∈ (0 , i ∈ A ( x ∗ ), s k +1 i ≤ k A kk ( x k +1 − x ∗ , λ k +1 − λ ∗ ) k ≤ δ k +1 , so that A ( x ∗ ) ⊆ Q k +1 . Theorem 2 can be applied recursively, yielding A ( x ∗ ) ⊆ Q k for all k large enough, concluding the proof of Claim (ii). ✷ Note that if a constraint-selection rule satisfies Condition CSR, rules de-rived from it by replacing Q k by a superset of it also satisfy Condition CSRso that our convergence results still hold. Such augmentation of Q k is oftenhelpful; e.g., see Section 5.3 in [34]. Note however that the following corollaryto Theorems 1–2 and Proposition 1, proved in Appendix A, of course does not apply when Rule R is thus augmented. Corollary 2
Suppose that Rule R is used in Step 2 of Algorithm CR-MPC, ε = 0 , and that Assumptions 1–3 hold. Let ( x ∗ , λ ∗ ) be the unique primal–dualsolution. Further suppose that λ ∗ i < λ max for all i ∈ m . Then, for sufficientlylarge k , Rule R gives Q k = A ( x ∗ ) . In particular, if x ∗ is an unconstrained minimizer, the working set Q is eventually empty,and Algorithm CR-MPC reverts to a simple regularized Newton method (and terminates inone additional iteration if H ≻ and R = ). Constraint-Reduced MPC Algorithm for Convex Quadratic Programming 17 We report computational results obtained with Algorithm CR-MPC on ran-domly generated problems and on data-fitting problems of various sizes. Comparisons are made across different constraint-selection rules, including theunreduced case ( Q = m ). All ( Q = m , i.e., no reduction).The details of Rule JOT and Rule FFK–CWH are stated below. Rule JOT
Parameters: κ > q U ∈ [ n, m ] (integer). Input:
Iteration: k , Slack variable: s k , Duality measure: µ := ( λ k ) T s k /m . Output:
Working set: Q k .Set q := min { max { n, ⌈ µ κ m ⌉} , q U } , and let η be the q -th smallest slack value.Select Q k := { i ∈ m | s ki ≤ η } . Rule FFK–CWH
Parameter: < r < Input:
Iteration: k , Slack variable: s k , Error: E k := E ( x k , λ k ) (see (17)). Output:
Working set: Q k .Select Q k := { i ∈ m | s ki ≤ ( E k ) r } . Note that the thresholds in Rule R and Rule FFK–CWH depend on both theduality measure µ and dual feasibility (see (17)) and these two rules imposeno restriction on | Q | . On the other hand, the threshold in Rule JOT involvesonly µ , while it is required that | Q | ≥ n . In addition, it is readily verified thatRule FFK–CWH satisfies Condition CSR, and that so does Rule JOT underAssumption 2.It is worth noting that Rule R, Rule FFK–CWH, and Rule JOT all se-lect constraints by comparing the values of primal slack variables s i to somethreshold values (independent of i ), while the associated dual variables λ i arenot taken into account individually. Of course, variations with respect to suchchoice are possible. In fact, it was shown in [8] (also see an implementation In addition, a preliminary version of the proposed algorithm (with a modified versionof Rule JOT, see [19] for details) was successfully tested in [20] on CQPs arising from apositivity-preserving numerical scheme for solving linear kinetic transport equations. We also ran comparison tests with the constraint-reduced algorithm of [23], for whichpolynomial complexity was established (as was superlinear convergence) for general semi-definite optimization problems. As was expected, that algorithm could not compete (ordersof magnitude slower) with algorithms specifically targeting CQP.8 M. Paul Laiu, Andr´e L. Tits in [5]) that the strongly active constraint set { i ∈ m : λ ∗ i > } can be (asymp-totically) identified at iteration k by the set { i ∈ m : λ ki ≥ δ k } with a properlychosen threshold δ k . Modifying the constraint selection rules considered in thepresent paper to include such information might improve the efficiency of therules, especially when the constraints are poorly scaled. (Such modificationdoes not affect the convergence properties of Algorithm CR-MPC as long asthe modified rules still satisfy Condition CSR.) Numerical tests were carriedout with an “augmented” Rule R that also includes { i ∈ m : λ ki ≥ δ k } (withthe same δ k as in the original Rule R). The results suggest that, on the classof imbalanced problems considered in this section, while introducing someoverhead, such augmentation (with the same δ k ) brings no benefit.3.2 Implementation DetailsAll numerical tests were run with a Matlab implementation of Algorithm CR-MPC on a machine with Intel(R) Core(TM) i5-4200 CPU(3.1GHz), 4GBRAM, Windows 7 Enterprise, and Matlab 7.12.0(R2011a). In the implemen-tation, E ( x , λ ) (see (17)–(18)) is normalized via division by the factor ofmax {k A k ∞ , k H k ∞ , k c k ∞ } , and 2-norms are used in (17) and Steps 9 and 10.In addition, for scaling purposes (see, for example, [18]), we used the normal-ized constraints ( DA ) x ≥ D b , where D = diag (1 / k a i k ).To highlight the significance of constraint-selection rules, a dense directCholesky solver was used to solve normal equations (20) and (15). Follow-ing [28] and [18], we set s i := max { s i , − } for all i when computing M ( Q ) in (12). Such safeguard prevents M ( Q ) from being too ill-conditioned and mit-igates numerical difficulties in solving (20) and (15). When the Cholesky fac-torization of the modified M ( Q ) failed, we then doubled the regularizationparameter ̺ and recomputed M ( Q ) in (12), and repeated this process until M ( Q ) was successfully factored. In the implementation, Algorithm CR-MPC is set to terminate as soonas either the stopping criterion (19) is satisfied or the iteration count reaches200. The algorithm parameter values used in the tests were ε = 10 − , τ = 0 . ω = 0 . κ = 0 . ν = 3, λ max = 10 , λ = 10 − , R = I n × n (the n × n identitymatrix), and ¯ E = E ( x , λ ), as suggested in footnote 5. The parameters inRule R were given values β = 0 . θ = 0 .
5, and ¯ δ = the 2 n -th smallest initialslack value. In Rule JOT, κ = 0 .
25 is used as in [18], and q U = m was selected(although q U = 3 n is suggested as a “good heuristic” in [18]) to protect againsta possible very large number of active constraints at the solution; the numericalresults in [18] suggest that there is no significant downside in using q U = m .In Rule FFK–CWH, r = 0 . An alternative approach to take care of ill-conditioned M ( Q ) is to apply a variant of theCholesky factorization that handles positive semi-definite matrices, such as the Cholesky-infinity factorization (i.e., cholinc(X, ' inf ' ) in Matlab) or the diagonal pivoting strategydiscussed in [36, Chapter 11]. Either implementation does not make notable difference inthe numerical results reported in this paper, since the Cholesky factorization fails in fewerthan 1% of the tested problems. Constraint-Reduced MPC Algorithm for Convex Quadratic Programming 19 m ≫ n ) randomly gen-erated problems; we used m := 10 000 and n ranging from 10 to 500. Problemsof the form (P) were generated in a similar way as those used in [18,28,34]. Theentries of A and c were taken from a standard normal distribution N (0 , x and s from uniform distributions U (0 ,
1) and U (1 , b := A x − s , which guarantees that x is strictly feasible. Weconsidered two sub-classes of problems: (i) strongly convex, with H diagonaland positive, with random diagonal entries from U (0 , H = . We solved 50 randomly generated problems for each sub-class of H andfor each problem size, and report the results averaged over the 50 problems.There was no instance of failure on these problems. Figure 1 shows the results.(We also ran tests with H rank-deficient but nonzero, with similar results.) Iteration count
10 20 50 100 200 500101214161820
Average | Q |
10 20 50 100 200 50010 Computation time (sec)
10 20 50 100 200 50010 -2 -1 (a) Strongly convex QP: H ≻ Iteration count
10 20 50 100 200 50010152025303540
Average | Q |
10 20 50 100 200 50010 Computation time (sec)
10 20 50 100 200 50010 -2 -1 (b) linear optimization: H = Fig. 1:
Randomly generated problems with m = 10 000 constraints– Numerical resultson two types of randomly generated problems. In each figure, the x -axis is the number ofvariables ( n ) and the y -axis is iteration count, average size of working set, or computationtime, all averaged over the 50 problem instances and plotted in logarithmic scale. Theresults of Rule All , Rule FFK–CWH, Rule JOT, and Rule R are plotted as blue triangles,red circles, yellow squares, and purple crosses, respectively.
It is clear from the plots that, in terms of computation time, Rule R out-performs other constraint-selection rules for the randomly generated problems we tested. When the number of variables ( n ) is 1% ∼
5% of the numberof constraints ( m ) (i.e, n =100 to 500), Algorithm CR-MPC with Rule R istwo to five times faster than with the second best rule, or 20 to 50 timesfaster than the unreduced algorithm (Rule All ). When n is lowered to lessthan 1% of m (i.e., n < n decreases, Rule R is asymptotically more restrictivethan Rule FFK–CWH and Rule JOT. We believe this may be the key reasonthat Rule R outperforms other rules, especially on problems with small n .3.4 Data-Fitting ProblemsWe also applied Algorithm CR-MPC on CQPs arising from two instances of adata-fitting problem: trigonometric curve fitting to noisy observed data points.This problem was formulated in [28] as a linear optimization problem, and thenin [18] reformulated as a CQP by imposing a regularization term. The CQPformulation of this problem, taken from [18], is as follows. Let g : [0 , → R be a given function of time, and let ¯ b := [¯ b , . . . , ¯ b ¯ m ] T ∈ R ¯ m be a vector thatcollects noisy observations of g at sample time t = t , . . . , t ¯ m . The problemaims at finding a trigonometric expansion u ( t ) from the noisy data ¯ b that bestapproximates g . Here u ( t ) := P ¯ nj =1 ¯ x j ψ j ( t ), with the trigonometric basis ψ j ( t ) := ( cos(2( j − πt ) , j = 1 , . . . , ⌈ ¯ n ⌉ sin(2( j − ⌈ ¯ n ⌉ ) πt ) j = ⌈ ¯ n ⌉ + 1 , . . . , ¯ n . Equivalently, u ( t ) = ¯ A ¯ x , where ¯ x := [¯ x , . . . , ¯ x ¯ n ] T and ¯ A is a ¯ m × ¯ n matrix withentries ¯ a ij = ψ j ( t i ). Based on a regularized minimax approach, the problemis then formulated as minimize ¯ x ∈ R ¯ n k ¯ A ¯ x − ¯ b k ∞ + 12 ¯ α ¯ x T ¯ H ¯ x , where ¯ H (cid:23) is a symmetric ¯ n × ¯ n matrix, ¯ α is a regularization parameter,and ¯ x T ¯ H ¯ x is a regularization term that helps resist over-fitting. This problemcan be rewritten as minimize ¯ x ∈ R ¯ n v ∈ R v + 12 ¯ α ¯ x T ¯ H ¯ x subject to ¯ A ¯ x − ¯ b ≥ − v , − ¯ A ¯ x + ¯ b ≥ − v , which is a CQP in the form of (P) with number of variables n = ¯ n + 1 andnumber of constraints m = 2 ¯ m .Following [18], we tested Algorithm CR-MPC on this problem with twotarget functions g ( t ) = sin(10 t ) cos(25 t ) and g ( t ) = sin(5 t ) cos (10 t ) . Interestingly, on strongly convex problems, most rules (and especially Rule R) need asmaller number of iterations than Rule
All (except for n = 500)! Constraint-Reduced MPC Algorithm for Convex Quadratic Programming 21 In each case, as in [18], we sampled the data uniformly in time and set¯ b i := g ( i − m ) + ǫ i , where ǫ i is an independent and identically distributed noisethat takes values from N (0 , . for i = 1 , . . . , ¯ m and, as in [18], the regu-larization parameters were chosen as ¯ α := 10 − and ¯ H = diag(¯ h ), with ¯ h := 0and ¯ h j = ¯ h j + ⌈ ¯ n ⌉− := 2( j − π , for j = 2 , . . . , ⌈ ¯ n ⌉ , and ¯ h ¯ n := 2( ⌊ ¯ n ⌋ ) π .Figure 2 reports our numerical results. Since these problems involve noise,we solved the problem 50 times for each target function and report the av-erage results. (The average is not reported—the corresponding symbol is notplotted—for problems on which one or more of the 50 runs failed, i.e., did notconverge when iteration count reaches 200.) The sizes of the tested problemsare m := 10 000 and n ranging from 10 to 500. Iteration count
10 20 50 100 200 500102050100200
Average | Q |
10 20 50 100 200 50010 Computation time (sec)
10 20 50 100 200 50010 -2 (a) g ( t ) = sin(10 t ) cos(25 t )Iteration count
10 20 50 100 200 500102050100200
Average | Q |
10 20 50 100 200 50010 Computation time (sec)
10 20 50 100 200 50010 -2 (b) g ( t ) = sin(5 t ) cos (10 t ) Fig. 2:
Data-fitting problems with m = 10 000 constraints – Numerical results on two data-fitting problems. In each figure, the x -axis is the number of variables ( n ) and the y -axisis iteration count, average size of working set, or computation time, all averaged over the50 problem instances and plotted in logarithmic scale. The results of Rule All , Rule FFK–CWH, Rule JOT, and Rule R are plotted as blue triangles, red circles, yellow squares, andpurple crosses, respectively.
The results show that Rule R still outperforms other constraint-selectionrules in terms of computation time, especially on problems with relativelysmall n . In general, Rule R is two to ten times faster than the second best We also ran the tests without noise and with noise of variance between 0 and 1, and theresults were very similar to the ones reported here.2 M. Paul Laiu, Andr´e L. Tits rule. We observe in Figure 2 that Rule JOT and Rule
All failed to convergewithin 200 iterations in a few instances on problems with relatively large n .Numerical results suggest that these failures are due to ill-conditioning of M Q apparently producing poor search directions. Thus, we conjecture thataccurate identification of active constraints not only reduces computation time,but also alleviates the ill-conditioning issue of M Q near optimal points.3.5 Comparison with Broadly Used, Mature Code With a view toward calibrating the performance of Algorithm CR-MPC re-ported in Sections 3.3 and 3.4, we carried out a numerical comparison withtwo widely used solvers, SDPT3 [30, 32] and SeDuMi [26]. The tests wereperformed on the problems considered in Sections 3.3 and 3.4 with sizes m = 10 000 and n = 10 , , , , , − and let the solvers decide the starting points.Table 1 reports the iteration counts and computation time of SDPT3,SeDuMi, and Algorithm CR-MPC with Rule All and Rule R, on each type oftested problems. The numbers in Table 1 are average values over 50 runs andover all six tested values of n . These results show that, for such significantlyimbalanced problems, constraint reduction brings a clear edge. In particular,for such problems, Algorithm CR-MPC with Rule R shows a significantlybetter time-performance than two mature solvers. Randomly generated problems Data fitting problemsAlgorithm H ≻ H = sin(10 t ) cos(25 t ) sin(5 t ) cos (10 t )iteration time iteration time iteration time iteration timeSDPT3 23.6 35.8 21.2 22.3 26.2 46.1 26.7 48.7SeDuMi 22.0 4.0 16.3 4.4 26.9 5.1 26.1 5.5Rule All
Table 1:
Comparison of Algorithm CR-MPC with popular codes – This table reports theiteration count and computation time (sec) for each of the compared algorithms on eachtype of tested problems, averaged over 50 runs. Every reported number is also averaged overvarious problems sizes: m = 10 000 and n = 10 , , , , , It may also be worth pointing out that a short decade ago, in [33], the performance of anearly version of a constraint-reduced MPC algorithm (with a more elementary constraint-selection rule than Rule JOT) was compared, on imbalanced filter-design applications (linearoptimization), to the “revised primal simplex with partial pricing” algorithm discussedin [2], with encouraging results: on the tested problems, the constraint-reduced code provedcompetitive with the simplex code on some such problems and superior on others. While these two solvers have a broader scope (second-order cone optimization, semidef-inite optimization) than Algorithm CR-MPC, they allow a close comparison with our code,as Matlab implementations are freely available within the CVX Matlab package [10, 11]. Constraint-Reduced MPC Algorithm for Convex Quadratic Programming 23
Convergence properties of the constraint-reduced algorithm proposed in thispaper, which includes a number of novelties, were proved independently ofthe choice of the working-set selection rule, provided the rule satisfies Condi-tion CSR. Under a specific such rule, based on a modified active-set identifi-cation scheme, the algorithm performs remarkably well in practice, both onrandomly generated problems (CQPs as well as linear optimization problems)as well as data-fitting problems.Of course, while the focus of the present paper was on dense problems,the concept of constraint reduction also applies to imbalanced large, sparseproblems. Indeed, whichever technique is used for solving the Newton-KKTsystem, solving instead a reduced Newton-KKT system, of like sparsity butof drastically reduced size, is bound to bring in major computational savingswhen the total number of inequality constraints is much larger than the numberof inequality constraints that are active at the solution—at least when thenumber of variables is reasonably small compared to the number of inequalityconstraints. In the case of sparse problems, the main computation cost in anIPM iteration would be that of a (sparse) Cholesky factorization or, in aniterative approach to solving the linear system, would be linked to the numberof necessary iterations for reaching needed accuracy. In both cases, a majorreduction in the dimension of the Newton-KKT system is bound to reduce thecomputation time, and like savings as in the dense case should be expected.
Appendix
The following results are used in the proofs in Appendices A and B. Here we assume that Q ⊆ m and W is symmetric, with W (cid:23) H ≻ . First, from (10) and (13), the approximateMPC search direction ( ∆ x , ∆ λ Q , ∆ s Q ) defined in (16) solves J ( W, A Q , s Q , λ Q ) ∆ x ∆ λ Q ∆ s Q = −∇ f ( x ) + ( A Q ) T λ Q − S Q λ Q + γσµ ( Q ) − γ∆S a Q ∆ λ a Q , (32)and equivalently, when s Q > , from (20), (21) and (15), M ( Q ) ∆ x = −∇ f ( x ) + ( A Q ) T S − Q ( γσµ ( Q ) − γ∆S a Q ∆ λ a Q ) ,∆ s Q = A Q ∆ x ,∆ λ Q = − λ Q + S − Q ( − Λ Q ∆ s Q + γσµ ( Q ) − γ∆S a Q ∆ λ a Q ) . (33)Next, with ˜ λ + and ˜ λ a , + given by˜ λ + i := ( λ i + ∆λ i i ∈ Q, i ∈ Q c , and ˜ λ a , + i := ( λ i + ∆λ a i i ∈ Q, i ∈ Q c , (34)from the last equation of (33) and from (21), we have˜ λ + Q = S − Q ( − Λ Q ∆ s Q + γσµ ( Q ) − γ∆S a Q ∆ λ a Q ) , (35)4 M. Paul Laiu, Andr´e L. Tits˜ λ a , + Q = − S − Q Λ Q ∆ s a Q = − S − Q Λ Q A Q ∆ x a Q , (36)and hence ( ∆ x a ) T ( A Q ) T ˜ λ a , + Q = − ( ∆ x a ) T ( A Q ) T S − Q Λ Q A Q ∆ x a ≤ , (37)so that, when in addition λ > , ( ∆ x a ) T ( A Q ) T ˜ λ a , + = 0 if and only if A Q ∆ x a = . Also,(20) yields ∇ f ( x ) T ∆ x a = − ( ∆ x a ) T M ( Q ) ∆ x a = − ( ∆ x a ) T W ∆ x a − ( ∆ x a ) T ( A Q ) T S − Q Λ Q A Q ∆ x a . (38)Since W (cid:23) H , it follows from (37) that, ∇ f ( x ) T ∆ x a + ( ∆ x a ) T H∆ x a ≤ . (39)In addition, when S Q ≻ , Λ Q ≻ and since W (cid:23) , the right-hand side of (38) is strictlynegative as long as W ∆ x a and A Q ∆ x a are not both zero. In particular, when [ W ( A Q ) T ]has full row rank, ∇ f ( x ) T ∆ x a < ∆ x a = . (40)Finally, we state and prove two technical lemmas. Lemma 3
Given an infinite index set K , { ∆ x k } → as k → ∞ , k ∈ K if and only if { ∆ x a ,k } → as k → ∞ , k ∈ K .Proof We show that k ∆ x k k is sandwiched between constant multiples of k ∆ x a ,k k . We havefrom the search direction given in (16) that, for all k , k ∆ x k − ∆ x a ,k k = k γ∆ x c ,k k ≤ τ k ∆ x a ,k k , where τ ∈ (0 ,
1) and the inequality follows from (7). Apply triangle inequalityleads to (1 − τ ) k ∆ x a ,k k ≤ k ∆ x k k ≤ (1 + τ ) k ∆ x a ,k k for all k , proving the claim. ✷ Lemma 4
Suppose Assumption 1 holds. Let Q ⊂ m , A ⊆ Q , x ∈ F oP , s := A x − b ( > ) ,and λ > enjoy the following property: With ∆ λ Q , ∆ λ a Q , ∆ s , and ∆ s a produced byIteration CR-MPC, λ i + ∆λ i > for all i ∈ A and s i + ∆s i > for all i ∈ Q \ A . Then ¯ α d ≥ min (cid:26) , min i ∈ Q \A (cid:26) s i | s i + ∆s a i | (cid:27) , min i ∈ Q \A (cid:26) s i − | ∆s a i || s i + ∆s i | (cid:27)(cid:27) (41) and ¯ α p ≥ min (cid:26) , min i ∈A (cid:26) λ i | λ i + ∆λ a i | (cid:27) , min i ∈A (cid:26) λ i − | ∆λ a i || λ i + ∆λ i | (cid:27)(cid:27) . (42) Proof
If ¯ α d ≥
1, (41) holds trivially, hence suppose ¯ α d <
1. Then, from the definition of ¯ α d in (24), we know that there exists some index i ∈ Q such that ∆λ i < − λ i < α d = λ i | ∆λ i | . (43)Since λ i + ∆λ i > i ∈ A , we have i ∈ Q \ A . Now we consider two cases: | ∆λ a i | ≥| ∆λ i | and | ∆λ a i | < | ∆λ i | . If | ∆λ a i | ≥ | ∆λ i | , then, since the second equation in (21) isequivalently written as λ i s i + s i ∆λ a i + λ i ∆s a i = 0 for all i ∈ Q and since λ s s i > i ∈ m , it follows from (43) that¯ α d = λ i | ∆λ i | ≥ λ i | ∆λ a i | = s i | s i + ∆s a i | , proving (41). To conclude, suppose now that | ∆λ a i | < | ∆λ i | . Since (i) s i + ∆s i > i ∈ Q \ A ; (ii) γ , σ , and µ ( Q ) in (35) are non-negative; and (iii) ∆λ i < λ i ( s i + ∆s i ) ≥ s i | ∆λ i | − γ | ∆s a i || ∆λ a i | . Applying this inequality to (43) leads to¯ α d = λ i | ∆λ i | ≥ s i | s i + ∆s i | − γ | ∆λ a i || ∆s a i || s i + ∆s i || ∆λ i | ≥ s i − | ∆s a i || s i + ∆s i | , where the last inequality holds since γ ≤ | ∆λ a i | < | ∆λ i | . Following a very similarargument that flips the roles of s and λ , one can prove that (42) also holds. ✷ Constraint-Reduced MPC Algorithm for Convex Quadratic Programming 25
A Proof of Theorem 1 and Corollary 2
Parts of this proof are inspired from [29], [18], [16], [34], and [35]. Throughout, we assumethat the constraint-selection rule used by the algorithm is such that Condition CSR issatisfied and (except in the proof of Lemma 13) we let ε = 0 and assume that the iterationnever stops.A central feature of Algorithm CR-MPC, which plays a key role in the convergenceproofs, is that it forces descent with respect of the primal objective function. The nextproposition establishes some related facts. Proposition 2
Suppose λ > and s > , and W satisfies W ≻ and W (cid:23) H . If ∆ x a = , then the following inequalities hold: f ( x + α∆ x a ) < f ( x ) , ∀ α ∈ (0 , , (44) ∂∂α f ( x + α∆ x a ) < , ∀ α ∈ [0 , , (45) f ( x ) − f ( x + α∆ x ) ≥ ω f ( x ) − f ( x + α∆ x a )) , ∀ α ∈ [0 , , (46) f ( x + α∆ x ) < f ( x ) , ∀ α ∈ (0 , . (47) Proof
When f ( x + α∆ x a ) is linear in α , i.e., when ( ∆ x a ) T H∆ x a = 0, then in view of (40),(44)–(45) hold trivially. When, on the other hand, ( ∆ x a ) T H∆ x a > f ( x + α∆ x a ) isquadratic and strictly convex in α and is minimized atˆ α = − ∇ f ( x ) T ∆ x a ( ∆ x a ) T H∆ x a = 1 + ( ∆ x a ) T (cid:16) W − H + ( A Q ) T S − Q Λ Q A Q (cid:17) ∆ x a ( ∆ x a ) T H∆ x a ≥ , where we have used (38), (37), and the fact that W (cid:23) H , and (44)–(45) again follow. Next,note that, since ω > ψ ( θ ) := ω ( f ( x ) − f ( x + ∆ x a )) − ( f ( x ) − f ( x + ∆ x a + θ∆ x c ))is quadratic and convex. Now, since γ satisfies the constraints in its definition (8), we seethat ψ ( γ ) ≤
0, and since ω ≤
1, it follows from (44) that ψ (0) = ( ω − f ( x ) − f ( x + ∆ x a )) ≤
0. Since γ ∈ [0 , γ ] (see (7)), it follows that ψ ( γ ) ≤
0, i.e., since from (16) ∆ x = ∆ x a + γ∆ x c , f ( x ) − f ( x + ∆ x ) ≥ ω ( f ( x ) − f ( x + ∆ x a )) , i.e., −∇ f ( x ) T ∆ x − ∆ x T H∆ x ≥ ω (cid:18) −∇ f ( x ) T ∆ x a −
12 ( ∆ x a ) T H∆ x a (cid:19) . (48)Now, for all α ∈ [0 , H (cid:23) , we can write f ( x ) − f ( x + α∆ x ) = − α ∇ f ( x ) T ∆ x − α ∆ x T H∆ x ≥ α (cid:0) −∇ f ( x ) T ∆ x − ∆ x T H∆ x (cid:1) ≥ ωα (cid:18) −∇ f ( x ) T ∆ x a −
12 ( ∆ x a ) T H∆ x a (cid:19) = ωα (cid:16) −∇ f ( x ) T ∆ x a − (cid:16) ∇ f ( x ) T ∆ x a + ( ∆ x a ) T H∆ x a (cid:17)(cid:17) ≥ αω (cid:0) −∇ f ( x ) T ∆ x a (cid:1) ≥ αω (cid:16) −∇ f ( x ) T ∆ x a − α ∆ x a ) T H∆ x a (cid:17) = ω ( f ( x ) − f ( x + α∆ x a )) , proving (46). Finally, since ω >
0, (47) is a direct consequence of (46) and (44). ✷ Given that the iterates are primal-feasible, an immediate consequence of Proposition 2 isthat the primal sequence is bounded.6 M. Paul Laiu, Andr´e L. Tits
Lemma 5
Suppose Assumption 1 holds. Then { x k } is bounded. We are now ready to prove a key result, relating two successive iterates, that plays a centralrole in the remainder of the proof of Theorem 1.
Proposition 3
Suppose Assumptions 1 and 2 hold, and either { ( x k , λ k ) } is bounded awayfrom F ∗ , or Assumption 3 also holds and { x k } converges to the unique primal solution x ∗ .Let K be an infinite index set such that (cid:18) inf k ∈ K { χ k − } = (cid:19) inf n k ∆ x a ,k − k ν + k [˜ λ a ,kQ k − ] − k ν : k ∈ K o > . (49) Then { ∆ x k } → as k → ∞ , k ∈ K .Proof From Lemma 5, { x k } is bounded, and hence so is { s k } ; by construction, s k and λ k have positive components for all k , and { λ k } ((26)–(27)) and { W k } are bounded. Further,for any infinite index set K ′ such that (49) holds, (26) and (27) imply that all componentsof { λ k } are bounded away from zero on K ′ . Since, in addition, Q k can take no more thanfinitely many different (set) values, it follows that there exist ˆ x , ˆ λ > , ˆ W (cid:23) , an indexset ˆ Q ⊆ m , and some infinite index set ˆ K ⊆ K ′ such that { x k } → ˆ x as k → ∞ , k ∈ ˆ K , { s k } → ˆ s := { A ˆ x − b } ≥ as k → ∞ , k ∈ ˆ K . (50) { λ k } → ˆ λ > as k → ∞ , k ∈ ˆ K , (51) { W k } → ˆ W as k → ∞ , k ∈ ˆ K ,Q k = ˆ Q , ∀ k ∈ ˆ K . (52)Next, under the stated assumptions, J ( ˆ W , A ˆ Q , ˆ s ˆ Q , ˆ λ ˆ Q ) is non-singular. Indeed, if { ( x k , λ k ) } is bounded away from F ∗ , then E ( x k , λ k ) is bounded away from zero and since H + R ≻ , W k = H + ̺ k R = H + min (cid:26) , E ( x k , λ k )¯ E (cid:27) R is bounded away from singularity and theclaim follows from Assumption 2 and Lemma 1. On the other hand, if Assumption 3 alsoholds and { x k } → x ∗ , then the claim follows from Condition CSR(ii) and Lemma 1. As aconsequence of this claim, and by continuity of J , it follows from Newton-KKT systems (10)and (32) that there exist ∆ ˆ x a , ∆ ˆ x , ¯ λ aˆ Q , ¯ λ ˆ Q such that { ∆ x a ,k } → ∆ ˆ x a as k → ∞ , k ∈ ˆ K , (53) { ∆ x k } → ∆ ˆ x as k → ∞ , k ∈ ˆ K , { ∆ s k } → ∆ ˆ s := A∆ ˆ x as k → ∞ , k ∈ ˆ K , (54) { ˜ λ a ,k +1ˆ Q } → ¯ λ aˆ Q as k → ∞ , k ∈ ˆ K , (55) { ˜ λ k +1ˆ Q } → ¯ λ ˆ Q as k → ∞ , k ∈ ˆ K , (56)The remainder of the proof proceeds by contradiction. Thus suppose that, for the infiniteindex set K in the statement of this lemma, { ∆ x k } 6→ as k → ∞ , k ∈ K , i.e., for some K ′′ ⊆ K , k ∆ x k k is bounded away from zero on K ′′ . Use K ′′ as our K ′ above, so that (sinceˆ K ⊆ K ′ ), k ∆ x k k is bounded away from zero on ˆ K . Then, in view of Lemma 3 (w.l.o.g.),inf k ∈ ˆ K k ∆ x a ,k k > . (57)In addition, we have A (ˆ x ) ⊆ ˆ Q , an implication of Condition CSR(i) when { ( x k , λ k ) } isbounded away from F ∗ and of Condition CSR(ii) when Assumption 3 holds and { x k } Constraint-Reduced MPC Algorithm for Convex Quadratic Programming 27converges to x ∗ . With these facts in hand, we next show that the sequence of primal stepsizes { α k p } is bounded away from zero for k ∈ ˆ K . To this end, let us define˜ λ ′ ,k +1 := − ( S k ) − Λ k ∆ s k , ∀ k , (58)so that, for all i ∈ m and all k , ˜ λ ′ ,k +1 i > ∆s ki <
0, and the primal portionof (24) can be written as¯ α k p := ∞ if ˜ λ ′ ,k +1 ≤ , min i (cid:26) λ ki ˜ λ ′ ,k +1 i : ˜ λ ′ ,k +1 i > (cid:27) otherwise. α k p := min n , max { κ ¯ α p , ¯ α p − k ∆ x k k} o . Clearly, it is now sufficient to show that, for all i , { ˜ λ ′ ,k +1 i } is bounded above on ˆ K . On theone hand, this is clearly so for i ˆ Q (whence i
6∈ A (ˆ x )), in view of (58) and (54), since { λ k } is bounded and { s ki } is bounded away from zero on ˆ K for i
6∈ A (ˆ x ) (from (50)). Onthe other hand, in view of (52), subtracting (58) from (35) yields, for all k ∈ ˆ K ,˜ λ ′ k +1ˆ Q = ˜ λ k +1ˆ Q − γ k σ k µ k ( ˆ Q ) ( S k ˆ Q ) − + γ k ( S k ˆ Q ) − ∆S a ,k ˆ Q ∆ λ a ,k ˆ Q . From (56), { ˜ λ k +1ˆ Q } is bounded on ˆ K , and clearly the second term in the right-hand sideof the above equation is non-positive component-wise. As for the third term, the secondequation in (21) gives ( S kQ k ) − ∆S a ,kQ k = ( Λ kQ k ) − ˜ Λ a ,k +1 Q k , so that we have γ k ( S k ˆ Q ) − ∆S a ,k ˆ Q ∆ λ a ,k ˆ Q = γ k ( Λ k ˆ Q ) − ˜ Λ a ,k +1ˆ Q ∆ λ a ,k ˆ Q , ∀ k ∈ ˆ K , which is bounded on ˆ K since, from (51), (55), and the definition (34) of { ˜ λ a , + } , both { ˜ Λ a ,k +1ˆ Q } and { ∆ λ a ,k ˆ Q } are bounded, and from (51), { λ k ˆ Q } is bounded away from zero on ˆ K .Therefore, { ˜ λ ′ ,k +1 i } is bounded above on ˆ K for i ∈ ˆ Q as well, proving that { α k p } is boundedaway from zero on ˆ K , i.e., that there exists α > α k p > α , for all k ∈ ˆ K , asclaimed. Without loss of generality, choose α in (0 , { f ( x k ) } → −∞ as k → ∞ on ˆ K , which contradicts boundednessof { x k } (Lemma 5). For all k ∈ ˆ K , since ∆ x a ,k = (by (57)) and α k p ∈ ( α, { f ( x k ) } is monotonically decreasing and that, for all k ∈ ˆ K , f ( x k + α k p ∆ x a ,k ) < f ( x k + α∆ x a ,k ) . Expanding the right-hand side yields f ( x k + α∆ x a ,k ) = f ( x k ) + α ∇ f ( x k ) T ∆ x a ,k + α ∆ x a ,k ) T H∆ x a ,k = f ( x k ) + α (cid:16) ∇ f ( x k ) T ∆ x a ,k + ( ∆ x a ,k ) T H∆ x a ,k (cid:17) − (cid:18) α − α (cid:19) ( ∆ x a ,k ) T H∆ x a ,k , where the sum of the last two terms tends to a strictly negative limit as k → ∞ , k ∈ ˆ K .Indeed, in view of (39), the second term is non-positive and (i) if ( ∆ ˆ x a ) T H∆ ˆ x a > α > α /
2, from (53) and (57), the third term tends to a negative limit, and (ii)if ( ∆ ˆ x a ) T H∆ ˆ x a = 0 then the sum of the last two terms tends to α ∇ f (ˆ x ) T ∆ ˆ x a whichis also strictly negative in view of (40), since we either have ˆ W ≻ (in the case that { ( x k , λ k ) } bounded away from F ∗ ) or at least [ ˆ W ( A ˆ Q ) T ] full row rank (in the case thatAssumption 3 holds and using the fact that A (ˆ x ) ⊆ ˆ Q ). It follows that, for some δ > f ( x k + α k p ∆ x a ,k ) < f ( x k ) − δ for all k ∈ ˆ K large enough. Proposition 2 (eq. (46)) thengives that f ( x k +1 ) := f ( x k + α k p ∆ x k ) < f ( x k ) − ω δ for all k ∈ ˆ K large enough, where ω > { f ( x k ) } is monotonically decreasing, the proof isnow complete. ✷ { ∆ x a ,k } tends to zero, then { x k } approaches stationary points. (Here both { ˜ λ a ,k +1 } and { ˜ λ k +1 } are as defined in (34).) Lemma 6
Suppose that Assumption 1 holds and that { x k } converges to some limit point ˆ x on an infinite index set K . If { ∆ x a ,k } converges to zero on K , then (i) ˆ x is stationaryand ∇ f ( x k ) − (cid:0) A A (ˆ x ) (cid:1) T ˜ λ a ,k +1 A (ˆ x ) → , as k → ∞ , k ∈ K . (59)
If, in addition, Assumption 2 holds, then (ii) { ˜ λ a ,k +1 } and { ˜ λ k +1 } converge on K to ˆ λ ,the unique multiplier associated with ˆ x .Proof Suppose { x k } → ˆ x on K and { ∆ x a ,k } → on K . Let s k := A x k − b ( > ) for all k ∈ K and ˆ s := A ˆ x − b ( ≥ ), so that { s k } → ˆ s on K . As a first step toward provingClaim (i), we show that, for any i
6∈ A (ˆ x ), { ˜ λ a ,k +1 i } → K . For i
6∈ A (ˆ x ), since ˆ s i > { s ki } is bounded away from zero on K . Since it follows from (34) and (36) that, for all k ,˜ λ a ,k +1 i = 0 , ∀ i Q k and ˜ λ a ,k +1 i = − ( s ki ) − λ ki ∆s a ,ki , ∀ i ∈ Q k , and since { λ ki } is bounded (by construction) and ∆ s a ,k = A∆ x a ,k (by (21)), we have { ˜ λ a ,k +1 i } → K . To complete the proof of Claim (i), note that the first equation of (10)(with H replaced by W ) yields ∇ f ( x k ) − ( A Q k ) T ˜ λ a ,k +1 Q k = − W k ∆ x a ,k . Since (i) { ˜ λ a ,k +1 i } → K for i
6∈ A (ˆ x ), (ii) { W k } is bounded (since H (cid:22) W k (cid:22) H + R ),(iii) { ∆ x a ,k } → on K , and (iv) by definition (34), ˜ λ a , + i = 0 for i ∈ Q c , we concludethat (59) holds, hence { (cid:0) A A (ˆ x ) (cid:1) T ˜ λ a ,k +1 A (ˆ x ) } converges (since ∇ f ( x k ) does) as k → ∞ , k ∈ K ,to a point in the range of (cid:0) A A (ˆ x ) (cid:1) T , say (cid:0) A A (ˆ x ) (cid:1) T ˆ λ A (ˆ x ) . We get ∇ f (ˆ x ) − (cid:0) A A (ˆ x ) (cid:1) T ˆ λ A (ˆ x ) = , proving Claim (i). Finally, Claim (ii) follows from (59), Assumption 2, and the fact thatfor i
6∈ A (ˆ x ), { ˜ λ a ,k +1 i } → k → ∞ , k ∈ K , noting that the same argument appliesto { ˜ λ k +1 } , using a modified version of (59), with ˜ λ replacing ˜ λ a , obtained by startingfrom the first equation of (32) instead of that of (10) and using the fact, proved next, that { ˜ λ k +1 i } → K for all i
6∈ A (ˆ x ). From its definition in (34) and the last equation in (33),we have that, for all k , ˜ λ k +1 i = 0 , ∀ i Q k , ˜ λ k +1 i = ( s ki ) − ( − λ ki ∆s ki + γ k σ k µ ( k )( Q k ) − γ k ∆s a ,ki ∆λ a ,ki ) , ∀ i ∈ Q k . Since { ˜ λ a ,k +1 i } converges (to zero) on K , { ∆λ a ,ki } is bounded on K . Furthermore, from itsdefinition (7)–(8) (see also (16)), { γ k } is bounded and | γ k σ k µ ( k )( Q k ) | ≤ τ k ∆ x a ,k k for all k .Since ∆ s a ,k = A∆ x a ,k and ∆ s k = A∆ x k , in view of Lemma 3, it follows that, for i
6∈ A (ˆ x ), { ˜ λ k +1 i } → K . ✷ Lemma 6, combined with Proposition 3 via a contradiction argument, then implies that (ona subsequence), if { x k } does not approach F ∗ P , then { ∆ x k } approaches zero. Lemma 7
Suppose that Assumptions 1 and 2 hold and that { x k } is bounded away from F ∗ P on some infinite index set K . Then { ∆ x k } → as k → ∞ , k ∈ K .Proof Proceeding by contradiction, let K be an infinite index set such that { x k } is boundedaway from F ∗ P on K and { ∆ x k } 6→ as k → ∞ , k ∈ K . Then, in view of Proposition 3 Constraint-Reduced MPC Algorithm for Convex Quadratic Programming 29and boundedness of { x k } (Lemma 5), there exist ˆ Q ⊆ m , ˆ x
6∈ F ∗ P , and an infinite index setˆ K ⊆ K such that Q k = ˆ Q for all k ∈ ˆ K and { x k } → ˆ x , as k → ∞ , k ∈ ˆ K , { ∆ x a ,k − } → , as k → ∞ , k ∈ ˆ K , { [˜ λ a ,k ˆ Q ] − } → , as k → ∞ , k ∈ ˆ K .
On the other hand, from (25), (16) and (7)–(8), k x k − x k − k = k α k − ∆ x k − k ≤ k ∆ x k − k ≤ (1 + τ ) k ∆ x a ,k − k , which implies that { x k − } → ˆ x as k → ∞ , k ∈ ˆ K . It then follows from Lemma 6 thatˆ x is stationary and that [˜ λ a ,k A (ˆ x ) ] + converges to the associated multiplier vector. Hence themultipliers are non-negative, contradicting the fact that ˆ x
6∈ F ∗ P . ✷ A contradiction argument based on Lemmas 6 and 7 then shows that { x k } approaches theset of stationary points of (P). Lemma 8
Suppose Assumptions 1 and 2 hold. Then the sequence { x k } approaches the setof stationary points of (P) , i.e., there exists a sequence { ˆ x k } of stationary points such that k x k − ˆ x k k goes to zero as k → ∞ .Proof Proceeding by contradiction, suppose the claim does not hold, i.e., (invoking Lemma 5)suppose { x k } converges to some non-stationary point ˆ x on some infinite index set K . Then { ∆ x a ,k } does not converge to zero on K (Lemma 6(i)) and nor does { ∆ x k } (Lemma 3).Since ˆ x is non-stationary, this is in contradiction with Lemma 7. ✷ The next technical result, proved in [29, Lemma 3.6], invokes analogues of Lemmas 5, 7and 8.
Lemma 9
Suppose Assumptions 1 and 2 hold. Suppose { x k } is bounded away from F ∗ P .Let ˆ x and ˆ x ′ be limit points of { x k } and let ˆ λ and ˆ λ ′ be the associated KKT multipliers.Then ˆ λ = ˆ λ ′ . Convergence of { x k } to F ∗ P ensues, proving Claim (i) of Theorem 1. Lemma 10
Suppose Assumptions 1 and 2 hold. Then { x k } converges to F ∗ P .Proof We proceed by contradiction. Thus suppose { x k } does not converge to F ∗ P . Then,since { x k } is bounded (Lemma 5), (by Proposition 2) { f ( x k ) } is a bounded, monotonicallydecreasing sequence, and it has at least one limit point ˆ x that is not in F ∗ P . Hence, f (ˆ x ) =inf k f ( x k ). Then, by Lemmas 7 and 3, { ∆ x k } and { ∆ x a ,k } converge to zero as k → ∞ . Itfollows from Lemmas 6 and 9 that all limit points of { x k } are stationary, and that both { ˜ λ a ,k } and { ˜ λ k } converge to ˆ λ , the common KKT multiplier vector associated to all limitpoints of { x k } . Since ˆ x
6∈ F ∗ P , there exists i such that ˆ λ i <
0, so that, for some ˆ k > λ a ,k +1 i < λ k +1 i < , ∀ k > ˆ k , (60)which, in view of Step 8 of the algorithm, implies that i ∈ Q k for all k > ˆ k . Then (36) gives ∆s a ,ki = − ( λ ki ) − s ki ˜ λ a ,k +1 i , ∀ k > ˆ k , where s ki > λ ki > ∆s a ,ki > k > ˆ k . Onthe other hand, the last equation of (33) gives ∆s ki = ( λ ki ) − ( − s ki ˜ λ k +1 i + γ k σ k µ k ( Q k ) − γ k ∆s a ,ki ∆λ a ,ki ) , ∀ k > ˆ k , (61)0 M. Paul Laiu, Andr´e L. Titswhere γ k ≥ σ k ≥
0, and µ k ( Q k ) ≥ k > ˆ k , ∆λ a ,ki < λ ki > λ a ,k +1 i (= λ ki + ∆λ a ,ki ) <
0. It follows that all terms in (61) are non-negativeand the first term is positive, so that ∆s ki > k > k ′ . Moreover, for all k > ˆ k , wehave s k +1 i = s ki + α k p ∆s ki > s ki >
0, where α k p > s k > . Since { s k } is bounded(Lemma 5), we then conclude that { s ki } → ˆ s i > s i ˆ λ i <
0, in contradictionwith the stationarity of limit points. ✷ Under strict complementarity, the next lemma then establishes appropriate convergence ofthe multipliers, setting the stage for the proof of part (ii) of Theorem 1 in the followinglemma.
Lemma 11
Suppose Assumptions 1 to 3 hold and let ( x ∗ , λ ∗ ) be the unique primal-dualsolution. Then, given any infinite index set K such that { ∆ x a ,k } k ∈ K → , it holds that { λ k +1 } k ∈ K → ξ ∗ , where ξ ∗ i := min { λ ∗ i , λ max } , for all i ∈ m .Proof Lemma 10 guarantees that { x k } → x ∗ . Let K be an infinite index set such that { ∆ x a ,k } k ∈ K → . Then, in view of Lemma 6(ii), { ˜ λ a ,k +1 } k ∈ K → λ ∗ ≥ . Accordingly, { χ k } = {k ∆ x a ,k k ν + k [˜ λ a ,k +1 Q k ] − k ν } → k → ∞ , k ∈ K . Hence, in view of (26) and (27),the proof will be complete if we show that { ˘ λ k +1 } k ∈ K → λ ∗ , where ˘ λ k +1 i := λ ki + α k d ∆λ ki , i ∈ Q k , and ˘ λ k +1 i := µ k +1( Q k ) /s k +1 i , i ∈ Q c k , or equivalently, { ˜ λ k +1 − ˘ λ k +1 } k ∈ K → , which we do now.For every Q ⊆ m , define the index set K ( Q ) := { k ∈ K : Q k = Q } , and let Q := { Q ⊆ m : | K ( Q ) | = ∞} . We first show that for all Q ∈ Q , { ˜ λ k +1 Q − ˘ λ k +1 Q } k ∈ K ( Q ) → . For Q ∈ Q , the definition (34) of ˜ λ + yields k ˜ λ k +1 Q − ˘ λ k +1 Q k = (1 − α k d ) k ∆ λ kQ k , k ∈ K ( Q ) . Since boundedness of { λ kQ } (by construction) and of { ˜ λ k +1 Q } k ∈ K (= { λ kQ + ∆ λ kQ ) } k ∈ K )(by Lemma 6(ii)) implies boundedness of { ∆ λ kQ } k ∈ K , we only need { α k d } k ∈ K ( Q ) → k ˜ λ k +1 Q − ˘ λ k +1 Q k → K ( Q ). Now, { ∆ x a ,k } k ∈ K → implies that { ∆ s a ,k } k ∈ K → , and from Lemma 3 that { ∆ x k } k ∈ K → , implying that { ∆ s k } k ∈ K → ;and { x k } → x ∗ yields { s k } → s ∗ := A x ∗ − b , so s ki + ∆s ki > i ∈ A ( x ∗ ) c , k ∈ K large enough. Moreover, Assumption 3 gives λ ∗ i > i ∈ A ( x ∗ ) so that, forsufficiently large k ∈ K , ˜ λ k +1 i > i ∈ A ( x ∗ ), and Condition CSR(ii) implies that A ( x ∗ ) ⊆ Q , so Lemma 4 applies, with A := A ( x ∗ ). It follows that { ¯ α k d } k ∈ K →
1, since allterms on the right-hand side of (41) converge to one on K . Thus, from the definition of α k d in (24) and the fact that { ∆ x k } k ∈ K → , we have { α k d } k ∈ K → { ˜ λ k +1 Q − ˘ λ k +1 Q } k ∈ K ( Q ) → .It remains to show that, for all Q ∈ Q , { ˜ λ k +1 Q c − ˘ λ k +1 Q c } k ∈ K ( Q ) → . To show this, wefirst note that, since { χ k } k ∈ K →
0, it follows from (26) and (27) and the fact establishedabove that { ˘ λ k +1 Q } K ( Q ) → λ ∗ Q that, for all Q ∈ Q , { λ k +1 Q } → ξ ∗ Q , k → ∞ , k ∈ K ( Q ) . (62)Next, from (26), (27), and the definition (34) of ˜ λ + , we have, for Q ∈ Q and sufficientlylarge k ∈ K ( Q ), | ˜ λ k +1 i − ˘ λ k +1 i | = ˘ λ k +1 i = µ k +1( Q ) s k +1 i , i ∈ Q c . (63) Constraint-Reduced MPC Algorithm for Convex Quadratic Programming 31Clearly, since A ( x ∗ ) ⊆ Q , we have s ∗ i > i ∈ Q c . Hence, since { ∆ s k } k ∈ K → , { s k +1 i } is bounded away from zero on K for i ∈ Q c . When Q is empty, the right-hand side of (63)is set to zero (see definition (14) of µ ( Q ) ). When Q is not empty, since ξ ∗ i = 0 whenever λ ∗ i = 0, (62) and complementary slackness gives n µ k +1( Q ) o = ( ( s k +1 Q ) T λ k +1 Q | Q | ) → ( ( s ∗ Q ) T ξ ∗ Q | Q | ) = 0 , k ∈ K ( Q ) , and it follows from (63) that { ˜ λ k +1 Q c − ˘ λ k +1 Q c } k ∈ K ( Q ) → , completing the proof. ✷ Claim (ii) of Theorem 1 can now be proved.
Lemma 12
Suppose Assumptions 1 to 3 hold and let ( x ∗ , λ ∗ ) be the unique primal-dualsolution. Then { ˜ λ k } → λ ∗ and { λ k } → ξ ∗ , with ξ ∗ i := min { λ ∗ i , λ max } for all i ∈ m .Proof Again, Lemma 10 guarantees that { x k } → x ∗ and { s k } → s ∗ := A x ∗ − b . Notethat if { ∆ x a ,k } → , the claims are immediate consequences of Lemmas 6 and 11. Wenow prove by contradiction that { ∆ x a ,k } → . Thus, suppose that for some infinite indexset K , inf k ∈ K k ∆ x a ,k k >
0. Then, Lemma 3 gives inf k ∈ K k ∆ x k k >
0. It follows fromProposition 3 that, on some infinite index set K ′ ⊆ K , { ∆ x a ,k − } → and { [˜ λ a ,k ] − } → .Since Q k is selected from a finite set and { W k } is bounded, we can assume without lossof generality that Q k = Q on K ′ for some Q ⊆ m , and that { W k } → W ∗ (cid:23) H on K ′ .Further, from Lemma 11, { λ k } k ∈ K ′ → ξ ∗ . Therefore, { J ( W k , A Q k , s Q k , λ Q k ) } k ∈ K ′ → J ( W ∗ , A Q , s ∗ Q , ξ ∗ Q ), and in view of Assumptions 2 and 3 and Lemma 1, J ( W ∗ , A Q , s ∗ Q , ξ ∗ Q )is non-singular (since ( x ∗ , λ ∗ ) is optimal). It follows from (10), with W substituted for H ,that { ∆ x a ,k } → on K ′ , a contradiction, proving that { ∆ x a ,k } → . ✷ Claim (iv) of Theorem 1 follows as well.
Lemma 13
Suppose Assumptions 1 and 2 hold and ε > . Then Algorithm CR-MPCterminates (in Step 1) after finitely many iterations.Proof If { ( x k , λ k ) } has a limit point in F ∗ , then inf k { E k } = 0, proving the claim. Thus,suppose that { ( x k , λ k ) } is bounded away from F ∗ . In view of Lemmas 5 and 10, { x k } has alimit point x ∗ ∈ F ∗ P . Assumption 2 then implies that there exists a unique KKT multipliervector λ ∗ ≥ associated to x ∗ . If ( x ∗ , λ ∗ ) ∈ F ∗ is a limit point of { ( x k , ˜ λ k ) } , which alsoimplies that inf k { E ( x k , ˜ λ k ) } = 0, then in view of the stopping criterion, the claim againfollows. Thus, further suppose that there is an infinite index set K such that { x k } k ∈ K → x ∗ ,but inf k ∈ K k ˜ λ k − λ ∗ k >
0. It then follows from Lemma 6(ii) that { ∆ x a ,k − } k ∈ K ,and from Lemma 3 that { ∆ x k − } k ∈ K . Proposition 3 and Lemma 3 then imply that { ∆ x a ,k − } k ∈ K ′ → and { ∆ x k − } k ∈ K ′ → for some infinite index set K ′ ⊆ K . Next, fromLemmas 5 and 10, we have { x k − } k ∈ K ′′ → x ∗∗ ∈ F ∗ P for some infinite index set K ′′ ⊆ K ′ ,and in view of Lemma 6(ii) { ˜ λ k − } k ∈ K ′′ → λ ∗∗ , where λ ∗∗ is the KKT multiplier associatedto x ∗∗ . Since α k p ∈ [0 ,
1] for all k , we also have { x k − } k ∈ K ′′ = { x k − + α k − ∆ x k − } k ∈ K ′′ → x ∗∗ , i.e., { ( x k − , ˜ λ k − ) } k ∈ K ′′ → ( x ∗∗ , λ ∗∗ ) ∈ F ∗ , completing the proof. ✷ Proof of Theorem 1.
Claim (i) was proved in Lemma 10 and Claim (ii) in Lemma 12,Claim (iii) is a direct consequence of Condition CSR(ii), and Claim (iv) was proved inLemma 13.
Proof of Corollary 2.
From Theorem 1, { ( x k , λ k ) } → ( x ∗ , λ ∗ ), i.e., { E k } →
0. It followsthat (i) in view of Proposition 1 and Condition CSR(ii) Q k ⊇ A ( x ∗ ) for all k large enough,and (ii) in view of Rule R, { δ k } →
0, so that Q k eventually excludes all indexes that arenot in A ( x ∗ ). ✷ B Proof of Theorem 2
Parts of this proof are adapted from [16,29,34]. Throughout, we assume that Assumption 3holds (so that Assumption 1 also holds), that ε = 0 and that the iteration never stops, andthat λ ∗ i < λ max for all i .Newton’s method plays the central role in the local analysis. The following lemma isstandard or readily proved; see, e.g., [29, Proposition 3.10]. Lemma 14
Let Φ : R n → R n be twice continuously differentiable and let t ∗ ∈ R n such that Φ ( t ∗ ) = . Suppose there exists ρ > such that ∂Φ∂ t ( t ) is non-singular for all t ∈ B ( t ∗ , ρ ) .Define ∆ N t to be the Newton increment at t , i.e., ∆ N t = − (cid:16) ∂Φ∂ t ( t ) (cid:17) − Φ ( t ) . Then, givenany c > , there exists c ∗ > such that, for all t ∈ B ( t ∗ , ρ ) , if t + ∈ R n satisfies min {| t + i − t ∗ i | , | t + i − ( t i + ( ∆ N t ) i ) |} ≤ c max {k ∆ N t k , k t − t ∗ k } , i = 1 , . . . , n , (64) then k t + − t ∗ k ≤ c ∗ k t − t ∗ k . For convenience, define z := ( x , λ ) (as well as z ∗ := ( x ∗ , λ ∗ ), etc.). For z ∈ F o := { z : x ∈ F oP , λ > } , define ̺ ( z ) := min (cid:26) , E ( x , λ )¯ E (cid:27) and W ( z ) := H + ̺ ( z ) R .
The gist of the remainder of this appendix is to apply Lemma 14 to Φ Q ( z ) := (cid:20) H x − ( A Q ) T λ Q + c Λ Q ( A Q x − b Q ) (cid:21) , Q ⊆ m . (Note that Φ Q ( z ∗ ) = .) Let z Q := ( x , λ Q ), then the step taken on the Q componentsalong the search direction generated by the Algorithm CR-MPC is analogously given by ˘z + Q := ( x + , ˘ λ + Q ) with ˘ λ + Q := λ Q + α d ∆ λ Q . The first major step of the proof is achievedby Proposition 4 below, where the focus is on ˘z + Q rather than on z + . Thus we compare ˘z + Q ,with Q ∈ Q ∗ to the Q components of the (unregularized) Newton step, i.e., z Q + ( ∆ N z ) Q .Define A := (cid:20) α p I n α d I | Q | (cid:21) , and α := min { α p , α d } . The difference between the CR-MPC iteration and the Newton iteration can be written as k ˘z + Q − ( z Q + ( ∆ N z ) Q ) k≤ k ˘z + Q − ( z Q + ∆ z Q ) k + k ∆ z Q − ∆ z a Q k + k ∆ z a Q − ∆ z Q k + k ∆ z Q − ( ∆ N z ) Q k = k ( I − A ) ∆ z Q k + γ k ∆ z c Q k + k ∆ z a Q − ∆ z Q k + k ∆ z Q − ( ∆ N z ) Q k≤ (1 − α ) k ∆ z Q k + k ∆ z c Q k + k ∆ z a Q − ∆ z Q k + k ∆ z Q − ( ∆ N z ) Q k , (65)where ∆ z Q := ( ∆ x , ∆ λ Q ), ∆ z a Q := ( ∆ x a , ∆ λ a Q ), ∆ z c Q := ( ∆ x c , ∆ λ c Q ), and ∆ z Q isthe (constraint-reduced) affine-scaling direction for the original (unregularized) system (so ∆ N z = ∆ z m ).Let J a ( W, A, s , λ ) := (cid:20) W − A T ΛA S (cid:21) . The following readily proved lemma will be of help. (For details, see Lemmas B.15 and B.16in [16]; also Lemmas 13 and 1 in [28])
Lemma 15
Let s , λ ∈ R m and Q ⊆ m be arbitrary and let W be symmetric, with W (cid:23) H .Then (i) J a ( W, A Q , s Q , λ Q ) is non-singular if and only if J ( W, A Q , s Q , λ Q ) is, and (ii) if A ( x ∗ ) ⊆ Q , then J ( W, A Q , s ∗ Q , λ ∗ Q ) is non-singular (and so is J a ( W, A Q , s ∗ Q , λ ∗ Q ) ). Constraint-Reduced MPC Algorithm for Convex Quadratic Programming 33With s := A x − b , J a ( H, A Q , s Q , λ Q ), the system matrix for the (constraint-reduced) orig-inal (unregularized) “augmented” system, is the Jacobian of Φ Q ( z ), i.e., J a ( H, A Q , s Q , λ Q ) ∆ z Q = − Φ Q ( z ) , and its regularized version J a ( W ( z ) , A Q , s Q , λ Q ) satisfies (among other systems solved byAlgorithm CR-MPC) J a ( W ( z ) , A Q , s Q , λ Q ) ∆ z a Q = − Φ Q ( z ) . Next, we verify that J a ( W ( z ) , A Q , s Q , λ Q ) is non-singular near z ∗ (so that ∆ z Q and ∆ z a Q in (65) are well defined) and establish other useful local properties. For convenience, wedefine Q ∗ := { Q ⊆ m : A ( x ∗ ) ⊆ Q } . and ˜ s + := s + ∆ s , ˜ s a , + := s + ∆ s a . Lemma 16
Let ǫ ∗ := min { , min i ∈ m ( λ ∗ i + s ∗ i ) } . There exist ρ ∗ > and r > , such that,for all z ∈ F o ∩ B ( z ∗ , ρ ∗ ) and all Q ∈ Q ∗ , the following hold:(i) k J a ( W ( z ) , A Q , λ Q , s Q ) − k ≤ r ,(ii) max {k ∆ z a Q k , k ∆ z Q k , k ∆ s a Q k , k ∆ s Q k} < ǫ ∗ / ,(iii) min { λ i , ˜ λ a , + i , ˜ λ + i } > ǫ ∗ / , ∀ i ∈ A ( x ∗ ) , max { λ i , ˜ λ a , + i , ˜ λ + i } < ǫ ∗ / , ∀ i ∈ m \ A ( x ∗ ) , max { s i , ˜ s a , + i , ˜ s + i } < ǫ ∗ / , ∀ i ∈ A ( x ∗ ) , min { s i , ˜ s a , + i , ˜ s + i } > ǫ ∗ / , ∀ i ∈ m \ A ( x ∗ ) .(iv) ˜ λ + i < λ max , ∀ i ∈ m .Proof Claim (i) follows from Lemma 15, continuity of J a ( W ( z ) , A Q , λ Q , s Q ) (and the factthat W ( z ∗ ) = H ). Claims (ii) and (iv) follow from Claim (i), Lemma 15, continuity of theright-hand sides of (10) and (32), which are zero at the solution, definition (34) of ˜ λ + , andour assumption that λ ∗ i < λ max for all i ∈ m . Claim (iii) is true due to strict complementaryslackness, the definition of ǫ ∗ , and Claim (ii). ✷ In preparation for Proposition 4, Lemmas 17–20 provide bounds on the four terms inthe last line of (65). The ρ ∗ used in these lemmas comes from Lemma 16. The proofs ofLemmas 17, 18, and 20 are omitted, as they are very similar to those of Lemmas A.9 andA.10 in the supplementary materials of [34] (where an MPC algorithm for linear optimizationproblems is considered) and of Lemma B.19 in [16] (also Lemma 16 in [28]). Lemma 17
There exists a constant c > such that, for all z ∈ F o ∩ B ( z ∗ , ρ ∗ ) , and forall Q ∈ Q ∗ , k ∆ z c Q k ≤ c k ∆ z a Q k . Note that an upper bound on the magnitude of the MPC search direction ∆ z Q can beobtained by using Lemma 17 and Lemma 16(ii), viz. k ∆ z Q k ≤ k ∆ z a Q k + k ∆ z c Q k ≤ k ∆ z a Q k + c k ∆ z a Q k ≤ (cid:18) c ǫ ∗ (cid:19) k ∆ z a Q k . (66)This bound is used in the proofs of Lemma 18 and Proposition 4. Lemma 18
There exists a constant c > such that, for all z ∈ F o ∩ B ( z ∗ , ρ ∗ ) , and forall Q ∈ Q ∗ , | − α | ≤ c k ∆ z a Q k . Lemma 19
There exists a constant c > such that, for all z ∈ F o ∩ B ( z ∗ , ρ ∗ ) and all Q ∈ Q ∗ , k ∆ z a Q − ∆ z Q k ≤ c k z − z ∗ k . Proof
We have ∆ z a Q − ∆ z Q = − ( J a ( W ( z ) , A Q , s Q , λ Q ) − − J a ( H, A Q , s Q , λ Q ) − ) Φ Q ( z )so that there exist c > z ∈ F o ∩ B ( z ∗ , ρ ∗ ) and all Q ∈ Q ∗ , k ∆ z a Q − ∆ z Q k ≤ c k W ( z ) − H kk z − z ∗ k , where the second inequality follows from Lemma 16(i). Since W ( z ) − H = ̺ ( z ) R , | ̺ ( z ) | ≤ c | E ( z ) | , and | E ( z ) | ≤ c k z − z ∗ k , for some c > c >
0, the proof is complete. ✷ Lemma 20
There exists a constant c > such that, for all z ∈ F o ∩ B ( z ∗ , ρ ∗ ) , and forall Q ∈ Q ∗ , k ∆ z Q − ( ∆ N z ) Q k ≤ c k z − z ∗ kk ( ∆ N z ) Q k . With Lemmas 17–20 in hand, we return to inequality (65).
Proposition 4
There exists a constant c > such that, for all z ∈ F o ∩ B ( z ∗ , ρ ∗ ) , andfor all Q ∈ Q ∗ , k ˘z + Q − ( z Q + ( ∆ N z ) Q ) k ≤ c max {k ∆ N z k , k z − z ∗ k } . (67) Proof
Let z ∈ F o ∩ B ( z ∗ , ρ ∗ ) and Q ∈ Q ∗ . It follows from (65), Lemmas 17–20, and (66)that k ˘z + Q − ( z Q + ( ∆ N z ) Q ) k ≤ (1 − α ) k ∆ z Q k + k ∆ z c Q k + k ∆ z a Q − ∆ z Q k + k ∆ z Q − ( ∆ N z ) Q k≤ c k ∆ z a Q kk ∆ z Q k + c k ∆ z a Q k + c k z − z ∗ k + c k z − z ∗ kk ( ∆ N z ) Q k≤ (cid:18) c (cid:18) c ǫ ∗ (cid:19) + c (cid:19) k ∆ z a Q k + c k z − z ∗ k + c k z − z ∗ kk ( ∆ N z ) Q k . Also, by Lemmas 19 and 20, we have k ∆ z a Q k ≤ k ∆ z a Q − ∆ z Q k + k ∆ z Q − ( ∆ N z ) Q k + k ( ∆ N z ) Q k≤ c k z − z ∗ k + c k z − z ∗ kk ( ∆ N z ) Q k + k ( ∆ N z ) Q k . (68)The claim follows (in view of boundedness of F o ∩ B ( z ∗ , ρ ∗ )). ✷ With Proposition 4 established, we proceed to the second major step of the proof ofTheorem 2: to show that (67) still holds when z + is substituted for ˘z + Q . Proof of Theorem 2.
Again, let ρ ∗ be as given in Lemma 16. Let z ∈ F o ∩ B ( z ∗ , ρ ∗ ) and Q ∈ Q ∗ . Let ρ := ρ ∗ , t := z , and t ∗ := z ∗ . Then the desired q-quadratic convergence isa direct consequence of Lemma 14, provided that the condition (64) is satisfied. Hence, wenow show that there exists some constant c > i ∈ m ,min {| z + i − z ∗ i | , | z + i − ( z i + ( ∆ N z ) i ) |} ≤ c max {k ∆ N z k , k z − z ∗ k } . (69)As per Proposition 4, (69) holds for i ∈ Q with z + i replaced with ˘ z + i . In particular, (69) holdsfor the x + components of z + . It remains to show that (69) holds for the λ + components of z + . Firstly, for all i ∈ A ( x ∗ ), we show that λ + i = ˘ λ + i , thus (69) holds for all λ + i such that i ∈ A ( x ∗ ) by Proposition 4. From the fact that λ > ( z ∈ F o ) and Lemma 16(ii), andsince ν ≥
2, it follows that χ := k ∆ x a k ν + k [˜ λ a , + Q ] − k ν ≤ k ∆ x a k ν + k ∆ λ a Q k ν ≤ (cid:18) ǫ ∗ (cid:19) ν ≤ ǫ ∗ , (70) Constraint-Reduced MPC Algorithm for Convex Quadratic Programming 35so that min { χ, λ } ≤ ǫ ∗ /
2. Also, from Lemma 16(iii) and the fact that ˘ λ + Q is a convexcombination of λ Q and ˜ λ + Q , we have, for all i ∈ A ( x ∗ ), ǫ ∗ < min { λ i , ˜ λ + i } ≤ ˘ λ + i . (71)Hence, from (70), (71), Lemma 16(iv), and (26), we conclude that λ + i = ˘ λ + i for all i ∈ A ( x ∗ ).Secondly, we prove that there exists d > k λ + Q \A ( x ∗ ) k = k λ + Q \A ( x ∗ ) − λ ∗ Q \A ( x ∗ ) k ≤ d max {k ∆ N z k , k z − z ∗ k } ∀ i ∈ Q \ A ( x ∗ ) , (72)thus establishing (69) for λ + i with i ∈ Q \ A ( x ∗ ). For i ∈ Q \ A ( x ∗ ), we know from (26)that, either λ + i = min { λ max , ˘ λ + i } , or λ + i = min { λ, k ∆ x a k ν + k [˜ λ a , + Q ] − k ν } . In the formercase, we have | λ + i | ≤ | ˘ λ + i | = | ˘ λ + i − λ ∗ i | ≤ | ˘ λ + i − ( λ i + ( ∆ N λ ) i ) | + | ( λ i + ( ∆ N λ ) i ) − λ ∗ i |≤ d max {k ∆ N z k , k z − z ∗ k } + d k z − z ∗ k , for some d > d >
0. Here the last inequality follows from Proposition 4 and thequadratic rate of the Newton step given in Lemma 14. In the latter case, since λ > , weobtain | λ + i | ≤ k ∆ x a k ν + k [[˜ λ a , + Q ] − k ν ≤ k ∆ x a k ν + k ∆ λ a Q k ν = k ∆ z a Q k ν ≤ d max {k ∆ N z k , k z − z ∗ k } , (73)for some d >
0. Here the equality is from the definition of ∆ z a and the last inequalityfollows from ν ≥
2, (68), and boundedness of F o ∩ B ( z ∗ , ρ ∗ ). Hence, we have established (72).Thirdly and finally, consider the case that i ∈ Q c . Since A ( x ∗ ) ⊆ Q , λ ∗ Q c = and it followsfrom (27) that, either λ + i = min { λ max , µ +( Q ) /s + i } , or λ + i = min { λ, k ∆ x a k ν + k [˜ λ a Q ] − k ν } . Inthe latter case, the bound in (73) follows. In the former case, we have | λ + i − λ ∗ i | = | λ + i | ≤ µ +( Q ) /s + i . By definition, s + i := s i + α p ∆s i is a convex combination of s i and ˜ s + i . Thus, Lemma 16(iii)gives that s + i ≥ min { s i , ˜ s + i } > ǫ ∗ /
2. Then using the definition of µ +( Q ) (see Step 10 ofAlgorithm CR-MPC) leads to | λ + i − λ ∗ i | ≤ ( ǫ ∗ | Q | (cid:16) ( s + A ( x ∗ ) ) T ( λ + A ( x ∗ ) ) + ( s + Q \A ( x ∗ ) ) T ( λ + Q \A ( x ∗ ) ) (cid:17) , if | Q | 6 = 00 , otherwise . Since z ∈ B ( z ∗ , ρ ∗ ), λ + A ( x ∗ ) and s + Q \A ( x ∗ ) are bounded by Lemma 16(ii). Also, by definition, s ∗A ( x ∗ ) = . Thus there exist d > d > | λ + i − λ ∗ i | ≤ d k s + A ( x ∗ ) − s ∗A ( x ∗ ) k + d k λ + Q \A ( x ∗ ) k . Having already established that the second term is bounded by the right-hand side of (72),and we are left to prove that the first term also is. By definition, k s + A ( x ∗ ) − s ∗A ( x ∗ ) k = k A A ( x ∗ ) x + − A A ( x ∗ ) x ∗ k ≤ k A x + − A x ∗ k ≤ k A kk ˘z + Q − z ∗ Q k . Applying Proposition 4 and Lemma 14, we get k s + A ( x ∗ ) − s ∗A ( x ∗ ) k ≤ k A kk ˘z + Q − ( z Q + ( ∆ N z ) Q ) k + k A kk ( z Q + ( ∆ N z ) Q ) − z ∗ Q k≤ d max {k ∆ N z k , k z − z ∗ k } + d k z − z ∗ k , for some d > d >
0. Hence, we established (69) for all i ∈ m , thus proving the q-quadratic convergence rate. ✷ References
1. Altman, A., Gondzio, J.: Regularized symmetric indefinite systems in interior pointmethods for linear and quadratic optimization. Optim. Methods Softw. (1-4), 275–302 (1999)2. Bertsimas, D., Tsitsiklis, J.: Introduction to Linear Optimization. Athena (1997)3. Cartis, C., Yan, Y.: Active-set prediction for interior point methods using controlledperturbations. Comput. Optim. Appl. (3), 639–684 (2016)4. Castro, J., Cuesta, J.: Quadratic regularizations in an interior-point method for primalblock-angular problems. Math. Prog. (2), 415–445 (2011)5. Chen, L., Wang, Y., He, G.: A feasible active set QP-free method for nonlinear pro-gramming. SIAM J. Optimiz. (2), 401–429 (2006)6. Dantzig, G.B., Ye, Y.: A build-up interior-point method for linear programming: Affinescaling form. Tech. rep., University of Iowa, Iowa City, IA 52242, USA (July 1991)7. Drummond, L., Svaiter, B.: On well definedness of the central path. J. Optim. Theoryand Appl. (2), 223–237 (1999)8. Facchinei, F., Fischer, A., Kanzow, C.: On the accurate identification of active con-straints. SIAM J Optimiz. (1), 14–32 (1998)9. Gill, P.E., Murray, W., Poncele´on, D.B., Saunders, M.A.: Solving reduced KKT systemsin barrier methods for linear programming. In: G.A. Watson, D. Griffiths (eds.) Numer-ical Analysis 1993, pp. 89–104. Pitman Research Notes in Mathematics 303, LongmansPress (1994)10. Grant, M., Boyd, S.: Graph implementations for nonsmooth convex programs. In:V. Blondel, S. Boyd, H. Kimura (eds.) Recent Advances in Learning and Control, Lec-ture Notes in Control and Information Sciences, pp. 95–110. Springer-Verlag Limited(2008). http://stanford.edu/~boyd/graph_dcp.html
11. Grant, M., Boyd, S.: CVX: Matlab software for disciplined convex programming, Version2.1. http://cvxr.com/cvx (2014)12. Hager, W.W., Seetharama Gowda, M.: Stability in the presence of degeneracy and errorestimation. Math. Prog. (1), 181–192 (1999)13. He, M.: Infeasible constraint reduction for linear and convex quadraticoptimization. Ph.D. thesis, University of Maryland (2011). URL: http://hdl.handle.net/1903/12772
14. He, M.Y., Tits, A.L.: Infeasible constraint-reduced interior-point methods for linearoptimization. Optim. Methods Softw. (4-5), 801–825 (2012)15. Hertog, D., Roos, C., Terlaky, T.: Adding and deleting constraints in the logarithmicbarrier method for LP. In: D.Z. Du, J. Sun (eds.) Advances in Optimization andApproximation, pp. 166–185. Kluwer Academic Publishers, Dordrecht, The Netherlands(1994)16. Jung, J.H.: Adaptive constraint reduction for convex quadratic programming and train-ing support vector machines. Ph.D. thesis, University of Maryland (2008). URL: http://hdl.handle.net/1903/8020
17. Jung, J.H., O’Leary, D.P., Tits, A.L.: Adaptive constraint reduction for training supportvector machines. Electron. T. Numer. Ana. , 156–177 (2008)18. Jung, J.H., O’Leary, D.P., Tits, A.L.: Adaptive constraint reduction for convex quadraticprogramming. Comput. Optim. Appl. (1), 125 – 157 (2012)19. Laiu, M.P.: Positive filtered P N method for linear transport equations and the asso-ciated optimization algorithm. Ph.D. thesis, University of Maryland (2016). URL: http://hdl.handle.net/1903/18732
20. Laiu, M.P., Hauck, C.D., McClarren, R.G., O’Leary, D.P., Tits, A.L.: Positive filteredP N moment closures for linear kinetic equations. SIAM J. Numer. Anal. (6), 3214–3238 (2016)21. Mehrotra, S.: On the implementation of a primal-dual interior point method. SIAM J.Optim. (4), 575–601 (1992)22. Nocedal, J., Wright, S.: Numerical Optimization. Springer Series in Operations Researchand Financial Engineering. Springer New York (2006)23. Park, S.: A constraint-reduced algorithm for semidefinite optimization problems withsuperlinear convergence. J. Optimiz. Theory App. (2), 512–527 (2016) Constraint-Reduced MPC Algorithm for Convex Quadratic Programming 3724. Park, S., O’Leary, D.P.: A polynomial time constraint-reduced algorithm for semidefiniteoptimization problems. J. Optimiz. Theory App. (2), 558–571 (2015)25. Saunders, M.A., Tomlin, J.A.: Solving regularized linear programs using barrier methodsand KKT systems. Tech. rep., SOL 96-4. Department of Operations Research, StanfordUniversity (1996)26. Sturm, J.: Using SeDuMi 1.02, a MATLAB toolbox for optimization over symmetriccones. Optim. Methods Softw. , 625–653 (1999). Version 1.05 available from http://fewcal.kub.nl/sturm
27. Tits, A., W¨achter, A., Bakhtiari, S., Urban, T., Lawrence, C.: A primal-dual interior-point method for nonlinear programming with strong global and local convergence prop-erties. SIAM J. Optimiz. (1), 173–199 (2003)28. Tits, A.L., Absil, P.A., Woessner, W.P.: Constraint reduction for linear programs withmany inequality constraints. SIAM J. Optimiz. (1), 119 – 146 (2006)29. Tits, A.L., Zhou, J.L.: A simple, quadratically convergent algorithm for linear andconvex quadratic programming. In: W. Hager, D. Hearn, P. Pardalos (eds.) Large ScaleOptimization: State of the Art, pp. 411–427. Kluwer Academic Publishers (1994)30. Toh, K.C., Todd, M.J., T¨ut¨unc¨u, R.H.: SDPT3 – A Matlab software package for semidef-inite programming, Version 1.3. Optim. Methods Softw. (1-4), 545–581 (1999)31. Tone, K.: An active-set strategy in an interior point method for linear programming.Math. Prog. (1), 345–360 (1993)32. T¨ut¨unc¨u, R.H., Toh, K.C., Todd, M.J.: Solving semidefinite-quadratic-linear programsusing SDPT3. Mathe. Prog. (2), 189–217 (2003)33. Winternitz, L.: Primal-dual interior-point algorithms for linear programming problemswith many inequality constraints. Ph.D. thesis, University of Maryland (2010). URL: http://hdl.handle.net/1903/10400
34. Winternitz, L.B., Nicholls, S.O., Tits, A.L., O’Leary, D.P.: A constraint-reduced variantof Mehrotra’s predictor-corrector algorithm. Comput. Optim. Appl. (1), 1001 – 1036(2012)35. Winternitz, L.B., Tits, A.L., Absil, P.A.: Addressing rank degeneracy in constraint-reduced interior-point methods for linear optimization. J. Optimiz. Theory App. (1),127–157 (2014)36. Wright, S.J.: Primal-Dual Interior-Point Methods. SIAM (1997)37. Wright, S.J.: Modifying SQP for degenerate problems. SIAM J. Optimiz. (2), 470–497(2002)38. Ye, Y.: A “build-down” scheme for linear programming. Math. Prog. (1), 61–72(1990)39. Zhang, Y., Zhang, D.: On polynomiality of the Mehrotra-type predictor–correctorinterior-point algorithms. Math. Prog.68