Hybrid Methods in Solving Alternating-Current Optimal Power Flows
HHybrid Methods in Solving Alternating-CurrentOptimal Power Flows
Alan C. Liddell, Jie Liu, Jakub Mareˇcek, Martin Tak´aˇc ∗ June 26, 2018
Abstract
Many steady-state problems in power systems, including rectangu-lar power-voltage formulations of optimal power flows in the alternating-current model (ACOPF), can be cast as polynomial optimisation prob-lems (POP). For a POP, one can derive strong convex relaxations, orrather hierarchies of ever stronger, but ever larger relaxations. Westudy means of switching from solving the convex relaxation to New-ton method working on a non-convex augmented Lagrangian of thePOP.
The alternating-current optimal power flow problem (ACOPF) is one of thebest known non-convex non-linear optimisation problems, studied exten-sively since 1960s [19, 20, 24, 11]. Early work focused on applications ofNewton method to the non-convex problem, which produced exceptionallyfast routines, albeit without any guarantees as to their global convergence.More recently, Lavaei and Low [18] have shown that a semidefinite program-ming (SDP) relaxation produces global optima in some cases. Ghaddar etal. [11] have shown that the SDP relaxation can be strengthened iterativelysuch that the hierarchy of relaxations converges to the global optimum ofthe non-convex problem, asymptotically, under mild conditions, albeit at aconsiderable computational cost. It has not been clear, however, how tocombine the two approaches. ∗ A. C. Liddell is with the University of Notre Dame, South Bend, IN, USA. J. Liuand M. Tak´aˇc are with Lehigh University, Bethlehem, PA, USA. J. Mareˇcek is with IBMResearch, Dublin, Ireland; e-mail: [email protected] . a r X i v : . [ m a t h . O C ] N ov n one hand, Newton method’s local convergence can lead to a vari-ety of poor outcomes. When one starts from an initial point outside of aneighbourhood of a stationary point, Newton method may diverge and pro-duce no feasible solution. Even within the neighbourhood, where Newtonmethod converges, the stationary point may turn out to be very far fromthe global optimum. For an illustration, see Figure 1, which we discussed inmore detail in Section 4. This behaviour is inherent in the non-convexity ofthe problem.On the other hand, solving the strengthened SDP relaxations [11] ischallenging, computationally. Leading second-order methods for solving theSDP relaxations, such as SeDuMi [26], often converge within dozens of itera-tions on SDP relaxations of even the largest available instances available, butthe absolute run-time and memory requirements of a single iteration maybe prohibitively large. Alternatively, one may employ first order methods[22, 21] whose memory requirements and per-iteration run-times are trivial,but whose rates of convergence are sub-linear. Either way, as one progressesin the hierarchy, the run-time to reach acceptable accuracy grows fast.To address the challenge, we introduce novel means of combining solversworking on the convexification and solvers working on the non-convex prob-lem. We employ a first-order method in solving the convexification [22],until we can guarantee local convergence of the Newton method on the non-convex Lagrangian of the problem, possibly considering some regularisation[22]. In particular, the guarantee considers points z and z ∗ , such that whenwe start a Newton method or a similar algorithm at the point z , it willgenerate a sequence of points z i converging to z ∗ with quadratic rate ofconvergence, i.e. (cid:107) z i − z ∗ (cid:107) ≤ (1 / i − i (cid:107) z − z ∗ (cid:107) . (1)The guarantee requires only the knowledge of the particular Lagrangianand its partial derivatives at z , but does not require the computation of z i , i > α - β Theory
Our approach is based on the α - β theory of Smale [25, 4, 6], which is alsoknown as the point estimation theory. We present the basics of the theory2nd a simple illustration, prior to presenting the main result.Consider the case of a general real-valued polynomial system f : R m (cid:55)→ R n , i.e., a system of polynomial equations f := ( f , . . . , f n ) in variables x := ( x , . . . , x m ) ∈ R m . Let us define the Newton operator at x ∈ R m as N f ( x ) := x − [ ∇ f ( x )] † f ( x ) , where [ ∇ f ( x )] † ∈ R m × n is the Moore–Penrose inverse of the Jacobian matrixof f at x . A sequence with initial point x and iterates of the Newton methodsubsequently, x i +1 := N f ( x i ) for i ≥
0, is well-defined if [ ∇ f ( x i )] † is welldefined at all x i , i ≥
0. We say that x ∈ R m is an approximate zero of f ifand only if1. the sequence { x i } is well-defined; and2. there exists x (cid:48) ∈ R m such that f ( x (cid:48) ) = 0 and (cid:107) x i − x (cid:48) (cid:107) ≤ (1 / i − (cid:107) x − x (cid:48) (cid:107) for all i ≥ x (cid:48) ∈ R m the associated zero of x ∈ R m and say that x represents x (cid:48) .The key result of α - β theory is: Proposition 1 ([25, 6]) . Let f : R m (cid:55)→ R n be a system of polynomialequations and define functions α ( f, x ) , β ( f, x ) , γ ( f, x ) as: α ( f, x ) := β ( f, x ) γ ( f, x ) , (2a) β ( f, x ) := (cid:13)(cid:13)(cid:13) [ ∇ f ( x )] † f ( x ) (cid:13)(cid:13)(cid:13) = (cid:107) x − N f ( x ) (cid:107) , (2b) γ ( f, x ) := sup k> (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) [ ∇ f ( x )] † [ ∇ ( k ) f ]( x ) k ! (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) / ( k − , (2c) where [ ∇ f ( x )] † ∈ R m × n is the Moore–Penrose inverse of the Jacobian matrixof f at x and [ ∇ ( k ) f ] is the symmetric tensor whose entries are the k -thpartial derivatives of f at x . Then there is a universal constant α ∈ R suchthat if α ( f, x ) ≤ α , then x is an approximate zero of f . Moreover, if x (cid:48) denotes its associated zero, then (cid:107) x − x (cid:48) (cid:107) ≤ β ( f, x ) . It can be shown that α = − √ ≈ . satisfies this property. We refer to [25, 6] for the proof and a variety of extensions. Consideringthat [25] is somewhat difficult to read and a part of a five-paper series, werefer to the survey of Cucker and Smale [6] or the very recent survey ofBeltran and Pardo [2] for an overview.Let us illustrate the approach on alternating-current power flows (ACPF),where the instance is defined by: 3 the graph G = ( V, E ), where V, | V | = n is partitioned into pv, pq, and {S} slack buses, and adjacent buses ( i, j ) ∈ E are denoted i ∼ j , and • the admittance matrix Y ∈ C n × n , with G := Re( Y ), B := Im( Y ) • active and reactive injection P i and Q i at the bus i ∈ pq ∪ {S} .Following [9], we define the power-flow operator F : R n (cid:55)→ R n in termsof complex voltages V i = V xi + ı V yi , i ∈ V with V c , with V c stacked as V ci = V xi , V cn + i = V yi :[ F { V c } ] i := G ii (cid:110) ( V xi ) + ( V yi ) (cid:111) (3a) − (cid:88) j ∼ i B ij (cid:110) V yi V xj − V xi V yj (cid:111) − (cid:88) j ∼ i G ij (cid:110) V xi V xj + V yi V yj (cid:111) − P i , i ∈ V [ F { V c } ] n + i := B ii (cid:110) ( V xi ) + ( V yi ) (cid:111) (3b)+ (cid:88) j ∼ i B ij (cid:110) V xi V xj + V yi V yj (cid:111) + (cid:88) j ∼ i G ij (cid:110) V yi V xj − V xi V yj (cid:111) − Q i , i ∈ pq[ F { V c } ] n + i := ( V xi ) + ( V yi ) − v i , i ∈ pv (3c)Whether a point is in a domain of monotonicity can be tested by thesimple comparison of α and α : Proposition 2.
For every instance of ACPF, there exists a universal con-stant a ∈ R and a function α of the instance of ACPF and a vector x ∈ R m such that if α ( F, x ) ≤ α , then x is an approximate zero of F .Proof. One can either Proposition 1 to a problem in V c , which stacks thereal and imaginary parts of the complex-valued vector to obtain a real-valuedproblem, or one may apply an extension of the proposition to complex-valuedpolynomials, such as Theorem 4.3 in [8].Obviously, one needs to compute β (2b) and γ (2c) to compute α (2a).Because γ ( f, x ) is difficult to compute in practice, we wish to establish abound, e.g., when m = n . Let us first define some auxiliary quantities, which4ill be used in the following proposition. Define a pseudo-norm (cid:107) · (cid:107) on R n by (cid:107) x (cid:107) := 1+ (cid:80) ni =1 | x i | , along with the auxiliary diagonal matrix ∆ ( d ) withentries ∆ ( d ) ( x ) i,i := d / i (cid:107) x (cid:107) d i − . Let us consider the degree- d polynomial g ( x ) := (cid:80) | ν | p ≤ d g ν x ν where the coefficients g ν ∈ R and x ν := x ν · · · x ν n n with | ν | p := (cid:80) ni =1 ν i . We can define the following norm: (cid:107) g (cid:107) p := (cid:88) | ν | p ≤ d | g ν | ν !( d − | ν | )! d ! , where ν ! := (cid:81) ni =1 ν i !. Next, we define a norm on the polynomial system f by simply writing (cid:107) f (cid:107) p := n (cid:88) i =1 (cid:107) f i (cid:107) p . Finally, define µ ( f, x ) := max { , (cid:107) f (cid:107) p · (cid:107) [ ∇ f ( x )] † ∆ ( d ) ( x ) (cid:107)} . With these quantities, we arrive at the following proposition bounding γ ( f, x ): Proposition 3 ([25, 6]) . Let f : R n (cid:55)→ R n be a polynomial system with d i := deg( f i ) and D := max i { d i } . If x ∈ R n such that [ ∇ f ( x )] is invertible,then γ ( f, x ) ≤ µ ( f, x ) D / (cid:107) x (cid:107) . (4)Notice that the proposition assumes a polynomial system, rather than apolynomial optimisation problem. We extend the approach to polynomial optimisation problems (POP). Con-sidering the developed insights [17] into the availability and strength ofcertain Lagrangian relaxations of a POP, we derive a test where knowingonly the relaxation and its derivatives at a particular point, we can decidewhether one can switch to the Newton method on the polynomial relaxation.Although there are many options for picking the relaxation, we suggest totrack the active set and wait until it stabilises. Then, one may consider apolynomial, in whose construction inequalities in the active set are treatedas equalities, while the remaining inequalities are disregarded. Notice thatunless one runs the Newton method on that very polynomial, one may needto back-track, whenever the active set changes while running the Newtonmethod. 5 .1 The Preliminaries
In order to describe the approaches formally, we introduce some notation.Let us denote the polynomial ring over the reals by R [ x ] and consider thecompact basic semi-algebraic set K defined by: K := { x ∈ R m : g j ( x ) ≥ , j = 1 , . . . , p, (5) h k ( x ) = 0 , k = 1 , . . . , q } for some g j ∈ R [ x ], j = 1 , . . . , p in x ∈ R m , h k ∈ R [ x ], k = 1 , . . . , q . Thecorresponding polynomial optimization problem ( POP ) is: P : f ∗ := min x ∈ R m { f ( x ) : x ∈ K } (6)where f ∈ R [ x ]. We use f ∗ to denote the value of the objective function f at the optimum of the POP (6); notice that there need not be a uniquepoint at which f ∗ is attained. We use P m to denote the space of all possi-ble descriptions of a POP (6) in dimension m . For additional backgroundmaterial on polynomial optimisation, we refer to [1].In a departure from the tradition, we use the term Lagrangian loosely, tomean a function ˜ L : R ˜ m (cid:55)→ R , ˜ m > m associated with a particular instanceof a POP (6) in R m . In the best known example, one has ˜ m = m + p + q and ˜ x ∈ R ˜ m is the concatenation of the variable x ∈ R m and the so calledLagrangian coefficients λ associated with the constraints: L ( x, λ ) := f ( x ) + p (cid:88) j =1 λ j g j ( x ) + q (cid:88) k =1 ( λ k max { , h k ( x ) } ) (7)The textbook version [3] of a Lagrangian relaxation is: ρ := max λ ∈ R p + q min x ∈ R m L ( x, λ ) (8)and it is known that ρ ≤ f ∗ . One often adds additional regularisation terms[22], which may improve the rate of convergence, but do not remove the factthat one may have ρ (cid:28) f ∗ . One may replace the max in (7) by constraintson λ k to be non-negative in (8), but the non-negativity constraints also makeit impossible to apply α - β theory directly.Using this looser definition of the Lagrangian, we define the domain ofmonotonicity of a (6), with respect to a particular Lagrangian: Definition 1 (Monotonicity domain with respect to ˜ L ) . For any ˜ x ∈ R ˜ m and ˜ L : R ˜ m (cid:55)→ R , consider a sequence ˜ x := ˜ x , ˜ x i +1 := N ˜ L (˜ x i ) for i > . he point ˜ x is within the monotonicity with respect to ˜ L if this sequence iswell defined and there exists a point ˜ x (cid:48) ∈ R ˜ m such that ˜ L (˜ x (cid:48) ) = 0 and || ˜ x i − ˜ x (cid:48) || ≤ (1 / i − i || ˜ x − ˜ x (cid:48) || . (9) Then, we call ˜ x (cid:48) the associated stationary point of ˜ x and say that ˜ x repre-sents ˜ x (cid:48) . Notice that we use tilde to stress the variable parts, such as the La-grangian ˜ L and its dimension ˜ m . Notice also that domains of monotonicityare known also as the region of attraction, the basin of attraction, etc. Recently, it has been realised that one can approximate the global optimum f ∗ as closely as possible, in case one applies the relaxation to a problem˜ P equivalent to P , which has sufficiently many redundant constraints. Tostate the result, we need some additional technical assumptions: Assumption 1. K is compact and ≤ g j ( x ) ≤ on x ∈ K for all j = 1 , . . . , p , possibly after re-scaling. Moreover, the family of polynomi-als { g j , − g j } generates the algebra R [ x ] . Notice that if K is compact, one may always rescale variables x i andadd redundant constraints 0 ≤ x i ≤ i = 1 , . . . , p , such thatthe family { g j , − g j } generates the algebra R [ x ] and Assumption 1 holds.Further, we assume: Assumption 2.
There exists a unique point x ∗ ∈ K , where f ∗ is attained. Notice that one can easily construct an example with two generators anda single other bus, where this assumption is violated. At the same time, itis easy to see that an arbitrarily small perturbation makes it possible tosatisfy the assumption. Alternatively, one could replace Assumption 2 withan assumption on the separation of stationary points, as discussed in [5].
It is well-known that one can construct:
Lemma 1 (Lasserre Hierarchy) . Let Assumption 1 hold for K (5) underlyinga POP P with optimum f ∗ . For every (cid:15) > , there exists d (cid:15) ∈ N such thatfor every d ≥ d (cid:15) , there exists a convex Lagrangian relaxation of P , whichyields a lower bound f ∗ − (cid:15) ≤ ρ d ≤ f ∗ . roof. The proof follows from Theorem 3.6 of Lasserre [16], when one con-siders a Lagrangian relaxation of the semidefinite programs. There, thestrong duality can be assured by a reformulation of the POP, cf. [14].Notice, however, that these Lagrangians, albeit convex, are not polyno-mial due to the presence of the semidefinite constraint. Moreover, for d ≥ convex Lagrangian, even using a first-order method, can be computationally much more demanding than a singleiteration of second-order methods for the basic Lagrangian ρ . We wouldhence like to study the domains of monotonicity with respect to the variousother Lagrangians.Specifically, notice that one can obtain Lagrangians by distinguishingwhich inequalities are satisfied with equality at a particular point. At anygiven point x , we can evaluate what inequalities of the POP (6) are active,i.e., satisfied with equality. Let us denote the index set active inequalities A ( · ) ⊆ I , A ( x, (cid:15) ) := { k | k = 1 , . . . , q, h i ( x ) ≤ (cid:15) } , (10)At an arbitrary point x ∈ R n , we can evaluate A ( · ) and construct a locallyvalid, but polynomial Lagrangian: L (cid:48) ( x, λ ) := f ( x ) + q (cid:88) k =1 λ k h j ( x ) + p (cid:88) j =1 ( j ∈ A λ j g j ( x )) , (11)and it is clear that: Lemma 2.
Let Assumptions 1 and 2 hold for K (5) . For every (cid:15) > , thereexists d (cid:15) ∈ N such that for every d ≥ d (cid:15) , the Lagrangian relaxation of ˜ P d ,yields a lower bound f ∗ − (cid:15) ≤ ρ d ≤ f ∗ achieved at x ∗ d and the active set A ( x ∗ d , (cid:15) ) induces L (cid:48) ( x, λ ) with optimum ρ d .Proof. The proof follows from the reasoning of Propositions 7 and 8 of [5],as explained by Henrion and Lasserre [13]: Under Assumptions 1 and 2, themoment matrix for d makes it possible to extract the solution by Choleskydecomposition, which in turn allows to estimate the active set.This allows for the direct application of α - β theory: Theorem 1.
There exists a universal constant a ∈ R , such that for all m ∈ N , P ∈ P m , where Assumptions 1 and 2 hold for P , there exists a d ∈ N , such that for every (cid:15) > , there exists d (cid:15) ∈ N such that for every ≥ d (cid:15) , there is a Lagrangian relaxation ˜ L d in dimension ˜ m , and a function α : P ˜ m × R ˜ m (cid:55)→ R such that if α ( ∇ ˜ L d , ˜ x ) ≤ α , then x is the domain ofmonotonicity of a solution with objective function value ρ d such that f ∗ − (cid:15) ≤ ρ d ≤ f ∗ .Proof. The proof follows from the observation that each convex
Lagrangianof Lemma 1 is associated with a non-convex, but polynomial Lagrangian ofLemma 2, and that both Lagrangians of will have a function value at theiroptima bounded from below by f ∗ − (cid:15) . Formally, for all m ∈ N , p ∈ P m , and x ∈ R m , there exists ˜ P d , d ∈ N , such that for every (cid:15) >
0, there exists d (cid:15) ∈ N such that for every d ≥ d (cid:15) , both the Lagrangian relaxation ˜ L d of ˜ P d ,and the new Lagrangian relaxation of the same problem L (cid:48) d yield a lowerbound f ∗ − (cid:15) ≤ ρ d ≤ f ∗ . While minimising the convex Lagrangian of thepolynomial optimisation problem (6), we can apply Proposition 1 to thefirst-order conditions of the corresponding Lagrangian L (cid:48) d of Lemma 2.In terms of the alternating-current optimal power flow problem (ACOPF),the theory can be summarised thus: Corollary 1.
There exists a universal constant a ∈ R , such that for everyinstance of ACOPF, there exists δ ∈ R , δ ≥ and a function α : R m (cid:55)→ R specific to the instance of ACOPF, such that for any (cid:15) > δ and vector x ∈ R m if α ( x ) ≤ α , then x is in the domain of monotonicity of an optimum of theinstance of ACOPF, which is no more than (cid:15) away from the value of theglobal optimum with respect to its objective function.Proof. By Theorem 1. The δ accounts for the perturbation.In the hybridisation we propose, one starts by solving a convexificationand construction of the active set in the outer loop. Then, one may testof the stability of the active set. Whenever the active set seems stable andthe test of Proposition 1 applied to L (cid:48) allows, we we switch to the Newtonmethod on the non-convex Lagrangian L (cid:48) . Some back-tracking line searchmay be employed withinin the Newton method, until a sufficient decrease in L is observed. Although this algorithm may seem somewhat crude, it seemsto perform well.Alternatively, one may employ a variant, whose schematic overview isin Algorithm 1. There, we consider first-order optimality conditions of L (cid:48) in the test on Line 6, but switch to the Newton method on the first-orderoptimality conditions of (7), while memorising the current value. Whileminimising (7), we check the active set; when it does change, we revertto solving the convexification with the memorised value. Although this9 umber of Epochs Infeasibility -25 -20 -15 -10 -5 case30 extended CD 2CD 4CD 8CD 16CD 32CD 64CD 128CD 140
Number of Epochs
Infeasibility -25 -20 -15 -10 -5 case118 CD 2CD 4CD 8CD 16CD 32CD 64CD 128CD 256CD 500
Number of Epochs Infeasibility -20 -15 -10 -5 case2383wp extended CD 4CD 16CD 64CD 201
Number of Epochs
Objective Value case30 extended
CD 2CD 4CD 8CD 16CD 32CD 64CD 128CD 140
Number of Epochs
Objective Value case118 CD 2CD 4CD 8CD 16CD 32CD 64CD 128CD 256CD 500
Number of Epochs Objective Value case2383wp extended CD 4CD 16CD 64CD 201
Figure 1: The motivation: the evolution of infeasibility (top row) and ob-jective function (bottom row) when one switches from solving the convexifi-cation to the Newton method after a given number of steps on IEEE 30-bustest system (left), 118-bus test system (middle), and a snapshot of the Polishsystem (case2383wp; right).algorithm may seem even cruder than the above, it performs better still, inpractice.
In implementing a hybrid method for ACOPF, such as Algorithm 1, oneencounters a number of challenges. One requires a solver for the convexifi-cation of ACOPF, a well-performing implementation of the Newton methodfor the non-convex Lagrangian L (cid:48) , and an implementation of Proposition 3.We will comment upon these in turn.The convexification we use is based on of the Lagrangian of the relaxationof Lavaei and Low [18]. (As we have shown in [11], relaxation of Lavaei andLow is the first level of the hierarchy Lasserre [16], considered in Lemma1.) In particular, we have used a variant introduced in [22]. To solve it,we have used a problem-specific first-order method [22], which is based ona coordinate descent with a closed-form step.The variant of Newton method, which we use, has been implementedspecifically for this paper. Conceptually, it follows the outline of the first-10 umber of Epochs Infeasibility -20 -15 -10 -5 case300 CDHybridNewton
Number of Epochs Objective Value case300 CDHybridNewton
Number of Epochs Active Set -2 -1 case300 CDHybridNewton
Solution Time
Infeasibility -20 -15 -10 -5 case300 CDHybridNewton
Solution Time
Objective Value case300 CDHybridNewton
Solution Time
Active Set -2 -1 case300 CDHybridNewton
Figure 2: The performance of the hybrid method on the IEEE 300-bus testsystem (case300).order method [22], but uses the symbolic Hessian in computation of the step.Note that on the non-convex Lagrangian, Newton direction may turn outnot to be a direction of descent. In dealing with the negative curvature, wemultiply the direction by -1, although we could also use a damped variant ofNewton method. Multiple Newton steps, each satisfying sufficient decrease,are performed in each iteration of the loop, before a sufficient decrease inthe convex Lagrangian is tested.A key contribution of ours is an implementation of Proposition 3 specificto ACOPF. There, one should observe that β is easy to obtain as β ( x, L ) := (cid:107) d (cid:107) = (cid:107) L p (cid:107) , where L p is the Newton direction, and f = ∂L∂x . By observing ∂L∂x , we can use d i = 3 ∀ i , thus D = 3 and ∆ ( d ) ( x ) = 3 / (cid:107) x (cid:107) I n × n , where I n × n is a 2 n × n identity matrix, so µ ( L, x ) = max { , √ (cid:107) x (cid:107) · (cid:107)∇ L (cid:107) · (cid:107) [ ∇ L ( x )] − (cid:107)} , where the spectral norm (cid:107) [ ∇ L ( x )] − (cid:107) can be computed as the inverse ofthe smallest eigenvalue of ∇ L ( x ). A trivial implementation may run fordays even on modest instances. In our implementation, we used about 2000lines of algebraic manipulations in Python to generate considerable amountsof instance-specific, optimised C code employing Intel MKL Libraries. Forexample, the test for case2383wp involves about 30 MB of C code. Thismakes it possible to run the test within seconds even on case2383wp.11 umber of Epochs Infeasibility -25 -20 -15 -10 -5 case30 CDHybridNewton
Number of Epochs
Infeasibility -25 -20 -15 -10 -5 case57 CDHybridNewton
Number of Epochs
Infeasibility -25 -20 -15 -10 -5 case118 CDHybridNewton
Number of Epochs
Objective Value case30 CDHybridNewton
Number of Epochs
Objective Value case57 CDHybridNewton
Number of Epochs
Objective Value case118 CDHybridNewton
Figure 3: The performance of the hybrid method on three IEEE test systems:30-bus, 57-bus, and 118-bus.
Number of Epochs Infeasibility -20 -15 -10 -5 case2383wp extended CDHybridNewton
Number of Epochs Objective Value case2383wp extended CDHybridNewton
Number of Epochs Active Set -2 -1 case2383wp extended CDHybridNewton
Figure 4: The performance of the hybrid method on a snapshot of the Polishtransmission system (case2383wp).Some evidence of the need for such a test is provided in Figure 1. There,the plot the evolution of infeasibility (top row) and objective function (bot-tom row) over the number of epochs, while varying the number of epochsafter we switch from solving the convexification to the Newton method.Even after a number of iterations, each of which decreases the value of theLagrangian, the Newton method can diverge. See, for example, the seriesdenoted CD 2 in the middle plot in the top row, where one switches-overafter 2 epochs of the coordinate descent on the convexification for the IEEE118-bus test system, but where the infeasibility does not seem to fall below1, ever. In the middle plot in the bottom row, one may also observe thevariety of local optima reached, by switching after 32 epochs (CD 32), 12812pochs (CD 128), and 500 epochs (CD 500) on the same instance.To validate the impact of our approach, we performed numerical exper-iments on a collection of well-known instances [30]. The experiments havebeen performed on a computer with an Intel Xeon CPU E5-2620 clocked at2.40GHz and 128 GB of RAM. We have used randomisation in generatingthe initial point, as well as in the sampling of coordinates, but we have useda fixed random seed for all runs of all methods. Throughout, we compare theperformance of the coordinate descent of [22] on the Lavaei-Low SDP relax-ation [18] (plotted in blue), against the Newton method on the non-convexLagrangian (plotted in yellow), against the performance of a variant of thehybrid method (plotted in red), which switches from the coordinate descenton the convexification to to Newton method on the non-convex Lagrangian,when the α - β test is satisfied. In Figures 2–4, we present a sample of theresults.For the first illustration, we chose the IEEE 300-bus test system, andplotted both a measure of infeasibility (left; see [22] for a definition) andthe objective function value (middle) against both wall-clock time (top row)and epochs (bottom row) in Figure 2. We use the the term epoch to mean m iterations of coordinate descent, or m coordinate-wise Newton steps, foran instance in dimension m . As can be seen by comparing the top andbottom row, the wall-clock time corresponding to one epoch across the threemethods is similar. On the other hand, the convergence rates are visiblydifferent, with the infeasibility decreasing at a quadratic rate for the Newtonand hybrid method.Further, we present the results on three more IEEE test systems in Figure3 in a more concise form with only the evolution of infeasibility (top row)and objective function (bottom row) over the number of epochs. The 30-bus(on the left) and 118-bus (on the right) test systems illustrate the typicalperformance: the evolution of infeasibility of the hybrid method overlapswith the first-order method until the switch-over. Henceforth, the quadraticrate of convergence resembles that of the Newton method, except with abetter starting point. The 57-bus test system (in the middle) demonstratesthe importance of the starting point: our implementation of the Newtonmethod from the Matpower starting point does not converge.Next, to illustrate the scalability of the approach, we present the resultson a snapshot of the Polish system in Figure 4. There are 2383 buses inthe snapshot, and more importantly, tap-changing and phase-shifting trans-formers, double-circuit transmission lines, and multiple generators at eachbus, which complicate the formulation of the thermal limits, as explained inSection 5.2 of [22]. Despite the preliminary nature of our implementation,13ompared to the established codes, developed over a decade or more [30],the convergence seems very robust.Finally, in the right-most plots of Figures 2 and 4, we plot the ratioof the cardinality of the active set to the number of inequalities over theepochs or time. This provides an empirical justification for the choice ofAlgorithm 1: the active set clearly stabilises much earlier than the objectivefunction value, and is only a small fraction of the count of the ineqalities,which allows for the short run-time of the test implementing Proposition 3. Let us present a brief overview of the rich history of the study of the conver-gence of Newton method. The best known result in the field is the theoremof Kantorovich [15], which formalises the assumptions under which wheneverfor a closed ball of radius t ∗ centered at x , it holds that ∇ F ( x )+ ∇ F ( x ) T (cid:31) x in the ball, the ball is a domain of monotonicity for the function F .(See also Appendix A.) Traditionally, it has been assumed that testing theproperty across the closed ball is difficult.Recently, Henrion and Korda [12] have shown that the domain of mono-tonicity of a polynomial system can be computed by solving an infinite-dimensional linear program over the space of measures, whose value canbe approximated by a certain hierarchy of convex semidefinite optimisationproblems. See also the work of Valm´orbida et al. [29, 28, 27] in the contextof partial differential equations, and elsewhere [7]. Dvijotham et al. [9, 10]showed that it can also be be cast as a certain non-convex semidefinite op-timisation problem. Notice, however, that this line of work [9, 10, 12] doesnot consider inequalities and may be rather computationally challenging.Similarly, the α - β theory [25, 4, 6], does not consider inequalities.To summarise: traditionally, the convergence of Newton method couldbe guaranteed only by the non-constructive arguments of the theorem ofKantorovich. Alternatively, one could the recently developed approaches[12, 9, 10], albeit at a computational cost possibly higher than that of solvingACOPF. Our approach seems to improve upon these considerably. Without the use of (hierarchies of) convex relaxations, Newton-type meth-ods can converge to particularly bad local optima of non-convex problems.Even the fastest first-order methods for computing strong convex relaxations14re, however, rather slow on their own. Hybrid methods combining first-order methods for the strong convex relaxations and Newton-type methodsfor the non-convex problems combine the guarantees of convergence asso-ciated with (hierarchies of) convex relaxations and the quadratic rates ofconvergence of Newton method. Crucially, such hybrid methods can be im-plemented in a distributed fashion, as discussed in [22]. This improves upon[9, 10] and opens up many novel directions for future research.15 eferences [1] M. F. Anjos and J.-B. Lasserre, editors.
Handbook on semidefinite,conic and polynomial optimization , volume 166 of
International seriesin operations research & management science . Springer, New York,2012.[2] C. Beltr´an and L. M. Pardo. Efficient polynomial system-solving bynumerical methods.
Journal of Fixed Point Theory and Applications ,6(1):63–85, 2009.[3] D. P. Bertsekas.
Nonlinear Programming . Athena Scientific, 2nd edi-tion, 1999.[4] P. Chen. Approximate zeros of quadratically convergent algorithms.
Mathematics of Computation , 63(207):247–270, 1994.[5] R. M. Corless, P. M. Gianni, and B. M. Trager. A reordered schur fac-torization method for zero-dimensional polynomial systems with mul-tiple roots. In
Proceedings of the 1997 International Symposium onSymbolic and Algebraic Computation , ISSAC ’97, pages 133–140, NewYork, NY, USA, 1997. ACM.[6] F. Cucker and S. Smale. Complexity estimates depending on conditionand round-off error.
Journal of the ACM , 46(1):113–184, Jan. 1999.[7] M. Demenkov. A matlab tool for regions of attraction estimation via nu-merical algebraic geometry. In
Mechanics - Seventh Polyakhov’s Read-ing, 2015 International Conference on , pages 1–5, Feb 2015.[8] W. Deren and Z. Fengguang. The theory of smale’s point estimation andits applications.
Journal of Computational and Applied Mathematics ,60(1):253 – 269, 1995.[9] K. Dvijotham, S. Low, and M. Chertkov. Solving the power flow equa-tions: A monotone operator approach. In
Decision and Control (CDC),2015 IEEE 54th Annual Conference on , 2015.[10] K. Dvijotham, S. Low, and M. Chertkov. Solving the power flow equa-tions: A monotone operator approach. abs/1506.08472, 2015.[11] B. Ghaddar, J. Mareˇcek, and M. Mevissen. Optimal power flow as apolynomial optimization problem.
IEEE Transactions on Power Sys-tems , 31(1):539–546, 2016. 1612] D. Henrion and M. Korda. Convex computation of the region of attrac-tion of polynomial control systems.
IEEE Transactions on AutomaticControl , 59(2):297–312, Feb 2014.[13] D. Henrion and J.-B. Lasserre.
Detecting Global Optimality and Extract-ing Solutions in GloptiPoly , pages 293–310. Springer Berlin Heidelberg,Berlin, Heidelberg, 2005.[14] C. Josz and D. Henrion. Strong duality in Lasserre’s hierarchy forpolynomial optimization.
Optimization Letters , 10(1):3–10, 2016.[15] L. Kantorovich. On Newton’s method for functional equations.
Dokl.Akad. Nauk SSSR , 59(7):1237–1240, 1948.[16] J. Lasserre. Convergent sdp-relaxations in polynomial optimizationwith sparsity.
SIAM Journal on Optimization , 17(3):882–843, 2006.[17] J. B. Lasserre. A lagrangian relaxation view of linear and semidefinitehierarchies.
SIAM Journal on Optimization , 23(3):1742–1756, 2013.[18] J. Lavaei and S. Low. Zero duality gap in optimal power flow problem.
IEEE T. Power Syst. , 27(1):92–107, 2012.[19] S. Low. Convex relaxation of optimal power flow – part i: Formulationsand equivalence.
Control of Network Systems, IEEE Transactions on ,1(1):15–27, March 2014.[20] S. Low. Convex relaxation of optimal power flow – part ii: Exactness.
Control of Network Systems, IEEE Transactions on , 1(2):177–189, June2014.[21] R. Madani, A. Kalbat, and J. Lavaei. Admm for sparse semidefiniteprogramming with applications to optimal power flow problem. In , pages 5932–5939, Dec 2015.[22] J. Marecek and M. Takac. A Low-Rank Coordinate-Descent Algorithmfor Semidefinite Programming Relaxations of Optimal Power Flow.
ArXiv , (arXiv:1506.08568), June 2015.[23] D. K. Molzahn, B. C. Lesieutre, and C. L. DeMarco. A sufficient condi-tion for global optimality of solutions to the optimal power flow prob-lem.
IEEE Transactions on Power Systems , 29(2):978–979, March 2014.1724] P. Panciatici, M. Campi, S. Garatti, S. Low, D. Molzahn, A. Sun, andL. Wehenkel. Advanced optimization methods for power systems. In
Power Systems Computation Conference (PSCC), 2014 , pages 1–18,Aug 2014.[25] M. Shub and S. Smale. On the complexity of Bezout’s theorem i –Geometric aspects.
Journal of the AMS , 6(2), 1993.[26] J. F. Sturm. Using sedumi 1.02, a matlab toolbox for optimization oversymmetric cones.
Optimization methods and software , 11(1-4):625–653,1999.[27] G. Valm´orbida, M. Ahmadi, and A. Papachristodoulou. Stability anal-ysis for a class of partial differential equations via semidefinite program-ming.
IEEE Transactions on Automatic Control , 61(6):1649–1654, June2016.[28] G. Valm´orbida and J. Anderson. Region of attraction analysis via in-variant sets. In , pages 3591–3596,June 2014.[29] G. Valm´orbida, S. Tarbouriech, and G. Garcia. Region of attraction es-timates for polynomial systems. In
Decision and Control, 2009 heldjointly with the 2009 28th Chinese Control Conference. CDC/CCC2009. Proceedings of the 48th IEEE Conference on , pages 5947–5952,Dec 2009.[30] R. Zimmerman, C. Murillo-S´anchez, and R. Thomas. Matpower:Steady-state operations, planning, and analysis tools for power sys-tems research and education.
IEEE Transactions on Power Systems ,26(1):12–19, Feb 2011.
A Kantorovich’s theorem
Let X be a Banach space, X ∗ its dual, F a functional on X , and ∇ F aFr´echet derivative of F . Further, define the open and closed balls at x ∈ X as: B ( x, r ) := { y ∈ X ; (cid:107) x − y (cid:107) < r } and (12a) B [ x, r ] := { y ∈ X ; (cid:107) x − y (cid:107) (cid:54) r } , (12b)respectively. Kantorovich’s theorem can be stated as follows:18 heorem 2 ([15]) . Let X , Y be Banach spaces, C ⊆ X and F : C (cid:55)→ Y a continuous function, continuously differentiable on int( C ) . Take x ∈ int( C ) , L, b > and suppose that ∇ F ( x ) is non-singular, (cid:107)∇ F ( x ) − [ ∇ F ( y ) − ∇ F ( x )] (cid:107) ≤ L (cid:107) x − y (cid:107) for any x, y ∈ C , (cid:107)∇ F ( x ) − F ( x ) (cid:107) ≤ b , bL ≤ .If B [ x , t ∗ ] ⊂ C , then the sequences { x k } generated by Newton method forsolving F ( x ) = 0 with starting point x , x k +1 := x k − [ ∇ F ( x n )] − F ( x k ) , k = 0 , , · · · , (13) is well defined, is contained in B ( x , t ∗ ) , converges to a point x ∗ ∈ B [ x , t ∗ ] ,which is the unique zero of F in B [ x , t ∗ ] , and (cid:107) x ∗ − x k +1 (cid:107) ≤ (cid:107) x ∗ − x k (cid:107) , k = 0 , , · · · . (14) Moreover, if bL < , then (cid:107) x ∗ − x k +1 (cid:107) ≤ − θ k θ k L √ − bL (cid:107) x ∗ − x k (cid:107) (15) ≤ L √ − bL (cid:107) x ∗ − x k (cid:107) , k = 0 , , · · · , where θ := t ∗ /t ∗∗ < , and x ∗ is the unique zero of F in B [ x , ρ ] for any ρ such that t ∗ ≤ ρ < t ∗∗ , B [ x , ρ ] ⊂ C, (16) where t ∗ := 1 − √ − bLL , t ∗∗ := 1 + √ − bLL . (17)19 lgorithm 1 A schema of the hybrid method Initialise x ∈ R m , λ ∈ R m , e.g., randomly for k ← , , , . . . do Update ( x, λ ), e.g., using [22] Compute the set A k of inequalities satisfied up to (cid:15) -accuracy Construct the polynomial Lagrangian function L (cid:48) corresponding to A k if k > K and A k = A k − = . . . = A k − K and α ( ∇ L (cid:48) , x ) ≤ α then S ← ( x, λ ) for l ← , , , . . . do Update ( x, λ ) using Newton step on ∇ L = 0, i.e., minimising (7) Compute the set A (cid:48) l of inequalities satisfied up to (cid:15) -accuracy if A (cid:48) l (cid:54) = Ak then ( x, λ ) ← S break end if if Infesibility of x is below (cid:15) then Optionally, test sufficient conditions for global optimality, e.g.,[23] break end if end for if Infesibility of x is below (cid:15) then break end if end if end forend for