[PDF] Conditions for Exact Convex Relaxation and No Spurious Local Optima

Abstract

Non-convex optimization problems can be approximately solved via relaxation or local algorithms. For many practical problems such as optimal power flow (OPF) problems, both approaches tend to succeed in the sense that relaxation is usually exact and local algorithms usually converge to a global optimum. In this paper, we study conditions which are sufficient or necessary for such non-convex problems to simultaneously have exact relaxation and no spurious local optima. Those conditions help us explain the widespread empirical experience that local algorithms for OPF problems often work extremely well.

Full PDF

11 Conditions for Exact Convex Relaxation and No SpuriousLocal Optima

Fengyu Zhou,

Student Member, IEEE, and Steven H. Low,

Fellow, IEEE

Abstract —Non-convex optimization problems can be approx-imately solved via relaxation or local algorithms. For manypractical problems such as optimal power ﬂow (OPF) problems,both approaches tend to succeed in the sense that relaxation isusually exact and local algorithms usually converge to a globaloptimum. In this paper, we study conditions which are sufﬁcientor necessary for such non-convex problems to simultaneouslyhave exact relaxation and no spurious local optima. Thoseconditions help us explain the widespread empirical experiencethat local algorithms for OPF problems often work extremelywell.

Index Terms —Convex relaxation, local optimum, optimalpower ﬂow, semideﬁnite program.

I. I

NTRODUCTION N ON-CONVEX optimization problems in general arecomputationally challenging. However, many heuristicstend to work well for real-world problems. Those approachesinclude convex relaxations and local algorithms. It is usuallyhoped that relaxations yields exact solutions and local optimaare also globally optimal. In this paper, we derive the condi-tions, sufﬁcient or necessary, for these two properties to holdsimultaneously. Our focus is speciﬁcally on the optimizationformulation with convex cost and non-convex constraints.

A. Related Works

Many problems have been proved to have exact relaxationand no spurious local optima (such as matrix completion [2],[3], [4], low rank semideﬁnite program [5], [6], [7]), the proofsfor those two properties are usually based on different typesof certiﬁcates. In this subsection, we review some widely-usedcertiﬁcates for each property.One type of certiﬁcates to exhibit relaxation exactness isby showing that any relaxed (and infeasible) point maps toa feasible solution with lower cost. This asserts that relaxedpoints cannot be the optimal solution. For instance, [8], [9]prove that optimal power ﬂow problems can be solved viasecond-order cone relaxation under certain conditions, usingthe argument that any solution in the interior of the second-order cone can always be moved towards the boundary tofurther reduce the cost. In [5], [6], it is proved that if a semi-deﬁnite program has a solution with sufﬁciently large rank,then one can always reduce the rank without increasing thecost or violating the constraints. Another type of certiﬁcatesinvolve studying the dual variables and KKT conditions. The

Partial and preliminary results have appeared in [1].Fengyu Zhou and Steven H. Low are with the Department of ElectricalEngineering, California Institute of Technology, Pasadena, CA, 91125 USAe-mail: ( { f.zhou, slow } @caltech.edu). underlying idea is a pair of primal and dual solutions satisfyingKKT conditions that certify their optimality for both the primaland dual problems. Thus constructing dual variables withcertain structures can also certify the optimality of primalsolutions. In [2] for instance, the dual variable is related tothe subgradient of the cost function at a desired matrix andtherefore it helps certify the optimality of that desired matrix.Another example is [10], which proves the primal matrixshould be of rank through the argument that the null spaceof its dual matrix has dimension at most . Similar techniquesare also used in [11], [12], [13].There are also considerable literature establishing the globaloptimality of local optima. We refer to [14], [15] and ref-erences therein. In [14], the authors focus on a class ofproblems with twice continuously differentiable function asthe cost and Riemannian manifold as the feasible set. Thevalues of Riemannian gradient and Hessian at a certain pointthen help certify properties such as strong gradient , negativecurvature or local convexity in its neighborhood. It theneliminates spurious local optima and saddle points, where localalgorithms can be trapped. This technique or idea were alsoused in [16] for dictionary recovery problem and [17] for phasesynchronization. In both problems, the Riemannian manifoldis some n -sphere or the Cartesian product of n -spheres. Theframework summarized in [15] also leverages the landscapeof the cost function, and the problem is usually reformulatedinto an unconstrained form. Instead of explicitly computingthe gradient and Hessian matrix, the paper shows it sufﬁces toﬁnd a single direction of improvement. For certain symmetricpositive deﬁnite problems, the paper shows that the decisionvariable will always get closer to the global optimizer when thecost is reduced. A similar idea was also applied in [18], wherethe main result is built upon a correlation condition whichstates that the gradient (or any updating rule) is correlatedwith the direction from the current location towards the globaloptimizer. Therefore the underlying algorithm such as gradientdescent can always produce a solution closer to the globaloptimizer as the algorithm progresses. B. Contribution

The brief review above shows most works study exactrelaxation and local optimality separately. It is unclear whatmight be the common feature of non-convex problems thatpossess both properties. Many real-world non-convex prob-lems, however, seem to possess both properties, either prov-ably or empirically, and it is hard to explain why these niceproperties, though seemingly different, often occur simultane-ously. Besides, most literature on local optimality focuses on a r X i v : . [ m a t h . O C ] F e b problems without constraints or with tractable constraints. Thisis usually the case for problems in the learning area. However,for problems arising in cyber-physical systems, the constraintscould include non-convex functions enforced by physical laws,as we will see in power systems. In these cases, either thefeasible set is not a Riemannian manifold, or the Riemanniangradient and Hessian are very hard to derive. These questionsmotivate us to study conditions, sufﬁcient or necessary, forproblems to simultaneously have exact relaxation and nospurious local optima. These conditions also help us studylocal optimality using properties of its relaxation, instead ofits landscape.Our conditions have two parts. The ﬁrst part, which alsoappeared in [1], is on the sufﬁcient condition. Roughly, if forany relaxed point, there exists a path connecting it to the non-convex feasible set and the path satisﬁes the following: • along the path the cost is non-increasing, • along the path the ‘distance’ to the non-convex feasibleset is non-increasing,then the problem must have exact relaxation and no spuriouslocal optima simultaneously. Here the ‘distance’ can be anyproperly constructed function, as we will deﬁne later as aLyapunov-like function (Deﬁnition 10). The second part is onthe necessary condition, which says that if a problem does haveexact relaxation and no spurious local optima simultaneously,then there must exist such a Lyapunov-like function and pathssatisfying the requirements above. Though Lyapunov-like functions and paths are guaranteed toexist, for speciﬁc problems it could still be difﬁcult to constructthem. We then derive certain rules to construct a Lyapunov-likefunction and paths of a new problem from primitive problemswith known Lyapunov-like functions and paths. This processallows us to reuse and extend known results as the problemchanges and grows. Finally, we apply the proposed approachto two speciﬁc problems, optimal power ﬂow (OPF) and lowrank SDP. Our work proves the ﬁrst known condition (that canbe checked a priori ) for OPF to have no spurious local optima,and it helps explain the widespread empirical experience thatlocal algorithms for OPF problems often work extremely well.

C. Background for Power Systems

As one of the applications and main motivation of this work,OPF is a core problem in power systems. First proposed in[19], OPF is a class of optimization problems that minimizes acertain cost subject to nonlinear physical laws and operationalconstraints. It is known to be non-convex and NP-hard inits AC formulation [20], [10], [21]. Therefore, there is noknown efﬁcient algorithm that can solve all problem instancesin polynomial time. Traditional approaches to solving OPF areusually based on local algorithms such as Newton-Raphson,see [22], [23], [24] for examples. Over the past decade,techniques on convex relaxation have also been introducedto solve OPF [25], [26]. A surprising empirical ﬁnding inthe literature shows that despite the non-convexity, both localalgorithms and convex relaxations very often yield global The necessary condition is based upon some stronger assumptions so thesecond part is not the exact converse of the ﬁrst part. optimum of the original non-convex problem [25], [26], [10],[27]. In recent years, there have been considerable analyticalworks on provable conditions for the relaxation exactness,which are summarized in the reviews [28], [29] and referencestherein. However, few analytical results are known on theperformance guarantee of local algorithms. In this paper, weshow that a known sufﬁcient condition for exact relaxation isalso sufﬁcient for local optima to be globally optimal. To thebest of our knowledge, this is the ﬁrst analytical result of itskind, and we hope that the approaches developed in this papercan help derive more sufﬁcient conditions along this direction.II. P

RELIMINARIES

In this paper, we will use K to denote the set R of realnumbers or the set C of complex numbers. For any ﬁnitepositive integer n , K n is a Banach space.Consider a (potentially non-convex) optimization problemminimize x f ( x ) (1a)subject to x ∈ X (1b)and its convex relaxationminimize x f ( x ) (2a)subject to x ∈ ˆ X . (2b)Here X is a nonempty compact subset of K n , not necessarilyconvex, while ˆ X ⊆ K n is an arbitrary compact and convexsuperset of X . The cost function f : ˆ X → R is convex andcontinuous over ˆ X . We do not require the relaxation ˆ X to beefﬁciently represented. Deﬁnition 1.

A point x lo ∈ X is called a local optimum of (1) if there exists a δ > such that f ( x lo ) ≤ f ( x ) for all x ∈ X with (cid:107) x − x lo (cid:107) < δ . Deﬁnition 2 (Strong Exactness) . We say the relaxation (2) is exact with respect to (1) if every optimal point of (2) isfeasible, and hence globally optimal, for (1) . Unless otherwise speciﬁed, we will always use the term exact to refer to such strong exactness. Deﬁnition 2 impliesin particular that, if (2) is exact, then ∀ ˆ x ∈ ˆ X \ X , f (ˆ x ) > min x ∈ ˆ X f ( x ) . Deﬁnition 3. A path in S ⊆ K n connecting point a to point b is a continuous function h : [0 , → S such that h (0) = a and h (1) = b . We may refer to a path as the corresponding function h inthe remainder of the paper. Lemma 1.

The following are equivalent:(A) Problem (2) is exact with respect to (1) .(B) For any x ∈ ˆ X \ X , there is a path h in ˆ X such that h (0) = x, h (1) ∈ X , f ( h ( t )) is non-increasing for t ∈ [0 , and f ( h (0)) > f ( h (1)) .Proof. (A) = ⇒ (B): Let x ∗ be any optimal point of (2). By(A), x ∗ ∈ X , thus for x ∈ ˆ X \ X , we could choose the pathas the line segment from x to x ∗ since ˆ X is convex. (B) = ⇒ (A): Condition (B) implies that no point x ∈ ˆ X \X can be optimal for (2).Lemma 1 is not surprising, and in fact many works in theliterature proving exact relaxations of Optimal Power Flowproblems can be interpreted as using (B) to prove (A) byimplicitly ﬁnding such a path h for each x ∈ ˆ X \ X [28].Condition (B) does not say anything about the local optimain X for (1). In the next section we will strengthen (B) byequipping the path with a Lyapunov-like function and showthat the stronger condition implies that all local optima of (1)are globally optimal. We start by classifying local minima. Deﬁnition 4.

We classify each local optimum x lo of (1) intothree disjoint classes: x lo is a • Global optimum (g.o.) if f ( x lo ) ≤ f ( x ) for all thefeasible x ∈ X . • Pseudo local optimum (p.l.o.) if there is a path h :[0 , → X such that h (0) = x lo , f ( h ( t )) ≡ f ( x lo ) forall t ∈ [0 , and h (1) is not a local optimum. • Genuine local optimum (g.l.o.) if it is neither a globaloptimum nor a pseudo local optimum.

Examples of all three classes are shown in Fig. 1. X a b c d Fig. 1. Examples for three classes of local optima. The arrow indicates thedirection along which the cost function decreases. Point b is a global optimum,point c is a pseudo local optimum, while points a and d are genuine localoptima. Deﬁnition 5.

A point x is improvable in X if there is a path h : [0 , → X such that • h (0) = x ; • f ( h ( t )) is non-increasing for t ∈ [0 , ; • h (1) is not a local optimum or f ( h (1)) < f ( x ) . Remark 1.

A local optimum is a pseudo local optimum if andonly if it is improvable in X . Deﬁnition 6.

A set { h i : i ∈ I} of paths indexed by i is saidto be uniformly bounded if there is a ﬁnite number M suchthat (cid:107) h i ( t ) (cid:107) ∞ ≤ M for every i ∈ I and t ∈ [0 , . Deﬁnition 7.

A set { h i : i ∈ I} of paths indexed by i is saidto be uniformly equicontinuous if for any (cid:15) > , there existsa δ > such that (cid:107) h i ( t ) − h i ( t ) (cid:107) ∞ < (cid:15) for every i ∈ I whenever | t − t | < δ . Remark 2.

The index set I could be empty or uncountablyinﬁnite. An empty path set (i.e., when I = ∅ ) is considered tobe both uniformly bounded and uniformly equicontinuous. Let Π | ba be the family of all the ﬁnite ordered subsetsof [ a, b ] . We use Π as a shorthand for Π | . For π =( t , · · · , t N ) ∈ Π and a path h , deﬁne L π ( h ) := N (cid:88) i =1 (cid:107) h ( t i − ) − h ( t i ) (cid:107) (cid:96) . Clearly, L π ( h ) is always ﬁnite for given π and h . Deﬁnition 8 ([30]) . For path h , deﬁne the function L ( h ) :=sup π ∈ Π L π ( h ) . We say h is rectiﬁable iff L ( h ) is ﬁnite. When h is rectiﬁable, L ( h ) is also referred to as its length. Deﬁnition 9 ([30]) . For a rectiﬁable path h : [0 , → K n , letits arc-length reparameterization be ¯ h : [0 , → K n and (cid:40) ¯ h (cid:16) L ( h ) sup π ∈ Π | t L π ( h ) (cid:17) := h ( t ) , if L ( h ) > h := h, if L ( h ) = 0 One could see L (¯ h ) = L ( h ) < ∞ and they have the samefunction image, i.e., { ¯ h ( t ) | t ∈ [0 , } = { h ( t ) | t ∈ [0 , } . For ≤ t ≤ t ≤ , ¯ h has the property that sup π ∈ Π | t t L π (¯ h ) =( t − t ) L (¯ h ) . Lemma 2.

For a set of rectiﬁable paths h i , i ∈ I , if the valuesof L ( h i ) are uniformly bounded, then the set of ¯ h i , i ∈ I isuniformly equicontinuous.Proof. Assume L ( h i ) ≤ M for all i ∈ I , then for any ≤ t ≤ t ≤ , we have for any i , (cid:107) ¯ h i ( t ) − ¯ h i ( t ) (cid:107) ∞ ≤ (cid:107) ¯ h i ( t ) − ¯ h i ( t ) (cid:107) (cid:96) ≤ sup π ∈ Π | t t L π (¯ h i ) = ( t − t ) L ( h i ) ≤ M | t − t | . Setting δ = (cid:15)/M , the equicontinuity is proved. Corollary 1. If S is compact in K n and all paths in a set H = { h i : i ∈ I} are [0 , → S and consist of at most N linear segments, then { h i : i ∈ I} must be both uniformlybounded and uniformly equicontinuous. Here N is a ﬁniteconstant for all paths in H . III. S

UFFICIENT C ONDITIONS

In this section, we ﬁrst study the sufﬁcient conditions underwhich (2) is exact w.r.t. (1) and all the local optima of (1)are also globally optimal. Those sufﬁcient conditions will beproposed by strengthening Condition (B). Note that (B) hasalready implied (2) is exact w.r.t. (1), so our strategy is tostrengthen (B) in order to rule out the possibility of genuinelocal optima and pseudo local optima.

A. Ruling Out Genuine Local Optima

Deﬁnition 10. A Lyapunov-like function associated with (1) and (2) is a continuous function V : ˆ X → R + such that V ( x ) = 0 for x ∈ X and V ( x ) > for x ∈ ˆ X \ X . A strengthened version of (B) is as follows. In contrast to a standard Lyapunov function, we do not require V to bedifferentiable here. x x ∗ X ˆ X ℓ ( t † ) ℓ ( t ) ℓ ( t ) ℓ ( t m ) h h h m h h (1) h (1) h m (1) h (1) Fig. 2. Sketch of notations for the proof of Theorem 1. Point x and (cid:96) ( t † ) will be later proved to be identical to each other. (C) There exists a Lyapunov-like function V associated with(1) and (2) such that:(C1) For any x ∈ ˆ X \ X , there is a path h x in ˆ X such that h x (0) = x, h x (1) ∈ X , both f ( h x ( t )) and V ( h x ( t )) are non-increasing for t ∈ [0 , and f ( h x (0)) > f ( h x (1)) .(C2) The set { h x } x ∈ ˆ X \X is uniformly bounded and uni-formly equicontinuous.

Theorem 1.

If (C) holds, then (A) also holds and any localoptimum in X for (1) is either a global optimum or a pseudolocal optimum.Proof. (C) = ⇒ (A) is because (C) is stronger than (B). Asfor the second part of the argument, we include an illustrativesketch of the notations in Fig. 2. Suppose x ∈ X is a localbut not global optimum for (1). We will prove that x must beimprovable in X (and thus a pseudo local optimum).Let x ∗ (cid:54) = x be a global optimum of (1), so f ( x ∗ ) < f ( x ) .Let (cid:96) : [0 , → ˆ X be the linear function characterizing theline segment from x to x ∗ , i.e., (cid:96) ( t ) = (1 − t ) x + tx ∗ with f ( (cid:96) (1)) = f ( x ∗ ) < f ( x ) . Note that f ( (cid:96) ( t )) is non-increasingin t . To see this, consider any t ≥ , (cid:15) > with t + (cid:15) ≤ , x = (cid:96) ( t ) , x = (cid:96) ( t + (cid:15) ) . Setting s := (cid:15)/ (1 − t ) , we have x = (1 − s ) x + sx ∗ . Since f is convex and x ∗ is also aglobal optimum of (2) over ˆ X , we have f ( x ) ≤ (1 − s ) f ( x ) + sf ( x ∗ ) ≤ f ( x ) . Deﬁne t † := sup t ∈ [0 , t s. t. (cid:96) ( τ ) ∈ X ∀ τ ≤ t. As X is closed, (cid:96) ( t † ) is also in X . We ﬁrst prove (cid:96) ( t † ) mustbe x (i.e., t † = 0 ). Otherwise, as x is a local optimum, wecould ﬁnd δ ∈ (0 , t † ) such that f ( (cid:96) ( t )) ≥ f ( (cid:96) (0)) = f ( x ) forall t ∈ [0 , δ ) . Since f ( (cid:96) ( t )) is non-increasing in t , we musthave f ( (cid:96) ( t )) ≡ f ( (cid:96) (0)) = f ( x ) for all t ∈ [0 , δ ) . It contradictsthe fact that f ( (cid:96) ( t )) is convex and f ( (cid:96) (1)) = f ( x ∗ ) < f ( x ) = f ( (cid:96) (0)) for the same reason f is non-increasing in t .Therefore (cid:96) ( t † ) = x and f ( (cid:96) ( t † )) = f ( x ) . It is sufﬁcient toshow (cid:96) ( t † ) is improvable in X . That is to say, it is sufﬁcientto ﬁnd some function h : [0 , → X such that h (0) = (cid:96) ( t † ) , f ( h ( t )) is non-increasing in t ∈ [0 , and either f ( h (1))

So far, Condition (C) has eliminated the possibility ofhaving genuine local optima, and in this subsection we furtherstrengthen the condition to also rule out pseudo local optima.Consider the following lemma and its corollaries.

Lemma 3. If (1) is exact with respect to (2) and (1) has nogenuine local optima, then the feasible set of (1) is connected.Proof. If X is not connected, then by deﬁnition X can bepartitioned into two disjoint non-empty closed sets X and X with X = X ∪X , which are hence both compact. Further welet x i be any global optimum of min x ∈X i f ( x ) for i = 1 , .Clearly x (cid:54) = x and they are both local optima of (1). The strict inequality is due to the convexity of f ( (cid:96) ( t )) and the fact that f ( (cid:96) (1)) < f ( (cid:96) ( t † )) . If f ( x ) = f ( x ) , then any convex combination of x , x must be a global optimum to (2). Since there is no pathin X that connects x and x , there must be some convexcombination that is outside X . This contradicts the exactnessof relaxation.If f ( x ) (cid:54) = f ( x ) , without loss of generality we assume f ( x ) < f ( x ) , i.e., x is not a global optimum of (1). But x is not a pseudo local optimum of (1) either, contradictingTheorem 1. To see this, note that any point x (cid:48) ∈ X which isconnected to x via a path in X must also be a point in X and if f ( x (cid:48) ) = f ( x ) then x (cid:48) must be a local optimum of (1)as well. Corollary 2.

Condition (C) implies that the feasible set of (1) is connected.

Now we are in a good position to discuss some conditionsthat rule out pseudo local optima and therefore guarantee thatany local optimum must be a global optimum.

Corollary 3.

If all local optima of (1) are isolated, thenCondition (C) implies that any local optimum of (1) is a globaloptimum.

Here, local optima being isolated means any local optimumof (1) has an open neighborhood which contains no other localoptimum. The proof is straightforward as by deﬁnition isolatedlocal optimum could not be pseudo local optimum. In fact, inthis case the optimum can be proved to be also unique.Another way to eliminate pseudo local optima is bystrengthening the monotonicity of f ( h x ( t )) in Condition (C).Consider the following condition which is slightly strongerthan (C).(C’) Condition (C) holds, and there exists k > such that ∀ x ∈ ˆ X \ X , ∀ ≤ t < s ≤ we have f ( h x ( t )) − f ( h x ( s )) ≥ k (cid:107) h x ( t ) − h x ( s ) (cid:107) . (3)In Condition (C’), (cid:107) · (cid:107) could be any norm on K n . As acaveat, (cid:96) -“norm” is not allowed here as it is not a normsince it does not satisfy (cid:107) αx (cid:107) = | α |(cid:107) x (cid:107) . Note that Condition(C) already implies f ( h x ( t )) − f ( h x ( s )) ≥ , while (C’)strengthens this condition by enforcing a positive lower bounddepending on h x . Theorem 2.

If (C’) holds, then any local optimum of (1) mustbe a global optimum.Proof.

Following the proof of Theorem 1, suppose x ∈ X is alocal but not global optimum for (1). Then we have x = (cid:96) ( t † ) and could obtain a limit point of the sequence h m , denoted as h . Since both sides of (3) are continuous in h m ( t ) and h m ( s ) ,and the limits of h m ( t ) and h m ( s ) are h ( t ) and h ( s ) , we musthave whenever h ( t ) (cid:54) = h ( s ) , f ( h ( t )) − f ( h ( s )) ≥ k (cid:107) h ( t ) − h ( s ) (cid:107) > . Taking t = 0 we can conclude that h (0) (which is the samepoint as x ) is not a local optimum of (1). IV. N ECESSARY C ONDITIONS

In this section we will study the necessary conditions for anon-convex problem to have exact relaxation and no spuriouslocal optima simultaneously. It turns out the results are notexactly the converses of Theorem 1 or Theorem 2, but in aslightly weaker sense. Speciﬁcally, we show that if a non-convex problem is known to have exact relaxation and nospurious local optima simultaneously, then the Lyapunov-likefunction and paths satisfying Condition (C) are guaranteed toexist. However, it still may or may not be easy to ﬁnd thosefunctions or paths in practice for a speciﬁc problem.

A. Results

Assumption 1.

The feasible set X is semianalytic and thecost function f is analytic. We refer to [31] for more detailed deﬁnitions and propertiesof semianalytic sets. This assumption is not restrictive for mostengineering problems. If K is chosen as C , then we suggest toview all the complex functions as functions of real variablesby separating the real and imaginary parts, and the space of C n can be viewed as a shorthand for R n in this section. Theorem 3 (necessary condition) . If (2) is exact with respectto (1) and any local optimum of (1) is globally optimal, thenthere exists a Lyapunov-like function V and a correspondingfamily of paths { h x } x ∈ ˆ X \X satisfying (C1) and (C2).

Remark 3.

Note that Theorem 3 is NOT the converse ofTheorem 1 in a strict sense. There are a few differences intheir settings. • Theorem 1 allows pseudo local optimum (in the conclu-sion) of the theorem, while Theorem 3 disallows it (in thepremise). • Theorem 3 relies on Assumption 1 while Theorem 1 doesnot.B. Proof Setup

In the rest of the section, we will prove Theorem 3. Fromnow on, we assume (2) is exact with respect to (1) and anylocal optimum of (1) is also globally optimal. We ﬁrst have thefollowing deﬁnition and lemmas, which are the main reasonswe introduced Assumption 1.

Deﬁnition 11 (Whitney regularity [31], [32], [33]) . For acompact set

U ⊂ K n and a positive integer p , we say U is p -regular if there exists C > such that ∀ x, y ∈ U , x , y can be joined by a rectiﬁable curve h in U satisfying L ( h ) ≤ C (cid:107) x − y (cid:107) /p . Lemma 4 (Theorem 6.10 in [31]) . If U is a compact connectedsubanalytic subset of K n , then there is a positive integer p such that U is p -regular and the curves can always be chosensemianalytic. The proof of Lemma 4 can be found in [31]. Note that anysemianalytic set is also subanalytic.

Lemma 5.

For any x ∈ X that is not a local optimum of (1) and for any (cid:15) > , there exists a path h in X such that h (0) = x , f ( h ( t )) is non-increasing in t , f ( h (1)) < f ( h (0)) and L ( h ) < (cid:15) .Proof. Consider the set U := { x ∈ X : f ( x ) ≤ f ( x ) } ,which by deﬁnition is also semi-analytic. Since x ∈ X is notan optimum of (1), the problem min x ∈U f ( x ) must also beexact with respect to (2) and it does not introduce new localoptima compared to (1). By Lemma 3, U must be connected.According to Lemma 4, there is a rectiﬁable and semi-analytic curve h in U such that h (0) = x , L ( h ) < (cid:15) , f ( h (1)) < f ( x ) and f ( h ( t )) ≤ f ( x ) for all t ∈ [0 , .Here f ( h (1)) can be chosen as any point in U which has astrictly smaller cost value than x and is sufﬁciently close to x in Euclidean distance. It is known that a semianalytic curveis analytic except for a ﬁnite number of points [34]. Assume h ( t ) is not analytic at a < a < · · · < a k = 1 where k ≥ . By Theorem on the parametrization of a semi-analyticarc in [35] and the assumption that f is analytic, the valueof f ( h ( t )) within any interval [ a (cid:96) − , a (cid:96) ] should be equalto some analytic function deﬁned over an open superset of [ a (cid:96) − , a (cid:96) ] . Since f ( h (1)) < f ( h (0)) , the function f ( h ( t )) cannot be a constant function over [0 , . Let [ a (cid:96) − , a (cid:96) ] bethe ﬁrst interval within which f ( h ( t )) is not constant, then f ( h ( a (cid:96) − )) = f ( h (0)) . As f ( h ( t )) within [ a (cid:96) − , a (cid:96) ] equalsto a analytic function deﬁned over an open superset of [ a (cid:96) − , a (cid:96) ] , there must be a small subinterval [ a (cid:96) − , a (cid:96) − + δ ) for some δ > within which we always have f ( h ( t )) = f ( h ( a (cid:96) − )) + ∞ (cid:88) i =0 c i ( t − a (cid:96) − ) i , where the right hand side is the Taylor expansion of f ( h ( t )) at a (cid:96) − . Since f ( h ( t )) is not constant over [ a (cid:96) − , a (cid:96) ] , thecoefﬁcients c i cannot all be zeros by the identity theorem.Suppose c i is the ﬁrst nonzero coefﬁcient in the sequence { c i } ∞ i =0 , we have two cases. If c i > , then f ( h ( t )) is strictlyincreasing within [ a (cid:96) − , a (cid:96) − + δ (cid:48) ) for some small positive δ (cid:48) <δ . It contradicts to the facts that f ( h ( a (cid:96) − )) = f ( x ) and f ( h ( t )) ≤ f ( x ) for all t ∈ [0 , . If c i < , then f ( h ( t )) is strictly decreasing within [ a (cid:96) − , a (cid:96) − + δ (cid:48) ) for some smallpositive δ (cid:48) < δ . Then we can construct a new path h such that h ( t ) = h ( t · ( a (cid:96) − + δ (cid:48) )) for all t ∈ [0 , . It is easy to checksuch h satisﬁes all the requirements in Lemma 5.Now we consider weaker versions of (C1) and (C2).(C3) For any x ∈ ˆ X \ X , there is a path h x in ˆ X such that h x (0) = x, h x (1) ∈ X , both f ( h x ( t )) and V ( h x ( t )) arenon-increasing for t ∈ [0 , .(C4) All the { L ( h x ) } x ∈ ˆ X \X are ﬁnite and uniformly bounded.Compared to (C1), (C3) does not require f ( h x (0)) >f ( h x (1)) to strictly hold. Then we have a weaker version ofTheorem 3 as follows. Lemma 6 (weaker necessary condition) . If (2) is exact to (1) and any local optimum of (1) is also globally optimal,then there always exists a Lyapunov-like function V and a We can always do so because x is not a local optimum of (1). Theinequality L ( h ) < (cid:15) is satisﬁed because of the p -regularity of U . corresponding family of paths { h x } x ∈ ˆ X \X satisfying (C3) and(C4).

We now show that Lemma 6, though weaker in its statement,actually implies Theorem 3, so later on we will only focuson the proof of Lemma 6. To see this, we suppose V † and { h † x } x ∈ ˆ X \X are the Lyapunov-like function and pathsguaranteed by Lemma 6.For each x ∈ ˆ X \ X , if h † x (1) is a local optimum (so it isalso a global optimum) of (1) then we must have f ( h x (1))

We construct V as V ( x ) = inf h ∈ ˆ H h (0)= xh (1) ∈X L ( h ) (4) Lemma 7.

For a sequence ( h i ) ∞ i =1 where h i ∈ ˆ H , if both ( h i ) ∞ i =1 and ( L ( h i )) ∞ i =1 are uniformly bounded, then theremust be a subsequence which uniformly converges to some h ∗ such that its arc-length reparameterization, denoted as h ∗ ,is in ˆ H . Furthermore, L ( h ∗ ) = L ( h ∗ ) ≤ lim sup i L ( h i ) .Proof. By Lemma 2, ( h i ) ∞ i =1 is both uniformly bounded anduniformly equicontinuous. By Arzel`a-Ascoli theorem, a subse-quence of ( h i ) ∞ i =1 uniformly converges to a limit h ∗ . Without loss of generalization, we denote this subsequence as ( h i ) ∞ i =1 as well. By uniform limit theorem and the compactness of ˆ X , h ∗ is a continuous function mapping from [0 , to ˆ X . To show h ∗ ∈ ˆ H , it is sufﬁcient to show f ( h ∗ ( t )) ≥ f ( h ∗ (1)) for all t ∈ [0 , and L ( h ∗ ) < ∞ . If f ( h ∗ ( t )) = f ( h ∗ (1)) − (cid:15) for some t ∈ [0 , and (cid:15) > , then for sufﬁciently large i , we wouldhave | f ( h i ( t )) − f ( h ∗ ( t )) | < (cid:15)/ and | f ( h i (1)) − f ( h ∗ (1)) | <(cid:15)/ . Thus f ( h i ( t )) ≤ f ( h i (1)) − (cid:15)/ and it contradicts to h i ∈ ˆ H .Instead of showing L ( h ∗ ) < ∞ , we directly prove L ( h ∗ ) ≤ lim sup i L ( h i ) . Otherwise, there exists π = ( t , · · · , t N ) ∈ Π such that L π ( h ∗ ) = lim sup i L ( h i ) + (cid:15) for (cid:15) > . Forsufﬁciently large i , we have | L π ( h i ) − L π ( h ∗ ) | = (cid:12)(cid:12)(cid:12) N (cid:88) j =1 (cid:107) h i ( t j − ) − h ( t j ) (cid:107) (cid:96) − N (cid:88) j =1 (cid:107) h ∗ ( t j − ) − h ∗ ( t j ) (cid:107) (cid:96) (cid:12)(cid:12)(cid:12) ≤ N (cid:88) j =1 (cid:16) (cid:107) h i ( t j − ) − h ∗ ( t j − ) (cid:107) (cid:96) + (cid:107) h i ( t j ) − h ∗ ( t j ) (cid:107) (cid:96) (cid:17) ≤ (cid:15) Thus, L ( h i ) ≥ L π ( h i ) ≥ lim sup i L ( h i ) + (cid:15)/ holds forsufﬁciently large i . It contradicts to the deﬁnition of lim sup .As a result, we must have L ( h ∗ ) ≤ lim sup i L ( h i ) . Lemma 8.

The optimization in (4) is feasible and the optimalcost can be achieved.Proof.

We ﬁx some x ∈ ˆ X . To show the feasibility, consider h fea ( t ) := (1 − t ) x + tx ∗ , which is feasible to (4). Let L fea = L ( h fea ) . Since L fea is ﬁnite and L ( h ) is non-negative, V ( x ) must be ﬁnite. To show the achievability of the optimal cost,if not, then there must be a sequence of feasible ( h i ) ∞ i =1 suchthat L fea > L ( h i ) ≥ L ( h i +1 ) > V ( x ) for all i ≥ i →∞ L ( h i ) = V ( x ) . The compactness of ˆ X implies ( h i ) ∞ i =1 is uniformly boundedas well. By Lemma 7, a subsequence of ( h i ) ∞ i =1 , denotedas ( h i ) ∞ i =1 as well, uniformly converges to a limit h ∗ and L ( h ∗ ) = L ( h ∗ ) ≤ V ( x ) . Moreover, h ∗ (0) = h ∗ (0) = lim i →∞ h i (0) = xh ∗ (1) = h ∗ (1) = lim i →∞ h i (1) ∈ X . Above all, we proved h ∗ is feasible to (4), and the cost L ( h ∗ ) is not worse than V ( x ) . It contradicts to the non-achievability assumption.For each x ∈ ˆ X \ X , we construct h x as h x = arg min h ∈ ˆ H h (0)= xh (1) ∈X L ( h ) (5)If there are multiple minimizers then h x can be chosen as anyone of them. Lemma 9.

For x ∈ ˆ X \ X , the function h x is injective. Proof. Otherwise, for some x , there exist t < t such that h x ( t ) = h x ( t ) . Since h x ∈ ˆ H ⊆ ¯ H , we have sup π ∈ Π | t t L π ( h x ) = ( t − t ) L ( h x ) = ( t − t ) V ( x ) > Consider a new path deﬁned as h ∗ ( t ) := (cid:26) h x ( t ) , if t ∈ [0 , \ [ t , t ] h x ( t ) , if t ∈ [ t , t ] It is easy to check h ∗ is continuous and entirely within ˆ X .For any t ∈ [0 , , h x ∈ ˆ H implies f ( h ∗ ( t )) ≥ f ( h x (1)) = f ( h ∗ (1)) . Further, we have L ( h ∗ ) = sup π ∈ Π L π ( h ∗ )= sup π ∈ Π | t L π ( h ∗ ) + sup π ∈ Π | t t L π ( h ∗ ) + sup π ∈ Π | t L π ( h ∗ )= sup π ∈ Π | t L π ( h x ) + 0 + sup π ∈ Π | t L π ( h x ) < sup π ∈ Π | t L π ( h x ) + sup π ∈ Π | t t L π ( h x ) + sup π ∈ Π | t L π ( h x )= sup π ∈ Π L π ( h x ) = L ( h x ) . Above all, the arc-length reparameterization of h ∗ , denoted as h ∗ , is feasible to (5) but achieves a strictly lower cost than h x . This contradicts to the optimality of h x . Corollary 4.

For distinctive t , t , t ∈ [0 , , if f ( h x ( t )) ≥ f ( h x ( t )) and f ( h x ( t )) > f ( h x ( t )) , then (cid:107) h x ( t ) − h x ( t ) (cid:107) (cid:96) + (cid:107) h x ( t ) − h x ( t ) (cid:107) (cid:96) > (cid:107) h x ( t ) − h x ( t ) (cid:107) (cid:96) .Proof. It is sufﬁcient to show h x ( t ) is not the convexcombination of h x ( t ) and h x ( t ) . Otherwise, we assume h x ( t ) = λh x ( t ) + (1 − λ ) h x ( t ) for some λ ∈ [0 , . First,Lemma 9 implies λ (cid:54) = 0 , . For λ ∈ (0 , , the convexity of f implies f ( h x ( t )) = f ( λh x ( t ) + (1 − λ ) h x ( t )) ≤ λf ( h x ( t )) + (1 − λ ) f ( h x ( t )) <λf ( h x ( t )) + (1 − λ ) f ( h x ( t )) = f ( h x ( t )) . This contradiction shows h x ( t ) is not the convex combinationof h x ( t ) and h x ( t ) . Then the triangle inequality implies thiscorollary. Lemma 10.

For each h x deﬁned in (5) , we have f ( h x ( t )) isnon-increasing in t for t ∈ [0 , .Proof. We ﬁx an x ∈ ˆ X \ X and prove the result for h x deﬁned above. If not, then there exist ≤ t < t ≤ suchthat f ( h x ( t )) < f ( h x ( t )) . Now deﬁne t † = arg max t ∈ [ t , f ( h x ( t )) , t ‡ = max t ∈ [ t † , f ( h x ( t ))= f ( h x ( t † )) t. In other words, t † is an arbitrary maximizer of f ( h x ( t )) ,while t ‡ is the largest maximizer. Clearly, both t † and t ‡ are well deﬁned (due to the continuity and closedness) andare strictly between t and . We also have f ( h x ( t ‡ )) >f ( h x (1)) strictly holds. By the continuity of h x ( · ) and f ( h x ( · )) , there exist r, δ > such that [ t ‡ − δ, t ‡ + δ ] ⊆ ( t , and • for x ∈ B ( h x ( t ‡ ) , r ) ∩ ˆ X , f ( h x (1)) ≤ f ( x ) . • for t ∈ [ t ‡ − δ, t ‡ ) , h x ( t ) ∈ B ( h x ( t ‡ ) , r ) . Therefore f ( h x (1)) ≤ f ( h x ( t )) ≤ f ( h x ( t ‡ )) . • for t ∈ ( t ‡ , t ‡ + δ ] , h x ( t ) ∈ B ( h x ( t ‡ ) , r ) . Therefore f ( h x (1)) ≤ f ( h x ( t )) < f ( h x ( t ‡ )) .Now we construct another h ∗ as h ∗ ( t ) =  h x ( t ) , if t ∈ [0 , \ [ t ‡ − δ, t ‡ + δ ] t ‡ + δ − t δ h x ( t ‡ − δ ) + t − t ‡ + δ δ h x ( t ‡ + δ ) , if t ∈ [ t ‡ − δ, t ‡ + δ ] . It is easy to verify h ∗ is continuous. For t ∈ [ t ‡ − δ, t ‡ + δ ] , h ∗ ( t ) is the convex combination of h x ( t ‡ − δ ) and h x ( t ‡ + δ ) ,and must be within B ( h x ( t ‡ ) , r ) ∩ ˆ X , which is convex. There-fore, h ∗ is entirely within ˆ X and f ( h ∗ ( t )) ≥ f ( h x (1)) = f ( h ∗ (1)) holds for all t .Next, we are showing L ( h ∗ ) < L ( h x ) by (6). The strictinequality in (6) is because of Corollary 4.Above all, the arc-length reparameterization of h ∗ , denotedas h ∗ , is feasible to (5) but achieves a strictly lower cost than h x . This contradicts to the optimality of h x . D. Veriﬁcation1) To show V satisﬁes Deﬁnition 10: It is sufﬁcient to show V is continuous in x . The proof is twofold. To abuse thenotations a little bit, we let h x ( t ) ≡ x for x ∈ X , so such h x is the unique minimizer of (4) and L ( h x ) = V ( x ) = 0 for x ∈ X .First we show for x ∈ ˆ X and (cid:15) > , there exists δ + > such that ∀ x ∈ B ( x , δ + ) ∩ ˆ X , V ( x ) ≤ V ( x ) + (cid:15) . There aretwo scenarios. If h x (1) is a global optimum of (1), then wecould set δ + = (cid:15) . For any x ∈ B ( x , δ + ) ∩ ˆ X , construct h ∗ ( t ) = (cid:26) (1 − t ) x + 2 tx , t ∈ [0 , ] h x (2 t − , t ∈ ( , . Its arc-length reparameterization h ∗ is feasible to (4) (w.r.t. x ) and V ( x ) ≤ L ( h ∗ ) = | x − x | + L ( h x ) ≤ V ( x ) + (cid:15) .Next we focus on the scenario that h x (1) is not a globaloptimum of (1), so it is not a local optimum neither. ByLemma 5, there is a path h (cid:111) in X such that h (cid:111) (0) = h x (1) , f ( h (cid:111) ( t )) is non-increasing in t , f ( h (cid:111) (1)) < f ( h (cid:111) (0)) and L ( h (cid:111) ) < (cid:15)/ . Suppose f ( h (cid:111) (0)) − f ( h (cid:111) (1)) = τ > . Since f is continuous, there must be some γ > such that forany x ∈ B ( x , γ ) ∩ ˆ X , we have | f ( x ) − f ( x ) | < τ . Nowwe choose δ + as min( γ, (cid:15)/ . For any x ∈ B ( x , δ + ) ∩ ˆ X ,construct h ∗ ( t ) =  (1 − t ) x + 3 tx , t ∈ [0 , ] h x (3 t − , t ∈ ( , ] h (cid:111) (3 t − , t ∈ ( , . Its arc-length reparameterization h ∗ is feasible to (4) (w.r.t. x )and V ( x ) ≤ L ( h ∗ ) = | x − x | + L ( h x ) + L ( h (cid:111) ) ≤ δ + + V ( x ) + (cid:15)/ ≤ V ( x ) + (cid:15) .Second we show for x ∈ ˆ X and (cid:15) > , there exists δ − > such that ∀ x ∈ B ( x , δ − ) ∩ ˆ X , V ( x ) ≥ V ( x ) − (cid:15) . If not, thenthere must be a sequence ( x i ) ∞ i =1 such that lim i →∞ x i = x but V ( x i ) < V ( x ) − (cid:15) for all i ≥ . Let h i := h x i for i ≥ ,then both ( h i ) ∞ i =1 and ( L ( h i )) ∞ i =1 are uniformly bounded. ByLemma 7, a subsequence of ( h i ) ∞ i =1 uniformly converges to alimit h ∗ and L ( h ∗ ) = L ( h ∗ ) ≤ lim sup i L ( h i )= lim sup i V ( x i ) ≤ V ( x ) − (cid:15). Lemma 7 also indicates h ∗ ∈ ˆ H and h ∗ (0) = lim i →∞ h i (0) =lim i →∞ x i = x , h ∗ (1) = lim i →∞ h i (1) ∈ X (as X isclosed). Therefore, h ∗ is feasible to (4) but its cost is strictlylower than V ( x ) . It leads to the contradiction.

2) To show (C3) holds:

By our construction (5), h x isentirely within ˆ X , and h x (0) = x, h x (1) ∈ X . Lemma 10shows f ( h x ( t )) is non-increasing for t ∈ [0 , . It is sufﬁcientto show V ( h x ( t )) is also non-increasing for t ∈ [0 , .Consider the following lemma. Lemma 11.

For ﬁxed x ∈ ˆ X and t ∈ [0 , , V ( h x ( t )) = sup π ∈ Π | t L π ( h x ) Proof.

Let x = h x ( t ) . We have L ( h x ) = V ( h x ( t )) . Ifthe lemma does not hold, then we have two cases.First, if L ( h x ) < sup π ∈ Π | t L π ( h x ) , then let h ∗ ( t ) = (cid:26) h x (2 t t ) , t ∈ [0 , ] h x (2 t − , t ∈ ( , . It is easy to check h ∗ is continuous and entirely within ˆ X ,and h ∗ (0) = x , h ∗ (1) = h x (1) ∈ X . For t ∈ [0 , / , f ( h ∗ ( t )) = f ( h x (2 t t )) ≥ f ( h x ( t )) = f ( x )= f ( h x (0)) ≥ f ( h x (1)) = f ( h ∗ (1)) . For t ∈ [1 / , , f ( h ∗ ( t )) = f ( h x (2 t − ≥ f ( h x (1)) = f ( h ∗ (1)) . Further, we have L ( h ∗ ) = sup π ∈ Π L π ( h ∗ )= sup π ∈ Π | . L π ( h ∗ ) + sup π ∈ Π | . L π ( h ∗ )= sup π ∈ Π | t L π ( h x ) + sup π ∈ Π L π ( h x )= sup π ∈ Π | t L π ( h x ) + L ( h x ) < sup π ∈ Π | t L π ( h x ) + sup π ∈ Π | t L π ( h x ) = L ( h x ) . Above all, the arc-length reparameterization of h ∗ , denoted as h ∗ , is feasible to (5) (w.r.t. x ) but achieves a strictly lowercost than h x . This contradicts to the optimality of h x .Second, if L ( h x ) > sup π ∈ Π | t L π ( h x ) , then let h ∗ ( t ) = (cid:26) h x ( t ) , t ∈ [0 , t ] h x ( t ) , t ∈ ( t , . It is easy to check h ∗ is continuous and entirely within ˆ X , and h ∗ (0) = h x ( t ) = x , h ∗ (1) = h x (1) ∈ X . For t ∈ [0 , , L ( h ∗ ) = sup π ∈ Π L π ( h ∗ ) = sup π ∈ Π | t ‡− δ L π ( h ∗ ) + sup π ∈ Π | t ‡ + δt ‡− δ L π ( h ∗ ) + sup π ∈ Π | t ‡ + δ L π ( h ∗ )= sup π ∈ Π | t ‡− δ L π ( h x ) + (cid:107) h x ( t ‡ − δ ) − h x ( t ‡ + δ ) (cid:107) (cid:96) + sup π ∈ Π | t ‡ + δ L π ( h x ) < sup π ∈ Π | t ‡− δ L π ( h x ) + (cid:107) h x ( t ‡ − δ ) − h x ( t ‡ ) (cid:107) (cid:96) + (cid:107) h x ( t ‡ ) − h x ( t ‡ + δ ) (cid:107) (cid:96) + sup π ∈ Π | t ‡ + δ L π ( h x ) ≤ sup π ∈ Π | t ‡− δ L π ( h x ) + sup π ∈ Π | t ‡ + δt ‡− δ L π ( h x ) + sup π ∈ Π | t ‡ + δ L π ( h x ) = sup π ∈ Π L π ( h x ) = L ( h x ) . (6) f ( h x ( t )) ≥ f ( h x (1)) implies f ( h ∗ ( t )) ≥ f ( h x (1)) = f ( h ∗ (1)) . Further, we have L ( h ∗ ) = sup π ∈ Π L π ( h ∗ ) = sup π ∈ Π | t L π ( h ∗ ) + sup π ∈ Π | t L π ( h ∗ )=0 + sup π ∈ Π | t L π ( h x ) < L ( h x ) . Above all, the arc-length reparameterization of h ∗ , denoted as h ∗ , is feasible to (5) (w.r.t. x ) but achieves a strictly lowercost than h x . This contradicts to the optimality of h x .Using this lemma, we are in a good position to show V ( h x ( t )) is non-increasing for t ∈ [0 , . For any t < t ,we have V ( h x ( t )) = sup π ∈ Π | t L π ( h x ) = sup π ∈ Π | t t L π ( h x ) + sup π ∈ Π | t L π ( h x ) ≥ sup π ∈ Π | t L π ( h x ) = V ( h x ( t )) .

3) To show (C4) holds:

The set { L ( h x ) } x ∈ ˆ X \X is uni-formly bounded by max x ∈ ˆ X | x − x ∗ | , which is ﬁnite.To summarize, we have veriﬁed that such construction iswell deﬁned and satisﬁes both (C3) and (C4), so Lemma 6is proved. Since we have shown that Lemma 6 impliesTheorem 3, the latter is also proved.V. O THER P ROPERTIES

A. Constructing from Primitives

Though the previous section guarantees the existence of theLyapunov-like function and paths under certain conditions, itis not clear how to systematically ﬁnd or construct them. Inthis subsection, we show that if one can ﬁnd the Lyapunov-likefunction and paths for some primitive problems, then there arenatural ways to construct the Lyapunov-like function and pathsfor new problems built up from those primitives in certainways. To streamline the notations, we will use the tuple ( f, X ) to refer to (1) and the tuple ( f, X , ˆ X ) to refer to the problempair (1), (2). Assume ( V, { h x } x ∈ ˆ X \X ) is a valid constructionof Lyapunov-like function and paths for ( f, X , ˆ X ) . In thissubsection, when we say V and h x are valid, it means theynot only are valid by deﬁnition, but also satisfy (C1) and (C2).

1) Function Composition:

Suppose g : R → R is non-decreasing and convex. Then ( V, { h x } x ∈ ˆ X \X ) is also a validconstruction of Lyapunov-like function and paths for ( g ◦ f, X , ˆ X ) . This result is trivial as g ◦ f preserves the convexityover ˆ X and monotonicity over any path.

2) Union of Feasible Sets:

Suppose for two pairs of prob-lems ( f , X , ˆ X ) , for which ( V , { h x } x ∈ ˆ X \X ) is valid, and ( f , X , ˆ X ) , for which ( V , { h x } x ∈ ˆ X \X ) is valid. We con-sider a new problem ( f, X , ˆ X ) where X := ( X ∪X ) ∩ ˆ X ∩ ˆ X and ˆ X := ˆ X ∩ ˆ X . The formulation of f will be providedlater. If for any x ∈ ˆ X \ X , we have h x ≡ h x , then construct ˜ V : ˆ X → R such that ˜ V ( x ) := V ( x ) · V ( x ) and ˜ h x = h x for all x ∈ ˆ X \ X . We have the following two results.

Corollary 5.

For any λ ∈ (0 , , deﬁne f : ˆ X → R as f ( x ) := λf ( x ) + (1 − λ ) f ( x ) . Then ( ˜ V , { ˜ h x } x ∈ ˆ X \X ) is valid for ( f, X , ˆ X ) . Corollary 6.

Deﬁne function f : ˆ X → R as f ( x ) :=max( f ( x ) , f ( x )) . Then ( ˜ V , { ˜ h x } x ∈ ˆ X \X ) is valid for ( f, X , ˆ X ) .Proof for Corollary 5 and Corollary 6. The function ˜ V isstill continuous and vanishes if and only if x ∈ X (since V ( x ) = 0 ⇔ V ( x ) = 0 or V ( x ) = 0 ). By construction, { h x } x ∈ ˆ X \X is a subset of { h x } x ∈ ˆ X \X so (C2) is naturallysatisﬁed. To see (C1) holds, we ﬁx any x ∈ ˆ X \ X . Then h x (0) = h x ( x ) = x and h x (1) = h x (1) ∈ X ⊆ X . Further, V ( h x ( t )) = V ( h x ( t )) V ( h x ( t )) = V ( h x ( t )) V ( h x ( t )) as h x and h x conincide when x ∈ ˆ X \X . Because both V ( h x ( t )) and V ( h x ( t )) are non-negative and non-increasing, so is V ( h x ( t )) . Finally, as f ( h x ( t )) and f ( h x ( t )) are both non-increasing over [0 , , their convex-combination or maximum(i.e., f ( h x ( t )) ) must be non-increasing as well. A similarargument can also be applied to show f ( h x (1)) < f ( h x (0)) .Thus (C1) holds and it completes the proof.

3) Intersection of Feasible Sets:

We still consider two pairsof problems ( f , X , ˆ X ) , for which ( V , { h x } x ∈ ˆ X \X ) isvalid, and ( f , X , ˆ X ) , for which ( V , { h x } x ∈ ˆ X \X ) is valid.Different from the previous setting, two pairs are required toshare the same relaxation set ˆ X . Further, we view each x ∈ ˆ X as a tuple with two parts x := ( u, v ) . Deﬁne P and P astwo projection operators such that P x = u and P x = v .We consider a new problem ( f, X , ˆ X ) where X := X ∩X .The formulation of f will be provided later.If f i , V i and h ix are completely separated with respect to u and v in the sense that for i = 1 , , f i ( x ) , V i ( x ) depend on P i x only and P − i ( h ix ( t )) is constant, then we can construct ˜ V as ˜ V ( x ) := V ( x ) + V ( x ) . For x ∈ ˆ X \ X , the path ˜ h x is TABLE IA

SUMMARY ON CONSTRUCTING V AND h x FROM PRIMITIVES .Operation Primitive problem New problem

V, h x for new problem Additional requirementsFunctionComposition ( f, X , ˆ X ) : V, h x ( g ◦ f, X , ˆ X ) ˜ V := V ˜ h x := h x g is non-decreasing andconvexUnion of feasible sets(with cost as the sum) ( f , X , ˆ X ) : V , h x ( f , X , ˆ X ) : V , h x ( f, X , ˆ X ) where f := λf + (1 − λ ) f X := ( X ∪ X ) ∩ ˆ X ∩ ˆ X ˆ X := ˆ X ∩ ˆ X ˜ V := V × V ˜ h x := h x h x and h x coincide for x ∈ ˆ X \ X

Union of feasible sets(with cost as the max-imum) ( f , X , ˆ X ) : V , h x ( f , X , ˆ X ) : V , h x ( f, X , ˆ X ) where f := max( f , f ) X := ( X ∪ X ) ∩ ˆ X ∩ ˆ X ˆ X := ˆ X ∩ ˆ X ˜ V := V × V ˜ h x := h x h x and h x coincide for x ∈ ˆ X \ X

Intersection of feasi-ble sets (with cost asthe sum) ( f , X , ˆ X ) : V , h x ( f , X , ˆ X ) : V , h x ( u, v ) =: x ∈ ˆ X ( f, X , ˆ X ) where f := λf + (1 − λ ) f X := X ∩ X ˜ V := V + V ˜ h x is deﬁned as in (7) f i ( x ) , V i ( x ) depend on P i x only and P − i h ix isconstantIntersection of feasi-ble sets (with cost asthe maximum) ( f , X , ˆ X ) : V , h x ( f , X , ˆ X ) : V , h x ( u, v ) =: x ∈ ˆ X ( f, X , ˆ X ) where f := max( f , f ) X := X ∩ X ˜ V := V + V ˜ h x is deﬁned as in (7) f i ( x ) , V i ( x ) depend on P i x only and P − i h ix isconstant constructed in three ways depending on the values of V ( x ) and V ( x ) .If V ( x ) = 0 then ˜ h x := h x , (7a)If V ( x ) = 0 then ˜ h x := h x , (7b)If V ( x ) , V ( x ) > then ˜ h x ( t ) := (cid:26) h x (2 t ) , t ∈ [0 , ) h h x (1) (2 t − , t ∈ [ , . (7c) Corollary 7.

For any λ ∈ (0 , , deﬁne f : ˆ X → R as f ( x ) := λf ( x ) + (1 − λ ) f ( x ) . Then ( ˜ V , { ˜ h x } x ∈ ˆ X \X ) is valid for ( f, X , ˆ X ) . Corollary 8.

Deﬁne function f : ˆ X → R as f ( x ) :=max( f ( x ) , f ( x )) . Then ( ˜ V , { ˜ h x } x ∈ ˆ X \X ) is valid for ( f, X , ˆ X ) .Proof for Corollary 7 and Corollary 8. The function ˜ V isstill continuous and vanishes if and only if x ∈ X (since V ( x ) = 0 ⇔ V ( x ) = 0 and V ( x ) = 0 ). The set { ˜ h x } x ∈ ˆ X \X satisﬁes (C2) as each path is constructed either as h ix or theconcatenation of h x and h h x (1) . Next we are showing ˜ h x (1) isin X . If ˜ h x is constructed by (7a), then we have ˜ V (˜ h x (1)) = V ( h x (1)) + V ( h x (1)) = V ( h x (1)) . Since V ( h x (1)) onlydepends on P h x (1) and P h x (1) = P h x (0) = P x , theremust be V ( h x (1)) = V ( x ) = 0 . Thus ˜ V (˜ h x (1)) = 0 and ˜ h x (1) ∈ X . It is similar if ˜ h x is constructed by (7b). When ˜ h x is constructed by (7c), then ˜ V (˜ h x (1)) = V ( h h x (1) (1)) + V ( h h x (1) (1))= V ( h h x (1) (1)) = V ( h h x (1) (0))= V ( h x (1)) = 0 , so ˜ h x (1) ∈ X as well. The monotonicity properties of ˜ V (˜ h x ( t )) and f (˜ h x ( t )) are also the direct consequence of thefact that f i , V i and h ix are completely separated.A summary of this subsection has been provided in Table I. B. Weak Exactness

One observation from the proof of Theorem 1 is we do notactually need f ( h x (0)) > f ( h x (1)) to eliminate genuine localoptima. However, such strict inequality is required to showthe exactness. We can consider a weaker version of exactnessdeﬁned as follows. Deﬁnition 12 (Weak Exactness) . We say the relaxation (2) is weakly exact with respect to (1) if at least one optimum of (2) is feasible, and hence globally optimal, for (1) . Theorem 4.

If there exists a Lyapunov-like function V associ-ated with (1) and (2) such that (C3) and (C2) hold, then (2) isweakly exact with respect to (1) and any local optimum in X for (1) is either a global optimum or a pseudo local optimum. The argument on weak exactness follows from the fact thatthe path connects any global optimum of (2) must determine aendpoint in X with the same cost, which by deﬁnition must bea global optimum as well. The argument on local optimalityfollows directly from the proof of Theorem 1.VI. A PPLICATIONS

In this sections we will use two examples to show forspeciﬁc problems, what V and { h x } might look like. The ﬁrstexample is Optimal Power Flow (OPF) problem in power sys-tems with tree structres, which is also the motivating problemfor us to develop this theory. By ﬁnding the Lyapunov-likefunction and paths, we show in [1] the ﬁrst known condition(that can be checked a priori ) for OPF to have no spuriouslocal optima. The same condition was only known to guaranteeexact relaxation before our work.In the second example, we study the Low Rank SemideﬁniteProgram (LRSDP) problem, which was known to have weaklyexact relaxation [5], [6] and no spurious local optima [7] inexisting literatures. Speciﬁcally, we show that part of the re-sults proved in [7] can also be proved by ﬁnding appropriate V and { h x } . They exemplify the usage of Theorem 1, Theorem 2and Theorem 4 in practice. A. Optimal Power Flow

Consider a radial power network with an underlying con-nected directed graph G ( V , E ) . Let V := { , , · · · , N − } be the set of buses (i.e., nodes), and E ⊆ V × V be the set ofpower lines (i.e., edges). We will refer to a power line frombus j to bus k by j → k or ( j, k ) interchangeably. For eachpower line ( j, k ) , its series admittance is denoted by y jk ∈ C ,and its series impedance is hence z jk := y − jk . Both the realand imaginary parts of z jk are assumed to be positive.As we assume G is a tree, we can adopt the DistFlow Model[36], [37] to formulate power ﬂow equations. For each bus j ,let V j ∈ C , s j = p j + i q j ∈ C denote its voltage and businjection respectively. For line ( j, k ) , let S jk and I jk ∈ C denote the branch power ﬂow and current from bus j to k , bothat the sending end. Let v j := | V j | ∈ R and (cid:96) jk := | I jk | ∈ R .We will denote the conjugate of a complex number a by a H .The power ﬂow equations are: v j = v k + 2Re( z jk S H jk ) − | z jk | (cid:96) jk , ∀ ( j, k ) ∈ E (8a) v j = | S jk | (cid:96) jk , ∀ ( j, k ) ∈ E (8b) s j = (cid:88) k : j → k S jk − (cid:88) i : i → j ( S ij − z ij (cid:96) ij ) , ∀ j ∈ V . (8c)Given a cost function f ( s ) : C N → R , we are interested inthe following OPF problem:minimize x =( s,v,(cid:96),S ) f ( s ) (9a)subject to (8) (9b) v j ≤ v j ≤ v j (9c) s j ≤ s j ≤ s j (9d) (cid:96) jk ≤ (cid:96) jk (9e)All the inequalities for complex numbers in this section areenforced for both the real and imaginary parts. Deﬁnition 13.

A function g : R → R is strongly increasing ifthere exists real c > such that for any a > b , we have g ( a ) − g ( b ) ≥ c ( a − b ) . We now make the following assumptions on OPF:(i) The underlying graph G is a tree.(ii) The cost function f is convex, and is strongly increasingin Re( s j ) (or Im( s j ) ) for each j ∈ V and non-decreasingin Im( s j ) (or Re( s j ) ).(iii) The problem (9) is feasible.(iv) The line current limit satisﬁes (cid:96) jk ≤ v j | y jk | .Assumption (i) is generally true for distribution networks andassumption (iii) is typically mild. As for (ii), f is commonlyassumed to be convex and increasing in Re( s j ) and Im( s j ) in the literature (e.g., [38], [9]). Assumption (ii) is onlyslightly stronger since one could always perturb any increasingfunction by an arbitrarily small linear term to achieve strongmonotonicity. Assumption (iv) is not common in the literaturebut is also mild because of the following reason. Typically V j = (1 + (cid:15) j ) e i θ j in per unit where (cid:15) ∈ [ − . , . and theangle difference θ jk := θ j − θ k between two neighboring buses j, k typically has a small magnitude. Thus the maximum valueof | V j − V k | = | (1 + (cid:15) j ) e i θ jk − (1 + (cid:15) k ) | , which is equivalentto (cid:96) jk / | y jk | , should be much smaller than v j which is ≈ per unit.Problem (9) is non-convex, as constraint (8b) is not convex.Denote by X the set of ( s, v, (cid:96), S ) that satisfy (9b)-(9e), so(9) is in the form of (1). We can relax (9) by convexifying(8b) into a second-order cone [8]:minimize x =( s,v,(cid:96),S ) f ( s ) (10a)subject to (8a) , (8c) , (9c) − (9e) (10b) | S jk | ≤ v j (cid:96) jk (10c)One can similarly regard ˆ X as the set of ( s, v, (cid:96), S ) thatsatisfy (10b), (10c). It is proved in [8] that if s j = −∞ − i ∞ for all j ∈ V , then (10) is exact, meaning any optimal solutionof (10) is also feasible and hence globally optimal for (9). Nowwe show that the same condition also guarantees that any localoptimum of (9) is also globally optimal. This implies that alocal search algorithm such as the primal-dual interior pointmethod can produce a global optimum as long as it converges. Theorem 5. If s j = −∞ − i ∞ for all j ∈ V , then any localoptimum of (9) is a global optimum.Proof. Our strategy is to construct appropriate V and { h x } and then prove such construction satsify both Condition (C)and Condition (C’). Let V ( x ) := (cid:88) ( j,k ) ∈E v j (cid:96) jk − | S jk | . (11)Clearly, V is a valid Lyapunov-like function satisfying Deﬁ-nition 10.For each x = ( s, v, (cid:96), S ) ∈ ˆ X \ X , let M be the set of ( j, k ) ∈ E such that | S jk | < v j (cid:96) jk . For ( j, k ) ∈ M , thequadratic function φ jk ( a ) := | z jk | a + (cid:0) v j − Re( z jk S H jk ) (cid:1) a + | S jk | − v j (cid:96) jk must have a unique positive root as φ jk (0) < . We deﬁne ∆ jk to be this positive root if ( j, k ) ∈ M and otherwise.Assumption (iv) implies (cid:96) jk ≤ v j | y jk | , and therefore v j − Re( z jk S H jk ) ≥ v j − | z jk || S jk |≥ v j − | z jk | (cid:112) v j (cid:96) jk ≥ v j − | z jk | (cid:113) v j | y jk | = 0 . It further implies φ jk ( a ) is strictly increasing for a ∈ [0 , ∆ jk ] .Now consider the path h x ( t ) := (˜ s ( t ) , ˜ v ( t ) , ˜ (cid:96) ( t ) , ˜ S ( t )) for t ∈ [0 , , where ˜ s j ( t ) = s j − t (cid:88) i : i → j z ij ∆ ij − t (cid:88) k : j → k z jk ∆ jk , (12a) ˜ v j ( t ) = v j , (12b) ˜ (cid:96) jk ( t ) = (cid:96) jk − t ∆ jk , (12c) ˜ S jk ( t ) = S jk − t z jk ∆ jk . (12d) Clearly we have h x (0) = x . It can be easily checked that h x ( t ) is feasible for (10) for t ∈ [0 , and h x (1) is feasiblefor (9) (see [1]). Therefore, h x is indeed [0 , → ˆ X and h x (1) ∈ X .Since z jk > , both real and imaginary parts of ˜ s j ( t ) arestrictly decreasing for ( j, k ) ∈ M and stay unchanged other-wise. By assumption (ii), f (˜ s ( t )) is also strictly decreasing.To show V ( h x ( t )) is also decreasing, we notice that V ( h x ( t )) equals to (cid:88) ( j,k ) ∈E ˜ v j ( t )˜ (cid:96) jk ( t ) − | ˜ S jk ( t ) | = (cid:88) ( j,k ) ∈M c v j (cid:96) jk − | S jk | + (cid:88) ( j,k ) ∈M ˜ v j ( t )˜ (cid:96) jk ( t ) − | ˜ S jk ( t ) | = (cid:88) ( j,k ) ∈M c v j (cid:96) jk − | S jk | − (cid:88) ( j,k ) ∈M φ jk ( t ∆ jk ) . As φ jk ( a ) is strictly increasing for a ∈ [0 , ∆ jk ] , we concludethat V ( h x ( t )) is strictly decreasing for t ∈ [0 , .By Corollary 1, the set { h x } x ∈ ˆ X \X is uniformly boundedand uniformly equicontinuous as all h x ( t ) are linear functionsin t . In summary, Condition (C) is satisﬁed.Finally, we show Condition (C’) also holds. By assumption(ii), there exists some real c > independent of x such thatfor any ≤ a < b ≤ , f (˜ s ( a )) − f (˜ s ( b )) ≥ c (cid:88) j ∈V Re(˜ s j ( a ) − ˜ s j ( b )) + Im(˜ s j ( a ) − ˜ s j ( b ))= c (cid:107) ˜ s ( a ) − ˜ s ( b ) (cid:107) m where (cid:107) · (cid:107) m is deﬁned as (cid:107) a (cid:107) m := (cid:80) i | Re( a i ) | + | Im( a i ) | over the complex vector space. It is easy to check (cid:107) · (cid:107) m is avalid norm.On the other hand, by (12) we have (cid:107) ˜ v ( a ) − ˜ v ( b ) (cid:107) m ≡ and (cid:107) ˜ (cid:96) ( a ) − ˜ (cid:96) ( b ) (cid:107) m ≤ ( j,k ) ∈E {(cid:107) z jk (cid:107) m } (cid:107) ˜ s ( a ) − ˜ s ( b ) (cid:107) m , (cid:107) ˜ S ( a ) − ˜ S ( b ) (cid:107) m ≤ (cid:107) ˜ s ( a ) − ˜ s ( b ) (cid:107) m . Therefore, (cid:107) h x ( a ) − h x ( b ) (cid:107) m ≤ (cid:16)

32 + 1min ( j,k ) ∈E {(cid:107) z jk (cid:107) m } (cid:17) (cid:107) ˜ s ( a ) − ˜ s ( b ) (cid:107) m and there exists ˆ c > independent of x, a, b such that f (˜ s ( a )) − f (˜ s ( b )) ≥ ˆ c (cid:107) h x ( a ) − h x ( b ) (cid:107) m . Therefore Condition (C’) is also satisﬁed and by Theorem 2,any local optimum of (9) is a global optimum.The results in this subsection only apply to radial networks,which serve as the underlying network for balanced distribu-tion power systems. For transmission systems and unbalanceddistribution systems, networks are usually highly meshed. Ithas been found that for most of meshed networks, both convexrelaxation and local search algorithms can also yield the globaloptimum for most of testcases [39], [40]. Thus Theorem 3 suggests that there may also exist similar Lyapunov-like func-tion and paths for meshed networks. Finding such Lyapunov-like function and paths would be an interesting future work toextend our results in this paper.

B. Low Rank Semideﬁnite Program

This subsection proves a known result in [7] but using adifferent approach. Adopting the same notations as [7], wehave the following problem.minimize X ≥ tr( CX ) (13a)subject to tr( A i X ) = b i , i = 1 , · · · , m (13b) rank( X ) ≤ r (13c)Here, C , A i , X are all n -by- n matrices. We assume theproblem is feasible and { X ≥ | (13b) } is compact. Theorem 6. If ( r + 1)( r + 2) / > m + 1 , then any localoptimum of (13) is either a global optimum or a pseudo localoptimum. Before proving Theorem 6, we consider the convex relax-ation of (13) asminimize X ≥ tr( CX ) (14a)subject to tr( A i X ) = b i , i = 1 , · · · , m (14b)As a side note, the results in [5], [6] show that if ( r + 1)( r +2) / > m , then (14) is weakly exact to (13). While ourtheorem is the same as in [7], some insights to ﬁnd V and { h X } are also from the structures ﬁrst raised in [5], [6]. Proof.

Clearly, (13) can be reformulated in the form of (1) bysetting f ( X ) = tr( CX ) , X = { X ≥ | (13b) , (13c) } and ˆ X = { X ≥ | (13b) } . Deﬁne V as V ( X ) := n (cid:88) i = r +1 λ i ( X ) where λ i ( X ) is the i th eigenvalue of X (in decreasing order).This function V satisﬁes Deﬁnition 10 and is concave.For ﬁxed X ∈ ˆ X \ X , we denote rank( X ) as r > r .We ﬁrst construct r − r paths labeled as h , h , · · · , h r − r .When we construct h i , if i > then we assume path h i − has already been constructed and let X i − := h i − (1) . Welet X = X . For i ≥ , if rank( X i − ) ≤ r − i then welet h i ( t ) ≡ X i − for t ∈ [0 , . Otherwise, we decompose X i − as U Σ U H where Σ is a k -by- k positive deﬁnite diagonalmatrix with k = rank( X i − ) > r − i . The linear system tr( CU Y U H ) =0tr( A i U Y U H ) =0 , i = 1 , · · · , m (15)must have a non-zero solution for Hermitian matrix Y ∈ C k × k . To see this, we have k ≥ r − i + 1 ≥ r + 1 , andthus k ( k + 1) / ≥ ( r + 1)( r + 2) / > m + 1 . As a result,(15) has more unknown variables than equations. We simplydenote this non-zero solution as Y and for any α ∈ R , αY isalso a solution to (15). The concavity of V also implies that V ( U (Σ + αY ) U H ) is concave in α when U and Σ are ﬁxed.Since Σ > , one of the following two scenarios must be true. • ∃ a < such that V ( U (Σ + αY ) U H ) is non-decreasing, rank( U (Σ+ αY ) U H ) ≤ k for α ∈ [ a, and rank( U (Σ+ aY ) U H ) ≤ k − . • ∃ b > such that V ( U (Σ + αY ) U H ) is non-increasing, rank( U (Σ+ αY ) U H ) ≤ k for α ∈ [0 , b ] and rank( U (Σ+ bY ) U H ) ≤ k − .Without loss of generality, we suppose V ( U (Σ + αY ) U H ) isnon-increasing for α ∈ [0 , b ] (otherwise we take − Y instead).We then construct h i as h i ( t ) = U (Σ + tbY ) U H for t ∈ [0 , .By construction, V ( h i ( t )) is non-increasing and f ( h i ( t )) staysa constant.Finally, we construct h X as the concatenation of paths h , · · · , h r − r . That is to say, h X ( t ) := h i (( r − r ) t − i + 1) for t ∈ (cid:104) i − r − r , ir − r (cid:105) . It is easy to see h X is continuous and h X (0) = h (0) = X . To see h X (1) ∈ X , we prove that rank( X i ) ≤ r − i . We ﬁrst have rank( X ) = rank( X ) = r . For i ≥ ,we have rank( X i ) = rank( X i − ) if rank( X i − ) ≤ r − i and rank( X i ) ≤ rank( X i − ) − otherwise. By induction,we can prove rank( X i ) ≤ r − i always holds. As a result, rank( h X (1)) = rank( h r − r (1)) ≤ r and thus h X (1) ∈ X . Byconstruction, h i ( t ) never violates (13b) and thus is in ˆ X , so is h X ( t ) for all t . Functions V ( h i ( t )) and f ( h i ( t )) being non-increasing implies that V ( h X ( t )) and f ( h X ( t )) are also non-increasing. Therefore, (C3) is satisﬁed. By Corollary 1 (C2)also holds for { h X } . It completes the proof (by Theorem 4). Remark 4.

In [7], Theorem 3.4 claims that any local optimumof (13) should also be globally optimal, unless it is harboredin some positive-dimensional face of SDP. The result in ourpaper further asserts that if it is indeed harbored in such aface, then there must be some point on the edge of this facewhose cost can be further reduced in its neighborhood (i.e.,the local optimum is in the same situation as point c ratherthan d as in Fig. 1). VII. C

ONCLUSION AND D ISCUSSIONTABLE IIS

UFFICIENT AND NECESSARY CONDITIONS

Condition Relaxation exactness Local optimalitySufﬁcient conditions: ⇒ (C1), (C2) Strong exactness l.o. is p.l.o. or g.o.(C3), (C2) Weak exactness l.o. is p.l.o. or g.o.(C’) Strong exactness l.o. is g.o.Necessary condition: ⇐ (C1), (C2) Strong exactness l.o. is g.o. Table II summaries both sufﬁcient and necessary conditionsfor non-convex problem (1) to simultaneously have exact(weak or strong) relaxation and no spurious local optima(allowing or not allowing pseudo local optima). The necessarycondition relies on Assumption 1, which is usually true for real-world problems. Those results provide a new perspectiveto certify a non-convex problem is computationally easy tosolve. Furthermore, whenever the problem is indeed compu-tationally easy, the certiﬁcates (Lyapunov-like functions andpaths) are guaranteed to exist. We also provide a hierarchicalframework which shows how such certiﬁcates for a compli-cated problem can be constructed from primitive problems.Our results have been applied to OPF and LRSDP problems.Based on the examples shown in Section VI, a natural wayto apply this approach is to ﬁrst look at existing results onexact relaxation, and then construct V and { h x } accordingto the hidden structure underlying the exactness. Once V and { h x } are appropriately constructed, our result can help extendexisting results on relaxation exactness to new results on localoptimality.Compared to some existing techniques to study local opti-mality, our results do not require differentiating or analyzingthe curvature of feasible sets. It allows the feasible setsto incorporate more complicated and possibly non-convexconstraints. Those non-convex constraints are common forproblems arising in cyber physical systems which are generallygoverned by physical laws.R EFERENCES[1] F. Zhou and S. H. Low, “A sufﬁcient condition for local optima to beglobally optimal,” in

To appear in Proc. of the 2020 Conference onDecision and Control . IEEE, 2020.[2] E. J. Cand`es and B. Recht, “Exact matrix completion via convexoptimization,”

Foundations of Computational mathematics , vol. 9, no. 6,p. 717, 2009.[3] E. J. Cand`es and T. Tao, “The power of convex relaxation: Near-optimalmatrix completion,”

IEEE Transactions on Information Theory , vol. 56,no. 5, pp. 2053–2080, 2010.[4] R. Ge, J. D. Lee, and T. Ma, “Matrix completion has no spuriouslocal minimum,” in

Advances in Neural Information Processing Systems ,2016, pp. 2973–2981.[5] A. I. Barvinok, “Problems of distance geometry and convex propertiesof quadratic maps,”

Discrete & Computational Geometry , vol. 13, no. 2,pp. 189–202, 1995.[6] G. Pataki, “On the rank of extreme matrices in semideﬁnite programsand the multiplicity of optimal eigenvalues,”

Mathematics of operationsresearch , vol. 23, no. 2, pp. 339–358, 1998.[7] S. Burer and R. D. Monteiro, “Local minima and convergence in low-rank semideﬁnite programming,”

Mathematical Programming , vol. 103,no. 3, pp. 427–444, 2005.[8] M. Farivar and S. H. Low, “Branch ﬂow model: Relaxations andconvexiﬁcation–part I,”

IEEE Transactions on Power Systems , vol. 28,no. 3, pp. 2554–2564, 2013.[9] L. Gan, N. Li, U. Topcu, and S. H. Low, “Exact convex relaxation ofoptimal power ﬂow in radial networks,”

IEEE Transactions on AutomaticControl , vol. 60, no. 1, pp. 72–87, 2015.[10] J. Lavaei and S. H. Low, “Zero duality gap in optimal power ﬂowproblem,”

IEEE Transactions on Power Systems , vol. 27, no. 1, pp.92–107, 2012.[11] J. Jald´en, C. Martin, and B. Ottersten, “Semideﬁnite programmingfor detection in linear systems-optimality conditions and space-timedecoding,” in , vol. 4. IEEE,2003, pp. IV–9.[12] C. Lu, Y.-F. Liu, W.-Q. Zhang, and S. Zhang, “Tightness of a new andenhanced semideﬁnite relaxation for mimo detection,”

SIAM Journal onOptimization , vol. 29, no. 1, pp. 719–742, 2019.[13] Z. Li, Q. Guo, H. Sun, and J. Wang, “Sufﬁcient conditions for exact re-laxation of complementarity constraints for storage-concerned economicdispatch,”

IEEE Transactions on Power Systems , vol. 31, no. 2, pp.1653–1654, 2015.[14] J. Sun, Q. Qu, and J. Wright, “When are nonconvex problems not scary?” arXiv preprint arXiv:1510.06096 , 2015. [15] R. Ge, C. Jin, and Y. Zheng, “No spurious local minima in nonconvexlow rank problems: A uniﬁed geometric analysis,” in Proceedings of the34th International Conference on Machine Learning-Volume 70 . JMLR.org, 2017, pp. 1233–1242.[16] J. Sun, Q. Qu, and J. Wright, “Complete dictionary recovery over thesphere I: Overview and the geometric picture,”

IEEE Transactions onInformation Theory , vol. 63, no. 2, pp. 853–884, 2016.[17] N. Boumal, “Nonconvex phase synchronization,”

SIAM Journal onOptimization , vol. 26, no. 4, pp. 2355–2377, 2016.[18] S. Arora, R. Ge, T. Ma, and A. Moitra, “Simple, efﬁcient, and neuralalgorithms for sparse coding,”

Proceedings of Machine Learning Re-search , vol. 40, January 2015.[19] J. Carpentier, “Contribution to the economic dispatch problem,”

Bulletinde la Societe Francoise des Electriciens , vol. 3, no. 8, pp. 431–447,1962.[20] A. Verma, “Power grid security analysis : An optimization approach,”Ph.D. dissertation, Columbia University, 2009.[21] K. Lehmann, A. Grastien, and P. Van Hentenryck, “AC-feasibility on treenetworks is NP-hard,”

IEEE Transactions on Power Systems , vol. 31,no. 1, pp. 798–801, 2016.[22] J. A. Momoh, R. Adapa, and M. El-Hawary, “A review of selectedoptimal power ﬂow literature to 1993. I. nonlinear and quadraticprogramming approaches,”

IEEE transactions on power systems , vol. 14,no. 1, pp. 96–104, 1999.[23] J. A. Momoh, M. El-Hawary, and R. Adapa, “A review of selectedoptimal power ﬂow literature to 1993. II. newton, linear programmingand interior point methods,”

IEEE Transactions on Power Systems ,vol. 14, no. 1, pp. 105–111, 1999.[24] R. A. Jabr, A. H. Coonick, and B. J. Cory, “A primal-dual interiorpoint method for optimal power ﬂow dispatching,”

IEEE Transactionson Power Systems , vol. 17, no. 3, pp. 654–662, 2002.[25] R. A. Jabr, “Radial distribution load ﬂow using conic programming,”

IEEE transactions on power systems , vol. 21, no. 3, pp. 1458–1459,2006.[26] X. Bai, H. Wei, K. Fujisawa, and Y. Wang, “Semideﬁnite programmingfor optimal power ﬂow problems,”

Int’l J. of Electrical Power & EnergySystems , vol. 30, no. 6-7, pp. 383–392, 2008.[27] S. Bose, D. F. Gayme, K. M. Chandy, and S. H. Low, “Quadraticallyconstrained quadratic programs on acyclic graphs with application topower ﬂow,”

IEEE Transactions on Control of Network Systems , vol. 2,no. 3, pp. 278–287, 2015.[28] S. H. Low, “Convex relaxation of optimal power ﬂow–part II: Exact-ness,”

IEEE Transactions on Control of Network Systems , vol. 1, no. 2,pp. 177–189, 2014.[29] D. K. Molzahn and I. A. Hiskens, “A survey of relaxations andapproximations of the power ﬂow equations,”

Foundations and Trends®in Electric Energy Systems , vol. 4, no. 1-2, pp. 1–221, 2019.[30] V. A. Toponogov,

Differential geometry of curves and surfaces .Springer, 2006.[31] E. Bierstone and P. D. Milman, “Semianalytic and subanalytic sets,”

Publications Math´ematiques de l’Institut des Hautes ´Etudes Scien-tiﬁques , vol. 67, no. 1, pp. 5–42, 1988.[32] E. Bierstone, “Differentiable functions,”

Boletim da Sociedade Brasileirade Matem´atica-Bulletin/Brazilian Mathematical Society , vol. 11, no. 2,pp. 139–189, 1980.[33] R. Hardt, “Some analytic bounds for subanalytic sets,”

Differentialgeometric control theory, Progress in Math , vol. 27, pp. 259–267, 1983.[34] A. M. Gabri`elov, “Projections of semi-analytic sets,”

Functional Analysisand its applications , vol. 2, no. 4, pp. 282–291, 1968.[35] S. Łojasiewicz, “On semi-analytic and subanalytic geometry,”

BanachCenter Publications , vol. 34, no. 1, pp. 89–104, 1995.[36] M. E. Baran and F. F. Wu, “Optimal Capacitor Placement on radialdistribution systems,”

IEEE Trans. Power Delivery , vol. 4, no. 1, pp.725–734, 1989.[37] ——, “Optimal Sizing of Capacitors Placed on A Radial DistributionSystem,”

IEEE Trans. Power Delivery , vol. 4, no. 1, pp. 735–743, 1989.[38] B. Zhang and D. Tse, “Geometry of injection regions of power net-works,”

IEEE Transactions on Power Systems , vol. 28, no. 2, pp. 788–797, 2013.[39] R. Jabr, A. Coonick, and B. Cory, “A primal-dual interior point methodfor optimal power ﬂow dispatching,”

IEEE Trans. on Power Systems ,vol. 17, no. 3, pp. 654–662, 2002.[40] S. Gopinath, H. Hijazi, T. Weißer, H. Nagarajan, M. Yetkin, K. Sundar,and R. Bent, “Proving global optimality of acopf solutions,”