Conditions for Exact Convex Relaxation and No Spurious Local Optima
11 Conditions for Exact Convex Relaxation and No SpuriousLocal Optima
Fengyu Zhou,
Student Member, IEEE, and Steven H. Low,
Fellow, IEEE
Abstract —Non-convex optimization problems can be approx-imately solved via relaxation or local algorithms. For manypractical problems such as optimal power flow (OPF) problems,both approaches tend to succeed in the sense that relaxation isusually exact and local algorithms usually converge to a globaloptimum. In this paper, we study conditions which are sufficientor necessary for such non-convex problems to simultaneouslyhave exact relaxation and no spurious local optima. Thoseconditions help us explain the widespread empirical experiencethat local algorithms for OPF problems often work extremelywell.
Index Terms —Convex relaxation, local optimum, optimalpower flow, semidefinite program.
I. I
NTRODUCTION N ON-CONVEX optimization problems in general arecomputationally challenging. However, many heuristicstend to work well for real-world problems. Those approachesinclude convex relaxations and local algorithms. It is usuallyhoped that relaxations yields exact solutions and local optimaare also globally optimal. In this paper, we derive the condi-tions, sufficient or necessary, for these two properties to holdsimultaneously. Our focus is specifically on the optimizationformulation with convex cost and non-convex constraints.
A. Related Works
Many problems have been proved to have exact relaxationand no spurious local optima (such as matrix completion [2],[3], [4], low rank semidefinite program [5], [6], [7]), the proofsfor those two properties are usually based on different typesof certificates. In this subsection, we review some widely-usedcertificates for each property.One type of certificates to exhibit relaxation exactness isby showing that any relaxed (and infeasible) point maps toa feasible solution with lower cost. This asserts that relaxedpoints cannot be the optimal solution. For instance, [8], [9]prove that optimal power flow problems can be solved viasecond-order cone relaxation under certain conditions, usingthe argument that any solution in the interior of the second-order cone can always be moved towards the boundary tofurther reduce the cost. In [5], [6], it is proved that if a semi-definite program has a solution with sufficiently large rank,then one can always reduce the rank without increasing thecost or violating the constraints. Another type of certificatesinvolve studying the dual variables and KKT conditions. The
Partial and preliminary results have appeared in [1].Fengyu Zhou and Steven H. Low are with the Department of ElectricalEngineering, California Institute of Technology, Pasadena, CA, 91125 USAe-mail: ( { f.zhou, slow } @caltech.edu). underlying idea is a pair of primal and dual solutions satisfyingKKT conditions that certify their optimality for both the primaland dual problems. Thus constructing dual variables withcertain structures can also certify the optimality of primalsolutions. In [2] for instance, the dual variable is related tothe subgradient of the cost function at a desired matrix andtherefore it helps certify the optimality of that desired matrix.Another example is [10], which proves the primal matrixshould be of rank through the argument that the null spaceof its dual matrix has dimension at most . Similar techniquesare also used in [11], [12], [13].There are also considerable literature establishing the globaloptimality of local optima. We refer to [14], [15] and ref-erences therein. In [14], the authors focus on a class ofproblems with twice continuously differentiable function asthe cost and Riemannian manifold as the feasible set. Thevalues of Riemannian gradient and Hessian at a certain pointthen help certify properties such as strong gradient , negativecurvature or local convexity in its neighborhood. It theneliminates spurious local optima and saddle points, where localalgorithms can be trapped. This technique or idea were alsoused in [16] for dictionary recovery problem and [17] for phasesynchronization. In both problems, the Riemannian manifoldis some n -sphere or the Cartesian product of n -spheres. Theframework summarized in [15] also leverages the landscapeof the cost function, and the problem is usually reformulatedinto an unconstrained form. Instead of explicitly computingthe gradient and Hessian matrix, the paper shows it suffices tofind a single direction of improvement. For certain symmetricpositive definite problems, the paper shows that the decisionvariable will always get closer to the global optimizer when thecost is reduced. A similar idea was also applied in [18], wherethe main result is built upon a correlation condition whichstates that the gradient (or any updating rule) is correlatedwith the direction from the current location towards the globaloptimizer. Therefore the underlying algorithm such as gradientdescent can always produce a solution closer to the globaloptimizer as the algorithm progresses. B. Contribution
The brief review above shows most works study exactrelaxation and local optimality separately. It is unclear whatmight be the common feature of non-convex problems thatpossess both properties. Many real-world non-convex prob-lems, however, seem to possess both properties, either prov-ably or empirically, and it is hard to explain why these niceproperties, though seemingly different, often occur simultane-ously. Besides, most literature on local optimality focuses on a r X i v : . [ m a t h . O C ] F e b problems without constraints or with tractable constraints. Thisis usually the case for problems in the learning area. However,for problems arising in cyber-physical systems, the constraintscould include non-convex functions enforced by physical laws,as we will see in power systems. In these cases, either thefeasible set is not a Riemannian manifold, or the Riemanniangradient and Hessian are very hard to derive. These questionsmotivate us to study conditions, sufficient or necessary, forproblems to simultaneously have exact relaxation and nospurious local optima. These conditions also help us studylocal optimality using properties of its relaxation, instead ofits landscape.Our conditions have two parts. The first part, which alsoappeared in [1], is on the sufficient condition. Roughly, if forany relaxed point, there exists a path connecting it to the non-convex feasible set and the path satisfies the following: • along the path the cost is non-increasing, • along the path the ‘distance’ to the non-convex feasibleset is non-increasing,then the problem must have exact relaxation and no spuriouslocal optima simultaneously. Here the ‘distance’ can be anyproperly constructed function, as we will define later as aLyapunov-like function (Definition 10). The second part is onthe necessary condition, which says that if a problem does haveexact relaxation and no spurious local optima simultaneously,then there must exist such a Lyapunov-like function and pathssatisfying the requirements above. Though Lyapunov-like functions and paths are guaranteed toexist, for specific problems it could still be difficult to constructthem. We then derive certain rules to construct a Lyapunov-likefunction and paths of a new problem from primitive problemswith known Lyapunov-like functions and paths. This processallows us to reuse and extend known results as the problemchanges and grows. Finally, we apply the proposed approachto two specific problems, optimal power flow (OPF) and lowrank SDP. Our work proves the first known condition (that canbe checked a priori ) for OPF to have no spurious local optima,and it helps explain the widespread empirical experience thatlocal algorithms for OPF problems often work extremely well.
C. Background for Power Systems
As one of the applications and main motivation of this work,OPF is a core problem in power systems. First proposed in[19], OPF is a class of optimization problems that minimizes acertain cost subject to nonlinear physical laws and operationalconstraints. It is known to be non-convex and NP-hard inits AC formulation [20], [10], [21]. Therefore, there is noknown efficient algorithm that can solve all problem instancesin polynomial time. Traditional approaches to solving OPF areusually based on local algorithms such as Newton-Raphson,see [22], [23], [24] for examples. Over the past decade,techniques on convex relaxation have also been introducedto solve OPF [25], [26]. A surprising empirical finding inthe literature shows that despite the non-convexity, both localalgorithms and convex relaxations very often yield global The necessary condition is based upon some stronger assumptions so thesecond part is not the exact converse of the first part. optimum of the original non-convex problem [25], [26], [10],[27]. In recent years, there have been considerable analyticalworks on provable conditions for the relaxation exactness,which are summarized in the reviews [28], [29] and referencestherein. However, few analytical results are known on theperformance guarantee of local algorithms. In this paper, weshow that a known sufficient condition for exact relaxation isalso sufficient for local optima to be globally optimal. To thebest of our knowledge, this is the first analytical result of itskind, and we hope that the approaches developed in this papercan help derive more sufficient conditions along this direction.II. P
RELIMINARIES
In this paper, we will use K to denote the set R of realnumbers or the set C of complex numbers. For any finitepositive integer n , K n is a Banach space.Consider a (potentially non-convex) optimization problemminimize x f ( x ) (1a)subject to x ∈ X (1b)and its convex relaxationminimize x f ( x ) (2a)subject to x ∈ ˆ X . (2b)Here X is a nonempty compact subset of K n , not necessarilyconvex, while ˆ X ⊆ K n is an arbitrary compact and convexsuperset of X . The cost function f : ˆ X → R is convex andcontinuous over ˆ X . We do not require the relaxation ˆ X to beefficiently represented. Definition 1.
A point x lo ∈ X is called a local optimum of (1) if there exists a δ > such that f ( x lo ) ≤ f ( x ) for all x ∈ X with (cid:107) x − x lo (cid:107) < δ . Definition 2 (Strong Exactness) . We say the relaxation (2) is exact with respect to (1) if every optimal point of (2) isfeasible, and hence globally optimal, for (1) . Unless otherwise specified, we will always use the term exact to refer to such strong exactness. Definition 2 impliesin particular that, if (2) is exact, then ∀ ˆ x ∈ ˆ X \ X , f (ˆ x ) > min x ∈ ˆ X f ( x ) . Definition 3. A path in S ⊆ K n connecting point a to point b is a continuous function h : [0 , → S such that h (0) = a and h (1) = b . We may refer to a path as the corresponding function h inthe remainder of the paper. Lemma 1.
The following are equivalent:(A) Problem (2) is exact with respect to (1) .(B) For any x ∈ ˆ X \ X , there is a path h in ˆ X such that h (0) = x, h (1) ∈ X , f ( h ( t )) is non-increasing for t ∈ [0 , and f ( h (0)) > f ( h (1)) .Proof. (A) = ⇒ (B): Let x ∗ be any optimal point of (2). By(A), x ∗ ∈ X , thus for x ∈ ˆ X \ X , we could choose the pathas the line segment from x to x ∗ since ˆ X is convex. (B) = ⇒ (A): Condition (B) implies that no point x ∈ ˆ X \X can be optimal for (2).Lemma 1 is not surprising, and in fact many works in theliterature proving exact relaxations of Optimal Power Flowproblems can be interpreted as using (B) to prove (A) byimplicitly finding such a path h for each x ∈ ˆ X \ X [28].Condition (B) does not say anything about the local optimain X for (1). In the next section we will strengthen (B) byequipping the path with a Lyapunov-like function and showthat the stronger condition implies that all local optima of (1)are globally optimal. We start by classifying local minima. Definition 4.
We classify each local optimum x lo of (1) intothree disjoint classes: x lo is a • Global optimum (g.o.) if f ( x lo ) ≤ f ( x ) for all thefeasible x ∈ X . • Pseudo local optimum (p.l.o.) if there is a path h :[0 , → X such that h (0) = x lo , f ( h ( t )) ≡ f ( x lo ) forall t ∈ [0 , and h (1) is not a local optimum. • Genuine local optimum (g.l.o.) if it is neither a globaloptimum nor a pseudo local optimum.
Examples of all three classes are shown in Fig. 1. X a b c d Fig. 1. Examples for three classes of local optima. The arrow indicates thedirection along which the cost function decreases. Point b is a global optimum,point c is a pseudo local optimum, while points a and d are genuine localoptima. Definition 5.
A point x is improvable in X if there is a path h : [0 , → X such that • h (0) = x ; • f ( h ( t )) is non-increasing for t ∈ [0 , ; • h (1) is not a local optimum or f ( h (1)) < f ( x ) . Remark 1.
A local optimum is a pseudo local optimum if andonly if it is improvable in X . Definition 6.
A set { h i : i ∈ I} of paths indexed by i is saidto be uniformly bounded if there is a finite number M suchthat (cid:107) h i ( t ) (cid:107) ∞ ≤ M for every i ∈ I and t ∈ [0 , . Definition 7.
A set { h i : i ∈ I} of paths indexed by i is saidto be uniformly equicontinuous if for any (cid:15) > , there existsa δ > such that (cid:107) h i ( t ) − h i ( t ) (cid:107) ∞ < (cid:15) for every i ∈ I whenever | t − t | < δ . Remark 2.
The index set I could be empty or uncountablyinfinite. An empty path set (i.e., when I = ∅ ) is considered tobe both uniformly bounded and uniformly equicontinuous. Let Π | ba be the family of all the finite ordered subsetsof [ a, b ] . We use Π as a shorthand for Π | . For π =( t , · · · , t N ) ∈ Π and a path h , define L π ( h ) := N (cid:88) i =1 (cid:107) h ( t i − ) − h ( t i ) (cid:107) (cid:96) . Clearly, L π ( h ) is always finite for given π and h . Definition 8 ([30]) . For path h , define the function L ( h ) :=sup π ∈ Π L π ( h ) . We say h is rectifiable iff L ( h ) is finite. When h is rectifiable, L ( h ) is also referred to as its length. Definition 9 ([30]) . For a rectifiable path h : [0 , → K n , letits arc-length reparameterization be ¯ h : [0 , → K n and (cid:40) ¯ h (cid:16) L ( h ) sup π ∈ Π | t L π ( h ) (cid:17) := h ( t ) , if L ( h ) > h := h, if L ( h ) = 0 One could see L (¯ h ) = L ( h ) < ∞ and they have the samefunction image, i.e., { ¯ h ( t ) | t ∈ [0 , } = { h ( t ) | t ∈ [0 , } . For ≤ t ≤ t ≤ , ¯ h has the property that sup π ∈ Π | t t L π (¯ h ) =( t − t ) L (¯ h ) . Lemma 2.
For a set of rectifiable paths h i , i ∈ I , if the valuesof L ( h i ) are uniformly bounded, then the set of ¯ h i , i ∈ I isuniformly equicontinuous.Proof. Assume L ( h i ) ≤ M for all i ∈ I , then for any ≤ t ≤ t ≤ , we have for any i , (cid:107) ¯ h i ( t ) − ¯ h i ( t ) (cid:107) ∞ ≤ (cid:107) ¯ h i ( t ) − ¯ h i ( t ) (cid:107) (cid:96) ≤ sup π ∈ Π | t t L π (¯ h i ) = ( t − t ) L ( h i ) ≤ M | t − t | . Setting δ = (cid:15)/M , the equicontinuity is proved. Corollary 1. If S is compact in K n and all paths in a set H = { h i : i ∈ I} are [0 , → S and consist of at most N linear segments, then { h i : i ∈ I} must be both uniformlybounded and uniformly equicontinuous. Here N is a finiteconstant for all paths in H . III. S
UFFICIENT C ONDITIONS
In this section, we first study the sufficient conditions underwhich (2) is exact w.r.t. (1) and all the local optima of (1)are also globally optimal. Those sufficient conditions will beproposed by strengthening Condition (B). Note that (B) hasalready implied (2) is exact w.r.t. (1), so our strategy is tostrengthen (B) in order to rule out the possibility of genuinelocal optima and pseudo local optima.
A. Ruling Out Genuine Local Optima
Definition 10. A Lyapunov-like function associated with (1) and (2) is a continuous function V : ˆ X → R + such that V ( x ) = 0 for x ∈ X and V ( x ) > for x ∈ ˆ X \ X . A strengthened version of (B) is as follows. In contrast to a standard Lyapunov function, we do not require V to bedifferentiable here. x x ∗ X ˆ X ℓ ( t † ) ℓ ( t ) ℓ ( t ) ℓ ( t m ) h h h m h h (1) h (1) h m (1) h (1) Fig. 2. Sketch of notations for the proof of Theorem 1. Point x and (cid:96) ( t † ) will be later proved to be identical to each other. (C) There exists a Lyapunov-like function V associated with(1) and (2) such that:(C1) For any x ∈ ˆ X \ X , there is a path h x in ˆ X such that h x (0) = x, h x (1) ∈ X , both f ( h x ( t )) and V ( h x ( t )) are non-increasing for t ∈ [0 , and f ( h x (0)) > f ( h x (1)) .(C2) The set { h x } x ∈ ˆ X \X is uniformly bounded and uni-formly equicontinuous.
Theorem 1.
If (C) holds, then (A) also holds and any localoptimum in X for (1) is either a global optimum or a pseudolocal optimum.Proof. (C) = ⇒ (A) is because (C) is stronger than (B). Asfor the second part of the argument, we include an illustrativesketch of the notations in Fig. 2. Suppose x ∈ X is a localbut not global optimum for (1). We will prove that x must beimprovable in X (and thus a pseudo local optimum).Let x ∗ (cid:54) = x be a global optimum of (1), so f ( x ∗ ) < f ( x ) .Let (cid:96) : [0 , → ˆ X be the linear function characterizing theline segment from x to x ∗ , i.e., (cid:96) ( t ) = (1 − t ) x + tx ∗ with f ( (cid:96) (1)) = f ( x ∗ ) < f ( x ) . Note that f ( (cid:96) ( t )) is non-increasingin t . To see this, consider any t ≥ , (cid:15) > with t + (cid:15) ≤ , x = (cid:96) ( t ) , x = (cid:96) ( t + (cid:15) ) . Setting s := (cid:15)/ (1 − t ) , we have x = (1 − s ) x + sx ∗ . Since f is convex and x ∗ is also aglobal optimum of (2) over ˆ X , we have f ( x ) ≤ (1 − s ) f ( x ) + sf ( x ∗ ) ≤ f ( x ) . Define t † := sup t ∈ [0 , t s. t. (cid:96) ( τ ) ∈ X ∀ τ ≤ t. As X is closed, (cid:96) ( t † ) is also in X . We first prove (cid:96) ( t † ) mustbe x (i.e., t † = 0 ). Otherwise, as x is a local optimum, wecould find δ ∈ (0 , t † ) such that f ( (cid:96) ( t )) ≥ f ( (cid:96) (0)) = f ( x ) forall t ∈ [0 , δ ) . Since f ( (cid:96) ( t )) is non-increasing in t , we musthave f ( (cid:96) ( t )) ≡ f ( (cid:96) (0)) = f ( x ) for all t ∈ [0 , δ ) . It contradictsthe fact that f ( (cid:96) ( t )) is convex and f ( (cid:96) (1)) = f ( x ∗ ) < f ( x ) = f ( (cid:96) (0)) for the same reason f is non-increasing in t .Therefore (cid:96) ( t † ) = x and f ( (cid:96) ( t † )) = f ( x ) . It is sufficient toshow (cid:96) ( t † ) is improvable in X . That is to say, it is sufficientto find some function h : [0 , → X such that h (0) = (cid:96) ( t † ) , f ( h ( t )) is non-increasing in t ∈ [0 , and either f ( h (1)) So far, Condition (C) has eliminated the possibility ofhaving genuine local optima, and in this subsection we furtherstrengthen the condition to also rule out pseudo local optima.Consider the following lemma and its corollaries. Lemma 3. If (1) is exact with respect to (2) and (1) has nogenuine local optima, then the feasible set of (1) is connected.Proof. If X is not connected, then by definition X can bepartitioned into two disjoint non-empty closed sets X and X with X = X ∪X , which are hence both compact. Further welet x i be any global optimum of min x ∈X i f ( x ) for i = 1 , .Clearly x (cid:54) = x and they are both local optima of (1). The strict inequality is due to the convexity of f ( (cid:96) ( t )) and the fact that f ( (cid:96) (1)) < f ( (cid:96) ( t † )) . If f ( x ) = f ( x ) , then any convex combination of x , x must be a global optimum to (2). Since there is no pathin X that connects x and x , there must be some convexcombination that is outside X . This contradicts the exactnessof relaxation.If f ( x ) (cid:54) = f ( x ) , without loss of generality we assume f ( x ) < f ( x ) , i.e., x is not a global optimum of (1). But x is not a pseudo local optimum of (1) either, contradictingTheorem 1. To see this, note that any point x (cid:48) ∈ X which isconnected to x via a path in X must also be a point in X and if f ( x (cid:48) ) = f ( x ) then x (cid:48) must be a local optimum of (1)as well. Corollary 2. Condition (C) implies that the feasible set of (1) is connected. Now we are in a good position to discuss some conditionsthat rule out pseudo local optima and therefore guarantee thatany local optimum must be a global optimum. Corollary 3. If all local optima of (1) are isolated, thenCondition (C) implies that any local optimum of (1) is a globaloptimum. Here, local optima being isolated means any local optimumof (1) has an open neighborhood which contains no other localoptimum. The proof is straightforward as by definition isolatedlocal optimum could not be pseudo local optimum. In fact, inthis case the optimum can be proved to be also unique.Another way to eliminate pseudo local optima is bystrengthening the monotonicity of f ( h x ( t )) in Condition (C).Consider the following condition which is slightly strongerthan (C).(C’) Condition (C) holds, and there exists k > such that ∀ x ∈ ˆ X \ X , ∀ ≤ t < s ≤ we have f ( h x ( t )) − f ( h x ( s )) ≥ k (cid:107) h x ( t ) − h x ( s ) (cid:107) . (3)In Condition (C’), (cid:107) · (cid:107) could be any norm on K n . As acaveat, (cid:96) -“norm” is not allowed here as it is not a normsince it does not satisfy (cid:107) αx (cid:107) = | α |(cid:107) x (cid:107) . Note that Condition(C) already implies f ( h x ( t )) − f ( h x ( s )) ≥ , while (C’)strengthens this condition by enforcing a positive lower bounddepending on h x . Theorem 2. If (C’) holds, then any local optimum of (1) mustbe a global optimum.Proof. Following the proof of Theorem 1, suppose x ∈ X is alocal but not global optimum for (1). Then we have x = (cid:96) ( t † ) and could obtain a limit point of the sequence h m , denoted as h . Since both sides of (3) are continuous in h m ( t ) and h m ( s ) ,and the limits of h m ( t ) and h m ( s ) are h ( t ) and h ( s ) , we musthave whenever h ( t ) (cid:54) = h ( s ) , f ( h ( t )) − f ( h ( s )) ≥ k (cid:107) h ( t ) − h ( s ) (cid:107) > . Taking t = 0 we can conclude that h (0) (which is the samepoint as x ) is not a local optimum of (1). IV. N ECESSARY C ONDITIONS In this section we will study the necessary conditions for anon-convex problem to have exact relaxation and no spuriouslocal optima simultaneously. It turns out the results are notexactly the converses of Theorem 1 or Theorem 2, but in aslightly weaker sense. Specifically, we show that if a non-convex problem is known to have exact relaxation and nospurious local optima simultaneously, then the Lyapunov-likefunction and paths satisfying Condition (C) are guaranteed toexist. However, it still may or may not be easy to find thosefunctions or paths in practice for a specific problem. A. Results Assumption 1. The feasible set X is semianalytic and thecost function f is analytic. We refer to [31] for more detailed definitions and propertiesof semianalytic sets. This assumption is not restrictive for mostengineering problems. If K is chosen as C , then we suggest toview all the complex functions as functions of real variablesby separating the real and imaginary parts, and the space of C n can be viewed as a shorthand for R n in this section. Theorem 3 (necessary condition) . If (2) is exact with respectto (1) and any local optimum of (1) is globally optimal, thenthere exists a Lyapunov-like function V and a correspondingfamily of paths { h x } x ∈ ˆ X \X satisfying (C1) and (C2). Remark 3. Note that Theorem 3 is NOT the converse ofTheorem 1 in a strict sense. There are a few differences intheir settings. • Theorem 1 allows pseudo local optimum (in the conclu-sion) of the theorem, while Theorem 3 disallows it (in thepremise). • Theorem 3 relies on Assumption 1 while Theorem 1 doesnot.B. Proof Setup In the rest of the section, we will prove Theorem 3. Fromnow on, we assume (2) is exact with respect to (1) and anylocal optimum of (1) is also globally optimal. We first have thefollowing definition and lemmas, which are the main reasonswe introduced Assumption 1. Definition 11 (Whitney regularity [31], [32], [33]) . For acompact set U ⊂ K n and a positive integer p , we say U is p -regular if there exists C > such that ∀ x, y ∈ U , x , y can be joined by a rectifiable curve h in U satisfying L ( h ) ≤ C (cid:107) x − y (cid:107) /p . Lemma 4 (Theorem 6.10 in [31]) . If U is a compact connectedsubanalytic subset of K n , then there is a positive integer p such that U is p -regular and the curves can always be chosensemianalytic. The proof of Lemma 4 can be found in [31]. Note that anysemianalytic set is also subanalytic. Lemma 5. For any x ∈ X that is not a local optimum of (1) and for any (cid:15) > , there exists a path h in X such that h (0) = x , f ( h ( t )) is non-increasing in t , f ( h (1)) < f ( h (0)) and L ( h ) < (cid:15) .Proof. Consider the set U := { x ∈ X : f ( x ) ≤ f ( x ) } ,which by definition is also semi-analytic. Since x ∈ X is notan optimum of (1), the problem min x ∈U f ( x ) must also beexact with respect to (2) and it does not introduce new localoptima compared to (1). By Lemma 3, U must be connected.According to Lemma 4, there is a rectifiable and semi-analytic curve h in U such that h (0) = x , L ( h ) < (cid:15) , f ( h (1)) < f ( x ) and f ( h ( t )) ≤ f ( x ) for all t ∈ [0 , .Here f ( h (1)) can be chosen as any point in U which has astrictly smaller cost value than x and is sufficiently close to x in Euclidean distance. It is known that a semianalytic curveis analytic except for a finite number of points [34]. Assume h ( t ) is not analytic at a < a < · · · < a k = 1 where k ≥ . By Theorem on the parametrization of a semi-analyticarc in [35] and the assumption that f is analytic, the valueof f ( h ( t )) within any interval [ a (cid:96) − , a (cid:96) ] should be equalto some analytic function defined over an open superset of [ a (cid:96) − , a (cid:96) ] . Since f ( h (1)) < f ( h (0)) , the function f ( h ( t )) cannot be a constant function over [0 , . Let [ a (cid:96) − , a (cid:96) ] bethe first interval within which f ( h ( t )) is not constant, then f ( h ( a (cid:96) − )) = f ( h (0)) . As f ( h ( t )) within [ a (cid:96) − , a (cid:96) ] equalsto a analytic function defined over an open superset of [ a (cid:96) − , a (cid:96) ] , there must be a small subinterval [ a (cid:96) − , a (cid:96) − + δ ) for some δ > within which we always have f ( h ( t )) = f ( h ( a (cid:96) − )) + ∞ (cid:88) i =0 c i ( t − a (cid:96) − ) i , where the right hand side is the Taylor expansion of f ( h ( t )) at a (cid:96) − . Since f ( h ( t )) is not constant over [ a (cid:96) − , a (cid:96) ] , thecoefficients c i cannot all be zeros by the identity theorem.Suppose c i is the first nonzero coefficient in the sequence { c i } ∞ i =0 , we have two cases. If c i > , then f ( h ( t )) is strictlyincreasing within [ a (cid:96) − , a (cid:96) − + δ (cid:48) ) for some small positive δ (cid:48) <δ . It contradicts to the facts that f ( h ( a (cid:96) − )) = f ( x ) and f ( h ( t )) ≤ f ( x ) for all t ∈ [0 , . If c i < , then f ( h ( t )) is strictly decreasing within [ a (cid:96) − , a (cid:96) − + δ (cid:48) ) for some smallpositive δ (cid:48) < δ . Then we can construct a new path h such that h ( t ) = h ( t · ( a (cid:96) − + δ (cid:48) )) for all t ∈ [0 , . It is easy to checksuch h satisfies all the requirements in Lemma 5.Now we consider weaker versions of (C1) and (C2).(C3) For any x ∈ ˆ X \ X , there is a path h x in ˆ X such that h x (0) = x, h x (1) ∈ X , both f ( h x ( t )) and V ( h x ( t )) arenon-increasing for t ∈ [0 , .(C4) All the { L ( h x ) } x ∈ ˆ X \X are finite and uniformly bounded.Compared to (C1), (C3) does not require f ( h x (0)) >f ( h x (1)) to strictly hold. Then we have a weaker version ofTheorem 3 as follows. Lemma 6 (weaker necessary condition) . If (2) is exact to (1) and any local optimum of (1) is also globally optimal,then there always exists a Lyapunov-like function V and a We can always do so because x is not a local optimum of (1). Theinequality L ( h ) < (cid:15) is satisfied because of the p -regularity of U . corresponding family of paths { h x } x ∈ ˆ X \X satisfying (C3) and(C4). We now show that Lemma 6, though weaker in its statement,actually implies Theorem 3, so later on we will only focuson the proof of Lemma 6. To see this, we suppose V † and { h † x } x ∈ ˆ X \X are the Lyapunov-like function and pathsguaranteed by Lemma 6.For each x ∈ ˆ X \ X , if h † x (1) is a local optimum (so it isalso a global optimum) of (1) then we must have f ( h x (1)) We construct V as V ( x ) = inf h ∈ ˆ H h (0)= xh (1) ∈X L ( h ) (4) Lemma 7. For a sequence ( h i ) ∞ i =1 where h i ∈ ˆ H , if both ( h i ) ∞ i =1 and ( L ( h i )) ∞ i =1 are uniformly bounded, then theremust be a subsequence which uniformly converges to some h ∗ such that its arc-length reparameterization, denoted as h ∗ ,is in ˆ H . Furthermore, L ( h ∗ ) = L ( h ∗ ) ≤ lim sup i L ( h i ) .Proof. By Lemma 2, ( h i ) ∞ i =1 is both uniformly bounded anduniformly equicontinuous. By Arzel`a-Ascoli theorem, a subse-quence of ( h i ) ∞ i =1 uniformly converges to a limit h ∗ . Without loss of generalization, we denote this subsequence as ( h i ) ∞ i =1 as well. By uniform limit theorem and the compactness of ˆ X , h ∗ is a continuous function mapping from [0 , to ˆ X . To show h ∗ ∈ ˆ H , it is sufficient to show f ( h ∗ ( t )) ≥ f ( h ∗ (1)) for all t ∈ [0 , and L ( h ∗ ) < ∞ . If f ( h ∗ ( t )) = f ( h ∗ (1)) − (cid:15) for some t ∈ [0 , and (cid:15) > , then for sufficiently large i , we wouldhave | f ( h i ( t )) − f ( h ∗ ( t )) | < (cid:15)/ and | f ( h i (1)) − f ( h ∗ (1)) | <(cid:15)/ . Thus f ( h i ( t )) ≤ f ( h i (1)) − (cid:15)/ and it contradicts to h i ∈ ˆ H .Instead of showing L ( h ∗ ) < ∞ , we directly prove L ( h ∗ ) ≤ lim sup i L ( h i ) . Otherwise, there exists π = ( t , · · · , t N ) ∈ Π such that L π ( h ∗ ) = lim sup i L ( h i ) + (cid:15) for (cid:15) > . Forsufficiently large i , we have | L π ( h i ) − L π ( h ∗ ) | = (cid:12)(cid:12)(cid:12) N (cid:88) j =1 (cid:107) h i ( t j − ) − h ( t j ) (cid:107) (cid:96) − N (cid:88) j =1 (cid:107) h ∗ ( t j − ) − h ∗ ( t j ) (cid:107) (cid:96) (cid:12)(cid:12)(cid:12) ≤ N (cid:88) j =1 (cid:16) (cid:107) h i ( t j − ) − h ∗ ( t j − ) (cid:107) (cid:96) + (cid:107) h i ( t j ) − h ∗ ( t j ) (cid:107) (cid:96) (cid:17) ≤ (cid:15) Thus, L ( h i ) ≥ L π ( h i ) ≥ lim sup i L ( h i ) + (cid:15)/ holds forsufficiently large i . It contradicts to the definition of lim sup .As a result, we must have L ( h ∗ ) ≤ lim sup i L ( h i ) . Lemma 8. The optimization in (4) is feasible and the optimalcost can be achieved.Proof. We fix some x ∈ ˆ X . To show the feasibility, consider h fea ( t ) := (1 − t ) x + tx ∗ , which is feasible to (4). Let L fea = L ( h fea ) . Since L fea is finite and L ( h ) is non-negative, V ( x ) must be finite. To show the achievability of the optimal cost,if not, then there must be a sequence of feasible ( h i ) ∞ i =1 suchthat L fea > L ( h i ) ≥ L ( h i +1 ) > V ( x ) for all i ≥ i →∞ L ( h i ) = V ( x ) . The compactness of ˆ X implies ( h i ) ∞ i =1 is uniformly boundedas well. By Lemma 7, a subsequence of ( h i ) ∞ i =1 , denotedas ( h i ) ∞ i =1 as well, uniformly converges to a limit h ∗ and L ( h ∗ ) = L ( h ∗ ) ≤ V ( x ) . Moreover, h ∗ (0) = h ∗ (0) = lim i →∞ h i (0) = xh ∗ (1) = h ∗ (1) = lim i →∞ h i (1) ∈ X . Above all, we proved h ∗ is feasible to (4), and the cost L ( h ∗ ) is not worse than V ( x ) . It contradicts to the non-achievability assumption.For each x ∈ ˆ X \ X , we construct h x as h x = arg min h ∈ ˆ H h (0)= xh (1) ∈X L ( h ) (5)If there are multiple minimizers then h x can be chosen as anyone of them. Lemma 9. For x ∈ ˆ X \ X , the function h x is injective. Proof. Otherwise, for some x , there exist t < t such that h x ( t ) = h x ( t ) . Since h x ∈ ˆ H ⊆ ¯ H , we have sup π ∈ Π | t t L π ( h x ) = ( t − t ) L ( h x ) = ( t − t ) V ( x ) > Consider a new path defined as h ∗ ( t ) := (cid:26) h x ( t ) , if t ∈ [0 , \ [ t , t ] h x ( t ) , if t ∈ [ t , t ] It is easy to check h ∗ is continuous and entirely within ˆ X .For any t ∈ [0 , , h x ∈ ˆ H implies f ( h ∗ ( t )) ≥ f ( h x (1)) = f ( h ∗ (1)) . Further, we have L ( h ∗ ) = sup π ∈ Π L π ( h ∗ )= sup π ∈ Π | t L π ( h ∗ ) + sup π ∈ Π | t t L π ( h ∗ ) + sup π ∈ Π | t L π ( h ∗ )= sup π ∈ Π | t L π ( h x ) + 0 + sup π ∈ Π | t L π ( h x ) < sup π ∈ Π | t L π ( h x ) + sup π ∈ Π | t t L π ( h x ) + sup π ∈ Π | t L π ( h x )= sup π ∈ Π L π ( h x ) = L ( h x ) . Above all, the arc-length reparameterization of h ∗ , denoted as h ∗ , is feasible to (5) but achieves a strictly lower cost than h x . This contradicts to the optimality of h x . Corollary 4. For distinctive t , t , t ∈ [0 , , if f ( h x ( t )) ≥ f ( h x ( t )) and f ( h x ( t )) > f ( h x ( t )) , then (cid:107) h x ( t ) − h x ( t ) (cid:107) (cid:96) + (cid:107) h x ( t ) − h x ( t ) (cid:107) (cid:96) > (cid:107) h x ( t ) − h x ( t ) (cid:107) (cid:96) .Proof. It is sufficient to show h x ( t ) is not the convexcombination of h x ( t ) and h x ( t ) . Otherwise, we assume h x ( t ) = λh x ( t ) + (1 − λ ) h x ( t ) for some λ ∈ [0 , . First,Lemma 9 implies λ (cid:54) = 0 , . For λ ∈ (0 , , the convexity of f implies f ( h x ( t )) = f ( λh x ( t ) + (1 − λ ) h x ( t )) ≤ λf ( h x ( t )) + (1 − λ ) f ( h x ( t )) <λf ( h x ( t )) + (1 − λ ) f ( h x ( t )) = f ( h x ( t )) . This contradiction shows h x ( t ) is not the convex combinationof h x ( t ) and h x ( t ) . Then the triangle inequality implies thiscorollary. Lemma 10. For each h x defined in (5) , we have f ( h x ( t )) isnon-increasing in t for t ∈ [0 , .Proof. We fix an x ∈ ˆ X \ X and prove the result for h x defined above. If not, then there exist ≤ t < t ≤ suchthat f ( h x ( t )) < f ( h x ( t )) . Now define t † = arg max t ∈ [ t , f ( h x ( t )) , t ‡ = max t ∈ [ t † , f ( h x ( t ))= f ( h x ( t † )) t. In other words, t † is an arbitrary maximizer of f ( h x ( t )) ,while t ‡ is the largest maximizer. Clearly, both t † and t ‡ are well defined (due to the continuity and closedness) andare strictly between t and . We also have f ( h x ( t ‡ )) >f ( h x (1)) strictly holds. By the continuity of h x ( · ) and f ( h x ( · )) , there exist r, δ > such that [ t ‡ − δ, t ‡ + δ ] ⊆ ( t , and • for x ∈ B ( h x ( t ‡ ) , r ) ∩ ˆ X , f ( h x (1)) ≤ f ( x ) . • for t ∈ [ t ‡ − δ, t ‡ ) , h x ( t ) ∈ B ( h x ( t ‡ ) , r ) . Therefore f ( h x (1)) ≤ f ( h x ( t )) ≤ f ( h x ( t ‡ )) . • for t ∈ ( t ‡ , t ‡ + δ ] , h x ( t ) ∈ B ( h x ( t ‡ ) , r ) . Therefore f ( h x (1)) ≤ f ( h x ( t )) < f ( h x ( t ‡ )) .Now we construct another h ∗ as h ∗ ( t ) = h x ( t ) , if t ∈ [0 , \ [ t ‡ − δ, t ‡ + δ ] t ‡ + δ − t δ h x ( t ‡ − δ ) + t − t ‡ + δ δ h x ( t ‡ + δ ) , if t ∈ [ t ‡ − δ, t ‡ + δ ] . It is easy to verify h ∗ is continuous. For t ∈ [ t ‡ − δ, t ‡ + δ ] , h ∗ ( t ) is the convex combination of h x ( t ‡ − δ ) and h x ( t ‡ + δ ) ,and must be within B ( h x ( t ‡ ) , r ) ∩ ˆ X , which is convex. There-fore, h ∗ is entirely within ˆ X and f ( h ∗ ( t )) ≥ f ( h x (1)) = f ( h ∗ (1)) holds for all t .Next, we are showing L ( h ∗ ) < L ( h x ) by (6). The strictinequality in (6) is because of Corollary 4.Above all, the arc-length reparameterization of h ∗ , denotedas h ∗ , is feasible to (5) but achieves a strictly lower cost than h x . This contradicts to the optimality of h x . D. Verification1) To show V satisfies Definition 10: It is sufficient to show V is continuous in x . The proof is twofold. To abuse thenotations a little bit, we let h x ( t ) ≡ x for x ∈ X , so such h x is the unique minimizer of (4) and L ( h x ) = V ( x ) = 0 for x ∈ X .First we show for x ∈ ˆ X and (cid:15) > , there exists δ + > such that ∀ x ∈ B ( x , δ + ) ∩ ˆ X , V ( x ) ≤ V ( x ) + (cid:15) . There aretwo scenarios. If h x (1) is a global optimum of (1), then wecould set δ + = (cid:15) . For any x ∈ B ( x , δ + ) ∩ ˆ X , construct h ∗ ( t ) = (cid:26) (1 − t ) x + 2 tx , t ∈ [0 , ] h x (2 t − , t ∈ ( , . Its arc-length reparameterization h ∗ is feasible to (4) (w.r.t. x ) and V ( x ) ≤ L ( h ∗ ) = | x − x | + L ( h x ) ≤ V ( x ) + (cid:15) .Next we focus on the scenario that h x (1) is not a globaloptimum of (1), so it is not a local optimum neither. ByLemma 5, there is a path h (cid:111) in X such that h (cid:111) (0) = h x (1) , f ( h (cid:111) ( t )) is non-increasing in t , f ( h (cid:111) (1)) < f ( h (cid:111) (0)) and L ( h (cid:111) ) < (cid:15)/ . Suppose f ( h (cid:111) (0)) − f ( h (cid:111) (1)) = τ > . Since f is continuous, there must be some γ > such that forany x ∈ B ( x , γ ) ∩ ˆ X , we have | f ( x ) − f ( x ) | < τ . Nowwe choose δ + as min( γ, (cid:15)/ . For any x ∈ B ( x , δ + ) ∩ ˆ X ,construct h ∗ ( t ) = (1 − t ) x + 3 tx , t ∈ [0 , ] h x (3 t − , t ∈ ( , ] h (cid:111) (3 t − , t ∈ ( , . Its arc-length reparameterization h ∗ is feasible to (4) (w.r.t. x )and V ( x ) ≤ L ( h ∗ ) = | x − x | + L ( h x ) + L ( h (cid:111) ) ≤ δ + + V ( x ) + (cid:15)/ ≤ V ( x ) + (cid:15) .Second we show for x ∈ ˆ X and (cid:15) > , there exists δ − > such that ∀ x ∈ B ( x , δ − ) ∩ ˆ X , V ( x ) ≥ V ( x ) − (cid:15) . If not, thenthere must be a sequence ( x i ) ∞ i =1 such that lim i →∞ x i = x but V ( x i ) < V ( x ) − (cid:15) for all i ≥ . Let h i := h x i for i ≥ ,then both ( h i ) ∞ i =1 and ( L ( h i )) ∞ i =1 are uniformly bounded. ByLemma 7, a subsequence of ( h i ) ∞ i =1 uniformly converges to alimit h ∗ and L ( h ∗ ) = L ( h ∗ ) ≤ lim sup i L ( h i )= lim sup i V ( x i ) ≤ V ( x ) − (cid:15). Lemma 7 also indicates h ∗ ∈ ˆ H and h ∗ (0) = lim i →∞ h i (0) =lim i →∞ x i = x , h ∗ (1) = lim i →∞ h i (1) ∈ X (as X isclosed). Therefore, h ∗ is feasible to (4) but its cost is strictlylower than V ( x ) . It leads to the contradiction. 2) To show (C3) holds: By our construction (5), h x isentirely within ˆ X , and h x (0) = x, h x (1) ∈ X . Lemma 10shows f ( h x ( t )) is non-increasing for t ∈ [0 , . It is sufficientto show V ( h x ( t )) is also non-increasing for t ∈ [0 , .Consider the following lemma. Lemma 11. For fixed x ∈ ˆ X and t ∈ [0 , , V ( h x ( t )) = sup π ∈ Π | t L π ( h x ) Proof. Let x = h x ( t ) . We have L ( h x ) = V ( h x ( t )) . Ifthe lemma does not hold, then we have two cases.First, if L ( h x ) < sup π ∈ Π | t L π ( h x ) , then let h ∗ ( t ) = (cid:26) h x (2 t t ) , t ∈ [0 , ] h x (2 t − , t ∈ ( , . It is easy to check h ∗ is continuous and entirely within ˆ X ,and h ∗ (0) = x , h ∗ (1) = h x (1) ∈ X . For t ∈ [0 , / , f ( h ∗ ( t )) = f ( h x (2 t t )) ≥ f ( h x ( t )) = f ( x )= f ( h x (0)) ≥ f ( h x (1)) = f ( h ∗ (1)) . For t ∈ [1 / , , f ( h ∗ ( t )) = f ( h x (2 t − ≥ f ( h x (1)) = f ( h ∗ (1)) . Further, we have L ( h ∗ ) = sup π ∈ Π L π ( h ∗ )= sup π ∈ Π | . L π ( h ∗ ) + sup π ∈ Π | . L π ( h ∗ )= sup π ∈ Π | t L π ( h x ) + sup π ∈ Π L π ( h x )= sup π ∈ Π | t L π ( h x ) + L ( h x ) < sup π ∈ Π | t L π ( h x ) + sup π ∈ Π | t L π ( h x ) = L ( h x ) . Above all, the arc-length reparameterization of h ∗ , denoted as h ∗ , is feasible to (5) (w.r.t. x ) but achieves a strictly lowercost than h x . This contradicts to the optimality of h x .Second, if L ( h x ) > sup π ∈ Π | t L π ( h x ) , then let h ∗ ( t ) = (cid:26) h x ( t ) , t ∈ [0 , t ] h x ( t ) , t ∈ ( t , . It is easy to check h ∗ is continuous and entirely within ˆ X , and h ∗ (0) = h x ( t ) = x , h ∗ (1) = h x (1) ∈ X . For t ∈ [0 , , L ( h ∗ ) = sup π ∈ Π L π ( h ∗ ) = sup π ∈ Π | t ‡− δ L π ( h ∗ ) + sup π ∈ Π | t ‡ + δt ‡− δ L π ( h ∗ ) + sup π ∈ Π | t ‡ + δ L π ( h ∗ )= sup π ∈ Π | t ‡− δ L π ( h x ) + (cid:107) h x ( t ‡ − δ ) − h x ( t ‡ + δ ) (cid:107) (cid:96) + sup π ∈ Π | t ‡ + δ L π ( h x ) < sup π ∈ Π | t ‡− δ L π ( h x ) + (cid:107) h x ( t ‡ − δ ) − h x ( t ‡ ) (cid:107) (cid:96) + (cid:107) h x ( t ‡ ) − h x ( t ‡ + δ ) (cid:107) (cid:96) + sup π ∈ Π | t ‡ + δ L π ( h x ) ≤ sup π ∈ Π | t ‡− δ L π ( h x ) + sup π ∈ Π | t ‡ + δt ‡− δ L π ( h x ) + sup π ∈ Π | t ‡ + δ L π ( h x ) = sup π ∈ Π L π ( h x ) = L ( h x ) . (6) f ( h x ( t )) ≥ f ( h x (1)) implies f ( h ∗ ( t )) ≥ f ( h x (1)) = f ( h ∗ (1)) . Further, we have L ( h ∗ ) = sup π ∈ Π L π ( h ∗ ) = sup π ∈ Π | t L π ( h ∗ ) + sup π ∈ Π | t L π ( h ∗ )=0 + sup π ∈ Π | t L π ( h x ) < L ( h x ) . Above all, the arc-length reparameterization of h ∗ , denoted as h ∗ , is feasible to (5) (w.r.t. x ) but achieves a strictly lowercost than h x . This contradicts to the optimality of h x .Using this lemma, we are in a good position to show V ( h x ( t )) is non-increasing for t ∈ [0 , . For any t < t ,we have V ( h x ( t )) = sup π ∈ Π | t L π ( h x ) = sup π ∈ Π | t t L π ( h x ) + sup π ∈ Π | t L π ( h x ) ≥ sup π ∈ Π | t L π ( h x ) = V ( h x ( t )) . 3) To show (C4) holds: The set { L ( h x ) } x ∈ ˆ X \X is uni-formly bounded by max x ∈ ˆ X | x − x ∗ | , which is finite.To summarize, we have verified that such construction iswell defined and satisfies both (C3) and (C4), so Lemma 6is proved. Since we have shown that Lemma 6 impliesTheorem 3, the latter is also proved.V. O THER P ROPERTIES A. Constructing from Primitives Though the previous section guarantees the existence of theLyapunov-like function and paths under certain conditions, itis not clear how to systematically find or construct them. Inthis subsection, we show that if one can find the Lyapunov-likefunction and paths for some primitive problems, then there arenatural ways to construct the Lyapunov-like function and pathsfor new problems built up from those primitives in certainways. To streamline the notations, we will use the tuple ( f, X ) to refer to (1) and the tuple ( f, X , ˆ X ) to refer to the problempair (1), (2). Assume ( V, { h x } x ∈ ˆ X \X ) is a valid constructionof Lyapunov-like function and paths for ( f, X , ˆ X ) . In thissubsection, when we say V and h x are valid, it means theynot only are valid by definition, but also satisfy (C1) and (C2). 1) Function Composition: Suppose g : R → R is non-decreasing and convex. Then ( V, { h x } x ∈ ˆ X \X ) is also a validconstruction of Lyapunov-like function and paths for ( g ◦ f, X , ˆ X ) . This result is trivial as g ◦ f preserves the convexityover ˆ X and monotonicity over any path. 2) Union of Feasible Sets: Suppose for two pairs of prob-lems ( f , X , ˆ X ) , for which ( V , { h x } x ∈ ˆ X \X ) is valid, and ( f , X , ˆ X ) , for which ( V , { h x } x ∈ ˆ X \X ) is valid. We con-sider a new problem ( f, X , ˆ X ) where X := ( X ∪X ) ∩ ˆ X ∩ ˆ X and ˆ X := ˆ X ∩ ˆ X . The formulation of f will be providedlater. If for any x ∈ ˆ X \ X , we have h x ≡ h x , then construct ˜ V : ˆ X → R such that ˜ V ( x ) := V ( x ) · V ( x ) and ˜ h x = h x for all x ∈ ˆ X \ X . We have the following two results. Corollary 5. For any λ ∈ (0 , , define f : ˆ X → R as f ( x ) := λf ( x ) + (1 − λ ) f ( x ) . Then ( ˜ V , { ˜ h x } x ∈ ˆ X \X ) is valid for ( f, X , ˆ X ) . Corollary 6. Define function f : ˆ X → R as f ( x ) :=max( f ( x ) , f ( x )) . Then ( ˜ V , { ˜ h x } x ∈ ˆ X \X ) is valid for ( f, X , ˆ X ) .Proof for Corollary 5 and Corollary 6. The function ˜ V isstill continuous and vanishes if and only if x ∈ X (since V ( x ) = 0 ⇔ V ( x ) = 0 or V ( x ) = 0 ). By construction, { h x } x ∈ ˆ X \X is a subset of { h x } x ∈ ˆ X \X so (C2) is naturallysatisfied. To see (C1) holds, we fix any x ∈ ˆ X \ X . Then h x (0) = h x ( x ) = x and h x (1) = h x (1) ∈ X ⊆ X . Further, V ( h x ( t )) = V ( h x ( t )) V ( h x ( t )) = V ( h x ( t )) V ( h x ( t )) as h x and h x conincide when x ∈ ˆ X \X . Because both V ( h x ( t )) and V ( h x ( t )) are non-negative and non-increasing, so is V ( h x ( t )) . Finally, as f ( h x ( t )) and f ( h x ( t )) are both non-increasing over [0 , , their convex-combination or maximum(i.e., f ( h x ( t )) ) must be non-increasing as well. A similarargument can also be applied to show f ( h x (1)) < f ( h x (0)) .Thus (C1) holds and it completes the proof. 3) Intersection of Feasible Sets: We still consider two pairsof problems ( f , X , ˆ X ) , for which ( V , { h x } x ∈ ˆ X \X ) isvalid, and ( f , X , ˆ X ) , for which ( V , { h x } x ∈ ˆ X \X ) is valid.Different from the previous setting, two pairs are required toshare the same relaxation set ˆ X . Further, we view each x ∈ ˆ X as a tuple with two parts x := ( u, v ) . Define P and P astwo projection operators such that P x = u and P x = v .We consider a new problem ( f, X , ˆ X ) where X := X ∩X .The formulation of f will be provided later.If f i , V i and h ix are completely separated with respect to u and v in the sense that for i = 1 , , f i ( x ) , V i ( x ) depend on P i x only and P − i ( h ix ( t )) is constant, then we can construct ˜ V as ˜ V ( x ) := V ( x ) + V ( x ) . For x ∈ ˆ X \ X , the path ˜ h x is TABLE IA SUMMARY ON CONSTRUCTING V AND h x FROM PRIMITIVES .Operation Primitive problem New problem V, h x for new problem Additional requirementsFunctionComposition ( f, X , ˆ X ) : V, h x ( g ◦ f, X , ˆ X ) ˜ V := V ˜ h x := h x g is non-decreasing andconvexUnion of feasible sets(with cost as the sum) ( f , X , ˆ X ) : V , h x ( f , X , ˆ X ) : V , h x ( f, X , ˆ X ) where f := λf + (1 − λ ) f X := ( X ∪ X ) ∩ ˆ X ∩ ˆ X ˆ X := ˆ X ∩ ˆ X ˜ V := V × V ˜ h x := h x h x and h x coincide for x ∈ ˆ X \ X Union of feasible sets(with cost as the max-imum) ( f , X , ˆ X ) : V , h x ( f , X , ˆ X ) : V , h x ( f, X , ˆ X ) where f := max( f , f ) X := ( X ∪ X ) ∩ ˆ X ∩ ˆ X ˆ X := ˆ X ∩ ˆ X ˜ V := V × V ˜ h x := h x h x and h x coincide for x ∈ ˆ X \ X Intersection of feasi-ble sets (with cost asthe sum) ( f , X , ˆ X ) : V , h x ( f , X , ˆ X ) : V , h x ( u, v ) =: x ∈ ˆ X ( f, X , ˆ X ) where f := λf + (1 − λ ) f X := X ∩ X ˜ V := V + V ˜ h x is defined as in (7) f i ( x ) , V i ( x ) depend on P i x only and P − i h ix isconstantIntersection of feasi-ble sets (with cost asthe maximum) ( f , X , ˆ X ) : V , h x ( f , X , ˆ X ) : V , h x ( u, v ) =: x ∈ ˆ X ( f, X , ˆ X ) where f := max( f , f ) X := X ∩ X ˜ V := V + V ˜ h x is defined as in (7) f i ( x ) , V i ( x ) depend on P i x only and P − i h ix isconstant constructed in three ways depending on the values of V ( x ) and V ( x ) .If V ( x ) = 0 then ˜ h x := h x , (7a)If V ( x ) = 0 then ˜ h x := h x , (7b)If V ( x ) , V ( x ) > then ˜ h x ( t ) := (cid:26) h x (2 t ) , t ∈ [0 , ) h h x (1) (2 t − , t ∈ [ , . (7c) Corollary 7. For any λ ∈ (0 , , define f : ˆ X → R as f ( x ) := λf ( x ) + (1 − λ ) f ( x ) . Then ( ˜ V , { ˜ h x } x ∈ ˆ X \X ) is valid for ( f, X , ˆ X ) . Corollary 8. Define function f : ˆ X → R as f ( x ) :=max( f ( x ) , f ( x )) . Then ( ˜ V , { ˜ h x } x ∈ ˆ X \X ) is valid for ( f, X , ˆ X ) .Proof for Corollary 7 and Corollary 8. The function ˜ V isstill continuous and vanishes if and only if x ∈ X (since V ( x ) = 0 ⇔ V ( x ) = 0 and V ( x ) = 0 ). The set { ˜ h x } x ∈ ˆ X \X satisfies (C2) as each path is constructed either as h ix or theconcatenation of h x and h h x (1) . Next we are showing ˜ h x (1) isin X . If ˜ h x is constructed by (7a), then we have ˜ V (˜ h x (1)) = V ( h x (1)) + V ( h x (1)) = V ( h x (1)) . Since V ( h x (1)) onlydepends on P h x (1) and P h x (1) = P h x (0) = P x , theremust be V ( h x (1)) = V ( x ) = 0 . Thus ˜ V (˜ h x (1)) = 0 and ˜ h x (1) ∈ X . It is similar if ˜ h x is constructed by (7b). When ˜ h x is constructed by (7c), then ˜ V (˜ h x (1)) = V ( h h x (1) (1)) + V ( h h x (1) (1))= V ( h h x (1) (1)) = V ( h h x (1) (0))= V ( h x (1)) = 0 , so ˜ h x (1) ∈ X as well. The monotonicity properties of ˜ V (˜ h x ( t )) and f (˜ h x ( t )) are also the direct consequence of thefact that f i , V i and h ix are completely separated.A summary of this subsection has been provided in Table I. B. Weak Exactness One observation from the proof of Theorem 1 is we do notactually need f ( h x (0)) > f ( h x (1)) to eliminate genuine localoptima. However, such strict inequality is required to showthe exactness. We can consider a weaker version of exactnessdefined as follows. Definition 12 (Weak Exactness) . We say the relaxation (2) is weakly exact with respect to (1) if at least one optimum of (2) is feasible, and hence globally optimal, for (1) . Theorem 4. If there exists a Lyapunov-like function V associ-ated with (1) and (2) such that (C3) and (C2) hold, then (2) isweakly exact with respect to (1) and any local optimum in X for (1) is either a global optimum or a pseudo local optimum. The argument on weak exactness follows from the fact thatthe path connects any global optimum of (2) must determine aendpoint in X with the same cost, which by definition must bea global optimum as well. The argument on local optimalityfollows directly from the proof of Theorem 1.VI. A PPLICATIONS In this sections we will use two examples to show forspecific problems, what V and { h x } might look like. The firstexample is Optimal Power Flow (OPF) problem in power sys-tems with tree structres, which is also the motivating problemfor us to develop this theory. By finding the Lyapunov-likefunction and paths, we show in [1] the first known condition(that can be checked a priori ) for OPF to have no spuriouslocal optima. The same condition was only known to guaranteeexact relaxation before our work.In the second example, we study the Low Rank SemidefiniteProgram (LRSDP) problem, which was known to have weaklyexact relaxation [5], [6] and no spurious local optima [7] inexisting literatures. Specifically, we show that part of the re-sults proved in [7] can also be proved by finding appropriate V and { h x } . They exemplify the usage of Theorem 1, Theorem 2and Theorem 4 in practice. A. Optimal Power Flow Consider a radial power network with an underlying con-nected directed graph G ( V , E ) . Let V := { , , · · · , N − } be the set of buses (i.e., nodes), and E ⊆ V × V be the set ofpower lines (i.e., edges). We will refer to a power line frombus j to bus k by j → k or ( j, k ) interchangeably. For eachpower line ( j, k ) , its series admittance is denoted by y jk ∈ C ,and its series impedance is hence z jk := y − jk . Both the realand imaginary parts of z jk are assumed to be positive.As we assume G is a tree, we can adopt the DistFlow Model[36], [37] to formulate power flow equations. For each bus j ,let V j ∈ C , s j = p j + i q j ∈ C denote its voltage and businjection respectively. For line ( j, k ) , let S jk and I jk ∈ C denote the branch power flow and current from bus j to k , bothat the sending end. Let v j := | V j | ∈ R and (cid:96) jk := | I jk | ∈ R .We will denote the conjugate of a complex number a by a H .The power flow equations are: v j = v k + 2Re( z jk S H jk ) − | z jk | (cid:96) jk , ∀ ( j, k ) ∈ E (8a) v j = | S jk | (cid:96) jk , ∀ ( j, k ) ∈ E (8b) s j = (cid:88) k : j → k S jk − (cid:88) i : i → j ( S ij − z ij (cid:96) ij ) , ∀ j ∈ V . (8c)Given a cost function f ( s ) : C N → R , we are interested inthe following OPF problem:minimize x =( s,v,(cid:96),S ) f ( s ) (9a)subject to (8) (9b) v j ≤ v j ≤ v j (9c) s j ≤ s j ≤ s j (9d) (cid:96) jk ≤ (cid:96) jk (9e)All the inequalities for complex numbers in this section areenforced for both the real and imaginary parts. Definition 13. A function g : R → R is strongly increasing ifthere exists real c > such that for any a > b , we have g ( a ) − g ( b ) ≥ c ( a − b ) . We now make the following assumptions on OPF:(i) The underlying graph G is a tree.(ii) The cost function f is convex, and is strongly increasingin Re( s j ) (or Im( s j ) ) for each j ∈ V and non-decreasingin Im( s j ) (or Re( s j ) ).(iii) The problem (9) is feasible.(iv) The line current limit satisfies (cid:96) jk ≤ v j | y jk | .Assumption (i) is generally true for distribution networks andassumption (iii) is typically mild. As for (ii), f is commonlyassumed to be convex and increasing in Re( s j ) and Im( s j ) in the literature (e.g., [38], [9]). Assumption (ii) is onlyslightly stronger since one could always perturb any increasingfunction by an arbitrarily small linear term to achieve strongmonotonicity. Assumption (iv) is not common in the literaturebut is also mild because of the following reason. Typically V j = (1 + (cid:15) j ) e i θ j in per unit where (cid:15) ∈ [ − . , . and theangle difference θ jk := θ j − θ k between two neighboring buses j, k typically has a small magnitude. Thus the maximum valueof | V j − V k | = | (1 + (cid:15) j ) e i θ jk − (1 + (cid:15) k ) | , which is equivalentto (cid:96) jk / | y jk | , should be much smaller than v j which is ≈ per unit.Problem (9) is non-convex, as constraint (8b) is not convex.Denote by X the set of ( s, v, (cid:96), S ) that satisfy (9b)-(9e), so(9) is in the form of (1). We can relax (9) by convexifying(8b) into a second-order cone [8]:minimize x =( s,v,(cid:96),S ) f ( s ) (10a)subject to (8a) , (8c) , (9c) − (9e) (10b) | S jk | ≤ v j (cid:96) jk (10c)One can similarly regard ˆ X as the set of ( s, v, (cid:96), S ) thatsatisfy (10b), (10c). It is proved in [8] that if s j = −∞ − i ∞ for all j ∈ V , then (10) is exact, meaning any optimal solutionof (10) is also feasible and hence globally optimal for (9). Nowwe show that the same condition also guarantees that any localoptimum of (9) is also globally optimal. This implies that alocal search algorithm such as the primal-dual interior pointmethod can produce a global optimum as long as it converges. Theorem 5. If s j = −∞ − i ∞ for all j ∈ V , then any localoptimum of (9) is a global optimum.Proof. Our strategy is to construct appropriate V and { h x } and then prove such construction satsify both Condition (C)and Condition (C’). Let V ( x ) := (cid:88) ( j,k ) ∈E v j (cid:96) jk − | S jk | . (11)Clearly, V is a valid Lyapunov-like function satisfying Defi-nition 10.For each x = ( s, v, (cid:96), S ) ∈ ˆ X \ X , let M be the set of ( j, k ) ∈ E such that | S jk | < v j (cid:96) jk . For ( j, k ) ∈ M , thequadratic function φ jk ( a ) := | z jk | a + (cid:0) v j − Re( z jk S H jk ) (cid:1) a + | S jk | − v j (cid:96) jk must have a unique positive root as φ jk (0) < . We define ∆ jk to be this positive root if ( j, k ) ∈ M and otherwise.Assumption (iv) implies (cid:96) jk ≤ v j | y jk | , and therefore v j − Re( z jk S H jk ) ≥ v j − | z jk || S jk |≥ v j − | z jk | (cid:112) v j (cid:96) jk ≥ v j − | z jk | (cid:113) v j | y jk | = 0 . It further implies φ jk ( a ) is strictly increasing for a ∈ [0 , ∆ jk ] .Now consider the path h x ( t ) := (˜ s ( t ) , ˜ v ( t ) , ˜ (cid:96) ( t ) , ˜ S ( t )) for t ∈ [0 , , where ˜ s j ( t ) = s j − t (cid:88) i : i → j z ij ∆ ij − t (cid:88) k : j → k z jk ∆ jk , (12a) ˜ v j ( t ) = v j , (12b) ˜ (cid:96) jk ( t ) = (cid:96) jk − t ∆ jk , (12c) ˜ S jk ( t ) = S jk − t z jk ∆ jk . (12d) Clearly we have h x (0) = x . It can be easily checked that h x ( t ) is feasible for (10) for t ∈ [0 , and h x (1) is feasiblefor (9) (see [1]). Therefore, h x is indeed [0 , → ˆ X and h x (1) ∈ X .Since z jk > , both real and imaginary parts of ˜ s j ( t ) arestrictly decreasing for ( j, k ) ∈ M and stay unchanged other-wise. By assumption (ii), f (˜ s ( t )) is also strictly decreasing.To show V ( h x ( t )) is also decreasing, we notice that V ( h x ( t )) equals to (cid:88) ( j,k ) ∈E ˜ v j ( t )˜ (cid:96) jk ( t ) − | ˜ S jk ( t ) | = (cid:88) ( j,k ) ∈M c v j (cid:96) jk − | S jk | + (cid:88) ( j,k ) ∈M ˜ v j ( t )˜ (cid:96) jk ( t ) − | ˜ S jk ( t ) | = (cid:88) ( j,k ) ∈M c v j (cid:96) jk − | S jk | − (cid:88) ( j,k ) ∈M φ jk ( t ∆ jk ) . As φ jk ( a ) is strictly increasing for a ∈ [0 , ∆ jk ] , we concludethat V ( h x ( t )) is strictly decreasing for t ∈ [0 , .By Corollary 1, the set { h x } x ∈ ˆ X \X is uniformly boundedand uniformly equicontinuous as all h x ( t ) are linear functionsin t . In summary, Condition (C) is satisfied.Finally, we show Condition (C’) also holds. By assumption(ii), there exists some real c > independent of x such thatfor any ≤ a < b ≤ , f (˜ s ( a )) − f (˜ s ( b )) ≥ c (cid:88) j ∈V Re(˜ s j ( a ) − ˜ s j ( b )) + Im(˜ s j ( a ) − ˜ s j ( b ))= c (cid:107) ˜ s ( a ) − ˜ s ( b ) (cid:107) m where (cid:107) · (cid:107) m is defined as (cid:107) a (cid:107) m := (cid:80) i | Re( a i ) | + | Im( a i ) | over the complex vector space. It is easy to check (cid:107) · (cid:107) m is avalid norm.On the other hand, by (12) we have (cid:107) ˜ v ( a ) − ˜ v ( b ) (cid:107) m ≡ and (cid:107) ˜ (cid:96) ( a ) − ˜ (cid:96) ( b ) (cid:107) m ≤ ( j,k ) ∈E {(cid:107) z jk (cid:107) m } (cid:107) ˜ s ( a ) − ˜ s ( b ) (cid:107) m , (cid:107) ˜ S ( a ) − ˜ S ( b ) (cid:107) m ≤ (cid:107) ˜ s ( a ) − ˜ s ( b ) (cid:107) m . Therefore, (cid:107) h x ( a ) − h x ( b ) (cid:107) m ≤ (cid:16) 32 + 1min ( j,k ) ∈E {(cid:107) z jk (cid:107) m } (cid:17) (cid:107) ˜ s ( a ) − ˜ s ( b ) (cid:107) m and there exists ˆ c > independent of x, a, b such that f (˜ s ( a )) − f (˜ s ( b )) ≥ ˆ c (cid:107) h x ( a ) − h x ( b ) (cid:107) m . Therefore Condition (C’) is also satisfied and by Theorem 2,any local optimum of (9) is a global optimum.The results in this subsection only apply to radial networks,which serve as the underlying network for balanced distribu-tion power systems. For transmission systems and unbalanceddistribution systems, networks are usually highly meshed. Ithas been found that for most of meshed networks, both convexrelaxation and local search algorithms can also yield the globaloptimum for most of testcases [39], [40]. Thus Theorem 3 suggests that there may also exist similar Lyapunov-like func-tion and paths for meshed networks. Finding such Lyapunov-like function and paths would be an interesting future work toextend our results in this paper. B. Low Rank Semidefinite Program This subsection proves a known result in [7] but using adifferent approach. Adopting the same notations as [7], wehave the following problem.minimize X ≥ tr( CX ) (13a)subject to tr( A i X ) = b i , i = 1 , · · · , m (13b) rank( X ) ≤ r (13c)Here, C , A i , X are all n -by- n matrices. We assume theproblem is feasible and { X ≥ | (13b) } is compact. Theorem 6. If ( r + 1)( r + 2) / > m + 1 , then any localoptimum of (13) is either a global optimum or a pseudo localoptimum. Before proving Theorem 6, we consider the convex relax-ation of (13) asminimize X ≥ tr( CX ) (14a)subject to tr( A i X ) = b i , i = 1 , · · · , m (14b)As a side note, the results in [5], [6] show that if ( r + 1)( r +2) / > m , then (14) is weakly exact to (13). While ourtheorem is the same as in [7], some insights to find V and { h X } are also from the structures first raised in [5], [6]. Proof. Clearly, (13) can be reformulated in the form of (1) bysetting f ( X ) = tr( CX ) , X = { X ≥ | (13b) , (13c) } and ˆ X = { X ≥ | (13b) } . Define V as V ( X ) := n (cid:88) i = r +1 λ i ( X ) where λ i ( X ) is the i th eigenvalue of X (in decreasing order).This function V satisfies Definition 10 and is concave.For fixed X ∈ ˆ X \ X , we denote rank( X ) as r > r .We first construct r − r paths labeled as h , h , · · · , h r − r .When we construct h i , if i > then we assume path h i − has already been constructed and let X i − := h i − (1) . Welet X = X . For i ≥ , if rank( X i − ) ≤ r − i then welet h i ( t ) ≡ X i − for t ∈ [0 , . Otherwise, we decompose X i − as U Σ U H where Σ is a k -by- k positive definite diagonalmatrix with k = rank( X i − ) > r − i . The linear system tr( CU Y U H ) =0tr( A i U Y U H ) =0 , i = 1 , · · · , m (15)must have a non-zero solution for Hermitian matrix Y ∈ C k × k . To see this, we have k ≥ r − i + 1 ≥ r + 1 , andthus k ( k + 1) / ≥ ( r + 1)( r + 2) / > m + 1 . As a result,(15) has more unknown variables than equations. We simplydenote this non-zero solution as Y and for any α ∈ R , αY isalso a solution to (15). The concavity of V also implies that V ( U (Σ + αY ) U H ) is concave in α when U and Σ are fixed.Since Σ > , one of the following two scenarios must be true. • ∃ a < such that V ( U (Σ + αY ) U H ) is non-decreasing, rank( U (Σ+ αY ) U H ) ≤ k for α ∈ [ a, and rank( U (Σ+ aY ) U H ) ≤ k − . • ∃ b > such that V ( U (Σ + αY ) U H ) is non-increasing, rank( U (Σ+ αY ) U H ) ≤ k for α ∈ [0 , b ] and rank( U (Σ+ bY ) U H ) ≤ k − .Without loss of generality, we suppose V ( U (Σ + αY ) U H ) isnon-increasing for α ∈ [0 , b ] (otherwise we take − Y instead).We then construct h i as h i ( t ) = U (Σ + tbY ) U H for t ∈ [0 , .By construction, V ( h i ( t )) is non-increasing and f ( h i ( t )) staysa constant.Finally, we construct h X as the concatenation of paths h , · · · , h r − r . That is to say, h X ( t ) := h i (( r − r ) t − i + 1) for t ∈ (cid:104) i − r − r , ir − r (cid:105) . It is easy to see h X is continuous and h X (0) = h (0) = X . To see h X (1) ∈ X , we prove that rank( X i ) ≤ r − i . We first have rank( X ) = rank( X ) = r . For i ≥ ,we have rank( X i ) = rank( X i − ) if rank( X i − ) ≤ r − i and rank( X i ) ≤ rank( X i − ) − otherwise. By induction,we can prove rank( X i ) ≤ r − i always holds. As a result, rank( h X (1)) = rank( h r − r (1)) ≤ r and thus h X (1) ∈ X . Byconstruction, h i ( t ) never violates (13b) and thus is in ˆ X , so is h X ( t ) for all t . Functions V ( h i ( t )) and f ( h i ( t )) being non-increasing implies that V ( h X ( t )) and f ( h X ( t )) are also non-increasing. Therefore, (C3) is satisfied. By Corollary 1 (C2)also holds for { h X } . It completes the proof (by Theorem 4). Remark 4. In [7], Theorem 3.4 claims that any local optimumof (13) should also be globally optimal, unless it is harboredin some positive-dimensional face of SDP. The result in ourpaper further asserts that if it is indeed harbored in such aface, then there must be some point on the edge of this facewhose cost can be further reduced in its neighborhood (i.e.,the local optimum is in the same situation as point c ratherthan d as in Fig. 1). VII. C ONCLUSION AND D ISCUSSIONTABLE IIS UFFICIENT AND NECESSARY CONDITIONS Condition Relaxation exactness Local optimalitySufficient conditions: ⇒ (C1), (C2) Strong exactness l.o. is p.l.o. or g.o.(C3), (C2) Weak exactness l.o. is p.l.o. or g.o.(C’) Strong exactness l.o. is g.o.Necessary condition: ⇐ (C1), (C2) Strong exactness l.o. is g.o. Table II summaries both sufficient and necessary conditionsfor non-convex problem (1) to simultaneously have exact(weak or strong) relaxation and no spurious local optima(allowing or not allowing pseudo local optima). The necessarycondition relies on Assumption 1, which is usually true for real-world problems. Those results provide a new perspectiveto certify a non-convex problem is computationally easy tosolve. Furthermore, whenever the problem is indeed compu-tationally easy, the certificates (Lyapunov-like functions andpaths) are guaranteed to exist. We also provide a hierarchicalframework which shows how such certificates for a compli-cated problem can be constructed from primitive problems.Our results have been applied to OPF and LRSDP problems.Based on the examples shown in Section VI, a natural wayto apply this approach is to first look at existing results onexact relaxation, and then construct V and { h x } accordingto the hidden structure underlying the exactness. Once V and { h x } are appropriately constructed, our result can help extendexisting results on relaxation exactness to new results on localoptimality.Compared to some existing techniques to study local opti-mality, our results do not require differentiating or analyzingthe curvature of feasible sets. It allows the feasible setsto incorporate more complicated and possibly non-convexconstraints. Those non-convex constraints are common forproblems arising in cyber physical systems which are generallygoverned by physical laws.R EFERENCES[1] F. Zhou and S. H. Low, “A sufficient condition for local optima to beglobally optimal,” in To appear in Proc. of the 2020 Conference onDecision and Control . IEEE, 2020.[2] E. J. Cand`es and B. Recht, “Exact matrix completion via convexoptimization,” Foundations of Computational mathematics , vol. 9, no. 6,p. 717, 2009.[3] E. J. Cand`es and T. Tao, “The power of convex relaxation: Near-optimalmatrix completion,” IEEE Transactions on Information Theory , vol. 56,no. 5, pp. 2053–2080, 2010.[4] R. Ge, J. D. Lee, and T. Ma, “Matrix completion has no spuriouslocal minimum,” in Advances in Neural Information Processing Systems ,2016, pp. 2973–2981.[5] A. I. Barvinok, “Problems of distance geometry and convex propertiesof quadratic maps,” Discrete & Computational Geometry , vol. 13, no. 2,pp. 189–202, 1995.[6] G. Pataki, “On the rank of extreme matrices in semidefinite programsand the multiplicity of optimal eigenvalues,” Mathematics of operationsresearch , vol. 23, no. 2, pp. 339–358, 1998.[7] S. Burer and R. D. Monteiro, “Local minima and convergence in low-rank semidefinite programming,” Mathematical Programming , vol. 103,no. 3, pp. 427–444, 2005.[8] M. Farivar and S. H. Low, “Branch flow model: Relaxations andconvexification–part I,” IEEE Transactions on Power Systems , vol. 28,no. 3, pp. 2554–2564, 2013.[9] L. Gan, N. Li, U. Topcu, and S. H. Low, “Exact convex relaxation ofoptimal power flow in radial networks,” IEEE Transactions on AutomaticControl , vol. 60, no. 1, pp. 72–87, 2015.[10] J. Lavaei and S. H. Low, “Zero duality gap in optimal power flowproblem,” IEEE Transactions on Power Systems , vol. 27, no. 1, pp.92–107, 2012.[11] J. Jald´en, C. Martin, and B. Ottersten, “Semidefinite programmingfor detection in linear systems-optimality conditions and space-timedecoding,” in , vol. 4. IEEE,2003, pp. IV–9.[12] C. Lu, Y.-F. Liu, W.-Q. Zhang, and S. Zhang, “Tightness of a new andenhanced semidefinite relaxation for mimo detection,” SIAM Journal onOptimization , vol. 29, no. 1, pp. 719–742, 2019.[13] Z. Li, Q. Guo, H. Sun, and J. Wang, “Sufficient conditions for exact re-laxation of complementarity constraints for storage-concerned economicdispatch,” IEEE Transactions on Power Systems , vol. 31, no. 2, pp.1653–1654, 2015.[14] J. Sun, Q. Qu, and J. Wright, “When are nonconvex problems not scary?” arXiv preprint arXiv:1510.06096 , 2015. [15] R. Ge, C. Jin, and Y. Zheng, “No spurious local minima in nonconvexlow rank problems: A unified geometric analysis,” in Proceedings of the34th International Conference on Machine Learning-Volume 70 . JMLR.org, 2017, pp. 1233–1242.[16] J. Sun, Q. Qu, and J. Wright, “Complete dictionary recovery over thesphere I: Overview and the geometric picture,” IEEE Transactions onInformation Theory , vol. 63, no. 2, pp. 853–884, 2016.[17] N. Boumal, “Nonconvex phase synchronization,” SIAM Journal onOptimization , vol. 26, no. 4, pp. 2355–2377, 2016.[18] S. Arora, R. Ge, T. Ma, and A. Moitra, “Simple, efficient, and neuralalgorithms for sparse coding,” Proceedings of Machine Learning Re-search , vol. 40, January 2015.[19] J. Carpentier, “Contribution to the economic dispatch problem,” Bulletinde la Societe Francoise des Electriciens , vol. 3, no. 8, pp. 431–447,1962.[20] A. Verma, “Power grid security analysis : An optimization approach,”Ph.D. dissertation, Columbia University, 2009.[21] K. Lehmann, A. Grastien, and P. Van Hentenryck, “AC-feasibility on treenetworks is NP-hard,” IEEE Transactions on Power Systems , vol. 31,no. 1, pp. 798–801, 2016.[22] J. A. Momoh, R. Adapa, and M. El-Hawary, “A review of selectedoptimal power flow literature to 1993. I. nonlinear and quadraticprogramming approaches,” IEEE transactions on power systems , vol. 14,no. 1, pp. 96–104, 1999.[23] J. A. Momoh, M. El-Hawary, and R. Adapa, “A review of selectedoptimal power flow literature to 1993. II. newton, linear programmingand interior point methods,” IEEE Transactions on Power Systems ,vol. 14, no. 1, pp. 105–111, 1999.[24] R. A. Jabr, A. H. Coonick, and B. J. Cory, “A primal-dual interiorpoint method for optimal power flow dispatching,” IEEE Transactionson Power Systems , vol. 17, no. 3, pp. 654–662, 2002.[25] R. A. Jabr, “Radial distribution load flow using conic programming,” IEEE transactions on power systems , vol. 21, no. 3, pp. 1458–1459,2006.[26] X. Bai, H. Wei, K. Fujisawa, and Y. Wang, “Semidefinite programmingfor optimal power flow problems,” Int’l J. of Electrical Power & EnergySystems , vol. 30, no. 6-7, pp. 383–392, 2008.[27] S. Bose, D. F. Gayme, K. M. Chandy, and S. H. Low, “Quadraticallyconstrained quadratic programs on acyclic graphs with application topower flow,” IEEE Transactions on Control of Network Systems , vol. 2,no. 3, pp. 278–287, 2015.[28] S. H. Low, “Convex relaxation of optimal power flow–part II: Exact-ness,” IEEE Transactions on Control of Network Systems , vol. 1, no. 2,pp. 177–189, 2014.[29] D. K. Molzahn and I. A. Hiskens, “A survey of relaxations andapproximations of the power flow equations,” Foundations and Trends®in Electric Energy Systems , vol. 4, no. 1-2, pp. 1–221, 2019.[30] V. A. Toponogov, Differential geometry of curves and surfaces .Springer, 2006.[31] E. Bierstone and P. D. Milman, “Semianalytic and subanalytic sets,” Publications Math´ematiques de l’Institut des Hautes ´Etudes Scien-tifiques , vol. 67, no. 1, pp. 5–42, 1988.[32] E. Bierstone, “Differentiable functions,” Boletim da Sociedade Brasileirade Matem´atica-Bulletin/Brazilian Mathematical Society , vol. 11, no. 2,pp. 139–189, 1980.[33] R. Hardt, “Some analytic bounds for subanalytic sets,” Differentialgeometric control theory, Progress in Math , vol. 27, pp. 259–267, 1983.[34] A. M. Gabri`elov, “Projections of semi-analytic sets,” Functional Analysisand its applications , vol. 2, no. 4, pp. 282–291, 1968.[35] S. Łojasiewicz, “On semi-analytic and subanalytic geometry,” BanachCenter Publications , vol. 34, no. 1, pp. 89–104, 1995.[36] M. E. Baran and F. F. Wu, “Optimal Capacitor Placement on radialdistribution systems,” IEEE Trans. Power Delivery , vol. 4, no. 1, pp.725–734, 1989.[37] ——, “Optimal Sizing of Capacitors Placed on A Radial DistributionSystem,” IEEE Trans. Power Delivery , vol. 4, no. 1, pp. 735–743, 1989.[38] B. Zhang and D. Tse, “Geometry of injection regions of power net-works,” IEEE Transactions on Power Systems , vol. 28, no. 2, pp. 788–797, 2013.[39] R. Jabr, A. Coonick, and B. Cory, “A primal-dual interior point methodfor optimal power flow dispatching,” IEEE Trans. on Power Systems ,vol. 17, no. 3, pp. 654–662, 2002.[40] S. Gopinath, H. Hijazi, T. Weißer, H. Nagarajan, M. Yetkin, K. Sundar,and R. Bent, “Proving global optimality of acopf solutions,”