Learning to Solve AC Optimal Power Flow by Differentiating through Holomorphic Embeddings
LLOPF: LEARNING OPTIMAL POWER FLOW BY DIFFERENTIATING THROUGH HOLOMORPHIC EMBEDDINGS 1
Learning to Solve AC Optimal Power Flow byDifferentiating through Holomorphic Embeddings
Henning Lange ∗ , Bingqing Chen † , Mario Berg´es † , Soummya Kar †∗ University of Washington, Seattle, WA, 98195, USA † Carnegie Mellon University, Pittsburgh, PA 15213, USA
Abstract —Alternating current optimal power flow (AC-OPF)is one of the fundamental problems in power systems operation.AC-OPF is traditionally cast as a constrained optimizationproblem that seeks optimal generation set points whilst fulfill-ing a set of non-linear equality constraints – the power flowequations. With increasing penetration of renewable generation,grid operators need to solve larger problems at shorter intervals.This motivates the research interest in learning OPF solutionswith neural networks, which have fast inference time and ispotentially scalable to large networks. The main difficulty insolving the AC-OPF problem lies in dealing with this equalityconstraint that has spurious roots, i.e. there are assignmentsof voltages that fulfill the power flow equations that howeverare not physically realizable. This property renders any methodrelying on projected-gradients brittle because these non-physicalroots can act as attractors. In this paper, we show efficientstrategies that circumvent this problem by differentiating throughthe operations of a power flow solver that embeds the power flowequations into a holomorphic function. The resulting learning-based approach is validated experimentally on a 200-bus systemand we show that, after training, the learned agent producesoptimized power flow solutions reliably and fast. Specifically, wereport a 12x increase in speed and a 40% increase in robustnesscompared to a traditional solver. To the best of our knowledge,this approach constitutes the first learning-based approach thatsuccessfully respects the full non-linear AC-OPF equations.
Index Terms —alternating current optimal power flow; rein-forcement learning; control; holomorphic embeddings
I. I
NTRODUCTION
The Optimal Power Flow (OPF) problem is fundamental topower systems operation [1]. In general, OPF finds the optimalgeneration set points that minimize operation costs given aset of loads while at the same time satisfying physical andsecurity constraints. The physicality of solutions is ensuredby enforcing the power flow equations – a set of non-linearequality constraints.The increasing penetration of distributed energy resources(DER) is posing new challenges for power system operation.Traditionally, generation schedules were updated at 5-min in-tervals [1]. Due to the variability and uncertainty of renewablegeneration, system operators are required to update generationschedules more frequently. At the same time, system operatorsneed to manage a growing number of smaller resources, re-sulting in a significantly larger problem than the conventionalOPF problem [2]. Thus, the integration of DERs requires
Demand, 𝑆 ! ℒ = 𝑐 𝑔 " 𝑆 ! + 𝑢 𝑆 ! $ 𝑔 " 𝑆 ! % Dual Variable, 𝑢 (𝑆 ! ) Differentiable
AC-OPF SolverPolicy, 𝑆 & = 𝑔 " (𝑆 ! ) ⊕ Fig. 1:
Framework:
We propose a learning-based approach toAC-OPF problem by directly differentiating through a HELMsolver. Thus, we learn a neural policy, g Θ ( S d ) , that respectspower flow equations. At the same time, We impose additionalconstraints, e.g., voltage magnitude and unit commitment, bylearning dual variables corresponding to these constraints withanother NN, u Ψ ( S d ) . Finally, we learn the policy end-to-endby optimizing the Lagrangian function, L solving larger OPF problems at shorter time intervals. Thismotivates research on using function approximators, such asneural networks (NN), to learn solutions to the OPF problem,due to their fast inference and their potential to scale well tolarger power networks [3].In order to deal with the non-linearity of the OPF problem,a common practice in power system operation is to linearizethe grid dynamics, leading to the DC-OPF approximation.The DC-OPF problem is generally formulated as either alinear program (LP) or a quadratic program (QP), dependingon the cost function. Such a linearization can work reasonablywell for small networks, but the resulting approximationerror scales poorly with network size [4]. Furthermore, theassumptions required for linearizing the power flow equationsare not valid for heavily-loaded or distribution networks [5]and the author of [6] show that even under mild assumptionsDC solutions are usually never AC feasible. Ideally, thefull non-linear AC power flow equations should be used.However, using the non-linear power flow equations comeswith another challenge. Because the non-linear AC-OPFequality constraints have a fractal nature riddled withspurious roots, which act as attractors [7], it is difficult todisambiguate physical from spurious non-physical solutions.As a result, convergence to the true solution cannot beguaranteed rendering existing solvers brittle.In this paper, we propose a learning-based approach to theAC-OPF problem that alleviates the aforementioned issuesby differentiating through a robust power flow solver, asshown in Figure 1. Our contributions can be summarized a r X i v : . [ c s . L G ] D ec OPF: LEARNING OPTIMAL POWER FLOW BY DIFFERENTIATING THROUGH HOLOMORPHIC EMBEDDINGS 2 as follows. Firstly, we show that a power flow solver basedon a holomorphic embedding of the power flow equations[8] is differentiable in its arguments. The primary benefit ofusing Holomorphic Embedded Load Flow method (HELM)is that it is much more likely to find the correct solutionif it exists. Secondly, we can use this differentiable HELMsolver as a computational layer in a NN. This allows us tolearn a neural policy that solves the OPF problem, whilerespecting the power flow constraints. Furthermore, we showhow this method can be extended to enforce convex securityconstraints, i.e., voltage magnitudes, as well as non-convexinteger constraints, i.e., unit commitment. To validate ourapproach, we empirically show that it is faster and more robust(i.e., results in physically-viable solutions more often) on a200-bus system in comparison to traditional solvers basedon mixed-integer interior point methods (MATPOWER [9]).Specifically, our proposed method achieves a 12x increasein speed and a 40% increase in robustness compared toMATPOWER. II. R
ELATED W ORK
A. Challenges of Solving AC-OPF
As described earlier, AC-OPF is traditionally posed asa constrained optimization problem, i.e. a cost function isminimized under network constraints oftentimes relying onsome form of projected gradient descent [10]. Note thatthis problem formulation faces numerous computationaldifficulties. First, the non-linear equality constraint (2) posesa challenge. In reality, (2) is a necessary but not sufficientcondition for the system to be in a physical state. There areassignments of nodal voltages v that fulfill the power flowequations that. however, do not constitute a physical state [11],[7]. Optimization algorithms based on some form of projectedgradient descent might be attracted to such a non-physicalsolution rendering them not robust. Advanced techniques likethe Homotopy [12] or Continuation [13] methods alleviatebut not fully remedy this problem, and can incur substantialcomputational costs. Furthermore, non-convex constraintscreate additional computational issues. Algorithms such asbranch-and-bound that are typically employed to deal withinteger constraints require solving multiple, and in the worstcase exponentially many, relaxed linear programs [14]. B. Learning OPF
Because larger OPF problems need to be solved morefrequently, there is growing interest in learning to solve OPFproblems with NNs. These approaches formulate OPF asthe problem of learning a policy, which maps the demandconfigurations to generation set points. This policy could belearned from historical data, i.e., imitation learning, and/or byinteracting with the system, i.e. reinforcement learning.The most common paradigm of existing solutions is to learna mapping from demand assignments to optimal generation setpoints based on a data set provided by an external solver, suchas MATPOWER [9]. This is analogous to behaviour cloning of an expert policy. In this case, the external OPF solver consti-tutes the expert policy [15]. Following [16], we refer to such aparadigm as
OPF-then-learn . Specifically, authors of [17], [3]pose AC-OPF as an end-to-end regression task. Specifically,in [3] graph neural networks (GNN) were adopted, whichmake predictions based on local information from neighboringnodes. Such an approach allows for a decomposition of thegrid-level problem based on power system connectivity and, inprinciple, scales well to large networks. However, a significantlimitation of such an approach is that the solutions may notadhere to physical and security constraints inherent to thephysical system. Concretely, 49% and 30% of the solutionswere infeasible when applied to the IEEE 30-bus and 118-bustest systems respectively [17].This has motivated research on learning-based methods thatrespect constraints. The authors of [15], [18], [19] studiedthe problem in the context of DC-OPF. Since DC-OPF caneither be posed as an LP or QP, the problem has amenablemathematical properties and permits for relatively simplesolution strategies. More relevant to our work, [20], [1] studiedthe problem in the context of AC-OPF. To ensure feasibility, in[20] generation set points produced by a NN were passed intoa power flow solver. This speeds up the computation since thepower flow equations are easier to solve than AC-OPF. In [1],the authors applied primal-dual updates to the Lagrangian dualfunction of the underlying constrained optimization problem,where a NN predicts the generation set points and the powersystem states. Notably, the approach proposed by [1] producesmore accurate and cost-effective solutions than those by DC-OPF approximation, which is commonly used in industry.However, there are key limitations to the
OPF-then-learn paradigm. Firstly, a large labeled data set needs to be generatedfor training. Given the convergence issues of solving AC-OPF, some load configurations may not have a solution fromcommercial solvers. Secondly, the learned solution may notgeneralize to unseen scenarios and re-training the systemon incoming demand assignments requires creating a newtraining set. Because of this, authors of [16] proposed
OPF-and-learn as an alternative paradigm where generation cost isoptimized directly by computing gradients through the OPFproblem. While
OPF-then-learn is analogous to imitationlearning,
OPF-and-learn can be seen as a one-step MarkovDecision Process (MDP). However, the
OPF-and-learn ap-proach proposed in [16] relies on the DC-OPF linearization,which limits the accuracy of solutions and is not applicableto heavily-loaded or distribution networks. Furthermore, whenassuming the DC-OPF linearization, creating a training set istrivial because optimal solutions can be obtained reliably andfast. Thus, by making the DC-OPF assumption, the circulardependence on training data is broken but a significantly easierproblem is solved which limits the potential advantages of the
OPF-and-learn paradigm.To the best of our knowledge, this paper introduces thefirst
OPF-and-learn approach that respects the full non-linear power flow constraints as well as non-convex unit-decomittment and non-linear physical and security constraints.
OPF: LEARNING OPTIMAL POWER FLOW BY DIFFERENTIATING THROUGH HOLOMORPHIC EMBEDDINGS 3
III. P
ROPOSED L EARNING F RAMEWORK
Let c ( v ) be the cost associated with the assignment of nodalvoltages v ∈ C N and N being the number of buses of thesystem. In the following, without loss of generality, we definethe cost in terms of v for presentational simplicity. However,numerous constraints need to be enforced, some of whichare non-linear and non-convex, i.e. the load flow problem istraditionally posed as:minimize w.r.t. v : c ( v ) (1)subject to: S = S g − S d = diag ( v )( Y v ) ∗ (2) k i ( v ) ≤ (3) h i ( v ) = 0 (4)with S d ∈ C N and S g ∈ C N being a demand and generationassignment, respectively, and Y ∈ C N × N being the busadmittance matrix. For different demand assignments S d , thisoptimization problem would be solved over and over again.Our proposed learning-based formulation following the OPF-and-learn paradigm is as follows: First, we note thatthe nodal voltages v are a function of S d and S g , i.e. v = v ( S d , S g ) with v solving the power flow equations (2).Second, we introduce a function that is tasked with producingoptimal generator configurations S (cid:48) g as a function of the systemstate, in this case the demand S d , in an expected risk mini-mization setting. Thus, S (cid:48) g = g Θ ( S d ) with Θ parameterizingthe function g . Third, we assume knowledge of a data set D containing historic demand assignments, i.e. a collectionof possible states S d ∈ D . Note that unlike OPF-then-Learn approaches [cite,cite,cite] we do not assume knowledge of thecorresponding optimal generator assignments. The goal is toestimate the parameters of the function g , in this case Θ , thatproduce optimal generation assignments as a function of thedemand. As an optimization objective and learning signal forthe function g , we propose the following:minimize w.r.t. Θ : (cid:88) S d ∈ D c ( v ( S d , g Θ ( S d ))) = L (5)subject to: k i ( v ) ≤ (6) h i ( v ) = 0 (7)When compared to traditional optimization based approaches,this problem formulation has a number of advantages:1) The non-linear power flow constraint (2) vanishes. Thisincreases robustness and avoids convergence to non-physical solutions given a differentiable and robustpower flow solver.2) As we will show later, optimization under non-convexconstraints can be amortized, i.e. time training the sys-tem is spent once and after training, inference is ex-tremely fast and solely requires a forward-pass througha NN even when respecting binary unit-commitmentconstraints.3) Because training g entails learning an optimal mappingbetween Euclidean spaces that represent demand and generation assignments respectively, the proposed ap-proach exploits covariances between load flow problemsand allows for the generalization to unseen problems.However, optimizing Θ w.r.t. (5) poses challenges: If gra-dient descent is used for optimization, gradients need to bedefined. Applying the chain rule to L yields: ∂ L ∂ Θ = ∂c∂ v ∂ v ∂g ∂g∂ Θ Thus, the loss is differentiable if the cost function c , thevoltage function v and the actor function g are differentiable. g and c can be usually assumed to be differentiable, however,the fact that computing the gradient through a power flowsolver w.r.t. to the generation assignment S g , i.e. ∂ v ∂S g , ispossible, might not be obvious. In section IV, we will showthat computing the gradient through a robust power flow solverthat represents the full non-linear AC-OPF equations, namelythe Holomorphic Embedded Load Flow Method, is indeedpossible. Doing this allows us to obtain the learning signal ∂ L ∂ Θ .Furthermore, the constraints i.e. (6) and (7) need to be en-forced. In section V, we will show that an auxiliary function u can be used to enforce the Karush-Kuhn-Tucker conditionsultimately allowing us to enforce arbitrary constraints similarbut not identical to the approaches introduced in [1], [16].In section VI, we show how the proposed approach can beextended to handle non-convex, e.g. binary unit commitmentconstraints, by optimizing a variational lower bound whilstintroducing little computational overhead during inference.The approach is validated empirically on a 200 bus system.The experimental setup and results are described in section5. In section 6, our findings are concluded and pathways forfuture work are laid out.IV. H OLOMORPHIC E MBEDDED L OAD F LOW M ETHOD
HELM was first proposed by Trias [8], [21] and was laterextended in e.g. [22], [23]. HELM addresses the convergenceissues of power flow solvers based on Householder’s methodsfor root-finding such as the Newton-Raphson algorithm. Be-cause the power flow equations have multiple roots but onlya single physically realizable solution, these types of solversare at risk to converge to a spurious, unstable or low voltagesolution [24]. Whether or not the ‘correct’ root is being foundis usually dependent on the initial condition [7]. BecauseHELM does not require an initial guess or initial condition, itovercomes the ambiguity problem that Householder’s solversface, by performing analytical continuation from a knownphysically-realizable solution. Analytical continuation also al-lows HELM to find the root that is on the same branch-cutas the previously known physically-realizable solution. Thissolution is unique because analytical continuation is uniquewhen the function at hand is holomorphic. HELM treats thecomplex nodal voltages at each bus as holomorphic functionsof a complex scalar z . These functions are evaluated at a pointfor which obtaining a physically-realizable solution is trivial(usually z = 0 ) and are then continued to the solution ata desired point where the original power flow equations arerecovered (usually z = 1 ). OPF: LEARNING OPTIMAL POWER FLOW BY DIFFERENTIATING THROUGH HOLOMORPHIC EMBEDDINGS 4
Let V ( z ) be a function of the complex scalar z . Equation (8)then describes such a holomorphic embedding, i.e. obtaininga solution at z = 0 is trivial because no power is flowing andthe original power flow equations (2) are recovered at z = 1 .See [23] for a proof that V ( z ) is indeed holomorphic. Y V ( z ) = zS ∗ V ∗ ( z ∗ ) (8)In order to obtain the power series coefficients required foranalytical continuation, V ( z ) and its reciprocal are approxi-mated by a power series expansion, i.e.: V ( z ) = ∞ (cid:88) n =0 c [ n ] z n (9) V ( z ) = W ( z ) = ∞ (cid:88) n =0 d ∗ [ n ] z n (10)Similar to traditional power flow solver such as NewtonRaphson, in order to avoid overspecification of the problem, aslack bus is introduced: Let Y r ∈ C N − × N − be the reduced Y matrix by removing the row and column of the slack busand y s ∈ C N − be the slack-row of Y sans self-admittance.We assume that the voltage at the slack generator is v s + 0 j with v s ∈ R .For the i th row of (8) the following then holds: (cid:88) k Y rik ∞ (cid:88) n c k [ n ] z n + ( v s + 0 j ) y s = zS ∗ i ∞ (cid:88) n d ∗ i [ n ] z n (11)Setting z = 0 : (cid:88) k Y rik c k [0] = − v s y s (12)Thus, solving the linear system in (12) yields a solution at z = 0 . Note that this solution is physically-realizable becauseno power is flowing. Higher order power series coefficientscan be obtained by equating coefficients of the same orderand by making use of ( (cid:80) ∞ n =0 c [ n ] z n )( (cid:80) ∞ n =0 d ∗ [ n ] z n ) = 1 which yields: d k [0] = 1 c k [0] (13) (cid:88) k Y rik c k [ n ] = S ∗ i d i [ n − (14) d i [ n ] = − (cid:80) n − m =0 c i [ n − m ] d i [ m ] c i [0] (15)After obtaining power series coefficients, analytic continuationis performed to obtain a solution at z = 1 . However, sincethe radius of convergence is usually smaller than 1, analyticalcontinuation is performed using Pad´e approximants instead ofdirectly evaluating (9). Pad´e is a rational approximation ofpower series known to have the widest radius of convergence[25]. Analytical continuation by Pad´e approximation is per-formed as follows: V i ( z ) ≈ R ( z ) = (cid:80) mj =0 a i,j z j (cid:80) mk =1 b i,k z k (16) Approximants of order m , i.e. a i and b i can be obtained fromthe power series coefficients by solving a linear system ofequations, specifically: (cid:2) I M ( c i ) (cid:3) (cid:20) a i b i (cid:21) = c i with: M ( c i ) = . . . − c i [1] 0 . . . − c i [2] − c i [1] 0 . . . − c i [2] − c i [2] − c i [1] 0 . . . ... ... ... ... ... ... − c i [ n ] − c i [ n − − c i [ n − . . . − c i [ m ] and I being the identity matrix. Because we perform analyt-ical continuation to z = 1 , plugging the obtained coefficientsinto (16) yields: V i ≈ (cid:80) mj =0 a j,i / (1 + (cid:80) mj =0 b j,i ) . A. Differentiating through HELM
In the following we will view HELM as a function thatmaps complex nodal power to complex nodal voltages, i.e. v = v ( S d , g Θ ( S d )) . We will show that v is not holomorphic but R -differentiable in Θ ultimately allowing us to compute gradientsw.r.t. the parameters of the actor function g . By making useof the chain-tule, the strategy is to decompose HELM intoa succession of functions and show that each function is R -differentiable. Specifically, we decompose HELM into its algo-rithmic steps, i.e. v ( S d , g Θ ( S d )) = f v ◦ f ab ◦ f c,n ( S d − g Θ ( S d )) with f c,n computing power series coefficients, f ab computingPad´e approximants and f v computing voltage phasors givenPad´e approximants. We then show that f ab , f c,n and f v are R -differentiable. Note that f c,n is a recursive function and thatwriting its gradient out would be tedious. But its gradients canbe computed efficiently using the backpropagation algorithmand the implementation is trivial in any deep learning frame-works with automatic differentiation capability, e.g. PyTorch and
TensorFlow .As stated earlier, HELM first computes the power seriescoefficients followed by Pad´e approximation. The power seriescoefficients c [ n ] and d [ n ] are obtained in alternating fashion:Let f c,n and f d,n be the function that produces c [ n ] and d [ n ] respectively. Note that f c,n requires knowledge of the previous d -coefficient and g Θ , whereas f d is a function of all previous c - and d -coefficients:corrs. to (14): f c,n ( x ) = ( Y r ) − f d,n − ( x ) x ∗ (17)corrs. to (15): f d,n ( x ) = (cid:80) n − m =0 f c,n − m ( x ) f d,m ( x ) f c, ( x ) (18)Because of the complex conjugation in (17), v is not holomor-phic in x , and Θ , if x = S d − g Θ ( S d ) . However, it is easy tosee that, by induction, (17) and (18) are R -differentiable when f c, and f d, are R -differentiable which is easy to see from(12) and (13).After obtaining the power series coefficients, Pad´e approxi-mants a and b are calculated. Note that this also only includessolving a linear system of equations, i.e. f ab ( x ) = (cid:20) ab (cid:21) = (cid:2) I M ( x ) (cid:3) − x OPF: LEARNING OPTIMAL POWER FLOW BY DIFFERENTIATING THROUGH HOLOMORPHIC EMBEDDINGS 5 which is differentiable. Then f v includes only a summationand fraction, i.e: f v ( (cid:20) ab (cid:21) ) = m (cid:88) i =0 a i / (1 + m (cid:88) i =0 b i ) Although, v ( x ) = f v ( f ab ( f c,n ( x ))) is not holomorphic, it is R -differentiable in its argument x and, when applied to x = S d − g Θ ( S d ) , it is R -differentiable in Θ .V. E NFORCING C ONSTRAINTS
A. A priori constraints
As stated earlier, we treat the generation assignment S g as the output of a parameterized function g Θ . Because ofthe reasoning laid out earlier, we require g to be differen-tiable and because of recent successes of NNs in non-linearoptimization, we choose g to be a NN with a penultimatesigmoidal layer [20]. We incorporate the generation limits ofthe generators into the output layer of the NN and thereforeenforce generation limits by construction. Let σ ∈ (0 , N g bethe penultimate layer with N g being the number of generatorbuses. Thus, every generator is associated with two neurons,i.e.: g Θ ( S d ) i = ( S g ) i = ( P maxi − P mini ) σ i + P mini + j ( Q maxi − Q mini ) σ i + N g + jQ mini with P maxi , P mini , Q maxi and Q mini being the active and re-active generation limit respectively. Because σ is bounded by (0 , non-slack generation limits cannot be violated. However,other constraints such as e.g. voltage magnitude or thermalline limits are not enforced by construction. That is why, in thenext section we show how to enforce what we call a posterioriconstraints, i.e. constraints whose violation is only known afterevaluating v . B. A posteriori constraints
We adapt ideas from mathematical optimization to enforcearbitrary constraints on v . In mathematical optimization, theKarush-Kuhn-Tucker conditions (KKT-conditions) are neces-sary conditions for a solution to be optimal [26]. Given theoptimization problem (1) expressed in terms of v , the KKTconditions state that a solution v (cid:48) is locally optimal under someregularity conditions when there exist µ i such that: • ∀ i µ i ≥ (Dual feasibility) • ∀ i µ i k i ( v (cid:48) ) = 0 (Complementary slackness) • ∀ i k i ( v (cid:48) ) ≤ (Primal feasibility) • ∇ f ( v (cid:48) ) + (cid:80) i µ i ∇ k i ( v (cid:48) ) (Stationarity)Note that, without loss of generality (because any equalityconstraint can be expressed as two inequality constraints)and for notational convenience, we restrict the optimizationproblem to only have inequality constraints.However, as stated earlier, we are not interested in thesolution of a single constraint optimization problem but insteadin solutions to all instances of a class of optimization problem.In this case, S d , i.e. the demand assignment, specifies theinstance of the optimization problem whereas the networktopology, i.e. admittance matrix Y , specifies the class. First, we note that the KKT-multipliers are dependent on the instance ofthe optimization problem, thus instead of introducing a scalar µ i , we introduce a scalar-valued function u ψ ( S d ) . In order toenforce dual feasibility by construction, we choose u to be aNN with soft-plus output parameterized by ψ . Furthermore,let g S d Θ = v ( S d , g Θ ( S d )) , ( u S d ψ ) i = u ψ ( S d ) i the i th outputof u and k + i ( v ) = max( k i ( v ) , . We will now introduce alearning criterion and show that local optima of this criterionfulfill the KKT-conditions for instances of the class containedin the training set. As a learning criterion we propose: L ( S d ) = c ( g S d Θ ) + (cid:88) i ( u S d ψ ) i k + i ( g S d Θ ) (19) ∀ S d ∈ D max ψ { min Θ {L ( S d ) }} (20)We will now show that, after convergence, for all S d ∈ D , v (cid:48) = g S d Θ is locally optimal under some regularity constraints,i.e. it fulfills the KKT-conditions and furthermore, that theKKT-multipliers for which the KKT-conditions hold are: µ i = (cid:40) ( u S d ψ ) i if k i ( g S d Θ ) = 00 else (21) • Dual feasibility: µ i is dual feasible by construction: it iseither or greater than because it is the output of asoft-plus NN. • Complementary slackness: Follows directly from (21) • Primal Feasibility: Since (20) converged, we know that ∂L∂ ( u Sdψ ) i = 0 and since ∂L∂ ( u Sdψ ) i = k + i ( v (cid:48) ) = 0 , v (cid:48) must beprimal feasible. Or in other words: if v (cid:48) was not primalfeasible, k + i ( v (cid:48) ) > but then the maximization step of(20) could have increased L by increasing µ i which is acontradiction to the assumption that (20) has converged. • Stationarity: Follows directly from the assumption that(20) has converged. Note that substituting k + i for k i doesnot have an influence because if k i ( v ) (cid:54) = 0 then thecorresponding µ i = 0 (complementary slackness) andwhen k i ( v ) = 0 then ∇ k i ( v ) = ∇ k + i ( v ) Note that the detour of substituting k i ( v ) for k + i ( v ) im-proves the performance substantially. Without the substitution,because more constraints are complied with initially, the NNdrives the outputs before the soft-plus non-linearity to −∞ inorder to make the corresponding µ i equal to 0. The outputunits are then ‘dead’ and, because the gradient of the outputnon-linearity is close to 0, will always stay 0. C. Enforcing Physicality
So far, we have shown how to enforce ‘a priori’-constraints,i.e. constraints whose violation is known before inferring nodalvoltages, by construction, as well as ‘a posteriori’-constraints,i.e. constraints whose violation is known after inferring nodalvoltages, by introducing a learning objective that, after con-vergence, will enforce the KKT-conditions. However, we havenot yet shown how to keep the function g in the physicalregime, i.e. prevent g from producing a generation assignment S g for some S d such that there is no v that fulfills the powerflow equations (2). An extreme example of a non-physical OPF: LEARNING OPTIMAL POWER FLOW BY DIFFERENTIATING THROUGH HOLOMORPHIC EMBEDDINGS 6 α l o g ( † ) Fig. 2: The log -mismatch (cid:15) as a function of α , i.e. (cid:15) ( v ( S d , αS g )) . α scales a physical solution, i.e. when α = 1 the corresponding (cid:15) is small. The number of HELM iterations n is color coded.tuple ( S d , S g ) , for any demand assignment S d for which (cid:80) i real ( S d ) i > is S g = (cid:126) .First, we note that HELM will always produce complexnodal voltages even for non-physical tuples. However, for non-physical tuples the power flow equations (2) will not hold, i.e.there is a mismatch between the RHS and LHS of (2). Wequantify this mismatch by defining: (cid:15) ( v ) = || S g − S d − diag ( v )( Y v ) ∗ || ∞ The goal now is to enforce that (cid:15) ( v ) < ξ with ξ beingsome parameter which specifies when a power flow solutionis deemed physical. Note that because (cid:15) is a function of v , in principle, an additional inequality constraint could beintroduced, i.e. k i ( v ) = (cid:15) ( v ) − ξ ≤ and one could try toenforce this constraint as an a posteriori constraint as describedearlier. However, in our experience this approach struggles,i.e. the learning objective usually does not converge. Figure 2gives an intuition as to why this is the case. Figure 2 shows log( (cid:15) ) as a function of α on a 200 bus system. α scales thegeneration S g of a physical tuple ( S d , S g ) , i.e. the y-axis shows log( (cid:15) ( v ( S d , αS g ))) . Note that when α is either small or big( < . or > . ), (cid:15) is close to flat and therefore the gradient of (cid:15) is close to 0. After randomly initializing the the parametersof the function g , its guesses about optimal generation assign-ments will naturally be bad which corresponds to scaling theoptimal generation assignment with a small or big α . However,the function cannot improve its guesses by gradient descentbecause the gradient will be close to 0.In order to overcome this problem, we propose to optimizea proxy of the actual mismatch function (cid:15) . Note that anindicator of whether or not a solution is physical is whetheror not the power series coefficients c i [ n ] have converged to0. Let ¯ c [ n ] be the mean n th power series coefficient of allvoltages, i.e. ¯ c [ n ] = (cid:80) i c i [ n ] /N . Figure 3 shows a scatter-plot of log ¯ c [ n ] and log (cid:15) . Empirically, one can see that small ¯ c [ n ] is a sufficient condition for small (cid:15) , however not a
25 20 15 10 5 0 5 10log( ¯ c [n])35302520151050510 l o g ( † ) Fig. 3: Small log ¯ c [ n ] is a sufficient condition for small log( (cid:15) ) .Specifically, when log ¯ c [ n ] > − then − < log( (cid:15) ) < − .However, the opposite is not true, i.e. small log( (cid:15) ) does notbound log ¯ c [ n ] .necessary condition. That is, a small ¯ c [ n ] implies small (cid:15) but not vice versa. Thus in order to enforce physicality, ¯ c [ n ] can be minimized as a proxy for (cid:15) . However, one mightthink that optimizing log ¯ c [ n ] is unnecessarily restrictive, i.e. itexcludes solutions where the power series coefficients did notconverge to 0 but the corresponding v nevertheless fulfill thepower flow equations. But, as we will show later, imposingvoltage magnitude constraints naturally enforces physicalityand additionally minimizing log ¯ c [ n ] is only required after theactor function g was first initialized in order to ‘nudge’ the itinto the physical regime.VI. B INARY C ONSTRAINTS
Binary constraints naturally occur in optimal power flowwhen incorporating the possibility of completely shuttingdown generators. Introducing this constraint also known asunit commitment makes generation limit constraints non-convex, i.e. becomes a possible generation assignment, eventhough points between and P min are not valid. Typically,optimal power flow solvers employ mixed integer program-ming techniques such as branch and bound or branch andcut [14] algorithms to tackle this problem. However, thesealgorithms can incur substantial computational cost, i.e. everybranch requires solving an LP relaxed optimal power flowproblem and there are exponentially-many branches in a worst-case scenario.However, using the problem formulation introduced here,because the constraint is a priori and can be enforced byconstruction, we can reduce the non-convex constraint intothe problem of inferring the mode of a probability distribution P over binary configurations. As we will show later, becauseinference in P is intractable, we optimize a variational bound,i.e. we introduce a variational distribution Q φ for which poste-rior inference is tractable and choose the variational parameters φ in such a way that Q φ best approximates P . Specifically, OPF: LEARNING OPTIMAL POWER FLOW BY DIFFERENTIATING THROUGH HOLOMORPHIC EMBEDDINGS 7 we built on recent advances in Bayesian inference, specificallyVariational Inference [27] and train a variational distribution Q φ parameterized by a NN. See [28], [29] for recent reviewsof Variational Inference.We begin by showing that computing the optimal binaryconfiguration is equivalent to computing the mode of a dis-tribution P . Let b ∈ { , } N g be the vector describing whichgenerators are turned on or off and p ( b, S d ) be an exponentialdistribution, p ( b | S d ) its posterior (Boltzmann distribution) and L be the loss as defined in (19), i.e. p ( b, S d ) = λ exp − λL ( b · S d ) (22) p ( b | S d ) = exp − λL ( b · S d ) (cid:80) b (cid:48) ∈ B exp − λL ( b (cid:48) · S d ) (23)It is easy to see that computing the mode of (23), i.e. arg max b p ( b | S d ) is equivalent to choosing the binary con-figuration that results in the smallest loss. However, na¨ıveevaluation of the mode is usually intractable, because of theintractable denominator. Na¨ıvely computing the mode of (23)is equivalent to brute-force search, i.e. enumerating all possiblelatent configurations and picking the one with the smallesterror. However, in the following, we show how ideas fromVariational Inference [27] can reduce the computational burdenof inference.We introduce a variational distribution Q φ whose posterioris tractable. Specifically, we choose q ( b | S d ) to be a multi-variate Bernoulli distribution and ensure tractability with ideasintroduced in [30]. Note that Q φ is parameterized with a NN,therefore ensuring that inference at test-time is fast. As alearning signal for the parameters of the auxiliary posteriordistribution φ , we choose the Evidence Lower Bound definedby: L BO ( φ ) = E q φ ( b | S d ) log p ( b, S d ) q φ ( b | S d ) (24) = log p ( S d ) − D KL ( q φ ( b | S d ) || p ( b | S d )) (25)Note that optimizing (24) does not require knowledge ofthe intractable posterior of P but nevertheless allows forminimizing a divergence measure between the true ( P ) andauxiliary posterior ( Q ). Thus, after training, in order to obtainan approximation of the mode of P , because P and Q will bemaximally similar, posterior inference is performed on Q in-stead. However, the price for this ‘trick’ is increased variance.It can be shown that the stochastic gradient estimator of (24)w.r.t. φ is an unbiased but higher variance estimator of the KL-divergence [31]. In order to combat variance, a decades-oldvariance reduction technique is employed, namely samplingwithout replacement. Sampling without replacement from Q isnot trivial. However, there is a considerable body of preexistingwork that we make us of. The sampling scheme introducedin [32] with slight modifications is employed. Specifically,instead of using the Pareto sampler as the underlying samplingmechanism, a slightly slower but more accurate eliminationsampler introduced in [33] is used.In order to obtain an approximation of the mode of the trueposterior, because Q allows for drawing samples efficiently, Fig. 4: A graphical depiction of the LOPF-pipeline. NNs b and g are fed the complex demand S d and tasked withproducing the optimal binary activation vector and generatorconfiguration respectively. Because the voltages are a functionof demand and generation, both are fed into the HELMbased power flow solver v . The loss L is computed basedon the resulting voltages. In order to ensure that control andnetwork inequality constraints are satisfied, a third NN istasked with predicting Lagrange multipliers. Because HELMis differentiable, the whole pipeline can be optimized jointly. S -many samples are drawn from Q . Then, in order to approx-imate the mode of P , out of the S -many binary configurationssampled from Q , the one which results in the smallest gen-eration cost is chosen. Note that the optimal configuration ofgenerators is dependent on the binary configuration, thus b should additionally be fed into the actor function g . Figure 4shows a graphical depiction of the proposed data pipeline.VII. T HE LOPF-
ALGORITHM
In this section we summarize the resulting algorithm, wecall L earning O ptimal P ower F low, or short LOPF. Thealgorithm iterates over batches of the data set D makingupdates to the three constituent NNs g , b and u . It is describedin pseudo-code in Algorithm 1. For notational convenience, wedefine a function solve : S g = b · g Θ ( S d , b ) solve ( S d , b ) = (cid:15) ( v ( S d , S g )) c ( v ( S d , S g )) + (cid:80) i ( u ψ ( S d )) i k + i ( v ( S d , S g )) N − (cid:80) i ( f c,n ( S d − S g )) i T Figure 5 shows a graphical depiction of the input/outputrelationships of the individual networks. Note that the network g not only produces active and reactive generation assignmentsfor non-slack generators but also the voltage at the slack bus.Furthermore, the magnitude by which constraints are violated,denoted by k , are fed into the network u that produces a proxyof the Lagrange multipliers. Additionally feeding k into the u -network eases and speeds up learning considerably.VIII. E XPERIMENTS
Since this work introduces a learning based approach tothe problem of ACOPF, the performance of the algorithm isevaluated similar to how the performance of reinforcementlearning agents is evaluated, i.e. an empirical evaluationstrategy is employed. Specifically, given a held out test set ofload flow problems that the system was not presented withduring training, the generation cost and the result of whetheror not the system was able to find a feasible solution arerecorded.
OPF: LEARNING OPTIMAL POWER FLOW BY DIFFERENTIATING THROUGH HOLOMORPHIC EMBEDDINGS 8
Algorithm 1:
LOPF-Algorithm in pseudo-code input : data set D output: Trained model parameters Θ , φ and ψ Initialize Θ , φ and ψ randomly; while not converged do for number of subsets of D do select d ⊂ D ; for S d ∈ d do G = {} ; G = {} ; ; B ∼ q ( b | S d ) (without replacement); for b (cid:48) ∈ B do (cid:15), L, c ← solve ( S d , b (cid:48) ) ; if (cid:15) < ξ then G ← G ∪ { L } ; else G ← G ∪ { c } ; Compute L BO based on (24); Maximize L BO w.r.t. φ ; Maximize (cid:80) L ∈ G L w.r.t. ψ ; Minimize (cid:80) L ∈ G L w.r.t. Θ ; Minimize (cid:80) c ∈ G c w.r.t. Θ ; Active Power Generation Reactive Power Generation vS d Binary Configuariong a) Binary Configuration 1S d b b) ....Binary Configuration S c) S d k u u Fig. 5: The input and output relationships of the three con-stituent NNs. a) The NN produces active and reactive powergeneration for non-slack generators as well as the voltage atthe slack bus given the demand S d and binary configurationproduced by the b -network. b) The binary-network that pa-rameterizes an auxiliary distribution. Note that the networkproduces multiple binary configuration by sampling from theauxiliary distribution. c) The Lagrange-network that producesa proxy of the Lagrange multipliers. Note that the constraint-violation magnitude k is additionally fed into the network toease learning.The requirements for feasibility excluding those that are metby construction are the following: • Log-mismatch between the RHS and LHS of the powerflow equations (2), i.e. (cid:15) , must be smaller than − . • Slack active and reactive generation are within limits • Non-slack voltage magnitude constraints are metThe experiments were conducted on the 200 bus IllinoisIEEE test case [9]. However, since the IEEE test cases only
LOPF MIPS
Feasible [%] 99.86 60.85Mean Cost [USD] 33325.98 25817.55Mean time per Instance [s] 1.2 14.4
TABLE I: Comparison of LOPF in terms of robustness,optimality and speed on a held-out test set in comparison toMIPS.contain a single demand assignment, the demand base casewas superimposed by temporal patterns extracted from the REEurope data set [34]. RE Europe data set contains historicaldemand for 3 years at an hourly interval. Let S d (cid:48) ∈ C bethe base demand taken from test case and x t ∈ R × be the temporal demand patterns taken from RE Europe dataset. The temporal patterns were imposed such that the meandemand of every node is equal to the demand in the test caseand such that the ratio between mean and standard deviationas seen in the RE Europe data set is preserved. The data setwas separated into training (20.280 data points) and test set(6000) when conducting experiments.The NNs used in this experiments constitute standard fullyconnected three-layer networks with intermediate tanh acti-vations. All intermediate layers have hidden units.IX. R ESULTS
As stated earlier, learning based approaches to controlproblems are usually not guaranteed to be optimal but can offeradvantages in terms of computational time and robustness. Theperformance of LOPF reinforces these expectations. Figure 7(left) shows the percentage of load flow solutions producedby the system that violate any requirement for feasibility asa function of learning steps. One learning step encompasses32 load flow problems. For all of the 32 load flow problems50 candidate binary configurations are drawn from the auxil-iary distribution. One can see that the system quickly learnsto produce feasible solutions. Initially the system producesfeasible solutions to no load flow problems. However, afterjust 30 steps close to all solutions proposed by the system arefeasible.Figure 7 (right) shows the log -mismatch between the RHSand LHS of the power flow equations (2), i.e. log (cid:15) . Note thatinitially, the proposed approach produces generation assign-ments for which the HELM solver is unable to produce voltagephasors that fulfill the power flow equations but by minimizingthe power series coefficients as described in section V-C, thesystem is quickly nudged into a regime where the proposedsolutions fulfill the power flow equations. However, afterapproximately 75 learning steps, for a short period of time,the system produces generation assignments that, again, donot fulfill the power flow equations. This can most likely beexplained by the fact that the system also tries to minimizecost. Thus, by trying to find cheaper generation assignments,the system left the regime in which solutions can be found byHELM but was then steered back into this regime.Table I showcases the performance of our proposed algorithmin comparison to the MIPS solver proposed in [9]. In order
OPF: LEARNING OPTIMAL POWER FLOW BY DIFFERENTIATING THROUGH HOLOMORPHIC EMBEDDINGS 9 G e n e r a t i o n C o s t [ U S D ] MIPSLOPF
Fig. 6: Comparison of generation cost on the first 300 load flow problems of the test set. F e a s i b l e [ % ] l o g e p s il o n Fig. 7: Left: The percentage of feasible solutions as a function of learning steps. Right: The average log mismatch of the powerflow equations (2) as a function of time.to deal with non-convex generation limit constraint, the MIPSsolver was run with a unit-decommitment heuristic (runuopf).When obtaining the results for the MIPS solver, all initial-izations were unchanged and only demand was varied in theway described earlier. Slightly varying the demand revealsthe weakness of traditional solvers: Convergence to a feasiblesolution cannot be guaranteed. In our experiments, the MIPSsolver produced solution which comply with all constraintsand fulfill the power flow equations in only about 61% ofall problem instances (failure in 2349 out of 6000 instances).Our proposed solution produces feasible solutions for 99.86%(failure in 8 out of 6000 cases) of the problem instances. On top of that, our proposed learning based approach is con-siderably faster than optimization based approaches: Becausesolutions can be obtained by feeding a demand assignmentthrough the NNand the forward pass through NNs is usuallyfast, obtaining the generation assignment proposed by thesystem is fast. Note that when we report the time per instancefor the MIPS solver, we report the mean-time over all loadflow problems. However, when the solver fails, it usually failsquickly. If only the time per successful instance was reported,the mean time per instance of the MIPS solver would be closeto 30s per instance.However, Table I and Figure 6 more clearly reveal the mainweakness of our proposed learning based approach. Eventhough solutions can be obtained robustly and fast, the pro-posed learning system does not find solutions that are optimalin terms of generation cost. On average, the solutions that When LOPF fails, it slightly violates voltage magnitude constraints. the approach produces are approximately 29% more expensivethan the solutions found by the MIPS solver. Note that theaverage cost is reported for only those load flow problems forwhich both approaches yielded feasible solutions.X. C
ONCLUSION AND FUTURE WORK
The main contribution of this paper is the introduction ofa learning based framework for the problem of ACOPF thatrespects the full non-linear ACOPF equations. Specifically,we introduce a learning based approach in which a functionis tasked to produce feasible and minimal cost generationassignments as a function of the demand. A learning signalfor this function is obtained by differentiating through theoperators of a load flow solver. Furthermore, we show howconvex security constraints and non-convex generation limitconstraints can be enforced. The resulting system seems toproduce feasible solutions fast. However, these solutions arenot necessarily optimal in terms of generation cost.An obvious future research path is to close the optimalitygap. At this moment, because of the complexity of theresulting system, it is hard to understand why the solutionsare not optimal. But note that the proposed algorithm cannotbe optimal by design, because the slack generator cannotbe decommitted and there is a natural interpolation betweenload flow solutions. However, in the opinion of the authors,performance gains in terms of optimality should be possible.Another potentially interesting research question is whetheror not the trained auxiliary distribution Q that learns the costsurface as a function of the binary generator configurationallows for conditional sampling. Imagine a scenario where OPF: LEARNING OPTIMAL POWER FLOW BY DIFFERENTIATING THROUGH HOLOMORPHIC EMBEDDINGS 10 generators have failed. In such a scenario, it is paramountto reconfigure the network in a feasible state fast. If itis possible to sample from Q conditioned that the failedgenerators are off, then the proposed learning based approachcould potentially find application in emergency and securitysensitive situations. Note that this seemingly easy problemis not trivial because of the FactorNet [30] structure of theauxiliary distribution. Rconditioned that the failedgenerators are off, then the proposed learning based approachcould potentially find application in emergency and securitysensitive situations. Note that this seemingly easy problemis not trivial because of the FactorNet [30] structure of theauxiliary distribution. R