Balancing Wind and Batteries: Towards Predictive Verification of Smart Grids
BBalancing Wind and Batteries: TowardsPredictive Verification of Smart Grids (cid:63)
Thom S. Badings , Arnd Hartmanns , Nils Jansen , and Marnix Suilen Department of Software Science, Radboud University, Nijmegen, The Netherlands University of Twente, Enschede, The Netherlands
Abstract.
We study a smart grid with wind power and battery storage.Traditionally, day-ahead planning aims to balance demand and windpower, yet actual wind conditions often deviate from forecasts. Short-term flexibility in storage and generation fills potential gaps, planned ona minutes time scale for 30-60 minute horizons. Finding the optimal flexi-bility deployment requires solving a semi-infinite non-convex stochasticprogram, which is generally intractable to do exactly. Previous approachesrely on sampling, yet such critical problems call for rigorous approacheswith stronger guarantees. Our method employs probabilistic model check-ing techniques. First, we cast the problem as a continuous-space Markovdecision process with discretized control, for which an optimal deploymentstrategy minimizes the expected grid frequency deviation. To mitigatestate space explosion, we exploit specific structural properties of the modelto implement an iterative exploration method that reuses pre-computedvalues as wind data is updated. Our experiments show the method’sfeasibility and versatility across grid configurations and time scales.
Electricity grids need to constantly maintain a balance between power supplyand demand; imbalances result in frequency deviations, which ultimately lead tocritical events like blackouts [22]. The increasing deployment of renewable energysources such as solar and wind power—which react sharply to hard-to-predictweather conditions—makes maintaining the balance increasingly difficult. Inday-to-day operation, the balancing is managed at two time scales. One dayahead, the transmission system operator (TSO) schedules conventional generatorsto match the predicted demand minus the expected renewable generation basedon weather forecasts. During the day, the TSO fills potential gaps introducedby any mismatch between forecast and actual weather conditions by ancillaryservices , which run on a short-term schedule that is updated every few minutes.On the supply side, the most prominent ancillary service are spinning re-serves from generators. They traditionally compensate for contingencies (such asgenerator failures) and deviations from the predicted demand. To free capacityfor spinning reserves, some generators operate below their rated capacity, makingthem a costly service. The TSO’s day-ahead plan thus makes a tradeoff between (cid:63)
This research has been partially funded by NWO grants NWO OCENW.KLEIN.187and NWA.1160.18.238, and NWO VENI grant no. 639.021.754. a r X i v : . [ ee ss . S Y ] F e b T.S. Badings, A. Hartmanns, N. Jansen, and M. Suilen allocating sufficient reserves to mitigate any potential imbalance and minimizingunused generator capacity [35].
Demand-side flexibility [3], on the other hand,is provided by various assets connected to the grid, including batteries [18] andHVAC systems [11,39]. They can reduce or increase the overall power consump-tion at some time point by injecting or withdrawing electricity into or fromthe grid. Such flexibility-based services shift power consumption in time ratherthan changing the total [21]. Examples today include the ODFM service in theUnited Kingdom [26] and various applications of demand response throughoutthe world [1]. In this paper, we approach the fundamental challenge of short-termscheduling for flexibility-based ancillary services:Given a power grid with significant uncertain wind power generation,optimally schedule the deployment of the available ancillary services overa finite horizon to minimize the expected total grid frequency deviationwithout violating any hard constraints on grid stability and operation.Hard constraints include a maximum frequency deviation (we use ± . scenario optimization approach [23,34], which linearizesthe nonconvex constraints and solves the resulting linear (but still semi-infinite)stochastic optimization problem up to some statistical confidence and error.The drawbacks of this approach are the approximation error introduced by thelinearization step and the statistical error due to the use of sampling [9]. Our contribution.
To overcome the need for both sampling and linearization, wemodel the problem as a Markov decision process (MDP) [31]. MDPs combineprobabilistic choices, which we use to follow a white-box DTMC for the winderrors, and nondeterminism, which we use to capture the service deploymentdecisions to optimize over. A direct cast of the problem into an MDP would yieldcontinuous state and action spaces: state variables would represent continuousquantities (e.g. grid frequency), and control decisions would range over real-valuedintervals (e.g. charge current applied to a battery). We thus (i) discretize thecontrollable values into finitely many control actions . Since (ii) control decisionsare only made every few minutes, the model is discrete-time. Further, (iii) windconditions at the current time are known, so there is a single initial state for every(iv) finite horizon. The combination of (i)-(iv) entails that we can only reacha finite subset of the continuous state space. Thus we can build an MDP withfinitely many states and actions. We use the original (non-linearized) continuousdynamics of the power grid to compute the successor state following an action. Acost function penalizes frequency deviations; violations of hard constraints lead owards Predictive Verification of Smart Grids 3 to absorbing non-goal states. An action selection strategy that minimizes theexpected accumulated cost to reach the time horizon then defines an optimaldeployment of ancillary services. We track the time horizon, making the MDPa directed acyclic graph in theory and a tree in practice. In the MPC loop,we only need the action selected in the initial (current) state; when time hasadvanced to the next control decision, we have a new initial state (based onmeasured wind conditions in reality and sampled from the wind error DTMC inour experiments) from which to repeat the procedure. In particular, a large partof the new iteration’s MDP had already been computed in the previous iteration;we only need to add one more layer for the advanced time horizon. We presentthe continuous-state dynamics of the power system in Sect. 3, explain the formaland technical details of our MDP-based approach in Sect. 4, and report on anexperimental evaluation in Sect. 5.Our approach has three key advantages: We (1) obtain a strategy that is sound w.r.t. the physical constraints, i.e. it is guaranteed to satisfy the battery, generator,and ramping capacities (but not necessarily the frequency and transmission limits).The same cannot be guaranteed with scenario optimization due to the linearizationand statistical error. We (2) exploit the tree structure of the MDP to speed upcomputations in the MPC loop; and (3) by relying on existing probabilistic modelchecking technology, the approach is easy to extend , e.g. with multiple objectives,unreliable communication, or demand uncertainty. Its main drawbacks are that,while sound and optimal for the discrete MDP, the computed strategy is soundbut may not be optimal for the continuous model : an optimal strategy in thecontinuous model may require a control input that lies between the discreteoptions of the MDP. Moreover, the MDP’s state space grows exponentially withthe time horizon and precision of the discretization. We investigate the effects ofvarying degrees of discretization and time horizons on the quality of the scheduleand the tractability of the problem in our experimental evaluation.
Related work.
Previous studies of demand-side flexibility consider e.g. vehicle-to-grid [18,41] and buildings-to-grid integration [20,32,33,39]. As renewable genera-tion is primarily decentralized, regional congestion is an issue [6,30]; congestionmanagement under uncertain generation was studied in [14,17,27]. The majorityof the previously cited works use continuous-state models. Several also apply theMPC pattern [19,34,39] and generally state the optimization as an optimal powerflow problem [22]. As mentioned, scenario optimization [8,9,23] is sampling-based;in a different approach to sampling, [2] uses Monte Carlo tree search for optimalpower flow in the presence of many distributed energy resources.When it comes to Markov models, [15] uses MDPs for optimal storage schedul-ing, while [12] computes MDP-based optimal charging strategies for electricvehicles. Probabilistic safety guarantees have been formally verified on DTMCmodels [29,37], i.e. MDPs where a fixed control strategy is embedded in the model.[16] studies decentralized protocols in solar panels to stabilize grid frequency.Finally, we mention that our way of deriving the MDP is similar to the approachof the StocHy tool [10] and its predecessor FAUST [38], which however lacksupport for costs/rewards and do not implement our efficient MPC loop. T.S. Badings, A. Hartmanns, N. Jansen, and M. Suilen
31 2 GeneratorWind farm Battery b , b , b , P load1 P load2 P load3 P stor R stor P wind P gen R gen Fig. 1: 3-node example electricity grid. A discrete probability distribution over a finite set X is a function µ : X → [0 , (cid:80) x ∈ X µ ( x ) = 1. The set of all distributions over X is Dist ( X ). We write | X | for the number of elements in X . Notation x n introduces a vector [ x , . . . , x n ]. Definition 1. A Markov decision process (MDP) is a tuple M = ( S, A, s I , T, c ) where S is a finite set of states, A is a finite set of actions, s I ∈ S is the initialstate, T : S × A (cid:42)
Dist ( S ) is the (partial) probabilistic transition function, and c : S → R is the state-based cost function. We assume deadlock-free MDP. A discrete-time Markov chain (DTMC) is an MDP with only one action at everystate. For DTMCs, we omit the set of actions A by simply typing the transitionfunction T : S → Dist ( S ). To define an expected cost measure on MDPs, thenondeterministic choices of actions are resolved by strategies . A memorylessdeterministic strategy for an MDP is a function σ : S → A . For other typesof strategies, we refer to [5]. Applying strategy σ to an MDP M resolves allnondeterministic choices and yields an induced DTMC M σ . The expected costof reaching a set of goal states G ⊆ S in this induced DTMC is denoted byEC M σ ( ♦ G ). The goal is to compute a strategy that minimizes the expected cost. The continuous power system model is a system of nonlinear differential equations.We explain its setup and components in this section. We then discretize the modelw.r.t. time, obtaining continuous-state dynamics as a set of nonlinear equations.
We model a power grid as an undirected graph of interconnected nodes, to whichgenerators and loads are connected. The example grid in Fig. 1 has one generator,three nodes with connected loads, plus one wind farm and one battery. We adoptthe grid model dynamics proposed in [4,34]. The dynamics in every node aregiven by the active power swing equation , which describes the balance between owards Predictive Verification of Smart Grids 5 electrical and kinetic energy at that node in the grid [40]. The state at time t fornode n is determined by the voltage angle and frequency, yielding the dynamics: m n ¨ δ n ( t ) + d n ˙ δ n ( t ) = ¯ P n ( t ) − (cid:80) p ∈N b n,p sin( δ n ( t ) − δ p ( t )) (1)where δ n ( t ), ˙ δ n ( t ), ¨ δ n ( t ) ∈ R are the voltage angle, angular velocity (frequency),and angular acceleration of node n , and m n and d n are inertia and dampingcoefficients, respectively. ¯ P n ( t ) ∈ R is the power balance at node n , and is givenby the sum of the generation and loads at that node. The power flow betweennode n and all connected nodes p ∈ N is assumed to be purely reactive andcharacterized by the line susceptance, b n,p , and the difference in voltage anglebetween the connected nodes. A detailed description is available in [3,34]. The grid frequency deviation for node n , denoted ω n ( t ), is the difference betweenthe absolute frequency, ˙ δ n ( t ) in Eq. 1, and the desired frequency (e.g. 50 Hz inEurope). The value of ω n ( t ) can be controlled by the injection or consumption ofelectrical energy; any mismatch between power supply and demand results in adeviation. As shown in Fig. 1, we distinguish five generating or consuming assets: – P gen ( t ) is the conventional power dispatch . Conventional generators are subjectto ramping limits, so the derivative ˙ P gen ( t ) is restricted to certain bounds. – P load ( t ), the consumer load , represents the known and uncontrollable demand.We can readily extend the model to controllable or uncertain demand. – R gen ( t ) is the deployment of spinning reserves . It is a control variable in theoptimal power flow problem. – P wind ( t ), the wind power generation , is a random variable [28] due to itslimited predictability. We define P wind,fc ( t ) as the forecast wind power attime t ; then the forecast error is ∆P wind ( t ) = P wind ( t ) − P wind,fc ( t ). – A battery is an energy storage buffer whose state of charge (SoC) q ( t ) followsthe injection or consumption of energy. P stor ( t ) is the uncontrollable powerinput variable, known from the day-ahead plan, and R stor ( t ) is the demand-sideflexibility power rate, which is a control variable in the power flow problem. Ancillary service deployment is subject to two restrictions: reserves and storageflexibility must be scheduled a day ahead, and the deployment can never exceedthe scheduled amount. The following constraints ensure these restrictions: − R gends ( t ) ≤ R gen ( t ) ≤ R genus ( t ) (2)ensures that the deployment of spinning reserves is within the scheduled boundswhere R gends ( t ) ≥ R genus ( t ) ≥
0) is the scheduled amount of down-spinning(up-spinning) reserves. Similarly we have for flexibility that − R stordd ( t ) ≤ R stor ( t ) ≤ R storid ( t ) , (3)where R stordd ( k ) ≥ R storid ( k ) ≥ T.S. Badings, A. Hartmanns, N. Jansen, and M. Suilen
We discretize the continuous dynamics with respect to time to render the problemof optimal frequency control tractable. The resulting model is a set of nonlinearequations, which albeit discretized w.r.t. time are still defined on continuousstate and control spaces. They describe the transition from one continuous stateand control input to the resulting continuous state one discrete time step later.Consider a power grid with n t nodes, n g generators, n f wind farms, and n s batteries for storage. Its continuous dynamics are given as a system of 2 n t + n g + n s first-order differential equations. The features of the continuous state space aregiven by the voltage angles δ n t and frequencies ω n t for all n t nodes, thepower generation P gen1: n g for all n g generators, and the state of charge q n s of all n s batteries. The vector of control variables contains the change in generatordispatch ˙ P gen1: n g and the reserve deployment R gen1: n g for all n g generators, plus theflexibility deployment R stor1: n s for all n s batteries. We discretize w.r.t. time via thefirst-order backward Euler implicit method [36] to obtain the nonlinear function x ( k + 1) = f (cid:0) x ( k ) , u ( k ) , v ( k ) , w ( k ) (cid:1) , (4) – with f ( · ) reflecting the dynamics of the considered power system, which arenonlinear due to the sinusoid, sin( δ n ( t ) − δ p ( t )), in Eq. 1, – x ( k ) = [ δ n t , ω n t , P gen1: n g , q n s ] ∈ R n t + n g + n s the state vector, – u ( k ) = [ ˙ P gen1: n g , R gen1: n g , R stor1: n s ] ∈ R n g + n s the vector of control variables, – v ( k ) = [ P load1: n t , P wind,fc1: n f , P stor1: n s ] ∈ R n t + n f + n s the uncontrollable known inputs, and – w ( k ) = [ ∆P wind1: n f ] ∈ R n f the vector of uncontrollable random variables.We omit further details for the sake of brevity and refer the interested readerto [4] and [39] for the full derivation and discretization of similar grid models. Power balance and control variables.
The day-ahead generator power dis-patch is scheduled such that generation plus wind power forecast matches theconsumer load pattern. We impose the following constraint at every time point k : (cid:80) n g i =1 P gen i ( k ) + (cid:80) n f m =1 P wind,fc m ( k ) = (cid:80) n t n =1 P load n ( k ) . (5)Since P wind,fc m and P load n are known, imposing this equality constraint yields n g − (cid:80) n g i =1 R gen i ( k ) + (cid:80) n f m =1 ∆P wind m ( k ) = (cid:80) n s n =1 R stor n ( k ) , (6)where ∆P wind m is a random variable, and both R gen i and R stor n are control variables.Hence, imposing Eq. 6 yields n g + n s − Power system constraints.
The discrete-time power system model in Eq. 4is subject to a number of constraints. First of all, the equality constraints in owards Predictive Verification of Smart Grids 7
Eqs. 5 and 6 are imposed to enforce the balance between power supply anddemand. Second, power lines have limited transmission capacity, and generatorshave limited generation capacity and ramping capability. Third, the deploymentof reserve power and storage flexibility is limited by their scheduled values, asdescribed in Eqs. 2 and 3. Finally, the electrical storage units have a limitedcapacity, and can only be charged or discharged at a given maximum rate. Forthe explicit formulation of the constraints we refer the interested reader to [34].
Our goal is to overcome the need for sampling and linearization to optimizeancillary service deployment. Directly using the model presented in Eq. 4 wouldrequire dealing with continuous state and action spaces. Our approach is todiscretize the actions, then explore the resulting finite number of (continuously-valued) successor states up to a given exploration depth. In this way, we obtain afinite MDP that can be solved iteratively using a receding horizon principle. Thisapproach is a discrete-state model predictive control technique. We now presentthe wind error DTMC, followed by the details of our exploration procedure andthe formal definition of the finite-horizon MDP.
Recall that the wind power forecast error, ∆P wind m ( k ) ∈ R , of every individualwind farm m ∈ { , . . . , n f } is a continuous random variable in Eq. 4. Exploitingthe time discretization of Sect. 3.4, we construct a DTMC for the forecast errorof every wind farm with time resolution equal to the discretization level of thedynamics. For this section, we assume the presence of only one wind farm tosimplify notation. Let S be the finite state space of the DTMC, and let function M : R → S map the wind power forecast error ∆P wind ∈ R to a state s w ∈ S :every value of the continuous random variable ∆P wind is approximated by thevalue of a discrete state s w ∈ S . The DTMC’s transition function T : S → Dist ( S )describes the probability that a transition occurs from one state s w at any time k to another state s (cid:48) w at time k + 1 with probability T ( s w )( s (cid:48) w ). Historical wind power data.
We follow the method proposed in [24,28] toconstruct the DTMC based on the historical wind power forecast error. Thestates S are based on a uniform discretization of the historical forecast error.The transition probabilities in T are determined using a maximum likelihoodestimation, by counting the number of transitions from one state s w at time k toa successor state s (cid:48) w at time k + 1. We use five years (2015-2019) of on-shore windpower data measured every 15 minutes from the TenneT region of the Germantransmission grid, obtained from the ENTSO-E Transparency Platform [13]. Weinterpolate the data to a 5-minute basis, to match the time resolution we employin the numerical demonstration in Sect. 5. The data set contains the wind powerforecast, P wind,fc , and the actual power, P wind , thus providing a five-year time T.S. Badings, A. Hartmanns, N. Jansen, and M. Suilen(a) Using 5-minute based data. (b) Using 15-minute based data.
Fig. 2: Transition probability function of the wind error DTMC.series of the observed wind power error, ∆P wind . M then maps every continuousvalue of ∆P wind to one of 41 discrete states, i.e. S = { s w , . . . , s w } , as in [24]. Weshow the resulting transition matrix in Fig. 2 for the original 15-minute and theinterpolated 5-minute data. The matrix is diagonally dominant, reflecting thestrong auto-correlation of the forecast error. Next, we present our method to explore the continuous state space of the modelup to a given exploration horizon. We discretize the continuous control variablessuch that their potential values define a set of actions for an MDP, then a priorieliminate those actions that lead to states violating the constraints of Sect. 3.4.Let X k = ( x ( k ) , s w ) denote the continuous state vector x ( k ) at time k , andthe DTMC state s w associated with the current wind power forecast error. Todefine an initial state, we use a concrete measurement at the first time step. Then,we use the dynamics in Eq. 4 in combination with transitions of the wind errorDTMC to compute the possible successor states for each time step. At time step k we select a control input u ( k ) and, based on the dynamics in Eq. 4, receivea set of successor states with different x ( k + 1) , . . . , x ( k + 1) n , one associatedwith every possible wind error successor state, s w , . . . , s nw with T ( s w )( s iw ) > ≤ i ≤ n . The features in x ( k + 1) i related to power generation ( P gen1: n g )and battery SoC ( q n s ) are equal for all 1 ≤ i ≤ n , while the grid features( δ n t , ω n t ) depend on the wind successor state, s iw . As the probabilities to reachthese successor states depend exclusively on the probabilities defined by theDTMC, we reach state X k +1 = ( x ( k + 1) i , s iw )) with probability T ( s w )( s iw ). Feasible control space.
Under the two balance constraints in Eqs. 5 and 6, thevector of control variables u ( k ) in Eq. 4 contains 2 n g + n s − u ∈ U X k ⊂ R n g + n s − that do not lead to a violation of any of the constraints at time k + 1. Note thatthis set depends on the state X k at time k . Given the subset of feasible continuous owards Predictive Verification of Smart Grids 9 ω min ω max q max X k a a a a a s w , . . . , s nw freq. deviation, ω (Hz)SoC, q (%) Fig. 3: Discrete actions a , . . . , a and their mapping to two continuous state-space features: a battery SoC and the frequency in one node. Curved arrowsshow the effect of wind successor states, and dashed lines are system constraints.control inputs, we apply a grid-based discretization in all 2 n g + n s − A X k of feasible and discrete actions at time k , where the subscriptdenotes the dependency on X k . From this discretization of the control actions, animportant property follows. Given the current state X k at time k , every action a ∈ A X k has a different set of discrete successor states X k +1 at time k + 1. Byonly exploring the continuous state-space for the feasible actions in A X k , weminimize the size of the resulting finite-state model.A schematic example of this procedure for two state features (one nodefrequency and one battery SoC) is shown in Fig. 3. The blue dot corresponds tocurrent state X k at time k , and the straight arrows show the discrete actions a ∈ A X k . The curved arrows show the effect of the forecast error to the grid frequency,which depends directly on the actual successor state in the DTMC. Becauseall successor states of a violate the maximum SoC constraint ( q ( k ) ≤ q max ),this action can be eliminated a priori . Similarly, all successors for a violate themaximum frequency deviation limit ( ω ( k ) ≤ ω max ). Actions a and a will notlead to any violation, and are, therefore, included as feasible discrete actions.Action a may or may not violate the constraints depending on the wind powerforecast error. Therefore, this action cannot be eliminated a priori. Exploration.
The finite exploration horizon with starting time k and look-ahead of K h steps is the set { k, . . . , k + K h } . The exploration procedure can beperformed recursively for all discrete actions a ∈ A X k for every k , until reachingthe desired horizon. We obtain a tree-structured model as in Fig. 4, where the branching factor depends on the number of actions and possible wind successorstates, and the depth is given by the horizon length. For brevity, the dependencyof the set of discrete actions on the current state is omitted in this figure.Recall that we consider only those states X k = ( x ( k ) , s w ) of the continuousstate space that are visited during the exploration. We provide the full definitionof the MDP ( S, A, s I , T, c ) with exploration horizon { k, . . . k + K h } , where X k . . . ...... . . . ...... . . . ... . . . ... a a | A | a a | A | a | A | a a a | A | a | A | a ( x ( k + 1) , s w )( x ( k + 1) n , s nw )( x ( k + 1) n , s nw )( x ( k + 1) , s w )Time step: k + 1 k + 2 k + K h − k + K h Fig. 4: Visualization of the state-space exploration procedure, with initial mea-surement X k at time k , and finite horizon with end time k + K h . Solid edgesrepresent discrete actions, whereas dashed lines reflect different successor states. – every state s ∈ S is associated with an X k from the continuous state space; – the set of actions A is the union of all feasible action sets A X k for all X k ; – the initial state s I is given by a concrete measurement ( x , s w ); – the partial probabilistic transition function T : S × A → Dist ( S ) maps ev-ery state and action to the corresponding distribution over successor statesaccording to the wind error DTMC; – the cost function c : S → R assigns the immediate cost given by the sum ofthe absolute value of the frequency deviation in every grid node.Finally, we define the set G ⊆ S of goal states as the states that are reached atthe end of the horizon k + K h and satisfy the constraints. Receding horizon and tree structure.
We compute the strategy σ thatinduces the minimal expected cost EC M σ ( ♦ G ) for the MDP with horizon { k, . . . k + K h } . Using this strategy, we implement the (first) optimal action a = σ ( s ) where s is associated with X k . Having executed the optimal actionat time k , we then apply the so-called receding horizon principle , meaning thatwe shift the exploration horizon one step forward in time. We update the MDPfor the shifted horizon, which is now given as { k + 1 , . . . k + K h + 1 } , and againcompute an optimal strategy. Hence, with every shift of the exploration horizonin time, we reveal small bits of new information near the end of the horizon.As an example, consider an initial exploration horizon defined as the timeframe between 9:00-9:15 AM. Computing an optimal strategy for the correspond-ing MDP means that we take all the information within that time frame intoaccount, i.e. we have perfect knowledge within the horizon. However, a possible owards Predictive Verification of Smart Grids 11 change in the power demand (or any other uncontrollable input) that is forecastat 9:30 AM is not revealed to the model, until the exploration horizon also spansthat time step. Defining an adequate length of the exploration horizon reflects atrade-off between model size and the optimality of the solution.When we shift the exploration horizon from { k, . . . , k + K h } to { k + 1 , . . . , k + K h + 1 } , another layer is added to the MDP. As the starting time of the horizonalso shifts, we gain new information via a new measurement at k + 1, and thenobtain a single new initial state for the MDP with the new horizon. By exploitingthe tree structure of the MDP, we extend the current MDP with a new layer ofsuccessor states and take the subtree starting at the newly found initial state attime k + 1. Simultaneously, we shrink the layer at the top, by pruning all statesthat cannot be reached anymore from the new initial state.The approach of iteratively solving and updating the MDP for shifting horizonsis similar to a discrete-state model predictive control technique. This method isapplied iteratively, until a desired simulation horizon is reached (e.g. 24 hours). We demonstrate the performance of our approach on multiple variants of the3-node example grid already introduced in Fig. 1. To this end, we simulate thewind error DTMC , and apply our method to build the MDP using the sampledwind power forecast error at every time step. As such, every wind power trajectorysampled from the DTMC results in a (potentially different) run of our approach.We solve the MDP to obtain the first action in the sequence of optimal decisionsover the exploration horizon, which is then executed to obtain the initial stateat the next time step. By applying the receding horizon principle, we iterativelyfollow this procedure, until a final simulation horizon of 24 hours is reached.
We consider a simulation time resolution of 5 minutes, and a full simulationhorizon of 24 hours. Using the receding horizon exploration procedure, this meansthat ·
24 = 288 MDPs are solved to obtain the results over one 24-hour run. Thesame demand and wind power forecast are used for every simulation, resultingin the day-ahead power dispatch shown in Fig. 5. This figure shows that underthe forecast conditions, the power supply and demand are perfectly balanced,leading to a stable grid frequency in the absence of forecast errors. Both thepower demand and wind power forecast are based on historical data obtainedfrom the
ENTSO-E Transparency Platform [13], and are scaled appropriately forthe simulation study. The scheduled flexibility deployment limits per battery areset to ± ± .
25 MW.
Simulation cases.
Simulations are performed on three variants of the examplegrid in Fig. 1. In the simplest case, we aggregate the three nodes into one,resulting in a grid where all assets are connected to the same node. The second
Fig. 5: The day-ahead power balance between generator dispatch, wind powerforecast, and the power demand, which is equivalent for all performed simulations.variant is the exact network shown in Fig. 1. The third variant is an extension ofthe second, where a second storage unit is connected to node 1.Then, we perform simulation studies for different values of: a) the explorationhorizon (300, 600, and 900 seconds), and b) the levels of discretization for theactions denoted by λ (with values between 3 and 25 steps). To evaluate thequality of the solution of our technique, we perform a statistical analysis on everycase, by repeating every experiment 100 times. Note that this Monte Carlo typesimulation is merely used to evaluate the quality of the obtained solutions, andnot required to apply our technique in practice. Source code.
All of the code and input data needed to reproduce the re-sults presented in this paper are available at https://gitlab.science.ru.nl/tbadings/power-system-mdp . Our prototype implementation is written inPython version 3.8.3, and allows the user to run the cases defined above or simu-late with any other parameter setting. The simulations are run on a Windowslaptop with a 1.30GHz Intel(R) i7-1065G7 CPU and 16.0 GB of RAM.
The observed run times are approximately proportional to thenumber of MDP states, and grow exponentially with the exploration horizon. Forthe 3-node system with 1 battery, an action discretization level of λ = 5 steps,and exploration horizon of 300 seconds, the average run time per iteration of thereceding horizon (i.e. for solving one MDP) is 0 .
01 seconds, resulting in around3 .
51 seconds per 24h run. For the same case with longer exploration horizons of600 and 900 seconds, average run times are 0 .
14 and 2 .
09 seconds, respectively,per receding horizon iteration.The strongest increase in run time is observed for the 3-node, 2 batterycase, where the average observed times were 0 .
03, 1 .
03, and 223 .
59 seconds forexploration horizons of 300, 600, and 900 seconds, respectively. This steeperincrease is explained by the exponential growth of the model size with respect tothe number of actions, which is higher for the case with 2 batteries. owards Predictive Verification of Smart Grids 13300 600 90000 . . . . J [ H z · h ] λ = 3 λ = 5 λ = 7 λ = 13 λ = 2595% CI Fig. 6: Average total frequency deviations for the 3-node case for different λ . Failed runs.
As visualized before in Fig. 3, a discrete action can lead to asuccessor state that violates one of the system constraints. In total, 4 .
2% of theperformed iterations for all cases combined resulted in a violation of either thefrequency limits or the power line transmission capacity. No significant differencesin the percentage of failed runs is observed across the different cases. Since theseruns are incomplete and not representative for the simulated cases, they are notreported for further analysis. Nevertheless, the number of failed runs provides anempirical indication of the adequacy of the scheduling limits for the reserve andflexibility deployment. A high percentage of failed runs means that the ancillaryservice scheduling limits might be insufficient, and should be enlarged.
Solution quality.
The cost function of the MDP penalizes the sum of theabsolute value of the frequency deviations in all grid nodes. Therefore, the qualityof the obtained solution can be evaluated by taking the integral of the totalobserved frequency deviation over the simulation horizon: J = (cid:90) k end k (cid:12)(cid:12) n t (cid:88) n =1 ω n ( k ) (cid:12)(cid:12) dk, where k and k end cover the full 24-hour simulation time. The lower the value of J , the better the quality of the solution in terms of total frequency deviations.A comparison of the value for J between multiple cases on the 3-node networkfrom Fig. 1, with different levels of λ ∈ { , . . . , } , is shown in Fig. 6. Everybar shows the average results and the 95% confidence interval (CI) of the 100iterations performed for that case. Due to infeasible run times, simulations forthe cases with exploration horizon of 900 seconds and λ ≥
13 were omitted.We observe that (1) a longer exploration horizon does not improve the qualityof the solution in terms of frequency deviations significantly. On the other hand, (2) increasing the number of discrete actions in every dimension yields a significantlybetter solution quality. This observation suggests that it is more beneficial tohave a more fine-grained discretization of the continuous control space, than toinvest in a longer optimization horizon. Exploration horizon [s] S t a t e s p e r M D P Exploration horizon [s] A c t i o n s p e r M D P Fig. 7: Average MDP states (left) and actions (right) for cases with λ = 5, andvarying numbers of nodes and batteries (plotted in log scale). MDP model size.
In Fig. 7, the average number of states and actions perMDP are compared for different network configurations, all with λ = 5. All caseswere simulated for 100 runs, except for the case with 2 batteries, which wasonly simulated for 1 run, due to a too long run time. We see that the numberof states and actions is independent of the number of grid nodes . The intuitionbehind this is that an increased number of state features does not yield a largerMDP. In fact, the branching factor of the applied exploration procedure onlydepends on the number of actions and successor states in the wind error DTMC,and is independent of the number of state features. As expected, increasing thenumber of batteries to 2 yields a significant increase in the number of MDP statesand actions, especially for a longer exploration horizon, due to the additionaldimension of the continuous control space. Frequency control and ancillary service deployment.
Finally, in Fig. 8,two example runs for the 3-node network with a single battery and explorationhorizon of 600 seconds are shown. Fig. 8a presents the frequency deviations for λ = 3, while Fig. 8b shows the same for λ = 25. Similarly, Figs. 8c and 8d showthe deployment of reserves and flexibility for both cases. Due to the uncertaintyin the wind power forecast error, the results in Fig. 8 present only two possibletrajectories, and repeating the experiment can lead to different results.The observed frequency deviations are significantly lower for the case withmore fine-grained discretization of the actions (i.e. λ = 25), thus confirming theresults also shown in Fig. 6. This difference in the control precision is clearlydepicted between Figs. 8c and 8d. The intuition behind this is that a fine-graineddiscretization allows for more precise control of the ancillary service deployment,which results in lower frequency deviations. owards Predictive Verification of Smart Grids 15(a) Frequency deviations for λ = 3. (b) Frequency deviations for λ = 25.(c) Ancillary services for λ = 3. (d) Ancillary services for λ = 25. Fig. 8: Results for two runs for the 3-node case with 1 battery, for different λ . We presented a novel method to solve the problem of short-term scheduling forflexibility-based ancillary services in power systems with uncertain wind powergeneration. By modelling the problem as an MDP, we overcome the need forboth sampling and linearization, as opposed to the continuous-state approachesused by most traditional power system analysis methods. Our experiments showthat our approach is feasible for power grids with different levels of complexityand under realistic operating conditions. Furthermore, our results show it is morebeneficial to have a more fine-grained discretization of the continuous controlspace, than to invest in a longer optimization horizon. Since the size of the MDPgrows exponentially with both the number of actions and the exploration horizon,making a trade-off between the two is necessary.In the future, we will exploit the flexibility of our model to incorporatealternative grid configurations and multiple sources of uncertainty, such asimperfect communication between assets in the grid, or demand uncertainty.Moreover, instead of the batteries, other flexible assets can also be modeled, suchas flexibility provided by the thermal inertia of large-scale buildings.
References
1. Aghaei, J., Alizadeh, M.I.: Demand response in smart electricity grids equipped withrenewable energy sources: A review. Renewable and Sustainable Energy Reviews , 64–72 (2013)2. Al-Saffar, M., Mus´ılek, P.: Distributed optimal power flow for electric power systemswith high penetration of distributed energy resources. In: CCECE 2019. pp. 1–5.IEEE (2019)3. Badings, T.S.: MSc Thesis. Buildings-to-Grid Integration for Demand-Side Flexibil-ity in Power Systems with Uncertain Generation. University of Groningen (2019)4. Badings, T.S., Rostampour, V., Scherpen, J.M.: Distributed Building EnergyStorage Units for Frequency Control Service in Power Systems. IFAC-PapersOnLine (4), 228–233 (2019)5. Baier, C., Katoen, J.P.: Principles of Model Checking. MIT Press (2008)6. Bertsch, J., Hagspiel, S., Just, L.: Congestion management in power systems:Long-term modeling framework and large-scale application. Journal of RegulatoryEconomics (3), 290–327 (2016)7. Boyd, S.P., Vandenberghe, L.: Convex Optimization. Cambridge University Press(2014)8. Calafiore, G.C., Campi, M.C.: The scenario approach to robust control design.IEEE Trans. Autom. Control. (5), 742–753 (2006)9. Campi, M.C., Garatti, S.: The exact feasibility of randomized solutions of uncertainconvex programs. SIAM J. Optim. (3), 1211–1230 (2008)10. Cauchi, N., Abate, A.: \ mathsf stochy : Automated verification and synthesis ofstochastic processes. In: TACAS (2). LNCS, vol. 11428, pp. 247–264. Springer(2019)11. Chertkov, M., Chernyak, V.: Ensemble of thermostatically controlled loads: Statis-tical physics approach. Scientific reports (1), 1–9 (2017)12. Ding, T., Zeng, Z., Bai, J., Qin, B., Yang, Y., Shahidehpour, M.: Optimal electricvehicle charging strategy with Markov decision process and reinforcement learningtechnique. IEEE Transactions on Industry Applications (5), 5811–5823 (2020)13. ENTSO-e: Transparency Platform - Generation Forecasts for Wind and Solar,Control area Germany (2020)14. Gerard, H., Rivero Puente, E.I., Six, D.: Coordination between transmission anddistribution system operators in the electricity sector: A conceptual framework.Utilities Policy , 40–48 (2018)15. Grillo, S., Pievatolo, A., Tironi, E.: Optimal storage scheduling using Markovdecision processes. IEEE Transactions on Sustainable Energy (2), 755–764 (2016)16. Hartmanns, A., Hermanns, H., Berrang, P.: A comparative analysis of decentralizedpower grid stabilization strategies. In: Winter Simulation Conference. pp. 158:1–158:13. WSC (2012)17. Hemmati, R., Saboori, H., Jirdehi, M.A.: Stochastic planning and schedulingof energy storage systems for congestion management in electric power systemsincluding renewable energy resources. Energy , 380–387 (2017)18. Kempton, W., Udo, V., Huber, K., Komara, K., Letendre, S., Baker, S., Brunner,D., Pearre, N.: A test of Vehicle-to-Grid (V2G) for Energy Storage and FrequencyRegulation in the PJM System. Results from an Industry-University ResearchPartnership (2008)19. Liu, Y., Yu, N., Wang, W., Guan, X., Xu, Z., Dong, B., Liu, T.: Coordinating theoperations of smart buildings in smart grids. Applied Energy (July), 2510–2525(2018)owards Predictive Verification of Smart Grids 1720. Lymperopoulos, I., Qureshi, F.A., Nghiem, T., Khatir, A.A., Jones, C.N.: Pro-viding ancillary service with commercial buildings: The Swiss perspective. IFAC-PapersOnLine (8), 6–13 (2015)21. MacDougall, P., Roossien, B., Warmer, C., Kok, K.: Quantifying flexibility forsmart grid services. In: 2013 IEEE Power Energy Society General Meeting. pp. 1–5.IEEE (2013)22. Machowski, J., Dong, Z.Y., Member, S., Zhang, P.: Power System Dynamics:Stability and Control. Wiley (2006)23. Margellos, K., Goulart, P., Lygeros, J.: On the road between robust optimizationand the scenario approach for chance constrained optimization problems. IEEETrans. Autom. Control. (8), 2258–2263 (2014)24. Margellos, K., Haring, T., Hokayem, P., Schubiger, M., Lygeros, J., Andersson,G.: A Robust Reserve Scheduling Technique for Power Systems with High WindPenetration. Proceedings of PMAPS pp. 870–875 (2012)25. Murty, K.G., Kabadi, S.N.: Some np-complete problems in quadratic and nonlinearprogramming. Math. Program. (2), 117–129 (1987)26. NG ESO: Optional downward flexibility management (odfm) service documents.National Grid ESO (2020)27. Nguyen, D.B., Scherpen, J.M.A., Bliek, F.: Distributed optimal control of smartelectricity grids with congestion management. IEEE Trans Autom. Sci. Eng. (2),494–504 (2017)28. Papaefthymiou, G., Kl¨ockl, B.: MCMC for wind power simulation. IEEE Transac-tions on Energy Conversion (1), 234–240 (2008)29. Peruffo, A., Guiu, E., Panciatici, P., Abate, A.: Safety guarantees for the electricitygrid with significant renewables generation. In: QEST. LNCS, vol. 11785, pp.332–349. Springer (2019)30. Pillay, A., Prabhakar Karthikeyan, S., Kothari, D.P.: Congestion management inpower systems - A review. International Journal of Electrical Power and EnergySystems , 83–90 (2015)31. Puterman, M.L.: Markov Decision Processes: Discrete Stochastic Dynamic Pro-gramming. Wiley Series in Probability and Statistics, Wiley (1994)32. Razmara, M., Bharati, G.R., Shahbakhti, M., Paudyal, S., Robinett, R.D.: Bileveloptimization framework for smart building-to-grid systems. IEEE Trans. SmartGrid (2), 582–593 (2018)33. Rostampour, V., Badings, T.S., Scherpen, J.M.A.: Buildings-to-grid integrationwith high wind power penetration. In: CDC. pp. 2976–2981. IEEE (2019)34. Rostampour, V., Badings, T.S., Scherpen, J.M.A.: Demand flexibility managementfor buildings-to-grid integration with uncertain generation. Energies (24) (2020)35. Rostampour, V., Ter Haar, O., Keviczky, T.: Distributed stochastic reserve schedul-ing in ac power systems with uncertain generation. IEEE Transactions on PowerSystems (2), 1005–1020 (2018)36. Sincovec, R.F., Erisman, A.M., Yip, E.L., Epton, M.A.: Analysis of DescriptorSystems Using Numerical Algorithms. IEEE Trans. Autom. Control. (1), 139–147(1981)37. Soudjani, S.E.Z., Abate, A.: Aggregation and control of populations of thermostati-cally controlled loads by formal abstractions. IEEE Trans. Control. Syst. Technol. (3), 975–990 (2015)38. Soudjani, S.E.Z., Gevaerts, C., Abate, A.: FAUST 2 : Formal abstractions ofuncountable-state stochastic processes. In: TACAS. LNCS, vol. 9035, pp. 272–286.Springer (2015)8 T.S. Badings, A. Hartmanns, N. Jansen, and M. Suilen39. Taha, A.F., Gatsis, N., Dong, B., Pipri, A., Li, Z.: Buildings-to-grid integrationframework. IEEE Trans. Smart Grid (2), 1237–1249 (2019)40. Trip, S., B¨urger, M., Persis, C.D.: An internal model approach to frequency regula-tion in inverter-based microgrids with time-varying voltages. In: CDC. pp. 223–228.IEEE (2014)41. Wang, J., Liu, C., Ton, D., Zhou, Y., Kim, J., Vyas, A.: Impact of plug-in hybridelectric vehicles on power systems with demand response and wind power. EnergyPolicy39