[PDF] Balancing Wind and Batteries: Towards Predictive Verification of Smart Grids

Abstract

We study a smart grid with wind power and battery storage. Traditionally, day-ahead planning aims to balance demand and wind power, yet actual wind conditions often deviate from forecasts. Short-term flexibility in storage and generation fills potential gaps, planned on a minutes time scale for 30-60 minute horizons. Finding the optimal flexibility deployment requires solving a semi-infinite non-convex stochastic program, which is generally intractable to do exactly. Previous approaches rely on sampling, yet such critical problems call for rigorous approaches with stronger guarantees. Our method employs probabilistic model checking techniques. First, we cast the problem as a continuous-space Markov decision process with discretized control, for which an optimal deployment strategy minimizes the expected grid frequency deviation. To mitigate state space explosion, we exploit specific structural properties of the model to implement an iterative exploration method that reuses pre-computed values as wind data is updated. Our experiments show the method's feasibility and versatility across grid configurations and time scales.

Full PDF

BBalancing Wind and Batteries: TowardsPredictive Veriﬁcation of Smart Grids (cid:63)

Thom S. Badings , Arnd Hartmanns , Nils Jansen , and Marnix Suilen Department of Software Science, Radboud University, Nijmegen, The Netherlands University of Twente, Enschede, The Netherlands

Abstract.

We study a smart grid with wind power and battery storage.Traditionally, day-ahead planning aims to balance demand and windpower, yet actual wind conditions often deviate from forecasts. Short-term ﬂexibility in storage and generation ﬁlls potential gaps, planned ona minutes time scale for 30-60 minute horizons. Finding the optimal ﬂexi-bility deployment requires solving a semi-inﬁnite non-convex stochasticprogram, which is generally intractable to do exactly. Previous approachesrely on sampling, yet such critical problems call for rigorous approacheswith stronger guarantees. Our method employs probabilistic model check-ing techniques. First, we cast the problem as a continuous-space Markovdecision process with discretized control, for which an optimal deploymentstrategy minimizes the expected grid frequency deviation. To mitigatestate space explosion, we exploit speciﬁc structural properties of the modelto implement an iterative exploration method that reuses pre-computedvalues as wind data is updated. Our experiments show the method’sfeasibility and versatility across grid conﬁgurations and time scales.

Electricity grids need to constantly maintain a balance between power supplyand demand; imbalances result in frequency deviations, which ultimately lead tocritical events like blackouts [22]. The increasing deployment of renewable energysources such as solar and wind power—which react sharply to hard-to-predictweather conditions—makes maintaining the balance increasingly diﬃcult. Inday-to-day operation, the balancing is managed at two time scales. One dayahead, the transmission system operator (TSO) schedules conventional generatorsto match the predicted demand minus the expected renewable generation basedon weather forecasts. During the day, the TSO ﬁlls potential gaps introducedby any mismatch between forecast and actual weather conditions by ancillaryservices , which run on a short-term schedule that is updated every few minutes.On the supply side, the most prominent ancillary service are spinning re-serves from generators. They traditionally compensate for contingencies (such asgenerator failures) and deviations from the predicted demand. To free capacityfor spinning reserves, some generators operate below their rated capacity, makingthem a costly service. The TSO’s day-ahead plan thus makes a tradeoﬀ between (cid:63)

This research has been partially funded by NWO grants NWO OCENW.KLEIN.187and NWA.1160.18.238, and NWO VENI grant no. 639.021.754. a r X i v : . [ ee ss . S Y ] F e b T.S. Badings, A. Hartmanns, N. Jansen, and M. Suilen allocating suﬃcient reserves to mitigate any potential imbalance and minimizingunused generator capacity [35].

Demand-side ﬂexibility [3], on the other hand,is provided by various assets connected to the grid, including batteries [18] andHVAC systems [11,39]. They can reduce or increase the overall power consump-tion at some time point by injecting or withdrawing electricity into or fromthe grid. Such ﬂexibility-based services shift power consumption in time ratherthan changing the total [21]. Examples today include the ODFM service in theUnited Kingdom [26] and various applications of demand response throughoutthe world [1]. In this paper, we approach the fundamental challenge of short-termscheduling for ﬂexibility-based ancillary services:Given a power grid with signiﬁcant uncertain wind power generation,optimally schedule the deployment of the available ancillary services overa ﬁnite horizon to minimize the expected total grid frequency deviationwithout violating any hard constraints on grid stability and operation.Hard constraints include a maximum frequency deviation (we use ± . scenario optimization approach [23,34], which linearizesthe nonconvex constraints and solves the resulting linear (but still semi-inﬁnite)stochastic optimization problem up to some statistical conﬁdence and error.The drawbacks of this approach are the approximation error introduced by thelinearization step and the statistical error due to the use of sampling [9]. Our contribution.

To overcome the need for both sampling and linearization, wemodel the problem as a Markov decision process (MDP) [31]. MDPs combineprobabilistic choices, which we use to follow a white-box DTMC for the winderrors, and nondeterminism, which we use to capture the service deploymentdecisions to optimize over. A direct cast of the problem into an MDP would yieldcontinuous state and action spaces: state variables would represent continuousquantities (e.g. grid frequency), and control decisions would range over real-valuedintervals (e.g. charge current applied to a battery). We thus (i) discretize thecontrollable values into ﬁnitely many control actions . Since (ii) control decisionsare only made every few minutes, the model is discrete-time. Further, (iii) windconditions at the current time are known, so there is a single initial state for every(iv) ﬁnite horizon. The combination of (i)-(iv) entails that we can only reacha ﬁnite subset of the continuous state space. Thus we can build an MDP withﬁnitely many states and actions. We use the original (non-linearized) continuousdynamics of the power grid to compute the successor state following an action. Acost function penalizes frequency deviations; violations of hard constraints lead owards Predictive Veriﬁcation of Smart Grids 3 to absorbing non-goal states. An action selection strategy that minimizes theexpected accumulated cost to reach the time horizon then deﬁnes an optimaldeployment of ancillary services. We track the time horizon, making the MDPa directed acyclic graph in theory and a tree in practice. In the MPC loop,we only need the action selected in the initial (current) state; when time hasadvanced to the next control decision, we have a new initial state (based onmeasured wind conditions in reality and sampled from the wind error DTMC inour experiments) from which to repeat the procedure. In particular, a large partof the new iteration’s MDP had already been computed in the previous iteration;we only need to add one more layer for the advanced time horizon. We presentthe continuous-state dynamics of the power system in Sect. 3, explain the formaland technical details of our MDP-based approach in Sect. 4, and report on anexperimental evaluation in Sect. 5.Our approach has three key advantages: We (1) obtain a strategy that is sound w.r.t. the physical constraints, i.e. it is guaranteed to satisfy the battery, generator,and ramping capacities (but not necessarily the frequency and transmission limits).The same cannot be guaranteed with scenario optimization due to the linearizationand statistical error. We (2) exploit the tree structure of the MDP to speed upcomputations in the MPC loop; and (3) by relying on existing probabilistic modelchecking technology, the approach is easy to extend , e.g. with multiple objectives,unreliable communication, or demand uncertainty. Its main drawbacks are that,while sound and optimal for the discrete MDP, the computed strategy is soundbut may not be optimal for the continuous model : an optimal strategy in thecontinuous model may require a control input that lies between the discreteoptions of the MDP. Moreover, the MDP’s state space grows exponentially withthe time horizon and precision of the discretization. We investigate the eﬀects ofvarying degrees of discretization and time horizons on the quality of the scheduleand the tractability of the problem in our experimental evaluation.

Related work.

Previous studies of demand-side ﬂexibility consider e.g. vehicle-to-grid [18,41] and buildings-to-grid integration [20,32,33,39]. As renewable genera-tion is primarily decentralized, regional congestion is an issue [6,30]; congestionmanagement under uncertain generation was studied in [14,17,27]. The majorityof the previously cited works use continuous-state models. Several also apply theMPC pattern [19,34,39] and generally state the optimization as an optimal powerﬂow problem [22]. As mentioned, scenario optimization [8,9,23] is sampling-based;in a diﬀerent approach to sampling, [2] uses Monte Carlo tree search for optimalpower ﬂow in the presence of many distributed energy resources.When it comes to Markov models, [15] uses MDPs for optimal storage schedul-ing, while [12] computes MDP-based optimal charging strategies for electricvehicles. Probabilistic safety guarantees have been formally veriﬁed on DTMCmodels [29,37], i.e. MDPs where a ﬁxed control strategy is embedded in the model.[16] studies decentralized protocols in solar panels to stabilize grid frequency.Finally, we mention that our way of deriving the MDP is similar to the approachof the StocHy tool [10] and its predecessor FAUST [38], which however lacksupport for costs/rewards and do not implement our eﬃcient MPC loop. T.S. Badings, A. Hartmanns, N. Jansen, and M. Suilen

31 2 GeneratorWind farm Battery b , b , b , P load1 P load2 P load3 P stor R stor P wind P gen R gen Fig. 1: 3-node example electricity grid. A discrete probability distribution over a ﬁnite set X is a function µ : X → [0 , (cid:80) x ∈ X µ ( x ) = 1. The set of all distributions over X is Dist ( X ). We write | X | for the number of elements in X . Notation x n introduces a vector [ x , . . . , x n ]. Deﬁnition 1. A Markov decision process (MDP) is a tuple M = ( S, A, s I , T, c ) where S is a ﬁnite set of states, A is a ﬁnite set of actions, s I ∈ S is the initialstate, T : S × A (cid:42)

Dist ( S ) is the (partial) probabilistic transition function, and c : S → R is the state-based cost function. We assume deadlock-free MDP. A discrete-time Markov chain (DTMC) is an MDP with only one action at everystate. For DTMCs, we omit the set of actions A by simply typing the transitionfunction T : S → Dist ( S ). To deﬁne an expected cost measure on MDPs, thenondeterministic choices of actions are resolved by strategies . A memorylessdeterministic strategy for an MDP is a function σ : S → A . For other typesof strategies, we refer to [5]. Applying strategy σ to an MDP M resolves allnondeterministic choices and yields an induced DTMC M σ . The expected costof reaching a set of goal states G ⊆ S in this induced DTMC is denoted byEC M σ ( ♦ G ). The goal is to compute a strategy that minimizes the expected cost. The continuous power system model is a system of nonlinear diﬀerential equations.We explain its setup and components in this section. We then discretize the modelw.r.t. time, obtaining continuous-state dynamics as a set of nonlinear equations.

We model a power grid as an undirected graph of interconnected nodes, to whichgenerators and loads are connected. The example grid in Fig. 1 has one generator,three nodes with connected loads, plus one wind farm and one battery. We adoptthe grid model dynamics proposed in [4,34]. The dynamics in every node aregiven by the active power swing equation , which describes the balance between owards Predictive Veriﬁcation of Smart Grids 5 electrical and kinetic energy at that node in the grid [40]. The state at time t fornode n is determined by the voltage angle and frequency, yielding the dynamics: m n ¨ δ n ( t ) + d n ˙ δ n ( t ) = ¯ P n ( t ) − (cid:80) p ∈N b n,p sin( δ n ( t ) − δ p ( t )) (1)where δ n ( t ), ˙ δ n ( t ), ¨ δ n ( t ) ∈ R are the voltage angle, angular velocity (frequency),and angular acceleration of node n , and m n and d n are inertia and dampingcoeﬃcients, respectively. ¯ P n ( t ) ∈ R is the power balance at node n , and is givenby the sum of the generation and loads at that node. The power ﬂow betweennode n and all connected nodes p ∈ N is assumed to be purely reactive andcharacterized by the line susceptance, b n,p , and the diﬀerence in voltage anglebetween the connected nodes. A detailed description is available in [3,34]. The grid frequency deviation for node n , denoted ω n ( t ), is the diﬀerence betweenthe absolute frequency, ˙ δ n ( t ) in Eq. 1, and the desired frequency (e.g. 50 Hz inEurope). The value of ω n ( t ) can be controlled by the injection or consumption ofelectrical energy; any mismatch between power supply and demand results in adeviation. As shown in Fig. 1, we distinguish ﬁve generating or consuming assets: – P gen ( t ) is the conventional power dispatch . Conventional generators are subjectto ramping limits, so the derivative ˙ P gen ( t ) is restricted to certain bounds. – P load ( t ), the consumer load , represents the known and uncontrollable demand.We can readily extend the model to controllable or uncertain demand. – R gen ( t ) is the deployment of spinning reserves . It is a control variable in theoptimal power ﬂow problem. – P wind ( t ), the wind power generation , is a random variable [28] due to itslimited predictability. We deﬁne P wind,fc ( t ) as the forecast wind power attime t ; then the forecast error is ∆P wind ( t ) = P wind ( t ) − P wind,fc ( t ). – A battery is an energy storage buﬀer whose state of charge (SoC) q ( t ) followsthe injection or consumption of energy. P stor ( t ) is the uncontrollable powerinput variable, known from the day-ahead plan, and R stor ( t ) is the demand-sideﬂexibility power rate, which is a control variable in the power ﬂow problem. Ancillary service deployment is subject to two restrictions: reserves and storageﬂexibility must be scheduled a day ahead, and the deployment can never exceedthe scheduled amount. The following constraints ensure these restrictions: − R gends ( t ) ≤ R gen ( t ) ≤ R genus ( t ) (2)ensures that the deployment of spinning reserves is within the scheduled boundswhere R gends ( t ) ≥ R genus ( t ) ≥

0) is the scheduled amount of down-spinning(up-spinning) reserves. Similarly we have for ﬂexibility that − R stordd ( t ) ≤ R stor ( t ) ≤ R storid ( t ) , (3)where R stordd ( k ) ≥ R storid ( k ) ≥ T.S. Badings, A. Hartmanns, N. Jansen, and M. Suilen

We discretize the continuous dynamics with respect to time to render the problemof optimal frequency control tractable. The resulting model is a set of nonlinearequations, which albeit discretized w.r.t. time are still deﬁned on continuousstate and control spaces. They describe the transition from one continuous stateand control input to the resulting continuous state one discrete time step later.Consider a power grid with n t nodes, n g generators, n f wind farms, and n s batteries for storage. Its continuous dynamics are given as a system of 2 n t + n g + n s ﬁrst-order diﬀerential equations. The features of the continuous state space aregiven by the voltage angles δ n t and frequencies ω n t for all n t nodes, thepower generation P gen1: n g for all n g generators, and the state of charge q n s of all n s batteries. The vector of control variables contains the change in generatordispatch ˙ P gen1: n g and the reserve deployment R gen1: n g for all n g generators, plus theﬂexibility deployment R stor1: n s for all n s batteries. We discretize w.r.t. time via theﬁrst-order backward Euler implicit method [36] to obtain the nonlinear function x ( k + 1) = f (cid:0) x ( k ) , u ( k ) , v ( k ) , w ( k ) (cid:1) , (4) – with f ( · ) reﬂecting the dynamics of the considered power system, which arenonlinear due to the sinusoid, sin( δ n ( t ) − δ p ( t )), in Eq. 1, – x ( k ) = [ δ n t , ω n t , P gen1: n g , q n s ] ∈ R n t + n g + n s the state vector, – u ( k ) = [ ˙ P gen1: n g , R gen1: n g , R stor1: n s ] ∈ R n g + n s the vector of control variables, – v ( k ) = [ P load1: n t , P wind,fc1: n f , P stor1: n s ] ∈ R n t + n f + n s the uncontrollable known inputs, and – w ( k ) = [ ∆P wind1: n f ] ∈ R n f the vector of uncontrollable random variables.We omit further details for the sake of brevity and refer the interested readerto [4] and [39] for the full derivation and discretization of similar grid models. Power balance and control variables.

The day-ahead generator power dis-patch is scheduled such that generation plus wind power forecast matches theconsumer load pattern. We impose the following constraint at every time point k : (cid:80) n g i =1 P gen i ( k ) + (cid:80) n f m =1 P wind,fc m ( k ) = (cid:80) n t n =1 P load n ( k ) . (5)Since P wind,fc m and P load n are known, imposing this equality constraint yields n g − (cid:80) n g i =1 R gen i ( k ) + (cid:80) n f m =1 ∆P wind m ( k ) = (cid:80) n s n =1 R stor n ( k ) , (6)where ∆P wind m is a random variable, and both R gen i and R stor n are control variables.Hence, imposing Eq. 6 yields n g + n s − Power system constraints.

The discrete-time power system model in Eq. 4is subject to a number of constraints. First of all, the equality constraints in owards Predictive Veriﬁcation of Smart Grids 7

Eqs. 5 and 6 are imposed to enforce the balance between power supply anddemand. Second, power lines have limited transmission capacity, and generatorshave limited generation capacity and ramping capability. Third, the deploymentof reserve power and storage ﬂexibility is limited by their scheduled values, asdescribed in Eqs. 2 and 3. Finally, the electrical storage units have a limitedcapacity, and can only be charged or discharged at a given maximum rate. Forthe explicit formulation of the constraints we refer the interested reader to [34].

Our goal is to overcome the need for sampling and linearization to optimizeancillary service deployment. Directly using the model presented in Eq. 4 wouldrequire dealing with continuous state and action spaces. Our approach is todiscretize the actions, then explore the resulting ﬁnite number of (continuously-valued) successor states up to a given exploration depth. In this way, we obtain aﬁnite MDP that can be solved iteratively using a receding horizon principle. Thisapproach is a discrete-state model predictive control technique. We now presentthe wind error DTMC, followed by the details of our exploration procedure andthe formal deﬁnition of the ﬁnite-horizon MDP.

Recall that the wind power forecast error, ∆P wind m ( k ) ∈ R , of every individualwind farm m ∈ { , . . . , n f } is a continuous random variable in Eq. 4. Exploitingthe time discretization of Sect. 3.4, we construct a DTMC for the forecast errorof every wind farm with time resolution equal to the discretization level of thedynamics. For this section, we assume the presence of only one wind farm tosimplify notation. Let S be the ﬁnite state space of the DTMC, and let function M : R → S map the wind power forecast error ∆P wind ∈ R to a state s w ∈ S :every value of the continuous random variable ∆P wind is approximated by thevalue of a discrete state s w ∈ S . The DTMC’s transition function T : S → Dist ( S )describes the probability that a transition occurs from one state s w at any time k to another state s (cid:48) w at time k + 1 with probability T ( s w )( s (cid:48) w ). Historical wind power data.

We follow the method proposed in [24,28] toconstruct the DTMC based on the historical wind power forecast error. Thestates S are based on a uniform discretization of the historical forecast error.The transition probabilities in T are determined using a maximum likelihoodestimation, by counting the number of transitions from one state s w at time k toa successor state s (cid:48) w at time k + 1. We use ﬁve years (2015-2019) of on-shore windpower data measured every 15 minutes from the TenneT region of the Germantransmission grid, obtained from the ENTSO-E Transparency Platform [13]. Weinterpolate the data to a 5-minute basis, to match the time resolution we employin the numerical demonstration in Sect. 5. The data set contains the wind powerforecast, P wind,fc , and the actual power, P wind , thus providing a ﬁve-year time T.S. Badings, A. Hartmanns, N. Jansen, and M. Suilen(a) Using 5-minute based data. (b) Using 15-minute based data.

Fig. 2: Transition probability function of the wind error DTMC.series of the observed wind power error, ∆P wind . M then maps every continuousvalue of ∆P wind to one of 41 discrete states, i.e. S = { s w , . . . , s w } , as in [24]. Weshow the resulting transition matrix in Fig. 2 for the original 15-minute and theinterpolated 5-minute data. The matrix is diagonally dominant, reﬂecting thestrong auto-correlation of the forecast error. Next, we present our method to explore the continuous state space of the modelup to a given exploration horizon. We discretize the continuous control variablessuch that their potential values deﬁne a set of actions for an MDP, then a priorieliminate those actions that lead to states violating the constraints of Sect. 3.4.Let X k = ( x ( k ) , s w ) denote the continuous state vector x ( k ) at time k , andthe DTMC state s w associated with the current wind power forecast error. Todeﬁne an initial state, we use a concrete measurement at the ﬁrst time step. Then,we use the dynamics in Eq. 4 in combination with transitions of the wind errorDTMC to compute the possible successor states for each time step. At time step k we select a control input u ( k ) and, based on the dynamics in Eq. 4, receivea set of successor states with diﬀerent x ( k + 1) , . . . , x ( k + 1) n , one associatedwith every possible wind error successor state, s w , . . . , s nw with T ( s w )( s iw ) > ≤ i ≤ n . The features in x ( k + 1) i related to power generation ( P gen1: n g )and battery SoC ( q n s ) are equal for all 1 ≤ i ≤ n , while the grid features( δ n t , ω n t ) depend on the wind successor state, s iw . As the probabilities to reachthese successor states depend exclusively on the probabilities deﬁned by theDTMC, we reach state X k +1 = ( x ( k + 1) i , s iw )) with probability T ( s w )( s iw ). Feasible control space.

Under the two balance constraints in Eqs. 5 and 6, thevector of control variables u ( k ) in Eq. 4 contains 2 n g + n s − u ∈ U X k ⊂ R n g + n s − that do not lead to a violation of any of the constraints at time k + 1. Note thatthis set depends on the state X k at time k . Given the subset of feasible continuous owards Predictive Veriﬁcation of Smart Grids 9 ω min ω max q max X k a a a a a s w , . . . , s nw freq. deviation, ω (Hz)SoC, q (%) Fig. 3: Discrete actions a , . . . , a and their mapping to two continuous state-space features: a battery SoC and the frequency in one node. Curved arrowsshow the eﬀect of wind successor states, and dashed lines are system constraints.control inputs, we apply a grid-based discretization in all 2 n g + n s − A X k of feasible and discrete actions at time k , where the subscriptdenotes the dependency on X k . From this discretization of the control actions, animportant property follows. Given the current state X k at time k , every action a ∈ A X k has a diﬀerent set of discrete successor states X k +1 at time k + 1. Byonly exploring the continuous state-space for the feasible actions in A X k , weminimize the size of the resulting ﬁnite-state model.A schematic example of this procedure for two state features (one nodefrequency and one battery SoC) is shown in Fig. 3. The blue dot corresponds tocurrent state X k at time k , and the straight arrows show the discrete actions a ∈ A X k . The curved arrows show the eﬀect of the forecast error to the grid frequency,which depends directly on the actual successor state in the DTMC. Becauseall successor states of a violate the maximum SoC constraint ( q ( k ) ≤ q max ),this action can be eliminated a priori . Similarly, all successors for a violate themaximum frequency deviation limit ( ω ( k ) ≤ ω max ). Actions a and a will notlead to any violation, and are, therefore, included as feasible discrete actions.Action a may or may not violate the constraints depending on the wind powerforecast error. Therefore, this action cannot be eliminated a priori. Exploration.

The ﬁnite exploration horizon with starting time k and look-ahead of K h steps is the set { k, . . . , k + K h } . The exploration procedure can beperformed recursively for all discrete actions a ∈ A X k for every k , until reachingthe desired horizon. We obtain a tree-structured model as in Fig. 4, where the branching factor depends on the number of actions and possible wind successorstates, and the depth is given by the horizon length. For brevity, the dependencyof the set of discrete actions on the current state is omitted in this ﬁgure.Recall that we consider only those states X k = ( x ( k ) , s w ) of the continuousstate space that are visited during the exploration. We provide the full deﬁnitionof the MDP ( S, A, s I , T, c ) with exploration horizon { k, . . . k + K h } , where X k . . . ...... . . . ...... . . . ... . . . ... a a | A | a a | A | a | A | a a a | A | a | A | a ( x ( k + 1) , s w )( x ( k + 1) n , s nw )( x ( k + 1) n , s nw )( x ( k + 1) , s w )Time step: k + 1 k + 2 k + K h − k + K h Fig. 4: Visualization of the state-space exploration procedure, with initial mea-surement X k at time k , and ﬁnite horizon with end time k + K h . Solid edgesrepresent discrete actions, whereas dashed lines reﬂect diﬀerent successor states. – every state s ∈ S is associated with an X k from the continuous state space; – the set of actions A is the union of all feasible action sets A X k for all X k ; – the initial state s I is given by a concrete measurement ( x , s w ); – the partial probabilistic transition function T : S × A → Dist ( S ) maps ev-ery state and action to the corresponding distribution over successor statesaccording to the wind error DTMC; – the cost function c : S → R assigns the immediate cost given by the sum ofthe absolute value of the frequency deviation in every grid node.Finally, we deﬁne the set G ⊆ S of goal states as the states that are reached atthe end of the horizon k + K h and satisfy the constraints. Receding horizon and tree structure.

We compute the strategy σ thatinduces the minimal expected cost EC M σ ( ♦ G ) for the MDP with horizon { k, . . . k + K h } . Using this strategy, we implement the (ﬁrst) optimal action a = σ ( s ) where s is associated with X k . Having executed the optimal actionat time k , we then apply the so-called receding horizon principle , meaning thatwe shift the exploration horizon one step forward in time. We update the MDPfor the shifted horizon, which is now given as { k + 1 , . . . k + K h + 1 } , and againcompute an optimal strategy. Hence, with every shift of the exploration horizonin time, we reveal small bits of new information near the end of the horizon.As an example, consider an initial exploration horizon deﬁned as the timeframe between 9:00-9:15 AM. Computing an optimal strategy for the correspond-ing MDP means that we take all the information within that time frame intoaccount, i.e. we have perfect knowledge within the horizon. However, a possible owards Predictive Veriﬁcation of Smart Grids 11 change in the power demand (or any other uncontrollable input) that is forecastat 9:30 AM is not revealed to the model, until the exploration horizon also spansthat time step. Deﬁning an adequate length of the exploration horizon reﬂects atrade-oﬀ between model size and the optimality of the solution.When we shift the exploration horizon from { k, . . . , k + K h } to { k + 1 , . . . , k + K h + 1 } , another layer is added to the MDP. As the starting time of the horizonalso shifts, we gain new information via a new measurement at k + 1, and thenobtain a single new initial state for the MDP with the new horizon. By exploitingthe tree structure of the MDP, we extend the current MDP with a new layer ofsuccessor states and take the subtree starting at the newly found initial state attime k + 1. Simultaneously, we shrink the layer at the top, by pruning all statesthat cannot be reached anymore from the new initial state.The approach of iteratively solving and updating the MDP for shifting horizonsis similar to a discrete-state model predictive control technique. This method isapplied iteratively, until a desired simulation horizon is reached (e.g. 24 hours). We demonstrate the performance of our approach on multiple variants of the3-node example grid already introduced in Fig. 1. To this end, we simulate thewind error DTMC , and apply our method to build the MDP using the sampledwind power forecast error at every time step. As such, every wind power trajectorysampled from the DTMC results in a (potentially diﬀerent) run of our approach.We solve the MDP to obtain the ﬁrst action in the sequence of optimal decisionsover the exploration horizon, which is then executed to obtain the initial stateat the next time step. By applying the receding horizon principle, we iterativelyfollow this procedure, until a ﬁnal simulation horizon of 24 hours is reached.

We consider a simulation time resolution of 5 minutes, and a full simulationhorizon of 24 hours. Using the receding horizon exploration procedure, this meansthat ·

24 = 288 MDPs are solved to obtain the results over one 24-hour run. Thesame demand and wind power forecast are used for every simulation, resultingin the day-ahead power dispatch shown in Fig. 5. This ﬁgure shows that underthe forecast conditions, the power supply and demand are perfectly balanced,leading to a stable grid frequency in the absence of forecast errors. Both thepower demand and wind power forecast are based on historical data obtainedfrom the

ENTSO-E Transparency Platform [13], and are scaled appropriately forthe simulation study. The scheduled ﬂexibility deployment limits per battery areset to ± ± .

25 MW.

Simulation cases.

Simulations are performed on three variants of the examplegrid in Fig. 1. In the simplest case, we aggregate the three nodes into one,resulting in a grid where all assets are connected to the same node. The second

Fig. 5: The day-ahead power balance between generator dispatch, wind powerforecast, and the power demand, which is equivalent for all performed simulations.variant is the exact network shown in Fig. 1. The third variant is an extension ofthe second, where a second storage unit is connected to node 1.Then, we perform simulation studies for diﬀerent values of: a) the explorationhorizon (300, 600, and 900 seconds), and b) the levels of discretization for theactions denoted by λ (with values between 3 and 25 steps). To evaluate thequality of the solution of our technique, we perform a statistical analysis on everycase, by repeating every experiment 100 times. Note that this Monte Carlo typesimulation is merely used to evaluate the quality of the obtained solutions, andnot required to apply our technique in practice. Source code.

All of the code and input data needed to reproduce the re-sults presented in this paper are available at https://gitlab.science.ru.nl/tbadings/power-system-mdp . Our prototype implementation is written inPython version 3.8.3, and allows the user to run the cases deﬁned above or simu-late with any other parameter setting. The simulations are run on a Windowslaptop with a 1.30GHz Intel(R) i7-1065G7 CPU and 16.0 GB of RAM.

The observed run times are approximately proportional to thenumber of MDP states, and grow exponentially with the exploration horizon. Forthe 3-node system with 1 battery, an action discretization level of λ = 5 steps,and exploration horizon of 300 seconds, the average run time per iteration of thereceding horizon (i.e. for solving one MDP) is 0 .

01 seconds, resulting in around3 .

51 seconds per 24h run. For the same case with longer exploration horizons of600 and 900 seconds, average run times are 0 .

14 and 2 .

09 seconds, respectively,per receding horizon iteration.The strongest increase in run time is observed for the 3-node, 2 batterycase, where the average observed times were 0 .

03, 1 .

03, and 223 .

59 seconds forexploration horizons of 300, 600, and 900 seconds, respectively. This steeperincrease is explained by the exponential growth of the model size with respect tothe number of actions, which is higher for the case with 2 batteries. owards Predictive Veriﬁcation of Smart Grids 13300 600 90000 . . . . J [ H z · h ] λ = 3 λ = 5 λ = 7 λ = 13 λ = 2595% CI Fig. 6: Average total frequency deviations for the 3-node case for diﬀerent λ . Failed runs.

As visualized before in Fig. 3, a discrete action can lead to asuccessor state that violates one of the system constraints. In total, 4 .

2% of theperformed iterations for all cases combined resulted in a violation of either thefrequency limits or the power line transmission capacity. No signiﬁcant diﬀerencesin the percentage of failed runs is observed across the diﬀerent cases. Since theseruns are incomplete and not representative for the simulated cases, they are notreported for further analysis. Nevertheless, the number of failed runs provides anempirical indication of the adequacy of the scheduling limits for the reserve andﬂexibility deployment. A high percentage of failed runs means that the ancillaryservice scheduling limits might be insuﬃcient, and should be enlarged.

Solution quality.

The cost function of the MDP penalizes the sum of theabsolute value of the frequency deviations in all grid nodes. Therefore, the qualityof the obtained solution can be evaluated by taking the integral of the totalobserved frequency deviation over the simulation horizon: J = (cid:90) k end k (cid:12)(cid:12) n t (cid:88) n =1 ω n ( k ) (cid:12)(cid:12) dk, where k and k end cover the full 24-hour simulation time. The lower the value of J , the better the quality of the solution in terms of total frequency deviations.A comparison of the value for J between multiple cases on the 3-node networkfrom Fig. 1, with diﬀerent levels of λ ∈ { , . . . , } , is shown in Fig. 6. Everybar shows the average results and the 95% conﬁdence interval (CI) of the 100iterations performed for that case. Due to infeasible run times, simulations forthe cases with exploration horizon of 900 seconds and λ ≥

13 were omitted.We observe that (1) a longer exploration horizon does not improve the qualityof the solution in terms of frequency deviations signiﬁcantly. On the other hand, (2) increasing the number of discrete actions in every dimension yields a signiﬁcantlybetter solution quality. This observation suggests that it is more beneﬁcial tohave a more ﬁne-grained discretization of the continuous control space, than toinvest in a longer optimization horizon. Exploration horizon [s] S t a t e s p e r M D P Exploration horizon [s] A c t i o n s p e r M D P Fig. 7: Average MDP states (left) and actions (right) for cases with λ = 5, andvarying numbers of nodes and batteries (plotted in log scale). MDP model size.

In Fig. 7, the average number of states and actions perMDP are compared for diﬀerent network conﬁgurations, all with λ = 5. All caseswere simulated for 100 runs, except for the case with 2 batteries, which wasonly simulated for 1 run, due to a too long run time. We see that the numberof states and actions is independent of the number of grid nodes . The intuitionbehind this is that an increased number of state features does not yield a largerMDP. In fact, the branching factor of the applied exploration procedure onlydepends on the number of actions and successor states in the wind error DTMC,and is independent of the number of state features. As expected, increasing thenumber of batteries to 2 yields a signiﬁcant increase in the number of MDP statesand actions, especially for a longer exploration horizon, due to the additionaldimension of the continuous control space. Frequency control and ancillary service deployment.

Finally, in Fig. 8,two example runs for the 3-node network with a single battery and explorationhorizon of 600 seconds are shown. Fig. 8a presents the frequency deviations for λ = 3, while Fig. 8b shows the same for λ = 25. Similarly, Figs. 8c and 8d showthe deployment of reserves and ﬂexibility for both cases. Due to the uncertaintyin the wind power forecast error, the results in Fig. 8 present only two possibletrajectories, and repeating the experiment can lead to diﬀerent results.The observed frequency deviations are signiﬁcantly lower for the case withmore ﬁne-grained discretization of the actions (i.e. λ = 25), thus conﬁrming theresults also shown in Fig. 6. This diﬀerence in the control precision is clearlydepicted between Figs. 8c and 8d. The intuition behind this is that a ﬁne-graineddiscretization allows for more precise control of the ancillary service deployment,which results in lower frequency deviations. owards Predictive Veriﬁcation of Smart Grids 15(a) Frequency deviations for λ = 3. (b) Frequency deviations for λ = 25.(c) Ancillary services for λ = 3. (d) Ancillary services for λ = 25. Fig. 8: Results for two runs for the 3-node case with 1 battery, for diﬀerent λ . We presented a novel method to solve the problem of short-term scheduling forﬂexibility-based ancillary services in power systems with uncertain wind powergeneration. By modelling the problem as an MDP, we overcome the need forboth sampling and linearization, as opposed to the continuous-state approachesused by most traditional power system analysis methods. Our experiments showthat our approach is feasible for power grids with diﬀerent levels of complexityand under realistic operating conditions. Furthermore, our results show it is morebeneﬁcial to have a more ﬁne-grained discretization of the continuous controlspace, than to invest in a longer optimization horizon. Since the size of the MDPgrows exponentially with both the number of actions and the exploration horizon,making a trade-oﬀ between the two is necessary.In the future, we will exploit the ﬂexibility of our model to incorporatealternative grid conﬁgurations and multiple sources of uncertainty, such asimperfect communication between assets in the grid, or demand uncertainty.Moreover, instead of the batteries, other ﬂexible assets can also be modeled, suchas ﬂexibility provided by the thermal inertia of large-scale buildings.

References

1. Aghaei, J., Alizadeh, M.I.: Demand response in smart electricity grids equipped withrenewable energy sources: A review. Renewable and Sustainable Energy Reviews , 64–72 (2013)2. Al-Saﬀar, M., Mus´ılek, P.: Distributed optimal power ﬂow for electric power systemswith high penetration of distributed energy resources. In: CCECE 2019. pp. 1–5.IEEE (2019)3. Badings, T.S.: MSc Thesis. Buildings-to-Grid Integration for Demand-Side Flexibil-ity in Power Systems with Uncertain Generation. University of Groningen (2019)4. Badings, T.S., Rostampour, V., Scherpen, J.M.: Distributed Building EnergyStorage Units for Frequency Control Service in Power Systems. IFAC-PapersOnLine (4), 228–233 (2019)5. Baier, C., Katoen, J.P.: Principles of Model Checking. MIT Press (2008)6. Bertsch, J., Hagspiel, S., Just, L.: Congestion management in power systems:Long-term modeling framework and large-scale application. Journal of RegulatoryEconomics (3), 290–327 (2016)7. Boyd, S.P., Vandenberghe, L.: Convex Optimization. Cambridge University Press(2014)8. Calaﬁore, G.C., Campi, M.C.: The scenario approach to robust control design.IEEE Trans. Autom. Control. (5), 742–753 (2006)9. Campi, M.C., Garatti, S.: The exact feasibility of randomized solutions of uncertainconvex programs. SIAM J. Optim. (3), 1211–1230 (2008)10. Cauchi, N., Abate, A.: \ mathsf stochy : Automated veriﬁcation and synthesis ofstochastic processes. In: TACAS (2). LNCS, vol. 11428, pp. 247–264. Springer(2019)11. Chertkov, M., Chernyak, V.: Ensemble of thermostatically controlled loads: Statis-tical physics approach. Scientiﬁc reports (1), 1–9 (2017)12. Ding, T., Zeng, Z., Bai, J., Qin, B., Yang, Y., Shahidehpour, M.: Optimal electricvehicle charging strategy with Markov decision process and reinforcement learningtechnique. IEEE Transactions on Industry Applications (5), 5811–5823 (2020)13. ENTSO-e: Transparency Platform - Generation Forecasts for Wind and Solar,Control area Germany (2020)14. Gerard, H., Rivero Puente, E.I., Six, D.: Coordination between transmission anddistribution system operators in the electricity sector: A conceptual framework.Utilities Policy , 40–48 (2018)15. Grillo, S., Pievatolo, A., Tironi, E.: Optimal storage scheduling using Markovdecision processes. IEEE Transactions on Sustainable Energy (2), 755–764 (2016)16. Hartmanns, A., Hermanns, H., Berrang, P.: A comparative analysis of decentralizedpower grid stabilization strategies. In: Winter Simulation Conference. pp. 158:1–158:13. WSC (2012)17. Hemmati, R., Saboori, H., Jirdehi, M.A.: Stochastic planning and schedulingof energy storage systems for congestion management in electric power systemsincluding renewable energy resources. Energy , 380–387 (2017)18. Kempton, W., Udo, V., Huber, K., Komara, K., Letendre, S., Baker, S., Brunner,D., Pearre, N.: A test of Vehicle-to-Grid (V2G) for Energy Storage and FrequencyRegulation in the PJM System. Results from an Industry-University ResearchPartnership (2008)19. Liu, Y., Yu, N., Wang, W., Guan, X., Xu, Z., Dong, B., Liu, T.: Coordinating theoperations of smart buildings in smart grids. Applied Energy (July), 2510–2525(2018)owards Predictive Veriﬁcation of Smart Grids 1720. Lymperopoulos, I., Qureshi, F.A., Nghiem, T., Khatir, A.A., Jones, C.N.: Pro-viding ancillary service with commercial buildings: The Swiss perspective. IFAC-PapersOnLine (8), 6–13 (2015)21. MacDougall, P., Roossien, B., Warmer, C., Kok, K.: Quantifying ﬂexibility forsmart grid services. In: 2013 IEEE Power Energy Society General Meeting. pp. 1–5.IEEE (2013)22. Machowski, J., Dong, Z.Y., Member, S., Zhang, P.: Power System Dynamics:Stability and Control. Wiley (2006)23. Margellos, K., Goulart, P., Lygeros, J.: On the road between robust optimizationand the scenario approach for chance constrained optimization problems. IEEETrans. Autom. Control. (8), 2258–2263 (2014)24. Margellos, K., Haring, T., Hokayem, P., Schubiger, M., Lygeros, J., Andersson,G.: A Robust Reserve Scheduling Technique for Power Systems with High WindPenetration. Proceedings of PMAPS pp. 870–875 (2012)25. Murty, K.G., Kabadi, S.N.: Some np-complete problems in quadratic and nonlinearprogramming. Math. Program. (2), 117–129 (1987)26. NG ESO: Optional downward ﬂexibility management (odfm) service documents.National Grid ESO (2020)27. Nguyen, D.B., Scherpen, J.M.A., Bliek, F.: Distributed optimal control of smartelectricity grids with congestion management. IEEE Trans Autom. Sci. Eng. (2),494–504 (2017)28. Papaefthymiou, G., Kl¨ockl, B.: MCMC for wind power simulation. IEEE Transac-tions on Energy Conversion (1), 234–240 (2008)29. Peruﬀo, A., Guiu, E., Panciatici, P., Abate, A.: Safety guarantees for the electricitygrid with signiﬁcant renewables generation. In: QEST. LNCS, vol. 11785, pp.332–349. Springer (2019)30. Pillay, A., Prabhakar Karthikeyan, S., Kothari, D.P.: Congestion management inpower systems - A review. International Journal of Electrical Power and EnergySystems , 83–90 (2015)31. Puterman, M.L.: Markov Decision Processes: Discrete Stochastic Dynamic Pro-gramming. Wiley Series in Probability and Statistics, Wiley (1994)32. Razmara, M., Bharati, G.R., Shahbakhti, M., Paudyal, S., Robinett, R.D.: Bileveloptimization framework for smart building-to-grid systems. IEEE Trans. SmartGrid (2), 582–593 (2018)33. Rostampour, V., Badings, T.S., Scherpen, J.M.A.: Buildings-to-grid integrationwith high wind power penetration. In: CDC. pp. 2976–2981. IEEE (2019)34. Rostampour, V., Badings, T.S., Scherpen, J.M.A.: Demand ﬂexibility managementfor buildings-to-grid integration with uncertain generation. Energies (24) (2020)35. Rostampour, V., Ter Haar, O., Keviczky, T.: Distributed stochastic reserve schedul-ing in ac power systems with uncertain generation. IEEE Transactions on PowerSystems (2), 1005–1020 (2018)36. Sincovec, R.F., Erisman, A.M., Yip, E.L., Epton, M.A.: Analysis of DescriptorSystems Using Numerical Algorithms. IEEE Trans. Autom. Control. (1), 139–147(1981)37. Soudjani, S.E.Z., Abate, A.: Aggregation and control of populations of thermostati-cally controlled loads by formal abstractions. IEEE Trans. Control. Syst. Technol. (3), 975–990 (2015)38. Soudjani, S.E.Z., Gevaerts, C., Abate, A.: FAUST 2 : Formal abstractions ofuncountable-state stochastic processes. In: TACAS. LNCS, vol. 9035, pp. 272–286.Springer (2015)8 T.S. Badings, A. Hartmanns, N. Jansen, and M. Suilen39. Taha, A.F., Gatsis, N., Dong, B., Pipri, A., Li, Z.: Buildings-to-grid integrationframework. IEEE Trans. Smart Grid (2), 1237–1249 (2019)40. Trip, S., B¨urger, M., Persis, C.D.: An internal model approach to frequency regula-tion in inverter-based microgrids with time-varying voltages. In: CDC. pp. 223–228.IEEE (2014)41. Wang, J., Liu, C., Ton, D., Zhou, Y., Kim, J., Vyas, A.: Impact of plug-in hybridelectric vehicles on power systems with demand response and wind power. EnergyPolicy39