Application-Driven Learning via Joint Prediction and Optimization of Demand and Reserves Requirement
Joaquim Dias Garcia, Alexandre Street, Tito Homem-de-Mello, Francisco D. Muñoz
AApplication-Driven Learning via Joint Estimationand Optimization for Demand and ReservesRequirement Forecast
Joaquim Dias Garcia
LAMPS, DEE, PUC-Rio & PSR-Inc, Rio de Janeiro, Brazil, [email protected]
Alexandre Street
LAMPS, DEE, PUC-Rio, Rio de Janeiro, Brazil, [email protected]
Tito Homem-de-Mello
School of Business, UAI, Santiago, Chile, [email protected]
Francisco D. Mu˜noz
Facultad de Ingenier´ıa y Ciencias, UAI, Santiago, Chile, [email protected]
Forecasting and decision-making are generally modeled as two sequential steps with no feedback, followingan open-loop approach. In power systems, operators first forecast loads trying to minimize errors withrespect to historical data. They also size reserve requirements based on error estimates. Next, they make unitcommitment decisions and operate the system following a dispatch schedule, deploying reserves as neededto accommodate forecast errors. However, co-optimizing these processes may lead to better decisions andresult in lower operating costs than when they are considered sequentially. In this paper we present a newclosed-loop learning framework in which the processes of forecasting and decision-making are merged andco-optimized through a bilevel optimization problem. We prove asymptotic convergence of the method andpropose two solution approaches: an exact method based on the KKT conditions of the second level problemand a scalable heuristic approach suitable for decomposition methods. We benchmark our methodologywith the standard sequential least squares forecast and dispatch planning process. We apply the proposedmethodology to an illustrative single-bus system and to the IEEE 24-, 118-, and 300-bus test systems. Ourresults show that the proposed approach yields consistently better performance than the standard open-loopapproach.
Key words : Application-driven learning, joint estimation and optimization, bilevel optimization, reservesscheduling, forecast, power systems planning a r X i v : . [ m a t h . O C ] F e b ias Garcia, Street, Homem-de-Mello, Mu˜noz Application-driven learning applied to demand and reserve forecast
1. Introduction
The most common approach to make decisions under uncertainty involves three steps. In the firststep, one develops a forecast for all uncertainties that affect the decision-making problem based onall information available. In the second step, an action based on the forecast is selected. Finally, inthe third step, one implements corrective actions after uncertainties are realized. This three-stepprocedure constitutes an open-loop forecast-decision process in which the outcomes of the decisionsare not considered in the forecasting framework.In the electricity sector, it is common for system operators to use an open-loop forecast-decisionapproach. First, loads are forecast based on standard statistical techniques, such as least squares(LS), and reserve requirements are defined by simple rules, based on quantiles, extreme valuesand standard deviation of forecast errors according to specified reliability standards (Ela et al.2011a). Then, a decision is made to allocate generation resources following an energy and reservescheduling program (Chen et al. 2014, De Vos et al. 2019). In real-time, reserves are deployed toensure that power is balanced at every node, compensating for forecast errors.However, it has been demonstrated that stochastic programming models yield better results thandeterministic ones when making decisions under uncertainty because the former take distributionsinto consideration. These models provide better results in terms of cost, reliability, and marketefficiency compared to deterministic approaches (Wang and Hobbs 2014). Nevertheless, in practicalapplications, model tractability imposes small sample sizes for sample average approximations(SAA). A consequence of this tractability issue is that SAA solutions become sample dependent(Papavasiliou et al. 2014, Papavasiliou and Oren 2013), thereby compromising market transparencyand preventing stakeholders acceptance (Wang and Hobbs 2015). Therefore, most system operatorsworldwide still rely on deterministic short-term scheduling (economic dispatch or unit commitment)models with exogenous forecasts for loads and reserve requirements (Chen et al. 2014, PJM 2018).Within this context, one alternative to improve the performance of deterministic scheduling tools isto forecast load and reserve requirements with the goal of minimizing energy and reserve schedulingcosts. ias Garcia, Street, Homem-de-Mello, Mu˜noz
Application-driven learning applied to demand and reserve forecast There is empirical evidence that system operators rely on ad hoc or out-of-market actions—andnot just on reserves— to deal with uncertainty in operations. According to the 2019 Annual Reporton Market Issues and Performance of the California ISO (CAISO 2020), “...operators regularlytake significant out-of-market actions to address the net load uncertainty over a longer multi-hourtime horizon (e.g., 2 or 3 hours). These actions include routine upward biasing of the hour-aheadand 15-minute load forecast, and exceptional dispatches to commit and begin to ramp up additionalgas-fired units in advance of the evening ramping hours.”
Additionally, reserve requirements are,in practice, empirically defined according to further ad hoc off-line rules based on off-line analysis(Ela et al. 2011b, PJM 2018). These ad hoc procedures lack technical formalism and transparencyto minimize operating and reliability costs. Consequently, this challenging real-world applicationrequires further research on the topic.For years, decision-making and forecasting have been treated as two completely separate pro-cesses (Bertsimas and Kallus 2019). Many communities, such as Statistics and Operations Research,have studied these problems and developed multiple tools combining probability and optimization.The machine learning community, which combines many ideas from optimization and probability,has also been tackling such tasks and has proposed methods to treat them jointly (Bengio 1997).Classical forecasting methods do not take the underlying application of the forecast into account.Consequently, hypotheses such as prediction error symmetry in least lquares (LS) might not be thebest fit for problems with asymmetric outcomes. By acknowledging the asymmetry in particularproblems lead researchers to attempt to capture it empirically, this method is indirect and doesnot take the application into account directly. Some existing methods do capture asymmetry,such as Quantile Regression (QR) (Rockafellar et al. 2008) applied to portfolio allocation. Theinterest in exploring asymmetric loss functions is not new. For instance, Zellner (1986a) and Zellner(1986b) acknowledge that biased estimators can perform even better than those that make accuratepredictions of statistical properties of the stochastic variables. The author exemplifies that anoverestimation is not as bad as underestimation for the case of dam construction and attributes asecond example about the asymmetry on real estate assessment to Varian (1975). ias Garcia, Street, Homem-de-Mello, Mu˜noz Application-driven learning applied to demand and reserve forecast
Within this context, two possible avenues of research are opened to achieve better results: i) afocus on improving the decision-making model (prescriptive framework), which assumes we canchange it to consider embedded co-optimized forecasts (Bertsimas and Kallus 2019); or ii) a focuson improving the forecasting model (predictive framework), which assumes we can not change thedecision-making process (in our application, defined by system operators’ dispatch models), but wecan change the forecasts to incorporate, in a closed-loop manner, a given application cost function(Bengio 1997, Elmachtoub and Grigas 2017, Mu˜noz et al. 2020). Therefore, in this paper, we focuson the latter avenue. In Section 2, we provide a literature review on this subject.
The objective of this paper is to present a new closed-loop application-driven learning framework forpoint forecast in which the ex-ante (or planning) cost-minimization structure of the decision-maker,i.e., the application schema , is considered in the estimation process. Therefore, our frameworkreplaces the traditional statistical error minimization objective with a cost-minimization structureof a specific application. To do that, we derive a new and flexible learning framework based on abilevel optimization model. Furthermore, we provide an asymptotic convergence proof for both theobjective function value and estimated parameters under mild conditions. This paper focuses onapplying the general method to the demand and reserve requirement forecasting problem of powersystem operators.The first-level of our bilevel problem seeks the parameters of a forecasting model that performsbest in terms of the application objective, i.e., ex-ante reserves allocation cost plus ex-post , or real-time, energy dispatch costs incurred when operating the system under the observed demand data.Thus, the first-level accounts for both the predictive model specification (parameters selection)and the cost evaluation metric based on the actual operation of the system for many data points.It is relevant to mention that we can also consider reserve requirement constraints imposed byregulatory rules, reliability standards (The European Commission 2017, Ela et al. 2011a), andrisk-aversion metrics (Shapiro et al. 2014). In the second level, the ex-ante energy and reserve ias Garcia, Street, Homem-de-Mello, Mu˜noz
Application-driven learning applied to demand and reserve forecast scheduling process of the system operator is accounted for based on 1) the conditional demandforecast and 2) the reserve requirements, both defined in the first level for each point of the dataset.Thus, multiple instances of the lower-level problem are considered in our bilevel model, each ofwhich representing the one-step-ahead deterministic scheduling process performed by operatorsfor each period. In this context, the second level ensures closed-loop feedback characterizing jointscheduling decisions of energy and reserve allocations without perfect information of the targetperiod data.Two solutions approaches are presented. The first approach is an exact method based on theKKT conditions of the second-level problem. The second is a scalable heuristic approach suitablefor decomposition methods and parallel computing. Although not limited to linear bilevel pro-grams, we show how to design efficient methods tailored for off-the-shelf linear optimization solvers.Additionally, our scalable heuristic method ensures optimal second-level solutions. This is a salientfeature of our method. In this context, the proposed framework is general and suitable for a widerange of applications relying on the standard structure of the forecast-decision process.We benchmark our methodology with the traditional sequential least squares forecast and energyand reserve scheduling approach. We apply the proposed methodology to multiple case studies,namely, an illustrative single-bus system and the IEEE 24-, 118-, and 300-bus test systems. Resultsshow that the proposed approach yields consistently better performance in out-of-sample tests thanthe benchmark where forecasts and decisions are sequentially carried out. More specifically, bycomparing the newly proposed exact and heuristic methods, we show that the heuristic approachis capable of consistently achieving high-quality solutions. For larger systems, where the exactmethod fails to find solutions within reasonable computational times, the heuristic method stillexhibits high-quality performance compared to the benchmark for all test systems.
2. Literature review
We provide a literature review on 1) forecast models jointly optimized for a given application,hereinafter referred to as application-driven forecast, and 2) uncertainty forecasting and reservesizing. ias Garcia, Street, Homem-de-Mello, Mu˜noz Application-driven learning applied to demand and reserve forecast
The ingenious idea of integrating the process of forecasting and optimizing a downstream prob-lem was first proposed in the seminal paper by Bengio (1997). More than twenty years ago, theauthor emphasized the importance of estimating parameters with the correct goals in mind. Inthat work, a Neural Network (NN) is trained to forecast stocks with an objective function thatdescribes the portfolio revenue given an allocation based on stocks forecast. An attempt to lowerthe burden of the method was proposed by Garcia and Gen¸cay (2000): the idea is to train multipleprediction models with standard regressions, but choose the best one in the out-of-sample anal-ysis considering the proper application-driven objective function. The work by Kao et al. (2009)presents another intermediary methodology. The model for estimating forecasts includes both theapplication objective function and the fitness measure similar to maximum likelihood estimation(MLE). A bi-objective problem is solved with scalarization: the authors look for a good balancebetween MLE and application value.Following the key idea of Bengio (1997) closely, the work by Donti et al. (2017) presents a genericalgorithm to deal with parameter optimization of forecasting models embedded in stochastic pro-gramming problems, that is, parameters estimated considering the loss function of the actual prob-lem. The algorithm is based on the stochastic gradient descent (SGD) method and employs toolsfor automatically differentiating strongly convex quadratic optimization problems. The method isapplied to small prototypical quadratic programming problems and the local solutions obtainedare shown to be promising.More recently, the work Smart ”Predict and Optimize” (SPO) (Elmachtoub and Grigas 2017),recognizes the importance of the closed-loop estimation. The authors develop an algorithm for thelinear programming case that is based on relaxation and convexification of the nonlinear loss func-tion before applying a tailored stochastic SGD approach, instead of looking for local solutions withnonlinear methods. To develop the algorithm, the authors limit themselves to linear dependencyon features and restrict uncertainty to the objective function. SPO is similar to the method pro-posed by Ryzhov and Powell (2012) to estimate uncertain objective coefficients without consideringfeatures in a different context. ias Garcia, Street, Homem-de-Mello, Mu˜noz
Application-driven learning applied to demand and reserve forecast While working on this paper for a few months, the authors became aware of a recently postedpreprint that guards many conceptual similarities to the general version of our proposed model.The work of Mu˜noz et al. (2020) also focuses on the idea of finding the best forecast for a givenapplication (or context) through a bilevel framework. The framework proposed in Mu˜noz et al.(2020) is applied to estimate a parameter of the inverse demand curve of a Cournot strategicproducer bidding in forward markets. The scalability of their model relies on a nonlinear relaxationof the right-hand-side of the complementarity constraints. In our method, we adopt a differentapproach to overcome the issue of suboptimal lower-level solutions and to tackle very large scaleproblems, as will explained in Section 7.4. Moreover, we prove convergence of our method, whereasMu˜noz et al. (2020) do not discuss convergence at all.Sen and Deng (2018) describes the so-called Learning Enabled Optimization (LEO). LEO is aframework to combine Statistical Learning (SL), Machine Learning (ML) and (stochastic) Opti-mization. The idea is to compare a set of predefined SL/ML models with the cost value of theactual application in mind. The main difference here is that the ML/SL are still estimated basedon classical methods.Not surprisingly, some of the above works either explicitly or implicitly formulate the problems asBilevel Optimization problems. For more information on bilevel optimization the reader is directedto Bard (2013). The work by Dempe (2018) lists hundreds of references related to bilevel opti-mization, including papers related to parameter optimization. Parameter optimization is frequentlymodeled as bilevel optimization and has been drawing the attention of many fields such as ML,control, energy systems and game theory. This can be thought of as a version of the closed-loopparadigm since those works target good parameters for algorithms and applications, just like ourclosed-loop is focused on obtaining good parameters for the forecasting, which will lead to thebest solutions for the entire problem. In the ML community, we call attention to hyper-parametertuning (Franceschi et al. 2018). ias Garcia, Street, Homem-de-Mello, Mu˜noz Application-driven learning applied to demand and reserve forecast
The operation of power systems has been profoundly related to uncertainty handling. The elec-tric load has been among the main challenges for forecasters in power systems for many years.Researchers around the world have proposed the most varied methodologies, ranging from standardlinear regressions to Neural Networks (NN); these techniques, along with many others, are reviewedin Hong and Fan (2016) and Van der Meer et al. (2018). Variable renewable energy sources areprobably the current big challenge in forecasting for power systems. Although many techniquesare already available, forecasting sources like wind have proven to be significantly harder thanload (Van der Meer et al. 2018, Orwig et al. 2014). As shown in reviews (Sweeney et al. 2020,Van der Meer et al. 2018), wind and solar forecasting are divided into two main trends: i) physical-based methods that rely on topographic models and Numerical Weather Predictions; ii) statisticalmethods including Kalman filters, ARMA models, NN.Although forecasts have consistently improved in the last years, the systems must be readyto withstand deviations from forecasts. The widely used method is reserve allocation: besidesthe power scheduled to meet demand forecasts, additional power is scheduled to give the systemoperator flexibility to handle real time-operation. Since the early work on reserve sizing for PJMby (Anstine et al. 1963), many other methods have been proposed to account for the variationsin loads, contingencies and VRE (Holttinen et al. 2012). Many reserve sizing rules are appliedby different ISOs all over the world (Ela et al. 2011a), these methods vary from deterministicrules to historical and statistical guidelines. Most of the rules in Ela et al. (2011a) are static.Although time-varying reserves have been studied in the past, they have re-emerged as dynamicprobabilistic reserve (DPR) (Matos and Bessa 2010). Probabilistic forecasts are frequently used inthe DPR method to account for the forecasting errors that will be ultimately handled by reservesin power systems. These probabilistic reserves can be sized following a variety of method withdifferent complexity based on: forecast error standard deviations (Strbac et al. 2007, Holttinenet al. 2012), non-parametric estimation of forecast error distribution (Bucksteeg et al. 2016) or ias Garcia, Street, Homem-de-Mello, Mu˜noz
Application-driven learning applied to demand and reserve forecast even machine learning (De Vos et al. 2019). These are all considered stochastic methods and aresimple alternatives to capture and incorporate fairly complex dynamics that are challenging forbottom-up approaches (De Vos et al. 2019).A prominent alternative to the use of reserves in power systems is the Stochastic Unit commit-ment, in which many types of reserves can be defined endogenously, targeting cheaper operationson average. However, as described in the review on Unit Commitment by Zheng et al. (2014),there are at least three main barriers toward the wide acceptance of stochastic unit commitment: i)uncertainty modeling, ii) computational performance and iii) Market design. Uncertainty modelingis jointly tackled by statistical modeling of the uncertainty concerning scenario generation andforecasting and by a decision-making framework like risk-averse stochastic optimization, robustoptimization and so on. Computational performance is the focus of many works like the Lagrangiandecomposition (Aravena and Papavasiliou 2020), improved formulations (Knueven et al. 2020), pro-gressive hedging (Gade et al. 2016). However, even with many recent techniques, the computationalcost is still a barrier for the usage of stochastic UC in ISOs. Finally, the least studied challenge isthe market design since it requires experimenting and developing rules that are both feasible to beimplemented and accepted by stakeholders (Kazempour et al. 2018, Wang and Hobbs 2015). SinceISOs currently follow the alternate route and tackle the uncertainty of UC with reserves (Wangand Hobbs 2015), we will also follow this approach to propose a readily applicable method.
3. Application-driven learning and forecasting
In this section, we contrast the standard sequential framework, referred to as open-loop, andthe joint estimation and optimization model, closed-loop. The presentation is in general form tofacilitate the description of the solution algorithm, to set notation for the convergence results andto highlight that the method has applications beyond load forecasting and reserve sizing in powersystems. We will specialize the bilevel optimization problem for closed-loop load forecasting andreserve sizing in Section 6.We consider a dataset of historical data { y t , x t } t ∈ T , where T = { , . . . , T } . Here y t are observationsof a variable of interest that we want to forecast, while x t are observations of external variables ias Garcia, Street, Homem-de-Mello, Mu˜noz Application-driven learning applied to demand and reserve forecast (covariates or features) that can be used to explain the former; the latter might include lags of y t as in time series auto-regressive modeling. The classic forecast-decision approach works as follows.The practitioner trains a parametric forecast model seeking for the best vector of parameters, θ ,such that a loss function, l ( · , · ), between the conditional forecast for sample t , ˆ y t ( θ, x t ), and theactual data, y t , is minimized: min θ T (cid:80) t l (ˆ y t ( θ, x t ) , y t ). This is frequently done by LS optimizationmethods: min θ T (cid:80) t (cid:107) ˆ y t ( θ, x t ) − y t (cid:107) . In the planning step, a decision is made by an optimized policybased on the previously obtained forecast. This results in a vector z ∗ (ˆ y t ), which in our applicationcomprises the schedule of energy and reserves through generating units. Finally, the actual data y t is observed, and the decision-maker must adapt to it, for instance, the system operator respondswith a balancing re-dispatch, and a cost , G a ( z ∗ (ˆ y t ) , y t ), is measured. There is no feedback of thefinal cost into the forecasting and decision policy, hence, the name Open-Loop . The core of the proposed predictive framework is to explore a feedback structure between theestimated predictive model and the application cost assessment. The general idea is depicted inFigure 1 that stresses the difference from the Open-Loop model.
Figure 1
Learning models: considering the blue dashed line we have the
Closed-Loop model, otherwise it representsthe
Open-Loop model.
The estimation method can be mathematically described through the following bilevel optimizationproblem (BOP): θ T = arg min θ ∈ Θ , ˆ y t ,z ∗ t ( · ) T (cid:88) t ∈ T G a ( z ∗ t (ˆ y t ) , y t ) (1) ias Garcia, Street, Homem-de-Mello, Mu˜noz Application-driven learning applied to demand and reserve forecast s.t. ˆ y t = Ψ( θ, x t ) ∀ t ∈ T (2) z ∗ t (ˆ y t ) ∈ arg min z ∈ Z G p ( z, ˆ y t ) ∀ t ∈ T , (3)where, for i ∈ { a, p } , G i ( z, y ) = c (cid:62) i z + Q i ( z, y ) (4) Q i ( z, y ) = min u { q (cid:62) i u | W i u ≥ b i − H i z + F i y } (5)Note that the functions in (4) and (5) resemble the formulation of two-stage stochastic programs,in that given a decision z and an observation y , one determines the best corrective action u . Inthat context, c i , q i , W i , b i , H i and F i ( i ∈ { a, p } ) are parameters defined according to the problemof interest. Note also that the uncertainty y appears only on the right-hand side of the problemsdefining Q a and Q p ; this will be important for our convergence analysis.At first sight, model (1)-(5) looks like a regular BOP, but we can see the main steps of Figure 1.Ψ( θ, x t ) represents a forecasting model that depends on both the vector of parameters θ and thefeatures vector x t (possibly including lags of y t ). The quantity ˆ y t is the forecast generated by themodel for sample (or period) t (comprising load and reserve requirements), conditioned to thefeatures vector x t as defined in (2). The forecast is used to obtain a decision policy, z ∗ t (ˆ y t ), byoptimizing the decision-maker planning cost function, G p ( z, ˆ y t ), (3), hence, the subscript p . Then,the optimized policy for each forecast is evaluated against the actual realization in the decision-maker’s assessment (or adaptation ) cost function, G a ( z ∗ t (ˆ y t ) , y t ), hence the subscript a , which isthe ultimate (application-driven) objective of this model, (1)-(5). The proposed formulation canbe interpreted as an optimization over θ in a back-test because a fixed θ completely determines G a ( z ∗ t (ˆ y t ) , y t ). However, instead of comparing finitely many options, we explore a continuum ofpossibilities. Note that there is no coupling between two samples indexed by t , this will be usedin the solution methods. Some additional structure is hidden for now in the constraints z ∈ Z and θ ∈ Θ. ias Garcia, Street, Homem-de-Mello, Mu˜noz Application-driven learning applied to demand and reserve forecast
One key difference from previous works (Donti et al. 2017, Elmachtoub and Grigas 2017, Mu˜nozet al. 2020) is that G a and G p can be different functions. This is extremely useful in the contextof power systems operations where planning models might differ from real-time ones. Althoughmodel (1)-(5) is fairly general, we specialize to the case of linear programs and right-hand-sideuncertainty, (4)-(5), because we will assume polyhedral structure for the set Z . This can be con-trasted with previous works that considered strongly quadratic programs (Donti et al. 2017) andobjective uncertainty (Elmachtoub and Grigas 2017). As mentioned earlier, this specialization willbe important for developing our asymptotic convergence results and our solution methods.
4. Convergence Results
In this section, we discuss some conditions for convergence of estimators obtained with application-driven joint estimation optimization. Again, our goal is to obtain the best possible forecast ˆ y t , butthis is completely defined by the parameters θ since x t is known. Let θ T be the optimal solution of(1)-(5) considering T data points. We will show the θ T converges to one element of the solution setof the actual stochastic formulation of the problem (opposed to the previously presented sampledversion). We will start describing some assumptions, then we will state and prove the main theorem. Assumption 1
The solution set of the optimization problem in (3) is a singleton for all possiblevalues of ˆ y t . In other words, the problem is always feasible and the solution is always unique. This is not asrestrictive as it seems. The feasibility requirement is similar to the classical assumption of completerecourse in stochastic programming. The uniqueness requirement is equivalent to the absence ofdual degeneracy in a linear program (Appa 2002). In the case the problem in question is dual-degenerate, it is possible to eliminate this degeneracy by perturbing the objective function—in ourcase, the vectors c p and q p —with small numbers that do not depend on the right-hand-side (RHS) ofthe problem. Thus, the same perturbation is valid for all possible ˆ y t (Megiddo and Chandrasekaran1989). Another possibility would be resorting to some lexicographic simplex method (Terlaky andZhang 1993). ias Garcia, Street, Homem-de-Mello, Mu˜noz Application-driven learning applied to demand and reserve forecast In this setting, we can define the set-valued function: ζ ( y ) := argmin z ∈ Z G p ( z, y ) (6)From B¨ohm (1975) we know that if ζ ( y ) is a compact set for all y then it a continuous set-valuedfunction. Moreover, since ζ ( y ) is a singleton for all possible values of y then we treat it as a functionwhich is continuous and piece-wise affine (Borrelli et al. 2003). Assumption 2
The feasibility set Z that appears in (3) is a non-empty and bounded polyhedron. Assumption 2 is reasonable since this is the set of implementable solutions of the decision-maker,typically representing physical quantities.
Assumption 3
The feasibility set of the dual of the problem that defines Q a ( z, y ) in (5) is non-empty and bounded. Note that this set does not depend on z and y , since they appear in the RHS of the primal problem.Again, this assumption is akin to a relatively complete recourse assumption applied to the problemdefining the outer-level function.We state now our main convergence result. Theorem 1.
Consider the process given by (1)-(5) and its output θ T . Suppose that (i) Assump-tions 1, 2 and 3 hold, (ii) the forecasting function Ψ( · , · ) is continuous in both arguments, (iii) thedata process ( X , Y ) , . . . , ( X T , Y T ) is independent and identically distributed (with ( X, Y ) denotinga generic element), (iv) the random variable Y is integrable, and (v) the set Θ is compact. Then,lim T →∞ d ( θ T , S ∗ ) = 0 , (7)where d is the Euclidean distance from a point to a set and S ∗ is defined as S ∗ = argmin θ ∈ Θ E (cid:2) G a (cid:0) ζ (Ψ( θ, X )) , Y (cid:1)(cid:3) , (8)with ζ ( · ) defined in (6). ias Garcia, Street, Homem-de-Mello, Mu˜noz Application-driven learning applied to demand and reserve forecast
Proof:
First, notice that G i ( z, Y ), i ∈ { a, p } , is continuous with respect to its arguments as itis a sum of a linear function and the optimal value of a parametric program (Gal 2010). Next,we argue that G a ( ζ (Ψ( θ, X )) , Y ) is integrable. Indeed, since Z is bounded (Assumption 2), itfollows that ζ ( y ) is bounded for all x by a constant, say K , so that (cid:107) ζ ( y ) (cid:107) ≤ K . By duality, Q i ( z, y ) = max π { ( b i − H i z + F i y ) (cid:62) π | W (cid:62) i π = q i , π ≥ } , but by Assumption 3 the dual variables of Q a ( z, y ) are bounded by a constant, say K , so (cid:107) π (cid:107) ≤ K . Thus, by a sequence of applications ofCauchy-Schwarz and triangle inequalities, we have that (cid:12)(cid:12) G a ( ζ (Ψ( θ, X )) , Y ) (cid:12)(cid:12) ≤ (cid:12)(cid:12) c (cid:62) a ζ (Ψ( θ, X )) + Q a ( ζ (Ψ( θ, X )) , Y ) (cid:12)(cid:12) ≤ (cid:13)(cid:13) c a (cid:13)(cid:13)(cid:13)(cid:13) ζ (Ψ( θ, X )) (cid:13)(cid:13) + (cid:13)(cid:13) b a − H a ζ (Ψ( θ, X )) + F a Y (cid:13)(cid:13) max W (cid:62) i π = q a ,π ≥ (cid:13)(cid:13) π (cid:13)(cid:13) ≤ K (cid:13)(cid:13) c a (cid:13)(cid:13) + K (cid:0)(cid:13)(cid:13) b a (cid:13)(cid:13) + (cid:13)(cid:13) H a ζ (Ψ( θ, X )) (cid:13)(cid:13) + (cid:13)(cid:13) F a Y (cid:13)(cid:13)(cid:1) ≤ K (cid:13)(cid:13) c a (cid:13)(cid:13) + K (cid:0)(cid:13)(cid:13) b a (cid:13)(cid:13) + (cid:13)(cid:13) H a (cid:13)(cid:13) K + (cid:13)(cid:13) F a (cid:13)(cid:13)(cid:13)(cid:13) Y (cid:13)(cid:13)(cid:1) . Hence, since Y is integrable (condition (iv) of the Theorem), we have that G a ( ζ (Ψ( θ, X )) , Y ) isalso integrable.It follows that the conditions of Theorem 7.53 in Shapiro et al. (2014) are satisfied and weconclude that: (i) the function ϕ ( θ ) := E [ G a ( ζ (Ψ( θ, X )) , Y )] is finite valued and continuous in θ ,(ii) by the Strong Law of Large Numbers, for any θ ∈ Θ we havelim T →∞ T T (cid:88) t =1 G a (cid:0) ζ (Ψ( θ, X t )) , Y t (cid:1) = E (cid:2) G a (cid:0) ζ (Ψ( θ, X )) , Y (cid:1)(cid:3) w.p.1 , (9)and (iii) the convergence in (9) is uniform in θ . Thus, by Theorem 5.3 in Shapiro et al. (2014),since the set Θ is compact we have that the minimizers (over Θ) of the expression on the left-handside of (9)—i.e., θ T —converge to the minimizers of the expression on the right-hand side in thesense of (7)-(8). Q.E.D.
Remark 1
Assumption 3 can be replaced by assuming a compact support of Y ; in this case, G a ( z, y ) is a continuous function in both arguments where both are defined on compact sets, henceit attains a maximum and is trivially integrable. ias Garcia, Street, Homem-de-Mello, Mu˜noz Application-driven learning applied to demand and reserve forecast
5. Solution Methodology
In this section, we describe solution methods to estimate the forecasting model within the proposedapplication-driven closed-loop framework described in (1)-(5). First, we present an exact methodbased on an equivalent single-level mixed integer linear programming (MILP) reformulation of thebilevel optimization problem (1)-(5). This method uses MILP-based linearization techniques toaddress the Karush Kuhn Tucker (KKT) optimality conditions of the second level and therebyguarantee global optimality of the solution in exchange for limited scalability. In the sequence, wedescribe how to use zero-order methods (Conn et al. 2009) that do not require gradients to developan efficient and scalable heuristic method to achieve high-quality solutions to larger instances. Thesemethods will leverage existing optimization solvers, their current implementations and features.
Our first approach consists of solving the bilevel problem (1)–(3) with standard techniques basedon the KKT conditions of the second-level problem (Fortuny-Amat and McCarl 1981). Thus, theresulting single-level nonlinear equivalent formulation can be reformulated as a MILP and solvedby standard commercial solvers. The conversion between KKT form to MIP form can be done bynumerous techniques (Siddiqui and Gabriel 2013, Pereira et al. 2005, Fortuny-Amat and McCarl1981), all of which have pros and cons. These techniques are implemented and automaticallyselected by the open-source package BilevelJuMP.jl (Garcia et al. 2021). This new package wasconceived to allow users to formulate bilevel problems in JuMP (Dunning et al. 2017) and solvethem with multiple off-the-shelf optimization solvers.For the sake of completeness we write the single-level nonlinear reformulation of the bilevelproblem (1)–(5) in (10)–(14). For simplicity, in this model we assume that Z = { z | Ax ≥ h } andthat Θ is polyhedral. min θ ∈ Θ , ˆ y t ,z ∗ t ,u t ,π t T (cid:88) t ∈ T (cid:2) c (cid:62) a z ∗ t + Q a ( z ∗ t , y t ) (cid:3) (10) s.t. ∀ t ∈ T : ias Garcia, Street, Homem-de-Mello, Mu˜noz Application-driven learning applied to demand and reserve forecast ˆ y t = Ψ( θ, x t ) (11) W p y t + H p z ∗ t ≥ b p + F p ˆ y t ; Az ∗ t ≥ h (12) W p (cid:62) π t = q p ; H (cid:62) p π t + A (cid:62) µ t = c p ; π t , µ t ≥ π t ⊥ W p u t + H p z ∗ t − b p − F p ˆ y t ; µ t ⊥ Az ∗ t − h (14)Equations (10) and (11) are the same as (1) and (2). (3) was replaced by (12)-(14). (12) arethe primal feasibility constraint, (13) are the dual feasibility constraints and (14) represents thecomplementarity constraints. The proposed class of methods will make extensive use of the way of thinking described in theFigure 1. In other words, the core algorithm decomposes the problem as follows:
Algorithm 1:
Meta algorithm
Result:
Optimized θ Initialize θ ; while Not converged do Update θ ; for t ∈ T do Forecast: ˆ y t ← Ψ( θ, x t );Plan Policy: z ∗ t ← arg min z ∈ Z G p ( z, ˆ y t );Cost Assessment: cost t ← G a ( z ∗ t , y t ) end Compute cost: cost ( θ ) ← (cid:80) t ∈ T ( cost t ) end We call this method a meta-algorithm because a few steps are not well specified, namely
Ini-tialization , Update , and
Convergence check, allowing for a wide range of possible specifications. ias Garcia, Street, Homem-de-Mello, Mu˜noz
Application-driven learning applied to demand and reserve forecast Initialization can be as simple as θ receiving a vector of zeros, which might not be good if theactual algorithm is a local search. One alternative that will be applied in the case study section isthe usage of traditional models as starting points, for instance, the ordinary least squares. In thecase study, we will initialize the algorithm with the LS estimate, this guarantees that the algorithmwill return at most the same cost as the Open-Loop framework in the training sample. There aremany possibilities for the convergence test. For instance, iteration limit, time, the variation of theobjective function value, and other algorithm-specific tests. Finally, the update step depends onthe selected concrete algorithm that is ultimately minimizing the non-trivial cost ( θ ) function.We will focus on a derivative-free local search algorithm named Nelder-Mead (Nelder and Mead1965). Notwithstanding, it is relevant to highlight the generality of the proposed meta-algorithm.For instance, gradient-based algorithms could also be developed based on numerical differentiationand automatic differentiation (Baydin et al. 2017). In this context, gradient calculation wouldenable the usage of Gradient Descent and BFGS-like algorithms (Liu and Nocedal 1989).The main features of the above-proposed meta-algorithm are: 1) it is suitable for parallel com-puting (the loop in the sample T is intrinsically decoupled); 2) each step is based on a deterministicLP defining the second-level variables in (3), suitable for off-the-shelf commercial solvers that findglobally optimal solutions in polynomial time; 3) each inner step can significantly benefit fromwarm-start processes developed in linear programming solvers (e.g., the dual simplex warm-startis extremely powerful, and many times only a handful of iterations will be needed in comparison topossibly thousands of iteration needed if there were no warm-start, cf. Bixby and Martin (2000)).It is worth emphasizing that the aforementioned feature 2) allows for an exact (always optimal)description of the second-level problem. Thus, differently from the approach adopted in Mu˜nozet al. (2020), where complementarity constraints are dealt with by a heuristic based on NLP toimprove efficiency in the larger cases where the MILP approach will fail, in our approach, we keepthe second level exact and face the challenge of optimizing a nonlinear problem on the upper level.As will be illustrated in our case study, this choice is supported by empirical evidence about the ias Garcia, Street, Homem-de-Mello, Mu˜noz Application-driven learning applied to demand and reserve forecast shape of the nonlinear function faced in the objective function. Additionally, it is usual in morecomplex estimation processes (like maximum likelihood-based methods) to rely on nonlinear opti-mization methods (Henningsen and Toomet 2011). Although not convex, as will be explored in ourcase study, the objective function exhibits relevant properties that facilitate the search within theparameters domain.One caveat is that variations on θ can lead to possibly infeasible results for the Policy Planning and
Cost Assessment optimization problems. Consequently, we require complete recourse for suchproblems. In cases where this property does not hold, it is always possible to add artificial (slack)variables with high penalty costs in the objective function to keep the problem feasible. In theenergy and reserve dispatch problem, this requirement is addressed by imbalance variables (loadand renewable curtailment decisions).
6. Closed-Loop Load Forecasting and Reserve Sizing
In this work, we focus on the energy and reserve scheduling problem of power systems (Kirschenand Strbac 2018, De Vos et al. 2019). In this problem, we aim to obtain the best joint conditionalpoint-forecast for the vector of nodal demands, ˆ D t , and vectors of up and down zonal or nodalreserve requirements, ˆ R ( up ) t and ˆ R ( dn ) t . While the forecast vector of nodal loads represent, e.g.,the next hour operating point target that system operators and agents should comply with, up-and down-reserve requirements represent a forecast of system’s resource availability (or securitymargins), defined per zone or node, allowing the system to withstand load deviations. Note that wecan think of loads as a general net load that corresponds to load minus non-dispatchable generation.The inputs of the problem are: vectors of historical data of dependent and explanatory vari-ables, { y t , x t } t ∈ T , including lags of demand, D t − , ..., D t − k , and possibly other covariates such asclimate and dummy variables; vectors of data associated with generating units: maximum gener-ation capacity, G , dispatch costs or offers, c ; maximum up- and down-reserve capacity, ¯ r ( up ) and¯ r ( dn ) , up- and down-reserves costs, p ( up ) and p ( dn ) ; load shedding and spillage penalty costs, λ LS and λ SP ; network data comprising the vector of transmission line capacities F , and network sensitivity ias Garcia, Street, Homem-de-Mello, Mu˜noz Application-driven learning applied to demand and reserve forecast matrix B describing the network topology and physical laws of electric circuits. Additionally, it isimportant to mention that besides changes on the models G a and G p , the input data describingthe system characteristics can be provided under two perspectives: 1) under the perspective of theactual ex-post (or assessed/implemented) operation, i.e., based on the observed demand data andmost accurate system’s description, for evaluating the function G a defined in (1); and 2) underthe ex-ante planning perspective, G p , which is accounted for in (3) based on the known features,such as previous information, and system operator’s description of the system considered in thedispatch tool. While the former has already been listed at the beginning of this paragraph, thelatter uses the same symbols but with a tilde above, i.e., ˜ c, ˜ p ( up ) , ˜ p ( dn ) , ˜ G, ˜ B , etc. For a simple matrixrepresentation of the problem, we define e to be a vector with one in all entries and appropriatedimension. M is an incidence matrix with buses in rows and generators in columns that is onewhen the generator lies in that bus and zero otherwise. Similarly, N is an incidence matrix withgenerators in columns and reserve zones in rows, which is one if the generator lies in that area.Thus, we study the following particularization of the closed-loop framework proposed in (1)–(5):min θ D ,θ Rup ,θ Rdn , ˆ D t , ˆ R t ,g t ,δ LSt ,δ SPt g ∗ t ,r ( up ) ∗ t ,r ( dn ) ∗ t T (cid:88) t ∈ T (cid:2) c (cid:62) g t + p ( up ) (cid:62) ˆ r t ( up ) ∗ + p ( dn ) (cid:62) ˆ r t ( dn ) ∗ + λ LS δ LSt + λ SP δ SPt (cid:3) (15) s.t. ∀ t ∈ T :ˆ D t = Ψ D ( θ D , x t ) (16)ˆ R ( up ) t = Ψ R ( up ) ( θ R ( up ) , x t ) (17)ˆ R ( dn ) t = Ψ R ( dn ) ( θ R ( dn ) , x t ) (18) e T ( M g t − δ SPt ) = e T ( D t − δ LSt ) (19) − F ≤ B ( M g t + δ LSt − D t − δ SPt ) ≤ F (20) g ∗ t − r ( dn ) ∗ t ≤ g t ≤ g ∗ t + r ( up ) ∗ t (21) δ LSt , δ
SPt , ˆ R ( up ) t , ˆ R ( dn ) t , g t ≥ ias Garcia, Street, Homem-de-Mello, Mu˜noz Application-driven learning applied to demand and reserve forecast (cid:16) g ∗ t , r ( up ) ∗ t , r ( dn ) ∗ t (cid:17) ∈ arg min ˆ g t , ˆ δ LSt , ˆ δ SPt , ˆ r t ( up ) , ˆ r t ( dn ) (cid:2) ˜ c (cid:62) ˆ g t + ˜ p ( up ) (cid:62) ˆ r t ( up ) + ˜ p ( dn ) (cid:62) ˆ r t ( dn ) + ˜ λ LS ˆ δ LSt + ˜ λ SP ˆ δ SPt (cid:3) (23) s.t. e T ( M ˆ g t − ˆ δ SPt ) = e T ( ˆ D t − ˆ δ LSt ) (24) − ˜ F ≤ ˜ B ( M ˆ g t + ˆ δ LSt − ˆ D t − ˆ δ SPt ) ≤ ˜ F (25) N ˆ r t ( up ) = ˆ R ( up ) t (26) N ˆ r t ( dn ) = ˆ R ( dn ) t (27)ˆ g t + ˆ r t ( up ) ≤ ˜ G (28)ˆ g t − ˆ r t ( dn ) ≥ r t ( up ) ≤ ¯ r ( up ) (30)ˆ r t ( dn ) ≤ ¯ r ( dn ) (31)ˆ g t , ˆ r t ( up ) , ˆ r t ( dn ) , ˆ δ LSt , ˆ δ SPt ≥ . (32)In (15)–(32), the objective function of the upper level problem (15) comprises the sum of theactual operating cost, the cost of scheduled reserves, and the implemented load shed and renewablespillage costs for all periods within the dataset, i.e., for t ∈ T . In the upper level, constraints(16)–(18) define the forecast model. Note that all periods are coupled by the vector of parameters θ = [ θ (cid:62) D , θ (cid:62) R ( up ) , θ (cid:62) R ( dn ) ] (cid:62) , which do not depend on t . These parameters define the forecast model thatwill be applied to each t for demand, as per ˆ D t in (16), for up reserve requirements, as per ˆ R ( up ) t in (17), and for down reserve requirements, as per ˆ R ( dn ) t in (18). The forecast models are definedby functions Ψ D , Ψ R ( up ) , and Ψ R ( dn ) that transform parameters and the historical data on loadand reserve requirement forecasts. For the sake of simplicity and didactic purposes, in this work,we assume affine regression models. The reserves are parts of the forecast vector ˆ y t because themethod optimizes a model for them. However, reserves historical data does not need to be in y t because it does not appear in the model. It could appear in regularization constraints, in any case.Constraints (19)–(22) together with the objective function (15) particularize G a from (1). Theyassess the ex-post operating cost (first term, c Tt g t , and the last to terms, λ LSt δ LSt + λ SPt δ SPt ) of the ias Garcia, Street, Homem-de-Mello, Mu˜noz
Application-driven learning applied to demand and reserve forecast actual dispatch given the ex-ante planned generation, g ∗ t , and allocated up and down reserves, r ( up ) ∗ t and r ( dn ) ∗ t , defined by the second level (23)–(32). Constraint (19) accounts for the ex-post energy balance constraint, where total generation meets total observed load data. The left-hand-side of the constraint is the sum of generated energy in all buses, with M g t resulting on the nodalgeneration injection vector (total generation per bus). δ SPt represents the nodal generation spilledper bus (positive load imbalance decision). The right-hand-side of the constraint accounts for thenet-nodal load vector (observed net-demand vector D t ). δ LSt represents the decision vector of nodalload shed (negative load imbalance decision). Finally, constraint (20) limits the flow of energythrough each transmission line to pre-defined bounds and (21) limits the ex-post generation torespect the operation range defined by the ex-ante planned generation, g ∗ t , and allocated up anddown reserves, r ( up ) ∗ t , r ( dn ) ∗ t . Constraint (22) ensures positiveness of slack, generation, and reserverequirement variables.The planning policy defines variables g ∗ t , r ( up ) ∗ t , and r ( dn ) ∗ t under the conditional informationavailable in vector x t . These variables should respect the optimality of the market’s or systemoperator’s ex-ante scheduling (or planning policy), as per (23), based on load forecast, ˆ D t , andreserve requirements, ˆ R ( up ) t and ˆ R ( dn ) t , for the next period (e.g., hour). Again, it is relevant to notethat this planning policy may differ from the actual ex-post implemented policy (based on theobserved load data D t ) resulting on the actual operating cost considered in the objective functionof the first level (15). Within this context, constraints (23)–(32) detail the second-level problem,which represents the optimization that takes place in the planning phase at each period to definea generation and reserve schedule for the next period. Thus, these constraints are particularizingthe general model, G p in (3). In the proposed closed-loop framework, (23) is key. It allows usto define, within a problem that seeks the best forecast model aiming to minimize the ex-post operation cost, the objective of an ex-ante scheduling problem minimizing energy and reserve costsfor a conditioned load forecast ( ˆ D t ) and reserve requirements ( ˆ R ( up ) t and ˆ R ( dn ) t ). Constraints (24)and (25) are similar to (19) and (20). Expressions (26) and (27) ensure that the total reserve ias Garcia, Street, Homem-de-Mello, Mu˜noz Application-driven learning applied to demand and reserve forecast requirements ( ˆ R ( up ) t and ˆ R ( dn ) t , which are considered as parameters for the lower-level problem) mustbe allocated among generators in the form of up and down reserves ( ˆ r t ( up ) and ˆ r t ( dn ) – second-leveldecision vectors). Constraints (28) and (29) limits the scheduled generation and reserves range (upand down) to generators physical generation limits. Constraints (30) and (31) limit the maximumamount of reserves that can be allocated in each generating unit and (32) ensures positiveness ofthe generation and reserve variables of the second level.This model is powerful because it optimizes the forecasts in the best way possible for the energyand reserve dispatch application. While the lower level, (23)–(32), ensures an ex-ante generationschedule and reserve allocation compatible with the system operator’s information level (bestschedule given the previous hour conditional forecast for t ), the upper level selects the parametersof the forecast model aiming to minimize the ex-post operating cost. In the following sections, wewill compare a few variants of this problem by optimizing all or some of the parameters θ D , θ R ( up ) ,and θ R ( dn ) .
7. Case studies
This section presents case studies to demonstrate the methodology’s applicability and how theclosed-loop framework can outperform the classic open-loop scheme in multiple variants of theload forecasting and reserve sizing problem defined in Section 6. First, we show that the Heuristicmethod of Section 5.2 can achieve close to optimal solutions in a fraction of the time requiredby the Exact method of Section 5.1. Second, we study the estimated parameters’ and forecasts’empirical properties and contrast them with the classical least squares (LS) estimators. Afterthat, we briefly explore how the method can estimate dynamic reserves. Moreover, we apply themethod to highlight the relations between load shed cost and the estimated parameters. Finally,we show that the heuristic algorithm finds good quality local-optimal parameters systematicallyoutperforming the LS open-loop benchmark for instances far larger than those solved in previouslyreported works tackling closed-loop bilevel frameworks. We used the same Dell Notebook (Intel i78th Gen with 8 logical cores, 16Gb RAM) for all studies. ias Garcia, Street, Homem-de-Mello, Mu˜noz
Application-driven learning applied to demand and reserve forecast We will consider four power system datasets in the next sections. The first is a single bus systemdefined by us, with 1 zone, 1 load (with long term average of 6) and 4 generators (with capacities5 , , . , . , , , .
9, 1 . .
9, respectively. Deficit and generation curtailmentcosts were defined, respectively, as 8 and 3 times the most expensive generator cost. Note thatthe test systems contain at most a single value for loads on each bus. We used those numbers asthe long-term average of AR(1) processes in each bus. In these stochastic processes, the AR(1)coefficients were set to 0 . .
4. Negativedemands were truncated to zero, although they could represent an excess renewable generation. Allgenerators were allowed to have up to 30% their capacity allocated to reserves, and their reserveallocation costs were set to 30% their nominal costs. We only considered the linear component ofthe generators’ costs in all instances.
In most of the following sections we will consider simple forecasting models so that we can detailexperiment results clearly. Therefore, unless otherwise noted, the model for forecasting loads is:ˆ D t = Ψ D ( θ D , x t ) = θ D (0) + θ D (1) D t − , (33) ias Garcia, Street, Homem-de-Mello, Mu˜noz Application-driven learning applied to demand and reserve forecast that is, an auto-regressive model of order one, AR (1), for demand, with no exogenous covariates.For the single bus case we set the “real”, or population, values as θ D (0) = 0 . θ D (1) = 0 . θ D (0) / (1 − θ D (0)) = 6 is the above defined long term average. The reserve models were set to aneven simpler AR (0) model: ˆ R ( up ) t = Ψ R ( up ) ( θ R ( up ) , x t ) = θ R ( up ) (0) , (34)ˆ R ( dn ) t = Ψ R ( dn ) ( θ R ( dn ) , x t ) = θ R ( dn ) (0) . (35) In this first experiment, we aim to compare both the exact and the heuristic methods to check thequality of the latter for an instance that the exact method is capable of reaching global optimalsolutions. To that end, we consider the single bus test system.We started by solving instances with T ∈ { , , , } . All instances solved with the exactmethod converged within a gap lower than 0 .
1% using the Gurobi solver (Gurobi Optimization2020) or stopped after two hours. The heuristic method was terminated when the objective functionpresented a decrease lower than 10 − between consecutive iterations. We used a Nelder-Meadimplementation found in Mogensen and Riseth (2018). To compare the results, we plotted the ratiooptimal costs in Figure 2 (a) and the time ration in Figure 2 (b). We can observe that the heuristicmethod achieves high-quality solutions for almost all instances. Although the exact method iscompetitive for T ∈ { , } , the heuristic method is much faster with average solve time of 4 . s ,for T = 50, and 5 . s , for T = 75, compared to 1200 s and 6670 s for the the Exact method.Next, we further analyze the shape of the objective cost as a function of the estimates parametersto better understand how good the heuristic solutions can be. Given one sample with size 250we fixed the demand autoregressive parameters to the LS estimation and plot in R the cost asa function of the reserve requirements. Two views of this function are presented in Figure 3 (a)and (b). In the sequence, we plotted the cost as a function of the AR (0) and AR (1) coefficients ofthe demand forecasting model, in this case, the reserve requirements were fixed to the exogenous ias Garcia, Street, Homem-de-Mello, Mu˜noz Application-driven learning applied to demand and reserve forecast (a) (b) Figure 2
Number of observations per sample described in the legend. (a) Objective of Heuristic method dividedby the objective of Exact method for the same sample. The Heuristic solution is slightly above, exceptfor the samples that the Exact method returned very poor solutions due to time limit. (b) Time to solvethe same problem by the Heuristic method divided by the time spent by the Exact method. values of +/-1.96 standard deviations of the LS estimation of load forecast. This is presented intwo views in Figure 3 (c) and (d).We can note that both functions are reasonably well behaved, due to (3) possibly not convex,however, with a good shape for local search algorithms. This is a relevant feature supporting thechoice for our heuristic approach as previously described at the end of Section 5.2. We also notethat taking averages is a smoothing operation (Shapiro et al. 2014); therefore, we can hope to havea more well-behaved function as the sample size grows.
Now we focus only on the heuristic method to analyze how the estimates behave with respect to thesample size variation. We will be able to see that they are actually converging. Moreover, we showempirically that using the closed-loop model is strictly better than the open-Loop one provided wehave a reasonable sample size. It will be possible to see that a method with too many parametersmight overfit the model for reduced sample sizes and not generalize well enough. From now on, wewill use the following nomenclature and color code to refer to the different models: ias Garcia, Street, Homem-de-Mello, Mu˜noz Application-driven learning applied to demand and reserve forecast (a) Front view:optimizing reserves (b) Top view:optimizing reserves (c) Front view:optimizing load forecast (d) Top view:optimizing load forecast
Figure 3
Training cost as a functions of optimization parameters. (a) and (b): fixed load AR coefficients, optimiz-ing reserves θ R ( up ) (0) , θ R ( dn ) (0). (c) and (d): fixed reserves, optimizing load AR coefficients θ D (0) , θ D (1).One sample with size 250 was used to evaluate the function values in a grid with resolution 0 .
05 units. • LS-Ex (red): This is the benchmark model representing the classical open-loop approach. Ituses LS to estimate demand and an exogenous reserve requirement. • LS-Opt (blue): This is a partially optimized model, where least squares are used to estimatedemand and only reserves requirements are optimized. • Opt-Ex (yellow): This is also a partially optimized model, where demand is optimized whereasreserves requirements are still exogenously defined. This model is not particularly meaningful inpractice. We show it in some studies, mostly for completeness. • Opt-Opt (green): This is the fully optimized model, where both demand forecast and reserverequirements are jointly optimized.For didactic purposes, in all cases tested in this section, up and down reserve requirements weredefined as +/- 1 .
96 standard deviations, respectively, of the estimated residuals from the LS demandforecast. Note, however, that other exogenous ad hoc rule could be used (Ela et al. 2011b).We empirically compare and analyze the convergence of the four demand and reserve requirementforecast models mentioned above. In the sequence, we varied the dataset size used in the estimationprocess from 50 to 1000 observations. For each sample size, we performed 100 trial estimations,with different samples generated from the same process, to study the convergence. To evaluate the ias Garcia, Street, Homem-de-Mello, Mu˜noz
Application-driven learning applied to demand and reserve forecast out-of-sample performance of each one of the 100 estimates for each sample size, we used a singlefixed dataset of 10 ,
000 new observations to compute the objective function (generated with thesame underlying process but different from all other data used in the estimation processes). In thefollowing plots, lines represent mean values among the 100 estimated costs with the in-sample orout-of-sample data, and shaded areas represent the respective 10% and 90% quantiles.The average operation cost within the estimation samples is presented in Figure 4. The verticalaxis shows in-sample costs, while the horizontal axis shows the sample size used for the estimationprocedure. It is possible to see that the method that co-optimizes reserves requirements and demandforecasts finds lower cost than the others. This is expected because this method has more degrees offreedom (it is a relaxed version of the others) on the parameter estimation and this is the objectivefunction being minimized. Also, as expected, the LS plus exogenous reserves requirement modelfinds higher costs than the others. For the same reason, it does not allow for improvements by thelocal optimization method as it can be seen as a constrained version of the others. The other twomethods are always in between and
Opt-Ex is always below
LS-Opt , which shows that demandforecasting might have a larger effect than reserve allocation in this test system.
200 400 600 800 100010.511.011.512.0
Sample size A v e r age O pe r a t i on C o s t ( I n S a m p l e ) LS-ExLS-OptOpt-ExOpt-Opt
Figure 4
Average operation cost in training sample (in-sample) versus sample size.
Figures 5 (a) and (b) depict the same costs but in the out-of-sample data. Hence, they mea-sure how well the models generalize to data it has never seen before. We can see that the modelsallowing more parameters to be endogenously optimized perform much better than models with ias Garcia, Street, Homem-de-Mello, Mu˜noz Application-driven learning applied to demand and reserve forecast more exogenously defined forecast models. Thus, we see that the application-driven learning frame-work works successfully in out-of-sample data when estimated with sample sizes larger than 150.However, we note that these steady improvements require more data than the classic exogenousmodels, as shown in Figure 5 (b). Between 50 and 120 samples, the model with more optimizationflexibility, Opt-Opt, exhibits a more significant cost variance. This is due to excessive optimizationin a small dataset that led to overfitting and poor generalization. Note that, in this work, we didnot consider any regularization procedure to avoid this issue. However, our optimization basedframework is suitable for well-known shrinkage operators (Tibshirani 2011) that can be readilyadded in the objective function (1).
200 400 600 800 100011.011.011.211.411.6
Sample size O pe r a t i on C o s t ( O u t o f S a m p l e ) LS-ExLS-OptOpt-ExOpt-Opt (a) Starting from sample size 200.
50 100 150 200 2501112131415
Sample size O pe r a t i on C o s t ( O u t o f S a m p l e ) LS-ExLS-OptOpt-ExOpt-Opt (b) Sample size from 50 to 200.
Figure 5
Out-of-sample average operation cost versus (in-sample) sample size. Lines represent the average of the100 estimation trials. Shaded areas represent the 10% and 90% quantiles. All trials are evaluated on asingle out-of-sample dataset with size 10 ,
000 observations.
Figures 6 and 7 show how the optimized parameters behave as functions of the estimation samplesize. In Figure 6 we can see that the load model parameters are indeed converging to long-runvalues. It is also clear to see the bias in those parameters. The constant term is greatly increasedwhile the autoregressive coefficient is slightly reduced. Ultimately this leads to a larger forecastvalue, which can be interpreted as the application risk-adjustment due to the asymmetric imbalancepenalization costs (load shed is much higher than the spillage cost). Thus, the Opt-Opt model will ias Garcia, Street, Homem-de-Mello, Mu˜noz
Application-driven learning applied to demand and reserve forecast do the best possible to balance these costs, thereby prioritizing the load shed by increasing theforecast level. The fixed reserves model (Opt-Ex) is less biased because the fixed reserves constrainhow much the load model can bias due to the risk of not having enough reserve to address lowerdemand realizations. Note that the red (LS-Ex) is on top of the purple (LS-Opt) since both usethe same LS estimates for demand, which exhibits the lowest variance.
200 400 600 800 10000.51.01.52.02.53.0
Sample size q D ( ) LS-(Ex/Opt)Opt-ExOpt-Opt (a) θ D (0)
200 400 600 800 10000.700.750.800.850.900.95
Sample size q D ( ) LS-(Ex/Opt)Opt-ExOpt-Opt (b) θ D (1) Figure 6
Demand coefficients versus sample size. Lines represent the average of the 100 estimation trials. Shadedareas represent the 10% and 90% quantiles. The models LS–Ex and LS–Opt coincide, thereby arepresented as LS–(Ex/Opt).
In Figure 7, we see that the fully endogenous model greatly increases the downward reserve anddecreases the upward reserve, both consistent with the change in the demand forecast parameters.Closed-loop estimation of only reserves led to increased up reserves that are the most expensive toviolate, while downward reserves are mostly unaffected, this might be an artifact of the estimationmodel that uses the open-loop estimation as a starting point. The Opt-Opt model is limited in 3because that is the maximum reserve that can be allocated (30% of the generators capacity).To highlight the bias on load forecast we present, in Figure 8 (a), a histogram of deviations: error := realization − f orecast . Negative values mean that the forecast value was above the real-ization. The LS estimation leads to an unbiased estimator, seen in the red histogram centered on ias Garcia, Street, Homem-de-Mello, Mu˜noz Application-driven learning applied to demand and reserve forecast
200 400 600 800 10002.002.252.502.753.00
Sample size R e s e r v e D o w n (LS/Opt)-ExLS-OptOpt-Opt (a) Reserve down, θ R ( dn ) (0)
200 400 600 800 10001.251.501.752.002.252.502.75
Sample size R e s e r v e U p (LS/Opt)-ExLS-OptOpt-Opt (b) Reserve up, θ R ( up ) (0) Figure 7
Reserve coefficients versus sample size. Lines represent the average of the 100 estimation trials. Shadedareas represent the 10% and 90% quantiles. The models Ls–Ex and Opt–Ex coincide, thereby arepresented as LS/Opt–Ex. zero. On the other hand, the forecast from the fully endogenous model is clearly biased, as it con-sistently forecasts higher values than the realizations. This fact is corroborated by the cumulativedistribution functions displayed in Figure 8 (b). -4 -2 0 2 40.000.050.100.15
Demand Forecast Error P r obab ili t y LS-(Ex/Opt)Opt-Opt (a) Two histograms are shown, the third color is theirintersection (LS–Ex and LS–Opt coincide). -4 -2 0 2 40.000.250.500.751.00
Demand Forecast Error P r obab ili t y LS-(Ex/Opt)Opt-ExOpt-Opt (b) Accumulated Probability – out of the four models,three are shown here (LS–Ex and LS–Opt coincide).
Figure 8
Forecast error (observation – forecast) in a histogram, comparing fully optimized model with leastsquares estimation. Negative values mean forecast was larger than actual realization. ias Garcia, Street, Homem-de-Mello, Mu˜noz
Application-driven learning applied to demand and reserve forecast This experiment aims to show that it is also possible to consider dynamic reserves within the pro-posed scheme, i.e., conditioned to external information being dynamically revealed to the systemoperator. There are examples of works in the literature that considered the load to be heteroscedas-tic (Van der Meer et al. 2018). Hence, we will study here a simple formulation of demand time serieswith time-varying variance: We considered an exogenous variable that follows an autoregressiveprocess of order one (36) and that the variance of the real demand depends linearly on this newvariable (37). E t = φ (0) + φ (1) E t − + ε t , ε t ∼ N (0 , σ E ) , (36) D t = θ D (0) + θ D (1) D t − + (cid:15) t , (cid:15) t ∼ N (0 , E t ) , (37)We did not modify the demand forecast model (33), but we allowed for dynamic reserve sizing,that is, the reserve will vary with time, with a linear dependence on some exogenous variable:ˆ R ( up ) t = Ψ R ( up ) ( θ R ( up ) , x t ) = θ R ( up ) (0) + θ R ( up ) (1) E t , (38)ˆ R ( dn ) t = Ψ R ( dn ) ( θ R ( dn ) , x t ) = θ R ( dn ) (0) + θ R ( dn ) (1) E t . (39)The results of this experiment are depicted in Figure 9. We refer to the model from the previoussections as static reserve model, Figure 9 (a), and the model defined here as dynamic reserve model, Figure 9 (b). The system cost was clearly reduced by considering dynamic reserves for bothOpt-Opt and LS-Opt models. This section, aims to spotlight the dependency of the estimated parameters on the deficit cost,which is the largest violation penalty in this problem. We varied the deficit cost between 15 and100 and estimated parameters with on single sample of size 1 ,
000 observations.Figure 10 depicts the up and down reserves and steady-state demand ( θ D (0) / (1 − θ D (1)). Weadded the solid line in red, Demand LS , to represent the steady-state demand and dashed red, ias Garcia, Street, Homem-de-Mello, Mu˜noz Application-driven learning applied to demand and reserve forecast
200 400 600 800 100010.010.511.011.5
Sample size O pe r a t i on C o s t ( O u t o f S a m p l e ) LS-ExLS-OptOpt-Opt (a) Static Reserve
200 400 600 800 100010.010.511.011.5
Sample size O pe r a t i on C o s t ( O u t o f S a m p l e ) LS-ExLS-OptOpt-Opt (b) Dynamic reserve
Figure 9
Variance driven by exogenous variable. Lines represent the average of the 100 estimation trials. Shadedareas represent the 10% and 90% quantiles.
Reserve Ex , to show the obtained reserve requirement from the residue of the LS estimation model.So, as these two values are exogenously calculated, they do not vary with the load shed cost.The solid green line,
Demand Opt–Opt , shows an increasing bias as the load-shed cost grows,corroborating the expected behavior. The dashed green line also shows that the best response isto increase reserve levels as the load shed grows. The dashes blues lines are the reserve marginsaround the solid LS-based demand (red line) and they are also clearly affected by the load-shedcost.
20 40 60 80 1004681012
Deficit Cost E ne r g y Demand Opt-OptReserve Opt-OptDemand LSReserve ExReserve LS-Opt
Figure 10
Estimated parameter as function of deficit cost ias Garcia, Street, Homem-de-Mello, Mu˜noz
Application-driven learning applied to demand and reserve forecast In this final experiment, we apply the closed-loop estimation methodology to a larger power systemnetwork. As in the previous section, we focus our attention on the three models: LS-Ex, LS-Optand Opt-Opt. Our primary goal in this case study is to demonstrate that the methodology canbe applied to realistic power systems and that it is possible to obtain high-quality solutions thatsignificantly improve the standard procedure. All results are depicted in Table 1First, we can see that both closed-loop models consistently outperform the benchmark cost (LS-Ex) with almost 5% improvement in the out-of-sample evaluation the 24 bus test system. Thereis not much value in estimating the demand forecast endogenously in this system, although theendogenous reserve sizing improvement is evident. Second, in the 118 bus system, we see the samepattern again, a consistent improvement of the cost function by relying on the closed-loop models.This time we see a 3% improvement with both closed-loop models being similar to each other.Finally, in the 300 bus system, we can see consistent improvements once more. However, nowwe have a 5% improvement by endogenously sizing reserves and a 10% improvement by jointlyestimating the closed-loop demand forecast and reserve size. Of course, the results come at theexpense of increased computational cost, but the computation times are still reasonable, showingthat the method can be used in real-world power systems.
8. Conclusions
We presented a general application-driven framework to jointly estimate the parameters of a loadand reserve requirements forecast model in a closed-loop fashion. A mathematical framework isproposed following the ideas of bilevel optimization. Asymptotic convergence is demonstrated andtwo solutions techniques are presented. The proposed method is contrasted with the classicalsequential open-loop procedure where the forecast models are estimated based on least squaresand used in the decision-making process.The reported numerical experience allows highlighting the following main empirical results andinsights: ias Garcia, Street, Homem-de-Mello, Mu˜noz Application-driven learning applied to demand and reserve forecast
System Model Test Cost( $ ) Train Cost ( $ ) Train Time (s)Mean Std Mean Std Mean StdLS-Ex 414.70 1.14 397.01 4.39 0.00 0.0024 LS-Opt 398.48 0.69 378.49 93.00 450.81 90.35Opt-Opt 398.20 1.04 376.35 14.04 643.89 11.16LS-Ex 2956.01 11.33 3163.74 3.78 0.00 0.00118 LS-Opt 2829.86 4.20 3041.28 4.86 639.87 4.97Opt-Opt 2815.85 4.57 3029.93 13.20 911.43 12.37LS-Ex 7697.25 40.78 7646.84 26.09 0.00 0.00300 LS-Opt 7329.44 36.62 7278.88 18.75 803.15 34.08Opt-Opt 6820.47 37.67 6787.65 294.53 2748.17 303.56 Table 1
24, 118 and 300 bus systems
1. There exists an optimal bias in the load forecast maximizing the performance of a system ormarket operator in the long-run. Moreover, The optimal bias in the load forecast is not disconnectedfrom the optimal reserve requirements (reserve sizing problem);2. Reserve sizing is intrinsically dependent on the load shed cost and system’s characteristicsand can be optimally determined by our framework even in the case where traditional methodsexogenously estimate the demand forecast.3. Our model can endogenously define the optimal reserve sizing across the network by definingzonal reserve requirements that will best perform given the system operator’s description of thenetwork.4. We show for realistic test systems, e.g., IEEE 300 bus, that a model only estimating reservesis capable of improving 4.8% the long-run operation cost and 11.4% when co-optimizing load andreserve requirement forecasts.5. We show that the proposed heuristic solution method can provide high-quality solutionsin reasonable computational time. This is mostly due to the selected approach that allows fordecomposing the estimation problem per observation and solving the second-level problem tillglobal optimality in polynomial time. Additionally, it also leverages mature linear-programming-based warm-start technologies and algorithms to scale up the performance in larger instances. ias Garcia, Street, Homem-de-Mello, Mu˜noz
Application-driven learning applied to demand and reserve forecast This pattern is consistently observed in all test systems, corroborating the proposed framework’seffectiveness in finding improved estimates for both load and reserve requirements.
References
Anstine L, Burke R, Casey J, Holgate R, John R, Stewart H (1963) Application of probability methods tothe determination of spinning reserve requirements for the pennsylvania-new jersey-maryland intercon-nection.
IEEE Transactions on Power Apparatus and Systems
Journal of the Operational ResearchSociety
Mathematical Program-ming Computation arXiv preprint arXiv:1908.02788 .Bard JF (2013)
Practical bilevel optimization: algorithms and applications , volume 30 (Springer Science &Business Media).Baydin AG, Pearlmutter BA, Radul AA, Siskind JM (2017) Automatic differentiation in machine learning:a survey.
The Journal of Machine Learning Research
International Journalof Neural Systems
Management Science .Bixby RE, Martin A (2000) Parallelizing the dual simplex method.
INFORMS Journal on Computing
SIAM Journal on AppliedMathematics
Journal of optimization theory and applications ias Garcia, Street, Homem-de-Mello, Mu˜noz Application-driven learning applied to demand and reserve forecast
Bucksteeg M, Niesen L, Weber C (2016) Impacts of dynamic probabilistic reserve sizing techniques on reserverequirements and system costs.
IEEE Transactions on Sustainable Energy .Chen Y, Gribik P, Gardner J (2014) Incorporating post zonal reserve deployment transmission constraintsinto energy and ancillary service co-optimization.
IEEE Transactions on Power Systems http://dx.doi.org/10.1109/TPWRS.2013.2284791 .Conn AR, Scheinberg K, Vicente LN (2009)
Introduction to derivative-free optimization (SIAM).De Vos K, Stevens N, Devolder O, Papavasiliou A, Hebb B, Matthys-Donnadieu J (2019) Dynamic dimen-sioning approach for operating reserves: Proof of concept in Belgium.
Energy Policy http://dx.doi.org/10.1016/j.enpol.2018.09.031 .Dempe S (2018)
Bilevel optimization: theory, algorithms and applications (TU Bergakademie Freiberg,Fakult¨at f¨ur Mathematik und Informatik).Donti P, Amos B, Kolter JZ (2017) Task-based end-to-end model learning in stochastic optimization.
Advances in Neural Information Processing Systems , 5484–5494.Dunning I, Huchette J, Lubin M (2017) Jump: A modeling language for mathematical optimization.
SIAMReview
Con-tract (August):1–103, URL .Elmachtoub AN, Grigas P (2017) Smart” predict, then optimize”. arXiv preprint arXiv:1710.08005 . ias Garcia, Street, Homem-de-Mello, Mu˜noz Application-driven learning applied to demand and reserve forecast
Journal of the operational Research Society
International Conference on Machine Learning , 1568–1577 (PMLR).Gade D, Hackebeil G, Ryan SM, Watson JP, Wets RJB, Woodruff DL (2016) Obtaining lower bounds fromthe progressive hedging algorithm for stochastic mixed-integer programs.
Mathematical Programming
Postoptimal Analyses, Parametric Programming, and Related Topics: degeneracy, multicriteriadecision making, redundancy (Walter de Gruyter).Garcia JD, Bodin G, and contributors (2021) joaquimg/bileveljump.jl: v0.4.1. URL http://dx.doi.org/10.5281/zenodo.4556393 .Garcia R, Gen¸cay R (2000) Pricing and hedging derivative securities with neural networks and a homogeneityhint.
Journal of Econometrics .Henningsen A, Toomet O (2011) maxlik: A package for maximum likelihood estimation in r.
ComputationalStatistics
IEEE Transactions on Sustainable Energy
International Journal ofForecasting
Advances in Neural Information Processing Systems ,889–897.Kazempour J, Pinson P, Hobbs BF (2018) A stochastic market design with revenue adequacy and costrecovery by scenario: Benefits and costs.
IEEE Transactions on Power Systems
Fundamentals of power system economics (John Wiley & Sons). ias Garcia, Street, Homem-de-Mello, Mu˜noz Application-driven learning applied to demand and reserve forecast
Knueven B, Ostrowski J, Watson JP (2020) On mixed-integer programming formulations for the unit com-mitment problem.
INFORMS Journal on Computing
IEEE Trans-actions on Smart Grid
Mathematicalprogramming
IEEEtransactions on power systems ε -perturbation method for avoiding degeneracy. OperationsResearch Letters
Journal of OpenSource Software arXiv preprint arXiv:2008.01500 .Nelder JA, Mead R (1965) A simplex method for function minimization.
The computer journal
IEEE Transactions on Sustainable Energy
Operations Research
IEEE Transactions on PowerSystems
IEEE Transactions on Power Systems ias Garcia, Street, Homem-de-Mello, Mu˜noz
Application-driven learning applied to demand and reserve forecast https://pjm.com/-/media/documents/manuals/archive/m11/m11v97-energy-and-ancillary-services-market-operations-07-26-2018.ashx .Rockafellar RT, Uryasev S, Zabarankin M (2008) Risk tuning with generalized linear regression.
Mathematicsof Operations Research
SIAM Journal on Optimization
INFORMS Journal on Optimization (submitted) .Shapiro A, Dentcheva D, Ruszczy´nski A (2014)
Lectures on stochastic programming : modeling and theory (SIAM), 2nd edition.Siddiqui S, Gabriel SA (2013) An sos1-based approach for solving mpecs with a natural gas market applica-tion.
Networks and Spatial Economics
Electric Power Systems Research
WileyInterdisciplinary Reviews: Energy and Environment
Annals of Operations Research https://eur-lex.europa.eu/legal-content/EN/TXT/HTML/?uri=CELEX:32017R1485&from=EN .Tibshirani R (2011) Regression shrinkage and selection via the lasso: a retrospective.
Journal of the RoyalStatistical Society: Series B (Statistical Methodology)
Renewable and Sustainable Energy Reviews ias Garcia, Street, Homem-de-Mello, Mu˜noz Application-driven learning applied to demand and reserve forecast
Varian H (1975) A bayesian approach to real estate assessment. Fienberg SE, Zellner A, eds.,
Studies inBayesian Econometrics and Statistics in Honor of Leonard J. Savage (Amsterdam: North-Holland).Wang B, Hobbs BF (2014) A flexible ramping product: Can it help real-time dispatch markets approach thestochastic dispatch ideal?
Electric Power Systems Research
IEEE Transactions on Power Systems
Journal of the Amer-ican Statistical Association
Economics Letters