A Heuristic for Dynamic Output Predictive Control Design for Uncertain Nonlinear Systems
11 A Heuristic for Dynamic Output Predictive ControlDesign for Uncertain Nonlinear Systems
Mazen Alamir
Abstract —In this paper, a simple heuristic is proposed for thedesign of uncertainty aware predictive controllers for nonlinearmodels involving uncertain parameters. The method relies onMachine Learning-based approximation of ideal deterministicMPC solutions with perfectly known parameters. An efficientconstruction of the learning data set from these off-line solutionsis proposed in which each solution provides many samples in thelearning data. This enables a drastic reduction of the requirednumber of Non Linear Programming problems to be solved off-line while explicitly exploiting the statistics of the parametersdispersion. The learning data is then used to design a fast on-lineoutput dynamic feedback that explicitly incorporate informationof the statistics of the parameters dispersion. An example isprovided to illustrate the efficiency and the relevance of theproposed framework. It is in particular shown that the proposedsolution recovers up to 78% of the expected advantage of havinga perfect knowledge of the parameters compared to nominaldesign.
Index Terms —Machine Learning, Nonlinear Systems, OutputFeedback, Stochastic NMPC.
I. I
NTRODUCTION
Over the last two decades, Nonlinear Model PredictiveControl (NMPC) [13] spread in all domains and became afirst choice when it comes to design control feedbacks forsystems that admit faithful models. This is due to the maturityof its stability assessment as well as to the availability ofuser-friendly solvers [4]. As a direct consequence of thissuccess, investigations started regarding the best way toextend the use of NMPC to systems that are representedthrough uncertain models.After some early attempts involving over stringent robustNMPC frameworks [11], robust adaptive MPC frameworkshave been proposed [2], [7] to provide on-line modeladaptation for uncertain systems under the assumption ofaffine parameters, strict identifiability, persistent excitationand full state measurement. More recently, the concept ofStochastic NMPC (SNMPC) [12], [14] emerged. In a nutshell,SNMPC frameworks implement the basic NMPC schemewith the standard deterministic cost and constraints beingreplaced by their expected values, considered as functions ofthe model’s uncertainties. While the conceptual simplicity ofthis shift in paradigm is appealing, its consequence on thecomputational burden appears as a deal breaker despite somerecent attempts [3], [9], [16]. Indeed, SNMPC requires theon-line computation of the expectation of nonlinear functions
This work was supported by the MIAI @ GrenobleAlpes under Grant ANR-19-P31A-003.The author is with the University of Grenoble Alpes, CNRS, Grenoble INP,GIPSA-lab, 38000 Grenoble, France (e-mail: [email protected]). of several variables. This computation is to be performedfor any value of the decision variable that is encounteredduring the on-line iterations of the optimization algorithm.This complexity justified an increasing interest in off-linecomputation based solutions.One of these is related to Stochastic Dynamic Programming(SDP) / Reinforcement Learning (LR) alternatives [5],[14] which offer elegant solutions for small-size problems.Another research track is related to the approximation ofthe appropriate optimal feedback/strategy via Deep/Machinelearning-based fitting schemes (see [8], [10] and the referencestherein). More precisely, a cloud of initial states is created andfor each element in the cloud, a multi-stage SNMPC is solved(this needs finite and generally small number of possibleuncertainty values to be allowed). Then a neural network istrained via deep-learning in order to fit a function that mapsthe initial state to the stochastically optimal feedback controland or strategy. This approach leads to extremely heavyoff-line computational burden without providing theoreticalguarantees, not only because of the inherent and unavoidabledata incompleteness which is a shared issue by all data-drivensolutions (including the one proposed in this paper), butalso because all these schemes mainly use the multi-stageSNMPC framework which needs the uncertainty realizationsto be limited to a few number of possible values whichobviously scarcely fits the real-life situations. Consequently,these approaches should be viewed as smart and reasonablytractable heuristics to address a problem that admits norigorously computable solution.It is not the intention of this paper to propose a betterframework than the ones cited above and the referencestherein. This is because the comparison between heuristics isa complex multi-criteria based process that would go beyondthe scope of this short contribution. In particular, while thescheme proposed hereafter is appealing from a computationalpoint of view, its scope of application is limited to nonlinearmodels with fixed (or slowly varying) uncertain parametersas main source of uncertainty. In this particular but highlyrelevant case, the following intuition motivates this paper:under specific circumstances, the deterministic op-timal solution that would have been computed,should the parameters be known, can be learned fromthe previous measurement over some observation win-dow. a r X i v : . [ ee ss . S Y ] F e b In the ideal case, this statement results from a perfectidentifiability of the pair of state and parameter vectorsleading to the optimal MPC control being an exact functionof the previous measurement y ( − ) k : ˆ u (cid:63) (cid:0) y ( − ) k (cid:1) = arg min u ∈ U N J (cid:0) u | ˆ x ( y ( − ) k ) , ˆ w ( y ( − ) k ) (cid:1) (1)making y ( − ) k and ˆ u (cid:63)k ideal candidates to be viewed respectivelyas features vector and a label in a Machine Learning (ML)identification step. When extended observability does not hold,the above relationship can be used with the understandingthat ˆ x ( y ( − ) k ) and ˆ w ( y ( − ) k ) are non uniquely determinedquantities since for a given instance of the measurementprofile y ( − ) k , there is a cloud of possible values of (ˆ x, ˆ w ) and hence a cloud of possibly indistinguishable optimalcontrol inputs ˆ u (cid:63) . The data generation using the statisticsof the dispersion of w and the following ML identificationstep helps making a statistically rational choice among theelements of this cloud.Following the above lines, this paper proposes a frameworkfor the design of uncertainty-aware output feedback lawwhich is fitted from learning data that is constructed in acomputationally efficient manner. The latter involves twomain differences with standard approaches: 1) the off-linecomputation involves only deterministic optimal controlproblems. and can therefore directly benefit from the state-of-the-art available NMPC solvers. 2) Each single solutionis used to extract multiple instances (that can be as high ashundreds) to be included in the learning data set. This is doneby exploring the open-loop optimal optimal trajectories andconcatenate the sub-optimal instances of ( y ( − ) k + i , u (cid:63)k + i ( y ( − ) k )) rather than collecting a single pair.This paper is organized as follows: Section II presentssome definitions and notation and states precisely theproblem under study. The proposed framework is detailed inSection III together with the working assumptions that mightinduce its success. An illustrative example is proposed inSection IV in order to show the steps of the framework andevaluate the impact of parameters choice on the quality ofthe results. Finally, Section V concludes the paper and givessome hints for further investigations.II. D EFINITION , NOTATION AND PROBLEM STATEMENT
Let us consider nonlinear systems governed by the followingdiscrete-time dynamics: x k +1 = f ( x k , u k , w ) (2) y k = h ( x k , u k , w ) (3)where (2) and (3) describe the dynamics and the measurementequation respectively. The notation x k ∈ R n x , u k ∈ U ⊂ R n u and y k ∈ R n y stand for the state, input and measured outputvectors at instant k . The vector of parameters w ∈ R n w issupposed to be constant although uncertain. Moreover, it isassumed that w belongs to some bounded subset W overwhich a probability density function (pdf) W is known so that a statistically relevant samples of w can be drawn ifrequired. It is also assumed that a set of states X of interestis considered with some random sampling rule which reflectssome form of relative importance or somehow relevance . Inwhat follows, a boldfaced notation refer to signals profiles oversome past or future time windows, in particular the notation u := ( u T , u T , . . . , u TN − ) T ∈ U N denotes a sequence offuture control actions over some prediction horizon of length N while y ( − ) k refers to the sequence of M + 1 previousmeasurement vectors, namely y ( − ) k := y k y k − ... y k − M ∈ (cid:2) R n y (cid:3) M +1 (4)The notation x ( u ) ( x , w ) = x ( u )0 ( x , w ) x ( u )1 ( x , w ) ... x ( u ) N ( x , w ) ∈ (cid:104) R n x (cid:105) N +1 denotes the state trajectory under (2) when the control profile u and when the parameter vector w is used, namely: x ( u )0 ( x , w ) = x x ( u ) k +1 ( x , w ) = f ( x ( u ) k ( x , w ) , u k , w ) It is assumed that the control objective would be definedthrough a cost function of the form J ( u | x k , w ) := N (cid:88) i =1 (cid:96) i (cid:0) x ( u ) i ( x k ) , u i − (cid:1) (5)should the current state x k and the parameter vector w beknown. The design problem that is addressed in the presentpaper can be shortly stated as follows:Design a dynamic output feedback of the form: u k := K ( y ( − ) k ) that associates to each past measurement profile y ( − ) k an associated input in such a way that in-corporates the presumably known statistics W andthat is oriented towards the optimization of the costfunction J .U NCERTAINTY - AWARE DYNAMIC OUTPUT FEEDBACK R EMARK it is assumed that all the constraints except thecontrol saturation related ones (which are expressed through U ) are included in the very definition of the stage cost map (cid:96) for the sake of simplicity of exposition. Moreover, it should beunderlined that the specific structure of the cost function, beingthe sum of decoupled stage-cost terms is not explicitly neededalthough it is kept because it fits all the available optimalcontrol problems’ solvers that we rely on in the design step. In the absence of any knowledge, uniform distribution over some hyper-box can be simply used.
III. T
HE PROPOSED FRAMEWORK
A. The design viewed as a map identification problem
The starting point lies in the fact that the problem reducesto a standard deterministic problem if the pair ( x k , w ) ofstate and parameter vector were reconstructible via extendedobservation [6], namely, if one can reconstruct estimations ˆ x k and ˆ w of these two quantities as functions of the previousmeasurement: ˆ x ( y ( − ) k ) ; ˆ w ( y ( − ) k ) (6)Indeed, in this case, the answer to the problem stated in theprevious section would obviously be given by: K ( y ( − ) k ) := ˆ u (cid:63) (cid:0) y ( − ) k (cid:1) (7)where ˆ u (cid:63) ( y ( − ) k ) is the first control in the optimal sequencethat solves the following optimization problem: ˆ u (cid:63) (cid:0) y ( − ) k (cid:1) = arg min u ∈ U N J (cid:0) u | ˆ x ( y ( − ) k ) , ˆ w ( y ( − ) k ) (cid:1) (8)When extended observability does not hold, the pairs ˆ x ( y ( − ) k ) , ˆ w ( y ( − ) k ) are non uniquely determined quantities since for a given instance of the measurement profile y ( − ) k ,there is a cloud of possible values of (ˆ x, ˆ w ) and hence acloud of possibly indistinguishable optimal control inputs ˆ u (cid:63) . The data generation using the statistics of the dispersionof w and the following ML identification step helps makinga statistically rational choice among the elements of this cloud.To summarize:Given the statistical dispersion of the parameters,fit the best map F ( y ( − ) k ) ≈ ˆ u (cid:63) ( · ) that links thesequence of previous measurement y ( − ) k to theoptimal control u k = F ( y ( − ) k ) .T HE MAP TO BE IDENTIFIED
Now the question is:
How to build a relevant and rich learning data thatcan be used to fit the map F while reducing as far aspossible the amount of off-line computation ?B. Building the learning data Consider the following procedure that is the elementary blocin the data generation step:1) Choose a long prediction horizon N .2) Choose an observation horizon M < N .3) Randomly choose a pair q = ( x , w ) ∈ R n x × R n w
4) Use an efficient NLP solver (for instance
IPOPT-Casadi [4]) to compute a minimizer u (cid:63) ( q ) to J ( · | q ) . One gets the situation depicted inFigure 1.5) This item explains how m samples in the learningdata is created using the optimal open-loop trajectories computed in the previous item for the specific value of q = ( x , w ) . Indeed, consider the pair (see Figure 1): (cid:16) y ( − ) , ˆ u (cid:63) ( y ( − ) ) (cid:17) := (cid:16) [ y (cid:63) ( q )] ( − ) M , u (cid:63)M ( q ) (cid:17) (9)The key idea is that the pair defined in (9) approximates apair of the form (7) provided that the prediction horizon N is sufficiently long. Indeed, y ( − ) := [ y (cid:63) ( q )] ( − ) M , consideredat instant M is the vector of past measurements. It thereforecontains the information regarding the pair ( x (cid:63)M ( x ) , w ) ofcurrent state and parameter vector. time x NM [ y (cid:63) ( q )] ( − ) M y ( − ) = Optimal trajectories u (cid:63) ( q )ˆ u (cid:63) ( y ( − ) ) := u (cid:63)M ( q ) Fig. 1. Generation of a sample pair ( y ( − ) , ˆ u (cid:63) ( y ( − ) ) by solving an optimalcontrol problem for a given sampled pair q = ( x , w ) . Now it is true that in order for (9) to be of the form (7), thefollowing approximation: u (cid:63)M ( q ) ≈ u (cid:63) ( x (cid:63)M ( q )) (10)should be satisfied. But this approximation would have beena strict equality should N be infinite thanks to the Bellmanprinciple ! Indeed, u (cid:63)M ( q ) is the beginning of the remainingpart of an optimal solution while u (cid:63) ( x (cid:63)M ( q )) is the optimalsolution of the updated problem and these two solutionswould be identical should N be infinite.Now the same argument holds if we iterate the process(still for the same q ) using moving windows that end atinstants M + j , for j = 1 , . . . , m − . This means thatby performing a single deterministic NLP solution, we cangenerate m samples in the learning data, namely: D ( q ) := (cid:40)(cid:16) [ y (cid:63) ( q )] ( − ) k , u (cid:63)k ( q ) (cid:17)(cid:41) M + m − k = M (11)This procedure is shown in Figure 2.Repeating this procedure for a cloud of n q sampledvalues of the pair q , one gets a learning data: D := (cid:110) D ( q ( (cid:96) ) ) (cid:111) n q (cid:96) =1 (12)containing ( n q · m ) samples at the price of solving n q deter-ministic optimal control problems. C. Performance evaluation
The learning data D can now be used to fit a model of theform : u k = ML ( y ( − ) k ) (13) ML stands for (Machine Learning)-based model time x NM y (cid:63) ( − ) M Optimal trajectories u (cid:63) ( q )ˆ u (cid:63) ( y (cid:63) ( − ) M ) := u (cid:63)M ( q ) time x NM + 1 y (cid:63) ( − ) M +1 Optimal trajectories u (cid:63) ( q )ˆ u (cid:63) ( y (cid:63) ( − ) M +1 ) := u (cid:63)M +1 ( q ) time x NM + 2 y (cid:63) ( − ) M +2 Optimal trajectories u (cid:63) ( q )ˆ u (cid:63) ( y (cid:63) ( − ) M +2 ) := u (cid:63)M +2 ( q ) Fig. 2. Construction of the learning data D ( q ) : by going forward in a movingwindow along the optimal trajectory computed for a single pair q = ( x , w ) it is possible to generate a high number of different samples of pair ( y ( − ) , u ) that can be used in the construction of the learning data. This feedback can be applied in closed-loop to a new cloud of values of (cid:8) q ( j ) = ( x ( j )0 , w ( j ) ) (cid:9) n v j =1 . Namely for each ofthese values, the feedback is implemented without knowledgeof the associated w ( j ) (only the vector of past measurementis used through (13)). If the closed-loop is simulated dur-ing N samples (equal to the prediction horizon), then thecorresponding closed-loop cost, say J clj can be compared tothe exact optimal cost J (cid:63)j that can be viewed as the idealsolution (since deterministic problems are solved assumingthat the parameter values w ( j ) is known). Obviously, if theoptimization is perfectly done, one should systematically have J clj ≤ J (cid:63)j since any closed-loop sequence over the predictionhorizon N is naturally a candidate sequence of the open-loopoptimization problem.IV. N UMERICAL INVESTIGATION
Consider the parallel reactor system [1] commonly used inthe study of deterministic Economic NMPC: ˙ x = 1 − w x e − /x − w x e − w /x − x (14) ˙ x = w x e − /x − x (15) ˙ x = u − x (16)where x and x stand for the concentrations of reactant andproduct respectively while x represents the temperature ofthe mixture in the reactor. The control variable is given by the Fig. 3. Dispersion of the parameters in the learning data in terms of ratio tothe nominal components. (Ratios to nominal values are shown) heat flow u ∈ U := [0 . , . . The objective of the controlis to maximize the amount of product x which is preciselythe only measured variable. This leads to the following costfunction: J ( u | x , w ) = − N (cid:88) i =1 Cx ( u ) i ( x , w ) with C = (0 , , In the forthcoming computations, a sampling period of . time units is used. Based on this value and observing typicaloptimal trajectories, the prediction horizon length N = 250 has been considered as it leads to a prediction horizon that isslightly greater than the settling time. The past measurementhorizon length used to define y ( − ) k is taken equal to M = 10 . A. Generating the learning and validation data
The initial states x are uniformly sampled inside the set X := [10 − , . × [10 − , . × [10 − , . (17)which is known to include almost all relevant evolutions of thesystem. As for the values of the unknown parameter vector w ,a normal distribution around the nominal value given by: w = (10 , , . is used so that the random sampling is based on the rule: w i = (1 + ν i ) w i with ν i ∈ N (0 , σ i ) (18)meaning that the true values are normally distributed aroundthe nominal value with a known standard deviations. Thevalues σ i = 0 . are used for i ∈ { , , } . Figure 3 showsthe dispersion of the parameter values that results from therandom sampling law (18). Note that the variations of theparameters span the interval going from 20% to 180% of thenominal values.The learning data set is generated according to (11)-(12) inwhich n q = 500 samples of pairs ( x , w ) are drawn inside theabove described sets. This leads to a learning set of cardinality (500 · m ) where m is the number of times the moving windowis translated in order to generate samples from a single solutiona deterministic optimal control problem. In what follows, Fig. 4. Cumulative histogram of computation time (sec) needed by IPOPT( multiple-shooting ) to solve a single optimal control problem when creatingthe learning data for the illustrative example. several identification settings are tested for different valuesof m ∈ { , , , , } leading to learning data setsof different cardinality { , , , , } samples having all in common almost the same computationtime (the time needed to solve n q = 500 deterministic NLPproblems).The cumulative histogram of computation time (in sec)needed by a multiple-shooting implementation of the optimalcontrol solver IPOPT implemented using
Casadi is shownin Figure 4. The computation time shown in Figure 4 mightseem too long for those who are commonly using
IPOPT tosolve standard regulation problems. Indeed, the median timeis close de 15 sec! This is probably due to the oscillatingcharacter of optimal trajectory that is very specific to theunderlying economic NMPC formulation and to the relativelylong prediction horizon N = 250 being used.The histogram of control values over the prediction horizonof the n q = 500 sampled scenarios (including hence 125000values) is shown in Figure 5. This histogram suggests thatclassification tools are best suited for the learning of thecontrol using ML tools compared to regression tools . Indeed,the following three-valued set of label values can be used:label = if u ≤ . if u ∈ ]0 . , . otherwise (19)with this definition of the label, it is possible toidentify a classifier that associates to any time seriesof measurement y ( − ) an element of the set { , , } andhence its corresponding control value in { . , . , . } .Note that if the distribution of the control values wasmore continuously distributed between u min = 0 . and u max = 0 . , a regressor would have been more appropriateto address the identification problem. for this specific example. Fig. 5. Histogram of the values of the control u present in the scenarios(of length N = 250 each) contained in the learning data. B. Identifying the output feedback model
The features vector is build using the previous profile of x as output. The Identification of the map ML ( y ( − ) ) has beenobtained using a RandomForestClassifier from the scikit-learn freely available python library [15]. In orderto avoid overfitting and enhance the quality of the extrapola-tion on unseen data, the max_leaf_nodes parameter hasbeen set to . Fig. 6. Fitting results on the training and test data (split ratio (33%) for thedifferent values of receding horizon moves m used in the construction of thelearning data. Recall that the label the classifier tries to guess lies in { , , } as explained in (19). The confusion matrices corresponding to the training andthe test data (using a test-size ratio of 33% of the learning data)are shown in Figure 6 for m ∈ { , , } . These matricesclearly show that when m = 10 is used (cardinality of thelearning set = 5000) the learning data is still not representativeenough (as the precision on training data is significantly betterthan on test data). This clearly highlights by itself the relevanceof the proposed heuristic since the computation time reportedin Figure 4 suggests that the time that would be needed togenerate rigorously optimal although still insufficientdata would be around hours while the time needed hereto generate samples is around 2h. C. Performance evaluation
A new set of pairs of ( x , w ) is generated and the initial states are used to generate learning data using the nominal value w of the parameter vector. This is done inorder to evaluate the benefit from explicitly learn the feedbackfrom sampled parameter vector compared to a nominal designthat only use the nominal expected value.Then for each value of m , the corresponding identifiedfeedback law is used to generate closed-loop in whichthe newly drawn samples of the parameter vector (which areunknown to the previously identified feedbacks) is used inthe simulated model.The results are shown in Figure 7 where one can find1) the ideal optimal closed-loop cost given by the NLPsolver IPOPT in the unrealistic case where the disturbanceis perfectly known 2) the closed-loop performance obtainedusing the nominal feedback described above and 3) theclosed-loop performance of the identified feedback using thedifferent values of m . Fig. 7. Comparison of the closed-loop performances on the validation dataset.
From Figure 7, the following observations can be made: • The optimal achievable values of the cost when theparameter vector is perfectly known is better(in average) than what the nominal identified controllerenables to achieve. • The proposed methodology enables to recover 78% ofthe advantages of knowing perfectly the value of theparameter vector when m = 200 is used in the construc-tion of the learning data. This corresponds to betteraverage than the nominally optimal identified controller.Almost similar results are obtained when m ∈ { , } is used. • The average performance drops to ( of re-covered advantage) when m = 50 is used while adrastic decrease corresponding to 3.9% (and only ofrecovered advantage) when short m = 10 is used in thebuilding of the learning data. It is important to underlinethat if strict optimality is used in the data set buildingstep (only one sample per NLP solution), the time thatwould be needed to solve the NLP problems would have been equal to approximately days. This is the ratio 0.245/0.312 Cardinality of the learning set for m = 200 V. C
ONCLUSION AND FUTURE WORK
In this paper a heuristic is proposed for the design ofuncertainty-aware dynamic output feedback for uncertainnonlinear models. This heuristic is based on off-linesolution of deterministic
NLP problems over a cloud ofsampled contexts followed by a receding horizon samplescollection that enriches the learning data for the same off-linecomputation load. The results show promising closed-loopperformance when compared to ideal performances obtainedwith the rigorous knowledge of the parameters. Theseperformance highly outperform nominal design based on themost expected value of the parameters.Undergoing works concern the application of the frameworkto more challenging examples and investigate heuristics toquickly guess the optimal triplet (
M, m, N ) for a givenuncertain nonlinear model.R EFERENCES[1] M¨uller M. A., David Angeli, Frank Allg¨ower, Rishi Amrit, and James B.Rawlings. Convergence in economic model predictive control withaverage constraints.
Automatica , 50(12):3100 – 3111, 2014.[2] Veronica Adetola and Martin Guay. Robust adaptive mpc for constraineduncertain nonlinear systems.
International Journal of Adaptive Controland Signal Processing , 25(2):155–167, 2011.[3] M. Alamir. On the use of supervised clustering in stochastic NMPCdesign.
IEEE Transactions on Automatic Control (Early access via DOI:10.1109/TAC.2020.2970424) , 2020.[4] J. A. E. Andersson, J. Gillis, G. Horn, J. B Rawlings, and M. Diehl.CasADi – A software framework for nonlinear optimization and optimalcontrol.
Mathematical Programming Computation , 2018.[5] D. Bertsekas.
Reinforcement Learning and Optimal Control . AthenaScientific, 2019.[6] G. Besanc¸on.
Nonlinear Observers and Applications . Lecture Notes inControl and Information Sciences. Springer Berlin Heidelberg, 2007.[7] Martin Guay, Veronica Adetola, and Darryl DeHaan.
Robust andAdaptive Model Predictive Control of Nonlinear Systems . Institutionof Engineering and Technology, 2015.[8] Benjamin Karg, Teodoro Alamo, and Sergio Lucia. Probabilisticperformance validation of deep learning-based robust nmpc controllers,2019.[9] Sergio Lucia, Tiago Finkler, and Sebastian Engell. Multi-stage nonlinearmodel predictive control applied to a semi-batch polymerization reactorunder uncertainty.
Journal of Process Control , 23(9):1306 – 1319, 2013.[10] Sergio Lucia and Benjamin Karg. A deep learning-based approachto robust nonlinear model predictive control.
IFAC-PapersOnLine ,51(20):511 – 516, 2018. 6th IFAC Conference on Nonlinear ModelPredictive Control NMPC 2018.[11] L. Magni and R. Scattolini.
Robustness and Robust Design of MPCfor Nonlinear Discrete-Time Systems , pages 239–254. Springer BerlinHeidelberg, Berlin, Heidelberg, 2007.[12] D. Mayne. Robust and stochastic model predictive control: Are we goingin the right direction?
Annual Reviews in Control , 41:184 – 192, 2016.[13] D. Q. Mayne, J.B. Rawlings, C. V. Rao, and P. O. M. Scokaert. Con-strained model predictive control: Stability and optimality.
Automatica ,36:789–814, 2000.[14] A. Mesbah. Stochastic model predictive control with active uncertaintylearning: A survey on dual control.
Annual Reviews in Control , 45:107– 117, 2018.[15] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion,O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vander-plas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duch-esnay. Scikit-learn: Machine learning in Python.
Journal of MachineLearning Research , 12:2825–2830, 2011.[16] G. Schildbach, L. Fagiano, Ch. Frei, and M. Morari. The scenarioapproach for stochastic model predictive control with bounds on closed-loop constraint violations.