[PDF] A Heuristic for Dynamic Output Predictive Control Design for Uncertain Nonlinear Systems

Abstract

In this paper, a simple heuristic is proposed for the design of uncertainty aware predictive controllers for nonlinear models involving uncertain parameters. The method relies on Machine Learning-based approximation of ideal deterministic MPC solutions with perfectly known parameters. An efficient construction of the learning data set from these off-line solutions is proposed in which each solution provides many samples in the learning data. This enables a drastic reduction of the required number of Non Linear Programming problems to be solved off-line while explicitly exploiting the statistics of the parameters dispersion. The learning data is then used to design a fast on-line output dynamic feedback that explicitly incorporate information of the statistics of the parameters dispersion. An example is provided to illustrate the efficiency and the relevance of the proposed framework. It is in particular shown that the proposed solution recovers up to 78\% of the expected advantage of having a perfect knowledge of the parameters compared to nominal design.

Full PDF

11 A Heuristic for Dynamic Output Predictive ControlDesign for Uncertain Nonlinear Systems

Mazen Alamir

Abstract —In this paper, a simple heuristic is proposed for thedesign of uncertainty aware predictive controllers for nonlinearmodels involving uncertain parameters. The method relies onMachine Learning-based approximation of ideal deterministicMPC solutions with perfectly known parameters. An efﬁcientconstruction of the learning data set from these off-line solutionsis proposed in which each solution provides many samples in thelearning data. This enables a drastic reduction of the requirednumber of Non Linear Programming problems to be solved off-line while explicitly exploiting the statistics of the parametersdispersion. The learning data is then used to design a fast on-lineoutput dynamic feedback that explicitly incorporate informationof the statistics of the parameters dispersion. An example isprovided to illustrate the efﬁciency and the relevance of theproposed framework. It is in particular shown that the proposedsolution recovers up to 78% of the expected advantage of havinga perfect knowledge of the parameters compared to nominaldesign.

Index Terms —Machine Learning, Nonlinear Systems, OutputFeedback, Stochastic NMPC.

I. I

NTRODUCTION

Over the last two decades, Nonlinear Model PredictiveControl (NMPC) [13] spread in all domains and became aﬁrst choice when it comes to design control feedbacks forsystems that admit faithful models. This is due to the maturityof its stability assessment as well as to the availability ofuser-friendly solvers [4]. As a direct consequence of thissuccess, investigations started regarding the best way toextend the use of NMPC to systems that are representedthrough uncertain models.After some early attempts involving over stringent robustNMPC frameworks [11], robust adaptive MPC frameworkshave been proposed [2], [7] to provide on-line modeladaptation for uncertain systems under the assumption ofafﬁne parameters, strict identiﬁability, persistent excitationand full state measurement. More recently, the concept ofStochastic NMPC (SNMPC) [12], [14] emerged. In a nutshell,SNMPC frameworks implement the basic NMPC schemewith the standard deterministic cost and constraints beingreplaced by their expected values, considered as functions ofthe model’s uncertainties. While the conceptual simplicity ofthis shift in paradigm is appealing, its consequence on thecomputational burden appears as a deal breaker despite somerecent attempts [3], [9], [16]. Indeed, SNMPC requires theon-line computation of the expectation of nonlinear functions

This work was supported by the MIAI @ GrenobleAlpes under Grant ANR-19-P31A-003.The author is with the University of Grenoble Alpes, CNRS, Grenoble INP,GIPSA-lab, 38000 Grenoble, France (e-mail: [email protected]). of several variables. This computation is to be performedfor any value of the decision variable that is encounteredduring the on-line iterations of the optimization algorithm.This complexity justiﬁed an increasing interest in off-linecomputation based solutions.One of these is related to Stochastic Dynamic Programming(SDP) / Reinforcement Learning (LR) alternatives [5],[14] which offer elegant solutions for small-size problems.Another research track is related to the approximation ofthe appropriate optimal feedback/strategy via Deep/Machinelearning-based ﬁtting schemes (see [8], [10] and the referencestherein). More precisely, a cloud of initial states is created andfor each element in the cloud, a multi-stage SNMPC is solved(this needs ﬁnite and generally small number of possibleuncertainty values to be allowed). Then a neural network istrained via deep-learning in order to ﬁt a function that mapsthe initial state to the stochastically optimal feedback controland or strategy. This approach leads to extremely heavyoff-line computational burden without providing theoreticalguarantees, not only because of the inherent and unavoidabledata incompleteness which is a shared issue by all data-drivensolutions (including the one proposed in this paper), butalso because all these schemes mainly use the multi-stageSNMPC framework which needs the uncertainty realizationsto be limited to a few number of possible values whichobviously scarcely ﬁts the real-life situations. Consequently,these approaches should be viewed as smart and reasonablytractable heuristics to address a problem that admits norigorously computable solution.It is not the intention of this paper to propose a betterframework than the ones cited above and the referencestherein. This is because the comparison between heuristics isa complex multi-criteria based process that would go beyondthe scope of this short contribution. In particular, while thescheme proposed hereafter is appealing from a computationalpoint of view, its scope of application is limited to nonlinearmodels with ﬁxed (or slowly varying) uncertain parametersas main source of uncertainty. In this particular but highlyrelevant case, the following intuition motivates this paper:under speciﬁc circumstances, the deterministic op-timal solution that would have been computed,should the parameters be known, can be learned fromthe previous measurement over some observation win-dow. a r X i v : . [ ee ss . S Y ] F e b In the ideal case, this statement results from a perfectidentiﬁability of the pair of state and parameter vectorsleading to the optimal MPC control being an exact functionof the previous measurement y ( − ) k : ˆ u (cid:63) (cid:0) y ( − ) k (cid:1) = arg min u ∈ U N J (cid:0) u | ˆ x ( y ( − ) k ) , ˆ w ( y ( − ) k ) (cid:1) (1)making y ( − ) k and ˆ u (cid:63)k ideal candidates to be viewed respectivelyas features vector and a label in a Machine Learning (ML)identiﬁcation step. When extended observability does not hold,the above relationship can be used with the understandingthat ˆ x ( y ( − ) k ) and ˆ w ( y ( − ) k ) are non uniquely determinedquantities since for a given instance of the measurementproﬁle y ( − ) k , there is a cloud of possible values of (ˆ x, ˆ w ) and hence a cloud of possibly indistinguishable optimalcontrol inputs ˆ u (cid:63) . The data generation using the statisticsof the dispersion of w and the following ML identiﬁcationstep helps making a statistically rational choice among theelements of this cloud.Following the above lines, this paper proposes a frameworkfor the design of uncertainty-aware output feedback lawwhich is ﬁtted from learning data that is constructed in acomputationally efﬁcient manner. The latter involves twomain differences with standard approaches: 1) the off-linecomputation involves only deterministic optimal controlproblems. and can therefore directly beneﬁt from the state-of-the-art available NMPC solvers. 2) Each single solutionis used to extract multiple instances (that can be as high ashundreds) to be included in the learning data set. This is doneby exploring the open-loop optimal optimal trajectories andconcatenate the sub-optimal instances of ( y ( − ) k + i , u (cid:63)k + i ( y ( − ) k )) rather than collecting a single pair.This paper is organized as follows: Section II presentssome deﬁnitions and notation and states precisely theproblem under study. The proposed framework is detailed inSection III together with the working assumptions that mightinduce its success. An illustrative example is proposed inSection IV in order to show the steps of the framework andevaluate the impact of parameters choice on the quality ofthe results. Finally, Section V concludes the paper and givessome hints for further investigations.II. D EFINITION , NOTATION AND PROBLEM STATEMENT

Let us consider nonlinear systems governed by the followingdiscrete-time dynamics: x k +1 = f ( x k , u k , w ) (2) y k = h ( x k , u k , w ) (3)where (2) and (3) describe the dynamics and the measurementequation respectively. The notation x k ∈ R n x , u k ∈ U ⊂ R n u and y k ∈ R n y stand for the state, input and measured outputvectors at instant k . The vector of parameters w ∈ R n w issupposed to be constant although uncertain. Moreover, it isassumed that w belongs to some bounded subset W overwhich a probability density function (pdf) W is known so that a statistically relevant samples of w can be drawn ifrequired. It is also assumed that a set of states X of interestis considered with some random sampling rule which reﬂectssome form of relative importance or somehow relevance . Inwhat follows, a boldfaced notation refer to signals proﬁles oversome past or future time windows, in particular the notation u := ( u T , u T , . . . , u TN − ) T ∈ U N denotes a sequence offuture control actions over some prediction horizon of length N while y ( − ) k refers to the sequence of M + 1 previousmeasurement vectors, namely y ( − ) k :=  y k y k − ... y k − M  ∈ (cid:2) R n y (cid:3) M +1 (4)The notation x ( u ) ( x , w ) =  x ( u )0 ( x , w ) x ( u )1 ( x , w ) ... x ( u ) N ( x , w )  ∈ (cid:104) R n x (cid:105) N +1 denotes the state trajectory under (2) when the control proﬁle u and when the parameter vector w is used, namely: x ( u )0 ( x , w ) = x x ( u ) k +1 ( x , w ) = f ( x ( u ) k ( x , w ) , u k , w ) It is assumed that the control objective would be deﬁnedthrough a cost function of the form J ( u | x k , w ) := N (cid:88) i =1 (cid:96) i (cid:0) x ( u ) i ( x k ) , u i − (cid:1) (5)should the current state x k and the parameter vector w beknown. The design problem that is addressed in the presentpaper can be shortly stated as follows:Design a dynamic output feedback of the form: u k := K ( y ( − ) k ) that associates to each past measurement proﬁle y ( − ) k an associated input in such a way that in-corporates the presumably known statistics W andthat is oriented towards the optimization of the costfunction J .U NCERTAINTY - AWARE DYNAMIC OUTPUT FEEDBACK R EMARK it is assumed that all the constraints except thecontrol saturation related ones (which are expressed through U ) are included in the very deﬁnition of the stage cost map (cid:96) for the sake of simplicity of exposition. Moreover, it should beunderlined that the speciﬁc structure of the cost function, beingthe sum of decoupled stage-cost terms is not explicitly neededalthough it is kept because it ﬁts all the available optimalcontrol problems’ solvers that we rely on in the design step. In the absence of any knowledge, uniform distribution over some hyper-box can be simply used.

III. T

HE PROPOSED FRAMEWORK

A. The design viewed as a map identiﬁcation problem

The starting point lies in the fact that the problem reducesto a standard deterministic problem if the pair ( x k , w ) ofstate and parameter vector were reconstructible via extendedobservation [6], namely, if one can reconstruct estimations ˆ x k and ˆ w of these two quantities as functions of the previousmeasurement: ˆ x ( y ( − ) k ) ; ˆ w ( y ( − ) k ) (6)Indeed, in this case, the answer to the problem stated in theprevious section would obviously be given by: K ( y ( − ) k ) := ˆ u (cid:63) (cid:0) y ( − ) k (cid:1) (7)where ˆ u (cid:63) ( y ( − ) k ) is the ﬁrst control in the optimal sequencethat solves the following optimization problem: ˆ u (cid:63) (cid:0) y ( − ) k (cid:1) = arg min u ∈ U N J (cid:0) u | ˆ x ( y ( − ) k ) , ˆ w ( y ( − ) k ) (cid:1) (8)When extended observability does not hold, the pairs ˆ x ( y ( − ) k ) , ˆ w ( y ( − ) k ) are non uniquely determined quantities since for a given instance of the measurement proﬁle y ( − ) k ,there is a cloud of possible values of (ˆ x, ˆ w ) and hence acloud of possibly indistinguishable optimal control inputs ˆ u (cid:63) . The data generation using the statistics of the dispersionof w and the following ML identiﬁcation step helps makinga statistically rational choice among the elements of this cloud.To summarize:Given the statistical dispersion of the parameters,ﬁt the best map F ( y ( − ) k ) ≈ ˆ u (cid:63) ( · ) that links thesequence of previous measurement y ( − ) k to theoptimal control u k = F ( y ( − ) k ) .T HE MAP TO BE IDENTIFIED

Now the question is:

How to build a relevant and rich learning data thatcan be used to ﬁt the map F while reducing as far aspossible the amount of off-line computation ?B. Building the learning data Consider the following procedure that is the elementary blocin the data generation step:1) Choose a long prediction horizon N .2) Choose an observation horizon M < N .3) Randomly choose a pair q = ( x , w ) ∈ R n x × R n w

4) Use an efﬁcient NLP solver (for instance

IPOPT-Casadi [4]) to compute a minimizer u (cid:63) ( q ) to J ( · | q ) . One gets the situation depicted inFigure 1.5) This item explains how m samples in the learningdata is created using the optimal open-loop trajectories computed in the previous item for the speciﬁc value of q = ( x , w ) . Indeed, consider the pair (see Figure 1): (cid:16) y ( − ) , ˆ u (cid:63) ( y ( − ) ) (cid:17) := (cid:16) [ y (cid:63) ( q )] ( − ) M , u (cid:63)M ( q ) (cid:17) (9)The key idea is that the pair deﬁned in (9) approximates apair of the form (7) provided that the prediction horizon N is sufﬁciently long. Indeed, y ( − ) := [ y (cid:63) ( q )] ( − ) M , consideredat instant M is the vector of past measurements. It thereforecontains the information regarding the pair ( x (cid:63)M ( x ) , w ) ofcurrent state and parameter vector. time x NM [ y (cid:63) ( q )] ( − ) M y ( − ) = Optimal trajectories u (cid:63) ( q )ˆ u (cid:63) ( y ( − ) ) := u (cid:63)M ( q ) Fig. 1. Generation of a sample pair ( y ( − ) , ˆ u (cid:63) ( y ( − ) ) by solving an optimalcontrol problem for a given sampled pair q = ( x , w ) . Now it is true that in order for (9) to be of the form (7), thefollowing approximation: u (cid:63)M ( q ) ≈ u (cid:63) ( x (cid:63)M ( q )) (10)should be satisﬁed. But this approximation would have beena strict equality should N be inﬁnite thanks to the Bellmanprinciple ! Indeed, u (cid:63)M ( q ) is the beginning of the remainingpart of an optimal solution while u (cid:63) ( x (cid:63)M ( q )) is the optimalsolution of the updated problem and these two solutionswould be identical should N be inﬁnite.Now the same argument holds if we iterate the process(still for the same q ) using moving windows that end atinstants M + j , for j = 1 , . . . , m − . This means thatby performing a single deterministic NLP solution, we cangenerate m samples in the learning data, namely: D ( q ) := (cid:40)(cid:16) [ y (cid:63) ( q )] ( − ) k , u (cid:63)k ( q ) (cid:17)(cid:41) M + m − k = M (11)This procedure is shown in Figure 2.Repeating this procedure for a cloud of n q sampledvalues of the pair q , one gets a learning data: D := (cid:110) D ( q ( (cid:96) ) ) (cid:111) n q (cid:96) =1 (12)containing ( n q · m ) samples at the price of solving n q deter-ministic optimal control problems. C. Performance evaluation

The learning data D can now be used to ﬁt a model of theform : u k = ML ( y ( − ) k ) (13) ML stands for (Machine Learning)-based model time x NM y (cid:63) ( − ) M Optimal trajectories u (cid:63) ( q )ˆ u (cid:63) ( y (cid:63) ( − ) M ) := u (cid:63)M ( q ) time x NM + 1 y (cid:63) ( − ) M +1 Optimal trajectories u (cid:63) ( q )ˆ u (cid:63) ( y (cid:63) ( − ) M +1 ) := u (cid:63)M +1 ( q ) time x NM + 2 y (cid:63) ( − ) M +2 Optimal trajectories u (cid:63) ( q )ˆ u (cid:63) ( y (cid:63) ( − ) M +2 ) := u (cid:63)M +2 ( q ) Fig. 2. Construction of the learning data D ( q ) : by going forward in a movingwindow along the optimal trajectory computed for a single pair q = ( x , w ) it is possible to generate a high number of different samples of pair ( y ( − ) , u ) that can be used in the construction of the learning data. This feedback can be applied in closed-loop to a new cloud of values of (cid:8) q ( j ) = ( x ( j )0 , w ( j ) ) (cid:9) n v j =1 . Namely for each ofthese values, the feedback is implemented without knowledgeof the associated w ( j ) (only the vector of past measurementis used through (13)). If the closed-loop is simulated dur-ing N samples (equal to the prediction horizon), then thecorresponding closed-loop cost, say J clj can be compared tothe exact optimal cost J (cid:63)j that can be viewed as the idealsolution (since deterministic problems are solved assumingthat the parameter values w ( j ) is known). Obviously, if theoptimization is perfectly done, one should systematically have J clj ≤ J (cid:63)j since any closed-loop sequence over the predictionhorizon N is naturally a candidate sequence of the open-loopoptimization problem.IV. N UMERICAL INVESTIGATION

Consider the parallel reactor system [1] commonly used inthe study of deterministic Economic NMPC: ˙ x = 1 − w x e − /x − w x e − w /x − x (14) ˙ x = w x e − /x − x (15) ˙ x = u − x (16)where x and x stand for the concentrations of reactant andproduct respectively while x represents the temperature ofthe mixture in the reactor. The control variable is given by the Fig. 3. Dispersion of the parameters in the learning data in terms of ratio tothe nominal components. (Ratios to nominal values are shown) heat ﬂow u ∈ U := [0 . , . . The objective of the controlis to maximize the amount of product x which is preciselythe only measured variable. This leads to the following costfunction: J ( u | x , w ) = − N (cid:88) i =1 Cx ( u ) i ( x , w ) with C = (0 , , In the forthcoming computations, a sampling period of . time units is used. Based on this value and observing typicaloptimal trajectories, the prediction horizon length N = 250 has been considered as it leads to a prediction horizon that isslightly greater than the settling time. The past measurementhorizon length used to deﬁne y ( − ) k is taken equal to M = 10 . A. Generating the learning and validation data

The initial states x are uniformly sampled inside the set X := [10 − , . × [10 − , . × [10 − , . (17)which is known to include almost all relevant evolutions of thesystem. As for the values of the unknown parameter vector w ,a normal distribution around the nominal value given by: w = (10 , , . is used so that the random sampling is based on the rule: w i = (1 + ν i ) w i with ν i ∈ N (0 , σ i ) (18)meaning that the true values are normally distributed aroundthe nominal value with a known standard deviations. Thevalues σ i = 0 . are used for i ∈ { , , } . Figure 3 showsthe dispersion of the parameter values that results from therandom sampling law (18). Note that the variations of theparameters span the interval going from 20% to 180% of thenominal values.The learning data set is generated according to (11)-(12) inwhich n q = 500 samples of pairs ( x , w ) are drawn inside theabove described sets. This leads to a learning set of cardinality (500 · m ) where m is the number of times the moving windowis translated in order to generate samples from a single solutiona deterministic optimal control problem. In what follows, Fig. 4. Cumulative histogram of computation time (sec) needed by IPOPT( multiple-shooting ) to solve a single optimal control problem when creatingthe learning data for the illustrative example. several identiﬁcation settings are tested for different valuesof m ∈ { , , , , } leading to learning data setsof different cardinality { , , , , } samples having all in common almost the same computationtime (the time needed to solve n q = 500 deterministic NLPproblems).The cumulative histogram of computation time (in sec)needed by a multiple-shooting implementation of the optimalcontrol solver IPOPT implemented using

Casadi is shownin Figure 4. The computation time shown in Figure 4 mightseem too long for those who are commonly using

IPOPT tosolve standard regulation problems. Indeed, the median timeis close de 15 sec! This is probably due to the oscillatingcharacter of optimal trajectory that is very speciﬁc to theunderlying economic NMPC formulation and to the relativelylong prediction horizon N = 250 being used.The histogram of control values over the prediction horizonof the n q = 500 sampled scenarios (including hence 125000values) is shown in Figure 5. This histogram suggests thatclassiﬁcation tools are best suited for the learning of thecontrol using ML tools compared to regression tools . Indeed,the following three-valued set of label values can be used:label =  if u ≤ . if u ∈ ]0 . , . otherwise (19)with this deﬁnition of the label, it is possible toidentify a classiﬁer that associates to any time seriesof measurement y ( − ) an element of the set { , , } andhence its corresponding control value in { . , . , . } .Note that if the distribution of the control values wasmore continuously distributed between u min = 0 . and u max = 0 . , a regressor would have been more appropriateto address the identiﬁcation problem. for this speciﬁc example. Fig. 5. Histogram of the values of the control u present in the scenarios(of length N = 250 each) contained in the learning data. B. Identifying the output feedback model

The features vector is build using the previous proﬁle of x as output. The Identiﬁcation of the map ML ( y ( − ) ) has beenobtained using a RandomForestClassifier from the scikit-learn freely available python library [15]. In orderto avoid overﬁtting and enhance the quality of the extrapola-tion on unseen data, the max_leaf_nodes parameter hasbeen set to . Fig. 6. Fitting results on the training and test data (split ratio (33%) for thedifferent values of receding horizon moves m used in the construction of thelearning data. Recall that the label the classiﬁer tries to guess lies in { , , } as explained in (19). The confusion matrices corresponding to the training andthe test data (using a test-size ratio of 33% of the learning data)are shown in Figure 6 for m ∈ { , , } . These matricesclearly show that when m = 10 is used (cardinality of thelearning set = 5000) the learning data is still not representativeenough (as the precision on training data is signiﬁcantly betterthan on test data). This clearly highlights by itself the relevanceof the proposed heuristic since the computation time reportedin Figure 4 suggests that the time that would be needed togenerate rigorously optimal although still insufﬁcientdata would be around hours while the time needed hereto generate samples is around 2h. C. Performance evaluation

A new set of pairs of ( x , w ) is generated and the initial states are used to generate learning data using the nominal value w of the parameter vector. This is done inorder to evaluate the beneﬁt from explicitly learn the feedbackfrom sampled parameter vector compared to a nominal designthat only use the nominal expected value.Then for each value of m , the corresponding identiﬁedfeedback law is used to generate closed-loop in whichthe newly drawn samples of the parameter vector (which areunknown to the previously identiﬁed feedbacks) is used inthe simulated model.The results are shown in Figure 7 where one can ﬁnd1) the ideal optimal closed-loop cost given by the NLPsolver IPOPT in the unrealistic case where the disturbanceis perfectly known 2) the closed-loop performance obtainedusing the nominal feedback described above and 3) theclosed-loop performance of the identiﬁed feedback using thedifferent values of m . Fig. 7. Comparison of the closed-loop performances on the validation dataset.

From Figure 7, the following observations can be made: • The optimal achievable values of the cost when theparameter vector is perfectly known is better(in average) than what the nominal identiﬁed controllerenables to achieve. • The proposed methodology enables to recover 78% ofthe advantages of knowing perfectly the value of theparameter vector when m = 200 is used in the construc-tion of the learning data. This corresponds to betteraverage than the nominally optimal identiﬁed controller.Almost similar results are obtained when m ∈ { , } is used. • The average performance drops to ( of re-covered advantage) when m = 50 is used while adrastic decrease corresponding to 3.9% (and only ofrecovered advantage) when short m = 10 is used in thebuilding of the learning data. It is important to underlinethat if strict optimality is used in the data set buildingstep (only one sample per NLP solution), the time thatwould be needed to solve the NLP problems would have been equal to approximately days. This is the ratio 0.245/0.312 Cardinality of the learning set for m = 200 V. C

ONCLUSION AND FUTURE WORK

In this paper a heuristic is proposed for the design ofuncertainty-aware dynamic output feedback for uncertainnonlinear models. This heuristic is based on off-linesolution of deterministic

NLP problems over a cloud ofsampled contexts followed by a receding horizon samplescollection that enriches the learning data for the same off-linecomputation load. The results show promising closed-loopperformance when compared to ideal performances obtainedwith the rigorous knowledge of the parameters. Theseperformance highly outperform nominal design based on themost expected value of the parameters.Undergoing works concern the application of the frameworkto more challenging examples and investigate heuristics toquickly guess the optimal triplet (

M, m, N ) for a givenuncertain nonlinear model.R EFERENCES[1] M¨uller M. A., David Angeli, Frank Allg¨ower, Rishi Amrit, and James B.Rawlings. Convergence in economic model predictive control withaverage constraints.

Automatica , 50(12):3100 – 3111, 2014.[2] Veronica Adetola and Martin Guay. Robust adaptive mpc for constraineduncertain nonlinear systems.

International Journal of Adaptive Controland Signal Processing , 25(2):155–167, 2011.[3] M. Alamir. On the use of supervised clustering in stochastic NMPCdesign.

IEEE Transactions on Automatic Control (Early access via DOI:10.1109/TAC.2020.2970424) , 2020.[4] J. A. E. Andersson, J. Gillis, G. Horn, J. B Rawlings, and M. Diehl.CasADi – A software framework for nonlinear optimization and optimalcontrol.

Mathematical Programming Computation , 2018.[5] D. Bertsekas.

Reinforcement Learning and Optimal Control . AthenaScientiﬁc, 2019.[6] G. Besanc¸on.

Nonlinear Observers and Applications . Lecture Notes inControl and Information Sciences. Springer Berlin Heidelberg, 2007.[7] Martin Guay, Veronica Adetola, and Darryl DeHaan.

Robust andAdaptive Model Predictive Control of Nonlinear Systems . Institutionof Engineering and Technology, 2015.[8] Benjamin Karg, Teodoro Alamo, and Sergio Lucia. Probabilisticperformance validation of deep learning-based robust nmpc controllers,2019.[9] Sergio Lucia, Tiago Finkler, and Sebastian Engell. Multi-stage nonlinearmodel predictive control applied to a semi-batch polymerization reactorunder uncertainty.

Journal of Process Control , 23(9):1306 – 1319, 2013.[10] Sergio Lucia and Benjamin Karg. A deep learning-based approachto robust nonlinear model predictive control.

IFAC-PapersOnLine ,51(20):511 – 516, 2018. 6th IFAC Conference on Nonlinear ModelPredictive Control NMPC 2018.[11] L. Magni and R. Scattolini.

Robustness and Robust Design of MPCfor Nonlinear Discrete-Time Systems , pages 239–254. Springer BerlinHeidelberg, Berlin, Heidelberg, 2007.[12] D. Mayne. Robust and stochastic model predictive control: Are we goingin the right direction?

Annual Reviews in Control , 41:184 – 192, 2016.[13] D. Q. Mayne, J.B. Rawlings, C. V. Rao, and P. O. M. Scokaert. Con-strained model predictive control: Stability and optimality.

Automatica ,36:789–814, 2000.[14] A. Mesbah. Stochastic model predictive control with active uncertaintylearning: A survey on dual control.

Annual Reviews in Control , 45:107– 117, 2018.[15] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion,O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vander-plas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duch-esnay. Scikit-learn: Machine learning in Python.

Journal of MachineLearning Research , 12:2825–2830, 2011.[16] G. Schildbach, L. Fagiano, Ch. Frei, and M. Morari. The scenarioapproach for stochastic model predictive control with bounds on closed-loop constraint violations.