[PDF] Sensitivity-based Data Augmentation for Learning an Approximate Model Predictive Controller

Abstract

Recently, there has been a surge of interest in approximating the model predictive control (MPC) law using expert supervised learning techniques, such as deep neural networks (DNN). Approximating the MPC control policy requires labeled training data sets, which is typically obtained by sampling the state-space and evaluating the control law by solving the numerical optimization problem offline for each sample. The accuracy of the MPC policy approximation is dependent on the availability of large training data set sampled across the entire state space. Although the resulting approximate MPC law can be cheaply evaluated online, generating large training samples to learn the MPC control law can be time consuming and prohibitively expensive. This paper aims to address this issue, and proposes the use of NLP sensitivities in order to cheaply generate additional training samples in the neighborhood of the existing samples.

Full PDF

aa r X i v : . [ m a t h . O C ] S e p Sensitivity-based Data Augmentation for Learning an ApproximateModel Predictive Controller

Dinesh Krishnamoorthy

Member, IEEE

Abstract —Recently, there has been a surge of interest inapproximating the model predictive control (MPC) law usingexpert supervised learning techniques, such as deep neuralnetworks (DNN). Approximating the MPC control policy requireslabeled training data sets, which is typically obtained by samplingthe state-space and evaluating the control law by solving thenumerical optimization problem ofﬂine for each sample. Theaccuracy of the MPC policy approximation is dependent on theavailability of large training data set sampled across the entirestate space. Although the resulting approximate MPC law canbe cheaply evaluated online, generating large training samplesto learn the MPC control law can be time consuming andprohibitively expensive. This paper aims to address this issue,and proposes the use of NLP sensitivities in order to cheaplygenerate additional training samples in the neighborhood of theexisting samples.

I. INTRODUCTIONModel predictive control (MPC) is a popular control strategyfor constrained multivariable systems that is based on repeat-edly solving a receding horizon optimal control problem ateach sampling time of the controller. As the range of MPCapplication extends beyond the traditional process industries,additional challenges such as computational effort and memoryfootprint need to be addressed. One approach to eliminatethe need for solving optimization problems online, is topredetermine the optimal control policy u ∗ = π ( x ) as afunction of the states x .This idea was ﬁrst proposed under the context of explicitMPC for constrained linear quadratic systems where the MPCfeedback law is expressed as a piecewise-afﬁne functiondeﬁned on polytopes [1], [2]. However, this can quicklybecome computationally intractable for large systems, sincethe number of polytopic regions grows exponentially with thenumber of decision variables and constraints. The extensionto nonlinear systems is also not straightforward.An alternative approach is to use some parametric functionapproximator, such as artiﬁcial neural networks (ANN) to ap-proximate the MPC control law. Although this idea dates backto the mid 90s [3], the use of neural networks to approximatethe MPC control law remained more or less dormant until veryrecently. Motivated by the recent developments and promisesof deep learning techniques, there has been an unprecedentedsurge of interest in the past couple of years in approximatingthe MPC policy using deep neural networks. This interest hasresulted in a number of research works from several research *This work was supported by the Research Council of Norway, under theIKTPLUSS program (Project number 299585)Dinesh Krishnamoorthy is with Department of Chemical Engineering,Norwegian University of Science and Technology, 7491, Trondheim, Norway [email protected] groups published just in the past couple of years. See forexample [4]–[11] to name a few.The underlying framework adopted in these works is asfollows. The feasible state-space is sampled ofﬂine to generatea ﬁnite number of N s discrete states { x i } N s i =1 . The NMPCproblem is solved ofﬂine for each discrete state as the initialcondition to obtain the corresponding optimal control law u ∗ i = π mpc ( x i ) for all i = 1 , . . . , N s . The resulting MPC con-trol law π mpc ( · ) is approximated using any suitable regressiontechnique using { ( x i , u ∗ i ) } N s i =1 as the training data set, suchthat the trained model π approx ( · ) can be used online to cheaplyevaluate the optimal control input. This approach is alsostudied more generally in the context of policy approximation ,where such a framework is known as “expert supervisedlearning” [12].However, one of the main bottleneck of this approach is that,generating the training data set can be time consuming andprohibitively expensive. The availability of large training dataset covering the entire feasible state space is a key stipulationin using deep learning techniques and has a major impacton the accuracy of the approximate policy. This implies thatthe sample size N s must be sufﬁciently large, covering theentire feasible state space. One then typically has to solvea large number of nonlinear programming (NLP) problemsofﬂine in order to generate adequate training samples. Thischallenge is only ampliﬁed for higher dimensional systems,since the number of samples N s required to adequately coverthe feasible state-space increases exponentially.For example, the authors in [9] reported a computationtime of roughly 500 hours on a Quad-Core PC to learn theapproximate MPC control law for the case study consideredin their work. Other works also report the need for a largetraining data set to adequately approximate the MPC controllaw.In the ﬁeld of machine learning and deep neural networks,the problem of insufﬁcient training data samples is typicallyaddressed using a process known as “data augmentation”,which is a strategy to artiﬁcially increase the number oftraining samples using computationally inexpensive transfor-mations [13], [14]. This has been extensively studied in thecontext of deep learning for image classiﬁcation problems,where geometric transformations (such as rotation, croppingetc.) and photometric transformations (such as color, contrast,brightness etc.) are often used to augment the existing dataset with artiﬁcially generated training samples. Unfortunately,such data augmentation techniques are not applicable in thecontext of MPC policy approximation.This paper aims to address the key issue of generating the without parallelization of the sampling and validation raining data samples by exploiting the NLP sensitivities togenerate multiple training samples using the solution of asingle optimization problem solved ofﬂine. That is, the MPCproblem solved ofﬂine can be considered as a parametricoptimization problem parameterized with respect to the initialstate x i . The NLP sensitivity then tells us how the optimalsolution u ∗ i changes for perturbations ∆ x in the neighborhoodof x i .Therefore, using the solution to one parametric optimizationproblem solved for x i , we can cheaply generate multiple train-ing data samples for other state realizations in the neighbor-hood of x i using the NLP sensitivity (also known as tangentialpredictor). This only requires computing the solution to asystem of linear equations, which is much cheaper to evaluatethan solving a nonlinear programming problem.To this end, the aim of this paper is not to present anew MPC approximation algorithm, but rather address thepivotal issue of generating training data samples, that wouldfacilitate practical implementation of the approximate explicitMPC framework. Thus, the main contribution of this paper isa sensitivity-based data augmentation technique to efﬁcientlyand cheaply generate training data samples that can be usedto approximate the MPC control law.The reminder of the paper is organized as follows. Section IIformulates the problem and recalls the approximate explicitMPC framework. The sensitivity-based data augmentationtechnique to efﬁciently generate the training samples is pre-sented in Section III. The proposed approach is illustratedusing two different examples in Section IV before concludingthe paper in Section V.II. PRELIMINARIES A. Problem Formulation

Consider a discrete-time nonlinear system x ( t + 1) = f ( x ( t ) , u ( t )) (1)where x ( t ) ∈ R n x and u ( t ) ∈ R n u are the states and controlinput at time t respectively. The mapping f : R n x × R n u → R n x denotes the plant model. The MPC problem P ( x ( t )) isformulated as V N ( x ( t )) = min x ( ·| t ) ,u ( ·| t ) N − X k =0 ℓ ( x ( k | t ) , u ( k | t )) (2a)s.t. x ( k + 1 | t ) = f ( x ( k | t ) , u ( k | t )) (2b) x ( k | t ) ∈ X , u ( k | t ) ∈ U (2c) x ( N | t ) ∈ X f (2d) x (0 | t ) = x ( t ) (2e)where ℓ : R n x × R n u → R denotes the stage cost, which maybe either a tracking or economic objective, N is the length ofthe prediction horizon, (2c) denotes the path constraints, (2d)denotes the terminal constraint, and (2e) denotes the initialcondition constraints. In the traditional MPC paradigm, theoptimization problem (2) is solved at each sample time t using x ( t ) as the state feedback, and the optimal input u ∗ (0 | t ) is Algorithm 1

Learning an approximate MPC control law

Input: P ( x ) , X , D = ∅ for i = 1 , . . . , N s do Sample x i ∈ X u ∗ i ← Solve P ( x i ) D ← D ∪ { ( x i , u ∗ i ) } end for ˆ θ ← arg min θ N s P N s i =1 k π approx ( x i ; θ ) − u ∗ i k Output: π approx ( x ; ˆ θ ) injected into the plant in a receding horizon fashion. Thisimplicitly leads to the control law u ∗ ( t ) = π mpc ( x ( t )) (3) B. Approximate Explicit MPC

This subsection recalls the underlying idea of the ap-proximate explicit MPC framework common to works suchas [3], [5], [8] and [9]. To approximate the MPC controllaw (3), the feasible state space X is sampled to generate N s randomly chosen initial states { x i } N s i =1 . For each initialstate x i , the MPC problem P ( x i ) is solved to obtain thecorresponding optimal input u ∗ i = π mpc ( x i ) . Using the datasamples D := { ( x i , u ∗ i ) } N s i =1 , any desirable functional form π approx ( x ; θ ) parameterized by the parameters θ is trained inorder to minimize the mean squared error ˆ θ = arg min θ N s N s X i =1 k π approx ( x i ; θ ) − u ∗ i k (4)This is summarized in Algorithm 1.Deep neural networks have become a popular choice for thefunctional form for approximating the MPC control law. Fora deep neural network with L hidden layers and M neuronsin each hidden layer, π approx ( x ; θ ) = h L +1 ◦ α L ◦ h L ◦ · · · ◦ α ◦ h (5)Each hidden layer is made of an afﬁne function h l ( ξ l − ) = W T l ξ l − + b l ∀ l = 1 , . . . , L where ξ l − ∈ R M is the outputof the previous layer and ξ = x . α l ( h l ) : R → R denotes anonlinear activation function such as sigmoid, rectiﬁed linearunit (ReLU) etc. The parameter θ contains all the weights W l and biases b l .Once the network architecture is trained, the approximatecontrol law π approx ( x ; ˆ θ ) can be used online to cheaply evaluatethe optimal control input.III. PROPOSED METHODAs mentioned in the previous section, generating the train-ing data requires solving N s numerical optimization problems,which can be time consuming and computationally expensive.This section leverages the NLP sensitivity to cheaply generatetraining data samples that can be used to learn the MPC controlaw. To keep the notation light, we rewrite the MPC problem(2) into a standard parametric NLP problem of the form, V N ( p ) = min w J ( w , p ) (6a)s.t. c ( w , p ) = 0 (6b) g ( w , p ) ≤ (6c)where p = x (0 | t ) = x ( t ) is the initial state, the decisionvariables w := [ u (0 | t ) , . . . , u ( N − | t ) , x (1 | t ) , . . . , x ( N | t )] T the cost (2a) is denoted by (6a), the system equations (2b)are denoted by (6b), the path constraints (2c) and terminalconstraints (2d) are collectively denoted by (6c). Since thefocus is on solving the MPC problem ofﬂine, we drop thetime dependency of the initial state, and simply denote theinitial condition as x instead of x ( t ) .The Lagrangian of (6) is given by L ( w , p, λ, µ ) := J ( w , p ) + λ T c ( w , p ) + µ T g ( w , p ) (7)where λ and µ are the Lagrangian multipliers of (6b) and (6c)respectively, and the KKT conditions for this problem is givenby, ∇ w L ( w , p, λ, µ ) = 0 (8a) c ( w , p ) = 0 (8b) g ( w , p ) ≤ (8c) µ i g i ( w , p ) = 0 , µ i ≥ ∀ i (8d)Any point s ∗ ( p ) := [ w ∗ T , λ ∗ T , µ ∗ T ] T that satisﬁes the KKTconditions (8) for a given initial condition p is known asa KKT point for p . We deﬁne the set of active inequalityconstraints g A ( w , p ) ⊆ g ( w , p ) such that g A ( w , p ) = 0 , andstrict complementarity is said to hold if the correspondingLagrange multipliers µ A > . This set of KKT conditionscan be represented compactly as ϕ ( s ( p ) , p ) . Theorem 1 ( [15]) . Let J ( · , · ) and c ( · , · ) of the parametricNLP problem P ( p ) be twice continuously differentiable in aneighborhood of the KKT point s ∗ ( p ) . Further, let linearinequality constraint qualiﬁcation (LICQ), second order sufﬁ-cient conditions (SOSC) and strict complementarity hold forthe solution vector s ∗ ( p ) . Then • s ∗ ( p ) is a unique local minimizer of P ( p ) . • For parametric perturbations ∆ p in the neighborhood of p , there exists a unique, continuous, and differentiablevector function s ∗ ( p + ∆ p ) which is a KKT pointsatisfying LICQ and SSOSC for P ( p + ∆ p ) . • There exists positive Lipschitz constants L s and L V suchthat the solution vector and the optimal cost satisﬁes k s ∗ ( p + ∆ p ) − s ∗ ( p ) k ≤ L s k ∆ p k and k V N ( p + ∆ p ) − V N ( p ) k ≤ L V k ∆ p k respectively.Proof. See [15]Linearizing the KKT conditions ϕ ( s ( p ) , p ) around s ∗ ( p ) gives ∇ s ϕ ( s ∗ ( p + ∆ p ))= ∇ s ϕ ( s ∗ ( p )) + ∂∂p ∇ s ϕ ( s ∗ ( p ))∆ p + O ( k ∆ p k ) Algorithm 2

Sensitivity-based data augmentation to learn anapproximate MPC control law

Input: P ( x ) , X , D = ∅ for i = 1 , . . . , N s do Sample x i ∈ X s ∗ ( x i ) ← Solve P ( x i ) Extract u ∗ i from the solution vector s ∗ ( x i ) D ← D ∪ { ( x i , u ∗ i ) } for j = 1 , . . . , N p do Sample ∆ x j in the neighborhood of x i ˆ s ∗ ( x i + ∆ x j ) = s ∗ ( x i ) + M − N ∆ x j Extract ˆ u ∗ j from the solution vector s ∗ ( x i + ∆ x j ) D ← D ∪ { ( x i + ∆ x j , ˆ u ∗ j ) } end for end for ˆ θ ← arg min θ N s N p P N s N p i =1 k π approx ( x i ; θ ) − u ∗ i k Output: π approx ( x ; ˆ θ ) Consequently, ∂∂p ∇ s ϕ ( s ∗ ( p ))∆ p = (cid:18) M ∂ s ∗ ∂p + N (cid:19) ∆ p ≈ (9)where M :=  ∇ ww L ( s ∗ ( p )) ∇ w c ( w ∗ ( p )) ∇ w g A ( w ∗ ( p )) ∇ w c ( w ∗ ( p )) T ∇ w g A ( w ∗ ( p )) T  is the KKT matrix and N :=  ∇ w p L ( s ∗ ( p )) ∇ p c ( w ∗ ( p )) T ∇ p g A ( w ∗ ( p )) T  Therefore, the solution of the neighboring problems p + ∆ p can be obtained from ˆ s ∗ ( p + ∆ p ) = s ∗ ( p ) + ∂ s ∗ ∂p ∆ p (10)where ˆ s ∗ ( p + ∆ p ) is the approximate primal-dual solutionof the optimization problem P ( p + ∆ p ) . Therefore, ∆ s ∗ :=ˆ s ∗ ( p + ∆ p ) − s ∗ ( p ) can be computed as the solution to thesystem of linear equations, M ∆ s ∗ = −N ∆ p (11)From this we can see that once the solution to the NLPproblem P ( x i ) is available for a given initial state x i ∈ X ,we can exploit the parametric property of the NLP to computea fast approximate solution for an additional ﬁnite set of j = 1 , . . . , N p optimization problems P ( x i + ∆ x j ) withinitial states x i + ∆ x j ∈ X in the neighborhood of x i .Using the solution to a system of linear equations (11) thecorresponding optimal solution, denoted by ˆ u ∗ j can then becheaply evaluated using (11). By exploiting the sensitivities,one can generate N s N p number of training samples usingonly N s ofﬂine optimization problems. The pseudo-code forthe proposed sensitivity-based data augmentation techniqueo learn an approximate MPC control law is summarized inAlgorithm 2. Remark 1.

In the case of linear MPC where ℓ ( · , · ) is convexquadratic and f ( · ) is linear, then ˆ s ∗ ( x + ∆ x ) = s ∗ ( x + ∆ x ) . Remark 2.

If the perturbation ∆ x j induces a change in the setof active constraints, then one would have to solve a quadraticprogramming problem, often known as predictor QP, in orderto obtain the approximate solution ˆ s ∗ ( x + ∆ x ) [16], whichmay still be computationally cheaper than solving a full NLPproblem. Alternatively, one can simply discard the sensitivityupdates that induce a change in active constraint set. Note that the idea of exploiting the parametric property ofthe MPC problem with respect to the initial states is also usedin other parts of MPC literature such as the advanced-stepMPC [17], [18] and adaptive horizon MPC [19], [20].The proposed sensitivity-based data augmentation schemecan also be utilized by parameterizing the optimization prob-lem with respect to other time-varying parameters such as ex-ogenous disturbances, or time varying setpoints, in addition tothe initial states. The proposed approach is also not restrictedto the MPC formulation (2), but can also be used with othervariants of MPC formulation, such as the multistage scenario-based MPC used in [21].IV. ILLUSTRATIVE EXAMPLES

A. Benchmark CSTR

We now apply the proposed approach on a benchmarkCSTR problem from [22] that was also used in the context ofapproximate MPC in [9]. This problem consists of two states,namely the concentration and reactor temperature (denoted by x and x respectively). The process is controlled using thecoolant ﬂow rate u . The model is given by ˙ x = (1 /τ )(1 − x ) − kx e − M/x ˙ x = (1 /τ )( x f − x ) + kx e − M/x − αu ( x − x c ) and the model parameters are τ = 20 , k = 300 , M = 5 , x f = 0 . , x c = 0 . , and α = 0 . . The feasible statespace is given by X = [0 . , . × [0 . , . and U = [0 , . The setpoint is given by x sp = [0 . , . T .The stage cost is given by ℓ ( x, u ) = k x − x sp k +10 − k u k .The MPC problem is solved with a sampling time of 3 s anda prediction horizon of N = 140 .One approach to generate the learning samples is to use agrid-based sampling approach as done in [9], where the statespace X is divided into ﬁnite number of uniform grids. Theoptimal input u ∗ = π mpc ( x ) is then evaluated at each gridpoint. In general, a small grid size is preferred since this wouldimprove the MPC approximation. However, this would lead tolarge sample size N s .The proposed approach enables us to choose a relativelylarger grid size, where the corresponding optimal input π mpc ( x ) is evaluated by solving the optimization problem. Ad-ditional grid points can then be generated with a smaller gridsize around each grid point, and the corresponding optimalinput can be computed by using the sensitivity update (11). Fig. 1: Example A: Grid-based sampling of the state space X . Red circles denote the samples, where the correspondingoptimal input is generated by solving the full optimizationproblem, and the black dots denotes the samples where thecorresponding optimal input is generated using the sensitivityupdate.Fig. 2: Example A: Closed-loop simulation results comparingperformance of the traditional online MPC π mpc ( x ) (blue) andthe approximate explicit MPC π approx ( x ) (red).This is shown in Fig. 1 where the state space is sampled intogrids with a larger grid size (shown in red circle) with aninterval of 0.0211 for x and x , and around each grid point,the state space is further sampled with a smaller grid size(shown in black dots) with an interval of 0.0052 for x and x .By using this approach, we were able to generate a totalof 6968 training samples, out of which 268 training samples(shown in red circles) were generated by solving a numericaloptimization problem, and 6700 training samples (shown inblack dots) were generated by using the sensitivity update.Note that not all the grid points have a feasible solution. Henceonly the feasible points are included in the training data set D and are shown in Fig. 1.The optimization problem was solved ofﬂine using IPOPT [23]. The samples were generated on a 2.6 GHz processorwith 16GB memory. The minimum, maximum and averageCPU time for generating the training samples using the fulloptimization problem and the sensitivity-update are summa-rized in Table I, from which it can be seen that the CPUtimes differ roughly by a factor of 100.Using the generated training samples, we approximate theMPC control law using deep neural networks with L = 3 ABLE I: Example A: CPU time in [s] for generating thetraining samples. min avg max layers and M = 10 neurons with rectiﬁed linear units (ReLU)as the activation function in each layer . From the generatedtraining samples, 4878 training samples were used for training,1045 training samples were used for validation, and 1045training samples were used for testing.Fig. 2 shows the closed loop simulation results using theapproximate MPC control law π approx ( x ) (shown in red) com-pared with the traditional MPC control law π mpc ( x ) (shown inblue).From this it can be seen that by using the proposedapproach, we can choose a relatively larger grid size betweenthe samples (sparse sampling), which reduces the numberof optimization problems that needs to be solved ofﬂine.Consequently, the overall time and computational cost requiredto generate the training samples is signiﬁcantly lesser. B. Building Climate Control

We now illustrate the proposed approach on a buildingclimate control problem, for which there have been severalworks considering MPC as the control strategy [10], [24]. Inour simulations, we model the heat dynamics of a buildingbased on the modeling framework from [25], as shown below,d T s d t = 1 R is C s ( T i − T s ) d T i d t = 1 R is C i ( T s − T i ) + 1 R ih C i ( T h − T i ) + A w Φ C i + 1 R ie C i ( T e − T i ) + 1 R ia C i ( T a − T i ) d T h d t = 1 R ih C h ( T i − T h ) + uC h d T e d t = 1 R ie C e ( T i − T e ) + 1 R ea C e ( T a − T e ) + A e Φ C e where the subscripts ( · ) s , ( · ) i , ( · ) h , ( · ) e and ( · ) a denotes thesensor, building interior, heater, building envelop, and ambient,respectively. T denotes the temperature, R denotes the thermalresistance, C denotes the heat capacity and u denotes theheat ﬂux. The solar irradiation Φ enters the building interiorthrough the effective window area A w in addition to heatingthe building envelop with effective area A e . The states aregiven by x = [ T s , T i , T h , T e ] T with T ( ∗ ) ∈ [12 , ◦ C. Theambient temperature T a ∈ [ − , ◦ C and the solar irradiation Φ ∈ [0 , . kW/m are considered as external disturbances.The parameter values used in the model are shown in Table III.The objective is to drive the interior temperature T i to adesired setpoint T spi ∈ [18 , ◦ C , while penalizing the Note that the hyperparameter tuning of the network archi-tecture is not the focus of this paper, and one may ﬁnd analternative/better network architecture than the one used here.Source codes for the simulation results presented in this paper can be found in this link

TABLE II: Example B: CPU time in [s] for generating thetraining samples. min avg max rate of change of the input usage u ∈ [0 , kW. The stagecost is then given by ℓ ( x, u ) = ( T i − T spi ) + 0 . u ) TheMPC problem is formulated with a sampling time of 1 minand a prediction horizon of N = 3 hours. The goal is toapproximate the MPC control law π approx (˜ x ) . In this example ˜ x = [ T s , T i , T h , T e , T spi , T a , Φ , u ] T , which requires sampling a8-dimensional space in order to generate the training samples.We randomly generate a total of 6930 samples, out ofwhich, 330 samples were generated by solving the optimiza-tion problem and 6600 samples were generated using thesensitivity update. The average, minimum and maximum CPUtime for solving the optimization problem and for computingthe sensitivity update are shown in Table II.Using the generated training samples, we approximate theMPC control law using deep neural networks with L = 3 layers and M = 10 neurons with rectiﬁed linear units (ReLU)as the activation function in each neuron. From the generatedtraining samples, 4850 training samples were used for training,1040 training samples were used for validation, and 1040training samples were used for testing.We test the performance of the approximate control lawfor a total simulation time of 12 hours, with changes in thesetpoint (at t = 3 h), solar irradiation (at time t = 6 h),and ambient temperature ( t = 9 h). Fig. 3 shows the closedloop simulation results using the traditional MPC control law π mpc ( x ) obtained by solving the MPC problem online, andthe performance of the approximate explicit MPC control law π approx ( x ) approximated using the training samples generatedusing Algorithm 2. From this it can be seen that the proposedsensitivity-based data augmentation framework can be usedto parameterize the measured disturbances, setpoints, and thecontrol input in addition to the states in order to handle timevarying disturbances and setpoints.TABLE III: Example 2: Parameter used in the building climatecontrol problem. R is Heat resistance between interior & sensor 1.89 ◦ C/kW R ih Heat resistance between interior & heater 0.146 ◦ C/kW R ie Heat resistance between interior & envelop 0.897 ◦ C/kW R ia Heat resistance between interior & ambient 2.5 ◦ C/kW R ea Heat resistance between envelop & ambient 0.146 ◦ C/kW C s Heat capacitance for the sensor 0.0549 kWh/ ◦ C C i Heat capacitance for the interior 0.0928 kWh/ ◦ C C e Heat capacitance for the envelop 3.32 kWh/ ◦ C C h Heat capacitance for the heater 0.889 kWh/ ◦ C A e Effective area of solar irradiation 3.87 m A w Effective window area 5.75 m V. CONCLUSIONSTo conclude, this brief paper addresses an important imple-mentation aspect of approximate explicit MPC design, namelyig. 3: Example B: Simulation results comparing the closed-loop performance of the traditional online MPC π mpc (˜ x ) , and theapproximate explicit MPC π approx (˜ x ) .the cost of training. Algorithm 2 exploits the parametericsensitivities to cheaply generate several training samples usingthe solution of a single optimization problem. It was shownthat by using the proposed approach, one can • sample the state space relatively sparsely, hence reducingthe number of optimization problems that needs to besolved ofﬂine, • and augment the data set with additional samples usingNLP sensitivity.The proposed scheme can be used with any MPC formulationthat can be cast as a nonlinear programming problem.R EFERENCES[1] A. Bemporad, M. Morari, V. Dua, and E. N. Pistikopoulos, “The explicitlinear quadratic regulator for constrained systems,”

Automatica , vol. 38,no. 1, pp. 3–20, 2002.[2] P. Tøndel, T. A. Johansen, and A. Bemporad, “An algorithm formulti-parametric quadratic programming and explicit mpc solutions,”

Automatica , vol. 39, no. 3, pp. 489–497, 2003.[3] T. Parisini and R. Zoppoli, “A receding-horizon regulator for nonlinearsystems and a neural approximation,”

Automatica , vol. 31, no. 10, pp.1443–1451, 1995.[4] A. Chakrabarty, V. Dinh, M. J. Corless, A. E. Rundell, S. H. ˙Zak, andG. T. Buzzard, “Support vector machine informed explicit nonlinearmodel predictive control using low-discrepancy sequences,”

IEEE Trans-actions on Automatic Control , vol. 62, no. 1, pp. 135–148, 2017.[5] B. Karg and S. Lucia, “Efﬁcient representation and approximation ofmodel predictive control laws via deep learning,”

IEEE Transactions onCybernetics , 2020.[6] S. Chen, K. Saulnier, N. Atanasov, D. D. Lee, V. Kumar, G. J. Pap-pas, and M. Morari, “Approximating explicit model predictive controlusing constrained neural networks,” in . IEEE, 2018, pp. 1520–1527.[7] L. H. Csek˝o, M. Kvasnica, and B. Lantos, “Explicit MPC-based RBFneural network controller design with discrete-time actual kalman ﬁlterfor semiactive suspension,”

IEEE Transactions on Control SystemsTechnology , vol. 23, no. 5, pp. 1736–1753, 2015.[8] J. A. Paulson and A. Mesbah, “Approximate closed-loop robust modelpredictive control with guaranteed stability and constraint satisfaction,”

IEEE Control Systems Letters , 2020.[9] M. Hertneck, J. K¨ohler, S. Trimpe, and F. Allg¨ower, “Learning anapproximate model predictive controller with guarantees,”

IEEE ControlSystems Letters , vol. 2, no. 3, pp. 543–548, 2018.[10] J. Drgoˇna, D. Picard, M. Kvasnica, and L. Helsen, “Approximate modelpredictive building control via machine learning,”

Applied Energy , vol.218, pp. 199–216, 2018. [11] X. Zhang, M. Bujarbaruah, and F. Borrelli, “Safe and near-optimal policylearning for model predictive control using primal-dual neural networks,”in . IEEE, 2019, pp. 354–359.[12] D. P. Bertsekas,

Reinforcement learning and optimal control . AthenaScientiﬁc Belmont, MA, 2019.[13] T. Tran, T. Pham, G. Carneiro, L. Palmer, and I. Reid, “A bayesiandata augmentation approach for learning deep models,” in

Advances inneural information processing systems , 2017, pp. 2797–2806.[14] L. Taylor and G. Nitschke, “Improving deep learning using generic dataaugmentation,” arXiv preprint arXiv:1708.06020 , 2017.[15] A. V. Fiacco, “Sensitivity analysis for nonlinear programming usingpenalty methods,”

Mathematical programming , vol. 10, no. 1, pp. 287–311, 1976.[16] J. F. Bonnans and A. Shapiro, “Optimization problems with perturba-tions: A guided tour,”

SIAM review , vol. 40, no. 2, pp. 228–264, 1998.[17] V. M. Zavala and L. T. Biegler, “The advanced-step nmpc controller:Optimality, stability and robustness,”

Automatica , vol. 45, no. 1, pp.86–93, 2009.[18] J. J¨aschke, X. Yang, and L. T. Biegler, “Fast economic model predictivecontrol based on nlp-sensitivities,”

Journal of Process Control , vol. 24,no. 8, pp. 1260–1272, 2014.[19] D. W. Grifﬁth, L. T. Biegler, and S. C. Patwardhan, “Robustly stableadaptive horizon nonlinear model predictive control,”

Journal of ProcessControl , vol. 70, pp. 109–122, 2018.[20] D. Krishnamoorthy, L. T. Biegler, and J. J¨aschke, “Adaptive horizon eco-nomic nonlinear model predictive control,”

Journal of Process Control ,vol. 92, pp. 108–118, 2020.[21] S. Lucia and B. Karg, “A deep learning-based approach to robustnonlinear model predictive control,”

IFAC-PapersOnLine , vol. 51, no. 20,pp. 511–516, 2018.[22] D. Q. Mayne, E. C. Kerrigan, E. Van Wyk, and P. Falugi, “Tube-based robust nonlinear model predictive control,”

International Journalof Robust and Nonlinear Control , vol. 21, no. 11, pp. 1341–1353, 2011.[23] A. W¨achter and L. T. Biegler, “On the implementation of an interior-point ﬁlter line-search algorithm for large-scale nonlinear programming,”

Mathematical programming , vol. 106, no. 1, pp. 25–57, 2006.[24] S. Pr´ıvara, Z. V´aˇna, J. Cigler, F. Oldewurtel, and J. Kom´arek, “Roleof mpc in building climate control,” in

Computer Aided ChemicalEngineering . Elsevier, 2011, vol. 29, pp. 728–732.[25] P. Bacher and H. Madsen, “Identifying suitable models for the heatdynamics of buildings,”