[PDF] Learning Models of Model Predictive Controllers using Gradient Data

Abstract

This paper investigates controller identification given data from a Model Predictive Controller (MPC) with constraints. We propose an approach for learning MPC that explicitly uses the gradient information in the training process. This is motivated by the observation that recent differentiable convex optimization MPC solvers can provide both the optimal feedback law from the state to control input as well as the corresponding gradient. As a proof of concept, we apply this approach to explicit MPC (eMPC), for which the feedback law is a piece-wise affine function of the state, but the number of pieces grows rapidly with the state dimension. Controller identification can here be used to find an approximate lower complexity functional approximation of the controller. The eMPC is modelled with a Neural Network (NN) with Rectified Linear Units (ReLUs), since such NN can represent any piece-wise affine function. A motivation is to replace on-line solvers with neural networks to implement MPC and to simplify the evaluation of the function in larger input dimensions. We also study experimental design and model evaluation in this framework, and propose a hit and run sampling algorithm for input design. The proposed algorithm are illustrated and numerically evaluated on a second order MPC problem.

Full PDF

LLearning Models of Model Predictive Controllersusing Gradient Data (cid:63)

Rebecka Winqvist ∗ Arun Venkitaraman ∗ Bo Wahlberg ∗∗ Division of Decision and Control Systems, EECS, KTH Royal Institute ofTechnology, Stockholm, Sweden. (e-mails: { rebwin, arunv, bo } @kth.se). Abstract:

This paper investigates controller identiﬁcation given data from a Model Predictive Controller(MPC) with constraints. We propose an approach for learning MPC that explicitly uses the gradientinformation in the training process. This is motivated by the observation that recent differentiable convexoptimization MPC solvers can provide both the optimal feedback law from the state to control input aswell as the corresponding gradient. As a proof of concept, we apply this approach to explicit MPC(eMPC), for which the feedback law is a piece-wise afﬁne function of the state, but the number ofpieces grows rapidly with the state dimension. Controller identiﬁcation can here be used to ﬁnd anapproximate lower complexity functional approximation of the controller. The eMPC is modelled with aNeural Network (NN) with Rectiﬁed Linear Units (ReLUs), since such NN can represent any piece-wiseafﬁne function. A motivation is to replace on-line solvers with neural networks to implement MPC andto simplify the evaluation of the function in larger input dimensions. We also study experimental designand model evaluation in this framework, and propose a hit and run sampling algorithm for input design.The proposed algorithm are illustrated and numerically evaluated on a second order MPC problem.

Keywords:

Identiﬁcation for control; data-driven control; neural networks relevant to control andidentiﬁcation; input and excitation design; model predictive control; modeling for control optimization.1. INTRODUCTIONController identiﬁcation concerns estimating a model of a feed-back controller from observed input and output data. This istypically done in a feedback mode, where the controller in-teracts with a dynamical system. It is a well studied topic,see e.g. (Ljung, 1999), in particular when the system and thecontroller can be described by linear dynamical models. It isonly recently that learning models based on Neural Networks(NNs) have been pursued for Model Predictive Controllers(MPCs) with constraints (Chen et al., 2018; Blanchini, 1999;Chen et al., 2019; Maddalena et al., 2019). A motivation behindthese approaches is the recent result that NNs with rectiﬁedlinear units (ReLUs) can represent piece-wise afﬁne functionsor functions with linear regions (Montufar et al., 2014). Thisnaturally makes such approaches suited to the MPC learningproblem (Chen et al., 2018), particularly in the case of explicitMPC, where the optimal control law is an afﬁne function of thestate vector (Bemporad, 2013).Consider an MPC problem with the feedback law u = u ∗ ( x ) corresponding to the full state information x . The goal of anylearning based approach to MPC is then to learn a mapping µ ( x ) that describes u ∗ ( x ) as good as possible. In the case ofNN based approaches, µ corresponds to the function learnt bya NN. Most existing NN based learning approaches proceed bylearning µ using only the input and output observations of x and u , respectively. Like any learning approach, an MPC mapping (cid:63) This work was partially supported by the Swedish Research Council andby the Wallenberg AI, Autonomous Systems and Software Program (WASP)funded by the Knut and Alice Wallenberg Foundation and the SwedishResearch Council Research Environment NewLEADS under contract 2016-06079. learnt in such a manner could perform poorly when the trainingdata is limited. In such a scenario, additional structure can oftenbe of merit in aiding the training of the NN. For example,in the case of explicit MPC (eMPC), u is a piece-wise afﬁnefunction in x and hence, the gradient of the optimal control lawwith respect to x is piecewise-constant and contains importantstructural information useful in learning µ .Motivated by this observation, we propose a NN-based learningapproach for MPC where we explicitly use structural informa-tion in the form of gradient data ∂ u ∗ ∂ x for training the NN. Whiletraining data in the form of gradient information is typicallyunavailable or difﬁcult to obtain, the special structure of theMPC problem and the use of recently proposed tools in differ-entiable convex optimization (Diamond and Boyd, 2016) helpsus achieve our goal. The main contributions of the paper are: • Learning Models : We design and evaluate algorithms totrain ReLU based NNs that implements MPC using inputand output data u i = u ∗ ( x i ) ) and corresponding gradientdata u (cid:48) i = ∂ u ∗ ( x ) / ∂ x | x = x i . We show that taking thegradient information into account can signiﬁcantly reducethe number of training data needed to achieve a highaccuracy. We use eMPC as a proof of concept, whilethe proposed algorithm can handle more general MPCproblems. • Data generation : It is not obvious how to efﬁcientlygenerate training data when learning the control law asa mapping. Grid-based approaches for sampling the inputspace would work for small state dimensions, but becomecumbersome for high-dimensional systems. Keeping thisin mind, we study the use of an efﬁcient and statisticallymotivated hit-and-run sampler that extends well to higherstate dimensions. a r X i v : . [ ee ss . S Y ] F e b Evaluation of performance : The performance of trainedNN based MPC controller should be evaluated in closedloop feedback, taking into account both tracking, distur-bance rejection and stability. Here, we evaluate the perfor-mance on test data in terms of different metrics.

In order to illustrate the identiﬁcation algorithms to be pro-posed, consider the following scalar linear feedback examplewith measurements u k = l x k + l + e k , k = , . . . , N , with control signal u k , scalar state signal x k and additive whitezero mean Gaussian noise e k with variance λ e . Assume that itis possible to measure the derivative of u with respect to x , u (cid:48) k = l + v k , k = , . . . , N , where v k is white zero mean Gaussian distributed noise withvariances λ v . The maximum likelihood estimate of l and l given the data { x k , u k , u (cid:48) k } , k = , . . . , N is found by solving theleast squares problem V ( l , l ) = N ∑ k = [ u k − l x k − l ] λ e + [ u (cid:48) k − l ] λ v . The error covariance matrix of the least squares estimate ( ˆ l , ˆ l ) T equals, see Ljung (1999), λ e N (cid:34) N N ∑ k = (cid:20) ( x k + λ e / λ v ) x k x k (cid:21)(cid:35) − For the choice x k = l and l individually without the extra gradient data. For this case theestimation error covariance matrix equals1 N (cid:20) λ v − λ v − λ v ( λ e + λ v ) (cid:21) This result can also be found by analyzing u k − u (cid:48) k = b + e k − v k . The least squares estimate of l is just the average of thisdifference signal. Notice that the variance of ˆ l is lower thanthe variance of ˆ l , which is expected given the extra informationon l and the added noises when estimating l . Hence gradientinformation can be crucial in terms of estimation quality for lowinput excitation.The structure of this paper is as follows: The MPC problemand Neural Networks are introduced in Section 2, while Section3 describes the proposed training and evaluation framework.Data generation and testing are studied in Section 4 and thenumerical examples are presented in Section 5. Finally, theconclusion and ideas for future work are given in Section 6.2. PRELIMINARIESIn this section, we brieﬂy review the basic Model PredictiveControl (MPC) problem, the explicit MPC, followed by areview of neural networks. Consider a discrete-time linear time-invariant system whichevolves in time as x ( k + ) = Ax ( k ) + Bu ( k ) , (1) where x ( k ) denotes the state vector and u ( k ) denotes the inputor control action, at the k th time-step,respectively; A and B denote the system matrices. Model-predictive control (MPC)refers to the problem of steering the state of the system (1) froman initial value to the origin by minimizing a control objective,subject to the state and input constraints x ( k ) ∈ X , u ( k ) ∈ U (2)where X ⊆ R n and U ⊆ R m are polyhedra representing theconstraint sets for the state and the input, respectively. Weconsider the following ﬁxed horizon MPC problem, see Borrelliet al. (2017):min u ( ) ,..., u ( N − ) x ( N ) T Q N x ( N ) + N − ∑ k = x ( k ) T Qx ( k ) + u ( k ) T Ru ( k ) s.t. x ( k + ) = Ax ( k ) + Bu ( k ) , x ( k ) ∈ X , u ( k ) ∈ U , x ( ) = x (3)where Q and R are the positive semi-deﬁnite weight matrices, Q N the terminal cost matrix, and N denotes the ﬁnite time hori-zon length. The optimal solution to such a constrained quadraticprogram can be found by solving a set of linear equations oncethe active constraints are identiﬁed, see Wang and Boyd (2010)for results on fast online MPC implementations.The MPC control law is the mapping from the current state x ( ) = x to the ﬁrst optimal control action u ( ) = u . We willuse the notation u = u ∗ ( x ) for this mapping. We now discuss how the feasibility constraints can be charac-terized in terms of set-invariance, which will form the basis ofincorporating structure into neural network solutions for MPC.As deﬁned by Borrelli et al. (2017), a set, C ⊆ X , is a controlinvariant set for the system (1) subject to the constraints (2) if x ( k ) ∈ C = ⇒ ∃ u ( k ) s.t. x ( k + ) ∈ C , k = , , · · · (4)In other words, for any initial state in C there exists a controllerthat ensures all future states reside in C . The maximal controlinvariant set, C ∞ , is then deﬁned as the control invariant setcontaining all control invariant sets contained in X . C ∞ beinga polytope is expressible as an intersection of halfspaces as C ∞ { x ∈ R n | C x x ≤ d x } (Bemporad, 2013). To compute C ∞ weuse algorithm 10.2 ’Computation of C ∞ ’ provided in Borrelliet al. (2017) and the accompanying software. The main challenge in solving the MPC problem lies in deter-mining which of the constraints are active. Nevertheless, thisentails solving an optimization problem at each time instant,which could quickly become a bottleneck when applying tosystems repeatedly. One of the ways of circumventing the de-termination of active sets is through ofﬂine pre-computation ofthe control laws such that the problem is transformed into thatof specifying a mapping or lookup table in the form of a piece-wise afﬁne function. This mapping then acts on the input toproduce the optimal control law. This approach is known as theexplicit MPC (Alessio and Bemporad, 2009; Goodwin et al.,2010; Bemporad, 2013). ...... ...... µ ( x , θθθ ) Fig. 1. Schematic of a two-layer neural network.Given a polytopic set X , for each x ∈ X , explicit MPCcomputes a piecewise afﬁne (PWA) mapping u = u ∗ ( x ) from x to u deﬁned over M regions of X . This in turn meansthat the gradient ∂ u ∗ ( x ) / ∂ x is piecewise-constant and containssigniﬁcant information describing the optimal control law. Thisstructural information can in turn be used to improve thetraining of the NNs - giving the gradient u (cid:48) in addition to thestate vector x and the associated with the control law u willhelp learn a better µ ( x ) , particularly when the training data islimited in size.The explicit MPC has further inspired the viewing of theMPC problem as a general learning problem. As a result, anumber of works involving the use of learning approaches,primarily in the form of artiﬁcial neural networks have beenproposed recently (Chen et al., 2018; Blanchini, 1999; Chenet al., 2019; Maddalena et al., 2019). The idea of using neuralnetworks based MPC is by no means novel and there exist manypublications on this topic, (Parisini and Zoppoli, 1995; ˚Akessonand Toivonen, 2006; Winqvist, 2020). Neural networks and deep learning approaches are now ubiq-uitous and form the crux of most learning-based approachestoday (LeCun et al., 2015). Neural networks learn a mappingfrom the input to the output from known training examples,when the problem at hand has either no clear closed-form input-output mapping, or even if there is one, is intractable to workwith (Bishop, 2006). Neural networks comprise concatenatedprocessing units known as neurons that combine linear andnon-linear transformations. Mathematically expressed, a neuralnetwork µ ( x ) learns a mapping from the input x to output u inthe form u = µ ( x , θθθ ) = σ ( W L σ ( W L − σ ( · · · σ ( W x + b ) · · · ) + b L ) , where σ ( · ) denotes the point-wise nonlinearity or activationfunction, and the matrices { W i , b i } Li = are the parameters θθθ (weights and biases) learnt by the network from the trainingdata, L denoting the number of neuron layers.The learning is typically performed by the use of back-propogation that uses the gradients of an error or loss functionwith respect to the network parameters. For the reasons ofcomputational complexity and stability, the rectiﬁed linear unit(ReLU) is the most commonly employed activation function(Bishop, 2006; LeCun et al., 2015). As discussed earlier, the useof ReLU as the activation function has been shown to be wellmotivated in learning functions with linear regions (Montufaret al., 2014), and particularly in the MPC setting due to itspiece-wise linear nature (Chen et al., 2018). A schematic of atwo-layer neural network is shown in Figure 2. 3. PROPOSED APPROACHLet x ∈ R n x denote the initial state vector x ( ) , and u ∈ R n u thecorresponding optimal control law x ( ) . Further, let u (cid:48) ∈ R n u × n x denote the true gradient of the optimal control with respect to x . Consider that we are given a set of N tr samples of triplets ofinitial states, the corresponding optimal control laws, and theirgradients, given by { x i , u i , u (cid:48) i } N tr i = . We note that the subscript i here denotes the i th samples and is not the time-index. Let usdeﬁne the sets X = { x , · · · , x N tr } , U = { u , · · · , u N tr } , and U (cid:48) = { u (cid:48) , · · · , u (cid:48) N tr } . Our goal is to train a ReLU NN to predictthe optimal control law using gradient information – we learnthe mapping µ ( x , θθθ ) : R n x × R D (cid:55)→ R n u , where θθθ ∈ R D denotes the all the learnable weights of the NN,by using a training loss function L tr ( θθθ ) : X × U × U (cid:48) (cid:55)→ R that explicitly uses gradient information.Speciﬁcally, we propose to learn µ ( x , θθθ ) by minimizing thefollowing loss function with respect to the parameters θθθ : L tr ( θθθ ) = N tr ∑ i = (cid:32) (cid:107) u i − µ ( x i , θθθ ) (cid:107) + γ (cid:13)(cid:13)(cid:13)(cid:13) u (cid:48) i − ∂ µ ( x , θθθ ) ∂ x | x = x i (cid:13)(cid:13)(cid:13)(cid:13) (cid:33) , (5)where the ﬁrst term denotes the model error, and the secondterm denotes the ﬁt of the gradients , γ > N tr ∑ i = (cid:107) u i − µ ( x i , θθθ ) (cid:107) . It is well known that training the NN in such a manner makes itdata-hungry, and the performance of the NN often suffers whenthe number of training samples is limited.In many control problems, and particularly with the MPC,often one has access only to a limited number of trainingobservations given the very large state space. As seen from ourdiscussion in Section 2, an explicit incorporation of structuralinformation often aids in the learning process when the numberof training samples is limited. This forms the motivation behindour approach - the regularization explicitly enforces structuralinformation on the learnt NN mapping µ . In the case of eMPC,the set feasible inputs is a polytope, and therefore, the gradientof the optimal control law u (cid:48) is piecewise constant. Assumingthe training samples are drawn randomly over the feasibleinput space (ideally one sample per region of the feasiblespace), if the samples are taken uniformly from the feasibleset, the network is given information about a large subsetsof the feasible set. In such a case, we would expect that ourapproach would learn with even limited training data due toactive incorporation of this structural information. In contrast,a regular NN based MPC solver that uses only the model errorin the training is agnostic to this information and must abstractsuch structure purely from the training samples.Thus, our approach is a trade-off between completely data-driven and structurally aware MPC solver. A schematic of theproposed approach is given in Figure 2. We note that our ap-proach requires the value of the true gradients of the MPCproblem evaluated at the given x . As described in Section5, we evaluate the true gradients using the recently proposed cvxpylayers (Agrawal et al., 2019), a framework for obtainingdifferentiable convex layers in pytorch. Since the cost function tr ( θθθ ) is non-convex, the training proceeds by backpropoga-tion as described next. The details of the dataset generationand training are described in the next Section. Let ˆ θθθ denotethe NN parameters obtained after training. Once the networkparameters θθθ are learnt, they are used to predict the optimalcontrol law for test data x asˆ u = µ ( x , ˆ θθθ ) . We note that the gradient information is not required during thetest phase, and is used only in the training of the NN. x ...... ...... µ ( x , θθθ ) ∂ µ∂ x µ (cid:48) ( x , θθθ ) Fig. 2. The structure of the proposed neural network.4. TRAINING AND DATA GENERATIONWe ﬁrst discuss the systematic strategy that we propose for thegeneration of data sets for the MPC problems. Our approach forgenerating the training and testing data set involves sampling(using the Hit-and-run sampler described next) a set of states X = { x , . . . , x N tr } from the feasible region C ∞ described inSection 3.3. The corresponding optimal control input and thecorresponding gradient sets, U , and U (cid:48) are then computedusing a stand-alone cvxpylayers. We follow the same strategy togenerate test data set with the difference that it does not containgradient data. To sample from the maximal control invariant set C ∞ deﬁnedin Section 2.2 we use the Hit-and-Run sampler, which is aMarkov chain Monte Carlo method for sampling uniformlyfrom convex shapes (Mete and Zabinsky, 2012). Essentially,starting from any point in the convex set, the sampler generatesa set of points, S , by walking steps of length λ , in randomlygenerated (unit) directions. The steps involved in the Hit-and-Run sampler are detailed in Algorithm 1. We employ thisapproach since it ensures that the generated datapoints coverthe feasibility region in a reasonably uniform manner (Meteand Zabinsky, 2012; Zabinsky and Smith, 2013). This in turnensures that the network has observed training samples thatspan the entire feasible set on an average, thereby aiding itsability to generalize. Once the training and test datasets are generated, the corre-sponding training outputs and gradients are obtained from us-ing CVXPY to solve the MPC problem. We then use a su-pervised learning method to train the networks on a data set D = { ( x i , u i , u (cid:48) i ) } of input-output pairs. During the training,we learn for the network parameters by minimizing the lossfunction L tr ( θθθ ) deﬁned in (5) with respect to θθθ .In order to increase the training speed, we split the data intosmaller subsets (mini-batches) and compute the the training Algorithm 1

Hit-and-Run Sampler procedure H IT - AND -R UN ( C ∞ , N S ) Pick random point x ∈ C ∞ = { x ∈ R n | C x x ≤ d x } S ← { x } for i = , . . . , N S − do λ i ← ∞ Generate random unit direction d i for ( c , d ) in ( C x , d x ) do λ ← d − c · xc · d i if λ > then (cid:46) To ensure right direction λ i ← min ( λ i , λ ) λ i ← drawn from U [ , λ i ) x ← x + λ i d i S ← S ∪ { x } return S loss function (5) for each mini-batch. Each mini-batch consistsof ﬁve training samples. We then use the gradient descent-based Adam optimizer (Kingma and Ba, 2014) to backpropagate theloss and update the parameters θθθ following each batch. Onceall the mini-batches have been iterated over, one training epochis completed. We train the networks until L tr ( θθθ ) is reduced to0 .

01 or for a maximum of 50000 training epochs.5. EXAMPLESWe now consider the application of the proposed concepts on aset of networks with different regularization constants γ , trainedon MPC problems for a two-dimensional system. We evaluatethe networks in terms of two performance metrics:(1) The normalized mean square error (NMSE) which isdeﬁned as NMSE = E (cid:107) ˆ u − u (cid:107) E (cid:107) u (cid:107) , where E denotes the average over all samples in thetraining or test dataset.(2) The normalized control cost J , deﬁned as the objective inEquation (3) normalized by x ( ) T x ( ) : J = x ( N ) T Q N x ( N ) + ∑ N − k = (cid:2) x ( k ) T Qx ( k ) + u ( k ) T Ru ( k ) (cid:3) x ( ) T x ( ) where x ( ) is the initial state of the trajectory.Both the metrics are evaluated on test data, previously unseenby the networks during the training. The NMSE helps evaluatethe control law predicted by the network with respect to theground truth, whereas the control cost measures how well thecontrol law is in terms of minimizing the control objective: thesmaller the J , the better the control achieved. We consider the example of a two-dimensional state vector witha scalar input under constraints. Despite being relatively low-dimensional, such a scenario occurs regularly in many real-lifecontrol applications. Let us then consider a two-dimensionalsystem speciﬁed as follows (Borrelli et al., 2017): A = (cid:20) . . . . (cid:21) , B = (cid:20) . . (cid:21) (6)ig. 3. Hit-and-run sampling of 1000 points from the set C ∞ forsystem (1) with system and control matrices (6) subject tothe constraints (7).subject to the constraints (cid:20) − . − . (cid:21) ≤ x ( k ) ≤ (cid:20) . . (cid:21) , − . ≤ u ( k ) ≤ . , k = , . . . , N (7)and with cost parameters Q N = Q = (cid:20) . . . . (cid:21) , R = , N = . (8)We generate training and testing data by ﬁrst sampling statesfrom C ∞ for the system (6) subject to (7) using the Hit andRun Algorithm 1, see Figure 3 for an example of 1000 sampledpoints. We then solve for the optimal controls and correspond-ing gradients using CVXPY with cost parameters given by (8).In our experimental setup we consider three sets of ten neuralnetworks for four different settings of the regularization con-stant γ ∈ { , . , , } . For each set of networks we (pseudo)randomly generate 10 sets of training data, one for each net-work in the set. The ﬁrst set of networks is trained using 25samples, the second set using 50 samples, and the third set using100 samples. We use S ( ) , S ( ) and S ( ) to denote the threenetwork sets.For the NMSE evaluation, we use three separate testingdatasets, one for each set of networks, all consisting of 100samples. For the control cost evaluation, we sample sets of 100initial states from C ∞ using Algorithm 1. Starting from these,we then simulate the networks in closed loop for three timesteps to generate trajectories.In Figure 4 we plot the NMSE averaged over the ten trainednetworks in each network set in the decibel [dB] scale. Table1 compares the lowest overall NMSE for the cases γ = γ (cid:54) =

0. We see that γ = γ should not be so important since we do not have measurementnoise. However, from a numerical point of view it makes adifference, for example in Figure 4 we note an increase in theNMSE for γ >

10. Notice that the we only evaluate the NMSEin the control mapping and not in its gradient. Figure 5 shows a comparison of the the control costs for thetwo cases { S ( ) , γ = } and { S ( ) , γ = } averaged over theten trained networks in each respective set. Table 2 showsthe control cost averaged over all trajectories. An interestingobservation is that the performance of the setting { S ( ) , γ = } is comparable to that of the setting { S ( ) , γ = } . In fact, alarger training dataset will in general lower the generalizationerror. The results then points to the richness of the gradientinformation.Considering the proportions of the amount of training data totest trajectories, it is likely that there are clusters of samples(initial states) in regions that the network has not been exposedto during training, which might explain the peaks we observein Figure 5. A possible explanation is that the network in thosecases produces either very large control signals and/or steersthe state away from the reference state (possibly outside thefeasible region), which would result in large control costs.Chen et al. (2019) suggest a projection strategy for ensuringfeasibility, i.e. constraint satisfaction, of the generated controlinputs by projecting them onto a safe region. The approaches in(Winqvist, 2020) is also based on this idea. We do not employany such safety measures in our network structure. Note alsothat we do not explicitly train the networks to optimize thecontrol cost, which is another possible approach for improvingthe results.Fig. 4. The test NMSE averaged over the ten networks in eachnetwork set as a function of the regularization constant γ .Fig. 5. Control cost evaluation over 100 trajectories. The controlcosts are averaged over the ten networks in the respectivesets S ( ) and S ( ) . et NMSE [dB](no regularization) NMSE [dB](regularization) Best γ S ( ) S ( ) -0.8165 -19.5597 1 S ( ) Table 1. NMSE evaluation for 2D example. Thetable presents the lowest NMSE when no regular-ization ( γ =

0) was used during training, as wellas the lowest overall NMSE and the correspondingregularization constant γ . For all sets the lowestNMSE was found for γ = { Set , γ } J (avg. over 100 test traj.) { S ( ) , 1 } { S ( ) , 0 } Table 2. Control cost evaluation for 2D example.6. CONCLUSION AND FUTURE WORKWe have presented a framework for off-line training and eval-uation of a neural network approach for implementing MPCusing gradient data. The underlying question is if it is possi-ble to replace on-line MPC optimization solvers with trainedNNs. This would allow for very efﬁcient and robust real timeimplementations. At the same time, there is great progress inthe area of embedded convex optimization for control. The ideais to approximate the MPC mapping from state to control inputwith constrained ReLU based neural network. The main novelresult is on how to include the gradient of the MPC controllerwith respect to the state input in the training of NNs. We haveused CVXPY (Agrawal et al., 2019) and PyTorch (Paszke et al.,2019) to implement this framework. We also use CVXPY andcvxpylayers to generate training data and to evaluate the result-ing controller.A key factor is the generation of samples for the off-line train-ing, which related to input design in system identiﬁcation, (An-nergren et al., 2017). This is a challenge when the dimension ofthe state space increase. Here we proposed to use a hit-and-run sampler, and evaluated the resulting controller based ontrajectories and normalized cost-functions. The numerical testsshowed the trade-off between the number of training data andthe approximation properties of the resulting controller.This paper is a ﬁrst step towards controller identiﬁcation ofMPC using ReLU networks. It should be noted that we donot assume any model information other than from x i , u i andgradient in training the network– so or approach is not restrictedto eMPC or even time-invariant MPC problems. One can alsouse model parameters as inputs to the NN and gradients withrespect to them as training data. It is also possible to train theNN to predict the u ( k ) , k = , . . . , N −

1, for the entire MPChorizon by giving just x ( ) as input. In this way we are feedingthe information that we need the control to take N steps.The next step is also to evaluate this approach on more ad-vanced MPC control problems including nonlinear MPC. Thenumerical example studied here is for proof of concept. We alsouse very basic methods for evaluation. The approach wouldalso beneﬁt from training on trajectories instead of just onthe control mapping. We will be pursuing these aspects in thefuture. REFERENCESAgrawal, A., Amos, B., Barratt, S., Boyd, S., Diamond, S., andKolter, J.Z. (2019). Differentiable convex optimization lay-ers. In Advances in Neural Information Processing Systems ,volume 32, 9562–9574.Alessio, A. and Bemporad, A. (2009).

A Survey on ExplicitModel Predictive Control , 345–369. Springer Berlin Heidel-berg, Berlin, Heidelberg.Annergren, M., Larsson, C.A., Hjalmarsson, H., Bombois, X.,and Wahlberg, B. (2017). Application-oriented input designin system identiﬁcation: Optimal input design for control[applications of control].

IEEE Control Systems Magazine ,37(2), 31–56. doi:10.1109/MCS.2016.2643243.Bemporad, A. (2013).

Explicit Model Predictive Control , 1–9.Springer London, London.Bishop, C.M. (2006).

Pattern Recognition and Machine Learn-ing (Information Science and Statistics) . Springer-Verlag,Berlin, Heidelberg.Blanchini, F. (1999). Set invariance in control.

Automatica ,35(11), 1747 – 1767.Borrelli, F., Bemporad, A., and Morari, M. (2017).

PredictiveControl for Linear and Hybrid Systems . Cambridge Univer-sity Press, USA, 1st edition.Chen, S., Saulnier, K., Atanasov, N., Lee, D.D., Kumar, V.,Pappas, G.J., and Morari, M. (2018). Approximating explicitmodel predictive control using constrained neural networks.In , 1520–1527.Chen, S.W., Wang, T., Atanasov, N., Kumar, V., and Morari,M. (2019). Large scale model predictive control with neuralnetworks and primal active sets.Diamond, S. and Boyd, S. (2016). CVXPY: A Python-embedded modeling language for convex optimization.

Jour-nal of Machine Learning Research , 17(83), 1–5.Goodwin, G., Seron, M.M., and de Don, J.A. (2010).

Con-strained Control and Estimation: An Optimisation Approach .Springer Publishing Company, Incorporated, 1st edition.Kingma, D.P. and Ba, J. (2014). Adam: A method for stochasticoptimization.

CoRR , abs/1412.6980.LeCun, Y., Bengio, Y., and Hinton, G. (2015). Deep learning.

Nature , 521, 436–44. doi:10.1038/nature14539.Ljung, L. (1999).

System Identiﬁcation: Theory for the User .Prentice Hall information and system sciences series. Pren-tice Hall PTR.Maddalena, E.T., da S. Moraes, C.G., Waltrich, G., and Jones,C.N. (2019). A neural network architecture to learn explicitMPC controllers from data.Mete, H.O. and Zabinsky, Z.B. (2012). Pattern hit-and-runfor sampling efﬁciently on polytopes.

Operations ResearchLetters , 40(1), 6 – 11.Montufar, G.F., Pascanu, R., Cho, K., and Bengio, Y. (2014).On the number of linear regions of deep neural networks.In Z. Ghahramani, M. Welling, C. Cortes, N. Lawrence,and K.Q. Weinberger (eds.),

Advances in Neural InformationProcessing Systems , volume 27, 2924–2932. Curran Asso-ciates, Inc.Parisini, T. and Zoppoli, R. (1995). A receding-horizon reg-ulator for nonlinear systems and a neural approximation.

Autom. , 31, 1443–1451.Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J.,Chanan, G., Killeen, T., Lin, Z., Gimelshein, N., Antiga, L.,Desmaison, A., Kopf, A., Yang, E., DeVito, Z., Raison, M.,Tejani, A., Chilamkurthy, S., Steiner, B., Fang, L., Bai, J.,nd Chintala, S. (2019). Pytorch: An imperative style, high-performance deep learning library. In

Advances in NeuralInformation Processing Systems 32 , 8024–8035. Curran As-sociates, Inc.Wang, Y. and Boyd, S. (2010). Fast model predictive controlusing online optimization.

IEEE Transactions on ControlSystems Technology , 18(2), 267–278.Winqvist, R. (2020).

Neural Network Approaches for ModelPredictive Control . Master’s thesis, KTH, School of Electri-cal Engineering and Computer Science (EECS).Zabinsky, Z.B. and Smith, R.L. (2013).

Hit-and-Run Methods ,721–729. Springer US, Boston, MA.˚Akesson, B. and Toivonen, H. (2006). A neural network modelpredictive controller.