[PDF] In Proximity of ReLU DNN, PWA Function, and Explicit MPC

Abstract

Rectifier (ReLU) deep neural networks (DNN) and their connection with piecewise affine (PWA) functions is analyzed. The paper is an effort to find and study the possibility of representing explicit state feedback policy of model predictive control (MPC) as a ReLU DNN, and vice versa. The complexity and architecture of DNN has been examined through some theorems and discussions. An approximate method has been developed for identification of input-space in ReLU net which results a PWA function over polyhedral regions. Also, inverse multiparametric linear or quadratic programs (mp-LP or mp-QP) has been studied which deals with reconstruction of constraints and cost function given a PWA function.

Full PDF

IIn Proximity of ReLU DNN, PWA Function, and Explicit MPC

Saman Fahandezh-Saadi, Masayoshi Tomizuka

Abstract — Rectiﬁer (ReLU) deep neural networks (DNN)and their connection with piecewise afﬁne (PWA) functions isanalyzed. The paper is an effort to ﬁnd and study the possibilityof representing explicit state feedback policy of model predictivecontrol (MPC) as a ReLU DNN, and vice versa. The complexityand architecture of DNN has been examined through sometheorems and discussions. An approximate method has beendeveloped for identiﬁcation of input-space in ReLU net whichresults a PWA function over polyhedral regions. Also, inversemultiparametric linear or quadratic programs (mp-LP or mp-QP) has been studied which deals with reconstruction ofconstraints and cost function given a PWA function.

I. I

NTRODUCTION

In recent years, deep neural networks (DNN) has hadtremendous success in computer vision, speech recognition,and other areas of machine learning [0]. Despite all theseunprecedented performances in learning tasks, a theoreticalunderstanding of DNN’s architecture, features, and propertiesis still unexplored. Also, all of these successes are relatedto the supervised learning and are concerned mostly withfunction ﬁtting (e.g. classiﬁcation, function approximation,and regression). In contrast, in reinforcement learning (RL),the concept of feedback makes it hard to study in theorysince the statistical properties are dynamic/changing, andalso they are hard to train in practice. Another shortcoming ofDNN in RL is the absence of theoretical guarantees regardingstability, robustness, and convergence. All these issues needa great deal of consideration.On the other hand, model predictive control is a pow-erful tool for control and decision making in robotics andother safety-concerned applications due to its adaptability,robustness, and stability-safety guarantees. In speciﬁc, ex-plicit

MPC allows us to pre-compute the optimal controlpolicy u ∗ t = f ( x ( t )) as a function of current state x ( t ) ,and deploy it on-line in real-time. This prevents the issueof solving optimization problem in real-time on embeddedsystems which are typically limited with regard to memorycapacity and computation power. But deployment of anexplicit MPC suffers from increasing number of regionswhich grows exponentially (in the worst case) with thenumber of constraints [0]. This demands signiﬁcant amountof storage and computational complexity.Several attempts have been made to address those short-comings in explicit MPC [0]. But in contrast, regarding deepreinforcement learning, all the attempts were mainly focusedon empirical results, and analyzing its architecture only toappear in literature in very recent years. Here we focus

The authors are with the Department of Mechanical Engineer-ing, at the University of California, Berkeley. { samanfahandej,tomizuka } @berkeley.edu Fig. 1. This example shows the level of complexity that a NN canrepresent. The plot shows how a -layer NN maps input-space x ∈ R to the output-space y ∈ R with ReLU activation units in each layer (totalof parameters). The network creates a complex continuous PWA functionwhich can be a close approximation of a highly nonlinear function. more to mention some of these new ﬁndings regarding theDNN. Authors in [0] investigate the complexity of DNN bystudying the number of polytopic regions that they can attain.The paper also provides a tighter upper-bound (compared toprevious bounds [0]) on the maximal number of regions thatcan be partitioned by a ReLU DNN. The paper [0] discussesthe geometric properties of DNN for classiﬁcation and howto improve the robustness of such DNN to perturbation byanalyzing those properties. In [0] authors present a methodthat adds stability guarantee to the deep gradient descentalgorithm.Although these two speciﬁc areas of research (deep RLand MPC) have strong connection in (adaptive) optimalcontrol theory [0], but from mathematical point of view thereis another link between these two: Both ReLU DNN andsolution to the mp-LP or mp-QP in explicit MPC representa PWA function on polyhedra. This gives a great amount ofmotivation to investigate the possibility of reconstructing onefrom the other in order to beneﬁt from advantages in bothapproaches.Since the presumption concerning DNN that they havetens of thousands of parameters (weights and biases) seemsreasonable for vision or language applications, but in fact aDNN can represent a very complex function with much lessnumber of parameters. This is a compelling property whenwe are dealing with representing a control policy as a DNN.As an example, Fig. 1 shows a –layer ReLU network with a r X i v : . [ c s . L G ] J un ust parameters chosen randomly. The plot shows howa very small size network can subdivide the input-space tomany polytopes and different afﬁne policy pieces over eachregion.In the following, we ﬁrst provide mathematical deﬁnitionof ReLU DNN and its structural properties in Section II.Then in Section III, we present a brief overview of existingtheorems that represent connection between ReLU nets andPWA functions, and discuss challenges which prevent us tohave an explicit association. Also we present a sample-basedmethod in order to identify the underlying PWA functionthat a ReLU DNN can represent. Finally in Section IV, weprovide a numerical example that examine a simple networkand its equivalent PWA function.II. P RELIMINARIES AND P ROBLEM F ORMULATION

In this section we deﬁne feedforward ReLU DNN anddiscuss some properties of these models and their ability tomap input-space to the complex family of PWA functions.

A. Notation and Deﬁnitions

Deﬁnition 1

A rectiﬁer (ReLU) feedforward network is alayered neural network with L ∈ N hidden layers ( depth of the net) with input and output dimensions n , n L +1 ∈ N respectively. Each hidden layer l is composed of an afﬁnetransformation f l : R n l → R n l +1 followed by a rectiﬁeractivation function rect ( x ) : x (cid:55)→ max( x, f l = W l h l − + b l h l = rect ( f l ) = max { f l , } , where the max is an element-wise function, W l ∈ R n l × n l − , b l ∈ R n l , f l ∈ R n l , h l ∈ R n l , and h ∈ R n is deﬁned asthe input to the network. We call f l pre-activation and h l post-activation functions at hidden layer l . The output layeris just a linear transformation W ( L +1) and does not count aspart of the hidden layers. Finally, any ReLU net with L > layers is called L –layer DNN and can be represented as afunction f : R n → R n L +1 f = W L +1 ◦ h L ◦ f L ◦ . . . ◦ h ◦ f ◦ h ◦ f where ◦ denotes function decomposition. Deﬁnition 2

Every layer l ∈ { , , . . . , L } of a ReLU DNNhas n l activation units which is called width of the layer.Each activation unit receives the rectiﬁed weighted sum ofthe previous post-activation values h l − plus a bias. The j thactivation unit in layer l is denoted by h l,j ∈ R f l,j = W (cid:62) l,j h l − + b l,j h l,j = max { f l,j , } , where W (cid:62) l,j and b l,j are the j th row and the j th element ofmatrix W l and vector b l , respectively. An illustration of a L –layer ReLU DNN is shown in Fig. 2.Each blue circle in the ﬁgure represent an activation unit.Depending on the structure of DNN it can have any widthsize for each hidden layer. The total number of parameters Fig. 2. Illustration of L –layer ReLU DNN f : R n → R n L +1 . Dependingon application and complexity of function to be approximated, DNN canhave an arbitrary number of layers (i.e. depth) and activation units in eachlayer (i.e width). Note that DNNs are recognized just by the number of hidden layers . Also as seen, the output layer is just a linear transformationof last hidden layer without activation mapping. for each DNN θ = { W L +1 , b L } can be a basis to comparedifferent architectures by varying depths and widths. B. ReLU DNN Expressiveness

Despite the DNN’s empirical successes, some fundamentalquestions about how and why these results are achieved isabsent in literature.

Neural net expressivity is a subject thattries to answer some of these questions such as how thedepth, width, and the type of layers impact the functionthat the network represents, and also how these propertiesaffect its performance. Here we try to provide some of theseﬁndings. First we present a set of theorems that deal withthese types of questions.First, since the post-activation h ( s ) := max { s, } is itselfa PWA function and also the structure of ReLU networks is aseries of composition of afﬁne and post-activation functions,therefore the result is a PWA function that is deﬁned overthe regions of the input-space. This has been stated in thefollowing theorem. Theorem 1

Given a neural network with ReLU activation,the input-space is partitioned into convex polytopes.Proof:

The complete proof can be found in [0]. But assketch of proof, consider the ﬁrst layer l = 1 ; each pre-activation function establishes a hyperplane on the input-space since f ,j = W ,j x + b ,j = 0 . All such hyperplanesassociated to each unit provide a hyperplane arrangement which partitions the input-space into polytopes. By induction,it can be shown that this is true for all other layers in DNN.Fig. 3 illustrates the theorem for a -layer DNN.Another important property of ReLU networks is thenumber of polytopic regions that they can realize on theirinput-spaces. This helps on two fronts: 1) To understand thecomplexity of a speciﬁc architecture based on the lower- andupper-bound of the number of regions and 2) To design anrchitecture based on the number of regions that is necessaryfor an speciﬁc application. A lower-bound on the number ofregions is described in the following theorem Theorem 2

The maximal number of regions computed by aReLU neural network, with n inputs, L hidden layers, andwidths n l ≥ n ∀ l ∈ { , , . . . , L } , is lower-bounded by (cid:32) L − (cid:89) l =1 (cid:98) n l n (cid:99) n (cid:33) n (cid:88) j =0 (cid:18) n L j (cid:19) . (1) where (cid:98)·(cid:99) is the ﬂoor function on fractions.Proof: Proof can be found in [0] or [0].From the hyperplane arrangement it can be shown that themaximal number of regions for any ReLU networks with atotal of N activation units is bounded from above by N [0].This bound is very loose, and not very useful. But there isalso a tighter upper-bound on the number of regions, Theorem 3

The maximal number of regions of a ReLUneural network, with n inputs, L hidden layers, and widths n l ≥ n ∀ l ∈ { , , . . . , L } , is upper-bounded by (cid:88) ( j ,...,j L ) ∈ J L (cid:89) l =1 (cid:18) n l j l (cid:19) (2) where J = { ( j , . . . , j L ) ∈ Z L : 0 ≤ j l ≤ min { n , n − j , . . . , n l − − j l − , n l } , ∀ l ∈ [ L ] } Proof:

See Theorem 1. in [0].These theoretical backgrounds give us better understandingof how a structure of neural network impacts its performanceand also helps us to use some of these properties in orderto construct the link with explicit MPC in the followingsections.

C. explicit MPC and PWA functions

Given a dynamical system, the purpose of a constrainedoptimal control is to solve an optimization problem with aset of constraints on states x t and actions u t in order toﬁnd a sequence of actions u ∗ ∞ that controls the system toa desired/reference state. We can formulate such problem asan inﬁnite-horizon optimization problem J ∗∞ ( x (0)) = min u ,u ,... ∞ (cid:88) t =0 q ( x t , u t ) s.t. x t +1 = Ax t + Bu t ,x t ∈ X , u t ∈ U ,x = x (0) , ∀ t = 0 , , . . . . (3)This problem (3) cannot be solved easily due to its inﬁ-nite horizon nature with constraints on states and actions[0]; instead model predictive control (i.e. receding horizoncontrol ) is a suitable approach to follow, which mimics (3) Fig. 3. A ReLU DNN subdivides the input-space into polytopes. In fact,each hidden layer divides the input-space from the previous layer h l − , andthis recursively subdivides the input-space of the whole network. Here wehave a –layer ReLU net with input x ∈ R and four activation units ineach layer. The left plot shows the pre-activation functions f = W x + b that is equivalent to four hyperplanes in R . Hidden units are activatedin one side of their corresponding hyperplanes. The right plot shows bothhyperplanes from the ﬁrst and second layer in blue and red respectively.The hyperplanes in the second layer, as seen in the plot, are not straightlines, but they are bent at the ﬁrst layer boundaries (blue lines). When thosehyperplanes pass through different regions partitioned by the ﬁrst layer, theywill be bent. Therefore we have four activation boundary for four units inlayer , but they are not straight lines. In the right plot we can see all theregions that the network can partition on the input-space. Also it representsdifferent afﬁne functions over each polytope. by appropriate choice of p ( x N ) , q ( x k , u k ) , and X f as thefollowing, J ∗ ( x ( t )) = min u N − p ( x N ) + N − (cid:88) k =0 q ( x k , u k ) s.t x k +1 = Ax k + Bu k ,x k ∈ X , u k ∈ U ,x N ∈ X f ,x = x ( t ) , ∀ k = 0 , . . . , N − . (4)Equation (4) can be seen as a multiparametric program (mp)in which x ( t ) is the vector of parameters. In particular for thecase of linear and quadratic cost functions with polyhedralconstraints, it transpires that the solution to problem (4) isin fact a PWA function of the parameters u ∗ ( t ) = f ( x ( t )) ,an explicit solution to the MPC controller.In a number of instances we may be interested in theconstructing the PWA function corresponding to a ReLU netwhich is also the solution of a mp-LP/mp-QP problem. Thismay arise when, for example, we want to measure the subop-timality of a trained network with the solution of an explicitMPC. Inverse mp-LP/QP studies this idea, constructing suchoptimization problems from PWA functions. The followingtheorem expresses this in detail,

Theorem 4

Every continuous piecewise afﬁne function f : R m → R n can be obtained as a linear map of the uniqueexplicit solution ˆ f ( x ) of multi-parametric linear program in ig. 4. This illustration gives an entire perspective that this paper tries todepict. ReLU DNNs represent PWA function on polyhedra which subdividethe input-space. Assuming that the input to the neural network is theparameter x ( t ) , the network can exactly act as an explicit state feedbackpolicy. The dashed arrows indicate the needs for further study of methods -analytical or approximate- which can reconstruct the mathematical structuresof each block from the other in a constructive manner. the form of ˆ f ( x ) ∈ arg min z J ( z, x ) s.t. ( z, x ) ∈ Ω , (5) with dimension ˆ n , when ˆ n ≤ n . The proof presented in [0] is constructive, that means theproof establishes a procedure that results to the formulationof a mp-LP from a PWA function. The proof follows fromthe fact that every PWA function can be decomposed totwo convex function and from there it is straightforward toconstruct a mp-LP for a convex PWA function. Note that,although the proof is constructive, it is still very hard (oreven impossible) to implement it as an algorithm.Now, referring to Fig. 4 we can have a better understandingof the whole picture. Although it is possible to use learningto ﬁnd an approximation of an explicit MPC policy, yetconstructing a deep network from a PWA function needsto be studied.III. E

XPLICIT

MPC

AND R E LU DNNIn this section we connect the ReLU DNN and explicitMPC through their underlying connection which is PWAfunctions. As mentioned in previous sections we know thatevery ReLU DNN has a continuous PWA function represen-tation on the input-space, and vice versa (but not necessary inan explicit closed form, since constructing such a connectionis not easy in general).

A. Identiﬁcation of Input-Space in ReLU DNN

In order to identify the different regions partitioned byReLU NN on the input-space, we present an approximatemethod here that is an extension of the method introduced in[0]. We will show that it is possible to construct each piecesof a PWA function by extending the PWA representation ofa shallow network (i.e. L = 1 ).Since every dimension of the output-space can be treatedindependently, here we assume the construction of a scalar-valued function f : Ω ⊂ R n → R from a DNN model (i.ethe output-space is scalar n L +1 = 1 ), but as mentioned,the proposed method can be applied separately for eachdimension in the case of vector-valued DNN models. Any scalar-valued afﬁne function which is deﬁned over its convexregion Ω i can be written as f i ( x ) = u (cid:62) x + c, x ∈ Ω i , (6)where u (cid:62) ∈ R n and c ∈ R . In order to construct u (cid:62) and c in (6), we ﬁrst consider a NN with one layer and then extendit to the deep nets.

1) Shallow Network:

Note that we can reformulate ascalar rectiﬁer function asrect ( h ) = I ( h ) · h, (7)where I ( h ) is an indicator function deﬁned as follows I ( h ) = (cid:40) h > otherwise (8)Now considering a single layer NN f : R n → R , we rewriteit with the help of indicator function as f ( x ) = W diag (cid:32)  I ( W , x + b , ) I ( W , x + b , ) ... I ( W ,n x + b ,n )  (cid:33) ( W x + b ) . (9)Simplifying (9), f ( x ) can be written more compactly as f ( x ) = W diag ( I f ( x )) W x + W diag ( I f ( x )) b , (10)where diag ( I f l ( x )) is the compact form of indicator functionfor pre-activation f l in layer l . From (10) we can see thatgiven input x weight u (cid:62) and bias c can be computed.

2) Deep Network:

Now we can extend the derivation in(10) for deep network. Given an input x from a region Ω i we can construct the corresponding weight u (cid:62) and bias c foreach afﬁne map f i . The weight is computed by u (cid:62) = W L +1 diag ( I f L ( x )) W L . . . diag ( I f ( x )) W diag ( I f ( x )) W , (11)A bias of the afﬁne map c also can be computed similarly c = W L +1 diag ( I f L ) W L . . . diag ( I f ) W diag ( I f ) b + W L +1 diag ( I f L ) W L . . . diag ( I f ) b + ... + W L +1 diag ( I f L ) b L (12)Both equations (11) and (12) depend on input x , so we needto use a (large enough) set of samples from the input-spaceto be able to identify different afﬁne responses of the output.It is worth mentioning that from (11) and (12) it is alsopossible to derive the corresponding afﬁne function for eachactivation unit up to a speciﬁc hidden layer instead of thewhole network. This means that any activation unit in anystage of a deep neural network can be written as a PWAfunction over the input-space of the network. This needsfurther study, but as a preliminary, we can ask ”is thereany connection between layers of a neural network and forexample the horizon in model predictive control?” ig. 5. The plots show a scalar PWA function and its decomposition totwo convex functions, f ( x ) = γ ( x ) − η ( x ) . B. Learning DNN with Exact Architecture

Several literature study the concept of learning approx-imate MPC through supervised or reinforcement learningprocess [0]. But we can utilize the structure of PWA controlpolicy to further improve the process of learning [0]. Thefollowing theorem is the key concept in the process.

Theorem 5

Any convex PWA function f : R n → R ,which also can be formulated as pointwise maximum of N afﬁne functions f ( x ) := max i =1 ,...,N f i ( x ) , can be exactlypresented by a ReLU DNN with width n l = n + 1 , ∀ l ∈ [ L ] and depth N .Proof: See Theorem 2 in [0].Depending on the dimension of control input u ∈ R m , it maybe needed to train up to m networks. In fact, every elementin the control input vector can be treated separately. Eachelement u ∗ ( x ( t )) : R n → R is a PWA function which needsto be decomposed into the difference of two convex PWAfunctions. Finally, theorem 5 gives an exact design structurefor each network. And presumably, this should result abetter learning (smaller loss value, faster convergence, betteraccuracy), which ultimately impacts the performance of thecontroller that the trained network substitutes.IV. E XPERIMENTAL VALIDATION

In this section we present a simple example (as a proofof concept) to illustrate the way to construct a mp-LP froma ReLU DNN. We examine a -layer feedforward net with n = n = 2 (i.e. total of four activation units) and onedimensional input/output x, y ∈ R . The two hidden layersare constructed as follows h = (cid:20) h , h , (cid:21) = max { , W x + b } = max (cid:26) , (cid:20) − / (cid:21) x + (cid:20) − (cid:21) (cid:27) h = (cid:20) h , h , (cid:21) = max { , W h + b } = max (cid:26) , (cid:20) − − / − (cid:21) h + (cid:20) (cid:21) (cid:27) , and the linear map for the output layer is y = W h = (cid:2) − (cid:3) h . The PWA function equivalent to the above feedforwardnetwork is y = f ( x ) =  x ≤ − x + 1 ≤ x ≤ x − ≤ x ≤ x − ≤ x ≤ x ≤ x (13)since the PWA function (13) is not convex nor concave, wecan decompose it into the difference of two convex functions γ ( x ) and η ( x ) as follows γ ( x ) =  − x x ≤ x − ≤ x ≤ x − ≤ xη ( x ) =  − x x ≤ x − ≤ x ≤ x − ≤ x Then we can construct the mp-LP counterpart that its solutionis the same as PWA function (13). Introducing decisionvariable z ∈ R , we can write J ∗ ( x ) = min z ∈ R (cid:2) − (cid:3) (cid:20) z z (cid:21) s.t. − x ≤ z x ≥ z x − ≤ z − x + 1 ≥ z x − ≤ z − x + 5 ≥ z ,x ∈ [0 , (14)and then we can construct f ( x ) with linear map T as f ( x ) = T ˆ f ( x ) = (cid:2) (cid:3) ˆ f ( x ) In fact the explicit solutions to the mp-LP are z ∗ = γ ( x ) , z ∗ = − η ( x ) and ˆ f ( x ) = (cid:2) γ ( x ) − η ( x ) (cid:3) T , which exactlyfollows the constructive proof in [0].V. C ONCLUSION

We presented an overview of ReLU deep neural networks;and discussed several structural properties of such modelswhich are key concepts of using a ReLU network as an ex-plicit state feedback policy for a model predictive controller.Speciﬁcally, we argued since any ReLU network models aPWA function on polyhedra, it would be a perfect choiceto use a ReLU network instead of state feedback policycomputed by an explicit MPC procedure considering storageand execution complexity of such controllers in real-time. We ig. 6. The plots show the explicit solution to the mp-LP (14) usingMPT3 toolbox [0].

Top & center: show plots of the optimizers z ∗ = γ ( x ) and z ∗ = − η ( x ) which both are functions of the parameter x . the solutionexactly results the PWA function (13). bottom: The plot depicts the optimalvalue J ∗ ( x ) which is a function of parameter x and also is convex and PWAas we know from the theory of mp-LP (corollary . in [0]). also presented a sample-based method that identiﬁes differentafﬁne pieces of a ReLU networks. For future work, alongsidefurther development of some initial ﬁndings in this paper,other very recently new ideas such as representing ReLUDNN as a mixed-integer linear problem [0] can be the subjectof further investigation.VI. A CKNOWLEDGEMENT

This work was partially supported by the NSF Gradu-ate Research Fellowship under Grant No. DGE 1106400awarded to the ﬁrst author, and the Cheryl and John Neer-hout, Jr. Distinguished chair fund.R

EFERENCES [0] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenetclassiﬁcation with deep convolutional neural networks,” in

Advances in Neural Information Processing Systems 25 , F.Pereira, C. J. C. Burges, L. Bottou, and K. Q. Weinberger,Eds., Curran Associates, Inc., 2012, pp. 1097–1105.[0] I. Goodfellow, D. Warde-Farley, M. Mirza, A. Courville, andY. Bengio, “Maxout networks,” in

Proceedings of the 30thInternational Conference on Machine Learning , S. Dasguptaand D. McAllester, Eds., ser. Proceedings of Machine Learn-ing Research, vol. 28, Atlanta, Georgia, USA: PMLR, Jun.2013, pp. 1319–1327.[0] D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre,G. van den Driessche, J. Schrittwieser, I. Antonoglou, V.Panneershelvam, M. Lanctot, S. Dieleman, D. Grewe, J.Nham, N. Kalchbrenner, I. Sutskever, T. Lillicrap, M. Leach,K. Kavukcuoglu, T. Graepel, and D. Hassabis, “Masteringthe game of go with deep neural networks and tree search,”

Nature , vol. 529, pp. 484–503, 2016.[0] C. Piech, J. Bassen, J. Huang, S. Ganguli, M. Sahami, L. J.Guibas, and J. Sohl-Dickstein, “Deep knowledge tracing,” in

Advances in Neural Information Processing Systems 28 , C.Cortes, N. D. Lawrence, D. D. Lee, M. Sugiyama, and R.Garnett, Eds., Curran Associates, Inc., 2015, pp. 505–513. [0] A. Alessio and A. Bemporad, “A survey on explicit modelpredictive control,” in

Nonlinear Model Predictive Control:Towards New Challenging Applications , L. Magni, D. M.Raimondo, and F. Allg¨ower, Eds. Berlin, Heidelberg: SpringerBerlin Heidelberg, 2009, pp. 345–369.[0] A. Bemporad, “Model predictive control design: New trendsand tools,” Jan. 2007, pp. 6678–6683.[0] A. Bemporad and C. Filippi, “Suboptimal explicit recedinghorizon control via approximate multiparametric quadraticprogramming,”

Journal of Optimization Theory and Applica-tions , vol. 117, no. 1, pp. 9–38, Apr. 2003.[0] T. Geyer, F. D. Torrisi, and M. Morari, “Optimal complexityreduction of polyhedral piecewise afﬁne systems,”

Automat-ica , vol. 44, no. 7, pp. 1728–1740, Jul. 2008.[0] T. A. Johansen, “Approximate explicit receding horizon con-trol of constrained nonlinear systems,”

Automatica , vol. 40,pp. 293–300, 2004.[0] T. A. Johansen and A. Grancharova, “Approximate explicitconstrained linear model predictive control via orthogonalsearch tree,”

IEEE Transactions on Automatic Control , vol.48, no. 5, pp. 810–815, May 2003.[0] P. Grieder and M. Morari, “Complexity reduction of recedinghorizon control,” in , vol. 3,Dec. 2003, 3179–3190 Vol.3.[0] J. Spjøtvold, P. Tøndel, and T. A. Johansen, “Continuousselection and unique polyhedral representation of solutions toconvex parametric quadratic programs,”

Journal of Optimiza-tion Theory and Applications , vol. 134, no. 2, pp. 177–189,Aug. 2007.[0] T. Serra, C. Tjandraatmadja, and S. Ramalingam, “Boundingand counting linear regions of deep neural networks,”

CoRR ,vol. abs/1711.02114, 2017. arXiv: .[0] G. Montufar, “Notes on the number of linear regions of deepneural networks,” Mar. 2017.[0] A. Fawzi, S. Moosavi-Dezfooli, P. Frossard, and S. Soatto,“Classiﬁcation regions of deep neural networks,”

CoRR , vol.abs/1705.09552, 2017. arXiv: .[0] F. Berkenkamp, M. Turchetta, A. Schoellig, and A. Krause,“Safe model-based reinforcement learning with stability guar-antees,” in

Advances in Neural Information Processing Sys-tems 30 , I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach,R. Fergus, S. Vishwanathan, and R. Garnett, Eds., CurranAssociates, Inc., 2017, pp. 908–918.[0] R. S. Sutton, A. G. Barto, and R. J. Williams, “Reinforcementlearning is direct adaptive optimal control,”

IEEE ControlSystems Magazine , vol. 12, no. 2, pp. 19–22, Apr. 1992.[0] M. Raghu, B. Poole, J. Kleinberg, S. Ganguli, and J. Sohl-Dickstein, “On the expressive power of deep neural networks,”in

Proceedings of the 34th International Conference on Ma-chine Learning , D. Precup and Y. W. Teh, Eds., ser. Proceed-ings of Machine Learning Research, vol. 70, InternationalConvention Centre, Sydney, Australia: PMLR, Aug. 2017,pp. 2847–2854.[0] G. F. Montufar, R. Pascanu, K. Cho, and Y. Bengio, “Onthe number of linear regions of deep neural networks,”in

Advances in Neural Information Processing Systems 27 ,Z. Ghahramani, M. Welling, C. Cortes, N. D. Lawrence,and K. Q. Weinberger, Eds., Curran Associates, Inc., 2014,pp. 2924–2932.[0] R. Pascanu, G. Mont´ufar, and Y. Bengio, “On the number ofinference regions of deep feed forward networks with piece-wise linear activations,”

CoRR , vol. abs/1312.6098, 2013.arXiv: .[0] F. Borrelli, A. Bemporad, and M. Morari,

Predictive controlfor linear and hybrid systems . Cambridge University Press,2017.0] A. Hempel, P. J. Goulart, and J. Lygeros, “Every continuouspiecewise afﬁne function can be obtained by solving a para-metric linear program,” en, in , European Control Conference (ECC 2013);Conference Location: Z¨urich, Switzerland; Conference Date:July 17-19, 2013; ., IEEE, 2013, pp. 2657–2662.[0] S. Chen, K. Saulnier, N. Atanasov, D. D. Lee, V. Kumar,G. J. Pappas, and M. Morari, “Approximating explicit modelpredictive control using constrained neural networks,” in

American Control Conference (ACC) , 2018.[0] M. Hertneck, J. K¨ohler, S. Trimpe, and F. Allg¨ower, “Learningan approximate model predictive controller with guarantees,”

CoRR , vol. abs/1806.04167, 2018. arXiv: .[0] B. Karg and S. Lucia, “Efﬁcient representation and approx-imation of model predictive control laws via deep learning,”

ArXiv preprint arXiv:1806.10644 , 2018.[0] B. Hanin, “Universal Function Approximation by Deep NeuralNets with Bounded Width and ReLU Activations,”

ArXiv e-prints , Aug. 2017. arXiv: .[0] M. Herceg, M. Kvasnica, C. Jones, and M. Morari, “Multi-Parametric Toolbox 3.0,” in

Proc. of the European ControlConference , http : / / control . ee . ethz . ch / ˜mpthttp : / / control . ee . ethz . ch / ˜mpt