[PDF] Some manifold learning considerations towards explicit model predictive control

Abstract

Model predictive control (MPC) is a de facto standard control algorithm across the process industries. There remain, however, applications where MPC is impractical because an optimization problem is solved at each time step. We present a link between explicit MPC formulations and manifold learning to enable facilitated prediction of the MPC policy. Our method uses a similarity measure informed by control policies and system state variables, to "learn" an intrinsic parametrization of the MPC controller using a diffusion maps algorithm, which will also discover a low-dimensional control law when it exists as a smooth, nonlinear combination of the state variables. We use function approximation algorithms to project points from state space to the intrinsic space, and from the intrinsic space to policy space. The approach is illustrated first by "learning" the intrinsic variables for MPC control of constrained linear systems, and then by designing controllers for an unstable nonlinear reactor.

Full PDF

aa r X i v : . [ m a t h . O C ] J u l ORIGINAL ARTICLEProcess Systems Engineering

Some manifold learning considerationstowards explicit model predictive control

Robert J. Lovelett ∗ | Felix Dietrich |Seungjoon Lee | Ioannis G. Kevrekidis Department of Chemical andBiomolecular Engineering, JohnsHopkins University, Baltimore, MD21218, USA

Correspondence

Ioannis G. Kevrekidis, Department ofChemical and BiomolecularEngineering, Johns HopkinsUniversity, Baltimore, MD 21218,USAEmail: [email protected]

Present address ∗ Department of Chemical andBiological Engineering, PrincetonUniversity, Princeton, NJ 08544, USA

Funding information

This work was partially supported byDARPA (Lagrange Program,Contract No. N6601-18-C-4031, aswell as the Physics of ArtiﬁcialIntelligence Program, AgreementHR00111890032).

Model predictive control (MPC) is a de facto standard con-trol algorithm across the process industries. There remain,however, applications where MPC is impractical becausean optimization problem is solved at each time step. Wepresent a link between explicit MPC formulations and man-ifold learning to enable facilitated prediction of the MPCpolicy. Our method uses a similarity measure informedby control policies and system state variables, to “learn”an intrinsic parametrization of the MPC controller usinga diﬀusion maps algorithm, which will also discover a low-dimensional control law when it exists as a smooth, nonlin-ear combination of the state variables. We use function ap-proximation algorithms to project points from state spaceto the intrinsic space, and from the intrinsic space to policyspace. The approach is illustrated ﬁrst by “learning” theintrinsic variables for MPC control of constrained linearsystems, and then by designing controllers for an unstablenonlinear reactor.

KEYWORDS

Model Predictive Control, Data Mining, Diﬀusion Maps,Machine Learning Robert J. Lovelett et al. | INTRODUCTION

Model predictive control (MPC) is a de facto standard method for control across the process industries[31]. In MPC, which is also known as receding horizon control, the control action is calculated on-line bysolving a receding horizon optimal control problem at each time step to determine the subsequent controlaction to take [12, 24, 23]. MPC for constrained linear systems has been well-established for some time now[24], and that success has led to a signiﬁcant eﬀort to extend MPC to systems that are harder to controldue to stochasticity [23], nonlinearity [25], decentralization [5], or other challenges. Many of the theoreticalchallenges associated with these more complex MPC problems have been overcome [25], enabling practitionersto use MPC in new applications that are even more demanding. Because, however, MPC entails solvingan optimization problem at every sampled time step, it will always be limited by the computational time ittakes to solve that problem. Due to limited computational resources, deploying MPC remains challengingfor strongly nonlinear, high dimensional, and stiﬀ systems [25]. Furthermore, given recent interest in MPCfor distributed or mobile systems [36, 5], avoiding high computational overhead is even more important forapplications where computational resources are limited at the point of control action. These concerns motivatemethods for reducing on-line computational cost in MPC.To address the demand for excessive computational resources, there has been signiﬁcant research into “fast”MPC over the past two decades. Broadly speaking, two approaches have been investigated for fast MPC:(1) suboptimal MPC, where a simpler optimization problem is solved that is equivalent (or approximatelyequivalent) to the solution of the full problem [42, 41], and (2) “explicit” MPC, where an explicit control lawis found that approximates the implicit MPC controller but does not require on-line optimization [3]. In thiswork, we focus on the latter, explicit MPC.For constrained linear systems, an explicit solution can be found by solving a single multiparametricquadratic programming problem oﬀ-line [3, 39, 28, 9]. The “parameters” are the system states and the solutionto this optimization problem is piecewise aﬃne as long as the constraints are linear; with this solution in hand,the controller only needs to ﬁrst follow a look-up table to determine the relevant polytopic region of state-spacein which the system currently lies and then perform an aﬃne computation. Unfortunately, for high dimensionalsystems, the number of polytopes increases exponentially, which can make even the look-up operation too slowfor some applications [4]. Multiparametric programming has also been applied to nonlinear MPC, but ﬁndingexact solutions to multiparametric nonlinear programming problems is not always feasible, and even when itis feasible, the solution is not necessarily piecewise aﬃne. Therefore, approximation methods are typicallyused instead [10].Although the multiparametric programming approach is dominant in the literature, another approach toexplicit MPC is interpolation or function approximation. In this framework, a large number of control policiesare computed oﬀ-line and the on-line control law is constructed by interpolation (or regression) from statesto the optimal policies that were computed oﬀ-line. This method has been most successful using artiﬁcialneural networks (ANNs) as the interpolation functions (e.g., [1, 18, 4, 34]). Recently, this method was reﬁnedfor constrained linear MPC problems using deep neural nets combined with Dykstra’s projection to ensureconstraint satisfaction [4].In this work, we present an alternative framework for explicit MPC based on ﬁnding an appropriatefunction between the system state and the control policy that approximates the MPC controller. We buildupon recent advances in nonlinear data mining, especially manifold learning, to show how we can ﬁnd anintrinsic parametrization of the MPC problem that respects similarities in control policy space. By working in obert J. Lovelett et al. this intrinsic space, we can design eﬀective interpolating or approximating functions as our explicit controllers.Speciﬁcally, we use state feedback to project the system’s position in state space onto a latent manifold, witha parameterization informed by the control policy space and by the state space; and then, using our positionin the latent space, we can estimate the entire optimal control policy. We also demonstrate how we can, insome cases, approximate the “inverse problem” of ﬁnding the system state given the optimal control policy.Solving the inverse problem could be valuable in contexts outside traditional MPC, where system feedbackcan be easily observed, but measuring the system state is challenging, as in some biological/living systems.In order to eﬀectively interpolate between the state space, latent manifold, and control policy, we apply threetools for learning the transformations: polynomial regression (PR), artiﬁcial neural networks (ANNs), andGaussian process (GP) regression.The remainder of this paper is organized as follows: in Section 2, we discuss our approach to MPCusing manifold learning and function approximation in more detail; in Section 3, we present illustrativeexamples of our approach to several problems including constrained linear systems and a mechanistic modelof a nonisothermal continuous stirred tank reactor (CSTR); and in Section 4 we mention possible applicationsof this framework and provide some directions for future research. | THEORY2.1 | Model Predictive Control Background

First, consider the discrete time nonlinear system written in state space form as a diﬀerence equation: x k +1 = f ( x k , u k ) y k = h ( x k , u k ) (1)where x ∈ X is the system state vector, u ∈ U is the input vector, y ∈ Y is the output vector, and k ∈ Î isthe sample index. For system 1, the MPC controller is deﬁned as the controller that minimizes the cost [24]: V ( y , k , u ) = F ( y k + N ) + k + N − Õ i = k ℓ ( y i , u i ) (2)where u = { u k , u k +1 , ..., u k + N − } , ℓ ( y i , u i ) is the stage cost (such as a normed diﬀerence between a referencetrajectory, r k , and the predicted trajectory) and F ( y k + N ) is the terminal cost. y i for i > k is found by evolvingthe model (Equation 1) in time using the policy u and initial condition x k . There are often constraints on therange of inputs, range of states, or rate of change of inputs; refer to [24] for a full discussion of constrained MPC.Here, we will assume that we can deploy an optimization algorithm to ﬁnd the optimal control policy, u ∗ , thatminimizes Equation 2. Because we are solving the optimization problem oﬀ-line, this optimization algorithmneed not be highly eﬃcient. In this paper, we will also assume that the reference trajectory is constant overthe entire prediction horizon, i.e., r k = r , and could be piecewise constant during process operation; we allow We use upper case,

Boldface notation to refer to matrices, lower case, boldface notation to refer to a vector of points intime (e.g., an optimal control policy), and nonboldface notation to refer to vector variables at a single time step, as well as forscalars.

Robert J. Lovelett et al. ourselves this strong assumption since the focus is on showcasing the data-driven features of the approach.The solution to the optimization problem deﬁnes a feedback control law. If many optimization problemsare solved oﬀ-line, then it is possible, subject to some weak smoothness conditions, to interpolate between thecurrent state, x k , and the optimal control policy, u ∗ . The interpolation function is a surrogate for the implicitlydeﬁned MPC controller that allows for near instantaneous calculation of the optimal policy. For ﬁrst order (i.e.,single state), single-input, single-output systems, the problem is trivial; but for more challenging problems,interpolation may be hindered by the “curse of dimensionality,” where function training and evaluation timecan increase exponentially with dimensionality. In this work, we discuss recent advances in machine learningand nonlinear data mining that allow us to use tools for function approximation in high-dimensional ( buteﬀectively low-dimensional ) spaces, thus enabling explicit computation of the MPC control policy. | Manifold Learning of MPC Systems

Manifold learning algorithms comprise a class of unsupervised machine learning techniques that attempt toﬁnd a low-dimensional manifold on which high-dimensional data points are embedded. The earliest manifoldlearning algorithm to be developed was principal component analysis (PCA), ﬁrst presented by Pearson in1901 [30]. PCA and related techniques, however, are limited in that they can only ﬁnd a linear embeddingof the data. Since 2000, a number of related nonlinear manifold learning techniques have been presented[38, 35, 2, 6, 7, 40]. These methods all share the attribute that they can ﬁnd nonlinear embeddings; therefore,for example, if 2D data points were all to lie on a smooth curve, PCA would ﬁnd variation in 2 axes, whereasthese manifold learning techniques would (correctly) discover that the data can be represented using only asingle variable.Our key observation is that the optimal control policy, u ∗ , which has extrinsic dimensionality of ( N × dim ( u )) (where N is the number of steps in the control horizon and dim ( u ) is the number of manipulated inputs) must lie on a manifold with intrinsic dimensionality limited by:dim i ( u ∗ ) ≤ dim ( x ) + dim ( r ) (3)where dim (·) is the extrinsic dimensionality and dim i (·) is the intrinsic dimensionality of an underlying man-ifold/vector. This limit results from observing that u ∗ is a function of the current system state, x , and thecurrent reference trajectory, r (the latter, in our case, is just the set point, having dimensionality equal tothe number of controlled output variables).Because of this property, we will refer to the augmented state vector, x ∗ ∈ X ∗ , as the concatenation ofthe state variables and the variables parametrizing the reference trajectory. The intrinsic dimension may belower than this maximum if, for example, the system quickly relaxes to a slow manifold (i.e., it is singularlyperturbed), if the state space realization is nonminimal (i.e., containing redundant information), or if thecontrol policy is a function of the diﬀerence between a state variable and a reference variable (as could be thecase for linear control of linear systems).For many systems, however, even though u ∗ lies on a manifold of equivalent or lower dimension than X ∗ ,the augmented state space may be very poorly parametrized for predicting u ∗ . In other words, a function c ( x ∗ ) : X ∗ → U is likely to be quite complicated, requiring many basis functions to represent, and thereforechallenging to learn and expensive to evaluate. Furthermore, it is possible that some (low dimensional) set obert J. Lovelett et al. −20 −10 0 10 20 p −20−15−10−505101520 p −0.4 −0.2 0.0 0.2 0.4 ϕ λ −0.4−0.20.00.20.4 ϕ λ −30−20−100102030 q FIGURE 1 (a) Points sampled from Ò colored by a “complicated” function. (b) Intrinsic reparametrizationof Ò learned by DMAPS using the informed metric in Equation 4 (see text for discussion).of nonlinear combinations of x ∗ are as eﬀective, or nearly as eﬀective, as x ∗ itself for predicting u ∗ . Wetherefore seek an eﬀective reparametrization of X ∗ that prioritizes similarities in policy space, enabling us toﬁnd simpler, potentially lower dimensional, representations of the relationship between (augmented) systemstates and optimal control policies.In this work, we use the diﬀusion maps (DMAPS) algorithm for manifold learning [6, 7], which is reviewedin Appendix A. For the MPC problem, we seek to reparametrize the augmented state space in such a way thatwe can predict the control policy based on knowledge of the system state, as well as the state variables basedon knowledge of the control policy. We will therefore use an “informed” metric (c.f., [14, 21]) that considerspoints ( z ) in an “input space” ( Z ) as well as points ( f ( z ) ) in a “function space” ( F ) to build the kernel matrix(Equation 13): d i , j = (cid:13)(cid:13) z i − z j (cid:13)(cid:13) ε + (cid:13)(cid:13) f ( z i ) − f ( z j ) (cid:13)(cid:13) ξ (4)where k · k is the Euclidean norm, and ε and ξ are tuning parameters. We choose the tuning parameters toprioritize distances in the function space, i.e., median (cid:16)(cid:13)(cid:13) z i − z j (cid:13)(cid:13) (cid:17) / ε ≪ median (cid:16)(cid:13)(cid:13) f ( z i ) − f ( z j ) (cid:13)(cid:13) (cid:17) / ξ . Therefore,it is only at positions where the distance in function space tends to zero that the distance in input spacebecomes signiﬁcant.For the MPC problem, we construct the weight matrix using Equation 4 with the augmented state variablesused for z and the control policies used for f ( z ) . We comment that it is not necessary to insert the entire control policy ( u ∗ ) as f ( z ) in the distance metric, and instead recommend using ( dim ( x ∗ ) + 1 ) steps of thecontrol policy (as a heuristic that was inspired by the Takens embedding theorem [37]). By appropriatechoice of ε and ξ , the informed metric ﬁnds a parametrization of the augmented state space that is organizedprimarily by similarities in policy space—when (cid:13)(cid:13)(cid:13) u ∗ i − u ∗ j (cid:13)(cid:13)(cid:13) is large, it will spread these points far apart in theintrinsic space. Thus, the intrinsic parametrization will organize the control policy space and the state variablespace using as few variables as possible—subject to the limit in Equation 3—and hopefully make predictionof the high-dimensional control policies “simpler”. The leading nonredundant eigenvectors can be used toparametrize the manifold of the outputs (the control policies) and therefore provide the coordinates importantfor predicting them [6, 7]. Therefore (nonredundant [11]) eigenvectors—sometimes after a large gap in thespectrum—correspond to coordinates in the augmented state space that are unimportant for predicting thecontrol policy and can be eliminated to provide a reduced order approximation of the control law. Robert J. Lovelett et al.

We will call the manifold discovered by this procedure “the control manifold”, M c ; the non-redundantDMAPS eigenvectors , { φλ k } , provide its parametrization (where the eigenvectors are optionally scaled bytheir corresponding eigenvalues, or powers of their eigenvalues, see Appendix A). We note that M c hasdimensionality less than or equal to that of X ∗ , and typically much less than the extrinsic of the controlpolicies.Because the parametrization of M c respects similarities in policy space, we expect that predicting u ∗ given φ should be a “simple” task. If (cid:13)(cid:13)(cid:13) u ∗ i − u ∗ j (cid:13)(cid:13)(cid:13) is small, then (cid:13)(cid:13) φ i − φ j (cid:13)(cid:13) should be small as well (unless (cid:13)(cid:13)(cid:13) x ∗ i − x ∗ j (cid:13)(cid:13)(cid:13) isvery large). To demonstrate, consider the function q = f ( p ) : Ò → Ò . Here, q = f ( p ) = 10 sin q p + p + p .Data ( p ) sampled from a regular grid in Ò and colored by function value ( q ) are shown in Figure 1a. If wedid not a priori know the function, we may resort to complicated and/or computationally expensive methodsto approximate it in ( p , p ) space. However, in the reorganized space discovered via DMAPS and shown inFigure 1b, the values q are seen to be a “simple” function of the single DMAPS coordinate φ . | Function Approximation

The above discussion implies that it is possible to link x ∗ to u ∗ through M c , but does not discuss how to ﬁndour position on M c in φ coordinates, nor, how, given φ , to predict the policy, u ∗ . Additionally, we wouldlike to solve the inverse problem, which requires predicting φ from u ∗ , and then predicting x ∗ from φ , withthe important caveat that the problem is not in general invertible, primarily when the control policy pushesagainst constraints or when the system dynamics are overspeciﬁed. The DMAPS metric from Equation 4was designed to make the task of predicting u ∗ from φ (or vice versa) simple in comparison with predicting u ∗ from x ∗ , since the former function is deﬁned over the lower-dimensional, intrinsic space. In any case, weuse three methods for function approximation, each of which has advantages and disadvantages: polynomialregression (PR), artiﬁcial neural networks (ANNs), and Gaussian process (GP) regression. Other tools forfunction approximation, such as geometric harmonics [8] or Laplacian pyramids [32] may be used as well. Inprinciple, any of these methods may be applicable to learn any of the functions of interest ( x ∗ → φ , φ → u ∗ ,and their inverses), though in practice PR is unlikely to eﬀectively capture the mapping between x ∗ and φ ,which is typically quite complicated.First, we examine polynomial regression, a classical linear technique. In some cases, the mapping between φ and u ∗ is suﬃciently simple that this classical approach works quite well (see Figure 1, for example).We use the ordinary least squares estimator, ˆΘ = ( X T X ) − X T Y to ﬁnd the coeﬃcients of the polynomialregression estimator, where X is a feature matrix and Y is an output matrix. For this problem, rows of X = h φ φ ... φ n i and rows of Y = u ∗ . Then, the output at new values of φ can be predicted using ˆ Y = ˆΘ X [13].Artiﬁcial neural networks have become popular in many applications due to their versatility and ability torepresent arbitrary continuous functions, and are widely considered the workhorse method for “deep learning”[27, 22]. Artiﬁcial neural networks have also been widely used in control for decades, mostly for empiricalapproximation of state equations [15], but also for explicit MPC problems [4, 1, 18, 34]. The main disadvantageof ANNs is that they require signiﬁcant computational time for training, and require manual design of networktopology and choice of activation function. More complex networks, with more nodes and hidden layers, requiremore training time and are susceptible to overﬁtting.Finally, GP regression, as a non-parametric Bayesian modeling technique, provides the conditional distri-bution of an output as a function of its observed inputs. In order to deal with higher dimensional input spaces, obert J. Lovelett et al. we introduce the automatic relevance determination (ARD) weight in the covariance kernel, which employsan individual lengthscale hyperparameter for each input dimension [33]. In this paper, we employ a Matérnkernel for the covariance: κ ( x i , x j ) = (cid:16) √ θ d ( x i , x j ) (cid:17) exp (cid:16) −√ θ d ( x i , x j ) (cid:17) . (5)GP regression typically requires some training time, in that hyperparameters should be optimized; butthere are usually far fewer adjustable parameters than in ANNs. | RESULTS AND DISCUSSION

To illustrate our approach, we present (1) for validation purposes, an instructive demonstration of the parama-trization learned by diﬀusion maps for constrained linear systems and (2) a tutorial example of our DMAPS-enabled explicit MPC formulation for controlling a jacketed nonisothermal continuous stirred tank reactor(CSTR). | Constrained MPC for Linear Systems

Nearly all practical MPC problems have constraints on the state variables, input variables, or rates of changeof input variables. Therefore, even in the case of a linear system model, the control law is nonlinear. As long asthe constraints are linear, the control law will be piecewise aﬃne with a set of so-called “critical regions” in thestate space that correspond to which constraints are active [3]. We will use our manifold learning techniquesto learn eﬀective parametrizations for these kinds of nonlinear control laws.First, to illustrate how the DMAPS algorithm will automatically detect an overspeciﬁed problem andidentify a lower-dimensional parametrization, we introduce the regulatory control problem for the followingsingularly perturbed system: x k +1 = " . . . . x k + " . . u k (6)where we assume exact state information is available ( y k = x k ). Note that the eigenvalues of the statetransition matrix are { . , . E − } , indicating signiﬁcant time scale separation that results in an essentiallyone-dimensional system; furthermore, the Hankel singular values (which characterize input to output energytransfer for each state when the system is in balanced coordinates [26]) are { . , . E − } , again suggestingthe system is eﬀectively one-dimensional. For this problem, the MPC control policy can be found via thesolution to the following constrained quadratic programming problem:min u V ( x , k , u ) = x Tk +5 Q t x k +5 + k +4 Õ i = k (cid:16) x Ti Q x i + u Ti R u i (cid:17) s.t. u i ≤ . , i = k , k + 1 , ..., k + 4 − u i ≤ . , i = k , k + 1 , ..., k + 4 (7) Robert J. Lovelett et al. x −0.50.00.5 x −0.5 0.0 0.5 u * k −0.40.00.4 PC1 −10010

PC2 −10 0 10 u * k −0.40.00.4 ϕ λ −0.0250.0000.025 ϕ λ −0.025 0.000 0.025 u * k −0.4−0.20.00.20.4 −20 −10 0 10 20 PC1, ϕ −0.50−0.250.000.250.50 u * k PCADMAPS (a) (b)(c) (d)

FIGURE 2 MPC control law from the the solution to the optimization problem in Equation 7, which is asingularly perturbed linear problem with linear constraints, plotted in several coordinate systems: (a) Thegrid of points sampled from the state space, (b) the ﬁrst two principal components, and (c) the DMAPSembedding coordinates. (d) The control action plotted against the ﬁrst PC or DMAPS eigenvector,demonstrating how a low dimensional control law is discovered (rescaled for easier visualization).where Q t is the terminal cost (found by solving the discrete time Lyapunov equation Q t = A T Q t A + Q ; A isthe state transition matrix for System 6), Q = I is the state cost, and R = 0 . is the control cost.We sample a 20 x 20 regular grid from the state space, and calculate the control policy at each of the gridpoints. As recommended in Section 2, we insert the state variables for z and the ﬁrst 5 steps of the controlpolicy as f ( z ) into Equation , and then construct the DMAPS embedding. Additionally, we use PCA (whereeach row in the data matrix is the concatenation of the state vector and the control policy), to compare thenonlinear manifold learning technique with a classical linear method.The MPC control law (i.e., the ﬁrst step of the control policy) is shown as a function of the states in Figure2a and, as anticipated, is eﬀectively one dimensional (along the diagonal). Notice that there are apparentlythree critical regions for predicting the control law: the minimum and maximum input constraints, and aplane that connects them. For this simple problem, DMAPS and PCA can both ﬁnd eﬀective 1D embeddingspaces (the ﬁrst coordinate in Figures 2b-c), and an exact 1D control law (Figure 2d).Next, we investigate the system ﬁrst examined by Bemporad et al. [3] and demonstrate how we canlearn the relevant control laws for the full-dimensional system, as well as ﬁnd an eﬀective reduced orderapproximation to the control law. Bemporad et al, introduced the system: x k +1 = " . − . . . x k + " . . u k (8)where again we assume that we have perfect state knowledge. The system in Equation 8 has the Hankelsingular values h . . i , suggesting that approximate model order reduction may be possible.First, consider the regulatory control problem of driving System 8 to the origin, subject to the constraint obert J. Lovelett et al. x −0.50.00.5 x −0.5 0.0 0.5 u * k −2−1012 PC1 −303

PC2 −2 0 2 u * k −2−1012 ϕ λ −0.040.000.04 ϕ λ −0.04 0.00 0.04 u * k −2−1012 −4 −2 0 2 4 PC1, ϕ −2−1012 u * k PCADMAPS (a) (b)(c) (d)

FIGURE 3 MPC control law from the the solution to the optimization problem in Equation 9, which is alinear problem with linear constraints, plotted in several coordinate systems: (a) The grid of points sampledfrom the state space, (b) the ﬁrst two principal components, and (c) the DMAPS embedding (d) The controlaction plotted against the ﬁrst PC or DMAPS eigenvector, demonstrating the existance of an approximate one-dimensional control law (rescaled for easier visualization).that − < u i < . The following constrained quadatric programming problem can be solved to provide theMPC control policy: min u V ( x , k , u ) = x Tk +10 Q t x k +10 + k +9 Õ i = k (cid:16) x Ti Q x i + u Ti R u i (cid:17) s.t. u i ≤ . , i = k , k + 1 , ..., k + 9 − u i ≤ . , i = k , k + 1 , ..., k + 9 (9)We ﬁnd the terminal cost using the same procedure as in the previous problem and, as before, Q = I and R = 0 . . Note that this is the same control problem studied by Bemporad [3], except that we expand thetime horizon from 5 to 10 time steps.We follow the same procedure as for the singularly perturbed problem to generate the control law in theoriginal space, a PCA embeddeding space, and a DMAPS embedding space, as shown in Figure . Here,we note that while once again both PCA and DMAPS ﬁnd eﬀective 2D parametrizations of the control law(Figure 3b-c), and approximate reduced order control laws (Figure 3d), the reduced order DMAPS controllaw is noticeably improved in comparison to PCA. The improvement is both because DMAPS can uncovernonlinear relationships among the variables and because we can design the kernel in a way that favors distancesin control policy space.The previous example demonstrated how DMAPS can reorganize, and reduce, that state dimension forpredicting the control action. Now, we return to the same linear system (Equation 8), but following Diangelikiset al, [9] (who found the explicit control law using multiparametric quadratic programming), we introduce Robert J. Lovelett et al. quadratic constraints. The optimization problem deﬁning the control policy is now:min u V ( x , k , u ) = x Tk +6 Q t x k +6 + k +5 Õ i = k (cid:16) x Ti Q x i + u Ti R u i (cid:17) s.t. u i ≤ . , i = k , k + 1 , ..., k + 5 − u i ≤ . , i = k , k + 1 , ..., k + 5 u k ≤ x Tk x k (10)where we have added the constraint u k ≤ x Tk x k . Q t , Q and R are found using the same approach as before; wereduced the control horizon to 6 steps (for computational eﬃciency).Again, we generate the control policy on a 20 x 20 regular grid, and ﬁnd DMAPS and PCA embeddingsof the results using the same procedure as before. Now, however, we observe in Figure 4 that nearly exactorder reduction is no longer possible. There are still three apparent critical regions seen from plotting u ∗ k as afunction of x (Figure 4a), but now two of them are paraboloids rather than planes. In any case, the ﬁrst twoprincipal components and the ﬁrst two DMAPS eigenvectors provide an appropriate parametrization (Figure4b-c). Because the control law is intrinsically 2D (even though the dynamics are approximately 1D), DMAPScannot ﬁnd an eﬀective reduced order representation; and, of course, neither can PCA (Figure 4d).The previous examples illustrate how DMAPS ﬁnds an intrinsic parametrization of the system statethat is organized by the control policy. We demonstrated how, when possible, DMAPS will suggest a lowerdimensional manifold from which we can write the control law. In the next section, we will use these sametools as presented here for a more complicated, nonlinear system, and further demonstrate how to use functionapproximation techniques to write the control law in an explicit form. | Nonlinear MPC for a Nonisothermal CSTR

We demonstrate the use of our manifold learning approach to approximate the MPC control law for a nonlinear,unstable chemical reactor system where both reference tracking and disturbance rejection is considered. TheCSTR contains a single reaction, A → B , which is exothermic with reaction rate temperature dependencehaving the Arrhenius form. We will control the concentration of species B using the temperature of thecooling water as the manipulated variable. The cooling water temperature is constrained both with minimumand maximum absolute values and with a maximum stepwise rate of change. All of the data presented areexpressed in dimensionless variables. For detailed descriptions of the model equations and the MPC parameters,refer to Appendix B and Appendix C, respectively.The nonisothermal CSTR is challenging to control partly because a large range of state space is unstable atsteady state. Figure 5 shows the bifurcation diagram of the open loop system with cooling water temperatureas the bifurcation parameter. The system contains two saddle-node bifurcations and requires feedback stabi-lization to operate at conversions between approximately 0.2 and 0.8. If using MPC, this means that shortsampling intervals are required to ensure that feedback is frequent enough to stabilize the system. Becausethe sampling interval is short, that likewise means that the control horizon must be long, so that signiﬁcantdynamics are included in the cost function. These challenges make this system an appropriate candidate totest our formulation of explicit MPC control.One of the key advantages of the manifold learning approach is that it will automatically detect equivalent obert J. Lovelett et al. x −0.50.00.5 x −0.5 0.0 0.5 u * k −101 PC1 −303

PC2 −2 0 2 u * k −101 ϕ λ −0.040.000.04 ϕ λ −0.04 0.00 0.04 u * k −101 −5.0 −2.5 0.0 2.5 5.0 PC1, ϕ −101 u * k PCADMAPS (a) (b)(c) (d)

FIGURE 4 MPC control law from the the solution to the optimization problem in Equation 10, which is alinear problem with quadratic constraints, plotted in several coordinate systems: (a) The grid of pointssampled from the state space, (b) the ﬁrst two principal components, and (c) the DMAPS embedding. (d)The control action plotted against the ﬁrst PC or DMAPS eigenvector, indicating that the control action cannot be written as of function of the ﬁrst PC or DMAPS eigenvector, and therefore a reduced order formwould be a poor approximation (rescaled for visualization) −2.0 −1.5 −1.0 −0.5 0.0 0.5 1.0 1.5 2.0 T c C B StableUnstable

FIGURE 5 Bifurcation diagram showing steady-state concentration of species B (the controlled variable) asa function of the cooling water temperature (the manipulated variable). Robert J. Lovelett et al. ϕ λ −0.250.000.25 ϕ λ −0.25 0.00 0.25 ϕ λ −0.250.000.25 DMAPS E bedding: z=x *α ϕ λ −0.250.000.25 ϕ λ −0.25 0.00 0.25 ϕ λ −0.250.000.25 DMAPS E bedding: z=x *β ϕ λ −0.250.000.25 ϕ λ −0.25 0.00 0.25 ϕ λ −0.250.000.25 DMAPS E bedding: z=x *γ Eigenvector Index R e s i dua l x *α x *β x *γ −2−1012 C on t r o l P o li cy (a) (b)(c) (d) FIGURE 6 DMAPS embedding of the MPC systems showing the training data using three alternativestate parametrizations (a) x ∗ α , (b) x ∗ β , and (c) x ∗ γ and the ﬁrst 10 steps of the control policy ( u ∗ ) in thedistance metric (see text). (d) Residuals from local linear regression for each parametrization in (a)-(c)suggesting that the ﬁrst, second, and seventh eigenvectors are the best parametrization of the underlyingmanifold (see Equation 17 and Reference [11]). Arguably, only the ﬁrst two eigenvectors are necessary,though we observe improved prediction accuracy when including the next nonredundant eigenvector. ϕ λ −0.250.000.25 ϕ λ −0.250.000.25 ϕ λ −0.250.000.25 DMAPSColored by: u *k ϕ λ −0.250.000.25 ϕ λ −0.250.000.25 ϕ λ −0.250.000.25 DMAPSColored by: u *k+5 ϕ λ −0.250.000.25 ϕ λ −0.250.000.25 ϕ λ −0.250.000.25 DMAPSColored by: u *k+10 C A T r r OriginalCoordinates −2−1012 C on t r o l P o li cy (a) (b)(c) (d) FIGURE 7 Training data using x ∗ α : (a-c) The ﬁrst three DMAPS intrinsic coordinates learned using theinformed metric and colored by the ﬁrst, ﬁfth, and tenth steps of the control policy (d) The originalaugmented states, x ∗ α , colored by u ∗ k (the ﬁrst step of the control policy). obert J. Lovelett et al. GPPR GPANN ANNPR ANNANN10 −2 −1 M SE Prediction of u *k GPPR GPANN ANNPR ANNANN10 −2 −1 M SE Prediction of u *k+5

GPPR GPANN ANNPR ANNANN10 −2 −1 M SE Prediction of u *k+10

PRGP PRANN ANNGP ANNANN10 −2 −1 M SE Prediction of C a PRGP PRANN ANNGP ANNANN10 −2 −1 M SE Prediction of T r PRGP PRANN ANNGP ANNANN10 −2 −1 M SE Prediction of r Predict ϕ →Predict u * →Predict ϕ →Predict x * → (a) (b) (c)(d) (e) (f) FIGURE 8 Model performance using test data: (a-c) Mean squared prediction error for diﬀerent explicitapproximations of the ﬁrst, ﬁfth, and tenth steps of the control policy using various function approximationtechniques. (d-f) Mean squared prediction error for the inverse problem (ﬁnding the augmented states fromthe control policy), using various function approximation techniques.or redundant descriptions of the state space in a purely data-driven manner. For example, we know fromtheory that our process model is intrinsically second order and can be described using the physical variablesof C A and T (see Appendix B). For more complicated processes, however, we may not recognize the intrinsicstate space and instead model our process using whichever measured variables are at hand to represent it.For a purely ANN-driven approach, using such alternative (and possibly redundant) parametrizations mayrequire redesigning the network topology for the new problem, and certainly requires retraining the network.The manifold learning approach, however, will always detect an intrinsic 2D space, regardless of how the statespace was parametrized.To illustrate, we consider three alternative parametrizations of state space: (1) x α = h C a T r i , the originalvariables used for modeling the system, (2) x β = h ( k C a ) (cid:16) qV ( T − T r ) − ∆ HρC p C A (cid:17)i , the reaction rate and heatingrate (see Appendix B) and (3) x γ = h x α x β i , the concatenation of the other two parametrizations. Byconcatenating the reference variable to each of these, we deﬁne three augmented state vectors, x ∗ α , x ∗ β and x ∗ γ , any one of which is a complete description of the system state and suﬃcient (in principle) to predict thecontrol policy.To discover M c , the control manifold, we need to sample the augmented state space and compute thecontrol policies oﬀ-line. We sampled 200 points randomly in augmented state space, and evolved the systemin time for 20 time steps using a nonlinear model predictive controller with time horizon of 20 time steps.Taken together, we have 4000 samples, and we collected all of the control policies in matrix U ∗ ∈ Ò × andaugmented state variables in matrices X ∗ α ∈ Ò × , X ∗ β ∈ Ò × , and X ∗ γ ∈ Ò × . To test our tools forfunction approximation, we randomly partitioned the data into 3000 points for training and 1000 points fortesting.We apply DMAPS three times using the informed metric from Equation 4 to the downsampled policymatrix U ∗ with each of the augmented state matrices X ∗ α , X ∗ β , and X ∗ γ . The results for each case are shown Robert J. Lovelett et al. in Figure 6, which shows augmented state variables projected into the intrinsic space and colored by u ∗ k .These results demonstrate that DMAPS can ﬁnd an eﬀective, possibly lower dimensional, intrinsic manifoldregardless of which state variables are made explicit, from which the control policy can be predicted. For eachof the diﬀerent parametrizations, we ﬁnd that the ﬁrst three eigenvalues are the best intrinsic parametrization(arguably, based on the residuals in Figure 6d, only the ﬁrst 2 DMAPS eigenvectors are needed—however, weobserve in practice that including the third improves prediction accuracy).Using x ∗ α (for example) as the augmented state variables, Figure 7 shows that given the position in DMAPScoordinates, we can predict the control policy—not just the control law, u ∗ k —using our intrinsic parametriza-tion. The policy space can be eﬀectively parametrized using only 3 intrinsic dimensions, φ = h φ φ φ i .Facilitated prediction using this intrinsic space parametrization is possible because the mapping from φ to u ∗ is much simpler than from x ∗ α to u ∗ (as is visually evident by comparing Figure 7a and 7d). Therefore, simplerfunctions can be used to determine the control policy when using the latent space rather than the originalspace.Now, using x ∗ α as inputs, we design explicit feedback control laws, i.e., functions c ( x ∗ α ) : X ∗ α → U . Thistask is divided into two stages: ﬁrst, estimate the intrinsic variables φ given the augmented state variables x ∗ α ,and second, predict u ∗ k given φ . We emphasize that because we transformed to the intrinsic variables, we canalso easily predict several steps of the control policy, rather than just the ﬁrst step.For estimating φ from x ∗ α , we use two methods presented in Section 2: ANNs and GPs. We write theneural network model as: ˆ φ AN N = p ( x ∗ α ; W ) (11)where p is an artiﬁcial neural network with eight hidden layers of 20 nodes each, three input nodes correspond-ing to each element in x ∗ α , three output nodes corresponding to each element in φ , and W is the weight matrixfor the neural network. Rectiﬁed linear activation functions (which have become a de facto standard in deeplearning [22]) were used for each of the nodes in the hidden layers: f ( x ) = max ( , x ) (12)To facilitate the use of the activation function in Equation 12 (which will never predict a negative output),all of the data variables were linearly rescaled from 0 to 1, and scaled back to their original ranges forpresentation. Equation 11 was built using pyTorch and trained using the Adam optimizer (a similar algorithmto stochastic gradient descent) with learning rate of × − [19, 29]. Alternatively, we predict φ from x ∗ α using GP regression. Using the Matérn covariance function in Equation 5, we optimize hyperparameters forprediction over our training data set by minimizing negative log marginal likelihood [33] to obtain ˆ φ GP .With intrinsic variables ˆ φ in hand using one of the three methods above, we now predict u ∗ . We use cubicPR (higher order polynomial regression was tested, with no improvement) and an ANN model. As before,the ANN model uses the rectiﬁed linear activation function from Equation 12 (with inputs and outputsappropriately rescaled), eight hidden layers of 20 nodes each, three input layers for φ , and 20 output neuronsfor u ∗ .Using two of the techniques for predicting φ and two of the techniques for predicting u ∗ , we have designed obert J. Lovelett et al. C B Implicit MPCApproximate Explicit MPCReference trajectory0.20.40.6 T r Time T c FIGURE 9 Comparison of implicit MPC controller with DMAPS-enabled approximate explicit nonlinearMPC controller using GP regression to predict φ from x ∗ α and an ANN model to predict u ∗ k from φ .four explicit MPC controllers. Their prediction accuracy is shown for the ﬁrst thre steps of the control policyin Figure 8a-c. All of the methods provide comparable prediction accuracy—though we comment that wedid not spend eﬀort optimizing neural network performance and may see some improvement with diﬀerentarchitectures, activation funcions, or optimization procedures. Additionally, we did not prioritize learningany particular step of the control policy—if we had weighted the loss function to give more emphasis on (forexample) u ∗ k than the subsequent steps, we likely would have lower error for u ∗ k (which deﬁnes the control law).We also solve (where feasible) the inverse problem: deducing the augmented state space from the controlpolicy, by following the same procedure as above in reverse. We ﬁnd PR and ANN models to predict φ from u ∗ , and then ﬁnd ANN and GP models to predict x ∗ α from φ . For this second task, we note that when thecontrol policy pushes against a constraint (deﬁned here as u ∗ k = − . or u ∗ k = 2 . ) predicting the augmentedstate accurately becomes infeasible because the transformation is not invertible. Figure 8d-f shows the meansquared error of our state predictions based on observations of control policies for test points that are notagainst constraints.Finally, we test the control laws we developed on-line using the full simulated CSTR model. Figure 9shows the performance of our DMAPS enabled, explicit nonlinear model predictive controller, in comparisonto the implicit (and therefore exact) MPC controller. We use state observations, x ∗ α , k , to predict ˆ φ k using GPregression and then an ANN models to predict ˆ u ∗ k at each time step. The reference trajectory is open-loopunstable (see Figure 5), and white Gaussian noise is added to each state as a disturbance. Still, the controllercan maintain the process near the reference value, as well as eﬀectively track set point changes, while closelymatching the implicit controller. | CONCLUSIONS AND FUTURE DIRECTIONS

In this paper, we have developed and demonstrated a data-driven approach to designing explicit model pre-dictive controllers. We uncover intrinsic, low-dimensional structure in the high-dimensional control policiesand state vectors that provides a link between state space and policy space. Our manifold learning method is Robert J. Lovelett et al. agnostic to the particular parametrization of state space, and equally valid regardless of which state variablesare available (as long as we have suﬃciently many). Furthermore, because the similarity measure used byDMAPS is designed to favor similarities in policy space, predicting the entire control policy becomes aboutas easy as predicting just its ﬁrst step. Like other approaches to explicit MPC, by developing functions be-tween the state space and control action, we avoid the need for on-line optimization. Although our tutorialdemonstrations were for single-input, single-output control problems, our framework naturally generalizes tomultiple-input, multiple output systems—in fact, we showed that we can easily generate multiple outputs inthat we can predict the full time series of control actions.We showed how the DMAPS algorithm will identify redundant information in the control policy space,and thus can identify a reduced order control law even when the augmented state space is densely sampled.The reduced order model can be due to unbalanced coordinates, singular perturbation, or other redundantspeciﬁcations of the systems. As illustrated by the quadratically constrained problem, we notice that by directlylearning the manifold on which the control policy lies, rather than the manifold on which the dynamics lie, weautomatically detect cases where even though the dynamics are low order, the control policy (and therefore,the relevant MPC problem) is not low dimensional due to constraints.Ensuring constraint satisfaction is a challenge for any approach that relies on function approximation. Forcases where it is feasible, we could use Dykstra’s projection, to project the control law into the feasible region,as recommended by Chen et al [4].Here, we assumed that we have access to the full system state, x , either via direct measurement orfrom a state estimator, at each sampling point. Inspired by delay embedding theorems, such as that ofTakens [37], we expect that we do not need full state feedback to apply our methodology. Much like a stateestimator synthesizes information from histories of measurements to “observe” the system state, we coulddirectly “observe” the control policy using this information. In future work, we will investigate how to builda purely data-driven “policy observer,” where the MPC policy is predicted from only measured data. Weanticipate that, as long as the usual state observability conditions are satisﬁed, the control policy will beequally observable.Alternatively, there may be situations where feedback is infrequent, but actuation is comparatively fast.Here, we could use our framework to design an explicit controller that takes control action in between stateobservations. Our procedure is especially well-suited for such systems because we can easily estimate the full, N -step control policy, and not only the ﬁrst step like other explicit controllers that use function approximation.Finally, even if not accurate enough to implement, our data-inferred control policy can conceivably be usedfor “smarter” initialization of the optimization algorithm in implicit MPC.We also demonstrated how, given observations of the control policy, we can predict the system state.From the perspective of process control, where the objective is developing a feedback control law, the inverseproblem may not appear to be of interest. Yet, we can imagine a system in which the “feedback” is easy toobserve, while the system state is not. For example, we may consider an “expert human” controller, or a neuralnetwork controller. Using this approach, we could design a state observer that uses control observations todetermine that state, instead of output measurements like in conventional state observers. Acknowledgements

The assistance of Dr. Mahdi Kooshkbaghi with ANN training is gratefully acknowledged. obert J. Lovelett et al. references [1] Bernt M. Åkesson and Hannu T. Toivonen. A neural network model predictive controller. Journal of ProcessControl , 16(9):937–946, 2006.[2] Mikhail Belkin and Partha Niyogi. Laplacian Eigenmaps for Dimensionality Reduction and Data Representation.

Neural Computation , 15(6):1373–1396, 2003.[3] Alberto Bemporad, Manfred Morari, Vivek Dua, and Efstratios N. Pistikopoulos. The explicit linear quadraticregulator for constrained systems.

Automatica , 38(1):3–20, 2002.[4] Steven Chen, Kelsey Saulnier, Nikolay Atanasov, Daniel D. Lee, Vijay Kumar, George J. Pappas, and ManfredMorari. Approximating Explicit Model Predictive Control Using Constrained Neural Networks. In

AmericanControl Conference , pages 1520–1527, 2018.[5] Panagiotis D. Christoﬁdes, Riccardo Scattolini, David Muñoz de la Peña, and Jinfeng Liu. Distributed modelpredictive control: A tutorial review and future research directions.

Computers and Chemical Engineering ,51:21–41, 2013.[6] R. R. Coifman, S. Lafon, A. B. Lee, M. Maggioni, B. Nadler, F. Warner, and S. W. Zucker. Geometric diﬀusionsas a tool for harmonic analysis and structure deﬁnition of data: diﬀusion maps.

Proceedings of the NationalAcademy of Sciences of the United States of America , 102(21):7426–31, 2005.[7] Ronald R. Coifman and Stephane Lafon. Diﬀusion maps.

Applied and Computational Harmonic Analysis ,21(1):5–30, 2006.[8] Ronald R. Coifman and Stephane Lafon. Geometric harmonics: A novel tool for multiscale out-of-sampleextension of empirical functions.

Applied and Computational Harmonic Analysis , 21(1):31–52, 2006.[9] Nikolaos A. Diangelakis, Iosif S. Pappas, and Efstratios N. Pistikopoulos. On multiparametric/explicit NMPCfor Quadratically Constrained Problems.

IFAC-PapersOnLine , 51(20):400–405, 2018.[10] Luis F. Domínguez, Diogo A. Narciso, and Efstratios N. Pistikopoulos. Recent advances in multiparametricnonlinear programming.

Computers and Chemical Engineering , 34(5):707–716, 2010.[11] Carmeline J. Dsilva, Ronen Talmon, Ronald R. Coifman, and Ioannis G. Kevrekidis. Parsimonious represen-tation of nonlinear dynamical systems through manifold learning: A chemotaxis case study.

Applied andComputational Harmonic Analysis , pages 1–15, 2015.[12] Carlos E. García, David M. Prett, and Manfred Morari. Model predictive control: Theory and practice–A survey.

Automatica , 25(3):335–348, 1989.[13] Trevor Hastie, Robert Tibshirani, and Jerome Friedman.

The Elements of Statistical Learning . Springer Seriesin Statistics. Springer New York Inc., New York, NY, USA, 2001.[14] Alexander Holiday, Mahdi Kooshkbaghi, Juan M. Bello-Rivas, C. William Gear, Antonios Zagaris, and Ioannis G.Kevrekidis. Manifold learning for parameter reduction.

Journal of Computational Physics , 392:419–431, 2019.[15] K. J. Hunt, D. Sbarbaro, R. Żbikowski, and P. J. Gawthrop. Neural networks for control systems - A survey.

Automatica , 28(6):1083–1112, 1992.[16] Eric Jones, Travis Oliphant, Pearu Peterson, et al. SciPy: Open source scientiﬁc tools for Python, 2001–.[17] Peter W. Jones, Mauro Maggioni, and Raanan Schul. Manifold parametrizations by eigenfunctions of theLaplacian and heat kernels.

Proceedings of the National Academy of Sciences of the United States of America ,105(6):1803–1808, 2008.8

Robert J. Lovelett et al. [18] Benjamin Karg and Sergio Lucia. Eﬃcient representation and approximation of model predictive control lawsvia deep learning. arXiv e-prints , page arXiv:1806.10644, Jun 2018.[19] Diederik P. Kingma and Jimmy Ba. Adam: A Method for Stochastic Optimization. arXiv e-prints , pagearXiv:1412.6980, Dec 2014.[20] Dieter Kraft. A Software Package for Sequential Quadratic Programming. Technical report, Germany AerospaceCenter – Institute for Flight Mechanics, Cologne, Germany, 1988.[21] Stephane Lafon.

Diﬀusion Maps and Geometric Harmonics . PhD thesis, Yale University, 2004.[22] Yann Lecun, Yoshua Bengio, and Geoﬀrey Hinton. Deep learning.

Nature , 521(7553):436–444, 2015.[23] David Q. Mayne. Model predictive control: Recent developments and future promise.

Automatica , 50(12):2967–2986, 2014.[24] David Q. Mayne, James B. Rawlings, Crhistopher V. Rao, and Pierre O.M. Scokaert. Constrained modelpredictive control: Stability and optimality.

Automatica , 36(6):789–814, 2000.[25] Edward S. Meadows and James B. Rawlings. Model Predictive Control. In Michael A. Henson and Dale E.Seborg, editors,

Nonlinear Process Control , chapter 5, pages 233–310. Prentice Hall, Englewood Cliﬀs, NJ,1996.[26] Bruce C Moore. Principal component analysis in linear systems: controllability, observability, and model reduc-tion.

IEEE Trans. Automat. Control , 26(1):17–32, 1981.[27] Michael A. Nielsen.

Neural Networks and Deep Learning . Determination Press, 2015.[28] Richard Oberdieck, Nikolaos A. Diangelakis, and Efstratios N. Pistikopoulos. Explicit model predictive control:A connected-graph approach.

Automatica , 76:103–112, 2017.[29] Adam Paszke, Gregory Chanan, Zeming Lin, Sam Gross, Edward Yang, Luca Antiga, and Zachary Devito.Automatic diﬀerentiation in PyTorch.

Advances in Neural Information Processing Systems 30 , pages 1–4,2017.[30] Karl Pearson. On lines and planes of closest ﬁt to systems of points in space.

The London, Edinburgh, andDublin Philosophical Magazine and Journal of Science , 2(11):559–572, 1901.[31] S. Joe Qin and Thomas A. Badgwell. A survey of industrial model predictive control technology.

ControlEngineering Practice , 11:733–764, 2003.[32] Neta Rabin and Ronald R. Coifman. Heterogeneous datasets representation and learning using diﬀusion mapsand Laplacian pyramids. In

Proceedings of the 2012 SIAM International Conference on Data Mining , pages189–199, 2012.[33] Carl Edward Rasmussen and Christopher K. I. Williams.

Gaussian Processes for Machine Learning . The MITPress, 2005.[34] James B. Rawlings and Christos T. Maravelias. Bringing new technologies and approaches to the operation andcontrol of chemical process systems.

AIChE Journal , 65(6):e16615, 2019.[35] Sam T Roweis and Lawrence K Saul. Nonlinear Dimensionality Reduction by Locally Linear Embedding.

Science ,290(December):2323–2326, 2000.[36] Riccardo Scattolini. Architectures for distributed and hierarchical Model Predictive Control - A review.

Journalof Process Control , 19(5):723–731, 2009. obert J. Lovelett et al.

Dynamical systems and turbulence, Warwick1980 , pages 366–381. Springer, 1981.[38] Joshua B. Tenenbaum, Vin De Silva, and John C. Langford. A Global Geometric Framework for NonlinearDimensionality Reduction.

Science , 290(December):2319–2323, 2000.[39] Petter Tøndel, Tor Arne Johansen, and Alberto Bemporad. An algorithm for multi-parametric quadratic pro-gramming and explicit MPC solutions.

Automatica , 39(3):489–497, 2003.[40] L. J. P. Van Der Maaten and G. E. Hinton. Visualizing high-dimensional data using t-sne.

Journal of MachineLearning Research , 9:2579–2605, 2008.[41] Inga J. Wolf and Wolfgang Marquardt. Fast NMPC schemes for regulatory and economic NMPC – A review.

Journal of Process Control , 44:162–183, 2016.[42] Melanie N. Zeilinger, Davide M. Raimondo, Alexander Domahidi, Manfred Morari, and Colin N. Jones. Onreal-time robust model predictive control.

Automatica , 50(3):683–694, 2014. A | DIFFUSION MAPS

DMAPS is an algorithm for manifold learning [6, 7]. We assume that the data points X ∈ Ò N all lie on asmooth low dimensional manifold, M , such that dim (M) ≪ N . The algorithm works by ﬁnding data-drivenapproximations for eigenfunctions of the Laplace-Beltrami (i.e., diﬀusion) operator on M . One can show thatthese eigenfunctions provide an eﬀective minimal parameterization of M [17].To approximate the eigenfunctions, we ﬁnd eigenvectors of the appropriately normalized graph Laplacian,which is constructed as follows. Given m observations of the data, deﬁne a kernel matrix, W ∈ Ò m × m : W ij = exp (cid:0) − d i , j (cid:1) , i , j = 1 , ..., m (13)where d i , j is a metric representing distance between data points i and j that is often the Euclidean distance,though in this work we use an input-output informed metric deﬁned in Equation 4. To account for nonuniformsampling, the kernel matrix is normalized with P ii = Í mk =1 W ik using: ˜ W = P − α WP − α (14)where for isotropic DMAPS, α = 1 . Next, we deﬁne a diagonal matrix from the row sums of the kernel matrix: D ii = Õ j ˜ W ij . (15)Finally, we construct the Markov transition matrix: A = D − ˜ W , (16) Robert J. Lovelett et al. which is the desired graph Laplacian. It has been shown that in the limit as ε → and m → ∞ , A converges tothe Laplace-Beltrami operator on M [7]. Because A is a Markov matrix, its eigenvalues ( λ ) are real-valued andvary between 0 and 1 and its eigenvectors ( Φ , stacked column-wise by convention) are real-valued; the trivialeigenvector, φ = 0 has eigenvalue λ = 1 . We discard φ and sort the remaining eigenvectors { φ i } , i = 1 , ... m in order of descending eigenvalue. The ﬁrst several eigenvectors provide an eﬀective parameterization of themanifold, which call the DMAPS embedding of the data. Sometimes, to emphasize a spectral gap, we will use φ i λ ki (where k = 1 , , ... ) as the DMAPS embedding.Some of the DMAPS eigenvectors may simply be harmonics that provide no new information about M . These can be safely discarded either by visual inspection or (more systematically) by using local linearregression to test whether a new eigenvector can be predicted using information from the previous eigenvectors.When using local linear regression, Dsilva et al. deﬁne a relative leave-one-out cross validation residual foreach eigenvector as [11]: R k = vuuut Í ni =1 (cid:16) φ k ( i ) − ˆ α k ( i ) + ˆ β k ( i ) T Φ k − ( i ))) (cid:17) Í ni =1 ( φ k ( i )) (17)where ˆ α and ˆ β are coeﬃcients from regression and Φ k − is a matrix containing the ﬁrst k − eigenvectors.If R k ≪ , then φ k can be predicted from the previous eigenvectors and therefore provides only redundantinformation. B | NONISOTHERMAL CSTR MODEL

The MPC controller for the CSTR is based on a mechanistic model of a constant density reactor (c.f. [25]).The model equations are: Û C A = qV ( C A − C A ) − k C A Û T r = qV ( T − T r ) − ∆ HρC p k C A + U AρC P V ( T c − T r ) C B = C A − C A (18) C i for i ∈ { A , B } are the species concentrations; C A is the concentration of species A at the inlet (concentrationsof species B is zero at the inlet). T r is the reactor temperature, T is the inlet temperature, and T c is the coolingwater temperature— T c is used here as the manipulated variable. q is the ﬂow rate, V is the reactor volume, C p is the heat capacity, ρ is the density, U is a heat transfer coeﬃcient, A is the heat transfer area, and ∆ H is the reaction enthalpy (note that ∆ H is a negative number as the reaction is exothermic). The reaction rateconstants, k are given as an Arrhenius expression: k = k e − ERTr (19) obert J. Lovelett et al. TABLE 1 Parameter values for the CSTR model.Parameter Value Units ER k e . s − HρC p -16.0 K l mol − UAρC p V − qV − C A − T k is the pre-exponential factor, E is the activation energy, R is the gas constant. Parameter values aregiven in Table 1. To improve numerical performance, we nondimensionalized the concentration variables using ˆ C i = C i mol L − for i ∈ { A , B } and temperature variables using ˆ T i = T i − K K for i ∈ { r , c } . All of the resultsshown in Section 3 use the nondimensionalized variables. C | MPC CONTROLLER

To build the datasets, U ∗ , X ∗ α , X ∗ β , and X ∗ γ , we designed a fully nonlinear MPC controller. The system wasdiscretized with a sampling time of 0.05 s. The control policy was obtained from the current system state bysolving the following optimization problem:min u V ( x , k , u ) = k +19 Õ i = k (cid:0) C B , i − r (cid:1) s.t. u i ≤ . , i = k , k + 1 , ..., k + 19 u i ≥ − . , i = k , k + 1 , ..., k + 19 | u i − u i +1 | ≤ . , i = k , k + 1 , ..., k + 19 (20)which indicates a control horizon of 20 time steps (1.0 s continuous time), constraints on the maximum andminimum values of u , and constraints on the rate of change of u . In Equation 20, x = h C A T r i and u k = T c , k .The optimization problem was solved using a sequential quadratic programming algorithm as implemented inthe SciPy numerical computing package [16, 20].The system was initialized at random states uniformly sampled from C A ∈ [ . , . ] and T ∈ [ . , . ] using random constant references r ∈ [ . , . ]]