Feedback Linearization based on Gaussian Processes with event-triggered Online Learning
FFeedback Linearization based on GaussianProcesses with event-triggered Online Learning
Jonas Umlauft,
Student Member, IEEE , Sandra Hirche,
Senior Member, IEEE
Abstract —Combining control engineering with nonparametricmodeling techniques from machine learning allows to controlsystems without analytic description using data-driven models.Most existing approaches separate learning , i.e. the systemidentification based on a fixed dataset, and control , i.e. theexecution of the model-based control law. This separation makesthe performance highly sensitive to the initial selection of trainingdata and possibly requires very large datasets. This articleproposes a learning feedback linearizing control law using onlineclosed-loop identification. The employed Gaussian process modelupdates its training data only if the model uncertainty becomestoo large. This event-triggered online learning ensures high dataefficiency and thereby reduces the computational complexity,which is a major barrier for using Gaussian processes under real-time constraints. We propose safe forgetting strategies of datapoints to adhere to budget constraint and to further increasedata-efficiency. We show asymptotic stability for the trackingerror under the proposed event-triggering law and illustrate theeffective identification and control in simulation.
Index Terms —adaptive control, machine learning, switchedsystems, uncertain systems, closed loop identification, data-drivencontrol, online learning, Gaussian processes, event-based control
I. I
NTRODUCTION D ATA-DRIVEN control gained high attention as costsfor measuring, processing and storing data is rapidlydecreasing and control engineering is increasingly applied inareas where it is difficult to describe the plant using firstprinciples. Nevertheless, a precise system description is essen-tial for many modern model-based control algorithms as e.g.model predictive control and feedback linearization. Classicalsystem identification using parametric models like autoregres-sive moving average (ARMA) or Hammerstein models [1],reaches its limits when the choice of a suitable model classis cumbersome or impossible, e.g. in systems where humanbehavior is part of the control loop. That is where data-drivennonparametric models have their advantages as only minimalprior knowledge is required and allow higher flexibility thanparametric models.This article particularly considers Gaussian processes (GPs)which are well recognized in machine learning and control formodeling complex dynamics [2]. The Bayesian background
All authors are members of the Chair of Information-orientedControl, Department of Electrical and Computer Engineering,Technical University of Munich, D-80333 Munich, Germany,fax: +498928928340, telephone: +498928923403, [jonas.umlauft,hirche]@tum.de
The research leading to these results has received funding from theEuropean Research Council under the European Union Seventh FrameworkProgram (FP7/2007-2013) / ERC Starting Grant “Control based on HumanModels (conhumo)” agreement no. 337654.The work has been submitted to the IEEE Transactions on AutomaticControl on Dec 10th, 2018. unknownplantmodel-basedcontrol (Sec. IV)data-drivenmodel (Sec. III) event trigger(Sec. V)datadesiredtrajectory output
Fig. 1. Proposed concept of an online learning control law with event-triggered model updates. allows an implicit bias-variance trade-off [3] and as GPs area kernel based method, prior knowledge (if any exists) canproperly be transferred into the model [4]. The major advan-tage is, that GP models also encode their own ignorance andtherefore provide information whether the model is reliablefor particular inputs or not.Due to their nonparametric nature, the model complexityof a GP increases with the number of available data. Thispossible unlimited expressive power is generally desired,however may cause difficulties from a computational pointof view in the case of large training data sets. Particularlychallenging are online learning schemes where data points areaccumulated over time and real-time capability is critical. Thisraises the question of efficient online learning strategies fornonparametric models. Time-triggered model adaptation failsto distinguish whether a new measurement or training pointis necessary at the current location of the state-space or not.This calls for an event-triggered scheme, which decides upon anew measurement based on the current reliability of the model,which is expected to result in higher data-efficiency.
A. Related work
The fact that no model can initially capture all aspects ofthe true system motivated robust and adaptive control methodsto overcome this discrepancy [5]. The online adaptation ofthe control strategy or its employed model is well under-stood for parametric models [6], [7]. In particular for linearsystems, data-driven approaches are extensively researched,see [8], and [9]. For nonlinear systems, model referenceadaptive control (MRAC) is designed to effectively deal withmodel uncertainties or only little prior knowledge using onlineparameter estimation [10]. Iterative learning control (ILC)improves control performance by iteratively modulating thecontrol signal in a repetitive task, such that experience fromearlier executions are used to improve performance, see [11]and [12]. However, most existing MRAC and ILC methods a r X i v : . [ ee ss . S Y ] N ov re mainly based on parametric models which suffer fromlimited complexity and flexibility. Model-free adaptive control(MFAC) avoids an explicit model but instead employs e.g.a dynamic linearization [13], virtual reference feedback tun-ing [14] or a closed-loop control parameter optimization [15].Alternatively, a spectral analysis for nonparametric frequency-domain tuning is considered [16] or extremum seeking isemployed for performance optimization [17].Other than in classical control theory, the machine learningliterature employs more frequently data-driven models withinfinite expressive power for online adaptation [18]. The classof model-based reinforcement learning algorithms considerscontinuous model and controller updates to maximize a re-ward [19]. For example [20] shows a high data efficiency withGaussian process models. These are also successfully appliedin robotics [21], [22], however most approaches miss a formalstability analysis for the system’s behavior.Very recently, several control approaches with formal guar-antees for GP models have been developed which, how-ever, keep a fixed dataset during execution of the controllaw [23], [24]. The work in [25] considers the control ofLagrangian systems and shows boundedness of the trackingerror. The identification of a priori known stable systems withGPs is analyzed in [26], and [27] proposes an uncertainty-based control approach for which asymptotic stability isproven. However, none of these techniques updates the modelwhile controlling the system. The work in [28] proposes asafe exploration by sequentially adding training points to thedataset, but it only stays within the region of attraction andcannot track an arbitrary trajectory in the state space.An online learning tracking control law with with time-triggered adaptation is proposed in [29]. As a result datapoints are added to the training dataset irrespectively of theirimportance. This might compromise real-time capability asthe computational inefficiency for large datasets is a knownchallenge of GPs [3].This difficulty in circumvented in [30], [31], where theunknown dynamics is estimated using high gain filters. How-ever, these approaches suffer from the known difficulties ofhigh gain control, i.e. a quick saturation of input signalsand the amplification of noise. The latter is avoidable bycombining feedback and model-based feedforward control.This idea is not just employed in this article, but was alsoused in [32], where a neural network identifies the dynamicswithout any parametric prior knowledge. Particularly [33], [34]and [35] focus on stability and performance guarantees. Thework in [36] proposes a feedback linearizing control law,which adapts online the weights of a neural network model. Itshows boundedness of the adaptation law and the resultingcontroller but cannot quantify the ultimate bound becauseneural networks - in comparison to GPs - do not inherentlyprovide a measure for the fidelity of the model [37]. Thisbecomes important, if the controller is applied in safety criticaldomains, where the tracking error must be quantified to avoidfailure or damage to the system.In summary, to date there exists no approach, which adapts anonparametric model online to guarantee asymptotic stabilityof the tracking error. Thus for universal models, which can represent arbitrarily complex dynamics, there are missingonline learning control laws, to guarantee safe behavior ofthe closed-loop system. Also a data-efficient update strategyis required to keep the model computationally efficient, whichis important in many real-time critical applications. B. Contribution and structure
The main contribution of this article is an online learningfeedback linearizing control law based on Gaussian processesfor an initially unknown system. This control algorithm in-cludes a closed-loop identification scheme for control affinesystems exploiting compound kernels for GPs. To ensure data-efficiency of the approach, we propose an event-triggered on-line learning mechanism which decides upon a model updatebased on its current reliability. The derivation is based on aprobabilistic upper bound for the model error of a GP, andallows to provide safety guarantees in terms of convergenceproperties of the closed-loop system. For noiseless trainingdata, we show global asymptotic stability and for noisy outputtraining data global ultimate boundedness of the trackingerror. For the case of a constraint budget for data points, wepropose a forgetting strategy, which maintains the convergenceguarantees using a reduced number of training points.The article is based on the preliminary work in [38], whichfocuses on the identification of a control affine system withGPs given a fixed dataset. In contrast, this work considersthe online collection of data and updates the model while thecontrol law is active. This allows to show asymptotic stabilitywith a data-efficient event-triggered update rule while [38]only showed existence of an ultimate bound.This article is structured as follows: After formulating theconsidered problem formally in Sec. II, Sec. III reviewsthe identification of control affine systems based on GPs.In Sec. IV, the feedback linearizing tracking control lawis proposed including a convergence analysis for trainingdata measured online at arbitrary time instances. Section Vintroduces an event-triggering to update the model based onits uncertainty. A numerical illustration is provided in Sec. VIfollowed by a conclusion in Sec. VII.
C. Notation
Lower/upper case bold symbols denote vectors/matrices, R +,0 / R + all real positive numbers with/without zero, N / N all natural numbers with/without zero, σ min ( · ), σ max ( · ) theminimal/maximal singular value of a matrix and E [ · ] / V [ · ] theexpected value/variance of a random variable, respectively. I n denotes the n × n identity matrix, N ( µ , σ ) a Gaussiandistribution with mean µ and variance σ , a n the first n elements of the vector a , · (cid:31) the positive definiteness ofmatrix or function and (cid:107) · (cid:107) the Euclidean norm if not statedotherwise.I. P ROBLEM F ORMULATION
Consider a single-input system in the controllable canonicalform ˙ x = x ˙ x = x · · · ˙ x n = f ( x ) + g ( x ) u , x = x (0), (1)with state x = [ x x · · · x n ] (cid:124) ∈ X ⊆ R n and in-put u ∈ U = R ; the functions f ( · ) and g ( · ) are consideredunknown. The following assumptions are made. Assumption 1:
The unknown functions f : X → R and g : X → R are globally bounded and differentiable.Differentiability is a very natural assumption, as it holdsfor many physical systems. The boundedness of the func-tions f ( · ), g ( · ) would automatically be implied (due to thedifferentiability) if the set X was bounded. However, wewant X to be possibly unbounded.From Assumption 1, the first property is derived. Lemma 1:
Consider the system (1) under Assumption 1 withbounded and continuous u ( x ) . Then the solution x ( t ) does nothave a finite escape time, thus (cid:64) t ∞ , < t ∞ < ∞ for which lim t → t ∞ (cid:107) x ( t ) (cid:107) = ∞ . (2) Proof:
According to [39, Theorem 3.2] the stated con-ditions ensure a unique solution x ( t ) , for all t > forwhich the finite escape time follows from the differentiabilityof f ( · ), g ( · ) and the bounded control input.As a stabilizing controller is not known in advance (be-cause f ( · ), g ( · ) are unknown), the absence of a finite escapetime is important: It allows to collect observations of thesystem in any finite time interval with a “poor” controller (oralso u ( x ) = 0 ) without risking damage due to “infinite” states.Additionally, we assume the following. Assumption 2:
For system (1) holds g ( x ) > ∀ x ∈ X .This ensures that the system’s relative degree is equal tothe system order n for all x ∈ X and the sign of g ( · ) is known. Equivalently, g ( · ) can also be taken as strictlynegative resulting in a change of sign for the control input.Assumption 2 is necessary to ensure global controllability andexcludes the existence of internal dynamics. It restricts thesystem class, however the focus on this work is on the onlinelearning control and extending it to a larger system classes ispart of future work.We assume that observations are taken online while theproposed control law is active. Assumption 3:
Noiseless measurements of the state vec-tor x ( κ ) = x ( t κ ) and noisy measurements of the high-est derivative y ( κ ) = ˙ x n ( t κ ) + (cid:15) ( κ ) can be taken at arbi-trary time instances t κ with κ ∈ N . The observationnoise (cid:15) ( κ ) ∼ N (0, σ on ) is assumed Gaussian, independent andidentically distributed. The time-varying dataset D κ = (cid:110) x ( i ) , y ( i ) (cid:111) N κ i =1 , (3)is updated at time t κ and remains constant until t κ +1 and N κ ∈ N denotes the current number of data points. The exact measurement of the state is a common assumptionand necessary for feedback linearization. The time derivativeof the state x n can, for practical applications, be approximatedthrough finite differences. The approximation error is thenconsidered as part of the measurement noise as other additivesources of imprecision result in an overall sub-Gaussian noisedistribution. Alternatively, a separate sensor for measurementsof ˙ x n is necessary.Throughout this article, we will refer to σ on = 0 as the noiseless case and σ on > as the noisy case consideringmeasurements of ˙ x n . The measurement of the state x willalways be assumed noise free.Consider, that N κ is not necessarily increasing with increas-ing κ as data pairs can also be discarded from the dataset ifnot needed anymore. However, this set D κ remains constantbetween two consecutive measurements, because elements areonly added or removed at t κ .The goal is to design an online learning feedback linearizingcontrol law - based on dataset D κ - of the form u κ ( x ) = 1ˆ g κ ( x ) (cid:16) − ˆ f κ ( x ) + ν (cid:17) , κ ∈ N , (4)where ν ∈ R is the input to the resulting approximately lin-earized system and the functions ˆ f κ : X → R , ˆ g κ : X → R arethe approximations for the unknown functions f ( x ) , g ( x ) . Thecontrol law (4) is switching, because the model ˆ f κ ( x ) , ˆ g κ ( x ) is updated with every change of the dataset D κ at time t κ .We would like to emphasize, that measurements are not takenat a constant time interval, and updates are therefore notperformed periodically. Instead, the updates will be performedwhen needed, i.e. triggered by an event (introduced in Sec. V)and thus t κ for κ ∈ N are not equidistant. By definition,the κ -th update occurs at t κ and the control law u κ is thenapplied until the next event at t κ +1 , more formally written as u ( x ) = u κ ( x ), t ∈ [ t κ t κ +1 ). (5)III. G AUSSIAN P ROCESS L EARNING FOR C ONTROL A FFINE S YSTEMS
For the closed-loop online identification of f ( · ) and g ( · ) we consider Gaussian process regression, which then providesthe approximations ˆ f κ ( · ) and ˆ g κ ( · ) . We will first introduceGP regression in general (Sec. III-A), before presenting ourtailored solution for control affine closed-loop systems inSec. III-B. A. Gaussian process regression
Consider a function f true : X → R for which noisy measure-ments of the image at the locations x ( i ) ∈ X are available, thus y ( i ) f = f true (cid:16) x ( i ) (cid:17) + (cid:15) ( i ) , (6)where (cid:15) ( i ) ∼ N (0, σ on ) and i = 1, . . . , N (where we writesimply N for N κ in this section). Modeling this functionwith a Gaussian process f GP ( x ) results in a stochasticprocess which assigns a Gaussian distribution to any finitesubset { x , . . . , x M } ⊂ X in a continuous domain. The GPs also often considered as distribution over functions [3],denoted by f GP ( x ) ∼ GP ( m ( x ), k ( x , x (cid:48) )) , (7)and is fully specified by a mean m ( x ) : X → R andcovariance k ( x , x (cid:48) ) : X × X → R function. The mean functionincludes prior knowledge of the function f true if there isany. Otherwise, it is commonly set to zero. The covariancefunction, also called kernel function, determines propertiesof f GP ( x ) , like the smoothness and signal variance. Meanand kernel function are described by the hyperparameters ψ .Using Bayesian techniques, the likelihood function ψ ∗ = arg max ψ log p ( y f | X , ψ ), (8) log p ( y f | X , ψ ) = 12 (cid:0) y Tf K − y f − log det K − N log(2 π ) (cid:1) , is maximized to obtain the optimal hyperparameters for a givenset of observations. As notation we use X = (cid:104) x (1) · · · x ( N ) (cid:105) ∈ R n × N , (9) y f = (cid:104) y (1) f · · · y ( N ) f (cid:105) (cid:124) ∈ R N , (10)to denote the input/output data, respectively and K = k (cid:0) x (1) , x (1) (cid:1) · · · k (cid:0) x (1) , x ( N ) (cid:1) ... . . . ... k (cid:0) x ( N ) , x (1) (cid:1) · · · k (cid:0) x ( N ) , x ( N ) (cid:1) ∈ R N × N (11)concatenates kernel evaluations of pairs of input data. Al-though the optimization (8) is generally non-convex, it is usu-ally performed with conjugated gradient-based methods [3].Each local minimum can be considered as a different in-terpretation of data and we discuss the effect of suboptimalidentification in Sec. III-C.In a regression task, GPs employ the joint Gaussian distri-bution of training data X , y f and a test input x ∗ (cid:20) f GP ( x ∗ ) y f (cid:21) ∼ N (cid:18)(cid:20) m ( x ∗ ) m X (cid:21) , (cid:20) k ∗ k (cid:124) k K + σ on I N (cid:21)(cid:19) , (12)where m X = (cid:2) m (cid:0) x (1) (cid:1) · · · m (cid:0) x ( N ) (cid:1)(cid:3) (cid:124) , (13)to find the posterior mean and variance function µ ( x ∗ ) := E (cid:2) f GP ( x ∗ ) | X , y f (cid:3) (14) = m ( x ∗ ) + k (cid:124) ( K + σ on I N ) − ( y f − m X ), σ ( x ∗ ) := V [ f GP ( x ∗ ) | X , y f ] (15) = k ∗ − k (cid:124) ( K + σ on I N ) − k , through conditioning, where k ∗ = k ( x ∗ , x ∗ ), k = (cid:104) k (cid:16) x (1) , x ∗ (cid:17) · · · k (cid:16) x ( N ) , x ∗ (cid:17)(cid:105) (cid:124) ∈ R N . (16)However, considering the defined problem in Sec. II, theclassical GP regression framework cannot be directly applied,because closed-loop measurements do not provide data pointsfor f ( · ) and g ( · ) separately. Therefore, the following sectionexplains how it is augmented using the given prior knowledge. B. Closed-loop identification with prior knowledge
First, we transfer the knowledge on the positivity of thefunction g ( x ) from Assumption 2 into the model ˆ g ( x ) . It iscrucial to utilize this knowledge for the model to ensure thefeedback linearizing control (4) results in well behaved controlsignals. Using a GP model for ˆ g ( x ) , this can be ensured usinga proper prior mean function. Lemma 2:
Consider the posterior mean function (14)with a bounded and differentiable kernel k ( · , · ) and adataset ( X , y f ) for which x ( i ) (cid:54) = x ( i (cid:48) ) and y ( i ) f > ,hold ∀ i , i (cid:48) = 1, . . . , N , i (cid:54) = i (cid:48) . Then, there exists a differ-entiable prior mean function m ( x ) such that µ ( x ) > ∀ x ∈ X . (17) Proof:
Consider a prior mean function for whichholds < m (cid:0) x ( i ) (cid:1) < ∞ , ∀ i = 1, . . . , N , then, a differen-tiable m ( x ) can ∀ x ∈ X \ (cid:8) x (1) , . . . , x ( N ) (cid:9) always be cho-sen larger than the constant k (cid:124) ( K + σ on I N ) − (cid:0) y f − m X (cid:1) ,because the latter is bounded. For x ∈ (cid:8) x (1) , . . . , x ( N ) (cid:9) a choice m (cid:0) x ( i ) (cid:1) = y ( i ) f (which complies with the firstcondition) ensures, that µ ( x ) is strictly positive. Remark 1:
Since g ( · ) is strictly positive by Assumption 2,the condition y ( i ) f > follows naturally. In case the Gaussiannoise results in negative measurements y ( i ) , it can be correctedusing max( y ( i ) , η ) , with an arbitrarily small η > . Alter-natively, strictly positive noise distributions, e.g. a Gammadistribution can also be combined with Gaussian processregression [3].In practice, it is often sufficient to set m ( x ) to a positiveconstant. To verify that µ ( x ) > holds, the techniques in [28]can be utilized. The suitable prior mean function according toLemma 2 will be denoted by m g ( x ) .Second, the major difficulty of closed-loop identificationis to differentiate the effect of the control input and theunforced dynamics. For the control affine structure, this meansthat individual measurements of the functions f ( · ) and g ( · ) from (1) are not provided. Thus, functions f ( · ), g ( · ) must beidentified from only observing their sum exploiting the controlaffine structure. We propose to utilize a compound kernels asreviewed in Appendix A based on [4]. More specifically, weuse the composite kernel k ( x , x (cid:48) ) = k f ( x , x (cid:48) ) + u ( x ) k g ( x , x (cid:48) ) u ( x (cid:48) ), (18)which replicates the structure of a control affine system: thefirst summand k f ( · , · ) represents the unknown unforced dy-namics f ( · ) ; the second summand u ( · ) k g ( · , · ) u ( · ) the productof the unknown scaling of the control g ( · ) and the knownstate feedback control term u ( · ) . As no further knowledgeregarding f ( · ), g ( · ) is given, we employ two squared expo-nential (SE) kernels with automatic relevance determination k f ( x , x (cid:48) ) = σ f exp n (cid:88) j =1 ( x j − x (cid:48) j ) − l j , f , (19) k g ( x , x (cid:48) ) = σ g exp n (cid:88) j =1 ( x j − x (cid:48) j ) − l j , g , (20)here the hyperparameters are the length-scales l j , f , l j , g ∈ R + , j = 1, . . . , n and the signalvariances σ f , σ g ∈ R . For notational convenience, theyare concatenated in the vector ψ gf = (cid:2) l f l g · · · l n , f l n , g σ f σ g (cid:3) (cid:124) . (21)The SE kernel is universal and therefore allows to model anycontinuous function arbitrarily exactly according to [40]. Remark 2:
GP models with structured kernels, like (18),must not be confused with parametric models, which have apredetermined structure and use a fixed number of parameters.In contrast a GP with a structured kernel has potentiallyinfinitely many parameters for each part of its structure. So thekernel encodes the knowledge, that the unknown function e.g.is build of a sum, but each summand has unlimited flexibility.We denote U = diag (cid:0) u (cid:0) x (1) (cid:1) , . . . , u N (cid:0) x ( N ) (cid:1)(cid:1) ∈ R N × N ,where u i denotes the control law which was active at the timeat which the pair (cid:8) x ( i ) , y ( i ) (cid:9) was collected for i = 1, . . . , N .Furthermore, m X g , y are analogously defined to (13), (10),respectively. Then K fg = K f + U (cid:124) K g U + σ on I n , (22)and k f , k g , K f , K g are defined analogously to (16) and (11)using k f ( x , x (cid:48) ), k g ( x , x (cid:48) ) . This notation allows to formulatethe estimates ˆ f ( x ), ˆ g ( x ) . Lemma 3:
The GP posterior mean prediction for the func-tions f ( x ), g ( x ) , based on the training data D κ in (3) for thecompound kernel (18) are given by ˆ f ( x ) := µ f ( x ) = k (cid:124) f K − fg (cid:0) y − U m X g (cid:1) , (23) ˆ g ( x ) := µ g ( x ) = m g ( x ) + k (cid:124) g U K − fg (cid:0) y − U m X g (cid:1) , (24)where the prior mean function for ˆ f ( x ) is set tozero, m f ( x ) = 0 , and for ˆ g ( x ) , m g ( x ) is chosen accordingto Lemma 2. Proof:
For an input x and the compound kernel (18), thejoint distribution is given by f ( x ) g ( x ) y ∼ N m g ( x ) U m X g , k ∗ f k (cid:124) f k ∗ g k (cid:124) g U (cid:124) k f U k g K fg , (25)similarly to (12). According to [4], the posterior mean func-tions (23) and (24) follow equivalently to (14).For these estimates, it can be shown that all prior knowledgeis properly transferred into the model. Proposition 1:
Consider a control affine system (1) underAssumptions 1-3 and the compound kernel (18). Then, theestimates ˆ f ( x ) and ˆ g ( x ) in Lemma 3 are bounded, infinitelydifferentiable and there exists a prior mean function m g ( x ) and a hyperparameter vector ψ gf such that ˆ g ( x ) > holds ∀ x ∈ X . Proof:
The SE kernel inherits its properties differentiabil-ity and boundedness to all functions represented by the GP [3],thus also to the posterior mean functions, which are used asestimates. The strict positivity of ˆ g ( x ) follows from the fact,that σ g can be made arbitrarily small such that there alwaysexists a positive function m g , such that m g ( x ) dominates theterm k (cid:124) g U K − fg (cid:0) y − U m X g (cid:1) in (24). Remark 3:
The only properties of the SE kernel which areused for the derivations and proofs are its differentiabilityand its boundedness. Thus, the conclusions can directly beextended to other kernel function fulfilling these properties.For the sake of focus, in this article we will consider the SEkernel only.
C. Discussion
The most obvious challenge of the closed-loop identificationis, that there exists not a unique, but infinitely many solutionsfor two differentiable functions to add up to the same values.Thus, only observing the sum in (1) is not promising to learnthe unique correct individual functions f ( · ) , g ( · ) because it isan under-determined problem. The estimates in (23) and (24)are just one of many solutions, determined by the choice ofhyperparameters, which suits the training data. Nevertheless,the optimization (8) interprets the observed data to matchthe kernel structure, which is shown to be successful in thesimulation in Sec. VI-A. For the case that the results are notsatisfactory, we provide an extension in Appendix B to addressthis challenge. It merges data points of the closed-loop systemwith measurements from the temporary open-loop system. Itthereby uses Lemma 1, which allows to safely turn off thecontrol signal ( u = 0 ) for a finite time period. Nevertheless,we want to highlight, that the formal guarantees provided inthe following section (Theorem 1) hold independently whetherthis extension is utilized or not.Furthermore, additional knowledge like periodicity or de-pendence of f ( · ) or g ( · ) on only a subset of the statevariables, can also be transferred into the kernel to facilitatethe identification by using a periodic kernel or setting thelengthscales l j , f = l j (cid:48) , g = ∞ for the states j , j (cid:48) of which theyare independent, respectively. The latter simplifies the opti-mization of hyperparameters as the search space is reduced. Asystematic way of constructing more evolved kernel function(including more prior knowledge) is discussed in [4].Considering the computational load, the inverse of K fg ismost critical, as the number of operations increases cubicalwith the number of data points, thus O ( N ) . However, addingfurther data points is necessary to ensure the model is preciseat the current position in the state space, where the mostrecent data points are taken from measurements. Comparingto previous approaches, e.g. [38], where K − fg y is constantand can thereby be precomputed offline, here it must berecomputed with every update of the model as data points areadded one at a time. This difficulty can be addressed usinga rank- update of the inverse with the Sherman–Morrisonformula [41] resulting in only O ( N ) operations. However,this quadratic computational complexity might still be verytime consuming, and therefore motivates the data-efficientevent-triggered model updates introduced in Sec. V.Generally, Gaussian processes turn out to be very effectivefor the adaptive model control law: They properly transferall prior assumptions consistently into the model (Proposi-tion 1) and allow for the identification in closed-loop. Thenonparametric nature allows an unlimited model flexibilityand the complexity increases as more data is available in aata-driven fashion. This is a crucial advantage compared toclassical system identification methods, particularly for highlynonlinear systems.IV. F EEDBACK L INEARIZING C ONTROL L AW In this section, the feedback linearizing online learningcontrol law is proposed and the resulting closed-loop behavioris analyzed. After showing ultimate boundedness for themost general case, we make further specific assumptions toprovide stronger stability results. Here, further properties ofthe Gaussian process modeling technique is exploited: As themodel error of the GP can be bounded and quantified, theultimate bound of the tracking error can also be quantified.Classical model reference adaptive control modifies themodel parameters continuously over time, which is not possi-ble here due to the nonparametric nature of the GP model.Thus, particular attention must be drawn to the resultingswitching character of the control law, which stems from thetime-varying dataset D κ of the Gaussian process introduced inAssumption 3.We are interested in tracking desired trajectories for thestate x , given by x d ( t ) , with the following property. Assumption 4:
The desired trajectory x d ( t ) is bounded andat least n − times differentiable, thus x d ( t ) = (cid:104) x d ˙ x d · · · d n − x d dt n − (cid:105) (cid:124) (26)is continuous and d n x d dt n is bounded. For notational convenience, we define the tracking error e = x − x d . (27) A. Control law
Consider the filtered scalar state r ∈ R , defined as r = (cid:2) λ (cid:124) (cid:3) e , (28)where λ = [ λ λ · · · λ n − ] (cid:124) ∈ R n − is a coef-ficient vector such that for s ∈ C the polynomial s n − + λ n − s n − + · · · + λ is Hurwitz. Under this condi-tion, the error converges exponentially e → as r → [36].The dynamics of the filtered state is ˙ r = f ( x ) + g ( x ) u ( x ) + ρ , (29)where ρ = λ (cid:124) e n − d n x d dt n , (30)with e n = [ e · · · e n ] (cid:124) ∈ R n − . For the control law u ( x ) ,we propose u κ ( x ) = 1ˆ g κ ( x ) (cid:16) − ˆ f κ ( x ) − k c r − ρ (cid:17) , (31)according to (4) where ν = − k c r − ρ with k c ∈ R + isused. The subscript κ ∈ N indicates the κ -th time inter-val t ∈ [ t κ t κ +1 ) for which u κ is applied according to (5).The estimates ˆ g κ ( · ) , ˆ f κ ( · ) are based on N κ training points The t dependency of x d , x and the x dependencies of u , f , ˆ f , g , ˆ g arepartially omitted for notational convenience. in the time-varying dataset D κ introduced in Assumption 3.The control scheme is visualized in Fig. 2 and the adaptationprocedure is provided in Algorithm 1.Note, that even though GPs itself are probabilistic models,the control law is deterministic, because it only employs theposterior mean functions as model estimate ˆ g κ ( · ) , ˆ f κ ( · ) . Algorithm 1
Online leaning for feedback linearization control initialize κ = 0 , D = {} , ˆ f = 0 , ˆ g = m g ( · ) while simulation time not exceeded do while t < t κ +1 do run controller u κ in (31) end while set κ ← κ + 1 measure x ( κ ) = x ( t κ ) and y ( κ ) = ˙ x n ( t κ ) + (cid:15) ( κ ) add training point D κ = D κ − ∪ (cid:8)(cid:0) x ( κ ) , y ( κ ) (cid:1)(cid:9) update the estimates ˆ f κ ( · ), ˆ g κ ( · ) in (23), (24) end while B. Convergence analysis
An offline version of the control law (31), with constantdataset D and estimates ˆ f ( x ), ˆ g ( x ) was introduced previ-ously and was shown to be globally uniformly ultimatelybounded [38, Proposition 1].But, with the time-varying dataset and model, Algorithm 1describes a switching control law. This switching results in ahybrid system, where some states are changing continuouslyin time but the system dynamics (or other states) change atdiscrete time instances. Here, the resulting closed-loop systemis subject to (in general) arbitrary switching. Its convergencebehavior is analyzed here based on the principle of a commonLyapunov function. It states, that a Lyapunov function, whichis independent of the switching signal must decrease over timealong the system’s trajectories. This is shown in the following. Theorem 1:
Consider the system (1) and a desired tra-jectory x d ( t ) under Assumptions 1-4. Further consider thecontrol law (31), where f ( · ), g ( · ) are modeled by GP meanfunctions ˆ f κ ( · ), ˆ g κ ( · ) in (23) and (24), respectively. The GPmodel is updated at arbitrary switching times t κ accordingto Algorithm 1. Then, there exists a k ∗ c > such that forevery k c ≥ k ∗ c the tracking error (cid:107) e (cid:107) is globally uniformlyultimately bounded. Proof:
Consider the common Lyapunov function candi-date V κ ( x ) = r / ∀ κ ∈ N , (32)with time derivative ˙ V κ ( x ) = r ˙ r = r ( f + gu κ + ρ ) (33) = r (cid:18) f + g ˆ g κ ( − ˆ f κ − k c r − ρ ) + ρ (cid:19) = r (cid:16) f − ¯ g κ ˆ f κ (cid:17) − k c ¯ g κ r + (1 − ¯ g κ ) rρ , where ¯ g κ := g ( x )ˆ g κ ( x ) is positive and bounded ∀ κ and ∀ x ∈ X from Proposition 1 and Assumptions 1 and 2. As a con-ontrol affinesystem (1)feedbacklinearization (4) GP model(23) (24) model updateevent trigger (53) k c [ λ (cid:124)
1] [0 λ (cid:124) ] D κ x d − e r − ρ − ν u ˆ f κ ˆ g κ x σ κ x ˙ x n x e linearized for ˆ f κ = f , ˆ g κ = g linear control law λ is Hurwitz Fig. 2. The online leaning feedback linearizing control scheme including the event trigger proposed in Sec. V, which controls the switching time t κ +1 . sequence (cid:16) f − ¯ g κ ˆ f κ (cid:17) is bounded and there exists a con-stant a ∈ R n such that (cid:13)(cid:13)(cid:13) r (cid:16) f − ¯ g κ ˆ f κ (cid:17)(cid:13)(cid:13)(cid:13) ≤ (cid:107) a (cid:124) e (cid:107) ∀ e , κ (34)holds, because r only grows linearly in e . For similar rea-sons, we can find constants c ∈ R n and B , C ∈ R n × n ,with B , C (cid:31) for which (cid:13)(cid:13) ¯ g κ r (cid:13)(cid:13) ≥ e (cid:124) Be , ∀ e , κ (35) (cid:107) (1 − ¯ g κ ) rρ (cid:107) ≤ e (cid:124) Ce + c (cid:124) e , ∀ e , κ (36)holds which exist since f , ˆ f κ , ¯ g κ are bounded. Therefore, ˙ V κ ( x ) ≤ (cid:107) a (cid:107)(cid:107) e (cid:107)− k c σ min ( B ) (cid:107) e (cid:107) + σ max ( C ) (cid:107) e (cid:107) + (cid:107) c (cid:107)(cid:107) e (cid:107) = (cid:107) e (cid:107) ( (cid:107) a (cid:107) + (cid:107) c (cid:107) )+ (cid:107) e (cid:107) ( σ max ( C ) − k c σ min ( B )) (37)holds for all κ and there exists a k ∗ c > such that σ max ( C ) − k ∗ c σ min ( B ) < (38)As a result, for every k c ≥ k ∗ c , the Lyapunov functiondecreases ˙ V κ ( x ) < ∀ x ∈ X \ B , ∀ κ (39)outside of the set B = (cid:26) ∀ x ∈ X (cid:12)(cid:12)(cid:12)(cid:12) (cid:107) e (cid:107) ≤ (cid:107) a (cid:107) + (cid:107) c (cid:107) k c σ min ( A ) − σ max ( B ) (cid:27) , (40)which forms a tube in x coordinates around the desiredtrajectory and a ball in e coordinates. Thus, we have founda common radially unbounded Lyapunov function V ( x ) ,which decreases ∀ κ outside of the ball B . Accordingto [42, Theorem 2.1], this allows the conclusion that forarbitrary switching sequences the tracking error converges tothe ball B . Since B is independent of the initial state, globaluniform ultimate boundedness (GUUB) holds.Thus, we have shown, that the tracking error is bounded by theproposed online learning control scheme for a large enoughgain k c with an arbitrary switching sequence. However, with-out further knowledge, a value for the critical gain k ∗ c cannotbe computed. We therefore make the following simplifyingassumption Assumption 5:
The function g ( x ) is known,thus ˆ g ( x ) = g ( x ) and noisy training data observationsof ˙ x n y ( i ) f = f (cid:16) x ( i ) (cid:17) + (cid:15) ( i ) = ˙ x ( i ) n − g (cid:16) x ( i ) (cid:17) u κ (cid:16) x ( i ) (cid:17) + (cid:15) ( i ) (41)with (cid:15) ( i ) ∼ N (0, σ on ), i = 1, . . . , N κ are available.This assumption holds for many real-world system, e.g. mostLagrangian systems, which is a considerably large class [43]. Itis also a quite common assumption when working with controlaffine systems [44].Analogously to (23), the unknown function is now estimatedby ˆ f ( x ) := µ f ( x ) = k (cid:124) f ( K f + σ on I N κ ) − y f , (42)where y f = (cid:104) y (1) f · · · y ( N κ ) f (cid:105) (cid:124) and k f , K f are computedaccording to (16) and (11) for the SE kernel. Under thisassumption, Theorem 1 can be relaxed with respect to thechoice of the gain k c . Corollary 1:
Consider the system (1) and a desired trajec-tory x d ( t ) under Assumptions 1-5. Further consider the controllaw (31), where f ( · ) is modeled by a GP mean function ˆ f κ ( · ) in (42), which is adapted at arbitrary switching times t κ according to Algorithm 1. Then, the tracking error (cid:107) e (cid:107) of theclosed-loop switching system is globally uniformly ultimatelybounded for any k c > . Proof:
Using the Lyapunov function (32), we obtainfor ¯ g κ = 1 ˙ V κ ( x ) = r ˙ r = r ( f + gu κ + ρ ) (43) = r (cid:16) f − ˆ f κ (cid:17) − k c r , (44)which leaves us with the condition (cid:16) f − ˆ f κ (cid:17) < k c r for neg-ative definiteness of ˙ V κ ( x ) . Thus independent of the gain k c ,their exists a ball outside of which the Lyapunov function isdecreasing ∀ κ , which leads to the GUUB ∀ k c > .For completeness, we also formalize the result fordataset D κ which remains constant after t κ , thus no furthermeasurements are taken and no data points are deleted fromthe set for t > t κ . Corollary 2:
Consider the system (1) and a desired trajec-tory x d ( t ) under Assumptions 1-5. Further consider the controlaw (31), where f ( · ) is modeled by a GP mean function ˆ f κ ( · ) in (42) with a fixed dataset D κ . Then, the tracking error (cid:107) e (cid:107) of the closed-loop system is globally uniformly ultimatelybounded for any k c > . Proof:
The proof is straightforward as it follows from theproof for Corollary 1.
C. Quantifying the ultimate bound
Theorem 1 and Corollaries 1 and 2 show, that there existsan ultimate bound for the tracking error e , however, its size isunknown. To quantify the ultimate bound B , an upper boundfor the model estimate, defined as ∆ f κ ( x ) = | f ( x ) − ˆ f κ ( x ) | , ∀ κ , (45)is derived in this section using the variance func-tion σ κ ( x ) : X → R of the GP as defined in (15). Sincethe GP is a probabilistic model in nature, we cannot expectany deterministic statements regarding the error of the esti-mate ∆ f κ . However, according to [45], it is possible to makehigh probability statements regarding the maximum distancefrom the true function f ( x ) to the mean function µ ( x ) on acompact set. As known from the no-free lunch theorems [46],this generalization cannot be expected without any priorknowledge about f ( · ) . Since we do not want to make anyparametric assumptions which limit the complexity of f ( · ) ,we restrict its reproducing kernel Hilbert space (RKHS) normas follows. Assumption 6:
The function f ( x ) has a bounded repro-ducing kernel Hilbert space (RKHS) norm with respect to asquared exponential kernel k ( · , · ) , with known hyperparame-ters denoted by (cid:107) f ( x ) (cid:107) k ≤ B f .With this additional assumption, a high probability statementregarding precision of the mean function estimate is possibleaccording to [45]. Lemma 4:
Suppose Assumption 6 holds, then Pr (cid:110) | µ κ ( x ) − f ( x ) | ≤ β κ σ κ ( x ), ∀ x ∈ ˜ X , N κ ∈ N (cid:111) ≥ − δ , (46)holds on a compact set ˜ X ⊂ R n , where δ ∈ (0, 1) , β κ = (cid:113) B f + 300 γ κ log (( κ + 1) /δ ) and γ κ is the max-imum mutual information that can be obtained about f ( · ) from κ + 1 noisy samples x (1) , . . . , x ( κ +1) and µ κ ( x ) and σ κ ( x ) are posterior mean and variance function of a GPfor N κ data points as defined in (14) and (15), respectively. Proof:
This is a direct consequence from [45, Theorem 6].
Remark 4:
Consider, that (46) takes all N κ ∈ N intoaccount at once. This means, the probability δ holds not justfor a single N κ ∈ N but for all N κ ∈ N . This becomes clearwhen rewriting (46) as Pr (cid:40) ∞ (cid:92) N κ =0 | µ κ ( x ) − f ( x ) | ≤ β κ σ κ ( x ), ∀ x ∈ ˜ X (cid:41) ≥ − δ . (47)The model error bound in Lemma 4 only holds on a compactset ˜ X . Nevertheless, we have already shown in Theorem 1 and Corollaries 1 and 1, that the tracking error converges toa compact set B . Thus, we set ˜ X = B , which leads to thefollowing result. Theorem 2:
Consider the system (1) and a desired tra-jectory x d ( t ) under Assumptions 1-6. Further consider thecontrol law (31), where f ( · ) is modeled by a GP meanfunction ˆ f κ ( · ) in (42), which is adapted at arbitrary switchingtimes t κ according to Algorithm 1. Then, with probabil-ity − δ , δ ∈ (0, 1) , the tracking error (cid:107) e (cid:107) is uniformlyultimately bounded for any k c > with the ultimate bound B κ = (cid:40) ∀ x ∈ X (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) (cid:107) e (cid:107) ≤ β κ ¯ σ κ k c (cid:13)(cid:13)(cid:2) λ (cid:124) (cid:3)(cid:13)(cid:13) (cid:41) , ∀ x ∈ X , (48)where ¯ σ κ := max x ∈ ˜ X σ κ ( x ) and β κ is defined in Lemma 4. Proof:
Using the common Lyapunov candidate (32), itstime derivative (44) is given in the κ -th time step for thecase g − ˆ g = 0 (Assumption 5) by ˙ V κ ( x ) ≤ r ∆ f κ ( x ) − k c r . (49)As Theorem 1 guarantees convergence to B = ˜ X , the modelerror must only be closer considered in this compact set. FromLemma 4 it can be concluded, that Pr (cid:110) ∆ f κ ( x ) ≤ β κ ¯ σ κ , ∀ x ∈ ˜ X , κ ∈ N (cid:111) ≥ − δ (50) ⇒ Pr (cid:110) ˙ V ( x ) < ∀ x ∈ ˜ X \ B κ , κ ∈ N (cid:111) ≥ − δ , (51)which shows convergence of r to a ball with radius β κ ¯ σ κ k c andthe error is e is ultimately bounded by B κ with probabilitylarger then − δ . The attributes hold uniformly and globallyfrom the fact that V κ is a common, time-independent andradially unbounded Lyapunov function [42]. Remark 5:
Theorem 1 focuses on the existence of anultimate bound and therefore provides with B in (40) themaximum bound across all time intervals, which can be seenin (35), where B , C , c must be suitable for all κ . In contrast,Theorem 2 is here more specific and provides with B κ aquantitative bound for each time interval κ individually. Note,that the tracking error e will not necessarily converge to theball B κ by the end of the κ -th time step t κ +1 because it mighttake infinite time. It is considered as a ball which is reachedby the tracking error if the control law stops adapting afterthe κ -th update (compare Corollary 2). Remark 6:
In contrast to Theorem 1, Theorem 2 is a stabilitystatement which only holds with a specified probability. Thereason for this lies in the uncertainty about the plant itself, butneither the plant nor any part of the controller are stochastic.Therefore, a stability analysis from deterministic control the-ory is applicable here, but the convergence result does not holdfor all plants which fulfill the specified assumptions. Thereexists a small fraction of all plants (specified by δ ), which donot converge to the specified ultimate bound. But, if the plantdoes not belong to this fraction, the result will always holdand there is no stochastic stability analysis necessary. Note,that the fraction δ for which the result does not hold, can bebe made arbitrarily small.. E VENT - TRIGGERED M ODEL U PDATE
The results in the previous section all hold for arbitraryswitching sequences (any definition of t κ +1 is possible) be-cause we have so far not specified when new training datapoints are taken and the model is updated accordingly. Ourgoal is a data-efficient online learning scheme and thereby weonly want to add training data if necessary. Thus, switchingshould not occur synchronously (after a specified fixed timeinterval) but asynchronously (whenever needed), which isknown as event-triggered control.The general idea of event-based control is to utilize ascarce resource (sensor measurements, computational power,communication channel, etc.) only when required. In contrastto time-triggered control, where the resource is used period-ically (synchronously), it is thereby typically more resource-conserving [47]. In our setting, we aim to reduce the numberof the model updates and measurements for training data tokeep the computational complexity low. The key idea of ourdata-efficient online learning is therefore, to take only newtraining data into account if there is necessity based on thecurrent uncertainty in the model.Previous work in [29] uses a time-triggered model, thusmeasurements are taken and training points are added after aspecific time, thus t κ +1 = t κ + ∆ t with fixed interval ∆ t > .However, this causes the following difficulties: First, it isunknown whether the current estimate ˆ f κ ( · ) of the func-tion f ( · ) is precise enough to ensure a further decrease of theLyapunov function. From (49) it is clear that, the estimatesmust become more precise as r gets smaller to guaranteenegative definiteness of ˙ V κ for ∀ x ∈ X .Considering, that at some parts of the state space moredata points are necessary to model the function f ( · ) preciselythan in others shows that choosing a constant ∆ t properlyis impossible without knowing the function f ( · ) . Second,over an infinite time horizon, the time-triggered update causesinfinitely many (possibly unnecessary) measurements. This iscritical even for finite time, because the number of operationsto update the GP model increases with O ( N ) (or O ( N ) atbest, when using the Sherman–Morrison formula) [3].In summary, for the time-triggered design, there is a trade-off between the precision and the computational complexity ofthe model when choosing the update rate. If more points areadded to the dataset, the variance of the GP model and therebythe maximum model error decreases according to Lemma 4.However, many training points increase the time to computethe model estimate and possible result in a loss of the real-timecapability [48].Therefore, in interest of data efficiency and the associatedcomputational complexity, we trigger measurements and theirintake to the dataset in an event-based fashion. An intuitiveidea is to add training points as soon as the error ∆ f κ becomes too large, which is specified based on the Lyapunovstability condition. Generally, to guarantee stability, an eventmust triggered before the temporal derivative of the Lyapunovfunction turns non-negative, thus t κ +1 := (cid:110) t > t κ (cid:12)(cid:12)(cid:12) ˙ V ( x ) ≥ (cid:111) . (52) However, since an exact computation of ˙ V ( · ) is not possible,we have to evaluate an upper bound as presented in thefollowing. First, we will consider noiseless measurementsof the highest state derivative for the training data, beforeaddressing the case, where these measurements are corruptedby noise. A. Asymptotic stability for noiseless measurements
We first define the noise free case formally in an assumption.
Assumption 7:
Measurements of ˙ x n are available noise free,thus σ on = 0 in Assumption 3.A well-suited indicator for the necessity to add a new trainingpoint is the variance function of the GP σ κ ( · ) in (15) as itbounds the maximum error with high probability according toLemma 4. Based on this intuition, we propose the followingevent t κ +1 := { t > t κ | β κ σ κ ( x ) ≥ k c | r | } , (53)where the triggering time t κ +1 is defined as the first timeafter t κ when β κ σ κ ( x ) becomes larger or equal than k c | r | . Remark 7:
In the time instance after eachupdate t = t κ , it generally holds σ κ ( x ( t κ )) = 0 ,which implies σ κ ( x ( t κ )) ≤ k c | r ( t κ ) | . Since both σ κ and r are continuous over time between two events,the event will always be triggered at the equality,thus β κ σ κ ( x ( t κ +1 )) = k c | r ( t κ +1 ) | .Using the proposed event (53) in Algorithm 1 as trigger toupdate the model, the following is concluded. Theorem 3:
Consider the system (1) and a desired trajec-tory x d ( t ) under Assumptions 1-7. Further consider the controllaw (31), where f ( · ) is modeled by a GP mean function ˆ f κ ( · ) in (42) which is updated according to the event-triggeringlaw (53) and Algorithm 1. Then, the tracking error e isglobally asymptotically stable for any k c > and the inter-event time ∆ t κ := t κ +1 − t κ is lower bounded by a positiveconstant t lb > , for all κ ∈ N with probability − δ . Proof:
We consider again the common Lyapunov candi-date (32) and its time derivative ˙ V κ ( x ) ≤ r ∆ f κ ( x ) − k c r , (54)where ∆ f κ is the model error defined in (45). With noise-less measurements, a GP mean function passes through eachtraining point [49]. Thus, the estimate ˆ f κ is exact for the timestep t κ , and ˙ V κ ( x ( t κ )) = − k c r . For t κ < t < t κ +1 the esti-mation error ∆ f κ ( x ( t )) continuously changes and is generallylarger than zero. But the term k c r will dominate r ∆ f κ ( x ) with probability − δ by design of the triggering condition (53)and Lemma 4, thus Pr (cid:110) ˙ V κ ( x ) < ∀ x ∈ ˜ X , κ ∈ N (cid:111) ≥ − δ . (55)From Theorem 1 it is known that the system reaches a compactset ˜ X for any initial condition x ∈ X . Therefore, Lemma 4is applicable and with the radial boundedness of V , the globalasymptotic stability with probability − δ is shown.To show, that the inter-event time is lower bounded, wedefine the Lipschitz constant L σ > , such that ˙ σ κ ≤ L σ ˙ r ,hich exists due to the differentiability of σ κ with respect to r .Following the lines of [50] ddt (cid:12)(cid:12)(cid:12) σ κ r (cid:12)(cid:12)(cid:12) = ddt (cid:112) σ κ √ r = ˙ σ κ r − σ κ ˙ rr ≤ (cid:12)(cid:12)(cid:12)(cid:12) ˙ σ κ r (cid:12)(cid:12)(cid:12)(cid:12) + (cid:12)(cid:12)(cid:12)(cid:12) σ κ ˙ rr (cid:12)(cid:12)(cid:12)(cid:12) ≤ (cid:12)(cid:12)(cid:12)(cid:12) L σ (∆ f κ − k c r ) r (cid:12)(cid:12)(cid:12)(cid:12) + (cid:12)(cid:12)(cid:12)(cid:12) σ κ (∆ f κ − k c r ) r (cid:12)(cid:12)(cid:12)(cid:12) ≤ L σ (cid:12)(cid:12)(cid:12)(cid:12) ∆ f κ r (cid:12)(cid:12)(cid:12)(cid:12) + L σ k c + (cid:12)(cid:12)(cid:12)(cid:12) ∆ f κ σ κ r (cid:12)(cid:12)(cid:12)(cid:12) + k c (cid:12)(cid:12)(cid:12) σ κ r (cid:12)(cid:12)(cid:12) , and using Lemma 4 yields Pr (cid:26) ddt (cid:12)(cid:12)(cid:12) σ κ r (cid:12)(cid:12)(cid:12) ≤ L σ β κ (cid:12)(cid:12)(cid:12) σ κ r (cid:12)(cid:12)(cid:12) + L σ k c + β κ (cid:12)(cid:12)(cid:12) σ κ r (cid:12)(cid:12)(cid:12) + k c (cid:12)(cid:12)(cid:12) σ κ r (cid:12)(cid:12)(cid:12) , ∀ x ∈ ˜ X , k ∈ N (cid:27) ≥ − δ for which we define φ = (cid:12)(cid:12) σ κ r (cid:12)(cid:12) . The differential equation ˙ φ = β κ φ + φ ( L σ β κ + k c ) + L σ k c , (56)with initial condition φ ( t κ ) = 0 (from σ κ ( x ( t κ )) = 0 ) yields φ ( t ) = 12 β κ (cid:18) c tan (cid:18)
12 (( t − t κ ) c ± c ) (cid:19) − L σ β κ − k c (cid:19) , (57)according to [51] for the time interval t ∈ [ t κ t κ +1 ] where c = (cid:112) β κ L σ k c − ( L σ β κ + k c ) and c = 2 arccos (cid:16) − c √ β κ L σ k c (cid:17) . By design, the event istriggered at φ = k c /β κ , which leads to the lower bound onthe inter-event time of ∆ t κ ≥ (cid:16) (cid:0) (3 k c + L σ β κ ) /c (cid:1) + c (cid:17) /c ≥ ( π + c ) /c =: t lb , where arctan( ξ ) < π/ , ∀ ξ > is used.Alternatively we consider the scenario, that the model errorcan continuously be monitored. Assumption 8:
Measurements of x , ˙ x n are continuouslyavailable without effort.To take advantage of this assumption, we propose the follow-ing event-trigger t κ +1 := { t > t κ | ∆ f κ ( x ) ≥ k c | r | } , (58)which allows to drop Assumption 6 and the probabilisticnature of Theorem 3 ( δ = 0 ) as formalized in the following. Corollary 3:
Consider the system (1) and a desired trajec-tory x d ( t ) under Assumptions 1-5, 7 and 8. Further considerthe control law (31), where f ( · ) is modeled by a GP meanfunction ˆ f κ ( · ) in (42) which is updated according to the event-triggering law (58) and Algorithm 1. Then, the tracking error e is globally asymptotically stable for any k c > and the inter-event time ∆ t κ := t κ +1 − t κ is lower bounded by a positiveconstant t lb > . Proof:
This is a direct consequence of the proof forTheorem 3.We want to highlight, that this requires measurements at anycontinuous time instance, which is generally not possible due to nonzero update rates of digital sensors. Therefore,Corollary 3 is rather stated for completeness. Nevertheless,the algorithm remains data-efficient despite the infinity mea-surements in finite time, because data points are only storedif actually needed.
B. Ultimate boundedness for noisy measurements
In case of noisy measurements of ˙ x n (Assumption 7 doesnot hold), it is possible to find an ultimate bound to whichthe system converges. The difference to Theorem 2 is that wenow make use of the event-triggered model update (Theorem 2allowed arbitrary updates), which shrinks down the size of theultimate bound to a size which is proportional to the noiselevel. From Theorem 3 we derive the following results. Corollary 4:
Consider the system (1) and a desired trajec-tory x d ( t ) under Assumptions 1-6. Further consider the controllaw (31), where f ( · ) is modeled by a GP mean function ˆ f κ ( · ) in (42) which is updated according to the event-triggering law t κ +1 := { t > t κ | β κ σ κ ( x ) ≥ k c | r | ∩ e / ∈ B σ on } . (59)and Algorithm 1 where B σ on = (cid:40) e ∈ ˜ X (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) (cid:107) e (cid:107) ≤ σ on β κ k c (cid:13)(cid:13)(cid:2) λ (cid:124) (cid:3)(cid:13)(cid:13) (cid:41) . (60)Then, the tracking error e is GUUB to the set B σ on forany k c > and the inter-event time ∆ t κ is lower bounded bya positive constant t (cid:48) lb > , for all κ ∈ N with probability − δ . Proof:
In contrast to Theorem 3, a measurement at time t κ does not lead to ∆ f κ ( x ( t κ )) = 0 , but we make use of thefact that the variance function of a GP (15) at any trainingpoint can be upper bounded in terms of the measurementnoise. Considering the variance for a single training data pointat x ( t κ ) as an upper bound for the variance function (whichholds according to [38]), the following is concluded σ κ ( x ( t κ )) ≤ (cid:115) σ f − σ f σ f + σ on = (cid:115) σ on σ on /σ f < σ on (61)for σ f < ∞ using k f ( x , x ) = σ f in (15). Considering againthe Lyapunov function (32) and its time derivative ˙ V κ ( x ( t κ )) ≤ | r | ( β κ σ on − k c | r | ), (62)it is clear that inside of B σ on negative definiteness of ˙ V κ cannotnot be ensured. But outside of this ball it is negative definiteand therefore GUUB can be shown similarly to Theorem 3.To exclude Zeno behavior only e / ∈ B σ on must be analyzed,since inside B σ on no events are triggered. The lower bound onthe inter-event time is derived along the lines of Theorem 3.Hence, the dynamics of φ ( t ) as derived in (56) are the same forthe noisy case, but the initial condition φ ( t κ ) is now unequalfrom zero (due to the noise). However, it can be upper boundedby φ ( t κ ) < (cid:115) σ on σ on /σ f / | r | := φ . he solution for the zero initial condition in (57) is adapted toa nonzero initial condition φ according to [51] by changing c to c (cid:48) = 2 arctan (cid:18) β κ φ + L σ β κ + k c c (cid:19) . (63)Accordingly, the lower bound on the inter event time is ∆ t κ ≥ ( π + c (cid:48) ) /c =: t (cid:48) lb , which concludes the proof. C. Forgetting strategies
With the event in (53) and Algorithm 1, we have proposed astrategy, which adds data points to the dataset only if necessaryto ensure further convergence of the system. However, this stillleads to a growing computational burden for computing theGP model as the cardinality of the dataset D κ monotonicallyincreases with time. Particularly, if the desired trajectorycovers a large area in the state space or when high precisiontracking is required, keeping up the real-time capability of theadaptation algorithm is challenging. A common technique tocircumvent this problem is a forgetting mechanism (deletingold data points when new ones are added) if a particular budgetis reached. While most other works, e.g. [29], use a heuristicfor the forgetting strategy, we propose a safe forgetting rule,which requires to store only a single data point. Corollary 5:
Consider the system (1) and a desired trajec-tory x d ( t ) under Assumptions 1-7. Further consider the controllaw (31), where f ( · ) is modeled by a GP mean function ˆ f κ ( · ) in (42). This estimate is updated at the event t κ +1 in (53),where at each event κ all old data points are eliminated fromthe dataset, thus D κ = { x ( t κ ), ˙ x n ( t κ ) } . Then, with probability − δ , the tracking error e is globallyasymptotically stable for any k c > . Proof:
This follows along the lines of the proof ofTheorem 3. With the continuity of σ ( x ) , which is zero at thesingle training point, σ ( x ( t κ )) = 0 , it follows that there existsa neighborhood of x ( t κ ) for which σ κ ( x ) < k c | r | holds. Thusthe results from Corollary 6 and Theorem 3 are applicable.Deleting all old data points is consequent in terms of dataefficiency, but in general triggers events more frequently. Thus,by storing more than one data point future measurements canbe avoided particularly for periodic desired trajectories. For afixed budget ¯ N ∈ N , we can also forget unnecessary pointsand still guarantee stability. Corollary 6:
Consider the system (1) and a desired trajec-tory x d ( t ) under Assumptions 1-7. Further consider the controllaw (31), where f ( · ) is modeled by a GP mean function ˆ f κ ( · ) in (42). This estimate is updated at the event t κ +1 in (53),where at each event κ the dataset D κ is limited to hold atmost ¯ N ∈ N data points, such that β elim κ σ elim κ ( x ) < k c | r | (64)remains true, where β elim κ and σ elim κ denote the values after theelimination. Then, with probability − δ , the tracking error e is globally asymptotically stable for any k c > . Proof:
This follows along the lines of the proof ofTheorem 3. By Corollary 5, it is known, that there alwaysexists a reduced dataset which fulfills (64) for ¯ N ≥ .Finding the reduced dataset is not a trivial combinatorialproblem, but we refer to the existing literature for efficientalgorithms [52]. Note, that the reduced dataset must necessar-ily contain the most recent measurement at t κ as otherwisethe event would not have been triggered. D. Discussion
From a control perspective, the most important advantageof GPs is the quantification of the uncertainty, i.e. an upperbound of the model error as given in by Lemma 4. We notethat the prerequisite for this lemma, the bounded RKHS normin Assumption 6, is difficult to verify, however minimal as-sumptions are necessary as otherwise a generalization beyondthe training data is impossible [46].Also the maximum mutual information γ κ in Lemma 4 can-not be computed analytically for a general kernel, but we referto the existing literature [45], which provides upper boundson γ κ for different kernels (including the squared exponentialkernel). Since β κ is not trivial to find, we would like topoint out that β κ always appears in the ratio with k c , thusany conservatism/approximation in β κ can be compensatedgenerally by the designers choice of the control gain k c .Overcoming these challenges, the GP allows - based onevent-triggered online learning - to design a feedback lin-earizing control law, which asymptotically stabilizes an initialunknown system (with high probability). This is made possibleby the error bounds on the model which is the most significantadvantage of a GP over alternative modeling approaches likeneural networks [53].As the model update is event-triggered, only data pointswhich are necessary to increase the precision of the model arecollected. This reduces the frequency at which measurementsare taken and increases data efficiency. With Corollary 5 wehave shown, that only a single data point must be storedto guarantee asymptotic convergence. This is a significantadvantage of the locally linearizing control law in comparisonto predictive control laws or reinforcement learning algo-rithms, where an accurate global model is required. Accurateglobal models require active exploration, e.g. through explo-ration noise, which sacrifices control performance (known asexploration-exploitation trade-off).VI. N UMERICAL I LLUSTRATION
To illustrate the proposed approach, we present simulations for the control affine system ˙ x = x , (65) ˙ x = 1 − sin( x ) + s ( x ) (cid:124) (cid:123)(cid:122) (cid:125) = f ( x ) + (cid:18) x / (cid:19)(cid:124) (cid:123)(cid:122) (cid:125) = g ( x ) u , where s ( x ) = − x / is the sigmoidal function. Itis a modified pendulum system and fulfills Assumptions 1 The code is available at https://gitlab.lrz.de/ga68car/adaptFeLi4GPs t x x d ( t ) x ( t ) Fig. 3. Scenario 1: The black solid line illustrates the actual, the green dashedline the desired value for the state x . The system converges to the desiredstate over time. and 2. To ensure Assumption 6 holds, we do not simulatedirectly on (65) but use a GP mean which was trained on itwith a high density of training points. As we are working insimulation Assumptions 3, 5 and 7 do hold or do not holdby design in the following two different scenarios, which weuse to illustrate the proposed approach. An overview of theemployed parameters is given in Table I A. Scenario 1: Time-triggered updates
In Scenario 1 (S1), we illustrate the results from Sec. IVwhich are shown to hold for an arbitrary switching sequence.Therefore, we utilize a periodic, time-triggered model update,thus t κ +1 − t κ = ∆ t , ∀ κ , with ∆ t = 0.5 and followAlgorithm 1. We consider f ( x ) and g ( x ) to be unknown,so Assumption 5 does not hold, but we know that g ( x ) ispositive, so Assumption 2 holds. As reference trajectory x d ( t ) = 1 −
11 + exp( − t − (66)is used, which describes a “soft” jump from x = 1 to x = 0 at t = 10 and it fulfills the required smoothness in Assump-tion 4. The scenario works on noisy measurements, thusAssumption 7 does not hold. As this scenario does not utilizeAssumption 6, we consider the kernel’s hyperparameters tobe unknown. Therefore, an hyperparameter optimization ac-cording to (8) is performed at each model update step κ . Thesimulation is stopped manually after T sim = 20 , which leadsto N = 40 data points.Figure 3 shows the desired trajectory and the correspondingtracking performance of the controller over time. Figure 4illustrates the resulting trajectory in the state space. In Figs. 5and 6 the true system dynamics, f ( x ), g ( x ) are comparedwith the approximations ˆ f ( x ), ˆ g ( x ) at the end of the simu-lation. It turns out, that the hyperparameters for k fg are wellidentified: With l f (cid:28) l f , the estimate for ˆ f ( x ) showsthat f ( x ) mainly depends on x (and vice versa for g ( x ) with l g (cid:29) l g ). As it can be seen in Figs. 5 and 6 theestimates are more precise near the training data.It can be seen, that the two stationary points of the desiredtrajectory, (1, 0) and (0, 0) , are approached with high precisionin the steady state. However, for both it requires a few mea-surements to be collected in the corresponding area of the state − − − x x x ( i ) x ( t ) x d Fig. 4. Scenario 1: Black crosses indicate the collected training points, theblack solid line illustrates the actual, the green dashed the desired trajectory.The system approaches the desired states as more training points are collected. − − − − − − x x r e l a t i v ee rr o r ( ˆ f ( x ) − f ( x )) /f ( x ) x ( i ) Fig. 5. Scenario 1: The surface illustrates the relative error between the truefunction f ( x ) and the model estimate ˆ f ( x ) after taking training points(black crosses). The error is the lowest (in terms of absolute value) near thetraining data. − − − − − − x x r e l a t i v ee rr o r (ˆ g ( x ) − g ( x )) /g ( x ) x ( i ) Fig. 6. Scenario 1: The surface illustrates the relative error between the truefunction g ( x ) and the model estimate ˆ g ( x ) after taking training points(black crosses). The error is the lowest (in terms of absolute value) near thetraining data. c λ m g ( x ) σ on (S1) σ on (S2) x β r min , ∀ x − − [3 2] (cid:124) − TABLE IS
IMULATION PARAMETERS space and the following model updates to achieve this highprecision. Once reached the steady state, the time-triggeredimplementation keeps adding unnecessary data points eventhough the model already has a high precision in this area.This is improved with the event-triggered model update asillustrated in Scenario 2.
B. Scenario 2: Event-triggered updates
In Scenario 2 (S2), the results in Sec. V are illustrated,which utilizes the event-triggered model update describedby (53). For this scenario, g ( x ) is assumed to be known(Assumption 5 holds). The reference trajectory x d ( t ) = sin( t ) (67)is used, which describes a circle with radius in the statespace. The scenario works on noise free measurements, thusAssumption 7 does hold, however, for numerical stability aminimal noise is assumed ( σ on = 10 − ). The simulation isstopped manually after T sim = 100 .As this scenario utilizes Assumption 6, we take the kernelhyperparameters to be known at σ f = 5, l / f = 5 and donot update these at any of the triggered events. Additionally,we set β κ constant ∀ κ and refer to the discussion in Sec. V-Dand [28]. Additionally, we enforce a lower bound r > r min toavoid numerical difficulties.Figure 8 shows the tracking error until t = 30 , which de-creases initially approximately exponentially until a numericallimit is reached. In the event-triggered setup, a total of events are triggered until sufficient training points are collectedaround the desired trajectory. This is also visualized in thestate space view in Fig. 7. In comparison, the time-triggeredapproach requires to store data points and would keepadding points for longer simulations, which the event-basedwould not.The time-trigger results in a higher computational burdenof the data-driven approach: The event-triggered simulationonly takes ≈ s on a Matlab MATLAB 2019a implementationon a i5-6200U CPU with 2.3GHz and uses MB. Thetime-triggered approach take ≈ s and uses MB ofmemory.Figure 8 also shows, that the stability criteria in Theorem 3is fulfilled for the event-triggered case, since β κ σ κ ( x ) ≤ k c | r | holds for any time. In contrast - for the time-triggered case -this condition is violated frequently, which means that negativedefiniteness of the common Lyapunov function - and therebystability - cannot be shown.VII. CONCLUSIONThis article proposes an online learning feedback linearizingcontrol law based on Gaussian process models. The closed-loop identification of the initially unknown system exploitsthe control affine structure by utilizing a composite kernel. − − − x x σ x ( i ) x ( t ) x d ( t ) Fig. 7. Scenario 2: Black crosses indicate the collected training points, theblack solid line illustrates the actual, the green dashed the desired trajectory.The colormap shows the variance function (15) for the GP σ ( x ) after the51st update, where yellow indicates low variance and blue high variance. The model is updated event-triggered, taking advantage ofthe uncertainty measure of the GP. The control law resultsin global asymptotic stability of the tracking error in thenoiseless case and in global uniform ultimate boundednessfor noisy measurements (of the highest state derivative) withhigh probability. We therefore propose a safe and data-efficientonline learning control approach because model updates occuronly if required to ensure stability. Zeno behavior is excludedas a lower bound on the inter-event time is derived. The pro-posed techniques are illustrated using simulations to supportthe theoretical results. A
PPENDIX AE XPRESSING STRUCTURE IN KERNELS
According to [4], the kernel of the GP does not onlydetermine the smoothness properties of the resulting functionsbut can also be utilized to express prior knowledge regardingthe structure of the unknown function.
A. Sum of functions
Consider f a , f b : X → R which both originate from twoindependent GP priors f a ( x ) ∼ GP ( m a ( x ), k a ( x , x (cid:48) )) , (68) f b ( x ) ∼ GP ( m b ( x ), k b ( x , x (cid:48) )) , (69)and add up to f sum : X → R , thus f sum ( x ) = f a ( x ) + f b ( x ) .Then, f sum ( x ) ∼ GP ( m a ( x ) + m b ( x ), k a ( x , x (cid:48) ) + k b ( x , x (cid:48) )) (70)is also a GP with kernel k a ( x , x (cid:48) ) + k b ( x , x (cid:48) ) . t | σ κ / r | event-triggered online learning t κ k e k| σ κ /r | k c /β κ − − − k e k t | σ κ / r | time-triggered online learning t κ k e k| σ κ /r | k c /β κ − − − k e k Fig. 8. Scenario 2: Comparison of the event-triggered (top) and time-triggered (bottom) online learning. For the first, events (magenta circles) are triggeredwhen the threshold k c /β κ (black vertical line) is reached by σ κ /r as proposed in (53). For the latter, events are triggered after a fixed time interval ( ∆ t = 0.5 ).The blue lines show the norm of the tracking error (cid:107) e (cid:107) . For regression, where noisy measurementswith (cid:15) ( i ) ∼ N (0, σ on ) of the sum of the two functionare available y ( i ) sum = f sum (cid:16) x ( i ) (cid:17) + (cid:15) ( i ) = f a (cid:16) x ( i ) (cid:17) + f b (cid:16) x ( i ) (cid:17) + (cid:15) ( i ) , (71)with i = 1, . . . , N , the joint distribution of the individualfunctions and the observations is given by f a ( x ∗ ) f b ( x ∗ ) y sum ∼ N , k ∗ a k (cid:124) a k ∗ b k (cid:124) b k a k b K a + K b + σ on I N , (72)where the prior mean functions are set tozero m a ( x ) = m b ( x ) = 0 for notational simplicityand k a , k b , k ∗ a , k ∗ b are defined according to (16). Byconditioning, the output of f a and f b are inferred for a testpoints x ∗ f a ( x ∗ ) | X , y sum ∼ N (cid:0) k (cid:124) a K − sum y sum , k ∗ a − k (cid:124) a K − sum k a (cid:1) , (73) f b ( x ∗ ) | X , y sum ∼ N (cid:0) k (cid:124) b K − sum y sum , k ∗ b − k (cid:124) b K − sum k b (cid:1) , (74) where K sum = K a + K b + σ on I N with K a , K b accordingto (11). Similarly to (8), the extended hyperparameter vec-tor ψ sum = [ ψ (cid:124) a ψ (cid:124) b ] (cid:124) is obtained through optimization of thelikelihood, where K = K sum and y f = y sum . This allows topredict a value of the individual functions f a , f b even thoughonly their sum has been measured. B. Product with known function
Consider an unknown function f h ( x ) : X → R , which ismultiplied with the known function h ( x ) : X → R , we canmodel f h using a GP with a scaled kernel function and noisymeasurements y ( i ) prod = f prod (cid:16) x ( i ) (cid:17) + (cid:15) ( i ) = f h (cid:16) x ( i ) (cid:17) h (cid:16) x ( i ) (cid:17) + (cid:15) ( i ) (75)of the product with (cid:15) ( i ) ∼ N (0, σ on ), i = 1, . . . , N . Thus,if f h ∼ GP (0, k h ( x , x (cid:48) )) is a GP, then f prod ( x ) is also a GPwith kernel k prod ( x , x (cid:48) ) = h ( x ) k h ( x , x (cid:48) ) h ( x (cid:48) ), (76)where the prior mean is set to zero m h ( x ) = 0 for notationalsimplicity.he joint distribution of the measurements and the inferredoutput of f prod at a test input x ∗ is given by (cid:20) f h ( x ∗ ) y prod (cid:21) ∼ N (cid:18) , (cid:20) k ∗ h k (cid:124) h H (cid:124) Hk h H (cid:124) K h H + σ on I N (cid:21)(cid:19) , (77)where H = diag (cid:0) h (cid:0) x (1) (cid:1) , . . . , h (cid:0) x ( N ) (cid:1)(cid:1) ∈ R N × N and k h , k ∗ h , K h are defined similarly to (16) and (11),respectively. By conditioning on the training data and the inputthe function f h is inferred by f h ( x ∗ ) | X , y prod ∼ N (cid:0) k (cid:124) h H (cid:124) K - prod y prod , (78) k ∗ h − k (cid:124) h H (cid:124) K - prod Hk h (cid:1) , where K prod = H (cid:124) K h H + σ on I N . Remark 8:
Instead of scaling the kernel, it seems morestraight forward to use y ( i ) prod /h (cid:0) x ( i ) (cid:1) as training data for aGP with unscaled kernel. However, this would scale the ob-servation noise undesirably, is numerically not stable and is notcompatible with the summation of kernels in Appendix A-A,which we combine in our identification approach in Sec. III-B.A PPENDIX BI MPROVING IDENTIFICATION
From Lemma 2 it is known, that the state remains boundedfor any finite < T < ∞ without any control input. Thus,without risking damage to the system, one can set u = 0 forat time interval T and record an open-loop training point y ( i ol ) = f (cid:16) x ( i ol ) (cid:17) + (cid:15) ( i ol ) , (79)which is highly beneficial as it only measures f ( x ) (withthe usual noise (cid:15) ). The GP framework allows to merge these i ol = 1, . . . , N ol observations with the closed-loop trainingpoints in D κ to improve the prediction as follows: Considerthe extension of the joint distribution (25) (where u = 1 isassumed in the close loop measurements and m g ( x ) = 0 fornotational convenience) f ( x ∗ ) g ( x ∗ ) yy ol ∼ N , k ∗ f k (cid:124) f k (cid:124) f , ol k ∗ g k (cid:124) g × N ol k f k g K fg K (cid:124) ol,cl k f , ol N ol × K ol,cl K ol , where K ol,cl , K ol are the pairwise evaluation of k ( x ( i ol ) , x ( i ) ) , k (cid:16) x ( i ol ) , x ( i (cid:48) ol ) (cid:17) and k f , ol evaluates k (cid:0) x ∗ , x ( i ol ) (cid:1) . forall i = 1, . . . , N , i ol , i (cid:48) ol = 1, . . . , N ol . Then, the estimatesare given by ˆ f ( x ∗ ) = [ k (cid:124) f k (cid:124) f , ol ] K − ˜ y , ˆ g ( x ∗ ) = [ k (cid:124) g × N ol ] K − ˜ y with K = (cid:20) K fg K (cid:124) ol,cl K ol,cl K ol (cid:21) , and ˜ y = (cid:20) yy ol (cid:21) . (80)We do not further investigate this extension, since it doesnot provide any additional formal guarantees regarding theconvergence, however, in practice, a significant improvementof the identification can be expected. R EFERENCES[1] L. Ljung,
System Identification . NJ, USA: Prentice Hall PTR, 1998.[2] J. Kocijan,
Modelling and Control of Dynamic Systems Using GaussianProcess Models . Springer, 2016.[3] C. E. Rasmussen and C. K. Williams,
Gaussian Processes for MachineLearning . Cambridge, MA, USA: MIT Press, Jan. 2006.[4] D. Duvenaud, “Automatic model construction with Gaussian processes,”Ph.D. dissertation, Computational and Biological Learning Laboratory,University of Cambridge, 2014.[5] P. A. Ioannou and J. Sun,
Robust adaptive control . PTR Prentice-HallUpper Saddle River, NJ, 1996, vol. 1.[6] M. Krstic, I. Kanellakopoulos, P. V. Kokotovic et al. , Nonlinear andadaptive control design . Wiley New York, 1995, vol. 222.[7] K. J. ˚Astr¨om and B. Wittenmark,
Adaptive control . Courier Corporation,2013.[8] A. S. Bazanella, L. Campestrini, and D. Eckhard,
Data-driven controllerdesign: the H2 approach . Springer Science & Business Media, 2011.[9] G. R. Gonc¸alves da Silva, A. S. Bazanella, C. Lorenzini, andL. Campestrini, “Data-driven lqr control design,”
IEEE Control SystemsLetters , vol. 3, no. 1, pp. 180–185, Jan 2019.[10] L. Campestrini, D. Eckhard, A. S. Bazanella, and M. Gevers,“Data-driven model reference control design by prediction erroridentification,”
Journal of the Franklin Institute
IEEE Control Systems Magazine , vol. 26, no. 3, pp.96–114, Jun. 2006.[12] M.-B. R˘adac, R.-E. Precup, E. M. Petriu, and S. Preitl, “Iterative data-driven tuning of controllers for nonlinear systems with constraints,”
IEEE Transactions on Industrial Electronics , vol. 61, no. 11, pp. 6360–6368, Nov 2014.[13] Z. Hou, R. Chi, and H. Gao, “An overview of dynamic-linearization-based data-driven control and applications,”
IEEE Transactions onIndustrial Electronics , vol. 64, no. 5, pp. 4076–4090, May 2017.[14] M. C. Campi and S. M. Savaresi, “Direct nonlinear control design: thevirtual reference feedback tuning (vrft) approach,”
IEEE Transactionson Automatic Control , vol. 51, no. 1, pp. 14–27, Jan 2006.[15] H. Hjalmarsson, M. Gevers, S. Gunnarsson, and O. Lequin, “Iterativefeedback tuning: theory and applications,”
IEEE Control Systems Mag-azine , vol. 18, no. 4, pp. 26–41, Aug 1998.[16] L. C. Kammer, R. R. Bitmead, and P. L. Bartlett, “Direct iterative tuningvia spectral analysis,”
Automatica , vol. 36, pp. 1301–1307, 2000.[17] N. J. Killingsworth and K. Miroslav, “Pid tuning using extremumseeking: online, model-free performance optimization,”
IEEE ControlSystems Magazine , vol. 26, no. 1, pp. 70–79, Feb 2006.[18] E. Theodorou, J. Buchli, and S. Schaal, “Reinforcement learning of mo-tor skills in high dimensions: A path integral approach,” in
InternationalConference on Robotics and Automation (ICRA) . IEEE, May 2010, pp.2397–2403.[19] R. S. Sutton and A. G. Barto,
Reinforcement learning: An introduction ,1st ed. Cambridge, MA, USA: MIT Press, 1998.[20] M. P. Deisenroth and C. E. Rasmussen, “PILCO: A model-based anddata-efficient approach to policy search,” in
International Conference onMachine Learning (ICML) , 2011, pp. 465–472.[21] D. Nguyen-Tuong and J. Peters, “Model learning for robot control: asurvey,”
Cognitive processing , vol. 12, no. 4, pp. 319–340, 2011.[22] D. Nguyen-Tuong, M. Seeger, and J. Peters, “Model learning with localGaussian process regression,”
Advanced Robotics , vol. 23, no. 15, pp.2015–2034, 2009.[23] T. Beckers, J. Umlauft, and S. Hirche, “Stable model-based control withGaussian process regression for robot manipulators,” in
World Congressof the International Federation of Automatic Control (IFAC) , vol. 50,no. 1. Elsevier, 2017, pp. 3877–3884.[24] Y. Fanger, J. Umlauft, and S. Hirche, “Gaussian processes for dynamicmovement primitives with application in knowledge-based cooperation,”in
International Conference on Intelligent Robots and Systems (IROS) .IEEE, Oct. 2016, pp. 3913–3919.[25] T. Beckers, J. Umlauft, D. Kulic, and S. Hirche, “Stable Gaussianprocess based tracking control of Lagrangian systems,” in
Conferenceon Decision and Control (CDC) . IEEE, Dec. 2017, pp. 5180–5185.[Online]. Available: https://ieeexplore.ieee.org/document/826442726] J. Umlauft, A. Lederer, and S. Hirche, “Learning stable Gaussianprocess state space models,” in
American Control Conference (ACC) ,IEEE. IEEE, May 2017, pp. 1499–1504. [Online]. Available:https://ieeexplore.ieee.org/document/7963165[27] J. Umlauft, L. P¨ohler, and S. Hirche, “An uncertainty-basedcontrol Lyapunov approach for control-affine systems modeled byGaussian process,”
IEEE Control Systems Letters , vol. 2, no. 3,pp. 483–488, Jul. 2018, outstanding Student Paper Award of theIEEE Conference on Decision and Control. [Online]. Available:https://ieeexplore.ieee.org/document/8368325[28] F. Berkenkamp, R. Moriconi, A. Schoellig, and A. Krause, “Safelearning of regions of attraction for uncertain, nonlinear systems withGaussian processes,” arXiv preprint arXiv:1603.04915 , 2016.[29] G. Chowdhary, H. A. Kingravi, J. P. How, and P. A. Vela, “Bayesiannonparametric adaptive control using Gaussian processes,”
IEEE Trans-actions on Neural Networks and Learning Systems , vol. 26, no. 3, pp.537–550, Mar. 2015.[30] A. Chakrabortty and M. Arcak, “Robust stabilization and performancerecovery of nonlinear systems with unmodeled dynamics,”
IEEE Trans-actions on Automatic Control (TAC) , vol. 54, no. 6, pp. 1351–1356, June2009.[31] A. Chakrabortty and M. Arcak, “Time-scale separation redesigns forstabilization and performance recovery of uncertain nonlinear systems,”
Automatica
Identification and control ofnonlinear systems using neural network models: Design and stabilityanalysis . University of Southern Calif., 1991.[33] F. L. Lewis, A. Yesildirek, and K. Liu, “Multilayer neural-net robotcontroller with guaranteed tracking performance,”
IEEE Transactionson Neural Networks , vol. 7, no. 2, pp. 388–399, 1996.[34] F. L. Lewis, K. Liu, and A. Yesildirek, “Neural net robot controllerwith guaranteed tracking performance,”
IEEE Transactions on NeuralNetworks , vol. 6, no. 3, pp. 703–715, 1995.[35] R. M. Sanner and J.-J. Slotine, “Stable adaptive control and recursiveidentification using radial Gaussian networks,” in
Conference on Deci-sion and Control (CDC) . IEEE, 1991, pp. 2116–2123.[36] A. Yesildirak and F. L. Lewis, “Feedback linearization using neuralnetworks,”
Automatica , vol. 31, no. 11, pp. 1659–1664, 1995.[37] J. Umlauft, Y. Fanger, and S. Hirche, “Bayesian uncertainty modelingfor programming by demonstration,” in
International Conference onRobotics and Automation (ICRA) , May 2017, pp. 6428–6434.[38] J. Umlauft, T. Beckers, M. Kimmel, and S. Hirche, “Feedbacklinearization using Gaussian processes,” in
Conference on Decision andControl (CDC) . IEEE, Dec. 2017, pp. 5249–5255. [Online]. Available:https://ieeexplore.ieee.org/abstract/document/8264435/[39] H. K. Khalil and J. Grizzle,
Nonlinear systems . Prentice hall NewJersey, 1996, vol. 3.[40] M. W. Seeger, S. M. Kakade, and D. P. Foster, “Information consistencyof nonparametric Gaussian process methods,”
IEEE Transactions onInformation Theory , vol. 54, no. 5, pp. 2376–2382, May 2008.[41] J. Sherman and W. J. Morrison, “Adjustment of an inverse matrixcorresponding to a change in one element of a given matrix,”
Ann.Math. Statist. , vol. 21, no. 1, pp. 124–127, Mar. 1950. [Online].Available: https://doi.org/10.1214/aoms/1177729893[42] D. Liberzon,
Switching in systems and control . Springer Science &Business Media, 2012.[43] T. Beckers, D. Kuli´c, and S. Hirche, “Stable Gaussian process basedtracking control of Euler-Lagrange systems,”
Automatica , vol. 23, no.103, pp. 390–397, 2019.[44] J.-J. E. Slotine and J. Karl Hedrick, “Robust input-output feedbacklinearization,”
International Journal of Control , vol. 57, no. 5, pp. 1133–1139, 1993.[45] N. Srinivas, A. Krause, S. M. Kakade, and M. W. Seeger, “Information-theoretic regret bounds for Gaussian process optimization in the banditsetting,”
IEEE Transactions on Information Theory , vol. 58, no. 5, pp.3250–3265, May 2012.[46] D. H. Wolpert, “The supervised learning no-free-lunch theorems,” in
Soft Computing and Industry . Springer, 2002, pp. 25–42.[47] W. P. M. H. Heemels, K. H. Johansson, and P. Tabuada, “An introductionto event-triggered and self-triggered control,” in
Conference on Decisionand Control (CDC) , Dec. 2012, pp. 3270–3285. [48] D. Nguyen-Tuong, J. R. Peters, and M. Seeger, “Local Gaussian processregression for real time online model learning,” in
Advances in NeuralInformation Processing Systems (NIPS) , D. Koller, D. Schuurmans,Y. Bengio, and L. Bottou, Eds. Curran Associates, Inc., 2009, pp.1193–1200.[49] J. Umlauft, T. Beckers, and S. Hirche, “A scenario-based optimalcontrol approach for Gaussian process state space models,” in
EuropeanControl Conference (ECC) , 6 2018, pp. 1386–1392. [Online]. Available:https://ieeexplore.ieee.org/document/8550458[50] P. Tabuada, “Event-triggered real-time scheduling of stabilizing controltasks,”
IEEE Transactions on Automatic Control (TAC)
Journal of heuristics , vol. 4, no. 1, pp. 63–86,1998.[53] F. Lewis, S. Jagannathan, and A. Yesildirak,
Neural network control ofrobot manipulators and non-linear systems . CRC Press, 1998.
Jonas Umlauft (S’15) received his B.Sc. and M.Sc.degree in electrical engineering and informationtechnology from the Technical University of Mu-nich, Germany, in 2013 and 2015, respectively. HisMaster’s thesis was carried out at the Computationaland Biological Learning Group at the Universityof Cambridge, UK. Since May 2015, he is a PhDstudent at the Chair of Information-oriented Control,Department of Electrical and Computer Engineeringat the Technical University of Munich, Germany. Hiscurrent research interests includes stability of data-driven control systems and system identification based on Gaussian processes.