[PDF] Safe learning-based trajectory tracking for underactuated vehicles with partially unknown dynamics

Abstract

Underactuated vehicles have gained much attention in the recent years due to the increasing amount of aerial and underwater vehicles as well as nanosatellites. The safe tracking control of these vehicles is a substantial aspect for an increasing range of application domains. However, external disturbances and parts of the internal dynamics are often unknown or very time-consuming to model. To overcome this issue, we present a safe tracking control law for underactuated vehicles using a learning-based oracle for the prediction of the unknown dynamics. The presented approach guarantees the boundedness of the tracking error with high probability where the bound is explicitly given. With additional assumptions, asymptotic stability is achieved. A simulation with a quadrocopter visualizes the effectiveness of the proposed control law.

Full PDF

aa r X i v : . [ ee ss . S Y ] S e p Safe learning-based trajectory tracking forunderactuated vehicles with partially unknown dynamics

Thomas Beckers , Leonardo Colombo , and Sandra Hirche Abstract — Underactuated vehicles have gained much atten-tion in the recent years due to the increasing amount of aerialand underwater vehicles as well as nanosatellites. The safetracking control of these vehicles is a substantial aspect foran increasing range of application domains. However, externaldisturbances and parts of the internal dynamics are oftenunknown or very time-consuming to model. To overcome thisissue, we present a safe tracking control law for underactuatedvehicles using a learning-based oracle for the prediction of theunknown dynamics. The presented approach guarantees theboundedness of the tracking error with high probability wherethe bound is explicitly given. With additional assumptions,asymptotic stability is achieved. A simulation with a quadro-copter visualizes the effectiveness of the proposed control law.

I. I

NTRODUCTION

The demand for unmanned aerial and underwater vehiclesis rapidly increasing in many areas such as monitoring,mapping, agriculture, and delivery. These vehicles are typ-ically underactuated due to constructional reasons whichposes several challenges from the control perspective [1]. Thedynamics of these systems can often be expressed by rigidbodies motion with full attitude control and one translationalforce input. This is a classical problem in underactuatedmechanics and many different types of control methods havebeen proposed to achieve an accurate trajectory tracking.Most of the control approaches are mainly based on feedbacklinearization [2], [3] and backstepping methods [4], [5] whichhave been shown to perform accurate tracking in simulationsand experiments. Furthermore, theoretical results about thestability of the tracking error have been proposed, e.g. in [6].However, these control approaches depend on exact mod-els of the systems and possible external disturbances to guar-antee stability and precise tracking. An accurate model oftypical uncertainties is hard to obtain by using ﬁrst principlesbased techniques. Especially the impact of air/water ﬂow onaerial/underwater vehicles or the interaction with unstruc-tured and a-priori unknown environment further compoundthe uncertainty. The increase of the feedback gains to sup-press the unknown dynamics is unfavorable due to the largeerrors in the presence of noise and the saturation of actuators.A suitable approach to avoid the time-consuming or evenunfeasible modeling process is provided by learning-basedoracles such as neural networks or Gaussian processes (GPs). are with the Chair of Information-oriented Control (ITR), Departmentof Electrical and Computer Engineering, Technical University of Munich,80333 Munich, Germany, { t.beckers, hirche } @tum.de is with Instituto de Ciencias Matematicas (CSIC-UAM-UCM-UC3M),Calle Nicolas Cabrera 13-15, Campus Cantoblanco, 28049 Madrid, Spain, [email protected] These data-driven modeling tools have shown remarkableresults in many different control applications, see [7]. In thiscase, data of the unknown system dynamics is collected andused by the oracle to predict the dynamics in areas withouttraining data. The collection of data can be performed ina separated step before the oracle is used in the controller(ofﬂine learning) or during the control (online learning).The purpose of this article is to employ the power oflearning-based approaches for the tracking control for aclass of underactuated systems. Additionally, stability anda desired level of performance of the closed-loop systemshould be guaranteed with low feedback gains when pos-sible. The problem of tracking control of underactuatedaerial/underwater vehicles with uncertainties has been ad-dressed in [3], [8]–[10] but these approaches are restrictedto structured uncertainties such as uncertain parameters oruse high feedback gains for compensation. Safe feedbacklinearization and backstepping controllers based on GaussianProcesses are introduced in [11]–[13] for a speciﬁc class ofsystems but they do not capture the general underactuatednature of the here considered model class and are limited toﬁxed feedback gains. In [14], [15] learning-based approachesfor Euler-Lagrange systems with stability guarantees arepresented. However, the systems are required to be fullyactuated. For a speciﬁc type of aerial vehicles, a safe Gaus-sian process based controller is proposed in [16] but withadditional assumptions such as an initial safe controller. Thecontribution of this article is a safe learning-based trackingcontrol law for a large class of underactuated vehicles withstability and performance guarantees. Instead of focusing ona particular type of oracle, the proposed approach allows theusage of various online and ofﬂine learning-based oracles.Additionally, our method allows to adapt the feedback gainsbased on the quality of the oracle to avoid the unfavorableeffects of high feedback gains. The remaining article isstructured as follows: After the problem setting in Section II,the learning-based oracles and the tracking controller areintroduced in Section III. Finally, a numerical example witha quadrocopter is presented in Section IV.II. P

ROBLEM S ETTING

We assume a single underactuated rigid body with posi-tion p ∈ R and orientation matrix R ∈ SO (3) . The body-ﬁxed angular velocity is denoted by ω ∈ R . The vehicle Vectors a are denoted with bold characters. Matrices A are describedwith capital letters. The term A i, : denotes the i-th row of the matrix A .The expression N ( µ, Σ) describes a normal distribution with mean µ andcovariance Σ . The set R > denotes the set of positive real numbers p τ τ τ orientation R Fig. 1. Vehicle with full attitude control and a translational force input. has mass m ∈ R > and rotational inertia tensor J ∈ R × .The state space of the vehicle is S = SE (3) × R with s = (( R, p ) , ( ω , ˙ p )) ∈ S denoting the whole state of thesystem. The vehicle is actuated with control torques τ ∈ R and a control force u > , u ∈ R , which is applied in a body-ﬁxed direction deﬁned by a unit vector e ∈ R . Note thatwe focus on non-zero control forces only as for u = 0 noposition tracking is possible in general. Thus, we can modelthe system as m ¨ p = R e u + f ( p , ˙ p )˙ R = R ˇ ω ˙ ω = J − (cid:0) J ω × ω + τ + f ω ( s ) (cid:1) , (1)where the map ˇ( · ) : R → so (3) is given by ˇ ω =  − ω − ω ω − ω − ω ω  . (2)The functions f : R → R and f ω : S → R aredisturbances and/or unmodeled dynamics. The general ob-jective is to track a trajectory speciﬁed by the functions ( R d , p d ) : [0 , T ] → SE (3) . For simplicity, we focus hereon position tracking only where the dynamics f is assumedto be unknown. The extension to rotation tracking is straight-forward and will be discussed later. A. Equivalent system

In preparation for the learning and control step, wetransform the system dynamics (1) in an equivalent form,which allows to separate the unknown system part from theestimates of the oracle. With the system matrix A ∈ R × and input matrix B ∈ R × given by A = (cid:20) I (cid:21) , B = (cid:20) m I (cid:21) , (3)and I ∈ R × as identity matrix, we can rewrite (1) as ˙ x = A x + B (cid:0) g ( R, u ) + ˆ f ( x ) + ρ ( x ) (cid:1) ˙ R = R ˇ ω ˙ ω = J − (cid:0) J ω × ω + τ + f ω ( s ) (cid:1) , (4)where state x = [ p ⊤ , ˙ p ⊤ ] ⊤ ∈ R . The term g : SE (3) → R in the dynamics (4) is assumed as virtual control input with g ( R, u ) := R e u . For the unknown dynamics f , we use theestimate ˆ f : R → R of an oracle. The estimation error ismoved to ρ ( x ) = f ( x ) − ˆ f ( x ) such that (4) is equivalentto (1) without the loss of generality. III. L EARNING - BASED CONTROL

A. Learning

For the learning of the unknown dynamics f of (1), weconsider an oracle which predicts the value of f ( x ) for agiven state x . For this purpose, the oracle collects N ( t ) ∈ N training points of the system (1) such that a data set D n ( t ) = { x { i } , y { i } } N ( n ) i =1 (5)exists, where the output data y ∈ R are given by y = m ¨ p − R e u . The data set D n ( t ) with n : R ≥ → N can changeover time t , such that the oracle allows online learning. Thetime-dependent estimate of the oracle is denoted by ˆ f n ( x ) to highlight the dependence on the corresponding data set D n . Note that this construction also allows ofﬂine learning,i.e. the prediction of the oracle depends on previous collecteddata only, or any hybrid online/ofﬂine approach. Remark 1

Simple oracles can be parametric models suchas a linear model, where the parameters are learned witha least-square approach based on the data set D n . Morepowerful oracles are given by neural networks, due to theiruniversal function approximation property [17]. Further-more, non-parametric oracles such as Gaussian processesand support vector machines have led to promising resultsas probabilistic function approximators [18], [19]. For the later stability analysis of the closed-loop, we intro-duce the following assumptions, which cover various typesof oracles.

Assumption 1

Consider an oracle for f ( x ) with the output ˆ f n ∈ C with bounded derivatives on a compact set X ⊆ R based on the data set D n (5) . There exists a bounded function ¯ ρ n : X → R ≥ such that the prediction error is given by P {k f ( x ) − ˆ f n ( x ) k ≤ ¯ ρ n ( x ) } ≥ δ (6) with a δ ∈ (0 , for all x ∈ X and n ( t ) . Assumption 2

The number of data sets D n is ﬁnite andthere are only ﬁnitely many switches of n ( t ) over time, suchthat there exists a time T ∈ R ≥ where n ( t ) = n end , ∀ t ≥ T Assumption 1 is fulﬁlled, for instance, by a Gaussian processmodel as oracle as shown in the next section. The secondassumption is little restrictive since the number of sets isoften naturally bounded due to ﬁnite computational poweror memory limitations and since the unknown function f in (1) is not time-dependent, long-life learning is typicallynot required. Furthermore, Assumption 2 ensures that theswitching between the data sets is not inﬁnitely fast whichis natural in real world applications. B. Gaussian process as oracle

Gaussian process models have been proven as very pow-erful oracle for nonlinear function regression. For the pre-diction, we concatenate the N ( n ) training points of D n in an input matrix X = [ x , x , . . . , x N ( n ) ] and a matrixof outputs Y ⊤ = [ y , y , . . . , y N ( n ) ] , where y might becorrupted by additive Gaussian noise with N (0 , σ ) . Then, arediction for the output y ∗ ∈ R at a new test point x ∗ ∈ X is given by µ i ( y ∗ | x ∗ , D n ) = m i ( x ∗ ) + k ( x ∗ , X ) ⊤ K − (7) (cid:0) Y : ,i − [ m i ( X : , ) , . . . , m i ( X : ,N )] ⊤ (cid:1) var i ( y ∗ | x ∗ , D n ) = k ( x ∗ , x ∗ ) − k ( x ∗ , X ) ⊤ K − k ( x ∗ , X ) . for all i ∈ { , , } , where µ i is the posterior mean and var i the posterior variance for the i -th output dimension.The kernel k : X × X → R is a measure for the correlationof two states ( x , x ′ ) . The selection of the kernel and thedetermination of the corresponding hyperparameters can beseen as degrees of freedom of the regression. A powerfulkernel for GP models is the squared exponential kernel. Anoverview of the properties of different kernels can be foundin [18]. Remark 2

The mean function m i : X → R allows toinclude prior knowledge about the unknown dynamics. Forinstance, the mean function can be achieved by commonsystem identiﬁcation techniques of the unknown dynamics f such as described in [20]. However, without any priorknowledge the mean function is set to zero, i.e. m i ( x ) = 0 . The function K : X N × X N → R N × N is called the Grammatrix whose elements are K j ′ ,j = k ( X : ,j ′ , X : ,j )+ δ ( j, j ′ ) σ for all j ′ , j ∈ { , . . . , N } with the delta function δ ( j, j ′ ) = 1 for j = j ′ and zero, otherwise. The vector-valued func-tion k : X × X N → R N , with the elements k j = k ( x ∗ , X : ,j ) for all j ∈ { , . . . , N } , expresses the covariance between x ∗ and the input training data X . Based on (7), the normaldistributed components y ∗ i | x ∗ , D n are combined into a multi-variable distribution y ∗ | ( x ∗ , D n ) ∼ N ( µ ( · ) , Σ( · )) , where µ ( y ∗ | x ∗ , D n ) = [ µ ( · ) , . . . , µ ( · )] ⊤ Σ( y ∗ | x ∗ , D n ) = diag [var ( · ) , . . . , var ( · )] . (8) Remark 3

For notational simplicity, we consider identicalkernels for each output dimension. However, the GP modelcan be easily adapted to different kernels for each outputdimension.

With the introduced GP model, we are now addressing As-sumption 1 using [14], [19], [21]. To provide model errorbounds, additional assumptions on the unknown function f must be introduced, in line with the no-free-lunch theorem. Assumption 3

Let X be a compact set. The kernel k isselected such that the function f has a bounded reproducingkernel Hilbert Space (RKHS) norm on X , i.e. k f i k k < ∞ for all i = 1 , , . The norm of a function f in a RKHS is a smoothnessmeasure relative to a kernel k that is uniquely connectedwith this RKHS. In particular, it is a Lipschitz constant withrespect to the metric of the used kernel. A more detaileddiscussion about RKHS norms is given in [22]. Assumption 3requires that the kernel must be selected in such a waythat the function f is an element of the associated RKHS.This sounds paradoxical since this function is unknown.However, there exist some kernels, namely universal kernels, which can approximate any continuous function arbitrarilyprecisely on a compact set [19, Lemma 4.55] such thatthe bounded RKHS norm is a mild assumption. Finally,with Assumption 3, the model error can be bounded by thefollowing lemma. Lemma 1 (adapted from [14])

Consider the unknownfunction f and a GP model satisfying Assumption 3. Themodel error is bounded by P n k µ (ˆ f n ( x ) | x , D n ) − f ( x ) k ≤ k β ⊤ Σ (ˆ f n ( x ) | x , D n ) k o ≥ δ for x ∈ X with δ ∈ (0 , , β ∈ R , γ ∈ R and β j = q k f j k k + 300 γ ln (cid:2) ( N + 1) / (1 − δ / ) (cid:3) γ = max x { } ,..., x { N +1 } ∈X

12 log | I N +1 + σ − K ( x , x ′ ) | x , x ′ ∈ n x { } , . . . , x { N +1 } o . Proof:

It is a direct implication of [14, Lemma 1].With Assumption 3 and the fact, that universals kernelsexist which generate bounded predictions with boundedderivatives, see [21], GP models can be used as oracle tofulﬁll Assumption 1. In this case, the prediction error boundis given by ¯ ρ n ( x ) := k β ⊤ Σ (ˆ f n ( x ) | x , D n ) k as shownin Lemma 1. Remark 4

An efﬁcient greedy algorithm can be used to ﬁndthe maximum γ of the information gain [23]. A similar boundis also presented in [24], but with the assumption that theunknown function f is a sample of the GP instead of thehere required bounded RKHS norm.C. Tracking control For the tracking control, we consider a given desiredtrajectory x d ( t ) : R t ≥ → X , x d ∈ C . The tracking erroris denoted by z ( t ) = x ( t ) − x d ( t ) . Before we propose themain theorem about the safe learning-based tracking controllaw, the feedback gain matrix G n is introduced. The matrix G n of the controller is allowed to be adapted with any updateof the oracle based on a new data set D n to keep the feedbackgains low. Property 1

The matrix G n ∈ R × is chosen such thatthere exist a symmetric positive deﬁnite matrix P n ∈ R × and a positive deﬁnite matrix Q n ∈ R × which satisfy theLyapunov equation P n (cid:0) A − BG n (cid:1) + (cid:0) A − BG n (cid:1) ⊤ P n = − Q n (9) for each switch of n ( t ) . Property 1 is satisﬁed if the real parts of all eigenvalues of ( A − BG n ) are negative. For example, this can be achievedby any G n = [ G n, , G n, ] , where G n, , G n, ∈ R × arepositive deﬁnite diagonal matrices, see [25]. Theorem 1

Consider the underactuated rigid-body systemgiven by (1) with unknown dynamics f and the exis-tence of an oracle satisfying Assumptions 1 and 2. Let G z , G z ∈ R × be positive deﬁnite symmetric matrices.hen, with Property 1, the control law τ = J ( e × ( R ⊤ g ¨ d ˇ ω e u − ωe ˙ u ) u − ) − J ω × ω − f ω ( s ) , ¨ u = e ⊤ ( R ⊤ g ¨ d − ˇ ω e u − ωe ˙ u ) , (10) guarantees that the tracking error is uniformly ultimatelybounded in probability by P {k z ( t ) k ≤ max x ∈X ¯ ρ n end ( x ) b n end , ∀ t ≥ T } ≥ δ (11) with constants b n end , T ∈ R ≥ on X . Remark 5

The control law does not depend on any statederivatives, which are typically noisy in measurements. Thederivatives are only necessary for the training of the oracle,see (5) , which can often inherently deal with noisy data. Forinstance, GP models can handle additive Gaussian noise onthe output data [18].

We prove the stability of the closed-loop with the proposedcontrol law with multiple Lyapunov function, where the n -th function is active when the oracle predicts based on thecorresponding training set D n . Note that due to a ﬁnitenumber of switching events, the switching between stablesystems can not lead to an unbounded trajectory, see [26]. Proof:

The term g ( R, u ) in (4) is assumed as virtual controlinput with the desired force g d ( t, n, x ) = m ¨ p d − G n z − ˆ f n ( x ) (12)where G n can change by the switching of n ( t ) . The trackingerror dynamics are given by ˙ z = A x + B (cid:16) g ( R, u ) + ˆ f n ( x ) + ρ n ( x ) (cid:17) − (cid:20) ˙ p d ¨ p d (cid:21) . (13)Using the desired acceleration ¨ p d of (12) in (13) leads to ˙ z = (cid:0) A − BG n (cid:1) z + B (cid:0) g ( R, u ) − g d ( t, n, x ) + ρ n ( x ) (cid:1) . In the next step, the boundedness of the tracking error z is proved. For this purpose, we use the matrices P n , Q n of Property 1 to construct the Lyapunov function V ,n ( z ) =0 . z ⊤ P n z and compute its evolution ˙ V ,n = − z ⊤ Q n z + ( B ⊤ P n z ) ⊤ (cid:0) g ( R, u ) − g d + ρ n ( x ) (cid:1) . The ﬁrst summand is negative for all z ∈ R . In the nextstep, we extend the previous Lyapunov function with theerror term z ∈ R with z ( t, n, x , R, u ) = g ( R, u ) − g d ( t, n, x ) , which describes the error between the virtualand the desired control input. Thus, it leads to a switchingLyapunov function V ,n ( z , z ) = V + 0 . z ⊤ z ≥ . Thederivative of V ,n leads to ˙ V ,n = ˙ V ,n + z ⊤ (cid:16) ˙ g − m p (3) d + G n ˙ z + ˙ˆ f n ( x ) (cid:17) , (14)where p (3) d denotes the third time-derivative of the desiredposition p d . Following again the idea of a desired virtualinput as in (12), we construct a desired value of ˙ g with g ˙ d = m p (3) d − G n ( ˙ˆ x − ˙ x d ) − B ⊤ P n z − G z z − ∂ ˆ f n ∂ x ˙ˆ x . (15)Instead of having dependencies on the typical noisy statederivative ˙ x , we use the estimation ˙ˆ x ∈ R given by ˙ˆ x = A x + B (cid:16) g ( R, u ) + ˆ f n ( x ) (cid:17) , (16) which only contains the known parts of the system dynam-ics (4). Then, the expression (15) is used to substitute ˙ g in (14). This leads to the evolution ˙ V ,n = − z ⊤ Q n z − z ⊤ G z z + z ⊤ P n B ρ n ( x )+ z ⊤ h ∂ ˆ f n ∂ x + G n i B ρ n ( x ) + ˙ g − g ˙ d ! . (17)Next, we deﬁne the error z ∈ R with z ( t, n, x , R, u ) = ˙ g ( R, u ) − g ˙ d ( t, n, x , R, u ) , (18)and an extended Lyapunov function V n ( z , z , z ) = V ,n + 12 z ⊤ z ≥ . (19)The derivative of V n leads to ˙ V n = ˙ V ,n + z ⊤ ¨ g − mp (4) d + G n ¨ z + BP n ˙ z + ddt h ∂ ˆ f n ∂ x ˙ˆ x i! and we construct a desired value of ¨ g with g ¨ d = mp (4) d − G n ∂ ˙ˆ x ∂ x ˙ˆ x − ¨ x d ! − BP n ( ˙ˆ x − ˙ x d ) − G z ˙ g − mp (3) d + G n ( ˙ˆ x − ˙ x d ) + ∂ ˆ f n ∂ x ˙ˆ x ! − ∂∂ x h ∂ ˆ f n ∂ x ˙ˆ x i ˙ˆ x − z − G z z . (20)Then, it is substituted into ˙ V n to obtain ˙ V n = − z ⊤ Q n z − z ⊤ G z z − z ⊤ G z z (21) +( z ⊤ P n + z ⊤ D ( x )+ z ⊤ E ( x )) B ρ n ( x )+ z ⊤ (¨ g − g ¨ d ) D ( x ) := ∂ ˆ f n ∂ x + G n (22) E ( x ) := BP + G z D + G n ∂ ˙ˆ x ∂ x + ∂∂ x (cid:16) ∂ ˆ f n ∂ x ˙ˆ x (cid:17) . (23)To eliminate the last summand in (21), we note that ¨ g ( R, u ) = R ( ˇ ω e u + 2 ˇ ω ˙ u + ˙ ω × e u + e ¨ u ) , (24)such that ¨ g − g ¨ d = 0 for ˙ ω × e u + e ¨ u = R ⊤ g ¨ d − ˇ ω e u − ω ˙ u. (25)Using (25) and Assumption 1, the evolution of the Lyapunovfunction V can be upper bounded by P { ˙ V n ≤ − z ⊤ Q n z − z ⊤ G z z − z ⊤ G z z + (cid:13)(cid:13) ( z P n + z ¯ D + z ¯ E ) B (cid:13)(cid:13) ¯ ρ ( x ) } ≥ δ, (26)with the upper bounds ¯ D ∈ R × and ¯ E ∈ R × , whichexist due to Assumption 1. Thus, the evolution is negativewith probability δ for all z = [ z ⊤ , z ⊤ , z ⊤ ] ⊤ with k z k > max x ∈X ¯ ρ n ( x ) k P n B k + (cid:13)(cid:13) ¯ DB (cid:13)(cid:13) + (cid:13)(cid:13) ¯ EB (cid:13)(cid:13) min { eig( Q n ) , eig( G z ) , eig( G z ) } | {z } =: λ n , (27)where a maximum of ¯ ρ n exists regarding to Assumption 1.Finally, the Lyapunov function (19) is lower and upperbounded by α ( k z k ) ≤ V n ( z ) ≤ α ( k z k ) , where α ( r ) =0 . { eig( P n ) , } r and α ( r ) = 0 . { eig( P n ) , } r .hus, we can compute the radius b n ∈ R ≥ of the bound by b n = max x ∈X ¯ ρ n ( x ) λ n s max { eig( P n ) , } min { eig( P n ) , } . (28)Since Assumption 2 only allows a ﬁnite number of switches,there exists a time T ∈ R ≥ such that n ( t ) = n end ∈ N forall t ≥ T . Thus, P {k z ( t ) k ≤ b n end , ∀ t ≥ T } ≥ δ . Remark 6

Extension to the rotation are analogously toperform with additional terms in the Lyapunov function asgiven in [6], [8].

Remark 7

The proposed approach allows multiple ways ofdata collection and adaptation of the feedback matrix G n .A possible strategy can be time-triggered where new datapoints are recurrently attached to the data set D n to improvethe prediction accuracy of the oracle and the magnitude of G n is decreased over time. More advanced strategies aremodel uncertainty or tracking error based collection andadaptation, as shown in [11]. The proof shows that the bound of the tracking error (28)depends on the prediction error ¯ ρ n of the oracle. Dependingon prior knowledge about the unknown function f and theoracle used, the prediction error can vanish which leads toasymptotic stability of the tracking error. In order to achievethis, we introduce the following assumption. Assumption 4

Let exists a data set D n end , such that themodel error is bounded by P {k f ( x ) − ˆ f n end ( x ) k = 0 } ≥ δ . Simply speaking, the oracle must be able to reproduce theunknown function f with a certain probability without anyprediction error. Even though this seems to be a strongassumption, there exist this types of oracles if additional priorknowledge about the unknown function f is available. Remark 8

With a GP model as oracle, Assumption 4 is ob-viously satisﬁed if the posterior variance

Σ(ˆ f n ( x ) | x , D n end ) is zero on X , as shown in Lemma 1, for the data set D n end of the switching sequence. If the kernel function of the GPhas a ﬁnite dimensional feature space, the posterior variancevanishs for a ﬁnite number of distinct, noise-free data points,see [27]. A ﬁnite dimensional feature space is given, forinstance, by the linear or the polynomial kernel. With the additional Assumption 4, asymptotic stability of thetracking error is guaranteed which is formally written in thenext corollary.

Corollary 1

Consider the underactuated rigid-body systemgiven by (1) with unknown dynamics f and the existence ofan oracle satisfying Assumptions 1, 2 and 4 and Property 1.Then, the control law (10) renders the tracking error z asymptotically stable on X with probability δ .Proof: Using the result about the upper bound of theLyapunov derivative given by (26) with the additional As-sumption 4, i.e. ¯ ρ n ( x ) = 0 , leads to a negative ˙ V n on X with probability δ . IV. N UMERICAL EXAMPLE

In this section, we present a numerical example of aquadrocopter within a a-priori unknown wind ﬁeld. Thedynamics of the quadrocopter are described by (1) withmass m = 1 kg , inertia J = I kgm and the direction e = [0 , , ⊤ of the force input u . As unknown dynamics f ,we consider an arbitrarily chosen wind ﬁeld with ﬂow in z-direction and the gravity force given by f ( x ) = [0 , , sin( x ) + exp( − x ) − . ⊤ . (29)The rotational force f ω in (1) is assumed to be zero. AGP model is used as oracle to predict the z-component of f ( x ) with the squared exponential kernel, see [18]. The priorknowledge about the existing gravity is packed as estimate inthe mean function of the GP with m ( x ) = − . At startingtime t = 0 , the data set D n is empty such that the predictionis solely based on the mean function. In this example, weemploy an online learning approach which collects a newtraining point every . such that the total number oftraining points is N = 5 n . In Fig. 2, the ﬁrst of thedesired (dashed) and the actual trajectory (solid) is shown.The crosses denote the collected training data. Each trainingpoint consists of the actual state x and m ¨ p − R e u as givenby (5). Since the training point depends on the typically noisymeasurement of the acceleration ¨ p , Gaussian distributednoise N (0 , . ) is added to the measurement. The GPmodel is updated every . until t = 12 s , where the last 5collected training points are appended to the set D n and thehyperparameters are optimized by means of the likelihoodfunction, see [19]. Thus, the function n is the integer partof t up to t = 12 s given by n ( t ) = min(12 s , ⌊ t ⌋ ) . Theinitial feedback gain matrix is set to G n =0 = 

10 0 0 10 0 00 10 0 0 10 00 0 20 0 0 10  (30)and G z = G z = 2 I . In this example, we adapt thefeedback gain matrix based on the number of training points.When the GP model is updated with new training data, thefeedback gains are decreased by G n = 0 . n G n =0 . Thus,after the ﬁrst update, the feedback gains are of theinitial gains, see Fig. 4. The simulation time is

15 s . Figure 3 . . . − − N = 0 N = 5 N = 10 . . . Time [s] P o s iti on [ m ] desired actual training dataFig. 2. A segment of the desired and actual trajectory. Every . a trainingpoint is recorded. Every . (black line) the oracle is updated based onall collected training points N up to this point. The additional training dataallows to reﬁne the model such that the tracking error is decreasing. − . . − − x position [m]y position [m] z po s iti on [ m ] desiredactualFig. 3. Actual trajectory converges to desired trajectory. V n Time [s] N o r m K n Fig. 4. Top: Lyapunov function converges to a tight set around zero. Thejumps occur when the oracle is updated. Bottom: Norm of the feedbackgain matrix is decreasing visualizes that the actual position (solid) of the quadrocopterconverges to a very tight set around the desired position(dashed). The effects of the switching to the updated GPmodel are more noticeable in the evolution of the Lyapunovfunction in Fig. 4. The function might increase after anupdate of the GP model due to the change of G n and the newprediction accuracy of the GP model. However, the functionconverges to a bounded set as proposed in Theorem 1 afterthe ﬁnite number of switching events.C ONCLUSION

We present a safe online learning-based tracking controllaw for a class of underactuated systems with unknowndynamics typical for aerial and underwater vehicles. Usinga various type of oracles, the tracking error is proven tobe bounded in probability and the size of the bound isexplicitly given. Furthermore, additional assumptions leadto asymptotic stability. Even though no particular oracle isassumed, we show that Gaussian process models fulﬁll allrequirements to be used as oracle in the proposed controlsetting. Finally, a numerical example visualizes the effec-tiveness of the control law.R

EFERENCES[1] M. Reyhanoglu, A. van der Schaft, N. H. McClamroch, and I. Kol-manovsky, “Dynamics and control of a class of underactuated me-chanical systems,”

IEEE Transactions on Automatic Control , vol. 44,no. 9, pp. 1663–1671, 1999.[2] S. A. Al-Hiddabi, “Quadrotor control using feedback linearizationwith dynamic extension,” in , pp. 1–3, IEEE, 2009. [3] D. Lee, H. J. Kim, and S. Sastry, “Feedback linearization vs. adaptivesliding mode control for a quadrotor helicopter,”

International Journalof control, Automation and systems , vol. 7, no. 3, pp. 419–428, 2009.[4] S. Bouabdallah and R. Siegwart, “Backstepping and sliding-modetechniques applied to an indoor micro quadrotor,” in

Proc. of theinternational conference on robotics and automation , pp. 2247–2252,IEEE, 2005.[5] G. V. Raffo, M. G. Ortega, and F. R. Rubio, “Backstepping/nonlinearH- ∞ control for path tracking of a quadrotor unmanned aerialvehicle,” in Proc. of the American Control Conference , pp. 3356–3361,IEEE, 2008.[6] E. Frazzoli, M. A. Dahleh, and E. Feron, “Trajectory tracking controldesign for autonomous helicopters using a backstepping algorithm,”in

Proc. of the American Control Conference , pp. 4102–4107, IEEE,2000.[7] Z.-S. Hou and Z. Wang, “From model-based control to data-drivencontrol: Survey, classiﬁcation and perspective,”

Information Sciences ,vol. 235, pp. 3–35, 2013.[8] R. Mahony and T. Hamel, “Robust trajectory tracking for a scalemodel autonomous helicopter,”

International Journal of Robust andNonlinear Control: IFAC-Afﬁliated Journal , vol. 14, no. 12, pp. 1035–1059, 2004.[9] I.-H. Choi and H.-C. Bang, “Adaptive command ﬁltered backsteppingtracking controller design for quadrotor unmanned aerial vehicle,”

Proc. of the Institution of Mechanical Engineers, Part G: Journal ofAerospace Engineering , vol. 226, no. 5, pp. 483–497, 2012.[10] M. Kobilarov, “Trajectory tracking of a class of underactuated systemswith external disturbances,” in ,pp. 1044–1049, IEEE, 2013.[11] J. Umlauft and S. Hirche, “Feedback linearization based on Gaussianprocesses with event-triggered online learning,”

IEEE Transactions onAutomatic Control , 2020. doi: 10.1109/tac.2019.2958840.[12] M. Greeff and A. P. Schoellig, “Exploiting differential ﬂatness forrobust learning-based tracking control using Gaussian processes,”

IEEE Control Systems Letters , vol. 5, no. 4, pp. 1121–1126, 2021.[13] A. Capone and S. Hirche, “Backstepping for partially unknownnonlinear systems using gaussian processes,”

IEEE Control SystemsLetters , vol. 3, no. 2, pp. 416 – 421, 2019.[14] T. Beckers, D. Kuli, and S. Hirche, “Stable Gaussian process basedtracking control of Euler-Lagrange systems,”

Automatica , no. 103,pp. 390–397, 2019.[15] M. K. Helwa, A. Heins, and A. P. Schoellig, “Provably robust learning-based approach for high-accuracy tracking control of Lagrangiansystems,”

IEEE Robotics and Automation Letters , vol. 4, no. 2,pp. 1587–1594, 2019.[16] F. Berkenkamp, A. P. Schoellig, and A. Krause, “Safe controlleroptimization for quadrotors with Gaussian processes,” in

Proc. of theIEEE International Conference on Robotics and Automation (ICRA) ,pp. 491–496, May 2016.[17] F. Scarselli and A. C. Tsoi, “Universal approximation using feedfor-ward neural networks: A survey of some existing methods, and somenew results,”

Neural networks , vol. 11, no. 1, pp. 15–37, 1998.[18] C. E. Rasmussen and C. K. Williams,

Gaussian processes for machinelearning , vol. 1. MIT press Cambridge, 2006.[19] I. Steinwart and A. Christmann,

Support vector machines . SpringerScience & Business Media, 2008.[20] K. J. ˚Astr¨om and P. Eykhoff, “System identiﬁcationa survey,”

Auto-matica , vol. 7, no. 2, pp. 123–162, 1971.[21] T. Beckers and S. Hirche, “Stability of Gaussian process state spacemodels,” in

Proc. of the European Control Conference , 2016.[22] G. Wahba,

Spline models for observational data . SIAM, 1990.[23] N. Srinivas, A. Krause, S. M. Kakade, and M. W. Seeger,“Information-theoretic regret bounds for Gaussian process optimiza-tion in the bandit setting,”

IEEE Transactions on Information Theory ,vol. 58, no. 5, pp. 3250–3265, 2012.[24] A. Lederer, J. Umlauft, and S. Hirche, “Uniform error bounds forGaussian process regression with application to safe control,” in

Conference on Neural Information Processing Systems , 2019.[25] H. L. Trentelman, A. A. Stoorvogel, and M. Hautus,

Control theoryfor linear systems . Springer Science & Business Media, 2012.[26] D. Liberzon and A. S. Morse, “Basic problems in stability and designof switched systems,”