Adaptive Estimation for Nonlinear Systems using Reproducing Kernel Hilbert Spaces
Parag Bobade, Suprotim Majumdar, Savio Pereira, Andrew J. Kurdila, John B. Ferris
AAdaptive Estimation for Nonlinear Systems using ReproducingKernel Hilbert Spaces
Parag Bobade ∗ , Suprotim Majumdar , Savio Pereira , Andrew J. Kurdila , John B. Ferris Abstract
This paper extends a conventional, general framework for online adaptive estimation problemsfor systems governed by unknown nonlinear ordinary differential equations. The central featureof the theory introduced in this paper represents the unknown function as a member of areproducing kernel Hilbert space (RKHS) and defines a distributed parameter system (DPS)that governs state estimates and estimates of the unknown function. This paper 1) derivessufficient conditions for the existence and stability of the infinite dimensional online estimationproblem, 2) derives existence and stability of finite dimensional approximations of the infinitedimensional approximations, and 3) determines sufficient conditions for the convergence of finitedimensional approximations to the infinite dimensional online estimates. A new condition forpersistency of excitation in a RKHS in terms of its evaluation functionals is introduced inthe paper that enables proof of convergence of the finite dimensional approximations of theunknown function in the RKHS. This paper studies two particular choices of the RKHS, thosethat are generated by exponential functions and those that are generated by multiscale kernelsdefined from a multiresolution analysis.
Keywords:
Adaptive Estimation, Reproducing Kernel Hilbert Spaces, Distributed ParameterSystems.
1. Introduction
There has been a steep rise of interest in the last decade among researchers in academiaand the commercial sector in autonomous vehicles and self driving cars. Although adaptiveestimation has been studied for some time, applications such as terrain or road mapping con-tinue to challenge researchers to further develop the underlying theory and algorithms in thisfield. These vehicles are required to sense the environment and navigate surrounding terrainwithout any human intervention. The environmental sensing capability of such vehicles mustbe able to navigate off-road conditions or to respond to other agents in urban settings. As akey ingredient to achieve these goals, it can be critical to have a good a priori knowledge of thesurrounding environment as well as the position and orientation of the vehicle in the environ-ment. To collect this data for the construction of terrain maps, mobile vehicles equipped withmultiple high bandwidth, high resolution imaging sensors are deployed. The mapping sensors ∗ Corresponding Author
Email addresses: [email protected] (Parag Bobade), [email protected] (Suprotim Majumdar), [email protected] (Savio Pereira), [email protected] (Andrew J. Kurdila), [email protected] (John B. Ferris) Graduate Student, Department of Engineering Science and Mechanics, Virginia Tech Graduate Student, Department of Electrical and Computer Engineering, Virginia Tech Graduate Student, Department of Mechanical Engineering, Virginia Tech Professor, Department of Mechanical Engineering, Virginia Tech Associate Professor, Department of Mechanical Engineering, Virginia Tech
Preprint submitted to Elsevier a r X i v : . [ c s . S Y ] J u l etrieve the terrain data relative to the vehicle and navigation sensors provide georeferencingrelative to a fixed coordinate system. The geospatial data, which can include the digital terrainmaps acquired from these mobile mapping systems, find applications in emergency responseplanning and road surface monitoring. Further, to improve the ride and handling characteristicof an autonomous vehicle, it might be necessary that these digital terrain maps have accuracyon a sub-centimeter scale.One of the main areas of improvement in current state of the art terrain modeling tech-nologies is the localization. Since the localization heavily relies on the quality of GPS/GNSS,IMU data, it is important to come up with novel approaches which could fuse the data frommultiple sensors to generate the best possible estimate of the environment. Contemporary dataacquisition systems used to map the environment generate scattered data sets in time andspace. These data sets must be either post-processed or processed online for construction ofthree dimensional terrain maps.Fig.1 and Fig.2 depict a map building vehicle and trailer developed by some of the authorsat Virginia Tech. The system generates experimental observations in the form of data that isscattered in time and space. These data sets have extremely high dimensionality. Roughly 180million scattered data points are collected per minute of data acquisition, which correspondsto a data file of roughly O (1 GB ) in size. Current algorithms and software developed in-housepost-process the scattered data to generate road and terrain maps. This offline batch computingproblem can take many days of computing time to complete. It remains a challenging task toderive a theory and associated algorithms that would enable adaptive or online estimation ofterrain maps from such high dimensional, scattered measurements.This paper introduces a novel theory and associated algorithms that are amenable to ob-servations that take the form of scattered data. The key attribute of the approach is that theunknown function representing the terrain is viewed as an element of a RKHS. The RKHS isconstructed in terms of a kernel function k ( · , · ) : Ω × Ω → R where Ω ⊆ R d is the domain overwhich scattered measurements are made. The kernel k can often be used to define a collection ofradial basis functions (RBFs) k x ( · ) := k ( x, · ), each of which is said to be centered at some point x ∈ Ω. For example, these RBFs might be exponentials, wavelets, or thin plate splines [1]. Byembedding the unknown function that represents the terrain in a RKHS, the new formulationgenerates a system that constitutes a distributed parameter system. The unknown function,representing map terrain, is the infinite dimensional distributed parameter. Although the studyof infinite dimensional distributed parameter systems can be substantially more difficult thanthe study of ODEs, a key result is that stability and convergence of the approach can be es-tablished succinctly in many cases. Much of the complexity [2, 3] associated with constructionof Gelfand triples or the analysis of infinitesimal generators and semigroups that define a DPScan be avoided for many examples of the systems in this paper. The kernel k ( · , · ) : Ω × Ω → R that defines the RKHS provides a natural collection of bases for approximate estimates of thesolution that are based directly on some subset of scattered measurements { x i } ni =1 ⊂ R d . It istypical in applications to select the centers { x i } ni =1 that locate the basis functions from somesub-sample of the locations at which the scattered data is measured. Thus, while we do notstudy the nuances of such methods, in this paper the formulation provides a natural frameworkto pose so-called “basis adaptive methods” such as in [4] and the references therein.While our formulation is motivated by this particular application, it is a general constructionfor framing and generalizing some conventional approaches for online adaptive estimation. Thisframework introduces sufficient conditions that guarantee convergence of estimates in spatialdomain Ω to the unknown function f . In contrast, nearly all conventional strategies considerstability and convergence in time alone for some fixed finite dimensional space of R d × R n ,with n the number of parameters used to represent the estimate. The remainder of this paperstudies the existence and uniqueness of solutions, stability, and convergence of approximate2 igure 1: Vehicle Terrain Measurement System, Virginia TechFigure 2: Experimental Setup with LMI 3D GO-Locator Lasers solutions for the infinite dimensional adaptive estimation problem defined over an RKHS. Thepaper concludes with an example of an RKHS adaptive estimation problem for a simple modelof map building from vehicles. The numerical example demonstrates the rate of convergencefor finite dimensional models constructed from RBF bases that are centered at a subset ofscattered observations. The general theory derived in this paper has been motivated in part by the terrain mappingapplication discussed in Section 1, but also by recent research in a number of fields related toestimation of nonlinear functions. In this section we briefly review some of the recent researchin probabilistic or Bayesian mapping methods, nonlinear approximation and learning theory,statistics, and nonlinear regression.
Many popular known techniques adopt a probabilistic approach towards solving the localiza-tion and mapping problem in robotics. The algorithms used to solve this problem fundamentallyrely on Bayesian estimation techniques like particle filters, Kalman filters and other variants ofthese methods [5, 6, 7]. The computational efforts required to implement these algorithms canbe substantial since they involve constructing and updating maps while simultaneously trackingthe relative locations of agents with respect to the environment. Over the last three decadessignificant progress has been made on various frontiers in terms of high-end sensing capabil-ities, faster data processing hardwares, robust and efficient computational algorithms [8, 9].However, the usual Kalman filter based approaches implemented in these applications oftenare required to address the inconsistency problem in estimation that arise from uncertainties instate estimates [10, 11]. Furthermore, it is well acknowledged among the community that thesemethods suffer from a major drawback of ‘ closing the loop ’. This refers to the ability to adap-tively update the information if it is revisited. Since such a capability for updating information3emands huge memory to store the high resolution and high bandwidth data. Moreover, itis highly nontrivial to guarantee that the uncertainties in estimates would converge to lowerbound at sub optimal rates, since matching these rates and bounds significantly constraint theevolution of states along infeasible trajectories. While probabilistic methods, and in particu-lar Bayesian estimation techniques, for the construction of terrain maps have flourished overthe past few decades, relatively few approaches for establishing deterministic theoretical errorbounds in the spatial domain of the unknown function representing the terrain have appeared.
Approximation theory has a long history, but the subtopics of most relevance to this paperinclude recent studies in multiresolution analysis (MRA), radial basis function (RBF) approx-imation and learning theory. The study of MRA techniques became popular in the late 1980’sand early 1990’s, and it has flourished since that time. We use only a small part of the generaltheory of MRAs in this paper, and we urge the interested reader to consult one of the excellenttreatises on this topic for a full account. References [12, 13, 14, 15] are good examples of suchdetailed treatments. We briefly summarize the pertinent aspects of MRA here and in Section2.1. A multiresolution analysis defines a family of nested approximation spaces { H j } j ∈ N ⊆ H of an abstract space H in terms of a single function φ , the scaling function. The approxi-mation space H j is defined in terms of bases that are constructed from dilates and translates { φ j,k } k ∈ Z d with φ j,k ( x ) := 2 jd/ φ (2 j x − k ) for x ∈ R d of this single function φ . It is for thisreason that these spaces are sometimes referred to as shift invariant spaces. While the MRAis ordinarily defined only in terms of the scaling functions, the theory provides a rich set oftools to derive bases { ψ j,k } k ∈ Z , or wavelets, for the complement spaces W j := V j +1 − V j . Ourinterest in multiresolution analysis arises since these methods can be used to develop multiscalekernels for RKHS, as summarized in [16, 17]. We only consider approximation spaces definedin terms of the scaling functions in this paper. Specifically, with a parameter s ∈ R + measuringsmoothness, we use s − regular MRAs to define admissible kernels for the reproducing kernelsthat embody the online and adaptive estimation strategies in this paper. When the MRA basesare smooth enough, the RKHS kernels derived from a MRA can be shown to be equivalent to ascale of Sobolev spaces having well documented approximation properties. The B-spline basesin the numerical examples yield RKHS embeddings with good condition numbers. The detailsof the RKHS embedding strategy given in terms of wavelet bases associated with an MRA istreated in the forthcoming paper. The methodology defined in this paper for online adaptive estimation can be viewed assimilar in philosophy to the recent efforts that synthesize learning theory and approximationtheory. [18, 19, 20, 21] In these references, independent and identically distributed observationsof some unknown function are collected, and they are used to define an estimator of thatunknown function. Sharp estimates of error, guaranteed to hold in probability spaces, arepossible using tools familiar from learning theory and thresholding in approximation spaces.The approximation spaces are usually defined terms of subspaces of an MRA. However, there area few key differences between the these efforts in nonlinear regression and learning theory andthis paper. The learning theory approaches to estimation of the unknown function depend onobservations of the function itself. In contrast, the adaptive online estimation framework hereassumes that observations are made of the estimator states, not directly of the unknown functionitself. The learning theory methods also assume a discrete measurement process, instead of thecontinuous measurement process that characterizes online adaptive estimation. On the otherhand, the methods based on learning theory derive sharp function space rates of convergenceof the estimates of the unknown function. Such estimates are not available in conventionalonline adaptive estimation methods. Typically, convergence in adaptive estimation strategies4s guaranteed in time in a fixed finite dimensional space. One of the significant contributionsof this paper is to construct sharp convergence rates in function spaces, similar to approachesin learning theory, of the unknown function using online adaptive estimation.
Since the approach in this paper generalizes a standard strategy in online adaptive esti-mation and control theory, we review this class of methods in some detail. This summarywill be crucial in understanding the nuances of the proposed technique and in contrasting thesharp estimates of error available in the new strategy to those in the conventional approach.Many popular textbooks study online or adaptive estimation within the context of adaptivecontrol theory for systems governed by ordinary differential equations [22, 23, 24]. The the-ory has been extended in several directions, each with its subtle assumptions and associatedanalyses. Adaptive estimation and control theory has been refined for decades, and significantprogress has been made in deriving convergent estimation and stable control strategies thatare robust with respect to some classes of uncertainty. The efforts in [2, 3] are relevant to thispaper, where the authors generalize some of adaptive estimation and model reference adaptivecontrol (MRAC) strategies for ODEs so that they apply to deterministic infinite dimensionalevolution systems. In addition, [25, 26, 27, 28] also investigate adaptive control and estimationproblems under various assumptions for classes of stochastic and infinite dimensional systems.Recent developments in L control theory as presented in [29], for example, utilize adaptiveestimation and control strategies in obtaining stability and convergence for systems generatedby collections of nonlinear ODEs.To motivate this paper, we consider a model problem in which the plant dynamics aregenerated by the nonlinear ordinary differential equations˙ x ( t ) = Ax ( t ) + Bf ( x ( t )) , x (0) = x (1.1)with state x ( t ) ∈ R d , the known Hurwitz system matrix A ∈ R d × d , the known control influencematrix B ∈ R d , and the unknown function f : R d → R . Although this model problem is anexceedingly simple prototypical example studied in adaptive estimation and control of ODEs[22, 23, 24], it has proven to be an effective case study in motivating alternative formulationssuch as in [29] and will suffice to motivate the current approach. Of course, much more generalplants are treated in standard methods [22, 23, 24, 30] and can be attacked using the strategythat follows. This structurally simple problem is chosen so as to clearly illustrate the essentialconstructions of RKHS embedding method while omitting the nuances associated with generalplants. A typical adaptive estimation problem can often be formulated in terms of an estimatorequation and a learning law. One of the simplest estimators for this model problem takes theform ˙ˆ x ( t ) = A ˆ x ( t ) + B ˆ f ( t, x ( t )) , ˆ x (0) = x (1.2)where ˆ x ( t ) is an estimate of the state x ( t ) and ˆ f ( t, x ( t )) is time varying estimate of the unknownfunction f that depends on measurement of the state x ( t ) of the plant at time t . When thestate error ˜ x := x − ˆ x and function estimate error ˜ f := f − ˆ f are defined, the state errorequation is simply ˙˜ x ( t ) = A ˜ x ( t ) + B ˜ f ( t, x ( t )) , ˜ x (0) = ˜ x . (1.3)The goal of adaptive or online estimation is to determine a learning law that governs theevolution of the function estimate ˆ f and guarantees that the state estimate ˆ x converges to thetrue state x , ˜ x ( t ) = x ( t ) − ˆ x ( t ) → t → ∞ . Perhaps additionally, it is hoped that thefunction estimates ˆ f converge to the unknown function f , ˜ f ( t ) = f ( t ) − ˆ f ( t ) → t → ∞ . f depends intrinsicallyon what specific information is available about the unknown function f . It is most often thecase for ODEs that the estimate ˆ f depends on a finite set of unknown parameters ˆ α , . . . , ˆ α n .The learning law is then expressed as an evolution law for the parameters ˆ α i , i = 1 , . . . , n . Thediscussion that follows emphasizes that this is a very specific underlying assumption regardingthe information available about unknown function f . Much more general prior assumptionsare possible. The adaptive estimation task seeks to construct a learning law based on the knowledgethat is available regarding the function f . Different methods for solving this problem havebeen developed depending on the type of information available about the unknown function f . The uncertainty about f is often described as forming a continuum between structured andunstructured uncertainty. In the most general case, we might know that f lies in some compactset C of a particular Hilbert space of functions H over a subset Ω ⊆ R d . This case, thatreflects in some sense the least information regarding the unknown function, can be expressedas the condition that f ∈ { g ∈ C | C ⊂ H } , for some compact set of functions C in a Hilbertspace of functions H . In approximation theory, learning theory, or non-parametric estimationproblems this information is sometimes referred to as the prior , and choices of H commonlyknown as the hypothesis space. The selection of the hypothesis space H and set C oftenreflect the approximation, smoothness, or compactness properties of the unknown function[18]. This example may in some sense utilize only limited or minimal information regardingthe unknown function f , and we may refer to the uncertainty as unstructured. Numerousvariants of conventional adaptive estimation admit additional knowledge about the unknownfunction. In most conventional cases the unknown function f is assumed to be given in terms ofsome fixed set of parameters. This situation is similar in philosophy to problems of parametricestimation which restrict approximants to classes of functions that admit representation interms of a specific set of parameters. Suppose the finite dimensional basis { φ k } k =1 ,...,n is knownfor a particular finite dimensional subspace H n ⊆ H in which the function lies, and further thatthe uncertainty is expressed as the condition that there is a unique set of unknown coefficients { α ∗ i } i =1 ,...,n such that f := f ∗ = (cid:80) i =1 ,...,n α ∗ i φ i ∈ H n . Consequently, conventional approachesmay restrict the adaptive estimation technique to construct an estimate with knowledge that f lies in the set f ∈ (cid:26) g ∈ H n ⊆ H (cid:12)(cid:12)(cid:12)(cid:12) g = (cid:88) i =1 ,...,n α i φ i with (1.4) α i ∈ [ a i , b i ] ⊂ R for i = 1 , . . . , n (cid:27) This is an example where the uncertainty in the estimation problem may be said to be struc-tured. The unknown function is parameterized by the collection of coefficients { α ∗ i } i =1 ,...,n . Inthis case the compact set the C is a subset of H n . As we discuss in sections 1.3, 2,and 3,the RKHS embedding approach can be characterised by the fact that the uncertainty is moregeneral and even unstructured, in contrast to conventional methods. R d × R n The development of adaptive estimation strategies when the uncertainty takes the formin 1.4 represents, in some sense, an iconic approach in the adaptive estimation and controlcommunity. Entire volumes [22, 23, 24, 31] contain numerous variants of strategies that canbe applied to solve adaptive estimation problems in which the uncertainty takes the form in1.4. One canonical approach to such an adaptive estimation problem is governed by three6oupled equations: the plant dynamics 1.5, estimator equation 1.6, and the learning rule. Weorganize the basis functions as φ := [ φ , . . . , φ n ] T and the parameters as α ∗ T = [ α ∗ , . . . , α ∗ n ],ˆ α T = [ ˆ α , . . . , ˆ α n ]. A common gradient based learning law yields the governing equations thatincorporate the plant dynamics, estimator equation, and the learning rule.˙ x ( t ) = Ax ( t ) + Bα ∗ T φ ( x ( t )) , (1.5)˙ˆ x ( t ) = A ˆ x ( t ) + B ˆ α T ( t ) φ ( x ( t )) , (1.6)˙ˆ α ( t ) = Γ − φB T P ( x − ˆ x ) , (1.7)where Γ ∈ R n × n is symmetric and positive definite. The symmetric positive definite matrix P ∈ R d × d is the unique solution of Lyapunov’s equation A T P + P A = − Q , for some selectedsymmetric positive definite Q ∈ R d × d . Usually the above equations are summarized in termsthe two error equations ˙˜ x ( t ) = A ˜ x + Bφ T ( x ( t )) ˜ α ( t ) (1.8)˙˜ α ( t ) = − Γ − φ ( x ( t )) B T P ˜ x. (1.9)with ˜ α := α ∗ − ˆ α and ˜ x := x − ˆ x . Equations 1.8, 1.9 can also be written as (cid:26) ˙˜ x ( t )˙˜ α ( t ) (cid:27) = (cid:20) A Bφ T ( x ( t )) − Γ − φ ( x ( t )) B T P (cid:21) (cid:26) ˜ x ( t )˜ α ( t ) (cid:27) . (1.10)This equation defines an evolution on R d × R n and has been studied in great detail in [30, 32, 33].Standard texts such as [22, 23, 24, 31] outline numerous other variants for the online adaptiveestimation problem using projection, least squares methods and other popular approaches. R d × H In this paper, we study the method of RKHS embedding that interprets the unknownfunction f as an element of the RKHS H , without any a priori selection of the particularfinite dimensional subspace used for estimation of the unknown function. The counterparts toEquations 1.5, 1.6, 1.7 are the plant, estimator, and learning laws˙ x ( t ) = Ax ( t ) + BE x ( t ) f, (1.11)˙ˆ x ( t ) = A ˆ x ( t ) + BE x ( t ) ˆ f ( t ) , (1.12)˙ˆ f ( t ) = Γ − ( BE x ( t ) ) ∗ P ( x ( t ) − ˆ x ( t )) , (1.13)where as before x, ˆ x ∈ R d , but f and ˆ f ( t ) ∈ H , E ξ : H → R d is the evaluation functional thatis given by E ξ : f (cid:55)→ f ( ξ ) for all ξ ∈ R d and f ∈ H , and Γ ∈ L ( H, H ) is a self adjoint, positivedefinite linear operator.a The error equation analogous to Equation 1.10 system is then givenby (cid:40) ˙˜ x ( t )˙˜ f ( t ) (cid:41) = (cid:20) A BE x ( t ) − Γ − ( BE x ( t ) ) ∗ P (cid:21) (cid:26) ˜ x ( t )˜ f ( t ) (cid:27) , (1.14)which defines an evolution on R d × H , instead of on R d × R n .7 .3.2. Existence, Stability, and Convergence Rates We briefly summarize and compare the conlusions that can be reached for the conventionaland RKHS embedding approaches. Let (ˆ x, ˆ f ) be estimates of ( x, f ) that evolve according tothe state, estimator, and learning law of RKHS embedding. Define the state and distributedparameter error as ˜ x := x − ˆ x and ˜ f := f − ˆ f , respectively. Under the assumptions outlinedin Theorems 1, 2, and 3 for each T > x, ˜ f ) ∈ C ([0 , T ]; R d × H ) to the DPS described by Equations 1.14. Moreover, the error in state estimates˜ x ( t ) converges to zero, lim t →∞ (cid:107) ˜ x ( t ) (cid:107) = 0. If all the evolutions with initial conditions in an openball containing the origin exist in C ([0 , ∞ ); R × H ), the equilibrium at the origin (˜ x, ˜ f ) = (0 , H . See the standard texts [22, 23, 24, 31]for proofs of existence and convergence of the conventional methods. It must be emphasizedagain that the conventional results are stated for evolutions in R d × R n , and the RKHS resultshold for evolutions in R d × H . Considerably more can be said about the convergence of finitedimensional approximations. For the RKHS embedding approach state and finite dimensionalapproximations (ˆ x j , ˆ f j ) of the infinite dimensional estimates (ˆ x, ˆ f ) on a grid that has resolutionlevel j are governed by Equations 4.1 and 4.2. The finite dimensional estimates (ˆ x j , ˆ f j ) convergeto the infinite dimensional estimates (ˆ x, ˆ f ) at a rate that depends on (cid:107) I − ΓΠ ∗ j Γ − j Π j (cid:107) and (cid:107) I − Π j (cid:107) where Π j : H → H j is the H -orthogonal projection.The remainder of this paper studies the existence and uniqueness of solutions, stability, andconvergence of approximate solutions for infinite dimensional, online or adaptive estimationproblems. The analysis is based on a study of distributed parameter systems (DPS) thatcontains the RKHS H . The paper concludes with an example of an RKHS adaptive estimationproblem for a simple model of map building from vehicles. The numerical example demonstratesthe rate of convergence for finite dimensional models constructed from radial basis function(RBF) bases that are centered at a subset of scattered observations. The discussion focuses ona comparison and contrast of the analysis for the ODE system and the distributed parametersystem. Prior to these discussions, however, we present a brief review fundamental propertiesof RKHS spaces in the next section.
2. Reproducing Kernel Hilbert Space
Estimation techniques for distributed parameter systems have been previously studied in[34], and further developed to incorporate adaptive estimation of parameters in certain infinitedimensional systems by [2] and the references therein. These works also presented the neces-sary conditions required to achieve parameter convergence during online estimation. But bothapproaches rely on delicate semigroup analysis and evolution, or Gelfand triples.The approachherein is much simpler and amenable to a wide class of applications. It appears to be simpler,practical approach to generalise conventional methods. This paper considers estimation prob-lems that are cast in terms of the unknown function f : Ω ⊆ R d → R , and our approximationswill assume that this function is an element of a reproducing kernel Hilbert space. One way todefine a reproducing kernel Hilbert space relies on demonstrating the boundedness of evaluationfunctionals, but we briefly summarize a constructive approach that is helpful in applicationsand understanding computations such as in our numerical examples.In this paper R denotes the real numbers, N the positive integers, N the non-negativeintegers, and Z the integers. We follow the convention that a (cid:38) b means that there is a constant c , independent of a or b , such that b ≤ ca . When a (cid:38) b and b (cid:38) a , we write a ≈ b . Severalfunction spaces are used in this paper. The p -integrable Lebesgue spaces are denoted L p (Ω) for1 ≤ p ≤ ∞ , and C s (Ω) is the space of continuous functions on Ω all of whose derivatives lessthan or equal to s are continuous. The space C sb (Ω) is the normed vector subspace of C s (Ω)8nd consists of all f ∈ C s (Ω) whose derivatives of order less than or equal to s are bounded.The space C s,λ (Ω) ⊆ C sb (Ω) ⊆ C s (Ω) is the collection of functions with derivatives ∂ | α | f∂x | α | thatare λ -Holder continuous, (cid:107) f ( x ) − f ( y ) (cid:107) ≤ C (cid:107) x − y (cid:107) λ The Sobolev space of functions that have weak derivatives of the order less than equal to r thatlie in L p (Ω) is denoted H rp (Ω).A reproducing kernel Hilbert space is constructed in terms of a symmetric, continuous, andpositive definite function k : Ω × Ω → R , where positive definiteness requires that for any finitecollection of points { x i } ni =1 ⊆ Ω n (cid:88) i,j =1 k ( x i , x j ) α i α j (cid:38) (cid:107) α (cid:107) R n for all α = { α , . . . , α n } T .. For each x ∈ Ω, we denote the function k x := k x ( · ) = k ( x, · )and refer to k x as the kernel function centered at x . In many typical examples [1], k x canbe interpreted literally as a radial basis function centered at x ∈ Ω. For any kernel functions k x and k y centered at x, y ∈ Ω, we define the inner product ( k x , k y ) := k ( x, y ). The RKHS H is then defined as the completion of all finite sums extracted from the set { k x | x ∈ Ω } . Itis well known that this construction guarantees the boundedness of the evaluation functionals E x : H → R . In other words for each x ∈ Ω we have a constant c x such that | E x f | = | f ( x ) | ≤ c x (cid:107) f (cid:107) H for all f ∈ H . The reproducing property of the RKHS H plays a crucial role in the analysishere, and it states that, E x f = f ( x ) = ( k x , f ) H for x ∈ Ω and f ∈ H . We will also require the adjoint E ∗ x : R → H in this paper, which can becalculated directly by noting that( E x f, α ) R = ( f, αk x ) H = ( f, E ∗ x α ) H for α ∈ R , x ∈ Ω and f ∈ H . Hence, E ∗ x : α (cid:55)→ αk x ∈ H .Finally, we will be interested in the specific case in which it is possible to show that theRKHS H is a subset of C (Ω), and furthermore, that the associated injection i : H → C (Ω) isuniformly bounded. This uniform embedding is possible, for example, provided that the kernelis bounded by a constant ˜ C , sup x ∈ Ω k ( x, x ) ≤ ˜ C . This fact follows by first noting that by thereproducing kernel property of the RKHS, we can write | f ( x ) | = | E x f | = | ( k x , f ) H | ≤ (cid:107) k x (cid:107) H (cid:107) f (cid:107) H . (2.1)From the definition of the inner product on H , we have (cid:107) k x (cid:107) = | ( k x , k x ) H | = | ( k ( x, x ) | ≤ ˜ C . It follows that (cid:107) if (cid:107) C (Ω) := (cid:107) f (cid:107) C (Ω) ≤ ˜ C (cid:107) f (cid:107) H and thereby that (cid:107) i (cid:107) ≤ ˜ C . We next give twoexamples that will be studied in this paper. Example: The Exponential Kernel
A popular example of an RKHS, one that will be used in the numerical examples, is con-structed from the family of exponentials κ ( x, y ) := e −(cid:107) x − y (cid:107) /σ where σ >
0. Suppose that˜ C = (cid:112) sup x ∈ Ω κ ( x, x ) < ∞ . Smale and Zhou in [35] argue that | f ( x ) | = | E x ( f ) | = | ( κ x , f ) H | ≤ (cid:107) κ x (cid:107) H (cid:107) f (cid:107) H x ∈ Ω and f ∈ H , and since (cid:107) κ x (cid:107) = | κ ( x, x ) | ≤ ˜ C , it follows that the embedding i : H → L ∞ (Ω) is bounded, (cid:107) f (cid:107) L ∞ (Ω) := (cid:107) i ( f ) (cid:107) L ∞ (Ω) ≤ ˜ C (cid:107) f (cid:107) H . For the exponential kernel above, ˜ C = 1. Let C s (Ω) denote the space of functions on Ω all ofwhose partial derivatives of order less than or equal to s are continuous. The space C sb (Ω) isendowed with the norm (cid:107) f (cid:107) C sb (Ω) := max | α |≤ s (cid:13)(cid:13)(cid:13)(cid:13) ∂ | α | f∂x α (cid:13)(cid:13)(cid:13)(cid:13) L ∞ (Ω) , with the summation taken over multi-indices α := { α , . . . , α d } ∈ N d , ∂x α := ∂x α · · · ∂x α d d ,and | α | = (cid:80) i =1 ,...,d α i . Observe that the continuous functions in C s (Ω) need not be boundedeven if Ω is a bounded open domain. The space C sb (Ω) is the subspace consisting of functions f ∈ C sb (Ω) for which all derivatives of order less than or equal to s are bounded. The space C s,λ (Ω) is the subspace of functions f in C s (Ω) for which all of the partial derivatives ∂f | α | ∂x α with | α | ≤ s are λ -Holder continuous. The norm of C s,λ (Ω) for 0 < λ ≤ (cid:107) f (cid:107) C s,λ (Ω) = (cid:107) f (cid:107) C s (Ω) + max ≤ α ≤ s sup x,y ∈ Ω x (cid:54) = y (cid:12)(cid:12)(cid:12) ∂ | α | f∂x | α | ( x ) − ∂ | α | f∂x | α | ( y ) (cid:12)(cid:12)(cid:12) | x − y | λ Also, reference [35] notes that if κ ( · , · ) ∈ C s,λb (Ω × Ω) with 0 < λ < H → C s,λ/ b (Ω) is well defined and continuous. That is the mapping i : H → C s,λ/ b defined via f (cid:55)→ i ( f ) := f satisfies (cid:107) f (cid:107) C s,λ/ b (Ω) (cid:46) (cid:107) f (cid:107) H . In fact reference [35] shows that (cid:107) f (cid:107) C sb (Ω) ≤ s (cid:107) κ (cid:107) / C sb (Ω × Ω) (cid:107) f (cid:107) H . The overall important conclusion to draw from the summary above is that there are many con-ditions that guarantee that the imbedding
H (cid:44) → C b (Ω) is continuous. This condition will playa central role in devising simple conditions for existence of solutions of the RKHS embeddingtechnique. s -Regular Scaling Functions The characterization of the norm of the Sobolev space H r := H r ( R d ) has appeared inmany monographs that discuss multiresolution analysis [12, 13, 36]. It is also possible todefine the Sobolev space H r ( R d ) as the Hilbert space constructed from a reproducing kernel κ ( · , · ) : R d × R d → R that is defined in terms of an s -regular scaling function φ of an multi-resolution analysis (MRA) [12, 36]. The scaling function φ is s -regular provided that, for d < r < s , we define the kernel κ ( u, v ) : = ∞ (cid:88) j =0 j ( d − r ) (cid:88) k ∈ Z d φ (2 j u − k ) φ (2 j v − k )= ∞ (cid:88) j =0 − rj (cid:88) k ∈ Z d φ j,k ( u ) φ j,k ( v ) .
10t should be noted that the requirement d/ < r implies the coefficient 2 j ( d − r ) above is de-creasing as j → ∞ , and ensures the summation converges. As discussed in Section 2 and inreference [16, 17], the RKHS is constructed as the closure of the finite linear span of the setof function { κ u } u ∈ Ω with κ u ( · ) := κ ( u, · ). Under the assumption that d < r < s , the Sobolevspace H r ( R d ) can also be related to the Hilbert space H rκ ( R d ) defined as H rκ ( R d ) := (cid:110) f : R d → R | ( f, f ) κ,r = (cid:107) f (cid:107) κ,r < ∞ (cid:111) with the inner product ( · , · ) κ,r on H rκ ( R d ) defined as( f, f ) κ,r := (cid:107) f (cid:107) κ,r := inf (cid:26) ∞ (cid:88) j =0 j (2 r − d ) (cid:107) f j (cid:107) V j (cid:12)(cid:12)(cid:12)(cid:12) f j ∈ V j , f = ∞ (cid:88) j =0 f j (cid:27) with (cid:107) f (cid:107) V j = (cid:80) k ∈ Z d c j,k for f j ( u ) = (cid:80) k ∈ Z d c j,k φ (2 j u − k ) and j ∈ N . Note that the charac-terization above of H rκ ( R d ) is expressed only in terms of the scaling functions φ j,k for j ∈ N and k ∈ Z d . The functions φ and ψ need not define an orthonormal multiresolution in thischaracterization, and the bases ψ j,k for the complement spaces W j are not used. We discussthe use of wavelet bases ψ j,k for the definition of the kernel in forthcoming paper. References[16, 17] show that when d/ < r < s , we have the norm equivalence H rκ ( R d ) ≈ H r ( R d ) . (2.2)Finally, from Sobolev’s Embedding Theorem [37], whenever r > d/ H r (cid:44) → C r − d/ b ⊂ C r − d/ where C rb is the subspace of functions f in C r all of whose derivatives up through order r arebounded. In fact, by choosing the s -regular MRA with s and r large enough, we have theimbedding H r (Ω) (cid:44) → C (Ω) when Ω ⊆ R d [37].One of the simplest examples that meet the conditions of this section includes the normalizedB-splines of order r >
0. We denote by N r the normalized B-spline of order r with integerknots and define its translated dilates by N rj,k := 2 jd/ N r (2 jd x − k ) for k ∈ Z d and j ∈ N . Inthis case the kernel is written in the form κ ( u, v ) := ∞ (cid:88) j =0 − rj (cid:88) k ∈ Z d N rj,k ( u ) N rj,k ( v ) . Figure 3 depicts the translated dilates of the normalized B-splines of order 1 and 2 respectively. N N N N N N N N N N N N N N N N N N N N B-splines N B-splines N Figure 3: Translated Dilates of Normalized B-Splines . Existence,Uniqueness and Stability In the adaptive estimation problem that is cast in terms of a RKHS H , we seek a solution X = (˜ x, ˜ f ) ∈ R d × H ≡ X that satisfies Equation 1.14. In general X is an infinite dimensionalstate space for this estimation problem, which can in principle substantially complicate theanalysis in comparison to conventional ODE methods. We first establish that the adaptiveestimation problem in Equation 1.14 is well-posed. The result that is derived below is not themost general possible, but rather has been emphasised because its conditions are simple andeasily verifiable in many applications. Theorem 1.
Suppose that x ∈ C ([0 , T ]; R d ) and that the embedding i : H (cid:44) → C (Ω) is uniformin the sense that there is a constant C > such that for any f ∈ H , (cid:107) f (cid:107) C (Ω) ≡ (cid:107) if (cid:107) C (Ω) ≤ C (cid:107) f (cid:107) H . (3.1) For any
T > there is a unique mild solution ( ˜ X, ˜ f ) ∈ C ([0 , T ] , X ) to Equation 1.14 and themap X ≡ (˜ x , ˜ f ) (cid:55)→ (˜ x, ˜ f ) is Lipschitz continuous from X to C ([0 , T ] , X ) .Proof. We can split the governing Equation 1.14 into the form (cid:40) ˙˜ x ( t )˙˜ f ( t ) (cid:41) = (cid:20) A A (cid:21) (cid:26) ˜ x ( t )˜ f ( t ) (cid:27) + (cid:20) BE ( x ( t )) − Γ ( BE ( x ( t ) ) ∗ P − A (cid:21) (cid:26) ˜ x ( t )˜ f ( t ) (cid:27) , (3.2)and write it more concisely as ˙˜ X = A ˜ X ( t ) + F ( t, ˜ X ( t )) (3.3)where the operator A ∈ L ( H, H ) is arbitrary. It is immediately clear that A is the infinitesimalgenerator of C semigroup on X ≡ R d × H since A is bounded on X . In addition, we see thefollowing:1. The function F : R + × X → X is uniformly globally Lipschitz continuous: there is aconstant L > (cid:107) F ( t, X ) − F ( t, Y ) (cid:107) ≤ L (cid:107) X − Y (cid:107) for all X, Y ∈ X and t ∈ [0 , T ].2. The map t (cid:55)→ F ( t, X ) is continuous on [0 , T ] for each fixed X ∈ X .By Theorem 1.2, p.184, in reference [38], there is a unique mild solution˜ X = { ˜ x, ˜ f } T ∈ C ([0 , T ]; X ) ≡ C ([0 , T ]; R d × H ) . In fact the map ˜ X (cid:55)→ X is Lipschitz continuous from X → C ([0 , T ]; X ).The proof of stability of the equilibrium at the origin of the RKHS Equation 1.14 closelyresembles the Lyapunov analysis of Equation 1.10; the extension to consideration of the infinitedimensional state space X is required. It is useful to carry out this analysis in some detail tosee how the adjoint E ∗ x : R → H of the evaluation functional E x : H → R plays a central andindispensable role in the study of the stability of evolution equations on the RKHS. Theorem 2.
Suppose that the RKHS Equations 1.14 have a unique solution in C ([0 , ∞ ); H ) for every initial condition X in some open ball B r (0) ⊆ X . Then the equilibrium at the originis Lyapunov stable. Moreover, the state error ˜ x ( t ) → as t → ∞ . igure 4: Lyapunov function, V ( x )Figure 5: Stability of the equilibrium Proof.
Define the Lyapunov function V : X → R as V (cid:26) ˜ x ˜ f (cid:27) = 12 ˜ x T P ˜ x + 12 (Γ ˜ f , ˜ f ) H . This function is norm continuous and positive definite on any neighborhood of the origin since V ( X ) ≥ (cid:107) X (cid:107) X for all X ∈ X . For any X , and in particular over the open set B r (0), thederivative of the Lyapunov function V along trajectories of the system is given as˙ V = 12 ( ˙˜ x T P ˜ x + ˜ x T P ˙˜ x ) + (Γ ˜ f , ˙˜ f ) H = −
12 ˜ x T Q ˜ x + ( ˜ f , E ∗ x B ∗ P ˜ x + Γ ˙˜ f ) H = −
12 ˜ x T Q ˜ x, since ( ˜ f , E ∗ x B ∗ P ˜ x + Γ ˙˜ f ) H = 0. Let (cid:15) be some constant such that 0 < (cid:15) < r . Define γ ( (cid:15) ) andΩ γ according to γ ( (cid:15) ) = inf (cid:107) X (cid:107) X = (cid:15) V ( X ) , Ω γ = { X ∈ X | V ( X ) < γ } . We can picture these quantities as shown in Fig. 4 and Fig. 5. But Ω γ = { X ∈ X | V ( X ) < γ } is an open set since it is the inverse image of the open set ( −∞ , γ ) ⊂ R under the continuousmapping V : X → R . The set Ω γ therefore contains an open neighborhood of each of itselements. Let δ > B δ (0) ⊂ Ω γ .Since Ω γ := { X ∈ X | V ( X ) ≤ γ } is a level set of V and V is non-increasing, it is a positiveinvariant set. Given any initial condition x ∈ B δ (0) ⊆ Ω γ , we know that the trajectory x ( t )starting at x satisfies x ( t ) ∈ Ω γ ⊆ B (cid:15) (0) ⊆ B r (0) for all t ∈ [0 , ∞ ). The equilibrium at theorigin is stable.The convergence of the state estimation error ˜ x ( t ) → t → ∞ can be based on Bar-balat’s lemma by modifying the conventional arguments for ODE systems. Since ddt ( V ( X ( t ))) =13 ˜ x T ( t ) Q ˜ x ≤ V ( X ( t )) is non-increasing and bounded below by zero. There is a constant V ∞ := lim t →∞ V ( X ( t )), and we have V ( X ) − V ∞ = (cid:90) ∞ ˜ x T ( τ ) Q ˜ xdτ (cid:38) (cid:107) ˜ x (cid:107) L ((0 , ∞ ); R d ) . Since V ( X ( t )) ≤ V ( X ), we likewise have (cid:107) ˜ x (cid:107) L ∞ (0 , ∞ ) (cid:46) V ( X ) and (cid:107) ˜ f (cid:107) L ∞ ((0 , ∞ ); H ) (cid:46) V ( X ).The equation of motion enables a uniform bound on ˙˜ x since (cid:107) ˙˜ x ( t ) (cid:107) R d ≤ (cid:107) A (cid:107)(cid:107) ˜ x ( t ) (cid:107) R d + (cid:107) B (cid:107)(cid:107) E x ( t ) ˜ f ( t ) (cid:107) R d , ≤ (cid:107) A (cid:107)(cid:107) ˜ x ( t ) (cid:107) R d + ˜ C (cid:107) B (cid:107)(cid:107) ˜ f ( t ) (cid:107) H , (3.4) ≤ (cid:107) A (cid:107)(cid:107) ˜ x (cid:107) L ∞ ((0 , ∞ ); R d ) + ˜ C (cid:107) B (cid:107)(cid:107) ˜ f (cid:107) L ∞ ((0 , ∞ ) ,H ) . Since ˜ x ∈ L ∞ ((0 , ∞ ); R d )) ∩ L ((0 , ∞ ); R d ) and ˙˜ x ∈ L ∞ ((0 , ∞ ); R d ), we conclude by general-izations of Barbalat’s lemma [39] that ˜ x ( t ) → t → ∞ .It is evident that Theorem 2 yields results about stability and convergence over the RKHSof the state estimate error to zero that are analogous to typical results for conventional ODEsystems. As expected, conclusions for the convergence of the function estimates ˆ f to f are moredifficult to generate, and they rely on persistency of excitation conditions that are suitablyextended to the RKHS framework. Definition 1.
We say that the plant in the RKHS Equation 1.12 is strongly persistentlyexciting if there exist constants ∆ , γ > , and T such that for f ∈ H with (cid:107) f (cid:107) H = 1 and t > T sufficiently large, (cid:90) t +∆ t (cid:0) E ∗ x ( τ ) E x ( τ ) f, f (cid:1) H dτ (cid:38) γ. As in the consideration of ODE systems, persistency of excitation is sufficient to guaranteeconvergence of the function parameter estimates to the true function.
Theorem 3.
Suppose that the plant in Equation 1.12 is strongly persistently exciting and thateither (i) the function k ( x ( . ) , x ( . )) ∈ L ((0 , ∞ ); R ) , or (ii) the matrix − A is coercive in thesense that ( − Av, v ) ≥ c (cid:107) v (cid:107) ∀ v ∈ R d and Γ = P = I d . Then the parameter function error ˜ f converges strongly to zero, lim t →∞ (cid:107) f − ˆ f ( t ) (cid:107) H = 0 . Proof.
We begin by assuming ( i ) holds, In the proof of Theorem 2 it is shown that V is boundedbelow and non-increasing, and therefore approaches a limitlim t →∞ V ( t ) = V ∞ < ∞ . Since ˜ x ( t ) → t → ∞ , we can conclude that the limitlim t →∞ (cid:107) ˜ f ( t ) (cid:107) H (cid:46) V ∞ . Suppose that V ∞ (cid:54) = 0 . Then there exists a positive, increasing sequence of times { t k } k ∈ N withlim k →∞ t k = ∞ and some constant δ > (cid:107) ˜ f ( t k ) (cid:107) H ≥ δ for all k ∈ N . Since the RKHS is persistently exciting, we can write (cid:90) t k +∆ t k (cid:16) E ∗ x ( τ ) E x ( τ ) ˜ f ( t k ) , ˜ f ( t k ) (cid:17) H dτ (cid:38) γ (cid:107) ˜ f ( t k ) (cid:107) H ≥ γδ k ∈ N . By the reproducing property of the RKHS, we can then see that γδ ≤ γ (cid:107) ˜ f ( t k ) (cid:107) H (cid:46) (cid:90) t k +∆ t k (cid:16) κ x ( τ ) , ˜ f ( t k ) (cid:17) H dτ ≤ (cid:107) ˜ f ( t k ) (cid:107) H (cid:90) t k +∆ t k (cid:107) κ x ( τ ) (cid:107) H dτ = (cid:107) ˜ f ( t k ) (cid:107) H (cid:90) t k +∆ t k (cid:0) κ x ( τ ) , κ x ( τ ) (cid:1) H dτ = (cid:107) ˜ f ( t k ) (cid:107) H (cid:90) t k +∆ t k κ ( x ( τ ) , x ( τ )) dτ. Since κ r ( x ( . ) , x ( . )) ∈ L ((0 , ∞ ); R ) by assumption, when we take the limit as k → ∞ , we obtainthe contradiction 0 < γ ≤
0. We conclude therefore that V ∞ = 0 and lim t →∞ (cid:107) ˜ f ( t ) (cid:107) H = 0.We outline the proof when (ii) holds, which is based on slight modifications of argumentsthat appear in [40, 2, 41, 42, 3, 43] that treat a different class of infinite dimensional nonlinearsystems whose state space is cast in terms of a Gelfand triple. Perhaps the simplest analysisfollows from [2] for this case. Our hypothesis that Γ = P = I d reduces Equations 1.14 to theform of Equations 2.20 in [2]. The assumption that − A is coercive in our theorem implies thecoercivity assumption (A4) in [2] holds. If we define X = Y := R n × H , then it is clear thatthe imbeddings Y → X → Y are continuous and dense, so that they define a Gelfand triple.Because of the trivial form of the Gelfand triple in this case, it is immediate that the Gardinginequality holds in Equation 2.17 in [2]. We identify BE x ( t ) as the control influence operator B ∗ ( u ( t )) in [2]. Under these conditions, Theorem 3 follows from Theorem 3.4 in [2] as a specialcase.
4. Finite Dimensional Approximations
The governing system in Equations 1.14 constitute a distributed parameter system sincethe functions ˜ f ( t ) evolve in the infinite dimensional space H . In practice these equations mustbe approximated by some finite dimensional system. Let { H n } n ∈ N ⊆ H be a nested sequenceof subspaces. Let Π j be a collection of approximation operators Π j : H → H n such thatlim j →∞ Π j f = f for all f ∈ H and sup j ∈ N (cid:107) Π j (cid:107) ≤ C for a constant C >
0. Perhaps the mostevident example of such collection might choose Π j as the H -orthogonal projection for a densecollection of subspaces H n . It is also common to choose Π j as a uniformly bounded family ofquasi-interpolants [36]. We next construct a finite dimensional approximations ˆ x j and ˆ f j of theonline estimation equations in ˙ˆ x j ( t ) = A ˆ x j ( t ) + BE x ( t ) Π ∗ j ˆ f j ( t ) , (4.1)˙ˆ f j ( t ) = Γ − j (cid:0) BE x ( t ) Π ∗ j (cid:1) ∗ P ˜ x j ( t ) (4.2)with ˜ x j := x − ˆ x j . It is important to note that in the above equation Π j : H → H n , andΠ ∗ j : H n → H . Theorem 4.
Suppose that x ∈ C ([0 , T ] , R d ) and that the embedding i : H → C (Ω) is uniformin the sense that (cid:107) f (cid:107) C (Ω) ≡ (cid:107) if (cid:107) C (Ω) ≤ C (cid:107) f (cid:107) H . (4.3) Then for any
T > , (cid:107) ˆ x − ˆ x j (cid:107) C ([0 ,T ]; R d ) → , (cid:107) ˆ f − ˆ f j (cid:107) C ([0 ,T ]; H ) → , s j → ∞ .Proof. Define the operators Λ( t ) := BE x ( t ) : H → R d and for each t ≥
0, introduce the measuresof state estimation error x j := ˆ x − ˆ x j , and define the function estimation error f j = ˆ f − ˆ f j .Note that ˜ x j := x − ˆ x j = x − ˆ x + ˆ x − ˆ x j = ˜ x + x j . The time derivative of the error induced byapproximation of the estimates can be expanded as follows:12 ddt (cid:0) ( x j , x j ) R d + ( f j , f j ) H (cid:1) = ( ˙ x j , x j ) R d + ( ˙ f j , f j ) H = ( Ax j + Λ f j , x j ) R d + (cid:0)(cid:0) Γ − − Π ∗ j Γ − j Π j (cid:1) Λ ∗ P ˜ x, f j (cid:1) H − (cid:0) Π ∗ j Γ − j Π j Λ ∗ P x j , f j (cid:1) H ≤ C A (cid:107) x j (cid:107) R d + (cid:107) Λ (cid:107)(cid:107) f j (cid:107) H (cid:107) x j (cid:107) R d + (cid:107) Γ − ( I − ΓΠ ∗ j Γ − j Π j )Λ ∗ P ˜ x (cid:107) H (cid:107) f j (cid:107) H + (cid:13)(cid:13) Π ∗ j Γ − j Π j Λ ∗ P (cid:13)(cid:13) (cid:107) x j (cid:107)(cid:107) f j (cid:107)≤ C A (cid:107) x j (cid:107) R d + 12 (cid:107) Λ (cid:107) (cid:0) (cid:107) f j (cid:107) H + (cid:107) x j (cid:107) R d (cid:1) + 12 (cid:107) Π ∗ j Γ − j Π j (cid:107)(cid:107) Λ ∗ (cid:107)(cid:107) P (cid:107) (cid:0) (cid:107) x j (cid:107) R d + (cid:107) f j (cid:107) H (cid:1) + 12 (cid:0) Γ − ( I − ΓΠ ∗ j Γ − j Π j )Λ ∗ P ˜ x (cid:107) H + (cid:107) f j (cid:107) H (cid:1) ≤ (cid:107) Γ − (cid:107)(cid:107) Λ ∗ (cid:107)(cid:107) P (cid:107)(cid:107) I − ΓΠ ∗ j Γ − j Π j (cid:107) (cid:107) ˜ x (cid:107) R d ++ (cid:18) C A + 12 (cid:107) Λ (cid:107) + 12 C B (cid:107) Λ ∗ (cid:107)(cid:107) P (cid:107) (cid:19) (cid:107) x j (cid:107) R d + 12 (cid:18) (cid:107) Λ (cid:107) + 1 + 12 C B (cid:107) Λ ∗ (cid:107)(cid:107) P (cid:107) (cid:19) (cid:107) f j (cid:107) H We know that (cid:107) Λ( t ) (cid:107) = (cid:107) Λ ∗ ( t ) (cid:107) is bounded uniformly in time from the assumption that H isuniformly embedded in C (Ω). We next consider the operator error that manifests in the term(Γ − − Π ∗ j Γ − j Π j ). For any g ∈ H we have (cid:107) (Γ − − Π ∗ j Γ − j Π j ) g (cid:107) H = (cid:107) Γ − ( I − ΓΠ ∗ j Γ − j Π j ) g (cid:107) H ≤ (cid:107) Γ − (cid:107)(cid:107) (Π j + ( I − Π j )) ( I − ΓΠ ∗ j Γ − j Π j ) g (cid:107) H (cid:46) (cid:107) I − Π j (cid:107)(cid:107) g (cid:107) H . This final inequality follows since Π j ( I − ΓΠ ∗ j Γ − j Π j ) = 0 and ΓΠ ∗ j Γ − j Π j ≡ ΓΠ ∗ j (cid:0) Π j ΓΠ ∗ j (cid:1) − Π j is uniformly bounded. We then can write ddt (cid:0) (cid:107) x j (cid:107) R d + (cid:107) f j (cid:107) H (cid:1) ≤ C (cid:107) I − ΓΠ ∗ j Γ − j Π j (cid:107) + C (cid:0) (cid:107) x j (cid:107) R d + (cid:107) f j (cid:107) H (cid:1) where C , C >
0. We integrate this inequality over the interval [0 , T ] and obtain (cid:107) x j ( t ) (cid:107) R d + (cid:107) f j ( t ) (cid:107) H ≤ (cid:107) x j (0) (cid:107) R d + (cid:107) f j (0) (cid:107) H + C T (cid:107) I − ΓΠ ∗ j Γ − j Π j (cid:107) + C (cid:90) T (cid:0) (cid:107) x j ( τ ) (cid:107) R d + (cid:107) f j ( τ ) (cid:107) H (cid:1) dτ We can always choose ˆ x (0) = ˆ x j (0), so that x j (0) = 0. If we choose ˆ f j (0) := Π j ˆ f (0) then, (cid:107) f j (0) (cid:107) = (cid:107) ˆ f (0) − Π j ˆ f (0) (cid:107) H ≤ (cid:107) I − Π j (cid:107) H (cid:107) ˆ f (0) (cid:107) H . The non-decreasing term can be rewritten as C T (cid:107) I − ΓΠ ∗ j Γ − j Π j (cid:107) ≤ C (cid:107) I − Π j (cid:107) H . (cid:107) x j ( t ) (cid:107) R d + (cid:107) f j ( t ) (cid:107) H ≤ C (cid:107) I − Π j (cid:107) H + C (cid:90) T (cid:0) (cid:107) x j ( τ ) (cid:107) R d + (cid:107) f j ( τ ) (cid:107) H (cid:1) dτ (4.4)16 igure 6: Experimental setup and definition of basis functions Let α ( t ) := C (cid:107) I − Π j (cid:107) H and applying Gronwall’s inequality to equation 4.4, we get (cid:107) x j ( t ) (cid:107) R d + (cid:107) f j ( t ) (cid:107) H ≤ α ( t ) e C T (4.5)As j → ∞ we get α ( t ) →
0, this implies x j ( t ) → f j ( t ) →
0. Therefore the finitedimensional approximation converges to the infinite dimensional states in R d × H .
5. Numerical Simulations
A schematic representation of a quarter car model consisting of a chassis, suspension androad measuring device is shown in Fig 6. In this simple model the displacement of car suspen-sion and chassis are x and x respectively. The arc length s measures the distance along thetrack that vehicle follows. The equation of motion for the two DOF model has the form, M ¨ x ( t ) + C ˙ x ( t ) + Kx ( t ) = Bf ( s ( t )) (5.1)with the mass matrix M ∈ R × , the stiffness matrix K ∈ R × , the damping matrix C ∈ R × ,the control influence vector b ∈ R × in this example. The road profile is denoted by theunknown function f : R → R . For simulation purposes, the car is assumed to traverse acircular path of radius R , so that we restrict attention to periodic round profiles f : [0 , R ] → R .To illustrate the methodology, we first assume that the unknown function, f is restricted tothe class of uncertainty mentioned in Equation 1.4 and therefore can be approximated as f ( · ) = n (cid:88) i =1 α ∗ i k x i ( · ) (5.2)with n as the number of basis functions, α ∗ i are the true unknown coefficients to be estimated,and k x i ( · ) are basis functions over the circular domain. Hence the state space equation can bewritten in the form ˙ x ( t ) = Ax ( t ) + B n (cid:88) i =1 α ∗ i k x i ( s ( t )) . (5.3)where the state vector x = [ ˙ x , x , ˙ x , x ], the system matrix A ∈ R × , and control influencematrix B ∈ R × . For the quarter car model shown in Fig. 6 we derive the matrices, A = − c m − ( k + k ) m c m k m − c m ( k ) m − c m − k m and B = k m . { x , x , x , x , s } and append an ODE that specifies˙ s ( t ) for t ∈ R + the equations 5.3 can be written in the form of equations 1.1.Then the finitedimensional set of coupled ODE’s for the adaptive estimation problem can be written in termsof the plant dynamics, estimator equation, and the learning law which are of the form shownin Equations 1.5, 1.6, and 1.7 respectively. The constants in the equation are initialized as follows: m = 0 . m = 0 . k =50000 N/m, k = 30000 N/m and c = 200 Ns/m, Γ = 0 . R = 4 m, the road profile to be estimated is assumed to have the shape f ( · ) = κ sin(2 πν ( · ))where ν = 0 .
04 Hz and κ = 2. Thus our adaptive estimation problem is formulated for asynthetic road profile in the RKHS H = { k x ( · ) | x ∈ Ω } with k x ( · ) = e −(cid:107) x −·(cid:107) σ . The radialbasis functions, each with standard deviation of σ = 50, span over the range of 25 o withtheir centers s i evenly separated along the arc length. It is important to note that we havechosen a scattered basis that can be located at any collection of centers { s i } ni =1 ⊆ Ω but theuniformly spaced centers are selected to illustrate the convergence rates. Fig.7 shows the finite
140 145 150 155 160 165 170 175 180 185 190
Arc Length -2.5-2-1.5-1-0.500.511.522.5 R o a d p r o f il e ( m ) Road Profile and Estimates
True Road Surfacen=10n=20n=30n=40n=50n=70n=100
Figure 7: Road surface estimates for n = { , , · · · , } dimensional estimates ˆ f of the road and the true road surface f for different number of basiskernels ranging from n = { , , · · · , } . The plots in Fig.8 show the rate of convergence of log( n ) -1-0.500.5 l og ( L E rr o r ) L Error vs. Number of Basis log( n ) -2.5-2-1.5-1-0.500.5 l og ( C ( Ω ) E rr o r ) C ( Ω ) Error vs Number of Basis Figure 8: Convergence rates using Gaussian kernel for synthetic data L error and the C (Ω) error with respect to the number of basis functions. The log along theaxes in the figures refer to the natural logarithm unless explicitly specified.18 .2. Experimental Road Profile Data The road profile to be estimated in this subsection is based on the experimental data ob-tained from the Vehicle Terrain Measurement System shown in Fig. 9. The constants in theestimation problem are initialized to the same numerical values as in previous subsection. In
U - grid coordinate (meters)
Long i t ud i na l e l e v a t i on p r o f il e ( m e t e r s ) Longitudinal profile plot
Left profileCenter ProfileRight Profile
Longitudinal Elevation Profile. Circular Path followed by VTMS.
Figure 9: Experimental Data From VTMS. the first study in this section the adaptive estimation problem is formulated in the RKHS H = k x ( · ) | x ∈ Ω } with k x ( · ) = e −(cid:107) x −·(cid:107) σ . The radial basis functions, each with standard devia-tion of σ = 50, span over the range of with a collection of centers located at { s i } ni =1 ⊆ Ω evenlyseparated along the arclength. This is repeated for kernels defined using B-splines of first orderand second order respectively.Fig.10 shows the finite dimensional estimates of the road and the true road surface f fora data representing single lap around the circular track, the finite dimensional estimates ˆ f n are plotted for different number of basis kernels ranging from n = { , , · · · , } using theGaussian kernel as well as the second order B-splines. The finite dimensional estimates ˆ f n ofthe road profile and the true road profile f for data collected representing multiple laps aroundthe circular track is plotted for the first order B-splines as shown in Fig. 11. The plots in Fig. 12show the rate of convergence of the L error and the C (Ω) error with respect to number of basisfunctions. It is seen that the rate of convergence for 2 nd order B-Spline is better as comparedto other kernels used to estimate in these examples. This corroborates the fact that smootherkernels are expected to have better convergence rates.Also, the condition number of the Grammian matrix varies with n , as illustrated in Table.1and Fig.13. This is an important factor to consider when choosing a specific kernel for theRKHS embedding technique since it is well known that the error in numerical estimates ofsolutions to linear systems is bounded above by the condition number. The implementation ofthe RKHS embedding method requires such a solution that depends on the grammian matrix ofthe kernel bases at each time step. We see that the condition number of Grammian matrices forexponentials is O (10 ) greater than the corresponding matrices for splines. Since the sensitivityof the solutions of linear equations is bounded by the condition numbers, it is expected that theuse of exponentials could suffer from a severe loss of accuracy as the dimensionality increases.The development for preconditioning techniques for Grammian matrices constructed from radialbasis functions to address this problem is an area of active research.19 Arc Length -0.8-0.6-0.4-0.200.20.40.60.8 R o a d p r o f il e ( m ) Road Profile and Estimates
True Road Surfacen=35n=50n=75n=100n=110n=140
Arc Length -0.8-0.6-0.4-0.200.20.40.60.8 R o a d p r o f il e ( m ) Road Profile Estimate
True Road Surfacen=50n=60n=70n=80n=90n=100n=110n=120
Road surface estimates for Gaussian kernels Road surface estimate for second-order B-splines
Figure 10: Road surface estimates for single lap
Arc Length -0.8-0.6-0.4-0.200.20.40.60.8 R o a d p r o f il e ( m ) Road Profile Estimate
True Road Surfacen=30n=35n=40n=50n=60n=70n=80n=100
Figure 11: Road surface estimate using first-order B-splines log( n ) -1-0.500.5 l og ( L E rr o r ) L Convergence Rate for choice of kernels
Gaussian Radial Basis Functions1st order B-Splines2nd order B-Splines log ( n ) -1-0.500.511.5 l og ( C ( Ω ) E rr o r ) C ( Ω ) Convergence rates for choice of kernels Gaussian Radial Basis Functions1st order B-Splines2nd order B-Splines
Figure 12: Convergence rates for different kernels × ConditionNo.(Secondorder B-Splines) × ConditionNo.(GaussianKernels) ×
10 0.6646 0.3882 0.000120 1.0396 0.9336 0.001730 1.4077 1.5045 0.002940 1.7737 2.0784 0.007450 2.1388 2.6535 0.016760 2.5035 3.2293 0.010270 2.8678 3.8054 0.054280 3.2321 4.3818 0.057190 3.5962 4.9583 0.7624100 3.9602 5.5350 1.3630
Table 1: Condition number of Grammian Matrix vs Number of Basis Functions
10 20 30 40 50 60 70 80 90 100
Number of basis functions = n C ond i t i on nu m b e r = l og ( ρ ) Condition Number for Grammian Matrix
Gaussian KernelFirst Order B-SplineSecond Order B-Spline
Figure 13: Condition Number of Grammian Matrix vs Number of Basis Functions
6. Conclusions
In this paper, we introduced a novel framework based on the use of RKHS embeddingto study online adaptive estimation problems. The applicability of this framework to solveestimation problems that involve high dimensional scattered data approximation provides themotivation for the theory and algorithms described in this paper. A quick overview of thebackground theory on RKHS enables rigorous derivation of the results in Sections 3 and 4. Inthis paper we derive (1) the sufficient conditions for the existence and uniqueness of solutionsto the RKHS embedding problem, (2) the stability and convergence of the state estimationerror, and (3) the convergence of the finite dimensional approximate solutions to the solutionof the infinite dimensional state space. To illustrate the utility of this approach, a simplifiednumerical example of adaptive estimation of a road profile is studied and the results are criticallyanalyzed. It would be of further interest to see the ramifications of using multiscale kernels21o achieve semi-optimal convergence rates for functions in a scale of Sobolev spaces. It wouldlikewise be important to extend this framework to adaptive control problems and examine theconsequences of persistency of excitation conditions in the RKHS setting, and further extendthe approach to adaptively generate bases over the state space.
References [1] Holger Wendland.
Scattered data approximation . Cambridge University Press, 2005.[2] M.A. Demetriou J. Baumeister, W. Scondo and I.G. Rosen. On-line parameter estimationfor infinite dimensional dynamical systems.
SIAM Journal of Control and Optimisation ,1997.[3] S. Reich M. Bohm, M.A. Demetriou and I.G. Rosen. Model reference adaptive control ofdistributed parameter systems.
SIAM Journal of Control and Optimisation , 1998.[4] Y. Chen W. Dong, Y. Zhao and J.A. Farrell. Tracking control for nonaffine systems:A self-organizing approximation approach.
IEEE Transactions on Neural Networks andLearning Systems , 2012.[5] Sebastian Thrun, Wolfram Burgard, and Dieter Fox.
Probabilistic Robotics . 2005.[6] H. Durrant-Whyte and T. Bailey. Simultaneous localization and mapping: Part I.
IEEERobotics Automation Magazine , 13(2):99–110, June 2006.[7] T. Bailey and H. Durrant-Whyte. Simultaneous localization and mapping (SLAM): partII.
IEEE Robotics Automation Magazine , 13(3):108–117, Sept 2006.[8] G. Dissanayake, S. Huang, Z. Wang, and R. Ranasinghe. A review of recent developmentsin simultaneous localization and mapping. , 2011.[9] G. Dissanayake, H. Durrant-Whyte, and T. Bailey. A computationally efficient solutionto the simultaneous localisation and map building (SLAM) problem. In
Proceedings 2000ICRA. Millennium Conference , 2000.[10] Shoudong Huang and Gamini Dissanayake. Convergence and consistency analysis forextended kalman filter based SLAM.
Transaction on Robotics , 23(5):1036–1049, October2007.[11] S. J. Julier and J. K. Uhlmann. A counter example to the theory of simultaneous localiza-tion and map building. In
Proceedings 2001 IEEE International Conference on Roboticsand Automation , 2001.[12] Yves Meyer.
Wavelets and operators . Cambridge University Press, 1992.[13] Stephane Mallat.
A wavelet tour of signal processing . Academic Press, 1999.[14] Ingrid Daubechies.
Ten Lectures on Wavelets . SIAM, 1992.[15] Ronald A. DeVore and George Lorentz.
Constructive Approximation . Springer-Verlag,1993.[16] Roland Opfer. Tight frame expansions of multiscale reproducing kernels in sobolev spaces.
Applied Computational Harmonic Analysis , 2006.2217] Roland Opfer. Multiscale kernels.
Advances in Computational Mathematics , 2006.[18] Dominque Picard Ronald DeVore, Gerard Kerkyacharian and Vladimir Temlyakov. Ap-proximation methods for supervised learning.
Foundations of Computational Mathematics ,2006.[19] S.V. Konyagin and V.N. Temlyakov. The entropy in learning theory. Error Estimates.
Constructive Approximation , 25(1):1–27, 2007.[20] Gerard Kerkyacharian Albert Cohen, Ronald DeVore and Dominique Picard. Maximalspaces with given rate of convergence for thresholding algorithms.
Applied and Computa-tional Harmonic Analysis , 2001.[21] V.N. Temlyakov. Approximation in learning theory.
Constructive Approximation , 2008.[22] Shankar Sastry and Marc Bodson.
Adaptive Control: Stability, Convergence and Robust-ness . Dover, 2011.[23] Petros A. Ioannou and Jing Sun.
Robust Adaptive Control . Dover, 2012.[24] Jay A. Farrell and Marios M. Polycarpou.
Adaptive approximation based control: unifyingneural, fuzzy and traditional adaptive approximation approaches . Wiley, 2006.[25] B. Maslowski T.E. Duncan and B. Pasik-Duncan. Adaptive boundary and point controlof linear stochastic distributed parameter systems.
SIAM J. Control Optim. , 1997.[26] T.E. Duncan and B. Pasik-Duncan. Adaptive control of linear delay time systems.
Stochas-tics , 1988.[27] B. Pasik-Duncan T.E. Duncan and B. Goldys. Adaptive control of linear stochastic evo-lution systems.
Stochastics Rep. , 1991.[28] B. Pasik-Duncan. On the consistency of a least squares identification procedure in linearevolution systems.
Stochastics Report. , 1992.[29] N. Hovakimyan and C. Cao. L Adaptive Control Theory . SIAM, 2010.[30] K.S Narendra and A.M.Annaswamy.
Stable Adaptive Systems . Prentice Hall, 1989.[31] K.S Narendra and K.Parthasarthy. Identification and control of dynamical systems usingneural networks.
IEEE Trans. Neural Networks , 1990.[32] K. S. Narendra and P. Kudva. Stable adaptive schemes for system identification and control- Part II.
IEEE Transactions on Systems, Man, and Cybernetics , SMC-4(6):552–560, Nov1974.[33] A.P.Morgan and K.S Narendra. On stability of nonautonomous differential equations˙ x = [ a + b ( t )] x , with skew symmetric b ( t ). SIAM Journal of Control and Optimisation ,1977.[34] H.T. Banks and K. Kunisch.
Estimation Techniques for Distributed Parameter Systems .Birkhauser, 1989.[35] S. Smale and X. Zhou. Learning theory estimates via integral operators and their approx-imations.
Constructive Approximation , 2007.2336] Ronald A. DeVore. Adapting to unknown smoothness via wavelet shrinkage.
Acta Numer-ica , 1998.[37] Adams R. A. and Fournier John.
Sobolev spaces . Elsevier, 2003.[38] Amnon Pazy.
Semigroups of Linear Operators and Applications to Partial DifferentialEquations . Springer, 2011.[39] Balint Farkas and Sven ake Wegner. Variations on barbalat’s lemma.
Arxiv:1411.1611v3 ,2016.[40] M.A. Demetriou.
Adaptive Parameter Estimation of Abstract Parabolic and HyperbolicDistributed Parameter Systems . PhD thesis, University of Southern California, 1993.[41] M.A. Demetriou and I.G. Rosen. Adaptive identification of second order distributed pa-rameter systems.
Inverse Problems , 1994.[42] M.A. Demetriou and I.G. Rosen. On the persistence of excitation in the adaptive iden-tification of distributed parameter systems.
IEEE Transactions of Automatic Control ,1994.[43] Joseph Kazimir and I.G. Rosen. Adaptive estimation of nonlinear distributed parametersystems.