Recursive Least Squares with Variable-Direction Forgetting -- Compensating for the loss of persistency
aa r X i v : . [ m a t h . O C ] M a r Recursive Least Squareswith Variable-Direction Forgetting
Compensating for the loss of persistencyAnkit Goel, Adam L. Bruce, and Dennis S. BernsteinPOC: A. Goel ([email protected])
The ability to estimate parameters depends on two things, namely, identifiability [1], whichis ability to distinguish distinct parameters, and persistent excitation, which refers to the spectral content of the signals needed to ensure convergence of the parameter estimates to the trueparameter values [2]–[4]. Roughly speaking, the level of persistency must be commensurate with the number of unknown parameters. For example, a harmonic input has two-dimensionalpersistency and thus can be used to identify two parameters, whereas white noise is sufficiently persistent for identifying an arbitrary number of parameters. Within the context of adaptivecontrol, persistent excitation is needed to avoid bursting [5]; recent research has focused on relaxing these requirements [6]–[8].Under persistent excitation, a key issue in practice is the rate of convergence, especially under changing conditions. For example, the parameters of a system may change abruptly, andthe goal is to ensure fast convergence to the modified parameter values. In this case, it turns out that the rate of convergence depends on the ability to forget past parameters and incorporate newinformation. As discussed in “Summary,” the ability to accommodate new information depends on the ability to forget; the ability to forget is thus crucial to the ability to learn. This paradoxis widely recognized, and effective forgetting is of intense interest in machine learning [9]–[12]. In the first half of the present article, classical forgetting within the context of recursive leastsquares (RLS) is considered. In the classical RLS formulation [13]–[16], a constant forgetting factor λ ∈ (0 , can be set by the user. However, it often occurs in practice that the performanceof RLS is extremely sensitive to the choice of λ , and suitable values in the range . to . are typically found by trial-and-error testing. This difficulty has motivated extensionsof classical RLS in the form of variable-rate forgetting [17]–[23], constant trace adjustment, covariance resetting, and covariance modification [24], [25].In the second half of this article, variable-direction forgetting (VDF), a technique that excitation, new information is confined to a limited number of directions. The goal of VDF isthus to determine these directions and thereby constrain forgetting to the directions in which new information is available. VDF allows RLS to operate without divergence during periods ofloss of persistency. The goal of this tutorial article is to investigate the effect of forgetting within the contextof RLS in order to motivate the need for VDF. With this motivation in mind, the article develops and illustrates RLS with VDF. The presentation is intended for graduate students who may wishto understand and apply this technique to system identification for modeling and adaptive control. Table 1 and 2 summarizes the results and examples in this article. Some of the content in thisarticle appeared in preliminary form in [33]. Although, in practical applications, all sensor measurements are corrupted by noise, theeffect of sensor noise is not considered in this article in order to focus on the loss of persistency. Alternative interpretations of RLS in the special case of zero-mean, white sensor noise arepresented in “RLS as a One-Step Optimal Predictor” and “RLS as a Maximum Likelihood Estimator”.
Recursive Least Squares Consider the model y k = φ k θ, (1)where, for all k ≥ , y k ∈ R p is the measurement, φ k ∈ R p × n is the regressor matrix, and θ ∈ R n is the vector of unknown parameters. The goal is to estimate θ as new data become available.One approach to this problem is to minimize the quadratic cost function J k (ˆ θ ) △ = k X i =0 λ k − i ( y i − φ i ˆ θ ) T ( y i − φ i ˆ θ ) + λ k +1 (ˆ θ − θ ) T R (ˆ θ − θ ) , (2)where λ ∈ (0 , is the forgetting factor , R ∈ R n × n is positive definite, and θ ∈ R n is theinitial estimate of θ . The forgetting factor applies higher weighting to more recent data, thereby enhancing the ability of RLS to use incoming data to estimate time-varying parameters. Thefollowing result is recursive least squares . Theorem 1:
For all k ≥ , let φ k ∈ R p × n and y k ∈ R p , let R ∈ R n × n be positive definite,and define P △ = R − , θ ∈ R n , and λ ∈ (0 , . Furthermore, for all k ≥ , denote the minimizer2ABLE 1: Summary of definitions and results in this article.Definition 1 Persistently exciting regressorDefinition 2 Lyapunov stable equilibriumDefinition 3 Uniformly Lyapunov stable equilibriumDefinition 4 Globally aymptotically stable equilibriumDefinition 5 Uniformly globally geometrically stable equilibriumTheorem 1-2 Recursive least squares (RLS)Theorem 3-5 Lyapunov stability theoremsTheorem 6 Lyapunov analysis of RLS for λ ∈ (0 , Theorem 7 Stability analysis of RLS for λ ∈ (0 , based on θ k Theorem S1 A Quadratic Cost Function for Variable-Direction RLSProposition 1 Recursive update of P − k with uniform-direction forgettingProposition 2 Data-dependent subspace constraint on θ k Proposition 3 Bounds on P k for λ = 1 Proposition 4 Bounds on P k for λ ∈ (0 , Proposition 5 Converse of Proposition 4Proposition 6 Convergence of z k with uniform-direction forgettingProposition 7 Persistent excitation and A k Proposition 8 Recursive update of P − k with variable-direction forgettingProposition 9 Convergence of z k with variable-direction forgettingProposition 10 Bounds on P k with variable-direction forgettingof (2) by θ k +1 = argmin ˆ θ ∈ R n J k (ˆ θ ) . (3)Then, for all k ≥ , θ k +1 is given by P k +1 = 1 λ P k − λ P k φ T k (cid:0) λI p + φ k P k φ T k (cid:1) − φ k P k , (4) θ k +1 = θ k + P k +1 φ T k ( y k − φ k θ k ) . (5) Proof:
See [13]. (cid:3)
The following result is a variation of Theorem 1, where the updates of P k and θ k arereversed. Theorem 2:
For all k ≥ , let φ k ∈ R p × n and y k ∈ R p , let R ∈ R n × n be positive definite,3ABLE 2: Summary of examples in this article.Example 1 P k converges to zero without persistent excitationExample 2 Persistent excitation and bounds on P − k Example 3 Lack of persistent excitation and bounds on P − k Example 4 Convergence of z k and θ k Example 5 Using κ ( P k ) to determine whether ( φ k ) ∞ k =0 is persistently excitingExample 6 Effect of λ on the rate of convergence of θ k Example 7 Lack of persistent excitation in scalar estimationExample 8 Subspace constrained regressorExample 9 Effect of lack of persistent excitation on θ k Example 10 Lack of persistent excitation and the information-rich subspaceExample 11 Variable-direction forgetting for a regressor lacking persistent excitationExample 12 Effect of variable-direction forgetting on θ k and define P △ = R − , θ ∈ R n , and λ ∈ (0 , . Furthermore, for all k ≥ , denote the minimizerof (2) by (3). Then, for all k ≥ , θ k +1 is given by θ k +1 = θ k + P k φ T k ( λI + φ k P k φ T k ) − ( y k − φ k θ k ) , (6) P k +1 = 1 λ P k − λ P k φ T k ( λI + φ k P k φ T k ) − φ k P k . (7) Proof:
See [13]. (cid:3)
Proposition 1:
Let λ ∈ (0 , ∞ ) , and let ( P k ) ∞ k =0 be a sequence of n × n positive-definitematrices. Then, for all k ≥ , ( P k ) ∞ k =0 satisfies (4) if and only if, for all k ≥ , ( P k ) ∞ k =0 satisfies P − k +1 = λP − k + φ T k φ k . (8) Proof:
To prove necessity, it follows from (8) and matrix-inversion lemma, that P k +1 = ( λP − k + φ T k φ k ) − = ( λP − k ) − − ( λP − k ) − φ T k ( I p + φ k ( λP − k ) − φ T k ) − φ k ( λP − k ) − = 1 λ P k − λ P k φ T k (cid:0) λI p + φ k P k φ T k (cid:1) − φ k P k . Reversing these steps proves sufficiency. (cid:3)
Let k ≥ . By defining the parameter error ˜ θ k △ = θ k − θ, (9)4t follows that φ i θ k − y i = φ i ˜ θ k . (10)Using (10) with k replaced by k + 1 , it follows that the minimum value of J k is given by J k ( θ k +1 ) = k X i =0 λ k − i ˜ θ T k +1 φ T i φ i ˜ θ k +1 + λ k +1 (˜ θ k +1 − ˜ θ ) T R (˜ θ k +1 − ˜ θ ) . (11)Furthermore, (5) and (9) imply that ˜ θ k satisfies ˜ θ k +1 = ( I n − P k +1 φ T k φ k )˜ θ k (12) = λP k +1 P − k ˜ θ k . (13)Finally, it follows from (13) that, for all k, l ≥ , ˜ θ k = λ k − l P k P − l ˜ θ l . (14)The following result shows that the estimate θ k of θ is constrained to a data-dependent subspace. Let R ( A ) denote the range of the matrix A . Proposition 2:
For all k ≥ , let φ k ∈ R p × n and y k ∈ R p , let R ∈ R n × n be positivedefinite, let θ ∈ R n , let λ ∈ (0 , , and define θ k +1 by (3). Then, θ k +1 satisfies k X i =0 λ k − i φ T i φ i + λ k +1 R ! θ k +1 = k X i =0 λ k − i φ T i y i + λ k +1 Rθ . (15)Furthermore, θ k +1 ∈ R (Φ T k Φ k + R − Φ T k Φ k R − + θ θ T0 ) , (16)where Φ k △ = [ φ T0 · · · φ T k ] T ∈ R ( k +1) p × n . (17) Proof:
Note that J k (ˆ θ ) = ˆ θ T A k ˆ θ + ˆ θ T b k + c k , where A k △ = k X i =0 λ k − i φ T i φ i + λ k +1 R,b k △ = k X i =0 − λ k − i φ T i y i − λ k +1 Rθ ,c k △ = k X i =0 λ k − i y T i y i + λ k +1 θ T0 Rθ . A k is positive definite, it follows from Lemma 1 in [13] that the minimizer θ k +1 of J k satisfies (15).Next, define W k △ = diag( λ − I p , . . . , λ − − k I p ) ∈ R ( k +1) p × ( k +1) p . Using (15) and Lemma 1from “Three Useful Lemmas,” it follows that θ k +1 = (cid:0) I n + Φ T k W k Φ k (cid:1) − k X i =0 λ − i − R − φ T i y i + θ ! = k X i =0 (cid:0) I n + Φ T k W k Φ k (cid:1) − λ − i − R − φ T i y i + (cid:0) I n + Φ T k W k Φ k (cid:1) − θ ∈ k X i =0 R ([Φ T k R − φ T i ]) + R ([Φ T k θ ])= R ([Φ T k R − Φ T k θ ])= R (Φ T k Φ k + R − Φ T k Φ k R − + θ θ T0 ) . (cid:3) Table 3 summarizes various expressions for the RLS variables.TABLE 3: Alternative expressions for the RLS variables.Variable Expression Equation P k • P k +1 = 1 λ P k − λ P k φ T k (cid:0) λI p + φ k P k φ T k (cid:1) − φ k P k (4) • P − k +1 = λP − k + φ T k φ k (8) • P − k +1 = λ k +1 P − + P ki =0 λ k − i φ T i φ i (8) θ k • θ k +1 = θ k + P k +1 φ T k ( y k − φ k θ k ) (5) • θ k +1 = θ k + P k φ T k ( λI p + φ k P k φ T k ) − ( y k − φ k θ k ) (6) • θ k +1 = P k +1 (cid:16)P ki =0 λ k − i φ T i y i + λ k +1 P − θ (cid:17) (15) ˜ θ k • ˜ θ k = θ k − θ (9) • ˜ θ k +1 = ( I n − P k +1 φ T k φ k )˜ θ k (12) • ˜ θ k +1 = λP k +1 P − k ˜ θ k (13) • ˜ θ k = λ k − l P k P − l ˜ θ l (14)6 ersistent Excitation and Forgetting This section defines persistent excitation of the regressor sequence and investigates theeffect of persistent excitation and forgetting on P k . For all j ≥ and k ≥ j, define F j,k △ = k X i = j φ T i φ i . (18) Definition 1:
The sequence ( φ k ) ∞ k =0 ⊂ R p × n is persistently exciting if there exist N ≥ n/p and α, β ∈ (0 , ∞ ) such that, for all j ≥ ,αI n ≤ F j,j + N ≤ βI n . (19)Suppose that ( φ k ) ∞ k =0 is persistently exciting and (19) is satisfied for given values of N, α, β.
Then, with suitably modified values of α and β, (19) is satisfied for all larger values of N . For example, if N is replaced by N, then (19) is satisfied with α replaced by α and β replacedby β. The following result expresses (8) in terms of F ,k in the case where λ = 1 . Lemma 1:
Let λ = 1 and, for all k ≥ , define P k as in Theorem 1. Then, P − k = F ,k + P − . (20)The following result shows that, if ( φ k ) ∞ k =0 is persistently exciting and λ = 1 , then P k converges to zero. Proposition 3:
Assume that ( φ k ) ∞ k =0 ∈ R p × n is persistently exciting, let N, α, β be givenby Definition 1, let R ∈ R n × n be positive definite, define P △ = R − , let λ = 1 , and, for all k ≥ , let P k be given by (4). Then, for all k ≥ N + 1 , (cid:4) kN +1 (cid:5) αI n + P − ≤ P − k ≤ (cid:6) kN +1 (cid:7) βI n + P − . (21)Furthermore, lim k →∞ P k = 0 . (22) Proof:
First, note that, for all k ≥ , F ,k = (cid:22) kN +1 (cid:23) X i =1 F ( i − N +1) ,i ( N +1) − + F (cid:22) kN +1 (cid:23) ( N +1) ,k ≤ (cid:24) kN +1 (cid:25) X i =1 F ( i − N +1) ,i ( N +1) − , (cid:4) kN +1 (cid:5) αI n ≤ (cid:22) kN +1 (cid:23) X i =1 F ( i − N +1) ,i ( N +1) − ≤ (cid:24) kN +1 (cid:25) X i =1 F ( i − N +1) ,i ( N +1) − ≤ (cid:6) kN +1 (cid:7) βI n . (23)It follows from Lemma 1 and (23) that, for all k ≥ N + 1 , (cid:4) kN +1 (cid:5) αI n + P − ≤ F , (cid:22) kN +1 (cid:23) ( N +1) − + P − ≤ F ,k + P − = P − k ≤ F , (cid:24) kN +1 (cid:25) ( N +1) − + P − ≤ (cid:6) kN +1 (cid:7) βI n + P − . Finally, it follows from (21) that lim k →∞ P k = 0 . (cid:3) The following example shows that lim k →∞ P k = 0 does not imply that ( φ k ) ∞ k =0 ispersistently exciting. Example 1: P k converges to zero without persistent excitation. For all k ≥ , let φ k = √ k +1 . Let λ = 1 . For all N ≥ , note that F j,j + N ≤ N +1 j +1 , and thus there does not exist α satisfying (19). Hence, ( φ k ) ∞ k =0 is not persistently exciting. However, it follows from (8) that,for all k ≥ , P − k = k X i =0 i + 1 + P − . (24)Thus, lim k →∞ P k = 0 . ⋄ The following result given in [34] shows that, if ( φ k ) ∞ k =0 is persistently exciting and λ ∈ (0 , , then P k is bounded. Proposition 4:
Assume that ( φ k ) ∞ k =0 ∈ R p × n is persistently exciting, let N, α, β be givenby Definition 1, let R ∈ R n × n be positive definite, define P △ = R − , let λ ∈ (0 , , and, for all k ≥ , let P k be given by (4). Then, for all k ≥ N + 1 ,λ N (1 − λ ) α − λ N +1 I n ≤ P − k ≤ β − λ N +1 I n + P − N . (25)8 roof: It follows from (8) that, for all i ≥ , λP − i ≤ P − i +1 and φ T i φ i ≤ P − i +1 , and thus, forall i, j ≥ , λ j P − i ≤ P − i + j . Hence, for all k ≥ N + 1 , αI n ≤ k − X i = k − N − φ T i φ i ≤ k X i = k − N P − i ≤ ( λ − N + · · · + 1) P − k = 1 − λ N +1 λ N (1 − λ ) P − k , which proves the first inequality in (25). To prove the second inequality in (25), note that, forall k ≥ N + 1 , P − k ≤ − λ − λ N +1 k + N − X i = k − P − i +1 ≤ − λ − λ N +1 λ k + N − X i = k − P − i + βI n ! ≤ − λ − λ N +1 λ k N X i =0 P − i + 1 − λ k − λ βI n ! ≤ λ k − N P − N + (1 − λ k ) β − λ N +1 I n . ≤ P − N + β − λ N +1 I n . (cid:3) The next result, which is an immediate consequence of (8), is a converse of Proposition4. Proposition 5:
Define φ k , y k , R, and P as in Theorem 1, let λ ∈ (0 , , and let P k be givenby (4). Furthermore, assume there exist α, β ∈ (0 , ∞ ) such that, for all k ≥ , αI n ≤ P − k ≤ βI n . Let N ≥ λβ − α (1 − λ ) α . Then, for all j ≥ , [(1 + (1 − λ ) N ) α − λβ ] I n ≤ j + N X i = j φ T i φ i ≤ − λ N +1 λ N (1 − λ ) βI n . (26)Consequently, ( φ k ) ∞ k =0 is persistently exciting. 9 roof: Note that, for all j ≥ , [(1 + (1 − λ ) N ) α − λβ ] I n = αI n + (1 − λ ) N αI n − βI n ≤ P − j + N +1 + (1 − λ ) j + N X i = j +1 P − i − λP − j = j + N X i = j ( P − i +1 − λP − i )= j + N X i = j φ T i φ i , which proves the first inequality in (26). To prove the second inequality in (26), note that(8) implies that, for all i ≥ , λP − i ≤ P − i +1 and φ T i φ i ≤ P − i +1 , and thus, for all i, j ≥ ,λ j P − i ≤ P − i + j . Hence, for all j ≥ , j + N X i = j φ T i φ i ≤ j + N X i = j P − i +1 ≤ ( λ − N + · · · + 1) P − j + N +1 ≤ − λ N +1 λ N (1 − λ ) βI n . Finally, it follows from Definition 1 with N ≥ λβ − α (1 − λ ) α , α = (1 + (1 − λ ) N ) α − λβ, and β = − λ N +1 λ N (1 − λ ) β, that ( φ k ) ∞ k =0 is persistently exciting. (cid:3) The proof of Proposition 5 shows that the condition N ≥ λβ − α (1 − λ ) α is needed to satisfy the lower bound in Definition 1. However, the upper bound in Definition 1 is satisfied for all N ≥ . Example 2: Persistent excitation and bounds on P − k . Let φ k = [ u k u k − ] , where u k isthe periodic signal u k = sin 2 πk
17 + sin 2 πk
23 + sin 2 πk . (27)Figure 1 shows the singular values of F j,j + N for N = 2 and N = 10 , as well as the singular values of P − k with the corresponding upper and lower bounds given by (25) for N = 2 and N = 10 . ⋄ Example 3: Lack of persistent excitation and bounds on P − k . Let φ k = [ u k u k − ] , where u k is given by (27) for all k < and u k = 1 for all k ≥ . Figure 2 shows the singular values of F j,j +2 and the singular values of P − k for λ = 1 and λ = 0 . , respectively. Note that,for λ = 1 , one of the singular values of P − k diverges, whereas, for λ ∈ (0 , , one of singular values of P − k converges to zero. ⋄ predicted error z k △ = φ k θ k − y k converges to zerowhether or not ( φ k ) ∞ k =0 is persistent. Proposition 6:
For all k ≥ , let φ k ∈ R p × n and y k ∈ R p , let R ∈ R n × n be positivedefinite, and let P = R − , θ ∈ R n , and λ ∈ (0 , . Furthermore, for all k ≥ , let P k and θ k be given by (4) and (5), respectively, and define the predicted error z k △ = φ k θ k − y k . Then, lim k →∞ z k = 0 . (28) Proof:
For all k ≥ , note that z k = φ k ˜ θ k , and define V k △ = ˜ θ T k P − k ˜ θ k . Note that, for all k ≥ and ˜ θ k ∈ R n , V k ≥ . Furthermore, for all k ≥ ,V k +1 − V k = ˜ θ T k +1 P − k +1 ˜ θ k +1 − ˜ θ T k P − k ˜ θ k = λ ˜ θ T k P − k P k +1 P − k ˜ θ k − ˜ θ T k P − k ˜ θ k = ( λ ˜ θ T k +1 − ˜ θ T k ) P − k ˜ θ k = − [(1 − λ )˜ θ T k + λ ˜ θ T k φ T k φ k P k +1 ] P − k ˜ θ k = − [(1 − λ )˜ θ T k P − k ˜ θ k + λ ˜ θ T k φ T k φ k P k +1 P − k ˜ θ k ]= − [(1 − λ )˜ θ T k P − k ˜ θ k + ˜ θ T k φ T k [ I p − φ k P k φ T k ( λI p + φ k P k φ T k ) − ] φ k ˜ θ k ]= − [(1 − λ ) V k + z T k [ I p − φ k P k φ T k ( λI p + φ k P k φ T k ) − ] z k ] ≤ . Note that, since ( V k ) ∞ k =1 is a nonnegative, nonincreasing sequence, it converges to a nonnegativenumber. Hence, lim k →∞ ( V k +1 − V k ) = 0 , which implies that lim k →∞ [(1 − λ ) V k + z T k R k z k ] = 0 , where R k △ = I p − φ k P k φ T k ( λI p + φ k P k φ T k ) − . Lemma 2 from “Three Useful Lemmas” impliesthat R k is positive definite. Since V k ≥ , it follows that lim k →∞ z k = 0 . (cid:3) The following example shows that θ k may converge despite the fact that ( φ k ) ∞ k =0 is notpersistent. Example 4: Convergence of z k and θ k . Consider the first-order system y k = 0 . q − . u k , (29)where q is the forward-shift operator. Define φ k △ = [ y k − u k − ] , so that y k = φ k θ , where θ consists of the coefficients in (29). To apply RLS, let P = I , θ = 0 , and λ = 0 . . Figure 3 shows the shows the singular values of F j,j +10 , the predicted error z k , and the parameter estimate θ k for two choices of the input u k . In the first case, for all k ≥ , u k = 1 , whereas in the second case, for all k ≥ , u k = 1 . For both choices of u k , the predicted error z k converges to zero,11hich confirms Proposition 6, and θ k converges. Note that, in these two cases, θ k converges todifferent parameter values, neither of which is the true value. ⋄ Table 4 summarizes the results in this section.TABLE 4: Behavior of P k with and without persistent excitation.Excitation \ λ λ = 1 λ ∈ (0 , Persistent • P k converges to zero • P k is bounded • Proposition 3 • Propositions 4, 5 • Example 2 • Example 2Not Persistent • All singular values of P k are bounded • Some singular values of P k diverge • Some of these converge tozero • The remaining singularvalues are bounded • Example 3 • Example 3
Persistent Excitation and the Condition Number For nonsingular A ∈ R n × n , the condition number of A is defined by κ ( A ) △ = σ max ( A ) σ min ( A ) , (30)For B ∈ R n × m , let k B k denotes the maximum singular value of B . If A is positive definite,then k A − k − I n = σ min ( A ) I n ≤ A ≤ σ max ( A ) I n = k A k I n . (31)Therefore, if α, β ∈ (0 , ∞ ) satisfy α ≤ σ min ( A ) and σ max ( A ) ≤ β , then κ ( A ) ≤ β/α . Thus, if λ = 1 and ( φ k ) ∞ k =0 is persistently exciting with N, α, β given by Definition 1, then (21) impliesthat κ ( P k ) ≤ βα . (32)Similarly, if λ ∈ (0 , and ( φ k ) ∞ k =0 is persistently exciting with N, α, β given by Definition 1,then (25) implies that κ ( P k ) ≤ β + (1 − λ N +1 ) k P − N k λ N (1 − λ ) α . (33)However, as shown by Example 3, in the case where ( φ k ) ∞ k =0 is not persistently exciting, theremight not exist α > satisfying (19), and thus κ ( P k ) cannot be bounded. Hence κ ( P k ) can be ( φ k ) ∞ k =0 is persistently exciting, where a bounded conditionnumber implies that ( φ k ) ∞ k =0 is persistently exciting, and a diverging condition number implies that φ k is not persistently exciting, as illustrated by the following example. [35] provides arecursive algorithm for computing κ ( P k ) . Example 5: Using the condition number of P k to determine whether ( φ k ) ∞ k =0 is persistentlyexciting. Consider the 5th-order system y k = 0 . q − . q − . q − . q + 0 . q − q + 0 . q − . q − . q + 0 . u k , (34)where u k is given by (27). To apply RLS, let θ consist of the coefficients in (34) and let φ k = [ u k − · · · u k − y k − · · · y k − ] , (35)so that y k = φ k θ. Letting P = I , Figure 4 shows the singular values of F j,j +20 and the singularvalues and condition number of P k for λ = 1 and λ = 0 . . In particular, the smallest singular value of F j,j +20 is essentially zero, which indicates that ( φ k ) ∞ k =0 is not persistently exciting.Consequently, in the case where λ = 0 . , P k becomes ill-conditioned. ⋄ In Example 5, the regressor ( φ k ) ∞ k =0 is not persistently exciting. Consequently, in the casewhere λ = 1 , it follows from (20) that P k is bounded by P , and thus all of the singular values of P k are bounded; this property is illustrated by Figure 4. However, Figure 4 also shows thatnot all of the singular values of P k converge to zero. On the other hand, in the case where λ = 0 . , Figure 4 shows that some of the singular values of P k are bounded, whereas theremaining singular values diverge. This example thus shows that singular values can diverge due to the lack of persistent excitation with λ ∈ (0 , . Lyapunov Analysis of the Parameter Error Let k ≥ , and consider the system x k +1 = f ( k, x k ) , (36)where x k ∈ R n , f : { , , , . . . } × R n → R n is continuous, and, for all k ≥ , f ( k,
0) = 0 . Let
D ⊂ R n be an open set such that ∈ D . Definition 2:
The zero solution of (36) is
Lyapunov stable if, for all ε > and k ≥ ,there exists δ ( ε, k ) > such that, for all x k ∈ R n satisfying k x k k < δ ( ε, k ) , it follows that, for all k ≥ k , k x k k < ε . Definition 3:
The zero solution of (36) is uniformly Lyapunov stable if, for all ε > , there exists δ ( ε ) > such that, for all k ≥ and all x k ∈ R n satisfying k x k k < δ ( ε ) , it followsthat, for all k ≥ k , k x k k < ε . 13 efinition 4: The zero solution of (36) is globally asymptotically stable if it is Lyapunov stable and, for all k ≥ and all x k ∈ R n , it follows that lim k →∞ x k = 0 . Definition 5:
The zero solution of (36) is uniformly globally geometrically stable if there exist α > and β > such that, for all k ≥ and all x k ∈ R n , it follows that, for all k ≥ k , k x k k ≤ α k x k k β − k . Note that, if the zero solution of (36) is uniformly globally geometrically stable, then it isuniformly globally aymptotically stable as well as uniformly Lyapunov stable. The following three results are specializations of Theorem 13.11 given in [36, pp. 784,785]. Theorem 3:
Consider (36), and assume there exist a continuous function V : { , , . . . } ×D → R and α > such that, for all k ≥ and x ∈ D , V ( k,
0) = 0 , (37) α k x k ≤ V ( k, x ) , (38) V ( k + 1 , f ( k, x )) − V ( k, x ) ≤ . (39)Then, the zero solution of (36) is Lyapunov stable. Theorem 4:
Consider (36), and assume there exist a continuous function V : { , , . . . } ×D → R and α , β > such that, for all k ≥ and x ∈ D , V ( k,
0) = 0 , (40) α k x k ≤ V ( k, x ) ≤ β k x k , (41) V ( k + 1 , f ( k, x )) − V ( k, x ) ≤ . (42)Then, the zero solution of (36) is uniformly Lyapunov stable. Theorem 5:
Consider (36), and assume there exist a continuous function V : { , , . . . } × R n → R , and α , β , γ > , such that, for all k ≥ and x ∈ R n , α k x k ≤ V ( k, x ) ≤ β k x k , (43) V ( k + 1 , f ( k, x )) − V ( k, x ) ≤ − γ k x k . (44)Then, the zero solution of (36) is uniformly globally geometrically stable.The following result uses Theorems 3-5 to prove that, if ( φ k ) ∞ k =0 is persistently exciting, then the RLS estimate θ k with λ ∈ (0 , converges to θ in the sense of Definition 5. A relatedresult is given in [34]. heorem 6: Assume that ( φ k ) ∞ k =0 is persistently exciting, let N, α, β be given by Definition1, let R ∈ R n × n be positive definite, define P △ = R − , let λ ∈ (0 , , and, for all k ≥ , let P k be given by (4). Then the zero solution of (12) is Lyapunov stable. In addition, if λ ∈ (0 , ,then the zero solution of (12) is uniformly Lyapunov stable and uniformly globally geometrically stable. Proof:
Define the Lyapunov candidate V ( k, x ) △ = x T P − k x, where x ∈ R n . Note that, for all k ≥ , V ( k,
0) = 0 , which confirms (37). Next, defining f ( k, x ) △ = ( I n − P k +1 φ T k φ k ) x, it follows that V ( k + 1 , f ( k, x )) − V ( k, x ) = f ( k, x ) T P − k +1 f ( k, x ) − x T P − k x = x T [( I n − φ T k φ k P k +1 ) P − k +1 ( I n − P k +1 φ T k φ k ) − P − k ] x = x T [( P − k +1 − φ T k φ k )( I n − P k +1 φ T k φ k ) − P − k ] x = x T [ P − k +1 − φ T k φ k + φ T k φ k P k +1 φ T k φ k − P − k ] x = x T [( λ − P − k − φ T k ( I p − φ k P k +1 φ T k ) φ k ] x. (45)First, consider the case where λ = 1 . It follows from (8) with λ = 1 that P − ≤ P − k , andthus, for all k ≥ , σ min ( P − ) k x k ≤ V ( k, x ) , which confirms (38) with α ( k x k ) = σ min ( P − ) k x k . Next, note that I p − φ k P k +1 φ T k = I p − [ φ k P k φ T k − φ k P k φ T k (cid:0) I p + φ k P k φ T k (cid:1) − φ k P k φ T k ] . (46)Using (45), (46), and Lemma 3 from “Three Useful Lemmas” yields (39). It thus follows from Theorem 3 that the zero solution of (12) is Lyapunov stable.Next, consider the case where λ ∈ (0 , . It follows from Proposition 4 that, for all k ≥ N + 1 , λ N (1 − λ ) α − λ N +1 k x k ≤ V ( k, x ) ≤ β − λ N +1 k x k + x T P − N x ≤ (cid:18) β − λ N +1 + k P − N k (cid:19) k x k , λ ∈ (0 , with α = λ N (1 − λ ) α − λ N +1 , and β = β − λ N +1 + k P − N k . Using (45), (46), and Lemma 3 from “Three Useful Lemmas”, (42) is confirmed. It thus followsfrom Theorem 4 that the zero solution of (12) is uniformly Lyapunov stable. Furthermore, (43) is confirmed, α = λ N (1 − λ ) α − λ N +1 , and β = β − λ N +1 + k P − N k . Finally, if λ ∈ (0 , , then V ( k + 1 , f ( k, x )) − V ( k, x ) ≤ ( λ − x T P − k x ≤ ( λ − (cid:18) β − λ N +1 + k P − N k (cid:19) k x k , which confirms (44) with , γ = (1 − λ )( β − λ N +1 + k P − N k ) . It thus follows from Theorem 5 thatthe zero solution of (12) is uniformly globally geometrically stable. (cid:3) The following result provides an alternative proof of Theorem 6 that does not depend onTheorems 3-5. In addition, this result considers the case λ = 1 , where the RLS estimate θ k converges to θ in the sense of Definition 4. Theorem 7:
Assume that ( φ k ) ∞ k =0 is persistently exciting, let N, α, β be given by Definition
1, let R ∈ R n × n be positive definite, define P △ = R − , let λ ∈ (0 , , and, for all k ≥ , let P k be given by (4). Then the zero solution of (12) is globally asymptotically stable. Furthermore, if λ ∈ (0 , , then the zero solution of (12) is uniformly globally geometrically stable. Proof:
Let k ≥ and ˜ θ k ∈ R n . Then, it follows from (14) that, for all k ≥ k , k ˜ θ k k = λ k − k k P k P − k ˜ θ k k≤ k P k P − k ˜ θ k k≤ k P k kk P − k kk ˜ θ k k . (47)First, consider the case where λ = 1 . Let δ > , and suppose that ˜ θ k ∈ R n satisfies k ˜ θ k k < δ . It follows from (8) with λ = 1 that k P k k ≤ k P k and (47), that, for all k ≥ k , k ˜ θ k k < k P kk P − k k δ. It thus follows from Definition 2 with ε = k P kk P − k k δ that the zero solution of (12) is Lyapunov stable.Next, let ˜ θ ∈ R n . Then, Proposition 3 implies that lim k →∞ ˜ θ k = lim k →∞ P k P − ˜ θ = 0 . It thus follows from Definition 4 that the zero solution of (12) is globally asymptotically stable. λ ∈ (0 , . Let k ≥ and δ > , and let ˜ θ k ∈ R n satisfy k ˜ θ k k < δ . It follows from Proposition 4 and (47) that, for all k ≥ max( N + 1 , k ) , k ˜ θ k k < ε, where ε △ = β + (1 − λ N +1 ) k P − N k λ N (1 − λ ) α δ. It thus follows from Definition 3 that the zero solution of (12) is uniformly Lyapunov stable.Next, let ˜ θ k ∈ R n . Then, it follows from (14) and Proposition 4 that, for all ˜ θ k ∈ R n and k ≥ N + 1 , k ˜ θ k k ≤ α k ˜ θ k k β − k , where β △ = 1 /λ and α △ = β + (1 − λ N +1 ) k P − N k λ N (1 − λ ) α . It thus follows from Definition 5 that the zero solution of (12) is uniformly globally geometricallystable, and thus globally asymptotically stable. (cid:3) The following result shows that persistent excitation produces an infinite sequence ofmatrices whose product converges to zero. Proposition 7:
Let P ∈ R n × n be positive definite, let λ ∈ (0 , , and, for all k ≥ , let P k be given by (4). Then, for all k ≥ , all of the eigenvalues of P k +1 φ T k φ k are contained in [0 , . If, in addition, ( φ k ) ∞ k =0 is persistently exciting, then lim k →∞ A k = 0 , (48)where A k △ = ( I n − P k +1 φ T k φ k ) · · · ( I n − P φ T0 φ ) . (49) Proof:
It follows from (8) that, for all k ≥ , φ T k φ k ≤ P − k +1 , and thus, for all k ≥ ,P / k +1 φ T k φ k P / k +1 ≤ I n . Hence, for all k ≥ , ≤ λ max ( P k +1 φ T k φ k ) = λ max ( P / k +1 φ T k φ k P / k +1 ) ≤ . To prove (48), suppose that ( φ k ) ∞ k =0 is persistently exciting, let i ∈ { , . . . , n } , and define θ △ = e i + θ, where e i is the i th column of I n . Note that ˜ θ △ = θ − θ = e i . Then, (14) impliesthat, for all k ≥ , ˜ θ k +1 = A k e i = λ k +1 P k +1 P − e i . (50)17t follows from Theorem 7 that ˜ θ k converges to zero. Hence, (50) implies that the i th columnof A k converges to zero as k → ∞ . It thus follows that every column of A k converges to zeroas k → ∞ , which implies (48). (cid:3) It follows from Theorem 7 that, if ( φ k ) ∞ k =0 is persistently exciting, then, for all λ ∈ (0 , , ˜ θ k converges to zero. In addition, if λ ∈ (0 , , then ˜ θ k converges to zero geometrically, and thus the rate of convergence of k ˜ θ k k is O ( λ k ) . However, in the case λ = 1 , as shown in [34] and thenext example, ˜ θ k converges to zero as O (1 /k ) , and thus the convergence is not geometric. Example 6: Effect of λ on the rate of convergence of θ k . Consider the 3rd-order FIR system y k = q + 0 . q + 0 . q u k . (51)To apply RLS, let θ = [1 0 . . , θ = 0 , and φ k = [ u k − u k − u k − ] , where the input u k is zero-mean Gaussian white noise with standard deviation 1. Note that ( φ k ) ∞ k =0 is persistently exciting. It thus follows from Theorem 7 that ˜ θ k converges to zero. Figure 5 shows the parameter-errornorm k ˜ θ k k for several values of P and λ as well as the condition number of the corresponding P k . Note that the convergence rate of k ˜ θ k k is O (1 /k ) for λ = 1 and geometric for all λ ∈ (0 , .Furthermore, as λ is decreased, the convergence rate of θ k increases; however, the condition number of P k degrades, and the effect of P is reduced. ⋄ Lack of Persistent Excitation This section presents numerical examples to investigate the effect of lack of persistentexcitation. As shown in Example 3 and Example 5, if ( φ k ) ∞ k =0 is not persistently exciting and λ = 1 , then some of the singular values of P k converge to zero, whereas the remaining singularvalues remain bounded. On the other hand, if ( φ k ) ∞ k =0 is not persistently exciting and λ ∈ (0 , , then some of the singular values of P k remain bounded, whereas the remaining singular valuesdiverge. Furthermore, Proposition 6 implies that the predicted error z k converges to zero whether or not ( φ k ) ∞ k =0 is persistent. Example 7: Lack of persistent excitation in scalar estimation . Let n = 1 , so that (4), (5)are given by P k +1 = P k λ + P k φ k , (52) ˜ θ k +1 = λ ˜ θ k λ + P k φ k . (53)Now, let k ≥ and assume that, for all k ≥ k , φ k = 0 . Therefore, for all j ≥ and N ≥ , F j,j + N cannot be lower bounded as in (19), and thus ( φ k ) ∞ k =0 is not persistently exciting.18urthermore, in the case where λ = 1 , it follows from the fact that φ k = 0 for all k ≥ k that P k and ˜ θ k converge in k steps to P = 0 and ˜ θ , respectively. Furthermore, if θ = θ, then ˜ θ = 0 . However, in the case where λ ∈ (0 , , it follows that P k diverges geometrically, whereas, as in the case where λ = 1 , ˜ θ k converges in k steps. Therefore, for all λ ∈ (0 , , since φ k = 0 for all k ≥ k , it follows from (52) and (53) that, for all k ≥ k , the minimum value of (2) is achieved in a finite number of steps. Consequently, RLS provides no further refinement of theestimate θ k of θ, and thus ˜ θ = 0 implies that θ k does not converge to θ. Alternatively, assume that, for all k ≥ , φ k = φ, where φ = 0 . Then it follows fromDefinition 1 with N = 1 , α = φ , and β = 3 φ that ( φ k ) ∞ k =0 is persistently exciting. If λ = 1 , then both P k and ˜ θ k converge to zero. However, if λ ∈ (0 , , then P k converges to − λφ and ˜ θ k converges geometrically to zero. Table 5 shows the asymptotic behavior of ˜ θ k and P k for both of these cases. ⋄ Excitation \ λ λ = 1 λ ∈ (0 , Not persistently exciting ˜ θ k → ˜ θ, P k → P ˜ θ k → ˜ θ, P k divergesPersistently exciting ˜ θ k → , P k → θ k → , P k → − λφ TABLE 5: Asymptotic behavior of RLS in Example 7. In the case of persistent excitation with λ < , the convergence of ˜ θ k is geometric. Example 8: Subspace-constrained regressor.
Consider (1), where φ k = (sin πk )[1 1] and θ = [0 . . T . To estimate θ using RLS, let P = I and θ = 0 . Figure 6 shows the estimate θ k of θ with λ = 1 and λ = 0 . . Note that all regressors φ k lie along the same one-dimensional subspace, and thus, ( φ k ) ∞ k =0 is not persistently exciting. It follows from (16) that the estimate θ k of θ lies in this subspace. For λ = 1 , note that one singular value decreases to zero, whereas the other singular valueis bounded. Note that ˜ θ k converges along the singular vector corresponding to the bounded singular value. For λ = 0 . , one singular value is bounded, whereas the other singular valuediverges. Note that ˜ θ k converges along the singular vector corresponding to the diverging singular value. ⋄ Example 9: Lack of persistent excitation and finite-precision arithmetic.
Consider the problem of fitting a 5th-order model to measured input-output data from the system (34), wherethe input u k is given by (27). Note that φ k is given by (35), and is not persistently exciting as shown in Example 5. Let P = I , θ = 0 , and λ = 0 . . Figure 7 shows the predicted error19 k , the norm of the parameter error ˜ θ k , and the singular values and the condition number of P k . Note that the ˜ θ k does not converge to zero and that six singular values of P k remain boundeddue to the presence of three harmonics in the regresssor. Due to finite-precision arithmetic, the computation becomes erroneous as P k becomes numerically ill-conditioned, and thus the estimate θ k diverges. ⋄ The numerical examples in this section show that, if λ ∈ (0 , and ( φ k ) ∞ k =0 is notpersistently exciting, then ˜ θ k does not necessarily converge to zero. Furthermore, if λ ∈ (0 , and ( φ k ) ∞ k =0 is not persistently exciting, then some of the singular values of P k diverge, and θ k diverges due to finite-precision arithmetic when P k becomes numerically ill-conditioned. Information Subspace
Using the singular value decomposition, (8) can be written as P − k +1 = λU k Σ k U T k + U k ψ T k ψ k U T k , (54)where U k ∈ R n × n is an orthonormal matrix whose columns are the singular vectors of P − k , Σ k ∈ R n × n is a diagonal matrix whose diagonal entries are the corresponding singular values,and ψ k △ = φ k U k . (55)The columns of U k are the information directions at step k , and each row of ψ k is the projection of the corresponding row of φ k onto the information directions. The norm of each column of ψ k thus indicates the information content present in φ k along the corresponding information direction. The smallest subspace that is spanned by a subset of the information directions andthat contains all rows of φ k is the information-rich subspace I k at step k . Figure 8 illustrates the information-rich subspace.Now, consider the case where ψ k = h ψ k, p × ( n − n ) i , (56)where ψ k, ∈ R p × n . It follows from (56) that φ k provides new information along the first n columns of U k ; these directions constitute the information-rich subspace. It thus follows from(54) and (56) that P − k +1 is given by P − k +1 = U k " λ Σ k, + ψ T k, ψ k, λ Σ k, U T k , (57)20here Σ k, ∈ R n × n is the diagonal matrix whose diagonal entries are the first n singularvalues of P − k , and Σ k, is the diagonal matrix whose diagonal entries are the remaining n − n singular values of P − k . In particular, writing U k = h U k, U k, i , (58)where U k, ∈ R n × n contains the first n columns of U k , and U k, ∈ R n × n − n contains theremaining n − n columns of U k , it follows that P − k +1 = h U k +1 , U k +1 , i " Σ k +1 ,
00 Σ k +1 , U T k +1 , U T k +1 , , (59)where U k +1 , = U k, V k , (60) Σ k +1 , = D k , (61) U k +1 , = U k, , (62) Σ k +1 , = λ Σ k, , (63)where V k ∈ R n × n contains the singular vectors of λ Σ k, + ψ T k, ψ k, and D k ∈ R n × n is the diagonal matrix containing the corresponding singular values. It follows from (62), (63) that if,for all k ≥ , ψ k is given by (56) and λ ∈ (0 , , then the last n − n singular vectors of P − k do not change and the corresponding singular values of P − k decrease to zero geometrically. Itthus follows from Proposition 4 that ( φ k ) ∞ k =0 is not persistently exciting. Furthermore, since P k and P − k have the same singular vectors and the singular values of P k are the reciprocals of thesingular values of P − k , it follows that the last n − n singular values of P k diverge. The next example considers the case where there exists a proper subspace
S ⊂ R n suchthat, for all k ≥ , R ( φ T k ) ⊆ S . Hence, ( φ k ) ∞ k =0 is not persistently exciting. In this case, for all k ≥ , the information-rich subspace I k is a proper subspace of R n , and the singular values of P − k corresponding to the singular vectors in the orthogonal complement of I k converge to zero. Example 10: Lack of persistent excitation and the information-rich subspace.
Consider theregressor φ k given by (35) used in Example 5. Recall that ( φ k ) ∞ k =0 is not persistently exciting. Let P = I . Figure 9 shows the information content | ψ k, ( i ) | for several values of λ alongwith the singular values of the corresponding P − k . Note that the information-rich subspace is six dimensional due to the presence of three harmonics in u k as shown by six relatively largecomponents of ψ k and, in the case where λ < , the singular values that correspond to the singular vectors not in the information-rich subspace converge to zero in machine precision. ⋄ ariable-Direction forgetting Examples 3, 5, 7, 8, and 9 show that some of the singular values of P − k converge tozero in the case where φ k is not persistently exciting. To address this situation, (8) is modifiedby replacing the scalar forgetting factor λ by a data-dependent forgetting matrix Λ k . Similarmodifications are discussed in “Toward Matrix Forgetting”. In particular, P − k +1 is redefined as P − k +1 = Λ k P − k Λ k + φ T k φ k , (64)where Λ k is a positive-definite (and thus symmetric) matrix constructed below. Note that, for all k ≥ , P − k +1 given by (64) is positive definite. Using the singular value decomposition, (64) canbe written as P − k +1 = Λ k U k Σ k U T k Λ k + U k ψ T k ψ k U T k , (65)where U k , Σ k , and ψ k are as defined in the previous section.The objective is to apply forgetting to only those singular values of P − k that correspondto the singular vectors in the information-rich subspace, that is, forgetting is restricted to thesubspace of P − k where sufficient new information is provided by φ k . Specifically, forgettingis applied to those information directions where the information content is greater than ε > ,where ε should be selected to be larger than the noise to signal ratio or larger than the machinezero, if no noise is present. To do so, (65) is written as P − k +1 = U k Λ k Σ k Λ k U T k + U k ψ T k ψ k U T k , (66)where Λ k is a diagonal matrix whose diagonal entries are either √ λ or . In particular, Λ k ( i, i ) △ = √ λ, k col i ( ψ k ) k > ε, , otherwise , (67)where col i ( ψ k ) is the i th column of ψ k and λ ∈ (0 , . Note that, it follows from (66) and (67)that P − k +1 is positive definite. Next, it follows from (65) and (66) that Λ k = U k Λ k U T k , (68)which is positive definite. Note that Λ − k = U k Λ − k U T k . (69)The next result provides a recursive formula to update P k +1 given by (64). Proposition 8:
Let λ ∈ (0 , , ε > , let ( P k ) ∞ k =0 be a sequence of n × n positive-definitematrices, and let U k ∈ R n × n be an orthonormal matrix whose columns are the singular vectors22f P k . Furthermore, let ψ k ∈ R p × n be given by (55), let Λ k be given by (67), and let Λ k begiven by (68). Then, for all k ≥ , ( P k ) ∞ k =0 satisfies (64) if and only if, for all k ≥ , ( P k ) ∞ k =0 satisfies P k +1 = P k − P k φ k ( I p + φ T k P k φ k ) − φ T k P k , (70)where P k = Λ − k P k Λ − k . (71) Proof:
To prove necessity, it follows from (64) and matrix-inversion lemma, that P k +1 = (Λ k P − k Λ k + φ T k φ k ) − = (Λ k P − k Λ k ) − − (Λ k P − k Λ k ) − φ T k [ I p + φ k (Λ k P − k Λ k ) − φ T k ] − φ k (Λ k P − k Λ k ) − = P k − P k φ k ( I p + φ T k P k φ k ) − φ T k P k , where P k is given by (71). Reversing these steps proves sufficiency. (cid:3) The modified update (64) is shown to be optimal for a specific cost function in “A ModifiedQuadratic Cost Function Supporting Variable-Direction RLS”. Next, the matrix-forgetting scheme (64) is shown to prevent the singular values of P k fromdiverging. Consider the case where, for all k ≥ , ψ k = h ψ k, i , (72)where ψ k, ∈ R p × n , that is, the information-rich subspace is spanned by the first n columnsof U k . It thus follows from (66) and (72) that P − k +1 is given by P − k +1 = U k " λ Σ k, + ψ T k, ψ k,
00 Σ k, U T k . (73)It follows from the (2 , block of (73) that the last n − n information directions and thecorresponding singular values are not affected by φ k . Furthermore, if n = n , that is, new information is present in φ k along every information direction, then forgetting is applied toall of the singular values of P − k , and thus variable-direction forgetting specializes to uniform- direction forgetting, that is, RLS with the update for P k given by (8).The next result shows that, as in the case of uniform-direction forgetting, z k converges to zero with variable-direction forgetting for every choice of ε > , whether or not ( φ k ) ∞ k =0 ispersistently exciting. roposition 9: For all k ≥ , let φ k ∈ R p × n and y k ∈ R p , let R ∈ R n × n be positivedefinite, and let P = R − , θ ∈ R n , and λ ∈ (0 , . Furthermore, for all k ≥ , let P k and θ k be given by (64) and (5), respectively. Then, lim k →∞ z k = 0 . (74) Proof:
Using (67), (68), and P − k = U k Σ k U T k , it follows that, for all k ≥ , Λ k P − k Λ k = U k Λ k Σ k Λ k U T k ≤ U k Σ k U T k = P − k . (75)For all k ≥ , note that z k = φ k ˜ θ k , and define V k △ = ˜ θ T k P − k ˜ θ k . Note that, for all k ≥ and ˜ θ k ∈ R n , V k ≥ . Furthermore, for all k ≥ ,V k +1 − V k = ˜ θ T k +1 P − k +1 ˜ θ k +1 − ˜ θ T k P − k ˜ θ k = ˜ θ T k Λ k P − k Λ k P k +1 Λ k P − k Λ k ˜ θ k − ˜ θ T k P − k ˜ θ k = ˜ θ T k [Λ k P − k Λ k P k +1 Λ k P − k Λ k − P − k ]˜ θ k = ˜ θ T k [Λ k P − k ( P k − P k Λ − k φ k ( I p + φ T k P k φ k ) − φ T k Λ − k P k ) P − k Λ k − P − k ]˜ θ k = ˜ θ T k [Λ k P − k Λ k − φ k ( I p + φ T k P k φ k ) − φ T k − P − k ]˜ θ k = − [˜ θ T k ( P − k − Λ k P − k Λ k )˜ θ k + z k ( I p + φ T k P k φ k ) − z k ] ≤ . Note that, since ( V k ) ∞ k =1 is a nonnegative, nonincreasing sequence, it converges to a nonnegativenumber. Hence, lim k →∞ ( V k +1 − V k ) = 0 , which implies that lim k →∞ [˜ θ T k ( P − k − Λ k P − k Λ k )˜ θ k + z k ( I p + φ T k P k φ k ) − z k ] = 0 . Since, for all k ≥ , P − k − Λ k P − k Λ k ≥ and ( I p + φ T k P k φ k ) − > , it follows that lim k →∞ z k =0 . (cid:3) The next result shows that P k is bounded from above with variable-direction forgetting for every choice of ε > in the case where ( φ k ) ∞ k =0 is persistently exciting. Proposition 10:
Assume that ( φ k ) ∞ k =0 is persistently exciting, let N, α, β be given byDefinition 1, let R ∈ R n × n be positive definite, define P △ = R − , let λ ∈ (0 , , and, forall k ≥ , let P k be given by (64). Then, for all k ≥ N + 1 ,λ N (1 − λ ) α − λ N +1 I n ≤ P − k . (76) Proof:
It follows from (64), that, for all k ≥ , Λ k P − k Λ k ≤ P − k +1 and φ T k φ k ≤ P − k +1 . Next,using (68) and P − k = U k Σ k U T k , it follows that, for all k ≥ , λP − k = λU k Σ k U T k ≤ U k Λ k Σ k Λ k U T k = Λ k P − k Λ k ≤ P − k +1 . k ≥ N + 1 , αI n ≤ k − X i = k − N − φ T i φ i ≤ k X i = k − N P − i ≤ ( λ − N + · · · + 1) P − k = 1 − λ N +1 λ N (1 − λ ) P − k , which proves (76). (cid:3) The next two examples consider variable-direction forgetting in the case where ( φ k ) ∞ k =0 is not persistently exciting. In these examples, P k is bounded, z k converges to zero, and θ k converges, although not to the true value θ. Example 11: Variable-direction forgetting for a regressor lacking persistent excitation. Reconsider Example 10. Let P = I , and P − k be given by (70), where ε = 10 − . Figure10 shows the information content | col i ( ψ k ) | and the singular values of the P − k for several values of λ . Note that the information-rich subspace is six dimensional due to the presence ofthree harmonics in u k as shown by six relatively large components of ψ k and the singular values that correspond to the singular vectors not in the information-rich subspace do not converge tozero. ⋄ Example 12: Effect of variable-direction forgetting on θ k . Reconsider Example 9. Let P = I , and P − k be given by (70), where ε = 10 − . Figure 11 shows the predicted error z k , the norm of the parameter error ˜ θ k , and the singular values and the condition number of P k . Notethat the ˜ θ k does not converge to zero and, unlike uniform-direction forgetting, all of the singular values of P k remain bounded and θ k is bounded. ⋄ Concluding Remarks This tutorial article presented a self-contained exposition of uniform-direction and variable-direction forgetting within the context of RLS. It was shown that, in the case of persistent excitation without forgetting, the parameter estimates converge asymptotically, whereas, withforgetting, the parameter estimates converge geometrically. Numerical examples were presented to illustrate this behavior.In the case where forgetting is used but the excitation is not persistent, it was shown that P k diverges, leading to numerical instability. This phenomenon was traced to the divergence of the singular values of P k corresponding to singularvectors that are orthogonal to the information-rich subspace. In order to address this problem, a data-dependent forgetting matrix was constructed torestrict forgetting to the information-rich subspace. The RLS cost function that corresponds to this extension of RLS was presented. Numerical examples showed that this variable-directionforgetting technique prevents P k from diverging under lack of persistent excitation. Since RLS is fundamentally least squares optimization, its estimates are not consistentin the case of sensor noise [37]. An open problem is thus to develop extensions of RLS that provide consistent parameter estimates in the presence of errors-in-variable noise arising insystem identification problems [38]. Acknowledgments
This research was partially supported by AFOSR under DDDAS grant FA9550-16-1-0071 References [1] M. Grewal and K. Glover, “Identifiability of linear and nonlinear dynamical systems,” IEEETrans. Autom. Contr. , vol. 21, no. 6, pp. 833–837, 1976. [2] I. Y. Mareels and M. Gevers, “Persistence of excitation criteria,” in Proc. Conf. Dec. Contr ,1986, pp. 1933–1935. [3] I. M. Mareels, R. R. Bitmead, M. Gevers, C. R. Johnson, and R. L. Kosut, “How excitingcan a signal really be?” Sys. Contr. Lett. , vol. 8, no. 3, pp. 197–204, 1987. [4] I. M. Mareels and M. Gevers, “Persistency of excitation criteria for linear, multivariable,time-varying systems,” Mathematics of Control, Signals, and Systems , vol. 1, no. 3, pp. phenomena,” Automatica , vol. 21, no. 3, pp. 247–258, 1985.[6] G. Chowdhary and E. Johnson, “Concurrent learning for convergence in adaptive control without persistency of excitation,” in Proc. Conf. Dec. Contr. , 2010, pp. 3674–3679.[7] G. Chowdhary, M. M ¨uhlegg, and E. Johnson, “Exponential parameter and tracking error convergence guarantees for adaptive controllers without persistency of excitation,” Int. J. ontr. , vol. 87, no. 8, pp. 1583–1603, 2014. [8] S. Aranovskiy, A. Bobtsov, R. Ortega, and A. Pyrkin, “Performance enhancement ofparameter estimators via dynamic regressor extension and mixing,” IEEE Trans. Autom. Contr. , vol. 62, no. 7, pp. 3546–3550, 2017.[9] P. Panda, J. M. Allred, S. Ramanathan, and K. Roy, “Learning to forget with adaptive synaptic plasticity in spiking neural networks,” J. Emerg. Selec. Top. Circ. Syst. , vol. 8,no. 1, pp. 51–64, 2018. [10] J. M. Allred and K. Roy, “Unsupervised incremental stdp learning using forced firing ofdormant or idle neurons,” in , July theory of three-factor learning rules,” Frontiers in Neural Circuits , vol. 9, pp. 85–103,2016. [12] B. Han, A. Ankit, A. Sengupta, and K. Roy, “Cross-layer design exploration for energy-quality tradeoffs in spiking and non-spiking deep artificial neural networks,” IEEE Trans- actions on Multi-Scale Computing Systems , 2017.[13] S. A. U. Islam and D. S. Bernstein, “Recursive Least Squares for Real-Time Implementa- tion,” IEEE Contr. Sys. Mag. , vol. 39, pp. 82–85, June 2019.[14] L. Ljung,
System Identification: Theory for the User , 2nd ed. Prentice Hall, 1999. [15] A. H. Sayed, Fundamentals of Adaptive Filtering . Wiley, 2003.[16] K. J. Astrom and B. Wittenmark,
Computer-Controlled Systems: Theory and Design , 3rd ed. Prentice-Hall, 1996.[17] T. R. Fortescue, L. S. Kershenbaum, and B. E. Ydstie, “Implementation of Self-tuning Regulators with Variable Forgetting Factors,”
Automatica , vol. 17, no. 6, pp. 831–835,1981. [18] C. Paleologu, J. Benesty, and C. Silviu, “A Robust Variable Forgetting Factor RecursiveLeast-Squares Algorithm for System Identification,” IEEE Sig. Proc. Lett. , vol. 15, pp. Time-Varying Environments,”
IEEE Trans. Sig. Proc. , vol. 53, no. 8, pp. 3141–3150, 2005.[20] S. Song, J.-S. Lim, S. J. Baek, and K.-M. Sung, “Gauss Newton variable forgetting factor recursive least squares for time varying parameter tracking,” Electron. Lett. , vol. 36, no. 11,pp. 988–990, 2000. [21] D. J. Park, B. E. Jun, and K. J. H., “Fast tracking RLS algorithm using novel variableforgetting factor with unity zone,” Electron. Lett. , vol. 27, no. 23, pp. 2150–2151, 1991. [22] A. A. Ali, J. B. Hoagg, M. Mossberg, and D. S. Bernstein, “On the stability and convergence27f a sliding-window variable-regularization recursive-least-squares algorithm,” Int. J. Adapt. Contr. Sig. Proc. , vol. 30, pp. 715–735, 2016.[23] R. M. Canetti and M. D. Espa˜na, “Convergence analysis of the least-squares identification algorithm with a variable forgetting factor for time-varying linear systems,” Automatica ,vol. 25, no. 4, pp. 609–612, 1989. [24] M. E. Salgado, G. C. Goodwin and R. H. Middleton, “Modified least squares algorithmincorporating exponential resetting and forgetting,” Int. J. Contr. , vol. 47, no. 2, pp. 477– with covariance resetting,” in IEE Proceedings D-Control Theory and Applications , vol.130, no. 1, 1983, pp. 6–8. [26] R. Kulhav`y, “Restricted exponential forgetting in real-time identification,” Automatica ,vol. 23, no. 5, pp. 589–600, 1987. [27] R. Kulhav \ ‘y and M. K´arn \ ‘y, “Tracking of slowly varying parameters by directionalforgetting,” IFAC Proc. Vol. , vol. 17, no. 2, pp. 687–692, 1984. [28] G. Kreisselmeier, “Stabilized least-squares type adaptive identifiers,” IEEE Trans. Autom.Contr. , vol. 35, no. 3, pp. 306–310, 1990. [29] L. Cao and H. Schwartz, “Directional forgetting algorithm based on the decomposition ofthe information matrix,” Automatica , vol. 36, no. 11, pp. 1725–1731, 2000. [30] G. Kubin, “Stabilization of the RLS algorithm in the absence of persistent excitation,” in Proc. Int. Conf. Acoustics, Speech, Signal Processing , 1988, pp. 1369–1372. [31] S. Bittanti, P. Bolzern, and M. Campi, “Convergence and exponential convergence ofidentification algorithms with directional forgetting factor,” Automatica , vol. 26, no. 5, pp. 929–932, 1990.[32] ——, “Exponential convergence of a modified directional forgetting identification algo- rithm,” Sys. Contr. Lett. , vol. 14, no. 2, pp. 131–137, 1990.[33] A. Goel and D. S. Bernstein, “A targeted forgetting factor for recursive least squares,” in Proc. Conf. Dec. Contr. , 2018, pp. 3899–3903.[34] R. M. Johnstone, C. R. Johnson, R. R. Bitmead, and B. D. O. Anderson, “Exponential convergence of recursive least squares with exponential forgetting factor,” in Proc. Conf.Dec. Contr. , 1982, pp. 994–997. [35] J. Benesty and T. G¨ansler, “New Insights into the RLS Algorithm,” Eurasip Jour. App. Sig.Proc. , no. 3, pp. 331–339, 2004. [36] W. M. Haddad and V. Chellaboina, Nonlinear Dynamical Systems and Control: A Lyapunov-based Approach . Princeton University Press, 2008. [37] P. Eykhoff, System Identification: Parameter and State Estimation . Wiley-Interscience,28974. [38] T. S¨oderstr¨om, Errors-in-variables methods in system identification . Springer, 2018.29 idebar: Summary Learning depends on the ability to acquire and assimilate new information. This abilitydepends—somewhat counterintuitively—on the ability to forget. In particular, effective forgetting requires the ability to recognize and utilize new information to order to update a system model.This article is a tutorial on forgetting within the context of recursive least squares (RLS). To do this, RLS is first presented in its classical form, which employs uniform-direction forgetting.Next, examples are given to motivate the need for variable-direction forgetting, especially in cases where the excitation is not persistent. Some of these results are well known, whereas otherscomplement the prior literature. The goal is to provide a self-contained tutorial of the main ideas and techniques for students and researchers whose research may benefit from variable-directionforgetting. idebar: Three Useful Lemmas Lemma 1:
Let X ∈ R n × p and y ∈ R n , and let W ∈ R p × p be positive definite. Then, ( I n + XW X T ) − y ∈ R ([ X y ]) . (S1) Proof:
Note that y ∈ R ([ X y ])= R [ X y + XW X T y ]= R [ X ( I n + XW X T ) y ] " I p + W X T X
00 1 = R ([ X ( I p + W X T X ) ( I n + XW X T ) y ])= R ([( I n + XW X T ) X ( I n + XW X T ) y ])= ( I n + XW X T ) R ([ X y ]) , which implies (S1). (cid:3) Lemma 2:
Let A ∈ R n × n be positive semidefinite, and let λ > . Then, I n − A ( λI n + A ) − > . (S2) Proof:
Write A = SDS T , where D = diag( d , . . . , d n ) is diagonal and S is unitary. Forall i ∈ { , . . . , n } , d i ≥ , and thus d i λ + d i < . Hence, D ( λI n + D ) − = diag (cid:16) d λ + d , . . . , d n λ + d n (cid:17) < I n . (S3)Pre-multiplying and post-multiplying (S3) by S and S T , respectively, yields (S2). (cid:3) Lemma 3:
Let A ∈ R n × n be positive semidefinite, and let λ > . Then, I n − λ (cid:0) A − A ( λI n + A ) − A (cid:1) > . (S4) Proof:
Write A = SDS T , where D = diag( d , . . . , d n ) is diagonal and S is unitary. Forall i ∈ { , . . . , n } , d i ≥ , and thus d i λ + d i < . Hence, λ (cid:0) D − D ( λI n + D ) − D (cid:1) = diag (cid:16) d λ + d , . . . , d n λ + d n (cid:17) < I n . (S5)Pre-multiplying and post-multiplying (S5) by S and S T , respectively, yields (S4). (cid:3) idebar: RLS as a One-Step Optimal Predictor Consider the linear system x k +1 = A k x k + B k u k + w ,k , (S1) y k = C k x k + w ,k , (S2)where, for all k ≥ , x k ∈ R n , u k ∈ R m , y k ∈ R p , and A k , B k , C k are real matrices of appropriatesizes. The input u k and output y k are assumed to be measured. The process noise w ,k ∈ R n and the sensor noise w ,k ∈ R p are zero-mean white noise processes with variances E [ w ,k w T1 ,k ] = Q k and E [ w ,k w T2 ,k ] = R k , respectively. The expected value of the initial state is assumed to be x , and the variance of the initial state is P , that is, E [ x ] = x and E [( x − x )( x − x ) T ] = P .The objective is to estimate the state x k given the measurements of u k and y k . To estimate x k , consider the estimator ˆ x k +1 = A k ˆ x k + B k u k + K k ( y k − C k ˆ x k ) , (S3)where ˆ x k is the estimate of x k at step k and ˆ x = x . The matrix K k is constructed as follows.Define the state-estimate error e k △ = x k − ˆ x k and the state error covariance P k △ = E [ e k e T k ] ∈ R n × n .Then, e k and P k satisfy e k +1 = ( A k − K k C k ) e k + w ,k − K k w ,k , (S4) P k +1 = A k P k A T k + Q k + K k (cid:0) R k + C k P k C T k (cid:1) K T k − A k P k C T k K T k − C k P k A T k . (S5) Proposition S1:
Let P k +1 be given by (S5). The matrix K k that minimizes tr P k +1 is givenby K k = A k P k C T k (cid:0) R k + C k P k C T k (cid:1) − , (S6)and the minimized state-error covariance P k is updated as P k +1 = A k P k A T k + Q k − A k P k C T k (cid:0) R k + C k P k C T k (cid:1) − C k P k A T k . (S7) Proof:
See [S1]. (cid:3) Let A k = I n , B k = 0 , C k = φ k , Q k = 0 , and R k = I p . Then, ˆ x k +1 = ˆ x k + P k φ T k (cid:0) I p + φ k P k φ T k (cid:1) − ( y k − φ k ˆ x k ) , (S8) P k +1 = P k − P k φ T k (cid:0) I p + φ k P k φ T k (cid:1) − φ k P k . (S9)Note that (6), (7) with λ = 1 have the same form as (S8), (S9). In particular, RLS withoutforgetting is the state estimator for the linear time-varying system with A k = I n , B k = 0 , C k = φ k , Q k = 0 , and R k = I p . 32 eferences [S1] S. A. U. Islam, A. Goel, and D. S. Bernstein, “Real-Time Implementation of the OptimalPredictor and Optimal Filter: Accuracy versus Latency,” IEEE Contr. Sys. Mag. , to appear. idebar: RLS as a Maximum Likelihood Estimator Let k ≥ and, for all i ∈ { , , . . . , k } , consider the process y i = φ i θ true + v i , (S1)where θ true ∈ R n is the unknown parameter, φ i ∈ R p × n is the regressor matrix, v i ∈ R p is themeasurement noise, and y i ∈ R p is the measurement. The goal is to estimate θ true using the data ( φ i ) ki =0 and ( y i ) ki =0 .Let θ true be modeled by the n -dimensional, real-valued normal random variable Θ withmean θ ∈ R n and covariance ( λ k +1 R ) − , where λ ∈ (0 , and R ∈ R n × n is positive definite.For θ ∈ R n , the density of Θ is thus given by f Θ ( θ ) = 1 p (2 π ) n det ( λ k +1 R ) − exp[ − ( θ − θ ) T λ k +1 R ( θ − θ )] . (S2)For all i ∈ { , , . . . , k } , assume that v i is a sample of the zero-mean, p -dimensional, real-valuednormal random variable V i with covariance λ i − k I p . For v i ∈ R p , the density of V i is thus givenby f V i ( v i ) = 1 p (2 π ) p λ i − k exp( − v T i λ k − i I p v i ) . (S3)Assume that V , V , . . . , V k are independent. Since θ true and v i are modeled as normal random variables, it follows from (S1) that y i isa sample of the p -dimensional, real-valued normal random variable Y i = φ i θ true + V i . Note that,since V , V , . . . , V k are independent, it follows that Y , Y , . . . , Y k are independent. Using (S1)and (S3), it thus follows that f Y i | θ ( y i ) = 1 p (2 π ) p λ i − k exp[ − ( y i − φ k θ ) T λ k − i I p ( y i − φ k θ )] , (S4)where f Y i | θ ( y i ) is the density of the random variable Y i conditions on Θ taking the value θ .It follows from Bayes’ rule [S1, p. 413] that f Θ |{ y ,...,y k } ( θ ) = α − f Θ ( θ ) k Y i =0 f Y i | θ ( y i ) , (S5)where α △ = Z R n f Θ ( θ ) k Y i =0 f Y i | θ ( y i ) d θ. (S6)34ubstituting (S2) and (S4) into (S5), it follows that f Θ |{ y ,...,y k } ( θ ) = β exp " k X i =0 − λ k − i ( y i − φ k θ ) T ( y i − φ k θ ) − λ k +1 ( θ − θ ) T R ( θ − θ ) , (S7)where β △ = 1 α p (2 π ) p λ i − k p (2 π ) n det ( λ k +1 R ) − . (S8)Finally, the maximum likelihood estimate of θ true is given by the maximizer of (S7), thatis, θ ML = argmax θ ∈ R n f Θ |{ y ,...,y k } ( θ ) . (S9)In fact, θ ML = argmin θ ∈ R n J k ( θ ) , where J k ( θ ) is given by (2). Therefore, RLS with forgetting can be interpreted as the maximum likelihood estimator of the random variable Θ . References [S1] D. P. Bertsekas, and J. N. Tsitsiklis. Introduction to Probability , Second Edition, AthenaScientific, 2008. idebar: Toward Matrix Forgetting In [S1], P − k is updated by P − k +1 = ( I n + M k P k ) P − k + φ T k φ k , (S1)where M k ∈ R n × n is chosen to guarantee asymptotic stability and boundedness. Two choices ofmatrix M k are considered. In the first case, M k △ = − (1 − λ )( I − αP k ) N P − k , (S2)where λ ∈ (0 , , α > , and N is an odd, positive integer. In the second case, M k = − (1 − λ )( P − k − αI n ) N ( P − k + βI n ) − N P − k , (S3)where λ ∈ (0 , , α > , β ≥ , and N is an odd, positive integer. Note that RLS with constantforgetting is obtained by setting M k = ( λ − P − k in (S1). Proposition S1:
Consider (S1) with (S2) or (S3). Let P be symmetric and nonsingular.Then, the following statements hold: i ) For all k ≥ , P k is symmetric and nonsingular. ii ) If P − ≥ α I n , then, P − k = αI is an asymptotically stable equilibrium of (S1). iii ) If P − ≥ αI n , then, for all k ≥ , P − k ≥ αI n . iv ) If P − ≥ αI n and, for all k ≥ , φ k is bounded, then P − k is bounded. v ) If P − ≥ αI n and φ k is persistently exciting, then there exists k > such that, for all k ≥ k , P − k > αI n . Proof:
See [28]. (cid:3)
The main goal of (S1) is stabilization of P k in the case where ( φ k ) ∞ k =0 is not persistently exciting. Proposition S1 implies that P k remains bounded whether or not ( φ k ) ∞ k =0 is persistent.However, (S1) is not designed to implement forgetting. Furthermore, note that (S1) requires the computation of the inverse of an n × n matrix at each step.An alternative directional forgetting scheme given in [S2] considers the update P − k +1 = M k P − k + φ T k φ k , (S4)where M k ∈ R n × n is designed to apply forgetting to a specific subspace. In the case of a scalarmeasurement, that is, p = 1 , P − k is decomposed as P − k = P − ,k + P − ,k , (S5)36here P − ,k is chosen such that P − ,k φ T k = 0 , that is, φ T k is in the null space of P − ,k . Next,forgetting is restricted to P − ,k , that is, P − k +1 = P − ,k + λP − ,k + φ T k φ k . (S6)The matrix P − ,k is chosen to be positive semidefinite with rank by using P − ,k △ = P − k φ T k (cid:0) φ k P − k φ T k (cid:1) − φ k P − k , (S7)and thus P − ,k = P − k − P − ,k . Finally, it follows from (S4), (S6), and (S7) that M k = I n − (1 − λ ) (cid:0) φ k P − k φ T k (cid:1) − P − k φ T k φ k (S8)and P k +1 is computed as P k = P k + 1 − λλ (cid:0) φ k P − k φ T k (cid:1) − φ T k φ k , φ k = 0 ,P k , φ k = 0 , (S9) P k +1 = P k − P k φ k (1 + φ T k P k φ k ) − φ T k P k . (S10)It is shown in [S2] that, if P − k is positive definite, then, for all λ ∈ (0 , , M k P − k is positive definite. Furthermore, if, for all k ≥ , φ k is bounded, then there exists β > such that,for all k ≥ , P k < βI n . References [S1] G. Kreisselmeier, “Stabilized least-squares type adaptive identifiers,”
IEEE Trans. Autom. Contr. , vol. 35, no. 3, pp. 306–310, 1990.[S2] L. Cao and H. Schwartz, “Directional forgetting algorithm based on the decomposition ofthe information matrix,” Automatica , vol. 36, no. 11, pp. 1725–1731, 2000.37 idebar: A Cost Function for Variable-Direction RLS Theorem S1:
For all k ≥ , let φ k ∈ R p × n and y k ∈ R p . Furthermore, let R ∈ R n × n bepositive definite, let λ ∈ (0 , , and, for all k ≥ , let P k be given by P − k +1 = Λ k P − k Λ k + φ T k φ k , (S1)where P △ = R − and let Λ k be given by (68). In addition, let θ ∈ R n , and define J k (ˆ θ ) △ = k X i =0 ( y i − φ i ˆ θ ) T ( y i − φ i ˆ θ ) + (ˆ θ − θ ) T R k (ˆ θ − θ ) , (S2)where, for all k ≥ , R k = R k − + Λ k P − k Λ k − P − k , (S3)where R − △ = R . Then, for all k ≥ , (S2) has a unique global minimizer θ k +1 = argmin ˆ θ ∈ R n J k (ˆ θ ) , (S4)which is given by θ k +1 = θ k + P k +1 φ T k ( y k − φ k θ k ) + P k +1 ( R k − R k − )( θ − θ k ) . (S5) Proof:
Note that, for all k ≥ , J k (ˆ θ ) = ˆ θ T A k ˆ θ + ˆ θ T b k + c k , where A k △ = k X i =0 φ T i φ i + R k , (S6) b k △ = k X i =0 − φ T i y i − R k θ , (S7) c k △ = k X i =0 y T i y i + θ T0 R k θ . Using (S3), (S6), and (S7), it follows that, for all k ≥ ,A k = A k − + Λ k P − k Λ k − P − k + φ T k φ k , (S8) b k = b k − − φ T k y k − ( R k − R k − ) θ , (S9)38here A − △ = R and b − △ = − Rθ . Using (S1) and (S8), it follows that, for all k ≥ , A k − P − k +1 = A k − − P − k = A − − P − = 0 . It follows from (65) that, for all k ≥ , P − k +1 is positive definite, and thus A k is positive definite.Furthermore, for all k ≥ , A k is given by A k = Λ k A k − Λ k + φ T k φ k . Finally, since A k is positive definite, it follows from Lemma 1 in [S1] that θ k +1 = − A − k b k = − A − k ( b k − − φ T k y k − ( R k − R k − ) θ )= − A − k ( − A k − θ k − φ T k y k − ( R k − R k − ) θ )= A − k (( A k − R k + R k − − φ T k φ k ) θ k + φ T k y k + ( R k − R k − ) θ )= A − k ( A k θ k + φ T k ( y k − φ k θ k ) + ( R k − R k − )( θ − θ k )= θ k + A − k φ T k ( y k − φ k θ k ) + A − k ( R k − R k − )( θ − θ k )= θ k + P k +1 φ T k ( y k − φ k θ k ) + P k +1 ( R k − R k − )( θ − θ k ) . Hence, (S5) is satisfied. (cid:3)
Using R k − R k − = Λ k A k − Λ k − A k − , it follows that (S5) can be implemented withoutcomputing P − k . References [S1] S. A. U. Islam, and D. S. Bernstein, “Recursive Least Squares for Real-Time Implemen- tation,” Contr. Sys. Mag. , vol. 39, pp. 82-85, June 2019.39igure 1: Example 2. Persistent excitation and bounds on P − k . a) and b) show the singularvalues of F j,j + N for N = 2 and N = 10 , where α and β are chosen to satisfy (19). Since u k isperiodic, it follows that, for all j ≥ , the lower and upper bounds (19) for F j,j + N are satisfied.Hence, ( φ k ) ∞ k =0 is persistently exciting. c) shows the singular values of P − k , with correspondingbounds given by (25) for λ = 0 . . Note that α and β are larger for N = 10 than for N = 2 ,as expected. 40igure 2: Example 3. Lack of persistent excitation and bounds on P − k . a) shows the singularvalues of F j,j +2 . Note that the smaller singular value of F j,j +2 reaches zero in machine precision,and thus that α > satisfying (19) does not exist. Hence, φ k is not persistently exciting. Theupper bound β shown by the dashed line is chosen to satisfy (19). b) and c) show the singularvalues of P − k for λ = 1 and λ = 0 . , respectively. Note that, if λ = 1 , then one of the singularvalues of P − k diverges, whereas, if λ ∈ (0 , , then one of singular values of P − k converges tozero. 41igure 3: Example 4. Convergence of z k and θ k . a) and b) show the singular values of F j,j +10 for two choices of u k . Note that the singular value of F j,j +10 that is close to machine precision ( ≈ − ) is essentially zero. Definition 1 thus implies that ( φ k ) ∞ k =0 is not persistently exciting.c) and d) show the predicted error z k for both cases. Note that z k converges to zero in both cases.Finally, e) and f) show the parameter estimate θ k for both cases. Note that, for both choices ofinput u k , θ k converge, but to different parameter values.42igure 4: Example 5. Using the condition number of P k to evaluate persistency. a) shows thesingular values of F j,j +20 , where the singular values of F j,j +20 close to machine precision ( ≈ − ) are essentially zero, thus implying that ( φ k ) ∞ k =0 is not persistently exciting. b) and c) showsthe singular values and the condition number of P k for λ = 1 . Note that the six singular valuesof P k decrease due to the presence of three harmonics in u k . d) and e) shows the singular valuesand the condition number of P k for λ = 0 . . Note that the six singular values of P k remainbounded due to the presence of three harmonics in u k . However, P k becomes ill-conditioneddue to the lack of persistent excitation. 43igure 5: Example 6. Effect of λ on the rate of convergence of θ k . a)-f) show the parametererror norm k ˜ θ k k for several values of P and λ . Note that the slope of − between log k ˜ θ k k and log k in d) is consistent with the fact that the rate of convergence of k ˜ θ k k is O (1 /k ) for λ = 1 .Similarly, the slope of log λ between log k ˜ θ k k and k in b) and c) is consistent with the factthat the rate of convergence of k ˜ θ k k is O ( λ k ) for λ ∈ (0 , . g), h), and i) show the conditionnumber of the corresponding P k for several values of P and λ . Note that, as λ is decreased, theconvergence rate of θ k increases; however, the condition number of P k degrades, and the effectof P is reduced. 44 λ = 1 -1 0 1-101 λ = 0 . Figure 6: Example 8. Subspace constrained regressor. The first component of each vector isplotted along the horizontal axis, and the second component is plotted along the vertical axis. Thesingular values σ i ( P ) are shown with the corresponding singular vector u P ,i . All regressors φ k lie along the same one-dimensional subspace, and thus, ( φ k ) ∞ k =0 is not persistently exciting.Consequently, each estimate θ k of θ lies in this subspace. The color gradient from yellow to blueof θ k and ˜ θ k shows the evolution from k = 1 to k = 1000 . In a), the singular value correspondingto the cyan singular vector decreases to zero, whereas the singular value corresponding tothe magenta singular vector is bounded. Note that ˜ θ k converges along the singular vectorcorresponding to the bounded singular value. In b), the singular value corresponding to the cyansingular vector is bounded, whereas the singular value corresponding to the magenta singularvector diverges. Note that ˜ θ k converges along the singular vector corresponding to the divergingsingular value. 45igure 7: Example 9. Effect of lack of persistent excitation on θ k . a) shows the predicted error z k ,b) shows the norm of the parameter error ˜ θ k , c) shows the singular values of P k , and d) shows thecondition number of P k . Note that six singular values of P k remain bounded due to the presenceof three harmonics in the regresssor. Due to finite-precision arithmetic, the computation becomeserroneous as P k becomes numerically ill-conditioned, and thus, the estimate θ k diverges.46igure 8: Illustrative example of the information-rich subspace. Let u , u , and u be theinformation directions (shown in blue). The regressor φ (shown in red) has new informationalong all three information directions, as shown by the nonzero values ψ , , ψ , , and ψ , ; theinformation-rich subspace is thus R ([ u u u ]) . On the other hand, the regressor φ (shown ingreen) has new information only along u and u , as shown by the nonzero values ψ , and ψ , ;the information-rich subspace is thus R ([ u u ]) .47igure 9: Example 10. Relation between P k and the information content ψ k . a), b), and c) showthe information content col i ( ψ k ) for several values of λ . Note that, in each case, the information-rich subspace is six dimensional due to the presence of three harmonics in u k . d), e), and (f)show the singular values of P − k for several values of λ . The inverse of the condition numberof P k is shown in black. Note that, for λ < , the singular values of P − k corresponding tothe singular vectors in the orthogonal complement of the information-rich subspace converge tozero. 48igure 10: Example 11. Variable-direction forgetting for a regressor lacking persistent excitation.a) and b) show the information content k ψ k k for λ = 0 . and λ = 0 . . c) and d) show the singularvalues of P − k for λ = 0 . and λ = 0 . . The inverse of the condition number of P k is shownin black. Note that, for λ < , the singular values that correspond to the singular vectors not inthe information-rich subspace do not converge to zero.49igure 11: Example 12. Effect of variable-direction forgetting on θ k . a) shows the predicted error z k , b) shows the norm of the parameter error ˜ θ k , c) shows the singular values of P k , and d)shows the condition number of P k . Note that all of the singular values of P kk