An optimal linear filter for estimation of random functions in Hilbert space
AAn optimal linear filter for estimation of random functions inHilbert space
Phil Howlett and Anatoli TorokhtiAugust 31, 2020
Abstract
Let f be a square-integrable, zero-mean, random vector with observable realizations in a Hilbertspace H , and let g be an associated square-integrable, zero-mean, random vector with realizations,which are not observable, in a Hilbert space K . We seek an optimal filter in the form of a closed linearoperator X acting on the observable realizations of a proximate vector f (cid:15) ≈ f that provides the bestestimate (cid:98) g (cid:15) = X f (cid:15) of the vector g . We assume the required covariance operators are known. Theresults are illustrated with a typical example. Primary: 49J55, Secondary: 49K45, 60K40
Keywords and phrases: random functions, optimal estimation, linear operators, generalized inverseoperators
A common problem in engineering, applied mathematics and statistics is the estimation of a randomfunction g , whose realizations g ( ω ) are not observable, by using the observable realizations f ( ω ) of anassociated random function f . We consider the following problem. Problem 1.1
Let (Ω , Σ , µ ) be a probability space, H and K Hilbert spaces, and f ∈ L (Ω , H ) and g ∈ L (Ω , K ) square-integrable, zero-mean, random functions with respective observable and unobserv-able realizations f ( ω ) and g ( ω ) for each outcome ω ∈ Ω . Find a closed, densely defined, linear operator X : D ( X ) ⊆ H → K , a proximate observable function f (cid:15) for each (cid:15) > with E [ (cid:107) f (cid:15) − f (cid:107) ] < (cid:15) and f (cid:15) ( ω ) ∈ D ( X ) for µ -almost all ω ∈ Ω , and a corresponding estimate (cid:98) g (cid:15) = X f (cid:15) of the unobservablefunction, such that E [ (cid:107) X f (cid:15) − g (cid:107) ] = (cid:82) Ω (cid:107) X f (cid:15) ( ω ) − g ( ω ) (cid:107) µ ( d ω ) (1) is minimized. (cid:50) For each outcome ω ∈ Ω the realization r (cid:15) ( ω ) = X f (cid:15) ( ω ) − g ( ω ) of the error function is an element ofthe Hilbert space K . The value (cid:107) r (cid:15) ( ω ) (cid:107) is the square of the magnitude of the pointwise error. Theestimated overall error E [ (cid:107) X f (cid:15) − g (cid:107) ] in (1) is the mean or expected value of the square of the magnitudeof the pointwise error. The proximate observable function f (cid:15) must be close to the observable function f in the sense that the mean square observation error E [ (cid:107) f (cid:15) − f (cid:107) ] must be small. The outcomes f (cid:15) ( ω ) a r X i v : . [ m a t h . S T ] A ug ust lie in the domain space D ( X ) of the operator X for µ -almost all ω ∈ Ω . The pointwise estimate (cid:98) g (cid:15) = X f (cid:15) ∈ L (Ω , K ) of the unobservable function is defined by (cid:98) g (cid:15) ( ω ) = X f (cid:15) ( ω ) = X · f (cid:15) ( ω ) for each ω ∈ Ω . The linear operator X does not depend on the approximation parameter (cid:15) . We use the terms random vector and random function interchangeably, but for the most part, prefer the latter.We assume that the key covariance operators, the auto-covariance E ff and the cross-covariance E gf , areknown bounded linear operators. We expect to find a solution in the form (cid:98) g = X f where X = E gf E ff † and E ff † is the generalized inverse auto-covariance operator and this perception enables us to identifysome critical issues. The auto-covariance operator E ff is positive semi-definite, self-adjoint and compact.Therefore the spectral set is reduced to a countable collection of real non-negative eigenvalues. When thereare an infinite number of positive eigenvalues the auto-covariance is not bounded below and the range spaceis not a closed subspace. Therefore the generalized inverse auto-covariance E ff † is an unbounded linearoperator. Consequently the proposed solution X = E gf E ff † is also unbounded. Now there are two specificissues that must be resolved.In the first instance the usual justification for the solution assumes that the auto-covariance of the trans-formed observable function X f is given by the formula E X f ,X f = XE ff X ∗ . The usual justification is nolonger valid if the operator X is unbounded. The matter is resolved by writing X = T E ff † where T is abounded linear operator and then using an alternative argument to find an optimal value for T .In the second instance a solution in the form (cid:98) g = E gf E ff † f would require the observable function f to lie in the domain of the unbounded operator E ff † . This cannot be guaranteed. The difficulty canbe resolved by introducing a proximate observable function f (cid:15) , for each (cid:15) > , which must lie in thedomain of E ff † but needs to be close to f in the sense that the mean square error in the observed valuessatisfies E [ (cid:107) f (cid:15) − f (cid:107) ] < (cid:15) . The proposed solution now takes the form (cid:98) g (cid:15) = E gf E ff † f (cid:15) . This raises afurther question. How can we ensure that the operator—which does not depend on the approximationparameter—is still optimal for the proximate function? The answer is found by taking the proximatefunction as a partial sum of the Fourier series for the observable function. Let (Ω , Σ , µ ) denote a probability space where Ω is the set of outcomes, Σ a complete σ -field of measurablesubsets E ⊆ Ω and µ : Σ → [0 , an associated probability measure on Σ , with µ (Ω) = 1 . Each element ω ∈ Ω represents the outcome of an observation or experiment and each E ∈ Σ is a set of outcomes, calledan event. We say that the event E has occurred if ω ∈ E . Let f = f ( ω ) ∈ C m and g = g ( ω ) ∈ C n becomplex-valued random vectors with zero mean. That is, we assume µ f = (cid:82) Ω f ( ω ) µ ( d ω ) = and µ g = (cid:82) Ω g ( ω ) µ ( d ω ) = . We would like to estimate f from a knowledge of g . One might postulate a linear relationship in the form X f = g + r where X ∈ C n × m is an unknown matrix and r = r ( ω ) ∈ C n is a random error vector which isindependent of f and has zero mean. If so then after k realizations one would obtain a system of equations X [ f , . . . , f k ] = [ g , . . . , g k ] + [ r , . . . , r k ] ⇐⇒ XF = G + R (2)where we have written f j = f ( ω j ) , g j = g ( ω j ) , r j = r ( ω j ) where ω j is the outcome of the j th experimentand where F = [ f , . . . , f k ] ∈ C m × k , G = [ g , . . . , g k ] ∈ C n × k and R = [ r , . . . , r k ] ∈ C n × k . In general,because we have merely postulated a linear relationship, one would not expect this equation to be satisfiedexactly. Thus we seek to minimize the mean-square error (cid:80) kj =1 (cid:107) X f j − g j (cid:107) = (cid:80) kj =1 (cid:107) r j (cid:107) . Hence we2olve the system XF F ∗ = GF ∗ ⇔ X (cid:80) ki =1 f i f i ∗ = (cid:80) ki =1 g i f i ∗ . (3)We make a probabilistic interpretation of this equation by noting that E ff = E [ f f ∗ ] ∼ k (cid:80) ki =1 f i f i ∗ = 1 k F F ∗ (4)and E gf = E [ gf ∗ ] ∼ k (cid:80) ki =1 g i f i ∗ = 1 k GF ∗ (5)where E is the expectation operator and E ff ∈ C m × m and E gf ∈ C n × m are the standard auto-covarianceand cross-covariance matrices for zero-mean vectors. Thus we rewrite the equation for the best estimateof X in the form XE ff = E gf . (6) To extend the above analysis to random vectors in Hilbert space we must be able to define appropriatecovariance operators. Notice that (cid:104) E ff x , u (cid:105) ∼ k (cid:104) F F ∗ x , u (cid:105) = 1 k (cid:80) kj =1 (cid:104) u , f j (cid:105)(cid:104) f j , x (cid:105) ∼ E [ (cid:104) u , f (cid:105)(cid:104) f , x (cid:105) ] for each x , u ∈ C m and (cid:104) E gf x , y (cid:105) ∼ k (cid:104) GF ∗ x , y (cid:105) = 1 k (cid:80) kj =1 (cid:104) y , g j (cid:105)(cid:104) f j , x (cid:105) ∼ E [ (cid:104) y , g (cid:105)(cid:104) f , x (cid:105) ] for each x ∈ C m and y ∈ C n and also thattr ( E ff ) ∼ k tr ( F F ∗ ) = 1 k tr ( F ∗ F ) = 1 k (cid:80) ki =1 (cid:107) f i (cid:107) ∼ E [ (cid:107) f (cid:107) ] where E is the expectation operator. By taking the limit as the number of independent realizations tendsto infinity we obtain the basic theoretical relationships (cid:104) E ff x , u (cid:105) = E [ (cid:104) u , f (cid:105)(cid:104) f , x (cid:105) ] , (7) (cid:104) E gf x , y (cid:105) = E [ (cid:104) y , g (cid:105)(cid:104) f , x (cid:105) ] , (8)tr ( E ff ) = E [ (cid:107) f (cid:107) ] (9)for all x , u ∈ C m and y ∈ C n . We will take these as the definitive properties of the covariance operatorsfor the random vectors f and g . We illustrate our theoretical results by considering the problem of input retrieval in an infinite-dimensionallinear system. Our formal task is to find an optimal estimate of the system input from observations of thesystem output. The input is a random function g which is represented as a Fourier series with randomcoefficients. The output f is a random function where each realization f ( ω ) of the output is uniquelydetermined by the corresponding realization g ( ω ) of the input for some ω ∈ Ω . We assume there is3o independently generated noise to disrupt our observations of the output. This makes no substantialdifference to the methodology. The introduction of noise simply decreases the accuracy of the estimation.In our hypothetical example we consider a known system so that the required covariance operators E ff and E gf are also known. In practice it may be necessary to estimate these operators a priori in a controlledexperiment. Each observed output f ( ω ) is approximated by a truncated Fourier series f (cid:15) ( ω ) = f m ( ω ) forsome fixed m ∈ N and the input is then estimated using the formula (cid:98) g (cid:15) = E gf E ff † f (cid:15) ⇔ (cid:98) g m = E gf E ff † f m .See also [12] for an application to input retrieval in finite-dimensional linear control systems and [4, Section8.4.1, pp 261–262] for the extension of these ideas to infinite-dimensional systems.Our hypothetical example is a special case of a more general collection of so-called inverse problems. SeeCotter et al. [6] for an extended discussion of the underlying statistical theory of optimal estimation and acollection of particular inverse problems arising from data assimilation in fluid mechanics. In each appli-cation one assumes that the system evolves in a predominantly deterministic manner from some unknowninitial configuration and that the evolution is monitored either directly or indirectly by observation ofvarious output signals that may or may not be disrupted by random noise. The objective is to makeinference about the underlying velocity field. For problems without model error the inference is on theinitial conditions. For problems with model error the inference is on the initial conditions and on the driv-ing noise process or, equivalently, on the entire time-dependent velocity field. Cotter et al. [6] illustratetheir theoretical results by considering the velocity field for fluid flow generated by the two-dimensionalNavier–Stokes equation on a torus. They claim that the case of Eulerian observations—direct observa-tions of the velocity field itself—is then a model for weather forecasting and that the case of Lagrangianobservations—observations of passive tracers advected by the flow—is then a model for data arising inoceanography. We shall assume throughout the paper—unless stated otherwise—that
H, K are Hilbert spaces over thefield C of complex numbers, that (Ω , Σ , µ ) is a probability space, and that L (Ω , H ) and L (Ω , K ) are thespaces of square-integrable random functions taking values in H and K respectively.Let f ∈ L (Ω , H ) and g ∈ L (Ω , K ) be zero-mean random functions. We show that the auto-covariance E ff ∈ B ( H ) is a nuclear operator. If the range space E ff ( H ) ⊆ K is not closed we prove that thegeneralized inverse auto-covariance operator E ff † : D ( E ff † ) ⊆ H → H is an unbounded, closed, denselydefined, self-adjoint, linear operator. We also show that the cross-covariance E gf ∈ B ( H, K ) is well definedand that the null space of E ff is a subspace of the null space of E gf .Finally we show that there exists an optimal, closed, densely defined, linear operator X = E gf E ff † : D ( E ff † ) ⊆ H → K , a proximate observable function p = f (cid:15) ∈ L (Ω , M ) for each (cid:15) > with E [ (cid:107) p − f (cid:107) ] <(cid:15) and p ( ω ) ∈ D ( E ff † ) for µ -almost all ω ∈ Ω , and a corresponding optimal estimate (cid:98) g (cid:15) = X p ∈ L (Ω , K ) of the unobservable function g with mean square error E [ (cid:107) E gf E ff † p − g (cid:107) ] = tr ( E gg − E gp E pp † E pg ) . (10)The operator X = E gf E ff † minimizes the mean square error E [ (cid:107) X p − g (cid:107) ] over all closed, densely defined,linear operators X = T E ff † : D ( E ff † ) ⊆ H → K where T ∈ B ( H, K ) . The operator X does not dependon the parameter (cid:15) . The notation p = f (cid:15) is simply a device to avoid the use of a double subscript in (10).4 Structure of the paper
In Section 4 we review the previous work on this problem. In Section 5 we survey the necessary pre-liminary material. We need to know that every Hilbert space has an orthonormal basis. We state therelevant background theory [19, pp 86–87] and provide an example of a Hilbert space with an uncountableorthonormal basis. In Section 5.1 we introduce an important elementary nuclear operator. This materialis taken from [13] but is central to later definitions and we need to repeat it here. The necessary theory ofthe Bochner integral is summarized in Section 5.2. Once again we cite the text by Yosida [19, pp 130–134].The Hilbert space covariance operators are introduced and also justified in Section 6. We follow [13] butno longer assume that the Hilbert spaces are separable. It is necessary to show that the auto-covariance ispositive semi-definite and self-adjoint in order to extract a countable orthonormal basis for the orthogonalcomplement of the null space and thereby obtain an effective coordinate representation of the key operators.The material in Section 7 is new. We show that the auto-covariance operator is nuclear and hencealso compact. We define the generalized inverse auto-covariance operator and show that in the generalcase it is an unbounded, closed, densely defined, self-adjoint, linear operator. We also establish thestandard properties of the generalized inverse auto-covariance operator and derive key formulæ for theauto-covariance and cross-covariance of a specific linearly transformed random function that is used toestablish the main result. In Section 8 we show that the null space of the auto-covariance is a subspace ofthe null space of the cross-covariance.In Section 9 we establish our main result—the solution to Problem 1.1. The solution is presented intwo parts. Firstly we prove that a direct solution is possible if the observable function takes almost allvalues in the domain of the generalized inverse auto-covariance operator. Secondly we argue that thedirect solution is essentially preserved when the observable function is replaced by a suitable proximateobservable function. In Section 10 we establish a key result, Lemma 10.1, that relates to practical aspectsof the solution procedure. To conclude, in Section 11, we present a detailed study of a particular example.The example highlights typical difficulties that arise when the results are applied.
Let f ∈ L (Ω , C m ) and g ∈ L (Ω , C n ) be square-integrable, zero-mean, random vectors with realizations f ( ω ) ∈ C m and g ( ω ) ∈ C n in finite-dimensional Euclidean space. We assume that the covariance matrices E ff = E [ f f ∗ ] = (cid:90) Ω f ( ω ) f ( ω ) ∗ µ ( d ω ) ∈ C m × m and E gf = E [ gf ∗ ] = (cid:90) Ω g ( ω ) f ( ω ) ∗ µ ( d ω ) ∈ C n × m are known, where E denotes the expectation operator. If the matrix E ff − exists, then it has long beenknown [17] that the best linear mean-square estimate (cid:98) g = X f of the random vector g from the observeddata vector f is (cid:98) g = E gf E ff − f (11)with expected mean-square error E [ (cid:107) (cid:98) g − g (cid:107) ] = tr ( E gg − E gf E ff − E fg ) (12)5here tr ( · ) denotes the trace operator. In this case the optimal solution X = E gf E ff − ∈ C n × m is a finite-dimensional matrix and the linear mapping (cid:98) g = X f is defined by the relationship (cid:98) g ( ω ) = X f ( ω ) = X · f ( ω ) for all ω ∈ Ω . Strictly speaking one should define an operator L X ∈ B ( L (Ω , C m ) , L (Ω , C n )) by setting [ L X f ]( ω ) = X · f ( ω ) for each ω ∈ Ω . We prefer to write X f rather than L X f so that [ X f ]( ω ) = X · f ( ω ) for all ω ∈ Ω . However we note that there are bounded linear transformations F ∈ B ( L (Ω , C m ) , L (Ω , C n )) that cannot be written in this way.Yamashita and Ogawa [18] considered the special case f = g + r where f and r are independent randomvectors with realizations in a finite-dimensional Euclidean space. When the auto-covariance matrix E ff is singular they showed that an optimal estimate can be found in the form (cid:98) g = E ff E ff † f where E ff † isthe Moore–Penrose inverse [4, Definition 2.2, p 10]. The expected mean-square error in this special caseis E [ (cid:107) (cid:98) g − g (cid:107) ] = E [ (cid:107) r (cid:107) ] = tr ( E rr ) . Hua and Liu [14] improved this result by showing that the randomvectors f and g can lie in different spaces and that no special relationship between the two vectors isnecessary. The optimal estimate is now given by (cid:98) g = E gf E ff † f (13)with expected mean-square error E [ (cid:107) (cid:98) g − g (cid:107) ] = tr ( E gg − E gf E ff † E fg ) . (14)This solution was extended to random vectors taking values in different Hilbert spaces by Fomin andRuzhansky [9, Theorem 4.1] and by Howlett, Pearce and Torokhti [13, Theorem 3], independently, andat about the same time. In each case the authors assumed that the generalized inverse auto-covarianceoperator E ff † was a bounded linear operator. We make no such assumption here and propose a moregeneral solution procedure that allows the generalized inverse operator E ff † to be unbounded. Thisrelaxation has profound implications. See our earlier remarks in Sections 1 and 2. A substantial portion of the preliminary material in Sections 5.1 and 5.2 is reprised from [13]. We beginwith some basic facts about Hilbert space. In particular we need to know that every Hilbert space hasan orthonormal basis which may or may not be countable. We follow the presentation in Yosida [19, pp86–87].
Definition 5.1
A set S of vectors in a Hilbert space H is called an orthogonal set if (cid:104) x , u (cid:105) = 0 for all x , u ∈ S with x (cid:54) = u . If, in addition, (cid:107) x (cid:107) = 1 for all x ∈ S then we say the S is an orthonormal set. Anorthonormal set S of a Hilbert space H is called a complete orthonormal system or an orthonormal basisof H , if no orthonormal set of H contains S as a proper subset. (cid:50) Some authors say that a complete orthonormal set is a maximal orthonormal set. See Naylor and Sell [16,Definition 5.17.4, p 306].
Theorem 5.1
A Hilbert space H containing a non-zero vector has at least one complete orthonormalsystem. Moreover, if S is any orthonormal set in H , there is a complete orthonormal set containing S . (cid:50) Theorem 5.2
Let S = { x α } α ∈ A be a complete orthonormal system of a Hilbert space H . For any h ∈ H we define the Fourier coefficients of h with respect to S by h α = (cid:104) h , x α (cid:105) for each α ∈ A . Then we haveParseval’s relation (cid:107) h (cid:107) = (cid:80) α ∈ A | h α | . (cid:50) . orollary 5.1 Let S = { x α } α ∈ A be a complete orthonormal system in H . For each h ∈ H there is a count-able subset S h , + ⊆ S such that h α = (cid:104) h , x α (cid:105) (cid:54) = 0 for α ∈ S h , + and h α = (cid:104) h , x α (cid:105) = 0 for α ∈ S h , = S \ S h , + .If we write S h , + in the form S h , + = { x h ,j } j ∈ N for convenience then we have (cid:107) (cid:80) ∞ j = n +1 (cid:104) h , x h ,j (cid:105) x h ,j (cid:107) → as n → ∞ and we can represent h by the Fourier series h = (cid:80) j ∈ N (cid:104) h , x h ,j (cid:105) x h ,j . (cid:50) The following example is taken from Naylor and Sell [16, Example 10, p 320].
Example 5.1
The set AP of all complex-valued almost periodic functions f : R → C with the property lim T →∞ (1 /T ) (cid:82) [ − T,T ] | f ( t ) | dt < ∞ becomes a Hilbert space if we define an inner product (cid:104) f , g (cid:105) = lim T →∞ (1 /T ) (cid:82) [ − T,T ] f ( t ) g ( t ) dt for each f , g ∈ AP and an associated norm (cid:107) f (cid:107) = (cid:104) f , f (cid:105) / for each f ∈ AP . The set { e α } α ∈ R definedby e α ( t ) = e iαt for each t ∈ R forms an uncountable orthonormal basis for AP . (cid:50) For each h ∈ H define a corresponding linear operator J h ∈ B ( C , H ) by the formula J h z = z h . The rangespace J h ( C ) ⊆ H is a one-dimensional subspace spanned by h . The adjoint operator J h ∗ ∈ B ( H, C ) isdefined by the relationship zJ h ∗ x = (cid:104) J h ∗ x , z (cid:105) = (cid:104) x , J h z (cid:105) = (cid:104) x , z h (cid:105) = z (cid:104) x , h (cid:105) for all x ∈ H and z ∈ C and hence J h ∗ x = (cid:104) x , h (cid:105) for each x ∈ H . If x ⊥ h then J h ∗ x = 0 . If T ∈ B ( H, K ) and we define k = T h then J k ∈ B ( C , K ) and we have J k z = z k = zT h = T ( z h ) = T J h z for all z ∈ C . Thus J k = T J h . We also have J k ∗ = J h ∗ T ∗ ∈ B ( K, C ) and J k J k ∗ = T J h J h ∗ T ∗ ∈ B ( K ) . If h ∈ H and k ∈ K the operator J k J h ∗ ∈ B ( H, J k ( C )) is given by J k J h ∗ x = (cid:104) x , h (cid:105) k for each x ∈ H and so (cid:104) J k J h ∗ x , y (cid:105) = (cid:104) x , h (cid:105)(cid:104) k , y , (cid:105) for each x ∈ H and y ∈ K .We are particularly interested in the operator J h J h ∗ ∈ B ( H, J h ( C )) . Since J h ( C ) ⊆ H is a one-dimensionalsubspace it follows that J h J h ∗ is a compact operator [16, pp 379–381]. If x ∈ J h ( C ) then x = w h for some w ∈ C and so J h J h ∗ x = (cid:104) w h , h (cid:105) h = (cid:107) h (cid:107) w h = (cid:107) h (cid:107) x . Thus x is an eigenvector with correspondingeigenvalue (cid:107) h (cid:107) . If u ∈ J h ( C ) ⊥ then J h J h ∗ u = (cid:104) u , h (cid:105) h = and so u is an eigenvector with correspondingeigenvalue . Write H = J h ( C ) ⊕ J h ( C ) ⊥ . Define x = h / (cid:107) h (cid:107) and let { u α } α ∈ A be a complete orthonormalset in J h ( C ) ⊥ . The trace of the positive semi-definite, self-adjoint operator J h J h ∗ ∈ B ( H ) is given bytr ( J h J h ∗ ) = (cid:104) J h J h ∗ x , x (cid:105) + (cid:80) α ∈ A (cid:104) J h J h ∗ u α , u α (cid:105) = (cid:104) h , x (cid:105)(cid:104) x , h (cid:105) = (cid:104) h , h (cid:105) / (cid:107) h (cid:107) = (cid:107) h (cid:107) < ∞ . Thus J h J h ∗ is a nuclear or equivalently trace-class operator [5, 7, 19]. Let X be a Banach space over the field C of complex numbers with norm (cid:107) · (cid:107) : X → [0 , ∞ ) . We say thata function f : Ω → X is a vector-valued random function or simply a random function. The followingdefinitions and results have been extracted from the text by Yosida [19, pp 130–134].7 efinition 5.2 The random function f : Ω → X is said to be finitely valued if there exists a finitecollection of disjoint sets { E j } mj =1 ∈ Σ such that f ( ω ) = c j for each ω ∈ E j and each j = 1 , , . . . , m and f ( ω ) = elsewhere. In such cases we define the µ -integral of f by the formula (cid:82) Ω f ( ω ) µ ( d ω ) = (cid:80) mj =1 c j µ ( E j ) . (cid:50) Definition 5.3
The function f : Ω → X is strongly Σ -measurable if there exists a sequence { f n } n ∈ N offinitely-valued functions f n : Ω → H with (cid:107) f ( ω ) − f n ( ω ) (cid:107) → for µ -almost all ω ∈ Ω . (cid:50) Definition 5.4
The function f : Ω → X is Bochner µ -integrable if there exists a sequence { f n } n ∈ N offinitely-valued functions f n : Ω → X with (cid:107) f n ( ω ) − f ( ω ) (cid:107) → for µ -almost all ω ∈ Ω in such a way that lim n →∞ (cid:82) Ω (cid:107) f n ( ω ) − f ( ω ) (cid:107) µ ( d ω ) = 0 . For each set E ∈ Σ the Bochner µ -integral of f ( ω ) over S is defined by (cid:82) E f ( ω ) µ ( d ω ) = lim n →∞ (cid:82) Ω χ E ( ω ) f n ( ω ) µ ( d ω ) where χ E : Ω → { , } is the characteristic function for E given by χ E ( ω ) = 1 for ω ∈ E and χ E ( ω ) = 0 otherwise. (cid:50) Theorem 5.3
A strongly Σ -measurable function f : Ω → X is Bochner µ -integrable if and only if thefunction (cid:107) f (cid:107) : Ω → [0 , ∞ ) defined by (cid:107) f (cid:107) ( ω ) = (cid:107) f ( ω ) (cid:107) for all ω ∈ Ω is µ -integrable in which case (cid:107) (cid:82) E f ( ω ) µ ( d ω ) (cid:107) ≤ (cid:82) E (cid:107) f ( ω ) (cid:107) µ ( d ω ) for each E ∈ Σ . (cid:50) Corollary 5.2
Let X and Y be Banach spaces and suppose that T ∈ B ( X, Y ) . If the function f : Ω → X is Bochner µ -integrable then the function g = T f : Ω → Y defined by g ( ω ) = T f ( ω ) for µ -almost all ω ∈ Ω is Bochner µ -integrable with (cid:82) E g ( ω ) µ ( d ω ) = T (cid:82) E f ( ω ) µ ( d ω ) for each E ∈ Σ . (cid:50) Let f : Ω → X be a Bochner µ -integrable random function taking values in the Banach space X . Theexpected value of f is defined by E [ f ] = (cid:82) Ω f ( ω ) µ ( d ω ) and we note from Theorem 5.3 that (cid:107) E [ f ] (cid:107) ≤ E [ (cid:107) f (cid:107) ] . When T ∈ B ( X, Y ) is a bounded linear map fromthe Banach space X to the Banach space Y , it follows from Corollary 5.2 that E [ T f ] = T E [ f ] .The theory of random functions in Hilbert space is an extension of the corresponding theory in Banachspace. Of particular interest are those properties relating to the scalar product which are used directly indefining the special operators for the optimal filter. Let H be a Hilbert space with scalar product (cid:104)· , ·(cid:105) andlet f : Ω → H be a finitely-valued random function defined by f ( ω ) = (cid:80) mj =1 χ j ( ω ) c j where { E j } mj =1 aredisjoint µ -measurable sets and χ j : Ω → { , } is the characteristic function for E j for each j = 1 , . . . , m .Since (cid:107) u ( ω ) (cid:107) = (cid:80) mj =1 χ j ( ω ) (cid:107) c j (cid:107) , it follows that if T ∈ B ( H ) is a bounded linear map, then we can usethe elementary inequalities |(cid:104) c j , T [ c k ] (cid:105)| ≤ (cid:107) T (cid:107) · (cid:107) c j (cid:107) · (cid:107) c k (cid:107) and (cid:107) c j (cid:107) · (cid:107) c k (cid:107) ≤ (cid:2) (cid:107) c j (cid:107) + (cid:107) c k (cid:107) (cid:3) /
8o deduce that (cid:104) (cid:82) Ω f ( ω ) µ ( d ω ) , (cid:82) Ω T [ f ( ω )] µ ( d ω ) (cid:105) = (cid:80) mj =1 (cid:80) mk =1 µ ( E j ) µ ( E k ) (cid:104) c j , T [ c k ] (cid:105) = (cid:107) T (cid:107) (cid:80) mj =1 (cid:80) mk =1 µ ( E j ) µ ( E k ) · (cid:107) c j (cid:107) · (cid:107) c k (cid:107)≤ (cid:107) T (cid:107) (cid:80) mj =1 (cid:80) mk =1 µ ( E j ) µ ( E k ) (cid:107) · ( (cid:107) c j (cid:107) + (cid:107) c k (cid:107) ) / (cid:107) T (cid:107) (cid:80) mj =1 µ ( E j ) (cid:107) c j (cid:107) = (cid:107) T (cid:107) (cid:82) Ω (cid:107) f ( ω ) (cid:107) µ ( d ω ) . By taking appropriate limits, we can extend the above argument to establish the following general results,which are used to justify construction of the optimal filter.
Theorem 5.4
Let H be a Hilbert space. If the random function f : Ω → H is strongly Σ –measurableand (cid:107) f (cid:107) : Ω → [0 , ∞ ) is µ -integrable, then f is Bochner µ -integrable and for each bounded linear map T ∈ B ( H ) we have (cid:104) (cid:82) Ω f ( ω ) µ ( d ω ) , (cid:82) Ω T f ( ω ) µ ( d ω ) (cid:105) ≤ (cid:107) T (cid:107) (cid:82) Ω (cid:107) f ( ω ) (cid:107) µ ( d ω ) . (cid:50) Corollary 5.3 If f : Ω → H is strongly Σ -measurable and (cid:107) f (cid:107) : Ω → [0 , ∞ ) is µ -integrable, then (cid:107) (cid:82) Ω f ( ω ) µ ( d ω ) (cid:107) ≤ (cid:82) Ω (cid:107) f ( ω ) (cid:107) µ ( d ω ) . (cid:50) Theorem 5.4 and Corollary 5.3 can be expressed in terms of expected values. Let T ∈ B ( H ) and let f : Ω → X be a random function. If (cid:107) f (cid:107) : Ω → [0 , ∞ ) is µ -integrable, then (cid:104) E [ f ] , E [ T f ] (cid:105) ≤ (cid:107) T (cid:107) · E [ (cid:107) f (cid:107) ] and (cid:107) E [ f ] (cid:107) ≤ E [ (cid:107) f (cid:107) ] . If f : Ω → H is strongly Σ -measurable and E [ (cid:107) f (cid:107) ] < ∞ then we say that f ( ω ) is µ -square-integrable on Ω and we write f ∈ L (Ω , H ) . If f , f ∈ L (Ω , H ) and we define the inner product (cid:104)(cid:104) f , f (cid:105)(cid:105) = E [ (cid:104) f , f (cid:105) ] then L (Ω , H ) becomes a Hilbert space. For each f ∈ L (Ω , H ) we write ||| f ||| = (cid:104)(cid:104) f , f (cid:105)(cid:105) / = E [ (cid:107) f (cid:107) ] / for the corresponding norm. If x ∈ H and we define an associated constant function x : Ω → H by setting x ( ω ) = x for all ω ∈ Ω then ||| x ||| = E [ (cid:107) x (cid:107) ] = (cid:82) Ω (cid:107) x (cid:107) µ ( d ω ) = (cid:107) x (cid:107) . Thus x ∈ L (Ω , H ) . Similarly if x , u ∈ H then (cid:104)(cid:104) x , u (cid:105)(cid:105) = (cid:82) Ω (cid:104) x , u (cid:105) µ ( d ω ) = (cid:104) x , u (cid:105) . Thus we could regard H as a subspace of L (Ω , H ) . Suppose that f ∈ L (Ω , H ) is a random function with zero mean. For each ω ∈ Ω we have J f ( ω ) ∈ B ( C , H ) defined by J f ( ω ) z = z f ( ω ) for all z ∈ C and J f ( ω ) ∗ ∈ B ( H, C ) defined by J f ( ω ) ∗ x = (cid:104) x , f ( ω ) (cid:105) for each x ∈ H . Therefore J f ( ω ) J f ( ω ) ∗ ∈ B ( H ) for all ω ∈ Ω with J f ( ω ) J f ( ω ) ∗ x = (cid:104) x , f ( ω ) (cid:105) f ( ω ) and (cid:104) J f ( ω ) J f ( ω ) ∗ x , u (cid:105) = (cid:104) x , f ( ω ) (cid:105)(cid:104) f ( ω ) , u (cid:105) for all ω ∈ Ω and each x , u ∈ H . We also have J f ( ω ) ∗ J f ( ω ) ∈ B ( C ) for all ω ∈ Ω with J f ( ω ) ∗ J f ( ω ) z = z (cid:107) f ( ω ) (cid:107) for all ω ∈ Ω and each z ∈ C . If T ∈ B ( H, K ) then J T f ( ω ) = T J f ( ω ) and J T f ( ω ) ∗ = J f ( ω ) ∗ T ∗ forall ω ∈ Ω . To continue we must show that certain key functions are measurable.9 emma 6.1 Let x ∈ H and f ∈ L (Ω , H ) . If we define an associated random function p : Ω → H bysetting p ( ω ) = J f ( ω ) J f ( ω ) ∗ x = (cid:104) x , f ( ω ) (cid:105) f ( ω ) for all ω ∈ Ω then p is strongly Σ -measurable. (cid:50) Proof
Let { f n } n ∈ N be a sequence of finitely valued functions such that (cid:107) f n ( ω ) − f ( ω ) (cid:107) → as n → ∞ for µ -almost all ω ∈ Ω . Define p n : Ω → H by setting p n ( ω ) = (cid:104) x , f n ( ω ) (cid:105) f n ( ω ) for all ω ∈ Ω . Then { p n } n ∈ N is a sequence of finitely valued functions with (cid:107) p n ( ω ) − p ( ω ) (cid:107) = (cid:107)(cid:104) x , f n ( ω ) (cid:105) f n ( ω ) − (cid:104) x , f ( ω ) (cid:105) f ( ω ) (cid:107) = (cid:107)(cid:104) x , f n ( ω ) − f ( ω ) (cid:105) f n ( ω ) + (cid:104) x , f ( ω ) (cid:105) [ f n ( ω ) − f ( ω )] (cid:107)≤ (cid:107) x (cid:107) · (cid:107) f n ( ω ) − f ( ω ) (cid:107) · (cid:107) f n (cid:107) + (cid:107) x (cid:107) · (cid:107) f ( ω ) (cid:107) · (cid:107) f n ( ω ) − f ( ω ) (cid:107) → as n → ∞ for µ -almost all ω ∈ Ω . Therefore p is strongly Σ -measurable. (cid:50) Suppose that f ∈ L (Ω , H ) is a µ -square-integrable random function with zero mean. The inequality (cid:107) (cid:82) Ω (cid:104) x , f ( ω ) (cid:105) f ( ω ) µ ( d ω ) (cid:107) ≤ (cid:107) x (cid:107) (cid:82) Ω (cid:107) f ( ω ) (cid:107) µ ( d ω ) = (cid:107) x (cid:107) · ||| f ||| < ∞ justifies the definition of an operator E ff ∈ B ( H ) by setting E ff x = (cid:82) Ω J f ( ω ) J f ( ω ) ∗ x µ ( d ω ) = (cid:82) Ω (cid:104) x , f ( ω ) (cid:105) f ( ω ) µ ( d ω ) = E [ (cid:104) x , f (cid:105) f ] for all x ∈ H . Let T ∈ B ( H, K ) . We have T E ff T ∗ y = T (cid:82) Ω (cid:104) T ∗ y , f ( ω ) (cid:105) f ( ω ) µ ( d ω )= (cid:82) Ω (cid:104) T ∗ y , f ( ω ) (cid:105) T · f ( ω ) µ ( d ω )= (cid:82) Ω (cid:104) y , T f ( ω ) (cid:105) T f ( ω ) µ ( d ω )= E T f ,T f y for all y ∈ K . Thus we have E T f ,T f = T E ff T ∗ ∈ B ( K ) . We also have (cid:104) E ff x , u (cid:105) = (cid:82) Ω (cid:104) J f ( ω ) J f ( ω ) ∗ x , u (cid:105) µ ( d ω )= (cid:82) Ω (cid:104) x , f ( ω ) (cid:105)(cid:104) f ( ω ) , u (cid:105) µ ( d ω ) = E [ (cid:104) x , f (cid:105)(cid:104) f , u (cid:105) ] for all x , u ∈ H . Therefore (cid:104) E ff x , x (cid:105) = (cid:82) Ω (cid:104) J f ( ω ) J f ( ω ) ∗ x , x (cid:105) µ ( d ω )= (cid:82) Ω |(cid:104) x , f ( ω ) (cid:105)| µ ( d ω ) = E [ |(cid:104) x , f (cid:105)| ] ≥ and hence E ff is positive semi-definite and self-adjoint. We have the following elementary, but important,results. Lemma 6.2
Let f ∈ L (Ω , H ) and let x ∈ H . Then x ∈ E ff − ( { } ) if and only if (cid:104) x , f ( ω ) (cid:105) = 0 for µ -almost all ω ∈ Ω . (cid:50) Proof If x ∈ E ff − ( { } ) then E ff x = and so (cid:104) E ff x , x (cid:105) = (cid:82) Ω |(cid:104) x , f ( ω ) (cid:105)| µ ( d ω ) = 0 . Therefore (cid:104) x , f ( ω ) (cid:105) = 0 for µ -almost all ω ∈ Ω . Conversely if (cid:104) x , f ( ω ) (cid:105) = 0 for µ -almost all ω ∈ Ω then (cid:104) E ff x , u (cid:105) = (cid:82) Ω (cid:104) x , f ( ω ) (cid:105)(cid:104) f ( ω ) , u (cid:105) µ ( d ω ) = 0 for all u ∈ H . Therefore E ff x = and hence x ∈ E ff − ( { } ) . (cid:50) emma 6.3 Let f ∈ L (Ω , H ) . Then tr ( E ff ) = (cid:82) Ω tr ( J f ( ω ) J f ( ω ) ∗ ) µ ( d ω ) . (cid:50) Proof
Let { e β } β ∈ B be a complete orthonormal set in H . Sincetr ( E ff ) = (cid:80) β ∈ B (cid:104) E ff e β , e β (cid:105) = E [ (cid:107) f (cid:107) ] = ||| f ||| < ∞ there is at most a countable subset B + ⊆ B with (cid:104) E ff e β , e β (cid:105) > for each β ∈ B + . Lemma 6.2 shows that (cid:104) E ff e β , e β (cid:105) = 0 if and only if (cid:104) J f ( ω ) J f ( ω ) ∗ e β , e β (cid:105) = |(cid:104) e β , f ( ω ) (cid:105)| = 0 for µ -almost all ω ∈ Ω in which case β ∈ B = B \ B + . Now we havetr ( E ff ) = (cid:80) β ∈ B + (cid:104) E ff e β , e β (cid:105) = (cid:80) β ∈ B + (cid:82) Ω (cid:104) J f ( ω ) J f ( ω ) ∗ e β , e β (cid:105) µ ( d ω )= (cid:82) Ω (cid:80) β ∈ B + (cid:104) J f ( ω ) J f ( ω ) ∗ e β , e β (cid:105) µ ( d ω )= (cid:82) Ω tr ( J f ( ω ) J f ( ω ) ∗ ) µ ( d ω ) as required. (cid:50) Suppose that f ∈ L (Ω , H ) and g ∈ L (Ω , K ) are µ -square-integrable random functions with zero mean.By essentially repeating previous arguments we deduce that J g ( ω ) J f ( ω ) ∗ ∈ B ( H, K ) with J g ( ω ) J f ( ω ) ∗ x = (cid:104) x , f ( ω ) (cid:105) g ( ω ) ∈ K for all ω ∈ Ω and each x ∈ H . It follows that for fixed x ∈ H the function q : Ω → K defined by q ( ω ) = J g ( ω ) J f ( ω ) ∗ x = (cid:104) x , f ( ω ) (cid:105) g ( ω ) for all ω ∈ Ω is strongly Σ -measurable. Now theinequality (cid:107) (cid:82) Ω (cid:104) x , f ( ω ) (cid:105) g ( ω ) µ ( d ω ) (cid:107) ≤ (cid:107) x (cid:107) · ||| f ||| · ||| g ||| < ∞ justifies the definition of an operator E gf ∈ B ( H, K ) by the formula E gf x = (cid:82) Ω J g ( ω ) J f ( ω ) ∗ x µ ( d ω ) = (cid:82) Ω (cid:104) x , f ( ω ) (cid:105) g ( ω ) µ ( d ω ) = E [ (cid:104) x , f (cid:105) g ] for each x ∈ H . We also have (cid:104) J g ( ω ) J f ( ω ) ∗ x , y (cid:105) = (cid:104) x , f ( ω ) (cid:105)(cid:104) g ( ω ) , y (cid:105) for all ω ∈ Ω and each x ∈ H and y ∈ K and so (cid:104) E gf x , y (cid:105) = (cid:82) Ω (cid:104) J g ( ω ) J f ( ω ) ∗ x , y (cid:105) µ ( d ω )= (cid:82) Ω (cid:104) x , f ( ω ) (cid:105)(cid:104) g ( ω ) , y (cid:105) µ ( d ω ) = E [ (cid:104) x , f (cid:105)(cid:104) g , y (cid:105) ] for each x ∈ H and y ∈ K . If g , k ∈ L (Ω , K ) we can use the definitions and basic algebra to show that E g + k , g + k = E gg + E kg + E gk + E kk . The operator E ff ∈ B ( H ) is self-adjoint and positive semi-definite. Thus we can find a countable or-thonormal basis of eigenvectors { x α } α ∈ A + in E ff − { } ⊥ such that E ff x α = λ α x α where λ α > for all α ∈ A + . There is also an orthonormal basis { x α } α ∈ A in E ff − { } with E ff x α = for all α ∈ A . This11asis, which may be uncountable, is automatically a basis of eigenvectors. If we define A = A ∪ A + then { x α } α ∈ A is a complete set of orthonormal eigenvectors in H = E ff − { } ⊕ E ff − { } ⊥ . It follows thattr ( E ff ) = (cid:80) α ∈ A (cid:104) E ff x α , x α (cid:105) = (cid:80) α ∈ A + (cid:104) E ff x α , x α (cid:105) = (cid:80) α ∈ A + (cid:82) Ω |(cid:104) x α , f ( ω ) (cid:105)| µ ( d ω )= (cid:82) Ω (cid:80) α ∈ A + |(cid:104) x α , f ( ω ) (cid:105)| µ ( d ω )= (cid:82) Ω (cid:107) f ( ω ) (cid:107) µ ( d ω ) = E [ (cid:107) f (cid:107) ] = ||| f ||| < ∞ . Therefore E ff is nuclear and hence also compact [19, p 279]. Note thattr ( E ff ) = (cid:80) α ∈ A (cid:104) E ff x α , x α (cid:105) = (cid:80) α ∈ A + (cid:104) λ α x α , x α (cid:105) = (cid:80) α ∈ A + λ α . Consequently the operators E ff ∈ B ( H ) and E gf ∈ B ( H, K ) satisfy the definitive properties (cid:104) E ff x , u (cid:105) = E [ (cid:104) x , f (cid:105)(cid:104) f , u (cid:105) ] , (15) (cid:104) E gf x , y (cid:105) = E [ (cid:104) x , f (cid:105)(cid:104) g , y (cid:105) ] , (16)tr ( E ff ) = E [ (cid:107) f (cid:107) ] (17)for all x , u ∈ H and y ∈ K . Thus we can regard these operators as covariance operators. In this section we describe the generalized inverse auto-covariance operator. We use an orthonormal basisof eigenvectors to construct a Fourier series representation of the auto-covariance E ff and hence definethe generalized inverse auto-covariance E ff † . We establish the important properties and pay particularattention to the general case where E ff † is unbounded, closed, densely defined and self-adjoint.Let { x α } α ∈ A be a complete set of orthonormal eigenvectors for E ff in H with corresponding eigenvalues { λ α } α ∈ A . The set A + = { α | λ α > } is at most a countable set but the set A = A \ A + may beuncountable. For each x ∈ H write x = x + x + = (cid:80) α ∈ A (cid:104) x , x α (cid:105) x α + (cid:80) α ∈ A + (cid:104) x , x α (cid:105) x α ∈ E ff − { } ⊕ E ff − { } ⊥ and define a corresponding element u = E ff x ∈ E ff ( H ) by the formula u = E ff x + E ff x + = E ff x + = (cid:80) α ∈ A + λ α (cid:104) x , x α (cid:105) x α . Therefore u = (cid:80) α ∈ A + (cid:104) u , x α (cid:105) x α with (cid:104) u , x α (cid:105) = λ α (cid:104) x , x α (cid:105) for each α ∈ A + and so (cid:80) α ∈ A + λ α − |(cid:104) u , x α (cid:105)| = (cid:80) α ∈ A + |(cid:104) x , x α (cid:105)| = (cid:107) x + (cid:107) < ∞ . Conversely, if we are given u = (cid:80) α ∈ A + (cid:104) u , x α (cid:105) x α with (cid:80) α ∈ A + λ α − |(cid:104) u , x α (cid:105)| < ∞ then we can define x = (cid:80) α ∈ A + λ α (cid:104) u , x α (cid:105) x α ∈ H so that E ff x = u . Therefore u ∈ E ff ( H ) . It followsthat E ff ( H ) = { u ∈ H | (cid:80) α ∈ A + λ α − |(cid:104) u , x α (cid:105)| < ∞} ⊆ E ff − { } ⊥ . There are two cases to consider. If the index set A + is finite then for some m ∈ N we can write A + = { j ∈ N | j ≤ m } . In this case E ff ( H ) = E ff − { } ⊥ is finite dimensional and closed, and the problem hasalready been solved [9, 13]. Henceforth we assume that A + is infinite and write A + = N with eigenvectors12 x j } j ∈ N and corresponding eigenvalues { λ j } j ∈ N ordered in such a way that λ j ≥ λ j +1 > . Now let D ( E ff † ) = E ff ( H ) ⊕ E ff − { } and define E ff † : D ( E ff † ) → E ff − { } ⊥ by setting E ff † u = (cid:80) j ∈ N λ j − (cid:104) u , x j (cid:105) x j for each u ∈ D ( E ff † ) . We will use the above notation for the eigenvectors and eigenvalues throughoutSection 7 without further comment. We will show that the domain D ( E ff † ) is not closed. Our definition of E ff † is a natural definition. If u ∈ E ff ( H ) then there is a unique point x ∈ E ff − ( { } ) ⊥ such that u = E ff x . Hence we can define E ff † u = x . If u ∈ E ff ( H ) ⊥ we define E ff † u = . We begin by showing that E ff ( H ) is not closed. Weneed to find { u n } n ∈ N ⊆ E ff ( H ) and u / ∈ E ff ( H ) such that (cid:107) u n − u (cid:107) → as n → ∞ .To do this we need to construct a series (cid:80) j ∈ N κ j that converges more slowly than (cid:80) j ∈ N λ j . The followingconstruction is taken from [3]. Define ρ j = (cid:80) ∞ k = j λ k for each j ∈ N and define κ j = λ j / √ ρ j . On the onehand κ j /λ j = 1 / √ ρ j → ∞ as j → ∞ and on the other hand (cid:80) j ∈ N κ j = (cid:80) j ∈ N ( ρ j − ρ j +1 ) / √ ρ j = (cid:80) j ∈ N ( √ ρ j − √ ρ j +1 )( √ ρ j + √ ρ j +1 ) / √ ρ j ≤ (cid:80) j ∈ N √ ρ j − √ ρ j +1 ) = 2 √ ρ < ∞ . Thus (cid:80) j ∈ N κ j is the desired series. Since (cid:80) j ∈ N ( κ j /κ ) < (cid:80) j ∈ N κ j /κ < ∞ we can define u = (cid:80) j ∈ N κ j x j ∈ H . If we also define x n = (cid:80) nj =1 ( κ j /λ j ) x j and u n = E ff x n = (cid:80) nj =1 κ j x j ∈ E ff ( H ) for each n ∈ N then (cid:107) u n − u (cid:107) → as n → ∞ . However { x n } n ∈ N does not converge. Therefore u / ∈ E ff ( H ) . Equivalently wemay say that u n ∈ D ( E ff † ) with u n → u ∈ H as n → ∞ but with x n = E ff † u n ∈ H for each n ∈ N suchthat { x n } n ∈ N diverges. Thus D ( E ff † ) is not closed. We will show that E ff † is unbounded, closed, densely defined and self-adjoint.The operator E ff † is unbounded because E ff † x j = λ j − x j for each j ∈ N with λ j → as j → ∞ .The following argument shows that E ff † is closed. Let { u n } n ∈ N ⊆ D ( E ff † ) . Write u n = (cid:80) j ∈ N (cid:104) u n , x j (cid:105) x j and E ff † u n = (cid:80) j ∈ N λ j − (cid:104) u n , x j (cid:105) x j for each n ∈ N . Now suppose that (cid:107) u n − u (cid:107) = (cid:80) j ∈ N |(cid:104) u n , x j (cid:105) − (cid:104) u , x j (cid:105)| → for some u ∈ H and that (cid:107) E ff † u n − x (cid:107) = (cid:80) j ∈ N | λ j − (cid:104) u n , x j (cid:105) − (cid:104) x , x j (cid:105)| → as n → ∞ for some x ∈ H . Therefore (cid:80) j ∈ N |(cid:104) u n , x j (cid:105) − λ j (cid:104) x , x j (cid:105)| = (cid:80) j ∈ N λ j | λ j − (cid:104) u n , x j (cid:105) − (cid:104) x , x j (cid:105)| ≤ λ (cid:80) j ∈ N | λ j − (cid:104) u n , x j (cid:105) − (cid:104) x , x j (cid:105)| → n → ∞ . Hence u n → u = (cid:80) j ∈ N λ j (cid:104) x , x j (cid:105) x j . Now we have E ff † u = x as required. Thus E ff † is aclosed operator.We show that D ( E ff † ) is dense in H . For each u = (cid:80) α ∈ A (cid:104) u , x α (cid:105) x α ∈ H we can define a sequence { u n } n ∈ N ⊆ D ( E ff † ) by setting u n = (cid:80) nj =1 (cid:104) u , x j (cid:105) x j + (cid:80) α ∈ A (cid:104) u , x α (cid:105) x α (18)such that (cid:107) u n − u (cid:107) = (cid:80) ∞ j = n +1 |(cid:104) u , x j (cid:105)| → as n → ∞ . Thus E ff † is densely defined.Finally we show that E ff † is self-adjoint. Suppose u , v ∈ D ( E ff † ) . If we write u = (cid:80) α ∈ A (cid:104) u , x α (cid:105) x α and v = (cid:80) α ∈ A (cid:104) v , x α (cid:105) x α then we have E ff † u = (cid:80) j ∈ N λ j − (cid:104) u , x j (cid:105) x j and E ff † v = (cid:80) j ∈ N λ j − (cid:104) v , x j (cid:105) x j . Consequently (cid:104) E ff † u , v (cid:105) = (cid:80) j ∈ N λ j − (cid:104) u , x j (cid:105)(cid:104) x j , v (cid:105) = (cid:80) j ∈ N λ j − (cid:104) v , x j (cid:105) (cid:104) x j , u (cid:105) = (cid:104) E ff † v , u (cid:105) = (cid:104) u , E ff † v (cid:105) . Thus E ff † is self-adjoint. We justify our definitions by showing that E ff † satisfies the standard properties associated with a general-ized inverse operator. Let u = (cid:80) α ∈ A (cid:104) u , x α (cid:105) x α ∈ H and let { u n } n ∈ N ⊆ H be the sequence defined abovein (18) with u n ∈ D ( E ff † ) for all n ∈ N and (cid:107) u n − u (cid:107) → as n → ∞ . Since E ff E ff † u n = (cid:80) j ∈ N (cid:104) u n , x j (cid:105) x j we can define E ff E ff † u = lim n →∞ (cid:80) j ∈ N (cid:104) u n , x j (cid:105) x j = (cid:80) j ∈ N (cid:104) u , x j (cid:105) x j ∈ E ff − { } ⊥ . Therefore (cid:104) E ff E ff † u , v (cid:105) = (cid:80) j ∈ N (cid:104) u , x j (cid:105)(cid:104) x j , v (cid:105) for each u , v ∈ H . For each v ∈ H we have E ff v = (cid:80) j ∈ N λ j (cid:104) v , x j (cid:105) x j ∈ E ff ( H ) ⊆ D ( E ff † ) . It follows that E ff † E ff v = (cid:80) j ∈ N (cid:104) v , x j (cid:105) x j ∈ E ff − { } ⊥ andhence that (cid:104) E ff † E ff v , u (cid:105) = (cid:80) j ∈ N (cid:104) v , x j (cid:105)(cid:104) x j , u (cid:105) for each u , v ∈ H . A similar argument to that used in the previous section now shows that (cid:104) [ E ff † E ff ] ∗ u , v (cid:105) = (cid:104) E ff † E ff u , v (cid:105) for all u , v ∈ H .We can now see that the operator E ff † : D ( E ff † ) → H has the following properties.1. E ff E ff † E ff = E ff ∈ B ( H ) .2. E ff † E ff E ff † = E ff † : D ( E ff † ) → H .3. [ E ff E ff † ] ∗ = E ff E ff † ∈ B ( H ) .4. [ E ff † E ff ] ∗ = E ff † E ff ∈ B ( H ) . 14 .4 Some specific identities. Let T ∈ B ( H, K ) and suppose that T ∗ y ∈ D ( E ff † ) and that f ( ω ) ∈ D ( E ff † ) for µ -almost all ω ∈ Ω . Wehave E ff ( E ff † T ∗ y ) = (cid:82) Ω (cid:104) E ff † T ∗ y , f ( ω ) (cid:105) f ( ω ) µ ( d ω )= (cid:82) Ω (cid:104) T ∗ y , E ff † f ( ω ) (cid:105) f ( ω ) µ ( d ω )= (cid:82) Ω (cid:104) y , T E ff † f ( ω ) (cid:105) f ( ω ) µ ( d ω ) because E ff † is self-adjoint. Therefore T E ff † E ff E ff † T ∗ y = (cid:82) Ω (cid:104) y , T E ff † f ( ω ) (cid:105) T E ff † f ( ω ) µ ( d ω ) = E kk y where we have written k = T E ff † f for convenience. Therefore we have E kk = T E ff † E ff E ff † T ∗ = T E ff † T ∗ . Similar arguments can be used to show that E gk = E gf E ff † T ∗ and E kg = T E ff † E fg . Theproof of the main result makes use of these specific identities. The next two results are important to the solution of Problem 1.1. We show that the null space of E ff isa subspace of the null space of E gf and hence deduce that E gf = E gf E ff † E ff . Lemma 8.1
Let P = E ff − { } and Q = E gf − { } denote the null spaces of E ff and E gf respectively.Then P ⊆ Q ⊆ H . (cid:50) Proof
Let u ∈ P . Then E [ |(cid:104) u , f (cid:105)| ] = E [ (cid:104) u , f (cid:105)(cid:104) f , u (cid:105) ] = (cid:104) E ff u , u (cid:105) = 0 . For each v ∈ K it follows that |(cid:104) E gf u , v (cid:105)| = | E [ (cid:104) u , f (cid:105)(cid:104) g , v (cid:105) ] | ≤ E [ |(cid:104) u , f (cid:105)| ] / E [ |(cid:104) g , v (cid:105)| ] / = 0 . Therefore E gf u = . Hence u ∈ Q . (cid:50) Corollary 8.1
Let
H, K be Hilbert spaces with f ∈ L (Ω , H ) and g ∈ L (Ω , K ) . We have E gf ( I − E ff † E ff ) = 0 ⇐⇒ E gf = E gf E ff † E ff . (cid:50) Proof
Let x ∈ H and write x = (cid:80) j ∈ N (cid:104) x , x j (cid:105) x j + (cid:80) α ∈ A \ A + (cid:104) x , x α (cid:105) x α . We know that E ff † E ff x = (cid:80) j ∈ N (cid:104) x , x j (cid:105) x j . Therefore ( I − E ff † E ff ) x = (cid:80) α ∈ A \ A + (cid:104) x , x α (cid:105) x α ∈ E ff − { } for all x ∈ H from which it follows that E gf ( I − E ff † E ff ) = 0 . (cid:50) Solution of the general estimation problem
Let us return to the original problem. Let f ∈ L (Ω , H ) and g ∈ L (Ω , K ) be random functions withzero means. We wish to find a closed, densely defined, linear operator X : D ( X ) ⊆ H → K , a proximateobservable function f (cid:15) for each (cid:15) > , with E [ (cid:107) f (cid:15) − f (cid:107) ] < (cid:15) and f (cid:15) ( ω ) ∈ D ( X ) for µ -almost all ω ∈ Ω ,and a corresponding estimate (cid:98) g (cid:15) = X f (cid:15) such that the mean square error E [ (cid:107) X f (cid:15) − g (cid:107) ] is minimized.Suppose f ( ω ) ∈ D ( E ff † ) for µ -almost all ω ∈ Ω and let X : D ( E ff † ) ⊆ H → K be defined by X = T E ff † for some T ∈ B ( H, K ) . Take f (cid:15) = f and let r = X f − g = T E ff † f − g = k − g . Now E [ (cid:107) r (cid:107) ] = tr ( E rr )= tr ( E kk − E gk − E kg + E gg )= tr ( T E ff † T ∗ − E gf E ff † T ∗ − T E ff † E fg + E gg )= tr (( T − E gf ) E ff † E ff E ff † ( T ∗ − E fg )) + tr ( E gg − E gf E ff † E fg )= tr ( E vv ) + tr ( E gg − E gf E ff † E fg ) where we have written v = ( T − E gf ) E ff † f ∈ L (Ω , K ) . Therefore E [ (cid:107) r (cid:107) ] = E [ (cid:107) v (cid:107) ] + tr ( E gg − E gf E ff † E fg )= E [ (cid:107) ( T − E gf ) E ff † f (cid:107) ] + tr ( E gg − E gf E ff † E fg ) . Thus the minimum occurs when ( T − E gf ) E ff † f ( ω ) = for µ -almost all ω ∈ Ω . Hence we choose T = E gf + B ( I − E ff E ff † ) where B ∈ B ( H, K ) is arbitrary. Therefore X = E gf E ff † . The minimumvalue of the expected mean-square error is E [ (cid:107) E gf E ff † f − g (cid:107) ] = tr ( E gg − E gf E ff † E fg ) . Since X = E gf E ff † we may assume D ( X ) = D ( E ff † ) . Therefore X is closed and densely defined.Now suppose there is a set S with µ ( S ) > and f ( ω ) / ∈ D ( E ff † ) for ω ∈ S . Let { x j } j ∈ N be a completeset of orthonormal eigenvectors for E ff in E ff − ( { } ) ⊥ . Let n ∈ N and define a proximate observablefunction p = f n by setting f n ( ω ) = (cid:80) nj =1 (cid:104) x j , f ( ω ) (cid:105) x j for each ω ∈ Ω . Thus (cid:104) x j , p ( ω ) (cid:105) = (cid:26) (cid:104) x j , f ( ω ) (cid:105) for j ≤ n otherwise . Since E ff x j = λ j x j it follows that E ff ( (cid:80) nj =1 λ j − (cid:104) x j , f ( ω ) (cid:105) x j ) = (cid:80) nj =1 (cid:104) x j , f ( ω (cid:105) x j = p ( ω ) and so p ( ω ) ∈ E ff ( H ) ⊆ D ( E ff † ) for all ω ∈ Ω . Therefore the corresponding optimal estimate using p = f n rather than f is given by (cid:98) g n = E gp E pp † p with error tr ( E gg − E gp E pp † E pg ) . Now, for j ≤ n , wehave E ff x j = (cid:82) Ω (cid:104) x j , f ( ω ) (cid:105) x j µ ( d ω ) = (cid:82) Ω (cid:104) x j , p ( ω ) (cid:105) x j µ ( d ω ) = E pp x j . Therefore E pp x j = λ j x j for each j ≤ n and so E pp † x j = λ j − x j = E ff † x j . Now E gp x j = (cid:82) Ω (cid:104) x j , p ( ω ) (cid:105) g ( ω ) µ ( d ω ) = (cid:82) Ω (cid:104) x j , f ( ω ) (cid:105) g ( ω ) µ ( d ω ) = E gf x j for j ≤ n . It follows, by linearity, that E gp E pp † p ( ω ) = E gp E ff † p ( ω )= (cid:80) nj =1 (cid:104) x j , f ( ω ) (cid:105) E gp E ff † x j = (cid:80) nj =1 λ j − (cid:104) x j , f ( ω ) (cid:105) E gp x j = (cid:80) nj =1 λ j − (cid:104) x j , f ( ω ) (cid:105) E gf x j = E gf ( (cid:80) nj =1 λ j − (cid:104) x j , f ( ω ) (cid:105) x j ) = E gf E ff † p ( ω ) ω ∈ Ω . Since p = f n the corresponding optimal estimate can now be written as (cid:98) g n ( ω ) = E gf E ff † f n ( ω ) for all ω ∈ Ω . Thus we may take X = E gf E ff † as before. The only difference is that we re-place f by f n for some suitably large value of n ∈ N . Note that E [ (cid:107) f n − f (cid:107) ] = (cid:80) ∞ j = n +1 (cid:82) Ω |(cid:104) x j , f ( ω ) (cid:105)| µ ( d ω ) → as n → ∞ .
10 A practical solution procedure
In practice we may be restricted to observing a projected component p ( ω ) = P · f ( ω ) of the outcome f ( ω ) where P ∈ B ( H ) is an orthogonal projection onto a closed subspace M = P ( H ) ⊆ H . We wouldlike to relate the restricted optimal estimate to the true optimal estimate. Lemma 10.1
Let
H, K be Hilbert spaces and let P ∈ B ( H ) be an orthogonal projection onto the closedsubspace M = P ( H ) . Let f ∈ L (Ω , H ) and g ∈ L (Ω , K ) be zero-mean random functions with p ( ω ) = P · f ( ω ) ∈ M and q ( ω ) = ( I − P ) · f ( ω ) ∈ M ⊥ the respective observable and unobservable componentsof f ( ω ) for each ω ∈ Ω . If we define r = q − E qp E pp † p we can rewrite the equation XE ff = E gf where X = E gf E ff † : D ( E ff † ) ⊆ H → K in the form (cid:2) Y Z (cid:3) (cid:20) E pp E qp E rr (cid:21) = (cid:2) E gp E gr (cid:3) (19) where Y : D ( E ff † ) ∩ M ⊆ M → K and Z : D ( E ff † ) ∩ M ⊥ ⊆ M ⊥ → K are given by Y = ( E gp − E gr E rr † E qp ) E pp † and Z = E gr E rr † . The optimal estimate for g is (cid:98) g = E gp E pp † p + E gr E rr † r = (cid:98) g M + r M (20) where (cid:98) g M = E gg E pp † p is the restricted optimal estimate. The components (cid:98) g M and r M are uncorrelatedand the error in the restricted estimate is E [ (cid:107) (cid:98) g M − g (cid:107) ] = E [ (cid:107) (cid:98) g − g (cid:107) ] + tr ( E gr E rr † E rg ) . (21) (cid:50) Proof
The equation XE ff = E gf is equivalent to the equation (cid:2) Y Z (cid:3) · (cid:20) E pp E pq E qp E qq (cid:21) (cid:20) I − E pp † E pq I (cid:21) = (cid:2) E gp E gq (cid:3) (cid:20) I − E pp † E pq I (cid:21) . If we evaluate the matrix products and use the identities E qp = E qp E pp † E pp and E rr = E qq − E qp E pp † E pq we obtain (19). Solving ZE rr = E gr gives Z = E gr E rr † and solving Y E pp E pq + ZE qp = E gp gives Y = E gp E pp † − ZE qp E pp † . Substituting for Z shows that Y = E gp E pp † − E gr E rr † E qp E pp † as required.Hence (cid:98) g = Y p + Z q = E gp E pp † p + E gr E rr † ( q − E qp E pp † p )= E gp E pp † p + E gr E rr † r which is (20). We note that E pr = E pq − E pp E pp † E pq = 0 which shows that the components (cid:98) g M and r M are uncorrelated. We know from the previous section that (cid:98) g = E gf E ff † f and so (20) gives E gf E ff † f = E gp E pp † p + E gr E rr † r . E gf E ff † E fg = E gp E pp † E pg + E gr E rr † E rg . Now we can use this relationship and the known error estimates E [ (cid:107) (cid:98) g − g (cid:107) ] = tr ( E gg − E gf E ff † E fg ) and E [ (cid:107) (cid:98) g M − g (cid:107) ] = tr ( E gg − E gp E pp † E pg ) to deduce (21). (cid:50)
11 A hypothetical example
The functions ϕ, ψ : ( − π, π ) → R defined by ϕ ( t ) = π sgn ( t ) / and ψ ( t ) = t/ can be represented by theFourier series ϕ ( t ) ∼ (cid:80) k ∈ N − k sin kt and ψ ( t ) ∼ (cid:80) j ∈ N ( − j +1 j sin jt. Equivalently we may represent these functions as elements of the Hilbert space (cid:96) by the vectors ϕ ∼ / / ... and ψ ∼ − / / − / / ... . Define a hypothetical experiment with outcomes ω = { ω j } j ∈ N ∈ (cid:96) ∞ where the coordinates ω j ∈ R foreach j ∈ N are independent identically distributed random variables with cumulative distribution function F : [ − , → [0 , defined by F ( t ) = t/ / . Let f , g ∈ L ( (cid:96) ∞ , (cid:96) ) be random functions with f = ω + ω ω + ω ) / ω + ω ) / ... and g = ω − ω / ω / − ω / ω / ... . The self-adjoint operator E ff can be represented by an infinite matrix E ff = [ ff ij ] where ff = 2 / , ff = 1 / , ff k,k − = 1 / [3 k ( k − , ff k,k = 2 / [3 k ] , and ff k,k +2 = 1 / [3 k ( k + 2)] for each k ∈ N + 1 , and ff ij = 0 otherwise. The operator E gf can be represented by an infinite matrix E gf = [ gf ij ] where gf = 1 / , gf j, j − = ( − j − / [3 j (2 j − and gf j, j − = ( − j − / [3 j (2 j − for all j ∈ N + 1 , and gf ij = 0 otherwise. Despite the structural simplicity of E ff it is a non-trivial task tocalculate E ff † . We can use elementary row operations to reduce the operator matrix to upper triangular18orm U ff = / / · · · · · · /
18 0 1 /
45 0 0 0 0 · · · · · · /
225 0 1 /
105 0 0 · · · · · · /
588 0 1 / · · · · · · / · · · ... ... ... ... ... ... ... ... ... . . . but a general formula for the elements on the leading diagonal is far from obvious. We can gain someinsight into the general calculation if we write E ff = c c ∗ + c c ∗ + c c ∗ + · · · where we define c j − = c j − , j − e j − + c j +1 , j − e j +1 for each j ∈ N and { e k } k ∈ N are the standardbasis vectors. If we now equate coefficients we can see that c = ff , c c = ff , c + c = ff , c c = ff , c + c = ff , c c = ff , c + c = ff , . . . and so on. Solving these equations gives c = 2 / , c = 1 / , c = 1 / , c = 2 / , c = 4 / , c = 1 / , c = 5 / , . . . and so on.This suggests that the process actually defines the diagonal elements of the reduced matrix. It turns outthat it also defines the elementary row operations. The coefficients c k +1 , k +1 and c k +1 , k − are defined bythe recursions c k +1 , k +12 = ff k +1 , k +1 − ff k +1 , k − c k − , k − and c k +1 , k − = ff k +1 , k − c k − , k − for each k ∈ N + 1 with c = 2 / and c = 1 / . If we define a sequence of lower triangular elementaryoperator matrices L k = [ l k,ij ] by setting l k,ii = 1 , l k, k +1 , k − = ( − c k +1 , k − for each k ∈ N and l k,ij = 0 otherwise, then we have L k − · · · L L · E ff · L ∗ L ∗ · · · L k − ∗ = (cid:20) D ff , [1 , k ] E ff , [2 k +1 , ∞ ) (cid:21) where D ff , [1 , k ] = [ d ij ] ∈ C k × is a diagonal matrix with d (cid:96) − , (cid:96) − = c (cid:96) − , (cid:96) − for (cid:96) ∈ N and d ij = 0 otherwise, and where E ff , [2 k +1 , ∞ ) denotes the operator matrix formed by deletingthe first k rows and columns from E ff . If we define M k = L k − then it can be seen that E ff † = M M · · · M k − (cid:20) D ff , [1 , k ] † E ff , [2 k +1 , ∞ ) † (cid:21) M k − ∗ · · · M ∗ M ∗ = (cid:20) E ff , [1 , k ] † E ff , [2 k +1 , ∞ ) † (cid:21) for each k ∈ N . We know from the operator matrix representation of E ff that the trace is given bytr ( E ff ) = 2 / (cid:2) + 1 / + 1 / + · · · (cid:3) = π / < ∞ . E ff is a nuclear operator and hence E ff † is closed and unbounded. Some elementary algebrausing M atlab now suggests that we can represent the generalized inverse operator E ff † in infinite matrixform as E ff † = − − · · · · · ·− −
90 0 126 · · · · · ·
15 0 −
90 0 225 0 − · · · · · ·−
21 0 126 0 −
315 0 588 · · · ... ... ... ... ... ... ... . . . where ff k − , (cid:96) − † = ( − k +2 (cid:96) − · min { k, (cid:96) } · k − (cid:96) − for k, (cid:96) ∈ N with ff ij † = 0 otherwise. Now the matrix representation for X = E gf E ff † is given by X = [ x ij ] = − − · · · − − · · · − · · · − · · · · · · ... ... ... ... ... ... ... . . . where x = 1 with x i, (cid:96) − = ( − (cid:96) − · (2 (cid:96) − /i for i < (cid:96) − and i, (cid:96) ∈ N + 1 and x ij = 0 otherwise. The matrix representation for X : D ( E ff † ) → K shows that it is unbounded and so we must be careful when calculating images for elements that are notin D ( E ff † ) . Consider the calculation (cid:98) g = X f . Define f n − = n − (cid:88) k =1 ( ω k + ω k +1 ) e k − / (2 k −
1) + ω n e n − / (2 n − for each n ∈ N . Thus E [ (cid:107) f n − − f (cid:107) ] = (1 / / (2 n − + (2 / ∞ (cid:88) k = n +1 / (2 k − → as n → ∞ . Now (cid:98) g n − = X f n − is given by (cid:98) g n − = (cid:80) n − k =1 ( − k − ( ω k + ω k +1 ) + ( − n − ω n (cid:80) n − k =2 ( − k − ω k + ω k +1 )2 + ( − n − ω n (cid:80) n − k =3 ( − k − ω k + ω k +1 )3 + ( − n − ω n ... ( − n − ω n − + ω n )( n − + ( − n − ω n ( n − ( − n − ω n n ... = ω − ω ω ... ( − n − ω n − n − ( − n − ω n n ... m ∈ N . This shows that E [ (cid:107) (cid:98) g n − − g (cid:107) ] = (2 / / ( n + 1) + 1 / ( n + 2) + · · · ) → as n → ∞ and so (cid:98) g = g . For k = 8 we have E gf ≈ E gf , [1 , × [1 , ∈ C × and E ff † ≈ E ff , [1 , † ∈ C × which gives X ≈ E gf , [1 , × [1 , E ff , [1 , † = − · · · −
15 00 0 − · · · −
00 0 0 0 · · · − ... ... ... ... ... . . . ... ... · · · − . We performed ten trials. The random function pairs ( f , (cid:98) g ) = ( f , X f ) for trials , , and are shown inF igure f b g ⇡ ⇡ ⇡ ⇡ b gf ⇡ ⇡ ⇡ ⇡ b gf ⇡ ⇡ ⇡ ⇡ b gf ⇡ ⇡ ⇡ ⇡ Figure 1: The random function pairs ( f , (cid:98) g ) = ( f , X f ) for trials (top left), (top right), (bottom left)and (bottom right) showing a typical range of outcomes. There is no estimation error in this exampleand so (cid:98) g = g in each of these trials.The trials used uniformly distributed pseudo-random numbers on [ − , generated in M atlab . The resultsof trials , , and show a typical range of outcomes. The corresponding pseudo-random numbers were ω = [0 . , . , − . , . , . , − . , . , − . ω = [0 . , . , . , − . , . , − . , . , − . ω = [ − . , − . , − . , . , . , − . , . , − . ω = [ − . , − . , . , . , − . , − . , − . , . .
21n this example it is easy to check thattr (cid:0) E gg , [1 , − E gf , [1 , × [1 , E ff , [1 , † E fg , [1 , × [1 , (cid:1) = 0 and hence there is no estimation error. We can explain this by noting that f contains complete informationabout the outcome ω and that we have used known theoretical information to construct the key matrices E ff and E gf . In addition there are no observation errors in our model. In practice f may not containcomplete information about the outcome, the observed values of f ( ω ) will normally contain measurementerrors, and the key matrices will likely be estimated from experimental data obtained under laboratoryconditions where both f ( ω ) and g ( ω ) can be observed.
12 Conclusions and future research
We have shown that the optimal least squares linear filter can be extended to estimation of randomfunctions with values in infinite-dimensional Hilbert spaces. In particular we have shown that in thoseinstances where the generalized inverse auto-covariance is an unbounded linear operator it is neverthelessclosed and densely defined. Our future research will consider applications to signal processing and possibleapplications to the inversion of linear operator pencils where the resolvent operator has an isolated essentialsingularity at the origin [2]. These operators may arise in input retrieval problems for infinite-dimensionallinear control systems [4, Section 8.4.1, pp 261–262] or in the solution of infinite systems of ordinarydifferential equations [1, Section 8].
References [1] Amie Albrecht, Phil Howlett, Geetika Verma, “The fundamental equations for the generalized resolventof an elementary pencil in a unital Banach algebra",
Linear Algebra and its Applications , , 2019,216–251. https://doi.org/10.1016/j.laa.2019.03.032.[2] Amie Albrecht, Phil Howlett, Geetika Verma, “Inversion of operator pencils on Banach space usingJordan chains when the generalized resolvent has an isolated essential singularity", Linear Algebraand its Applications , , 2020, 33–62. https://doi.org/10 1016/j.laa.2020.02.030.[3] M Ash, “Neither a Worst Convergent Series nor a Best Divergent Series Exists", The College Mathe-matics Journal , , 4, 1997, 296–297. https://doi.org/10.1080/07468342.1997.11973879.[4] Konstantin E. Avrachenkov, Jerzy A. Filar, Phil G. Howlett, Analytic Perturbation Theory and ItsApplications , SIAM, Philadelphia, 2013, OT .[5] A V Balakrishnan,
Applied functional analysis , Applications of Mathematics , Springer, New York,1976.[6] S.L. Cotter, M. Dashti, J.C. Robinson and A.M. Stuart, “Bayesian inverse problems for func-tions and applications to fluid mechanics", Inverse Problems , , 2009, 115008. doi: 10.1088/0266-5611/25/11/115008.[7] N Dunford and J T Schwartz, Linear operators, Part1, General theory , Wiley, NewYork, 1988.228] Heinz W Engl and M Z Nashed, “New Extremal Characterizations of Generalized Inversesof Linear Operators",
Journal of Mathematical Analysis and Applications , , 1981, 566–586.https://doi.org/10.1016/0022-247X(81)90217-1.[9] Vladimir N Fomin and Michael V Ruzhansky, “Abstract optimal linear filtering", SIAM Journal onControl and Optimization , , 5, 2000, 1334–1352. https://doi.org/10.1137/S036301299834778X.[10] P R Halmos, Measure theory , University Series in Higher Mathematics, 12th printing Van Nostrand,Princeton, 1968.[11] Simon Haykin, Adaptive Filter Theory, International Edition, 5th Edition, Pearson Higher Ed USA,2013.[12] P.G. Howlett, “Input retrieval in finite dimensional linear systems",
ANZIAM J. (formerly J. Austral.Math. Soc. Ser. B) , 1982, 357–382. https://doi.org/10.1017/S033427000000031X.[13] P G Howlett, C E M Pearce and A P Torokhti, “An optimal linear filter for randomsignals with realisations in a separable Hilbert space", ANZIAM J , , 2003, 485–500.https://doi.org/10.1017/S1446181100012888.[14] Y Hua and W Q Liu, “Generalized Karhunen-Loeve transform", IEEE Signal Process. Lett. , , 1998,141–142. https://doi.org/10.1109/97.681430.[15] M Z Nashed, “Inner, outer and generalized inverses in Banach and Hilbertspaces", Numerical Functional Analysis and Optimization , , (3-4), 1987, 261–325.https://doi.org/10.1080/01630568708816235.[16] A W Naylor and G R Sell, Linear Operator Theory in Engineering and Science , Applied MathematicalSciences , Springer-Verlag, New York, 1982.[17] H W Sorenson, Parameter estimation, principles and problems , Marcel Dekker, New York, 1980.[18] Y Yamashita and H Ogawa, “Relative Karhunen-Loeve transform",