Sub-ideal causal smoothing filters for real sequences
aa r X i v : . [ m a t h . O C ] J u l Sub-ideal causal smoothing filters for the real sequences
Nikolai Dokuchaev
Department of Mathematics & Statistics, Curtin University,GPO Box U1987, Perth, 6845 Western Australia
Submitted November 27, 2014; revised July 4, 2015
Abstract
The paper considers causal smoothing of the real sequences, i.e., discrete time processes in adeterministic setting. A family of causal linear time-invariant filters is suggested. These filters ap-proximate the gain decay for some non-causal ideal smoothing filters with transfer functions vanish-ing at a point of the unit circle and such that they transfer processes into predictable ones. In thissense, the suggested filters are near-ideal; a faster gain decay would lead to the loss of causality.Applications to predicting algorithms are discussed and illustrated by experiments with forecastingof autoregressions with the coefficients that are deemed to be untraceable.
Key words : smoothing filters, casual filters, predicting, near-ideal filters, LTI filters.
AMS 2010 classification : 42A38, 93E11, 93E10, 42B30
The paper studies causal smoothing of the discrete time processes. For many applications, it is preferableto replace a process by a more smooth process. In continuous time setting, smoothness is associated withpredictability. Smooth analytic functions are predictable, i.e., their values on any interval define uniquelytheir values outside of this interval, and an ideal low-pass filter converts a function into an analytic one.For discrete time processes, it is not obvious how to define an analog of the continuous time analyticityand smoothness. A classical approach is to consider predictability instead of analyticity. So far, thepredictability criterion for stochastic Gaussian stationary discrete time processes in the frequency do-main setting are given by the classical Szeg¨o-Kolmogorov Theorem. This theorem says that the optimal1rediction error is zero if Z π − π log φ (cid:0) e iω (cid:1) dω = −∞ , (1)where φ is the spectral density; see Kolmogorov [20], Szeg¨o [27, 28], Verblunsky [29], and more recentliterature reviews in [3, 26]. This means that a stationary Gaussian process is predictable if its spectraldensity is vanishing on a part of the unit circle { z ∈ C : | z | = 1 } , i.e., if the process is ”band-limited”in this sense. This result was expanded on more general stable stochastic processes allowing spectralrepresentations with spectral density via processes with independent increments; see, e.g., [7].The stochastic setting is the most common in causal smoothing and sampling; see, e.g., [1, 2, 4,5, 3, 8, 9, 12, 11, 13, 14, 15, 16, 18, 20, 25, 24]. A transition to the predicability of discrete timeprocesses in deterministic setting is non-trivial and related to the concept of the randomness for the realsequences in the pathwise setting without a probability measure. There are many classical works devotedto this important concept, starting from Mises [24], Church [6], Kolmogorov [20], Loveland [23]; seethe references in [21].It was found that real sequences are predictable if their Z-transform vanishes on an arc of the unitcircle [11] on the complex plane or at a point z = − of the unit circle [12]. Therefore, smoothing canbe interpreted as reduction of the energy on the higher frequencies. In particular, an ideal low-pass filteris a smoothing filter. This filter is non-causal, i.e., it requires the future value of the process. Similarly, afilter with too high rate of decay of the frequency response at a certain point of the unit circle also cannotbe causal, since causality is inconsistent with predictability of outputs described in [12].The present paper readdresses the problem of causal smoothing of the discrete time processes inthe deterministic pathwise setting, without probabilistic assumptions. We suggest a family of causalsmoothing filters that can be arbitrarily close to some ideal non-causal smoothing filters defined byequation (2) below. The suggested filters are near-ideal in the sense that they ensure ”almost” ideal rateof damping the energy at the point z = − ; a faster decay of the frequency response is impossible forcausal filters. This follows from predictability criterion [11]. In fact, the particular reference familyof non-causal ideal filters (2) was selected because these filters transfer non-predictable processes intopredictable ones satisfying the criterion from [12]. Similar approach was used in [10] for the continuoustime setting.The suggested near-ideal filters are discrete time causal linear time-invariant filters (LTI filters); they2re represented as convolution integrals over the historical data and approximate the real unity uniformlyon an arbitrarily large part of the unit circle.It appears that these causal filters can be used to improve the performance of predictors suggested in[12]. These are robust with respect to the noise contamination, and the error caused by a high-frequencynoise depends on the intensity of this noise. A near-ideal smoothing filter cannot remove the high-frequency noise entirely but still can reduce it. This approach is discussed in Section 4, where we presentsome numerical experiments with the suggested near-ideal filters applied to forecasting of autoregres-sions in a setting where the autoregression coefficients are deemed to be untraceable. Some definitions and notations
We denote by Z the set of all integers.For r ∈ [1 , + ∞ ] , we denote by ℓ r the set of all sequences x = { x ( t ) } t ∈ Z ⊂ R , such that k x k ℓ r = (cid:0)P ∞ t = −∞ | x ( t ) | r (cid:1) /r < + ∞ for r ∈ [1 , ∞ ) or k x k ℓ ∞ = sup t | x ( t ) | < + ∞ for r = + ∞ . We denote by ℓ + r the set of all sequences x ∈ ℓ r such that x ( t ) = 0 for t < .We denote by L r ( − π, π ) the usual Banach space of complex valued L r -integrable functions x :[ − π, π ] → C .Let D c ∆ = { z ∈ C : | z | > } , and let T = { z ∈ C : | z | = 1 } .For x ∈ ℓ or x ∈ ℓ , we denote by X = Z x the Z-transform X ( z ) = ∞ X t = −∞ x ( t ) z − t , z ∈ C . Respectively, the inverse Z-transform x = Z − X is defined as x ( t ) = 12 π Z π − π X (cid:0) e iω (cid:1) e iωt dω, t = 0 , ± , ± , .... If x ∈ ℓ , then X | T is defined as an element of L ( T ) , i.e., X (cid:0) e iω (cid:1) ∈ L ( − π, π ) . If X (cid:0) e iω (cid:1) ∈ L ( − π, π ) , then x = Z − X is defined as an element of ℓ ∞ .Let H ( D c ) be the Hardy space of functions that are holomorphic on D c including the point atinfinity with finite norm k h k H ( D c ) = sup ρ> k h ( ρe iω )) k L ( − π,π ) . Note that Z-transform defines abijection between the sequences from ℓ +2 and the restrictions (i.e., traces) X | T of the functions from H ( D c ) such that X ( e iω ) = X (cid:0) e − iω (cid:1) for ω ∈ R ; see, e.g., [22], Section 4.3. If X (cid:0) e iω (cid:1) ∈ L ( − π, π ) and X ( e iω ) = X (cid:0) e − iω (cid:1) , then x = Z − X is defined as an element of ℓ + ∞ .3 Problem setting
Let x ( t ) be a discrete time process, t ∈ Z . The output of a linear filter is the process y ( t ) = ∞ X s = −∞ h ( t − s ) x ( s ) , where h : Z → R is a given impulse response function.If h ( t ) = 0 for t < , then the output of the corresponding filter is y ( t ) = t X s = −∞ h ( t − s ) x ( s ) . In this case, the filter and the impulse response function are said to be causal. The output of a causal filterat time t can be calculated using only past historical values x ( s ) | s ≤ t of the currently observable inputprocess.The goal is to approximate x by a ”smooth” filtered process y via selection of an appropriate causalimpulse response function h .We are looking for families of the causal smoothing impulse response functions h satisfying thefollowing conditions.(A) The outputs y approximate inputs x ; an arbitrarily close approximation can be achieved via selec-tion of a filter from this family.(B) The spectrum of the output y vanishes on higher frequencies.(C) The effectiveness of the damping on the energy on the higher frequencies approximates the ef-fectiveness of some reference family of non-causal smoothing filters that transfer processes intopredictable ones, i.e., such that the future values are uniquely defined by the past values.Note that it is not a trivial task to satisfy Conditions (A)-(C) simultaneously. For example, there aresets of ideal low-pass filters such that the distance of these sets from the set of all causal filters is zero.In [2], this was shown for the set of low-pass filters with increasing pass interval [ − ∆ , ∆] , where ∆ ∈ (0 , π ) . However, Condition (B) is not satisfied for the corresponding causal approximations. Moreover,all known causal approximations of ideal filters do not feature zero values for the transfer functions. Thepresent paper suggests transfer functions vanishing at z = − together with their derivatives.4 he targeted properties of the near ideal filters Our purpose is to construct a family of causal filters such that the Conditions (A)–(C) are satisfied. Wewill be using a reference family of ”ideal” smoothing filters with the frequency response M θ,q (cid:0) e iω (cid:1) = exp (cid:18) − θ | e iω | q (cid:19) , (2)where q > and θ > are parameters. For these filters, Condition (A) is satisfied as θ → , andConditions (B) is satisfied for all θ > . However, these filters are non-causal because, for any x ∈ ℓ ,the values x ( t + 1) of the output processes of these filters are weakly predictable at time t [12]. This issince M θ,q (cid:0) e iω (cid:1) → fast enough as ω → − π .For a given integer m ≥ , we will construct a family of causal filters with impulse responses h a ∈ ℓ + ∞ and with the corresponding Z-transforms H a = Z h a , where a ∈ (0 , is a parameter. For thisfamily, the following more special Conditions (a)-(c) will be satisfied (the number m is used in Condition(b1) below).(a) Approximation of the identity operator: • (a1) sup ω ∈ [0 ,π ] ,a | H a (cid:0) e iω (cid:1) | < + ∞ . • (a2) For any Ω > , H a (cid:0) e iω (cid:1) → as a → − uniformly in ω ∈ [ − Ω , Ω] . • (a3) For any X (cid:0) e iω (cid:1) ∈ L ( − π, π ) and any x = Z − X , k y a ( · ) − x ( · ) k ℓ ∞ → as a → − , where y a is the output process y a ( t ) = t X τ = −∞ h a ( t − τ ) x ( τ ) . (b) The spectrum is vanishing at a point at T : • (b1) For all a , H a (cid:0) e iω (cid:1) is m times differentiable at ω = π , and H a ( −
1) = 0 , d k H a dω k (cid:0) e iω (cid:1) (cid:12)(cid:12)(cid:12) ω = π = 0 , k = 1 , ..., m. (b2) For any ε > , there exists δ > and a ∈ (0 , such that sup ω ∈ [ π − δ,π + δ ] | H a (cid:0) e iω (cid:1) | < ε. (c) Approximation of non-causal filters (2) with respect to the effectiveness in damping:
For any ε > and any Ω ∈ (0 , π ) , Ω ∈ (Ω , π ) , Ω ∈ (Ω , π ) there exists θ > , q > , a > such that | H a (cid:0) e iω (cid:1) − | ≤ ε, ω ∈ [ − Ω , Ω] , (3) | H a (cid:0) e iω (cid:1) | ≤ | M θ,q (cid:0) e iω (cid:1) | , ω ∈ [ − Ω , − Ω ] ∪ [Ω , Ω ] . (4)Conditions (a)-(c) represent particular versions of less specific Conditions (A)-(C). In particular,estimate (4) ensures that Condition (C) is satisfied. Let a real number p ∈ (1 / , and integers N ≥ and m ≥ be given. For the real numbers a ∈ (0 , ,we define transfer functions H a ( z ) = (cid:18) exp (1 − a ) p z + a + G a ( z ) (cid:19) m , z ∈ C , (5)where G a ( z ) = − ξ ( a, p ) + γ ( a, p ) N (cid:0) ( − N z − N − (cid:1) , and where ξ ( a, p ) = exp[ − (1 − a ) p − ] , γ ( a, p ) = | − a | p − ξ ( a, p ) . We consider the set { H a } a ∈ (0 , of transfer functions (5) with a fixed triplet ( m, N, p ) . Theorem 1
Conditions (a)-(c) are satisfied for the family of filters defined by the transfer functions { H a } a ∈ (0 , . (Therefore, Conditions (A)-(C) are satisfied for this family).Proof of Theorem 1. Let us assume first that m = 1 .Clearly, the functions H a are holomorphic in D c and bounded in D c ∪ T for any a ∈ (0 , . Hencethe inverse Z-transforms h a = Z − H a are causal impulse responses, i.e., h a ( t ) = 0 for t < ; see, e.g.,[22], Theorem 4.3.2. 6et f ( a ) = (1 − a ) p and Ψ a ( z ) = f ( a )( z + a ) − . By the definitions, H a ( z ) = exp Ψ a ( z ) + G a ( z ) ,and Ψ a (cid:0) e iω (cid:1) = f ( a ) cos( ω ) + a − i sin( ω )(cos( ω ) + a ) + sin( ω ) . Let us prove that Condition (a) holds.Clearly, | G a (cid:0) e iω (cid:1) | → as a → − uniformly in ω ∈ ( − π, π ] .Let ω a ∈ ( π/ , π ) be such that cos( ω a )+ a = 0 . We have that Re Ψ a (cid:0) e iω (cid:1) > for all ω ∈ [ − ω a , ω a ] and Re Ψ a (cid:0) e iω (cid:1) < for all ω ∈ [ − π, ω a ) ∪ ( − ω a , π ] .Further, we have that inf ω ∈ [ − ω a ,ω a ] | e iω + a | ≥ p − a . Hence sup ω ∈ [ − ω a ,ω a ] | Ψ a (cid:0) e iω (cid:1) | ≤ f ( a ) √ − a = (1 − a ) p − / (1 + a ) / ≤ . Therefore, the value | H a (cid:0) e iω (cid:1) | is uniformly bounded in a, ω . Hence Condition (a1) holds.Further, we have that ω a → π − as a → . Hence, for any Ω ∈ [0 , π ) , we have that sup ω ∈ [ − Ω , Ω] | Ψ a (cid:0) e iω (cid:1) | → as a → and sup ω ∈ [ − Ω , Ω] | H a (cid:0) e iω (cid:1) − | → as a → . Hence Condition (a2) holds.Let as show that Condition (a3) holds. Let Y a = H a X . By Condition (a2), Y a (cid:0) e iω (cid:1) → X (cid:0) e iω (cid:1) as a → − for all ω ∈ R . Clearly, there exists a ∈ (0 , and c > such that sup ω,a ≥ a | H a (cid:0) e iω (cid:1) | ≤ c . Hence | Y a (cid:0) e iω (cid:1) − X (cid:0) e iω (cid:1) | ≤ ( c + 1) | X (cid:0) e iω (cid:1) | . By the assumptions, X (cid:0) e iω (cid:1) = Z x ∈ L ( − π, π ) .By the Lebesgue Dominance Theorem, it follows that (cid:13)(cid:13) Y a (cid:0) e iω (cid:1) − X (cid:0) e iω (cid:1)(cid:13)(cid:13) L ( − π,π ) → as a → − . exp (cid:0) Ψ a (cid:0) e iπ (cid:1)(cid:1) = exp (cid:18) f ( a ) − a (1 − a ) (cid:19) = exp (cid:18) − (1 − a ) p − a (cid:19) = ξ ( a, p ) = − G a ( − . Hence H a ( −
1) = exp (Ψ a ( − G a ( −
1) = 0 .Let us show that dH a dω (cid:0) e iω (cid:1) (cid:12)(cid:12)(cid:12) ω = π = 0 . Let r ( ω ) = Re exp (cid:0) Ψ a (cid:0) e iω (cid:1)(cid:1) , s ( ω ) = Im exp (cid:0) Ψ a (cid:0) e iω (cid:1)(cid:1) ,q ( ω ) = 1 N Im (cid:16) e − iN ( ω − π ) − (cid:17) . Clearly, the function Ψ a (cid:0) e iω (cid:1) is differentiable in ω ∈ R for any a , as well as functions r ( ω ) , s ( ω ) ,and q ( ω ) . In addition, we have that r ( ω ) = r ( π − ω ) . Hence r ( ω ) is even about the point ω = π anddifferentiable. This implies that drdω ( ω ) (cid:12)(cid:12)(cid:12) ω = π = 0 . By the definitions, s ( ω ) = exp(Re Ψ a ) sin(Im Re Ψ a ) and exp(Re Ψ a ) (cid:0) e iω (cid:1) → ξ ( a, p ) as ω → π . Wehave that s ( π ) = q ( π ) = 0 . The L’H ˆopital’s rule gives that lim ω → π ds ( ω ) dωdq ( ω ) dω = lim ω → π s ( ω ) q ( ω ) = lim ω → π ξ ( a, p ) sin (cid:16) − (1 − a ) p sin( ω )( a +cos( ω )) +sin( ω ) (cid:17) N sin( N ( ω − π ))= − ξ ( a, p ) (1 − a ) p ( a − = − γ ( a, p ) . Clearly, dq ( ω ) dω | ω = π = − . Hence ddω exp Ψ a (cid:0) e iω (cid:1) (cid:12)(cid:12)(cid:12) ω = π = i ds ( ω ) dω | ω = π = iγ ( a, p ) . On the other hand, dG a (cid:0) e iω (cid:1) dω (cid:12)(cid:12)(cid:12) ω = π = ddω (cid:18) − ξ ( a, p ) + γ ( a, p ) N (cid:16) e − iN ( ω − π ) − (cid:17)(cid:19) (cid:12)(cid:12)(cid:12) ω = π = − iγ ( a, p ) . Hence ddω H a (cid:0) e iπ (cid:1) (cid:12)(cid:12)(cid:12) ω = π = ddω exp (cid:0) Ψ a (cid:0) e iπ (cid:1)(cid:1) (cid:12)(cid:12)(cid:12) ω = π + dG a (cid:0) e iω (cid:1) dω (cid:12)(cid:12)(cid:12) ω = π = 0 . Re Ψ a (cid:0) e iω (cid:1) = f ( a )(cos( ω ) + a )(cos( ω ) + a ) + sin( ω ) = f ( a ) Re ( e iω + a ) | e iω + a | . We have that − Re ( e iω + a ) / | e iω + a | is non-decreasing in ω ∈ [ ω a , π ] and converges to as ω → π − ,and / | e iω + a | is non-decreasing in ω ∈ [ ω a , π ] and converges to (1 − a ) − as ω → π − . Hencethe product of these functions, − f ( a ) − Re Ψ a (cid:0) e iω (cid:1) , is non-decreasing in ω ∈ [ ω a , π ] . Hence we canselect b ω a ∈ [ ω a , π ] such that − Re Ψ a (cid:0) e iω (cid:1) ≥ − Re Ψ a (cid:0) e iπ (cid:1) / for all ω ∈ [ b ω a , π ] , i.e., Re Ψ a (cid:0) e iω (cid:1) ≤ Re Ψ a (cid:0) e iπ (cid:1) / for all ω ∈ [ b ω a , π ] . In addition, Re Ψ a (cid:0) e iω (cid:1) = Re Ψ a (cid:0) e − iω (cid:1) . Hence Re Ψ a (cid:0) e iω (cid:1) ≤ Re Ψ a (cid:0) e iπ (cid:1) / for all ω ∈ [ b ω a , π − b ω a ] = [ π − δ a , π + δ a ] , where δ a = π − b ω a .Further, we have that Re Ψ a (cid:0) e iπ (cid:1) = (1 − a ) p − a ( − a ) → −∞ as a → . For a given ε > , let us select ¯ a such that Re Ψ a (cid:0) e iπ (cid:1) / < log( ε/ for all a ≥ ¯ a . In addition, wecan select e a ≥ ¯ a such that | G a (cid:0) e iω (cid:1) | ≤ ε/ for all a ≥ e a and all ω . Then Condition (b2) holds with a = e a and δ = δ e a selected for given ε . Therefore, Condition (b) holds.Let us show that Condition (c) holds. It follows from the proof of (a) above, that, for a given ε > and Ω , we can select ¯ a such that (3) holds for a ≥ ¯ a . Further, let q > be any. For any Ω > Ω and Ω > Ω , sup ω ∈ [ − Ω , − Ω ] ∪ [Ω , Ω ] | M θ,q (cid:0) e iω (cid:1) − | → as θ → . Clearly, (4) holds for small enough θ . Hence Condition (c) holds.We have proved the theorem for the case where m = 1 . The extension on the case where m > isstraightforward. This completes the proof of Theorem 1. (cid:3) Illustrative examples
Figures 1-3 shows examples of the frequency responses and the impulse functions for the filters describedabove.Figure 1 shows the shapes of gain curves | M θ,q (cid:0) e iω (cid:1) for reference non-causal filter (2) with θ = 0 . , q = 1 . , and | H a (cid:0) e iω (cid:1) | for near-ideal causal filters (5) with a = 0 . , p = 0 . , N = 50 , m = 2 .9igure 2 shows the shapes of error curves for approximation of identity operator on low frequencies.More precisely, it shows | M θ,q (cid:0) e iω (cid:1) − | for reference non-causal filter (2) and | H a (cid:0) e iω (cid:1) − | for near-ideal causal filters (5), with the same parameters as for Figure 1.Figure 3 shows an example of impulse response h = Z − H a calculated as the inverse Z-transformfor causal filter (5) with a = 0 . , p = 0 . , N = 10 , m = 1 . Since the properties of H a guarantee that Im h a ( t ) = 0 for all t and that h a ( t ) = 0 for all t < , we show the values for t ≥ only.In particular, these examples show that the impulse response functions h a can take negative values;i.e., these filters do not represent an averaging with a positive kernels. A possible application of suggested above filters is preliminary smoothing of the input signals for thepredicting algorithms. For this task, the causality is crucial. It is known that the band-limited sequencesare predictable, i.e., the sequences are predictable if with the spectrum vanishing on a interval in T . Inaddition, there are predictable sequences such that the spectrum is vanishing in a single point of T ; see[11, 12], where some predicable algorithms were suggested.It can be noted that, the suggested above filters do not change the input sequences significantly if a isclose to 1; the energy of the input is not damped on a given arc of T . In fact, the energy is damped on asmall neighborhood of the point e iπ = − , and the size of this neighborhood converges to zero as a → .Therefore, one cannot expect that the filters introduced above will help to improve the performance ofthe predicting algorithms [11] requiring that the spectrum is vanishing on a fixed arc on the unit circle.However, it appears that these filters can help to improve the performance of predictors [12] orientedon processes x with spectrum vanishing in a single point. More precisely, predictors [12] are applicablefor discrete time processes x such that, for some θ > , q > , c > , sup ω ∈ [ − π,π ] | X (cid:0) e iω (cid:1) | ≤ cM θ,q (cid:0) e iω (cid:1) , X = Z x. (6)In particular, it follows that filters (2) transfer sequences of a general type into predictable sequencessuch that (6) holds; respectively, filters (2) cannot be causal.The predicting kernel [12] was defined as k = k ( · , γ ) = Z − K , where K ( z ) ∆ = z (cid:18) − exp (cid:20) − γz + 1 − γ − r (cid:21)(cid:19) , (7)10nd where r > and γ > are parameters. This predictor produces the process y ( t ) = t X d = −∞ k ( t − d ) x ( t ) approximating x ( t + 1) for γ → + ∞ for all inputs x satisfying (6) with q > /r . (In the notationsfrom [12], r = 2 µ/ ( q − , where µ > , q > are the parameters ). The function K (cid:0) e iω (cid:1) approximatesthe function e iω representing the forward one-step shift in the time domain; the value | K (cid:0) e iω (cid:1) − e iω | issmall everywhere but in a small neighborhood of ω = π . Therefore, the process y ( t ) represents an onestep prediction of x ( t + 1) if X (cid:0) e iω (cid:1) vanishes with a certain rate at ω = − π . It was shown in [12] that sup t | x ( t + 1) − y ( t ) | → as γ → , for real sequences x such that (6) holds, i.e., that the prediction error vanishes as γ → + ∞ . Moreover,the error vanishes uniformly over classes of processes x from some bounded sets from ℓ ∞ , such that (6)holds with a given c .Predictors (7) are robust with respect to some small noise contamination, meaning that the predictionerror depends continuously on the intensity of the contaminating noise. However, for large γ , the valuesof K (cid:0) e iω (cid:1) can be very large in a neighborhood of ω = π ; in this case, the error can be large even for asmall noise.We suggest to apply filter (5) to compensate the presence of large values of K (cid:0) e iω (cid:1) in a smallneighborhood of ω = π and therefore to reduce the impact of the presence of the high-frequency noise.This is illustrated by Figure 4 showing the shapes of error curves for approximation of the forward onestep shift operator. More precisely, it shows the shape of | K (cid:0) e iω (cid:1) − e iω | for the predictor (7) and theshape of | K (cid:0) e iω (cid:1) H a (cid:0) e iω (cid:1) − e iω | for the transfer functions (5) and (7), which corresponds to preliminarysmoothing of the input process by filters (5). These shapes characterize imperfection of the predictors,since the transfer function e iω corresponds to the one-step forward shift operator in time domain, i.e., e iω represents an ideal non-causal error-free one-step ahead predictor. It appears that the application of thefilter improves the approximation of e iω .Our setting does not involve stochastic processes and probability measure; it is oriented on smoothingthe real sequences. However, to provide an example of the application of our smoothing filters, weconsidered a toy example with prediction of a stochastic Gaussian stationary process x ( t ) evolving as an11utoregression of AR(2) type x ( t ) = β x ( t −
1) + β x ( t −
2) + ση ( t ) , t ∈ Z . (8)Here η ( t ) is a stochastic discrete time Gaussian white noise, E η ( t ) = 0 , E η ( t ) = 1 . The coefficient σ > describes the intensity of the noise.For the estimation of the effectiveness of predictors, we use the ratio e ( b , b ) = (cid:0)P nt =1 | y ( t − − x ( t ) | (cid:1) / ( P nt =1 | b x ( t −
1) + b x ( t − − x ( t ) | ) / , (9)where b k ∈ R are parameters. The values y ( t − are supposed to be the predictions, at time t − ,of the future values x ( t ) . Since the sequence { x ( t ) } does not satisfy (6) due to the presence of noise, aforecasting error is inevitable. Ratio (9) allows to compare the error of a predicting algorithm generating y and the error generated by a linear predictor with the coefficients b and b . More precisely, the value e ( b , b ) represent the ratio of the error generated by the predictor producing y and the error generatedwith the error of the linear predictor based on the hypothesis that β = b and β = b .If the vector ( β , β ) is known, then the optimal one step predictor of x ( t ) is y ( t −
1) = β x ( t −
1) + β x ( t − . (10)In this case, the value n − P nt =1 | β x ( t −
1) + β x ( t − − x ( t ) | represents the sample mean of thesquared error of this optimal predictor with known values of ( β , β ) . Therefore, the optimal predictor(10) ensures that e ( β , β ) ≈ for a large enough n . Similarly, for any given n , an average value for e ( β , β ) is also close to one for a sufficiently large number of Monte-Carlo trials, for optimal predictor(10). Respectively, any other predictor besides (10), including predictor (7), cannot achieve a lesseraverage value of e ( β , β ) for a sufficiently large n or as an average value for a sufficiently large numberof Monte-Carlo trials.However, in many practical situations, the value of ( β , β ) is unknown, and, respectively, predictor(10) cannot be used. On the other hand, predictor (7) does not require to know ( β , β ) and can be appliedin models with unknown or random and time variable ( β , β ) where predictors (10) is not applicable. Inother words, predictor (7) can be applied for processes with unknown shape of the spectral representation.Therefore, it is reasonable to estimate the performance of a predictor using e ( b , b ) with b k = β k , forinstance, with b k selected as the expected value, or the median, or upper or lower boundaries of unknown β k . 12ince it is impossible to implement convolution with infinitely supported kernels and inputs, one hasto use truncated kernels and inputs for calculations. In the experiments described below, we replaced k and h a = Z − H a by the truncated kernels k d ( t ) = I { t ≤ d } k ( t ) , h a,d ( t ) = I { t ≤ d } h a ( t ) , d > . (11)In other words, the original k = Z − K and h a = Z − H a were used as some benchmarks; only theirtruncated versions were actually implemented.We will use value (9) to estimate the performance of predicting algorithms for the following twocases: • The algorithm is applied without filtering and produces y = k d ◦ x ; we denote by e K ( b , b ) thecorresponding values (9). • The algorithm is applied with filtering and produces y = ( k d ◦ h a,d ) ◦ x ; we denote by e KH ( b , b ) the corresponding values (9).In both cases, y ( t ) is calculated using historical data { x ( s ) } t − d ≤ s ≤ t .In our experiments, we used equations (5) and (7) with γ = 1 . , r = 1 . , a = 0 . , p = 0 . , N = 100 , m = 2 . (12)Note that selection of too large γ makes calculation of k challenging, since it involves precise integrationof fast growing K (cid:0) e iω (cid:1) . The choice of parameters in (12) ensures that the values of | K (cid:0) e iω (cid:1) | are notlarge. Figure 5 shows the corresponding impulse response Z − H a . Figure 6 shows the correspondingimpulse responses Z − K and Z − ( KH a ) .In our experiment with AR(2) process, we used 10,000 Monte-Carlo trials with n = d = 100 and σ = 0 . . For each trial, we selected ( β , β ) randomly and independently. The distribution of ( β , β ) ateach trial was the following: β has the uniform distribution on the interval (0 , , and β = ξ p − β ,where ξ is a random variable independent on β and uniformly distributed on the interval ( − , . Thischoice ensures that the eigenvalues of the autoregression stay inside of the unit circle D almost surely.We used MATLAB and standard personal computers; an experiment with 10,000 Monte-Carlo trialswould take about five minutes of calculation time. 13irst, we compared the relative performance of our predictors with respect to the performance of theoptimal predictor that requires that the values β and β are known. We obtained that the mean valueover all Monte-Carlo trials of e K ( β , β ) is 1.5019 and the mean value of e KH ( β , β ) is 1.1177. Thisindicates that application of filter (5) improves the performance of the predictor. As was mentionedabove, we cannot expect, for sufficiently many Monte-Carlo trials, the mean values of e K ( β , β ) and e KH ( β , β ) to be less than one, so the performance of the predictor (7) combined with filters (5) isreasonably good.Second, we calculated the mean values e K ( b , b ) and e KH ( b , b ) with b k = E β k , where E β k is thepopulation mean for β k , k = 1 , . For our parameters of the Monte-Carlo trials, we have E β = 0 . and E β = 0 . The corresponding values e K (0 . , and e KH (0 . , represent comparison of the performanceof our predictor (without or with preliminary filtering) with an one-step predictor of x ( t ) given by y ( t −
1) = ( E β ) x ( t −
1) + ( E β ) x ( t − . Note that this predictor requires to know the population means of β and β . We obtained that the meanvalue for e K (0 . , is . and the mean value for e KH (0 . , is . . These numbers indicate agood performance of our filter/predictor system, especially if we take into account that our system doesnot require to know the values E β k .Figure 7 shows a sample path of AR(2) process x ( t ) and a filtered process obtained using filter (5)with the parameters defined by (12). Figure 8 shows sample paths of AR(2) process x ( t ) and outputs y ( t ) of predictor [12] without preliminary filtering and with preliminary filtering using filter (5) with theparameters defined by (12). It shows the values y ( t − , i.e., predictions of x ( t ) , versus the values of x ( t ) .In addition, we considered a modification of process (8) with β ≡ , i.e., AR(1) process. We usedthe same predictors and filters as for the experiments with AR(2) process described above.We set again , Monte-Carlo trials with n = d = 100 , with σ = 0 . , and with randomlyselected β such that β was distributed uniformly on the interval (0 , . We compared the relativeperformance of our predictors with respect to the performance of the optimal predictor that requires thatthe value β is known. We obtained that the mean value of e K ( β , is 1.2830 and the mean value of e KH ( β , is 1.1023. The numbers indicate again that the use of the filter improves the performanceof the predictor. Again, the mean values of e K ( β , and e KH ( β , cannot be less than one, so the14erformance of the predictor (7) combined with filters (5) is reasonably good.Finally, we calculated the values e K ( b , and e KH ( b , with b = E β = 0 . for AR(1) processdefined by (8) with β ≡ . In other words, we compared the relative performance of the one steppredictor (5) with or without filtering with respect to the predictor y ( t ) = ( E β ) x ( t ) that requires toknown the population mean for β . We obtained that e K (0 . ,
0) = 1 . and e KH (0 . ,
0) = 1 . .This shows again that application of the filter reduces the forecasting error. Given that our system doesnot require to know E β , the performance is reasonably good.The results of these experiments appear to be consistent and stable with respect to the variations ofthe parameters. On the impact of truncation
Since it is impossible to implement convolution with infinitely supported kernels and inputs, we haveto run numerical calculations with truncated processes. Let us show that the described above filteringand forecasting are robust with respect to the truncation, i.e., that the truncation in (11) has a vanishingimpact for large d . Since H a (cid:0) e iω (cid:1) ∈ L ∞ ( − π, π ) , we have that h a ∈ ℓ . Hence h a,d ◦ x → h a ◦ x in ℓ ∞ as d → + ∞ for x ∈ ℓ ; in practice, only truncated inputs x ∈ ℓ are available. It can be also noted thatthe predicting kernel k = Z − K defined by (7) belongs to ℓ . Therefore, the kernel k ◦ h a,d convergesto k ◦ h a in ℓ ∞ as d → + ∞ .Our numerical experiments for autoregressions with the coefficients deemed to untraceable demon-strated that truncation with relatively small d = 100 does not diminish a good forecasting performance. The paper proposes a family of causal smoothing filters. These filters are near-ideal meaning that ahigher rate of damping of the energy on the high frequencies would lead to the loss of causality; this isbecause they approximate non-causal filters transferring non-predicable processes into predictable ones.A possible application is preliminary smoothing of the inputs for predicting algorithms. Certain mild butstable improvement of forecasting accuracy is demonstrated in experiments with simple autoregressionsin a setting where theirs coefficients are deemed to be untraceable.It could be interesting to investigate the computational limits of the algorithms described above, for15nstance, for long sequences, for higher order autoregressions, or for other types of input processes. Itcould be interesting to find other filters with similar properties. It could be useful to represent the filteringalgorithm in the terms of the discrete Fourier transform. We leave this for future work.
Acknowledgment
This work was supported by ARC grant of Australia DP120100928 to the author. In addition, the authorthanks the anonymous referees for their valuable remarks which helped to improve the paper.
ReferencesReferences [1] A. Aldroubi, and M. Unser. (1994). A general sampling theory for nonideal acquisition devices.
IEEE Trans. Signal Process. , No. 11 2915–2995.[2] J.M. Almira and A.E. Romero. (2008). How distant is the ideal filter of being a causal one? AtlanticElectronic Journal of Mathematics (1), 46–55.[3] N.H. Bingham. (2012). Szeg¨o’s theorem and its probabilistic descendants. Probability Surveys ,287-324.[4] F.G. Beutler. (1966). Error-free recovery of signals from irregularly spaced samples. SIAM Review (3), 328–335.[5] J.R. Brown Jr. (1969). Bounds for truncation error in sampling expansion of band-limited signals, IEEE Transactions on Information Theory (4), 440-444.[6] A. Church. (1940). On the Concept of Random Sequence. Bull.Amer.Math.Soc.
46, 254–260.[7] S. Cambanis, and A.R. Soltani. (1984). Prediction of stable processes: spectral and moving averagerepresentations.
Z. Wahrsch. Verw. Gebiete
66, no. 4, 593–612.[8] N. Dokuchaev. (2008). The predictability of band-limited, high-frequency, and mixed processesin the presence of ideal low-pass filters,
Journal of Physics A: Mathematical and Theoretical ,382002 (7pp). 169] N. Dokuchaev. (2010). Predictability on finite horizon for processes with exponential decrease ofenergy on higher frequencies, Signal processing (2), 696–701.[10] N. Dokuchaev. (2012). On sub-ideal causal smoothing filters. Signal Processing , iss. 1, 219-223.[11] N. Dokuchaev. (2012). On predictors for band-limited and high-frequency time series. Signal Pro-cessing , iss. 10, 2571-2575.[12] N. Dokuchaev (2012). Predictors for discrete time processes with energy decay on higher frequen-cies, IEEE Transactions on Signal Processing , No. 11, 6027-6030.[13] P. G. S. G. Ferreira (1994). Interpolation and the discrete Papoulis-Gerchberg algorithm. IEEETransactions on Signal Processing, (10), 2596–2606.[14] P. G. S. G. Ferreira. (1995a). Nonuniform sampling of nonbandlimited signals. IEEE Signal Pro-cessing Letters , Iss. 5, 89–91.[15] P. G. S. G. Ferreira. (1995b). Approximating non-band-limited functions by nonuniform samplingseries. In: SampTA’95, 1995 Workshop on Sampling Theory and Applications , 276–281.[16] P. G. S. G. Ferreira, A. Kempf, and M. J. C. S. Reis (2007). Construction of Aharonov-Berryssuperoscillations.
J. Phys. A, Math. Gen. , vol. 40, pp. 5141–5147.[17] M. Frank, and L. Klotz. (2002). A duality method in prediction theory of multivariate stationarysequences.
Math. Nachr.
Sampling Theory in Fourier and Signal Analysis.
Oxford University Press,NY.[19] A.N. Kolmogorov. (1941). Interpolation and extrapolation of stationary stochastic series.
Izv. Akad.Nauk SSSR Ser. Mat.,
Problemsof Information and Transmission , 1:1, 1–7.[21] M. Li, P.B.M. Vitanyi. (1993).
An introduction to Kolmogorov complexity and its applications ,Springer. 1722] A. Lindquist, G. Picci. (2015). Linear Stochastic Systems
A Geometric Approach to Modeling,Estimation and Identification.
Springer-Verlag, Berlin Heidelberg.[23] D.M. Loveland. (1966). A new interpretation of von Mises’ concept of random sequence.
Z. Math.Logik Grundlagen Math , 279–294.[24] R. von Mises. (1919). Grundlagen der Wahrscheinlichkeitsrechnung. Math. Z.
5, 52–99.[25] A.G. Miamee, and M. Pourahmadi. (1988). Best approximations in L p ( dµ ) and prediction problemsof Szeg¨o, Kolmogorov, Yaglom, and Nakazi. J. London Math. Soc.
38, no. 1, 133–145.[26] B. Simon. (2011). Szeg¨o’s Theorem and Its Descendants. Spectral Theory for L Perturbations ofOrthogonal Polynomials.
M.B. Porter Lectures.
Princeton University Press, Princeton.[27] G. Szeg¨o. (1920). Beitr¨age zur Theorie der Toeplitzschen Formen.
Math. Z.
6, 167–202.[28] G. Szeg¨o. (1921). Beitr¨age zur Theorie der Toeplitzschen Formen, II.
Math. Z.
9, 167-190.[29] S. Verblunsky. (1936). On positive harmonic functions (second paper).
Proc. London Math. Soc.
40, 290–320. 18 .1405 3.141 3.1415 3.142 3.142500.511.522.533.544.55 x 10 −7 ω gain decay for filter (5)gain decay for an ideal filter (2) Figure 1:
Gain decay: the values | M θ,q (cid:0) e iω (cid:1) | for a non-causal filter (2) with θ = 0 . and q = 1 . ,and | H a (cid:0) e iω (cid:1) | for a causal filter (5) with a = 0 . , p = 0 . , N = 50 , m = 2 . ω distance from identity for filter (5)distance from identity for ideal filter (2) Figure 2:
Approximation of identity operator: shapes of the distances from 1, i.e, for the values | M θ,q (cid:0) e iω (cid:1) − | and | H a (cid:0) e iω (cid:1) − | , for a non-causal filter (2) with θ = 0 . and q = 1 . , and | H a (cid:0) e iω (cid:1) | for a causal filter (5) with a = 0 . , p = 0 . , N = 50 , m = 2 . Figure 3:
Impulse response h a ( t ) = ( F − H a )( t ) for causal filter (5) with a = 0 . , p = 0 . , N = 10 , m = 1 . ω -- K ! e iω " − e iω ---- K ! e iω " H ! e iω " − e iω -- Figure 4:
Approximation of the one-step forward shift operator: the values | K (cid:0) e iω (cid:1) − e iω | for the transferfunction of the predictor (7) and | K (cid:0) e iω (cid:1) H ν (cid:0) e iω (cid:1) − e iω | , i.e., with smoothing of the input process byfilters (5) with γ = 2 . , r = 0 . , a = 0 . , p = 0 . , N = 30 , m = 3 . Figure 5:
Impulse response h a ( t ) = ( F − H a )( t ) for causal filter (5) with the parameters given in (12). Figure 6:
The impulse response Z − K of predictor (7) and the impulse response Z − ( KH a ) of thepredictor combined with filters (5) with the parameters given in (12).
180 185 190 195 200−0.8−0.6−0.4−0.200.20.40.60.8 t x(t)filtered x(t)
Figure 7:
A path of AR(2) process x ( t ) versus the output of filter (5) with the parameters given in (12).
80 185 190 195 200−1.5−1−0.500.51 t x(t)prediction of x(t)prediction of filtered x(t)
Figure 8:
A path of AR(2) process x ( t ) versus two predictions y ( t − of x ( t ) ; one was calculatedwithout filtering, and another was calculated after application of filter (5) with the parameters given in(12).; one was calculatedwithout filtering, and another was calculated after application of filter (5) with the parameters given in(12).