[PDF] Sub-ideal causal smoothing filters for real sequences

Abstract

The paper considers causal smoothing of the real sequences, i.e.,discrete time processes in a deterministic setting. A family of causal linear time-invariant filters is suggested. These filters approximate the gain decay for some non-causal smoothing filters with transfer functions vanishing at a point of the unit circle and such that they transfer processes into predictable ones. In this sense, the suggested filters are near-ideal; a faster gain decay would lead to the loss of causality. Applications to predicting algorithms are discussed and illustrated by experiments with forecasting of autoregressions with the coefficients that are deemed to be untraceable.

Full PDF

aa r X i v : . [ m a t h . O C ] J u l Sub-ideal causal smoothing ﬁlters for the real sequences

Nikolai Dokuchaev

Department of Mathematics & Statistics, Curtin University,GPO Box U1987, Perth, 6845 Western Australia

Submitted November 27, 2014; revised July 4, 2015

Abstract

The paper considers causal smoothing of the real sequences, i.e., discrete time processes in adeterministic setting. A family of causal linear time-invariant ﬁlters is suggested. These ﬁlters ap-proximate the gain decay for some non-causal ideal smoothing ﬁlters with transfer functions vanish-ing at a point of the unit circle and such that they transfer processes into predictable ones. In thissense, the suggested ﬁlters are near-ideal; a faster gain decay would lead to the loss of causality.Applications to predicting algorithms are discussed and illustrated by experiments with forecastingof autoregressions with the coefﬁcients that are deemed to be untraceable.

Key words : smoothing ﬁlters, casual ﬁlters, predicting, near-ideal ﬁlters, LTI ﬁlters.

AMS 2010 classiﬁcation : 42A38, 93E11, 93E10, 42B30

The paper studies causal smoothing of the discrete time processes. For many applications, it is preferableto replace a process by a more smooth process. In continuous time setting, smoothness is associated withpredictability. Smooth analytic functions are predictable, i.e., their values on any interval deﬁne uniquelytheir values outside of this interval, and an ideal low-pass ﬁlter converts a function into an analytic one.For discrete time processes, it is not obvious how to deﬁne an analog of the continuous time analyticityand smoothness. A classical approach is to consider predictability instead of analyticity. So far, thepredictability criterion for stochastic Gaussian stationary discrete time processes in the frequency do-main setting are given by the classical Szeg¨o-Kolmogorov Theorem. This theorem says that the optimal1rediction error is zero if Z π − π log φ (cid:0) e iω (cid:1) dω = −∞ , (1)where φ is the spectral density; see Kolmogorov [20], Szeg¨o [27, 28], Verblunsky [29], and more recentliterature reviews in [3, 26]. This means that a stationary Gaussian process is predictable if its spectraldensity is vanishing on a part of the unit circle { z ∈ C : | z | = 1 } , i.e., if the process is ”band-limited”in this sense. This result was expanded on more general stable stochastic processes allowing spectralrepresentations with spectral density via processes with independent increments; see, e.g., [7].The stochastic setting is the most common in causal smoothing and sampling; see, e.g., [1, 2, 4,5, 3, 8, 9, 12, 11, 13, 14, 15, 16, 18, 20, 25, 24]. A transition to the predicability of discrete timeprocesses in deterministic setting is non-trivial and related to the concept of the randomness for the realsequences in the pathwise setting without a probability measure. There are many classical works devotedto this important concept, starting from Mises [24], Church [6], Kolmogorov [20], Loveland [23]; seethe references in [21].It was found that real sequences are predictable if their Z-transform vanishes on an arc of the unitcircle [11] on the complex plane or at a point z = − of the unit circle [12]. Therefore, smoothing canbe interpreted as reduction of the energy on the higher frequencies. In particular, an ideal low-pass ﬁlteris a smoothing ﬁlter. This ﬁlter is non-causal, i.e., it requires the future value of the process. Similarly, aﬁlter with too high rate of decay of the frequency response at a certain point of the unit circle also cannotbe causal, since causality is inconsistent with predictability of outputs described in [12].The present paper readdresses the problem of causal smoothing of the discrete time processes inthe deterministic pathwise setting, without probabilistic assumptions. We suggest a family of causalsmoothing ﬁlters that can be arbitrarily close to some ideal non-causal smoothing ﬁlters deﬁned byequation (2) below. The suggested ﬁlters are near-ideal in the sense that they ensure ”almost” ideal rateof damping the energy at the point z = − ; a faster decay of the frequency response is impossible forcausal ﬁlters. This follows from predictability criterion [11]. In fact, the particular reference familyof non-causal ideal ﬁlters (2) was selected because these ﬁlters transfer non-predictable processes intopredictable ones satisfying the criterion from [12]. Similar approach was used in [10] for the continuoustime setting.The suggested near-ideal ﬁlters are discrete time causal linear time-invariant ﬁlters (LTI ﬁlters); they2re represented as convolution integrals over the historical data and approximate the real unity uniformlyon an arbitrarily large part of the unit circle.It appears that these causal ﬁlters can be used to improve the performance of predictors suggested in[12]. These are robust with respect to the noise contamination, and the error caused by a high-frequencynoise depends on the intensity of this noise. A near-ideal smoothing ﬁlter cannot remove the high-frequency noise entirely but still can reduce it. This approach is discussed in Section 4, where we presentsome numerical experiments with the suggested near-ideal ﬁlters applied to forecasting of autoregres-sions in a setting where the autoregression coefﬁcients are deemed to be untraceable. Some deﬁnitions and notations

We denote by Z the set of all integers.For r ∈ [1 , + ∞ ] , we denote by ℓ r the set of all sequences x = { x ( t ) } t ∈ Z ⊂ R , such that k x k ℓ r = (cid:0)P ∞ t = −∞ | x ( t ) | r (cid:1) /r < + ∞ for r ∈ [1 , ∞ ) or k x k ℓ ∞ = sup t | x ( t ) | < + ∞ for r = + ∞ . We denote by ℓ + r the set of all sequences x ∈ ℓ r such that x ( t ) = 0 for t < .We denote by L r ( − π, π ) the usual Banach space of complex valued L r -integrable functions x :[ − π, π ] → C .Let D c ∆ = { z ∈ C : | z | > } , and let T = { z ∈ C : | z | = 1 } .For x ∈ ℓ or x ∈ ℓ , we denote by X = Z x the Z-transform X ( z ) = ∞ X t = −∞ x ( t ) z − t , z ∈ C . Respectively, the inverse Z-transform x = Z − X is deﬁned as x ( t ) = 12 π Z π − π X (cid:0) e iω (cid:1) e iωt dω, t = 0 , ± , ± , .... If x ∈ ℓ , then X | T is deﬁned as an element of L ( T ) , i.e., X (cid:0) e iω (cid:1) ∈ L ( − π, π ) . If X (cid:0) e iω (cid:1) ∈ L ( − π, π ) , then x = Z − X is deﬁned as an element of ℓ ∞ .Let H ( D c ) be the Hardy space of functions that are holomorphic on D c including the point atinﬁnity with ﬁnite norm k h k H ( D c ) = sup ρ> k h ( ρe iω )) k L ( − π,π ) . Note that Z-transform deﬁnes abijection between the sequences from ℓ +2 and the restrictions (i.e., traces) X | T of the functions from H ( D c ) such that X ( e iω ) = X (cid:0) e − iω (cid:1) for ω ∈ R ; see, e.g., [22], Section 4.3. If X (cid:0) e iω (cid:1) ∈ L ( − π, π ) and X ( e iω ) = X (cid:0) e − iω (cid:1) , then x = Z − X is deﬁned as an element of ℓ + ∞ .3 Problem setting

Let x ( t ) be a discrete time process, t ∈ Z . The output of a linear ﬁlter is the process y ( t ) = ∞ X s = −∞ h ( t − s ) x ( s ) , where h : Z → R is a given impulse response function.If h ( t ) = 0 for t < , then the output of the corresponding ﬁlter is y ( t ) = t X s = −∞ h ( t − s ) x ( s ) . In this case, the ﬁlter and the impulse response function are said to be causal. The output of a causal ﬁlterat time t can be calculated using only past historical values x ( s ) | s ≤ t of the currently observable inputprocess.The goal is to approximate x by a ”smooth” ﬁltered process y via selection of an appropriate causalimpulse response function h .We are looking for families of the causal smoothing impulse response functions h satisfying thefollowing conditions.(A) The outputs y approximate inputs x ; an arbitrarily close approximation can be achieved via selec-tion of a ﬁlter from this family.(B) The spectrum of the output y vanishes on higher frequencies.(C) The effectiveness of the damping on the energy on the higher frequencies approximates the ef-fectiveness of some reference family of non-causal smoothing ﬁlters that transfer processes intopredictable ones, i.e., such that the future values are uniquely deﬁned by the past values.Note that it is not a trivial task to satisfy Conditions (A)-(C) simultaneously. For example, there aresets of ideal low-pass ﬁlters such that the distance of these sets from the set of all causal ﬁlters is zero.In [2], this was shown for the set of low-pass ﬁlters with increasing pass interval [ − ∆ , ∆] , where ∆ ∈ (0 , π ) . However, Condition (B) is not satisﬁed for the corresponding causal approximations. Moreover,all known causal approximations of ideal ﬁlters do not feature zero values for the transfer functions. Thepresent paper suggests transfer functions vanishing at z = − together with their derivatives.4 he targeted properties of the near ideal ﬁlters Our purpose is to construct a family of causal ﬁlters such that the Conditions (A)–(C) are satisﬁed. Wewill be using a reference family of ”ideal” smoothing ﬁlters with the frequency response M θ,q (cid:0) e iω (cid:1) = exp (cid:18) − θ | e iω | q (cid:19) , (2)where q > and θ > are parameters. For these ﬁlters, Condition (A) is satisﬁed as θ → , andConditions (B) is satisﬁed for all θ > . However, these ﬁlters are non-causal because, for any x ∈ ℓ ,the values x ( t + 1) of the output processes of these ﬁlters are weakly predictable at time t [12]. This issince M θ,q (cid:0) e iω (cid:1) → fast enough as ω → − π .For a given integer m ≥ , we will construct a family of causal ﬁlters with impulse responses h a ∈ ℓ + ∞ and with the corresponding Z-transforms H a = Z h a , where a ∈ (0 , is a parameter. For thisfamily, the following more special Conditions (a)-(c) will be satisﬁed (the number m is used in Condition(b1) below).(a) Approximation of the identity operator: • (a1) sup ω ∈ [0 ,π ] ,a | H a (cid:0) e iω (cid:1) | < + ∞ . • (a2) For any Ω > , H a (cid:0) e iω (cid:1) → as a → − uniformly in ω ∈ [ − Ω , Ω] . • (a3) For any X (cid:0) e iω (cid:1) ∈ L ( − π, π ) and any x = Z − X , k y a ( · ) − x ( · ) k ℓ ∞ → as a → − , where y a is the output process y a ( t ) = t X τ = −∞ h a ( t − τ ) x ( τ ) . (b) The spectrum is vanishing at a point at T : • (b1) For all a , H a (cid:0) e iω (cid:1) is m times differentiable at ω = π , and H a ( −

1) = 0 , d k H a dω k (cid:0) e iω (cid:1) (cid:12)(cid:12)(cid:12) ω = π = 0 , k = 1 , ..., m. (b2) For any ε > , there exists δ > and a ∈ (0 , such that sup ω ∈ [ π − δ,π + δ ] | H a (cid:0) e iω (cid:1) | < ε. (c) Approximation of non-causal ﬁlters (2) with respect to the effectiveness in damping:

For any ε > and any Ω ∈ (0 , π ) , Ω ∈ (Ω , π ) , Ω ∈ (Ω , π ) there exists θ > , q > , a > such that | H a (cid:0) e iω (cid:1) − | ≤ ε, ω ∈ [ − Ω , Ω] , (3) | H a (cid:0) e iω (cid:1) | ≤ | M θ,q (cid:0) e iω (cid:1) | , ω ∈ [ − Ω , − Ω ] ∪ [Ω , Ω ] . (4)Conditions (a)-(c) represent particular versions of less speciﬁc Conditions (A)-(C). In particular,estimate (4) ensures that Condition (C) is satisﬁed. Let a real number p ∈ (1 / , and integers N ≥ and m ≥ be given. For the real numbers a ∈ (0 , ,we deﬁne transfer functions H a ( z ) = (cid:18) exp (1 − a ) p z + a + G a ( z ) (cid:19) m , z ∈ C , (5)where G a ( z ) = − ξ ( a, p ) + γ ( a, p ) N (cid:0) ( − N z − N − (cid:1) , and where ξ ( a, p ) = exp[ − (1 − a ) p − ] , γ ( a, p ) = | − a | p − ξ ( a, p ) . We consider the set { H a } a ∈ (0 , of transfer functions (5) with a ﬁxed triplet ( m, N, p ) . Theorem 1

Conditions (a)-(c) are satisﬁed for the family of ﬁlters deﬁned by the transfer functions { H a } a ∈ (0 , . (Therefore, Conditions (A)-(C) are satisﬁed for this family).Proof of Theorem 1. Let us assume ﬁrst that m = 1 .Clearly, the functions H a are holomorphic in D c and bounded in D c ∪ T for any a ∈ (0 , . Hencethe inverse Z-transforms h a = Z − H a are causal impulse responses, i.e., h a ( t ) = 0 for t < ; see, e.g.,[22], Theorem 4.3.2. 6et f ( a ) = (1 − a ) p and Ψ a ( z ) = f ( a )( z + a ) − . By the deﬁnitions, H a ( z ) = exp Ψ a ( z ) + G a ( z ) ,and Ψ a (cid:0) e iω (cid:1) = f ( a ) cos( ω ) + a − i sin( ω )(cos( ω ) + a ) + sin( ω ) . Let us prove that Condition (a) holds.Clearly, | G a (cid:0) e iω (cid:1) | → as a → − uniformly in ω ∈ ( − π, π ] .Let ω a ∈ ( π/ , π ) be such that cos( ω a )+ a = 0 . We have that Re Ψ a (cid:0) e iω (cid:1) > for all ω ∈ [ − ω a , ω a ] and Re Ψ a (cid:0) e iω (cid:1) < for all ω ∈ [ − π, ω a ) ∪ ( − ω a , π ] .Further, we have that inf ω ∈ [ − ω a ,ω a ] | e iω + a | ≥ p − a . Hence sup ω ∈ [ − ω a ,ω a ] | Ψ a (cid:0) e iω (cid:1) | ≤ f ( a ) √ − a = (1 − a ) p − / (1 + a ) / ≤ . Therefore, the value | H a (cid:0) e iω (cid:1) | is uniformly bounded in a, ω . Hence Condition (a1) holds.Further, we have that ω a → π − as a → . Hence, for any Ω ∈ [0 , π ) , we have that sup ω ∈ [ − Ω , Ω] | Ψ a (cid:0) e iω (cid:1) | → as a → and sup ω ∈ [ − Ω , Ω] | H a (cid:0) e iω (cid:1) − | → as a → . Hence Condition (a2) holds.Let as show that Condition (a3) holds. Let Y a = H a X . By Condition (a2), Y a (cid:0) e iω (cid:1) → X (cid:0) e iω (cid:1) as a → − for all ω ∈ R . Clearly, there exists a ∈ (0 , and c > such that sup ω,a ≥ a | H a (cid:0) e iω (cid:1) | ≤ c . Hence | Y a (cid:0) e iω (cid:1) − X (cid:0) e iω (cid:1) | ≤ ( c + 1) | X (cid:0) e iω (cid:1) | . By the assumptions, X (cid:0) e iω (cid:1) = Z x ∈ L ( − π, π ) .By the Lebesgue Dominance Theorem, it follows that (cid:13)(cid:13) Y a (cid:0) e iω (cid:1) − X (cid:0) e iω (cid:1)(cid:13)(cid:13) L ( − π,π ) → as a → − . exp (cid:0) Ψ a (cid:0) e iπ (cid:1)(cid:1) = exp (cid:18) f ( a ) − a (1 − a ) (cid:19) = exp (cid:18) − (1 − a ) p − a (cid:19) = ξ ( a, p ) = − G a ( − . Hence H a ( −

1) = exp (Ψ a ( − G a ( −

1) = 0 .Let us show that dH a dω (cid:0) e iω (cid:1) (cid:12)(cid:12)(cid:12) ω = π = 0 . Let r ( ω ) = Re exp (cid:0) Ψ a (cid:0) e iω (cid:1)(cid:1) , s ( ω ) = Im exp (cid:0) Ψ a (cid:0) e iω (cid:1)(cid:1) ,q ( ω ) = 1 N Im (cid:16) e − iN ( ω − π ) − (cid:17) . Clearly, the function Ψ a (cid:0) e iω (cid:1) is differentiable in ω ∈ R for any a , as well as functions r ( ω ) , s ( ω ) ,and q ( ω ) . In addition, we have that r ( ω ) = r ( π − ω ) . Hence r ( ω ) is even about the point ω = π anddifferentiable. This implies that drdω ( ω ) (cid:12)(cid:12)(cid:12) ω = π = 0 . By the deﬁnitions, s ( ω ) = exp(Re Ψ a ) sin(Im Re Ψ a ) and exp(Re Ψ a ) (cid:0) e iω (cid:1) → ξ ( a, p ) as ω → π . Wehave that s ( π ) = q ( π ) = 0 . The L’H ˆopital’s rule gives that lim ω → π ds ( ω ) dωdq ( ω ) dω = lim ω → π s ( ω ) q ( ω ) = lim ω → π ξ ( a, p ) sin (cid:16) − (1 − a ) p sin( ω )( a +cos( ω )) +sin( ω ) (cid:17) N sin( N ( ω − π ))= − ξ ( a, p ) (1 − a ) p ( a − = − γ ( a, p ) . Clearly, dq ( ω ) dω | ω = π = − . Hence ddω exp Ψ a (cid:0) e iω (cid:1) (cid:12)(cid:12)(cid:12) ω = π = i ds ( ω ) dω | ω = π = iγ ( a, p ) . On the other hand, dG a (cid:0) e iω (cid:1) dω (cid:12)(cid:12)(cid:12) ω = π = ddω (cid:18) − ξ ( a, p ) + γ ( a, p ) N (cid:16) e − iN ( ω − π ) − (cid:17)(cid:19) (cid:12)(cid:12)(cid:12) ω = π = − iγ ( a, p ) . Hence ddω H a (cid:0) e iπ (cid:1) (cid:12)(cid:12)(cid:12) ω = π = ddω exp (cid:0) Ψ a (cid:0) e iπ (cid:1)(cid:1) (cid:12)(cid:12)(cid:12) ω = π + dG a (cid:0) e iω (cid:1) dω (cid:12)(cid:12)(cid:12) ω = π = 0 . Re Ψ a (cid:0) e iω (cid:1) = f ( a )(cos( ω ) + a )(cos( ω ) + a ) + sin( ω ) = f ( a ) Re ( e iω + a ) | e iω + a | . We have that − Re ( e iω + a ) / | e iω + a | is non-decreasing in ω ∈ [ ω a , π ] and converges to as ω → π − ,and / | e iω + a | is non-decreasing in ω ∈ [ ω a , π ] and converges to (1 − a ) − as ω → π − . Hencethe product of these functions, − f ( a ) − Re Ψ a (cid:0) e iω (cid:1) , is non-decreasing in ω ∈ [ ω a , π ] . Hence we canselect b ω a ∈ [ ω a , π ] such that − Re Ψ a (cid:0) e iω (cid:1) ≥ − Re Ψ a (cid:0) e iπ (cid:1) / for all ω ∈ [ b ω a , π ] , i.e., Re Ψ a (cid:0) e iω (cid:1) ≤ Re Ψ a (cid:0) e iπ (cid:1) / for all ω ∈ [ b ω a , π ] . In addition, Re Ψ a (cid:0) e iω (cid:1) = Re Ψ a (cid:0) e − iω (cid:1) . Hence Re Ψ a (cid:0) e iω (cid:1) ≤ Re Ψ a (cid:0) e iπ (cid:1) / for all ω ∈ [ b ω a , π − b ω a ] = [ π − δ a , π + δ a ] , where δ a = π − b ω a .Further, we have that Re Ψ a (cid:0) e iπ (cid:1) = (1 − a ) p − a ( − a ) → −∞ as a → . For a given ε > , let us select ¯ a such that Re Ψ a (cid:0) e iπ (cid:1) / < log( ε/ for all a ≥ ¯ a . In addition, wecan select e a ≥ ¯ a such that | G a (cid:0) e iω (cid:1) | ≤ ε/ for all a ≥ e a and all ω . Then Condition (b2) holds with a = e a and δ = δ e a selected for given ε . Therefore, Condition (b) holds.Let us show that Condition (c) holds. It follows from the proof of (a) above, that, for a given ε > and Ω , we can select ¯ a such that (3) holds for a ≥ ¯ a . Further, let q > be any. For any Ω > Ω and Ω > Ω , sup ω ∈ [ − Ω , − Ω ] ∪ [Ω , Ω ] | M θ,q (cid:0) e iω (cid:1) − | → as θ → . Clearly, (4) holds for small enough θ . Hence Condition (c) holds.We have proved the theorem for the case where m = 1 . The extension on the case where m > isstraightforward. This completes the proof of Theorem 1. (cid:3) Illustrative examples

Figures 1-3 shows examples of the frequency responses and the impulse functions for the ﬁlters describedabove.Figure 1 shows the shapes of gain curves | M θ,q (cid:0) e iω (cid:1) for reference non-causal ﬁlter (2) with θ = 0 . , q = 1 . , and | H a (cid:0) e iω (cid:1) | for near-ideal causal ﬁlters (5) with a = 0 . , p = 0 . , N = 50 , m = 2 .9igure 2 shows the shapes of error curves for approximation of identity operator on low frequencies.More precisely, it shows | M θ,q (cid:0) e iω (cid:1) − | for reference non-causal ﬁlter (2) and | H a (cid:0) e iω (cid:1) − | for near-ideal causal ﬁlters (5), with the same parameters as for Figure 1.Figure 3 shows an example of impulse response h = Z − H a calculated as the inverse Z-transformfor causal ﬁlter (5) with a = 0 . , p = 0 . , N = 10 , m = 1 . Since the properties of H a guarantee that Im h a ( t ) = 0 for all t and that h a ( t ) = 0 for all t < , we show the values for t ≥ only.In particular, these examples show that the impulse response functions h a can take negative values;i.e., these ﬁlters do not represent an averaging with a positive kernels. A possible application of suggested above ﬁlters is preliminary smoothing of the input signals for thepredicting algorithms. For this task, the causality is crucial. It is known that the band-limited sequencesare predictable, i.e., the sequences are predictable if with the spectrum vanishing on a interval in T . Inaddition, there are predictable sequences such that the spectrum is vanishing in a single point of T ; see[11, 12], where some predicable algorithms were suggested.It can be noted that, the suggested above ﬁlters do not change the input sequences signiﬁcantly if a isclose to 1; the energy of the input is not damped on a given arc of T . In fact, the energy is damped on asmall neighborhood of the point e iπ = − , and the size of this neighborhood converges to zero as a → .Therefore, one cannot expect that the ﬁlters introduced above will help to improve the performance ofthe predicting algorithms [11] requiring that the spectrum is vanishing on a ﬁxed arc on the unit circle.However, it appears that these ﬁlters can help to improve the performance of predictors [12] orientedon processes x with spectrum vanishing in a single point. More precisely, predictors [12] are applicablefor discrete time processes x such that, for some θ > , q > , c > , sup ω ∈ [ − π,π ] | X (cid:0) e iω (cid:1) | ≤ cM θ,q (cid:0) e iω (cid:1) , X = Z x. (6)In particular, it follows that ﬁlters (2) transfer sequences of a general type into predictable sequencessuch that (6) holds; respectively, ﬁlters (2) cannot be causal.The predicting kernel [12] was deﬁned as k = k ( · , γ ) = Z − K , where K ( z ) ∆ = z (cid:18) − exp (cid:20) − γz + 1 − γ − r (cid:21)(cid:19) , (7)10nd where r > and γ > are parameters. This predictor produces the process y ( t ) = t X d = −∞ k ( t − d ) x ( t ) approximating x ( t + 1) for γ → + ∞ for all inputs x satisfying (6) with q > /r . (In the notationsfrom [12], r = 2 µ/ ( q − , where µ > , q > are the parameters ). The function K (cid:0) e iω (cid:1) approximatesthe function e iω representing the forward one-step shift in the time domain; the value | K (cid:0) e iω (cid:1) − e iω | issmall everywhere but in a small neighborhood of ω = π . Therefore, the process y ( t ) represents an onestep prediction of x ( t + 1) if X (cid:0) e iω (cid:1) vanishes with a certain rate at ω = − π . It was shown in [12] that sup t | x ( t + 1) − y ( t ) | → as γ → , for real sequences x such that (6) holds, i.e., that the prediction error vanishes as γ → + ∞ . Moreover,the error vanishes uniformly over classes of processes x from some bounded sets from ℓ ∞ , such that (6)holds with a given c .Predictors (7) are robust with respect to some small noise contamination, meaning that the predictionerror depends continuously on the intensity of the contaminating noise. However, for large γ , the valuesof K (cid:0) e iω (cid:1) can be very large in a neighborhood of ω = π ; in this case, the error can be large even for asmall noise.We suggest to apply ﬁlter (5) to compensate the presence of large values of K (cid:0) e iω (cid:1) in a smallneighborhood of ω = π and therefore to reduce the impact of the presence of the high-frequency noise.This is illustrated by Figure 4 showing the shapes of error curves for approximation of the forward onestep shift operator. More precisely, it shows the shape of | K (cid:0) e iω (cid:1) − e iω | for the predictor (7) and theshape of | K (cid:0) e iω (cid:1) H a (cid:0) e iω (cid:1) − e iω | for the transfer functions (5) and (7), which corresponds to preliminarysmoothing of the input process by ﬁlters (5). These shapes characterize imperfection of the predictors,since the transfer function e iω corresponds to the one-step forward shift operator in time domain, i.e., e iω represents an ideal non-causal error-free one-step ahead predictor. It appears that the application of theﬁlter improves the approximation of e iω .Our setting does not involve stochastic processes and probability measure; it is oriented on smoothingthe real sequences. However, to provide an example of the application of our smoothing ﬁlters, weconsidered a toy example with prediction of a stochastic Gaussian stationary process x ( t ) evolving as an11utoregression of AR(2) type x ( t ) = β x ( t −

1) + β x ( t −

2) + ση ( t ) , t ∈ Z . (8)Here η ( t ) is a stochastic discrete time Gaussian white noise, E η ( t ) = 0 , E η ( t ) = 1 . The coefﬁcient σ > describes the intensity of the noise.For the estimation of the effectiveness of predictors, we use the ratio e ( b , b ) = (cid:0)P nt =1 | y ( t − − x ( t ) | (cid:1) / ( P nt =1 | b x ( t −

1) + b x ( t − − x ( t ) | ) / , (9)where b k ∈ R are parameters. The values y ( t − are supposed to be the predictions, at time t − ,of the future values x ( t ) . Since the sequence { x ( t ) } does not satisfy (6) due to the presence of noise, aforecasting error is inevitable. Ratio (9) allows to compare the error of a predicting algorithm generating y and the error generated by a linear predictor with the coefﬁcients b and b . More precisely, the value e ( b , b ) represent the ratio of the error generated by the predictor producing y and the error generatedwith the error of the linear predictor based on the hypothesis that β = b and β = b .If the vector ( β , β ) is known, then the optimal one step predictor of x ( t ) is y ( t −

1) = β x ( t −

1) + β x ( t − . (10)In this case, the value n − P nt =1 | β x ( t −

1) + β x ( t − − x ( t ) | represents the sample mean of thesquared error of this optimal predictor with known values of ( β , β ) . Therefore, the optimal predictor(10) ensures that e ( β , β ) ≈ for a large enough n . Similarly, for any given n , an average value for e ( β , β ) is also close to one for a sufﬁciently large number of Monte-Carlo trials, for optimal predictor(10). Respectively, any other predictor besides (10), including predictor (7), cannot achieve a lesseraverage value of e ( β , β ) for a sufﬁciently large n or as an average value for a sufﬁciently large numberof Monte-Carlo trials.However, in many practical situations, the value of ( β , β ) is unknown, and, respectively, predictor(10) cannot be used. On the other hand, predictor (7) does not require to know ( β , β ) and can be appliedin models with unknown or random and time variable ( β , β ) where predictors (10) is not applicable. Inother words, predictor (7) can be applied for processes with unknown shape of the spectral representation.Therefore, it is reasonable to estimate the performance of a predictor using e ( b , b ) with b k = β k , forinstance, with b k selected as the expected value, or the median, or upper or lower boundaries of unknown β k . 12ince it is impossible to implement convolution with inﬁnitely supported kernels and inputs, one hasto use truncated kernels and inputs for calculations. In the experiments described below, we replaced k and h a = Z − H a by the truncated kernels k d ( t ) = I { t ≤ d } k ( t ) , h a,d ( t ) = I { t ≤ d } h a ( t ) , d > . (11)In other words, the original k = Z − K and h a = Z − H a were used as some benchmarks; only theirtruncated versions were actually implemented.We will use value (9) to estimate the performance of predicting algorithms for the following twocases: • The algorithm is applied without ﬁltering and produces y = k d ◦ x ; we denote by e K ( b , b ) thecorresponding values (9). • The algorithm is applied with ﬁltering and produces y = ( k d ◦ h a,d ) ◦ x ; we denote by e KH ( b , b ) the corresponding values (9).In both cases, y ( t ) is calculated using historical data { x ( s ) } t − d ≤ s ≤ t .In our experiments, we used equations (5) and (7) with γ = 1 . , r = 1 . , a = 0 . , p = 0 . , N = 100 , m = 2 . (12)Note that selection of too large γ makes calculation of k challenging, since it involves precise integrationof fast growing K (cid:0) e iω (cid:1) . The choice of parameters in (12) ensures that the values of | K (cid:0) e iω (cid:1) | are notlarge. Figure 5 shows the corresponding impulse response Z − H a . Figure 6 shows the correspondingimpulse responses Z − K and Z − ( KH a ) .In our experiment with AR(2) process, we used 10,000 Monte-Carlo trials with n = d = 100 and σ = 0 . . For each trial, we selected ( β , β ) randomly and independently. The distribution of ( β , β ) ateach trial was the following: β has the uniform distribution on the interval (0 , , and β = ξ p − β ,where ξ is a random variable independent on β and uniformly distributed on the interval ( − , . Thischoice ensures that the eigenvalues of the autoregression stay inside of the unit circle D almost surely.We used MATLAB and standard personal computers; an experiment with 10,000 Monte-Carlo trialswould take about ﬁve minutes of calculation time. 13irst, we compared the relative performance of our predictors with respect to the performance of theoptimal predictor that requires that the values β and β are known. We obtained that the mean valueover all Monte-Carlo trials of e K ( β , β ) is 1.5019 and the mean value of e KH ( β , β ) is 1.1177. Thisindicates that application of ﬁlter (5) improves the performance of the predictor. As was mentionedabove, we cannot expect, for sufﬁciently many Monte-Carlo trials, the mean values of e K ( β , β ) and e KH ( β , β ) to be less than one, so the performance of the predictor (7) combined with ﬁlters (5) isreasonably good.Second, we calculated the mean values e K ( b , b ) and e KH ( b , b ) with b k = E β k , where E β k is thepopulation mean for β k , k = 1 , . For our parameters of the Monte-Carlo trials, we have E β = 0 . and E β = 0 . The corresponding values e K (0 . , and e KH (0 . , represent comparison of the performanceof our predictor (without or with preliminary ﬁltering) with an one-step predictor of x ( t ) given by y ( t −

1) = ( E β ) x ( t −

1) + ( E β ) x ( t − . Note that this predictor requires to know the population means of β and β . We obtained that the meanvalue for e K (0 . , is . and the mean value for e KH (0 . , is . . These numbers indicate agood performance of our ﬁlter/predictor system, especially if we take into account that our system doesnot require to know the values E β k .Figure 7 shows a sample path of AR(2) process x ( t ) and a ﬁltered process obtained using ﬁlter (5)with the parameters deﬁned by (12). Figure 8 shows sample paths of AR(2) process x ( t ) and outputs y ( t ) of predictor [12] without preliminary ﬁltering and with preliminary ﬁltering using ﬁlter (5) with theparameters deﬁned by (12). It shows the values y ( t − , i.e., predictions of x ( t ) , versus the values of x ( t ) .In addition, we considered a modiﬁcation of process (8) with β ≡ , i.e., AR(1) process. We usedthe same predictors and ﬁlters as for the experiments with AR(2) process described above.We set again , Monte-Carlo trials with n = d = 100 , with σ = 0 . , and with randomlyselected β such that β was distributed uniformly on the interval (0 , . We compared the relativeperformance of our predictors with respect to the performance of the optimal predictor that requires thatthe value β is known. We obtained that the mean value of e K ( β , is 1.2830 and the mean value of e KH ( β , is 1.1023. The numbers indicate again that the use of the ﬁlter improves the performanceof the predictor. Again, the mean values of e K ( β , and e KH ( β , cannot be less than one, so the14erformance of the predictor (7) combined with ﬁlters (5) is reasonably good.Finally, we calculated the values e K ( b , and e KH ( b , with b = E β = 0 . for AR(1) processdeﬁned by (8) with β ≡ . In other words, we compared the relative performance of the one steppredictor (5) with or without ﬁltering with respect to the predictor y ( t ) = ( E β ) x ( t ) that requires toknown the population mean for β . We obtained that e K (0 . ,

0) = 1 . and e KH (0 . ,

0) = 1 . .This shows again that application of the ﬁlter reduces the forecasting error. Given that our system doesnot require to know E β , the performance is reasonably good.The results of these experiments appear to be consistent and stable with respect to the variations ofthe parameters. On the impact of truncation

Since it is impossible to implement convolution with inﬁnitely supported kernels and inputs, we haveto run numerical calculations with truncated processes. Let us show that the described above ﬁlteringand forecasting are robust with respect to the truncation, i.e., that the truncation in (11) has a vanishingimpact for large d . Since H a (cid:0) e iω (cid:1) ∈ L ∞ ( − π, π ) , we have that h a ∈ ℓ . Hence h a,d ◦ x → h a ◦ x in ℓ ∞ as d → + ∞ for x ∈ ℓ ; in practice, only truncated inputs x ∈ ℓ are available. It can be also noted thatthe predicting kernel k = Z − K deﬁned by (7) belongs to ℓ . Therefore, the kernel k ◦ h a,d convergesto k ◦ h a in ℓ ∞ as d → + ∞ .Our numerical experiments for autoregressions with the coefﬁcients deemed to untraceable demon-strated that truncation with relatively small d = 100 does not diminish a good forecasting performance. The paper proposes a family of causal smoothing ﬁlters. These ﬁlters are near-ideal meaning that ahigher rate of damping of the energy on the high frequencies would lead to the loss of causality; this isbecause they approximate non-causal ﬁlters transferring non-predicable processes into predictable ones.A possible application is preliminary smoothing of the inputs for predicting algorithms. Certain mild butstable improvement of forecasting accuracy is demonstrated in experiments with simple autoregressionsin a setting where theirs coefﬁcients are deemed to be untraceable.It could be interesting to investigate the computational limits of the algorithms described above, for15nstance, for long sequences, for higher order autoregressions, or for other types of input processes. Itcould be interesting to ﬁnd other ﬁlters with similar properties. It could be useful to represent the ﬁlteringalgorithm in the terms of the discrete Fourier transform. We leave this for future work.

Acknowledgment

This work was supported by ARC grant of Australia DP120100928 to the author. In addition, the authorthanks the anonymous referees for their valuable remarks which helped to improve the paper.

ReferencesReferences [1] A. Aldroubi, and M. Unser. (1994). A general sampling theory for nonideal acquisition devices.

IEEE Trans. Signal Process. , No. 11 2915–2995.[2] J.M. Almira and A.E. Romero. (2008). How distant is the ideal ﬁlter of being a causal one? AtlanticElectronic Journal of Mathematics (1), 46–55.[3] N.H. Bingham. (2012). Szeg¨o’s theorem and its probabilistic descendants. Probability Surveys ,287-324.[4] F.G. Beutler. (1966). Error-free recovery of signals from irregularly spaced samples. SIAM Review (3), 328–335.[5] J.R. Brown Jr. (1969). Bounds for truncation error in sampling expansion of band-limited signals, IEEE Transactions on Information Theory (4), 440-444.[6] A. Church. (1940). On the Concept of Random Sequence. Bull.Amer.Math.Soc.

46, 254–260.[7] S. Cambanis, and A.R. Soltani. (1984). Prediction of stable processes: spectral and moving averagerepresentations.

Z. Wahrsch. Verw. Gebiete

66, no. 4, 593–612.[8] N. Dokuchaev. (2008). The predictability of band-limited, high-frequency, and mixed processesin the presence of ideal low-pass ﬁlters,

Journal of Physics A: Mathematical and Theoretical ,382002 (7pp). 169] N. Dokuchaev. (2010). Predictability on ﬁnite horizon for processes with exponential decrease ofenergy on higher frequencies, Signal processing (2), 696–701.[10] N. Dokuchaev. (2012). On sub-ideal causal smoothing ﬁlters. Signal Processing , iss. 1, 219-223.[11] N. Dokuchaev. (2012). On predictors for band-limited and high-frequency time series. Signal Pro-cessing , iss. 10, 2571-2575.[12] N. Dokuchaev (2012). Predictors for discrete time processes with energy decay on higher frequen-cies, IEEE Transactions on Signal Processing , No. 11, 6027-6030.[13] P. G. S. G. Ferreira (1994). Interpolation and the discrete Papoulis-Gerchberg algorithm. IEEETransactions on Signal Processing, (10), 2596–2606.[14] P. G. S. G. Ferreira. (1995a). Nonuniform sampling of nonbandlimited signals. IEEE Signal Pro-cessing Letters , Iss. 5, 89–91.[15] P. G. S. G. Ferreira. (1995b). Approximating non-band-limited functions by nonuniform samplingseries. In: SampTA’95, 1995 Workshop on Sampling Theory and Applications , 276–281.[16] P. G. S. G. Ferreira, A. Kempf, and M. J. C. S. Reis (2007). Construction of Aharonov-Berryssuperoscillations.

J. Phys. A, Math. Gen. , vol. 40, pp. 5141–5147.[17] M. Frank, and L. Klotz. (2002). A duality method in prediction theory of multivariate stationarysequences.

Math. Nachr.

Sampling Theory in Fourier and Signal Analysis.

Oxford University Press,NY.[19] A.N. Kolmogorov. (1941). Interpolation and extrapolation of stationary stochastic series.

Izv. Akad.Nauk SSSR Ser. Mat.,

Problemsof Information and Transmission , 1:1, 1–7.[21] M. Li, P.B.M. Vitanyi. (1993).

An introduction to Kolmogorov complexity and its applications ,Springer. 1722] A. Lindquist, G. Picci. (2015). Linear Stochastic Systems

A Geometric Approach to Modeling,Estimation and Identiﬁcation.

Springer-Verlag, Berlin Heidelberg.[23] D.M. Loveland. (1966). A new interpretation of von Mises’ concept of random sequence.

Z. Math.Logik Grundlagen Math , 279–294.[24] R. von Mises. (1919). Grundlagen der Wahrscheinlichkeitsrechnung. Math. Z.

5, 52–99.[25] A.G. Miamee, and M. Pourahmadi. (1988). Best approximations in L p ( dµ ) and prediction problemsof Szeg¨o, Kolmogorov, Yaglom, and Nakazi. J. London Math. Soc.

38, no. 1, 133–145.[26] B. Simon. (2011). Szeg¨o’s Theorem and Its Descendants. Spectral Theory for L Perturbations ofOrthogonal Polynomials.

M.B. Porter Lectures.

Princeton University Press, Princeton.[27] G. Szeg¨o. (1920). Beitr¨age zur Theorie der Toeplitzschen Formen.

Math. Z.

6, 167–202.[28] G. Szeg¨o. (1921). Beitr¨age zur Theorie der Toeplitzschen Formen, II.

Math. Z.

9, 167-190.[29] S. Verblunsky. (1936). On positive harmonic functions (second paper).

Proc. London Math. Soc.

40, 290–320. 18 .1405 3.141 3.1415 3.142 3.142500.511.522.533.544.55 x 10 −7 ω gain decay for filter (5)gain decay for an ideal filter (2) Figure 1:

Gain decay: the values | M θ,q (cid:0) e iω (cid:1) | for a non-causal ﬁlter (2) with θ = 0 . and q = 1 . ,and | H a (cid:0) e iω (cid:1) | for a causal ﬁlter (5) with a = 0 . , p = 0 . , N = 50 , m = 2 . ω distance from identity for filter (5)distance from identity for ideal filter (2) Figure 2:

Approximation of identity operator: shapes of the distances from 1, i.e, for the values | M θ,q (cid:0) e iω (cid:1) − | and | H a (cid:0) e iω (cid:1) − | , for a non-causal ﬁlter (2) with θ = 0 . and q = 1 . , and | H a (cid:0) e iω (cid:1) | for a causal ﬁlter (5) with a = 0 . , p = 0 . , N = 50 , m = 2 . Figure 3:

Impulse response h a ( t ) = ( F − H a )( t ) for causal ﬁlter (5) with a = 0 . , p = 0 . , N = 10 , m = 1 . ω -- K ! e iω " − e iω ---- K ! e iω " H ! e iω " − e iω -- Figure 4:

Approximation of the one-step forward shift operator: the values | K (cid:0) e iω (cid:1) − e iω | for the transferfunction of the predictor (7) and | K (cid:0) e iω (cid:1) H ν (cid:0) e iω (cid:1) − e iω | , i.e., with smoothing of the input process byﬁlters (5) with γ = 2 . , r = 0 . , a = 0 . , p = 0 . , N = 30 , m = 3 . Figure 5:

Impulse response h a ( t ) = ( F − H a )( t ) for causal ﬁlter (5) with the parameters given in (12). Figure 6:

The impulse response Z − K of predictor (7) and the impulse response Z − ( KH a ) of thepredictor combined with ﬁlters (5) with the parameters given in (12).

180 185 190 195 200−0.8−0.6−0.4−0.200.20.40.60.8 t x(t)filtered x(t)

Figure 7:

A path of AR(2) process x ( t ) versus the output of ﬁlter (5) with the parameters given in (12).

80 185 190 195 200−1.5−1−0.500.51 t x(t)prediction of x(t)prediction of filtered x(t)

Figure 8:

A path of AR(2) process x ( t ) versus two predictions y ( t − of x ( t ) ; one was calculatedwithout ﬁltering, and another was calculated after application of ﬁlter (5) with the parameters given in(12).; one was calculatedwithout ﬁltering, and another was calculated after application of ﬁlter (5) with the parameters given in(12).