Estimators of Long-Memory: Fourier versus Wavelets
aa r X i v : . [ m a t h . S T ] J a n ESTIMATORS OF LONG-MEMORY: FOURIER VERSUS WAVELETS
GILLES FA ¨Y, ERIC MOULINES, FRANC¸ OIS ROUEFF, AND MURAD S. TAQQU
Abstract.
There have been a number of papers written on semi-parametric estimation methodsof the long-memory exponent of a time series, some applied, others theoretical. Some using Fouriermethods, others using a wavelet-based technique. In this paper, we compare the Fourier andwavelet approaches to the local regression method and to the local Whittle method. We providean overview of these methods, describe what has been done, indicate the available results andthe conditions under which they hold. We discuss their relative strengths and weaknesses bothfrom a practical and a theoretical perspective. We also include a simulation-based comparison.The software written to support this work is available on demand and we illustrate its use at theend of the paper.
Date : January 28, 2008.1991
Mathematics Subject Classification.
Primary 62M10, 62M15, 62G05 Secondary: 60G18.
Key words and phrases.
Wavelet analysis, long range dependence, semi-parametric estimation.Murad S. Taqqu would like to thank l’´Ecole Normale Sup´erieure des T´elecommunications in Paris for theirhospitality. This research was partially supported by the NSF Grants DMS–0505747 and DMS–0706786 at BostonUniversity.
Contents
1. Introduction 32. Definition of an M( d ) process 43. Examples 64. Wavelet semi-parametric estimators of the memory parameter 64.1. The wavelet setting 74.2. Choice of the wavelets 94.3. The local regression wavelet (LRW) estimator of d d d d AVELET ESTIMATORS OF LONG-MEMORY 3 Introduction
We study here finite variance stochastic processes { X k } k ≥ , whose spectral density f ( λ ), λ ∈ ( − π, π ) behaves like a power function at low frequencies, that is as | λ | − d as the frequency λ → d > long-memory , d = 0 to short-memory and d < negative dependence . For X k , k ∈ Z to be stationary it is necessary that R π − π f ( λ ) dλ < ∞ and hence that d < /
2. We relax these restrictions in a number of ways. We shall allow theprocess to be non-stationary, requiring only that it becomes stationary after it is differenced anumber of times. We also suppose that the spectral density (of the differenced process) behavesnot merely like | λ | − d but as | λ | − d f ⋆ ( λ ), where f ⋆ is regarded as a short-range density function.Our goal is to estimate d in the presence of f ⋆ . We shall not assume that the nuisance function f ⋆ is known, nor that it is characterized by a finite number of unknown parameters, but merelythat f ⋆ ( λ ) is ”smooth” in the neighborhood of λ = 0, so that if one focuses only on frequencies λ that are sufficiently low, then the spectral density f ( λ ) behaves essentially like | λ | − d . Whatfrequency cut-off should one choose will clearly become an important issue.The estimation framework is semi-parametric : we must estimate the unknown parameter d while viewing the presence of f ⋆ as a nuisance, albeit one which complicates matters. Theestimation method will also be local , in that, it is necessary to focus only on frequencies λ thatare close enough to the origin, where the influence of f ⋆ ( λ ) can be neglected.In this paper we provide an overview and comparison of four semi-parametric estimation meth-ods of the parameter d which have all proven to be very effective. Two of them are Fourier-based,the other two are based on wavelets. The methods are: • Geweke-Porter Hudak (GPH): Regression / Fourier, • Local Whittle Fourier (LWF): Whittle / Fourier, • Local Regression Wavelets (LRW): Regression / Wavelets, • Local Whittle Wavelets (LWW): Whittle / Wavelets.The Fourier methods are older and better known. They have essentially been developed by PeterRobinson in a number of fundamental papers Robinson (1995b), Robinson (1995a). If we ignorefor the moment the presence of the nuisance function f ⋆ , then one has f ( λ ) = | λ | − d , that islog f ( λ ) ≈ − d log | λ | . Therefore, d can be estimated by linear regression on the periodogram.This is the Fourier-based regression method considered in Geweke and Porter-Hudak (1983) in aparametric setting. The semi-parametric setting was suggested by K¨unsch (1987) and developedby Robinson (1995b). The Fourier-based Whittle method is a pseudo-maximum likelihood methoddeveloped by Fox and Taqqu (1986) in a parametric setting and extended in a semi-parametricsetting by Robinson (1995a). GILLES FA ¨Y, ERIC MOULINES, FRANC¸ OIS ROUEFF, AND MURAD S. TAQQU
The papers of Moulines et al. (2007b), Moulines et al. (2007a), Moulines et al. (2007c) andRoueff and Taqqu (2007) recast the preceding Fourier-based methods in a wavelet setting. Waveletshave a number of advantages. They allow differencing implicitly and therefore they can be usedwithout problems when d > /
2. They also automatically discount polynomial trends. The localwavelet-based regression method was first developed by Abry and Veitch (1998) under the simpli-fying assumption that the wavelet coefficients are uncorrelated; see also Veitch and Abry (1999)and the review articles Abry et al. (2000) and Abry et al. (2003). In addition, see Veitch et al.(2003) for the automatic selection of the cut-off frequency point and Veitch et al. (2000) for thechoice of the ”scale function”. Bardet (2000) and Bardet (2002) provides asymptotic result forthe LRW estimator in a parametric context. Bardet et al. (2000) is a first attempt to analyze thebehavior of LRW in a semi-parametric context by assuming continuous-time observations. TheLocal Whittle wavelet method is developed in Moulines et al. (2007c).The paper is structured as follow. In Section 2, we formalise our assumptions on { X k } bydefining an M( d ) process, that is, a process with memory parameter d . The standard ARIMAand fractional Gaussian noise examples are introduced in Section 3. The wavelet-based semi-parametric estimators are defined in Section 4 and the Fourier estimators in Section 5. Thesemi-parametric setting is discussed in Section 6. The asymptotic properties of the waveletand Fourier semi-parametric estimators are described in Sections 7 and 8, respectively. Theirproperties are discussed further in Section 9. Section 10 contains the Monte-Carlo study whichcompares the effectiveness of the four methods. In Section 11, we illustrate the use of the softwarewritten in support of this work. This software may be obtained from the authors. Section 12contains concluding remarks.2. Definition of an M( d ) process Let X def = { X k } k ∈ Z be a real-valued process, not necessarily stationary. Its first order differenceis [ ∆ X ] n def = X n − X n − , n ∈ Z Its K -th order difference ∆ K X is defined recursively. We suppose that the process X has memoryparameter d , d ∈ R , in short, is an M( d ) process. We shall first define this notion for a stationaryprocess X , where d < /
2, and then provide a general definition for d ∈ R .Let f ∗ be a non-negative even function continuous and positive at the origin. A stationaryprocess X is said to have memory parameter d , −∞ < d < /
2, and short-range density function f ∗ , if its spectral density is given by f X ( λ ) def = | − e − i λ | − d f ∗ ( λ ) , λ ∈ ( − π, π ) , (1) AVELET ESTIMATORS OF LONG-MEMORY 5
To allow d > /
2, we consider non–stationary processes X and extend the preceding definition,valid for stationary processes, in the following way. Definition 1.
We say that X has memory parameter d , d ∈ R (in short, an M( d ) process), and short-range density function f ∗ , if f ∗ is continuous and positive at the origin and, for any integer K > d − / , its K -th order difference ∆ K X is stationary with spectral density function f ∆ K X ( λ ) = | − e − i λ | K − d ) f ∗ ( λ ) , λ ∈ ( − π, π ) . (2)Observe that f ∆ K X ( λ ) in (2) is integrable since − ( K − d ) < /
2. Observe also that if theprocess X is as in Definition 1, then while ∆ K X is stationary, the process X itself is stationaryonly when d < /
2. Nevertheless, one can associate to X the generalized spectral density function f X ( λ ) = | − e − i λ | − d f ∗ ( λ ) (3) Remark . This definition of M( d ) processes was proposed by Hurvich and Ray (1995). It hasthe advantage that ∆ K X is stationary, but it introduces a discontinuity at the fractional points d = 1 / , / , . . . since f ∆ K X is quite different at these values of d . In empirical work, there are typ-ically no inherent restrictions on the value of the memory parameter d , and this may cause a prob-lems if the degree of integer differencing required to achieve stationarity must be guessed in ad-vance. An alternative definition of M( d ) process has been introduced by Robinson (1994) and laterused by some authors (see Tanaka (1999), Shimotsu and Phillips (2005), Shimotsu and Phillips(2006)).The memory parameter d plays a central role in the definition of M( d ) processes because itcharacterizes the behavior of the generalized spectral density f X ( λ ) at low frequencies. Indeed,assuming that f ∗ is continuous at zero, then (3) implies f X ( λ ) ∼ | λ | − d f ∗ (0) as λ →
0. Allowing d to take non integer values produces a fundamental change in the correlation structure of afractional process, as compared to the correlation structure of a standard time-series model, suchas an ARMA( p, q ) process.The study of M( d ) processes has recently attracted attention amongst theorists and empiricalresearchers. In applied econometric work, M( d ) processes with d > d ) modelsencompass both stationary and nonstationary processes depending on the value of the memoryparameter and include both short-memory series M(0) and unit-root M(1) processes as specialcases when the memory parameter takes on the values zero and unity. GILLES FA ¨Y, ERIC MOULINES, FRANC¸ OIS ROUEFF, AND MURAD S. TAQQU Examples
Stationarity of the increments is commonly assumed in time-series analysis. In ARIMA models,for example, (2) holds with d = K integer and with f ∗ equal to the spectral density of anautoregressive moving average short-memory process. If d ∈ R and f ∗ ≡ σ in (3), one gets theso-called fractionally integrated white noise process, ARFIMA(0, d ,0). The choice d ∈ R and f ∗ ARMA ( λ ) = σ (cid:12)(cid:12) − P qk =1 θ k e − i λk (cid:12)(cid:12) (cid:12)(cid:12) − P pk =1 φ k e − i λk (cid:12)(cid:12) , λ ∈ ( − π, π ) , (4)with 1 − P pk =1 φ k z k = 0 for | z | = 1 and 1 − P pk =1 θ k = 0 (so that f ∗ ARMA (0) = 0) leads to theclass of ARFIMA( p, d, q ) processes.Another example is { B H ( k ) } k ∈ Z , a discrete-time version of fractional Brownian motion (FBM) { B H ( t ) , t ∈ R } with Hurst index H ∈ (0 , R H ( t, s ) def = E [ B H ( t ) B H ( s )] = 12 (cid:8) | t | H + | s | H − | t − s | H (cid:9) . The process { B H ( k ) } k ∈ Z is increment stationary ( K = 1) and its generalized spectral density isgiven up to a multiplicative constant (see Samorodnitsky and Taqqu (1994)) by f FBM ( λ ) def = ∞ X k = −∞ | λ + 2 kπ | − H − , λ ∈ ( − π, π ) . We can express it in the form (3), f FBM ( λ ) = | − e − i λ | − d f ∗ FBM ( λ ) , (5)by setting d = H + 1 / ∈ (1 / , /
2) and f ∗ FBM ( λ ) = (cid:12)(cid:12)(cid:12)(cid:12) λ/ λ (cid:12)(cid:12)(cid:12)(cid:12) H +1 + | λ/ | H +1 X k =0 | λ + 2 kπ | − H − . (6)Observe that f ∗ FBM (0) = 1 and that it is bounded on ( − π, π ).The process G H = ∆ B H is fractional Gaussian noise (FGN). It is a stationary Gaussian processwith spectral density proportional to (5), but with d = H − / ∈ ( − / , / Wavelet semi-parametric estimators of the memory parameter
In this section, we introduce the wavelet setting and, based on heuristical arguments, proposedpossible semi-parametric wavelet estimators. We start with a brief summary of the basic ideas.A wavelet ψ ( t ) , t ∈ R is a function with at least one vanishing moment, that is , R R ψ ( t )d t = 0,and which is low-pass , in the sense that its Fourier transform b ψ ( ξ ) decreases as ξ → ∞ . We thendefine the scaled and translated versions of ψ , namely ψ j,k ( t ) = 2 − j/ ψ (2 − j t − k ) , j, k ∈ Z . The scale index j dilates ψ so that large values of j correspond to coarse scales (low frequencies), while AVELET ESTIMATORS OF LONG-MEMORY 7 the position index k translates the function ψ (2 − j t ) to ψ (2 − j t − k ). The corresponding waveletcoefficients are then defined as W j,k = R R X ( t ) ψ j,k ( t )d t and are used to estimate d . Because b ψ is low-pass, b ψ j,k concentrates in the low frequency region as j → ∞ and f X ( λ ) “scales” at lowfrequencies since | − e i λ | − d ∼ | λ | − d as | λ | →
0. In the above definition of wavelet coefficients,we supposed, for simplicity, that the process { X ( t ) } t ∈ R is defined in continuous time and that theintegral above is well-defined. This definition can be adapted to discrete time series { X k , k ∈ Z } by using a scale function and also to finite samples X , . . . , X n by merely restricting the set ofscale and translation indices of available wavelet coefficients . This is in sharp contrast to Fourieranalysis, where the definition of discrete Fourier coefficients at a given frequency changes as thesample length increases. We now turn to a more formal presentation.4.1. The wavelet setting.
The wavelet setting involves a scale function φ ∈ L ( R ) and a wavelet ψ ∈ L ( R ), with associated Fourier transforms b φ ( ξ ) def = Z ∞−∞ φ ( t )e − i ξt d t and b ψ ( ξ ) def = Z ∞−∞ ψ ( t )e − i ξt d t . We assume the following:(W-1) φ and ψ are compactly-supported, integrable, and b φ (0) = R ∞−∞ φ ( t ) d t = 1 and R ∞−∞ ψ ( t ) d t =1.(W-2) There exists α > ξ ∈ R | b ψ ( ξ ) | (1 + | ξ | ) α < ∞ .(W-3) The function ψ has M vanishing moments, i.e. R ∞−∞ t m ψ ( t ) d t = 0 for all m = 0 , . . . , M − P k ∈ Z k m φ ( · − k ) is a polynomial of degree m for all m = 0 , . . . , M − b ψ decreases quickly to zero. Daubechieswavelets have α > α = 1. Condition (W-3) it ensures that ψ oscillates and that its scalar product withcontinuous-time polynomials up to degree M − M − b ψ vanish at the origin and hence | b ψ ( λ ) | = O ( | λ | M ) as λ → . (7)And, by (Cohen, 2003, Theorem 2.8.1, Page 90), (W-4) is equivalent tosup k =0 | b φ ( λ + 2 kπ ) | = O ( | λ | M ) as λ → . (8)As shown below, conditions (W-4)-(W-3) imply that the wavelet transform of discrete-time poly-nomials of degree M − GILLES FA ¨Y, ERIC MOULINES, FRANC¸ OIS ROUEFF, AND MURAD S. TAQQU
We now describe the computation of the wavelet coefficients. Define the family { ψ j,k , j ∈ Z , k ∈ Z } of translated and dilated functions ψ j,k ( t ) = 2 − j/ ψ (2 − j t − k ) , j ∈ Z , k ∈ Z . (9)Consider a real-valued sequence x = { x k , k ∈ Z } . We need to construct a continuous–timeprocess from a discrete–time one. Using the scaling function φ , we first associate to this sequencethe continuous-time functions x n ( t ) def = n X k =1 x k φ ( t − k ) and x ( t ) def = X k ∈ Z x k φ ( t − k ) , t ∈ R . (10)The function x n only requires the values of x , . . . , x n while the function x requires the wholesequence { x k , k ∈ Z } . Without loss of generality we may suppose that the supports of the scalingfunction φ and of the wavelet function ψ are included in [ − T ,
0] and [0 , T], respectively, for someinteger T ≥
1. This implies that x n ( t ) = x ( t ) for all t ∈ [0 , n − T + 1] and that the support of ψ j,k is included in the interval [2 j k, j ( k + T)]. The wavelet coefficient W x j,k at scale j ≥ k ∈ Z is defined as W x j,k def = Z ∞−∞ x ( t ) ψ j,k ( t ) d t = Z ∞−∞ x n ( t ) ψ j,k ( t ) d t, j ≥ , k ∈ Z . (11)The second equality holds when [2 j k, j ( k + T)] ⊆ [0 , n − T + 1], that is, for all ( j, k ) ∈ I n , where I n def = { ( j, k ) : j ≥ , ≤ k < n j } with n j = ⌊ − j ( n − T + 1) − T + 1 ⌋ . (12)In other words I n denotes the set of indices ( j, k ) for which the wavelet coefficients W j,k dependonly on x , . . . , x n . If the sample size n increases, these wavelet coefficients remain unchangedand new ones can be computed. Thus the definition of wavelet coefficients does not depend on thesample length, in contrast to Fourier coefficients. The wavelet coefficient W x j,k can be computedexplicitly by using discrete convolution and downsampling, namely, W x j,k = X l ∈ Z x l h j, j k − l = ( h j, · ⋆ x ) j k = ( ↓ j [ h j, · ⋆ x ]) k , j ≥ , k ∈ Z , (13)where, for all j ≥
0, the impulse response h j, · is defined by h j,l def = 2 − j/ Z ∞−∞ φ ( t + l ) ψ (2 − j t ) d t, l ∈ Z , (14)where ’ ⋆ ’ denotes the convolution of discrete sequences and ↓ j is the j –power downsamplingoperator defined, for any sequence { c k } k ∈ Z , by ( ↓ j c ) k = c k j . Since φ and ψ have compactsupport, the associated transfer function H j is a trigonometric polynomial, H j ( λ ) = X l ∈ Z h j,l e − i λl = − X l = − T(2 j +1)+1 h j,l e − i λl . (15) AVELET ESTIMATORS OF LONG-MEMORY 9
Under assumption (W-4), t P l ∈ Z φ ( t + l ) l m is a polynomial of degree m and (W-3) thereforeimplies that, for all j ≥ m = 0 , . . . , M − X l ∈ Z h j,l l m = 2 − j/ Z ∞−∞ ψ (2 − j t ) X l ∈ Z φ ( t + l ) l m dt = 0 . (16)Now consider P j ( x ) = P l ∈ Z h j,l x l and observe that (16) implies P j (1) = 0, P ′ j (1) = 0, ..., P ( M − j (1) = 0, and hence H j ( λ ) = P j (e − i λ ) factors as H j ( λ ) = (1 − e − i λ ) M ˜ H j ( λ ) , (17)where ˜ H j ( λ ) is also a trigonometric polynomial. The wavelet coefficient (13) may therefore becomputed as W x j,k = ( ↓ j [˜ h j, · ⋆ ∆ M x ]) k (18)where { ˜ h j,l } l ∈ Z are the coefficients of the trigonometric polynomial ˜ H j and ∆ M x is the M -thorder difference of the sequence x . In other words, the use of a wavelet and a scaling functionsatisfying (W-4) and (W-3) implicitly perform a M -th order differentiation of the time-series.Therefore, we may work with an M( d ) processes X beyond the stationary regime ( d > / K without specific preprocessing, providedthat d − M < / M ≥ K + 1. It is perhaps less known that wavelets can be used withnon-invertible processes ( d ≤ − /
2) thanks to the decay property of b ψ at infinity assumed in(W-2).4.2. Choice of the wavelets.
In this paper, we do not assume that ψ j,k are orthonormal inL ( R ) nor that they are associated to a multiresolution analysis (MRA). We may therefore useother convenient choices for φ and ψ as long as (W-1)-(W-4) are satisfied. A simple choice is forinstance, for some integer M ≥ φ ( t ) def = ⋆M [0 , ( t ) and ψ ( t ) def = c M d M d t M ⋆ M [0 , (2 t ) , (19)where A is the indicator function of the set A and for an integrable function f , f ⋆M denotes the M -th self-convolution of f , f ⋆M = f ⋆ · · · ⋆ f | {z } M times , with ( f ⋆ g )( t ) = Z ∞−∞ f ( t − u ) g ( u ) d u , and c M is a normalizing positive constant such that R ∞−∞ ψ ( t ) d t = 1.Scaling and wavelet functions associated to an MRA present two important features: 1) theygive raise orthonormal L ( R ) bases { ψ j,k } ; 2) a recursive algorithm, the so–called pyramidalalgorithm, is available for performing the convolution/downsampling operations at all scale j .The complexity of this algorithm is O ( n ) for a sample of length n , see Mallat (1998). In other words, in an MRA, the computation (13) can be made recursively as j grows and it is not necessaryto explicitly compute the filters h j, · defined in (14).Assumptions (W-1)-(W-4) are standard in the context of an MRA, see for instance Cohen(2003). Common wavelets are Daubechies wavelet and Coiflets (for which the scale functionalso has vanishing moments). What matters in the asymptotic theory of wavelet estimatorspresented below is the number of vanishing moments M and the decay exponent α , which bothdetermine the frequency resolution of ψ . For standard wavelets, M is always known and (Cohen,2003, Remark 2.7.1, Page 86) provides a sequence of lower bounds ( α k ) tending to α as k → ∞ .Daubechies wavelet are defined by their number M of vanishing moments, for any M ≥ M = 1 corresponds to the so called Haar wavelet). An analytic formula for their decay exponent α is available, see (Daubechies, 1992, Eq (7.1.23), Page 225 and the table on Page 226) and notethat our α equals the α of Daubechies (1992) plus 1. A simpler lower bound α ≥ (1 − log (3) / M holds for Daubechies wavelets, see Daubechies (1992). Although it is not sharp, it shows that thenumber of vanishing moments M and decay exponent α of Daubechies’s wavelets can be madearbitrarily large at the same time. Table 1 provides some values of α for Daubechies wavelets andthe lower bound α k with k = 10 for Coiflets with given number of vanishing moments M rangingfrom 2 to 10. M α (DB) 1.3390 1.6360 1.9125 2.1766 2.4322 2.6817 2.9265 3.1676 3.4057 α (Coif.) 1.6196 N.A. 1.9814 N.A. 2.5374 N.A. 3.0648 N.A. 3.5744 Table 1.
The decay exponent α or its lower bound α of | b φ ( ξ ) | (and hence of | b ψ ( ξ ) | ) with M vanishing moments. First line: M ; second line: α for Daubechieswavelet; third line: the lower bound α for the Coiflet. N.A. stands for notavailable (Coiflets are defined for M even).In view of Table 1, one can observe that the decays of Coiflets are slightly faster than the onesof Daubechies for given M ’s. On the other hand the Daubechies wavelets have shorter support,since it is of length T = 2 M while it is of length T = 3 M for Coiflets. The support length impactson the number of available wavelet coefficients: given a sample size n , the greater the supportlength T the smaller the cardinality of the set I n defined in (12).We should also mention the so-called Shannon wavelet ψ S whose Fourier transform b ψ S satisfies | c ψ S ( ξ ) | = | ξ | ∈ [ π, π ]0 otherwise . (20) AVELET ESTIMATORS OF LONG-MEMORY 11
This wavelet satisfies (W-2)–(W-4) for arbitrary large M and α but does not have compactsupport, hence it does not satisfy (W-1). We may therefore not choose this wavelet in ouranalysis. It is of interest, however, because it gives a rough idea of what happens when α and M are large since one can always construct a wavelet ψ satisfying (W-1)–(W-4) which is arbitrarilyclose to the Shannon wavelet.4.3. The local regression wavelet (
LRW ) estimator of d . For any integers n , j and j , j ≤ j , the set of all available wavelet coefficients from n observations X , . . . , X n having scaleindices between j and j is I n ( j , j ) def = { ( j, k ) : j ≤ j ≤ j , ≤ k < n j } , (21)where n j is given in (12). Consider two integers L < U satisfying0 ≤ L < U ≤ J n def = max { j : n j ≥ } . (22)The index J n is the maximal available scale index for the sample size n ; L and U will denote,respectively, the lower and upper scale indices used in the estimation. For an M( d ) process, underregularity conditions on the short-memory part f ∗ and for appropriately chosen scale functionand wavelet φ and ψ , it may be shown that as j → ∞ , σ j ( d, f ∗ ) def = Var[ W Xj, ] ≍ σ dj and theempirical variance b σ j def = n − j n j − X k =0 (cid:0) W Xj,k (cid:1) , (23)is a consistent sequence of estimator of σ j ( d, f ∗ ) (see Proposition 2). A popular semi-parametricestimator of the memory parameter d is the local regression wavelet (LRW) estimator of Abry and Veitch(1998), defined as the least squares estimator in the ”linear regression model”log (cid:2)b σ j (cid:3) = log σ + dj { } + u j , where u j = log[ b σ j /σ dj ]. This regression problem can be solved in closed form: b d LRW n ( L, U, w ) def = U X j = L w j − L log (cid:0)b σ j (cid:1) , (24)(in short b d LRW ) where the vector w def = [ w , . . . , w U − L ] T of weights satisfies U − L X j =0 w j = 0 and 2 log(2) U − L X j =0 jw j = 1 . (25)For U − L = ℓ ≥
1, one may choose, for example, w corresponding to the weighted least-squaresregression vector, defined by w def = DB ( B T DB ) − b (26) where b def = " − , B def = " . . .
10 1 . . . ℓ T (27)is the design matrix and D is an arbitrary positive definite matrix. We will discuss the choiceof the regression weights w after stating Theorem 3, which provides the asymptotic variance of b d LRW ( L, U, w ).4.4. The local Whittle wavelet (
LWW ) estimator of d . Let { c j,k , ( j, k ) ∈ I} be an arrayof centered independent Gaussian random variables with variance Var( c j,k ) = σ j,k , where I isa finite set. The negative of its log-likelihood is (1 / P ( j,k ) ∈I n c j,k /σ j,k + log( σ j,k ) o up to aconstant additive term. The local Whittle wavelet (LWW) estimator uses such a contrast processto estimate the memory parameter d by choosing c j,k = W Xj,k and I = I n ( L, U ) as defined in(21) for appropriately chosen lower and upper scale indices L and U , so that the correspondingwavelet coefficients W Xj,k are computed from X , . . . , X n . The scaling property σ j ( d, f ∗ ) ≍ σ dj and weak dependence conditions of the wavelet coefficients then suggest the following pseudo negative log-likelihood b L I ( σ , d ) = 12 σ X ( j,k ) ∈I − dj ( W Xj,k ) + |I| σ hIi d ) , where |I| denotes the cardinal of I and hIi is the average scale, hIi def = |I| − P ( j,k ) ∈I j . Define b σ I ( d ) def = Argmin σ > b L I ( σ , d ) = |I| − P ( j,k ) ∈I − dj ( W Xj,k ) . The pseudo maximum likelihoodestimator of the memory parameter is then equal to the minimum of the negative profile log-likelihood, b d LWW ( L, U ) def = Argmin d ∈ [∆ , ∆ ] b L I n ( L,U ) ( b σ I ( d ) , d ) (28)where [∆ , ∆ ] is an interval of admissible values for d and˜L I ( d ) def = log X ( j,k ) ∈I d ( hIi− j ) ( W Xj,k ) . (29)This estimator has been proposed for analyzing noisy data in Wornell and Oppenheim (1992), andwas then considered by several authors, mostly in a parametric context, see e.g. Kaplan and Kuo(1993) and McCoy and Walden (1996). If I contains at least two different scales then ˜L I ( d ) → ∞ as d → ±∞ , and thus b d is finite. This contrast is strictly convex, and the minimum is unique: itcan be found using any one-dimensional convex optimization procedure. In contrast to the thelocal regression wavelet estimator, the definition of LWWE does not relies on particular weights.An important issue for both estimators is the choice of the scale indices L and U . The asymptotic AVELET ESTIMATORS OF LONG-MEMORY 13 theory developed for these estimators in a semi-parametric context sheds some light on the roleplayed by these quantities, as will be explained in Section 7.5.
Fourier semi-parametric estimators of the memory parameter
The periodogram.
Given n observations X , · · · , X n , the discrete Fourier transform (DFT)and the periodogram are respectively defined as D X ( λ ) def = (2 πn ) − / n X t =1 X t e i tλ , I X ( λ ) def = | D X ( λ ) | . (30)These quantities are computed at the Fourier frequencies λ j def = 2 πj/n , for k ∈ { , . . . , ˜ n } , where ˜ n = ⌊ ( n − / ⌋ . (31)For stationary and invertible M ( d ) processes (see for example Lahiri (2003)), the DFT coeffi-cients at Fourier frequencies are known to be approximately asymptotically independent outsidea shrinking neighborhood of zero. Thus the Fourier transform performs a whitening of the data,and, as a consequence, Fourier methods have neat asymptotic statistical properties.5.1.1. Differencing and Tapering.
To overcome the presence of polynomial or smooth trends (seeHurvich et al. (2005a)), or to estimate the memory parameter of an M ( d ) process beyond thestationary regime ( d > / X or to its δ -th order difference ∆ δ X . A taper is anon-random weight function (with certain desired properties) that is multiplied to the time-series(or its difference) prior to Fourier transformation. Tapering was originally used in nonparametricspectral analysis of short memory ( d = 0) time series in order to reduce bias due to frequencydomain leakage, where part of the spectrum ”leaks” into adjacent frequencies. The leakage is dueto the discontinuity caused by the finiteness of the sample and is reduced by using tapers whichsmooth this discontinuity. But such a bias reduction inflates the variance as will be seen later ina special case in the context of long memory semi-parametric estimation (see section 8).The idea of applying taper directly to the observations X was proposed by Velasco (1999a,b);Velasco and Robinson (2000), who considered several tapering schemes such as the cosine belland the Zurbenko-Kolmogorov tapers (ˇZurbenko, 1979). These tapers h t , t = 1 , . . . , n havethe property of being orthogonal to polynomials up to a given order, for a subset of Fourierfrequencies, n X t =1 (1 + t + · · · + t δ − ) h t e i tλ j = 0 , j ∈ J δ,n ⊂ { , . . . , ˜ n } , (32)where λ j and ˜ n are defined in (31). A problem with this approach is that the efficiency loss dueto these tapers may be quite substantial, because the set J δ,n can be fairly small when δ is large. In this contribution, we rather focus on the construction suggested in Hurvich and Ray (1995)and later developed in Hurvich and Chen (2000), which consists in differencing before tapering.Differencing is a very widely used technique for detrending and inducing stationarity. The δ -th order difference will convert the memory parameter of a M ( d ) process to d − δ , and willcompletely remove a polynomial trend of degree δ −
1. To apply this technique, an upper boundto the memory parameter (or to the degree of the polynomial trend) should be known in advance.But if only an upper bound is known, δ may be chosen too large and consequently the δ -th orderdifference may be non-invertible, that is one may have d − δ ≤ − /
2. This situation, referred to as over-differencing which may cause difficulties in spectral inference (see Hurvich and Ray (1995)).As was suggested by these authors, the use of a data taper can alleviate the detrimental effectof overdifferencing. A main drawback with this approach is that tapering inflates the variance ofthe estimator. To minimize this effect, the tapers should be chosen carefully.Hurvich and Chen (2000) have defined a family of data taper depending on a single parameter τ , referred to as the taper order . Set h t = 1 − e πt/n and, for any integer τ ≥
0, define the taperedDFT of order τ of the sequence x = { x k , k ∈ Z } as follows D x τ ( λ ) def = (2 πna τ ) − / n X t =1 h τt x t e i tλ , I x τ ( λ ) def = | D x τ ( λ ) | (33)where the subscript τ denotes the taper order and a τ def = n − P nt =1 | h t | τ is a normalization factor.As shown in (Hurvich and Chen, 2000, Lemma 0), the decay of the discrete Fourier transform ofthe taper of order τ is given by (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) (2 πna τ ) − / n X t =1 h τt e i tλ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ C n (1 + n | λ | ) τ , λ ∈ ( − π, π ) . This property means that higher-order tapers control the leakage more effectively.Note that the Fourier transform of the taper may be expressed as a finite sum of shiftedDirichlet kernels, n X t =1 h τt e i tλ = τ X k =0 ( n X t =1 e i t ( λ + λ k ) ) (34)Since P nt =1 e i tλ k = 0, this relation implies that for j ∈ { , . . . , ˜ n − τ } , P nt =1 h τt e i tλ j = 0 so that thetapered Fourier transform (evaluated at Fourier frequencies) is invariant to shift in the mean. Thisshift-invariance is achieved without restricting attention to a coarse grid of Fourier frequencies,as is necessary for the Zhurbenko-Kolmogorov taper (Velasco (1999a)).However, the construction of theoretically-justified memory estimators using the tapered peri-odogram may require dropping some Fourier frequencies as will be seen in the definition of thepooled periodogram in (35). This is related to the following observation. For τ = 0 (no taper), theDFT coefficients D Zτ ( λ j ) of a white noise { Z t } at Fourier frequencies λ k , λ j , k = j ∈ { , . . . , ˜ n } AVELET ESTIMATORS OF LONG-MEMORY 15 are uncorrelated. This property is lost by tapering. For τ ≥
1, the correlation E [ D Zτ ( λ j ) D Zτ ( λ k )]is equal to 0 if | k − j | > τ , and (2 πa τ ) − ( − k (cid:0) ττ + k (cid:1) if | k | ≤ τ .5.1.2. Pooling.
Let I x τ ( λ ) be the tapered periodogram introduced in (33). When considering non-linear transformations of the periodogram such as taking logarithm, variance reduction can beobtained by pooling groups of finitely many, say p , consecutive I x τ ( λ j ), which results in a pooledperiodogram (see Hannan and Nicholls (1977) and Robinson (1995b)). To understand why poolingmay be helpful, recall that if Z is a white Gaussian noise, then the variance of the running meanof the log-periodogram p − / P pk =1 log I Z ( λ j + k ) is ψ ′ (1) whereas the variance of the logarithm ofthe running mean p − / log (cid:0)P pk =1 I Z ( λ j + k ) (cid:1) is pψ ′ ( p ), where ψ ( z ) = Γ ′ ( z ) / Γ( z ) is the digammafunction (see for instance Johnson and Kotz (1970)). The quantity pψ ′ ( p ) decreases from π / p goes from 1 to ∞ . For p = 3, pψ ′ ( p ) is 1.1848 and its value changes slowly thereafteras pψ ′ ( p ) = 1 + 1 / (2 p ) + O ( p − ). Nonetheless, this shows that the variance of the local averageof the log-periodogram is larger than the variance of the logarithm of local average and explainswhy typical values of p are p = 3 , tapered periodogram ordinates, if p successivevalues of the periodogram are pooled, then, at the end of the block, τ DFT coefficients aredropped, where τ is the taper order. More precisely, set K ( p, τ ) = [( n − / p + τ )] and for k ∈ { , · · · , K ( p, τ ) } , define the pooled periodogram as follows¯ I x p,τ (˜ λ k ) def = ( p + τ )( k − p X j =( p + τ )( k − I x τ ( λ j ) . (35)where ˜ λ k def = p − P ( p + τ ) kj =( p + τ )( k − λ j = (2( p + τ )( k −
1) + p + τ + 1) π/n . The definition of thecentral frequency ˜ λ k seems somehow arbitrary. Our choice is motivated by the fact that theChen and Hurvich’s taper actually mixes together τ adjacent periodogram ordinates, so that( p + τ ) Fourier frequencies are mixed in each pooled and tapered periodogram ordinate. Notethat the bias of the GPH estimator defined below is very sensitive to the definition of this centralfrequency.5.2. The Geweke–Porter–Hudak (
GPH ) estimator of d . Assume that the differencing order δ induces stationarity, i.e. d < δ + 1 /
2, and the taper order τ is larger than δ . Then, for certainsequences { ℓ n } and { m n } which increase slowly with n , and under smoothness conditions on theshort memory component f ∗ , the ratios of the pooled periodogram of the δ -th order differenceof Y def = ∆ δ X divided by its spectral density ¯ I Yp,τ (˜ λ k ) /f Y (˜ λ k ), ℓ n ≤ k ≤ m n , can be regarded asapproximately independent and identically distributed (i.i.d.) in a sense that can be rigorouslycharacterized; Robinson (1995b) and Lahiri (2003). Based on this heuristics, a popular semiparametric estimate of d is the log-periodogram estimateof Geweke and Porter-Hudak (1983), defined here [in the manner of Robinson (1995b)] as the leastsquares estimate in the linear regression model log h ¯ I Yp,τ ( ˜ λ k ) i = log f ∗ (0) + ( d − δ ) g (˜ λ k ) + u k , ≤ k ≤ m , (36)where g ( λ ) def = − | − e i λ | and u k is ”approximately” equal to log h ¯ I Yp,τ (˜ λ k ) /f X (˜ λ k ) i . Thisregression equation can be solved in closed form : b d GPH ( m ) = m X k =1 n g (˜ λ k ) − m − P mk =1 g (˜ λ k ) oP mk =1 n g (˜ λ k ) − m − P mk =1 g (˜ λ k ) o log[ ¯ I Yp,τ (˜ λ k )] + δ . (37)To simplify the notations, we have made the dependence in the differencing, tapering and poolingorders implicit. The choice of these orders δ , τ and p will be discussed in Section 8. This estimatorhas been introduced by Geweke and Porter-Hudak (1983) and was later used in many empiricalworks. Remark . In the definition of the GPH estimator in Robinson (1995b), the first ℓ n DFT coeffi-cients are eliminated. ℓ n is referred to as the trimming number . Trimming is sometimes requiredto eliminate deterministic trend (Hurvich et al., 2005a), or to deal with non-Gaussian processes(Velasco, 2000).5.3. The local Whittle Fourier (
LWF ) estimator of d . Since, as mentioned above, I Yτ ( λ k ) /f Y ( λ k )can be regarded as approximately i.i.d. and f Y ( λ ) ≈ C | − e i λ | − d − δ ) in the neighborhood of zero,using the same arguments as in Section 4.4, we may approximate the negated likelihood as follows: b L τ,m ( C, d ) = m − m X k =1 (cid:26) log( C | − e i λ k | − d − δ ) ) + I Yτ ( λ k ) C | − e i λ k | − d − δ ) (cid:27) . (38)Note that pooling is here irrelevant because non non-linear transformation is involved. Thisestimator was originally proposed by K¨unsch (1987) and later studied in Robinson (1995a). Aftereliminating C by maximizing the contrast (38), we get ˜ L LWF τ,m the profile likelihood , defined as˜ L LWF τ,m ( d ) def = log m − m X k =1 I Yτ ( λ k ) | − e i λ k | d − δ ) ! − d − δ ) m − m X k =1 log( | − e i λ k | ) , (39)and we define the Local Whittle Fourier estimator (LWF) as the minimum b d LWF ( m ) def = Argmin ¯ d ∈ R ˜ L τ,m ( ¯ d ) + δ . (40)The function d → ˜ L τ,m ( d ) is convex and thus admit a single global minimum, which can beobtained numerically by using a standard one-dimensional convex optimization algorithm. InRobinson (1995a), the minimization in (40) is performed over a closed interval which was supposed AVELET ESTIMATORS OF LONG-MEMORY 17 to include the true value of the parameter. But, because the contrast is strictly convex, there isin fact no need to impose such a restriction.5.4.
The exact and the non-stationary extended local Whittle estimators.
To concludethis section, let us mention two recent works on the estimation of the memory parameter fornon-stationary M ( d ) processes, d ≥ /
2, which are not covered in details in this contributionbecause they are derive under slightly different conditions and henceforth do not compare wellwith wavelet estimators.Shimotsu and Phillips (2005) introduced an exact local Whittle estimator. It is applicablewhen the M ( d ) series is generated by a linear process and when the domain of d is not wider than9/2. Their estimator is based on fractional differencing of the data and the complexity of theiralgorithm is of the order n , where n is the number of observations. In contrast, the complexityof the estimators we consider is of the order of n log ( n ); see the discussion in Moulines et al.(2007c). Note also that the model considered by Shimotsu and Phillips (2005) is not an M ( d )process in the sense given above and is not time-shift invariant, see their Eq. (1). In addition,their estimator is not invariant upon addition of a constant in the data, a drawback which is noteasily dealt with, see their Remark 2.Abadir et al. (2007) propose to extend the local Whittle estimator to d ∈ ( − / , ∞ ) calling itthe fully extended local Whittle estimator. This estimator is based on an extended definition ofthe DFT ˜ D X ( λ j , d ), which include correction terms, i.e. ˜ D X ( λ j , d ) def = D X ( λ j ) + k X ( λ j , d ). Thecorrection term k X ( λ j , d ), which takes constant values on the intervals d ∈ [ p − / , p + 1 / p = 0 , , . . . is defined as k X ( λ j , d ) = 0 if d ∈ ( − / , /
2) and k X ( λ j , d ) = e − i λ j p X r =1 (1 − e − i λ j ) − r Z n,r , d ∈ [ p − / , p + 1 / , p = 1 , , . . . , (41)where Z n,r def = (2 πn ) − / ( ∆ r − X n − ∆ r − X ), r = 1 , . . . , p . The corresponding corrected peri-odogram ˜ I X ( λ j , d ) is given by ˜ I X ( λ j , d ) = | ˜ D X ( λ j , d ) | . The extended local Whittle estimator isthen defined as b d LWF = Argmin ¯ d ∈ [ d min ,d max ] L LWF m ( ¯ d ) where L LWF ( d ) is L LWF m ( d ) def = log m − m X k =1 ˜ I X ( λ k , d ) | − e i λ k | d ! − dm − m X k =1 log( | − e i λ k | ) . Note that to compute k X ( λ j ; d ), we have to involve additional observations X − p +1 , . . . , X n , where p def = 0 ∨ ⌊ d max − / ⌋ . Compared to the Shimotsu and Phillips (2005) estimator, this estimatoris easy to evaluate numerically, but the approximation of the extended local Whittle function isnot continuous at d = 1 / , / , . . . , which does not allow one to obtain limit theorems at thesepoints (and, in the finite sample case, causes disturbances in the neighborhood of these values). In addition, this estimator is not robust to the presence of polynomial trends in the data (if d isthe memory parameter, the method tolerate a polynomial trend of degree at most ⌊ d + 1 / ⌋ ).6. The semi-parametric estimation setting
The theory of semi-parametric Fourier estimators was developed in two fundamental papers byRobinson, Robinson (1995b) and Robinson (1995a), which establish, under suitable conditions,the asymptotic normality of the GPH and the LWE estimators in the stationary case. Theseresults were later extended to non-stationary M( d ) processes for different versions of the memoryestimator and under various sets of assumptions. The theory of semi-parametric wavelet estima-tors was developed much more recently in Moulines et al. (2007b) and Moulines et al. (2007c)(some preliminary results are in Bardet et al. (2000) and Bardet (2002)). To allow for compar-ison the wavelet and the Fourier approaches, the asymptotic properties of the estimators arepresented under a common set of assumptions. Because the theory of wavelet estimators is muchless developed than the theory of Fourier estimators, these assumptions can often be relaxed inthe context of Fourier estimators.There are two types of additional assumptions that enter into play in an asymptotic theory.First, the semi-parametric rates of convergence depends on the smoothness of the short-memorycomponent in a neighborhood of zero frequency. The most common assumption, introduced in(Robinson, 1995b), is a H¨older condition on the short-memory component of the spectral density f ∗ in (1). Definition 2.
For any < β ≤ , γ > and ε ∈ (0 , π ] , H ( β, γ, ε ) is the set of all non-negativeand even function g that satisfies g (0) > and for all λ ∈ ( − ε, ε ) | g ( λ ) − g (0) | ≤ γ g (0) | λ | β . (42)The larger the value of β , the smoother the function at the origin. Observe that if f ∗ iseven – as assumed – and if it is in addition infinitely differentiable, then f ∗ ′ (0) = 0 and hence,by a Taylor expansion, (42) holds with β = 2, that is, in this case, one has f ∗ ∈ H (2 , γ, ε ).Andrews and Guggenberger (2003) extend this definition to the case β > (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) g ( λ ) − g (0) − ⌊ β/ ⌋ X k =1 ϕ k λ k / (2 k !) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ γg (0) | λ | β − ⌊ β/ ⌋ , (43)with | ϕ k | ≤ γg (0), for any k ∈ { , . . . , ⌊ β/ ⌋} . To take advantage of this more refined smooth-ness assumption when β >
2, bias reduction techniques must be applied (see for exampleAndrews and Guggenberger (2003); Robinson and Henry (2003); Andrews and Sun (2004)). Wewill only consider β ≤ AVELET ESTIMATORS OF LONG-MEMORY 19
Second, the definition of an M( d ) process accounts only for the spectral (or equivalently co-variance) structure, which specifies the distribution of the process if X is Gaussian. To extendthe results in the non-Gaussian context, it is necessary to specify the distribution of the processbeyond its second-order properties. The most complete asymptotic theory has been developed sofar for linear processes. Definition 3.
We say that X is a strong linear M( d ) process if there exist a L ( Z ) -sequence { a s } s ∈ Z and an i.i.d. sequence { Z s } s ∈ Z satisfying E [ Z ] = 0 , E [ Z ] < ∞ , and for any t ∈ Z , ( ∆ K X ) t = X s ∈ Z a s Z t − s , K def = ⌊ d + 1 / ⌋ . (44) Remark . According to the standard terminology, strong here refers to the fact that { Z t } isi.i.d. (or a strong white noise). This assumption can often be relaxed by supposing that { Z t } is amartingale difference sequence ( E [ Z t | F t − ] = 0 where F t = σ ( Z s , s ≤ t ) is the natural filtrationof the process) satisfying various additional conditional moment assumptions. For example,Robinson (1995a), and many authors after that, assume that { Z t − E [ Z t ] } is a square integrablemartingale difference.The following theorem, established in Giraitis et al. (1997), provides a lower bound for theestimation error. Theorem 1.
Let d min < d max in R , ε ∈ (0 , π ] , β ∈ (0 , and γ > . There exists a constant c > such that, lim inf n →∞ inf b d n sup d min ≤ d ≤ d max sup f ∗ ∈H ( β,γ,ε ) P d,f ∗ (cid:16) n β/ (2 β +1) | b d n − d | ≥ c (cid:17) > , (45) where the infimum inf b d n is taken over all possible estimators based on { X , · · · , X n } and P d,f ∗ denotes the distribution of a Gaussian M( d ) process { X t } t ∈ Z with generalized spectral density ofthe form (3) . We shall see in the sequel that the best possible rate n β/ (2 β +1) is achieved by both wavelet andFourier estimators when X is a strong linear M( d ) process.The theory of the semi-parametric estimation of d for several non-linear M( d ) processes, usedin particular in financial econometric, and in teletrafic modeling, have also been investigatedin the literature (see the recent surveys Deo et al. (2006a) and Teyssi`ere and Abry (2007) andthe references therein). The stochastic volatility model (a special instance of the signal plusnoise model) has been considered in Hurvich et al. (2005b); Deo et al. (2006b), which establishconsistency, rate of convergence and asymptotic normality of an appropriately modified LWFestimator. Dalla et al. (2006) provide general conditions under which the LWF estimator of thememory parameter of a stationary process is consistent and examines its rate of convergence. This class of processes include, among others, signal plus noise processes, nonlinear transformsof a Gaussian process, and exponential generalized autoregressive, conditionally heteroscedastic(EGARCH) models, etc... Fa¨y et al. (2007) provide the consistency and rate of convergenceof the LWW estimator for the infinite source Poisson process. The results are in general notas complete as in the linear case, and the required assumptions are specific to each consideredmodel (abstract assumptions like in (Dalla et al., 2006, Eq. (8)) or (Moulines et al., 2007c) can beused, but checking these still require model-dependent conditions). In addition, the asymptoticbehavior of the estimators are model-dependent and might depart significantly from the resultsobtained in the linear case.7.
Asymptotic properties of the wavelet estimators
LRW and
LWW7.1.
The between-scale process.
Before stating the main known results on the asymptoticbehavior of LWW and LRW estimators, some additional definitions related to the spectral densityof wavelet coefficients are required.If the process X is an M( d ) process, as defined in Section 2, and M > d − /
2, then ∆ M X is weakly stationary. It follows that the process { W Xj,k } k ∈ Z of wavelet coefficients at scale j ≥ k . However the two–dimensional process { [ W Xj,k , W Xj ′ ,k ] T } k ∈ Z of waveletcoefficients at two different scales j and j ′ , with j = j ′ , is not weakly stationary. This is why weconsider the between-scale process { [ W Xj,k , W Xj,k ( j − j ′ ) T ] T } k ∈ Z , where W Xj,k ( u ) def = (cid:2) W Xj − u, u k , W Xj − u, u k +1 , . . . , W Xj − u, u k +2 u − (cid:3) T . (46)The index u in (46) denotes the scale difference j − j ′ ≥ j ′ and thecoarsest scale j . If u = 0, that is j = j ′ , then W Xj,k (0) is the scalar W Xj,k . For u >
0, the secondcomponent of the between–scale process is a vector whose entries are the wavelet coefficients withscale index j − u = j ′ and translation indices 2 j − j ′ k, j − j ′ k + 1 , . . . , j − j ′ ( k + 1) −
1. Using (13),it follows that W Xj,k ( u ) = [( h j − u, · ⋆ x ) j k + l j − u , l = 0 , . . . , u − T = h ( ↓ j ◦ B − l j − u [ h j − u, · ⋆ x ]) k , l = 0 , . . . , u − i T , where B is the shift operator defined, for any sequence { c k } k ∈ Z , by ( B c ) k = c k − . The between–scale process is stationary in k because all its entries can be expressed with the same downsamplingoperator ↓ j applied to jointly stationary time series, namely, h j, · ⋆ x and B − l j − u [ h j − u, · ⋆ x ] =[ B − l j − u h j − u, · ] ⋆ x , l = 0 , . . . , u −
1. One can therefore write, for all 0 ≤ u ≤ j ,Cov f ( W Xj,k , W Xj,k ′ ( u )) = Z π − π e i λ ( k − k ′ ) D j,u ( λ ; f ) d λ , AVELET ESTIMATORS OF LONG-MEMORY 21 where D j,u ( λ ; f ) is the cross-spectral density function of the between-scale process. The case u = 0 corresponds to the spectral density of the within-scale process { W Xj,k } k ∈ Z . When X is anM( d ) process with spectral density function (3), we often denote D j,u ( λ ; f ) by D j,u ( λ ; d, f ∗ ).7.2. Generalized fractional Brownian motion.
We shall approximate the within- and between-scale spectral densities D j,u ( λ ; d, f ∗ ) of the process X with memory parameter d ∈ R by thecorresponding spectral densities of the generalized fractional Brownian motion B ( d ) . This processis parametrized by a family Θ ( d ) of smooth test functions θ ( t ), t ∈ R and is defined as follows: { B ( d ) ( θ ) , θ ∈ Θ ( d ) } is a mean zero Gaussian process with covarianceCov (cid:0) B ( d ) ( θ ) , B ( d ) ( θ ) (cid:1) = Z R | λ | − d b θ ( λ ) b θ ( λ ) d λ . (47)The finiteness of the integral R R | λ | − d | b θ ( λ ) | d λ provides a constraint on the family Θ ( d ) . Forinstance, when d > /
2, this condition requires that b θ ( λ ) decays sufficiently quickly at the originand, when d <
0, it requires that b θ ( λ ) decreases sufficiently rapidly at infinity. Hence, under(W-1)-(W-4), θ can be a wavelet ψ if d ∈ (1 / − α, M + 1 / B ( d ) is defined as W ( d ) j,k def = B ( d ) ( ψ j,k ) , ( j, k ) ∈ Z × Z . (48)As shown in (Moulines et al., 2007b, Relation (35)), for all j , k and k ′ in Z , and u ≥
0, one hasCov (cid:16) W ( d ) j,k , W ( d ) j,k ′ ( u ) (cid:17) = 2 dj Z π − π D ∞ ,u ( λ ; d ) e i λ ( k − k ′ ) d λ , (49)where D ∞ ,u ( λ ; d ) does not involve j and is given by D ∞ ,u ( λ ; d ) = X l ∈ Z | λ + 2 lπ | − d e u ( λ + 2 lπ ) b ψ ( λ + 2 lπ ) b ψ (2 − u ( λ + 2 lπ )) , (50)where, for all u ≥ e u ( ξ ) def = 2 − u/ [1 , e − i2 − u ξ , . . . , e − i(2 u − − u ξ ] T , ξ ∈ R . As mentioned above, W ( d ) j,k is well defined under (W-1)-(W-4) when d ∈ (1 / − α, M + 1 / b ψ in (W-2) and (7), one easily gets that for any u ≥ D ∞ ,u ( λ ; d )is continuous ( − π, π ) \ { } and | D ∞ ,u ( λ ; d ) | = O ( | λ | M − d ) ) as λ →
0. If moreover d ≤ M , onehas (see Relation (72) in Moulines et al. (2007c))sup u ≥ u (2 d − / Z π − π | D ∞ ,u ( λ ; d ) | d λ < ∞ . (51)The variance σ j ( d, f ∗ ) def = Var[ W Xj, ] can be interpreted as a scale spectrum , in words, the power at scale j . It is approximated by f ∗ (0)K( d ) 2 jd where the constant K( d ) is given byK( d ) def = Z ∞−∞ | ξ | − d | b ψ ( ξ ) | dξ . (52) Uniform bounds.
Using (Moulines et al., 2007b, Theorem 1) and (Moulines et al., 2007c,Theorem 1), the following result holds.
Proposition 2.
Let X be an M( d ) process with d ∈ R and f ∗ ∈ H ( β, γ, ε ) for some β, γ > and ε ∈ (0 , π ] . Assume (W-1)-(W-4) with (1 + β ) / − α < d < M + 1 / , (cid:12)(cid:12)(cid:12) σ j ( d, f ∗ ) − f ∗ (0) K( d ) 2 jd (cid:12)(cid:12)(cid:12) ≤ C f ∗ (0) L (2 d − β ) j (53) If moreover ε = π and d ≤ M , then, for all λ ∈ ( − π, π ) , j ≥ u ≥ , (cid:12)(cid:12)(cid:12) D j,u ( λ ; d, f ∗ ) − f ∗ (0) D ∞ ,u ( λ ; d ) 2 jd (cid:12)(cid:12)(cid:12) ≤ C f ∗ (0) L (2 d − β ) j (54) where | y | denotes the Euclidean norm of any vector y . In (53) and (54), the constant C only depends on d, β and on the wavelets ψ and φ . It can bemade independent of d on any compact set included in ((1 + β ) / − α, M + 1 /
2) for (53) and in((1 + β ) / − α, M ] for (54). This proposition shows that the covariance properties of the waveletcoefficients of an M( d ) process resemble those of the generalized FBM B d at large scales. Thelatter are not, in general, decorrelated as sometimes heuristically assumed (see Abry and Veitch(1998) and Veitch and Abry (1999) for example). Exact decorrelation occurs but in very specificcases: if { ψ j,k } is an orthonormal basis of L ( R ) and d = 0, see (Moulines et al., 2007b, Remark 7).Due to its very specific spectral property, the ideal Shannon wavelet coefficients (defined in (20))of B d satisfy partial independence for d = 0. Indeed, applying (20) in (50), we get, for all λ ∈ ( − π, π ), D ∞ ,u ( λ ; d ) = u ≥ π − | λ | ) − d otherwise , (55)implying that W ( d ) j,k and W ( d ) j ′ ,k ′ are uncorrelated for j = j ′ and that W ( d ) j,k and W ( d ) j,k ′ are uncorrelatedonly if d = 0.The asymptotic behavior of wavelet estimators b d LRW ( L n , U n , w n ) and b d LWW ( L n , U n ) definedin (24) and (28) will be derived for specific lower and upper scale sequences ( L n ) and ( U n ). Inthe semiparametric framework, the lower scale sequence ( L n ) governs the rate of convergence ofthe memory estimator. There are two possible settings as far as the upper scale sequence ( U n ) isconcerned:(S-1) U n − L n is fixed, equal to ℓ > U n ≤ J n for all n and U n − L n → ∞ as n → ∞ , where J n is the largest available scaledefined in (22). AVELET ESTIMATORS OF LONG-MEMORY 23
Asymptotic properties of the
LRW estimator.
We will use the following definition, forall i, j ≥ V i,j ( d ) def = 4 π d | i − j | i ∧ j K( d ) Z π − π (cid:12)(cid:12) D ∞ , | i − j | ( λ ; d ) (cid:12)(cid:12) dλ , (56)with D ∞ ,u ( λ ; d ) ∈ R u defined by (50). The following result, adapted from Moulines et al.(2007a), applies to Gaussian M( d ) processes; it has recently been extended to strong linear M( d )processes in Roueff and Taqqu (2007). Theorem 3.
Let X be a Gaussian M( d ) process with generalized spectral density given by (3) with d ∈ R and f ∗ ∈ H ( β, γ, ε ) for some γ > , β ∈ (0 , and ε ∈ (0 , π ] . Assume (W-1)-(W-4)with (1 + β ) / − α < d ≤ M . (57)
Let ( L n ) be a sequence satisfying lim n →∞ n n − (1+2 β ) L n + ( n − L n ) − o = 0 . (58) and w be a weight of length ℓ + 1 satisfying (25) . Then, as n → ∞ , √ n − L n (cid:16) b d LRW ( L n , L n + ℓ, w ) − d (cid:17) D −→ N , ℓ X i,j =0 w i V i,j ( d ) w j . (59) Remark . This result is stated in Moulines et al. (2007a) with ε = π . The case ε < π can beobtained by using (Moulines et al., 2007c, Corollary 2). Remark . The larger the value of β , the smaller the size of the allowed range for d in (57) for agiven decay exponent α and number M of vanishing moments. Indeed the range in (57) has beenchosen so as to obtain a bound on the bias which corresponds to the best possible rate under thecondition f ∗ ∈ H ( β, γ, ε ). If (57) is replaced by the weakest condition d ∈ (1 / − α, M ], whichdoes not depend on β , the same CLT (57) holds but β in Condition (58) must be replaced by β ′ ∈ (0 , β ]. This β ′ must satisfy 1 / − α < (1 + β ′ ) / − α < d , that is 0 < β ′ < d + α ) − β ′ < β one gets a slower achievable rate in (59). Remark . The condition n − (1+2 β ) L n → n β/ (1+2 β ) given by Theorem 1 is obtained with n − (1+2 β ) L n ≍
1, in which case thesquared bias and the variance are of the same order of magnitude, see Theorem 3 in Moulines et al.(2007b) where uniform bounds of the mean square error and an exact equivalent of the varianceare given. The asymptotic equivalent of the variance is ( n − L n ) − P ℓi,j =0 w i V i,j ( d ) w j as can beexpected from (59). Remark . Theorem 3 applies only to the setting (S-1) since U n = L n + ℓ . Under the setting (S-2), U n − L n → ∞ as n → ∞ , one has to replace the weights w by a sequence of weights ( w n ) oflengths U n − L n + 1. Using (Moulines et al., 2007b, Proposition 4), if ε = π , it is possible toextend the Theorem 3 to this setting, provided that w n,i → w ∞ ,i as n → ∞ for all i andlim ℓ →∞ X i>ℓ sup n | w n,i | i/ = 0 , (60)in which case one has √ n − L n (cid:16) b d LRW ( L n , U n , w n ) − d (cid:17) D −→ N , ∞ X i,j =0 w ∞ ,i V i,j ( d ) w ∞ ,j . The variance in the right-hand side of the last display is finite as a consequence of (51) and (60).The standard theory of linear regression shows that, for any fixed ℓ ≥
1, the optimal designmatrix is D = V − ( d, ℓ ), where V ( d, ℓ ) = [ V i,j ( d )] ≤ i,j, ≤ ℓ is a ( ℓ + 1) × ( ℓ + 1) matrix. By (26),the corresponding weights read w opt ( d, ℓ ) def = V − ( d, ℓ ) B ( B T V − ( d, ℓ ) B ) − b , (61)where B and b are defined by (27) and the associated limiting variance is ρ ( d, ℓ ) def = w opt ( d, ℓ ) T V ( d, ℓ ) w opt ( d, ℓ ) = b T ( B T V − ( d, ℓ ) B ) − b . (62)Since the regression vector weights w of length ℓ can be viewed as regression vector weights oflength ℓ + 1 with zero as last coordinate, we have that ρ ( d, ℓ ) decreases as ℓ increases and wewill denote its limit by ρ ( d, ∞ ). Figure 3 shows that the limit is approximately attained for ℓ ≥ w opt ( d, ℓ ) cannot be used directly since it depends on unknownthe memory parameter d , but a plug-in method can be used as suggested by Bardet (2002) ina similar context: a preliminary consistent estimator, say b d (1) , is used to estimate the optimalweights b w = w opt ( b d (1) , ℓ ) and then the estimator b d LRW ( L, U, b w ) is applied.A different choice of weights is suggested by Abry and Veitch (1998) (in a parametric context).This choice relies on the approximation that D ∞ ,u ( λ ; d ) / K( d ) is nearly zero for u > π ) − for u = 0. This provides a diagonal approximation of V − ( d, ℓ ) yieldinga diagonal design matrix D with diagonal entries D i,i = 2 − i , i = 0 , . . . , ℓ , up to a multiplicativeconstant. By straightforward computations, this design matrix define the following Abry–Veitch weights. w AV i ( ℓ ) def = ( i − η ℓ )2 − i κ ℓ (2 − − ℓ ) , i = 0 , . . . , ℓ , (63) AVELET ESTIMATORS OF LONG-MEMORY 25 where η ℓ def = ℓ X j =0 j − j − − ℓ and κ ℓ def = ℓ X j =0 ( j − η ℓ ) − j − − ℓ . (64)This choice, while not optimal, is closed to it in practice, at least for not too large values of d and with a standard choice of wavelets, see Figure 2. One advantage of this choice of regressionvector weights stems from the fact that it does not require the use of a pilot estimator, since theweights do not depend on the unknown parameter d .Let us denote, for all u ≥ u ( d ) def = Z π − π | D ∞ ,u ( λ ; d ) | d λ = (2 π ) − X τ ∈ Z Cov (cid:16) W ( d )0 , , W ( d ) − u,τ (cid:17) , (65)with D ∞ ,u ( λ ; d ) ∈ R u defined by (50). In the case where the weights w are chosen as proposedby Abry and Veitch (1998) given by (63), the asymptotic variance in the right-hand side of (59)reads ρ ( d, ℓ ) def = ℓ X i,j =0 w AV i ( d, ℓ ) V i,j ( d, ℓ ) w AV j ( d, ℓ ) , where ℓ + 1 is the number of scales used in the regression. Inserting (63) and (65), we get, forany ℓ ≥ ρ ( d, ℓ ) = π (2 − − ℓ ) κ ℓ (log(2)K( d )) × ( I ( d ) + 2 κ ℓ ℓ X u =1 I u ( d ) 2 (2 d − u ℓ − u X i =0 − i − − ℓ ( i − η ℓ )( i + u − η ℓ ) ) , (66)where K( d ) is defined in (52), and η ℓ and κ ℓ in (64). When ℓ is large, the last display can beapproximated by its limit as ℓ → ∞ , namely, ρ ( d, ∞ ) def = π [2 log(2)K( d )] ( I ( d ) + 2 ∞ X u =1 I u ( d ) 2 (2 d − u ) . (67)7.5. Asymptotic properties of the
LWW estimator.
Let us now consider the LWW esti-mator b d LWW ( L n , U n ) defined in (28). The following results was first proved for a Gaussian M( d )process in Moulines et al. (2007c) and then extended to linear processes in Roueff and Taqqu(2007). Theorem 4.
Let X be a strong linear M( d ) process with generalized spectral density given by (3) .with d ∈ R and f ∗ ∈ H ( β, γ, ε ) for some γ > , β ∈ (0 , and ε ∈ (0 , π ] . Assume (W-1)-(W-4)with Condition (57). Let ( L n ) be a sequence satisfying lim n →∞ n n − (1+2 β ) L n + L n ( n − L n ) − / o = 0 (68) and ( U n ) be a sequence such that either (S-1) or (S-2) holds. Then, as n → ∞ , ( n − L n ) / ( b d LWW ( L n , U n ) − d ) D −→ N (cid:2) , ρ ( d, ℓ ) (cid:3) , (69) where ℓ = lim n →∞ ( U n − L n ) ∈ { , , . . . , ∞} and where ρ ( d, ℓ ) is given by (66) for l < ∞ and (67) for l = ∞ .Remark . As in Theorem 3, the condition n − (1+2 β ) L n → L n . The condition L n ( n − L n ) − / → n − L n has to grow faster than L n which is at most of order log ( n ) and hence alwaysholds in the typical regime where n − L n ≍ n γ with γ ∈ (0 , b d LWW , then we can relax condition (68). It follows from (Moulines et al., 2007c, Theorems 1and 3) that if under the assumptions of Theorem 4, (68) is replaced bylim n →∞ n L − n + L n ( n − L n ) − / o = 0 , then b d LWW ( L n , U n ) = d + O P (cid:16) ( n − L n ) − / + 2 − βL n (cid:17) . Hence, for 2 L n ≍ n / (1+2 β ) , we get the optimal rate n β/ (1+2 β ) , stated in Theorem 1. As shown inMoulines et al. (2007c), this result holds for a class of ”weak” linear M( d ) processes (see remark3).The following observations, which follow directly from Theorem 4 seems to have be unkwnownso far: Corollary 5.
The
LWW estimator has the same asymptotic variance as the
LRW estimator withAbry–Veitch weights (63) . Corollary 6.
Among all wavelet estimators of the memory parameter d presented in this paper,for a given choice of wavelet and scales involved in the estimates, the estimator with optimalasymptotic variance is the LRW estimator using the optimal weights defined in (61) . As explained above, the optimal LRW in Corollary 6 estimator requires plugging a preliminaryconsistent estimator of d .7.6. Asymptotic variances.
The asymptotic variances of both the LRW and the LWW esti-mators depend on true value of the memory parameter d and on the wavelet ψ , as they are allexpressed in terms of D ∞ ,u ( λ ; d ), defined in (50). In practice, one estimates the limiting variance ρ ( d, ℓ ) by ρ ( b d, ℓ ) in order to construct asymptotic confidence intervals. The continuity of ρ ( · , ℓ )and the consistency of b d justify this procedure. AVELET ESTIMATORS OF LONG-MEMORY 27
A comparison of the asymptotic variances ρ ( d, ℓ ) for several wavelets can be found in Moulines et al.(2007c), (see also Figure 1) In particular, as Figure 1 in Moulines et al. (2007c) indicates, thechoice of wavelets does not matter much (provided that (1 + β ) / − α < d ≤ M holds) and a sen-sible approximation can be obtained by using the Shannon wavelet, for which a simple expressionof the asymptotic variance can be obtained thanks to (55). Using the Shannon wavelet in (50),we get, for all λ ∈ ( − π, π ), D ∞ ,u ( λ ; d ) = 0 for u ≥ D ∞ , ( λ ; d ) = (2 π − | λ | ) − d so that, forall d ∈ R , (66) becomes ρ ( d, ℓ ) = π g ( − d )2(2 − − ℓ ) κ ℓ log (2) g ( − d ) where g ( x ) = Z ππ λ x d λ . (70)8. Asymptotic properties of the Fourier estimators
GPH and
LWF8.1.
Asymptotic properties of the
GPH estimator.
The consistency and asymptotic nor-mality of the GPH have been established by Robinson (1995b) for stationary invertible Gaussian M ( d ) process − / < d < / τ = 0) and no differencing ( δ = 0). As shownby Velasco (1999b), the GPH estimator with ( τ = 0 and δ = 0) exhibits non-standard behaviorwhen d > /
2. Although it is consistent for d ∈ (1 / ,
1] and asymptotically normally distributedfor d ∈ (1 / , / d ∈ [3 / , d >
1, it converges to 1 in probability and is inconsistent. Hence, the interest in applying theGPH estimator under differencing δ > τ > { Z t } is a Gaussian white noise, ¯ I Zp,τ (˜ λ k ) is distributed as k G p,τ k / G p,τ = [ G (1) p,τ , . . . , G (2 p ) p,τ ] is a 2 p -dimensional zero-mean Gaussian vector withcovariance matrix Σ p,τ , whose expression is given in Hurvich et al. (2002). Define γ p,τ = E (cid:2) log( k W p,τ k / (cid:3) , σ p,τ = Var (cid:2) log( k W p,τ k / (cid:3) . (71)Numerical expressions for these quantities are given in Hurvich et al. (2002). Theorem 7.
Assume that X is a Gaussian M ( d ) process and f ∗ ∈ H ∗ ( β, γ, ε ) for some β ∈ (0 , , γ > , ε ∈ (0 , π ] . Let δ ≥ be the differencing order, τ ≥ be the tapering order, and p ≥ bethe pooling order. Let m n be a non-decreasing sequence of integers such that lim n →∞ ( m − n + m β +1 n n − β ) = 0 . (72) Then, for any d satisfying δ − τ − / < d < δ + 1 / , (73) the GPH estimator defined in (37) satisfies, √ m n ( b d GPH ( m n ) − d ) D −→ N (0 , σ p,τ / . (74) where σ p,τ is defined in (71) . Compared to the wavelet estimators, the size of the confidence intervals does not depend on d , which may be seen as a significant advantage. On the other hand, there is an inflation of thevariance over all the interval ( δ − τ − / , δ + 1 /
2) and it can be greater than the limiting varianceobtained using by the wavelet estimator, at least for certain values of the memory parameter.The definition of the pooled periodogram (35) implies that the number of Fourier frequenciesused to evaluate b d GPH ( m n ) is equal to ( p + τ ) m n . Hence the efficiency ratio between two GPHestimators using same number of Fourier frequencies but two different pooling numbers, say p and p ′ , may be expressed as ( p + τ ) σ p,τ / ( p ′ + τ ) σ p ′ ,τ . As shown in (Hurvich et al., 2002, Theorem1), the function p ( p + τ ) σ p,τ is decreasing showing that pooling increases asymptotic efficiency.In addition, for any fixed τ , lim p →∞ ( p + τ ) σ p,τ = Φ( τ ), whereΦ( τ ) = Γ(4 τ + 1)Γ ( τ + 1)Γ (2 τ + 1) . (75)As seen below, this efficiency bound is achieved by the local Whittle estimator. Therefore,the order of pooling can be made arbitrarily large, and at least asymptotically, an increase inthe pooling order will result in a decrease of the asymptotic variance; see Hannan and Nicholls(1977) and Theorem 7. In practice, of course, this is not possible and since the improvements inasymptotic efficiency happen quickly, there is no need to use a very large p ; p = 3 , , Remark . As observed in (73), the differencing order δ and the taper order τ control the range ofvalues of the memory parameter d which can be inferred. The number of differentiation δ controlsthe upper bound for d , while the taper order τ controls the range. These two parameters areindependent, by choosing the number of differentiations δ and the taper order τ we can thereforecover any intervals of admissible values for d (the same comment apply to the LWF estimator).If τ = 0, the interval over which the GPH estimator is consistent and asymptotically normalis ( δ − / , δ + 1 / τ >
0, the range is [ δ − τ − / , δ + 1 / τ = 0, using the sharpened results from Velasco(1999b), the range over which the memory parameter is asymptotically normal can be shown tobe ( δ − / , δ + 3 / Remark . It is possible to replace the Gaussian assumption by the weaker assumption thatthe process X is a strong linear M ( d ) process by adding moment and regularity conditions onthe distribution of the driving noise in the definition (44). In this case however, the estimator b d GPH ( m n ) should be slightly modified to avoid a number of Fourier frequencies near 0. In addition,tapering and pooling are then required, even if the process X is stationary and invertible; see(Velasco, 2000, Theorem 3) and Fa¨y et al. (2004). AVELET ESTIMATORS OF LONG-MEMORY 29
Asymptotic properties of the
LWF estimator.
The consistency and asymptotic nor-mality of the LWF estimator have been established by Robinson (1995a) for stationary invertiblelinear M ( d ) process − / < d < / τ = 0) and no differencing ( δ = 0)(under the weaker assumption that { Z t } in (44) are martingale differences, whose squares, cen-tered at their expectation, are also weakly stationary martingale differences). Velasco (1999a)has shown that the LWF estimator with τ = 0 and δ = 0 was consistent for d ∈ ( − / ,
1] andasymptotically N (0 , /
4) for d ∈ ( − / , /
4) under the same assumptions than Robinson (1995a).(Hurvich and Chen, 2000, Theorem 2) established Theorem 8 for τ = 1 and δ = 1. This resultwas later extended in Moulines and Soulier (2003) to general τ and δ . The consistency of theLWF was established (with δ = 0 and τ = 0) for − / < d < / Theorem 8.
Assume that X is a strong linear M ( d ) process for some d ∈ R and f ∗ ∈ H ∗ ( β, γ, ε ) for some β ∈ (0 , , γ > , ε ∈ (0 , π ] . Let δ be the differencing order and τ be the taper order.Let m n be a non decreasing sequence of integers such that lim n →∞ (cid:16) m − n + m β +1 n n − β (cid:17) = 0 . (76) Then, the
LWF estimator defined in (40) satisfies, for any d satisfying δ − τ − / < d < δ + 1 / , (77) √ m n ( b d LWF ( m n ) − d ) D −→ N (0 , Φ( τ ) / , (78) where Φ( τ ) is defined in (75)The quantity Φ( τ ) quantifies the loss of efficiency due to tapering. As the taper order increases,the limiting variance inflates: Φ(0) = 1 (no tapering), Φ(1) = 1 .
5, Φ(2) = 35 /
18, etc. Since theLWF estimator is based on linear functionals of the periodogram, pooling is irrelevant. Recallthat, in the definition (40), the standard periodogram is used and not the pooled one. This iswhy the pooling order does not appear in the conditions of Theorem 8.
Remark . The condition on the bandwidth (76) is slightly weaker than Assumption A4 ′ inRobinson (1995a) and Hurvich and Chen (2000). It seems that the log ( m n ) term in these as-sumptions is superfluous (see (Andrews and Sun, 2004, Comments of Assumption 4) for therequired adaptation of the proof). 9. Discussion
The Fourier and wavelet estimators of the memory parameter present similar characteristics anddistinctive advantages. In Table 2, we summarize the main features of the estimators consideredin this paper.
Fourier WaveletsNon-stationarity( d large) Pre-apply ( I − B ) δ with δ > d − / M ≥ d (a sufficient numberof vanishing moments)Polynomial trends same as above same as aboveof degree K with δ ≥ K + 1 with M ≥ K + 1Leakage( d small) Use taper (1 − e πk/n ) p with taper order p > / − d take α > (1 + β ) / − d (sufficientlysmooth wavelet).Rate of convergence n β/ (1+2 β ) ( β ≤ n β/ (1+2 β ) ( β ≤ b d to d with m n ≍ n β/ (1+2 β ) with 2 L n ≍ n / (1+2 β ) Asymptoticvariance depends ontaper order p only;GPH’s ↓ LWF’s,as pooling order → ∞ . depends on d and ψ ;LRW’s with w AV = LWW’s ≥ LRW’s with w opt ( b d (1) , ℓ ). Table 2.
Fourier VS wavelets: trends, non-stationarity, non-invertibility. In the
Wavelets column, M and α are defined in (W-1)-(W-4).To allow comparison between wavelet and Fourier estimators, we must first link the normal-ization factors, ( n − L n ) / for wavelet estimators and m / n for Fourier estimators. A Fourierestimator with bandwidth m n projects the observations [ X . . . X n ] T on the space generated bythe vectors { cos(2 πk · /n ) , sin(2 πk · /n ) } , k = 1 , . . . , m n , whose dimension is 2 m n ; on the otherhand, the wavelet coefficients { W j,k , j ≥ L, k = 0 , . . . , n j − } used in the wavelet estimator corre-spond to a projection on a space whose dimension is at most P J n j = L n n j ∼ P ∞ j = L n n − j ∼ n − L n .Hence, for m n or n − L n large, it makes sense to consider n − L n as an analog of the bandwidthparameter m n .We shall now compare the asymptotic variances in the CLT’s in Theorems 3, 4, 7 and 8.While the asymptotic variance of the Fourier estimators is a constant, the variance of the waveletestimators is a function of the memory parameter, which can be numerically computed. For theFourier estimators, the allowed range of the memory parameter d is given by (see Theorems 7and 8) δ − τ − / < d < δ + 1 / . (79)The length of this range equals τ + 1, while the differentiation order δ allows to shift it towardslarge values of d . For instance, if one wishes to shift the upper boundary of the range towardslarge values of d while keeping the lower boundary unchanged, one has to increase both τ and δ AVELET ESTIMATORS OF LONG-MEMORY 31 by the same factor. As shown in Theorem 8, increasing τ inflates the asymptotic variance of theestimator. For wavelet estimators, the allowed range of the memory parameter d is given by1 / − α < d ≤ M , (80)[see Theorems 3 and 4; here we took β arbitrarily small, since we focus on the asymptoticvariance in this discussion]. Of course one may still shift this range by a factor δ to the right bydifferentiating the series X at the order δ before processing the wavelet transform. This will alsoeliminate polynomial trends up to degree M + δ − α in (80), the more negative d is allowed to be. This is becausethe higher the α , the smoother the wavelet ψ j,k and hence the better the spectral resolution ofthe wavelet. This matters particularly when d < f ( λ ) is then very small around theorigin making it harder to estimate d . In Fourier (see (79), it is τ that plays a role similar to α .It is important to note that, for a given wavelet family such as Daubechies and Coiflets,increasing M yields a larger α , so that the allowed range is effectively increased by increasing M . In contrast to Fourier methods, by increasing the number of vanishing moments M , say of aDaubechies wavelet, the asymptotic variance converges to the asymptotic variance obtained withthe Shannon wavelet, presented in (70). Thus, for a given d , there is asymptotically no price topay for increasing the number of vanishing moments M and the number of available scales. Thisis an argument in favor of wavelet estimators as compared to tapered Fourier estimator. Thisshould, however, be interpreted with care. For a given sample size n , an increase of M causesan increase of the support of the wavelet and a decrease in the number of available scales. Whilethis does not influence the asymptotic variances, it affects the performance on finite samples .The plots in Figure 1 indicate that the asymptotic variance of the LWW estimator is lowerthan the one obtained using the tapered version of LWF estimator, for most values of the memoryparameter and this difference increases as τ increases in order to adapt to larger ranges for d .The asymptotic variance ρ ( d, ℓ ) in (66) of the LWW estimator is displayed for ℓ = 7 using theDaubechies wavelets with M = 2 (Left) to M = 4 (Right). For these choices of wavelets, thecorresponding α ’s are 1 .
34 and 1 .
91, and the allowed intervals for d are [ − . ,
2] and [ − . , ρ ( d, ∞ ) in (67) for M = 2 and 4 can also be comparedto the one of the Shannon wavelet on this plot. The asymptotic variance Φ( τ ) in (75) of theLWF estimator is constant in d but increases when the taper order τ increases from τ = 2 to τ = 4, these values corresponding to intervals lengths for d close to those of the M = 2 , The same remark apply to the pooling order for the GPH estimator: the asymptotic variance decreases as p → ∞ but in practice, one takes small values, e.g. p = 3 , Using wavelet to estimate the memory parameter has several additional benefits comparedto using Fourier estimators. The wavelets present a rich time/frequency representation of theprocess, which can be more informative than that of the classical Fourier analysis, as discussedin Serroukh et al. (2000), Stoev et al. (2006) and Percival and Walden (2006). Wavelets can beused to detect the presence of outliers or jumps in the mean. The short-range dependence of thewavelet coefficients suggests construction of bootstrap confidence intervals for functionals of thewavelet coefficients, a procedure referred to as wavestrapping . This technique, which still is notrigorously justified, may be used to construct bootstrapped confidence interval for the memoryparameter; see for example Percival et al. (2000).10.
A Monte-Carlo study
In this section, we present some Monte-Carlo simulation results that compare the root-meansquare error performance of our four estimators for finite samples. The four estimators are denotedGPH (Geweke-Porter-Hudak), LWF (local Whittle Fourier), LWW (local Whittle wavelet) andLRW (local regression wavelet). We consider three models and several parameter combinationsfor each model:(1) The ARFIMA models, introduced by Granger and Joyeux (1980), and generalized here toany value of the memory parameter d . We considered the ARFIMA(0, d ,0) and ARFIMA(1, d ,0)with d in {− . , , . , . , . , . } and sizable lag 1 AR coefficient equal to 0.8. The in-novation is assumed to be Gaussian. The short-memory component f ∗ of the spectraldensity satisfies f ∗ ∈ H (2 , γ, π ), where H is defined in Definition 2.(2) The DARFIMA models, defined in Andrews and Sun (2004), is an ARFIMA-like processthat has a discontinuity in its spectral density at frequency λ = λ . The DARFIMA(1, d ,0)has the spectral density of an ARFIMA(1, d ,0) on the interval [ − λ , λ ] and is zero for | λ | ∈ [ λ , π ]. It is obtained by low-pass filtering of an ARFIMA(1, d ,0) trajectory by atruncated sinc function in the time domain. We chose λ = π/ X t = G ( Y t ) where { Y t } is a stationary Gaussian sequence with zero-mean and variance 1 and G : R → R is a measurable function such that E [ G ( Y )] < ∞ . Then, X t may be expressed as thesum X t = c + P ∞ k = k ( c k /k !) H k ( Y t ), where H k ( · ) is the k -th Hermite polynomial and c k = E [ G ( Y t ) H k ( Y t )]. The minimal integer k ≥ c k = 0 is called the Hermiterank of G . If Y is a M ( d ) process with memory parameter d Y ≤ /
2, then X is also an M ( d ) process with memory parameter d X = (1 − k (1 − d Y )) (see (Dalla et al., 2006,p. 229, Eq. (55)) for details). In simulations, we use G ( x ) = exp( x ) (for which k = 1)and G ( x ) = H ( x ) = x − k = 2) and denote those models SUBORD1 andSUBORD2, respectively. AVELET ESTIMATORS OF LONG-MEMORY 33
For the estimators LWW and LWF, we have used a convex minimization procedure of the contrastfunctions (29) and (39). In all cases, 1000 simulation runs for each value of d are used. Thisproduces simulation standard errors that are roughly 3%.The tuning parameters of each estimation procedure have been chosen to allow a fair compari-son of those methods in a realistic setting, where the order of magnitude of the memory parameter d is only loosely known and where one may be in the presence of high-order polynomial trends. Inorder to cover all the values of d above ( − . ≤ d ≤ . M = 4 vanishing moments for the wavelet estimators (hence α ≈ .
91, see Table 1) and we havedifferenced the series δ = 4 times and have used a taper order τ = 5 for the Fourier estimators.The corresponding admissible ranges are (see (80) and (79)) ( − . ,
4] and ( − . , . p = 4 in Relation (35) defining the pooled periodogram.This reduces the number of frequencies by a factor τ + p = 9 (see Relation (35)). In the case of theLRW estimator, the computation of the optimal weights in the least-square criterion is numer-ically quite involved so we ran the simulations using the weights suggested by Abry and Veitch(1998). The difference in the results becomes significant only when d gets close to the boundariesof the admissible range, see Figure 2 for the asymptotic variance. The remaining free parametersare the number of frequencies (LWF) or blocks of frequencies (GPH with pooling) denoted m n ,and the minimal ( i.e. finest) dyadic wavelet scale L n (for the LRW and the LWW estimators);the highest ( i.e. coarsest) scale is chosen to be the highest available ( U n = J n ).The box and whisker plots of the estimators are displayed in Figures 4 and 5 for differentvalues of the bandwidth (Fourier methods) and the finest scale (wavelet methods). In Figure 4,the model is an ARFIMA(1, d ,0) with d = 1 .
5. In Figure 5, the model is an DARFIMA(1, d ,0)with d = 0 .
3. The AR coefficient is 0.8 in both cases. These figures illustrate the bias-variancetrade-off inherent to semi-parametric methods (the variance decreases as the bandwidth or thenumber of scales increases, but then the bias increases). In general, the standard deviation of the1000 runs of the wavelet methods is comparable to that of the Fourier methods.Tables 3 and 4 give the bias, variance, and RMSE (root mean square error) for models 1,2,3for sample sizes 512 and 4096, respectively. Those quantities are computed for the optimalbandwidth m n (Fourier methods) or the optimal finest scale L n (wavelet methods) in the RMSE-sense, whose values are displayed in the fourth column. For each model, the lowest RMSE amongthe four methods appears in boldface. Note that all the possible values of finest scale L n havebeen considered, but only a subset of the many possible values of the bandwidth m n . Thestandard deviations of the LWW estimator are lower than those of the LRW estimator, whichis consistent with our theoretical findings. Also, the standard deviations of the Fourier methodsremain approximately constant for the different values of the memory parameter, whereas thevariance of the wavelet methods increase with | d | . Also, as predicted by the expressions of the limiting variance, the variance of the wavelet methods are lower than those of the Fouriermethods, especially when d is small. The reported values of the standard deviations agree withour theoretical findings for the sample size n = 4096.For the non-linear processes, the results suggest that the wavelet estimator remains consistent.However, the presence of non-linearity worsens the behavior of the estimator at a given finitesample and a larger sample size is required to achieve a prescribed accuracy.The root mean square error is shown in some particular cases in Figure 6. The MSE of theFourier criteria is plotted against the value of the bandwidth, that is, m n for the LWF estimatorand m n × ( p + τ ) for the GPH estimator. For the wavelet methods, the somehow arbitrary“equivalent bandwidth” abscissa is half the number of wavelet coefficients used by the estimators: m equiv n = P U n j = L n n j (see the discussion on the comparison of the asymptotic variances in theprevious section). 11. Software
The software used to perform the estimation of the long-memory parameter was written in
Matlab/Octave and may be obtained from the authors. It includes the four estimators (LWF,GPH, LWWand LRW), basic random processes generators and some other utilities such as thepyramidal algorithm for computing wavelet coefficient.
Basic installation . After downloading the tar archive ( toolboxLRD.tar ) and expanding itin e.g. /home/user/octave , one has to add the directory to the search path: addpath(genpath(’/home/user/octave/ToolboxLRD’));
This line can be added to the .octave or .matlab file. Some demos are available in ToolboxLRD/Examples . Loading the data . Use the load command to load a data set (time series) into some vector,say x . One can also synthesize some trajectories using one’s personal generator or the onepresent in the Utils subdirectory. For instance, a 4096 long trajectory of the ARFIMA model(1 − B ) d ( X t − αX t − ) = Z t with d = 1 . α = 0 . Z can be obtainedby setting: n = 4096;x = randARFIMA(1.4,[0.8],[],n); The argument [] above means that the MA part of the generated ARFIMA is a weak white noise(MA(0)). This example is used in the following to describe the package. Estimating the long-memory parameter . We shall now obtain the LRW, LWW, GPHand LWF estimators of the memory parameter d of the series as well as an estimate of theirstandard deviation using the asymptotic variance given in Theorems 3, 4, 7, and 8. The standarddeviations can be used to build asymptotic confidence intervals.1. The Geweke-Porter Hudak (GPH) estimator is obtained as follows:
AVELET ESTIMATORS OF LONG-MEMORY 35 param.taper=5; param.pooling= 4;param.bandwidth=[6 12 24 50]; param.difforder = 4;[d, stds]=GPH(x,param) where param.bandwidth is a vector giving the different values for the upper Fourier fre-quency m on which the regression is to be performed (Theorem 7). The taper order τ ,pooling order p and differentiation order δ are specified by param.taper , param.pooling and param.difforder , respectively. One obtains d = 1.2744 1.2736 1.3670 1.3930stds =0.1803 0.1215 0.0840 0.0576 Note that d and stds are vectors. In the above example, they have four components, corre-sponding respectively to the bandwidths m = 6 , , , Local Whittle Fourier (LWF) estimator is invoked in the following way: param.taper=5; param.difforder=4;param.bandwidth=[50 100 200 500];[d, stds]=LWF(x,param,[])
One gets : d = 1.3114 1.2974 1.2905 1.3422stds =0.1206 0.0853 0.0603 0.0381
Here the minimization of (39) is done over the whole real line. As for the
LWW function, onemay specify the range where to optimize the LWF contrast function (39) by replacing the thirdargument [] by an interval [∆ , ∆ ]. For instance, if one wants to restrict this minimizationto the set (79) of admissible values of d for the CLT Theorem 8 to hold, range= [ param.difforder-param.taper-0.5, param.difforder+0.5];[d, stds]=LWF(x,param,range) In this specific case, the output is unchanged since the minimizing values of d are within thecorresponding interval [ − . , . Local Whittle Wavelets (LWW) estimator is obtained as follows:
LU = [6 9; 5 9; 4 9; 3 9];[phi, M, alpha] = scalingfilter(’Daubechies’,4);[d, stds] = LWW(x,LU,phi,[]) where phi indicates the scaling function, M the corresponding number of vanishing moments, alpha the Fourier decay exponent (see (W-1)-(W-4)), x contains a finite set of observationsand LU is a two column matrix, containing scales limits L and U in the LWW objectivefunction (29). If LU is a one column vector then it contains different values of the lower scale L and U is taken equal to the maximal available scale index J defined in (22). The argument [] above means that the interval, denoted [∆ , ∆ ] in Definition (28), over which the contrastfunction (29) is minimized is ( −∞ , ∞ ). It can be replaced by an interval [∆ , ∆ ], if one wants to restrict the minimization to a particular range, for instance to the one given by (80) whichcorresponds to admissible values of d for the CLT Theorem 4 to hold. One gets : d = 1.3183 1.3366 1.4209 1.3788stds =0.1138 0.0719 0.0479 0.0324
4. The
Local Regression Wavelet (LRW) estimator and the standard deviation is invoked in thefollowing way:
L = [6;5;4;3];[d,stds] = LRW(x,L,phi)
The three first argument of
LRW are the same as
LWW but LU has been replaced by L , a columnvector containing different choices for the lower scale L used in the regression, see Eq. (24). Inthis case, the upper scale is the largest scale available ( U = J n ). If one wants to take differentvalues for U , a two-columns matrix must replace L , for example, by the LU in the LWW case.Here the LRW estimator is computed using Abry–Veitch weights (see 63) and one gets theoutput: d = 1.4161 1.3815 1.4305 1.3882stds =0.1020 0.0675 0.0461 0.0317
But the LRW estimator can also be obtained using the optimal weights. In fact, the followingadditional output variables are available :(a) the value of a log-regression multiplicative constant c so that b σ j ≈ c dj . Equivalently,log b σ j ≈ log c + 2 dj , where log c is the intercept of the regression line;(b) new estimates of d using the two-step procedure based on the optimal weights (61). Theseweights are computed using the preliminary value of d estimated with the Abry-Veitchweights;(c) the standard deviations of the new estimates of d ;(d) the corresponding values of the log-regression multiplicative constant.Thus if the LRW call of the last example is replaced by [d,stds, c, dopt,stdopt, copt] = LRW(x,L,phi) the following additional output is added to the previous one : c = 0.0074 0.0105 0.0067 0.0094dopt =1.4199 1.3852 1.4451 1.3834stdopt =0.1011 0.0666 0.0453 0.0311copt =0.0073 0.0103 0.0060 0.0097
Alternatively, one could also use a different preliminary estimate of d , say d = 1 . GPH routine above : [d,stds, c, dopt,stdopt, copt] = LRW(x,L,phi,1.3823)
AVELET ESTIMATORS OF LONG-MEMORY 37
The last three output values are then replaced by dopt =1.4199 1.3852 1.4441 1.3835stdopt =0.1011 0.0666 0.0451 0.0310copt =0.0073 0.0103 0.0061 0.0097
Obtaining confidence intervals.
A routine for obtaining the asymptotically normal confi-dence intervals for d at a given level has also been included. It works as follows : p=0.95; d=dopt; stds=stdopt;[I]=ConfidenceInterval(d,stds,p) where d and stds are any outputs of the above procedures and p is the confidence level. Here weused the last displayed estimates with p = 0 .
95 and get
I = 1.2217 1.61801.2546 1.51581.3557 1.53251.3227 1.4443
In this particular example we can see that the true value of d , namely 1 .
4, belongs to the fourintervals.
Obtaining the theoretical asymptotic variances.
It is possible also to obtain directly theasymptotic variances ρ ( d, ℓ ) and ρ ( d, ℓ ) defined in (62) and (66). These are the asymptoticvariances of the LRW estimator (Theorem 3), when, respectively, the optimal weigths (61) arechosen or when the Abry-Veitch weights (63) are chosen. The asymptotic variance of the LWWestimator is also ρ ( d, ℓ ) (Theorem 4). The approximation of ρ ( d, ℓ ) given in (70) and obtainedby replacing ψ by the Shannon wavelet is also available. This is how to get these asymptoticvariances : d=1.4; l=5;[v, vs, vopt, wopt]= AsymptoticVarianceLRW(phi,d,l) Here ρ ( d, ℓ ) and ρ ( d, ℓ ) are computed for d = 1 . ℓ = 5 and stacked in the output v and vopt respectively. The output vs corresponds to the Shannon approximation (70) and wopt tothe optimal weights (61), of length ℓ + 1 = 6. For these values one gets v = 0.5848vs =0.4949vopt =0.5698wopt =-0.2693 0.0546 0.0827 0.0587 0.0410 0.0322 We observe that for this value of d the optimal variance (0 . . . Daubechies wavelet increases, it indicates that one could get a better variance by increasing M ,here M = 2. Notice, however, that the length of the wavelet filters would also increase andthus the number of available wavelet coefficients decrease for a finite n , an effect which is notconsidered in the asymptotic variance, see Section 9.12. Conclusion
We have compared four semi-parametric methods for the estimation of the long-memory pa-rameter d in times series, two Fourier-based and two wavelet-based. These are the Geweke-Porter Hudak (GPH) [Regression/Fourier], Local Whittle Fourier (LRW) [Whittle/Fourier], Lo-cal Regression Wavelet (LRW) [Regression/Wavelets] and Local Whittle Wavelet (LWW) [Whit-tle/Wavelets]. We have discussed issues related to differencing, tapering and pooling in the caseof Fourier-based estimators and choices of wavelets in the case of wavelet-based estimators. Con-ditions for the asymptotic normality of the estimators are specified in Theorems 3, 4, 7 and 8.We have undertaken a Monte Carlo comparison. In the Monte Carlo study, we have focused onARFIMA(0, d ,0) and ARFIMA(1, d ,0) models with an AR(1) parameter equal to 0.8, a relativelyhigh value, as well as on DARFIMA and subordinated models defined in Section 10. All fourmethods appear to work well with similar performances at the optimal bandwidth lower scaleindex. We have also developed a software package for the benefit of the practitioner whichcomputes the corresponding estimates of the long-memory parameter d and provides confidenceintervals, based on the asymptotically normal distribution of the estimators.We noted that the LRW estimator with Abry-Veitch weights (63) has the same asymptoticvariance as the LWW estimator. This means that the LRW estimator, when used with theoptimal weights (61), has smaller asymptotic variance than the LWW estimator. ReferencesAbadir, K. , Distaso, W. and
Giraitis, L. (2007). Nonstationarity-extended local whittle estimation.
J. of Econometrics
Abry, P. , Flandrin, P. , Taqqu, M. S. and
Veitch, D. (2000). Wavelets for the analysis, estimationand synthesis of scaling data. In
Self-Similar Network Traffic and Performance Evaluation (K. Parkand W. Willinger, eds.). Wiley (Interscience Division), New York.
Abry, P. , Flandrin, P. , Taqqu, M. S. and
Veitch, D. (2003). Self-similarity and long-range depen-dence through the wavelet lens. In
Theory and Applications of Long-range Dependence (P. Doukhan,G. Oppenheim and M. S. Taqqu, eds.). Birkh¨auser, 527–556.
Abry, P. and
Veitch, D. (1998). Wavelet analysis of long-range-dependent traffic.
IEEE Trans. Inform.Theory Andrews, D. W. K. and
Guggenberger, P. (2003). A bias-reduced log-periodogram regression esti-mator for the long-memory parameter.
Econometrica AVELET ESTIMATORS OF LONG-MEMORY 39
Andrews, D. W. K. and
Sun, Y. (2004). Adaptive local polynomial Whittle estimation of long-rangedependence.
Econometrica Bardet, J.-M. (2000). Testing for the presence of self-similarity of Gaussian time series having stationaryincrements.
Journal of Time Series Analysis Bardet, J.-M. (2002). Statistical study of the wavelet analysis of fractional Brownian motion.
IEEETrans. Inform. Theory Bardet, J.-M. , Lang, G. , Moulines, E. and
Soulier, P. (2000). Wavelet estimator of long-rangedependent processes.
Stat. Inference Stoch. Process. Cohen, A. (2003).
Numerical analysis of wavelet methods , vol. 32 of
Studies in Mathematics and itsApplications . North-Holland Publishing Co., Amsterdam.
Dalla, V. , Giraitis, L. and
Hidalgo, J. (2006). Consistent estimation of the memory parameter fornonlinear time series.
J. Time Ser. Anal. Daubechies, I. (1992).
Ten lectures on wavelets , vol. 61 of
CBMS-NSF Regional Conference Series inApplied Mathematics . Society for Industrial and Applied Mathematics (SIAM), Philadelphia, PA.
Deo, R. , Hsieh, M. , Hurvich, C. M. and
Soulier, P. (2006a). Long memory in nonlinear processes.In
Dependence in probability and statistics , vol. 187 of
Lecture Notes in Statist.
Springer, New York,221–244.
Deo, R. , Hurvich, C. M. and
Lu, Y. (2006b). Forecasting realized volatility using a long-memorystochastic volatility model: estimation, prediction and seasonal adjustment.
J. Econometrics
Fa¨y, G. , Moulines, E. and
Soulier, P. (2004). Edgeworth expansions for linear statistics of possiblylong-range-dependent linear processes.
Statist. Probab. Lett. Fa¨y, G. , Roueff, F. and
Soulier, P. (2007). Estimation of the memory parameter of the infinite-sourcePoisson process.
Bernoulli Fox, R. and
Taqqu, M. S. (1986). Large-sample properties of parameter estimates for strongly dependentstationary Gaussian time series.
Ann. Statist. Geweke, J. and
Porter-Hudak, S. (1983). The estimation and application of long memory time seriesmodels.
J. Time Ser. Anal. Giraitis, L. , Robinson, P. M. and
Samarov, A. (1997). Rate optimal semiparametric estimation ofthe memory parameter of the Gaussian time series with long range dependence.
J. Time Ser. Anal. Granger, C. W. J. and
Joyeux, R. (1980). An introduction to long-memory time series models andfractional differencing.
J. Time Ser. Anal. Hannan, E. J. and
Nicholls, D. F. (1977). The estimation of the prediction error variance.
J. Amer.Statist. Assoc. Hurvich, C. M. and
Chen, W. W. (2000). An efficient taper for potentially overdifferenced long-memorytime series.
J. Time Ser. Anal. Hurvich, C. M. , Lang, G. and
Soulier, P. (2005a). Estimation of long memory in the presence of asmooth nonparametric trend.
J. Amer. Statist. Assoc.
Hurvich, C. M. , Moulines, E. and
Soulier, P. (2002). The FEXP estimator for potentially non-stationary linear time series.
Stoch. Proc. App. Hurvich, C. M. , Moulines, E. and
Soulier, P. (2005b). Estimating long memory in volatility.
Econo-metrica Hurvich, C. M. and
Ray, B. K. (1995). Estimation of the memory parameter for nonstationary ornoninvertible fractionally integrated processes.
J. Time Ser. Anal. Johnson, N. L. and
Kotz, S. (1970).
Distributions in statistics. Continuous univariate distributions. 2.
Houghton Mifflin Co., Boston, Mass.
Kaplan, L. M. and
Kuo, C.-C. J. (1993). Fractal estimation from noisy data via discrete fractionalGaussian noise (DFGN) and the Haar basis.
IEEE Trans. Signal Process. K¨unsch, H. (1987). Statistical aspects of self-similar processes. In
Proceedings of the 1st World Congressof the Bernoulli Society, Vol. 1 (Tashkent, 1986) . VNU Sci. Press, Utrecht.
Lahiri, S. N. (2003). A necessary and sufficient condition for asymptotic independence of discrete Fouriertransforms under short- and long-range dependence.
Ann. Statist. Mallat, S. (1998).
A wavelet tour of signal processing . Academic Press Inc., San Diego, CA.
McCoy, E. J. and
Walden, A. T. (1996). Wavelet analysis and synthesis of stationary long-memoryprocesses.
J. Comput. Graph. Statist. Moulines, E. , Roueff, F. and
Taqqu, M. S. (2007a). Central Limit Theorem for the log-regressionwavelet estimation of the memory parameter in the Gaussian semi-parametric context. To appear.
Moulines, E. , Roueff, F. and
Taqqu, M. S. (2007b). On the spectral density of the wavelet coefficientsof long memory time series with application to the log-regression estimation of the memory parameter.
J. Time Ser. Anal. .URL http://arxiv.org/abs/math.ST/0512635 Moulines, E. , Roueff, F. and
Taqqu, M. S. (2007c). A wavelet Whittle estimator of the mem-ory parameter of a non-stationary Gaussian time series. Tech. rep., Ecole Nationale Sup´erieure desT´el´ecommunications et Boston University. To appear in the Annals of Statistics.URL http://arxiv.org/abs/math.ST/0601070
Moulines, E. and
Soulier, P. (2003). Semiparametric spectral estimation for fractional processes. In
Theory and applications of long-range dependence . Birkh¨auser Boston, Boston, MA, 251–301.
Percival, D. B. , Sardy, S. and
Davison, A. C. (2000). Wavestrapping time series: adaptive wavelet-based bootstrapping. In
Nonlinear and nonstationary signal processing (Cambridge, 1998) . CambridgeUniv. Press, Cambridge, 442–471.
Percival, D. B. and
Walden, A. T. (2006).
Wavelet methods for time series analysis , vol. 4 of
Cambridge Series in Statistical and Probabilistic Mathematics . Cambridge University Press, Cambridge.Reprint of the 2000 original [MR1770693].
Robinson, P. M. (1994). Efficient tests of nonstationary hypotheses.
J. Amer. Statist. Assoc. obinson, P. M. (1995a). Gaussian semiparametric estimation of long range dependence. Ann. Statist. Robinson, P. M. (1995b). Log-periodogram regression of time series with long range dependence.
TheAnnals of Statistics Robinson, P. M. and
Henry, M. (2003). Higher-order kernel semiparametric M -estimation of longmemory. J. Econometrics
Roueff, F. and
Taqqu, M. S. (2007). Asymptotic normality of wavelet estimators of the memoryparameter: the linear case. Tech. rep.
Samorodnitsky, G. and
Taqqu, M. S. (1994).
Stable non-Gaussian processes: stochastic models withinfinite variance . Chapman and Hall.
Serroukh, A. , Walden, A. T. and
Percival, D. B. (2000). Statistical properties and uses of thewavelet variance estimator for the scale analysis of time series.
J. Amer. Statist. Assoc. Shimotsu, K. and
Phillips, P. C. B. (2005). Exact local Whittle estimation of fractional integration.
Ann. Statist. Shimotsu, K. and
Phillips, P. C. B. (2006). Local Whittle estimation of fractional integration andsome of its variants.
J. Econometrics
Stoev, S. , Taqqu, M. S. , Park, C. , Michailidis, G. and
Marron, J. S. (2006). LASS: a tool for thelocal analysis of self-similarity.
Comput. Statist. Data Anal. Tanaka, K. (1999). The nonstationary fractional unit root.
Econometric Theory Teyssi`ere, G. and
Abry, P. (2007). Wavelet analysis of nonlinear long-range dependent processes.Applications to financial time series. In
Long memory in economics . Springer, Berlin, 173–238.
Veitch, D. and
Abry, P. (1999). A wavelet-based joint estimator of the parameters of long-rangedependence.
IEEE Trans. Inform. Theory Veitch, D. , Abry, P. and
Taqqu, M. S. (2003). On the automatic selection of the onset of scaling.
Fractals Veitch, D. , Taqqu, M. S. and
Abry, P. (2000). Meaningful MRA initialisation for discrete time series.
Signal Processing Velasco, C. (1999a). Gaussian semiparametric estimation of non-stationary time series.
J. Time Ser.Anal. Velasco, C. (1999b). Non-stationary log-periodogram regression.
J. Econometrics Velasco, C. (2000). Non-Gaussian log-periodogram regression.
Econometric Theory Velasco, C. and
Robinson, P. M. (2000). Whittle pseudo-maximum likelihood estimation for nonsta-tionary time series.
J. Am. Statist. Assoc. Wornell, G. W. and
Oppenheim, A. V. (1992). Estimation of fractal signals from noisy measurementsusing wavelets.
IEEE Trans. Signal Process.
611 – 623. ˇZurbenko, I. (1979). On the efficiency of estimates of a spectral density.
Scand. J. Statist. G I LL E S F A ¨ Y , E R I C M O U L I N E S , F R AN C ¸ O I S R O U E FF , AN D M U R A D S . T A QQ U GPH LWF LRW LWWModel bias std RMSE m opt n bias std RMSE m opt n bias std RMSE L opt n bias std RMSE L opt n ARFIMA(0,-1.2,0) 0.007 0.105
26 -0.108 0.071 0.129 234 0.047 0.106 0.116 2 0.105 0.083 0.134 2ARFIMA(1,-1.2,0) 0.000 0.161 0.161 12 -0.138 0.128 0.188 72 -0.093 0.108 0.142 2 -0.048 0.083
234 -0.093 0.120 0.152 2 -0.047 0.097 0.108 2ARFIMA(1,2.5,0) -0.132 0.141 0.194 12 -0.157 0.108 0.190 108 -0.136 0.115 0.178 2 -0.101 0.098
234 -0.077 0.110 0.134 2 -0.037 0.089 0.097 2ARFIMA(1,3.5,0) -0.097 0.136 0.167 12 -0.111 0.109 0.155 108 -0.102 0.113 0.152 2 -0.063 0.097
234 -0.043 0.097 0.106 1 -0.015 0.086 0.087 1SUBORD1(0,0.3,0) -0.111 0.113 0.159 26 -0.179 0.087 0.199 234 -0.155 0.090 0.179 1 -0.127 0.083
17 -0.133 0.157 0.206 153 -0.122 0.174 0.212 2 -0.032 0.185 0.188 2SUBORD2(0,0.0,0) -0.014 0.103 0.104 26 -0.095 0.067 0.117 234 -0.026 0.061 0.066 1 -0.001 0.048
45 0.160 0.212 0.266 3 0.293 0.197 0.353 3SUBORD2(0,0.3,0) 0.014 0.115 0.116 26 -0.058 0.093 0.109 234 -0.005 0.076 0.077 1 0.025 0.067
72 0.174 0.088 0.195 1 0.214 0.085 0.230 1
Table 3.
Length of the time series = 512. Bias, standard deviation, root mean-square error and optimalbandwidth/minimal scale for the four estimators applied to the three models of Section 10. The lowest RMSEamong the four methods appears in boldface. AV E L E T E S T I M A T O R S O F L O N G - M E M O R Y GPH LWF LRW LWWModel bias std RMSE m opt n bias std RMSE m opt n bias std RMSE L opt n bias std RMSE L opt n ARFIMA(0,-1.2,0) -0.012 0.032
224 -0.031 0.022 0.038 2016 0.020 0.037 0.043 3 0.038 0.032 0.050 3ARFIMA(1,-1.2,0) -0.024 0.061 0.065 54 -0.081 0.041 0.091 486 -0.046 0.023 0.051 2 -0.037 0.021
234 0.054 0.063 0.083 4 0.031 0.090 0.095 5SUBORD2(0,0.3,0) 0.032 0.045 0.055 224 0.022 0.040 0.046 2016 0.028 0.025 Table 4.
Same as Table 3 with length of the time series = 4096.
Laboratoire Paul-Painlev´e, Universit´e Lille-1, 59655 Villeneuve-d’Ascq Cedex, France.
Current address : Laboratoire APC, Universit´e Paris-7, Bˆatiment Condorcet, 10, rue Alice Domon et L´eonieDuquet, 75205 Paris Cedex 13, France.
E-mail address : [email protected] LTCI (CNRS, TELECOM ParisTech) , 46, rue Barrault, 75634 Paris C´edex 13, France.
E-mail address : [email protected] E-mail address : [email protected] Department of Mathematics and Statistics, Boston University Boston, MA 02215, USA.
E-mail address : [email protected] AVELET ESTIMATORS OF LONG-MEMORY 45 −1 −0.5 0 0.5 1 1.5 200.20.40.60.81 −1 0 1 2 3 400.20.40.60.81
Figure 1.
Comparison of the asymptotic variances of LWF and LWW estimatorsas functions of d . The dot/dash line displays the variance (75) of the LWF estima-tor with taper order τ ; the plain curve displays the variance ρ ( d, ℓ ) in (66) with ℓ = 7 of the LWW estimator using Daubechies wavelets of order M ; the dottedcurve displays the variance (67) of the LWW estimator using the ideal Shannonwavelet. Left panel: τ = 2 for the LWF, M = 2 for the LWW. Right panel: τ = 4for the LWF, M = 4 for LWW. −1 −0.5 0 0.5 1 1.5 200.050.10.150.20.250.30.350.4 −1 0 1 2 3 400.050.10.150.20.250.30.350.4 Figure 2.
Comparison of the asymptotic variance of the LRW estimator usingAbry-Veitch weights given by (63) with the LRW estimator using optimal weightsgiven by (61). We plot ρ ( d, ℓ ) − ρ ( d, ℓ ) as a function of d with ℓ = 7. We usedDaubechies wavelets for two different values for M . Left panel: M = 2. Rightpanel: M = 4. −1 −0.5 0 0.5 1 1.5 200.511.522.533.544.5 −1 0 1 2 3 400.511.522.533.544.5 Figure 3.
Comparison of the asymptotic variance (66) of the LRW estimatorusing Abry-Veitch weights given by (63) with the one of LRW estimator usingoptimal weights given by (61). We plot ρ ( d, ℓ ) as a function of d for successivevalues of ℓ = 1 , , , , M . Left panel: M = 2. Right panel: M = 4. AVELETESTIMATORSOFLONG-MEMORY47
27 45 72 108 153 234 333 486 693 990 2016
GPH
Taper order = 5 , pooling order = 4Number of Fourier frequencies used27 45 72 108 153 234 333 486 693 990 2016
LWF
Taper order = 5Number of Fourier frequencies 8(6) 7(19) 6(48) 5(109) 4(234) 3(487) . . . . . LRW
Daubechies 8, largest scale = 9Finest scale (equivalent number of frequencies) used8(6) 7(19) 6(48) 5(109) 4(234) 3(487) . . . . . LWW
Daubechies 8, largest scale = 9Finest scale (equivalent number of frequencies) used F i g u r e . D i s t r i bu t i o n o f t h ee s t i m a t o r s f o r a n A R F I M A ( , . , )
27 45 72 108 153 234 333 486 693 990 2016
GPH
Taper order = 5 , pooling order = 4Number of Fourier frequencies used27 45 72 108 153 234 333 486 693 990 2016
LWF
Taper order = 5Number of Fourier frequencies 8(6) 7(19) 6(48) 5(109) 4(234) 3(487) − . . . . LRW
Daubechies 8, largest scale = 9Finest scale (equivalent number of frequencies) used8(6) 7(19) 6(48) 5(109) 4(234) 3(487) − . . . . LWW
Daubechies 8, largest scale = 9Finest scale (equivalent number of frequencies) used F i g u r e . D i s t r i bu t i o n o f t h ee s t i m a t o r s f o r a D A R F I M A ( , . , ) AVELETESTIMATORSOFLONG-MEMORY49 E qu i v a l en t B and w i d t h Mean square error G P H L W F L R W L WW G P H L W F L R W L WW Figure6.
MSEcomparison;Toppanel:ARFIMA(1, d ,0)model, d =1 . d ,0), d =0 . λ = π/ Memory parameter d
Memory parameter d
Memory parameter d
Memory parameter d
Memory parameter d
Memory parameter d2 0 2 4 6 800.10.20.30.40.50.60.70.80.91