[PDF] Finite sample inference for generic autoregressive models

Abstract

Autoregressive stationary processes are fundamental modeling tools in time series analysis. To conduct inference for such models usually requires asymptotic limit theorems. We establish finite sample-valid tools for hypothesis testing and confidence set construction in such settings. Further results are established in the always-valid and sequential inference framework.

Full PDF

aa r X i v : . [ m a t h . S T ] S e p Finite sample inference for generic autoregressive models

Hien Duy Nguyen ∗ September 24, 2020

Department of Mathematics and Statistics, La Trobe University, Bundoora 3086, Australia

Abstract

Autoregressive stationary processes are fundamental modeling tools in time series analysis. To conductinference for such models usually requires asymptotic limit theorems. We establish ﬁnite sample-valid toolsfor hypothesis testing and conﬁdence set construction in such settings. Further results are established in thealways-valid and sequential inference framework.

Let X = ( X t ) t ∈ Z be a stationary stochastic process, where X t ∈ X ⊆ R d for each t ∈ Z ( d ∈ N ), and let F t = σ ( X t , X t − , . . . ) be the σ -algebra generated by X t , X t − , . . . . We assume that X arises from a generic p thorder ( p ∈ N ) parametric autoregressive process, in the sense that for each A ⊆ X , Pr ( X t ∈ A |F t − ) = Z A f θ (cid:0) x t | X t − t − p (cid:1) d x t .Here, X sr = ( X r , X r +1 , . . . , X s ) ( r ≤ s ) and f θ (cid:0) x t | x t − t − p (cid:1) is a probability density function of x t , conditional on theevent X t − t − p = x t − t − p , characterized by the parameter θ ∈ Θ .Suppose that we observe a subsequence X T = ( X t ) t ∈ [ T ] ( [ T ] = { , . . . , T } ; T ∈ N ), arising from a data generatingprocess (DGP) characterized by an unknown parameter θ ∗ , but with known conditional PDF f θ . Typically, wethen wish to use X T to estimate θ ∗ , to construct conﬁdence sets with speciﬁed probabilities of containing θ ∗ , andto test hypotheses regarding the value of θ ∗ .Although diﬃcult, the problem of parametric estimation is largely resolvable via optimization of well-constructedfunctions of the data; see, for example, the comprehensive treatment of De Gooijer (2017, Ch. 6). The problems ofconﬁdence set construction and hypothesis testing are generally addressed via limit theorems or resampling methodsfor dependent processes, as described in Davidson (1994) and Potscher & Prucha (1997), and Politis et al. (1999)and Lahiri (2003), respectively.In some simple cases, concentration inequality are applicable for the derivation of ﬁnite sample inference tools.For example, in the simple case of the univariate ﬁrst order autoregressive model with normal noise, ﬁnite sampleresults have been derived in Vovk (2007), Bercu et al. (2015, Sec. 4.1), and Bercu & Touati (2019).In Wasserman et al. (2020), the authors consider the construction of estimator agnostic and ﬁnite sample validconﬁdence sets and hypothesis tests for independent data, using the martingale property of the generalized likelihoodratio statistic (cf. de la Pena et al., 2009, Sec. 17.1). Following a note regarding conditional likelihoods, we extendthe results of Wasserman et al. (2020) to address the problem of estimator agnostic and ﬁnite sample valid inferencefor data arising from generic autoregressive DGPs. This therefore contributes to the developing literature regardingﬁnite sample inference for time series models.We present two sets of results. The ﬁrst set uses Markov’s inequality in order to construct conﬁdence sets andtests via a split data setup. The second set of results uses the maximal inequality of Ville (cf. Howard et al.,2020a, Lem. 1), in order to construct always-valid sequential conﬁdence sets and tests. These constructions areclosely related to the test martingales of Shafer et al. (2011) and the recently popular notion of e-values (see, e.g.Shafer & Vovk, Ch. 10 and Vovk, 2020).The paper proceeds as follows. In Sections 2 and 3, we present split data inference tools and always-validinference tools, respectively. Proofs of main results are provided in Section 4. Further technical results are providedin the Appendix. ∗ Email: [email protected]. Split data inference

Let T = T + T , where T , T ∈ N and T ≥ p . Suppose that X arises from a DGP, characterized by an unknownparameter θ ∗ ∈ Θ , and let ˜ θ T be an arbitrary estimator of θ ∗ , obtained from X T . For arbitrary θ , deﬁne theconditional likelihood based on X TT +1 by L TT +1 ( θ ) = T Y t = T +1 f θ (cid:0) X t | X t − pt − (cid:1) . (1)Using (1), we deﬁne the split data conditional likelihood ratio statistic (CLRS) by U TT +1 ( θ ) = L TT +1 (cid:16) ˜ θ T (cid:17) /L TT +1 ( θ ) .We shall ﬁrstly consider conﬁdence sets of the form C TT +1 ( α ) = (cid:8) θ ∈ Θ : U TT +1 ( θ ) ≤ /α (cid:9) ,for α ∈ [0 , . Let E θ ∗ and Pr θ ∗ , denote the expectation and probability operators, evaluated under the assumptionthat the DGP of X is characterized by the parameter of value θ ∗ . The following result establishes the ﬁnite samplevalidity of such conﬁdence constructions. Proposition 1.

For each α ∈ [0 , , T ≥ p and T ∈ N , C TT +1 ( α ) is ﬁnite sample valid

100 (1 − α ) % conﬁdenceset, in the sense that Pr θ ∗ (cid:0) θ ∗ ∈ C TT +1 ( α ) (cid:1) ≥ − α . Next, we shall consider the problem of testing hypotheses of the form:H : θ ∗ ∈ Θ and H : θ ∗ / ∈ Θ , (2)where H and H denote the null and alternative hypotheses, respectively, and Θ ⊂ Θ is a composite null set.Deﬁne the maximum conditional likelihood estimator based on X TT +1 , under H , by ˆ θ TT +1 ∈ arg max θ ∈ Θ L TT +1 ( θ ) , (3)and test (2) via the split data conditional likelihood ratio test (CLRT) rule:Reject H if V TT +1 > /α , (4)where V TT +1 = L TT +1 (cid:16) ˜ θ T (cid:17) /L TT +1 (cid:16) ˆ θ TT +1 (cid:17) .We have the following result regarding the ﬁnite sample validity of the split data CLRT. Proposition 2.

For each α ∈ [0 , , T ≥ p and T ∈ N , the split data CLRT, deﬁned by (4), controls the Type Ierror at the signiﬁcance level α , in the sense that sup θ ∗ ∈ Θ Pr θ ∗ (cid:0) V TT +1 > /α (cid:1) ≤ α .Remark . Instead of (4), we can also use the duality between conﬁdence sets and tests (see, e.g., Thm 2.3 ofHochberg & Tamhane, 1987, Appendix 1) to test the hypotheses (2). That is, we can test hypotheses (2) using therule: Reject H if Θ ∩ C TT +1 ( α ) = ∅ . (5)Rule (5) replaces the optimization problem of computing (3) by potential complications regarding the derivationof the set intersect Θ ∩ C TT +1 ( α ) . Since both tests correctly control the Type I error, the choice between thealternatives is a matter of practicality. 2 Always-valid inference

We now consider the sampling of the elements of the subsequence X T , characterized by the DGP parameterized by θ ∗ ∈ Θ , one at a time. For T > p , let ˜ θ T be a non-anticipatory estimator of θ ∗ (i.e., ˜ θ T is only dependent on X T ).We wish to use the sequence of estimators (cid:16) ˜ θ T (cid:17) T >p to sequentially test the hypotheses (2) and constructconﬁdence sets for θ ∗ . At any time T > p , deﬁne the running CLRT by the rule:Reject H if M T > /α , (6)where M T = Q Tt = p +1 f ˜ θ t − (cid:0) X t | X Tt − p (cid:1)Q Tt = p +1 f ˆ θ T (cid:0) X t | X Tt − p (cid:1) ,and ˆ θ T ∈ arg max θ ∈ Θ T Y t = p +1 f θ (cid:0) X t | X Tt − p (cid:1) .We shall also deﬁne M T = 1 , for all ≤ T ≤ p .Let τ θ ∗ denote the time at which the sequence of tests stops, under rejection rule (6), when the DGP of X ischaracterized by the parameter θ ∗ . The following result establishes that τ θ ∗ is ﬁnite with probability no greaterthan α . Proposition 3.

The running CLRT, deﬁned by (6), has Type I error at most α . That is, sup θ ∗ ∈ Θ Pr θ ∗ ( τ θ ∗ < ∞ ) ≤ α , for each α ∈ [0 , . Let P T = 1 /M T and ˜ P T = min t ≤ T { /M T } be p -values for the test (2), and let T ∈ N be a random variable.Then, the randomly indexed p -values P T and ˜ P T are both valid. Proposition 4.

For any random T ∈ N , not necessarily a stopping time, P T and ˜ P T are valid, in the sense that sup θ ∗ ∈ Θ Pr θ ∗ ( P T ≤ α ) ≤ α , and sup θ ∗ ∈ Θ Pr θ ∗ (cid:16) ˜ P T ≤ α (cid:17) ≤ α , for all α ∈ [0 , . Let D T ( α ) = { θ ∈ Θ : R T ( θ ) ≤ /α } ,where R T ( θ ) = Q Tt = p +1 f ˜ θ t − (cid:0) X t | X Tt − p (cid:1)Q Tt = p +1 f ˆ θ T (cid:0) X t | X Tt − p (cid:1) ,for T > p and R T ( θ ) = 1 , for T ≤ p . Further, let ˜ D T ( α ) = T Tt =1 D αt . We have the fact that ( D T ( α )) T ∈ N and (cid:16) ˜ D T ( α ) (cid:17) T ∈ N are sequences of conﬁdence sets that are all simultaneously valid. Proposition 5.

For any α ∈ [0 , , the conﬁdence sequences ( D T ( α )) T ∈ N and (cid:16) ˜ D T ( α ) (cid:17) T ∈ N are valid, in the sensethat Pr θ ∗ ( ∀ T ∈ N : θ ∗ ∈ D T ( α )) ≥ − α and Pr θ ∗ (cid:16) ∀ T ∈ N : θ ∗ ∈ ˜ D T ( α ) (cid:17) ≥ − α . The following lemma provides the basis for the proofs of Propositions 1 and 2.

Lemma 1.

For each T ≥ p and T ∈ N , E θ ∗ (cid:2) U TT +1 ( θ ∗ ) (cid:3) = 1 . roof. Let X t − pt − = x t − pt − , for t > p , and X t − pt − = ( X t − p , . . . , X , x , . . . , x t − ) , for t ≤ p . Write E θ ∗ (cid:2) U TT +1 ( θ ∗ ) |F T (cid:3) = Z X T Q Tt = T +1 f ˜ θ T (cid:0) x t | X t − pt − (cid:1)Q Tt = T +1 f θ ∗ (cid:0) x t | X t − pt − (cid:1) T Y t = T +1 f θ ∗ (cid:0) x t | X t − pt − (cid:1) d x TT +1 = Z X T T Y t = T +1 f ˜ θ T (cid:0) x t | X t − pt − (cid:1) d x TT +1 (i) = Z X · · · Z X f ˜ θ T (cid:16) x T +1 | X T +1 − pT (cid:17) d x T +1 · · · f ˜ θ T (cid:16) x T | X T − pT − (cid:17) d x T (ii) = 1 ,where (i) is due to Tonelli’s Theorem and (ii) is due to the deﬁnition of a conditional PDF. Then, we apply the lawof iterated expectations to obtain the desired result: E θ ∗ (cid:2) U TT +1 ( θ ∗ ) (cid:3) = E θ ∗ (cid:2) E θ ∗ (cid:2) U TT +1 ( θ ∗ ) |F T (cid:3)(cid:3) = 1 . For any θ ∗ ∈ Θ , we have Pr θ ∗ (cid:0) θ ∗ / ∈ C TT +1 ( α ) (cid:1) = Pr θ ∗ (cid:0) U TT +1 ( θ ∗ ) > /α (cid:1) (i) ≤ α E θ ∗ (cid:2) U TT +1 ( θ ∗ ) (cid:3) (ii) = α ,where (i) is due to Markov’s inequality and (ii) is due to Lemma 1. For any θ ∗ ∈ Θ , we have Pr θ ∗ (cid:0) V TT +1 > /α (cid:1) (i) ≤ α E θ ∗ (cid:2) V TT +1 (cid:3) (ii) ≤ α E θ ∗ (cid:2) U TT +1 ( θ ∗ ) (cid:3) (iii) = α ,where (i) is due to Markov’s inequality, and (ii) is due to the fact that L TT +1 (cid:16) ˆ θ TT +1 (cid:17) ≥ L TT +1 ( θ ∗ ) ,by deﬁnition of (3), for all θ ∗ ∈ Θ . Finally, (iii) is obtained by Lemma 1. Let M ∗ T = Q Tt = p +1 f ˜ θ t − (cid:0) X t | X Tt − p (cid:1)Q Tt = p +1 f θ ∗ (cid:0) X t | X Tt − p (cid:1) ,where we deﬁne M ∗ T = 1 , for each ≤ T ≤ p . Firstly, we wish to establish that ( M ∗ T ) T ∈ N ∪{ } is a martingale,adapted to the natural ﬁltration (cid:0) F T (cid:1) T ∈ N ∪{ } , where F T = σ ( X T , . . . , X ) . Lemma 2.

For each T ∈ N , E θ ∗ [ M ∗ T |F T − ] = M ∗ T − . roof. We ﬁrstly prove the result for

T > p + 1 . Write E θ ∗ [ M ∗ T |F T − ]= Z X Q Tt = p +1 f ˜ θ t − (cid:0) X t | X t − pt − (cid:1)Q Tt = p +1 f θ ∗ (cid:0) X t | X t − pt − (cid:1) f θ ∗ (cid:16) x T | X T − pT − (cid:17) d x T = Q T − t = p +1 f ˜ θ t − (cid:0) X t | X t − pt − (cid:1)Q T − t = p +1 f θ ∗ (cid:0) X t | X t − pt − (cid:1) Z X f ˜ θ T − (cid:16) x T | X T − pT − (cid:17) d x T (i) = M ∗ T − ,where (i) is due to the deﬁnition of a conditional PDF. By deﬁnition of M ∗ T , the result also holds for T ≤ p + 1 , asrequired. By Lemmas 2 and 3, for any α > , we have Pr θ ∗ ( ∃ T ∈ N : M ∗ T ≥ /α ) ≤ αM ∗ .Since { τ θ ∗ = ∞} = {∀ T ∈ N : M T < /α } ,we have Pr θ ∗ ( τ θ ∗ < ∞ ) = Pr θ ∗ ( ∃ T ∈ N : M T ≥ /α ) (i) ≤ Pr θ ∗ ( ∃ T ∈ N : M ∗ T ≥ /α )= αM ∗ (ii) = α ,where (i) is due to the fact that T Y t = p +1 f ˆ θ T (cid:0) X t | X Tt − p (cid:1) ≥ T Y t = p +1 f θ ∗ (cid:0) X t | X Tt − p (cid:1) ,for every θ ∗ ∈ Θ , and (ii) is by deﬁnition of M ∗ . For the case of P T , we apply Lemma 4, together with the fact that {∃ T ∈ N : M T ≥ /α } = {∃ T ∈ N : P T ≤ α } = ∞ [ T =1 { P T ≤ α } .Then, we obtain the result for ˜ P T using the fact that n ˜ P T ≤ α o = T [ t =1 { P t ≤ α } ,which implies ∞ [ T =1 n ˜ P T ≤ α o = ∞ [ T =1 T [ t =1 { P t ≤ α } = ∞ [ T =1 { P t ≤ α } .5 .2.3 Proof of Proposition 5 First, note that R T ( θ ∗ ) = M ∗ T . Then, Pr θ ∗ ( ∃ T ∈ N : θ / ∈ D T ( α )) = Pr θ ∗ ( ∃ T ∈ N : R T ( θ ∗ ) > /α ) ≤ Pr θ ∗ ( ∃ T ∈ N : M ∗ T ≥ /α ) (i) ≤ α ,where (i) is due to Lemma 2. Thus, ( D T ( α )) T ∈ N is valid.Next, note that n θ ∗ / ∈ ˜ D T ( α ) o = ( θ ∗ / ∈ T \ t =1 D T ( α ) ) = T [ t =1 { θ ∗ / ∈ D t ( α ) } .Then, we obtain the validity of (cid:16) ˜ D T ( α ) (cid:17) T ∈ N , due to n ∃ T ∈ N : θ ∗ / ∈ ˜ D T ( α ) o = ∞ [ T =1 n θ ∗ / ∈ ˜ D T ( α ) o = ∞ [ T =1 T [ t =1 { θ ∗ / ∈ D t ( α ) } = ∞ [ T =1 { θ ∗ / ∈ D t ( α ) } = {∃ T ∈ N : θ ∗ / ∈ D T ( α ) } . We state some technical results that are required throughout the text. References for unproved results are providedat the end of the section.

Lemma 3 (Ville’s Inequality) . If ( Y T ) T ∈ N ∪{ } is a non-negative supermartingale, adapted to the ﬁltration ( F T ) T ∈ N ∪{ } .Then, for any α > , we have Pr ( ∃ T ∈ N : Y T ≥ /α ) ≤ αY . Lemma 4.

Let ( A T ) T ∈ N be a sequence of events in some ﬁltered probability space, and let A ∞ = lim sup T →∞ A T .If α ∈ [0 , , then the following statements are equivalent: (a) Pr ( S ∞ T =1 A T ) ≤ α , (b) Pr ( A T ) ≤ α for all random(potentially not stopping times) T , (c) Pr ( A τ ) ≤ α for all stopping times τ (possibly inﬁnite). Lemma 3 appears as Lemma 1 in Howard et al. (2020a) (see also Stout, 1973, Lem. 1.1). Lemma 4 appears asLemma 3 in Howard et al. (2020b).

References

Bercu, B., Delyon, B., & Rio, E. (2015).

Concentration Inequalities for Sums and Martingales . Cham: Springer.Bercu, B. & Touati, T. (2019). New insights on concentration inequalities for self-normalized martingales.

ElectronicCommunications in Probability , (63), 1–12.Davidson, J. (1994).

Stochastic Limit Theory: An Introduction for Econometricians . Oxford: Oxford UniversityPress.De Gooijer, J. G. (2017).

Elements of Nonlinear Time Series Analysis and Forecasting . Berlin: Springer.de la Pena, V. H., Lai, T. L., & Shao, Q.-M. (2009).

Self-Normalized Processes: Limit Theory and StatisticalApplications . Berlin: Springer.Hochberg, Y. & Tamhane, A. C. (1987).

Multiple Comparison Procedures . New York: Wiley.Howard, S. R., Ramdas, A., McAuliﬀe, J., & Sekhon, J. (2020a). Time-uniform Chernoﬀ bounds via nonnegativesupermartingales.

Probability Surveys , 17, 257–317. 6oward, S. R., Ramdas, A., McAuliﬀe, J., & Sekhon, J. (2020b). Time-uniform, nonparametric, nonasymptoticconﬁdence sequences.

ArXiv .Lahiri, S. N. (2003).

Resampling Methods for Dependent Data . New York: Springer.Politis, D. N., Romano, J. P., & Wolf, M. (1999).

Subsampling . New York: Springer.Potscher, B. M. & Prucha, I. R. (1997).

Dynamic Nonlinear Econometric Models: Asymptotic Theory . Berlin:Springer.Shafer, G., Shen, A., Vereshchagin, N., & Vovk, V. (2011). Test martingales, Bayes factors and p-values.

StatisticalScience , 26, 84–101.Shafer, G. & Vovk, V. (2019).

Game-Theoretic Foundations for Probability and Finance . Hoboken: Wiley.Stout, W. F. (1973). Maximal Inequalities and the Law of the Iterated Logarithm.

Annals of Probability , (pp.322–328).Vovk, V. (2007). Strong conﬁdence intervals for autoregression.

ArXiv , (arXiv:0707.0660v1).Vovk, V. (2020). Non-algorithmic theory of randomness. In A. Blass, P. Cegielski, N. Dershowitz, M. Droste, & B.Finkbeiner (Eds.),

Fields of Logic and Computation III (pp. 323–340). Cham: Springer.Wasserman, L., Ramdas, A., & Balakrishnan, S. (2020). Universal inference.