Finite sample inference for generic autoregressive models
aa r X i v : . [ m a t h . S T ] S e p Finite sample inference for generic autoregressive models
Hien Duy Nguyen ∗ September 24, 2020
Department of Mathematics and Statistics, La Trobe University, Bundoora 3086, Australia
Abstract
Autoregressive stationary processes are fundamental modeling tools in time series analysis. To conductinference for such models usually requires asymptotic limit theorems. We establish finite sample-valid toolsfor hypothesis testing and confidence set construction in such settings. Further results are established in thealways-valid and sequential inference framework.
Let X = ( X t ) t ∈ Z be a stationary stochastic process, where X t ∈ X ⊆ R d for each t ∈ Z ( d ∈ N ), and let F t = σ ( X t , X t − , . . . ) be the σ -algebra generated by X t , X t − , . . . . We assume that X arises from a generic p thorder ( p ∈ N ) parametric autoregressive process, in the sense that for each A ⊆ X , Pr ( X t ∈ A |F t − ) = Z A f θ (cid:0) x t | X t − t − p (cid:1) d x t .Here, X sr = ( X r , X r +1 , . . . , X s ) ( r ≤ s ) and f θ (cid:0) x t | x t − t − p (cid:1) is a probability density function of x t , conditional on theevent X t − t − p = x t − t − p , characterized by the parameter θ ∈ Θ .Suppose that we observe a subsequence X T = ( X t ) t ∈ [ T ] ( [ T ] = { , . . . , T } ; T ∈ N ), arising from a data generatingprocess (DGP) characterized by an unknown parameter θ ∗ , but with known conditional PDF f θ . Typically, wethen wish to use X T to estimate θ ∗ , to construct confidence sets with specified probabilities of containing θ ∗ , andto test hypotheses regarding the value of θ ∗ .Although difficult, the problem of parametric estimation is largely resolvable via optimization of well-constructedfunctions of the data; see, for example, the comprehensive treatment of De Gooijer (2017, Ch. 6). The problems ofconfidence set construction and hypothesis testing are generally addressed via limit theorems or resampling methodsfor dependent processes, as described in Davidson (1994) and Potscher & Prucha (1997), and Politis et al. (1999)and Lahiri (2003), respectively.In some simple cases, concentration inequality are applicable for the derivation of finite sample inference tools.For example, in the simple case of the univariate first order autoregressive model with normal noise, finite sampleresults have been derived in Vovk (2007), Bercu et al. (2015, Sec. 4.1), and Bercu & Touati (2019).In Wasserman et al. (2020), the authors consider the construction of estimator agnostic and finite sample validconfidence sets and hypothesis tests for independent data, using the martingale property of the generalized likelihoodratio statistic (cf. de la Pena et al., 2009, Sec. 17.1). Following a note regarding conditional likelihoods, we extendthe results of Wasserman et al. (2020) to address the problem of estimator agnostic and finite sample valid inferencefor data arising from generic autoregressive DGPs. This therefore contributes to the developing literature regardingfinite sample inference for time series models.We present two sets of results. The first set uses Markov’s inequality in order to construct confidence sets andtests via a split data setup. The second set of results uses the maximal inequality of Ville (cf. Howard et al.,2020a, Lem. 1), in order to construct always-valid sequential confidence sets and tests. These constructions areclosely related to the test martingales of Shafer et al. (2011) and the recently popular notion of e-values (see, e.g.Shafer & Vovk, Ch. 10 and Vovk, 2020).The paper proceeds as follows. In Sections 2 and 3, we present split data inference tools and always-validinference tools, respectively. Proofs of main results are provided in Section 4. Further technical results are providedin the Appendix. ∗ Email: [email protected]. Split data inference
Let T = T + T , where T , T ∈ N and T ≥ p . Suppose that X arises from a DGP, characterized by an unknownparameter θ ∗ ∈ Θ , and let ˜ θ T be an arbitrary estimator of θ ∗ , obtained from X T . For arbitrary θ , define theconditional likelihood based on X TT +1 by L TT +1 ( θ ) = T Y t = T +1 f θ (cid:0) X t | X t − pt − (cid:1) . (1)Using (1), we define the split data conditional likelihood ratio statistic (CLRS) by U TT +1 ( θ ) = L TT +1 (cid:16) ˜ θ T (cid:17) /L TT +1 ( θ ) .We shall firstly consider confidence sets of the form C TT +1 ( α ) = (cid:8) θ ∈ Θ : U TT +1 ( θ ) ≤ /α (cid:9) ,for α ∈ [0 , . Let E θ ∗ and Pr θ ∗ , denote the expectation and probability operators, evaluated under the assumptionthat the DGP of X is characterized by the parameter of value θ ∗ . The following result establishes the finite samplevalidity of such confidence constructions. Proposition 1.
For each α ∈ [0 , , T ≥ p and T ∈ N , C TT +1 ( α ) is finite sample valid
100 (1 − α ) % confidenceset, in the sense that Pr θ ∗ (cid:0) θ ∗ ∈ C TT +1 ( α ) (cid:1) ≥ − α . Next, we shall consider the problem of testing hypotheses of the form:H : θ ∗ ∈ Θ and H : θ ∗ / ∈ Θ , (2)where H and H denote the null and alternative hypotheses, respectively, and Θ ⊂ Θ is a composite null set.Define the maximum conditional likelihood estimator based on X TT +1 , under H , by ˆ θ TT +1 ∈ arg max θ ∈ Θ L TT +1 ( θ ) , (3)and test (2) via the split data conditional likelihood ratio test (CLRT) rule:Reject H if V TT +1 > /α , (4)where V TT +1 = L TT +1 (cid:16) ˜ θ T (cid:17) /L TT +1 (cid:16) ˆ θ TT +1 (cid:17) .We have the following result regarding the finite sample validity of the split data CLRT. Proposition 2.
For each α ∈ [0 , , T ≥ p and T ∈ N , the split data CLRT, defined by (4), controls the Type Ierror at the significance level α , in the sense that sup θ ∗ ∈ Θ Pr θ ∗ (cid:0) V TT +1 > /α (cid:1) ≤ α .Remark . Instead of (4), we can also use the duality between confidence sets and tests (see, e.g., Thm 2.3 ofHochberg & Tamhane, 1987, Appendix 1) to test the hypotheses (2). That is, we can test hypotheses (2) using therule: Reject H if Θ ∩ C TT +1 ( α ) = ∅ . (5)Rule (5) replaces the optimization problem of computing (3) by potential complications regarding the derivationof the set intersect Θ ∩ C TT +1 ( α ) . Since both tests correctly control the Type I error, the choice between thealternatives is a matter of practicality. 2 Always-valid inference
We now consider the sampling of the elements of the subsequence X T , characterized by the DGP parameterized by θ ∗ ∈ Θ , one at a time. For T > p , let ˜ θ T be a non-anticipatory estimator of θ ∗ (i.e., ˜ θ T is only dependent on X T ).We wish to use the sequence of estimators (cid:16) ˜ θ T (cid:17) T >p to sequentially test the hypotheses (2) and constructconfidence sets for θ ∗ . At any time T > p , define the running CLRT by the rule:Reject H if M T > /α , (6)where M T = Q Tt = p +1 f ˜ θ t − (cid:0) X t | X Tt − p (cid:1)Q Tt = p +1 f ˆ θ T (cid:0) X t | X Tt − p (cid:1) ,and ˆ θ T ∈ arg max θ ∈ Θ T Y t = p +1 f θ (cid:0) X t | X Tt − p (cid:1) .We shall also define M T = 1 , for all ≤ T ≤ p .Let τ θ ∗ denote the time at which the sequence of tests stops, under rejection rule (6), when the DGP of X ischaracterized by the parameter θ ∗ . The following result establishes that τ θ ∗ is finite with probability no greaterthan α . Proposition 3.
The running CLRT, defined by (6), has Type I error at most α . That is, sup θ ∗ ∈ Θ Pr θ ∗ ( τ θ ∗ < ∞ ) ≤ α , for each α ∈ [0 , . Let P T = 1 /M T and ˜ P T = min t ≤ T { /M T } be p -values for the test (2), and let T ∈ N be a random variable.Then, the randomly indexed p -values P T and ˜ P T are both valid. Proposition 4.
For any random T ∈ N , not necessarily a stopping time, P T and ˜ P T are valid, in the sense that sup θ ∗ ∈ Θ Pr θ ∗ ( P T ≤ α ) ≤ α , and sup θ ∗ ∈ Θ Pr θ ∗ (cid:16) ˜ P T ≤ α (cid:17) ≤ α , for all α ∈ [0 , . Let D T ( α ) = { θ ∈ Θ : R T ( θ ) ≤ /α } ,where R T ( θ ) = Q Tt = p +1 f ˜ θ t − (cid:0) X t | X Tt − p (cid:1)Q Tt = p +1 f ˆ θ T (cid:0) X t | X Tt − p (cid:1) ,for T > p and R T ( θ ) = 1 , for T ≤ p . Further, let ˜ D T ( α ) = T Tt =1 D αt . We have the fact that ( D T ( α )) T ∈ N and (cid:16) ˜ D T ( α ) (cid:17) T ∈ N are sequences of confidence sets that are all simultaneously valid. Proposition 5.
For any α ∈ [0 , , the confidence sequences ( D T ( α )) T ∈ N and (cid:16) ˜ D T ( α ) (cid:17) T ∈ N are valid, in the sensethat Pr θ ∗ ( ∀ T ∈ N : θ ∗ ∈ D T ( α )) ≥ − α and Pr θ ∗ (cid:16) ∀ T ∈ N : θ ∗ ∈ ˜ D T ( α ) (cid:17) ≥ − α . The following lemma provides the basis for the proofs of Propositions 1 and 2.
Lemma 1.
For each T ≥ p and T ∈ N , E θ ∗ (cid:2) U TT +1 ( θ ∗ ) (cid:3) = 1 . roof. Let X t − pt − = x t − pt − , for t > p , and X t − pt − = ( X t − p , . . . , X , x , . . . , x t − ) , for t ≤ p . Write E θ ∗ (cid:2) U TT +1 ( θ ∗ ) |F T (cid:3) = Z X T Q Tt = T +1 f ˜ θ T (cid:0) x t | X t − pt − (cid:1)Q Tt = T +1 f θ ∗ (cid:0) x t | X t − pt − (cid:1) T Y t = T +1 f θ ∗ (cid:0) x t | X t − pt − (cid:1) d x TT +1 = Z X T T Y t = T +1 f ˜ θ T (cid:0) x t | X t − pt − (cid:1) d x TT +1 (i) = Z X · · · Z X f ˜ θ T (cid:16) x T +1 | X T +1 − pT (cid:17) d x T +1 · · · f ˜ θ T (cid:16) x T | X T − pT − (cid:17) d x T (ii) = 1 ,where (i) is due to Tonelli’s Theorem and (ii) is due to the definition of a conditional PDF. Then, we apply the lawof iterated expectations to obtain the desired result: E θ ∗ (cid:2) U TT +1 ( θ ∗ ) (cid:3) = E θ ∗ (cid:2) E θ ∗ (cid:2) U TT +1 ( θ ∗ ) |F T (cid:3)(cid:3) = 1 . For any θ ∗ ∈ Θ , we have Pr θ ∗ (cid:0) θ ∗ / ∈ C TT +1 ( α ) (cid:1) = Pr θ ∗ (cid:0) U TT +1 ( θ ∗ ) > /α (cid:1) (i) ≤ α E θ ∗ (cid:2) U TT +1 ( θ ∗ ) (cid:3) (ii) = α ,where (i) is due to Markov’s inequality and (ii) is due to Lemma 1. For any θ ∗ ∈ Θ , we have Pr θ ∗ (cid:0) V TT +1 > /α (cid:1) (i) ≤ α E θ ∗ (cid:2) V TT +1 (cid:3) (ii) ≤ α E θ ∗ (cid:2) U TT +1 ( θ ∗ ) (cid:3) (iii) = α ,where (i) is due to Markov’s inequality, and (ii) is due to the fact that L TT +1 (cid:16) ˆ θ TT +1 (cid:17) ≥ L TT +1 ( θ ∗ ) ,by definition of (3), for all θ ∗ ∈ Θ . Finally, (iii) is obtained by Lemma 1. Let M ∗ T = Q Tt = p +1 f ˜ θ t − (cid:0) X t | X Tt − p (cid:1)Q Tt = p +1 f θ ∗ (cid:0) X t | X Tt − p (cid:1) ,where we define M ∗ T = 1 , for each ≤ T ≤ p . Firstly, we wish to establish that ( M ∗ T ) T ∈ N ∪{ } is a martingale,adapted to the natural filtration (cid:0) F T (cid:1) T ∈ N ∪{ } , where F T = σ ( X T , . . . , X ) . Lemma 2.
For each T ∈ N , E θ ∗ [ M ∗ T |F T − ] = M ∗ T − . roof. We firstly prove the result for
T > p + 1 . Write E θ ∗ [ M ∗ T |F T − ]= Z X Q Tt = p +1 f ˜ θ t − (cid:0) X t | X t − pt − (cid:1)Q Tt = p +1 f θ ∗ (cid:0) X t | X t − pt − (cid:1) f θ ∗ (cid:16) x T | X T − pT − (cid:17) d x T = Q T − t = p +1 f ˜ θ t − (cid:0) X t | X t − pt − (cid:1)Q T − t = p +1 f θ ∗ (cid:0) X t | X t − pt − (cid:1) Z X f ˜ θ T − (cid:16) x T | X T − pT − (cid:17) d x T (i) = M ∗ T − ,where (i) is due to the definition of a conditional PDF. By definition of M ∗ T , the result also holds for T ≤ p + 1 , asrequired. By Lemmas 2 and 3, for any α > , we have Pr θ ∗ ( ∃ T ∈ N : M ∗ T ≥ /α ) ≤ αM ∗ .Since { τ θ ∗ = ∞} = {∀ T ∈ N : M T < /α } ,we have Pr θ ∗ ( τ θ ∗ < ∞ ) = Pr θ ∗ ( ∃ T ∈ N : M T ≥ /α ) (i) ≤ Pr θ ∗ ( ∃ T ∈ N : M ∗ T ≥ /α )= αM ∗ (ii) = α ,where (i) is due to the fact that T Y t = p +1 f ˆ θ T (cid:0) X t | X Tt − p (cid:1) ≥ T Y t = p +1 f θ ∗ (cid:0) X t | X Tt − p (cid:1) ,for every θ ∗ ∈ Θ , and (ii) is by definition of M ∗ . For the case of P T , we apply Lemma 4, together with the fact that {∃ T ∈ N : M T ≥ /α } = {∃ T ∈ N : P T ≤ α } = ∞ [ T =1 { P T ≤ α } .Then, we obtain the result for ˜ P T using the fact that n ˜ P T ≤ α o = T [ t =1 { P t ≤ α } ,which implies ∞ [ T =1 n ˜ P T ≤ α o = ∞ [ T =1 T [ t =1 { P t ≤ α } = ∞ [ T =1 { P t ≤ α } .5 .2.3 Proof of Proposition 5 First, note that R T ( θ ∗ ) = M ∗ T . Then, Pr θ ∗ ( ∃ T ∈ N : θ / ∈ D T ( α )) = Pr θ ∗ ( ∃ T ∈ N : R T ( θ ∗ ) > /α ) ≤ Pr θ ∗ ( ∃ T ∈ N : M ∗ T ≥ /α ) (i) ≤ α ,where (i) is due to Lemma 2. Thus, ( D T ( α )) T ∈ N is valid.Next, note that n θ ∗ / ∈ ˜ D T ( α ) o = ( θ ∗ / ∈ T \ t =1 D T ( α ) ) = T [ t =1 { θ ∗ / ∈ D t ( α ) } .Then, we obtain the validity of (cid:16) ˜ D T ( α ) (cid:17) T ∈ N , due to n ∃ T ∈ N : θ ∗ / ∈ ˜ D T ( α ) o = ∞ [ T =1 n θ ∗ / ∈ ˜ D T ( α ) o = ∞ [ T =1 T [ t =1 { θ ∗ / ∈ D t ( α ) } = ∞ [ T =1 { θ ∗ / ∈ D t ( α ) } = {∃ T ∈ N : θ ∗ / ∈ D T ( α ) } . We state some technical results that are required throughout the text. References for unproved results are providedat the end of the section.
Lemma 3 (Ville’s Inequality) . If ( Y T ) T ∈ N ∪{ } is a non-negative supermartingale, adapted to the filtration ( F T ) T ∈ N ∪{ } .Then, for any α > , we have Pr ( ∃ T ∈ N : Y T ≥ /α ) ≤ αY . Lemma 4.
Let ( A T ) T ∈ N be a sequence of events in some filtered probability space, and let A ∞ = lim sup T →∞ A T .If α ∈ [0 , , then the following statements are equivalent: (a) Pr ( S ∞ T =1 A T ) ≤ α , (b) Pr ( A T ) ≤ α for all random(potentially not stopping times) T , (c) Pr ( A τ ) ≤ α for all stopping times τ (possibly infinite). Lemma 3 appears as Lemma 1 in Howard et al. (2020a) (see also Stout, 1973, Lem. 1.1). Lemma 4 appears asLemma 3 in Howard et al. (2020b).
References
Bercu, B., Delyon, B., & Rio, E. (2015).
Concentration Inequalities for Sums and Martingales . Cham: Springer.Bercu, B. & Touati, T. (2019). New insights on concentration inequalities for self-normalized martingales.
ElectronicCommunications in Probability , (63), 1–12.Davidson, J. (1994).
Stochastic Limit Theory: An Introduction for Econometricians . Oxford: Oxford UniversityPress.De Gooijer, J. G. (2017).
Elements of Nonlinear Time Series Analysis and Forecasting . Berlin: Springer.de la Pena, V. H., Lai, T. L., & Shao, Q.-M. (2009).
Self-Normalized Processes: Limit Theory and StatisticalApplications . Berlin: Springer.Hochberg, Y. & Tamhane, A. C. (1987).
Multiple Comparison Procedures . New York: Wiley.Howard, S. R., Ramdas, A., McAuliffe, J., & Sekhon, J. (2020a). Time-uniform Chernoff bounds via nonnegativesupermartingales.
Probability Surveys , 17, 257–317. 6oward, S. R., Ramdas, A., McAuliffe, J., & Sekhon, J. (2020b). Time-uniform, nonparametric, nonasymptoticconfidence sequences.
ArXiv .Lahiri, S. N. (2003).
Resampling Methods for Dependent Data . New York: Springer.Politis, D. N., Romano, J. P., & Wolf, M. (1999).
Subsampling . New York: Springer.Potscher, B. M. & Prucha, I. R. (1997).
Dynamic Nonlinear Econometric Models: Asymptotic Theory . Berlin:Springer.Shafer, G., Shen, A., Vereshchagin, N., & Vovk, V. (2011). Test martingales, Bayes factors and p-values.
StatisticalScience , 26, 84–101.Shafer, G. & Vovk, V. (2019).
Game-Theoretic Foundations for Probability and Finance . Hoboken: Wiley.Stout, W. F. (1973). Maximal Inequalities and the Law of the Iterated Logarithm.
Annals of Probability , (pp.322–328).Vovk, V. (2007). Strong confidence intervals for autoregression.
ArXiv , (arXiv:0707.0660v1).Vovk, V. (2020). Non-algorithmic theory of randomness. In A. Blass, P. Cegielski, N. Dershowitz, M. Droste, & B.Finkbeiner (Eds.),
Fields of Logic and Computation III (pp. 323–340). Cham: Springer.Wasserman, L., Ramdas, A., & Balakrishnan, S. (2020). Universal inference.