[PDF] Inference and model selection in general causal time series with exogenous covariates

Abstract

In this paper, we study a general class of causal processes with exogenous covariates, including many classical processes such as the ARMA-GARCH, APARCH, ARMAX, GARCH-X and APARCH-X processes. Under some Lipschitz-type conditions, the existence of a \tau-weakly dependent strictly stationary and ergodic solution is established. We provide conditions for the strong consistency and derive the asymptotic distribution of the quasi-maximum likelihood estimator (QMLE), both when the true parameter is an interior point of the parameter's space and when it belongs to the boundary. A significance Wald-type test of parameter is developed. This test is quite extensive and includes the test of nullity of the parameter's components, which in particular, allows us to assess the relevance of the exogenous covariates. Relying on the QMLE of the model, we also propose a penalized criterion to address the problem of the model selection for this class. The weak and the strong consistency of the procedure are established. Finally, Monte Carlo simulations are conducted to numerically illustrate the main results.

Full PDF

IInference and model selection in general causal timeseries with exogenous covariates

February 8, 2021

Mamadou Lamine DIOP and William KENGNE THEMA, CY Cergy Paris Universit´e, 33 Boulevard du Port, 95011 Cergy-Pontoise Cedex, France.E-mail: [email protected] ; [email protected]

Abstract : In this paper, we study a general class of causal processes with exogenous covariates, including manyclassical processes such as the ARMA-GARCH, APARCH, ARMAX, GARCH-X and APARCH-X processes.Under some Lipschitz-type conditions, the existence of a τ -weakly dependent strictly stationary and ergodicsolution is established. We provide conditions for the strong consistency and derive the asymptotic distribu-tion of the quasi-maximum likelihood estimator (QMLE), both when the true parameter is an interior pointof the parameter’s space and when it belongs to the boundary. A signiﬁcance Wald-type test of parameter isdeveloped. This test is quite extensive and includes the test of nullity of the parameter’s components, which inparticular, allows us to assess the relevance of the exogenous covariates. Relying on the QMLE of the model,we also propose a penalized criterion to address the problem of the model selection for this class. The weakand the strong consistency of the procedure are established. Finally, Monte Carlo simulations are conducted tonumerically illustrate the main results. Keywords:

Causal processes, exogenous covariates, quasi-maximum likelihood estimator, consistency, bound-ary, signiﬁcance test, model selection, penalized criterion.

Autoregressive time series with exogenous covariates provide eﬀective ways to take into account some availableextra information in the models. The well known example that has been widely studied is the ARMAX model,see Hannan (1976), Hannan and Deistler (2012). The GARCH-type models with exogenous covariates haverecently attracted much attention in the literature, see for instance Han and Kristensen (2014) for GARCH-X,Francq and Thieu (2019) for APARCH-X. Guo et al. (2014) considered the factor double autoregressive model,whose ARX and ARCH-X are particular cases. We consider a large class of causal time series models, whoseARMAX and GARCH-X type models are speciﬁc examples. Supported by the MME-DII center of excellence (ANR-11-LABEX-0023-01) Developed within the ANR BREAKRISK: ANR-17-CE26-0001-01 and the CY Initiative of Excellence (grant ”Investissementsd’Avenir” ANR-16-IDEX-0008), Project ”EcoDep” PSI-AAP2020-0000000013 a r X i v : . [ m a t h . S T ] F e b Inference and model selection in general causal time series with exogenous covariates

Let X t = ( X ,t , X ,t , . . . , X d x ,t ) ∈ R d x be a vector of covariates, with d x ∈ N . Consider the class of aﬃnecausal models with exogenous covariates, Class AC - X ( M θ , f θ ) : A process { Y t , t ∈ Z } belongs to AC - X ( M θ , f θ ) if it satisﬁes: Y t = M θ ( Y t − , Y t − , . . . ; X t − , X t − , . . . ) ξ t + f θ ( Y t − , Y t − , . . . ; X t − , X t − , . . . ) , (1.1)where M θ , f θ : R N × ( R d x ) N → R are two measurable functions and assumed to be known up to the parameter θ , which belongs in a compact subset Θ ⊂ R d ( d ∈ N ); and ( ξ t ) t ∈ Z is a sequence of zero-mean independent,identically distributed ( i.i.d ) random variable satisfying E ( ξ r ) < ∞ for some r ≥ E ( ξ ) = 1. Remarkthat, if X t ≡ C for some constant C (absence of covariates), then (1.1) reduces to the classical aﬃne causalmodels that has already been considered in the literature (see, for instance, Bardet and Wintenberger (2009),Bardet et al. (2012), Bardet et al. (2020)). One can see that, the ARMAX, GARCH-X, APARCH-X modelsbelong to the class AC - X ( M θ , f θ ).There exist several important contributions devoted to autoregressive models with covariates; we refer toHannan and Deistler (2012), Han and Kristensen (2014), Sucarrat et al. (2016), Francq and Sucarrat (2017),Pedersen and Rahbek (2018), Francq and Thieu (2019), Grønneberg and Holcblat (2019), Zambom and Gel(2020) and the references therein for some developments on ARMAX and conditional volatility type models withexogenous covariates. The class AC - X ( M θ , f θ ) is more general than the models considered in the aforementionedworks, as well as the factor double autoregressive model proposed by Guo et al. (2014) which is a particularcase of the model (1.1). Note as well that, the class AC - X ( M θ , f θ ) provides a more general way to take intoaccount covariates in the model, and one can see that the linear covariates regressors considered by Francq andThieu (2019) and by many other works is a speciﬁc case. Compared to Bardet and Wintenberger (2009), besidestaking into account covariates in the model (1.1), we address the inference when the true parameter belongs tothe boundary of the parameter set Θ and the model selection question.In this new contribution, we consider the class of model (1.1) and address the following issues.(i) Existence of a stationary solution . We provide suﬃcient conditions that ensure the existence of a τ -weakly dependent stationary and ergodic solution Z t = ( Y t , X t ) of (1.1). At a ﬁrst glance, one mightthink that these conditions are the same as those obtained by Bardet and Wintenberger (2009), but inour case, the existence of the covariates must be taken into account.(ii) Inference for the class AC - X ( M θ ∗ , f θ ∗ ). An inference based on the quasi likelihood of the model iscarried out. The consistency of the quasi-maximum likelihood estimator (QMLE) is established and wederived the asymptotic distribution of this estimator (even when θ ∗ belongs to the boundary of Θ).(iii) Signiﬁcance test of parameter . A Wald-type signiﬁcance test of parameter of the model (1.1) isconducted. The proposed test is quite extensive and includes the test of nullity of the parameter’s com-ponents. An asymptotic study is carried out, which shows in particular that, when the true parameterbelongs to the boundary of Θ, the asymptotic distribution of the test statistic under the null hypothesisis quite diﬀerent from the classical chi-square distribution.(iv)

Model selection . A penalized criterion based on the quasi likelihood of the model is proposed formodel selection in the class AC - X ( M θ ∗ , f θ ∗ ). We provides conditions that ensure the weak and the strongconsistency of the proposed procedure. These conditions shows in particular that, the Hannan-Quinn iop and Kengne κ n = c log log n (see (3.2)) is stronglyconsistent for suﬃciently large c .The article is organized as follows. In Section 2, ﬁrstly, we provide conditions for stability properties. Secondly,we give the deﬁnition of the QMLE and study its asymptotic properties; a signiﬁcance test of parameter withan asymptotic study is also addressed. Section 3 focuses on the model selection and the consistency of theproposed procedure. Some classical examples of processes belonging to the class AC - X ( M θ ∗ , f θ ∗ ) are detailedin Section 4. Section 5 gives some empirical results, whereas Section 6 is devoted to a summary and conclusion.Section 7 contains the proofs of the main results. Throughout the sequel, the following norms will be used: • (cid:107) x (cid:107) := » (cid:80) pi =1 x i , for any x ∈ R p , p ∈ N ; • (cid:107) V (cid:107) := » (cid:80) pi =1 (cid:80) qj =1 v i,j , for any matrix V ∈ M p,q ( R ) , where M p,q ( R ) denotes the set of matrices ofdimension p × q with coeﬃcients in R , for p, q ∈ N ; • (cid:107) g (cid:107) K := sup θ ∈K ( (cid:107) g ( θ ) (cid:107) ) for any compact set K ⊆ R d and function g : K −→ M p,q ( R ) ; • (cid:107) Y (cid:107) r := E ( (cid:107) Y (cid:107) r ) /r , if Y is a random vector with ﬁnite r − order moments, for r > . We will denote by 0 the null vector of any vector space. Let Ψ θ be the generic symbol for any of the functions f θ or M θ . We set the following classical Lipschitz-type conditions for any compact set K ⊆ Θ. Assumption A i (Ψ θ , K ) ( i = 0 , , y, x ) ∈ R N × ( R d x ) N , the function θ (cid:55)→ Ψ θ ( y ) is i timescontinuously diﬀerentiable on K with (cid:13)(cid:13) ∂ i Ψ θ (0) ∂θ i (cid:13)(cid:13) K < ∞ ; and there exists two sequences of non-negative realnumbers ( α ( i ) k,Y (Ψ θ , K )) k ≥ and ( α ( i ) k,X (Ψ θ , K )) k ≥ satisfying: ∞ (cid:80) k =1 α ( i ) k,Y (Ψ θ , K ) < ∞ , ∞ (cid:80) k =1 α ( i ) k,X (Ψ θ , K ) < ∞ for i = 0 , ,

2; such that for any ( y, x ) , ( y (cid:48) , x (cid:48) ) ∈ R N × ( R d x ) N , (cid:13)(cid:13)(cid:13) ∂ i Ψ θ ( y, x ) ∂θ i − ∂ i Ψ θ ( y (cid:48) , x (cid:48) ) ∂θ i (cid:13)(cid:13)(cid:13) K ≤ ∞ (cid:88) k =1 α ( i ) k,Y (Ψ θ , K ) | y k − y (cid:48) k | + ∞ (cid:88) k =1 α ( i ) k,X (Ψ θ , K ) (cid:107) x k − x (cid:48) k (cid:107) ) , where (cid:107) · (cid:107) denotes any vector, matrix norm.The following assumption is considered on the function H θ = M θ in the cases of ARCH-X type process. Assumption A i ( H θ , K ) ( i = 0 , , f θ = 0. There exists two sequences of non-negative realnumbers ( α ( i ) k,Y ( H θ , K )) k ≥ and ( α ( i ) k,X ( H θ , K )) k ≥ satisfying: ∞ (cid:80) k =1 α ( i ) k,Y ( H θ , K ) < ∞ , ∞ (cid:80) k =1 α ( i ) k,X ( H θ , K ) < ∞ for i = 0 , ,

2; such that for any ( y, x ) , ( y (cid:48) , x (cid:48) ) ∈ R ∞ × ( R d x ) ∞ , (cid:13)(cid:13)(cid:13) ∂ i H θ ( y, x ) ∂θ i − ∂ i H θ ( y (cid:48) , x (cid:48) ) ∂θ i (cid:13)(cid:13)(cid:13) K ≤ ∞ (cid:88) k =1 α ( i ) k,Y ( H θ , K ) | y k − y (cid:48) k | + ∞ (cid:88) k =1 α ( i ) k,X ( H θ , K ) (cid:107) x k − x (cid:48) k (cid:107) . Inference and model selection in general causal time series with exogenous covariates

In the whole paper, we impose an autoregressive-type structure on the covariates: X t = g ( X t − , X t − , . . . ; η t ) , (2.1)where ( η t ) t ∈ Z is a sequence of random variables such as ( η t , ξ t ) t ∈ Z is i.i.d and g is a function with values in R d x satisfying E [ (cid:107) g (0 , η ) (cid:107) r ] < ∞ and (cid:107) g ( x ; η ) − g ( x (cid:48) ; η ) (cid:107) r ≤ ∞ (cid:88) k =1 α k ( g ) (cid:107) x k − x (cid:48) k (cid:107) for all x, x (cid:48) ∈ ( R d x ) N , (2.2)for some r ≥ α k ( g )) k ≥ such that ∞ (cid:80) k =1 α k ( g ) < r ≥

1, when (2.2) holds, we deﬁne the setΘ( r ) = (cid:110) θ ∈ R d (cid:14) A ( f θ , { θ } ) and A ( M θ , { θ } ) hold with ∞ (cid:88) k =1 max ¶ α k ( g ) , α (0) k,Y ( f θ , { θ } ) + (cid:107) ξ (cid:107) r α (0) k,Y ( M θ , { θ } © < (cid:111)(cid:91) (cid:110) θ ∈ R d (cid:14) f θ = 0 and A ( H θ , { θ } ) holds with (cid:107) ξ (cid:107) r ∞ (cid:88) k =1 max ¶ α k ( g ) , α (0) k,Y ( H θ , { θ } ) © < (cid:111) . In the sequel, we make the convention that if A i ( M θ , Θ) holds then α ( i ) k,Y ( H θ , Θ) = α ( i ) k,X ( H θ , Θ) = 0 for all k ∈ N and if A i ( H θ , Θ) holds then α ( i ) k,Y ( M θ , Θ) = α ( i ) k,X ( M θ , Θ) = 0 for all k ∈ N .The condition (2.2) allows to assure the stability of the process X t . Together the aforementioned assumptionsassure the existence of a stationary and weakly dependent solution of order r to the model (1.1), as shown inthe following proposition. Proposition 2.1

Assume that A ( f θ , Θ) , A ( M θ , Θ) (or A ( H θ , Θ) ) and (2.2) hold. If θ ∗ ∈ Θ ∩ Θ( r ) with r ≥ , then there exists a τ -weakly dependent stationary, ergodic and non anticipative solution ( Z t ) t ∈ Z Z t =( Y t , X t ) , to (1.1), satisfying E [ (cid:107) Z (cid:107) r ] < ∞ . In this paragraph, we describe the use of the Gaussian quasi-maximum likelihood to obtain an estimator ofthe parameters of the model (1.1). The main asymptotic properties of this estimator are also established.Assume that the observations ( Y , X ) , . . . , ( Y n , X n ) are generated from (1.1) and (2.1) according to the trueparameter θ ∗ ∈ Θ which is unknown. For all t ∈ Z , denote by F t = σ (( Y s , X s ) , s ≤ t ) the σ -ﬁeld generatedby the whole past at time t . The mean and the variance of Y t |F t − and f θ ∗ ( Y t − , . . . ; X t − , . . . ) and variance M θ ∗ ( Y t − , . . . ; X t − , . . . ) respectively. For any θ ∈ Θ, the conditional Gaussian quasi-log-likelihood is given by(up to an additional constant) L n ( θ ) := − n (cid:88) t =1 q t ( θ ) with q t ( θ ) = ( Y t − f tθ ) H tθ + log H tθ , where f tθ := f θ ( Y t − , Y t − . . . ; X t − , X t − , . . . ), M tθ := M θ ( Y t − , Y t − . . . ; X t − , X t − , . . . ) and H tθ := ( M tθ ) .Since ( Y , X ) , ( Y − , X − ) , . . . are not observed, L n ( θ ) is approximated by (cid:98) L n ( θ ) = − n (cid:88) t =1 (cid:98) q t ( θ ) with (cid:98) q t ( θ ) = ( Y t − (cid:98) f tθ ) (cid:98) H tθ + log (cid:98) H tθ , iop and Kengne (cid:98) f tθ := f θ ( Y t − , . . . , Y , X t − , . . . , X , (cid:99) M tθ := M θ ( Y t − , . . . , Y , X t − , . . . , X ,

0) and (cid:98) H tθ := ( (cid:99) M tθ ) .Thus, the QMLE of θ ∗ is deﬁned by (cid:98) θ n = argmax θ ∈ Θ (cid:0)(cid:98) L n ( θ ) (cid:1) . We set the following regularity conditions to assure the identiﬁability of the model and to derive the asymptoticbehavior of the QMLE.( A0 ): for all θ ∈ Θ and some t ∈ Z , (cid:0) f tθ ∗ = f tθ and H tθ ∗ = H tθ a.s. (cid:1) ⇒ θ = θ ∗ ;( A1 ): ∃ h > θ ∈ Θ H θ ( y, x ) ≥ h , for all ( y, x ) ∈ R N × ( R d x ) N ;( A2 ): for all θ ∈ Θ, c ∈ R d , (cid:16) c (cid:48) ∂∂θ f θ ∗ = 0 or c (cid:48) ∂∂θ H θ ∗ = 0 (cid:17) a.s. = ⇒ c = 0, where (cid:48) denotes the transpose.Assumption ( A0 ) is an identiﬁability condition and it will be discussed in detail for each of the examples ofprocesses studied in the paper. From ( A1 ), the quasi likelihood is well deﬁned, whereas ( A2 ), which is classical(see for instance Bardet and Wintenberger (2009)) allows to derive the asymptotic distribution of the QMLE.The following theorem addresses the strong consistency of the QMLE. Theorem 2.2

Assume that ( A0 ), ( A1 ), A ( f θ , Θ) , A ( M θ , Θ) and (2.2) (with r ≥ ) hold with α (0) k,Y ( f θ , Θ) + α (0) k,X ( f θ , Θ) + α (0) k,Y ( M θ , Θ) + α (0) k,X ( M θ , Θ) + α (0) k,Y ( H θ , Θ) + α (0) k,X ( H θ , Θ) = O ( k − γ ) , (2.3) for some γ > / .If θ ∗ ∈ Θ ∩ Θ( r ) with r ≥ , then (cid:98) θ n a.s. −→ n →∞ θ ∗ . To derive the asymptotic distribution of the QMLE, it is necessary to take into account the constraints in theparameter space Θ corresponding to the model. For example, in some processes belonging to (1.1), such as theARCH-X models (see below), the components of θ ∗ are constrained to be positive or equal to zero. In orderto propose a parsimonious representation, it is often required to test whether or not the exogenous covariatesare relevant. For example, in an ARCH(1)-X model deﬁned by Y t = ξ t σ t with σ t = α ∗ + α ∗ Y t − k + γ ∗(cid:48) X t − ,the true parameter vector is θ ∗ = ( α ∗ , α ∗ , γ ∗ ) ∈ Θ ⊂ ]0 , ∞ [ × [0 , ∞ [ dx +1 . The signiﬁcance test of the covariate X t consists to verify the nullity of the parameter γ ∗ ; that is, if the true parameter vector can be of the form θ = ( α , α ,

0) which is not an interior point of Θ. In this situation, it is impossible to apply the asymptoticnormality results based on the classical assumption of ”interior point” to derive the asymptotic behavior of thetest statistic used. To take into account such a scenario in the general class (1.1), we will consider that thecomponent i of θ ∗ is constrained if the i -th section of Θ is of the form [ θ i , θ i ] with θ i < θ i . Assume that the d (with d ∈ { , . . . , d } ) last components of θ ∗ are constrained, and let d = d − d . Therefore, if d ≥ θ ∗ i ∈ { θ i , θ i } with i > d , then θ ∗ is not an interior point of Θ. For instance, in a scenario where θ ∗ i = θ i ,with the QMLE (cid:98) θ n = ( (cid:98) θ ,n , . . . , (cid:98) θ d,n ), it holds that √ n ( (cid:98) θ i,n − θ ∗ i ) ∈ [0 , ∞ ) which cannot tend to a Gaussiandistribution with mean 0. By convention, it is assumed that θ ∗ ∈ ◦ Θ if d = 0. When d ≥ (cid:91) n ≥ (cid:8) √ n ( θ − θ ∗ ) , θ ∈ Θ (cid:9) = C with C = d (cid:89) i =1 C i , (2.4) Inference and model selection in general causal time series with exogenous covariates where C i = [0 , ∞ [ when i > d and θ ∗ i = θ i , C i =] ∞ ,

0] when i > d and θ ∗ i = θ i , and C i = R otherwise. The set C is a convex cone which is equal to R d if θ ∗ ∈ ◦ Θ.Let us deﬁne the following matrices F = E (cid:104) ∂ q ( θ ∗ ) ∂θ∂θ (cid:48) (cid:105) and G = E (cid:104) ∂q ( θ ∗ ) ∂θ ∂q ( θ ∗ ) ∂θ (cid:48) (cid:105) . (2.5)Under the assumptions A i ( f θ , Θ), A i ( M θ , Θ) (with i = 0 , , F and G . Inaddition, in view to ( A2 ), the same arguments as in Bardet and Wintenberger (2009) allow to establish thatmatrix F is positive deﬁnite. Consider then the F -scalar product (cid:104) x, y (cid:105) F = x (cid:48) F y and the norm (cid:107) x (cid:107) F = x (cid:48) F x for x, y ∈ R d . Let us deﬁne the F -orthogonal projection of a vector Z ∈ R d on the cone C as follows: Z C = arg inf C ∈C (cid:107) C − Z (cid:107) F . This deﬁnition is equivalent to Z C ∈ C with (cid:10) Z − Z C , C − Z C (cid:11) ≤ , ∀ C ∈ C . (2.6)Note that, when θ ∗ ∈ ◦ Θ, we have Z C = Z . Combining all the regularity conditions and deﬁnitions given above,we obtain the following main result. Theorem 2.3

Assume that ( A0 )-( A2 ), ( A i ( f θ , Θ) ), ( A i ( M θ , Θ) ) (for i = 0 , , ) and (2.2) (with r ≥ ) holdwith α ( i ) k,Y ( f θ , Θ) + α ( i ) k,X ( f θ , Θ) + α ( i ) k θ ,Y ( M θ , Θ) + α ( i ) k,X ( M θ , Θ) + α ( i ) k θ ,Y ( H θ , Θ) + α ( i ) k,X ( H θ , Θ) = O ( k − γ ) , (2.7) for i = 0 , , and some γ > / . • If θ ∗ ∈ Θ ∩ Θ( r ) with r ≥ , then √ n Ä (cid:98) θ n − θ ∗ ä D −→ n →∞ Z C with Z ∼ N d (0 , Σ) , where Σ := F − GF − . • If θ ∗ ∈ ◦ Θ ∩ Θ( r ) with r ≥ , then (cid:98) θ n − θ ∗ = O Å… log log nn ã a.s. The matrix Σ can be consistently estimated by (cid:98) Σ n = F n ( (cid:98) θ n ) − G n ( (cid:98) θ n ) F n ( (cid:98) θ n ) − , where F n ( (cid:98) θ n ) = 1 n n (cid:88) t =1 ∂ q t ( (cid:98) θ n ) ∂θ∂θ (cid:48) and G n ( (cid:98) θ n ) = 1 n n (cid:88) t =1 ∂q t ( (cid:98) θ n ) ∂θ ∂q t ( (cid:98) θ n ) ∂θ (cid:48) . Now, we are interested to investigate whether or not a given subset of components of θ ∗ are equal to some ﬁxedvector. To do so, consider the following hypothesis testing: H : Γ θ ∗ = ϑ against H : Γ θ ∗ (cid:54) = ϑ , (2.8)where Γ is a d × d full-rank matrix and ϑ is a vector of dimension d . Deﬁne the Wald-type test statistic givenby W n = n (Γ (cid:98) θ n − ϑ ) (cid:48) (Γ (cid:98) Σ n Γ (cid:48) ) − (Γ (cid:98) θ n − ϑ ) . (2.9)Under H , the asymptotic behavior of W n is given by the following theorem. iop and Kengne Theorem 2.4

Under H , assume that the assumptions of Theorem 2.3 hold. Then W n D −→ n →∞ (Γ Z C ) (cid:48) (ΓΣΓ (cid:48) ) − Γ Z C with Z ∼ N d (0 , Σ) . By the above theorem, at a nominal level α ∈ (0 , W n > q α ), where q α isthe (1 − α )-quantile of the distribution of (Γ Z C ) (cid:48) (ΓΣΓ (cid:48) ) − Γ Z C . The critical value q α can be computed throughMonte-Carlo simulations. The following corollary follows immediately when θ ∗ belongs to the interior of theparameter space. Corollary 2.5

Assume that the conditions of Theorem 2.4 hold. If θ ∗ ∈ ◦ Θ , then W n converges to a chi-squaredistribution with d degrees of freedom. Under H , one can easily see that W n a.s −→ n →∞ + ∞ ; which shows that the test is consistent in power. In theempirical studies, we will restrict our attention to test the relevance of the exogenous covariates by using thehypothesis (2.8) with ϑ = 0 and an appropriate matrix Γ. Assume that ( Y , . . . , Y n ) is a trajectory of the process Y = { Y t , t ∈ Z } satisfying AC - X ( M θ ∗ , f θ ∗ ) (deﬁned asin (1.1)), where the true parameter θ ∗ is unknown. Let M be a ﬁnite collection of models belonging to AC - X ( M θ , f θ ) with θ ∈ Θ. Assume that M contains at least the true model m ∗ corresponding to the parameter θ ∗ . Our objective is to develop a procedure that allows to select the ”best model” (that we denote by (cid:98) m n )among the collection M such that it is ”close” to m ∗ for n large enough. To this end, we consider the followingdeﬁnitions and notations in the sequel: • a model m ∈ M is considered as a subset of { , . . . , d } and denote by | m | the dimension of m (i.e, | m | = m )); • for m ∈ M , Θ m = { ( θ i ) ≤ i ≤ d ∈ Θ with θ i = 0 if i / ∈ m } is a compact set containing θ ( m ), where θ ( m )denotes the parameter vector associated to the model m ; • M is considered as a subset of the power set of { , . . . , d } ; that is, M ∈ P ( { , . . . , d } ).For instance, when the observations Y , . . . , Y n are generated from a ARMAX( p ∗ , q ∗ , s ∗ ) model (deﬁned below),the collection M of the competing models could be considered as a family of ARMAX( p, q, s ) with ( p, q, s ) ∈{ , , . . . , p max } × { , , . . . , q max } × { , , . . . , s max } , where p max , q max , s max are the ﬁxed upper bounds ofthe orders satisfying p max ≥ p ∗ , q max ≥ q ∗ , s max ≥ s ∗ . The parameter space Θ is a compact subset of R p max + q max + s max , and thus a model m is a subset of { , , . . . , p max + q max + s max } . Note that, under the identiﬁability assumption ( A0 ), one can show that, for all m ∈ M , the function θ (cid:55)→− E [ q ( θ )] has a unique maximum in Θ m (see proof of Theorem 2.2). Let us thus deﬁne the ”best” parameterassociated to the model m as θ ∗ ( m ) := argmin θ ∈ Θ m ( E [ q ( θ )]) . Inference and model selection in general causal time series with exogenous covariates

When m ⊇ m ∗ , we have θ ∗ ( m ) = θ ∗ ( m ∗ ) = θ ∗ ; that is, θ ∗ ( m ) will play the role of the true parameter θ ∗ incases of ”true” or overﬁtted model. For m ∈ M , we deﬁne the QMLE of θ ∗ ( m ) as (cid:98) θ ( m ) := argmax θ ∈ Θ m Ä (cid:98) L n ( θ ) ä . (3.1)Now, deﬁne the penalized criteria given by (cid:98) C n ( m ) := − (cid:98) L n ( (cid:98) θ ( m )) + κ n | m | , for all m ∈ M , (3.2)where ( κ n ) n ∈ N is an increasing sequence of the regularization parameter (possibly data-dependent) that will beused to calibrate the penalty term, and | m | is the number of non-zero components of θ ∗ ( m ) ∈ Θ m that will becalled the dimension of the model m . The selection of the ”best” model (cid:98) m n is then obtained by minimizing thepenalized contrast; that is, (cid:98) m n := argmin m ∈M Ä (cid:98) C n ( m ) ä . (3.3)Using the results of Theorems 2.2 and 2.3, we establish the asymptotic behavior of the model selection procedure,as shown in the following theorem. Theorem 3.1

Let ( Y , . . . , Y n ) be a trajectory of a process belonging to AC - X ( M θ ∗ , f θ ∗ ) , where θ ∗ ∈ Θ ∩ Θ( r ) with r > . Assume that ( A0 )-( A2 ), ( A i ( f θ , Θ) ), ( A i ( M θ , Θ) ) (or ( A i ( H θ , Θ) )) (for i = 0 , , ) and (2.2)(with r > ) hold with κ n /n −→ n →∞ . Suppose that when θ ∗ ∈ ◦ Θ , (cid:88) k ≥ √ k log log k (cid:88) j ≥ k (cid:88) i =0 (cid:8) α ( i ) j,Y ( f θ , Θ) + α ( i ) j,X ( f θ , Θ) + α ( i ) j,Y ( M θ , Θ) + α ( i ) j,X ( M θ , Θ)+ α ( i ) j,Y ( H θ , Θ) + α ( i ) j,X ( H θ , Θ) (cid:9) < ∞ . (3.4)(i.) If κ n / √ log log n −→ n →∞ ∞ , then (cid:98) m n P −→ n →∞ m ∗ . (ii.) When θ ∗ ∈ ◦ Θ , there exists a constant c such that if lim inf n →∞ ( κ n / log log n ) > c , then (cid:98) m n a.s. −→ n →∞ m ∗ . (iii.) If θ ∗ ∈ ◦ Θ and (3.4) holds, then (cid:98) θ ( (cid:98) m n ) − θ ∗ = O Å… log log nn ã . Remark that, if (cid:80) i =0 (cid:8) α ( i ) j,Y ( f θ , Θ) + α ( i ) j,X ( f θ , Θ) + α ( i ) j,Y ( M θ , Θ) + α ( i ) j,X ( M θ , Θ) + α ( i ) j,Y ( H θ , Θ) + α ( i ) j,X ( H θ , Θ) (cid:9) = O ( j − γ ) for some γ > /

2, then (3.4) is satisﬁed. The ﬁrst and second parts of Theorem 3.1 show the consistencyof the selection procedure; in particular, the second part provides suﬃcient conditions for the consistency ofthe HQC procedure. The last part establishes that the estimator of the parameter of the selected model (cid:98) θ ( (cid:98) m n )obeys the law of iterated logarithm. iop and Kengne In this section, we detail some particular processes satisfying the class (1.1). We show that the regularityconditions required for the main results are satisﬁed for these processes, with a particular emphasis on theidentiﬁability assumption. For each example discussed, we consider that X t = ( X ,t , X ,t , . . . , X d x ,t ) ∈ R d x ( d x ∈ N ) represents a vector of covariates; and ( ξ t ) t ∈ Z is a sequence of zero-mean i.i.d. random variablesatisfying E ( ξ r ) < ∞ for some r ≥ E ( ξ ) = 1. ( ∞ ) models As ﬁrst example, consider a linear ARMAX( p ∗ , q ∗ , s ∗ ) model deﬁned by Y t = p ∗ (cid:88) i =1 α ∗ i Y t − i + ξ t + q ∗ (cid:88) i =1 β ∗ i ξ t − i + s ∗ (cid:88) i =1 γ ∗(cid:48) i X t − i , ∀ t ∈ Z , (4.1)where α ∗ i , β ∗ i ∈ R (for 1 ≤ i ≤ p ∗ , for 1 ≤ i ≤ q ∗ ) and γ ∗ i ∈ R d x (for 1 ≤ i ≤ s ∗ ). This model has beeninvestigated by several authors and various issues have been addressed; see, among others, Hannan et al. (1980),Hannan and Deistle (1988), Hannan and Deistler (2012) (within the multivariate framework), and Bierens (1994)(within the linear and nonlinear frameworks). We also refer to Hoque and Peters (1986) who carried out theARMAX(1 , ,

1) process. The true parameter of the model is θ ∗ = ( α ∗ , . . . , α ∗ p ∗ , β ∗ , . . . , β ∗ q ∗ , γ ∗ , . . . , γ ∗ s ∗ ) ∈ Θ,where Θ is a compact set such as: for all θ = ( α , . . . , α p ∗ , β , . . . , β q ∗ , γ , . . . , γ s ∗ ) ∈ Θ, p ∗ (cid:80) i =1 | α i | + q ∗ (cid:80) i =1 | β i | < θ ∈ Θ: A θ ( L ) = 1 − p ∗ (cid:88) i =1 α i L i , B θ ( L ) = 1 + q ∗ (cid:88) j =1 β j L j and C θ ( L ) = s ∗ (cid:88) k =1 γ ∗(cid:48) k L k , where L denotes the lag operator. The polynomial B θ ∗ ( L ) is invertible and the model (4.1) can be rewritten as A θ ∗ ( L ) B θ ∗ ( L ) Y t = ξ t + C θ ∗ ( L ) B θ ∗ ( L ) X t , ∀ t ∈ Z . (4.2)For any θ ∈ Θ, again from the invertibility of B θ ( L ), one can ﬁnd two sequences ( φ k ( θ )) k ∈ N and ( ϕ k ( θ )) k ∈ N such that A θ ( L ) B θ ( L ) = 1 − (cid:88) k ≥ φ k ( θ ) L k and C θ ( L ) B θ ( L ) = (cid:88) k ≥ ϕ (cid:48) k ( θ ) L k . Thus, (4.2) is equivalent to Y t = (cid:88) k ≥ φ k ( θ ∗ ) Y t − k + (cid:88) k ≥ ϕ (cid:48) k ( θ ∗ ) X t − k + ξ t , ∀ t ∈ Z , (4.3)which shows that this process belongs to the class AC - X ( M θ ∗ , f θ ∗ ), with M tθ ≡ f tθ = (cid:80) k ≥ φ i ( θ ) Y t − k + (cid:80) i ≥ k ϕ (cid:48) k ( θ ) X t − k , for any θ ∈ Θ. We deduce that the Lipschitz coeﬃcients are α (0) k,Y ( M θ , { θ } ) = α (0) k,X ( M θ , { θ } ) = 0, α (0) k,Y ( f θ , { θ } ) = | φ k ( θ ) | and α (0) k,X ( f θ , { θ } ) = (cid:107) ϕ k ( θ ) (cid:107) for all k ≥

1. The stationarity set for any r ≥ r ) = (cid:8) θ ∈ R p ∗ + q ∗ + s ∗ × d x (cid:14) (cid:88) k ≥ max { α k ( g ) , | φ k ( θ ) |} < (cid:9) . The assumption ( A1 ) holds with h = 1. To assure the correctness of this ARMAX model speciﬁcation, weimpose the following additional conditions:0 Inference and model selection in general causal time series with exogenous covariates ( B0 ): E [ ξ t X t (cid:48) ] = 0 for all ( t, t (cid:48) ) ∈ Z ;( B1 ): A θ ∗ ( z ) and B θ ∗ ( z ) have no common root;( B2 ): if ( c k ) k ∈ N is a sequence of vector of R d x such as ∃ c k (cid:54) = 0 (with k ∈ N ), then (cid:80) k ≥ c (cid:48) k X t − k is non-degenerate.Observe that, the assumption ( B0 ) excludes situations where the covariate c (cid:48) X t is replaced by Y ν ( t ) for someconstant c ∈ R d x and a function ν : Z → Z satisfying ν ( t ) ≤ t for all t ∈ Z ; in particular, the case c (cid:48) X t = Y t is excluded. Under the conditions ( B0 )-( B2 ), one can easily prove that this model is identiﬁable. Indeed, let( t, θ ) ∈ Z × Θ such as f tθ ∗ = f tθ a.s. Let us show that θ = θ ∗ .When f tθ ∗ = f tθ a.s., it follows that (cid:88) k ≥ (cid:0) φ k ( θ ∗ ) − φ k ( θ ) (cid:1) Y t − k = (cid:88) k ≥ (cid:0) ϕ (cid:48) k ( θ ) − ϕ (cid:48) k ( θ ∗ ) (cid:1) X t − k . (4.4)Let us ﬁrst prove by contradiction that φ k ( θ ∗ ) = φ k ( θ ) , ∀ k ≥

1. Let m ≥ φ m ( θ ∗ ) (cid:54) = φ m ( θ ). Then, (4.4) gives Y t − m = 1 φ m ( θ ) − φ m ( θ ∗ ) ß (cid:88) k>m (cid:0) φ k ( θ ∗ ) − φ k ( θ ) (cid:1) Y t − k + (cid:88) k ≥ (cid:0) ϕ (cid:48) k ( θ ∗ ) − ϕ (cid:48) k ( θ ) (cid:1) X t − k ™ . This implies that from ( B0 ), E [ ξ t − m Y t − m ] = 1 φ m ( θ ) − φ m ( θ ∗ ) (cid:88) k>m (cid:0) φ k ( θ ∗ ) − φ k ( θ ) (cid:1) E [ ξ t − m Y t − k ] . (4.5)Note that A θ ∗ ( L ) is also invertible since θ ∗ ∈ Θ. Therefore, (4.1) is equivalent to Y t = B θ ∗ ( L ) A θ ∗ ( L ) ξ t + C θ ∗ ( L ) A θ ∗ ( L ) X t = ξ t + (cid:88) i ≥ a i ( θ ∗ ) ξ t − i + (cid:88) i ≥ b i (cid:48) ( θ ∗ ) X t − i , ∀ t ∈ Z , where ( a i ( θ ∗ )) i ∈ N and ( b i ( θ ∗ )) i ∈ N are two sequences of real numbers and vectors. Thus, from ( B0 ) and theproprieties of ( ξ t ) t ∈ Z , we deduce: E [ ξ t Y t ] = 1 for any t ∈ Z and E [ ξ t Y t (cid:48) ] = 0 for any t (cid:48) < t . It comes from (4.5)that E [ ξ t − m Y t − m ] = 0, which is impossible. This establishes that φ k ( θ ∗ ) = φ k ( θ ) , ∀ k ≥

1. Consequently, A θ ∗ ( L ) B θ ∗ ( L ) = A θ ( L ) B θ ( L ) and (cid:88) i ≥ (cid:0) ϕ (cid:48) i ( θ ∗ ) − ϕ (cid:48) i ( θ ) (cid:1) X t − i = 0 , where the last equality is obtained from the relation (4.4). In view of ( B1 ) and ( B2 ), this implies that A θ ∗ ( L ) = A θ ( L ), B θ ∗ ( L ) = B θ ( L ) and C θ ∗ ( L ) = C θ ( L ), which proves that θ = θ ∗ .In general, one can deﬁne the ARX( ∞ ) process as follows: Y t = ψ ( θ ∗ ) + (cid:88) i ≥ ψ i ( θ ∗ ) Y t − i + (cid:88) i ≥ γ (cid:48) i ( θ ∗ ) X t − i + ξ t , ∀ t ∈ Z , (4.6)where ( ψ i ( θ ∗ )) i ≥ and ( γ (cid:48) i ( θ ∗ )) i ≥ are two sequences, and θ ∗ is the true parameter. If (cid:80) k ≥ max { α k ( g ) , | ψ k ( θ ∗ ) |} <

1, there exists a stationary and ergodic solution with r -order moment. If the assumption ( B0 ) holds for (4.6),then one can go along similar lines as in the model (4.2) to show that a suﬃcient condition for the model(4.6) to be identiﬁable is that the function θ (cid:55)→ ψ i ( θ ) (for i ≥

0) is injective. The condition ( A1 ) holds ifinf θ ∈ Θ ψ ( θ ) > iop and Kengne ( ∞ ) models Consider an ARCH-X( ∞ ) model deﬁned by: Y t = ξ t σ t ( θ ∗ ) with σ t ( θ ∗ ) = φ ( θ ∗ ) + ∞ (cid:88) k =1 φ k ( θ ∗ ) Y t − k + ∞ (cid:88) k =1 γ (cid:48) k ( θ ∗ ) X t − k , ∀ t ∈ Z , (4.7)where θ ∗ is the true parameter, ( φ k ( θ ∗ )) k ≥ and ( γ k ( θ ∗ )) k ≥ are two non-negative (componentwise for γ k ( θ ∗ ))sequences with ( φ ( θ ∗ )) (cid:54) = 0, and X t is a vector of non-negative (componentwise) covariates. This model is anexample of the class (1.1) with f tθ = 0 and M tθ = σ t ( θ ). Assume that θ ∗ ∈ Θ, where Θ is a compact set deﬁnedby Θ = (cid:8) θ ∈ R d (cid:14) (cid:88) k ≥ max { α k ( g ) , | φ k ( θ ) |} < (cid:88) k ≥ (cid:107) γ k ( θ ) (cid:107) < α U for some α U > (cid:9) . Therefore, A ( H θ , Θ) holds and Θ(2) = Θ. Assume that inf θ φ ( θ ) >

0, which ensures that assumption ( A1 ) issatisﬁed. Now, denote by F t,i the σ − ﬁeld generated by { ξ t − j , j > i, X t − k , k > } . The following assumptionsare needed to ensure the identiﬁability.( B3 ): the support of the distribution of ξ t − i given F t,i contains at least three points;( B4 ): θ (cid:55)→ φ k ( θ ) is an injective function for some ∃ k > B5 ): θ (cid:55)→ φ ( θ ) (or θ (cid:55)→ γ k ( θ ∗ ) for some ∃ k >

0) is an injective function, and the condition ( B2 ) holds forthis model.Note that ( B3 ) is a required identiﬁability condition which entails that ξ t − i given F t,i is a non-degeneraterandom variable. But, both the assumptions ( B4 ) and ( B5 ) are not necessary; that is, the model is identiﬁableif B3 and (( B4 ) or ( B5 )) holds. Indeed, let θ ∈ Θ is such that H θ = H ∗ θ . Then ∞ (cid:88) k =1 ( φ k ( θ ∗ ) − φ k ( θ )) Y t − k = φ ( θ ) − φ ( θ ∗ ) + ∞ (cid:88) k =1 ( γ (cid:48) k ( θ ) − γ (cid:48) k ( θ ∗ )) X t − k . (4.8)By contradiction, assume that φ k ( θ ∗ ) (cid:54) = φ k ( θ ) for some k > m > φ m ( θ ∗ ) (cid:54) = φ m ( θ ). Therefore,( φ m ( θ ∗ ) − φ m ( θ )) Y t − m = φ ( θ ) − φ ( θ ∗ ) + (cid:88) k>m ( φ k ( θ ) − φ k ( θ ∗ )) Y t − k + ∞ (cid:88) k =1 ( γ (cid:48) k ( θ ) − γ (cid:48) k ( θ ∗ )) X t − k , and (4.7) yields that ξ t − m = 1( φ m ( θ ∗ ) − φ m ( θ )) σ t − m ( θ ∗ ) (cid:2) φ ( θ ) − φ ( θ ∗ )+ (cid:88) k>m ( φ k ( θ ) − φ k ( θ ∗ )) Y t − k + ∞ (cid:88) k =1 ( γ (cid:48) k ( θ ) − γ (cid:48) k ( θ ∗ )) X t − k (cid:3) . (4.9)By a recursive substitution, one can easily see that for all k > m , Y t − k is F t,m − measurable, and thus theright-hand side of (4.9). This implies that ξ t − m given F t,m is degenerate, which contradicts the assumption( B3 ); hence, φ k ( θ ∗ ) = φ k ( θ ) , ∀ k >

0. We deduce that: • if ( B4 ) holds, then θ = θ ∗ ;2 Inference and model selection in general causal time series with exogenous covariates • else, if ( B5 ) holds, we have from (4.8), ∞ (cid:88) k =1 ( γ (cid:48) k ( θ ∗ ) − γ (cid:48) k ( θ )) X t − k = φ ( θ ) − φ ( θ ∗ ) , which implies that γ k ( θ ∗ ) = γ k ( θ ) ∀ k ≥ B2 ) in ( B5 )); and consequently, φ ( θ ) = φ ( θ ∗ ). Hence, θ = θ ∗ .Note that the GARCH-X process already introduced in the literature and widely studied by some authors(see, for instance, Han and Kristensen (2014), Nana et al. (2013) and Han (2015)) is a particular case of theARCH-X( ∞ ) process that we have considered here. Let us stress that, the i.i.d. assumption for ( ξ t ) t ∈ Z is abit strong for the model (4.7). This assumption, which is needed to the large class AC - X ( M θ ∗ , f θ ∗ ) can berelaxed to ( ξ t , F t, ) t ∈ Z is a martingale diﬀerence sequence (see for instance Francq and Thieu (2019) in thecase of GARCH-X model) when checking the identiﬁability. In the absence of covariates (i.e., the case of thestandard ARCH( ∞ ) model), the assumption ( B3 ) can be automatically reduced to: ” ξ t is a non-degeneraterandom variable”; see, for instance, Franck and Zako¨ıan (2004) and Berkes et al. (2003) for some works wheresuch condition is assumed to address the identiﬁability for the standard GARCH model. With the assumption( B3 ), the feedback from Y t to X t (for instance, c (cid:48) X t − = Y t − i with c ∈ R d x ) is excluded, because this wouldcontradict the non-degeneration of the variable ξ t − i |F t,i . Now, we propose an extension of the AR-GARCH processes (see Ling and McAleer (2003)) to processes withexogenous covariates. Deﬁne the ARX(1)-GARCH(1 ,

1) process by (cid:40) Y t = a ∗ Y t − + γ ∗ (cid:48) X t − + ε t ε t = ξ t σ t with σ t = c ∗ + c ∗ ε t − + d ∗ σ t − , (4.10)where | a ∗ | < c ∗ >

0, 0 ≤ c ∗ + d ∗ < γ ∗ is a vector of a compact set included in R dx ( d x ∈ N ). The trueparameter of the model is θ ∗ = ( a ∗ , c ∗ , c ∗ , d ∗ , γ ∗ ). Since d ∗ ∈ (0 , σ t = c ∗ − d ∗ + c ∗ (cid:80) ∞ k =1 d ∗ k − ε t − k .Then, one can see that the model (4.10) belongs to the class AC - X ( M θ ∗ , f θ ∗ ) with f tθ = aY t − + γ (cid:48) X t − and M tθ = (cid:115) c − d + c (cid:88) k ≥ d k − ( Y t − k − aY t − k − − γ (cid:48) X t − k − ) , for all θ = ( a, c , c , d, γ ) ∈ Θ ⊂ R d x +4 . The assumption A ( f θ , { θ } ) holds with α (0)1 ,Y ( f θ , { θ } ) = | a | , α (0)1 ,X ( f θ , { θ } ) = (cid:107) γ (cid:107) and α (0) k,Y ( f θ , { θ } ) = α (0) k,X ( f θ , { θ } ) = 0 for k ≥

2. Simple computations show that, the Lipschitz coeﬃcientsof M tθ are:  α (0)1 ,Y ( M θ , { θ } ) = c and α (0) k,Y ( M θ , { θ } ) = c ( d k − + | a | d k − ) for k ≥ α (0)1 ,X ( M θ , { θ } ) = 0 and α (0) k,X ( M θ , { θ } ) = c (cid:107) γ (cid:107) d k − for k ≥ . For example, if the components of X t satisfy a Markov-structure as follows: X i,t = ϕ i X i,t − + η i,t for all 1 ≤ i ≤ d x and t ∈ Z , iop and Kengne | ϕ i | < η i,t is a white noise, then the stability condition (2.2) is satisﬁed with α ( g ) = max ≤ i ≤ d x | ϕ i | and α k ( g ) = 0 for k ≥

2. In this case, the stationarity set Θ( r ) is deﬁned byΘ( r ) = (cid:8) θ = ( a, c , c , d, γ ) ∈ R d x +4 (cid:14) max { α ( g ) , | a | + c (cid:107) ξ (cid:107) r } + c (1 − d ) − ( d + | a | ) (cid:107) ξ (cid:107) r < (cid:9) . In addition to the assumptions ( B0 ) and ( B3 ), we assume that( B6 ): c (cid:48) X is not degenerated if c is a non zero vector of R d x .Then, the identiﬁability of this model can be established as in the model (4.7). In this section, we consider a double autoregressive model with exogenous covariates, deﬁned by Y t = φ + φ Y t − + q (cid:88) i =1 ψ i X t − i + ξ t Ã α + α Y t − + q (cid:88) i =1 β i X t − i , (5.1)where q ∈ N , α , α , β , . . . , β q > φ , φ , ψ , . . . , ψ q ∈ R , ξ t is a white noise and X t is a real valued exogenouscovariates. This model is a particular case of the a factor double autoregressive (FDAR) process introduced byGuo et al. (2014) to extend the double AR( p ) model proposed by Ling (2017). We assume that X t is an AR(1)process: X t = ϕ + ϕ X t − + η t for all t ∈ Z , where η t is a Gaussian white noise. The AR parameter is set to ( ϕ , ϕ ) = (0 . , .

5) and X t is initializedat X ∼ N (cid:0) ϕ / (1 − ϕ ) , / (1 − ϕ ) (cid:1) . Set ψ = ( ψ , . . . , ψ q ) and β = ( β , . . . , β q ); the true parameter is θ ∗ = ( φ , φ , α , α , ψ, β ). Based on the examples discussed in Section 4, if the conditions ( B0 ) and ( B3 )hold for (5.1), then to satisfy the identiﬁability condition, it suﬃces to impose the following assumption on thecovariate:( B7 ): if ( c k ) ≤ k ≤ q is a sequence of vector of R d x such as ∃ c k (cid:54) = 0 (with 1 ≤ k ≤ q ), then (cid:80) qi =1 c k X t − k is notdegenerated.We ﬁrst present some results from Monte Carlo simulations to assess the asymptotic properties of the QMLE.We also investigate the empirical size and power of the proposed procedure on testing the signiﬁcance of thecovariate X t . With q = 1, consider the model (5.1) in the following situations: • scenario S : θ ∗ = (0 . , − . , . , . , , • scenario S : θ ∗ = (0 . , − . , . , . , . , • scenario S’ : θ ∗ = (1 , . , . , . , , • scenario S’ : θ ∗ = (1 , . , . , . , . , . S and S’ correspond to cases where the covariate is absent. We consider the following signiﬁcancetests: H : θ ∗ = (0 . , − . , . , . , ,

0) ( S ) against H : θ ∗ (cid:54) = (0 . , − . , . , . , , Inference and model selection in general causal time series with exogenous covariates H : θ ∗ = (1 , . , . , . , ,

0) ( S (cid:48) ) against H : θ ∗ (cid:54) = (1 , . , . , . , , . (5.3)In each scenario S , S , S’ and S’ , we simulate 200 replications with the sample size n = 500 , ψ, β ) after estimating the parameters of interest. Table 1 contains the empirical meanand root mean square error (RMSE) of each component of the estimator. The last two columns of Table 1indicate the empirical levels and powers of the above tests at the nominal level α = 0 .

05, where the empiricalpowers are computed under the alternative H respectively in the scenario S and S’ . For the scenario S’ ,the histograms and estimated densities of the estimates are plotted in Figure 1.From these ﬁndings, one can see that, in all scenarios, the performance of the QMLE is satisfactory interms of the mean and that, the RMSE of the estimators decreases when n increases. This is consistent withthe results of Theorem 2.2. Also remark that, the fact of computing the QMLE with ( (cid:98) ψ, (cid:98) β ) for trajectoriesgenerated without covariates (see the scenarios S and S’ ) does not aﬀect the performance of the QMLE, whichagain conﬁrms its good theoretical properties. As seen in Figure 1, for each component of (cid:98) θ n , the estimateddensity is very close to that of the normal distribution; which is in accordance with the asymptotic resultsobtained from Theorem 2.3 when θ ∗ ∈ ◦ Θ. The results of the test (see Table 1) show that, the statistic W n isslightly undersized for a sample size of 500 observations, but the empirical levels are reasonable when n = 1000in the sense that, they are very close to the nominal one. Further, the empirical powers displayed increases withthe sample size and are quite accurate.Now, we are going to carry out other simulation experiments aimed at evaluating the eﬀectiveness of theproposed model selection procedure for choosing the order q in the model (5.1). To this end, consider the casewhere q = 2 as the ”true” model m ∗ . We generate trajectories in the following situations: • scenario S ∗ : θ ∗ = (0 . , . , . , . , , . , . , . • scenario S ∗ : θ ∗ = (0 . , . , . , . , . , . , . , . n = 100 , , , . . . , q ∈ { , , . . . , } , which leads us to a collectionof 10 models. We compare the performances of the procedure with κ n = log n (see (3.2)) linked to the BayesianInformation Criteria (BIC) and the procedure with κ n = c log log n ( c ∈ { , . , } ) linked to the Hannan-Quinninformation Criterion (HQC). For each penalty, Figures 2 and 3 display the points ( n, (cid:98) q n ), where (cid:98) q n denotes theaverage of the orders selected with trajectories of length n , as well as the curve of the proportions of number ofreplications (frequencies) where the associated criterion selects the true order.From these ﬁgures, the ﬁrst remark is that, for all the penalties, the performances of the procedure increasewith n in each scenario. Further, the probability of selecting the true order is very close to 1 when n = 1000.This shows that, these procedures are in accordance with the results of Theorem 3.1. One can notice that,in the scenario S ∗ , the log n penalty is more interesting for selecting the true order than the others penaltiesfor a small sample size (see Figure 2 for n ≤ S ∗ , the HQC with c = 2 slightlyoutperforms the BIC penalization when n ≤

350 (see Figure 3). However, the larger the sample size, the c log log n penalty (except for the case where c = 2) provides the same accuracies in comparison with thelog n penalty, and displays satisfactory results. The results also show that, as c increases, the performances ofthe c log log n penalty increase, which reveals that the common use of the classical HQC penalization (i.e, the iop and Kengne c log log n penalty with c = 2) is not always the optimal choice to select the best model with this informationcriterion.Table 1: Sample mean and RMSE × of the QMLE for the model (5.1) following the scenarios S , S , S’ and S’ . The last two columns show the empirical levels and powers at the nominal level . for the test of therelevance of the exogenous covariates. QMLE Statistic W n Scenarios n (cid:98) φ (cid:98) φ (cid:98) α (cid:98) α (cid:98) ψ (cid:98) β Levels Powers S

500 Mean 0 . − . . . . . . . . . . . . . − . . . − . . . . . . . . . S

500 Mean 0 . − . . . . . . . . . . . . . − . . . . . . . . . . . . S’

500 Mean 1 . . . . . . . . . . . . . . . . . − . . . . . . . . . S’

500 Mean 1 . . . . . . . . . . . . . . . . . . . . . . . . . . Histogram of f f Histogram of f f Histogram of a a Histogram of a a Histogram of y y −0.05 0.00 0.05 0.10 0.15 Histogram of b b Figure 1:

Histograms of the components of (cid:98) θ n in the scenario S’ with sample size n = 1000 . The overlayingcurves are the density estimates and the dotted vertical lines represent the true values of the parameters. Inference and model selection in general causal time series with exogenous covariates

200 400 600 800 1000 . . . . . (a) Avarage of the selected orders Sample size q n BICHQC with c = 2HQC with c = 3.5HQC with c = 5 200 400 600 800 1000 . . . . . . (b) Probability of selecting the true order Sample size F r equen cy BICHQC with c = 2HQC with c = 3.5HQC with c = 5

Figure 2:

The averages of the orders selected and the frequencies of selecting the true order based on 100independent replications depending on sample’s length in the scenario S ∗ .

200 400 600 800 1000 . . . . (a) Avarage of the selected orders Sample size q n BICHQC with c = 2HQC with c = 3.5HQC with c = 5 200 400 600 800 1000 . . . . . (b) Probability of selecting the true order Sample size F r equen cy BICHQC with c = 2HQC with c = 3.5HQC with c = 5

Figure 3:

The averages of the orders selected and the frequencies of selecting the true order based on 100independent replications depending on sample’s length in the scenario S ∗ . This paper considers a general class of causal processes with exogenous covariates in a semiparametric frame-work. This class is quite extensive and many classical processes such as ARMA-GARCH, ARMAX-GARCH-X,APARCH-X, · · · are particular cases. Suﬃcient conditions for the existence of a stationary and ergodic solution iop and Kengne AC - X ( M θ ∗ , f θ ∗ ) is carried out by a penalized quasi likelihood con-trast. The weak and the strong consistency of the proposed procedure is established. These results providesuﬃcient conditions for the consistency of the BIC and the HQC procedures. Simulation study shows that, theempirical and the theoretical results are overall in accordance.An extension of this works is to address the inference, the signiﬁcance test of parameter, the model selectionproblem for the class AC - X ( M θ ∗ , f θ ∗ ) with a non Gaussian quasi likelihood. For instance, as pointed out byKengne (2021), the use of the Laplacian quasi likelihood will allow to reduce the order of moments imposed tothe process. Another topic of a research project, is the change-point detection in this class of models. To simplify the expressions, in the proofs of Theorems 2.2, 2.3 and 3.1, we will use the conditional Gaussianquasi-log-likelihood given by L n ( θ ) = − n (cid:80) t =1 q t ( θ ) and (cid:98) L n ( θ ) = − n (cid:80) t =1 (cid:98) q t ( θ ). Throughout the sequel, C denotes apositive constant whom value may diﬀer from an inequality to another. We verify that the process Z t := ( Y t ; X t ) satisﬁes the conditions required for the Theorem 3.1 in Doukhan andWintenberger [6]. According to (1.1), for all t ∈ Z , Z t = (cid:0) M θ ∗ ( Y t − , . . . ; X t − , . . . ) ξ t + f θ ∗ ( Y t − , . . . ; X t − , . . . ); g ( X t − , . . . ; η t ) (cid:1) = F ( Z t − , Z t − , . . . ; U t ) , with U t = ( ξ t , η t ) and F ( zzz ; U t ) = (cid:0) M θ ∗ ( y , . . . ; x , . . . ) ξ t + f θ ∗ ( y , . . . ; x , . . . ); g ( x , . . . ; η t ) (cid:1) for all zzz = (cid:0) ( y k , x k ) (cid:1) k ∈ N ∈ ( R d x +1 ) N . Thus, the equation (1.1) of [6] holds for ( Z t ) t ∈ Z . For a vector z = ( y, x ) ∈ R d x +1 , deﬁne the norm (cid:107) z (cid:107) w = | y | + w x (cid:107) x (cid:107) for some w x >

0. According to Doukhan and Wintenberger (2008), it suﬃces to show that:(i) E (cid:107) F ( zzz ; U ) (cid:107) rw < ∞ for some zzz ∈ ( R d x +1 ) N ;(ii) there exists a non-negative sequence ( α k ( F )) k ≥ satisfying (cid:80) k ≥ α k ( F ) < zzz, ˜ zzz ∈ ( R d x +1 ) N , E (cid:107) F ( zzz ; U ) − F (˜ zzz ; U ) (cid:107) rw ≤ (cid:88) k ≥ α k ( F ) (cid:107) z k − ˜ z k (cid:107) w . Using the condition (2.2), the part (i) is directly obtained from the assumptions A ( f θ , Θ), A ( M θ , Θ).To prove (ii), let zzz = ( z , . . . ) , ˜ zzz = (˜ z , . . . ) ∈ ( R d x +1 ) N such that z k = ( y k , x k ) and ˜ z k = (˜ y k , ˜ x k ) for all k ≥ Inference and model selection in general causal time series with exogenous covariates

From A ( f θ , Θ), A ( M θ , Θ) and (2.2), we get (cid:107)(cid:107) F ( zzz ; U ) − F (˜ zzz ; U ) (cid:107) w (cid:107) r ≤ (cid:13)(cid:13) (cid:107) ( M θ ( y , . . . ; x , . . . ) − M θ (˜ y , . . . ; ˜ x , . . . )) ξ (cid:107) Θ + (cid:107) f θ ( y , . . . ; x , . . . ) − f θ (˜ y , . . . ; ˜ x , . . . ) (cid:107) Θ (cid:13)(cid:13) r + w x (cid:107) g ( x , . . . ; η ) − g (˜ x , . . . ; η ) (cid:107) r ≤ ∞ (cid:88) k =1 Ä α (0) k,Y ( f θ , Θ) + (cid:107) ξ (cid:107) r α (0) k,Y ( M θ , Θ) ä | y k − ˜ y k | + ∞ (cid:88) k =1 Ä α (0) k,X ( f θ , Θ) + (cid:107) ξ (cid:107) r α (0) k,X ( M θ , Θ) ä (cid:107) x k − ˜ x k (cid:107) + w x ∞ (cid:88) k =1 α k ( g ) (cid:107) x k − ˜ x k (cid:107)≤ ∞ (cid:88) k =1 Ä α (0) k,Y ( f θ , Θ) + (cid:107) ξ (cid:107) r α (0) k,Y ( M θ , Θ) ä | y k − ˜ y k | + w x ∞ (cid:88) k =1 (cid:16) w x (cid:0) α (0) k,X ( f θ , Θ) + (cid:107) ξ (cid:107) r α (0) k,X ( M θ , Θ) (cid:1) + α k ( g ) (cid:17) (cid:107) x k − ˜ x k (cid:107)≤ ∞ (cid:88) k =1 α k ( F ) (cid:107) z k − ˜ z k (cid:107) w with α k ( F ) = max ¶ α (0) k,Y ( f θ , Θ) + (cid:107) ξ (cid:107) r α (0) k,Y ( M θ , Θ) , w x (cid:0) α (0) k,X ( f θ , Θ) + (cid:107) ξ (cid:107) r α (0) k,X ( M θ , Θ) (cid:1) + α k ( g ) © . Thus, toget ∞ (cid:80) k =1 α k ( F ) <

1, it suﬃces to choose w x suﬃciently large, such that w x > (cid:80) k ≥ ¶ α (0) k,X ( f θ , Θ) + (cid:107) ξ (cid:107) r α (0) k,X ( M θ , Θ) © − (cid:80) k ≥ max ¶ α k ( g ) , α (0) k,Y ( f θ , Θ) + (cid:107) ξ (cid:107) r α (0) k,Y ( M θ , Θ) © . This completes the proof of the proposition. (cid:4)

We consider the following lemma.

Lemma 7.1

Assume that the assumptions of Theorem 2.2 hold. Then n (cid:13)(cid:13)(cid:98) L n ( θ ) − L n ( θ ) (cid:13)(cid:13) Θ a.s. −→ n →∞ . Proof of Lemma 7.1

Remark that 1 n (cid:107) (cid:98) L n ( θ ) − L n ( θ ) (cid:107) Θ ≤ n n (cid:88) t =1 (cid:107) (cid:98) q t ( θ ) − q t ( θ ) (cid:107) Θ . Hence, by Corollary 1 of Kounias and Weng (1969), with 2 ≤ ˜ r ≤ min { , r } (without loss of generality), itsuﬃces to show that (cid:88) (cid:96) ≥ (cid:96) ˜ r/ E Ä (cid:107) (cid:98) q (cid:96) ( θ ) − q (cid:96) ( θ ) (cid:107) ˜ r/ ä < ∞ . (7.1) iop and Kengne θ ∈ Θ, by applying the mean value theorem at the functions x (cid:55)→ x and x (cid:55)→ log x , we have | (cid:98) q t ( θ ) − q t ( θ ) | ≤ (cid:12)(cid:12)(cid:12) ( Y t − (cid:98) f tθ ) (cid:98) H tθ − ( Y t − f tθ ) H tθ (cid:12)(cid:12)(cid:12) + | log (cid:98) H tθ − log H tθ |≤ (cid:12)(cid:12)(cid:12) ( Y t − (cid:98) f tθ ) ( (cid:99) M tθ ) − ( Y t − f tθ ) ( M tθ ) (cid:12)(cid:12)(cid:12) + 2 (cid:12)(cid:12) log | (cid:99) M tθ | − log | M tθ | (cid:12)(cid:12) ≤ (cid:12)(cid:12)(cid:12) ( Y t − (cid:98) f tθ ) (cid:16) (cid:99) M tθ ) − M tθ ) (cid:17) + 1( M tθ ) Ä ( Y t − (cid:98) f tθ ) − ( Y t − f tθ ) ä (cid:12)(cid:12)(cid:12) + 2 h / | (cid:99) M tθ − M tθ |≤ h / | Y t − (cid:98) f tθ | | (cid:99) M tθ − M tθ | + 1 h | (cid:98) f tθ − f tθ || (cid:98) f tθ + f tθ − Y t | + 2 h / | (cid:99) M tθ − M tθ |≤ C (cid:0) ( | Y t − f tθ | + 1) | (cid:99) M tθ − M tθ | + | (cid:98) f tθ − f tθ || (cid:98) f tθ + f tθ − Y t | (cid:1) . This implies E (cid:2) (cid:107) (cid:98) q t ( θ ) − q t ( θ ) (cid:107) ˜ r/ (cid:3) ≤ C (cid:16) E (cid:104)(cid:0) (cid:107) Y t − f tθ (cid:107) + 1 (cid:1) ˜ r/ (cid:107) (cid:99) M tθ − M tθ (cid:107) ˜ r/ (cid:105) + E (cid:104) (cid:107) (cid:98) f tθ − f tθ (cid:107) ˜ r/ (cid:0) (cid:107) (cid:98) f tθ (cid:107) Θ + (cid:107) f tθ (cid:107) Θ + 2 | Y t | (cid:1) ˜ r/ (cid:105)(cid:17) . Moreover, since θ ∗ ∈ Θ( r ) for some r ≥

2, by Assumption A (Ψ θ , Θ), one can easily show that: • E î | Y t | r + (cid:107) f tθ (cid:107) r Θ + (cid:107) (cid:98) f tθ (cid:107) r Θ + (cid:107) M tθ (cid:107) r Θ + (cid:107) (cid:99) M tθ (cid:107) r Θ + (cid:107) H tθ (cid:107) r/ + (cid:107) (cid:98) H tθ (cid:107) r/ ó < ∞ ; (7.2) •  E ( (cid:107) (cid:98) f tθ − f tθ (cid:107) r Θ ) ≤ C (cid:16) (cid:80) k ≥ t (cid:8) α (0) k,Y ( f θ , Θ) + α (0) k,X ( f θ , Θ) (cid:9)(cid:17) r ; E ( (cid:107) (cid:99) M tθ − M tθ (cid:107) r Θ ) ≤ C (cid:16) (cid:80) k ≥ t (cid:8) α (0) k,Y ( M θ , Θ) + α (0) k,X ( M θ , Θ) (cid:9)(cid:17) r . (7.3)Then, by the H¨older’s inequality, we have E (cid:104) (cid:107) (cid:98) f (cid:96)θ − f (cid:96)θ (cid:107) ˜ r/ (cid:0) (cid:107) (cid:98) f (cid:96)θ (cid:107) Θ + (cid:107) f (cid:96)θ (cid:107) Θ + 2 | Y (cid:96) | (cid:1) ˜ r/ (cid:105) ≤ (cid:16) E (cid:2) (cid:107) (cid:98) f (cid:96)θ − f (cid:96)θ (cid:107) ˜ r Θ (cid:3)(cid:17) / (cid:16) E (cid:2) (cid:107) (cid:98) f (cid:96)θ (cid:107) Θ + (cid:107) f (cid:96)θ (cid:107) Θ + 2 | Y (cid:96) | (cid:3) ˜ r/ (cid:17) / ≤ C (cid:16) (cid:88) k ≥ t (cid:8) α (0) k,Y ( f θ , Θ) + α (0) k,X ( f θ , Θ) (cid:9)(cid:17) ˜ r/ . Again, by the H¨older’s inequality, from (7.2) and (7.3), we obtain E (cid:104)(cid:0) (cid:107) Y t − f (cid:96)θ (cid:107) + 1 (cid:1) ˜ r/ (cid:107) (cid:99) M (cid:96)θ − M (cid:96)θ (cid:107) ˜ r/ (cid:105) ≤ E (cid:104) (cid:107) Y t + f (cid:96)θ + 1 (cid:107) r/ (cid:107) (cid:99) M (cid:96)θ − M (cid:96)θ (cid:107) ˜ r/ (cid:105) ≤ (cid:0) E (cid:2) (cid:107) Y t + f (cid:96)θ + 1 (cid:107) ˜ r Θ (cid:3)(cid:1) / (cid:0) E (cid:2) (cid:107) (cid:99) M (cid:96)θ − M (cid:96)θ (cid:107) ˜ r Θ (cid:3)(cid:1) / ≤ C (cid:16) (cid:88) k ≥ t (cid:8) α (0) k,Y ( M θ , Θ) + α (0) k,X ( M θ , Θ) (cid:9)(cid:17) ˜ r/ . Hence, from (2.3), we deduce (cid:88) (cid:96) ≥ (cid:96) ˜ r/ E Ä (cid:107) (cid:98) q (cid:96) ( θ ) − q (cid:96) ( θ ) (cid:107) ˜ r/ ä ≤ C (cid:88) (cid:96) ≥ (cid:96) ˜ r/ (cid:16) (cid:88) k ≥ (cid:96) (cid:8) α (0) k,Y ( f θ , Θ) + α (0) k,X ( f θ , Θ) + α (0) k,Y ( M θ , Θ) + α (0) k,X ( M θ , Θ) (cid:9)(cid:17) ˜ r/ ≤ C (cid:88) (cid:96) ≥ (cid:96) ˜ r/ (cid:16) (cid:96) γ − (cid:17) ˜ r/ ≤ C (cid:88) (cid:96) ≥ (cid:96) ˜ rγ/ < ∞ , Inference and model selection in general causal time series with exogenous covariates where the last inequality holds since γ > /

2. Thus, the condition (7.1) is satisﬁed. This completes the proofLemma 7.1. (cid:4)

To complete the proof of Theorem 2.2, we will show that: (1.) E [ (cid:107) q t ( θ ) (cid:107) Θ ] < ∞ and (2.) the function θ (cid:55)→ − E [ q ( θ )] has a unique maximum at θ ∗ .(1.) For all θ ∈ Θ, using the inequality | log( x ) | ≤ | x − | for all x >

1, we have | q t ( θ ) | ≤ H tθ | Y t − f tθ ) | + (cid:12)(cid:12) log (cid:0) H tθ h (cid:1) + log( h ) (cid:12)(cid:12) ≤ h ( Y t + ( f tθ ) + 2 Y t f tθ ) + (cid:12)(cid:12) H tθ h − (cid:12)(cid:12) + | log( h ) |≤ C (cid:0) Y t + ( f tθ ) + 2 Y t f tθ + | M tθ | (cid:1) + C. Hence, from (7.2), we deduce E [ (cid:107) q t (cid:107) Θ ] ≤ C (cid:16) E [ Y t ] + E (cid:107) f tθ (cid:107) + 2 (cid:0) ( E [ Y t ]) / ( E [ (cid:107) f tθ (cid:107) ]) / (cid:1) + E [ (cid:107) M tθ (cid:107) ] (cid:17) + C < ∞ , which shows that (1.) holds.(2.) Let θ ∈ Θ with θ (cid:54) = θ ∗ . We have E [ q ( θ )] − E [ q ( θ ∗ )] = E (cid:2) E [( q ( θ ) − q ( θ ∗ )) |F − ] (cid:3) . (7.4)Moreover, E [( q ( θ ) − q ( θ ∗ )) |F − ] = E (cid:2) ( Y − f θ ) H θ + log H θ − ( Y − f θ ∗ ) H θ ∗ − log H θ ∗ (cid:12)(cid:12) F − (cid:3) = − log (cid:0) H θ ∗ H θ (cid:1) + E (cid:2) ( Y − f θ ) (cid:12)(cid:12) F − (cid:3) H θ − E (cid:2) ( Y − f θ ∗ ) (cid:12)(cid:12) F − (cid:3) H θ ∗ = − log (cid:0) H θ ∗ H θ (cid:1) − E (cid:2) ( Y − f θ ∗ + f θ ∗ − f θ ) (cid:12)(cid:12) F − (cid:3) H θ = H θ ∗ H θ − log (cid:0) H θ ∗ H θ (cid:1) − f θ ∗ − f θ ) H θ . Therefore, using (7.4) and by applying the Jensen’s inequality, we get E [ q ( θ )] − E [ q ( θ ∗ )] = E (cid:2) H θ ∗ H θ − log (cid:0) H θ ∗ H θ (cid:1) − f θ ∗ − f θ ) H θ (cid:3) ≥ E (cid:2) H θ ∗ H θ (cid:3) − log (cid:0) E (cid:2) H θ ∗ H θ (cid:3)(cid:1) − E (cid:2) ( f θ ∗ − f θ ) H θ (cid:3) . Since x − log( x ) − > x > , x (cid:54) = 1; and x − log( x ) − x = 1, we deduce: • if f θ ∗ (cid:54) = f θ a.s. , then E (cid:2) ( f θ ∗ − f θ ) H θ (cid:3) > E [ q ( θ )] − E [ q ( θ ∗ )] > • if f θ ∗ = f θ a.s. , then E [ q ( θ )] − E [ q ( θ ∗ )] = E (cid:2) H θ ∗ H θ − log (cid:0) H θ ∗ H θ (cid:1) − (cid:3) From the identiﬁability condition ( A0 ), when θ ∗ (cid:54) = θ and f θ ∗ = f θ a.s. , we necessarily have H θ ∗ (cid:54) = H θ a.s. . This implies H θ ∗ H θ (cid:54) = 1 a.s. , and thus E [ q ( θ )] − E [ q ( θ ∗ )] > iop and Kengne E [ q ( θ )] = E [ q ( θ ∗ )] holds a.s. if and only if θ ∗ = θ . This achieves the proof of (2.).Since { ( Y t , X t ) , t ∈ Z } is stationary and ergodic, the process { q t ( θ ) , t ∈ Z } is also a stationary and ergodicsequence. Then, according to (1.), by the uniform strong law of large number applied on the process { q t ( θ ) , t ∈ Z } , it holds that (cid:13)(cid:13) n L n ( θ ) + E ( q ( θ )) (cid:13)(cid:13) Θ = (cid:13)(cid:13) n n (cid:88) t =1 q t ( θ ) − E ( q ( θ )) (cid:13)(cid:13) Θ a.s. −→ n →∞ . Then, by Lemma 7.1, we obtain (cid:13)(cid:13) n (cid:98) L n ( θ ) + E ( q ( θ )) (cid:13)(cid:13) Θ ≤ n (cid:13)(cid:13)(cid:98) L n ( θ ) − L n ( θ ) (cid:13)(cid:13) Θ + (cid:13)(cid:13) n L n ( θ ) + E ( q ( θ )) (cid:13)(cid:13) Θ a.s. −→ n →∞ . (7.5)The part (2.) and (7.5) lead to conclude the proof of the theorem. (cid:4) The following lemma is needed.

Lemma 7.2

Assume that the conditions of Theorem 2.3 hold. Then (i.) E (cid:2) √ n (cid:13)(cid:13) ∂ (cid:98) L n ( θ ) ∂θ − ∂L n ( θ ) ∂θ (cid:13)(cid:13) Θ (cid:3) −→ n →∞ n (cid:13)(cid:13) ∂ (cid:98) L n ( θ ) ∂θ∂θ (cid:48) − ∂ L n ( θ ) ∂θ∂θ (cid:48) (cid:13)(cid:13) Θ a.s. −→ n →∞ (cid:13)(cid:13) n n (cid:80) t =1 ∂ q t ( θ ) ∂θ∂θ (cid:48) − E (cid:0) ∂ q ( θ ) ∂θ∂θ (cid:48) (cid:1)(cid:13)(cid:13) Θ a.s. −→ n →∞ . Proof of Lemma 7.2 (i.) Remark that (cid:13)(cid:13)(cid:13) ∂ (cid:98) L n ( θ ) ∂θ − ∂L n ( θ ) ∂θ (cid:13)(cid:13)(cid:13) Θ ≤ n (cid:88) t =1 (cid:13)(cid:13)(cid:13) ∂ (cid:98) q t ( θ ) ∂θ − ∂q t ( θ ) ∂θ (cid:13)(cid:13)(cid:13) Θ . (7.6)Moreover, for all θ ∈ Θ, ∂q t ( θ ) ∂θ = − ( H tθ ) − (cid:0) H tθ ( Y t − f tθ ) ∂f tθ ∂θ + ( Y t − f tθ ) ∂H tθ ∂θ (cid:1) + ( H tθ ) − H tθ ∂θ = − H tθ ) − ( Y t − f tθ ) ∂f tθ ∂θ + ( Y t − f tθ ) ∂ ( H tθ ) − ∂θ + ( H tθ ) − ∂H tθ ∂θ , (7.7)which implies (cid:12)(cid:12)(cid:12) ∂ (cid:98) q t ( θ ) ∂θ − ∂q t ( θ ) ∂θ (cid:12)(cid:12)(cid:12) ≤ (cid:12)(cid:12)(cid:12) ( (cid:98) H tθ ) − ( Y t − (cid:98) f tθ ) ∂ (cid:98) f tθ ∂θ − ( H tθ ) − ( Y t − f tθ ) ∂f tθ ∂θ (cid:12)(cid:12)(cid:12) + (cid:12)(cid:12)(cid:12) ( Y t − (cid:98) f tθ ) ∂ ( (cid:98) H tθ ) − ∂θ − ( Y t − f tθ ) ∂ ( H tθ ) − ∂θ (cid:12)(cid:12)(cid:12) + (cid:12)(cid:12)(cid:12) ( (cid:98) H tθ ) − ∂ (cid:98) H tθ ∂θ − ( H tθ ) − ∂H tθ ∂θ (cid:12)(cid:12)(cid:12) . Inference and model selection in general causal time series with exogenous covariates

Using the relation | a b c − a b c | ≤ | a − a || b || c | + | a || b − b || c | + | a || b || c − c | , ∀ a , a , b , b , c , c , ∈ R , we get (cid:13)(cid:13)(cid:13) ∂ (cid:98) q t ( θ ) ∂θ − ∂q t ( θ ) ∂θ (cid:13)(cid:13)(cid:13) Θ ≤ (cid:16)(cid:13)(cid:13) ( (cid:98) H tθ ) − − ( H tθ ) − (cid:13)(cid:13) Θ (cid:107) Y t − (cid:98) f tθ (cid:107) Θ (cid:13)(cid:13)(cid:13) ∂f tθ ∂θ (cid:13)(cid:13)(cid:13) Θ + (cid:13)(cid:13) ( (cid:98) H tθ ) − (cid:13)(cid:13) Θ (cid:107) (cid:98) f tθ − f tθ (cid:107) Θ (cid:13)(cid:13)(cid:13) ∂ (cid:98) f tθ ∂θ (cid:13)(cid:13)(cid:13) Θ + (cid:13)(cid:13) ( (cid:98) H tθ ) − (cid:13)(cid:13) Θ (cid:107) Y t − f tθ (cid:107) Θ (cid:13)(cid:13)(cid:13) ∂ (cid:98) f tθ ∂θ − ∂f tθ ∂θ (cid:13)(cid:13)(cid:13) Θ (cid:17) + (cid:107) ( Y t − (cid:98) f tθ ) (cid:107) (cid:13)(cid:13)(cid:13) ∂ ( (cid:98) H tθ ) − ∂θ − ∂ ( H tθ ) − ∂θ (cid:13)(cid:13)(cid:13) Θ + 2 | Y t |(cid:107) (cid:98) f tθ − f tθ (cid:107) Θ (cid:13)(cid:13)(cid:13) ∂ ( H tθ ) − ∂θ (cid:13)(cid:13)(cid:13) Θ + (cid:13)(cid:13) ( (cid:98) H tθ ) − (cid:13)(cid:13) Θ (cid:13)(cid:13)(cid:13) ∂ (cid:98) H tθ ∂θ − ∂H tθ ∂θ (cid:13)(cid:13)(cid:13) Θ + (cid:13)(cid:13) ( (cid:98) H tθ ) − − ( H tθ ) − (cid:13)(cid:13) Θ (cid:13)(cid:13)(cid:13) ∂H tθ ∂θ (cid:13)(cid:13)(cid:13) Θ ≤ h ) − (cid:16) (cid:107) (cid:98) f tθ − f tθ (cid:107) Θ (cid:13)(cid:13)(cid:13) ∂ (cid:98) f tθ ∂θ (cid:13)(cid:13)(cid:13) Θ + (cid:107) Y t − f tθ (cid:107) Θ (cid:13)(cid:13)(cid:13) ∂ (cid:98) f tθ ∂θ − ∂f tθ ∂θ (cid:13)(cid:13)(cid:13) Θ + 12 (cid:13)(cid:13)(cid:13) ∂ (cid:98) H tθ ∂θ − ∂H tθ ∂θ (cid:13)(cid:13)(cid:13) Θ (cid:17) + 2 (cid:13)(cid:13) ( (cid:98) H tθ ) − − ( H tθ ) − (cid:13)(cid:13) Θ (cid:107) Y t − (cid:98) f tθ (cid:107) Θ (cid:13)(cid:13)(cid:13) ∂f tθ ∂θ (cid:13)(cid:13)(cid:13) Θ + (cid:107) ( Y t − (cid:98) f tθ ) (cid:107) (cid:13)(cid:13)(cid:13) ∂ ( (cid:98) H tθ ) − ∂θ − ∂ ( H tθ ) − ∂θ (cid:13)(cid:13)(cid:13) Θ + 2 | Y t |(cid:107) (cid:98) f tθ − f tθ (cid:107) Θ (cid:13)(cid:13)(cid:13) ∂ ( H tθ ) − ∂θ (cid:13)(cid:13)(cid:13) Θ + (cid:13)(cid:13) ( (cid:98) H tθ ) − − ( H tθ ) − (cid:13)(cid:13) Θ (cid:13)(cid:13)(cid:13) ∂H tθ ∂θ (cid:13)(cid:13)(cid:13) Θ . (7.8)By applying the H¨older’s inequality to the terms of the right hand side of (7.8), we have E (cid:104)(cid:13)(cid:13)(cid:13) ∂ (cid:98) q t ( θ ) ∂θ − ∂q t ( θ ) ∂θ (cid:13)(cid:13)(cid:13) Θ (cid:105) ≤ C (cid:34) Ä E [ (cid:107) (cid:98) f tθ − f tθ (cid:107) ] ä / (cid:16) E (cid:104)(cid:13)(cid:13)(cid:13) ∂ (cid:98) f tθ ∂θ (cid:13)(cid:13)(cid:13) / (cid:105)(cid:17) / + (cid:0) E [ (cid:107) Y t − f tθ (cid:107) / ] (cid:1) / (cid:16) E (cid:104)(cid:13)(cid:13)(cid:13) ∂ (cid:98) f tθ ∂θ − ∂f tθ ∂θ (cid:13)(cid:13)(cid:13) (cid:105)(cid:17) / + (cid:16) E (cid:104)(cid:13)(cid:13)(cid:13) ∂ (cid:98) H tθ ∂θ − ∂H tθ ∂θ (cid:13)(cid:13)(cid:13) (cid:105)(cid:17) / + (cid:16) E (cid:104)(cid:13)(cid:13) ( (cid:98) H tθ ) − − ( H tθ ) − (cid:13)(cid:13) (cid:105)(cid:17) / Ä E [ (cid:107) Y t − (cid:98) f tθ (cid:107) ] ä / (cid:16) E (cid:104)(cid:13)(cid:13)(cid:13) ∂f tθ ∂θ (cid:13)(cid:13)(cid:13) (cid:105)(cid:17) / + Ä E [ (cid:107) ( Y t − (cid:98) f tθ ) (cid:107) ] ä / (cid:16) E (cid:104)(cid:13)(cid:13)(cid:13) ∂ ( (cid:98) H tθ ) − ∂θ − ∂ ( H tθ ) − ∂θ (cid:13)(cid:13)(cid:13) (cid:105)(cid:17) / + (cid:0) E [ | Y t | ] (cid:1) / Ä E [ (cid:107) (cid:98) f tθ − f tθ (cid:107) ] ä / × (cid:16) E (cid:104)(cid:13)(cid:13)(cid:13) ∂ ( H tθ ) − ∂θ (cid:13)(cid:13)(cid:13) (cid:105)(cid:17) / + Ä E (cid:2)(cid:13)(cid:13) ( (cid:98) H tθ ) − − ( H tθ ) − (cid:13)(cid:13) (cid:3) ä / (cid:16) E (cid:104)(cid:13)(cid:13)(cid:13) ∂H tθ ∂θ (cid:13)(cid:13)(cid:13) / (cid:105)(cid:17) / (cid:35) . Moreover, since θ ∗ ∈ Θ( r ), using A i ( f θ , Θ) and A i ( M θ , Θ) (with i = 0 , • E (cid:104)(cid:13)(cid:13)(cid:13) ∂f tθ ∂θ (cid:13)(cid:13)(cid:13) r Θ + (cid:13)(cid:13)(cid:13) ∂ (cid:98) f tθ ∂θ (cid:13)(cid:13)(cid:13) r Θ + (cid:13)(cid:13)(cid:13) ∂M tθ ∂θ (cid:13)(cid:13)(cid:13) r Θ + (cid:13)(cid:13)(cid:13) ∂ (cid:99) M tθ ∂θ (cid:13)(cid:13)(cid:13) r Θ + (cid:13)(cid:13)(cid:13) ∂H tθ ∂θ (cid:13)(cid:13)(cid:13) r/ + (cid:13)(cid:13)(cid:13) ∂ ( H tθ ) − ∂θ (cid:13)(cid:13)(cid:13) r/ (cid:105) < ∞ , (7.9) •  E (cid:104)(cid:13)(cid:13)(cid:13) ∂ (cid:98) f tθ ∂θ − ∂f tθ ∂θ (cid:13)(cid:13)(cid:13) r Θ (cid:105) ≤ C (cid:16) (cid:80) k ≥ t (cid:8) α (1) k,Y ( f θ , Θ) + α (1) k,X ( f θ , Θ) (cid:9)(cid:17) r , E (cid:104)(cid:13)(cid:13) ( (cid:98) H tθ ) − − ( H tθ ) − (cid:13)(cid:13) r Θ (cid:105) ≤ C (cid:16) (cid:80) k ≥ t (cid:8) α (0) k,Y ( M θ , Θ) + α (0) k,X ( M θ , Θ) (cid:9)(cid:17) r , E (cid:104)(cid:13)(cid:13)(cid:13) ∂ (cid:98) H tθ ∂θ − ∂H tθ ∂θ (cid:13)(cid:13)(cid:13) r/ (cid:105) ≤ C (cid:16) (cid:80) k ≥ t (cid:8) α (0) k,Y ( M θ , Θ) + α (1) k,X ( M θ , Θ) (cid:9)(cid:17) r/ , E (cid:104)(cid:13)(cid:13)(cid:13) ∂ ( (cid:98) H tθ ) − ∂θ − ∂ ( H tθ ) − ∂θ (cid:13)(cid:13)(cid:13) r/ (cid:105) ≤ C (cid:16) (cid:80) k ≥ t (cid:8) α (0) k,Y ( M θ , Θ) + α (1) k,X ( M θ , Θ) (cid:9)(cid:17) r/ . (7.10) iop and Kengne r = 4, we obtain E (cid:104)(cid:13)(cid:13)(cid:13) ∂ (cid:98) q t ( θ ) ∂θ − ∂q t ( θ ) ∂θ (cid:13)(cid:13)(cid:13) Θ (cid:105) ≤ C (cid:34) Ä E [ (cid:107) (cid:98) f tθ − f tθ (cid:107) ] ä / + (cid:16) E (cid:104)(cid:13)(cid:13)(cid:13) ∂ (cid:98) f tθ ∂θ − ∂f tθ ∂θ (cid:13)(cid:13)(cid:13) (cid:105)(cid:17) / + (cid:16) E (cid:104)(cid:13)(cid:13)(cid:13) ∂ (cid:98) H tθ ∂θ − ∂H tθ ∂θ (cid:13)(cid:13)(cid:13) (cid:105)(cid:17) / + (cid:16) E (cid:104)(cid:13)(cid:13) ( (cid:98) H tθ ) − − ( H tθ ) − (cid:13)(cid:13) (cid:105)(cid:17) / + (cid:16) E (cid:104)(cid:13)(cid:13)(cid:13) ∂ ( (cid:98) H tθ ) − ∂θ − ∂ ( H tθ ) − ∂θ (cid:13)(cid:13)(cid:13) (cid:105)(cid:17) / + Ä E [ (cid:107) (cid:98) f tθ − f tθ (cid:107) ] ä / + Ä E (cid:2)(cid:13)(cid:13) ( (cid:98) H tθ ) − − ( H tθ ) − (cid:13)(cid:13) (cid:3) ä / (cid:35) ≤ C (cid:88) k ≥ t (cid:110) α (0) k,Y ( f θ , Θ) + α (0) k,X ( f θ , Θ) + α (1) k,Y ( f θ , Θ) + α (1) k,X ( f θ , Θ)+ α (0) k,Y ( M θ , Θ) + α (0) k,X ( M θ , Θ) + α (1) k,Y ( M θ , Θ) + α (1) k,X ( M θ , Θ) (cid:111) . Therefore, in view of the condition (2.7), it holds that E (cid:104)(cid:13)(cid:13)(cid:13) ∂ (cid:98) q t ( θ ) ∂θ − ∂q t ( θ ) ∂θ (cid:13)(cid:13)(cid:13) Θ (cid:105) ≤ C (cid:88) k ≥ t k − γ = C t γ − . By the inequality (7.6), we deduce E (cid:104) √ n (cid:13)(cid:13)(cid:13) ∂ (cid:98) L n ( θ ) ∂θ − ∂L n ( θ ) ∂θ (cid:13)(cid:13)(cid:13) Θ (cid:105) ≤ C √ n n (cid:88) t =1 t γ − = C √ n (1 + n − γ ) −→ n →∞ . This proves the part (i.) of Lemma 7.2.(ii.) This part can be established by using the same arguments as in the proof of Lemma 7.1.(iii.) Let us show that E (cid:104)(cid:13)(cid:13)(cid:13) ∂ q t ( θ ) ∂θ i ∂θ j (cid:13)(cid:13)(cid:13) Θ (cid:105) < ∞ , for all i, j ∈ { , . . . , d } .From (7.7), for any i, j ∈ { , . . . , d } , we have ∂ q t ( θ ) ∂θ∂ i θ j = − H tθ ) − ( Y t − f tθ ) ∂ f tθ ∂θ i ∂θ j + ( Y t − f tθ ) ∂ ( H tθ ) − ∂θ i ∂θ j − Y t − f tθ ) (cid:16) ∂f tθ ∂θ i ∂ ( H tθ ) − ∂θ j + ∂f tθ ∂θ j ∂ ( H tθ ) − ∂θ i (cid:17) + 2( H tθ ) − ∂f tθ ∂θ i ∂f tθ ∂θ j + ∂ ( H tθ ) − ∂θ j ∂H tθ ∂θ i + ( H tθ ) − ∂ H tθ ∂θ i ∂θ j . Therefore, according to ( A1 ), we get (cid:13)(cid:13)(cid:13) ∂ q t ( θ ) ∂θ∂ i θ j (cid:13)(cid:13)(cid:13) Θ ≤ C (cid:107) ( Y t − f tθ ) (cid:107) Θ (cid:16)(cid:13)(cid:13)(cid:13) ∂ f tθ ∂θ i ∂θ j (cid:13)(cid:13)(cid:13) Θ + (cid:13)(cid:13)(cid:13) ∂f tθ ∂θ i (cid:13)(cid:13)(cid:13) Θ (cid:13)(cid:13)(cid:13) ∂ ( H tθ ) − ∂θ j (cid:13)(cid:13)(cid:13) Θ + (cid:13)(cid:13)(cid:13) ∂f tθ ∂θ j (cid:13)(cid:13)(cid:13) Θ (cid:13)(cid:13)(cid:13) ∂ ( H tθ ) − ∂θ i (cid:13)(cid:13)(cid:13) Θ (cid:17) + C (cid:16)(cid:13)(cid:13)(cid:13) ∂f tθ ∂θ i (cid:13)(cid:13)(cid:13) Θ (cid:13)(cid:13)(cid:13) ∂f tθ ∂θ j (cid:13)(cid:13)(cid:13) Θ + (cid:13)(cid:13)(cid:13) ∂ H tθ ∂θ i ∂θ j (cid:13)(cid:13)(cid:13) Θ (cid:17) + (cid:107) ( Y t − f tθ ) (cid:107) (cid:13)(cid:13)(cid:13) ∂ ( H tθ ) − ∂θ i ∂θ j (cid:13)(cid:13)(cid:13) Θ + (cid:13)(cid:13)(cid:13) ∂ ( H tθ ) − ∂θ j (cid:13)(cid:13)(cid:13) Θ (cid:13)(cid:13)(cid:13) ∂H tθ ∂θ i (cid:13)(cid:13)(cid:13) Θ . (7.11)Moreover, by A ( f θ , Θ) and A ( M θ , Θ), one can show that E (cid:104)(cid:13)(cid:13)(cid:13) ∂ f tθ ∂θ i ∂θ j (cid:13)(cid:13)(cid:13) + (cid:13)(cid:13)(cid:13) ∂ H tθ ∂θ i ∂θ j (cid:13)(cid:13)(cid:13) + (cid:13)(cid:13)(cid:13) ∂ ( H tθ ) − ∂θ i ∂θ j (cid:13)(cid:13)(cid:13) (cid:105) < ∞ . Inference and model selection in general causal time series with exogenous covariates

Thus, by applying the H¨older’s inequality to the terms of the right hand side of (7.11), it suﬃces to use(7.2) and (7.9) to obtain E (cid:104)(cid:13)(cid:13)(cid:13) ∂ q t ( θ ) ∂θ i ∂θ j (cid:13)(cid:13)(cid:13) Θ (cid:105) < ∞ .Since E (cid:104)(cid:13)(cid:13)(cid:13) ∂ q t ( θ ) ∂θ i ∂θ j (cid:13)(cid:13)(cid:13) Θ (cid:105) < ∞ for all i, j ∈ { , . . . , d } , from the stationarity and ergodicity properties of (cid:8) ∂ q t ( θ ) ∂θ∂θ (cid:48) , t ∈ Z (cid:9) and the uniform strong law of large numbers, it holds that (cid:13)(cid:13)(cid:13) n n (cid:88) t =1 ∂ q t ( θ ) ∂θ∂θ (cid:48) − E (cid:16) ∂ q ( θ ) ∂θ∂θ (cid:48) (cid:17)(cid:13)(cid:13)(cid:13) Θ a.s. −→ n →∞ . This completes the proof of Lemma 7.2. (cid:4)

The following lemma is also needed.

Lemma 7.3

Assume that the conditions of Theorem 2.3 hold. Then (i.) (cid:8) ∂q t ( θ ∗ ) ∂θ |F t − , t ∈ Z (cid:9) is a stationary ergodic martingale diﬀerence sequence with covariance matrix G ,(ii.) − n ∂ ∂θ∂θ (cid:48) (cid:98) L n (˜ θ n ) a.s. −→ n →∞ F , for any sequence (˜ θ n ) n ≥ with values in Θ and satisfying ˜ θ n a.s. −→ n →∞ θ ∗ ,where G and F are deﬁned in (2.5). Proof of Lemma 7.3 (i.) Recall that G = E (cid:104) ∂q t ( θ ∗ ) ∂θ ∂q t ( θ ∗ ) ∂θ (cid:48) (cid:105) and that for all θ ∈ Θ, ∂q t ( θ ) ∂θ = − H tθ ) − ( Y t − f tθ ) ∂f tθ ∂θ − (cid:16) Y t − f tθ H tθ (cid:17) ∂H tθ ∂θ + ( H tθ ) − ∂H tθ ∂θ . Since the functions f tθ , H tθ , ∂f tθ ∂θ and ∂H tθ ∂θ are F t − -measurable, we have E (cid:104) ∂q t ( θ ∗ ) ∂θ (cid:12)(cid:12) F t − (cid:105) = − ( H tθ ∗ ) − ∂H tθ ∗ ∂θ (cid:0) ( H tθ ∗ ) − E (cid:2) ( Y t − f tθ ∗ ) |F t − (cid:3) − (cid:1) = 0 , which shows that (i.) holds.(ii.) Let (˜ θ n ) n ∈ N be a sequence satisfying ˜ θ n a.s. −→ n →∞ θ ∗ . For any i, j = 1 , . . . , d , we have (cid:12)(cid:12)(cid:12) n n (cid:88) t =1 ∂∂θ j ∂θ i q t (˜ θ n ) − E (cid:16) ∂∂θ j ∂θ i q ( θ ∗ ) (cid:17)(cid:12)(cid:12)(cid:12) ≤ (cid:12)(cid:12)(cid:12) n n (cid:88) t =1 ∂∂θ j ∂θ i q t (˜ θ n ) − E (cid:16) ∂∂θ j ∂θ i q (˜ θ n ) (cid:17)(cid:12)(cid:12)(cid:12) + (cid:12)(cid:12)(cid:12) E (cid:16) ∂∂θ j ∂θ i q (˜ θ n ) (cid:17) − E (cid:16) ∂∂θ j ∂θ i q ( θ ∗ ) (cid:17)(cid:12)(cid:12)(cid:12) ≤ (cid:13)(cid:13)(cid:13) n n (cid:88) t =1 ∂∂θ j ∂θ i q t ( θ ) − E (cid:16) ∂∂θ j ∂θ i q ( θ ) (cid:17)(cid:13)(cid:13)(cid:13) Θ + (cid:12)(cid:12)(cid:12) E (cid:16) ∂∂θ j ∂θ i q (˜ θ n ) (cid:17) − E (cid:16) ∂∂θ j ∂θ i q ( θ ∗ ) (cid:17)(cid:12)(cid:12)(cid:12) −→ n →∞ . Thus, − n ∂ ∂θ∂θ (cid:48) L n (˜ θ n ) = 1 n n (cid:88) t =1 ∂ ∂θ∂θ (cid:48) q t (˜ θ n ) a.s. −→ n →∞ E (cid:16) ∂ ∂θ∂θ (cid:48) q ( θ ∗ ) (cid:17) = F. We conclude the proof of the part (ii.) by using Lemma 7.2 (ii.). (cid:4) iop and Kengne θ (cid:55)→ (cid:98) L n ( θ ), for all θ ∈ Θ, there exists ˜ θ between θ and θ ∗ such that 1 n ¶ (cid:98) L n ( θ ) − (cid:98) L n ( θ ∗ ) © = 1 n ∂L n ( θ ∗ ) ∂θ (cid:48) ( θ − θ ∗ ) −

12 ( θ − θ ∗ ) (cid:48) F ( θ − θ ∗ ) + R n ( θ ) , (7.12)where R n ( θ ) = 1 n (cid:110) ∂ (cid:98) L n ( θ ∗ ) ∂θ (cid:48) − ∂L n ( θ ∗ ) ∂θ (cid:48) (cid:111) ( θ − θ ∗ ) + 12 ( θ − θ ∗ ) (cid:48) (cid:16) n ∂ ∂θ∂θ (cid:48) (cid:98) L n (˜ θ ) + F (cid:17) ( θ − θ ∗ ) . Let us deﬁne the vector Z n = F − √ n ∂L n ( θ ∗ ) ∂θ . Then, we can rewrite (7.12) as1 n ¶ (cid:98) L n ( θ ) − (cid:98) L n ( θ ∗ ) © = 12 n (cid:107) Z n (cid:107) F − n (cid:13)(cid:13) Z n − √ n ( θ − θ ∗ ) (cid:13)(cid:13) F + R n ( θ ) . (7.13)Deﬁne also θ Z n = arg inf θ ∈ Θ (cid:13)(cid:13) Z n − √ n ( θ − θ ∗ ) (cid:13)(cid:13) F . Then, by (2.4), for n large enough, we have √ n ( θ Z n − θ ∗ ) = Z C n , where Z C n is the F -projection of Z n on C . Using this relation and the deﬁnition of θ Z n , we have (cid:13)(cid:13) Z n − √ n ( (cid:98) θ n − θ ∗ ) (cid:13)(cid:13) F − (cid:13)(cid:13) Z n − Z C n (cid:13)(cid:13) F = (cid:13)(cid:13) Z n − √ n ( (cid:98) θ n − θ ∗ ) (cid:13)(cid:13) F − (cid:13)(cid:13) Z n − √ n ( θ Z n − θ ∗ ) (cid:13)(cid:13) F ≥ . Furthermore, from (7.13) and the deﬁnition of (cid:98) θ n , it holds that (cid:13)(cid:13) Z n − √ n ( (cid:98) θ n − θ ∗ ) (cid:13)(cid:13) F − (cid:13)(cid:13) Z n − √ n ( θ Z n − θ ∗ ) (cid:13)(cid:13) F = { (cid:98) L n ( θ Z n ) − (cid:98) L n ( (cid:98) θ n ) } + 2 n { R n ( (cid:98) θ n ) − R n ( θ Z n ) }≤ n { R n ( (cid:98) θ n ) − R n ( θ Z n ) } . Therefore, (cid:12)(cid:12)(cid:12)(cid:13)(cid:13) Z n − √ n ( (cid:98) θ n − θ ∗ ) (cid:13)(cid:13) F − (cid:13)(cid:13) Z n − Z C n (cid:13)(cid:13) F (cid:12)(cid:12)(cid:12) ≤ n { R n ( (cid:98) θ n ) − R n ( θ Z n ) } . (7.14)Let us consider the following Lemma. Lemma 7.4

Assume that the conditions of Theorem 2.3 hold. Then n { R n ( (cid:98) θ n ) − R n ( θ Z n ) } = o P (1) . By Lemma 7.4 and (7.14), it follows that (cid:13)(cid:13) Z n − √ n ( (cid:98) θ n − θ ∗ ) (cid:13)(cid:13) F − (cid:13)(cid:13) Z C n − Z n (cid:13)(cid:13) F = o P (1) . (7.15)Moreover, according to the equivalent deﬁnition of the F -orthogonal projection in (2.6), we get (cid:13)(cid:13) Z n − √ n ( (cid:98) θ n − θ ∗ ) (cid:13)(cid:13) F = (cid:13)(cid:13) Z C n − √ n ( (cid:98) θ n − θ ∗ ) (cid:13)(cid:13) F + (cid:13)(cid:13) Z C n − Z n (cid:13)(cid:13) F − ¨ Z C n − √ n ( (cid:98) θ n − θ ∗ ) , Z C n − Z n ∂ ≥ (cid:13)(cid:13) Z C n − √ n ( (cid:98) θ n − θ ∗ ) (cid:13)(cid:13) F + (cid:13)(cid:13) Z C n − Z n (cid:13)(cid:13) F . Inference and model selection in general causal time series with exogenous covariates

Therefore, from (7.15), we obtain (cid:13)(cid:13) Z C n − √ n ( (cid:98) θ n − θ ∗ ) (cid:13)(cid:13) F ≤ (cid:13)(cid:13) Z n − √ n ( (cid:98) θ n − θ ∗ ) (cid:13)(cid:13) F − (cid:13)(cid:13) Z C n − Z n (cid:13)(cid:13) F = o P (1) . (7.16)Now, using Lemma 7.3 (i.), we apply the central limit theorem for the stationary ergodic martingale diﬀerencesequence (cid:8) ∂q t ( θ ∗ ) ∂θ |F t − , t ∈ Z (cid:9) . It follows that1 √ n ∂L n ( θ ∗ ) ∂θ = 1 √ n n (cid:88) t =1 ∂q t ( θ ∗ ) ∂θ D −→ n →∞ N d (0 , G ) , (7.17)and thus Z n = F − √ n ∂L n ( θ ∗ ) ∂θ D −→ n →∞ Z ∼ N d (cid:0) , F − GF − (cid:1) . (7.18)Hence, Z C n D −→ n →∞ Z C . From this, it suﬃces to use (7.16) to conclude the proof of Theorem 2.3. (cid:4) Proof of Lemma 7.4.

Recall that R n ( θ ) = 1 n (cid:110) ∂ (cid:98) L n ( θ ∗ ) ∂θ (cid:48) − ∂L n ( θ ∗ ) ∂θ (cid:48) (cid:111) ( θ − θ ∗ ) + 12 ( θ − θ ∗ ) (cid:48) (cid:16) n ∂ ∂θ∂θ (cid:48) (cid:98) L n (˜ θ ) + F (cid:17) ( θ − θ ∗ ) . According to Lemmas 7.2 (i.) and 7.3 (ii.), when ˜ θ n − θ ∗ = o P (1), we have nR n (˜ θ n ) = o P ( √ n (˜ θ n − θ ∗ )) + o P ( n (cid:107) ˜ θ n − θ ∗ (cid:107) ) . (7.19)This implies nR n (˜ θ n ) = o P (1) when √ n (˜ θ n − θ ∗ ) = O P (1) . (7.20)It comes from the deﬁnition of θ Z n that (cid:13)(cid:13) √ n ( θ Z n − θ ∗ ) (cid:13)(cid:13) F ≤ (cid:13)(cid:13) √ n ( θ Z n − θ ∗ ) − Z n (cid:13)(cid:13) F + (cid:107) Z n (cid:107) F ≤ (cid:107) Z n (cid:107) F . Moreover, the convergence in (7.18) implies (cid:107) Z n (cid:107) F = O P (1); and consequently, √ n ( θ Z n − θ ∗ ) = O P (1). Thus, nR n ( θ Z n ) = o P (1) by virtue (7.20).We now show that, it also holds nR n ( (cid:98) θ n ) = o P (1). From (7.13), we have (cid:107) Z n (cid:107) F − (cid:13)(cid:13) Z n − √ n ( (cid:98) θ n − θ ∗ ) (cid:13)(cid:13) F + 2 nR n ( (cid:98) θ n ) = 2 { (cid:98) L n ( (cid:98) θ n ) − (cid:98) L n ( θ ∗ ) } ≥ , where the inequality holds since (cid:98) θ n = argmax θ ∈ Θ ( (cid:98) L n ( θ )). Thus, it holds that (cid:13)(cid:13) √ n ( (cid:98) θ n − θ ∗ ) (cid:13)(cid:13) F ≤ (cid:0)(cid:13)(cid:13) Z n − √ n ( (cid:98) θ n − θ ∗ ) (cid:13)(cid:13) F + (cid:107) Z n (cid:107) F (cid:1) ≤ (cid:107) Z n (cid:107) F + 4 nR n ( (cid:98) θ n ) . Furthermore, since (cid:98) θ n a.s. −→ n →∞ θ ∗ , by (7.19), it follows that nR n ( (cid:98) θ n ) = o P (cid:0)(cid:13)(cid:13) √ n ( (cid:98) θ n − θ ∗ ) (cid:13)(cid:13) F (cid:1) . Consequently, √ n ( (cid:98) θ n − θ ∗ ) = O P (1), and nR n ( (cid:98) θ n ) = o P (1) holds according to (7.20). This achieves the proof of the lemma. (cid:4) iop and Kengne Under H , we have Γ (cid:98) θ n − ϑ = Γ( (cid:98) θ n − θ ∗ ). Then, we get W n = n (Γ (cid:98) θ n − ϑ ) (cid:48) (Γ (cid:98) Σ n Γ (cid:48) ) − (Γ (cid:98) θ n − ϑ )= n ( (cid:98) θ n − θ ∗ ) (cid:48) Γ (cid:48) (Γ (cid:98) Σ n Γ (cid:48) ) − Γ( (cid:98) θ n − θ ∗ )= √ n ( (cid:98) θ n − θ ∗ ) (cid:48) Γ (cid:48) (ΓΣΓ (cid:48) ) − Γ √ n ( (cid:98) θ n − θ ∗ )+ √ n ( (cid:98) θ n − θ ∗ ) (cid:48) Γ (cid:48) Ä (Γ (cid:98) Σ n Γ (cid:48) ) − − (ΓΣΓ (cid:48) ) − ä Γ √ n ( (cid:98) θ n − θ ∗ ) . (7.21)Recall that, by Theorem 2.4, we have √ n ( (cid:98) θ n − θ ∗ ) D −→ n →∞ Z C with Z ∼ N d (0 , Σ).Furthermore, (Γ (cid:98) Σ n Γ (cid:48) ) − − (ΓΣΓ (cid:48) ) − = o P (1). Thus, from (7.21), it holds that W n = √ n ( (cid:98) θ n − θ ∗ ) (cid:48) Γ (cid:48) (ΓΣΓ (cid:48) ) − Γ √ n ( (cid:98) θ n − θ ∗ ) + o P (1) D −→ n →∞ (Γ Z C ) (cid:48) (ΓΣΓ (cid:48) ) − Γ Z, which establishes the theorem. (cid:4) Proof of Corollary 2.5.

When θ ∗ ∈ ◦ Θ, we have W n D −→ n →∞ Z (cid:48) Γ (cid:48) (ΓΣΓ (cid:48) ) − Γ Z = (cid:107) U (cid:107) with U = (ΓΣΓ (cid:48) ) − / Γ Z and Z ∼ N d (0 , Σ). SinceΣ is symmetric, the vector U follows a multivariate Gaussian distribution with mean 0 and covariance matrix I d , where I d is the identity matrix of size d . Therefore, all components of U are independent, standardnormal distributed random variables. This leads to the conclusion. (cid:4) Consider the following lemma.

Lemma 7.5

Assume that the conditions of Theorem 3.1 hold. Then √ n log log n (cid:13)(cid:13)(cid:13) ∂ (cid:98) L n ( θ ) ∂θ − ∂L n ( θ ) ∂θ (cid:13)(cid:13)(cid:13) Θ a.s. −→ n →∞ . Proof of Lemma 7.5.

Using the inequality (7.6) and Corollary 1 of of Kounias and Weng (1969), it suﬃces to show that (cid:88) k ≥ √ k log log k E ï (cid:13)(cid:13)(cid:13) ∂ (cid:98) q k ( θ ) ∂θ − ∂q k ( θ ) ∂θ (cid:13)(cid:13)(cid:13) Θ ò < ∞ . (7.22)In the proof of Lemma 7.3, we have established that (cid:13)(cid:13)(cid:13) ∂ (cid:98) q k ( θ ) ∂θ − ∂q k ( θ ) ∂θ (cid:13)(cid:13)(cid:13) Θ ≤ C (cid:88) j ≥ k (cid:110) α (0) j,Y ( f θ , Θ) + α (0) j,X ( f θ , Θ) + α (1) j,Y ( f θ , Θ) + α (1) k,X ( f θ , Θ)+ α (0) j,Y ( M θ , Θ) + α (0) j,X ( M θ , Θ) + α (1) j,Y ( M θ , Θ) + α (1) j,X ( M θ , Θ) (cid:111) = C (cid:88) j ≥ k (cid:88) i =0 ¶ α ( i ) j,Y ( f θ , Θ) + α ( i ) j,X ( f θ , Θ) + α ( i ) j,Y ( M θ , Θ) + α ( i ) j,X ( M θ , Θ) © . Inference and model selection in general causal time series with exogenous covariates

Then, from the condition (3.4), we obtain (cid:88) k ≥ √ k log log k E ï (cid:13)(cid:13)(cid:13) ∂ (cid:98) q k ( θ ) ∂θ − ∂q k ( θ ) ∂θ (cid:13)(cid:13)(cid:13) Θ ò ≤ (cid:88) k ≥ √ k log log k (cid:88) j ≥ k (cid:88) i =0 (cid:8) α ( i ) j,Y ( f θ , Θ) + α ( i ) j,X ( f θ , Θ) + α ( i ) j,Y ( M θ , Θ) + α ( i ) j,X ( M θ , Θ) (cid:9) < ∞ . Hence, (7.22) is satisﬁed, and Lemma 7.5 holds. (cid:4)

Let us prove the part (i.) of the theorem.(i.) We have P ( (cid:98) m n = m ∗ ) = 1 − P ( (cid:98) m n (cid:41) m ∗ ) − P ( (cid:98) m n (cid:43) m ∗ ) . Therefore, it suﬃces to show thatlim n →∞ P ( (cid:98) m n (cid:41) m ∗ ) = lim n →∞ P ( (cid:98) m n (cid:43) m ∗ ) = 0 . (7.23)1. Let m ∈ M such as m (cid:41) m ∗ . We have,1 √ log log n (cid:0) (cid:98) C ( m ∗ ) − (cid:98) C ( m ) (cid:1) = 2 √ log log n (cid:0)(cid:98) L n (cid:0)(cid:98) θ ( m ) (cid:1) − (cid:98) L n (cid:0)(cid:98) θ ( m ∗ ) (cid:1) − κ n √ log log n ( | m | − | m ∗ | ) . (7.24)Let us establish that 1 √ log log n (cid:0)(cid:98) L n (cid:0)(cid:98) θ ( m ) (cid:1) − (cid:98) L n (cid:0)(cid:98) θ ( m ∗ ) (cid:1)(cid:1) = O P (1) . (7.25)From the Taylor expansion of (cid:98) L n , we can ﬁnd θ ( m ) between (cid:98) θ ( m ) and θ ∗ such that (cid:98) L n (cid:0)(cid:98) θ ( m ) (cid:1) − (cid:98) L n (cid:0) θ ∗ (cid:1) = ∂L n ( θ ∗ ) ∂θ (cid:0)(cid:98) θ ( m ) − θ ∗ (cid:1) − √ n (cid:0)(cid:98) θ ( m ) − θ ∗ (cid:1) (cid:48) F ( θ ∗ , m ) √ n (cid:0)(cid:98) θ ( m ) − θ ∗ (cid:1) + nR (cid:48) n ( m ) , (7.26)where R (cid:48) n ( m ) = 1 n (cid:110) ∂ (cid:98) L n ( θ ∗ ) ∂θ (cid:48) − ∂L n ( θ ∗ ) ∂θ (cid:48) (cid:111)(cid:0)(cid:98) θ ( m ) − θ ∗ (cid:1) + 12 (cid:0)(cid:98) θ ( m ) − θ ∗ (cid:1) (cid:48) (cid:16) n ∂ ∂θ∂θ (cid:48) (cid:98) L n ( θ ( m )) + F ( θ ∗ , m ) (cid:17)(cid:0)(cid:98) θ ( m ) − θ ∗ (cid:1) and F ( θ ∗ , m ) = (cid:16) E (cid:104) ∂ q ( θ ∗ ) ∂θ i ∂θ j (cid:105)(cid:17) i,j ∈ m . Moreover, since (cid:98) θ ( m ) , θ ( m ) a.s. −→ n →∞ θ ∗ , in this case of overﬁtting, the same arguments as in the proof ofLemma 7.3 (ii.) lead to − n ∂ (cid:98) L n (cid:0) θ ( m ) (cid:1) ∂θ∂θ (cid:48) a.s. −→ n →∞ F ( θ ∗ , m ) . Then, one can show as in the proof of Theorem 2.3 that nR (cid:48) n ( m ) = o P (1). Also, we have √ n (cid:0)(cid:98) θ ( m ) − θ ∗ (cid:1) = O P (1). In addition, (cid:8) ∂q t ( θ ∗ ) ∂θ |F t − , t ∈ Z (cid:9) is a stationary ergodic square integrable martingale diﬀerence iop and Kengne √ n log log n ∂L n ( θ ∗ ) ∂θ = O (1) . Thus, we have from (7.26),1 √ log log n (cid:0)(cid:98) L n (cid:0)(cid:98) θ ( m ) (cid:1) − (cid:98) L n (cid:0) θ ∗ (cid:1)(cid:1) = 1 √ n log log n ∂L n ( θ ∗ ) ∂θ √ n (cid:0)(cid:98) θ ( m ) − θ ∗ (cid:1) + 12 √ log log n √ n (cid:0)(cid:98) θ ( m ) − θ ∗ (cid:1) (cid:48) F ( θ ∗ , m ) √ n (cid:0)(cid:98) θ ( m ) − θ ∗ (cid:1) + 1 √ log log n nR (cid:48) n ( m )= O (1) O P (1) + o (1) O P (1) O P (1) + o P (1) = O P (1) . (7.27)By using the same arguments with m = m ∗ , we get1 √ log log n (cid:0)(cid:98) L n (cid:0)(cid:98) θ ( m ∗ ) (cid:1) − (cid:98) L n (cid:0) θ ∗ ) (cid:1) = O P (1) . (7.28)Hence, (7.25) holds from (7.27) and (7.28).Therefore, since κ n / √ log log n −→ n →∞ ∞ and | m | > | m ∗ | , then (7.24) and (7.25) lead to1 √ log log n (cid:0) (cid:98) C ( m ∗ ) − (cid:98) C ( m ) (cid:1) P −→ n →∞ − ∞ . This implies that, for large n , (cid:98) C ( m ) − (cid:98) C ( m ∗ ) > P ( (cid:98) m n (cid:41) m ∗ ) −→ n →∞ m ∈ M such as m (cid:43) m ∗ . We have,1 n (cid:0) (cid:98) C ( m ∗ ) − (cid:98) C ( m ) (cid:1) = 2 n (cid:0)(cid:98) L n (cid:0)(cid:98) θ ( m ) (cid:1) − (cid:98) L n (cid:0)(cid:98) θ ( m ∗ ) (cid:1)(cid:1) − κ n n ( | m | − | m ∗ | ) . (7.29)Using the same arguments in the proof of Theorem 3.1 of Bardet et al. (2020), we get1 n (cid:0)(cid:98) L n (cid:0)(cid:98) θ ( m ) (cid:1) − (cid:98) L n (cid:0)(cid:98) θ ( m ∗ ) (cid:1)(cid:1) = L ( θ ∗ ( m )) − L ( θ ∗ ) + o (1) a.s., where L ( θ ) = − E [ q ( θ )], for all θ ∈ Θ. Note that, the function L : Θ → R has a unique maximumat θ ∗ (see the proof of Theorem 2.2). Since m (cid:43) m ∗ , it holds that θ ∗ / ∈ Θ( m ); and consequently, L ( θ ∗ ( m )) − L ( θ ∗ ) < a.s. . Thus, according to (7.29) and since κ n /n −→ n →∞

0, we getlim n →∞ n (cid:0) (cid:98) C ( m ∗ ) − (cid:98) C ( m ) (cid:1) < a.s. and (cid:98) C ( m ) − (cid:98) C ( m ∗ ) > a.s. for large n. This implies that P ( (cid:98) m n (cid:43) m ∗ ) −→ n →∞

0. Hence, the condition (7.23) holds; and the part (i.) of thetheorem is established.0

Inference and model selection in general causal time series with exogenous covariates (ii.) Let m ∈ M such as m (cid:41) m ∗ . We have1log log n (cid:0) (cid:98) C ( m ) − (cid:98) C ( m ∗ ) (cid:1) = 2log log n (cid:0)(cid:98) L n (cid:0)(cid:98) θ ( m ∗ ) (cid:1) − (cid:98) L n (cid:0)(cid:98) θ ( m ) (cid:1)(cid:1) + κ n log log n ( | m | − | m ∗ | ) . Moreover, from the same arguments as in the proof of Theorem 3.1 in Kengne (2021), one can show that1log log n (cid:0)(cid:98) L n (cid:0)(cid:98) θ ( m ∗ ) (cid:1) − (cid:98) L n (cid:0)(cid:98) θ ( m ) (cid:1)(cid:1) = O (1) a.s. Thus, we can ﬁnd a constant c such that if lim inf n →∞ κ n / log log n > c , thenlim inf n →∞ n (cid:0) (cid:98) C ( m ) − (cid:98) C ( m ∗ ) (cid:1) > a.s. This implies that (cid:98) C ( m ) − (cid:98) C ( m ∗ ) > a.s. for large n. (7.30)Note that, the inequality (7.30) also holds when m (cid:43) m ∗ (see the part 2. of the proof of (i.)). Hence,we deduce that (cid:98) m n = argmin m ∈M (cid:98) C ( m ) = argmin m ∈M (cid:0) (cid:98) C ( m ) − (cid:98) C ( m ∗ ) (cid:1) a.s. −→ n →∞ m ∗ ; which establishes the strongconsistency of (cid:98) m n .(iii.) Using Lemma 7.5, this part can be proved by going along similar lines as in Kengne (2021). (cid:4) References [1]

Bardet, J. M., Kengne, K. and Wintenberger, O.

Multiple breaks detection in general causal timeseries using penalized quasi-likelihood.

Electronic Journal of Statistics 6 , (2012), 435-477.[2]

Bardet, J.-M. and Wintenberger, O.

Asymptotic normality of the quasi-maximum likelihood esti-mator for multidimensional causal processes.

The Annals of Statistics 37, 5B , (2009), 2730-2759.[3]

Bardet, J.-M., Kamila, K. and Kengne, W.

Consistent model selection criteria and goodness-of-ﬁttest for common time series models.

Electron. J. Stat. 14 , (2020), 2009-2052.[4]

Berkes, I. Horv´ath, L. and Kokoszka, P.

GARCH processes: Structure and estimation.

Bernoulli9 , (2003), 201-227.[5]

Bierens H.J.

Estimation, testing and speciﬁcation of cross-section and time series models.

CambridgeUniversity Press , (1994).[6]

Doukhan, P. and Wintenberger, O.

Weakly dependent chains with inﬁnite memory.

StochasticProcess. Appl. 118 , (2008), 1997-2013.[7]

Francq, C. and Sucarrat, G.

An equation-by-equation estimator of a multivariate log-GARCH-Xmodel of ﬁnancial returns.

Journal of Multivariate Analysis, 153 , (2017), 16-32.[8]

Francq, C. and Thieu, L.Q.

QML inference for volatility models with covariates.

Econometric Theory,35 , (2019), 37-72. iop and Kengne

Francq, C. and Zako¨ıan, J.M.

Maximum likelihood estimation of pure GARCH and ARMA-GARCHprocesses.

Bernoulli 10 , (2004), 605-637.[10]

Grønneberg, S. and Holcblat, B.

Factor double autoregressive models with application to simulta-neous causality testing.

Annals of Statistics 47(6) , (2019), 3216–3243.[11]

Guo, S., Ling S. and Zhu, K.

Factor double autoregressive models with application to simultaneouscausality testing.

Journal of Statistical Planning and Inference 148 , (2014), 82-94.[12]

Han, H. and Kristensen, D.

Asymptotic Theory for the QMLE in GARCHX Models With Stationaryand Nonstationary Covariates.

Journal of Business & Economic Statistics 32 , (2014) , 416-429.[13]

Han, H.

Asymptotic properties of GARCH-X processes.

Journal of Financial Econometrics, 13 , (2015),188-221[14]

Hannan, E.J.

The identiﬁcation and parameterization of ARMAX and state space forms.

Econometrica:Journal of the Econometric Society , (1976), 713–723.[15]

Hannan, E.J., Dunsmuir, W.T.M. and Deistler, M.

Estimation of Vector ARMAX Models.

Journalof Multivariate Analysis, 10 , (1980), 275-295.[16]

Hannan, E. J. and Deistler, M.

The statistical theory of linear systems.

SIAM , (2012).[17]

Hoque, A. and Peters, T.A.

Finite Sample Analysis of Least Squares in ARMAX Models.

The IndianJournal of Statistics, B 48 , (1986), 266-283.[18]

Kengne, W.

Strongly consistent model selection for general causal time series.

Statistics and ProbabilityLetters , (2021).[19]

Kounias, E.G. and Weng, T.-S.

An inequality and almost sure convergence.

Annals of MathematicalStatistics 33 , (1969), 1091-1093.[20]

Ling, S. and McAleer, M.

Asymptotic theory for a vector ARMA-GARCH model.

Econometric theory19(02) (2003), 280-310.[21]

Ling, S.

Adouble AR( p ) model: structure and estimation. Statist. Sinica 17 , (2007), 161-175.[22]

Nana, G.N., Korn, R. and Elwein-Sayer, C.

GARCH-extended models: theoretical properties andapplications. arXiv:1307.6685v1 , (2013).[23]

Pedersen, R.S. and Rahbek, A.

Testing Garch-X Type Models.

Econometric Theory 35(5) , (2018),1-36.[24]

Stout, W. F.

The Hartman-Wintner law of the iterated logarithm for martingales.

The Annals ofMathematical Statistics 41 (1970), 2158-2160.[25]

Stout, W. F.

Almost sure convergence.

Academic press (1974).[26]

Sucarrat, Genaro and Grønneberg, S. and Escribano, A.

Testing for local covariate trend eﬀectsin volatility models.

Computational Statistics & Data Analysis 100 (2016), 582-594.2

Inference and model selection in general causal time series with exogenous covariates [27]

Zambom, A. Z. and Gel, Y. R.

Testing for local covariate trend eﬀects in volatility models.