Wilks' theorem for semiparametric regressions with weakly dependent data
Marie Du Roy de Chaumaray, Matthieu Marbac, Valentin Patilea
WWilks’ theorem for semiparametric regressionswith weakly dependent data
Marie Du Roy de Chaumaray ∗ , Matthieu Marbac ∗ and Valentin Patilea ∗ June 12, 2020
Abstract
The empirical likelihood inference is extended to a class of semiparametric models for stationary,weakly dependent series. A partially linear single-index regression is used for the conditional mean ofthe series given its past, and the present and past values of a vector of covariates. A parametric modelfor the conditional variance of the series is added to capture further nonlinear effects. We proposea fixed number of suitable moment equations which characterize the mean and variance model. Wederive an empirical log-likelihood ratio which includes nonparametric estimators of several functions,and we show that this ratio has the same limit as in the case where these functions are known. keywords : α − mixing; Dimension reduction; Kernel smoothing; Nonlinear time series; Nuisancefunction; Pivotal statistic. Our aim is to model and to make inference on a one-dimensional time series ( Y i ) given a vector-valuedtimes series ( V i ) and the past values of Y i and V i , i ∈ Z . For this purpose we propose flexible semipara-metric models for the conditional mean and conditional variance of Y i . Formally, let ( Z i ) be a strictlystationary and strongly mixing sequence of random vectors with Z i = ( V > i , ε i ) > ∈ R d X + d W × R where V i = ( X > i , W > i ) > ∈ R d X × R d W . Let ( F i ) be its natural filtration. For any positive integer r , we denotethe r lagged values of Z i by Z { r } i = ( V > i − , Y i − , . . . , V > i − r , Y i − r ) > .Let us consider the semiparametric model defined by Y i = µ ( V i ; γ, m ) + ε i with µ ( V i ; γ, m ) = l ( X i ; γ ) + m ( W > i γ ) , (1)where E [ ε i | V i , F i − ] = 0 , (2)and E [ ε i | V i , F i − ] = σ ( V i , Z { r } i ; β ) , (3) γ = ( γ > , γ > ) > , θ = ( γ > , β > ) > and m ( · ) is an infinite dimensional parameter. Thus θ gathers the finitedimensional parameters, and our interest will focus on this vector, while m ( · ) is considered a nuisanceparameter. The value of r , as well as the real-valued functions l ( · ) and σ ( · ), are given. Moreover,the functions we consider for σ ( · ) do not require knowing the infinite dimensional parameter m ( · ).Let θ and m ( · ) denote the true values of the finite and infinite-dimensional parameters of the model,respectively. The vector V i may include common random variables and/or lagged values of Y i , as wellas exogenous covariates. We call a model defined by (1)-(3) a CHPLSIM which stands for ConditionalHeteroscedastic Partially Linear Single-Index Model . The methodology we will propose in the followingallows us to replace (3) by a higher order moment equation, or to add higher order moments to (3). Forthe sake of simplicity we keep (3) and we will only mention such possible extensions in the conclusionsection.CHPLSIM is related to the model proposed by Lian et al. (2015) in the case of independent obser-vations following the same distribution. Our model covers a wide class of models for weakly dependentand independent data. First, with l ( X i ; γ ) = X > i γ , CHPLSIM includes the partially linear single-indexmodel (PLSIM) (Carroll et al., 1997) in which the errors ε i are independent and identically distributed1 a r X i v : . [ s t a t . M E ] J un i.i.d.) variables and V i are independent covariates. Such semiparametric models were originally usedto overcome the curse of dimensionality inherent to nonparametric regression on W i by making use ofa single-index W > i γ . The PLSIM includes the partially linear models with a single variable in thenonparametric part. Our non-i.i.d. framework allows for heteroscedasticity in the errors of PLSIM, withthe conditional variance of the errors possibly depending of both the covariates and the lagged errorsvalues. For instance, it allows martingale difference errors, as considered by Chen and Cui (2008) andFan and Liang (2010). Xia et al. (1999) considered a model defined by (1) for strongly mixing stationarytime series, with the identity function l ( · ), X i = W i and W i admitting a density. Their study focuseson the estimation of the parameters in the conditional mean function using kernel smoothing, withoutinvestigating the conditional variance, as condition (3) allows. In the same type of model, using locallinear smoothing, Xia and H¨ardle (2006) allowed for X i not necessarily being equal to W i and, at theprice of a trimming, relaxed the condition for a density for W i to a density for the index W > i γ . Morerecently, using orthogonal series expansions, Dong et al. (2016) extended the model defined by (1) to thecase where X i = W i is a multi-dimensional integrated process.Model (1)-(2) is also related to and extends a large class of location-scale type models called con-ditional heteroscedastic autoregressive nonlinear (CHARN) models (H¨ardle et al., 1998; Kanai et al.,2010). CHARN models include many well-known models widely used with application areas as differ-ent as foreign exchange rates (Bossaerts et al., 1996) or brain and muscular wave analysis (Kato et al.,2006). For general nonlinear autoregressive processes, we refer to the book of Tong (1990) for the basicdefinitions as well as numerous applications on real data sets. More generally, nonparametric techniquesfor nonlinear AR processes can be found in the review of H¨ardle et al. (1997). CHPLSIM allows for asemiparametric specification of the conditional mean and for exogenous covariates.We are interested in inference on the finite dimensional parameter θ constituted of finite-dimensionalparameters from both the conditional mean and the conditional variance functions. When interest fo-cuses on the parameters of the conditional mean, it suffices to consider equations (1)-(2) with a fullynonparametric conditional variance σ ( · ). However, in the time series context, modeling the variance canbe important, for instance for forecasting purposes. For our inference purpose, we propose a semipara-metric empirical likelihood approach with infinite-dimensional nuisance parameters. Empirical likelihood(EL), introduced by Owen (1988, 2001), is a general inference approach for models specified by momentconditions. Under the assumption of independence between observations, empirical likelihood has beenused for inference on finite dimensional parameters into regression models and unconditional momentequations. See Qin and Lawless (1994); see also the review of Chen and Van Keilegom (2009).Under i.i.d. data assumption, Wang and Jing (1999, 2003) and Lu (2009) studied the conditionsimplying that the empirical likelihood log-ratio (ELR) still converges to a chi-square distribution for thepartially linear model. Due to the curse of the dimensionality, the performances of the non parametricestimators decrease dramatically with the number of variables. Xue and Zhu (2006) and Zhu and Xue(2006) show that, if the density of the index is bounded away from zero, the ELR converges to a chi-squaredistribution and thus permits parameter testing, for the single-index model and PLSIM respectively (seealso Zhu et al. (2010)).The aim of this paper is to propose a novel, general, semiparametric regression framework for ELinference with weakly dependent data. Some related cases have been considered in the literature. Forinstance, the ELR with longitudinal data has been considered by Xue and Zhu (2007), for the partiallylinear model, and by Li et al. (2010), for PLSIM. In their framework, the convergence of the ELR isguaranteed by the independence between individuals for which a finite bounded number of repeatedobservations are available. Empirical likelihood has also been used for specific models in times series (seethe review of Nordman and Lahiri (2014)). Most of the methods developed in this context are based on ablockwise version of empirical likelihood, first introduced by Kitamura (1997), in which consecutive datapoints are grouped into blocks. A large amount of generalizations have been proposed in the literaturedepending on the type of dependency under consideration. We refer to Nordman and Lahiri (2014) foran overview of those blocking techniques. However, in such an approach, one has to tune additionalparameters as the number, or the length or the overlapping of the blocks, which may be a complex task.Our contribution is the extension of the EL inference approach to the case of CHPLSIM defined by(1)-(3), for weakly dependent data. This extension is developed without imposing the density of theindex bounded away from zero, as is usually assumed in the literature in the case of i.i.d. data. See, forinstance, Zhu and Xue (2006), Zhu et al. (2010) and Lian et al. (2015). Such a very convenient, thoughquite stringent, condition implies a bounded support for the index, a restriction which makes practically2o sense in a general time series framework. To obtain our results, a preliminary crucial step beforeusing EL consists of building a fixed number of suitable unconditional moment equations equivalent toconditional moment equations defining the regression model. We then follow the lines of Qin and Lawless(1994), with the difference of the presence of infinite-dimensional nuisance parameters. We show that thenonparametric estimation of the nuisance parameters does not affect the asymptotics and the ELR stillconverges to a chi-square distribution. The negligibility of the nonparametric estimation effect is obtainedunder mild conditions on the smoothing parameter. Chang et al. (2015) studied the EL inference forunconditional moment equations under strongly mixing conditions, with the number of moment equationsallowed to increase with the sample size. Since, in principle, conditional moment equation models couldbe approximated by models defined by a large number of unconditional moment equations, Chang et al.(2015) could also consider semiparametric models. However, the practical effectiveness of their approachremains an uninvestigated issue.The article is organized as follows. In Section 2 we consider the profiling approach for the nuisanceparameter m ( · ) and the identification issue for the finite-dimensional parameters. Next, we establish theequivalence between our model equations and suitable unconditional moment estimating equations inSection 3. The number of unconditional equations is given by the dimension of the vector of identifiableparameters in the (CH)PLSIM. Section 4 presents the ELR and the Wilks’ Theorem in our context.Section 5 illustrates the methodology by numerical experiments and an application using daily pollutiondata inspired by the study of Lian et al. (2015). Section 6 contains some additional discussion. Theproofs and mathematical details are presented in Appendix. Let g µ ( Z i ; γ, m ) = Y i − µ ( V i ; γ, m ) , with µ ( · ) defined in (1). The partially linear, single index model (PLSIM) is defined by the conditionalmoment equation E [ g µ ( Z i ; γ, m ) | V i , F i − ] = 0 ⇐⇒ γ = γ and m = m . (4)In such case, the conditional variance of the residuals has to be finite but does not necessarily have aparametric form.The conditionally heteroscedastic, partially linear, single index model (CHPLSIM) is defined by twoconditional moment equations. For this case, we assume that the second-order conditional moment of theresiduals has a semiparametric form. More precisely, the model is defined by the following conditionalmoment equations (cid:26) E [ g µ ( Z i ; γ, m ) | V i , F i − ] = 0 E [ g σ ( Z i , Z { r } i ; θ, m ) | V i , F i − ] = 0 ⇐⇒ θ = θ and m = m , (5)where g σ ( Z i , Z { r } i ; θ, m ) = g µ ( Z i ; γ, m ) − σ ( V i , Z { r } i ; β ) , (6)with σ ( · ) defined in (3). The model defined by (1)-(2) requires a methodology for estimating θ and m , with m being in a func-tion space. A common approach, that avoids a simultaneous search involving an infinite-dimensionalparameter, is the profiling (Severini and Wong, 1992; Liang et al., 2010), which defines m γ ( t ) = E [ Y i − l ( X i ; γ ) | W > i γ = t ] , t ∈ R . As usual with such an approach, in the following it will be assumed that m γ ( W > i γ , ) = m ( W > i γ , ) . (7)3ence, one expects that, for each x, w , the value γ gives the minimum of γ E [ { Y i − l ( x i ; γ ) − m γ ( w > i γ ) } | X i = x i , W i = w i , F i − ] . However, even if m γ ( · ) is well defined for any γ = ( γ > , γ > ) > ∈ Γ ⊂ R d × R d W , in general the value γ could not be the unique parameter value with this minimum property. More precisely, in general thetrue value of the vector γ is not identifiable and only its direction could be consistently estimated. Thestandard remedies to this identifiability issue are detailed in the following. Concerning the identification of γ ∈ R d , a minimal requirement is that as soon as l ( X i ; γ ) = l ( X i ; γ )a.s., then necessarily γ = γ . For instance, when l ( X i ; γ ) = X > i γ , and thus d = d X , then necessarily E ( X i X > i ) is invertible. The nonparametric part m γ ( · ) induces some more constraints. It could absorbany intercept in the model equation. Thus, in particular, when l ( X i ; γ ) = X > i γ , the vectors X i and W i should not contain constant components.There are two common approaches to restrict γ for identification purposes: either fix one componentequal to 1 (Ma and Zhu, 2013), or set the norm of γ equal to 1 and the sign of one of its components(Zhu and Xue, 2006). Without loss of generality, we choose the first component of γ to impose theconstraints of value or sign. When the value of the first component is fixed, the parameter γ could beredefined as γ = (1 , e γ > ) > where e γ ∈ R d W − . The Jacobian matrix of this reparametrization of γ isthe d W × ( d W −
1) matrix J ( γ ) = ∂γ ∂ e γ = (cid:18) × ( d W − I d W − (cid:19) , (8)where here × ( d W − denotes the null 1 × ( d W − − matrix, while I d W − is the ( d W − × ( d W − γ = (cid:16)p − k e γ k , e γ > (cid:17) > , where now e γ ⊂ { z ∈ R d W − : k z k ≤ } . The Jacobian matrix of this reparametrization using thenormalization of γ is the d W × ( d W −
1) matrix J ( γ ) = ∂γ ∂ e γ = (cid:18) −{ − k e γ k } − / e γ > I d W − (cid:19) . (9)Hereafter, when we refer to the true value of the finite-dimensional parameter, we implicitly assume thatone of these two approaches was chosen for identifying γ . This section presents unconditional moment equations which permits parameter inference to be achievedby using empirical likelihood. For ease of explanation, we start by introducing an unconditional momentequation which is equivalent to the conditional moment equation of the PLSIM defined in (4). We thenintroduce an unconditional moment equation which is equivalent to the conditional moment equation ofthe CHPLSIM defined in (5).
For the PLSIM, it is quite standard (Zhu and Xue, 2006) to consider the following unconditional equation E [ g µ ( Z i ; γ, m γ ) e ∇ γ g µ ( Z i ; γ, m γ )] = 0 , (10)where γ = ( γ > , γ > ) > ∈ R d γ , d γ = d + d W , and e ∇ γ g µ ( Z i ; γ, m γ ) = J ( γ ) ∇ γ g µ ( Z i ; γ, m γ ) ∈ R d γ − , J ( γ ) the ( d γ − × d γ Jacobian matrix of the reparametrization chosen to guarantee the identificationof the finite-dimensional parameter and ∇ γ ( ∇ γ , resp. ) the column matrix-valued operator of the firstorder partial derivatives with respect to the components of γ ∈ R d γ ( γ ∈ R d , resp.). In our context, ∇ γ g µ ( Z i ; γ, m γ ) = − (cid:20) ∇ γ l ( X i ; γ ) − E [ ∇ γ l ( X i ; γ ) | W > i γ ] m ( W > i γ ) (cid:0) W i − E [ W i | W > i γ ] (cid:1) (cid:21) and J ( γ ) = (cid:18) I d d × ( d W − d W × d J ( γ ) (cid:19) , with m ( · ) the derivative of m ( · ) and J ( γ ), the Jacobian matrix of the parametrization of γ , that iseither the matrix defined in (8) or that defined in (9).The following lemma proposes a new unconditional moment equation by introducing a positive weight-ing function ω ( V i ) in (10). Showing the equivalence between the conditional moment equation (4) and ournew unconditional moment equation, we deduce that the latter equation could be used for EL inference. Lemma 1.
Let ω ( · ) be a positive function of V i = ( X > i , W > i ) > and H µ ( γ ) be the Hessian matrix of themap γ E [ E [ g µ ( Z i ; γ, m γ ) | V i , F i − ] ω ( V i )] . Assume that conditions (4) and (7) hold true and H µ ( γ ) is definite positive. Then E [ g µ ( Z i ; γ, m γ ) e ∇ γ g µ ( Z i ; γ, m γ ) ω ( V i )] = 0 a.s. ⇔ γ = γ . (11)For the PLSIM, we consider ω ( V i ) = η γ,f ( W > i γ ) where η γ,f ( W > i γ ) is the density of the index W > i γ ,which is assumed to exist. This choice of the weights ω ( V i ) allows us to cancel all the terms η γ,f ( W > i γ )appearing in the denominators, and thus to keep them away from zero. Thus, for the control of the smallvalues in the denominators, it is no longer needed to assume that the density of the index is boundedaway from zero. This assumption, often imposed in the semiparametric literature, is quite unrealistic forbounded vectors W i and could not even hold when the W i ’s are unbounded. Imposing bounded W i in atime series framework, where W i could include lagged values of Y i , would be too restrictive.Thus, we consider that the parameters are defined by the unconditional moment equations E [Ψ( Z i ; γ, η γ )] = 0 , (12)where Ψ( Z i ; γ, η γ ) = g µ ( Z i ; γ, m γ ) ˜ ∇ γ g µ ( Z i ; γ, m γ ) η γ,f ( W > i γ ) ∈ R d γ − . Thus, we haveΨ( Z i ; γ, η γ ) = (cid:0) { Y i − l ( X i ; γ ) } η γ,f ( W > i γ ) − η γ,m ( W > i γ ) (cid:1) × J ( γ ) (cid:20) η γ,f ( W > i γ ) (cid:0) ∇ γ l ( X i ; γ ) η γ,f ( W > i γ ) − η γ,X ( W > i γ ) (cid:1) η γ,m ( W > i γ ) (cid:0) W i η γ,f ( W > i γ ) − η γ,W ( W > i γ ) (cid:1) (cid:21) , (13)where the vector η γ = ( η γ,m , η γ,m , η γ,X , η γ,W , η γ,f ) > groups all the non-parametric elements and, usingthe stationarity of the process, is given by η γ,m ( t ) = m γ ( t ) η γ,f ( t ) = E [ Y i − l ( X i ; γ ) | W > i γ = t ] η γ,f ( t ) ,η γ,m ( t ) = η γ,f ( t ) ∂∂t m γ ( t ) = η γ,f ( t ) ∂∂t E [ Y i − l ( X i ; γ ) | W > i γ = t ] ,η γ,X ( t ) = E [ ∇ γ l ( X i ; γ ) | W > i γ = t ] η γ,f ( t ) ,η γ,W ( t ) = E [ W i | W > i γ = t ] η γ,f ( t ) , t ∈ R . For the CHPLSIM we have to construct an unconditional moment equation to take into account theconditional variance condition in (3). In this case, the finite-dimensional parameters are θ = ( γ > , β > ) > ∈ R d θ with d θ = d γ + d β . Given the definition (6), we have ∇ β g σ ( Z i , Z { r } i ; θ, m ) = −∇ β σ ( V i , Z { r } i ; β ) ∈ R d β . The following lemma provides the unconditional moment equations for EL inference in CHPLSIM. Theproof is similar to the proof of Lemma 1 and is thus omitted.5 emma 2.
Let ω ( · ) and ω ( · ) be positive functions of V i . Assume that conditions (5) and (7) hold trueand H µ ( γ ) and H σ ( β ) are definite positive. Then ( E [ g µ ( Z i ; γ, m γ ) e ∇ γ g µ ( Z i ; γ, m γ ) ω ( V i )] = 0 E [ g σ ( Z i , Z { r } i ; θ, m γ ) ∇ β σ ( V i , Z { r } i ; β ) ω ( V i )] = 0 ⇔ θ = θ . To cancel all the denominators induced by the non-parametric estimator, we take ω ( V i ) = η γ,f ( W > i γ )and ω ( V i ) = η γ,f ( W > i γ ). Thus, we consider that the parameters are defined by the unconditional mo-ment equations E [Ψ( Z i , Z { r } i ; θ, η γ )] = 0 , (14)where η γ is defined as in section 3 and Ψ( Z i , Z { r } i ; θ, η γ ) ∈ R d θ − withΨ( Z i , Z { r } i ; θ, η γ ) = g µ ( Z i ; γ, m γ ) e ∇ γ g µ ( Z i ; γ, m γ ) η γ,f ( W > i γ ) g σ ( Z i , Z { r } i ; θ, m γ ) ∇ β σ ( V i , Z { r } i ; β ) η γ,f ( W > i γ ) ! . (15) In the sequel, for EL inference in the CHPLSIM we use condition (14), while for EL inference in thePLSIM we use condition (12). With a slight abuse of notation, in the sequel we use the notationΨ( Z i , Z { r } i ; θ, η γ ), with some given integer r ≥
0, for both PLSIM and CHPLSIM conditions. Bydefinition, the case r = 0 corresponds to the case where Ψ( Z i , Z { r } i ; θ, η γ ) does not depend on the laggedvalues of Z i . This is the case for PLSIM, but this situation could also occur in CHPLSIM. Note that,by construction E h Ψ( Z i , Z { r } i ; θ , η )Ψ( Z j , Z { r } j ; θ , η ) > i = 0 , ∀ i = j. (16)If η γ is given, the empirical likelihood, obtained with the unconditional moment conditions we proposefor the (CH)PLSIM, is defined by L ( θ, η γ ) = max π ,...,π n n Y i =1 π i ( θ, η γ ) , where P ni =1 π i ( θ, η γ )Ψ( Z i , Z { r } i ; θ, η γ ) = 0, π i ( θ, η γ ) ≥ P ni =1 π i ( θ, η γ ) = 1. Thus, we have π i ( θ, η γ ) = 1 n
11 + λ ( θ, η γ ) > Ψ( Z i , Z { r } i ; θ, η γ ) , where λ ( θ, η γ ) ∈ R d + d W − are the Lagrange multipliers which allow the empirical counterpart of therestriction (14) to be satisfied, that is : n X i =1 π i ( θ, η γ )Ψ( Z i , Z { r } i ; θ, η γ ) = 0 . The empirical log-likelihood ratio is then defined by : ‘ n ( θ, η γ ) = n X i =1 ln(1 + λ ( θ, η γ ) > Ψ( Z i , Z { r } i ; θ, η γ )) . As the infinite-dimensional parameter η γ is unknown, nonparametric estimation using kernel smoothingis used instead. Thus, we propose ‘ n ( θ, b η γ ) = n X i =1 ln (cid:16) λ ( θ, b η γ ) > Ψ( Z i , Z { r } i ; θ, b η γ ) (cid:17) , (17)6here b η γ = ( b η γ,m , b η γ,m , b η γ,X , b η γ,W , b η γ,f ) > , (18)with b η γ,f ( t ) = 1 nh n X i =1 K (cid:18) W > i γ − th (cid:19) , b η γ,m ( t ) = 1 nh n X i =1 { Y i − l ( X i ; γ ) } K (cid:18) W > i γ − th (cid:19) , b η γ,X ( t ) = 1 nh n X i =1 ∇ γ l ( X i ; γ ) K (cid:18) W > i γ − th (cid:19) , b η γ,W ( t ) = 1 nh n X i =1 W i K (cid:18) W > i γ − th (cid:19) , and b η γ,m ( t ) = 1 nh "b η γ,f ( t ) n X i =1 { Y i − l ( X i ; γ ) } K (cid:18) W > i γ − th (cid:19) − b η γ,m ( t ) n X i =1 K (cid:18) W > i γ − th (cid:19) ,K ( · ) is the derivative of the univariate kernel K ( · ) and h is the bandwidth. We will consider weakly dependent data which satisfy strong mixing conditions. We refer the readerto the book of Rio (2000) and to the survey of Bradley (2005) for the basic properties as well as theasymptotic behavior of weakly dependent processes. We will focus our attention on α -mixing sequences.We use the following measure of dependence between two σ -fields A and B : α ( A , B ) = sup A ∈A ,B ∈B | P ( A ∩ B ) − P ( A ) P ( B ) | . We recall that a sequence ( Z i ) i ∈ Z is said to be α -mixing (or strongly mixing) if α m = sup j ∈ Z α ( F j −∞ , F ∞ j + m )goes to zero as m tends to infinity, where for any −∞ ≤ j ≤ l ≤ ∞ , F lj = σ ( Z i , j ≤ i ≤ l ). Let U i = ( l ( X i ; γ , ) , ∇ γ l ( X i ; γ , ) > , W > i , ε i ) > . Assumptions 1.
1. The process ( Z i ) i ∈ Z , Z i = ( X > i , W > i , ε i ) > ∈ R d X × R d W × R , is strictly stationaryand strongly mixing with mixing coefficients α m satisfying α m = O ( m − ξ ) with ξ > ss − for some s > such that sup k c k =1 E [ | U > i c | s ] < ∞ . (20)
2. The marginal density of the index η γ ,f ( · ) of the index W > i γ , is such that sup t ∈ R η γ ,f ( t ) < ∞ and sup k c k =1 sup t ∈ R E [ | U > i c |{| t | + | U > i c | s − } | W > i γ , = t ] η γ ,f ( t ) < ∞ . (21) Moreover, there is some j ? < ∞ such that, for all j ≥ j ? , sup ( t,t ) ∈ R E [ | U > U j | | W > γ , = t, W > j γ , = t ] f W > γ , ,W > j γ , ( t, t ) < ∞ , where f W > γ , ,W > j γ , ( · ) is the joint density of W > γ , and W > j γ , . . The second partial derivatives of E [ ∇ γ l ( X i ; γ ) | W > i γ , = · ] , E [ W i | W > i γ , = · ] η γ ,f ( · ) and η γ ,f ( · ) , as well as the third derivatives of m ( · ) , are uniformly continuous and bounded. Moreover,the first derivative of m ( · ) is bounded, and the vector ∇ β σ ( V i , Z { r } i ; β ) is also bounded. Assumptions 2.
The matrix
Σ = E h Ψ( Z i , Z { r } i ; θ , η )Ψ( Z i , Z { r } i ; θ , η ) > i is positive definite. Assumptions 3.
The Hessian matrix H µ ( γ ) , defined with the weight ω ( V i ) = η γ,f ( W > i γ ) , is positivedefinite. Moreover, when the model is defined by (1) - (3) , both the Hessian matrices H µ ( γ ) and H σ ( β ) with their corresponding weights ω ( V i ) = η γ,f ( W > i γ ) and ω ( V i ) = η γ,f ( W > i γ ) are positive definite. Assumptions 4.
The bandwidth h used for the non-parametric part of the estimation, is such that nh / ln n → ∞ and nh → . The univariate kernel K is symmetric, bounded and integrable, such that R R t {| K ( t ) | + | tK ( t ) |} dt < ∞ and R R t K ( t ) dt = 0 . The Fourier Transform of K , denoted by F [ K ] ,satisfies the condition sup t ∈ R | t | c K |F [ K ]( t ) | < ∞ for some c K > . Moreover, t
7→ | t | q { K ( t ) + K ( t ) } isbounded on R . Assumption 1 guarantees suitable rates of uniform convergence for the kernel estimators of the infinite-dimensional parameters gathered in the vector η γ . More precisely, they imply the conditions used inTheorem 4 of Hansen (2008), with q = d = 1. We also use the condition on ξ to apply Davydov’sinequality and show that the effect of the nonparametric estimation is negligible and does not alter thepivotalness of the empirical log-likelihood ratio statistic. For this purpose therefore, some conditions inAssumption 1 are more restrictive than in Theorem 4 of Hansen (2008). Condition (19) reveals a linkbetween the existence of some moments of order s and the strength of the dependency given by thecoefficient ξ . The more moments for U i exist, the stronger the time dependency can be. In particular,if U i has finite moments of any order, then s = ∞ and thus ξ could be larger but arbitrarily closeto 10. Assumption 2 guarantees a non-degenerate limit distribution in the CLT for the sample meanof the Ψ( Z i , Z { r } i ; γ , η )’s. Assumption 3 is used to prove Lemma 1 and Lemma 2. Concerning thebandwidth conditions, one could of course use different bandwidths for the different nonparametric esti-mators involved. For readability and practical simplicity, we propose the same bandwidth h . Moreover,Assumption 4 allows one to use, the Gaussian kernel for instance. When the infinite-dimensional parameters η γ are given and the observations are independent, Theorem 2of Qin and Lawless (1994) guarantees that the empirical log-likelihood ratio (ELR) statistic 2 ‘ n ( θ , η )converges in distribution to a X d θ − as n → ∞ (where d θ is the dimension of the model parameters).The following theorem states that, under suitable conditions, the chi-square limit in law is preserved forthe ELR defined with our moment conditions for the (CH)PLSIM, with dependent data and estimated η γ . Let us define the ELR statistic W ( θ ) = 2 ‘ n ( θ , b η γ ) , where ‘ n and b η γ are given by (17) and (18) respectively. Let d θ = d γ for the PLSIM and d θ = d γ + d β for the CHPLSIM. In the following d −→ denotes the convergence in distribution. Theorem 1.
Consider that Assumptions 1, 3 and 4 hold true. Moreover, condition (7) is satisfied, as wellas condition (5) in the case of PLSIM or condition (4) in the case of CHPLSIM. Then, W ( θ ) d −→ X d θ − as n tends to infinity. For the proof, we use a central limit theorem for mixing processes implying that n − / P ni =1 Ψ( Z i , Z { r } i ; θ , η )converges in distribution to a multivariate centered normal distribution, to deal with the dependencybetween observations. Moreover, the behavior of the Lagrange multipliers has to be carefully investi-gated. However, the major difficulty in the proof is to show ‘ n ( θ , b η γ ) − ‘ n ( θ , η ) = o P (1), that is toshow that the nonparametric estimation of the infinite-dimensional nuisance parameters does not break8he pivotalness of the ELR statistic. This type of negligibility, obtained under mild technical conditions,seems to be a new result in the context of semiparametric regression models with weakly dependentdata. It is obtained using arguments based on the Inverse Fourier Transform and Davydov’s inequalityin Theorem A.6 of (Hall and Heyde, 1980). We generated data from model (1)-(3) with ε i = σ ( V i , Z { r } i ; β ) ζ i and σ ( V i , Z { r } i ; β ) = β + β Y i − , where the ζ i are independently drawn from a distribution such that E ( ζ i ) = 0 and Var( ζ i ) = 1. Thatmeans, we allow for conditional heteroscedasticity in the mean regression error term. The covariates X i =( Y i − , Y i − ) > are two lagged values of the target variable Y i and the covariates W i = ( W i , W i , W i ) > are generated from a multivariate Gaussian distribution with mean W i − / W ik , W i‘ ) = 0 . | k − ‘ | . We set ‘ ( X i ; γ ) = γ Y i − + γ Y i − and m ( u ) = 34 sin ( uπ ) , (22)with γ = (0 . , > , γ = (1 , , > and β = (0 . , . > .Hypothesis testing is based on Wilks’ Theorem in Section 4.3 (results related to this method are named estim ), along with the unfeasible EL approach that previously learns the nonparametric estimators ona sample of size 10 (this case mimics the situation where m , m and the density of the index areknown; results related to this method are named ref ). The nonparametric elements are estimated by theNadaraya-Watson method with Gaussian kernel and bandwidth h = C − n − / where C is the standarddeviation of the index. In the experiments, we consider four sample sizes (100, 500, 2000 and 5000)and three distributions for ζ i : a standard Gaussian distribution ( Gaussian ), uniform distribution on[ −√ , √
3] ( uniform ) and a mixture of Gaussian distributions ( mixture ) pN ( m , v ) + (1 − p ) N ( m , v ),with p = 0 . m = − m = 1 / √ s = 1 / s = 3 /
2. For each scenario, we generated 5000 data sets.First, we want to test the order for the lagged values of Y i in the parametric function ‘ . For thispurpose, we use the PLSIM and we consider the following tests: • Test Lag(1) which corresponds to the true order equal to 1, and which is defined by H : γ =(0 . , > and γ = (1 , , > ; • Test Lag(0) which neglects the lagged values of Y i in the linear part and which is defined by H : γ = (0 , > and γ = (1 , , > ; • Test Lag(2) which overestimates the order for the lagged values of Y i and which is defined by H : γ = (0 . , . > and γ = (1 , , > .The empirical probabilities of rejection are presented in Table 1 for a nominal level of 0 .
05. A first,not surprising, conclusion: EL inference in such flexible nonlinear models, with dependent data, requiressufficiently large sample sizes. The results with n = 100 are quite poor even when m ( · ) is given, that is ina purely parametric setup. Next, we notice that for the three distributions of the noise, our EL inferenceapproach allows us to identify the correct order for the lagged values when the sample size is sufficientlylarge. Indeed, only Test Lag(1) has an asymptotic empirical probability of rejection converging to thenominal level 0.05 while the other tests have a probability of rejection converging to one. Moreover,the differences between the unfeasible EL approach ( ref. columns) and our approach ( estim. columns)quickly become negligible. This result was expected because the statistics of both methods converge tothe same chi-square distribution.We now investigate the order for the lagged values of Y i in the conditional mean and variance of thenoise. Thus, we use the CHPLSIM and we consider the following tests: • Test Lag(1)-CH(1) which corresponds to the true values of the conditional mean and variance andwhich is defined by H : γ = (0 . , > , γ = (1 , , > and β = (0 . , . > ;9able 1: Empirical probabilities of rejection obtained from 5000 replications using the PLSIM for testingthe order for the lagged values of Y i in the parametric part ‘ ( · ; γ ) in (22).Test ζ i n = 100 n = 500 n = 1000 n = 2000ref. estim. ref. estim. ref. estim. ref. estim.Lag(1) Gaussian 0.165 0.214 0.066 0.075 0.056 0.054 0.056 0.055uniform 0.123 0.185 0.055 0.074 0.059 0.056 0.052 0.050mixture 0.190 0.229 0.080 0.094 0.063 0.060 0.054 0.051Lag(0) Gaussian 0.204 0.243 0.253 0.231 0.707 0.665 0.983 0.980uniform 0.159 0.204 0.217 0.207 0.743 0.718 0.992 0.990mixture 0.233 0.263 0.236 0.228 0.643 0.619 0.971 0.965Lag(2) Gaussian 0.215 0.270 0.269 0.268 0.787 0.760 0.996 0.995uniform 0.163 0.227 0.242 0.243 0.779 0.769 0.996 0.997mixture 0.262 0.301 0.305 0.299 0.776 0.725 0.995 0.990Table 2: Empirical probabilities of rejection obtained from 5000 replications using the CHPLSIM fortesting the order of the lagged values of Y i in the conditional mean and variance.Test ζ i n = 100 n = 500 n = 1000 n = 2000ref. estim. ref. estim. ref. estim. ref. estim.Lag(1) Gaussian 0.294 0.388 0.105 0.111 0.069 0.074 0.071 0.069CH(1) uniform 0.164 0.277 0.070 0.077 0.063 0.072 0.081 0.072mixture 0.395 0.461 0.152 0.170 0.091 0.098 0.078 0.078Lag(0) Gaussian 0.332 0.406 0.261 0.249 0.672 0.641 0.978 0.972CH(1) uniform 0.195 0.291 0.196 0.190 0.691 0.675 0.985 0.983mixture 0.445 0.493 0.330 0.327 0.653 0.637 0.963 0.958Lag(2) Gaussian 0.340 0.426 0.283 0.287 0.745 0.727 0.993 0.992CH(1) uniform 0.202 0.304 0.220 0.227 0.731 0.728 0.993 0.993mixture 0.443 0.511 0.360 0.352 0.744 0.704 0.990 0.985Lag(1) Gaussian 0.286 0.332 0.528 0.523 0.985 0.986 1.000 1.000CH(0) uniform 0.281 0.294 0.770 0.748 1.000 1.000 1.000 1.000mixture 0.344 0.392 0.484 0.499 0.969 0.970 1.000 1.000 • Test Lag(0)-CH(1) which neglects the lagged values of Y i in the conditional mean and which isdefined by H : γ = (0 , > , γ = (1 , , > and β = (0 . , . > ; • Test Lag(2)-CH(1) which overestimates the order of the lagged values of Y i in the conditional meanand which is defined by H : γ = (0 . , . > , γ = (1 , , > and β = (0 . , . > ; • Test Lag(1)-CH(0) which corresponds to the true value of the conditional mean but neglects thelagged value of Y i in the conditional variance and which is defined by H : γ = (0 . , > , γ =(1 , , > and β = (0 . , > .The empirical probabilities of rejection are presented in Table 2 for a nominal level of 0 .
05. Again, thetrue order of the lagged values is detected by the procedure and the differences between the unfeasibleEL approach and our approach quickly become negligible. As expected, given that the model is morecomplex, the rate of convergence to the nominal level is slower than for the tests on the PLSIM. However,our procedure allows the conditional heteroscedasticity of the noise to be detected, and meanwhile itidentifies the correct order for the lags of Y i in the mean equation. We analyze the data set containing weather (temperature, dew point temperature, relative humidity)and pollution data (PM10 and ozone) for the city of Chicago in the period 1987-2000 from the NationalMorbidity, Mortality and Air Pollution Study. The analyzed data is freely available in the R package10 lnm (Gasparrini, 2011). Lian et al. (2015) considered the same data set under the assumption of i.i.d.observations.We use the (CH)PLSIM with a linear function in the parametric part to predict daily mean ozonelevel ( f o i ). For this purpose we use previous daily values of mean ozone level and four other predictors,that are the daily relative humidity ( (cid:94) rhum i ), the daily mean temperature (in Celsius degrees) (cid:93) temp i , thedaily dew point temperature g dptp i and the daily PM10-level (cid:94) pm10 i . The first step of our analysis was toremove seasonality for each variable we consider in the models. To remove seasonality, we consider theestimators (ˆ c , ˆ c , ˆ a ) > = arg min ( c ,c ,a ) n X i =1 h f o i − c − c sin (2 π ( i/ a ) i . The series obtained by removing seasonality is defined by o i = f o i − ˆ c − ˆ c sin (2 π ( i/ a ) . A similar approach is used to obtain the series rhum i , temp i , dptp i and pm10 i by removing the season-nality of the series (cid:93) rhum i , (cid:93) temp i , g dptp i and (cid:94) pm10 i . Note that the series temp i , dptp i and pm10 i havebeen scaled to facilitate the interpretation γ . Figures 1-5 provided in Section B.1 of the SupplementaryMaterial, present the original series and the series obtained by removing the seasonality. Thus, all thevariables we refer to hereafter in this section are deseasonalized. In this application, the observationsclearly have a time dependency. After removing the seasonality, the autocorrelations of o p − value 0.000) and 0.408 ( p − value 0.000), respectively; Note that all thecovariates have significant autocorrelations (all the p − values are 0.000, see Table 4 in the Section B.1 ofthe Supplementary Material).The covariates included in the linear part are the mean relative humidity ( rhum i ) and mean ozone levelcomputed on the three previous days ( o i − , o i − , o i − ). The covariates included in the nonparametricpart of the conditional mean are temp i , dptp i and pm10 i . The eigenvalues of the covariance matrixcomputed on the three variables used in the nonparametric part are 2.271, 0.684 and 0.150 for the dataof year 1996, and 1.953, 0.777 and 0.162 for the data of year 1997.Thus, the equation of the PLSIM is o i = γ rhum i + γ o i − + γ o i − + γ o i − + m ( γ temp i + γ dptp i + γ pm i ) + ε i . (23)We estimate the parameters of the models, on the data of the year 1996, by minimizing the least squaresusing kernel smoothing (with Gaussian kernel and bandwidth n − / ). Hypothesis testing is conductedon the data of the year 1997. We begin by investigating the order H for the lagged values of the ozonemeasures to be included in the linear part of the conditional mean. Using PLSIM, we define differentmodels, called Lag ( H ) (with H = 0 , , H lagged values of the mean ozone levels areincluded in the linear part (meaning the coefficients related to the other previous days are zero). Theresults for different orders H presented in Table 3 show that the time dependency cannot be neglectedfor analyzing these data. It is relevant to include lagged values of the mean ozone level variable to buildits daily prediction.The autocorrelation of the residuals, obtained with the Lag (2) setup, on the data of years 1997, has avalue of − .
055 ( p − value 0.294). This suggests that H = 2 is a reasonable choice. Figure 6 and Figure 7,given in Section B.1 of the Supplementary Material, present the estimated density of the index and theestimated function ˆ m ( · ), obtained with the Lag (2) setup.We also calculated the autocorrelation of the square of the residuals, obtained with the
Lag (2) setup,and we obtain the value 0.153 ( p − value 0.003). This suggests investigating the conditional heteroscedas-ticity of the noise using the CHPLSIM with the Lag (2) setup. For the conditional variance equation weconsider E ( ε i | rhum i , temp i , dptp i , pm i , F i − ) = β + β ( o i − ) . (24)To estimate the parameters of the conditional variance, we again use the data of year 1996 (forthe estimation, the observations corresponding to the 1% highest values of the squared residuals ˆ ε i orof ( o i − ) are removed). The estimators for the CHPLSIM with conditional variance as in (24) areˆ β = 25 .
276 and ˆ β = 0 . β = 28 . p − valueobtained by testing the values β = ˆ β and β = ˆ β in (24) on the data of year 1997 is 0.146. Meanwhile,11able 3: Estimators of the parameters obtained by the PLSIM, on the data of year 1996, with differentorders of lagged values, and p − values obtained by testing these values on the data of year 1997 for the‘National morbidity and mortality air pollution study’ example.Lag(0) Lag(1) Lag(2) Lag(3)ˆ γ rhum i -1.773 -1.240 -1.258 -1.188 o ( i − o ( i − o ( i − γ temp i dptp i -0.064 0.628 0.666 0.628 pm10 i p − value 0.000 0.050 0.225 0.058the p − value obtained by testing the values β = ˜ β and β = 0 is 0.033. Thus, we conclude with a nonconstant conditional variance for the error term in (23). This effect should be considered when buildingforecast confidence intervals. We propose EL inference in a semiparametric mean regression model with strongly mixing data. Ourmodel could include an additional condition on the second order conditional moment of the error term.The regression function has a partially linear single-index form, while for the conditional variance, weconsider a parametric function. This function could depend on the past values of the observed variables,but it cannot depend directly on the regression error term. A parametric function of the past error termswould break the asymptotic pivotal distribution of the empirical log-likelihood ratio. See Hjort et al.(2009) for a description of this phenomenon in semiparametric models.We prove Wilks’ Theorem under mild technical conditions, in particular without using any trimmingand allowing for unbounded series. To obtain this result, we first rewrite the regression model in theform of a fixed number of suitable unconditional moment conditions. These moment conditions includeinfinite dimensional nuisance parameters estimated by kernel smoothing. We then show that estimatingthe nuisance parameters does not break the asymptotic pivotality of the empirical log-likelihood ratiowhich behaves asymptotically as the nuisance parameters were given. Our theoretical result opens thedoor for the EL inference approach to new applications in nonlinear time series models. We illustrateour result by several simulation experiments and an application to air pollution where assuming timedependency seems reasonable, a fact confirmed by the data.The models proposed in this paper have several straightforward extensions. First, the variable Y i could be allowed to be measured with some error. For instance, Y i could be the square of the error termin a parametric model for some time series ( R i ), such an AR (1) model R i = ρR i − + u i . Then (1) couldbe used for inference on the conditional variance of ( u i ). This example is detailed in the SupplementaryMaterial.Another easy extension is to consider more general conditions than (3). Our theory applies with prac-tically no change if (3) is replaced by one or several conditions such as E [ T ( ε i ) | V i , F i − ] = ν ( V i , Z { r } i ; β ) , where the T ( · )’s are some given twice continuously differentiable functions such that E [ T ( ε i ) | V i , F i − ] =0 a.s., and ν ( · , · ; · ) is a given parametric function. For instance, taking T ( y ) = y , we could include afourth order conditional moment equation in the model, provided E [ ε i | V i , F i − ] = 0 a.s. Such ahigher-order moment condition could replace or could be added to (3).Finally, one might want to consider some partially linear function, with possibly a different index, onthe right-hand side of (3). Lian et al. (2015) followed a similar idea in the i.i.d. case. While consideringseveral series ( Y i ) and equations such as (1) is a straightforward matter, a semiparametric model forthe square of the error term requires some additional effort. We argue that our methodology could beextended to such cases, however the investigation of this extension is left for future work.12 Appendix: proof of Theorem 1
The proof contains three parts. • In Section A.1 we show that, when the nonparametric elements η are known, twice the empiricallikelihood ratio converges to a chi-square distribution, that is 2 ‘ n ( θ , η ) d −→ X d θ − , provided θ and η are the true values of the parameters in the model. This part follows the lines of Chapter 11of Owen (2001) but, owing to dependencies between observations, the central limit theorem of n − / P ni =1 Ψ( Z i , Z { r } i ; θ , η ) and the Lagrange multipliers should be investigated (see Lemmas 3and 5 in section A.1). • Section A.2 is the key part of the proof where we investigate the impact of the estimation of η .We show that 1 n n X i =1 h Ψ( Z i , Z { r } i ; θ , b η γ ) − Ψ( Z i , Z { r } i ; θ , η ) i = o P ( n − / ) , and1 n n X i =1 h Ψ( Z i , Z { r } i ; θ , b η γ )Ψ( Z i , Z { r } i ; θ , b η γ ) > − Ψ( Z i , Z { r } i ; θ , η )Ψ( Z i , Z { r } i ; θ , η ) > i = o P (1) . These differences will be decomposed into several terms. For some of them we simply take thenorm, use the triangle inequality and the uniform convergence rates for nonparametric estimatorsfor dependent data as presented in Hansen (2008). Some other terms will require a more refinedtreatment. To show that they are negligible, we use more elaborated arguments based on an InverseFourier Transform and Davydov’s inequality. • In Section A.3, we conclude the proof by showing that the asymptotic distribution of the ELRis not impacted by the estimation of η and thus, if θ is the true value of the finite-dimensionalparameter, 2 ‘ n ( θ , b η γ ) d −→ X d X + d W − . A.1 Empirical likelihood ratio with η known Before showing the convergence in distribution of the empirical likelihood ratio we need three technicallemmas. They are mainly used to show that, when η are known, the empirical likelihood ratio thenconverges under H , to a chi-squared distribution. Proofs are given in the Supplementary Material. Lemma 3.
Suppose that 1 and (2) hold true. We then have the following central limit theorem √ n n X i =1 Ψ( Z i , Z { r } i ; θ , η ) d −→ N (0 , Σ) . Lemma 4.
Under the assumptions of Lemma 3, max ≤ i ≤ n k Ψ( Z i , Z { r } i ; θ , η ) k = o P ( n / ) and n X i =1 k Ψ( Z i , Z { r } i ; θ , η ) k = o P ( n / ) . Lemma 5.
Under the assumptions of Lemma 3, λ ( θ , η ) = S ( θ , η ) − n n X i =1 Ψ( Z i , Z { r } i ; θ , η ) + o P ( n − / ) , and S ( θ , η ) − n n X i =1 Ψ( Z i , Z { r } i ; θ , η ) = O P ( n / ) , where S ( θ, η ) = n − P ni =1 Ψ( Z i , Z { r } i ; θ, η γ )Ψ( Z i , Z { r } i ; θ, η γ ) > .
13y the third order Taylor expansion,2 ‘ n ( θ , η ) = 2 λ ( θ , η ) > n X i =1 Ψ( Z i , Z { r } i ; θ , η ) − λ ( θ , η ) > " n X i =1 Ψ( Z i , Z { r } i ; θ , η )Ψ( Z i , Z { r } i ; θ , η ) > λ ( θ , η ) + R n , where R n is the reminder. By Lemma 4 and Lemma 5, we have | R n | ≤ O P (cid:16) k λ ( θ , η ) k (cid:17) max (cid:18) , max ≤ i ≤ n | λ ( θ , η ) > Ψ( Z i , Z { r } i ; θ , η ) | − (cid:19) × n X i =1 (cid:13)(cid:13)(cid:13) Ψ( Z i , Z { r } i ; θ , η ) (cid:13)(cid:13)(cid:13) = O P ( n − / ) O P (1) o P ( n / ) = o P (1) . Thus, replacing λ ( θ , η ) by its definition given in Lemma 5, we obtain2 ‘ n ( θ , η ) = √ n n X i =1 Ψ( Z i , Z { r } i ; θ , η ) ! S ( θ , η ) − × √ n n X i =1 Ψ( Z i , Z { r } i ; θ , η ) + o P (1) . Note that, by the ergodicity of the process ( Z i ) i ∈ Z , we have S ( θ , η ) → Σ almost surely when n → ∞ .From this and Lemma 3, we have that S ( θ , η ) − / n − / × P ni =1 Ψ( Z i , Z { r } i ; θ , η ) converges to astandard multivariate normal distribution. Hence 2 ‘ n ( θ , η ) converges in distribution to X d θ − . A.2 Controlling the effect of the nonparametric estimation
In the following, C , C , C , c . . . denote constants that may change from line to line. A.2.1 The rate of k b η γ − η k We aim to apply the uniform convergence result from Theorem 4 of Hansen (2008), for which ourassumptions allow us to verify the required conditions with d = q = 1. In particular, we guaranteeHansen’s condition (10), with our ξ playing the role of β defined in Hansen’s equation (2), and we have θ > / θ defined in Hansen’s equation (11). Then, by Theorem 4 of Hansen (2008) and the secondorder Taylor expansion, we havesup t ∈ R | b η γ ,m ( t ) − η ,m ( t ) | = O P (cid:18) ln nnh (cid:19) / ! + O (cid:0) h (cid:1) = o P ( n − / ) , and similar uniform rates hold for b η γ ,f ( t ), b η γ ,X ( t ) and b η γ ,W ( t ). Thus,sup x ∈ R dX sup w ∈ R dW (cid:13)(cid:13)(cid:13)b η ( − γ ( w > γ , ) − η ( − ( w > γ , ) (cid:13)(cid:13)(cid:13) ω ( x, w ) = o P ( n − / ) , (25)with b η ( − γ ( · ) the sub-vector of b η γ ( · ) defined in (18) obtained after removing the second component b η γ ,m ( · ), and η ( − ( · ) the limit of the sub-vector. Meanwhile,sup t ∈ R | b η γ ,m ( t ) − η ,m ( t ) | = O P (cid:18) ln nnh (cid:19) / ! + O (cid:0) h (cid:1) (26)Moreover, since η γ ,f ( · ) is bounded, by the identity a k − b k = ( a − b )( a k − + a k − b + · · · + b k − ), we alsohave b η kγ ,f ( t ) = η k ,f ( t ) + O P (cid:18) ln nnh (cid:19) / ! + O (cid:0) h (cid:1) = η k ,f ( t ) + o P ( n − / ) , t ∈ R , (27)14ith the o P ( n − / ) rate uniformly maintained with respect to t . We will use this result with k = 2 and k = 3. A.2.2 PLSIM: the rate of n − P ni =1 [Ψ( Z i ; γ , b η γ ) − Ψ( Z i ; γ , η )]Let Φ( · ; ˙ , · ) be defined as in (13). We want to show that (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) n n X i =1 Ψ( Z i ; γ , b η γ ) − Ψ( Z i ; γ , η ) (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) = o P ( n − / ) . (28)Let Υ i = W > i γ , . We have 1 n n X i =1 Ψ( Z i ; γ , b η γ ) = (cid:20) A X A W (cid:21) , (29)with A X = 1 n n X i =1 [ { ε i + m (Υ i ) } b η γ ,f (Υ i ) − b η γ ,m (Υ i )] × [ b η γ ,f (Υ i ) ∇ γ l ( X i ; γ , ) − b η γ ,X (Υ i )] b η γ ,f (Υ i ) , and A W = 1 n n X i =1 [ { ε i + m (Υ i ) } b η γ ,f (Υ i ) − b η γ ,m (Υ i )] × b η γ ,m (Υ i ) [ b η γ ,f (Υ i ) W i − b η γ ,W (Υ i )] . Lemma 6. A X = 1 n n X i =1 ε i (cid:20) ∇ γ l ( X i ; γ , ) − η γ ,X (Υ i ) η γ ,f (Υ i ) (cid:21) η γ ,f (Υ i ) + o P ( n − / ) . (30)The proof of Lemma 6 is provided in the Supplement. Lemma 7. A W = 1 n n X i =1 ε i m (Υ i ) (cid:20) W i − η γ ,W (Υ i ) η γ ,f (Υ i ) (cid:21) η γ ,f (Υ i ) + o P ( n − / ) . (31) Lemma 7.
We rewrite A W = 1 n n X i =1 [ { ε i + m (Υ i ) } b η γ ,f (Υ i ) − b η γ ,m (Υ i )] × b η γ ,m (Υ i ) [ b η γ ,f (Υ i ) W i − b η γ ,W (Υ i )]= A W, + X l ∈{ a,b,...,g } A W,l + X k ∈{ , ,..., } A W,k , where A W, = 1 n n X i =1 ε i m (Υ i ) (cid:20) W i − η γ ,W (Υ i ) η γ ,f (Υ i ) (cid:21) η γ ,f (Υ i ) , is the dominating term. The negligible terms could be separated into two groups A W,a = 1 n n X i =1 ε i η γ ,m (Υ i ) [ b η γ ,f (Υ i ) W i − η γ ,W (Υ i )] b η γ ,f (Υ i ) ,A W,b = 1 n n X i =1 ε i η γ ,m (Υ i ) [ η γ ,W (Υ i ) − b η γ ,W (Υ i )] b η γ ,f (Υ i ) , W,c = 1 n n X i =1 [ m (Υ i ) η γ ,f (Υ i ) − b η γ ,m (Υ i )] η γ ,m (Υ i ) [ η γ ,f (Υ i ) W i − b η γ ,W (Υ i )] A W,d = 1 n n X i =1 m (Υ i ) [ b η γ ,f (Υ i ) − η γ ,f (Υ i )] η γ ,m (Υ i ) [ η γ ,f (Υ i ) W i − b η γ ,W (Υ i )] A W,e = 1 n n X i =1 ε i [ b η γ ,m (Υ i ) − η γ ,m (Υ i )] [ η γ ,W (Υ i ) − b η γ ,W (Υ i )] b η γ ,f (Υ i ) ,A W,f = 1 n n X i =1 ε i [ b η γ ,m (Υ i ) − η γ ,m (Υ i )] [ b η γ ,f (Υ i ) − η γ ,f (Υ i )] W i b η γ ,f (Υ i ) ,A W,g = 1 n n X i =1 ε i [ b η γ ,m (Υ i ) − η γ ,m (Υ i )] [ η γ ,f (Υ i ) W i − η γ ,W (Υ i )] b η γ ,f (Υ i ) , and A W, = 1 n n X i =1 [ m (Υ i ) η γ ,f (Υ i ) − b η γ ,m (Υ i )] × η γ ,m (Υ i ) [ b η γ ,f (Υ i ) − η γ ,f (Υ i )] W i b η γ ,f (Υ i ) ,A W, = 1 n n X i =1 m (Υ i ) [ b η γ ,f (Υ i ) − η γ ,f (Υ i )] × η γ ,m (Υ i ) [ b η γ ,f (Υ i ) − η γ ,f (Υ i )] W i b η γ ,f (Υ i ) ,A W, = 1 n n X i =1 [ m (Υ i ) b η γ ,f (Υ i ) − b η γ ,m (Υ i )] × [ b η γ ,m (Υ i ) − η γ ,m (Υ i )] [ b η γ ,f (Υ i ) W i − η γ ,W (Υ i )] ,A W, = 1 n n X i =1 [ m (Υ i ) b η γ ,f (Υ i ) − b η γ ,m (Υ i )] × [ b η γ ,m (Υ i ) − η γ ,m (Υ i )] [ η γ ,W (Υ i ) − b η γ ,W (Υ i )] ,A W, = 1 n n X i =1 [ b η γ ,f (Υ i ) − η γ ,f (Υ i )] × [ b η γ ,m (Υ i ) − η γ ,m (Υ i )] [ b η γ ,f (Υ i ) W i − η γ ,W (Υ i )] ,A W, = 1 n n X i =1 [ b η γ ,f (Υ i ) − η γ ,f (Υ i )] × [ b η γ ,m (Υ i ) − η γ ,m (Υ i )] [ η γ ,W (Υ i ) − b η γ ,W (Υ i )] . Taking the norm, by the triangle inequality and (25), k A W, k + · · · + k A W, k = o P ( n − / ) . Now we investigate A W,b , the same arguments will apply to A W,a . We have A W,b = 1 n n X i =1 ε i η γ ,m (Υ i ) [ b η γ ,W (Υ i ) − η γ ,W (Υ i )] b η γ ,f (Υ i )= 1 n n X i =1 ε i η γ ,m (Υ i ) hb η ( − i ) γ ,W (Υ i ) − η γ ,W (Υ i ) i η γ ,f (Υ i ) + r W,b =: A W,b + r W,b , b η ( − i ) γ ,W (Υ i ) = 1 n X ≤ j = i ≤ n c ( W j ) 1 h K (cid:18) Υ j − Υ i h (cid:19) , and k r W,b k = o P ( n − / ), which is obtained after taking the norm of the sums and using (25). Thus itsuffices to show k A W,b k = o P ( n − / ) . The rate of the norm could be deduced from the same rates of thecomponents. After replacing the expression of b η γ ,W (Υ i ) a component of A W,b could be written underthe form R ,n − R ,n where R ,n = 1 n X ≤ i = j ≤ n λ ( Z i ) τ ( Z j ) 1 h K (cid:18) Υ j − Υ i h (cid:19) and R ,n = n − n X ≤ i ≤ n λ ( Z i ) E [ τ ( Z i ) | Υ i ] η γ ,f (Υ i ) , with λ ( · ) and τ ( · ) some real-valued functions with E [ | λ ( Z i ) | s + | τ ( Z j ) | s ] < ∞ , and E [ λ ( Z i ) | Υ i , F i − ] = 0 . (32)Thus our purpose will be to show : R ,n − R ,n = o P ( n − / ) . (33)For this purpose, we will show that E (cid:2) ( R ,n − R ,n ) (cid:3) = o P ( n − ) . We first want to control E [ R ,n ]. Let us note that, applying the Inverse Fourier Transform,1 h K (cid:18) Υ j − Υ i h (cid:19) = Z R e πιt (Υ j − Υ i ) F [ K ]( th ) dt, where ι = √− F [ K ]( · ) is the Fourier Transform of the kernel K ( · ). Note that, since F [ K ]( · ) issupposed to be integrable, there exists some constant C such that 0 < R R |F [ K ]( th ) | dt ≤ Ch − . We nextcan write n R ,n = Z R X ≤ i = j ≤ n (cid:8) λ ( Z i ) e − πιt Υ i τ ( Z j ) e πιt Υ j (cid:9) F [ K ]( th ) dt × Z R X ≤ i = j ≤ n n λ ( Z i ) e − πιt Υ i τ ( Z j ) e πιt Υ j o F [ K ]( t h ) dt . Thus we have n E [ R ,n ] = Z R Z R X ≤ i = j ≤ n X ≤ i = j ≤ n E { Λ( i, j, i , j ; t, t ) } F [ K ]( th ) dt F [ K ]( t h ) dt , with Λ( i, j, i , j ; t, t ) = λ ( Z i ) e − πιt Υ i τ ( Z j ) e πιt Υ j λ ( Z i ) e − πιt Υ i τ ( Z j ) e πιt Υ j . For 1 ≤ m ≤ n , let I ( i ; m ) = { k : 1 ≤ k ≤ n, | k − i | < m } , the set of indices from 1 to n in the m − neighborhood of i , and I c ( i ; m ) = { k : 1 ≤ k ≤ n, k
6∈ I ( i ; m ) } . Let 0 < δ < I = [ ≤ i = i ≤ n (cid:8) ( i, j, i , j ) : { j, j } ⊂ I ( i ; n δ / ∪ I ( i ; n δ / (cid:9) is a set of cardinality of order n δ n . First, consider the case | i − i | ≥ n δ .
17n this case, by (19) and Davydov’s inequality from Theorem A.6 of (Hall and Heyde, 1980) with p, q > | E { Λ( i, j, i , j ; t, t ) }| ≤ Cn − ξδ/p , ∀ t, t ∈ R , ∀| i − i | ≥ n δ and { j, j } 6⊂ I ( i ; n δ / ∪ I ( i ; n δ / , (34)for some constant C > i, i , j, j , t and t . Indeed, if at least one of the indices j and j isnot in the n δ / − neighborhood of i or i , we then have max { min( | i − j | , | i − j | ) , min( | i − j | , | i − j | ) } ≥ n δ / . This means we could isolate one of the indices i and i by a n δ / − neighborhood, and, possiblyafter repeated applications, we could apply Davydov’s inequality with, say, Y = λ ( Z i ) e − πιt Υ i . In thiscase E ( Y ) = 0 and we obtain (34). For the multi-indices satisfying | i − i | ≥ n δ but belonging to I , wecould simply bound | E { Λ( i, j, i , j ; t, t ) }| using the Cauchy-Schwarz inequality and recall the negligiblecardinality of order n δ n of I . From the investigation of all types of situations we note that we couldtake p = s/ ( s −
3) and q = s/ < | i − i | < n δ , we distinguish two sub-cases. The absolute value of another pair of indices is smaller than n δ or theabsolute values of all the other five pairs we could make with i, j, i and j are larger than n δ . In theformer case, the cardinality of the set of multi-indices is of order at most n δ n . In the latter case, wecould apply Davydov’s inequality with a split of Λ( i, j, i , j ; t, t ) in X and Y such that, say, in Y thelargest index is either i or i . In such a case, E ( X ) = 0 and we have a bound as in (34).The case i = i requires a special attention, that is we have to studyΛ n = Z R Z R X ≤ i = j = j ≤ n E { Λ( i, i, i , j ; t, t ) } F [ K ]( th ) dt F [ K ]( t h ) dt = Z R Z R X ≤ i = j = j ≤ n E n λ ( Z i ) e − πι ( t + t )Υ i τ ( Z j ) e πιt Υ j τ ( Z j ) e πιt Υ j o × F [ K ]( th ) dt F [ K ]( t h ) dt . In the case where in addition min( | i − j | , | i − j | , | j − j | ) < n ν , for some 0 < ν < n ν . Then a bound for the sumover the set of these multi-indices is easily obtained using the low cardinality of the set (low comparedto n , the order of the cardinality of the full set of multi-indices ( i, j, i , j ) with i = j and i = j ) andthe Cauchy-Schwarz inequality. For multi-indices such that i = i and min( | i − j | , | i − j | , | j − j | ) ≥ n ν ,applying Davydov’s inequality twice with p = s/ ( s −
2) and q = s/ E n λ ( Z i ) e − πι ( t + t )Υ i τ ( Z j ) e πιt Υ j τ ( Z j ) e πιt Υ j o = E n λ ( Z i ) e − πι ( t + t )Υ i o E (cid:8) τ ( Z j ) e πιt Υ j (cid:9) E n τ ( Z j ) e πιt Υ j o + O ( n − ξν ( s − /s ) )= E n λ (Υ i ) e − πι ( t + t )Υ i o E (cid:8) τ (Υ j ) e πιt Υ j (cid:9) E n τ (Υ j ) e πιt Υ j o + O ( n − ξν ( s − /s ) ) , (35)where λ (Υ i ) = E [ λ ( Z i ) | Υ i ], τ (Υ j ) = E [ τ ( Z j ) | Υ j ], γ (Υ j ) = E [ τ ( Z j ) | Υ j ]. Moreover, the rate O ( n − ξν ( s − /s ) ) of the reminder is uniform with respect to t and t . We deduce that Z R Z R E { Λ( i, i, i , j ; t, t ) } F [ K ]( th ) dt F [ K ]( t h ) dt = Z R Z R F [ λ η γ ,f ]( t + t ) F [ τ η γ ,f ]( − t ) F [ τ η γ ,f ]( − t ) F [ K ]( th ) dt F [ K ]( t h ) dt + O ( n − ξν ( s − /s ) ) Z R Z R F [ K ]( th ) dt F [ K ]( t h ) dt . < c < F [ K ](0) = 1, Z R n F [ λ η γ ,f ]( t + t ) F [ τ η γ ,f ]( − t ) o F [ K ]( th ) dt = Z | th |≤ h c {· · · } dt + Z | th |≤ h c {· · · } {F [ K ]( th ) − F [ K ](0) } dt + Z | th | >h c {· · · }F [ K ]( th ) dt =: I ( t ; h ) + I ( t ; h ) + I ( t ; h ) . Since F [ λ η γ ,f ]( · ) and F [ τ η γ ,f ]( · ) are squared integrable, | I ( t ; h ) | ≤ Z R (cid:12)(cid:12)(cid:12) F [ λ η γ ,f ]( t + t ) F [ τ η γ ,f ]( − t ) (cid:12)(cid:12)(cid:12) dt < ∞ . Moreover, since F [ K ]( · ) is Lipschitz continuous and F [ λ η γ ,f ]( · ) and F [ τ η γ ,f ]( · ) are bounded, for someconstant C , | I ( t ; h ) | ≤ Ch Z | t |≤ h c − | t | dt = h c − → , the convergence to zero being guaranteed as soon as c > /
2. Finally, by Assumption 4, | I ( t ; h ) | ≤ C h − Z | u | >h c − |F [ K ]( u ) | du ≤ C h − Z | u | >h c − u − c K du = C h − h ( c K − − c ) → , with some constants C , C . The convergence to zero holds as soon as c is smaller than ( c K − / ( c K − c K >
3. Next, we integrate with respect to t and we decompose theintegral in a similar way, that is we write Z R { I ( t ; h ) + I ( t ; h ) + I ( t ; h ) }F [ K ]( t h ) dt = Z | t h |≤ h c {· · · } dt + Z | t h |≤ h c {· · · } {F [ K ]( t h ) − F [ K ](0) } dt + Z | t h | >h c {· · · }F [ K ]( t h ) dt =: J ( h ) + J ( h ) + · · · + J ( h ) + J ( h ) . By the Dominated Convergence Theorem and the Convolution Theorem for the Fourier Transform, J ( h ) → Z R Z R F [ λ η γ ,f ]( t + t ) F [ τ η γ ,f ]( − t ) F [ τ η γ ,f ]( − t ) dtdt = F [ λ η γ ,f τ η γ ,f τ η γ ,f ](0) = E [ λ (Υ i ) { τ η γ ,f } (Υ i )]Meanwhile, by the same arguments as above, the other eight terms J kl ( h ), with 1 ≤ k, l ≤ k, l ) = (1 , E [ R ,n ] − n − E (cid:8) λ ( Z i ) E [ τ ( Z i ) | Υ i ] η f,γ (Υ i ) (cid:9) = n − × h h − O (cid:16) n −{ ξδs − ( s − − } + n − (1 − δ ) (cid:17) + h − O (cid:16) n − ξνs − ( s − + n − (1 − ν ) (cid:17)i . Taking δ = 2 sξ ( s −
3) + 2 s and ν = sξ ( s −
2) + s , to guarantee E [ R ,n ] − E n λ ( Z i ) E [ τ ( Z i ) | Υ i ] η f,γ (Υ i ) o = o ( n − ) , we need the conditions 0 < δ, ν < nh ρ → ∞ with ρ = 2[ ξ ( s −
3) + 2 s ] ξ ( s − − s and ρ = 2[ ξ ( s −
2) + s ] ξ ( s − . ρ in the last display is always larger than the second one, provided s >
3, weonly have to ensure that nh ρ → ∞ for the first expression of ρ . Note that in both cases ρ < ξ > s/ ( s − E [ R ,n ] = n − E (cid:8) λ ( Z i ) E [ τ ( Z i ) | Υ i ] η γ ,f (Υ i ) (cid:9) { o ( n − ) } . It remains to study n n − E [ R ,n R ,n ] = Z R X ≤ i = j ≤ n X ≤ i ≤ n E { Γ( i, j, i ; t ) } F [ K ]( th ) dt, with Γ( i, j, i ; t ) = λ ( Z i ) e − πιt Υ i τ ( Z j ) e πιt Υ j λ ( Z i ) E [ τ ( Z i ) | Υ i ] η γ ,f (Υ i ) . Repeating the same arguments as above, the leading term of n ( n − − E [ R ,n R ,n ] is obtained summingthe terms R R E [Γ( i, j, i ; t )] F [ K ]( th ) dt over all the pairs ( i, j ). Moreover, only the pairs for which | i − j | issufficiently large will matter. As a consequence, after applying Davydov’s inequality, the leading termswill be Z R E (cid:8) λ ( Z i ) E [ τ ( Z i ) | Υ i ] η γ ,f (Υ i ) e − πιt Υ i (cid:9) E (cid:8) τ ( Z j ) e πιt Υ j (cid:9) F [ K ]( th ) dt = Z R E (cid:8) λ ( Z i ) E [ τ ( Z i ) | Υ i ] η γ ,f (Υ i ) e − πιt Υ i (cid:9) E (cid:8) E [ τ ( Z j ) | Υ j ] e πιt Υ j (cid:9) F [ K ]( th ) dt = F [ λ η γ ,f τ η γ ,f τ η γ ,f ](0) { o (1) } , where for the last equality we used the same arguments as above. We deduce that E [ R ,n R ,n ] = n − E (cid:8) λ ( Z i ) E [ τ ( Z i ) | Υ i ] η γ ,f (Υ i ) (cid:9) { o ( n − ) } , and thus (33) holds true, and E ( k A W,b k ) = o ( n − / ) . Next, we have to investigate A W,e , A W,f and A W,g . This could not be bounded by simply takingthe norm of the sum. Indeed, since the nonparametric estimator of the derivative has a slower rate ofconvergence given in (26), this would not yield a sufficiently fast rate for these terms. To improve therate we have to exploit (2). For this, we again use the steps we followed for A W,b : replace b η γ ,f (Υ i ) by η γ ,f (Υ i ), replace the expressions of the nonparametric estimator, and compute the second order momentof the resulting average over three indices. We next partition the set of six components multi-indices,obtained when considering the second order moment, in three subsets that could be handled either usingCauchy-Schwarz inequality and the negligible cardinality of the subset, or using Davydov’s inequalityand a condition like (32), or using the Inverse Fourier Transform for K and K . The latter category ofmulti-indices corresponds to the expectation of the terms containing the factor ε i . The adaptation ofthe previous arguments for A W,e , A W,f and A W,g is quite straightforward and thus we omit the details.We deduce that E ( k A W,e k + k A W,f k + k A W,g k ) = o ( n − / ) . Finally, we have to investigate A W,c and A W,d . We can write each of these terms under the form1 + o P (1) n n X i =1 δ (Υ i ) { b γ n (Υ i ) − γ (Υ i ) } λ ( W i ) − o P (1) n n X i,j =1 δ (Υ i ) { b γ n (Υ i ) − γ (Υ i ) } n n X j =1 λ (Υ i , Υ j ; h ) =: { A − A }{ o P (1) } , where A and A are the sums corresponding to λ ( W i ) = [ W i − E ( W i | Υ i )] η γ ,f (Υ i ) ,λ ( t, Υ j ; h ) = W j h K (cid:18) Υ j − th (cid:19) − E ( W i | Υ i = t ) η γ ,f ( t ) , b γ n ( · ) is either the kernel estimator of γ ( · ) = η γ ,m ( · ) (and then δ ( · ) = η γ ,m ( · )), orof γ ( · ) = η γ ,f ( · ) (and then δ ( · ) = m ( · ) η γ ,m ( · )). By (25), b γ n ( · ) is uniformly convergent with the rate o P ( n − / ). We first investigate the variance of A for which we could again apply the uniform rate (25)and deduce that E ( k A k ) = o ( n − / ) . After replacing the expression for the kernel estimator, we couldrewrite A = 1 n X ≤ i = j ≤ n λ ( W i ) τ ( W j ) 1 h K (cid:18) Υ j − Υ i h (cid:19) , an expression close to R ,n from the decomposition of A W,b . Here, instead of (32) we have E [ λ ( W i ) | Υ i ] = E [ τ ( W i ) | Υ i ] = 0 . Like for E ( R ,n ), to bound the expectation of A , which is a sum over a multi-index ( i, i , j, j ), wecould first consider the expectation of the partial sum over the multi-indices such that | i − i | ≥ n δ .This situation could be handled in the same way as above and could deduce the rate o ( n − ) for theexpectation of this partial sum, under the same conditions on δ . The difference with respect to thestudy of E ( R ,n ) arises in the case of the partial sum over the multi-indices with | i − i | < n δ . When i and i are closer than n δ , in particular when i = i , we distinguish three sub-cases: (a) j or/and j belonging to I ( i ; n δ / ∪ I ( i ; n δ / { j, j } ⊂ I c ( i ; n δ / ∩ I c ( i ; n δ /
2) and | j − j | < n δ ; and (c) { j, j } ⊂ I c ( i ; n δ / ∩ I c ( i ; n δ /
2) and | j − j | ≥ n δ . In sub-cases (a) and (b), the cardinality of thesubset of indices is at most n δ n . In sub-case (c), we could isolate j or j in a n δ / − neighborhood andapply again Davydov’s inequality with, say, Y = τ ( W j ) e − πιt Υ j , in which case E ( Y ) = 0. Gatheringfacts, we deduce that E ( k A W,c k + k A W,d k ) = o ( n − / ). Now the proof of Lemma 7 is complete. A.2.3 CHPLSIM: the rate of n − P ni =1 [Ψ( Z i , Z { r } i ; θ , b η γ ) − Ψ( Z i , Z { r } i ; θ , η )]Let Φ( · , · ; · , · ) be defined as in (15). We want to show a rate like (28). The only difference compared tothe PLSIM comes from the second set of equations. LetΨ σ ( Z i , Z { r } i ; θ, η γ ) = g σ ( Z i , Z { r } i ; θ, m γ ) ∇ β σ ( V i , Z { r } i ; β ) η γ,f ( W > i γ ) ∈ R d β , where we recall that g σ ( Z i , Z { r } i ; θ, m γ ) = { Y i − l ( X i ; γ ) − m γ (Υ i ) } − σ ( V i , Z { r } i ; β ) . With the notations σ i = σ ( V i , Z { r } i ; β ) and ∇ β σ i = ∇ β σ ( V i , Z { r } i ; β ), we could decompose1 n n X i =1 h Ψ( Z i , Z { r } i ; θ , b η γ ) − Ψ( Z i , Z { r } i ; θ , η ) i = 1 n n X i =1 (cid:8) ε i − σ i (cid:9) ∇ β σ i { b η γ,f (Υ i ) − η γ,f (Υ i ) } { η γ,f (Υ i ) + o P ( n − / ) } + 1 n n X i =1 { b η γ ,m (Υ i ) − m (Υ i ) b η γ ,f (Υ i ) } ∇ β σ i + 2 n n X i =1 ε i m (Υ i )[ b η γ ,f (Υ i ) − b η γ ,f (Υ i )] ∇ β σ i { η γ,f (Υ i ) + o P ( n − / ) }− n n X i =1 ε i [ b η γ ,m (Υ i ) − m (Υ i ) η γ ,f (Υ i )] ∇ β σ i { η γ,f (Υ i ) + o P ( n − / ) } = B + B + 2 B − B + o P ( n − / ) , where the reminder o P ( n − / ) is obtained by taking the norms of the sums where we obtain a productof two quantities with uniform rates o P ( n − / ). Taking the norm of the sum, using triangle inequalityand (25), k B k = o P ( n − / ). Next, by the definition of the model, E (cid:2) ε i − σ i | V i , F i − (cid:3) = 0 a.s. Acareful inspection of the arguments for deducing the rate of A W,a and A W,b in Lemma 7 reveals that the21rguments remain valid if the function λ ( · ) appearing in the definition of R ,n and R n, , also dependson Z { r } i , that is λ ( Z i ) becomes λ ( Z i , Z { r } i ). Here we consider λ ( Z i , Z { r } i ) = (cid:8) ε i − σ i (cid:9) ∇ β σ i η γ,f (Υ i ) , λ ( Z i , Z { r } i ) = ε i m (Υ i ) ∇ β σ i η γ,f (Υ i ) , and λ ( Z i , Z { r } i ) = ε i ∇ β σ i η γ,f (Υ i ) to handle B , B and B , respectively. We have E [ λ ( Z i , Z { r } i ) | V i , F i − ] = 0 a.s., for these three definitions, and thus condition (32) holds true. We deduce that E ( k B k + k B k + k B k ) = o P ( n − / ). A.2.4 Controlling the variance estimation error
By the previous arguments, it is now easy to deduce that (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) n n X i =1 h Ψ( Z i , Z { r } i ; θ , b η γ )Ψ( Z i , Z { r } i ; θ , b η γ ) > − Ψ( Z i , Z { r } i ; θ , η )Ψ( Z i , Z { r } i ; θ , η ) > i(cid:13)(cid:13)(cid:13) = o P (1) . (36) A.3 Empirical likelihood ratio with b η γ We have λ ( θ , b η γ ) = S ( θ , b η γ ) − n n X i =1 Ψ( Z i , Z { r } i ; θ , η ) + o P ( n − / )= S ( θ , η ) − n n X i =1 Ψ( Z i , Z { r } i ; θ , η )+ o P (1) 1 n n X i =1 Ψ( Z i , Z { r } i ; θ , η ) + o P ( n − / ) . Thus, the CLT for n − P ni =1 Ψ( Z i ; θ , η ) implies that k λ ( θ , b η γ ) − λ ( θ , η ) k = o P ( n − / ). Moreover,since k λ ( θ , η ) k = O P ( n − / ),2 ‘ n ( θ , b η γ ) = 2 λ ( θ , b η γ ) > n X i =1 Ψ( Z i , Z { r } i ; θ , b η γ ) − λ ( θ , b η γ ) > " n X i =1 Ψ( Z i , Z { r } i ; θ , b η γ )Ψ( Z i , Z { r } i ; θ , b η γ ) > λ ( θ , b η γ )+ o P (1)= λ ( θ , η ) > n X i =1 Ψ( Z i , Z { r } i ; θ , η ) − λ ( θ , η ) > n X i =1 Ψ( Z i , Z { r } i ; θ , η )Ψ( Z i , Z { r } i ; θ , η ) > λ ( θ , η ) + o P (1)= 2 ‘ n ( θ , η ) + o P (1) . Thus 2 ‘ n ( θ , b η γ ) and 2 ‘ n ( θ , η ) have the same X d θ − asymptotic distribution. References
Bossaerts, P., C. Hafner, and W. H¨ardle (1996).
A New Method for Volatility Estimation with Applicationsin Foreign Exchange Rate Series , pp. 71–83. Heidelberg: Physica-Verlag HD.Bradley, R. C. (2005). Basic properties of strong mixing conditions. A survey and some open questions.
Probab. Surv. 2 , 107–144. Update of, and a supplement to, the 1986 original.22arroll, R. J., J. Fan, I. Gijbels, and M. P. Wand (1997). Generalized partially linear single-index models.
J. Amer. Statist. Assoc. 92 (438), 477–489.Chang, J., S. X. Chen, and X. Chen (2015). High dimensional generalized empirical likelihood for momentrestrictions with dependent data.
Journal of Econometrics 185 (1), 283 – 304.Chen, S. X. and I. Van Keilegom (2009). A review on empirical likelihood methods for regression.
TEST 18 (3), 415–447.Chen, X. and H. Cui (2008). Empirical likelihood inference for partial linear models under martingaledifference sequence.
Statistics & Probability Letters 78 (17), 2895–2901.Dong, C., J. Gao, and D. Tjøstheim (2016). Estimation for single-index and partially linear single-indexintegrated models.
Ann. Statist. 44 (1), 425–453.Fan, G.-L. and H.-Y. Liang (2010). Empirical likelihood inference for semiparametric model with linearprocess errors.
Journal of the Korean Statistical Society 39 (1), 55–65.Gasparrini, A. (2011). Distributed lag linear and non-linear models in R: the package dlnm.
Journal ofStatistical Software 43 (8), 1–20.Hall, P. and C. C. Heyde (1980).
Martingale limit theory and its application . Academic Press, Inc.[Harcourt Brace Jovanovich, Publishers], New York-London. Probability and Mathematical Statistics.Hansen, B. E. (2008). Uniform convergence rates for kernel estimation with dependent data.
EconometricTheory 24 (3), 726–748.H¨ardle, W., H. L¨utkepohl, R. Chen, W. Hardle, and H. Lutkepohl (1997, April). A review of nonparamet-ric time series analysis.
International Statistical Review / Revue Internationale de Statistique 65 (1),49–72.H¨ardle, W., A. Tsybakov, and L. Yang (1998). Nonparametric vector autoregression.
Journal of StatisticalPlanning and Inference 68 (2), 221–245.Hjort, N. L., I. W. McKeague, and I. Van Keilegom (2009). Extending the scope of empirical likelihood.
Ann. Statist. 37 (3), 1079–1111.Kanai, H., H. Ogata, and M. Taniguchi (2010). Estimating function approach for charn models.
Metron 68 (1), 1–21.Kato, H., M. Taniguchi, and M. Honda (2006, September). Statistical analysis for multiplicativelymodulated nonlinear autoregressive model and its applications to electrophysiological signal analysisin humans.
IEEE Transactions on Signal Processing 54 (9), 3414–3425.Kitamura, Y. (1997). Empirical likelihood methods with weakly dependent processes.
Ann. Statist. 25 (5),2084–2102.Li, G., L. Zhu, L. Xue, and S. Feng (2010). Empirical likelihood inference in partially linear single-indexmodels for longitudinal data.
Journal of Multivariate Analysis 101 (3), 718–732.Lian, H., H. Liang, and R. J. Carroll (2015). Variance function partially linear single-index models.
Journal of the Royal Statistical Society: Series B (Statistical Methodology) 77 (1), 171–194.Liang, H., X. Liu, R. Li, and C.-L. Tsai (2010, 12). Estimation and testing for partially linear single-indexmodels.
Ann. Statist. 38 (6), 3811–3836.Lu, X. (2009). Empirical likelihood for heteroscedastic partially linear models.
Journal of MultivariateAnalysis 100 (3), 387–396.Ma, Y. and L. Zhu (2013, 03). Doubly robust and efficient estimators for heteroscedastic partially linearsingle-index model allowing high-dimensional covariates.
Journal of the Royal Statistical Society. SeriesB, Statistical methodology 75 , 305–322. 23erlev`ede, F. and M. Peligrad (2000). The functional central limit theorem under the strong mixingcondition.
Ann. Probab. 28 (3), 1336–1352.Nordman, D. J. and S. N. Lahiri (2014). A review of empirical likelihood methods for time series.
Journalof Statistical Planning and Inference 155 , 1–18.Owen, A. B. (1988). Empirical likelihood ratio confidence intervals for a single functional.
Biometrika 75 (2), 237–249.Owen, A. B. (2001).
Empirical likelihood . Chapman and Hall/CRC.Qin, J. and J. Lawless (1994). Empirical likelihood and general estimating equations.
Ann. Statist. 22 (1),300–325.Rio, E. (2000).
Th´eorie asymptotique des processus al´eatoires faiblement d´ependants , Volume 31 of
Math´ematiques & Applications (Berlin) [Mathematics & Applications] . Springer-Verlag, Berlin.Severini, T. A. and W. H. Wong (1992, 12). Profile likelihood and conditionally parametric models.
Ann.Statist. 20 (4), 1768–1802.Tong, H. (1990).
Nonlinear time series , Volume 6 of
Oxford Statistical Science Series . The ClarendonPress, Oxford University Press, New York. A dynamical system approach, With an appendix by K.S. Chan, Oxford Science Publications.Wang, Q.-H. and B.-Y. Jing (1999). Empirical likelihood for partial linear models with fixed designs.
Statistics & Probability Letters 41 (4), 425–433.Wang, Q.-H. and B.-Y. Jing (2003). Empirical likelihood for partial linear models.
Annals of the Instituteof Statistical Mathematics 55 (3), 585–595.Xia, Y. and W. H¨ardle (2006). Semi-parametric estimation of partially linear single-index models.
Journalof Multivariate Analysis 97 (5), 1162 – 1184.Xia, Y., H. Tong, and W. K. Li (1999). On extended partially linear single-index models.
Biometrika 86 (4), 831–842.Xue, L. and L. Zhu (2007). Empirical likelihood semiparametric regression analysis for longitudinal data.
Biometrika 94 (4), 921–937.Xue, L.-G. and L. Zhu (2006). Empirical likelihood for single-index models.
Journal of MultivariateAnalysis 97 (6), 1295–1312.Zhu, L., L. Lin, X. Cui, and G. Li (2010). Bias-corrected empirical likelihood in a multi-link semipara-metric model.
Journal of Multivariate Analysis 101 (4), 850 – 868.Zhu, L. and L. Xue (2006). Empirical likelihood confidence regions in a partially linear single-indexmodel.
Journal of the Royal Statistical Society: Series B (Statistical Methodology) 68 (3), 549–570.24 ilks’ theorem for semiparametric regressionswith weakly dependent data
Supplementary Material
Marie Du Roy de Chaumaray ∗ , Matthieu Marbac ∗ and Valentin Patilea ∗ ∗ Univ. Rennes, Ensai, CNRS, CREST - UMR 9194, F-35000 Rennes, France
This supplement is organized as follows. Appendix A contains additional proofs. Appendix B collectsadditional simulation and real data analysis results, that were omitted from the main paper due to thepage limit.
A Additional proofs
Lemma 1: equivalence of the moment conditions.
Firstly, without imposing any identification condition,note that for any positive function ω ( V i ), we have E [ g µ ( Z i ; γ, m ) | V i , F i − ] = 0 ⇔ E [ g µ ( Z i ; γ, m ) E [ g µ ( Z i ; γ, m ) | V i , F i − ] ω ( V i )] = 0 . Indeed, E [ g µ ( Z i ; γ, m ) | V i , F i − ] = 0 directly implies that E [ g µ ( Z i ; γ, m ) E [ g µ ( Z i ; γ, m ) | V i , F i − ] ω ( V i )] = 0 . Conversely, by elementary properties of the conditional expectation, E [ g µ ( Z i ; γ, m ) E [ g µ ( Z i ; γ, m ) | V i , F i − ] ω ( V i )] = E [ E [ g µ ( Z i ; γ, m ) | V i , F i − ] ω ( V i )] , and thus E [ g µ ( Z i ; γ, m ) E [ g µ ( Z i ; γ, m ) | V i , F i − ] ω ( V i )] = 0 implies that E [ g µ ( Z i ; γ, m ) | V i , F i − ] = 0 . For an identifiable model, ( γ , m ) is the unique solution for E [ g µ ( Z i ; γ, m ) | V i , F i − ] = 0. Therefore,by (4), for any ( γ, m ) = ( γ , m ), we have E [ E [ g µ ( Z i ; γ, m ) | V i , F i − ] ω ( V i )] > . By (7), we have that γ is the minimum of the map γ E [ E [ g µ ( Z i ; γ, m γ ) | V i , F i − ] ω ( V i )]. Thus, wehave ∇ γ E [ E [ g µ ( Z i ; γ , m ) | V i , F i − ] ω ( V i )] = 0 . By construction, ∇ γ g µ ( Z i ; γ, m γ ) only depends on V i , and thus interchanging derivative and expectationoperators we have ∇ γ E [ E [ g µ ( Z i ; γ, m γ ) | V i , F i − ] ω ( V i )] = 2 E [ g µ ( Z i ; γ, m γ ) ∇ γ g µ ( Z i ; γ, m γ ) ω ( V i )] , which leads to E [ g µ ( Z i ; γ , m γ ) ∇ γ g µ ( Z i ; γ , m γ ) ω ( V i )] = 0 . The proof is completed after left-multiplying both sides in the last display by the non-random matrix J ( γ ) and noting that the assumption made on H µ ( γ ) ensures that γ is the only critical point for themap γ E [ E [ g µ ( Z i ; γ, m γ ) | V i , F i − ] ω ( V i )]. Lemma 3.
By the Cramer-Wold device, it suffices to show that for any c ∈ R d ,1 √ n n X i =1 c > Ψ( Z i , Z { r } i ; θ , η ) d −→ N (0 , c > Σ c ) . (37)25s (cid:16) c > Ψ( Z i , Z { r } i ; θ , η ) (cid:17) is a strictly stationary, α -mixing, centered process, we notice that the CentralLimit Theorem follows by a direct application of Corollary 1.1 in (Merlev`ede and Peligrad, 2000), underAssumption 1.1. Indeed, let δ = s − >
0, where s is given in Assumption 1.1. By the Cauchy-Schwarz inequality and (20), as 2(2 + δ ) = 2 s , we obtain that E h k Ψ( Z i , Z { r } i ; θ , η ) k δ i < ∞ . If e α m denotes the mixing coefficients of the process (cid:16) c > Ψ( Z i , Z { r } i ; θ , η ) (cid:17) , we have e α m ≤ α m where, by (19), m α δ/ (2+ δ ) m → δ/ (2 + δ ) = s/ ( s − < ξ . To obtain (37), it remains to check that1 n E n X i =1 c > Ψ( Z i , Z { r } i ; θ , η ) ! → c > Σ c. (38)Using the stationarity of ( Z i ), and since by construction we have E h Ψ( Z j , Z { r } j ; θ , η ) | V j , F j − i = 0 a.s., E n X i =1 Ψ( Z i , Z { r } i ; θ , η ) ! = n X i =1 E h Ψ( Z i , Z { r } i ; θ , η )Ψ( Z i , Z { r } i ; θ , η ) > i + 2 X ≤ i
The property E (cid:20)(cid:13)(cid:13)(cid:13) Ψ( Z i , Z { r } i ; θ , η ) (cid:13)(cid:13)(cid:13) (cid:21) < ∞ follows by applying the Cauchy–Schwarz in-equality component-wise and using our moment conditions. Let M n = max ≤ i ≤ n k Ψ( Z i , Z { r } i ; θ , η ) k and C >
0. By Boole’s inequality, the stationarity of the process ( Z i ) and the Markov inequality, wehave n / P ( M n > Cn / ) ≤ n / E [ k Ψ( Z i , Z { r } i ; θ , η ) k ] / ( Cn / ) = C − E [ k Ψ( Z i , Z { r } i ; θ , η ) k ] < ∞ . Therefore, we have M n = o P ( n / ) . Moreover, we have1 n n X i =1 (cid:13)(cid:13)(cid:13) Ψ( Z i , Z { r } i ; θ , η ) (cid:13)(cid:13)(cid:13) ≤ M n n n X i =1 (cid:13)(cid:13)(cid:13) Ψ( Z i , Z { r } i ; θ , η ) (cid:13)(cid:13)(cid:13) Using the fact that M n = o P ( n / ), E (cid:20)(cid:13)(cid:13)(cid:13) Ψ( Z i , Z { r } i ; θ , η ) (cid:13)(cid:13)(cid:13) (cid:21) < ∞ and by Lemma 3, we have1 n n X i =1 (cid:13)(cid:13)(cid:13) Ψ( Z i , Z { r } i ; θ , η ) (cid:13)(cid:13)(cid:13) = o P ( n / ) . Lemma 5.
For any θ , we have that λ ( γ, η ) satisfies1 n n X i =1
11 + λ ( θ, η ) > Ψ( Z i , Z { r } i ; θ, η ) Ψ( Z i , Z { r } i ; θ, η ) = 0 , (39)26et λ ( θ , η ) = k λ ( θ , η ) k u , we want to show that k λ ( θ , η ) k = O P ( n − / ). Noting that { λ ( θ, η ) > Ψ( Z i , Z { r } i ; θ, η ) } − = 1 − λ ( θ, η ) > Ψ( Z i , Z { r } i ; θ, η ) / { λ ( θ, η ) > Ψ( Z i , Z { r } i ; θ, η ) } , we have from (39) k λ ( θ , η ) k u > e S ( θ , η ) u = 1 n u > n X i =1 Ψ( Z i , Z { r } i ; θ , η ) , (40)where e S ( θ , η ) = 1 n n X i =1 { λ ( θ , η ) > Ψ( Z i , Z { r } i ; θ , η ) } − × Ψ( Z i , Z { r } i ; θ , η )Ψ( Z i , Z { r } i ; θ , η ) > . By construction, λ ( θ , η ) > Ψ( Z i , Z { r } i ; θ , η ) + 1 > ∀ ≤ i ≤ n . Thus, we obtain that k λ ( θ , η ) k u > S ( θ , η ) u ≤ k λ ( θ , η ) k u > e S ( θ , η ) u (1 + k λ ( θ , η ) k M n ) , where S ( θ , η ) = n − P ni =1 Ψ( Z i , Z { r } i ; θ , η )Ψ( Z i , Z { r } i ; θ , η ) > and where M n is the largest valueamong the k Ψ( Z i , Z { r } i ; θ , η ) k ’s. Using (40) we deduce k λ ( θ , η ) k " u > S ( θ , η ) u − M n u > n n X i =1 Ψ( Z i , Z { r } i ; θ , η ) ≤ u > n n X i =1 Ψ( Z i , Z { r } i ; θ , η ) . Lemma 4 implies that M n = o P ( n / ) and Lemma 3 allows us to upper-bound the right side of theprevious inequality by O P ( n − / ). Moreover, we have ν + o P (1) ≤ u > S ( θ , η ) u, where ν > k λ ( θ , η ) k (cid:16) ν + o P (1) − o P ( n / ) O P ( n − / ) (cid:17) ≤ O P ( n − / ) , which implies that k λ ( θ , η ) k = O P ( n − / ) . Noting that nπ i ( θ , η ) = 1 − λ ( θ , η ) > Ψ( Z i , Z { r } i ; θ , η ) + n λ ( θ , η ) > Ψ( Z i , Z { r } i ; θ , η ) o λ ( θ , η ) > Ψ( Z i , Z { r } i ; θ , η ) , we have from (39) that1 n n X i =1 Ψ( Z i , Z { r } i ; θ , η ) − S ( θ , η ) λ ( θ , η )+ 1 n n X i =1 n λ ( θ , η ) > Ψ( Z i , Z { r } i ; θ , η ) o λ ( θ , η ) > Ψ( Z i , Z { r } i ; θ , η ) Ψ( Z i , Z { r } i ; θ , η ) = 0 . Using Lemma 4, we deduce that max ≤ i ≤ n { λ ( θ , η ) > Ψ( Z i , Z { r } i ; θ , η ) } − = 1 + o P (1), and thenorm of the second sum on the left-hand side of the last display, can be bounded by1 n n X i =1 k Ψ( Z i , Z { r } i ; θ , η ) k k λ ( θ , η ) k λ ( θ , η ) > Ψ( Z i , Z { r } i ; θ , η ) = o P ( n / ) O P ( n − ) O P (1) = o P ( n − / ) . Thus λ ( θ , η ) = S ( θ , η ) − n n X i =1 Ψ( Z i , Z { r } i ; θ , η ) + o P ( n − / ) . emma 6. We decompose A X = A X, + A X,a + A X,b + A X,c + A X,d + A X, + A X, where A X, = 1 n n X i =1 ε i (cid:20) ∇ γ l ( X i ; γ , ) − η γ ,X (Υ i ) η γ ,f (Υ i ) (cid:21) η γ ,f (Υ i ) ,A X,a = 1 n n X i =1 ε i [ b η γ ,f (Υ i ) ∇ γ l ( X i ; γ , ) − η γ ,X (Υ i )] b η γ ,f (Υ i ) ,A X,b = 1 n n X i =1 ε i [ η γ ,X (Υ i ) − b η γ ,X (Υ i )] b η γ ,f (Υ i ) ,A X,c = 1 n n X i =1 [ m (Υ i ) η γ ,f (Υ i ) − b η γ ,m (Υ i )] × [ η γ ,f (Υ i ) ∇ γ l ( X i ; γ , ) − b η γ ,X (Υ i )] b η γ ,f (Υ i ) ,A X,d = 1 n n X i =1 m (Υ i ) [ b η γ ,f (Υ i ) − η γ ,f (Υ i )] × [ η γ ,f (Υ i ) ∇ γ l ( X i ; γ , ) − b η γ ,X (Υ i )] b η γ ,f (Υ i ) , and A X, = 1 n n X i =1 [ m (Υ i ) η γ ,f (Υ i ) − b η γ ,m (Υ i )] × [ b η γ ,f (Υ i ) − η γ ,f (Υ i )] ∇ γ l ( X i ; γ , ) b η γ ,f (Υ i ) ,A X, = 1 n n X i =1 m (Υ i ) [ b η γ ,f (Υ i ) − η γ ,f (Υ i )] × [ b η γ ,f (Υ i ) − η γ ,f (Υ i )] ∇ γ l ( X i ; γ , ) b η γ ,f (Υ i ) . After taking the norm of the sums, by the triangle inequality and (25), k A X, k + k A X, k = " O P (cid:18) ln nnh (cid:19) / ! + O P (cid:0) h (cid:1) × O P (1) = o P ( n − / ) . The terms A X,a to A X,d require a more refined treatment. We can write A X,a = 1 n n X i =1 ε i [ b η γ ,f (Υ i ) ∇ γ l ( X i ; γ , ) − η γ ,X (Υ i )] b η γ ,f (Υ i )= 1 n n X i =1 ε i [ b η γ ,f (Υ i ) ∇ γ l ( X i ; γ , ) − η γ ,X (Υ i )] η γ ,f (Υ i ) + r X,a , (41)with k r X,a k = o P ( n − / ). The rate of the negligible reminder r A,a is again obtained after taking thenorm of the sums, by (25). Finally, using the same arguments as used for bounding A W,b in Lemma 7,we also obtain k A X,a k + k A X,b k = o P ( n − / ) . To bound A X,c and A X,d , first we replace b η γ ,f (Υ i ) by η γ ,f (Υ i ), as in (41). We next use the the same arguments as for for bounding A W,c and A W,d in Lemma7, and we obtain k A X,c k + k A X,d k = o P ( n − / ) . o rhum temp dptp pm B Additional empirical evidence
B.1 Additional results on the real data application
This section presents additional results on the real data application. Table 4 presents the autocorrelationsof the different variables. Figures 1-5 present the original series and the series obtained by removing theseasonality. Figure 6 and Figure 7 present the estimated density of the index and the estimated functionˆ m ( · ), obtained with the Lag (2) setup. lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll i o3 i (a) Original data lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll − − i o3 i (b) Deseasonalized data Figure 1: Series of the daily mean ozone level collected in Chicago in 1996-1997.
B.2 A conditional variance semiparametric model
In this section we extend the scope of our models. As mentioned in section 6, Y i could be observed withsome error. For illustration, consider a time series ( R i ), solution of the AR(1) equation R i = ρ R i − + u i , i ∈ Z , (42)Consider the PLSIM Y i = u i = µ ( V i ; γ, m ) + ε i with µ ( V i ; γ, m ) = l ( X i ; γ ) + m ( W > i γ ) > , (43)with E [ ε i | V i , F i − ] = 0 a.s. and ( V > i , ε i ) > ∈ R d X + d W × R a strictly stationary and strongly mixingsequence. Moreover, E [ u i | V i , F i − ] = 0 a.s. A more common way to write model (43) is u i = √ µ i ν i , (44)with ( ν i ) a strong white noise process with unit variance, and µ i a positive function of the past valuesof u i . The ARCH model is a typical example. When covariates are also allowed to enter the expressionof µ i , one obtains a particular example of the so called GARCH-X models. See Han and Kristensen29 llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll i o3 i (a) Original data lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll − − − i o3 i (b) Deseasonalized data Figure 2: Series of the daily relative humidity level collected in Chicago in 1996-1997. lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll − − i T e m p i (a) Original data lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll − − i T e m p i (b) Deseasonalized data Figure 3: Series of the daily mean temperature collected in Chicago in 1996-1997.(2014). Here we allow for a flexible semiparametric form µ i = µ ( V i ; γ, m ) and our additive error term is ε i = µ ( V i ; γ, m )( ν i − . Although the conditional variance of Y i is not constant and the Y i are not directly observed, thePLSIM is still applicable, as we will briefly justify in the following. Instead of Y i , one has e Y i = ( R i − e ρR i − ) = Y i + R i − ( e ρ − ρ ) − u i R i − ( e ρ − ρ ) . Here, e ρ is the least-squares estimator of ρ . Let be η γ be the vector of nonparametric estimators defined in(18) obtained with e Y i instead of Y i . Only the components be η γ,m and be η γ,m are affected by the fact thatthe Y i ’s are not available. Given the expression of e Y i − Y i we deduce that be η γ,m ( t ) = b η γ,m ( t ) − e ρ − ρ ) nh n X i =1 u i R i − K (cid:18) W > i γ − th (cid:19) + ( e ρ − ρ ) O P (1) , uniformly with respect to t , and a similar representation holds true for be η γ,m . Using the fact that e ρ − ρ = O P ( n − / ) and E [ u i | V i , F i − ] = 0 a.s., the arguments in the proof of Theorem 1 remain validand the limit of the ELR is still a chi-square distribution. The technical details are quite straightforwardand thus are omitted. Instead, we propose an illustration using simulation data.30 llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll − i dp t p i (a) Original data lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll − − − − i dp t p i (b) Deseasonalized data Figure 4: Series of the daily dew point temperature collected in Chicago in 1996-1997. lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll − i p m i (a) Original data lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll − − i p m i (b) Deseasonalized data Figure 5: Series of the daily PM10-level collected in Chicago in 1996-1997. −2 −1 0 1 2 3 4 . . . . . . . index den s i t y Figure 6: Density of the index obtained on data of year 1997.31 − . − . . . . index den s i t y Figure 7: Drawing of the estimator ˆ m ( · ).Table 5: Empirical probabilities of rejection obtained from 5000 replications using the PLSIM for testingthe order for the lagged values of Y i in the parametric part ‘ ( · ; γ ).Test n = 1000 n = 2000 n = 4000 n = 8000ref. estim. ref. estim. ref. estim. ref. estim.Lag(1) 0.117 0.120 0.089 0.097 0.081 0.078 0.064 0.063Lag(0) 0.422 0.402 0.678 0.654 0.942 0.932 1.000 0.999Lag(2) 0.538 0.531 0.695 0.687 0.887 0.874 1.000 1.000We generated data from model (42)-(44). First, we generate the covariates W i = ( W i , W i , W i ) > from a multivariate Gaussian distribution with mean W i − / W ik , W i‘ ) =0 . | k − ‘ | . Then, we set X i = ( U i − , U i − ) > and we generate the U i from (43)- (44) with ‘ ( X i ; γ ) = γ U i − + γ U i − and m ( u ) = 14 + 34 sin ( uπ ) , (45)with γ = (0 . , > , γ = (1 , , > . Finally, the variables R i can be computed from (42) with ρ = 0 . R i ’s, we can compute ˜ Y i and use the proposed EL procedure for testing theorder of the lagged values to consider in the parametric function. Hypothesis testing is based on Wilks’Theorem in Section 4.3 (results related to this method are named estim ), along with the unfeasible ELapproach that previously learns the nonparametric estimators on a sample of size 10 (this case mimicsthe situation where m , m and the density of the index are known; results related to this method arenamed ref ). The nonparametric elements are estimated by the Nadaraya-Watson method with Gaussiankernel and bandwidth h = C − n − / where C is the standard deviation of the index. Thus, we considerthe tests introduced in Section 5. The empirical probabilities of rejection are presented in Table 5 for anominal level of 0 . References [1] Han, H. and D. Kristensen (2014). Asymptotic for the QMLE in GARCH-X Models With Stationaryand Nonstationary Covariates.