[PDF] On estimation and prediction in spatial functional linear regression model

Abstract

We consider a spatial functional linear regression, where a scalar response is related to a square integrable spatial functional process. We use a smoothing spline estimator for the functional slope parameter and establish a finite sample bound for variance of this estimator under mixing spatial dependence. Then, we give a bound of the prediction error. Finally, we illustrate our results by simulations

Full PDF

aa r X i v : . [ m a t h . S T ] A ug On estimation and prediction in spatialfunctional linear regression model

St´ephane BOUKA , Sophie DABO-NIANG , and Guy Martial NKIET URMI, Universit´e des Sciences et Techniques de Masuku, Franceville,Gabon. Laboratoire LEM, CNRS 9221, Universit´e de Lille, France. INRIA-MODAL, Lille, France.E-mail : [email protected];[email protected]; [email protected].

Abstract.

We consider a spatial functional linear regression, where a scalarresponse is related to a square integrable spatial functional process. We use asmoothing spline estimator for the functional slope parameter and establisha ﬁnite sample bound for variance of this estimator under mixing spatialdependence. Then, we give a bound of the prediction error. Finally, weillustrate our results by simulations.

AMS 1991 subject classiﬁcations:

Key words:

Functional linear regression; spatial functional process ; mixingspatial dependence

Consider the following spatial functional linear regression model where thespatial scalar response ( Y i ∈ R , i ∈ D ⊂ Z d ) is related to a square integrablespatial functional process ( X i ∈ F , i ∈ D ⊂ Z d ) through Y i = β + Z I β ( t ) X i ( t ) dt + ǫ i , i ∈ Z d (1)where β is a constant, I is the domain of X i , F is a space of functionsendowed with a semi-norm, β is an unknown function representing the slopefunction, and ( ǫ i ) i ∈ Z d is a centered random spatial noise and with variance σ ǫ >

0. The functional linear regression with functional or scalar responsehas been the focus of various investigations. There exist many contributions1n this ﬁeld for non spatial data, and recent references are: [1], [6], [7], [8],[15], [16], [17], [18], [23]. This work is motivated by a large number of appli-cations for which the data are of spatial nature. For example, non-parametricprediction from kriging methods for geostatistical functional data was tack-led in [3], [4], [11], [12], [13], [14] and [19] whereas spatial autoregressivefunctional models were considered in [20, 21]. In this paper, we are inter-ested in estimation of the slope function β in model (1). To the best of ourknowledge, this problem have not yet been considered for the basic spatialfunctional linear regression model, but only for non spatial data (e.g [7]) orfor spatial linear regression model with derivatives (see [5]). The paper isorganized as follow. The section 2 is devoted to the estimation of the esti-mator that will be use. Assumptions and main results are stated in Section3, and a simulation study is given in Section 4. The proofs are postponed toSection 5. In this section, we give an estimator of β in (1) by using an approach similarto the one of [7]. Since this procedure of estimation does not take into accountthe nature of the dependence of the data, we obtain an estimator that has thesame form than that of [7]. The process ( X i , Y i ) i ∈ Z d is deﬁned on probabilityspace (Ω , A , P ) with the same distribution as a couple of variable ( X, Y ). For n = ( n, · · · , n ) with n ∈ N ∗ , let I n := { , · · · , n } d be a grid of points in Z d and consider observations ( X i , Y i ) i ∈I n . We assume that the random functions X i are observed at p equidistant points t , ..., t p ∈ I := [0 , t j = jp for all j = 1 , ..., p . By using the lexico-graphic order, the previous sampleis rewritten as { ( X i i , Y i i ) } ≤ i ≤ n d , then we put Y = ( Y i − Y , ..., Y i nd − Y ) T (where u T denotes the transposed of u ) and we consider the n d × p matrix X with general term X i i ( t j ) − X ( t j ) for i = 1 , ..., n d , j = 1 , ..., p . Then, weconsider the estimator b β of β given by b β ( t ) = D ( t ) T ( D T D ) − D T b β (2)with b β = 1 n d (cid:18) n d p X T X + ρ A m (cid:19) − X T Y , (3)2here ρ > A m is a p × p symmetric matrix deﬁnedfrom B-splines (see [7] for details), D ( t ) = ( D ( t ) , · · · , D p ( t )) T is a functionalbasis of the p -dimensional linear space N S m ( t , · · · , t p ) of functions v havinga m -th order derivative v ( m ) that belongs to L ([0 , D is the p × p matrix with general term D i ( t j ) for i, j = 1 , · · · , p . For estimating theintercept β we take b β = Y − D b β, X E , where h ., . i denotes the usual innerproduct of L ([0 , In this section, we ﬁrst introduce the assumptions that are needed to obtainthe main results of the paper, then theorems that give the rate of convergenceof the estimator b β and also that of the prediction at a non-visited site areestablished. Assumption 1 β is m -times diﬀerentiable and β ( m ) belongs to L ([0 , . Assumption 2

There exist κ ∈ ]0 , , δ > and C > such that, for any ( t, s ) ∈ I , P ( | X ( t ) − X ( s ) | ≤ C | t − s | κ ) ≥ − δ . Assumption 3

For C ∈ R ∗ + and all r ∈ N ∗ there exists a r -dimensionallinear subspace L r of L ([0 , and a real q ∈ ]0 , such that E (cid:18) inf f ∈L r sup t | X ( t ) − f ( t ) | (cid:19) ≤ C r − q . Assumption 4

For any ( j, ℓ ) ∈ N ∗ , V ar  n d n d X i =1 h X i i − E ( X ) , ζ j i h X i i − E ( X ) , ζ ℓ i  ≤ C n d E (cid:0) h X − E ( X ) , ζ j i (cid:1) E (cid:0) h X − E ( X ) , ζ ℓ i (cid:1) where < C < ∞ and { ζ j } j ∈ N ∗ is a complete orthonormal system of eigen-functions of the operator Γ from L ([0 , to itself deﬁned by: Γ u := E ( < u, X − E ( X ) > ( X − E ( X ))) , ach ζ j being associated with the j -th largest eigenvalue λ j . Assumptions 1–4 are technical conditions that are similar to the ones con-sidered in [7]. In order to give the remaining assumptions, let us ﬁrst recallthe notion of polynomial mixing dependence. Letting α be the α -mixingcoeﬃcient given, for two sub σ -algebras U and V of A , by α ( U , V ) = sup {| P ( A ∩ B ) − P ( A ) P ( B ) | , A ∈ U , B ∈ V} , we consider the strong mixing coeﬃcient (see [10]) related to a random ﬁeld( Z i ) i ∈ Z d , deﬁned as α , ∞ ( u ) = sup { α ( σ ( Z i ) , F Λ ) , i ∈ Z d , Λ ⊂ Z d , δ (Λ , { i } ) ≥ u } , (4)where F Λ = σ ( Z i ; i ∈ Λ) and the distance δ is deﬁned for any subsets Γ andΓ of Z d by δ (Γ , Γ ) = min {|| i − j || , i ∈ Γ , j ∈ Γ } where k . k is the usualEuclidean norm of R d . Then, ( Z i ) i ∈ Z d is polynomial mixing if the relatedstrong mixing coeﬃcients satisfy α , ∞ ( u ) = O ( u − θ ), θ > Assumption 5 { ǫ i } i ∈ Z d is a strictly stationary random ﬁeld, polynomialmixing, independent of { X i } i ∈ Z d and such that sup i ∈ Z d | ǫ i | < M almostsurely, where M is a strictly positive constant. Assumption 6 { ( X i , Y i ) } i ∈ Z d is a strictly stationary and polynomial mixingrandom ﬁeld. Assumption 7

There exists M > such that for all i ∈ Z d , k X i k < M almost surely. Assumptions 5 and 6 are classical assumptions (see [2]). Assumption 7 hasalready been made in some works (see, e.g., [18]).

Assumption 8 X is an isotropic process such that for all t , u in [0 , , Cov ( X i i ( t ) , X i j ( u )) = g ( | t − u | ) Ψ ( δ ( { i i } , { i j } )) and Ψ (0) = 1 where g is a positive function and Ψ is a known R + -valued decreasing functionthat veriﬁed P ∞ t =1 t d − Ψ ( t ) < ∞ . The separable covariance structure stated in Assumption 8 has also beenused in [17]. Examples on isotropic spatial models can be founded in [9]. Wemay mention for instance, the exponential spatial model.4 .2 The results

We consider the semi-norm k . k Γ deﬁned by || u || := h Γ u, u i , u ∈ L ([0 , . (5)and the discretized empirical semi-norm deﬁned for any u ∈ R p as k u k n,p := 1 p u T (cid:18) n d p X T X (cid:19) u . The following theorem gives a bound of the estimator’s variance. In thistheorem, E ǫ refers to the conditional expectation given X i , ..., X i nd . Theorem 1

Under Assumptions 1, 5, 6 and 7 with α , ∞ ( u ) = O ( u − θ ) , θ >d , for all ρ > n − md , if the eigenvalues λ x, ≥ λ x, ≥ ... ≥ λ x,p ≥ of / ( n d p ) X T X satisfy P pj = r +1 λ x,j ≤ C.r − q with C > , q > and r := ⌊ ρ − / (2 m +2 q +1) ⌋ , then E ǫ ( k b β − E ǫ ( b β ) k n,p ) ≤ (cid:18) σ ǫ n d + c ln nn d (cid:19) (cid:0) m + ⌊ ρ − / (2 m +2 q +1) ⌋ (2 + C.C ) (cid:1) (6) where C > , c > and ⌊ x ⌋ stands the integer part of x . Using Theorem 1 and Arguing as in [7] , we obtain the Corollary below.

Corollary 1

Under assumptions of Theorem together withAssumptions - , as well as n d p − κ = O (1) , ρ → , / ( n d ρ ) → as n, p → ∞ we have k b β − β k = O p (cid:16) ρ + (cid:0) n d ρ / (2 m +2 q +1) (cid:1) − ln n + n − d (2 q +1) / (cid:17) (7)Next, we give a bound for prediction error. For that, we assume what follows Assumption 9

The non-visited site i is such that δ ( { i } , { i , ..., i n } ) ≥ ⌊ n d/θ ⌋

5n this Assumption 9, it is suﬃcient to choice θ large for doing the predictionat any non-visited site.we consider the prediction b Y i and the ”theoretical” prediction Y ∗ i at a non-visited site i ∈ Z d such that ( X i , Y i ) has the same distribution than ( X, Y ).In fact, b Y i = b β + D b β, X i E and Y ∗ i = β + h β, X i i (8)We are interested by the bound of the prediction error between b Y i and Y ∗ i . Theorem 2

Suppose that assumptions of Corollary together with assump-tions – hold. If P j ≥ λ / j < ∞ , q > , ρ ∼ n − d (2 m +2 q +1) / (2 m +2 q +2) and p is chosen suﬃciently large compared to n d , then E (cid:16) ( b Y i − Y ∗ i ) | b β , b β (cid:17) = O p (cid:0) n − d/ (2 m +2 q +2) (cid:1) This section presents the results of simulations made in order to evaluatethe performances of the proposed methods for slope estimation and predic-tion in the model (1). We computed estimation and prediction errors fromsimulated spatial data in Z . Using the lexico-graphic order, we generateda sample { ( X i ℓ , Y i ℓ ) } ≤ ℓ ≤ n as follows: we consider the 15-th ﬁrst elements B , · · · , B of the B-splines basis. For k = 1 , · · · ,

15, we generate a vector( ξ i ,k , · · · , ξ i n ,k ) T from a normal distribution N (0 , Σ ) in R n , where Σ isthe n × n covariance matrix with general term Σ ij = exp( − k i i − i j k ).Further, we generate a vector (Λ i ( t ) , · · · , Λ i n ( t )) T from a normal distribu-tion N (0 , Σ ) in R n , where Σ is the n × n covariance matrix with generalterm Σ ij = 0 .

09, and for ℓ = 1 , · · · , n we take X i ℓ ( t ) = X k =1 ξ i ℓ ,k B k ( t ) + Λ i ℓ ( t ) . Considering 1001 equispaced points in [0 , Y i ℓ by approx-imating the integral in equation (1) using the rectangular method. Thatgives Y i ℓ = X j =1 β ( t j ) X i ℓ ( t j ) + ǫ i ℓ t j = j − , j = 1 , · · · , ǫ i , · · · , ǫ i n ) T is generated froma normal distribution N (0 , σ ǫ Σ ) with σ ǫ controlled by the signal-to-noiseratio (snr) deﬁned by snr = E [ h β, X i ] E [ h β, X i ] + σ ǫ , and β is a given function. We considered two cases for the function β givenby: Case A : β ( t ) = [sin(2 πt )] ; Case B : β ( t ) = (0 . − t ) .The estimator b β of β in model (1) is computed by using the function ”fre-gre.basis” of the R fda package. We assess performance of our methodsthrough the semi-norm k . k Γ deﬁned in (15) for evaluating the estimationerror between b β and β , and through the mean squared error (MSE) for eval-uating the prediction error between the prediction b Y i and the ”theoretical”prediction Y ∗ i at the non-visited site i = (13 . , X i is obtained by theordinary krigging method, and b Y i and Y ∗ i are obtained as deﬁned in (8).We take snr = 5% ,

10% and n = 10 , , ,

25 over 100 replications and weobtain the following tables.snr(%) Case n = 10 n = 15 n = 20 n = 25 Table 1:

Estimation errorssnr(%) Case n = 10 n = 15 n = 20 n = 25 Table 2:

Prediction errors at a non-visited site i = (13 . , i = (13 . ,

5) is beyond the grid of size n = 10 whereas it is inside thegrid of size n = 15 . We remark that when this point is inside the gridthe prediction errors decrease as the sample size increases. Also, we see thatestimation and prediction errors are small even when the sample size and thesnr increase. Conclusion

In this paper, we propose to study asymptotic properties of a smoothingsplines estimator of slope function in a spatial functional linear regressionmodel, where a scalar response is related to a square integrable spatial func-tional process. The originality of the proposed method is to consider spatiallydependent data. The main diﬃcult is technical, especially in the proof of theprediction error because of the presence of the data spatial dependency. Theprediction proposed in this work is available as well as for the points insidethe grid than those beyond the grid compared to [5] where the prediction isonly available for the points beyond the grid. One can then see the proposedmethodology as a good alternative to [7] when available data are spatiallydependent.

Let M = (cid:18) n d p X T X + ρ A m (cid:19) − (cid:18) n d p X T X (cid:19) ; (9)then we have: Lemma 1 tr ( M ) ≤ tr ( M ) .Proof. Since A m is a symmetric nonnegative matrix, its has a square root,denoted by A / m , that is also a symmetric nonnegative matrix. Denoting by A − / m the inverse of A / m and by I p the p × p identity matrix, we have: M = A − / m (cid:18) n d p A − / m X T XA − / m + ρI p (cid:19) − (cid:18) n d p A − / m X T X (cid:19) . n d p A − / m X T XA − / m = P pℓ =1 µ ℓ u ℓ u Tℓ , where the µ ℓ ’s are the nonegative eigenvalues and { u ℓ } ≤ ℓ ≤ p is an orthonormalbasis of R p consisting of eigenvectors, it follows: M = p X ℓ =1 p X k =1 µ k µ ℓ + ρ A − / m u ℓ u Tℓ u k u Tk A / m = p X ℓ =1 µ ℓ µ ℓ + ρ A − / m u ℓ u Tℓ A / m . Therefore, since tr ( A − / m u ℓ u Tℓ A / m ) = tr ( u Tℓ A / m A − / m u ℓ ) = tr ( u Tℓ u ℓ ) = 1,we deduce that tr ( M ) = P pℓ =1 µ ℓ µ ℓ + ρ . Finally, tr (cid:0) M (cid:1) = tr p X ℓ =1 p X k =1 (cid:18) µ ℓ µ ℓ + ρ (cid:19) (cid:18) µ k µ k + ρ (cid:19) A − / m u ℓ u Tℓ u k u Tk A / m ! = p X ℓ =1 (cid:18) µ ℓ µ ℓ + ρ (cid:19) ≤ p X ℓ =1 µ ℓ µ ℓ + ρ = tr ( M ) . (cid:3) PuttingΘ = X (cid:18) n d p X T X + ρ A m (cid:19) − (cid:18) n d p X T X (cid:19) (cid:18) n d p X T X + ρ A m (cid:19) − X T , we have E ǫ (cid:18)(cid:13)(cid:13)(cid:13)b β − E ǫ ( b β ) (cid:13)(cid:13)(cid:13) n,p (cid:19) = 1 p E ǫ n d ǫ T X (cid:18) n d p X T X + ρ A m (cid:19) − (cid:18) n d p X T X (cid:19) (cid:18) n d p X T X + ρ A m (cid:19) − X T ǫ ! = 1 n d p  n d X i =1 Θ ii E (cid:0) τ i (cid:1) + n d X i =1 n d X j =1 j = i Θ ij E ( τ i τ j )  (10)9here τ i = ǫ i i − ǫ , with ǫ = n − d P n d j =1 ǫ i j . Putting σ ǫ = E (cid:0) ǫ i i (cid:1) , we deducefrom τ i = ǫ i i − ǫ i i ǫ + ǫ and the strict stationarity that E (cid:0) τ i (cid:1) = σ ǫ − E  ǫ i i  n d n d X j =1 ǫ i j  + E (cid:0) ǫ (cid:1) = (cid:18) − n d (cid:19) σ ǫ − n d n d X j =1 j = i E (cid:0) ǫ i i ǫ i j (cid:1) + 1 n d n d X k =1 n d X j =1 j = k E (cid:0) ǫ i k ǫ i j (cid:1) ≤ σ ǫ + 2 n d n d X j =1 j = i (cid:12)(cid:12) E (cid:0) ǫ i i ǫ i j (cid:1)(cid:12)(cid:12) + 1 n d n d X k =1 n d X j =1 j = k (cid:12)(cid:12) E (cid:0) ǫ i k ǫ i j (cid:1)(cid:12)(cid:12) . (11)Notice that, putting Q n = ⌊ (ln n ) /d ⌋ , we have n d X j =1 j = i (cid:12)(cid:12) E (cid:0) ǫ i i ǫ i j (cid:1)(cid:12)(cid:12) = n d X j =1 <δ ( { i j } , { i i } ) ≤ Q n (cid:12)(cid:12) E (cid:0) ǫ i i ǫ i j (cid:1)(cid:12)(cid:12) + n d X j =1 δ ( { i j } , { i i } ) >Q n (cid:12)(cid:12) E (cid:0) ǫ i i ǫ i j (cid:1)(cid:12)(cid:12) . Then, using the Cauchy-Schwartz inequality as well as Lemma 2 . ii ) in[22], we obtain, under Assumption 1: n d X j =1 j = i (cid:12)(cid:12) E (cid:0) ǫ i i ǫ i j (cid:1)(cid:12)(cid:12) ≤ σ ǫ  n d X j =1 <δ ( { i j } , { i i } ) ≤ Q n  + b n d X j =1 δ ( { i j } , { i i } ) >Q n α , ∞ ( δ ( { i i } , { i j } )) ≤ σ ǫ Q n X t =1 t d − + b ∞ X k = Q n +1 X k ≤ t d , this ﬁnally gives: n d X j =1 j = i (cid:12)(cid:12) E (cid:0) ǫ i i ǫ i j (cid:1)(cid:12)(cid:12) ≤ σ ǫ ln( n ) + K , (12)where K is a positive constant. Therefore, from (11), it follows: E (cid:0) τ i (cid:1) ≤ σ ǫ + 3 n d (cid:0) σ ǫ ln( n ) + K (cid:1) . (13)Clearly, P n d i =1 Θ ii = tr (Θ) = n d p tr ( M ), where M is deﬁned in (9). Then,from Lemma 1, and the proof of Theorem 1 in Crambes et al (2009) (see p.55-56) that shows that tr ( M ) ≤ m + ρ − / (2 m +2 q +1) (2 + C.C ) where C and C are positive constants, it follows: n d X i =1 Θ ii ≤ n d p (cid:0) m + ρ − / (2 m +2 q +1) (2 + C.C ) (cid:1) . (14)Then, we deduce from (13) and (14) that1 n d p n d X i =1 Θ ii E (cid:0) τ i (cid:1) ≤ (cid:18) σ ǫ n d + c ln( n ) n d (cid:19) (cid:0) m + ρ − / (2 m +2 q +1) (2 + C.C ) (cid:1) , (15)where c is a positive constant. On the other hand, E ( τ i τ j ) = E (cid:0) ǫ i i ǫ i j − ǫ i i ǫ − ǫ i j ǫ + ǫ (cid:1) = E (cid:0) ǫ i i ǫ i j (cid:1) − n d σ ǫ − n d n d X k =1 k = i E ( ǫ i i ǫ i k ) − n d n d X k =1 k = j E (cid:0) ǫ i j ǫ i k (cid:1) + 1 n d n d X k =1 n d X ℓ =1 ℓ = k E ( ǫ i k ǫ i ℓ ) . Then, using (12), we obtain | E ( τ i τ j ) | ≤ | E (cid:0) ǫ i i ǫ i j (cid:1) | + σ ǫ n d + n d ( σ ǫ ln( n ) + K ),and n d X j =1 j = i | E ( τ i τ j ) | ≤ n d X j =1 j = i | E (cid:0) ǫ i i ǫ i j (cid:1) | + σ ǫ + 3 (cid:0) σ ǫ ln( n ) + K (cid:1) ≤ σ ǫ + 4 (cid:0) σ ǫ ln( n ) + K (cid:1) ≤ K ln( n ) , (16)11here K is a positive constant. Note that Θ = B , where B = ( n d p ) − X (cid:18) n d p X T X + ρ A m (cid:19) − X T ;then | Θ ij | = | n d X k =1 B ik B kj | ≤ n d X k =1 ( B ik + B kj ) = 12 (Θ ii + Θ jj ) , and, putting S = n d p P n d i =1 P n dj =1 j = i Θ ij E ( τ i τ j ), we deduce from this inequalityand from (14) and (16) that | S | ≤ n d p n d X i =1 n d X j =1 j = i | Θ ij | | E ( τ i τ j ) | ≤ n d p n d X i =1 Θ ii n d X j =1 j = i | E ( τ i τ j ) |≤ K ln( n ) n d p n d X i =1 Θ ii ≤ K ln( n ) n d (cid:0) m + ρ − / (2 m +2 q +1) (2 + C.C ) (cid:1) . (cid:3) Under assumptions Theorem 2, we have k b β − β k = O (cid:18) ρ (cid:19) a.s . We have k b β − β k ≤ (cid:16) k b β k + k β k (cid:17)

12n the one hand, from assumption 1, we have k β k < C < ∞ . On the otherhand, for p large enough, we have k b β k = Z b β ( t ) dt − p p X j =1 b β ( t j ) ! + 1 p b β T b β ≤ M + 1 p b β T b β Set V = ( V − V , · · · , V n d − V ) T , where V ℓ = R β ( t ) X i ℓ ( t ) dt − p P pj =1 β ( t j ) X i ℓ ( t j ), ℓ = 1 , · · · , n d . Then, by deﬁnition of b β , we have1 p b β T b β ≤ p β T n d p X T X (cid:18) n d p X T X + ρ A m (cid:19) − n d p X T X β + 3 n d p V T X (cid:18) n d p X T X + ρ A m (cid:19) − X T V (17)+ 3 n d p ǫ T X (cid:18) n d p X T X + ρ A m (cid:19) − X T ǫ The ﬁrst and second term on the right-hand side of (17) are bounded as in[7] (see p. 57), that is to say3 p β T n d p X T X (cid:18) n d p X T X + ρ A m (cid:19) − n d p X T X β = O (1) , n d p V T X (cid:18) n d p X T X + ρ A m (cid:19) − X T V = O (cid:18) p − κ ρ (cid:19) Set W = 1 n d p X (cid:16) n d p X T X + ρ A m (cid:17) − X T = BB T where B = 1 p n d p X (cid:16) n d p X T X + ρ A m (cid:17) − . We have N := 3 n d p ǫ T X (cid:18) n d p X T X + ρ A m (cid:19) − X T ǫ = 3 n d n d X i,j =1 W ij ( ǫ i i − ǫ )( ǫ i j − ǫ )= 3 n d n d X i =1 W ii ( ǫ i i − ǫ ) + 3 n d n d X i,j =1 i = j W ij ( ǫ i i − ǫ )( ǫ i j − ǫ ) := N + N | N | ≤ M n d n d X i =1 W ii = 12 M n d tr ( W ) ≤ M n d tr [( ρ A m ) − ] = O (cid:18) n d ρ (cid:19) a.sand since | W ij | = p X k =1 B ik ( B T ) kj ≤ p X k =1 (cid:26) B ik + h(cid:0) B T (cid:1) kj i (cid:27) = 12 ( W ii + W jj )it follows that | N | ≤ M n d n d X i,j =1 i = j | W ij | ≤ M tr ( W ) = O (cid:18) ρ (cid:19) a.sWe then obtain the result of Lemma 2. (cid:3) Proof of Theorem 2 : B := E n E (cid:16) ( b β + D b β, X i E − β − h β, X i i ) | b β , b β (cid:17)o = E (cid:26) E (cid:20)(cid:16) b β − β + D b β − β, X i E(cid:17) | b β , b β (cid:21)(cid:27) = E (cid:26) E (cid:20)(cid:16)D β − b β, X E + D b β − β, X i E(cid:17) | b β , b β (cid:21)(cid:27) = E (cid:18)D b β − β, X i − X E (cid:19) = E (cid:18)(cid:16)D b β − β, X i − E ( X i ) E + D b β − β, E ( X ) − X E(cid:17) (cid:19) ≤ (cid:20) E (cid:18)D b β − β, X i − E ( X i ) E (cid:19) + E (cid:18)D b β − β, X − E ( X ) E (cid:19)(cid:21) := B + B Since, from assumption 7 and Lemma 2, we have (cid:12)(cid:12)(cid:12)D b β − β, X i − E ( X i ) E(cid:12)(cid:12)(cid:12) ≤ k b β − β kk X i − E ( X i ) k = O (1 / √ ρ ) a.s.and E (cid:0) h X i − E ( X i ) , ζ j i (cid:1) ≤ M h Γ ζ j , ζ j i , it follows from Lemma 2 . i ) in1422], Assumption 7 and Lemma 2 that B := 2 E (cid:16)DD b β − β, X i − E ( X i ) E ( X i − E ( X i )) , b β − β E(cid:17) = 2 X j ≥ E (cid:16)D b β − β, X i − E ( X i ) E h X i − E ( X i ) , ζ j i D b β − β, ζ j E(cid:17) ≤ X j ≥ (cid:13)(cid:13)(cid:13)D b β − β, X i − E ( X i ) E h X i − E ( X i ) , ζ j i (cid:13)(cid:13)(cid:13) (cid:13)(cid:13)(cid:13)D b β − β, ζ j E(cid:13)(cid:13)(cid:13) × [ α , ∞ ( δ ( { i } , { i , ..., i n } ))] / +2 X j ≥ (cid:12)(cid:12)(cid:12) E hD b β − β, X i − E ( X i ) E h X i − E ( X i ) , ζ j i i(cid:12)(cid:12)(cid:12) (cid:12)(cid:12)(cid:12) E hD b β − β, ζ j Ei(cid:12)(cid:12)(cid:12) ≤ Cρ X j ≥ ( h Γ ζ j , ζ j i ) / [ α , ∞ ( δ ( { i } , { i , ..., i n } ))] / +2 X j ≥ (cid:12)(cid:12)(cid:12) E hD b β − β, X i − E ( X i ) E h X i − E ( X i ) , ζ j i i(cid:12)(cid:12)(cid:12) (cid:12)(cid:12)(cid:12) E hD b β − β, ζ j Ei(cid:12)(cid:12)(cid:12) where C is a positive constant. However, we have from Lemma 2 . i ) in [22]thatΛ := X j ≥ (cid:12)(cid:12)(cid:12) E hD b β − β, X i − E ( X i ) E h X i − E ( X i ) , ζ j i i(cid:12)(cid:12)(cid:12) (cid:12)(cid:12)(cid:12) E hD b β − β, ζ j Ei(cid:12)(cid:12)(cid:12) ≤ X j ≥ X ℓ ≥ n(cid:12)(cid:12)(cid:12) E hD b β − β, ζ ℓ E h X i − E ( X i ) , ζ ℓ i h X i − E ( X i ) , ζ j i i(cid:12)(cid:12)(cid:12) × (cid:12)(cid:12)(cid:12) E hD b β − β, ζ j Ei(cid:12)(cid:12)(cid:12)o ≤ X j ≥ X ℓ ≥ n(cid:13)(cid:13)(cid:13)D b β − β, ζ ℓ E(cid:13)(cid:13)(cid:13) kh X i − E ( X i ) , ζ ℓ i h X i − E ( X i ) , ζ j ik × (cid:12)(cid:12)(cid:12) E hD b β − β, ζ j Ei(cid:12)(cid:12)(cid:12)o [ α , ∞ ( δ ( { i } , { i , ..., i n } ))] / + X j ≥ X ℓ ≥ n(cid:12)(cid:12)(cid:12) E hD b β − β, ζ ℓ Ei(cid:12)(cid:12)(cid:12) | E [ h X i − E ( X i ) , ζ ℓ i h X i − E ( X i ) , ζ j i ] |× (cid:12)(cid:12)(cid:12) E hD b β − β, ζ j Ei(cid:12)(cid:12)(cid:12)o