A Small-Uniform Statistic for the Inference of Functional Linear Regressions
AA Small-Uniform Statistic for the Inference of FunctionalLinear Regressions
Raymond C. W. Leung and Yu-Man TamFebruary 21, 2021
Abstract
We propose a “small-uniform” statistic for the inference of the functional PCAestimator in a functional linear regression model. The literature has shown two extremebehaviors: on the one hand, the FPCA estimator does not converge in distribution inits norm topology; but on the other hand, the FPCA estimator does have a pointwiseasymptotic normal distribution. Our statistic takes a middle ground between these twoextremes: after a suitable rate normalization, our small-uniform statistic is constructedas the maximizer of a fractional programming problem of the FPCA estimator overa finite-dimensional subspace, and whose dimensions will grow with sample size. Weshow the rate for which our scalar statistic converges in probability to the supremum ofa Gaussian process. The small-uniform statistic has applications in hypothesis testing.Simulations show our statistic has comparable to slightly better power properties forhypothesis testing than the two statistics of Cardot, Ferraty, Mas and Sarda (2003).
Keywords and phrases:
Empirical process, functional data analysis, functional linearmodel, functional principal components estimator, Gaussian processes, hypothesis testing,supremum. 1 a r X i v : . [ m a t h . S T ] F e b he functional linear model (FLM) and its associated functional principal componentsestimator (FPCA estimator) are now staples in the statistics literature. However, whilemuch is known about the FPCA’s mean squared error convergence and consistency prop-erties, much less is known about its asymptotic distributional properties. In particular,although there are hypothesis testing procedures on the FLM, the literature has few hypoth-esis testing procedures of the FLM that are explicitly based on the FPCA slope estimate.This dearth of hypothesis testing procedures based on the estimator of the model is instark contrast to its finite-dimensional counterpart; for instance, ordinary least squares isboth an estimator of the slope and also the input of the t -tests, F -tests and many othersin tests of the finite-dimensional linear model.This paper has two main objectives. Firstly, we introduce a small-uniform statistic that is constructed out of a normalized fractional programming problem of the FPCAestimator. Theorem 2.1 is the main result of this paper and shows our small-uniformstatistic converges in probability to a supremum of a Gaussian process. This result is thebasis for a hypothesis testing procedure that explicitly depends on the FPCA estimator.Secondly, we show in numerical simulations the hypothesis testing procedure based off ofour small-uniform statistic has comparable to slightly better power properties than the twostatistics proposed in Cardot et al. (2003).The key references of our paper are Cardot et al. (2007, 2003) and Chernozhukov et al.(2014). In particular, Cardot et al. (2003) and Hilgert et al. (2013) are one of the first fewstudies for conducting hypothesis testing on the FLM. However, as far as we understand,none of these studies base their hypothesis testing procedure on the FPCA estimator. Re-cently, Cuesta-Albertos et al. (2019) has proposed an interesting goodness-of-fit test of theFLM based on random projections, and a step in its testing procedure does indeed dependon the FPCA estimator. Roughly speaking, the testing procedure of Cuesta-Albertos et al.(2019) is dependent on a single randomly drawn vector (i.e. a “direction”) of the functionalregressors’ underlying Hilbert space. To smooth out the uncertainty in just drawing a sin-gle direction, the authors recommend drawing multiple directions to thus conduct severalhypothesis tests, and the final inference step is concluded by a multiple hypothesis testingcorrection (see their Algorithms 4.1 and 4.2). In contrast and intuitively, our small-uniformstatistic considers finitely many (but that number increases with the sample size) of thesedirections, and then look for the “largest” direction. Thus our small-uniform statistic isa single scalar and does not require multiple hypothesis testing corrections. Ramsay andSilverman (2005) is the well-known seminal survey of the functional data analysis (FDA)literature. Cardot and Sarda (2011), Horv´ath and Kokoszka (2012), Hsing and Eubank(2015), Goia and Vieu (2016) and Wang et al. (2016) are some recent surveys on theadvancements of the FDA literature.Section 1 fixes notations for the FLM and reviews the two extreme asymptotic behaviorof the FPCA estimator as documented by Cardot et al. (2007). Section 2 introduces oursmall-uniform statistic. Section 3 outlines the hypothesis testing procedure based off ofour small-uniform statistic, and Section 4 shows some simulated numerical results. We2onclude in Section 5. The proofs are technical in nature and thus we gather them in theSupplementary Materials Leung and Tam (2021). Let’s begin with the standard functional linear model . Throughout this paper, we will fixa sufficiently rich probability space (Ω , F , P ) that accommodates all the random quantitiesin this paper. Let H be an arbitrary real separable infinite dimensional Hilbert spaceequipped with an inner product (cid:104)· , ·(cid:105) and denote its norm as || · || . Let, Y = (cid:104) ρ, X (cid:105) + ε, (1)where Y is a real valued scalar dependent variable, X is H -valued random element, and ρ is an H -valued coefficient vector. Moreover, ε is a scalar error term such that E [ ε | X ] = 0and E [ ε | X ] = σ ε . We are interested in the estimation and subsequent inference of thecoefficient vector ρ .Let’s define the usual covariance and cross-covariance operators. For any x , x ∈ H ,we denote their tensor product as x ⊗ x ( h ) := (cid:104) x , h (cid:105) x for all h ∈ H . We denote the covariance operator of X as Γ : H → H ,Γ h := E [ X ⊗ X ( h )] , h ∈ H (2)and define the cross-covariance operator of X and Y as ∆ : H → R ,∆ h := E [ X ⊗ Y ( h )] , h ∈ H (3)We denote { λ j } j ∈ Z + as the sequence of sorted non-null distinct eigenvalues of Γ, λ >λ > . . . >
0, and { e j } j ∈ Z + a sequence of orthonormal eigenvectors associated with thoseeigenvalues. We assume the multiplicity of each λ j is one. From (1) we have normalequation, ∆ = Γ ρ. (4)For the H -valued random element X , there is the well-known Karhunen-Lo`eve expan-sion of X and is given by, X = ∞ (cid:88) l =1 (cid:112) λ l ξ l e l , (5)where ξ l ’s are centered real random variables such that E [ ξ l ξ l (cid:48) ] = 1 if l = l (cid:48) and 0 otherwise. This section will revisit some of the key definitions and setup from Cardot et al. (2007).Suppose we have have n independent and identically distributed observations { ( Y i , X i ) } ni =1
3f (1). We construct the empirical counterparts of Γ and ∆ as,Γ n := 1 n n (cid:88) i =1 X i ⊗ X i , (6a)∆ n := 1 n n (cid:88) i =1 X i ⊗ Y i , (6b) U n := 1 n n (cid:88) i =1 X i ⊗ ε i . (6c)Then from (1), we get the empirical normal equation∆ n = Γ n ρ + U n . (7)We denote the j th empirical eigenelement of Γ n as (ˆ λ j , ˆ e j ).As is well known in the FLM literature, we will need some sort of regularization methodto define an “approximate inverse” to Γ n . We will again follow the setup of Cardot et al.(2007) and Bosq (2012) and define the sequence δ j ’s, j = 1 , , . . . of the smallest differencebetween distinct eigenvalues of Γ as, δ := λ − λ , (8a) δ j := min { λ j − λ j +1 , λ j − − λ j } . (8b)Now take { c n } n ∈ N a sequence of strictly positive numbers tending to zero such that c n < λ and set, k n := sup { p : λ p + δ p / ≥ c n } . (9)This k n will be our truncation parameter ; note when n → ∞ we have c n →
0, which thenimplies k n ↑ ∞ .Let’s gather the assumptions of our paper here. Unless noted otherwise, we will enforcethese assumptions throughout the paper’s results and proofs. Assumption 1 (Identifiability) . (i) (cid:80) ∞ j =1 (cid:104) E [ XY ] ,e j (cid:105) λ j < ∞ ; and(ii) ker Γ = { } . Assumption 2 (Tail behavior) . (i) (cid:80) ∞ l =1 | (cid:104) ρ, e l (cid:105) | < ∞ ;(ii) There exists some finite M such that sup l E [ ξ l ] ≤ M < ∞ ; and iii) There exists a convex positive function λ such that for j sufficiently large, λ j = λ ( j ) . Assumption 3 (Approximate reciprocal) . (i) f n is decreasing on [ c n , λ + δ ] ;(ii) lim n →∞ sup x ≥ c n | xf n ( x ) − | = 0 ;(iii) f (cid:48) n ( x ) exists for x ∈ [ c n , ∞ ) ; and(iv) sup s ≥ c n | sf n ( s ) − | = o (cid:16) √ n (cid:17) . Assumption 4 (Roughening the standard deviation) . There exists a sequence of positivenumbers { a n } such that a n → and a n √ k n log k n → , as n → ∞ ; and Assumption 5 (Empirical eigenvector approximations) . Assume k n is such that λ kn − λ kn +1 = O ( n / ) . Assumption 1 is a basic identifiability condition in a functional linear model and theseconditions are discussed in detail in Cardot et al. (1999) and Cardot et al. (2003). Assump-tion 2 corresponds to Assumption A of Cardot et al. (2003) which are basic conditions thatensure the statistical problem is correctly posed. For our purposes, however, we replaceCardot et al. (2007)’s finite fourth moment assumption on the ξ l ’s with a stronger finitesixth moment assumption. Assumption 3 corresponds to Assumption F of Cardot et al.(2003) which effectively says the sequence of functions { f n } should behave like f n ( x ) ≈ /x when n is sufficiently large. Assumption 4 is new: it says { a n } is a regularization that tendsto zero, and more importantly, tends to zero faster than the reciprocal of the eigenvaluestending to infinity. Assumption 5 will be used to ensure the empirical eigenvectors of theempirical covariance operator uniformly converges in probability to the population eigen-vectors of population covariance operator.At this point, we will need to use the resolvent formalism to define an object Γ † n whichwill serve as our “approximate empirical inverse” to Γ n . For the purpose of exposition,we delegate the definition and details of this object to the supplementary materials. Toconstruct Γ † n , we will need a sequence of positive functions { f n } n ∈ N with support on [ c n , ∞ )that satisfy Assumption 3. Intuitively, the functions f n have the behavior of f n ( x ) ≈ /x when n is sufficiently large. By Riesz functional calculus , we can define the followingquantity (see supplementary materials (Leung and Tam, 2021, (29)) for details),Γ † n := f n (Γ n ) . (10)In particular, Γ † n will serve as the approximate inverse of Γ n . We will also let ˆΠ k n denote theprojection operator from H onto span { ˆ e , . . . , ˆ e k n } , which is subspace of all possible linearcombinations of the first k n empirical eigenvectors (equation (29) in the supplementarymaterials Leung and Tam (2021) will define ˆΠ k n precisely via Riesz functional calculus).5inally, a natural estimator of ρ from n iid observations based on (4) and (7) is the functional principal components (FPCA) estimator ,ˆ ρ := Γ † n ∆ n . (11)Cardot et al. (1999) shows this estimator is consistent for the choice of f n ( x ) ≡ /x . The motivation of our paper starts from two key insights from Cardot et al. (2007). Theirfirst key result (also more recently (Crambes and Mas, 2013, Theorem 8)) is that the FPCAestimator (11) cannot converge in distribution to a non-degenerate random element in thenorm topology of H . Theorem (Cardot et al. (2007), Theorem 1) . It is impossible for ˆ ρ − ρ to converge indistribution to a non-degenerate random element in the norm topology of H . This impossibility result suggests that we may not directly use the FPCA estimatorfor the purpose of inference in the norm topology of H . In contrast, uniform predictionintervals can still be constructed (see concluding remarks of Cardot et al. (2007) and(Crambes and Mas, 2013, Corollaries 10 and 11)).Their second result (see also more recently (Crambes and Mas, 2013, Theorem 9))shows the following pointwise weak convergence result. Theorem (Cardot et al. (2007), Theorem 3) . Fix any x ∈ H . Then under the sameAssumptions 2 to 3 of our paper, and under additional regularity conditions (see theirpaper for details), √ n || Γ / Γ † x || σ ε ( (cid:104) ˆ ρ, x (cid:105) − (cid:104) ˆΠ k n ρ, x (cid:105) ) (cid:32) N (0 , . For the sake of exposition, we will defer the precise definition of Γ † to the supplementarymaterials (see (Leung and Tam, 2021, (23))), but we can intuitively think of this quantity asan “approximate inverse” of the population covariance operator Γ. This result is extremelyuseful for constructing prediction intervals when we evaluate at x = X n +1 . However, therather arbitrary choice of x ∈ H renders this result impractical when the researcher isconcerned with the statistical inference.The main contribution of this paper can be thought of as “something in between”Theorem 1 and Theorem 3 of Cardot et al. (2007). This paper focuses on the study on ascalar “partial” supremum statistic W n to be defined in (14). For the sake of heuristics inthis section, we will slightly blur the distinction between the empirical eigenelements andthe population eigenelements (see Remark 2.3 for the validity of this justification). Let’smake three observations. 6he first observation is that there is no need to consider points x in all of H in(Cardot et al., 2007, Theorem 3). Provided x (cid:54) = 0, we can multiply and divide by (cid:15) || x || ,where (cid:15) ∈ (0 , √ nσ ε (cid:104) ˆ ρ − ˆΠ k n ρ, x (cid:105)|| Γ / Γ † x || = √ nσ ε (cid:104) ˆ ρ − ˆΠ k n ρ, (cid:15)x || x || (cid:105) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) Γ / Γ † (cid:15)x || x || (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) (12)Of course, || (cid:15)x || x || || = (cid:15) ∈ (0 , H , we can immedi-ately confine to those points in ball H := { h ∈ H : || h || ≤ } .Secondly, we can say a lot more about (Cardot et al., 2007, Theorem 3) by restrictingball H further. The main idea is to consider not all points in ball H , but consider a“small but growing” linear subspace of it. In the numerator of (12), since ˆ ρ − ˆΠ k n ρ ∈ span { ˆ e , . . . , ˆ e k n } , by the idempotent property of the projection operator, it follows for any x ∈ ball H (or in H ) we have (cid:104) ˆ ρ − ˆΠ k n ρ, x (cid:105) = (cid:104) ˆΠ k n ( ˆ ρ − ˆΠ k n ρ ) , x (cid:105) = (cid:104) ˆ ρ − ˆΠ k n ρ, ˆΠ k n x (cid:105) .Thus only points in span { ˆ e , . . . , ˆ e k n } ≈ span { e , . . . , e k n } determine the numerator of (12).Next let’s consider the denominator of (12). By the spectral decompositions of Γ / andΓ † , Γ / Γ † = ∞ (cid:88) j =1 k n (cid:88) l =1 (cid:112) λ j f n ( λ l ) P j P l = k n (cid:88) j =1 (cid:112) λ j f n ( λ j ) P j where P j is the projection of H onto the j th eigenspace ker (Γ − λ j ). More explicitly, sincethese orthogonal projections partition H , we can write any x ∈ ball H as x = (cid:80) ∞ j =1 P j x .And since ker (Γ − λ j ) ⊥ ker (Γ − λ j (cid:48) ) for j (cid:54) = j (cid:48) , this implies,Γ / Γ † x = k n (cid:88) j =1 (cid:112) λ j f n ( λ j ) ∞ (cid:88) l =1 P j P l x = k n (cid:88) j =1 (cid:112) λ j f n ( λ j ) P j x In other words, picking any x ∈ ball H with x = (cid:80) ∞ j =1 P j x versus picking h ∈ ball H with h = (cid:80) k n j =1 P j h results in the same value, Γ / Γ † x = Γ / Γ † h . And since H (and henceball H ) is assumed to be separable, we can simply assume that such h takes the form h = (cid:80) k n j =1 b j e j with (cid:80) k n j =1 b j ≤
1. In all, we argue it suffices to evaluate || Γ / Γ † · || on thefinite-dimensional domain ball H ∩ span { e , . . . , e k n } instead of on the infinite-dimensionaldomain ball H .Thirdly, instead of considering σ ε || Γ / Γ † h || as the asymptotic standard deviation of (cid:104) ˆ ρ − ˆΠ k n ρ, h (cid:105) , let’s use a slightly roughened version and define t n ( h ) := || Γ / Γ † h || + a n = (cid:118)(cid:117)(cid:117)(cid:116) k n (cid:88) j =1 λ j [ f n ( λ j )] (cid:104) h, e j (cid:105) + a n (13)where we let { a n } be a sequence of nonnegative numbers tending to zero. Note and recallthat t n depends on n not just through a n but also through Γ † which depends on k n .7ssumption 4 ensures this roughening sequence tends to zero at a rate slower than the ratefor which the sequence of eigenvalues tend to zero. Finally, let’s put our above observations together. In search for a single scalar statistic, itseems reasonable to look for the largest value of (12) over the finite-dimensional domainball H ∩ span { e , . . . , e k n } . We thus have the following definition. Definition 2.1 (Small-uniform statistic) . Let ˆ ρ be the FPCA estimator (11) of the func-tional linear model (1) and let { β n } be a sequence of positive numbers with β n → ∞ as n → ∞ . Define W n := √ nσ ε β n sup h ∈J n (cid:104) ˆ ρ − ˆΠ k n ρ, h (cid:105) t n ( h ) J n := ball H ∩ span { e , . . . , e k n } (14)where t n is defined in (13). We call W n the small-uniform statistic of the functional linearmodel.The real-valued scalar statistic W n is “small” because we only consider a low andfinite-dimensional linear subspace J n of H , even though as n becomes large this subspaceapproaches ball H . It is “uniform” because we look for the largest value over this linearsubspace J n .Recall again (Cardot et al., 2007, Theorem 3) already shows the pointwise asymptoticnormality result of the FPCA estimator. Thus under some regularity conditions and aproper rate normalization, one can expect W n to distribute like the supremum of a Gaussianprocess indexed by J n . Indeed our main result Theorem 2.1 shows precisely the rate ofconvergence under which W n and a certain Gaussian process converge to each other inprobability, and hence also in distribution. Note by linearity in h in the numerator of (14)and as the denominator t n is strictly positive, the statistic W n is almost surely nonnegativevalued. Remark . The normalization 1 /β n in (14) might seem curious.The normalization by √ n is standard, and is well expected by the pointwise asymptoticnormality result of (Cardot et al., 2007, Theorem 3). The normalization by 1 /β n is neces-sary because we need this rate to ensure some “nuisance terms” in W n converge fast enoughto zero. See the proof outline of our main result Theorem 2.1 for further explanations.In addition, our statistic is a fractional programming problem (see Stancu-Minasian(2012) for a survey). So while σ ε || Γ / Γ † h || is indeed the pointwise asymptotic standarddeviation for √ n (cid:104) ˆ ρ − ˆΠ k n ρ, h (cid:105) , we clearly see this standard deviation evaluates to zero at h = 0. Using a roughened version t n ( h ) of the standard deviation ensures the denominatorof our statistic is strictly positive. 8 emark . The optimization problem in W n is well-defined. The objectivefunction is clearly continuous in H , especially since by construction t n >
0. Moreoverwe’re optimizing over J n , which is a compact set , and so the extreme value theoremapplies. Remark W n ) . As (14) is written, it is an empiricallyinfeasible quantity for several reasons. Let’s argue why putting in empirically feasibleplug-in estimates will asymptotically do no harm to our results.(a) (
Replacing the truncation parameter ) The truncation parameter k n as defined in (9)depends on the unobservable population eigenvalues λ j ’s. The natural substitute isthe empirical truncationˆ k n := max { p = 1 , . . . , n : ˆ λ p + ˆ δ p / ≥ c n } , (15)where ˆ δ j is as analogously defined to its population counterpart in (8) but with theempirical eigenvalues. Thanks to Assumption 2(ii) and (Bosq, 2012, § j ≥ | ˆ λ j − λ j | → k n or population truncation k n are equivalent inprobability.(b) ( Replacing the optimization domain ) The optimization domain as defined in (14) isover the unobservable population eigenvectors e j ’s. The natural empirically feasibleapproach is to optimize instead over the empirical eigenvectors ˆ e j ’s. By Assumption 5,(Bosq, 2012, § E (cid:104) sup ≤ j ≤ k n || ˆ e j − e (cid:48) j || (cid:105) → n → ∞ , andwhere we have denoted e (cid:48) j := sign( (cid:104) ˆ e j , e j (cid:105) ) e j and where sign( t ) = 1 if t >
0, = 0 if t = 0,and = − t <
0. This implies optimizing over ˆ J n := ball H ∩ span { ˆ e , . . . , ˆ e k n } andball H ∩ span { e (cid:48) , . . . , e (cid:48) k n } are asymptotically equivalent in probability. By fixing the“orientations” sign( (cid:104) ˆ e j , e j (cid:105) )’s, we can identify optimizing ball H ∩ span { e (cid:48) , . . . , e (cid:48) k n } with optimizing over J n .(c) ( Replacing the asymptotic standard deviation ) The asymptotic standard deviation t n as defined in (13) depends on the unobservable population eigenvalues λ j ’s and eigen-vectors e j ’s. An empirically feasible version of t n is its natural plug-in estimator,ˆ t n ( h ) = (cid:118)(cid:117)(cid:117)(cid:117)(cid:116) ˆ k n (cid:88) j =1 ˆ λ j [ f n (ˆ λ j )] (cid:104) h, ˆ e j (cid:105) + a n , h ∈ ˆ J n (16)By using the arguments in (a) and (b) above, it is not difficult to see that t n and ˆ t n are asymptotically the same in probability. See also (Cardot et al., 2007, Corollary 2). Clearly ball H is bounded, and a finite-dimensional subspace in an infinite-dimensional Hilbert spaceis closed (in the relative topology). Thus the Heine-Borel theorem applies and so J n is compact. Consistent estimate of noise error ) It is clear the standard deviation of the error term σ ε can be replaced by any consistent estimator ˆ σ ε P → σ ε .Except for Sections 3 and 4 where we discuss numerical simulations, the rest of thissection and the proofs will use W n as defined by (14). This is the paper’s main result. The proof outline sketches the two key steps to provingour result. We delegate all the proof details to the supplementary materials Leung andTam (2021). For an arbitrary set T , we denote (cid:96) ∞ ( T ) as the space of all bounded functionsfrom T to R with the uniform norm || f || T := sup t ∈ T | f ( t ) | . Theorem 2.1 (Gaussian suprema approximation of the small-uniform statistic) . AssumeAssumptions 1 to 4 hold and assume k n /n → as n → ∞ . Then for sufficiently large n , there exists a mean-zero Gaussian process { G P,n ( h ) } h ∈J n in (cid:96) ∞ ( J n ) with covariancefunction, E [ G P,n ( h ) G P,n ( h )] = (cid:10) Γ / Γ † h , Γ / Γ † h (cid:11) ( || Γ / Γ † h || + a n ) ( || Γ / Γ † h || + a n ) , (17) for all h , h ∈ J n .Moreover, if we define the random variables (cid:101) Z n := sup h ∈J n G P,n h and (cid:102) W n := (cid:101) Z n β n , then the small-uniform statistic W n of (14) and the random variable (cid:102) W n are close togetherin probability at the rate | W n − (cid:102) W n | = O P (cid:32) k / n (log k n ) / β n + k / n (log k n ) / (log n ) n / (cid:33) . (18) In particular if k / n (log k n ) / (log n )min { β n ,n / } → , then | W n − (cid:102) W n | P → . Proof outline.
For each h ∈ J n we have the important decomposition, √ nσ ε t n ( h ) (cid:104) ˆ ρ − ˆΠ k n ρ, h (cid:105) = √ nσ ε t n ( h ) (cid:104)T n + S n + Y n + R n , h (cid:105) . (19)10here T n := (Γ † n Γ n − Π k n ) ρ, (20a) S n := (Γ † n − Γ † ) U n , (20b) Y n := (Π k n − ˆΠ k n ) ρ, (20c) R n := Γ † U n . (20d)For the sake of exposition, we defer the precise functional calculus definitions of thebounded operators Γ † n , Γ n , Γ † and Π k n to the supplementary materials.Then by triangle inequality, we have (cid:12)(cid:12)(cid:12)(cid:12) sup h ∈J n √ nσ ε t n ( h ) (cid:104) ˆ ρ − ˆΠ k n ρ, h (cid:105) − (cid:101) Z n (cid:12)(cid:12)(cid:12)(cid:12) ≤ sup h ∈J n (cid:12)(cid:12)(cid:12)(cid:12) √ nσ ε t n ( h ) (cid:104)T n + S n + Y n , h (cid:105) (cid:12)(cid:12)(cid:12)(cid:12) + (cid:12)(cid:12)(cid:12)(cid:12) sup h ∈J n √ nσ ε t n ( h ) (cid:104)R n , h (cid:105) − (cid:101) Z n (cid:12)(cid:12)(cid:12)(cid:12) . The two major steps in the proof are showing the following results for sufficiently large n : Step I:
Asymptotic bias termssup h ∈J n (cid:12)(cid:12)(cid:12)(cid:12) √ nσ ε t n ( h ) (cid:104)T n + S n + Y n , h (cid:105) (cid:12)(cid:12)(cid:12)(cid:12) = O P (cid:16) k / n (log k n ) / (cid:17) . (I) Step II:
Asymptotic distribution term (cid:12)(cid:12)(cid:12)(cid:12) √ nσ ε t n ( h ) sup h ∈J n (cid:104)R n , h (cid:105) − (cid:101) Z n (cid:12)(cid:12)(cid:12)(cid:12) = O P (cid:32) k / n (log k n ) / (log n ) n / (cid:33) . (II)Step (I) uses many proof arguments from Cardot et al. (2007) but we take extra carein keeping track of the rates of various bounds. Proposition A.7 of the supplementarymaterials concludes the discussions of Step (I). By our underlying real valued Hilbertspace structure, we can apply Riesz’s representation theorem to uniquely identify H withits dual H ∗ . Thus we can view the indexing of the supremum of R n by J n in Step (II) asequivalent to indexing by its dual J ∗ n , which allows us to apply the tools from empiricalprocess theory. Our desired result for Step (II) is the contents of Proposition A.10 in thesupplementary materials, which is an application of Chernozhukov et al. (2014).Once Steps (I) and (II) hold, the statistic W n of (14) and the displayed random vari-able W n are, respectively, exactly the quantities sup h ∈J n √ nσ ε t n ( h ) (cid:104) ˆ ρ − ˆΠ k n ρ, h (cid:105) and (cid:101) Z n bothnormalized by 1 /β n to achieve the rate (18).11s discussed earlier, our result is a middle ground between the non-convergence (innorm topology) and the pointwise asymptotic normality of the FPCA estimator. The keycontribution of our result is to further understand the asymptotic distributional propertiesof the FPCA estimator. Up to our knowledge, only a few select studies (most notablyCardot et al. (2007)) have studied this problem from the perspective of inference. Thereare more, but still few, studies of the asymptotic distributional properties of the FPCAestimator for the purpose of prediction; for instance, see Yao et al. (2005) and Crambesand Mas (2013), among others. Remark . Note the right hand side of Step (I)is not normalized by √ n . In other words, with just a scaling of √ n on sup h ∈J n (cid:104) ˆ ρ − ˆΠ kn ρ,h (cid:105) σ ε t n ( h ) ,its asymptotic bias terms do not converge to zero. In contrast to Cardot et al. (2007),here we do not benefit from the extra smoothing in a prediction problem (cid:104) ˆ ρ − ˆΠ k n ρ, X n +1 (cid:105) ,where they show normalizing by just √ n is sufficient to ensure the asymptotic bias termswill vanish; see also Cai et al. (2006).The sufficient condition k / n (log k n ) / (log n )min { β n ,n / } → W n to converge in probability to (cid:102) W n effectively depends on the speed for which the truncation parameter k n of (9) tends toinfinity. The speed of k n in turn depends on both the speed the eigenvalues λ j ’s tend tozero, and the speed the regularization c n tends to zero. An important application of the small-uniform statistic is hypothesis testing. Cardot et al.(2003) introduces two statistics (their D n and T n ; see Section 4.2 later) based on the normof the cross-covariance operator ∆ n to test the hypothesis H : ρ = ρ versus H : ρ (cid:54) = ρ (e.g. we can say take ρ as the zero functional). However, while their statistics test therelationship ρ = ρ versus ρ (cid:54) = ρ , it does not use an estimate of ρ to form this test. Theprocedure in Cardot et al. (2003) is, in some sense, an analysis of variance approach totesting significance. In contrast, our hypothesis testing approach here directly uses theFPCA estimator of ρ via the small-uniform statistic W n .Let’s summarize and outline a practical recipe to applying our main result for thepurpose of hypothesis testing.1. Fix a statistical significance level α ∈ (0 ,
1) and form the hypothesis H : ρ = 0, H : ρ (cid:54) = 0. Loosely speaking, the procedure of Cardot et al. (2003) has the following counterpart in the finite-dimensional linear model. Let y i = x (cid:62) i β + (cid:15) i be the usual linear model in finite dimensions. Suppose wehave the hypothesis β = 0. Under this null, it necessarily implies E [ y i x i ] = E [( x (cid:62) i (cid:15) i ) x i ] = E [ (cid:15) i x i ] = 0.Thus, a test of the hypothesis β = 0 is to test for zero correlation between y i and x i — such “correlationtest” does not require an estimate of β . If the hypothesis were instead H : ρ = ρ , H : ρ (cid:54) = ρ for ρ is non-zero, we consider Y (cid:48) :=
12. Perform functional PCA on the empirical covariance operator Γ n and collect theempirical eigenelements (ˆ λ j , ˆ e j )’s.3. Fix regularization parameters:(a) Based on ˆ λ and ˆ δ , pick a sequence { c n } of positive numbers and a sequence offunctions { f n } that satisfy Assumption 3.(b) Compute the empirical truncation parameter ˆ k n of (15).(c) Pick a sequence of positive numbers { a n } that satisfy Assumption 4.(d) Pick a sequence of positive numbers { β n } that satisfy ˆ k / n (log ˆ k n ) / (log n )min { β n ,n / } → ρ of (11).5. Pick a consistent estimator of the error standard deviation ˆ σ ε .6. Construct the small-uniform statistic: Numerically solve the fractional programmingproblem, W n = √ n ˆ σ ε β n sup h ∈ ˆ J n (cid:104) ˆ ρ, h (cid:105) ˆ t n ( h ) = √ n ˆ σ ε β n sup b ∈ R ˆ kn || b ||≤ (cid:80) ˆ k n j =1 b j (cid:104) ˆ ρ, ˆ e j (cid:105) (cid:18)(cid:113)(cid:80) ˆ k n j =1 ˆ λ j [ f n (ˆ λ j )] b j + a n (cid:19) . (21)7. Simulate the asymptotic distribution:(a) Simulate a mean zero Gaussian process G P,n with covariance function (17),replacing all population quantities their empirical or estimated counterparts.(b) Take the maximum value of this Gaussian process’s sample path.(c) Repeat (a) and (b) many times to get a simulated distribution of the scalarrandom variable (cid:103) W n .(d) Compute the quantile q − α ; that is, P ( (cid:102) W n ≤ q − α ) = 1 − α .8. Inference: Reject H if W n > q − α ; otherwise, accept it. Remark . It is evident there is no closed form analytical solutionto the optimization problem (21). However, some numerical optimizers can greatly benefitfrom inputting a known gradient and the Hessian of the objective function. In particular,for h = (cid:80) k n j =1 b j ˆ e j with b = ( b , . . . , b k n ), let L ( b ) := (cid:104) ˆ ρ − ˆΠ k n ρ, h (cid:105) t n ( h ) = (cid:80) k n j =1 b j θ j (cid:113)(cid:80) k n j =1 b j ψ j + a n =: f ( b ) g ( b ) =: f ( b ) (cid:112) p ( b ) + a n Y − (cid:104) X, ˆΠ k n ρ (cid:105) . Then the procedure is exactly as follows but we replace the cross-covariance operator ∆of ( Y, X ) with the cross-covariance operator ∆ (cid:48) of ( Y (cid:48) , X ). θ j := (cid:104) ˆ ρ − ˆΠ k n ρ, ˆ e j (cid:105) and ψ j := (cid:113) ˆ λ j f n (ˆ λ j ). Direct calculations show the l -thelement of the gradient vector is, ∂L∂b l = 1 g (cid:18) θ l − b l ψ l fg √ p (cid:19) and the ( l (cid:48) , l )-th component of the Hessian is, ∂L∂b l (cid:48) ∂b l = 1 g (cid:20) − b l ψ l g ( θ l (cid:48) p − b l (cid:48) ψ l (cid:48) f ) − b l (cid:48) ψ l (cid:48) f √ pgp / − b l (cid:48) ψ l (cid:48) √ p (cid:18) θ l − b l ψ l fg √ p (cid:19)(cid:21) Remark . As stated, (21) is an optimization problem with anonlinear objective function with a norm inequality constraint. The norm inequality con-straint is a nonlinear constraint. However, many local and global numerical optimizersare designed to accommodate only box constraints. By using spherical coordinates, wecan replace the single norm inequality constraint with just k n number of box constraints.Specifically, pick r ∈ [0 , φ , . . . , φ k n − ∈ [0 , π ] and φ k n − ∈ [0 , π ). We can change fromspherical to Euclidean coordinates via the well-known equations: b = r cos( φ ) ,b = r sin( φ ) cos( φ ) , ... b k n − = r sin( φ ) · · · sin( φ k n − ) cos( φ k n − ) ,b k n = r sin( φ ) · · · sin( φ k n − ) sin( φ k n − ) . Let’s illustrate the small sample properties of our hypothesis testing procedure from Sec-tion 3 with numerical simulations. We focus on the Hilbert space H = L ([0 , , B , λ ) =: L ([0 , B are the usual Borel sets in [0 ,
1] and λ is the Lebesgue measure in [0 , X is a standard Brownian mo-tion on [0 , f n ( x ) = 1 /x when x ≥ c n and 0 otherwise; and (ii) “ridge” regularization where f n ( x ) = 1 / ( x + α n ) if x ≥ c n and 0 otherwise. As shown in Example 2 of Cardot et al.(2007), we require α n √ n/c n → It is well known (for instance, see Example 4.6.3 of Hsing and Eubank (2015)) the eigenele-ments of the covariance operator of Brownian motion are, λ j = 4((2 j − π ) and e j ( t ) = √ (cid:18) (2 j − π t (cid:19) . λ j = O ( j ) ≤ O ( j log j ). In particular we have δ j = λ j − λ j +1 for j ≥
2, and so λ j + δ j = j +8 j +1) π (2 j +1) (2 j − (cid:46) O ( j ). Recalling (9) and { c n } is a sequence tending to zero, the above implies the upper bound k n (cid:46) O (cid:16) √ c n (cid:17) .With respect to the required rate of our Theorem 2.1 and Assumption 4, we choose β n =(log n ) . Consequently, we have the bounds k / n (log k n ) / (log n )min { β n ,n / } (cid:46) O (cid:18) n c / n (cid:16) log √ c n (cid:17) / (cid:19) and a n √ k n log k n (cid:46) O (cid:16) a n √ c n log √ c n (cid:17) . For the ridge regularization, we also need a choiceof { α n } such that α n √ n/c n →
0. So in all, the c n , a n and α n , all tending to zero, mustalso satisfy the three requirements: (i) n c / n (cid:16) log √ c n (cid:17) / →
0; (ii) a n √ c n log √ c n → α n √ n/c n → c n = C log log n , a n = n , and α n = √ n log n . It is easy toshow these choices of c n ’s, a n ’s and α n ’s will satisfy the aforementioned requirements (i)to (iii). However in finite samples the choice of the constant C in c n has a material impacton the numerical results. We consider the choices C = λ c (deterministic case) and C = ˆ λ c (data based case) for c = 2 , , , λ j ’s as per the above displayed equation, and this correspondinglyimplies deterministic quantities k n of (9) and f n (through the defining condition x ≥ c n ).In the data based case, we use the random truncation ˆ k n and the corresponding datadependent f n . The exponents c are chosen as such because they generate a good rangeof truncation parameter k n and ˆ k n values for our numerical illustrations; higher values of c imply larger values of k n and ˆ k n .We will consider three different coefficient vectors in L ([0 , • ρ ( t ) ≡ • ρ ( t ) = sin (cid:0) π t (cid:1) + sin (cid:0) π t (cid:1) + sin (cid:0) π t (cid:1) ; and • ρ ( t ) = sin(2 πt ) .The first choice ρ is used to evaluate the size of our small-uniform statistic W n , while ρ and ρ are used to evaluate power. The second choice is a case where the coefficientvector is exactly spanned by the first three eigenvectors of the Brownian motion covarianceoperator. The third choice is an example where the coefficient vector cannot be linearlyspanned by those eigenvectors. We note Cardot et al. (2003) also numerically illustratescases 1 and 3, while Cardot et al. (1999) illustrates case 2. We fix the noise ε distributionas a Gaussian N (0 , σ ε ) distribution where we pick variance σ ε = − snrsnr Var( (cid:104)
X, ρ (cid:105) ), andwhere snr is the “signal-to-noise ratio” and we let it to be snr = 5% and 10%. To focus We emphasize in the deterministic case, it is only in the calculations of C = λ c , k n and f n do weassume we have perfect knowledge of the eigenvalues λ j ’s. The eigendecomposition of ∆ n are still based onthe random observations X i ’s in our simulations. σ ε , we assume throughout all these numerical simulationsthat the noise parameter σ ε is known with certainty.For each of the three example coefficient vectors and each of the two noise distributions,we run n s = 2500 simulations of { ( Y i , X i ) } ni =1 for each of the sample size choices n =50 , , X i and the function ρ are discretized by 100 equispacedpoints in [0 , L ([0 , fdapace package of the R language.Once the FPCA estimator ˆ ρ is constructed as per (11), we can evaluate it pointwiseon [0 ,
1] as ˆ ρ ( t ) = (cid:80) k n j =1 (cid:104) ˆ ρ, ˆ e j (cid:105) ˆ e j ( t ). At the end of each simulation round, we will alsocompute and record the quadratic error measure,error( ρ ) = (cid:90) ( ρ ( t ) − ˆ ρ ( t )) d t, if ρ ≡
0; and (cid:82) ( ρ ( t ) − ˆ ρ ( t )) d t (cid:82) ρ ( t ) d t , otherwise. (22) W n For succinctness in discussing both the deterministic truncation case and the data driventruncation case, let’s denote K n ∈ { k n , (cid:100) ˆ k n avg (cid:101)} . Here ˆ k n avg denotes the averaged randomtruncations over the n s number of simulations for a given sample size choice n , and (cid:100)·(cid:101) isthe ceiling function. That is to say, if we use deterministic truncation we simply set K n tothe known value k n , and if we were to use a data driven truncation, we set K n to be theaveraged truncation parameter.The computation of W n is as written in (21). As mentioned above, we evaluate W n under a known standard deviation σ ε of the error distribution. The optimization step in(21) is computed using a combination of a global and local search. A uniformly randompoint in drawn in R K n and this serves as the initial point in the constrained nonlinearglobal optimizer ISRES of Runarsson and Yao (2005). To further refine the solution,we take the resulting solution point and set that as the initial point in the constrainednonlinear local optimizer COBYLA of Powell (1994). The end of this procedure results inour small-uniform statistic W n . Although not thoroughly experimented in this paper, butwe sense the specific choices of these numerical optimization algorithms are not particularlyimportant.As a matter of comparison, we will also compute the D n and T n statistics of Cardot https://cran.r-project.org/web/packages/fdapace/ , version 0.5.5. We set a relative error tolerance of 10 − and a maximum of 100 runs. Again, we set a relative error tolerance of 10 − and a maximum of 100 runs.
16t al. (2003). These two statistics are defined as, D n := 1 σ ε ||√ n ∆ n ˆ A n || , T n := D n − k n √ k n where ˆ A n := (cid:80) k n j =1 1ˆ λ j ˆ e j ⊗ ˆ e j . Cardot et al. (2003) show that under the null hypothe-sis, D n (cid:32) χ ( k n ) and T n (cid:32) N (0 , q χ ( k n ) , − α denote the quantile P ( χ ( k n ) ≤ q χ ( k n ) , − α ) = 1 − α and q N (0 , , − α/ denote the quantile P ( N (0 , ≤ q N (0 , , − α/ ) = 1 − α/
2. Then we reject the null hypothesis H : ρ = 0 using the D n statistic if D n > q χ ( k n ) , − α and reject the null hypothesis using the T n statistic if | T n | > √ q N (0 , , − α/ . Otherwise,we accept the null hypothesis. Notice we can only make the comparison of these two statis-tics against our small-uniform statistic W n in the deterministic truncation parameter k n case. (cid:102) W n The distribution of the supremum of our Gaussian process (cid:102) W n must be numerically simu-lated. Note in the data driven case, it necessarily implies a mismatch between the trunca-tion ˆ k n that was used to compute each small-uniform statistic W n for a given simulationepoch, and the asymptotic distribution approximation (cid:102) W n that depends on ˆ k n avg.Let’s describe our simulation procedure. We first uniformly draw 25 points on theboundary of a K n -sphere, and then uniformly draw another 25 points in the interior ofthat K n -sphere; that is, a total of 25 = 625 of K n -vectors are drawn. We evaluate thecovariance function (17) on the Cartesian product of these 625 points, and this results in a625 ×
625 dimensional covariance matrix. We draw an observation from a 625-dimensionalmean zero multivariate normal distribution with this covariance matrix. This observationrepresents one sample path of the Gaussian process G P,n . We record the maximum valueof this sample path.We repeat the above procedure 1 . (cid:101) Z n of Theorem 2.1. Finally, we normalize (cid:101) Z n by β n = n ) to arriveat the simulated distribution of (cid:102) W n . Figure 1 plots the results of the simulations. Thequantile numbers q − α are accordingly numerically computed. For h l ∈ J n , l = 1 , h l = (cid:80) K n j =1 b j e j where b = ( b , . . . , b K n ) is a real Euclidean vector inthe K n unit sphere. Consequently, the covariance function (17) can be more explicitly written as a functionon the Cartesian product of two K n unit spheres, c n ( x, y ) = (cid:80) K n j =1 λ j f n ( λ j ) x j y j (cid:16)(cid:113)(cid:80) K n j =1 λ j f n ( λ j ) x j + a n (cid:17) (cid:16)(cid:113)(cid:80) K n j =1 λ j f n ( λ j ) y j + a n (cid:17) where || x || R Kn ≤ || y || R Kn ≤ W n C oun t n (a) c = 3 W n C oun t n (b) c = 4 W n C oun t n (c) c = 5 W n C oun t n (d) c = 7 W n C oun t n (e) c = 8 Figure 1: Histograms of the distribution of (cid:102) W n for various sample sizes n and various exponents c when the FPCA usesridge regularization. These plots are best seen in color. Details of the parameterization are described in Section 4.1.The procedure for simulating (cid:102) W n is described in Section 4.3. The histogram plots for when the FPCA uses simpleregularization are similar; they are not shown for brevity. emark . In this numerical simulation exercise, it was far more time consuming tosimulate and compute the statistic W n than simulating its asymptotic approximation (cid:102) W n .Simply put, both the spectral decomposition of Γ n and the numerical optimization stepsin computing W n are computationally expensive, and made even more so when we have todo this n s many times across various sample size choices n . Of course, in actual practicewhere W n is only computed once based on the given data, the computation time of a single W n is negligible. Choosing the coefficient vector as ρ ( t ) ≡ k n ; and Table 3 (snr = 5%) and Table 1 (snr = 5%) show the case when we usea deterministic truncation k n . Thus these tables illustrate the size properties of our small-uniform statistic. Firstly, we see there is little qualitative difference of the levels betweenthe deterministic and data driven truncation cases, which suggests random variations inthe eigenelements, and hence in the determination of ˆ k n , do not substantially affect theestimated levels. Thus for the remainder of this section, we will focus on the deterministictruncation k n case, as this focus allows us to further compare our W n statistic againstCardot et al. (2003)’s D n and T n statistics.Let’s focus on Table 1 with snr = 5%. We see the estimated size of our small-uniformstatistic W n (for both the reciprocal and ridge regularization cases) matches the simulatedlevels of its asymptotic distribution (cid:102) W n when the truncation k n is small. However, thismatching deteriorates as the truncation increases, and perhaps paradoxically, also deterio-rates with larger sample sizes. This can be explained from the log errors: as the truncationand sample size increases, the quality of the estimator ˆ ρ of the true coefficient ρ ≡ ρ is zero, the “optimal” truncation shouldsimple be k n = 0. In all, our numerical results suggest the FPCA estimator (and especiallythe case of simple regularization) has significant difficulty in estimating a null coefficient.And since our small-uniform statistic W n is based on the FPCA estimator, it is thus nosurprise the size performance of W n is also necessarily hampered. In contrast, the D n and T n statistics of Cardot et al. (2003) do not depend on the FPCA estimator, and their nom-inal levels appear to be stable across truncations and sample sizes. Table 3 is the resultswith a higher snr = 10% and exhibit the same qualitative behavior of W n , D n and T n asdiscussed above.Let’s now discuss the empirical power of our statistic W n . Tables 5 (snr = 5%) and 6(snr = 10%) show the results for the power against ρ . By design, ρ is a linear combinationof the first three eigenvectors of Γ, and so the “optimal” truncation k n for ρ is exactly 3.Hence, we should expect the best performance for all the statistics W n , D n and T n at k n = 3(i.e. c = 4). Even with a modest sample size of n = 200, it appears the empirical powerof W n (for both the simple and ridge regularizations) yield qualitatively almost identical19ower to that of D n and T n . For the other truncation cases (i.e. corresponding to c = 3 , , W n , again for both the simple and ridge regularizations, yield higherpower than D n and T n . However this observation is not without reservations. On the onehand, higher truncations lead to higher log quadratic errors of ˆ ρ . But on the other hand,it could very well be possible that the estimated coefficient ˆ ρ doesn’t resemble the truecoefficient ρ , but ˆ ρ nonetheless is still significantly different from the null vector, and thatthe optimizing nature of W n can advantage of this. Thus this suggests our small-uniformstatistic W n is robust at rejecting the null hypothesis H : ρ = 0 even if the underlingFPCA estimator ˆ ρ has high estimation error, as is most evident when using the simpleregularization, in finite samples.Finally, Tables 7 (snr = 5%) and 8 (snr = 10%) show the results for the power against ρ . This coefficient vector ρ is designed such that it is not a linear combination of theeigenvectors of Γ and so higher truncations k n should yield better results. This coefficientvector ρ example is particularly important because real world coefficient vectors of theFLM are highly unlikely to be just simple linear combinations of the eigenvectors of Γ. Here,the power of our small-uniform statistic W n outperforms that of D n and T n , especially athigh truncations. Although it is not the purpose of this paper to empirically evaluate theperformance of various regularization regimes, it does appear that the log quadratic errorof the FPCA estimator under ridge regularization is substantially lower than when theFPCA estimator uses simple regularization.20able 1: The empirical power (in percentages) of our small-uniform W n statistic along with Cardot et al. (2003)’s D n and T n statistics when ρ ( t ) = ρ ( t ) ≡ ε i has a N (0 , σ ε ) distribution with σ ε = − snrsnr Var( (cid:104)
X, ρ (cid:105) ) with snr = 5%.Here we assume the truncation parameter k n is known. The n here refers to sample size, and c here refers to theexponent associated with the definition of c n . The “log error” here refers to the average over all the simulations of thelog of the error measure as given in (22). Section 4.3 describes our procedure to obtain the simulated levels of (cid:102) W n .The nominal levels of D n and T n are based on their respective asymptotic distributions as described in Section 4.2. (cid:102) W n (simple regularization) (cid:102) W n (ridge regularization)Simulated level Simulated level Nominal level of D n Nominal level of T n n k n log error 1 5 10 20 log error 1 5 10 20 1 5 10 20 1 5 10 20 c = 350 2 -327.44 2.24 9.60 16.80 31.04 -376.72 2.60 10.56 18.00 31.60 0.92 5.36 10.64 20.04 2.76 5.68 8.04 10.76200 2 -321.85 0.76 3.40 7.32 15.04 -349.58 0.44 3.44 7.16 14.36 0.84 4.64 9.96 20.00 2.48 4.84 7.12 10.401000 2 -450.00 1.20 5.24 9.52 18.64 -469.85 1.04 5.04 9.80 19.92 1.32 5.64 10.16 19.92 3.36 5.84 7.36 10.48 c = 450 3 -133.88 2.08 9.44 15.48 26.28 -229.18 2.12 8.52 15.36 25.96 1.24 5.44 10.84 19.96 2.72 5.52 7.36 11.28200 3 -206.36 0.44 3.60 7.04 13.92 -264.24 0.92 3.20 6.84 15.48 0.56 5.12 9.92 18.60 2.08 5.16 6.80 10.481000 3 -303.70 0.68 4.40 9.16 18.04 -339.78 0.80 4.16 8.60 16.76 0.92 4.92 10.24 19.96 2.60 4.96 7.32 10.72 c = 550 4 58.28 3.32 9.24 15.56 25.72 -139.09 1.48 6.52 11.92 21.16 1.12 5.08 9.56 19.60 2.24 5.08 6.44 12.12200 4 -42.88 1.84 7.20 12.76 24.64 -164.06 2.24 7.68 13.48 24.72 1.00 4.84 9.36 19.40 2.48 4.80 6.60 11.161000 5 -202.19 2.04 7.48 14.20 25.56 -259.44 1.80 7.08 13.64 24.80 0.68 4.76 9.32 19.40 2.12 4.52 6.52 13.48 c = 750 9 356.97 13.52 25.32 34.68 45.00 -75.02 2.64 8.04 14.48 24.76 1.16 5.12 10.32 19.60 2.32 4.52 8.12 17.60200 10 258.27 12.68 28.32 40.00 54.48 -74.13 4.28 11.88 19.04 30.60 0.88 5.64 11.40 21.40 1.60 5.04 9.08 17.601000 11 121.34 12.60 27.12 38.76 53.80 -99.45 6.28 18.80 28.72 42.92 0.96 5.40 10.92 21.00 1.64 4.44 8.84 18.68 c = 850 14 502.37 18.72 32.68 41.52 51.80 -60.25 2.40 8.00 13.48 22.72 1.60 5.96 11.52 21.16 2.40 5.48 9.76 19.76200 16 400.44 19.52 35.36 44.52 56.88 -51.52 4.28 13.16 21.04 33.24 1.40 6.08 11.72 21.72 2.00 5.56 10.60 20.761000 17 267.09 21.60 38.04 48.84 62.04 -65.88 6.32 16.24 24.52 36.32 2.16 7.56 13.44 25.36 2.96 6.40 11.04 21.40 able 2: The empirical power (in percentages) of our small-uniform W n statistic when ρ ( t ) = ρ ( t ) ≡ ε i has a N (0 , σ ε ) distribution with σ ε = − snrsnr Var( (cid:104)
X, ρ (cid:105) ) with snr = 5%. Here we use a data driven truncation parameter ˆ k n .In particular, “ˆ k n avg” is the average truncation value over n s number of simulations, and “ˆ k n std” is the associatedstandard error. The n here refers to sample size, and c here refers to the exponent associated with the definition of c n . The “log error” here refers to the average over all the simulations of the log of the error measure as given in (22).Section 4.3 describes our procedure to obtain the simulated levels of (cid:102) W n . Simulated level of (cid:102) W n (simple) Simulated level of (cid:102) W n (ridge) n ˆ k n avg ˆ k n std log error 1 5 10 20 log error 1 5 10 20 c = 350 1.47 0.60 -313.18 2.24 9.72 16.80 31.04 -363.41 2.64 10.56 18.00 31.56200 1.57 0.50 -416.98 0.76 3.36 7.28 14.80 -436.73 0.44 3.44 7.20 14.361000 1.93 0.26 -473.03 1.20 5.20 9.52 18.68 -488.62 1.04 5.08 9.88 19.92 c = 450 2.43 1.18 -137.21 2.12 9.56 15.52 26.32 -244.48 2.12 8.52 15.32 25.96200 2.45 0.59 -234.49 0.64 4.08 7.80 15.60 -289.46 1.00 3.36 7.32 15.761000 2.64 0.48 -355.21 0.72 4.60 9.48 18.40 -390.93 0.80 4.16 8.36 16.60 c = 550 3.91 2.27 29.62 3.24 9.20 15.56 25.68 -168.98 1.52 6.72 12.16 21.60200 3.85 1.07 -70.54 1.84 7.16 12.72 24.64 -184.19 2.12 7.56 13.24 24.441000 4.02 0.53 -207.77 1.92 7.28 13.96 24.84 -262.03 1.80 7.36 14.40 26.08 c = 750 10.39 7.65 332.29 13.52 25.36 34.76 45.12 -91.57 2.44 7.72 13.64 23.92200 9.59 3.53 227.29 12.68 28.12 39.56 54.00 -85.21 4.40 12.16 19.80 31.521000 9.71 1.59 86.75 13.00 28.24 39.72 55.20 -112.72 6.56 19.60 29.96 44.36 c = 850 16.41 11.60 473.32 18.96 33.04 42.32 52.56 -75.49 2.44 8.32 14.04 23.24200 15.09 6.62 360.73 19.28 34.92 44.32 56.44 -59.58 4.32 13.16 21.04 33.241000 15.12 2.83 228.74 22.56 39.12 50.28 63.60 -74.90 6.40 16.36 24.76 36.76 able 3: The empirical power (in percentages) of our small-uniform W n statistic along with Cardot et al. (2003)’s D n and T n statistics when ρ ( t ) = ρ ( t ) ≡ ε i has a N (0 , σ ε ) distribution with σ ε = − snrsnr Var( (cid:104)
X, ρ (cid:105) ) with snr = 10%.Here we assume the truncation parameter k n is known. The n here refers to sample size, and c here refers to theexponent associated with the definition of c n . The “log error” here refers to the average over all the simulations of thelog of the error measure as given in (22). Section 4.3 describes our procedure to obtain the simulated levels of (cid:102) W n .The nominal levels of D n and T n are based on their respective asymptotic distributions as described in Section 4.2. (cid:102) W n (simple regularization) (cid:102) W n (ridge regularization)Simulated level Simulated level Nominal level of D n Nominal level of T n n k n log error 1 5 10 20 log error 1 5 10 20 1 5 10 20 1 5 10 20 c = 350 2 -330.34 2.40 9.88 17.20 31.16 -366.85 2.60 9.40 17.20 31.12 0.92 4.92 9.72 20.36 2.60 5.12 7.12 10.04200 2 -313.85 0.80 3.56 6.56 13.92 -348.63 0.36 2.88 6.60 14.48 1.12 4.88 9.16 19.52 3.04 4.96 6.96 9.521000 2 -452.11 0.88 5.08 9.08 19.56 -472.10 0.48 3.92 8.88 18.16 1.04 5.32 9.16 20.08 3.12 5.44 6.92 9.44 c = 450 3 -124.35 2.88 9.36 15.96 27.44 -224.59 2.64 8.88 15.64 27.88 1.08 5.24 10.36 19.56 2.76 5.40 7.88 10.96200 3 -201.98 0.60 3.32 7.28 15.00 -266.06 0.68 3.72 7.04 13.68 0.88 4.72 9.24 19.16 1.88 4.76 6.40 9.561000 3 -302.72 0.60 3.80 7.80 15.04 -341.54 0.80 4.44 9.36 17.20 0.72 4.32 9.48 18.24 2.20 4.36 6.28 9.92 c = 550 4 57.02 2.88 9.12 15.88 24.76 -141.90 1.80 7.00 12.24 22.32 0.76 5.12 10.44 20.16 2.64 5.04 7.24 12.80200 4 -41.28 1.76 7.80 14.32 25.24 -172.30 1.32 6.96 13.24 23.04 0.80 5.04 10.24 19.76 2.36 5.00 7.44 12.201000 5 -199.48 2.04 6.84 13.52 24.56 -261.46 2.00 7.60 14.20 25.00 1.24 5.12 9.80 19.12 2.36 5.00 6.84 13.64 c = 750 9 358.46 13.96 26.52 34.44 45.64 -74.80 3.00 8.84 14.80 23.92 1.28 5.40 10.64 20.24 2.04 4.52 8.20 17.68200 10 256.43 11.08 24.84 36.08 51.00 -73.54 4.32 13.88 21.80 33.72 1.40 5.60 11.16 21.24 2.28 5.04 8.32 19.041000 11 122.28 11.64 29.28 40.64 56.68 -100.18 6.12 17.40 27.08 41.04 1.32 5.36 11.24 22.20 2.36 4.76 8.80 19.16 c = 850 14 501.98 19.24 32.88 42.36 54.24 -56.42 2.32 8.04 14.48 24.16 1.60 6.60 12.04 22.36 2.32 5.72 10.44 20.84200 16 401.06 20.04 36.96 46.76 59.60 -50.26 5.16 14.00 21.36 33.16 1.88 6.36 11.92 23.44 2.52 5.56 10.00 20.281000 17 264.41 22.44 40.72 51.48 64.32 -65.30 6.88 16.72 25.92 39.08 1.88 7.00 13.20 23.24 2.48 6.08 11.68 21.12 able 4: The empirical power (in percentages) of our small-uniform W n statistic when ρ ( t ) = ρ ( t ) ≡ ε i has a N (0 , σ ε ) distribution with σ ε = − snrsnr Var( (cid:104)
X, ρ (cid:105) ) with snr = 10%. Here we use a data driven truncation parameter ˆ k n .In particular, “ˆ k n avg” is the average truncation value over n s number of simulations, and “ˆ k n std” is the associatedstandard error. The n here refers to sample size, and c here refers to the exponent associated with the definition of c n . The “log error” here refers to the average over all the simulations of the log of the error measure as given in (22).Section 4.3 describes our procedure to obtain the simulated levels of (cid:102) W n . Simulated level of (cid:102) W n (simple) Simulated level of (cid:102) W n (ridge) n ˆ k n avg ˆ k n std log error 1 5 10 20 log error 1 5 10 20 c = 350 1.48 0.61 -311.28 2.28 9.88 17.20 31.16 -362.82 2.64 9.40 17.20 31.08200 1.57 0.50 -417.98 0.88 3.60 6.72 13.92 -439.45 0.36 2.88 6.60 14.481000 1.93 0.26 -469.98 0.88 5.08 9.00 19.56 -488.47 0.48 3.92 8.88 18.16 c = 450 2.40 1.11 -130.99 2.88 9.36 15.88 27.32 -246.59 2.56 8.80 15.64 27.76200 2.45 0.60 -229.70 0.60 3.32 7.36 15.36 -289.38 0.68 3.60 6.96 13.561000 2.63 0.48 -357.13 0.60 3.84 7.84 15.04 -386.91 0.76 4.28 9.08 16.88 c = 550 3.89 2.25 29.26 2.76 8.92 15.48 24.32 -167.34 1.72 6.64 11.84 21.60200 3.83 1.04 -70.54 1.72 7.80 14.40 25.44 -189.46 1.28 6.80 12.68 22.041000 4.03 0.52 -203.37 2.12 7.00 13.80 24.84 -264.43 2.00 7.72 14.32 25.12 c = 750 10.68 8.11 336.74 13.80 26.40 34.12 45.28 -92.96 3.00 8.88 14.92 24.40200 9.66 3.64 226.90 11.08 25.12 36.68 52.12 -85.96 4.32 14.20 22.48 35.041000 9.76 1.60 87.98 11.68 29.28 40.52 56.60 -112.98 6.36 17.72 27.32 41.40 c = 850 16.05 11.21 472.39 19.12 32.80 42.00 53.84 -69.30 2.20 7.84 14.28 23.76200 15.39 6.86 367.03 20.48 37.32 47.24 59.76 -59.13 5.16 13.88 21.20 32.721000 15.15 2.85 225.53 22.56 40.92 51.60 64.92 -74.06 6.80 16.64 25.44 38.60 able 5: The empirical power (in percentages) of our small-uniform W n statistic along with Cardot et al. (2003)’s D n and T n statistics when ρ ( t ) = ρ ( t ) = sin( πt/
2) + sin(3 πt/
2) + sin(5 πt/
2) and ε i has a N (0 , σ ε ) distribution with σ ε = − snrsnr Var( (cid:104)
X, ρ (cid:105) ) with snr = 5%. Here we assume the truncation parameter k n is known. The n here refers tosample size, and c here refers to the exponent associated with the definition of c n . The “log error” here refers to theaverage over all the simulations of the log of the error measure as given in (22). Section 4.3 describes our procedureto obtain the simulated levels of (cid:102) W n . The nominal levels of D n and T n are based on their respective asymptoticdistributions as described in Section 4.2. (cid:102) W n (simple regularization) (cid:102) W n (ridge regularization)Simulated level Simulated level Nominal level of D n Nominal level of T n n k n log error 1 5 10 20 log error 1 5 10 20 1 5 10 20 1 5 10 20 c = 350 2 -43.90 20.16 41.92 55.28 69.20 -70.48 20.36 40.76 54.28 69.00 11.60 28.44 40.72 55.84 21.00 28.80 33.92 41.24200 2 -88.55 62.76 83.32 89.88 94.92 -117.76 63.96 83.04 89.88 94.84 65.68 85.64 91.44 95.72 79.12 85.84 88.68 91.561000 2 -196.76 100.00 100.00 100.00 100.00 -206.30 100.00 100.00 100.00 100.00 100.00 100.00 100.00 100.00 100.00 100.00 100.00 100.00 c = 450 3 56.50 13.88 33.04 45.28 59.52 -31.49 17.00 35.76 48.36 64.28 9.48 24.16 35.88 50.96 16.80 24.32 29.44 36.96200 3 -10.33 57.80 78.36 85.92 92.84 -68.13 58.56 77.72 86.80 92.76 59.20 79.28 86.44 93.56 71.76 79.36 83.04 86.841000 3 -117.00 100.00 100.00 100.00 100.00 -148.56 100.00 100.00 100.00 100.00 100.00 100.00 100.00 100.00 100.00 100.00 100.00 100.00 c = 550 4 238.38 11.04 28.80 39.52 53.56 43.23 9.92 25.80 37.20 51.00 7.72 22.72 33.68 48.92 14.16 22.36 27.84 35.36200 4 137.54 58.84 78.12 85.60 91.80 17.25 59.20 79.60 86.84 92.96 52.40 73.40 82.08 89.56 63.76 73.04 77.64 82.801000 5 -8.25 100.00 100.00 100.00 100.00 -64.56 100.00 100.00 100.00 100.00 100.00 100.00 100.00 100.00 100.00 100.00 100.00 100.00 c = 750 9 539.84 27.32 43.40 53.80 65.88 109.07 11.56 25.44 34.96 47.92 6.24 17.24 27.64 40.68 9.36 16.00 20.92 31.28200 10 450.40 70.16 84.84 90.32 94.84 114.77 57.84 76.20 83.88 91.28 40.28 61.92 72.64 83.20 47.28 59.96 66.32 73.881000 11 376.49 100.00 100.00 100.00 100.00 133.72 100.00 100.00 100.00 100.00 99.96 100.00 100.00 100.00 100.00 100.00 100.00 100.00 c = 850 14 692.65 38.80 53.52 61.60 72.08 125.39 10.32 22.64 32.56 44.20 8.36 20.72 29.04 43.20 11.08 18.68 23.92 33.48200 16 617.61 81.88 91.04 94.24 96.76 141.20 60.28 77.24 85.64 91.20 45.84 66.40 75.80 84.52 51.08 63.80 69.84 76.881000 17 567.22 100.00 100.00 100.00 100.00 183.44 100.00 100.00 100.00 100.00 99.96 100.00 100.00 100.00 100.00 100.00 100.00 100.00 able 6: The empirical power (in percentages) of our small-uniform W n statistic along with Cardot et al. (2003)’s D n and T n statistics when ρ ( t ) = ρ ( t ) = sin( πt/
2) + sin(3 πt/
2) + sin(5 πt/
2) and ε i has a N (0 , σ ε ) distributionwith σ ε = − snrsnr Var( (cid:104)
X, ρ (cid:105) ) with snr = 10%. Here we assume the truncation parameter k n is known. The n hererefers to sample size, and c here refers to the exponent associated with the definition of c n . The “log error” hererefers to the average over all the simulations of the log of the error measure as given in (22). Section 4.3 describesour procedure to obtain the simulated levels of (cid:102) W n . The nominal levels of D n and T n are based on their respectiveasymptotic distributions as described in Section 4.2. (cid:102) W n (simple regularization) (cid:102) W n (ridge regularization)Simulated level Simulated level Nominal level of D n Nominal level of T n n k n log error 1 5 10 20 log error 1 5 10 20 1 5 10 20 1 5 10 20 c = 350 2 -84.04 45.64 69.24 79.76 88.36 -111.81 45.04 69.32 79.80 88.48 31.16 55.08 66.00 78.96 46.00 55.60 60.44 66.40200 2 -138.32 96.48 99.12 99.48 99.72 -169.19 96.60 99.36 99.56 99.88 96.84 99.20 99.48 99.72 98.80 99.24 99.36 99.481000 2 -234.36 100.00 100.00 100.00 100.00 -245.04 100.00 100.00 100.00 100.00 100.00 100.00 100.00 100.00 100.00 100.00 100.00 100.00 c = 450 3 -8.32 35.12 59.00 70.60 82.40 -89.08 34.04 58.68 71.08 82.24 26.80 49.28 61.80 75.32 38.76 49.40 55.52 62.56200 3 -70.53 94.84 98.84 99.40 99.84 -134.69 94.68 98.76 99.32 99.80 94.92 98.56 99.52 99.84 97.44 98.56 99.08 99.521000 3 -175.07 100.00 100.00 100.00 100.00 -209.87 100.00 100.00 100.00 100.00 100.00 100.00 100.00 100.00 100.00 100.00 100.00 100.00 c = 550 4 162.68 29.00 51.52 64.64 77.20 -26.20 24.76 45.36 57.20 71.80 22.28 44.08 56.40 71.04 32.28 43.56 50.52 57.48200 4 70.80 95.08 98.68 99.44 99.80 -53.39 93.72 98.28 99.16 99.80 92.96 97.88 99.12 99.60 96.16 97.88 98.36 99.201000 5 -65.50 100.00 100.00 100.00 100.00 -121.91 100.00 100.00 100.00 100.00 100.00 100.00 100.00 100.00 100.00 100.00 100.00 100.00 c = 750 9 469.73 43.40 62.52 72.40 81.20 40.43 23.88 44.60 55.88 69.36 17.16 35.60 46.48 59.88 22.16 33.76 40.24 48.32200 10 394.15 96.32 99.12 99.60 99.76 53.64 93.68 97.68 98.72 99.56 84.20 94.48 96.68 98.80 88.20 93.44 95.64 96.921000 11 353.78 100.00 100.00 100.00 100.00 95.27 100.00 100.00 100.00 100.00 100.00 100.00 100.00 100.00 100.00 100.00 100.00 100.00 c = 850 14 624.19 54.16 68.96 76.20 84.00 55.43 23.52 42.60 53.24 64.68 19.44 35.40 46.00 59.76 23.16 33.00 39.32 48.00200 16 568.11 98.12 99.20 99.84 99.96 79.12 93.20 97.40 98.64 99.60 85.36 92.88 95.84 97.88 87.48 91.84 93.72 96.121000 17 553.00 100.00 100.00 100.00 100.00 154.03 100.00 100.00 100.00 100.00 100.00 100.00 100.00 100.00 100.00 100.00 100.00 100.00 able 7: The empirical power (in percentages) of our small-uniform W n statistic along with Cardot et al. (2003)’s D n and T n statistics when ρ ( t ) = ρ ( t ) = sin(2 πt ) and ε i has a N (0 , σ ε ) distribution with σ ε = − snrsnr Var( (cid:104)
X, ρ (cid:105) )with snr = 5%. Here we assume the truncation parameter k n is known. The n here refers to sample size, and c here refers to the exponent associated with the definition of c n . The “log error” here refers to the average over allthe simulations of the log of the error measure as given in (22). Section 4.3 describes our procedure to obtain thesimulated levels of (cid:102) W n . The nominal levels of D n and T n are based on their respective asymptotic distributions asdescribed in Section 4.2. (cid:102) W n (simple regularization) (cid:102) W n (ridge regularization)Simulated level Simulated level Nominal level of D n Nominal level of T n n k n log error 1 5 10 20 log error 1 5 10 20 1 5 10 20 1 5 10 20 c = 350 2 0.08 12.84 31.64 44.00 59.76 -4.85 13.36 31.40 43.64 58.56 9.08 22.64 32.12 47.84 15.92 23.20 27.56 32.24200 2 -13.51 41.60 62.60 73.68 82.80 -16.30 39.52 62.20 73.20 82.64 50.84 71.72 80.92 88.72 64.40 72.32 76.72 81.081000 2 -21.50 99.88 100.00 100.00 100.00 -21.66 99.92 100.00 100.00 100.00 99.92 100.00 100.00 100.00 100.00 100.00 100.00 100.00 c = 450 3 4.12 12.68 29.00 41.00 56.80 -12.96 12.48 29.84 40.64 55.00 10.28 24.76 35.56 50.60 17.20 24.84 30.00 36.40200 3 -43.32 48.80 69.60 79.12 87.24 -50.12 48.56 68.84 79.36 87.56 57.84 77.96 85.88 92.64 69.32 78.08 81.92 86.761000 3 -100.84 100.00 100.00 100.00 100.00 -102.20 99.92 100.00 100.00 100.00 100.00 100.00 100.00 100.00 100.00 100.00 100.00 100.00 c = 550 4 53.53 11.64 27.48 37.24 51.76 -32.05 9.92 23.56 33.96 47.52 9.00 23.52 34.08 49.08 15.24 23.16 28.32 35.48200 4 -27.28 63.00 81.44 88.04 93.20 -72.30 58.76 76.92 85.16 91.80 55.52 76.36 84.48 91.16 67.28 76.04 80.72 85.001000 5 -107.67 99.96 100.00 100.00 100.00 -118.97 100.00 100.00 100.00 100.00 99.96 100.00 100.00 100.00 99.96 100.00 100.00 100.00 c = 750 9 319.01 23.00 40.76 51.08 62.52 -18.76 10.40 25.16 33.72 46.68 5.20 17.08 26.80 40.68 7.64 14.96 21.00 30.08200 10 223.07 68.04 84.44 90.40 95.64 -44.75 60.36 77.80 84.76 91.04 38.60 60.72 71.80 82.40 46.12 59.24 65.84 73.161000 11 111.44 100.00 100.00 100.00 100.00 -66.55 100.00 100.00 100.00 100.00 99.92 100.00 100.00 100.00 99.96 100.00 100.00 100.00 c = 850 14 466.78 31.60 48.80 58.12 68.44 -11.74 10.20 24.76 34.32 46.60 5.12 16.04 26.00 39.32 7.24 14.52 20.44 30.12200 16 374.11 73.68 87.40 92.04 96.24 -28.19 56.96 76.12 84.08 89.92 34.28 56.00 66.80 78.60 39.40 52.28 59.56 68.281000 17 276.11 100.00 100.00 100.00 100.00 -35.84 100.00 100.00 100.00 100.00 99.72 100.00 100.00 100.00 99.84 100.00 100.00 100.00 able 8: The empirical power (in percentages) of our small-uniform W n statistic along with Cardot et al. (2003)’s D n and T n statistics when ρ ( t ) = ρ ( t ) = sin(2 πt ) and ε i has a N (0 , σ ε ) distribution with σ ε = − snrsnr Var( (cid:104)
X, ρ (cid:105) )with snr = 10%. Here we assume the truncation parameter k n is known. The n here refers to sample size, and c here refers to the exponent associated with the definition of c n . The “log error” here refers to the average over allthe simulations of the log of the error measure as given in (22). Section 4.3 describes our procedure to obtain thesimulated levels of (cid:102) W n . The nominal levels of D n and T n are based on their respective asymptotic distributions asdescribed in Section 4.2. (cid:102) W n (simple regularization) (cid:102) W n (ridge regularization)Simulated level Simulated level Nominal level of D n Nominal level of T n n k n log error 1 5 10 20 log error 1 5 10 20 1 5 10 20 1 5 10 20 c = 350 2 -6.12 28.44 49.00 61.36 74.48 -8.02 29.52 53.04 64.12 76.36 22.36 42.60 53.56 67.60 34.28 42.88 47.08 54.08200 2 -18.25 80.48 92.08 95.72 98.40 -18.98 80.20 92.28 95.44 97.96 87.72 96.40 97.68 99.32 93.96 96.44 97.24 97.801000 2 -22.50 100.00 100.00 100.00 100.00 -22.64 100.00 100.00 100.00 100.00 100.00 100.00 100.00 100.00 100.00 100.00 100.00 100.00 c = 450 3 -12.89 28.44 49.48 61.04 74.44 -20.69 29.00 49.08 61.24 73.96 25.64 46.32 59.08 71.52 36.76 46.44 52.40 60.04200 3 -59.21 89.80 96.36 98.08 99.16 -58.12 88.16 95.92 97.72 99.04 94.16 98.36 99.16 99.72 97.08 98.36 98.80 99.201000 3 -108.69 100.00 100.00 100.00 100.00 -106.68 100.00 100.00 100.00 100.00 100.00 100.00 100.00 100.00 100.00 100.00 100.00 100.00 c = 550 4 1.66 29.60 49.80 61.44 72.20 -45.34 25.04 45.36 56.84 69.44 24.64 45.80 58.64 71.68 33.64 45.32 52.32 60.08200 4 -70.58 94.92 98.36 99.28 99.80 -89.75 93.36 97.76 98.96 99.64 93.20 97.56 98.76 99.64 95.80 97.44 98.32 98.881000 5 -128.01 100.00 100.00 100.00 100.00 -130.83 100.00 100.00 100.00 100.00 100.00 100.00 100.00 100.00 100.00 100.00 100.00 100.00 c = 750 9 249.77 40.64 61.48 71.20 81.08 -40.74 26.52 46.76 57.16 70.00 16.52 34.60 46.44 61.48 21.84 32.84 39.40 48.48200 10 157.24 94.96 98.36 99.32 99.68 -74.06 93.76 98.08 99.00 99.56 83.00 94.04 96.64 98.44 87.64 93.56 95.12 96.881000 11 61.37 100.00 100.00 100.00 100.00 -95.50 100.00 100.00 100.00 100.00 100.00 100.00 100.00 100.00 100.00 100.00 100.00 100.00 c = 850 14 395.72 47.16 64.40 72.40 81.64 -36.90 25.20 44.28 54.96 67.56 13.24 31.32 41.84 56.20 17.48 28.72 35.40 44.64200 16 307.97 96.08 98.68 99.24 99.76 -62.10 92.12 97.24 98.48 99.36 76.64 90.12 94.12 97.16 80.64 88.60 91.68 94.521000 17 236.78 100.00 100.00 100.00 100.00 -65.24 100.00 100.00 100.00 100.00 100.00 100.00 100.00 100.00 100.00 100.00 100.00 100.00 Concluding remarks
This paper introduces a small-uniform statistic W n that is constructed as a fractional pro-gramming problem out of the FPCA estimator ˆ ρ of the slope ρ of the functional linearmodel. Our main result Theorem 2.1 shows W n converges in probability to the supre-mum of a Gaussian process (cid:102) W n . The key arguments to showing our main result are bytaking advantage of identifying the regressors’ underlying Hilbert space with its dual, andalso recent advances by Chernozhukov et al. (2014) in studying the suprema of empiricalprocesses indexed by functionals.We see two interesting directions in extending the small-uniform statistic. Firstly,while this paper focuses on the most commonly studied scalar-on-functional FLM, it seemsfeasible to extend our statistic to a functional-on-functional FLM. Secondly, the recentand growing literature on functional time series regressions represent a more excitingchallenge of extending our small-uniform statistic. In particular, it is clear one needs amodification of Step I in our proof of Theorem 2.1 to a functional time series context.More importantly, we conjecture the required extension of our Step II will call for newresults in studying the empirical processes constructed out of dependent random variables. For example, see Panaretos et al. (2013) and H¨ormann et al. (2015). ppendices A Proofs
Throughout the proofs, we will use the following asymptotic approximation notations. Wewill always use
C, c > x, y >
0, we denote x (cid:16) y to mean cy ≤ x ≤ Cy . For a real sequence { x n } , wewill write x n (cid:46) O ( a n ) to mean there exists some sequence { y n } such that | x n | ≤ C | y n | ,and | y n | ≤ c | a n | . Likewise, if { X n } is a sequence of random variables and { a n } is adeterministic sequence, we will write X n (cid:46) O P ( a n ) to mean there exists some sequence ofrandom variables { Y n } such that | X n | ≤ C | Y n | with Y n = O P ( a n ); i.e. for all (cid:15) > C > P ( | Y n /a n | ≤ C ) ≥ − (cid:15) for all n . We will also use || · || to denotethe operator norm; that is, for a bounded operator A ∈ B ( H , H ) =: B ( H ), we denote || A || := sup || h ||≤ || Ah || . We will denote the space of compact operators on H as B ( H ). Remark
A.1 (Our proof arguments vis-´a-vis that of Cardot et al. (2007)) . Our proof argu-ments of Step I are heavily inspired by Cardot et al. (2007). Indeed, our proofs of Proposi-tions A.5 and A.6 are heavily based on the arguments of (Cardot et al., 2007, Propositions2 and 3). But we also have two significant deviations. Firstly, a critical difference is thatwe neither define their set E j ( z ) nor use their Lemma 4. In particular, we can’t understandone their key arguments (last displayed equation on their pg 351) which seemingly requiresthe expression “sup z ∈B j E j ( z )”, of which measureability concerns arise. Instead of pursingthis argument direction, we simply recognize that the norm || ( z − Γ n ) − || is bounded aboveby the reciprocal of the distance from z ∈ B j ⊆ ρ (Γ n ) to its spectrum σ (Γ n ). And thanksto the choice of the contours C n and the event A n from Lemma A.1, this reciprocal canbe approximated by the reciprocal of the radius of B j ; see (53). This argument allows usto estimate || ( z − Γ n ) − || without the need to deal with potential measureability issuesassociated with the event E j ( z ).Secondly, we do not work with “square-roots” of the resolvent; that is, we do not writeexpressions like “( z − Γ) − / ”. It is unclear to us whether this is necessarily a well-definedobject. A lot of the work in our proofs goes to re-deriving the results of Cardot et al.(2007) using only the resolvent but without invoking a square-root of the resolvent. Thispartly explains why our convergence rates differ from theirs.Let’s first setup some preliminary definitions and results. Let ι := √−
1. Denote theorientated circle of the complex plane with center λ i and radius δ i / B i := { λ i + δ i e πιt : In general, if A is self-adjoint, then it is clear that its resolvent ( z − A ) − for z ∈ ρ ( A ) is also self-adjoint. In particular, being self-adjoint implies it is normal. But conventional definitions of the square-rootof an operator require the underlying operator to be normal and compact. Thus to define a square-root “( z − A ) − / ”, it necessarily requires ( z − A ) − to be compact (i.e. so is in B ( H )). But clearly( z − A ) ∈ B ( H ). That implies ( z − A )( z − A ) − = id H is compact — which is only possible if H isfinite-dimensional (a case which we explicitly do not consider throughout this paper). ∈ [0 , } . We also denote the orientated circle ˆ B i analogously with the center at ˆ λ i andradius ˆ δ i /
2. Define, C n := k n (cid:91) i =1 B i . With some abuse of notations, for the approximate reciprocal f n that satisfies Assump-tion 3, we will denote also f n as its analytic extension to the interior of C n . By Rieszfunctional calculus (see Conway (1994) and Kato (1995)), we can defineΓ † := f n (Γ) = 12 πι (cid:90) C n ( z − Γ) − f n ( z ) d z. (23)Moreover, the projection of H onto span { e , . . . , e k n } can be written as,Π k n = 12 πι (cid:90) C n ( z − Γ) − d z. (24)Define the event, A n := k n (cid:92) j =1 (cid:26) | ˆ λ j − λ j | < δ j (cid:27) (25)The following lemma show that, asymptotically, integrating over a collection of random cir-cle traces centered at the empirical eigenvalues is equivalent to integrating over a collectionof deterministic circle traces centered at the population eigenvalues. Lemma A.1.
Let f : C → C be an analytic function. Define ˆ C n := k n (cid:91) j =1 ˆ B j (26) Then we have πι (cid:90) ˆ C n f ( z )( z − Γ n ) − d z = A n πι (cid:90) C n f ( z )( z − Γ n ) − d z + r n (27) where r n is a random operator with || r n || = O P (cid:18) k n log k n √ n (cid:19) (28) Proof.
On the event A n , and by definition of the operator valued contour integral, it isimmediate that A n πι (cid:90) ˆ C n f ( z )( z − Γ n ) − d z = A n πι (cid:90) C n f ( z )( z − Γ n ) − d z C n tothe deterministic domain C n . Thus we can write,12 πι (cid:90) ˆ C n f ( z )( z − Γ n ) − d z = A n πι (cid:90) ˆ C n f ( z )( z − Γ n ) − d z + A cn πι (cid:90) ˆ C n f ( z )( z − Γ n ) − d z ≡ A n πι (cid:90) C n f ( z )( z − Γ n ) − d z + r n It remains to show the operator r n converges to zero in probability at some appropriaterate. Fix any (cid:15) ∈ (0 , P ( || r n || > (cid:15) ) ≤ P ( A cn > (cid:15) ) = P ( A cn )At this point, the rest of the proof follows exactly as in (Cardot et al., 2007, Lemma 5),who show P ( A cn ) ≤ C √ n k n log k n . This completes the proof.
Remark
A.2 . For completeness, we should check that even on the event A n and any j =1 , . . . , k n , the integral (cid:82) C n f ( z )( z − Γ n ) − d z is well defined for all z ∈ B j . This is particularlysince the resolvent ( · − Γ n ) − has singularities exactly at the eigenvalues of Γ n . Of course,if we are integrating over the random empirical contours ˆ B j the resolvent ( · − Γ n ) − is welldefined by definition. The finite rank operator Γ n has spectrum σ (Γ n ) = { , ˆ λ , . . . , ˆ λ n } for which we had assumed ˆ λ > ˆ λ > · · · > ˆ λ n >
0. So immediately by definition of theevent A n , the point z is not equal to any one of ˆ λ , . . . , ˆ λ k n . However, we still need tocheck such z is not equal to any one of ˆ λ k n +1 , . . . , ˆ λ n . Because of the strictly decreasingordering of the ˆ λ j ’s, it suffices to check that z does not equal to ˆ λ k n +1 .For contradiction, suppose there exists some z ∈ B k n with z = ˆ λ k n +1 . Then z =ˆ λ k n +1 = λ k n + δ k n / λ k n + ( λ k n − λ k n +1 ) / λ k n / − λ k n +1 /
2. But on the event A n ,we have | ˆ λ k n − λ k n | < δ k n /
4. This implies δ k n / > | λ k n / − λ k n +1 / − λ k n | = δ k n /
2, whichis a contradiction.In all, this implies on the event A n the resolvent ( · − Γ n ) − is well defined on B j for all j = 1 , . . . , k n .The primary uses of Lemma A.1 are with the case f ≡ f as f n . With onlya little more work via the Borel-Cantelli lemma, (Crambes and Mas, 2013, Proposition 13)shows P (lim sup A cn ) = 0 if ( k n log k n ) /n →
0. But for our purposes we want to keep track Actually from (Conway, 1994, Proposition VII.4.6), we clearly don’t need the strong condition that f is analytic over all of C . But this strong condition is easier to state and suffices for our paper.
32f the various rates of convergences and thus we do not invoke this result. Note that a lookinto the proof shows the result holds regardless of whether f is dependent on n . Indeed,Lemma A.1 motivates the definitionsˆΠ k n := A n πι (cid:90) ˆ C n ( z − Γ n ) − d z ≡ A n πι (cid:90) C n ( z − Γ n ) − d z Γ † n := A n πι (cid:90) ˆ C n f n ( z )( z − Γ n ) − d z ≡ A n πι (cid:90) C n f n ( z )( z − Γ n ) − d z (29)Let’s observe a simple bound on the roughened standard deviation t n ( h ) that we willrepeatedly use. Lemma A.2. (i) For any h ∈ J n , t n ( h ) ≥ f n ( λ ) λ / k n || h || + a n (ii) Provided Assumption 4 holds, then for n sufficiently large, sup h ∈J n (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ht n ( h ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) (cid:46) O ( (cid:112) k n log k n ) Proof. (i): For any h ∈ J n , we can write h = (cid:80) k n j =1 b j e j for some b j ∈ R such that || h || = (cid:80) k n j =1 b j ≤ t n ( h ) = (cid:113) || Γ / Γ † h || + a n ≥ (cid:118)(cid:117)(cid:117)(cid:116) k n (cid:88) j =1 b j f n ( λ j ) λ j + a n ≥ (cid:118)(cid:117)(cid:117)(cid:116) f n ( λ ) λ k n k n (cid:88) j =1 b j + a n = f n ( λ ) λ / k n || h || + a n (ii): It is clear the supremum is not achieved at h = 0 for any n . Thus applying thecalculations of part (i) for any h ∈ J n \ { } , (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ht n ( h ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ || h || f n ( λ ) λ / k n || h || + a n ≤ f n ( λ ) λ / k n + a n Since f n ( λ ) λ / k n = (cid:16) λ O (cid:16) √ n (cid:17) + 1 (cid:17) O (cid:16)(cid:113) k n log k n (cid:17) = O (cid:16) √ k n log k n (cid:17) , we have f n ( λ ) λ / k n + a n = O (cid:16) max (cid:110) √ k n log k n , a n (cid:111)(cid:17) = O (cid:16) √ k n log k n (cid:17) where the last equality follows from As-sumption 4.As outlined in the proof outline of this paper’s main result Theorem 2.1, there are twodistinct steps to proving the result. 33 .1 Step I The T n term is directly handled by (Cardot et al., 2007, Lemma 6); we record the resulthere for completeness. Proposition A.3. If (1) holds, ||T n || = o P (cid:18) √ n (cid:19) The next result is critical in the proofs of Propositions A.6 and A.5.
Lemma A.4.
For any sufficiently large j and n . E (cid:2) || ( z − Γ) − (Γ n − Γ) || (cid:3) (cid:46) j log jn , for all z ∈ B j Proof.
Since Γ , Γ n ∈ B ( H ) ⊆ B ( H ) then it follows that the resolvent R ( z ; Γ) := ( z − Γ) − is also in B ( H ), and thus (Γ − Γ n )( z − Γ) − ∈ B ( H ). Hence we can bound theoperator norm || · || by the Hilbert-Schmidt norm || · || HS 12 || (Γ − Γ n )( z − Γ) − || ≤ || (Γ − Γ n )( z − Γ) − || ≡ ∞ (cid:88) l =1 || (Γ − Γ n )( z − Γ) − ( e l ) || = ∞ (cid:88) l,k =1 z − λ l ) |(cid:104) (Γ − Γ n ) e l , e k (cid:105)| (30)Observe that for z ∈ B j and l (cid:54) = j , by the triangle inequality we have that | z − λ l | ≥ | λ l − λ j | . (31)In addition, by the KL expansion, we have for all l, k = 1 , , . . . E [ |(cid:104) (Γ n − Γ) e l , e k (cid:105)| ] ≤ n E (cid:104) (cid:104) X , e l (cid:105) (cid:104) X , e k (cid:105) (cid:105) ≤ Mn λ l λ k (32)Thus, applying (32) and (31) into (30) E (cid:2) || (Γ − Γ n )( z − Γ) − || (cid:3) = 4 M λ j δ j n ∞ (cid:88) k =1 λ k + 4 M n ∞ (cid:88) l (cid:54) = j λ l ( λ l − λ j ) ∞ (cid:88) k =1 λ k , (33)for all z ∈ B j . See (Conway, 1994, Exercise IX.2.19)
34t this point we need to investigate the behavior of (cid:80) l (cid:54) = j λ l ( λ l − λ j ) . Let’s decompose, (cid:88) l (cid:54) = j λ l ( λ l − λ j ) = j − (cid:88) l =1 + j (cid:88) l = j +1 + ∞ (cid:88) l =2 j +1 λ l ( λ l − λ j ) =: T + T + T (34)By (Cardot et al., 2007, Lemma 1) where we have λ l − λ j ≥ (1 − l/j ) λ l , and recallingthat the eigenvalues are strictly decreasing, T = j − (cid:88) l =1 λ l ( λ l − λ j ) ≤ λ j − j − (cid:88) l =1 − l/j ) = j λ j −
16 ( π − ψ (1) ( j )) (35)where ψ ( m ) is the polygamma function of order m . Similarly, we have T ≤ λ j +1 λ j j π − ψ (1) ( j + 1)) (36)And since for l ≥ j + 1 we have λ j − λ l ≥ λ j − λ j +1 > λ j − λ j ≥ (1 − j/ (2 j )) λ j = 2 λ j ,this implies, T < λ j ∞ (cid:88) l =2 j +1 λ l ≤ λ j ((2 j + 1) + 1) λ j +1 < λ j j + 1) λ j = j + 12 λ j (37)where the second inequality follows from (Cardot et al., 2007, Lemma 1).Now we use the following well-known bounds of the polygamma function: for m ≥ x >
0, ( m − x m + m !2 x m +1 ≤ ( − m +1 ψ ( m ) ( x ) ≤ ( m − x m + m ! x m +1 (38)and applying (38) to (35)-(37) we obtain the bounds, T ≤ C j λ j , T ≤ C j λ j , T ≤ C j + 1 λ j (39)Putting (39) back into (34), we arrive at, (cid:88) l (cid:54) = j λ l ( λ l − λ j ) ≤ C j λ j (40) Observe that this is a different term compared to that of (Cardot et al., 2007, Lemma 2) The polygamma function of order m is defined to be the ( m + 1)th derivative of the logarithm of thegamma function; or equivalently, it is the m th derivative of the digamma function. E (cid:2) || (Γ − Γ n )( z − Γ) − || (cid:3) ≤ C n max (cid:40) λ j δ j , j λ j (cid:41) , (41)for all z ∈ B j .Applying Condition 2 shows that for sufficiently large j ,max (cid:40) λ j δ j , j λ j (cid:41) ≤ C max (cid:8) j log j, j log j (cid:9) = Cj log j This completes the proof.
Proposition A.5.
For sufficiently large n , sup h ∈J n (cid:12)(cid:12)(cid:12)(cid:12)(cid:28) Y n , ht n ( h ) (cid:29)(cid:12)(cid:12)(cid:12)(cid:12) (cid:46) O P (cid:32) k / n (log k n ) / √ n (cid:33) Proof.
Firstly using Lemma A.1, we haveˆΠ k n − Π k n ≡ πι k n (cid:88) j =1 (cid:34)(cid:90) ˆ B j ( z − Γ n ) − d z − (cid:90) B j ( z − Γ) − d z (cid:35) = 12 πι k n (cid:88) j =1 (cid:34) A n (cid:90) B j ( z − Γ n ) − d z − ( A n + A cn ) (cid:90) B j ( z − Γ) − d z (cid:35) + r n = A n πι k n (cid:88) j =1 (cid:90) B j (cid:2) ( z − Γ n ) − − ( z − Γ) − (cid:3) d z − A cn πι k n (cid:88) j =1 (cid:90) B j ( z − Γ) − d z + r n By the resolvent identity, and this is feasible only because we are on the event A n , A n πι k n (cid:88) j =1 (cid:90) B j (cid:2) ( z − Γ n ) − − ( z − Γ) − (cid:3) d z = A n πι k n (cid:88) j =1 (cid:90) B j ( z − Γ n ) − (Γ n − Γ)( z − Γ) − d z A n πι k n (cid:88) j =1 (cid:90) B j ( z − Γ n ) − (Γ n − Γ)( z − Γ) − d z =: S n + R n (42)where we define, S n := A n πι k n (cid:88) j =1 (cid:90) B j ( z − Γ) − (Γ n − Γ)( z − Γ) − d z (43a) R n := A n πι k n (cid:88) j =1 (cid:90) B j ( z − Γ) − (Γ n − Γ)( z − Γ) − (Γ n − Γ)( z − Γ n ) − d z (43b)Thus, the above equation can be rewritten as,ˆΠ k n − Π k n = S n + R n − A cn πι k n (cid:88) j =1 (cid:90) B j ( z − Γ) − d z + r n (44)By the triangle inequality and Cauchy-Schwartz inequality,sup h ∈J n (cid:12)(cid:12)(cid:12)(cid:12)(cid:28) Y n , ht n ( h ) (cid:29)(cid:12)(cid:12)(cid:12)(cid:12) ≤ sup (cid:12)(cid:12)(cid:12)(cid:12)(cid:28) S n ρ, ht n ( h ) (cid:29)(cid:12)(cid:12)(cid:12)(cid:12) + sup (cid:12)(cid:12)(cid:12)(cid:12)(cid:28) R n ρ, ht n ( h ) (cid:29)(cid:12)(cid:12)(cid:12)(cid:12) + A cn π k n (cid:88) j =1 (cid:90) B j sup (cid:12)(cid:12)(cid:12)(cid:12)(cid:28) ( z − Γ) − ρ, ht n ( h ) (cid:29)(cid:12)(cid:12)(cid:12)(cid:12) d z + sup (cid:12)(cid:12)(cid:12)(cid:12)(cid:28) r n ρ, ht n ( h ) (cid:29)(cid:12)(cid:12)(cid:12)(cid:12) (45)We will individually bound the four terms on the right hand side of (45). Let’s first dis-cuss those last two remaining terms. By again Cauchy-Schwartz inequality and Lemma A.1,sup h ∈J n (cid:12)(cid:12)(cid:12)(cid:12)(cid:28) r n ρ, ht n ( h ) (cid:29)(cid:12)(cid:12)(cid:12)(cid:12) ≤ || ρ || sup (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ht n ( h ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) || r n || The || r n || term is bounded by Lemma A.1.For the third integral expression, by Cauchy-Schwartz inequality again A cn π k n (cid:88) j =1 (cid:90) B j sup h ∈J n (cid:12)(cid:12)(cid:12)(cid:12)(cid:28) ( z − Γ) − ρ, ht n ( h ) (cid:29)(cid:12)(cid:12)(cid:12)(cid:12) d z ≤ || ρ || sup (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ht n ( h ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) A cn π k n max j =1 ,...,k n sup z ∈B j || ( z − Γ) − || diam( B j ) < || ρ || sup (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ht n ( h ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) π A cn k n (cid:46) sup (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ht n ( h ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) O P (cid:18) k n log k n √ n (cid:19) k n (46)37n particular, we used that for any z ∈ B j , || ( z − Γ) − || ≤ z, σ (Γ)) = 1 δ j / σ (Γ) denotes the spectrum of Γ. By the choice of radii δ j / B j ’s, any point z ∈ B j is not an eigenvalue of Γ, which by definition implies z is in theresolvent set of Γ. Hence the first inequality of (47) follows from standard results on thenorm of a resolvent (e.g. (Conway, 1994, Proposition VII.3.9)). The equality dist( z, σ (Γ)) = δ j / B j .In addition diam( B j ) = δ j , and that P ( A cn ) (cid:46) k n log k n √ n from the proof of Lemma A.1.Thus by Lemma A.2, the two remainder terms of (45) are of order,sup (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ht n ( h ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) O P (cid:18) k n log k n √ n + k n log k n √ n (cid:19) = O ( (cid:112) k n log k n ) O P (cid:18) k n log k n √ n + k n log k n √ n (cid:19) = O P (cid:32) k / n (log k n ) / √ n (cid:33) (48)Now we turn to bounding the S n term in (45).The S n term: By triangle inequality, || S n || ≤ π k n (cid:88) j =1 (cid:90) B j || ( z − Γ) − (Γ n − Γ) || || ( z − Γ) − || d z (49)Firstly, we have the bound sup z ∈B j || ( z − Γ) − || < /δ j again by (47). Thus by Lemma A.4,we have in all E [ ||S n || ] (cid:46) k n (cid:88) j =1 sup z ∈B j E [ || ( z − Γ) − (Γ n − Γ) || ] sup z ∈B j || ( z − Γ) − || diam( B j ) (cid:46) k n (cid:88) j =1 (cid:114) j log jn · δ j · δ j (cid:46) k / n (log k n ) / √ n It is worth noting our discussions of this S n term is substantially different than that of (Cardot et al.,2007, Proposition 2). In particular, while the integral of the j th summand of S n has an explicit form dueto Dauxois et al. (1982), but for our purposes of obtaining moment bounds, knowing this explicit form isunnecessary. In contrast to our purposes, the desired computation of the predicted value E [ (cid:104) S n ρ, X n +1 (cid:105) ]in (Cardot et al., 2007, Proposition 2) gives them an extra smoothing property which can take advantageof the explicit form of S n . Indeed, an earlier draft of this paper uses analogous proofs methods of this S n term as the authors and we arrive at the same rate in (50).
38y Markov’s and Jensen’s inequality and Lemma A.2sup h ∈J n |(cid:104) S n ρ, x (cid:105)| (cid:46) sup h ∈J n (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ht n ( h ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) O P (cid:32) k / n (log k n ) / √ n (cid:33) = O P (cid:18) k n log k n √ n (cid:19) (50)This completes the discussion of the S n term.The R n term: We first define for j = 1 , . . . , k n , T j,n := A n (cid:90) B j ( z − Γ) − (Γ n − Γ)( z − Γ) − (Γ n − Γ)( z − Γ n ) − d z (51)so that we can write R n = 12 πι k n (cid:88) j =1 T j,n By again the Cauchy-Schwartz inequality, we havesup h ∈J n (cid:12)(cid:12)(cid:12)(cid:12)(cid:28) T j,n ρ, ht n ( h ) (cid:29)(cid:12)(cid:12)(cid:12)(cid:12) ≤ || ρ || sup (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ht n ( h ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) (cid:90) B j || ( z − Γ) − (Γ n − Γ)( z − Γ) − (Γ n − Γ)( z − Γ n ) − || A n d z ≤ || ρ || sup (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ht n ( h ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) (cid:90) B j || ( z − Γ) − (Γ n − Γ) || || ( z − Γ n ) − || A n d z (52)Recall from Remark A.2 that z ∈ B j is also in the resolvent set of Γ n . So we havedist( z, σ (Γ n )) = | z − ˆ λ j | ≥ | z − λ j | − | ˆ λ j − λ j | = δ j / − δ j / δ j /
2. By the same argumentsfor (47), we have || ( z − Γ n ) − || A n ≤ z, σ (Γ n )) A n ≤ δ j / A n ≤ δ j (cid:46) j log j (53)Consider the expectation and use Lemma A.4, E (cid:34)(cid:90) B j || ( z − Γ) − (Γ n − Γ) || d z (cid:35) = (cid:90) B j E (cid:2) || ( z − Γ) − (Γ n − Γ) || (cid:3) d z (cid:46) j log jn diam( B j )= j log jn δ j (cid:46) j log jn j log j = j n
39y Markov’s inequality, we thus have that (cid:90) B j || ( z − Γ) − (Γ n − Γ) || d z = O P (cid:18) j n (cid:19) (54)Putting (54) and (53) together we have (cid:90) B j || ( z − Γ) − (Γ n − Γ) || || ( z − Γ n ) − || A n d z (cid:46) O P (cid:18) ( j log j ) j n (cid:19) = O P (cid:18) j log jn (cid:19) So by Lemma A.2,sup h ∈J n (cid:12)(cid:12)(cid:12)(cid:12)(cid:28) R n , ht n ( h ) (cid:29)(cid:12)(cid:12)(cid:12)(cid:12) (cid:46) sup h ∈J n (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ht n ( h ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) O P n k n (cid:88) j =1 j log j (cid:46) O ( (cid:112) k n log k n ) O P (cid:18) k n log k n n (cid:19) = O P (cid:32) k / n (log k n ) / n (cid:33) (55)This completes the proof of the R n term.Summary: Now we can finally put everything together. Putting (50), (55) and (48)back into (45),sup h ∈J n (cid:12)(cid:12)(cid:12)(cid:12)(cid:28) Y n , ht n ( h ) (cid:29)(cid:12)(cid:12)(cid:12)(cid:12) (cid:46) O P (cid:18) k n log k n √ n (cid:19) + O P (cid:32) k / n (log k n ) / n (cid:33) + O P (cid:32) k / n (log k n ) / √ n (cid:33) This completes the proof.
Proposition A.6.
For sufficiently large n , sup h ∈J n (cid:12)(cid:12)(cid:12)(cid:12)(cid:28) S n , ht n ( h ) (cid:29)(cid:12)(cid:12)(cid:12)(cid:12) (cid:46) O P (cid:32) k / n (log k n ) / √ n (cid:33) Proof.
The proof of this result closely follows the development of the proof of Proposi-tion A.5. By an entirely analogous arguments leading up to (42), we obtain the decompo-40ition sup h ∈J n (cid:12)(cid:12)(cid:12)(cid:12)(cid:28) S n , ht n ( h ) (cid:29)(cid:12)(cid:12)(cid:12)(cid:12) ≤ sup h ∈J n (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ht n ( h ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) (cid:32) A n πι k n (cid:88) j =1 (cid:90) B j | f n ( z ) | || ( z − Γ) − (Γ n − Γ) || || ( z − Γ) − U n || d z + A n πι k n (cid:88) j =1 (cid:90) B j | f n ( z ) | || ( z − Γ) − (Γ n − Γ) || || ( z − Γ n ) − || || U n || d z + A cn πι k n (cid:88) j =1 (cid:90) B j | f n ( z ) | || ( z − Γ) − || || U n || d z + || r n || || U n || (cid:33) (56)Firstly, let’s see that U n = O P (1). Since || U n || ≤ n (cid:80) ni =1 || X i || | ε i | , taking expectationsand applying Jensen’s inequality, we have E [ || U n || ] ≤ C . By Markov’s inequality, thisimplies || U n || = O P (1) (57)Combining (57) along with the analogous arguments of the last two expressions of (48)from Proposition A.5, we have thatLast two expressions inparentheses of (56) (cid:46) O P (cid:18) k n log k n √ n (cid:19) O P (1) + O P (cid:18) k n log k n √ n (cid:19) O P (1)= O P (cid:18) k n log k n √ n (cid:19) (58)It thus suffices to concentrate the discussion on the first two expressions of (56). Thanksto the arguments from Proposition A.5, we have already handled the terms || ( z − Γ n ) − || A n and (cid:82) B j || ( z − Γ) − (Γ n − Γ) || d z . Thus it remains to discuss the terms: (i) | f n ( z ) | and (ii) || ( z − Γ) − U n || . Term (i) : By Condition 1, we have thatsup z ∈B j | f n ( z ) | ≤ δ j (cid:18) C √ n (cid:19) (cid:46) ( j log j ) (cid:18) √ n (cid:19) (59) Term (ii) : Fix any z ∈ B j . By definition of the adjoint of a linear operator and usingthe Hilbert-Schmidt norm, || ( z − Γ) − U n || = || U n ( z − Γ) − || ≤ || U n ( z − Γ) − || = ∞ (cid:88) l =1 z − λ l ) n n (cid:88) i =1 (cid:104) X i , e l (cid:105) ε i + 1 n n (cid:88) i (cid:54) = j (cid:104) X i , e l (cid:105) (cid:104) X j , e l (cid:105) ε i ε j E [ || ( z − Γ) − U n || ] ≤ σ ε n ∞ (cid:88) l =1 λ l ( z − λ l ) = σ ε n λ j ( z − λ j ) + ∞ (cid:88) l (cid:54) = j λ l ( z − λ l ) ≤ σ ε n (cid:18) λ j ( δ j / + C j λ j (cid:19) (cid:46) n (cid:18) j log j ( j log j ) + j ( j log j ) (cid:19) = 1 n (cid:0) j log j + j log j (cid:1) (cid:46) j log jn where the third line follows from (40) in the proof of Proposition A.5. Thus by Chebyshev’sinequality, it follows we have for all z ∈ B j , || ( z − Γ) − U n || = O P (cid:32) j / (log j ) / √ n (cid:33) (60)Putting (59) and (60) together along with the already discussed terms from Proposi-tion A.5, it followsThe j th summand ofthe 1st expression inparentheses of (56) (cid:46) ( j log j ) (cid:18) √ n (cid:19) · O P (cid:32)(cid:114) j n (cid:33) · O P (cid:32) j / (log j ) / √ n (cid:33) (cid:46) O P (cid:32) j / (log j ) / n (cid:33) (61)Let’s now discuss the second expression of (56). Using (59), (54), (53) and (57), wehave The j th summand ofthe 2nd expression inparentheses of (56) (cid:46) ( j log j ) (cid:18) √ n (cid:19) · O P (cid:18) j n (cid:19) · O a . s . ( j log j ) · O P (1) (cid:46) O P (cid:18) j (log j ) n (cid:19) (62)42inally, summing (61) and (62), using (58) in (56) and using Lemma A.2,sup h ∈J n (cid:12)(cid:12)(cid:12)(cid:12)(cid:28) S n , ht n ( h ) (cid:29)(cid:12)(cid:12)(cid:12)(cid:12) (cid:46) O ( (cid:112) k n log k n ) (cid:32) O P (cid:32) k / n (log k n ) / n (cid:33) + O P (cid:18) k n (log k n ) n (cid:19) + O P (cid:18) k n log k n √ n (cid:19)(cid:33) = O P (cid:32) k / n (log k n ) / √ n (cid:33) This completes the proof.The following result summarizes the discussions of Step I.
Proposition A.7 (Nuisance terms converge rate) . For sufficiently large n , sup h ∈J n (cid:12)(cid:12)(cid:12)(cid:12) (cid:104) ( T n + Y n + S n ) ρ, h (cid:105) t n ( h ) (cid:12)(cid:12)(cid:12)(cid:12) (cid:46) O P (cid:32) k / n (log k n ) / √ n (cid:33) . Proof.
By Propositions A.3, A.5 and A.6, the displayed equation on the left hand side isbounded above by O P ( (cid:112) k n log k n ) o P (cid:18) √ n (cid:19) + O P (cid:32) k / n (log k n ) / √ n (cid:33) + O P (cid:32) k / n (log k n ) / √ n (cid:33) A.2 Step II
We now move onto Step II. The key to showing that Step II holds is to cast the R n term into an empirical process theory framework and apply the approximation results ofChernozhukov et al. (2014). Let’s setup some standard notations. Let N ( (cid:15), T, || · || ) denotethe covering number of radius (cid:15) > T, || · || ). Let’s also denote theuniform entropy integral (see (van der Vaart and Wellner, 1996, Chapter 2.14)) for the set T equipped with measurable cover F , J ( δ, T ) := sup Q (cid:90) δ (cid:113) N ( (cid:15) || F || Q, , T, L ( Q )) d (cid:15) where the supremum is taken over all discrete probability measures Q with || F || Q, > T , we will denote l ∞ ( T ) as the space of all bounded functions T → R with the uniform norm || f || T := sup t ∈ T | f ( t ) | .43y the Riesz representation theorem, J n and its dual space J ∗ n are isometrically iso-morphic (this is especially since we’re working with real valued Hilbert spaces). Thus foreach h ∈ J n , we can identify h ∈ J n with h ∗ ∈ J ∗ n such that h ∗ ( · ) = (cid:104) h, ·(cid:105) . With someabuse of notations, we will write R n ( h ) ≡ n n (cid:88) i =1 (cid:104) h, Γ † X i ε i (cid:105) = 1 n n (cid:88) i =1 h ∗ (Γ † X i ε i ) = 1 n n (cid:88) i =1 h ∗ ( V i ) =: R n ( h ∗ )for V i,n := Γ † X i ε i . Note that Γ † depends on n but is otherwise entirely deterministic, andhence for each n , { V ,n , . . . , V n,n } is an iid sequence. Note that (cid:112) P | h ∗ | = t n ( h ) − a n andnoting that P h ∗ = E [ h ∗ ( V )] = 0, we normalize to write √ n R n ( h ) σ ε t n ( h ) = 1 √ n n (cid:88) i =1 h ∗ ( V i,n ) σ ε ( (cid:112) P | h ∗ | + a n ) =: G n g, g ∈ G n (63)where we define the class, G n := (cid:40) h ∗ σ ε ( (cid:112) P | h ∗ | + a n ) : h ∗ ∈ J ∗ n (cid:41) (64)In other words, we have the equivalence between the expressions sup h ∈J n √ nσ ε t n ( h ) R n ( h ) andsup f ∈G n G n g . Most importantly this casts the handling of the R n term into an empiricalprocess framework.Let’s first record some basic entropic properties about G n . These entropic propertiesare particularly simple to derive precisely due to the structure of J n . Lemma A.8 (Entropic properties of G n ) . (i) The VC index of G n is V ( G n ) ≤ ( k n + 2) .(ii) A measurable cover for G n is the (constant) function F n ( g ) ≡ σ ε (cid:16) f n ( λ ) λ / k n + a n (cid:17) , for all g ∈ G n . (iii) The (cid:15) -covering number for G n satisfies for any discrete probability measure Q , N ( (cid:15) || F n || Q, , G n , L ( Q )) ≤ (cid:18) A n (cid:15) (cid:19) ν n where A n := (cid:0) KV ( G n )(16 e ) V ( G n ) (cid:1) V ( G n ) − and ν n := 2( V ( G n ) − , and (cid:15) ∈ (0 , .(iv) Assume ν n ≥ . Then the uniform entropy integral for G n satisfies, J ( δ, G n ) ≤ δ √ ν n (cid:16) (cid:112) A n /δ ) (cid:17) v) If δ ∈ (0 , is a constant that is independent of n , then for sufficiently large n , J ( δ, G n ) (cid:46) δ (cid:16)(cid:112) O ( V ( G n )) + (cid:112) O ( V ( G n )) − log δ (cid:17) (cid:46) δ O ( k n ) Proof. (i) Since J n is isomorphic to R k n , and thus J ∗ n is isomorphic to R k n . By van derVaart and Wellner (1996) Lemma 2.6.15 and Lemma 2.6.18(vii), the VC index of G n satisfies V ( G n ) ≤ ( k n + 2) .(ii) Recall Lemma A.2. Moreover, by Riesz representation theorem || h || = || h ∗ || forany h ∈ J n and where h ∗ is its unique dual. For any non-zero g ∈ G n there exists somenon-zero h ∗ ∈ J ∗ n such that, || g || = (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) h ∗ σ ε ( (cid:112) P | h ∗ | + a n ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ || h ∗ || σ ε (cid:16) f n ( λ ) λ / k n || h || + a n (cid:17) ≤ σ ε (cid:16) f n ( λ ) λ / k n + a n (cid:17) (iii) By (van der Vaart and Wellner, 1996, Theorem 2.6.7), N ( (cid:15) || F n || Q, , G n , L ( Q )) ≤ KV ( G n )(16 e ) V ( G n ) (cid:18) (cid:15) (cid:19) V ( G n ) − for an universal constant K and (cid:15) ∈ (0 , A n and ν n .(iv) By part (iii) and change of variables, J ( δ, G n ) ≤ (cid:90) δ (cid:112) ν n log( A n /(cid:15) ) d (cid:15) ≤ A n √ ν n (cid:90) ∞ A n /δ √ (cid:15)(cid:15) d (cid:15) Observe we have the indefinite integral, (cid:90) √ xx d x = − √ xx − e Γ (cid:18) , x (cid:19) + constwhere here Γ( s, z ) := (cid:82) ∞ z t s − e − t d t is the upper incomplete gamma function. Since for afixed s , lim z →∞ Γ( s, z ) = 0, it follows that lim x →∞ Γ (cid:0) , x ) (cid:1) = 0. It is clear thatlim x →∞ √ xx = 0. Thus it follows, (cid:90) ∞ A n /δ √ (cid:15)(cid:15) d (cid:15) = (cid:112) A n /δ ) A n /δ + 12 e Γ (cid:18) , log( A n /δ ) (cid:19) = (cid:112) A n /δ ) A n /δ + e √ π efrc (cid:16)(cid:112) A n /δ ) (cid:17) where efrc is the complementary error function, efrc ( x ) := 1 − erf ( x ) = 1 − √ π (cid:82) x e − t d t = √ π (cid:82) ∞ x e − t d t . Using the bound efrc ( x ) ≤ e − x , we obtain the bound as displayed.45v) By (iii), ν n log A n = log K + log V ( G n ) + V ( G n ) log(16 e ) = O ( V ( G n ))and moreover, ν n = 2( V ( G n ) −
1) = O ( V ( G n )). And thus by (iv), J ( δ, G n ) ≤ δ (cid:16)(cid:112) O ( V ( G n )) + (cid:112) O ( V ( G n )) + O ( V ( G n )) − log δ (cid:17) = δ (cid:16)(cid:112) O ( V ( G n )) + (cid:112) O ( V ( G n )) − log δ (cid:17) (cid:46) δ O ( (cid:112) V ( G n ))Apply (i) which implies V ( G n ) = O ( k n ) and we have the displayed result.Next we state a slightly modified version of the key results of Chernozhukov et al.(2014) that’s applicable for our context. Theorem A.9 (Gaussian approximation to suprema of empirical processes index by VCtype classes; Chernozhukov et al. (2014)) . Fix n ≥ . Let ( G n , || · || ) be a subset of a normedseparable space of real functions f : X → R and is equipped with an envelope F n . Suppose:(i) G n is pre-Gaussian. That is, there exists a tight Gaussian random variable G P,n in l ∞ ( G n ) with mean zero and covariance function, E [ G P,n ( f ) G P,n ( g )] = P ( f g ) = E [ f ( Z ) g ( Z )] , for all f, g ∈ G n (ii) The (cid:15) -covering number of G n satisfies sup Q N ( (cid:15) || F n || Q, , G n , L ( Q )) ≤ (cid:0) A n (cid:15) (cid:1) ν n forsome ν n ≥ and A n > , and where the supremum is taken over all discrete proba-bility measures Q such that || F n || Q, > ;(iii) For some b n ≥ σ n > and q ∈ [4 , ∞ ] , we have sup f ∈G n P | f | k ≤ σ n b k − n for k = 2 , and || F n || P,q ≤ b n .Let Z n := G n f . Then for every γ ∈ (0 , , there exists a random variable ˜ Z n :=sup f ∈G n G P,n f such that P (cid:32) | Z n − ˜ Z n | > b n K n γ / n / − /q + ( b n σ n ) / K / n γ / n / + ( b n σ n K n ) / γ / n / (cid:33) ≤ C (cid:18) γ + log nn (cid:19) where K n := cν n max (cid:26) log n , (cid:16)(cid:16) (cid:113) A n b n σ n (cid:17)(cid:17) (cid:27) and c, C > are constants thatonly depend on q . emark A.3 . This result is nothing more than Corollary 2.2 of Chernozhukov et al. (2014),which is based on their key result Theorem 2.1. We refer to their paper for the proof. Butlet’s remark on what small proof modifications we need to adapt their result to our Theo-rem A.9. The major difference between our stated result and their Corollary 2.2 is the con-dition on the constant A in the covering number bound. Their Corollary 2.2 requires A ≥ e but we do not impose this requirement here. Indeed, from Lemma A.8(i), it is unnatural torequire that A n ≥ e for all n , especially since we only have an upper bound for V ( G n ) andnot a lower bound. For their proofs, the authors only require the condition A ≥ e to arriveat the uniform entropy integral condition J ( δ, F ) (cid:46) δ (cid:112) ν log( A/δ ). From Lemma A.8(iv),we have instead a slightly larger bound of J ( δ, G n ) ≤ δ √ ν n (1 + (cid:112) A n /δ )). Con-sequently and by inspecting the proofs of their Corollary 2.2, it suffices to replace theirdefinition of K n = cν (log n ∨ log Abσ ) with our slightly larger K n , then the remainder oftheir proof goes through to our case.This following result will conclude Step II. Proposition A.10.
Fix any γ ∈ (0 , and assume k n /n → . Then there exists a meanzero Gaussian process G P,n in (cid:96) ∞ ( J n ) with the displayed covariance function (17) suchthat the random variables Z n := sup h ∈J n √ nσ ε t n ( h ) R n ( h ) and (cid:101) Z n := sup h ∈J n G P,n h have, | Z n − (cid:101) Z n | (cid:46) O P (cid:32) k / n (log k n ) / (log n ) γ / n / + k / n (log k n ) / (log n ) / γ / n / + k / n (log k n ) / (log n ) / γ / n / (cid:33) Proof.
We apply Theorem A.9 with the choice of q = ∞ and recall the notations from thattheorem statement. Fix any n ≥ g ∈ G n . Firstly the second moment is, P | g | = P | h ∗ | σ ε ( (cid:112) P | h ∗ | + a n ) ≤ σ ε P | g | = 1( σ ε (cid:112) P | h ∗ | + a n ) P | h ∗ | ≤ σ ε (cid:112) P | h ∗ | + a n ) P | h ∗ | ≤ σ ε (cid:16) f n ( λ ) λ / k n || h ∗ || + a n (cid:17) (cid:112) P | h ∗ | ≤ σ ε (cid:16) f n ( λ ) λ / k n || h ∗ || + a n (cid:17) (cid:114) sup j E [ ξ j ] λ f n ( λ k n ) || h ∗ || = C f n ( λ k n ) || h ∗ || σ ε (cid:16) f n ( λ ) λ / k n || h ∗ || + a n (cid:17) ≤ C f n ( λ k n ) σ ε (cid:16) f n ( λ ) λ / k n + a n (cid:17) Thus it suffices to set, σ n := 1 σ ε ,b n := max σ ε (cid:16) f n ( λ ) λ / k n + a n (cid:17) , C f n ( λ k n ) σ ε (cid:16) f n ( λ ) λ / k n + a n (cid:17) = 1 σ ε (cid:16) f n ( λ ) λ / k n + a n (cid:17) max , C f n ( λ k n ) σ ε (cid:16) f n ( λ ) λ / k n + a n (cid:17) (cid:46) ( (cid:112) k n log k n ) max { , ( k n log k n ) } (cid:46) k / n (log k n ) / for which we obtain sup g ∈G n P | g | ≤ σ n and sup g ∈G n P | g | ≤ σ n b n and F n ≤ b n . Note that b n σ n (cid:46) k / n (log k n ) / .Let’s now obtain a bound for K n . Observe that using Lemma A.8, ν n log A n b n σ n = ν n log b n σ n + log K + log V ( G n ) + V ( G n ) log(16 e ) (cid:46) O ( k n ) O (log k n + log log k n ) + O (1) + O (log k n ) + O ( k n )= O ( k n log k n ) 48nd so ν n (cid:32) (cid:114) A n b n σ n (cid:33) = 2 ν n + 2 √ ν n (cid:114) ν n + ν n log A n b n σ n + ν n log A n b n σ n = O ( k n ) + (cid:112) O ( k n ) (cid:112) O ( k n ) + O ( k n log k n ) + O ( k n log k n )= O ( k n log k n )This implies, K n := cν n max log n , (cid:32) (cid:114) A n b n σ n (cid:33) (cid:46) max {O ( k n ) log n , O ( k n log k n ) } = O ( k n log n )where we used that k n /n → Z n there exists a random variable (cid:102) W n with which we have a mean-zero Gaussian process { G P,n ( g ) } g ∈G n with covariance function E [ G P,n ( g ) G P,n ( g )] = (cid:10) Γ / Γ † h , Γ / Γ † h (cid:11) ( || Γ / Γ † h || + a n )( || Γ / Γ † h || + a n ) , for all g , g ∈ G n such that g i = h ∗ i √ P | h ∗ i | with h ∗ i ∈ J ∗ n , i = 1 ,
2. Indeed, thanks to again to the Rieszrepresentation theorem, we can identify this Gaussian process G P,n indexed by G n withcovariance function on the left hand side with a Gaussian process indexed by J n havingthe covariance function on the right hand side. So with some abuse of notations, we canwrite (cid:101) Z n = sup g ∈G n G P,n g = sup h ∈J n G P,n h . Moreover, Z n and (cid:101) Z n satisfy, for all γ ∈ (0 , | Z n − (cid:101) Z n | = O P (cid:32) k / n (log k n ) / (log n ) γ / n / + k / n (log k n ) / (log n ) / γ / n / + k / n (log k n ) / (log n ) / γ / n / (cid:33) We can finally summarize everything and put Steps I and II together.
Theorem A.11.
Define (cid:101) Z n := sup h ∈J n G P,n h where { G P,n ( h ) } h ∈J n is a mean zero Gaus-sian process on (cid:96) ∞ ( J n ) with covariance function (17) . Then for sufficiently large n , (cid:12)(cid:12)(cid:12)(cid:12) sup h ∈J n (cid:28) √ nσ ε t n ( h ) ( ˆ ρ − ˆΠ k n ρ ) , h (cid:29) − (cid:101) Z n (cid:12)(cid:12)(cid:12)(cid:12) (cid:46) O P (cid:32) k / n (log k n ) / + k / n (log k n ) / (log n ) n / (cid:33) roof. By (19), Propositions A.7 and A.10 with an arbitrarily fixed γ ∈ (0 ,
1) and usingthe notations therein, (cid:12)(cid:12)(cid:12)(cid:12) sup h ∈J n (cid:28) √ nσ ε t n ( h ) ( ˆ ρ − ˆΠ k n ρ ) , h (cid:29) − (cid:101) Z n (cid:12)(cid:12)(cid:12)(cid:12) ≤ sup h ∈J n (cid:12)(cid:12)(cid:12)(cid:12) √ nσ ε t n ( h ) (cid:104)T n + S n + Y n , h (cid:105) (cid:12)(cid:12)(cid:12)(cid:12) + (cid:12)(cid:12)(cid:12)(cid:12) sup h ∈J n √ nσ ε t n ( h ) (cid:104)R n , x (cid:105) − (cid:101) Z n (cid:12)(cid:12)(cid:12)(cid:12) (cid:46) √ n O P (cid:32) k / n (log k n ) / n / (cid:33) + O P (cid:32) k / n (log k n ) / (log n ) γ / n / + k / n (log k n ) / (log n ) / γ / n / + k / n (log k n ) / (log n ) / γ / n / (cid:33) The result follows by taking the higher order terms.The main result of Theorem 2.1 displayed in the main text is thus simply Theorem A.11normalized by the appropriate rate.
References
Bosq, D. (2012):
Linear Processes in Function Spaces: Theory and Applications , vol. 149,Springer Science & Business Media.
Cai, T. T., P. Hall, et al. (2006): “Prediction in functional linear regression,”
TheAnnals of Statistics , 34, 2159–2179.
Cardot, H., F. Ferraty, A. Mas, and P. Sarda (2003): “Testing hypotheses in thefunctional linear model,”
Scandinavian Journal of Statistics , 30, 241–255.
Cardot, H., F. Ferraty, and P. Sarda (1999): “Functional linear model,”
Statistics& Probability Letters , 45, 11–22.
Cardot, H., A. Mas, and P. Sarda (2007): “CLT in functional linear regressionmodels,”
Probability Theory and Related Fields , 138, 325–361.
Cardot, H. and P. Sarda (2011): “Functional linear regression,” in
The Oxford Hand-book of Functional Data Analysis . Chernozhukov, V., D. Chetverikov, K. Kato, et al. (2014): “Gaussian approxi-mation of suprema of empirical processes,”
The Annals of Statistics , 42, 1564–1597.
Conway, J. B. (1994):
A Course in Functional Analysis , Springer, 2nd ed.50 rambes, C. and A. Mas (2013): “Asymptotics of prediction in functional linear regres-sion with functional outputs,”
Bernoulli , 19, 2627–2651.
Cuesta-Albertos, J. A., E. Garc´ıa-Portugu´es, M. Febrero-Bande, andW. Gonz´alez-Manteiga (2019): “Goodness-of-fit tests for the functional linear modelbased on randomly projected empirical processes,”
The Annals of Statistics , 47, 439–467.
Dauxois, J., A. Pousse, and Y. Romain (1982): “Asymptotic Theory for the PrincipalComponent Analysis of a Vector Random Function: Some Applications to StatisticalInference,”
Journal of Multivariate Analysis , 12, 136–154.
Goia, A. and P. Vieu (2016): “An introduction to recent advances in high/infinitedimensional statistics,” .
Hilgert, N., A. Mas, N. Verzelen, et al. (2013): “Minimax adaptive tests for thefunctional linear model,”
Annals of Statistics , 41, 838–869.
H¨ormann, S., (cid:32)L. Kidzi´nski, and M. Hallin (2015): “Dynamic functional principalcomponents,”
Journal of the Royal Statistical Society: Series B: Statistical Methodology ,319–348.
Horv´ath, L. and P. Kokoszka (2012):
Inference for functional data with applications ,vol. 200, Springer Science & Business Media.
Hsing, T. and R. Eubank (2015):
Theoretical Foundations of Functional Data Analysis,with an Introduction to Linear Operators , vol. 997, John Wiley & Sons.
Kato, T. (1995):
Perturbation Theory for Linear Operators , Springer Science & BusinessMedia, 2nd ed.
Leung, R. C. W. and Y.-M. Tam (2021): “Supplement to “A Small-Uniform Statisticfor the Inference of Functional Linear Regressions”,” .
Panaretos, V. M., S. Tavakoli, et al. (2013): “Fourier analysis of stationary timeseries in function space,”
The Annals of Statistics , 41, 568–603.
Powell, M. J. (1994): “A direct search optimization method that models the objec-tive and constraint functions by linear interpolation,” in
Advances in optimization andnumerical analysis , Springer, 51–67.
Ramsay, J. and B. W. Silverman (2005):
Functional Data Analysis , Springer-VerlagNew York, 2nd ed.
Runarsson, T. P. and X. Yao (2005): “Search biases in constrained evolutionary opti-mization,”
IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applicationsand Reviews) , 35, 233–243. 51 tancu-Minasian, I. M. (2012):
Fractional programming: theory, methods and applica-tions , vol. 409, Springer Science & Business Media. van der Vaart, A. W. and J. A. Wellner (1996):
Weak Convergence and EmpiricalProcesses: With Applications to Statistics , Springer.
Wang, J.-L., J.-M. Chiou, and H.-G. M¨uller (2016): “Functional Data Analysis,”
Annual Review of Statistics and Its Application , 3, 257–295.
Yao, F., H.-G. M¨uller, and J.-L. Wang (2005): “Functional linear regression analysisfor longitudinal data,”