[PDF] A Small-Uniform Statistic for the Inference of Functional Linear Regressions

Abstract

We propose a "small-uniform" statistic for the inference of the functional PCA estimator in a functional linear regression model. The literature has shown two extreme behaviors: on the one hand, the FPCA estimator does not converge in distribution in its norm topology; but on the other hand, the FPCA estimator does have a pointwise asymptotic normal distribution. Our statistic takes a middle ground between these two extremes: after a suitable rate normalization, our small-uniform statistic is constructed as the maximizer of a fractional programming problem of the FPCA estimator over a finite-dimensional subspace, and whose dimensions will grow with sample size. We show the rate for which our scalar statistic converges in probability to the supremum of a Gaussian process. The small-uniform statistic has applications in hypothesis testing. Simulations show our statistic has comparable to slightly better power properties for hypothesis testing than the two statistics of Cardot, Ferraty, Mas and Sarda (2003).

Full PDF

AA Small-Uniform Statistic for the Inference of FunctionalLinear Regressions

Raymond C. W. Leung and Yu-Man TamFebruary 21, 2021

Abstract

We propose a “small-uniform” statistic for the inference of the functional PCAestimator in a functional linear regression model. The literature has shown two extremebehaviors: on the one hand, the FPCA estimator does not converge in distribution inits norm topology; but on the other hand, the FPCA estimator does have a pointwiseasymptotic normal distribution. Our statistic takes a middle ground between these twoextremes: after a suitable rate normalization, our small-uniform statistic is constructedas the maximizer of a fractional programming problem of the FPCA estimator overa ﬁnite-dimensional subspace, and whose dimensions will grow with sample size. Weshow the rate for which our scalar statistic converges in probability to the supremum ofa Gaussian process. The small-uniform statistic has applications in hypothesis testing.Simulations show our statistic has comparable to slightly better power properties forhypothesis testing than the two statistics of Cardot, Ferraty, Mas and Sarda (2003).

Keywords and phrases:

Empirical process, functional data analysis, functional linearmodel, functional principal components estimator, Gaussian processes, hypothesis testing,supremum. 1 a r X i v : . [ m a t h . S T ] F e b he functional linear model (FLM) and its associated functional principal componentsestimator (FPCA estimator) are now staples in the statistics literature. However, whilemuch is known about the FPCA’s mean squared error convergence and consistency prop-erties, much less is known about its asymptotic distributional properties. In particular,although there are hypothesis testing procedures on the FLM, the literature has few hypoth-esis testing procedures of the FLM that are explicitly based on the FPCA slope estimate.This dearth of hypothesis testing procedures based on the estimator of the model is instark contrast to its ﬁnite-dimensional counterpart; for instance, ordinary least squares isboth an estimator of the slope and also the input of the t -tests, F -tests and many othersin tests of the ﬁnite-dimensional linear model.This paper has two main objectives. Firstly, we introduce a small-uniform statistic that is constructed out of a normalized fractional programming problem of the FPCAestimator. Theorem 2.1 is the main result of this paper and shows our small-uniformstatistic converges in probability to a supremum of a Gaussian process. This result is thebasis for a hypothesis testing procedure that explicitly depends on the FPCA estimator.Secondly, we show in numerical simulations the hypothesis testing procedure based oﬀ ofour small-uniform statistic has comparable to slightly better power properties than the twostatistics proposed in Cardot et al. (2003).The key references of our paper are Cardot et al. (2007, 2003) and Chernozhukov et al.(2014). In particular, Cardot et al. (2003) and Hilgert et al. (2013) are one of the ﬁrst fewstudies for conducting hypothesis testing on the FLM. However, as far as we understand,none of these studies base their hypothesis testing procedure on the FPCA estimator. Re-cently, Cuesta-Albertos et al. (2019) has proposed an interesting goodness-of-ﬁt test of theFLM based on random projections, and a step in its testing procedure does indeed dependon the FPCA estimator. Roughly speaking, the testing procedure of Cuesta-Albertos et al.(2019) is dependent on a single randomly drawn vector (i.e. a “direction”) of the functionalregressors’ underlying Hilbert space. To smooth out the uncertainty in just drawing a sin-gle direction, the authors recommend drawing multiple directions to thus conduct severalhypothesis tests, and the ﬁnal inference step is concluded by a multiple hypothesis testingcorrection (see their Algorithms 4.1 and 4.2). In contrast and intuitively, our small-uniformstatistic considers ﬁnitely many (but that number increases with the sample size) of thesedirections, and then look for the “largest” direction. Thus our small-uniform statistic isa single scalar and does not require multiple hypothesis testing corrections. Ramsay andSilverman (2005) is the well-known seminal survey of the functional data analysis (FDA)literature. Cardot and Sarda (2011), Horv´ath and Kokoszka (2012), Hsing and Eubank(2015), Goia and Vieu (2016) and Wang et al. (2016) are some recent surveys on theadvancements of the FDA literature.Section 1 ﬁxes notations for the FLM and reviews the two extreme asymptotic behaviorof the FPCA estimator as documented by Cardot et al. (2007). Section 2 introduces oursmall-uniform statistic. Section 3 outlines the hypothesis testing procedure based oﬀ ofour small-uniform statistic, and Section 4 shows some simulated numerical results. We2onclude in Section 5. The proofs are technical in nature and thus we gather them in theSupplementary Materials Leung and Tam (2021). Let’s begin with the standard functional linear model . Throughout this paper, we will ﬁxa suﬃciently rich probability space (Ω , F , P ) that accommodates all the random quantitiesin this paper. Let H be an arbitrary real separable inﬁnite dimensional Hilbert spaceequipped with an inner product (cid:104)· , ·(cid:105) and denote its norm as || · || . Let, Y = (cid:104) ρ, X (cid:105) + ε, (1)where Y is a real valued scalar dependent variable, X is H -valued random element, and ρ is an H -valued coeﬃcient vector. Moreover, ε is a scalar error term such that E [ ε | X ] = 0and E [ ε | X ] = σ ε . We are interested in the estimation and subsequent inference of thecoeﬃcient vector ρ .Let’s deﬁne the usual covariance and cross-covariance operators. For any x , x ∈ H ,we denote their tensor product as x ⊗ x ( h ) := (cid:104) x , h (cid:105) x for all h ∈ H . We denote the covariance operator of X as Γ : H → H ,Γ h := E [ X ⊗ X ( h )] , h ∈ H (2)and deﬁne the cross-covariance operator of X and Y as ∆ : H → R ,∆ h := E [ X ⊗ Y ( h )] , h ∈ H (3)We denote { λ j } j ∈ Z + as the sequence of sorted non-null distinct eigenvalues of Γ, λ >λ > . . . >

0, and { e j } j ∈ Z + a sequence of orthonormal eigenvectors associated with thoseeigenvalues. We assume the multiplicity of each λ j is one. From (1) we have normalequation, ∆ = Γ ρ. (4)For the H -valued random element X , there is the well-known Karhunen-Lo`eve expan-sion of X and is given by, X = ∞ (cid:88) l =1 (cid:112) λ l ξ l e l , (5)where ξ l ’s are centered real random variables such that E [ ξ l ξ l (cid:48) ] = 1 if l = l (cid:48) and 0 otherwise. This section will revisit some of the key deﬁnitions and setup from Cardot et al. (2007).Suppose we have have n independent and identically distributed observations { ( Y i , X i ) } ni =1

3f (1). We construct the empirical counterparts of Γ and ∆ as,Γ n := 1 n n (cid:88) i =1 X i ⊗ X i , (6a)∆ n := 1 n n (cid:88) i =1 X i ⊗ Y i , (6b) U n := 1 n n (cid:88) i =1 X i ⊗ ε i . (6c)Then from (1), we get the empirical normal equation∆ n = Γ n ρ + U n . (7)We denote the j th empirical eigenelement of Γ n as (ˆ λ j , ˆ e j ).As is well known in the FLM literature, we will need some sort of regularization methodto deﬁne an “approximate inverse” to Γ n . We will again follow the setup of Cardot et al.(2007) and Bosq (2012) and deﬁne the sequence δ j ’s, j = 1 , , . . . of the smallest diﬀerencebetween distinct eigenvalues of Γ as, δ := λ − λ , (8a) δ j := min { λ j − λ j +1 , λ j − − λ j } . (8b)Now take { c n } n ∈ N a sequence of strictly positive numbers tending to zero such that c n < λ and set, k n := sup { p : λ p + δ p / ≥ c n } . (9)This k n will be our truncation parameter ; note when n → ∞ we have c n →

0, which thenimplies k n ↑ ∞ .Let’s gather the assumptions of our paper here. Unless noted otherwise, we will enforcethese assumptions throughout the paper’s results and proofs. Assumption 1 (Identiﬁability) . (i) (cid:80) ∞ j =1 (cid:104) E [ XY ] ,e j (cid:105) λ j < ∞ ; and(ii) ker Γ = { } . Assumption 2 (Tail behavior) . (i) (cid:80) ∞ l =1 | (cid:104) ρ, e l (cid:105) | < ∞ ;(ii) There exists some ﬁnite M such that sup l E [ ξ l ] ≤ M < ∞ ; and iii) There exists a convex positive function λ such that for j suﬃciently large, λ j = λ ( j ) . Assumption 3 (Approximate reciprocal) . (i) f n is decreasing on [ c n , λ + δ ] ;(ii) lim n →∞ sup x ≥ c n | xf n ( x ) − | = 0 ;(iii) f (cid:48) n ( x ) exists for x ∈ [ c n , ∞ ) ; and(iv) sup s ≥ c n | sf n ( s ) − | = o (cid:16) √ n (cid:17) . Assumption 4 (Roughening the standard deviation) . There exists a sequence of positivenumbers { a n } such that a n → and a n √ k n log k n → , as n → ∞ ; and Assumption 5 (Empirical eigenvector approximations) . Assume k n is such that λ kn − λ kn +1 = O ( n / ) . Assumption 1 is a basic identiﬁability condition in a functional linear model and theseconditions are discussed in detail in Cardot et al. (1999) and Cardot et al. (2003). Assump-tion 2 corresponds to Assumption A of Cardot et al. (2003) which are basic conditions thatensure the statistical problem is correctly posed. For our purposes, however, we replaceCardot et al. (2007)’s ﬁnite fourth moment assumption on the ξ l ’s with a stronger ﬁnitesixth moment assumption. Assumption 3 corresponds to Assumption F of Cardot et al.(2003) which eﬀectively says the sequence of functions { f n } should behave like f n ( x ) ≈ /x when n is suﬃciently large. Assumption 4 is new: it says { a n } is a regularization that tendsto zero, and more importantly, tends to zero faster than the reciprocal of the eigenvaluestending to inﬁnity. Assumption 5 will be used to ensure the empirical eigenvectors of theempirical covariance operator uniformly converges in probability to the population eigen-vectors of population covariance operator.At this point, we will need to use the resolvent formalism to deﬁne an object Γ † n whichwill serve as our “approximate empirical inverse” to Γ n . For the purpose of exposition,we delegate the deﬁnition and details of this object to the supplementary materials. Toconstruct Γ † n , we will need a sequence of positive functions { f n } n ∈ N with support on [ c n , ∞ )that satisfy Assumption 3. Intuitively, the functions f n have the behavior of f n ( x ) ≈ /x when n is suﬃciently large. By Riesz functional calculus , we can deﬁne the followingquantity (see supplementary materials (Leung and Tam, 2021, (29)) for details),Γ † n := f n (Γ n ) . (10)In particular, Γ † n will serve as the approximate inverse of Γ n . We will also let ˆΠ k n denote theprojection operator from H onto span { ˆ e , . . . , ˆ e k n } , which is subspace of all possible linearcombinations of the ﬁrst k n empirical eigenvectors (equation (29) in the supplementarymaterials Leung and Tam (2021) will deﬁne ˆΠ k n precisely via Riesz functional calculus).5inally, a natural estimator of ρ from n iid observations based on (4) and (7) is the functional principal components (FPCA) estimator ,ˆ ρ := Γ † n ∆ n . (11)Cardot et al. (1999) shows this estimator is consistent for the choice of f n ( x ) ≡ /x . The motivation of our paper starts from two key insights from Cardot et al. (2007). Theirﬁrst key result (also more recently (Crambes and Mas, 2013, Theorem 8)) is that the FPCAestimator (11) cannot converge in distribution to a non-degenerate random element in thenorm topology of H . Theorem (Cardot et al. (2007), Theorem 1) . It is impossible for ˆ ρ − ρ to converge indistribution to a non-degenerate random element in the norm topology of H . This impossibility result suggests that we may not directly use the FPCA estimatorfor the purpose of inference in the norm topology of H . In contrast, uniform predictionintervals can still be constructed (see concluding remarks of Cardot et al. (2007) and(Crambes and Mas, 2013, Corollaries 10 and 11)).Their second result (see also more recently (Crambes and Mas, 2013, Theorem 9))shows the following pointwise weak convergence result. Theorem (Cardot et al. (2007), Theorem 3) . Fix any x ∈ H . Then under the sameAssumptions 2 to 3 of our paper, and under additional regularity conditions (see theirpaper for details), √ n || Γ / Γ † x || σ ε ( (cid:104) ˆ ρ, x (cid:105) − (cid:104) ˆΠ k n ρ, x (cid:105) ) (cid:32) N (0 , . For the sake of exposition, we will defer the precise deﬁnition of Γ † to the supplementarymaterials (see (Leung and Tam, 2021, (23))), but we can intuitively think of this quantity asan “approximate inverse” of the population covariance operator Γ. This result is extremelyuseful for constructing prediction intervals when we evaluate at x = X n +1 . However, therather arbitrary choice of x ∈ H renders this result impractical when the researcher isconcerned with the statistical inference.The main contribution of this paper can be thought of as “something in between”Theorem 1 and Theorem 3 of Cardot et al. (2007). This paper focuses on the study on ascalar “partial” supremum statistic W n to be deﬁned in (14). For the sake of heuristics inthis section, we will slightly blur the distinction between the empirical eigenelements andthe population eigenelements (see Remark 2.3 for the validity of this justiﬁcation). Let’smake three observations. 6he ﬁrst observation is that there is no need to consider points x in all of H in(Cardot et al., 2007, Theorem 3). Provided x (cid:54) = 0, we can multiply and divide by (cid:15) || x || ,where (cid:15) ∈ (0 , √ nσ ε (cid:104) ˆ ρ − ˆΠ k n ρ, x (cid:105)|| Γ / Γ † x || = √ nσ ε (cid:104) ˆ ρ − ˆΠ k n ρ, (cid:15)x || x || (cid:105) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) Γ / Γ † (cid:15)x || x || (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) (12)Of course, || (cid:15)x || x || || = (cid:15) ∈ (0 , H , we can immedi-ately conﬁne to those points in ball H := { h ∈ H : || h || ≤ } .Secondly, we can say a lot more about (Cardot et al., 2007, Theorem 3) by restrictingball H further. The main idea is to consider not all points in ball H , but consider a“small but growing” linear subspace of it. In the numerator of (12), since ˆ ρ − ˆΠ k n ρ ∈ span { ˆ e , . . . , ˆ e k n } , by the idempotent property of the projection operator, it follows for any x ∈ ball H (or in H ) we have (cid:104) ˆ ρ − ˆΠ k n ρ, x (cid:105) = (cid:104) ˆΠ k n ( ˆ ρ − ˆΠ k n ρ ) , x (cid:105) = (cid:104) ˆ ρ − ˆΠ k n ρ, ˆΠ k n x (cid:105) .Thus only points in span { ˆ e , . . . , ˆ e k n } ≈ span { e , . . . , e k n } determine the numerator of (12).Next let’s consider the denominator of (12). By the spectral decompositions of Γ / andΓ † , Γ / Γ † = ∞ (cid:88) j =1 k n (cid:88) l =1 (cid:112) λ j f n ( λ l ) P j P l = k n (cid:88) j =1 (cid:112) λ j f n ( λ j ) P j where P j is the projection of H onto the j th eigenspace ker (Γ − λ j ). More explicitly, sincethese orthogonal projections partition H , we can write any x ∈ ball H as x = (cid:80) ∞ j =1 P j x .And since ker (Γ − λ j ) ⊥ ker (Γ − λ j (cid:48) ) for j (cid:54) = j (cid:48) , this implies,Γ / Γ † x = k n (cid:88) j =1 (cid:112) λ j f n ( λ j ) ∞ (cid:88) l =1 P j P l x = k n (cid:88) j =1 (cid:112) λ j f n ( λ j ) P j x In other words, picking any x ∈ ball H with x = (cid:80) ∞ j =1 P j x versus picking h ∈ ball H with h = (cid:80) k n j =1 P j h results in the same value, Γ / Γ † x = Γ / Γ † h . And since H (and henceball H ) is assumed to be separable, we can simply assume that such h takes the form h = (cid:80) k n j =1 b j e j with (cid:80) k n j =1 b j ≤

1. In all, we argue it suﬃces to evaluate || Γ / Γ † · || on theﬁnite-dimensional domain ball H ∩ span { e , . . . , e k n } instead of on the inﬁnite-dimensionaldomain ball H .Thirdly, instead of considering σ ε || Γ / Γ † h || as the asymptotic standard deviation of (cid:104) ˆ ρ − ˆΠ k n ρ, h (cid:105) , let’s use a slightly roughened version and deﬁne t n ( h ) := || Γ / Γ † h || + a n = (cid:118)(cid:117)(cid:117)(cid:116) k n (cid:88) j =1 λ j [ f n ( λ j )] (cid:104) h, e j (cid:105) + a n (13)where we let { a n } be a sequence of nonnegative numbers tending to zero. Note and recallthat t n depends on n not just through a n but also through Γ † which depends on k n .7ssumption 4 ensures this roughening sequence tends to zero at a rate slower than the ratefor which the sequence of eigenvalues tend to zero. Finally, let’s put our above observations together. In search for a single scalar statistic, itseems reasonable to look for the largest value of (12) over the ﬁnite-dimensional domainball H ∩ span { e , . . . , e k n } . We thus have the following deﬁnition. Deﬁnition 2.1 (Small-uniform statistic) . Let ˆ ρ be the FPCA estimator (11) of the func-tional linear model (1) and let { β n } be a sequence of positive numbers with β n → ∞ as n → ∞ . Deﬁne W n := √ nσ ε β n sup h ∈J n (cid:104) ˆ ρ − ˆΠ k n ρ, h (cid:105) t n ( h ) J n := ball H ∩ span { e , . . . , e k n } (14)where t n is deﬁned in (13). We call W n the small-uniform statistic of the functional linearmodel.The real-valued scalar statistic W n is “small” because we only consider a low andﬁnite-dimensional linear subspace J n of H , even though as n becomes large this subspaceapproaches ball H . It is “uniform” because we look for the largest value over this linearsubspace J n .Recall again (Cardot et al., 2007, Theorem 3) already shows the pointwise asymptoticnormality result of the FPCA estimator. Thus under some regularity conditions and aproper rate normalization, one can expect W n to distribute like the supremum of a Gaussianprocess indexed by J n . Indeed our main result Theorem 2.1 shows precisely the rate ofconvergence under which W n and a certain Gaussian process converge to each other inprobability, and hence also in distribution. Note by linearity in h in the numerator of (14)and as the denominator t n is strictly positive, the statistic W n is almost surely nonnegativevalued. Remark . The normalization 1 /β n in (14) might seem curious.The normalization by √ n is standard, and is well expected by the pointwise asymptoticnormality result of (Cardot et al., 2007, Theorem 3). The normalization by 1 /β n is neces-sary because we need this rate to ensure some “nuisance terms” in W n converge fast enoughto zero. See the proof outline of our main result Theorem 2.1 for further explanations.In addition, our statistic is a fractional programming problem (see Stancu-Minasian(2012) for a survey). So while σ ε || Γ / Γ † h || is indeed the pointwise asymptotic standarddeviation for √ n (cid:104) ˆ ρ − ˆΠ k n ρ, h (cid:105) , we clearly see this standard deviation evaluates to zero at h = 0. Using a roughened version t n ( h ) of the standard deviation ensures the denominatorof our statistic is strictly positive. 8 emark . The optimization problem in W n is well-deﬁned. The objectivefunction is clearly continuous in H , especially since by construction t n >

0. Moreoverwe’re optimizing over J n , which is a compact set , and so the extreme value theoremapplies. Remark W n ) . As (14) is written, it is an empiricallyinfeasible quantity for several reasons. Let’s argue why putting in empirically feasibleplug-in estimates will asymptotically do no harm to our results.(a) (

Replacing the truncation parameter ) The truncation parameter k n as deﬁned in (9)depends on the unobservable population eigenvalues λ j ’s. The natural substitute isthe empirical truncationˆ k n := max { p = 1 , . . . , n : ˆ λ p + ˆ δ p / ≥ c n } , (15)where ˆ δ j is as analogously deﬁned to its population counterpart in (8) but with theempirical eigenvalues. Thanks to Assumption 2(ii) and (Bosq, 2012, § j ≥ | ˆ λ j − λ j | → k n or population truncation k n are equivalent inprobability.(b) ( Replacing the optimization domain ) The optimization domain as deﬁned in (14) isover the unobservable population eigenvectors e j ’s. The natural empirically feasibleapproach is to optimize instead over the empirical eigenvectors ˆ e j ’s. By Assumption 5,(Bosq, 2012, § E (cid:104) sup ≤ j ≤ k n || ˆ e j − e (cid:48) j || (cid:105) → n → ∞ , andwhere we have denoted e (cid:48) j := sign( (cid:104) ˆ e j , e j (cid:105) ) e j and where sign( t ) = 1 if t >

0, = 0 if t = 0,and = − t <

0. This implies optimizing over ˆ J n := ball H ∩ span { ˆ e , . . . , ˆ e k n } andball H ∩ span { e (cid:48) , . . . , e (cid:48) k n } are asymptotically equivalent in probability. By ﬁxing the“orientations” sign( (cid:104) ˆ e j , e j (cid:105) )’s, we can identify optimizing ball H ∩ span { e (cid:48) , . . . , e (cid:48) k n } with optimizing over J n .(c) ( Replacing the asymptotic standard deviation ) The asymptotic standard deviation t n as deﬁned in (13) depends on the unobservable population eigenvalues λ j ’s and eigen-vectors e j ’s. An empirically feasible version of t n is its natural plug-in estimator,ˆ t n ( h ) = (cid:118)(cid:117)(cid:117)(cid:117)(cid:116) ˆ k n (cid:88) j =1 ˆ λ j [ f n (ˆ λ j )] (cid:104) h, ˆ e j (cid:105) + a n , h ∈ ˆ J n (16)By using the arguments in (a) and (b) above, it is not diﬃcult to see that t n and ˆ t n are asymptotically the same in probability. See also (Cardot et al., 2007, Corollary 2). Clearly ball H is bounded, and a ﬁnite-dimensional subspace in an inﬁnite-dimensional Hilbert spaceis closed (in the relative topology). Thus the Heine-Borel theorem applies and so J n is compact. Consistent estimate of noise error ) It is clear the standard deviation of the error term σ ε can be replaced by any consistent estimator ˆ σ ε P → σ ε .Except for Sections 3 and 4 where we discuss numerical simulations, the rest of thissection and the proofs will use W n as deﬁned by (14). This is the paper’s main result. The proof outline sketches the two key steps to provingour result. We delegate all the proof details to the supplementary materials Leung andTam (2021). For an arbitrary set T , we denote (cid:96) ∞ ( T ) as the space of all bounded functionsfrom T to R with the uniform norm || f || T := sup t ∈ T | f ( t ) | . Theorem 2.1 (Gaussian suprema approximation of the small-uniform statistic) . AssumeAssumptions 1 to 4 hold and assume k n /n → as n → ∞ . Then for suﬃciently large n , there exists a mean-zero Gaussian process { G P,n ( h ) } h ∈J n in (cid:96) ∞ ( J n ) with covariancefunction, E [ G P,n ( h ) G P,n ( h )] = (cid:10) Γ / Γ † h , Γ / Γ † h (cid:11) ( || Γ / Γ † h || + a n ) ( || Γ / Γ † h || + a n ) , (17) for all h , h ∈ J n .Moreover, if we deﬁne the random variables (cid:101) Z n := sup h ∈J n G P,n h and (cid:102) W n := (cid:101) Z n β n , then the small-uniform statistic W n of (14) and the random variable (cid:102) W n are close togetherin probability at the rate | W n − (cid:102) W n | = O P (cid:32) k / n (log k n ) / β n + k / n (log k n ) / (log n ) n / (cid:33) . (18) In particular if k / n (log k n ) / (log n )min { β n ,n / } → , then | W n − (cid:102) W n | P → . Proof outline.

For each h ∈ J n we have the important decomposition, √ nσ ε t n ( h ) (cid:104) ˆ ρ − ˆΠ k n ρ, h (cid:105) = √ nσ ε t n ( h ) (cid:104)T n + S n + Y n + R n , h (cid:105) . (19)10here T n := (Γ † n Γ n − Π k n ) ρ, (20a) S n := (Γ † n − Γ † ) U n , (20b) Y n := (Π k n − ˆΠ k n ) ρ, (20c) R n := Γ † U n . (20d)For the sake of exposition, we defer the precise functional calculus deﬁnitions of thebounded operators Γ † n , Γ n , Γ † and Π k n to the supplementary materials.Then by triangle inequality, we have (cid:12)(cid:12)(cid:12)(cid:12) sup h ∈J n √ nσ ε t n ( h ) (cid:104) ˆ ρ − ˆΠ k n ρ, h (cid:105) − (cid:101) Z n (cid:12)(cid:12)(cid:12)(cid:12) ≤ sup h ∈J n (cid:12)(cid:12)(cid:12)(cid:12) √ nσ ε t n ( h ) (cid:104)T n + S n + Y n , h (cid:105) (cid:12)(cid:12)(cid:12)(cid:12) + (cid:12)(cid:12)(cid:12)(cid:12) sup h ∈J n √ nσ ε t n ( h ) (cid:104)R n , h (cid:105) − (cid:101) Z n (cid:12)(cid:12)(cid:12)(cid:12) . The two major steps in the proof are showing the following results for suﬃciently large n : Step I:

Asymptotic bias termssup h ∈J n (cid:12)(cid:12)(cid:12)(cid:12) √ nσ ε t n ( h ) (cid:104)T n + S n + Y n , h (cid:105) (cid:12)(cid:12)(cid:12)(cid:12) = O P (cid:16) k / n (log k n ) / (cid:17) . (I) Step II:

Asymptotic distribution term (cid:12)(cid:12)(cid:12)(cid:12) √ nσ ε t n ( h ) sup h ∈J n (cid:104)R n , h (cid:105) − (cid:101) Z n (cid:12)(cid:12)(cid:12)(cid:12) = O P (cid:32) k / n (log k n ) / (log n ) n / (cid:33) . (II)Step (I) uses many proof arguments from Cardot et al. (2007) but we take extra carein keeping track of the rates of various bounds. Proposition A.7 of the supplementarymaterials concludes the discussions of Step (I). By our underlying real valued Hilbertspace structure, we can apply Riesz’s representation theorem to uniquely identify H withits dual H ∗ . Thus we can view the indexing of the supremum of R n by J n in Step (II) asequivalent to indexing by its dual J ∗ n , which allows us to apply the tools from empiricalprocess theory. Our desired result for Step (II) is the contents of Proposition A.10 in thesupplementary materials, which is an application of Chernozhukov et al. (2014).Once Steps (I) and (II) hold, the statistic W n of (14) and the displayed random vari-able W n are, respectively, exactly the quantities sup h ∈J n √ nσ ε t n ( h ) (cid:104) ˆ ρ − ˆΠ k n ρ, h (cid:105) and (cid:101) Z n bothnormalized by 1 /β n to achieve the rate (18).11s discussed earlier, our result is a middle ground between the non-convergence (innorm topology) and the pointwise asymptotic normality of the FPCA estimator. The keycontribution of our result is to further understand the asymptotic distributional propertiesof the FPCA estimator. Up to our knowledge, only a few select studies (most notablyCardot et al. (2007)) have studied this problem from the perspective of inference. Thereare more, but still few, studies of the asymptotic distributional properties of the FPCAestimator for the purpose of prediction; for instance, see Yao et al. (2005) and Crambesand Mas (2013), among others. Remark . Note the right hand side of Step (I)is not normalized by √ n . In other words, with just a scaling of √ n on sup h ∈J n (cid:104) ˆ ρ − ˆΠ kn ρ,h (cid:105) σ ε t n ( h ) ,its asymptotic bias terms do not converge to zero. In contrast to Cardot et al. (2007),here we do not beneﬁt from the extra smoothing in a prediction problem (cid:104) ˆ ρ − ˆΠ k n ρ, X n +1 (cid:105) ,where they show normalizing by just √ n is suﬃcient to ensure the asymptotic bias termswill vanish; see also Cai et al. (2006).The suﬃcient condition k / n (log k n ) / (log n )min { β n ,n / } → W n to converge in probability to (cid:102) W n eﬀectively depends on the speed for which the truncation parameter k n of (9) tends toinﬁnity. The speed of k n in turn depends on both the speed the eigenvalues λ j ’s tend tozero, and the speed the regularization c n tends to zero. An important application of the small-uniform statistic is hypothesis testing. Cardot et al.(2003) introduces two statistics (their D n and T n ; see Section 4.2 later) based on the normof the cross-covariance operator ∆ n to test the hypothesis H : ρ = ρ versus H : ρ (cid:54) = ρ (e.g. we can say take ρ as the zero functional). However, while their statistics test therelationship ρ = ρ versus ρ (cid:54) = ρ , it does not use an estimate of ρ to form this test. Theprocedure in Cardot et al. (2003) is, in some sense, an analysis of variance approach totesting signiﬁcance. In contrast, our hypothesis testing approach here directly uses theFPCA estimator of ρ via the small-uniform statistic W n .Let’s summarize and outline a practical recipe to applying our main result for thepurpose of hypothesis testing.1. Fix a statistical signiﬁcance level α ∈ (0 ,

1) and form the hypothesis H : ρ = 0, H : ρ (cid:54) = 0. Loosely speaking, the procedure of Cardot et al. (2003) has the following counterpart in the ﬁnite-dimensional linear model. Let y i = x (cid:62) i β + (cid:15) i be the usual linear model in ﬁnite dimensions. Suppose wehave the hypothesis β = 0. Under this null, it necessarily implies E [ y i x i ] = E [( x (cid:62) i (cid:15) i ) x i ] = E [ (cid:15) i x i ] = 0.Thus, a test of the hypothesis β = 0 is to test for zero correlation between y i and x i — such “correlationtest” does not require an estimate of β . If the hypothesis were instead H : ρ = ρ , H : ρ (cid:54) = ρ for ρ is non-zero, we consider Y (cid:48) :=

12. Perform functional PCA on the empirical covariance operator Γ n and collect theempirical eigenelements (ˆ λ j , ˆ e j )’s.3. Fix regularization parameters:(a) Based on ˆ λ and ˆ δ , pick a sequence { c n } of positive numbers and a sequence offunctions { f n } that satisfy Assumption 3.(b) Compute the empirical truncation parameter ˆ k n of (15).(c) Pick a sequence of positive numbers { a n } that satisfy Assumption 4.(d) Pick a sequence of positive numbers { β n } that satisfy ˆ k / n (log ˆ k n ) / (log n )min { β n ,n / } → ρ of (11).5. Pick a consistent estimator of the error standard deviation ˆ σ ε .6. Construct the small-uniform statistic: Numerically solve the fractional programmingproblem, W n = √ n ˆ σ ε β n sup h ∈ ˆ J n (cid:104) ˆ ρ, h (cid:105) ˆ t n ( h ) = √ n ˆ σ ε β n sup b ∈ R ˆ kn || b ||≤ (cid:80) ˆ k n j =1 b j (cid:104) ˆ ρ, ˆ e j (cid:105) (cid:18)(cid:113)(cid:80) ˆ k n j =1 ˆ λ j [ f n (ˆ λ j )] b j + a n (cid:19) . (21)7. Simulate the asymptotic distribution:(a) Simulate a mean zero Gaussian process G P,n with covariance function (17),replacing all population quantities their empirical or estimated counterparts.(b) Take the maximum value of this Gaussian process’s sample path.(c) Repeat (a) and (b) many times to get a simulated distribution of the scalarrandom variable (cid:103) W n .(d) Compute the quantile q − α ; that is, P ( (cid:102) W n ≤ q − α ) = 1 − α .8. Inference: Reject H if W n > q − α ; otherwise, accept it. Remark . It is evident there is no closed form analytical solutionto the optimization problem (21). However, some numerical optimizers can greatly beneﬁtfrom inputting a known gradient and the Hessian of the objective function. In particular,for h = (cid:80) k n j =1 b j ˆ e j with b = ( b , . . . , b k n ), let L ( b ) := (cid:104) ˆ ρ − ˆΠ k n ρ, h (cid:105) t n ( h ) = (cid:80) k n j =1 b j θ j (cid:113)(cid:80) k n j =1 b j ψ j + a n =: f ( b ) g ( b ) =: f ( b ) (cid:112) p ( b ) + a n Y − (cid:104) X, ˆΠ k n ρ (cid:105) . Then the procedure is exactly as follows but we replace the cross-covariance operator ∆of ( Y, X ) with the cross-covariance operator ∆ (cid:48) of ( Y (cid:48) , X ). θ j := (cid:104) ˆ ρ − ˆΠ k n ρ, ˆ e j (cid:105) and ψ j := (cid:113) ˆ λ j f n (ˆ λ j ). Direct calculations show the l -thelement of the gradient vector is, ∂L∂b l = 1 g (cid:18) θ l − b l ψ l fg √ p (cid:19) and the ( l (cid:48) , l )-th component of the Hessian is, ∂L∂b l (cid:48) ∂b l = 1 g (cid:20) − b l ψ l g ( θ l (cid:48) p − b l (cid:48) ψ l (cid:48) f ) − b l (cid:48) ψ l (cid:48) f √ pgp / − b l (cid:48) ψ l (cid:48) √ p (cid:18) θ l − b l ψ l fg √ p (cid:19)(cid:21) Remark . As stated, (21) is an optimization problem with anonlinear objective function with a norm inequality constraint. The norm inequality con-straint is a nonlinear constraint. However, many local and global numerical optimizersare designed to accommodate only box constraints. By using spherical coordinates, wecan replace the single norm inequality constraint with just k n number of box constraints.Speciﬁcally, pick r ∈ [0 , φ , . . . , φ k n − ∈ [0 , π ] and φ k n − ∈ [0 , π ). We can change fromspherical to Euclidean coordinates via the well-known equations: b = r cos( φ ) ,b = r sin( φ ) cos( φ ) , ... b k n − = r sin( φ ) · · · sin( φ k n − ) cos( φ k n − ) ,b k n = r sin( φ ) · · · sin( φ k n − ) sin( φ k n − ) . Let’s illustrate the small sample properties of our hypothesis testing procedure from Sec-tion 3 with numerical simulations. We focus on the Hilbert space H = L ([0 , , B , λ ) =: L ([0 , B are the usual Borel sets in [0 ,

1] and λ is the Lebesgue measure in [0 , X is a standard Brownian mo-tion on [0 , f n ( x ) = 1 /x when x ≥ c n and 0 otherwise; and (ii) “ridge” regularization where f n ( x ) = 1 / ( x + α n ) if x ≥ c n and 0 otherwise. As shown in Example 2 of Cardot et al.(2007), we require α n √ n/c n → It is well known (for instance, see Example 4.6.3 of Hsing and Eubank (2015)) the eigenele-ments of the covariance operator of Brownian motion are, λ j = 4((2 j − π ) and e j ( t ) = √ (cid:18) (2 j − π t (cid:19) . λ j = O ( j ) ≤ O ( j log j ). In particular we have δ j = λ j − λ j +1 for j ≥

2, and so λ j + δ j = j +8 j +1) π (2 j +1) (2 j − (cid:46) O ( j ). Recalling (9) and { c n } is a sequence tending to zero, the above implies the upper bound k n (cid:46) O (cid:16) √ c n (cid:17) .With respect to the required rate of our Theorem 2.1 and Assumption 4, we choose β n =(log n ) . Consequently, we have the bounds k / n (log k n ) / (log n )min { β n ,n / } (cid:46) O (cid:18) n c / n (cid:16) log √ c n (cid:17) / (cid:19) and a n √ k n log k n (cid:46) O (cid:16) a n √ c n log √ c n (cid:17) . For the ridge regularization, we also need a choiceof { α n } such that α n √ n/c n →

0. So in all, the c n , a n and α n , all tending to zero, mustalso satisfy the three requirements: (i) n c / n (cid:16) log √ c n (cid:17) / →

0; (ii) a n √ c n log √ c n → α n √ n/c n → c n = C log log n , a n = n , and α n = √ n log n . It is easy toshow these choices of c n ’s, a n ’s and α n ’s will satisfy the aforementioned requirements (i)to (iii). However in ﬁnite samples the choice of the constant C in c n has a material impacton the numerical results. We consider the choices C = λ c (deterministic case) and C = ˆ λ c (data based case) for c = 2 , , , λ j ’s as per the above displayed equation, and this correspondinglyimplies deterministic quantities k n of (9) and f n (through the deﬁning condition x ≥ c n ).In the data based case, we use the random truncation ˆ k n and the corresponding datadependent f n . The exponents c are chosen as such because they generate a good rangeof truncation parameter k n and ˆ k n values for our numerical illustrations; higher values of c imply larger values of k n and ˆ k n .We will consider three diﬀerent coeﬃcient vectors in L ([0 , • ρ ( t ) ≡ • ρ ( t ) = sin (cid:0) π t (cid:1) + sin (cid:0) π t (cid:1) + sin (cid:0) π t (cid:1) ; and • ρ ( t ) = sin(2 πt ) .The ﬁrst choice ρ is used to evaluate the size of our small-uniform statistic W n , while ρ and ρ are used to evaluate power. The second choice is a case where the coeﬃcientvector is exactly spanned by the ﬁrst three eigenvectors of the Brownian motion covarianceoperator. The third choice is an example where the coeﬃcient vector cannot be linearlyspanned by those eigenvectors. We note Cardot et al. (2003) also numerically illustratescases 1 and 3, while Cardot et al. (1999) illustrates case 2. We ﬁx the noise ε distributionas a Gaussian N (0 , σ ε ) distribution where we pick variance σ ε = − snrsnr Var( (cid:104)

X, ρ (cid:105) ), andwhere snr is the “signal-to-noise ratio” and we let it to be snr = 5% and 10%. To focus We emphasize in the deterministic case, it is only in the calculations of C = λ c , k n and f n do weassume we have perfect knowledge of the eigenvalues λ j ’s. The eigendecomposition of ∆ n are still based onthe random observations X i ’s in our simulations. σ ε , we assume throughout all these numerical simulationsthat the noise parameter σ ε is known with certainty.For each of the three example coeﬃcient vectors and each of the two noise distributions,we run n s = 2500 simulations of { ( Y i , X i ) } ni =1 for each of the sample size choices n =50 , , X i and the function ρ are discretized by 100 equispacedpoints in [0 , L ([0 , fdapace package of the R language.Once the FPCA estimator ˆ ρ is constructed as per (11), we can evaluate it pointwiseon [0 ,

1] as ˆ ρ ( t ) = (cid:80) k n j =1 (cid:104) ˆ ρ, ˆ e j (cid:105) ˆ e j ( t ). At the end of each simulation round, we will alsocompute and record the quadratic error measure,error( ρ ) = (cid:90) ( ρ ( t ) − ˆ ρ ( t )) d t, if ρ ≡

0; and (cid:82) ( ρ ( t ) − ˆ ρ ( t )) d t (cid:82) ρ ( t ) d t , otherwise. (22) W n For succinctness in discussing both the deterministic truncation case and the data driventruncation case, let’s denote K n ∈ { k n , (cid:100) ˆ k n avg (cid:101)} . Here ˆ k n avg denotes the averaged randomtruncations over the n s number of simulations for a given sample size choice n , and (cid:100)·(cid:101) isthe ceiling function. That is to say, if we use deterministic truncation we simply set K n tothe known value k n , and if we were to use a data driven truncation, we set K n to be theaveraged truncation parameter.The computation of W n is as written in (21). As mentioned above, we evaluate W n under a known standard deviation σ ε of the error distribution. The optimization step in(21) is computed using a combination of a global and local search. A uniformly randompoint in drawn in R K n and this serves as the initial point in the constrained nonlinearglobal optimizer ISRES of Runarsson and Yao (2005). To further reﬁne the solution,we take the resulting solution point and set that as the initial point in the constrainednonlinear local optimizer COBYLA of Powell (1994). The end of this procedure results inour small-uniform statistic W n . Although not thoroughly experimented in this paper, butwe sense the speciﬁc choices of these numerical optimization algorithms are not particularlyimportant.As a matter of comparison, we will also compute the D n and T n statistics of Cardot https://cran.r-project.org/web/packages/fdapace/ , version 0.5.5. We set a relative error tolerance of 10 − and a maximum of 100 runs. Again, we set a relative error tolerance of 10 − and a maximum of 100 runs.

16t al. (2003). These two statistics are deﬁned as, D n := 1 σ ε ||√ n ∆ n ˆ A n || , T n := D n − k n √ k n where ˆ A n := (cid:80) k n j =1 1ˆ λ j ˆ e j ⊗ ˆ e j . Cardot et al. (2003) show that under the null hypothe-sis, D n (cid:32) χ ( k n ) and T n (cid:32) N (0 , q χ ( k n ) , − α denote the quantile P ( χ ( k n ) ≤ q χ ( k n ) , − α ) = 1 − α and q N (0 , , − α/ denote the quantile P ( N (0 , ≤ q N (0 , , − α/ ) = 1 − α/

2. Then we reject the null hypothesis H : ρ = 0 using the D n statistic if D n > q χ ( k n ) , − α and reject the null hypothesis using the T n statistic if | T n | > √ q N (0 , , − α/ . Otherwise,we accept the null hypothesis. Notice we can only make the comparison of these two statis-tics against our small-uniform statistic W n in the deterministic truncation parameter k n case. (cid:102) W n The distribution of the supremum of our Gaussian process (cid:102) W n must be numerically simu-lated. Note in the data driven case, it necessarily implies a mismatch between the trunca-tion ˆ k n that was used to compute each small-uniform statistic W n for a given simulationepoch, and the asymptotic distribution approximation (cid:102) W n that depends on ˆ k n avg.Let’s describe our simulation procedure. We ﬁrst uniformly draw 25 points on theboundary of a K n -sphere, and then uniformly draw another 25 points in the interior ofthat K n -sphere; that is, a total of 25 = 625 of K n -vectors are drawn. We evaluate thecovariance function (17) on the Cartesian product of these 625 points, and this results in a625 ×

625 dimensional covariance matrix. We draw an observation from a 625-dimensionalmean zero multivariate normal distribution with this covariance matrix. This observationrepresents one sample path of the Gaussian process G P,n . We record the maximum valueof this sample path.We repeat the above procedure 1 . (cid:101) Z n of Theorem 2.1. Finally, we normalize (cid:101) Z n by β n = n ) to arriveat the simulated distribution of (cid:102) W n . Figure 1 plots the results of the simulations. Thequantile numbers q − α are accordingly numerically computed. For h l ∈ J n , l = 1 , h l = (cid:80) K n j =1 b j e j where b = ( b , . . . , b K n ) is a real Euclidean vector inthe K n unit sphere. Consequently, the covariance function (17) can be more explicitly written as a functionon the Cartesian product of two K n unit spheres, c n ( x, y ) = (cid:80) K n j =1 λ j f n ( λ j ) x j y j (cid:16)(cid:113)(cid:80) K n j =1 λ j f n ( λ j ) x j + a n (cid:17) (cid:16)(cid:113)(cid:80) K n j =1 λ j f n ( λ j ) y j + a n (cid:17) where || x || R Kn ≤ || y || R Kn ≤ W n C oun t n (a) c = 3 W n C oun t n (b) c = 4 W n C oun t n (c) c = 5 W n C oun t n (d) c = 7 W n C oun t n (e) c = 8 Figure 1: Histograms of the distribution of (cid:102) W n for various sample sizes n and various exponents c when the FPCA usesridge regularization. These plots are best seen in color. Details of the parameterization are described in Section 4.1.The procedure for simulating (cid:102) W n is described in Section 4.3. The histogram plots for when the FPCA uses simpleregularization are similar; they are not shown for brevity. emark . In this numerical simulation exercise, it was far more time consuming tosimulate and compute the statistic W n than simulating its asymptotic approximation (cid:102) W n .Simply put, both the spectral decomposition of Γ n and the numerical optimization stepsin computing W n are computationally expensive, and made even more so when we have todo this n s many times across various sample size choices n . Of course, in actual practicewhere W n is only computed once based on the given data, the computation time of a single W n is negligible. Choosing the coeﬃcient vector as ρ ( t ) ≡ k n ; and Table 3 (snr = 5%) and Table 1 (snr = 5%) show the case when we usea deterministic truncation k n . Thus these tables illustrate the size properties of our small-uniform statistic. Firstly, we see there is little qualitative diﬀerence of the levels betweenthe deterministic and data driven truncation cases, which suggests random variations inthe eigenelements, and hence in the determination of ˆ k n , do not substantially aﬀect theestimated levels. Thus for the remainder of this section, we will focus on the deterministictruncation k n case, as this focus allows us to further compare our W n statistic againstCardot et al. (2003)’s D n and T n statistics.Let’s focus on Table 1 with snr = 5%. We see the estimated size of our small-uniformstatistic W n (for both the reciprocal and ridge regularization cases) matches the simulatedlevels of its asymptotic distribution (cid:102) W n when the truncation k n is small. However, thismatching deteriorates as the truncation increases, and perhaps paradoxically, also deterio-rates with larger sample sizes. This can be explained from the log errors: as the truncationand sample size increases, the quality of the estimator ˆ ρ of the true coeﬃcient ρ ≡ ρ is zero, the “optimal” truncation shouldsimple be k n = 0. In all, our numerical results suggest the FPCA estimator (and especiallythe case of simple regularization) has signiﬁcant diﬃculty in estimating a null coeﬃcient.And since our small-uniform statistic W n is based on the FPCA estimator, it is thus nosurprise the size performance of W n is also necessarily hampered. In contrast, the D n and T n statistics of Cardot et al. (2003) do not depend on the FPCA estimator, and their nom-inal levels appear to be stable across truncations and sample sizes. Table 3 is the resultswith a higher snr = 10% and exhibit the same qualitative behavior of W n , D n and T n asdiscussed above.Let’s now discuss the empirical power of our statistic W n . Tables 5 (snr = 5%) and 6(snr = 10%) show the results for the power against ρ . By design, ρ is a linear combinationof the ﬁrst three eigenvectors of Γ, and so the “optimal” truncation k n for ρ is exactly 3.Hence, we should expect the best performance for all the statistics W n , D n and T n at k n = 3(i.e. c = 4). Even with a modest sample size of n = 200, it appears the empirical powerof W n (for both the simple and ridge regularizations) yield qualitatively almost identical19ower to that of D n and T n . For the other truncation cases (i.e. corresponding to c = 3 , , W n , again for both the simple and ridge regularizations, yield higherpower than D n and T n . However this observation is not without reservations. On the onehand, higher truncations lead to higher log quadratic errors of ˆ ρ . But on the other hand,it could very well be possible that the estimated coeﬃcient ˆ ρ doesn’t resemble the truecoeﬃcient ρ , but ˆ ρ nonetheless is still signiﬁcantly diﬀerent from the null vector, and thatthe optimizing nature of W n can advantage of this. Thus this suggests our small-uniformstatistic W n is robust at rejecting the null hypothesis H : ρ = 0 even if the underlingFPCA estimator ˆ ρ has high estimation error, as is most evident when using the simpleregularization, in ﬁnite samples.Finally, Tables 7 (snr = 5%) and 8 (snr = 10%) show the results for the power against ρ . This coeﬃcient vector ρ is designed such that it is not a linear combination of theeigenvectors of Γ and so higher truncations k n should yield better results. This coeﬃcientvector ρ example is particularly important because real world coeﬃcient vectors of theFLM are highly unlikely to be just simple linear combinations of the eigenvectors of Γ. Here,the power of our small-uniform statistic W n outperforms that of D n and T n , especially athigh truncations. Although it is not the purpose of this paper to empirically evaluate theperformance of various regularization regimes, it does appear that the log quadratic errorof the FPCA estimator under ridge regularization is substantially lower than when theFPCA estimator uses simple regularization.20able 1: The empirical power (in percentages) of our small-uniform W n statistic along with Cardot et al. (2003)’s D n and T n statistics when ρ ( t ) = ρ ( t ) ≡ ε i has a N (0 , σ ε ) distribution with σ ε = − snrsnr Var( (cid:104)

X, ρ (cid:105) ) with snr = 5%.Here we assume the truncation parameter k n is known. The n here refers to sample size, and c here refers to theexponent associated with the deﬁnition of c n . The “log error” here refers to the average over all the simulations of thelog of the error measure as given in (22). Section 4.3 describes our procedure to obtain the simulated levels of (cid:102) W n .The nominal levels of D n and T n are based on their respective asymptotic distributions as described in Section 4.2. (cid:102) W n (simple regularization) (cid:102) W n (ridge regularization)Simulated level Simulated level Nominal level of D n Nominal level of T n n k n log error 1 5 10 20 log error 1 5 10 20 1 5 10 20 1 5 10 20 c = 350 2 -327.44 2.24 9.60 16.80 31.04 -376.72 2.60 10.56 18.00 31.60 0.92 5.36 10.64 20.04 2.76 5.68 8.04 10.76200 2 -321.85 0.76 3.40 7.32 15.04 -349.58 0.44 3.44 7.16 14.36 0.84 4.64 9.96 20.00 2.48 4.84 7.12 10.401000 2 -450.00 1.20 5.24 9.52 18.64 -469.85 1.04 5.04 9.80 19.92 1.32 5.64 10.16 19.92 3.36 5.84 7.36 10.48 c = 450 3 -133.88 2.08 9.44 15.48 26.28 -229.18 2.12 8.52 15.36 25.96 1.24 5.44 10.84 19.96 2.72 5.52 7.36 11.28200 3 -206.36 0.44 3.60 7.04 13.92 -264.24 0.92 3.20 6.84 15.48 0.56 5.12 9.92 18.60 2.08 5.16 6.80 10.481000 3 -303.70 0.68 4.40 9.16 18.04 -339.78 0.80 4.16 8.60 16.76 0.92 4.92 10.24 19.96 2.60 4.96 7.32 10.72 c = 550 4 58.28 3.32 9.24 15.56 25.72 -139.09 1.48 6.52 11.92 21.16 1.12 5.08 9.56 19.60 2.24 5.08 6.44 12.12200 4 -42.88 1.84 7.20 12.76 24.64 -164.06 2.24 7.68 13.48 24.72 1.00 4.84 9.36 19.40 2.48 4.80 6.60 11.161000 5 -202.19 2.04 7.48 14.20 25.56 -259.44 1.80 7.08 13.64 24.80 0.68 4.76 9.32 19.40 2.12 4.52 6.52 13.48 c = 750 9 356.97 13.52 25.32 34.68 45.00 -75.02 2.64 8.04 14.48 24.76 1.16 5.12 10.32 19.60 2.32 4.52 8.12 17.60200 10 258.27 12.68 28.32 40.00 54.48 -74.13 4.28 11.88 19.04 30.60 0.88 5.64 11.40 21.40 1.60 5.04 9.08 17.601000 11 121.34 12.60 27.12 38.76 53.80 -99.45 6.28 18.80 28.72 42.92 0.96 5.40 10.92 21.00 1.64 4.44 8.84 18.68 c = 850 14 502.37 18.72 32.68 41.52 51.80 -60.25 2.40 8.00 13.48 22.72 1.60 5.96 11.52 21.16 2.40 5.48 9.76 19.76200 16 400.44 19.52 35.36 44.52 56.88 -51.52 4.28 13.16 21.04 33.24 1.40 6.08 11.72 21.72 2.00 5.56 10.60 20.761000 17 267.09 21.60 38.04 48.84 62.04 -65.88 6.32 16.24 24.52 36.32 2.16 7.56 13.44 25.36 2.96 6.40 11.04 21.40 able 2: The empirical power (in percentages) of our small-uniform W n statistic when ρ ( t ) = ρ ( t ) ≡ ε i has a N (0 , σ ε ) distribution with σ ε = − snrsnr Var( (cid:104)

X, ρ (cid:105) ) with snr = 5%. Here we use a data driven truncation parameter ˆ k n .In particular, “ˆ k n avg” is the average truncation value over n s number of simulations, and “ˆ k n std” is the associatedstandard error. The n here refers to sample size, and c here refers to the exponent associated with the deﬁnition of c n . The “log error” here refers to the average over all the simulations of the log of the error measure as given in (22).Section 4.3 describes our procedure to obtain the simulated levels of (cid:102) W n . Simulated level of (cid:102) W n (simple) Simulated level of (cid:102) W n (ridge) n ˆ k n avg ˆ k n std log error 1 5 10 20 log error 1 5 10 20 c = 350 1.47 0.60 -313.18 2.24 9.72 16.80 31.04 -363.41 2.64 10.56 18.00 31.56200 1.57 0.50 -416.98 0.76 3.36 7.28 14.80 -436.73 0.44 3.44 7.20 14.361000 1.93 0.26 -473.03 1.20 5.20 9.52 18.68 -488.62 1.04 5.08 9.88 19.92 c = 450 2.43 1.18 -137.21 2.12 9.56 15.52 26.32 -244.48 2.12 8.52 15.32 25.96200 2.45 0.59 -234.49 0.64 4.08 7.80 15.60 -289.46 1.00 3.36 7.32 15.761000 2.64 0.48 -355.21 0.72 4.60 9.48 18.40 -390.93 0.80 4.16 8.36 16.60 c = 550 3.91 2.27 29.62 3.24 9.20 15.56 25.68 -168.98 1.52 6.72 12.16 21.60200 3.85 1.07 -70.54 1.84 7.16 12.72 24.64 -184.19 2.12 7.56 13.24 24.441000 4.02 0.53 -207.77 1.92 7.28 13.96 24.84 -262.03 1.80 7.36 14.40 26.08 c = 750 10.39 7.65 332.29 13.52 25.36 34.76 45.12 -91.57 2.44 7.72 13.64 23.92200 9.59 3.53 227.29 12.68 28.12 39.56 54.00 -85.21 4.40 12.16 19.80 31.521000 9.71 1.59 86.75 13.00 28.24 39.72 55.20 -112.72 6.56 19.60 29.96 44.36 c = 850 16.41 11.60 473.32 18.96 33.04 42.32 52.56 -75.49 2.44 8.32 14.04 23.24200 15.09 6.62 360.73 19.28 34.92 44.32 56.44 -59.58 4.32 13.16 21.04 33.241000 15.12 2.83 228.74 22.56 39.12 50.28 63.60 -74.90 6.40 16.36 24.76 36.76 able 3: The empirical power (in percentages) of our small-uniform W n statistic along with Cardot et al. (2003)’s D n and T n statistics when ρ ( t ) = ρ ( t ) ≡ ε i has a N (0 , σ ε ) distribution with σ ε = − snrsnr Var( (cid:104)

X, ρ (cid:105) ) with snr = 10%.Here we assume the truncation parameter k n is known. The n here refers to sample size, and c here refers to theexponent associated with the deﬁnition of c n . The “log error” here refers to the average over all the simulations of thelog of the error measure as given in (22). Section 4.3 describes our procedure to obtain the simulated levels of (cid:102) W n .The nominal levels of D n and T n are based on their respective asymptotic distributions as described in Section 4.2. (cid:102) W n (simple regularization) (cid:102) W n (ridge regularization)Simulated level Simulated level Nominal level of D n Nominal level of T n n k n log error 1 5 10 20 log error 1 5 10 20 1 5 10 20 1 5 10 20 c = 350 2 -330.34 2.40 9.88 17.20 31.16 -366.85 2.60 9.40 17.20 31.12 0.92 4.92 9.72 20.36 2.60 5.12 7.12 10.04200 2 -313.85 0.80 3.56 6.56 13.92 -348.63 0.36 2.88 6.60 14.48 1.12 4.88 9.16 19.52 3.04 4.96 6.96 9.521000 2 -452.11 0.88 5.08 9.08 19.56 -472.10 0.48 3.92 8.88 18.16 1.04 5.32 9.16 20.08 3.12 5.44 6.92 9.44 c = 450 3 -124.35 2.88 9.36 15.96 27.44 -224.59 2.64 8.88 15.64 27.88 1.08 5.24 10.36 19.56 2.76 5.40 7.88 10.96200 3 -201.98 0.60 3.32 7.28 15.00 -266.06 0.68 3.72 7.04 13.68 0.88 4.72 9.24 19.16 1.88 4.76 6.40 9.561000 3 -302.72 0.60 3.80 7.80 15.04 -341.54 0.80 4.44 9.36 17.20 0.72 4.32 9.48 18.24 2.20 4.36 6.28 9.92 c = 550 4 57.02 2.88 9.12 15.88 24.76 -141.90 1.80 7.00 12.24 22.32 0.76 5.12 10.44 20.16 2.64 5.04 7.24 12.80200 4 -41.28 1.76 7.80 14.32 25.24 -172.30 1.32 6.96 13.24 23.04 0.80 5.04 10.24 19.76 2.36 5.00 7.44 12.201000 5 -199.48 2.04 6.84 13.52 24.56 -261.46 2.00 7.60 14.20 25.00 1.24 5.12 9.80 19.12 2.36 5.00 6.84 13.64 c = 750 9 358.46 13.96 26.52 34.44 45.64 -74.80 3.00 8.84 14.80 23.92 1.28 5.40 10.64 20.24 2.04 4.52 8.20 17.68200 10 256.43 11.08 24.84 36.08 51.00 -73.54 4.32 13.88 21.80 33.72 1.40 5.60 11.16 21.24 2.28 5.04 8.32 19.041000 11 122.28 11.64 29.28 40.64 56.68 -100.18 6.12 17.40 27.08 41.04 1.32 5.36 11.24 22.20 2.36 4.76 8.80 19.16 c = 850 14 501.98 19.24 32.88 42.36 54.24 -56.42 2.32 8.04 14.48 24.16 1.60 6.60 12.04 22.36 2.32 5.72 10.44 20.84200 16 401.06 20.04 36.96 46.76 59.60 -50.26 5.16 14.00 21.36 33.16 1.88 6.36 11.92 23.44 2.52 5.56 10.00 20.281000 17 264.41 22.44 40.72 51.48 64.32 -65.30 6.88 16.72 25.92 39.08 1.88 7.00 13.20 23.24 2.48 6.08 11.68 21.12 able 4: The empirical power (in percentages) of our small-uniform W n statistic when ρ ( t ) = ρ ( t ) ≡ ε i has a N (0 , σ ε ) distribution with σ ε = − snrsnr Var( (cid:104)

X, ρ (cid:105) ) with snr = 10%. Here we use a data driven truncation parameter ˆ k n .In particular, “ˆ k n avg” is the average truncation value over n s number of simulations, and “ˆ k n std” is the associatedstandard error. The n here refers to sample size, and c here refers to the exponent associated with the deﬁnition of c n . The “log error” here refers to the average over all the simulations of the log of the error measure as given in (22).Section 4.3 describes our procedure to obtain the simulated levels of (cid:102) W n . Simulated level of (cid:102) W n (simple) Simulated level of (cid:102) W n (ridge) n ˆ k n avg ˆ k n std log error 1 5 10 20 log error 1 5 10 20 c = 350 1.48 0.61 -311.28 2.28 9.88 17.20 31.16 -362.82 2.64 9.40 17.20 31.08200 1.57 0.50 -417.98 0.88 3.60 6.72 13.92 -439.45 0.36 2.88 6.60 14.481000 1.93 0.26 -469.98 0.88 5.08 9.00 19.56 -488.47 0.48 3.92 8.88 18.16 c = 450 2.40 1.11 -130.99 2.88 9.36 15.88 27.32 -246.59 2.56 8.80 15.64 27.76200 2.45 0.60 -229.70 0.60 3.32 7.36 15.36 -289.38 0.68 3.60 6.96 13.561000 2.63 0.48 -357.13 0.60 3.84 7.84 15.04 -386.91 0.76 4.28 9.08 16.88 c = 550 3.89 2.25 29.26 2.76 8.92 15.48 24.32 -167.34 1.72 6.64 11.84 21.60200 3.83 1.04 -70.54 1.72 7.80 14.40 25.44 -189.46 1.28 6.80 12.68 22.041000 4.03 0.52 -203.37 2.12 7.00 13.80 24.84 -264.43 2.00 7.72 14.32 25.12 c = 750 10.68 8.11 336.74 13.80 26.40 34.12 45.28 -92.96 3.00 8.88 14.92 24.40200 9.66 3.64 226.90 11.08 25.12 36.68 52.12 -85.96 4.32 14.20 22.48 35.041000 9.76 1.60 87.98 11.68 29.28 40.52 56.60 -112.98 6.36 17.72 27.32 41.40 c = 850 16.05 11.21 472.39 19.12 32.80 42.00 53.84 -69.30 2.20 7.84 14.28 23.76200 15.39 6.86 367.03 20.48 37.32 47.24 59.76 -59.13 5.16 13.88 21.20 32.721000 15.15 2.85 225.53 22.56 40.92 51.60 64.92 -74.06 6.80 16.64 25.44 38.60 able 5: The empirical power (in percentages) of our small-uniform W n statistic along with Cardot et al. (2003)’s D n and T n statistics when ρ ( t ) = ρ ( t ) = sin( πt/

2) + sin(3 πt/

2) + sin(5 πt/

2) and ε i has a N (0 , σ ε ) distribution with σ ε = − snrsnr Var( (cid:104)

X, ρ (cid:105) ) with snr = 5%. Here we assume the truncation parameter k n is known. The n here refers tosample size, and c here refers to the exponent associated with the deﬁnition of c n . The “log error” here refers to theaverage over all the simulations of the log of the error measure as given in (22). Section 4.3 describes our procedureto obtain the simulated levels of (cid:102) W n . The nominal levels of D n and T n are based on their respective asymptoticdistributions as described in Section 4.2. (cid:102) W n (simple regularization) (cid:102) W n (ridge regularization)Simulated level Simulated level Nominal level of D n Nominal level of T n n k n log error 1 5 10 20 log error 1 5 10 20 1 5 10 20 1 5 10 20 c = 350 2 -43.90 20.16 41.92 55.28 69.20 -70.48 20.36 40.76 54.28 69.00 11.60 28.44 40.72 55.84 21.00 28.80 33.92 41.24200 2 -88.55 62.76 83.32 89.88 94.92 -117.76 63.96 83.04 89.88 94.84 65.68 85.64 91.44 95.72 79.12 85.84 88.68 91.561000 2 -196.76 100.00 100.00 100.00 100.00 -206.30 100.00 100.00 100.00 100.00 100.00 100.00 100.00 100.00 100.00 100.00 100.00 100.00 c = 450 3 56.50 13.88 33.04 45.28 59.52 -31.49 17.00 35.76 48.36 64.28 9.48 24.16 35.88 50.96 16.80 24.32 29.44 36.96200 3 -10.33 57.80 78.36 85.92 92.84 -68.13 58.56 77.72 86.80 92.76 59.20 79.28 86.44 93.56 71.76 79.36 83.04 86.841000 3 -117.00 100.00 100.00 100.00 100.00 -148.56 100.00 100.00 100.00 100.00 100.00 100.00 100.00 100.00 100.00 100.00 100.00 100.00 c = 550 4 238.38 11.04 28.80 39.52 53.56 43.23 9.92 25.80 37.20 51.00 7.72 22.72 33.68 48.92 14.16 22.36 27.84 35.36200 4 137.54 58.84 78.12 85.60 91.80 17.25 59.20 79.60 86.84 92.96 52.40 73.40 82.08 89.56 63.76 73.04 77.64 82.801000 5 -8.25 100.00 100.00 100.00 100.00 -64.56 100.00 100.00 100.00 100.00 100.00 100.00 100.00 100.00 100.00 100.00 100.00 100.00 c = 750 9 539.84 27.32 43.40 53.80 65.88 109.07 11.56 25.44 34.96 47.92 6.24 17.24 27.64 40.68 9.36 16.00 20.92 31.28200 10 450.40 70.16 84.84 90.32 94.84 114.77 57.84 76.20 83.88 91.28 40.28 61.92 72.64 83.20 47.28 59.96 66.32 73.881000 11 376.49 100.00 100.00 100.00 100.00 133.72 100.00 100.00 100.00 100.00 99.96 100.00 100.00 100.00 100.00 100.00 100.00 100.00 c = 850 14 692.65 38.80 53.52 61.60 72.08 125.39 10.32 22.64 32.56 44.20 8.36 20.72 29.04 43.20 11.08 18.68 23.92 33.48200 16 617.61 81.88 91.04 94.24 96.76 141.20 60.28 77.24 85.64 91.20 45.84 66.40 75.80 84.52 51.08 63.80 69.84 76.881000 17 567.22 100.00 100.00 100.00 100.00 183.44 100.00 100.00 100.00 100.00 99.96 100.00 100.00 100.00 100.00 100.00 100.00 100.00 able 6: The empirical power (in percentages) of our small-uniform W n statistic along with Cardot et al. (2003)’s D n and T n statistics when ρ ( t ) = ρ ( t ) = sin( πt/

2) + sin(3 πt/

2) + sin(5 πt/

2) and ε i has a N (0 , σ ε ) distributionwith σ ε = − snrsnr Var( (cid:104)

X, ρ (cid:105) ) with snr = 10%. Here we assume the truncation parameter k n is known. The n hererefers to sample size, and c here refers to the exponent associated with the deﬁnition of c n . The “log error” hererefers to the average over all the simulations of the log of the error measure as given in (22). Section 4.3 describesour procedure to obtain the simulated levels of (cid:102) W n . The nominal levels of D n and T n are based on their respectiveasymptotic distributions as described in Section 4.2. (cid:102) W n (simple regularization) (cid:102) W n (ridge regularization)Simulated level Simulated level Nominal level of D n Nominal level of T n n k n log error 1 5 10 20 log error 1 5 10 20 1 5 10 20 1 5 10 20 c = 350 2 -84.04 45.64 69.24 79.76 88.36 -111.81 45.04 69.32 79.80 88.48 31.16 55.08 66.00 78.96 46.00 55.60 60.44 66.40200 2 -138.32 96.48 99.12 99.48 99.72 -169.19 96.60 99.36 99.56 99.88 96.84 99.20 99.48 99.72 98.80 99.24 99.36 99.481000 2 -234.36 100.00 100.00 100.00 100.00 -245.04 100.00 100.00 100.00 100.00 100.00 100.00 100.00 100.00 100.00 100.00 100.00 100.00 c = 450 3 -8.32 35.12 59.00 70.60 82.40 -89.08 34.04 58.68 71.08 82.24 26.80 49.28 61.80 75.32 38.76 49.40 55.52 62.56200 3 -70.53 94.84 98.84 99.40 99.84 -134.69 94.68 98.76 99.32 99.80 94.92 98.56 99.52 99.84 97.44 98.56 99.08 99.521000 3 -175.07 100.00 100.00 100.00 100.00 -209.87 100.00 100.00 100.00 100.00 100.00 100.00 100.00 100.00 100.00 100.00 100.00 100.00 c = 550 4 162.68 29.00 51.52 64.64 77.20 -26.20 24.76 45.36 57.20 71.80 22.28 44.08 56.40 71.04 32.28 43.56 50.52 57.48200 4 70.80 95.08 98.68 99.44 99.80 -53.39 93.72 98.28 99.16 99.80 92.96 97.88 99.12 99.60 96.16 97.88 98.36 99.201000 5 -65.50 100.00 100.00 100.00 100.00 -121.91 100.00 100.00 100.00 100.00 100.00 100.00 100.00 100.00 100.00 100.00 100.00 100.00 c = 750 9 469.73 43.40 62.52 72.40 81.20 40.43 23.88 44.60 55.88 69.36 17.16 35.60 46.48 59.88 22.16 33.76 40.24 48.32200 10 394.15 96.32 99.12 99.60 99.76 53.64 93.68 97.68 98.72 99.56 84.20 94.48 96.68 98.80 88.20 93.44 95.64 96.921000 11 353.78 100.00 100.00 100.00 100.00 95.27 100.00 100.00 100.00 100.00 100.00 100.00 100.00 100.00 100.00 100.00 100.00 100.00 c = 850 14 624.19 54.16 68.96 76.20 84.00 55.43 23.52 42.60 53.24 64.68 19.44 35.40 46.00 59.76 23.16 33.00 39.32 48.00200 16 568.11 98.12 99.20 99.84 99.96 79.12 93.20 97.40 98.64 99.60 85.36 92.88 95.84 97.88 87.48 91.84 93.72 96.121000 17 553.00 100.00 100.00 100.00 100.00 154.03 100.00 100.00 100.00 100.00 100.00 100.00 100.00 100.00 100.00 100.00 100.00 100.00 able 7: The empirical power (in percentages) of our small-uniform W n statistic along with Cardot et al. (2003)’s D n and T n statistics when ρ ( t ) = ρ ( t ) = sin(2 πt ) and ε i has a N (0 , σ ε ) distribution with σ ε = − snrsnr Var( (cid:104)

X, ρ (cid:105) )with snr = 5%. Here we assume the truncation parameter k n is known. The n here refers to sample size, and c here refers to the exponent associated with the deﬁnition of c n . The “log error” here refers to the average over allthe simulations of the log of the error measure as given in (22). Section 4.3 describes our procedure to obtain thesimulated levels of (cid:102) W n . The nominal levels of D n and T n are based on their respective asymptotic distributions asdescribed in Section 4.2. (cid:102) W n (simple regularization) (cid:102) W n (ridge regularization)Simulated level Simulated level Nominal level of D n Nominal level of T n n k n log error 1 5 10 20 log error 1 5 10 20 1 5 10 20 1 5 10 20 c = 350 2 0.08 12.84 31.64 44.00 59.76 -4.85 13.36 31.40 43.64 58.56 9.08 22.64 32.12 47.84 15.92 23.20 27.56 32.24200 2 -13.51 41.60 62.60 73.68 82.80 -16.30 39.52 62.20 73.20 82.64 50.84 71.72 80.92 88.72 64.40 72.32 76.72 81.081000 2 -21.50 99.88 100.00 100.00 100.00 -21.66 99.92 100.00 100.00 100.00 99.92 100.00 100.00 100.00 100.00 100.00 100.00 100.00 c = 450 3 4.12 12.68 29.00 41.00 56.80 -12.96 12.48 29.84 40.64 55.00 10.28 24.76 35.56 50.60 17.20 24.84 30.00 36.40200 3 -43.32 48.80 69.60 79.12 87.24 -50.12 48.56 68.84 79.36 87.56 57.84 77.96 85.88 92.64 69.32 78.08 81.92 86.761000 3 -100.84 100.00 100.00 100.00 100.00 -102.20 99.92 100.00 100.00 100.00 100.00 100.00 100.00 100.00 100.00 100.00 100.00 100.00 c = 550 4 53.53 11.64 27.48 37.24 51.76 -32.05 9.92 23.56 33.96 47.52 9.00 23.52 34.08 49.08 15.24 23.16 28.32 35.48200 4 -27.28 63.00 81.44 88.04 93.20 -72.30 58.76 76.92 85.16 91.80 55.52 76.36 84.48 91.16 67.28 76.04 80.72 85.001000 5 -107.67 99.96 100.00 100.00 100.00 -118.97 100.00 100.00 100.00 100.00 99.96 100.00 100.00 100.00 99.96 100.00 100.00 100.00 c = 750 9 319.01 23.00 40.76 51.08 62.52 -18.76 10.40 25.16 33.72 46.68 5.20 17.08 26.80 40.68 7.64 14.96 21.00 30.08200 10 223.07 68.04 84.44 90.40 95.64 -44.75 60.36 77.80 84.76 91.04 38.60 60.72 71.80 82.40 46.12 59.24 65.84 73.161000 11 111.44 100.00 100.00 100.00 100.00 -66.55 100.00 100.00 100.00 100.00 99.92 100.00 100.00 100.00 99.96 100.00 100.00 100.00 c = 850 14 466.78 31.60 48.80 58.12 68.44 -11.74 10.20 24.76 34.32 46.60 5.12 16.04 26.00 39.32 7.24 14.52 20.44 30.12200 16 374.11 73.68 87.40 92.04 96.24 -28.19 56.96 76.12 84.08 89.92 34.28 56.00 66.80 78.60 39.40 52.28 59.56 68.281000 17 276.11 100.00 100.00 100.00 100.00 -35.84 100.00 100.00 100.00 100.00 99.72 100.00 100.00 100.00 99.84 100.00 100.00 100.00 able 8: The empirical power (in percentages) of our small-uniform W n statistic along with Cardot et al. (2003)’s D n and T n statistics when ρ ( t ) = ρ ( t ) = sin(2 πt ) and ε i has a N (0 , σ ε ) distribution with σ ε = − snrsnr Var( (cid:104)

X, ρ (cid:105) )with snr = 10%. Here we assume the truncation parameter k n is known. The n here refers to sample size, and c here refers to the exponent associated with the deﬁnition of c n . The “log error” here refers to the average over allthe simulations of the log of the error measure as given in (22). Section 4.3 describes our procedure to obtain thesimulated levels of (cid:102) W n . The nominal levels of D n and T n are based on their respective asymptotic distributions asdescribed in Section 4.2. (cid:102) W n (simple regularization) (cid:102) W n (ridge regularization)Simulated level Simulated level Nominal level of D n Nominal level of T n n k n log error 1 5 10 20 log error 1 5 10 20 1 5 10 20 1 5 10 20 c = 350 2 -6.12 28.44 49.00 61.36 74.48 -8.02 29.52 53.04 64.12 76.36 22.36 42.60 53.56 67.60 34.28 42.88 47.08 54.08200 2 -18.25 80.48 92.08 95.72 98.40 -18.98 80.20 92.28 95.44 97.96 87.72 96.40 97.68 99.32 93.96 96.44 97.24 97.801000 2 -22.50 100.00 100.00 100.00 100.00 -22.64 100.00 100.00 100.00 100.00 100.00 100.00 100.00 100.00 100.00 100.00 100.00 100.00 c = 450 3 -12.89 28.44 49.48 61.04 74.44 -20.69 29.00 49.08 61.24 73.96 25.64 46.32 59.08 71.52 36.76 46.44 52.40 60.04200 3 -59.21 89.80 96.36 98.08 99.16 -58.12 88.16 95.92 97.72 99.04 94.16 98.36 99.16 99.72 97.08 98.36 98.80 99.201000 3 -108.69 100.00 100.00 100.00 100.00 -106.68 100.00 100.00 100.00 100.00 100.00 100.00 100.00 100.00 100.00 100.00 100.00 100.00 c = 550 4 1.66 29.60 49.80 61.44 72.20 -45.34 25.04 45.36 56.84 69.44 24.64 45.80 58.64 71.68 33.64 45.32 52.32 60.08200 4 -70.58 94.92 98.36 99.28 99.80 -89.75 93.36 97.76 98.96 99.64 93.20 97.56 98.76 99.64 95.80 97.44 98.32 98.881000 5 -128.01 100.00 100.00 100.00 100.00 -130.83 100.00 100.00 100.00 100.00 100.00 100.00 100.00 100.00 100.00 100.00 100.00 100.00 c = 750 9 249.77 40.64 61.48 71.20 81.08 -40.74 26.52 46.76 57.16 70.00 16.52 34.60 46.44 61.48 21.84 32.84 39.40 48.48200 10 157.24 94.96 98.36 99.32 99.68 -74.06 93.76 98.08 99.00 99.56 83.00 94.04 96.64 98.44 87.64 93.56 95.12 96.881000 11 61.37 100.00 100.00 100.00 100.00 -95.50 100.00 100.00 100.00 100.00 100.00 100.00 100.00 100.00 100.00 100.00 100.00 100.00 c = 850 14 395.72 47.16 64.40 72.40 81.64 -36.90 25.20 44.28 54.96 67.56 13.24 31.32 41.84 56.20 17.48 28.72 35.40 44.64200 16 307.97 96.08 98.68 99.24 99.76 -62.10 92.12 97.24 98.48 99.36 76.64 90.12 94.12 97.16 80.64 88.60 91.68 94.521000 17 236.78 100.00 100.00 100.00 100.00 -65.24 100.00 100.00 100.00 100.00 100.00 100.00 100.00 100.00 100.00 100.00 100.00 100.00 Concluding remarks

This paper introduces a small-uniform statistic W n that is constructed as a fractional pro-gramming problem out of the FPCA estimator ˆ ρ of the slope ρ of the functional linearmodel. Our main result Theorem 2.1 shows W n converges in probability to the supre-mum of a Gaussian process (cid:102) W n . The key arguments to showing our main result are bytaking advantage of identifying the regressors’ underlying Hilbert space with its dual, andalso recent advances by Chernozhukov et al. (2014) in studying the suprema of empiricalprocesses indexed by functionals.We see two interesting directions in extending the small-uniform statistic. Firstly,while this paper focuses on the most commonly studied scalar-on-functional FLM, it seemsfeasible to extend our statistic to a functional-on-functional FLM. Secondly, the recentand growing literature on functional time series regressions represent a more excitingchallenge of extending our small-uniform statistic. In particular, it is clear one needs amodiﬁcation of Step I in our proof of Theorem 2.1 to a functional time series context.More importantly, we conjecture the required extension of our Step II will call for newresults in studying the empirical processes constructed out of dependent random variables. For example, see Panaretos et al. (2013) and H¨ormann et al. (2015). ppendices A Proofs

Throughout the proofs, we will use the following asymptotic approximation notations. Wewill always use

C, c > x, y >

0, we denote x (cid:16) y to mean cy ≤ x ≤ Cy . For a real sequence { x n } , wewill write x n (cid:46) O ( a n ) to mean there exists some sequence { y n } such that | x n | ≤ C | y n | ,and | y n | ≤ c | a n | . Likewise, if { X n } is a sequence of random variables and { a n } is adeterministic sequence, we will write X n (cid:46) O P ( a n ) to mean there exists some sequence ofrandom variables { Y n } such that | X n | ≤ C | Y n | with Y n = O P ( a n ); i.e. for all (cid:15) > C > P ( | Y n /a n | ≤ C ) ≥ − (cid:15) for all n . We will also use || · || to denotethe operator norm; that is, for a bounded operator A ∈ B ( H , H ) =: B ( H ), we denote || A || := sup || h ||≤ || Ah || . We will denote the space of compact operators on H as B ( H ). Remark

A.1 (Our proof arguments vis-´a-vis that of Cardot et al. (2007)) . Our proof argu-ments of Step I are heavily inspired by Cardot et al. (2007). Indeed, our proofs of Proposi-tions A.5 and A.6 are heavily based on the arguments of (Cardot et al., 2007, Propositions2 and 3). But we also have two signiﬁcant deviations. Firstly, a critical diﬀerence is thatwe neither deﬁne their set E j ( z ) nor use their Lemma 4. In particular, we can’t understandone their key arguments (last displayed equation on their pg 351) which seemingly requiresthe expression “sup z ∈B j E j ( z )”, of which measureability concerns arise. Instead of pursingthis argument direction, we simply recognize that the norm || ( z − Γ n ) − || is bounded aboveby the reciprocal of the distance from z ∈ B j ⊆ ρ (Γ n ) to its spectrum σ (Γ n ). And thanksto the choice of the contours C n and the event A n from Lemma A.1, this reciprocal canbe approximated by the reciprocal of the radius of B j ; see (53). This argument allows usto estimate || ( z − Γ n ) − || without the need to deal with potential measureability issuesassociated with the event E j ( z ).Secondly, we do not work with “square-roots” of the resolvent; that is, we do not writeexpressions like “( z − Γ) − / ”. It is unclear to us whether this is necessarily a well-deﬁnedobject. A lot of the work in our proofs goes to re-deriving the results of Cardot et al.(2007) using only the resolvent but without invoking a square-root of the resolvent. Thispartly explains why our convergence rates diﬀer from theirs.Let’s ﬁrst setup some preliminary deﬁnitions and results. Let ι := √−

1. Denote theorientated circle of the complex plane with center λ i and radius δ i / B i := { λ i + δ i e πιt : In general, if A is self-adjoint, then it is clear that its resolvent ( z − A ) − for z ∈ ρ ( A ) is also self-adjoint. In particular, being self-adjoint implies it is normal. But conventional deﬁnitions of the square-rootof an operator require the underlying operator to be normal and compact. Thus to deﬁne a square-root “( z − A ) − / ”, it necessarily requires ( z − A ) − to be compact (i.e. so is in B ( H )). But clearly( z − A ) ∈ B ( H ). That implies ( z − A )( z − A ) − = id H is compact — which is only possible if H isﬁnite-dimensional (a case which we explicitly do not consider throughout this paper). ∈ [0 , } . We also denote the orientated circle ˆ B i analogously with the center at ˆ λ i andradius ˆ δ i /

2. Deﬁne, C n := k n (cid:91) i =1 B i . With some abuse of notations, for the approximate reciprocal f n that satisﬁes Assump-tion 3, we will denote also f n as its analytic extension to the interior of C n . By Rieszfunctional calculus (see Conway (1994) and Kato (1995)), we can deﬁneΓ † := f n (Γ) = 12 πι (cid:90) C n ( z − Γ) − f n ( z ) d z. (23)Moreover, the projection of H onto span { e , . . . , e k n } can be written as,Π k n = 12 πι (cid:90) C n ( z − Γ) − d z. (24)Deﬁne the event, A n := k n (cid:92) j =1 (cid:26) | ˆ λ j − λ j | < δ j (cid:27) (25)The following lemma show that, asymptotically, integrating over a collection of random cir-cle traces centered at the empirical eigenvalues is equivalent to integrating over a collectionof deterministic circle traces centered at the population eigenvalues. Lemma A.1.

Let f : C → C be an analytic function. Deﬁne ˆ C n := k n (cid:91) j =1 ˆ B j (26) Then we have πι (cid:90) ˆ C n f ( z )( z − Γ n ) − d z = A n πι (cid:90) C n f ( z )( z − Γ n ) − d z + r n (27) where r n is a random operator with || r n || = O P (cid:18) k n log k n √ n (cid:19) (28) Proof.

On the event A n , and by deﬁnition of the operator valued contour integral, it isimmediate that A n πι (cid:90) ˆ C n f ( z )( z − Γ n ) − d z = A n πι (cid:90) C n f ( z )( z − Γ n ) − d z C n tothe deterministic domain C n . Thus we can write,12 πι (cid:90) ˆ C n f ( z )( z − Γ n ) − d z = A n πι (cid:90) ˆ C n f ( z )( z − Γ n ) − d z + A cn πι (cid:90) ˆ C n f ( z )( z − Γ n ) − d z ≡ A n πι (cid:90) C n f ( z )( z − Γ n ) − d z + r n It remains to show the operator r n converges to zero in probability at some appropriaterate. Fix any (cid:15) ∈ (0 , P ( || r n || > (cid:15) ) ≤ P ( A cn > (cid:15) ) = P ( A cn )At this point, the rest of the proof follows exactly as in (Cardot et al., 2007, Lemma 5),who show P ( A cn ) ≤ C √ n k n log k n . This completes the proof.

Remark

A.2 . For completeness, we should check that even on the event A n and any j =1 , . . . , k n , the integral (cid:82) C n f ( z )( z − Γ n ) − d z is well deﬁned for all z ∈ B j . This is particularlysince the resolvent ( · − Γ n ) − has singularities exactly at the eigenvalues of Γ n . Of course,if we are integrating over the random empirical contours ˆ B j the resolvent ( · − Γ n ) − is welldeﬁned by deﬁnition. The ﬁnite rank operator Γ n has spectrum σ (Γ n ) = { , ˆ λ , . . . , ˆ λ n } for which we had assumed ˆ λ > ˆ λ > · · · > ˆ λ n >

0. So immediately by deﬁnition of theevent A n , the point z is not equal to any one of ˆ λ , . . . , ˆ λ k n . However, we still need tocheck such z is not equal to any one of ˆ λ k n +1 , . . . , ˆ λ n . Because of the strictly decreasingordering of the ˆ λ j ’s, it suﬃces to check that z does not equal to ˆ λ k n +1 .For contradiction, suppose there exists some z ∈ B k n with z = ˆ λ k n +1 . Then z =ˆ λ k n +1 = λ k n + δ k n / λ k n + ( λ k n − λ k n +1 ) / λ k n / − λ k n +1 /

2. But on the event A n ,we have | ˆ λ k n − λ k n | < δ k n /

4. This implies δ k n / > | λ k n / − λ k n +1 / − λ k n | = δ k n /

2, whichis a contradiction.In all, this implies on the event A n the resolvent ( · − Γ n ) − is well deﬁned on B j for all j = 1 , . . . , k n .The primary uses of Lemma A.1 are with the case f ≡ f as f n . With onlya little more work via the Borel-Cantelli lemma, (Crambes and Mas, 2013, Proposition 13)shows P (lim sup A cn ) = 0 if ( k n log k n ) /n →

0. But for our purposes we want to keep track Actually from (Conway, 1994, Proposition VII.4.6), we clearly don’t need the strong condition that f is analytic over all of C . But this strong condition is easier to state and suﬃces for our paper.

32f the various rates of convergences and thus we do not invoke this result. Note that a lookinto the proof shows the result holds regardless of whether f is dependent on n . Indeed,Lemma A.1 motivates the deﬁnitionsˆΠ k n := A n πι (cid:90) ˆ C n ( z − Γ n ) − d z ≡ A n πι (cid:90) C n ( z − Γ n ) − d z Γ † n := A n πι (cid:90) ˆ C n f n ( z )( z − Γ n ) − d z ≡ A n πι (cid:90) C n f n ( z )( z − Γ n ) − d z (29)Let’s observe a simple bound on the roughened standard deviation t n ( h ) that we willrepeatedly use. Lemma A.2. (i) For any h ∈ J n , t n ( h ) ≥ f n ( λ ) λ / k n || h || + a n (ii) Provided Assumption 4 holds, then for n suﬃciently large, sup h ∈J n (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ht n ( h ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) (cid:46) O ( (cid:112) k n log k n ) Proof. (i): For any h ∈ J n , we can write h = (cid:80) k n j =1 b j e j for some b j ∈ R such that || h || = (cid:80) k n j =1 b j ≤ t n ( h ) = (cid:113) || Γ / Γ † h || + a n ≥ (cid:118)(cid:117)(cid:117)(cid:116) k n (cid:88) j =1 b j f n ( λ j ) λ j + a n ≥ (cid:118)(cid:117)(cid:117)(cid:116) f n ( λ ) λ k n k n (cid:88) j =1 b j + a n = f n ( λ ) λ / k n || h || + a n (ii): It is clear the supremum is not achieved at h = 0 for any n . Thus applying thecalculations of part (i) for any h ∈ J n \ { } , (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ht n ( h ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ || h || f n ( λ ) λ / k n || h || + a n ≤ f n ( λ ) λ / k n + a n Since f n ( λ ) λ / k n = (cid:16) λ O (cid:16) √ n (cid:17) + 1 (cid:17) O (cid:16)(cid:113) k n log k n (cid:17) = O (cid:16) √ k n log k n (cid:17) , we have f n ( λ ) λ / k n + a n = O (cid:16) max (cid:110) √ k n log k n , a n (cid:111)(cid:17) = O (cid:16) √ k n log k n (cid:17) where the last equality follows from As-sumption 4.As outlined in the proof outline of this paper’s main result Theorem 2.1, there are twodistinct steps to proving the result. 33 .1 Step I The T n term is directly handled by (Cardot et al., 2007, Lemma 6); we record the resulthere for completeness. Proposition A.3. If (1) holds, ||T n || = o P (cid:18) √ n (cid:19) The next result is critical in the proofs of Propositions A.6 and A.5.

Lemma A.4.

For any suﬃciently large j and n . E (cid:2) || ( z − Γ) − (Γ n − Γ) || (cid:3) (cid:46) j log jn , for all z ∈ B j Proof.

Since Γ , Γ n ∈ B ( H ) ⊆ B ( H ) then it follows that the resolvent R ( z ; Γ) := ( z − Γ) − is also in B ( H ), and thus (Γ − Γ n )( z − Γ) − ∈ B ( H ). Hence we can bound theoperator norm || · || by the Hilbert-Schmidt norm || · || HS 12 || (Γ − Γ n )( z − Γ) − || ≤ || (Γ − Γ n )( z − Γ) − || ≡ ∞ (cid:88) l =1 || (Γ − Γ n )( z − Γ) − ( e l ) || = ∞ (cid:88) l,k =1 z − λ l ) |(cid:104) (Γ − Γ n ) e l , e k (cid:105)| (30)Observe that for z ∈ B j and l (cid:54) = j , by the triangle inequality we have that | z − λ l | ≥ | λ l − λ j | . (31)In addition, by the KL expansion, we have for all l, k = 1 , , . . . E [ |(cid:104) (Γ n − Γ) e l , e k (cid:105)| ] ≤ n E (cid:104) (cid:104) X , e l (cid:105) (cid:104) X , e k (cid:105) (cid:105) ≤ Mn λ l λ k (32)Thus, applying (32) and (31) into (30) E (cid:2) || (Γ − Γ n )( z − Γ) − || (cid:3) = 4 M λ j δ j n ∞ (cid:88) k =1 λ k + 4 M n ∞ (cid:88) l (cid:54) = j λ l ( λ l − λ j ) ∞ (cid:88) k =1 λ k , (33)for all z ∈ B j . See (Conway, 1994, Exercise IX.2.19)

34t this point we need to investigate the behavior of (cid:80) l (cid:54) = j λ l ( λ l − λ j ) . Let’s decompose, (cid:88) l (cid:54) = j λ l ( λ l − λ j ) =  j − (cid:88) l =1 + j (cid:88) l = j +1 + ∞ (cid:88) l =2 j +1  λ l ( λ l − λ j ) =: T + T + T (34)By (Cardot et al., 2007, Lemma 1) where we have λ l − λ j ≥ (1 − l/j ) λ l , and recallingthat the eigenvalues are strictly decreasing, T = j − (cid:88) l =1 λ l ( λ l − λ j ) ≤ λ j − j − (cid:88) l =1 − l/j ) = j λ j −

16 ( π − ψ (1) ( j )) (35)where ψ ( m ) is the polygamma function of order m . Similarly, we have T ≤ λ j +1 λ j j π − ψ (1) ( j + 1)) (36)And since for l ≥ j + 1 we have λ j − λ l ≥ λ j − λ j +1 > λ j − λ j ≥ (1 − j/ (2 j )) λ j = 2 λ j ,this implies, T < λ j ∞ (cid:88) l =2 j +1 λ l ≤ λ j ((2 j + 1) + 1) λ j +1 < λ j j + 1) λ j = j + 12 λ j (37)where the second inequality follows from (Cardot et al., 2007, Lemma 1).Now we use the following well-known bounds of the polygamma function: for m ≥ x >

0, ( m − x m + m !2 x m +1 ≤ ( − m +1 ψ ( m ) ( x ) ≤ ( m − x m + m ! x m +1 (38)and applying (38) to (35)-(37) we obtain the bounds, T ≤ C j λ j , T ≤ C j λ j , T ≤ C j + 1 λ j (39)Putting (39) back into (34), we arrive at, (cid:88) l (cid:54) = j λ l ( λ l − λ j ) ≤ C j λ j (40) Observe that this is a diﬀerent term compared to that of (Cardot et al., 2007, Lemma 2) The polygamma function of order m is deﬁned to be the ( m + 1)th derivative of the logarithm of thegamma function; or equivalently, it is the m th derivative of the digamma function. E (cid:2) || (Γ − Γ n )( z − Γ) − || (cid:3) ≤ C n max (cid:40) λ j δ j , j λ j (cid:41) , (41)for all z ∈ B j .Applying Condition 2 shows that for suﬃciently large j ,max (cid:40) λ j δ j , j λ j (cid:41) ≤ C max (cid:8) j log j, j log j (cid:9) = Cj log j This completes the proof.

Proposition A.5.

For suﬃciently large n , sup h ∈J n (cid:12)(cid:12)(cid:12)(cid:12)(cid:28) Y n , ht n ( h ) (cid:29)(cid:12)(cid:12)(cid:12)(cid:12) (cid:46) O P (cid:32) k / n (log k n ) / √ n (cid:33) Proof.

Firstly using Lemma A.1, we haveˆΠ k n − Π k n ≡ πι k n (cid:88) j =1 (cid:34)(cid:90) ˆ B j ( z − Γ n ) − d z − (cid:90) B j ( z − Γ) − d z (cid:35) = 12 πι k n (cid:88) j =1 (cid:34) A n (cid:90) B j ( z − Γ n ) − d z − ( A n + A cn ) (cid:90) B j ( z − Γ) − d z (cid:35) + r n = A n πι k n (cid:88) j =1 (cid:90) B j (cid:2) ( z − Γ n ) − − ( z − Γ) − (cid:3) d z − A cn πι k n (cid:88) j =1 (cid:90) B j ( z − Γ) − d z + r n By the resolvent identity, and this is feasible only because we are on the event A n , A n πι k n (cid:88) j =1 (cid:90) B j (cid:2) ( z − Γ n ) − − ( z − Γ) − (cid:3) d z = A n πι k n (cid:88) j =1 (cid:90) B j ( z − Γ n ) − (Γ n − Γ)( z − Γ) − d z A n πι k n (cid:88) j =1 (cid:90) B j ( z − Γ n ) − (Γ n − Γ)( z − Γ) − d z =: S n + R n (42)where we deﬁne, S n := A n πι k n (cid:88) j =1 (cid:90) B j ( z − Γ) − (Γ n − Γ)( z − Γ) − d z (43a) R n := A n πι k n (cid:88) j =1 (cid:90) B j ( z − Γ) − (Γ n − Γ)( z − Γ) − (Γ n − Γ)( z − Γ n ) − d z (43b)Thus, the above equation can be rewritten as,ˆΠ k n − Π k n = S n + R n − A cn πι k n (cid:88) j =1 (cid:90) B j ( z − Γ) − d z + r n (44)By the triangle inequality and Cauchy-Schwartz inequality,sup h ∈J n (cid:12)(cid:12)(cid:12)(cid:12)(cid:28) Y n , ht n ( h ) (cid:29)(cid:12)(cid:12)(cid:12)(cid:12) ≤ sup (cid:12)(cid:12)(cid:12)(cid:12)(cid:28) S n ρ, ht n ( h ) (cid:29)(cid:12)(cid:12)(cid:12)(cid:12) + sup (cid:12)(cid:12)(cid:12)(cid:12)(cid:28) R n ρ, ht n ( h ) (cid:29)(cid:12)(cid:12)(cid:12)(cid:12) + A cn π k n (cid:88) j =1 (cid:90) B j sup (cid:12)(cid:12)(cid:12)(cid:12)(cid:28) ( z − Γ) − ρ, ht n ( h ) (cid:29)(cid:12)(cid:12)(cid:12)(cid:12) d z + sup (cid:12)(cid:12)(cid:12)(cid:12)(cid:28) r n ρ, ht n ( h ) (cid:29)(cid:12)(cid:12)(cid:12)(cid:12) (45)We will individually bound the four terms on the right hand side of (45). Let’s ﬁrst dis-cuss those last two remaining terms. By again Cauchy-Schwartz inequality and Lemma A.1,sup h ∈J n (cid:12)(cid:12)(cid:12)(cid:12)(cid:28) r n ρ, ht n ( h ) (cid:29)(cid:12)(cid:12)(cid:12)(cid:12) ≤ || ρ || sup (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ht n ( h ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) || r n || The || r n || term is bounded by Lemma A.1.For the third integral expression, by Cauchy-Schwartz inequality again A cn π k n (cid:88) j =1 (cid:90) B j sup h ∈J n (cid:12)(cid:12)(cid:12)(cid:12)(cid:28) ( z − Γ) − ρ, ht n ( h ) (cid:29)(cid:12)(cid:12)(cid:12)(cid:12) d z ≤ || ρ || sup (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ht n ( h ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) A cn π k n max j =1 ,...,k n sup z ∈B j || ( z − Γ) − || diam( B j ) < || ρ || sup (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ht n ( h ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) π A cn k n (cid:46) sup (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ht n ( h ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) O P (cid:18) k n log k n √ n (cid:19) k n (46)37n particular, we used that for any z ∈ B j , || ( z − Γ) − || ≤ z, σ (Γ)) = 1 δ j / σ (Γ) denotes the spectrum of Γ. By the choice of radii δ j / B j ’s, any point z ∈ B j is not an eigenvalue of Γ, which by deﬁnition implies z is in theresolvent set of Γ. Hence the ﬁrst inequality of (47) follows from standard results on thenorm of a resolvent (e.g. (Conway, 1994, Proposition VII.3.9)). The equality dist( z, σ (Γ)) = δ j / B j .In addition diam( B j ) = δ j , and that P ( A cn ) (cid:46) k n log k n √ n from the proof of Lemma A.1.Thus by Lemma A.2, the two remainder terms of (45) are of order,sup (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ht n ( h ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) O P (cid:18) k n log k n √ n + k n log k n √ n (cid:19) = O ( (cid:112) k n log k n ) O P (cid:18) k n log k n √ n + k n log k n √ n (cid:19) = O P (cid:32) k / n (log k n ) / √ n (cid:33) (48)Now we turn to bounding the S n term in (45).The S n term: By triangle inequality, || S n || ≤ π k n (cid:88) j =1 (cid:90) B j || ( z − Γ) − (Γ n − Γ) || || ( z − Γ) − || d z (49)Firstly, we have the bound sup z ∈B j || ( z − Γ) − || < /δ j again by (47). Thus by Lemma A.4,we have in all E [ ||S n || ] (cid:46) k n (cid:88) j =1 sup z ∈B j E [ || ( z − Γ) − (Γ n − Γ) || ] sup z ∈B j || ( z − Γ) − || diam( B j ) (cid:46) k n (cid:88) j =1 (cid:114) j log jn · δ j · δ j (cid:46) k / n (log k n ) / √ n It is worth noting our discussions of this S n term is substantially diﬀerent than that of (Cardot et al.,2007, Proposition 2). In particular, while the integral of the j th summand of S n has an explicit form dueto Dauxois et al. (1982), but for our purposes of obtaining moment bounds, knowing this explicit form isunnecessary. In contrast to our purposes, the desired computation of the predicted value E [ (cid:104) S n ρ, X n +1 (cid:105) ]in (Cardot et al., 2007, Proposition 2) gives them an extra smoothing property which can take advantageof the explicit form of S n . Indeed, an earlier draft of this paper uses analogous proofs methods of this S n term as the authors and we arrive at the same rate in (50).

38y Markov’s and Jensen’s inequality and Lemma A.2sup h ∈J n |(cid:104) S n ρ, x (cid:105)| (cid:46) sup h ∈J n (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ht n ( h ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) O P (cid:32) k / n (log k n ) / √ n (cid:33) = O P (cid:18) k n log k n √ n (cid:19) (50)This completes the discussion of the S n term.The R n term: We ﬁrst deﬁne for j = 1 , . . . , k n , T j,n := A n (cid:90) B j ( z − Γ) − (Γ n − Γ)( z − Γ) − (Γ n − Γ)( z − Γ n ) − d z (51)so that we can write R n = 12 πι k n (cid:88) j =1 T j,n By again the Cauchy-Schwartz inequality, we havesup h ∈J n (cid:12)(cid:12)(cid:12)(cid:12)(cid:28) T j,n ρ, ht n ( h ) (cid:29)(cid:12)(cid:12)(cid:12)(cid:12) ≤ || ρ || sup (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ht n ( h ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) (cid:90) B j || ( z − Γ) − (Γ n − Γ)( z − Γ) − (Γ n − Γ)( z − Γ n ) − || A n d z ≤ || ρ || sup (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ht n ( h ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) (cid:90) B j || ( z − Γ) − (Γ n − Γ) || || ( z − Γ n ) − || A n d z (52)Recall from Remark A.2 that z ∈ B j is also in the resolvent set of Γ n . So we havedist( z, σ (Γ n )) = | z − ˆ λ j | ≥ | z − λ j | − | ˆ λ j − λ j | = δ j / − δ j / δ j /

2. By the same argumentsfor (47), we have || ( z − Γ n ) − || A n ≤ z, σ (Γ n )) A n ≤ δ j / A n ≤ δ j (cid:46) j log j (53)Consider the expectation and use Lemma A.4, E (cid:34)(cid:90) B j || ( z − Γ) − (Γ n − Γ) || d z (cid:35) = (cid:90) B j E (cid:2) || ( z − Γ) − (Γ n − Γ) || (cid:3) d z (cid:46) j log jn diam( B j )= j log jn δ j (cid:46) j log jn j log j = j n

39y Markov’s inequality, we thus have that (cid:90) B j || ( z − Γ) − (Γ n − Γ) || d z = O P (cid:18) j n (cid:19) (54)Putting (54) and (53) together we have (cid:90) B j || ( z − Γ) − (Γ n − Γ) || || ( z − Γ n ) − || A n d z (cid:46) O P (cid:18) ( j log j ) j n (cid:19) = O P (cid:18) j log jn (cid:19) So by Lemma A.2,sup h ∈J n (cid:12)(cid:12)(cid:12)(cid:12)(cid:28) R n , ht n ( h ) (cid:29)(cid:12)(cid:12)(cid:12)(cid:12) (cid:46) sup h ∈J n (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ht n ( h ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) O P  n k n (cid:88) j =1 j log j  (cid:46) O ( (cid:112) k n log k n ) O P (cid:18) k n log k n n (cid:19) = O P (cid:32) k / n (log k n ) / n (cid:33) (55)This completes the proof of the R n term.Summary: Now we can ﬁnally put everything together. Putting (50), (55) and (48)back into (45),sup h ∈J n (cid:12)(cid:12)(cid:12)(cid:12)(cid:28) Y n , ht n ( h ) (cid:29)(cid:12)(cid:12)(cid:12)(cid:12) (cid:46) O P (cid:18) k n log k n √ n (cid:19) + O P (cid:32) k / n (log k n ) / n (cid:33) + O P (cid:32) k / n (log k n ) / √ n (cid:33) This completes the proof.

Proposition A.6.

For suﬃciently large n , sup h ∈J n (cid:12)(cid:12)(cid:12)(cid:12)(cid:28) S n , ht n ( h ) (cid:29)(cid:12)(cid:12)(cid:12)(cid:12) (cid:46) O P (cid:32) k / n (log k n ) / √ n (cid:33) Proof.

The proof of this result closely follows the development of the proof of Proposi-tion A.5. By an entirely analogous arguments leading up to (42), we obtain the decompo-40ition sup h ∈J n (cid:12)(cid:12)(cid:12)(cid:12)(cid:28) S n , ht n ( h ) (cid:29)(cid:12)(cid:12)(cid:12)(cid:12) ≤ sup h ∈J n (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ht n ( h ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) (cid:32) A n πι k n (cid:88) j =1 (cid:90) B j | f n ( z ) | || ( z − Γ) − (Γ n − Γ) || || ( z − Γ) − U n || d z + A n πι k n (cid:88) j =1 (cid:90) B j | f n ( z ) | || ( z − Γ) − (Γ n − Γ) || || ( z − Γ n ) − || || U n || d z + A cn πι k n (cid:88) j =1 (cid:90) B j | f n ( z ) | || ( z − Γ) − || || U n || d z + || r n || || U n || (cid:33) (56)Firstly, let’s see that U n = O P (1). Since || U n || ≤ n (cid:80) ni =1 || X i || | ε i | , taking expectationsand applying Jensen’s inequality, we have E [ || U n || ] ≤ C . By Markov’s inequality, thisimplies || U n || = O P (1) (57)Combining (57) along with the analogous arguments of the last two expressions of (48)from Proposition A.5, we have thatLast two expressions inparentheses of (56) (cid:46) O P (cid:18) k n log k n √ n (cid:19) O P (1) + O P (cid:18) k n log k n √ n (cid:19) O P (1)= O P (cid:18) k n log k n √ n (cid:19) (58)It thus suﬃces to concentrate the discussion on the ﬁrst two expressions of (56). Thanksto the arguments from Proposition A.5, we have already handled the terms || ( z − Γ n ) − || A n and (cid:82) B j || ( z − Γ) − (Γ n − Γ) || d z . Thus it remains to discuss the terms: (i) | f n ( z ) | and (ii) || ( z − Γ) − U n || . Term (i) : By Condition 1, we have thatsup z ∈B j | f n ( z ) | ≤ δ j (cid:18) C √ n (cid:19) (cid:46) ( j log j ) (cid:18) √ n (cid:19) (59) Term (ii) : Fix any z ∈ B j . By deﬁnition of the adjoint of a linear operator and usingthe Hilbert-Schmidt norm, || ( z − Γ) − U n || = || U n ( z − Γ) − || ≤ || U n ( z − Γ) − || = ∞ (cid:88) l =1 z − λ l )  n n (cid:88) i =1 (cid:104) X i , e l (cid:105) ε i + 1 n n (cid:88) i (cid:54) = j (cid:104) X i , e l (cid:105) (cid:104) X j , e l (cid:105) ε i ε j  E [ || ( z − Γ) − U n || ] ≤ σ ε n ∞ (cid:88) l =1 λ l ( z − λ l ) = σ ε n  λ j ( z − λ j ) + ∞ (cid:88) l (cid:54) = j λ l ( z − λ l )  ≤ σ ε n (cid:18) λ j ( δ j / + C j λ j (cid:19) (cid:46) n (cid:18) j log j ( j log j ) + j ( j log j ) (cid:19) = 1 n (cid:0) j log j + j log j (cid:1) (cid:46) j log jn where the third line follows from (40) in the proof of Proposition A.5. Thus by Chebyshev’sinequality, it follows we have for all z ∈ B j , || ( z − Γ) − U n || = O P (cid:32) j / (log j ) / √ n (cid:33) (60)Putting (59) and (60) together along with the already discussed terms from Proposi-tion A.5, it followsThe j th summand ofthe 1st expression inparentheses of (56) (cid:46) ( j log j ) (cid:18) √ n (cid:19) · O P (cid:32)(cid:114) j n (cid:33) · O P (cid:32) j / (log j ) / √ n (cid:33) (cid:46) O P (cid:32) j / (log j ) / n (cid:33) (61)Let’s now discuss the second expression of (56). Using (59), (54), (53) and (57), wehave The j th summand ofthe 2nd expression inparentheses of (56) (cid:46) ( j log j ) (cid:18) √ n (cid:19) · O P (cid:18) j n (cid:19) · O a . s . ( j log j ) · O P (1) (cid:46) O P (cid:18) j (log j ) n (cid:19) (62)42inally, summing (61) and (62), using (58) in (56) and using Lemma A.2,sup h ∈J n (cid:12)(cid:12)(cid:12)(cid:12)(cid:28) S n , ht n ( h ) (cid:29)(cid:12)(cid:12)(cid:12)(cid:12) (cid:46) O ( (cid:112) k n log k n ) (cid:32) O P (cid:32) k / n (log k n ) / n (cid:33) + O P (cid:18) k n (log k n ) n (cid:19) + O P (cid:18) k n log k n √ n (cid:19)(cid:33) = O P (cid:32) k / n (log k n ) / √ n (cid:33) This completes the proof.The following result summarizes the discussions of Step I.

Proposition A.7 (Nuisance terms converge rate) . For suﬃciently large n , sup h ∈J n (cid:12)(cid:12)(cid:12)(cid:12) (cid:104) ( T n + Y n + S n ) ρ, h (cid:105) t n ( h ) (cid:12)(cid:12)(cid:12)(cid:12) (cid:46) O P (cid:32) k / n (log k n ) / √ n (cid:33) . Proof.

By Propositions A.3, A.5 and A.6, the displayed equation on the left hand side isbounded above by O P ( (cid:112) k n log k n ) o P (cid:18) √ n (cid:19) + O P (cid:32) k / n (log k n ) / √ n (cid:33) + O P (cid:32) k / n (log k n ) / √ n (cid:33) A.2 Step II

We now move onto Step II. The key to showing that Step II holds is to cast the R n term into an empirical process theory framework and apply the approximation results ofChernozhukov et al. (2014). Let’s setup some standard notations. Let N ( (cid:15), T, || · || ) denotethe covering number of radius (cid:15) > T, || · || ). Let’s also denote theuniform entropy integral (see (van der Vaart and Wellner, 1996, Chapter 2.14)) for the set T equipped with measurable cover F , J ( δ, T ) := sup Q (cid:90) δ (cid:113) N ( (cid:15) || F || Q, , T, L ( Q )) d (cid:15) where the supremum is taken over all discrete probability measures Q with || F || Q, > T , we will denote l ∞ ( T ) as the space of all bounded functions T → R with the uniform norm || f || T := sup t ∈ T | f ( t ) | .43y the Riesz representation theorem, J n and its dual space J ∗ n are isometrically iso-morphic (this is especially since we’re working with real valued Hilbert spaces). Thus foreach h ∈ J n , we can identify h ∈ J n with h ∗ ∈ J ∗ n such that h ∗ ( · ) = (cid:104) h, ·(cid:105) . With someabuse of notations, we will write R n ( h ) ≡ n n (cid:88) i =1 (cid:104) h, Γ † X i ε i (cid:105) = 1 n n (cid:88) i =1 h ∗ (Γ † X i ε i ) = 1 n n (cid:88) i =1 h ∗ ( V i ) =: R n ( h ∗ )for V i,n := Γ † X i ε i . Note that Γ † depends on n but is otherwise entirely deterministic, andhence for each n , { V ,n , . . . , V n,n } is an iid sequence. Note that (cid:112) P | h ∗ | = t n ( h ) − a n andnoting that P h ∗ = E [ h ∗ ( V )] = 0, we normalize to write √ n R n ( h ) σ ε t n ( h ) = 1 √ n n (cid:88) i =1 h ∗ ( V i,n ) σ ε ( (cid:112) P | h ∗ | + a n ) =: G n g, g ∈ G n (63)where we deﬁne the class, G n := (cid:40) h ∗ σ ε ( (cid:112) P | h ∗ | + a n ) : h ∗ ∈ J ∗ n (cid:41) (64)In other words, we have the equivalence between the expressions sup h ∈J n √ nσ ε t n ( h ) R n ( h ) andsup f ∈G n G n g . Most importantly this casts the handling of the R n term into an empiricalprocess framework.Let’s ﬁrst record some basic entropic properties about G n . These entropic propertiesare particularly simple to derive precisely due to the structure of J n . Lemma A.8 (Entropic properties of G n ) . (i) The VC index of G n is V ( G n ) ≤ ( k n + 2) .(ii) A measurable cover for G n is the (constant) function F n ( g ) ≡ σ ε (cid:16) f n ( λ ) λ / k n + a n (cid:17) , for all g ∈ G n . (iii) The (cid:15) -covering number for G n satisﬁes for any discrete probability measure Q , N ( (cid:15) || F n || Q, , G n , L ( Q )) ≤ (cid:18) A n (cid:15) (cid:19) ν n where A n := (cid:0) KV ( G n )(16 e ) V ( G n ) (cid:1) V ( G n ) − and ν n := 2( V ( G n ) − , and (cid:15) ∈ (0 , .(iv) Assume ν n ≥ . Then the uniform entropy integral for G n satisﬁes, J ( δ, G n ) ≤ δ √ ν n (cid:16) (cid:112) A n /δ ) (cid:17) v) If δ ∈ (0 , is a constant that is independent of n , then for suﬃciently large n , J ( δ, G n ) (cid:46) δ (cid:16)(cid:112) O ( V ( G n )) + (cid:112) O ( V ( G n )) − log δ (cid:17) (cid:46) δ O ( k n ) Proof. (i) Since J n is isomorphic to R k n , and thus J ∗ n is isomorphic to R k n . By van derVaart and Wellner (1996) Lemma 2.6.15 and Lemma 2.6.18(vii), the VC index of G n satisﬁes V ( G n ) ≤ ( k n + 2) .(ii) Recall Lemma A.2. Moreover, by Riesz representation theorem || h || = || h ∗ || forany h ∈ J n and where h ∗ is its unique dual. For any non-zero g ∈ G n there exists somenon-zero h ∗ ∈ J ∗ n such that, || g || = (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) h ∗ σ ε ( (cid:112) P | h ∗ | + a n ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ || h ∗ || σ ε (cid:16) f n ( λ ) λ / k n || h || + a n (cid:17) ≤ σ ε (cid:16) f n ( λ ) λ / k n + a n (cid:17) (iii) By (van der Vaart and Wellner, 1996, Theorem 2.6.7), N ( (cid:15) || F n || Q, , G n , L ( Q )) ≤ KV ( G n )(16 e ) V ( G n ) (cid:18) (cid:15) (cid:19) V ( G n ) − for an universal constant K and (cid:15) ∈ (0 , A n and ν n .(iv) By part (iii) and change of variables, J ( δ, G n ) ≤ (cid:90) δ (cid:112) ν n log( A n /(cid:15) ) d (cid:15) ≤ A n √ ν n (cid:90) ∞ A n /δ √ (cid:15)(cid:15) d (cid:15) Observe we have the indeﬁnite integral, (cid:90) √ xx d x = − √ xx − e Γ (cid:18) , x (cid:19) + constwhere here Γ( s, z ) := (cid:82) ∞ z t s − e − t d t is the upper incomplete gamma function. Since for aﬁxed s , lim z →∞ Γ( s, z ) = 0, it follows that lim x →∞ Γ (cid:0) , x ) (cid:1) = 0. It is clear thatlim x →∞ √ xx = 0. Thus it follows, (cid:90) ∞ A n /δ √ (cid:15)(cid:15) d (cid:15) = (cid:112) A n /δ ) A n /δ + 12 e Γ (cid:18) , log( A n /δ ) (cid:19) = (cid:112) A n /δ ) A n /δ + e √ π efrc (cid:16)(cid:112) A n /δ ) (cid:17) where efrc is the complementary error function, efrc ( x ) := 1 − erf ( x ) = 1 − √ π (cid:82) x e − t d t = √ π (cid:82) ∞ x e − t d t . Using the bound efrc ( x ) ≤ e − x , we obtain the bound as displayed.45v) By (iii), ν n log A n = log K + log V ( G n ) + V ( G n ) log(16 e ) = O ( V ( G n ))and moreover, ν n = 2( V ( G n ) −

1) = O ( V ( G n )). And thus by (iv), J ( δ, G n ) ≤ δ (cid:16)(cid:112) O ( V ( G n )) + (cid:112) O ( V ( G n )) + O ( V ( G n )) − log δ (cid:17) = δ (cid:16)(cid:112) O ( V ( G n )) + (cid:112) O ( V ( G n )) − log δ (cid:17) (cid:46) δ O ( (cid:112) V ( G n ))Apply (i) which implies V ( G n ) = O ( k n ) and we have the displayed result.Next we state a slightly modiﬁed version of the key results of Chernozhukov et al.(2014) that’s applicable for our context. Theorem A.9 (Gaussian approximation to suprema of empirical processes index by VCtype classes; Chernozhukov et al. (2014)) . Fix n ≥ . Let ( G n , || · || ) be a subset of a normedseparable space of real functions f : X → R and is equipped with an envelope F n . Suppose:(i) G n is pre-Gaussian. That is, there exists a tight Gaussian random variable G P,n in l ∞ ( G n ) with mean zero and covariance function, E [ G P,n ( f ) G P,n ( g )] = P ( f g ) = E [ f ( Z ) g ( Z )] , for all f, g ∈ G n (ii) The (cid:15) -covering number of G n satisﬁes sup Q N ( (cid:15) || F n || Q, , G n , L ( Q )) ≤ (cid:0) A n (cid:15) (cid:1) ν n forsome ν n ≥ and A n > , and where the supremum is taken over all discrete proba-bility measures Q such that || F n || Q, > ;(iii) For some b n ≥ σ n > and q ∈ [4 , ∞ ] , we have sup f ∈G n P | f | k ≤ σ n b k − n for k = 2 , and || F n || P,q ≤ b n .Let Z n := G n f . Then for every γ ∈ (0 , , there exists a random variable ˜ Z n :=sup f ∈G n G P,n f such that P (cid:32) | Z n − ˜ Z n | > b n K n γ / n / − /q + ( b n σ n ) / K / n γ / n / + ( b n σ n K n ) / γ / n / (cid:33) ≤ C (cid:18) γ + log nn (cid:19) where K n := cν n max (cid:26) log n , (cid:16)(cid:16) (cid:113) A n b n σ n (cid:17)(cid:17) (cid:27) and c, C > are constants thatonly depend on q . emark A.3 . This result is nothing more than Corollary 2.2 of Chernozhukov et al. (2014),which is based on their key result Theorem 2.1. We refer to their paper for the proof. Butlet’s remark on what small proof modiﬁcations we need to adapt their result to our Theo-rem A.9. The major diﬀerence between our stated result and their Corollary 2.2 is the con-dition on the constant A in the covering number bound. Their Corollary 2.2 requires A ≥ e but we do not impose this requirement here. Indeed, from Lemma A.8(i), it is unnatural torequire that A n ≥ e for all n , especially since we only have an upper bound for V ( G n ) andnot a lower bound. For their proofs, the authors only require the condition A ≥ e to arriveat the uniform entropy integral condition J ( δ, F ) (cid:46) δ (cid:112) ν log( A/δ ). From Lemma A.8(iv),we have instead a slightly larger bound of J ( δ, G n ) ≤ δ √ ν n (1 + (cid:112) A n /δ )). Con-sequently and by inspecting the proofs of their Corollary 2.2, it suﬃces to replace theirdeﬁnition of K n = cν (log n ∨ log Abσ ) with our slightly larger K n , then the remainder oftheir proof goes through to our case.This following result will conclude Step II. Proposition A.10.

Fix any γ ∈ (0 , and assume k n /n → . Then there exists a meanzero Gaussian process G P,n in (cid:96) ∞ ( J n ) with the displayed covariance function (17) suchthat the random variables Z n := sup h ∈J n √ nσ ε t n ( h ) R n ( h ) and (cid:101) Z n := sup h ∈J n G P,n h have, | Z n − (cid:101) Z n | (cid:46) O P (cid:32) k / n (log k n ) / (log n ) γ / n / + k / n (log k n ) / (log n ) / γ / n / + k / n (log k n ) / (log n ) / γ / n / (cid:33) Proof.

We apply Theorem A.9 with the choice of q = ∞ and recall the notations from thattheorem statement. Fix any n ≥ g ∈ G n . Firstly the second moment is, P | g | = P | h ∗ | σ ε ( (cid:112) P | h ∗ | + a n ) ≤ σ ε P | g | = 1( σ ε (cid:112) P | h ∗ | + a n ) P | h ∗ | ≤ σ ε (cid:112) P | h ∗ | + a n ) P | h ∗ | ≤ σ ε (cid:16) f n ( λ ) λ / k n || h ∗ || + a n (cid:17) (cid:112) P | h ∗ | ≤ σ ε (cid:16) f n ( λ ) λ / k n || h ∗ || + a n (cid:17) (cid:114) sup j E [ ξ j ] λ f n ( λ k n ) || h ∗ || = C f n ( λ k n ) || h ∗ || σ ε (cid:16) f n ( λ ) λ / k n || h ∗ || + a n (cid:17) ≤ C f n ( λ k n ) σ ε (cid:16) f n ( λ ) λ / k n + a n (cid:17) Thus it suﬃces to set, σ n := 1 σ ε ,b n := max  σ ε (cid:16) f n ( λ ) λ / k n + a n (cid:17) , C f n ( λ k n ) σ ε (cid:16) f n ( λ ) λ / k n + a n (cid:17)  = 1 σ ε (cid:16) f n ( λ ) λ / k n + a n (cid:17) max  , C f n ( λ k n ) σ ε (cid:16) f n ( λ ) λ / k n + a n (cid:17)  (cid:46) ( (cid:112) k n log k n ) max { , ( k n log k n ) } (cid:46) k / n (log k n ) / for which we obtain sup g ∈G n P | g | ≤ σ n and sup g ∈G n P | g | ≤ σ n b n and F n ≤ b n . Note that b n σ n (cid:46) k / n (log k n ) / .Let’s now obtain a bound for K n . Observe that using Lemma A.8, ν n log A n b n σ n = ν n log b n σ n + log K + log V ( G n ) + V ( G n ) log(16 e ) (cid:46) O ( k n ) O (log k n + log log k n ) + O (1) + O (log k n ) + O ( k n )= O ( k n log k n ) 48nd so ν n (cid:32) (cid:114) A n b n σ n (cid:33) = 2 ν n + 2 √ ν n (cid:114) ν n + ν n log A n b n σ n + ν n log A n b n σ n = O ( k n ) + (cid:112) O ( k n ) (cid:112) O ( k n ) + O ( k n log k n ) + O ( k n log k n )= O ( k n log k n )This implies, K n := cν n max  log n , (cid:32) (cid:114) A n b n σ n (cid:33)  (cid:46) max {O ( k n ) log n , O ( k n log k n ) } = O ( k n log n )where we used that k n /n → Z n there exists a random variable (cid:102) W n with which we have a mean-zero Gaussian process { G P,n ( g ) } g ∈G n with covariance function E [ G P,n ( g ) G P,n ( g )] = (cid:10) Γ / Γ † h , Γ / Γ † h (cid:11) ( || Γ / Γ † h || + a n )( || Γ / Γ † h || + a n ) , for all g , g ∈ G n such that g i = h ∗ i √ P | h ∗ i | with h ∗ i ∈ J ∗ n , i = 1 ,

2. Indeed, thanks to again to the Rieszrepresentation theorem, we can identify this Gaussian process G P,n indexed by G n withcovariance function on the left hand side with a Gaussian process indexed by J n havingthe covariance function on the right hand side. So with some abuse of notations, we canwrite (cid:101) Z n = sup g ∈G n G P,n g = sup h ∈J n G P,n h . Moreover, Z n and (cid:101) Z n satisfy, for all γ ∈ (0 , | Z n − (cid:101) Z n | = O P (cid:32) k / n (log k n ) / (log n ) γ / n / + k / n (log k n ) / (log n ) / γ / n / + k / n (log k n ) / (log n ) / γ / n / (cid:33) We can ﬁnally summarize everything and put Steps I and II together.

Theorem A.11.

Deﬁne (cid:101) Z n := sup h ∈J n G P,n h where { G P,n ( h ) } h ∈J n is a mean zero Gaus-sian process on (cid:96) ∞ ( J n ) with covariance function (17) . Then for suﬃciently large n , (cid:12)(cid:12)(cid:12)(cid:12) sup h ∈J n (cid:28) √ nσ ε t n ( h ) ( ˆ ρ − ˆΠ k n ρ ) , h (cid:29) − (cid:101) Z n (cid:12)(cid:12)(cid:12)(cid:12) (cid:46) O P (cid:32) k / n (log k n ) / + k / n (log k n ) / (log n ) n / (cid:33) roof. By (19), Propositions A.7 and A.10 with an arbitrarily ﬁxed γ ∈ (0 ,

1) and usingthe notations therein, (cid:12)(cid:12)(cid:12)(cid:12) sup h ∈J n (cid:28) √ nσ ε t n ( h ) ( ˆ ρ − ˆΠ k n ρ ) , h (cid:29) − (cid:101) Z n (cid:12)(cid:12)(cid:12)(cid:12) ≤ sup h ∈J n (cid:12)(cid:12)(cid:12)(cid:12) √ nσ ε t n ( h ) (cid:104)T n + S n + Y n , h (cid:105) (cid:12)(cid:12)(cid:12)(cid:12) + (cid:12)(cid:12)(cid:12)(cid:12) sup h ∈J n √ nσ ε t n ( h ) (cid:104)R n , x (cid:105) − (cid:101) Z n (cid:12)(cid:12)(cid:12)(cid:12) (cid:46) √ n O P (cid:32) k / n (log k n ) / n / (cid:33) + O P (cid:32) k / n (log k n ) / (log n ) γ / n / + k / n (log k n ) / (log n ) / γ / n / + k / n (log k n ) / (log n ) / γ / n / (cid:33) The result follows by taking the higher order terms.The main result of Theorem 2.1 displayed in the main text is thus simply Theorem A.11normalized by the appropriate rate.

References

Bosq, D. (2012):

Linear Processes in Function Spaces: Theory and Applications , vol. 149,Springer Science & Business Media.

Cai, T. T., P. Hall, et al. (2006): “Prediction in functional linear regression,”

TheAnnals of Statistics , 34, 2159–2179.

Cardot, H., F. Ferraty, A. Mas, and P. Sarda (2003): “Testing hypotheses in thefunctional linear model,”

Scandinavian Journal of Statistics , 30, 241–255.

Cardot, H., F. Ferraty, and P. Sarda (1999): “Functional linear model,”

Statistics& Probability Letters , 45, 11–22.

Cardot, H., A. Mas, and P. Sarda (2007): “CLT in functional linear regressionmodels,”

Probability Theory and Related Fields , 138, 325–361.

Cardot, H. and P. Sarda (2011): “Functional linear regression,” in

The Oxford Hand-book of Functional Data Analysis . Chernozhukov, V., D. Chetverikov, K. Kato, et al. (2014): “Gaussian approxi-mation of suprema of empirical processes,”

The Annals of Statistics , 42, 1564–1597.

Conway, J. B. (1994):

A Course in Functional Analysis , Springer, 2nd ed.50 rambes, C. and A. Mas (2013): “Asymptotics of prediction in functional linear regres-sion with functional outputs,”

Bernoulli , 19, 2627–2651.

Cuesta-Albertos, J. A., E. Garc´ıa-Portugu´es, M. Febrero-Bande, andW. Gonz´alez-Manteiga (2019): “Goodness-of-ﬁt tests for the functional linear modelbased on randomly projected empirical processes,”

The Annals of Statistics , 47, 439–467.

Dauxois, J., A. Pousse, and Y. Romain (1982): “Asymptotic Theory for the PrincipalComponent Analysis of a Vector Random Function: Some Applications to StatisticalInference,”

Journal of Multivariate Analysis , 12, 136–154.

Goia, A. and P. Vieu (2016): “An introduction to recent advances in high/inﬁnitedimensional statistics,” .

Hilgert, N., A. Mas, N. Verzelen, et al. (2013): “Minimax adaptive tests for thefunctional linear model,”

Annals of Statistics , 41, 838–869.

H¨ormann, S., (cid:32)L. Kidzi´nski, and M. Hallin (2015): “Dynamic functional principalcomponents,”

Journal of the Royal Statistical Society: Series B: Statistical Methodology ,319–348.

Horv´ath, L. and P. Kokoszka (2012):

Inference for functional data with applications ,vol. 200, Springer Science & Business Media.

Hsing, T. and R. Eubank (2015):

Theoretical Foundations of Functional Data Analysis,with an Introduction to Linear Operators , vol. 997, John Wiley & Sons.

Kato, T. (1995):

Perturbation Theory for Linear Operators , Springer Science & BusinessMedia, 2nd ed.

Leung, R. C. W. and Y.-M. Tam (2021): “Supplement to “A Small-Uniform Statisticfor the Inference of Functional Linear Regressions”,” .

Panaretos, V. M., S. Tavakoli, et al. (2013): “Fourier analysis of stationary timeseries in function space,”

The Annals of Statistics , 41, 568–603.

Powell, M. J. (1994): “A direct search optimization method that models the objec-tive and constraint functions by linear interpolation,” in

Advances in optimization andnumerical analysis , Springer, 51–67.

Ramsay, J. and B. W. Silverman (2005):

Functional Data Analysis , Springer-VerlagNew York, 2nd ed.

Runarsson, T. P. and X. Yao (2005): “Search biases in constrained evolutionary opti-mization,”

IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applicationsand Reviews) , 35, 233–243. 51 tancu-Minasian, I. M. (2012):

Fractional programming: theory, methods and applica-tions , vol. 409, Springer Science & Business Media. van der Vaart, A. W. and J. A. Wellner (1996):

Weak Convergence and EmpiricalProcesses: With Applications to Statistics , Springer.

Wang, J.-L., J.-M. Chiou, and H.-G. M¨uller (2016): “Functional Data Analysis,”

Annual Review of Statistics and Its Application , 3, 257–295.

Yao, F., H.-G. M¨uller, and J.-L. Wang (2005): “Functional linear regression analysisfor longitudinal data,”