Factor-augmented Smoothing Model for Functional Data
FFactor-augmented Smoothing Model for FunctionalData
Yuan Gao * Research School of Finance, Actuarial Studies and StatisticsAustralian National UniversityHan Lin ShangDepartment of Actuarial Studies and Business AnalyticsMacquarie UniversityYanrong YangResearch School of Finance, Actuarial Studies and StatisticsAustralian National UniversityFebruary 5, 2021
Abstract
We propose modeling raw functional data as a mixture of a smooth function and a high-dimensional factor component. The conventional approach to retrieving the smooth functionfrom the raw data is through various smoothing techniques. However, the smoothing modelis not adequate to recover the smooth curve or capture the data variation in some situations.These include cases where there is a large amount of measurement error, the smoothing basisfunctions are incorrectly identified, or the step jumps in the functional mean levels are neglected.To address these challenges, a factor-augmented smoothing model is proposed, and an iterativenumerical estimation approach is implemented in practice. Including the factor model compo-nent in the proposed method solves the aforementioned problems since a few common factorsoften drive the variation that cannot be captured by the smoothing model. Asymptotic theoremsare also established to demonstrate the effects of including factor structures on the smoothingresults. Specifically, we show that the smoothing coefficients projected on the complementspace of the factor loading matrix is asymptotically normal. As a byproduct of independentinterest, an estimator for the population covariance matrix of the raw data is presented basedon the proposed model. Extensive simulation studies illustrate that these factor adjustmentsare essential in improving estimation accuracy and avoiding the curse of dimensionality. Thesuperiority of our model is also shown in modeling Canadian weather data and Australiantemperature data.
Keywords:
Basis function misspecification; Functional data smoothing; High-dimensionalfactor model; Measurement error; Statistical inference on covariance estimation * Postal address: Research School of Finance, Actuarial Studies and Statistics, Level 4, Building 26C, Kingsley St,Australian National University, Canberra, ACT 2601, Australia; Email: [email protected] a r X i v : . [ s t a t . M E ] F e b Introduction
With the increasing capability to store data, functional data analysis (FDA) has received growingattention over the last 20 years. Functional data are considered realizations of smooth randomobjects in graphical representations of curves, images, and shapes. The monographs of Ramsay& Silverman (2002, 2005) and Ramsay & Hooker (2017) provide a comprehensive account of themethodology and applications of the FDA; other relevant monographs include Ferraty & Vieu(2006) and Horv ´ath & Kokoszka (2012). More recent advances in this field can be found in manysurvey papers (see, e.g., Cuevas 2014, Febrero-Bande et al. 2017, Goia & Vieu 2016, Reiss et al. 2017,Wang et al. 2016). One main challenge in the FDA lies in the fact that we cannot observe functionalcurves directly, but only discrete points, which are often contaminated by measurement errors.To model a mixture of functional data and high-dimensional measurement error, we introduce afactor-augmented smoothing model (FASM).We denote a random sample of n functional data as X i ( u ) , i =
1, . . . , n , and u ∈ I ⊂ R , where I is a compact interval on the real line R . In practice, the observed data are discrete points and areoften contaminated by noise or measurement error. We use Y ij to represent the j th observation onthe i th subject; the observed data can then be expressed as a “signal plus noise” model: Y ij = X i ( u j ) + η ij , j =
1, . . . p , i =
1, . . . , n .We use X i ( u j ) to denote the realization of the j th discrete point on the curve X i ( · ) , and η ij isthe noise or measurement error. We assume that measurement error only occurs where themeasurements are taken; thus, the error η i = ( η i , . . . , η ip ) is a multivariate term of dimension p . Though in practice, the signal function component X i = ( X i ( u ) , . . . , X i ( u j )) is of the same p dimension, it differs from η i in nature. Although functions are potentially infinite-dimensional, wemay impose smoothing assumptions on the functions, which usually implies functions possess oneor more derivatives. This smoothness feature is used to separate the functions from measurementerrors – a functional smoothing procedure.When the variance of the noise level is a tiny fraction of the variance of the function, we saythe signal-to-noise ratio is high. In this case, classic smoothing tools apply to functional data,including kernel methods (e.g. Wand & Jones 1995), local polynomial smoothing (e.g. Fan &Gijbels 1996), and spline smoothing (e.g. Wahba 1990, Eubank 1999, Green & Silverman 1999).With pre-smoothed functions, estimates, such as mean and covariance functions, can be furtherobtained. More recent studies on functional smoothing approaches include Cai & Yuan (2011), Yao& Li (2013), and Zhang & Wang (2016). In this article, we apply basis smoothing to the functions X i ( u ) ; that is, we represent X i ( u ) as X i ( u ) = ∑ Kk = c ik φ k ( u ) , where { φ k ( u ) , k =
1, . . . , K } are the2asis functions and { c ik , i =
1, . . . , n , k =
1, . . . , K } are the smoothing coefficients. The smoothingmodel then becomes Y ij = K ∑ k = c ik φ k ( u j ) + η ij , j =
1, . . . p , i =
1, . . . , n .When the signal-to-noise level is low, smoothing tools may not be adequate in removing themeasurement error and may cause an inefficient estimation of the smoothing coefficients. Letus take a further look at the measurement error η ij . In the FDA, the number of discrete points p on each subject is often large compared with the sample size n . Hence the term η i is a high-dimensional component. In this case, the observed data are, in fact, a mixture of functional dataand high-dimensional data. The existence of the large measurement error η ij raises the curse ofdimensionality problem, which naturally calls for the application of dimension reduction modelsto η ij . Many studies have been conducted on various dimension reduction techniques for high-dimensional data; among theses, factor models are widely used (e.g. Fan et al. 2008, Lam et al.2011).We propose using a factor model for the measurement error term. Without further informationon the measurement error, factor model is appropriate since the estimation of latent factors doesnot require any observed variables. The high-dimensional measurement error is assumed to bedriven by a small number of unobserved common factors. η ij = a (cid:62) j f i + (cid:101) ij , i =
1, . . . , n , j =
1, . . . , p ,where f i ∈ R r are the unobserved factors, a j ∈ R r are the unobserved factor loadings, r is thenumber of latent factors, and (cid:101) ij are idiosyncratic errors with mean zero. Thus, the observed data Y ij can be written as the sum of two components: Y ij = K ∑ k = c ik φ k ( u j ) + a (cid:62) j f i + (cid:101) ij , i =
1, . . . , n , j =
1, . . . , p .This is a basis smoothing model with the factor-augmented form. This proposed model can beeasily modified to adopt nonparametric smoothing methods. In Section 5, we illustrate the use ofspline smoothing approaches. In Section 7.5, the nonparametric smoothing model is applied tosimulated data.In this paper, we motivate the FASM in three considerations, as listed below. In these three cases,using the proposed model remedies the defects of the traditional smoothing model. Examples ofthe following three motivations are provided in Section 2.1. In traditional smoothing models, the measurement error η ij is assumed to be non-informative3nd independently and identically distributed (i.i.d.) in both directions. This is an unrealisticassumption when the measurement errors contain information. With the factor modelapplied, we assume that a small number of unobserved factors can capture the covariance inthe measurement error. This is usually reasonable in practice because a few common factorsoften drive the occurrence of systematic measurement error.2. When the smoothing basis functions are incorrectly identified, the smoothing model willlead to an erroneous coefficient estimate and large residuals. The proposed model deals withthis problem since the unexplained variation resulting from the basis’s misidentification canbe modeled with a small number of unobserved common factors.3. When there are step jumps in the mean level of the functions, neglecting the mean shift insmoothing models will result in large residuals at the point where the jumps occur. Thechanges in the mean levels of the functions come from a universal source and can be modeledby common factors.Since the latent factors are unobserved, we propose an iterative approach to estimate thesmooth function and the factors simultaneously. Principal component analysis (PCA) is usedas a tool in estimating the factor model, and penalized least squares estimation is applied toconstruct the estimator for the smoothing coefficient c ik . We establish the asymptotic theories ofthe smoothing coefficient estimator, where the consistency of the estimator is proved. We alsoprovide the asymptotic distribution of the projected estimator in the orthogonal complement ofthe space spanned by the factors f i . The interplay between the smooth component and the factormodel component is manifested.In the remainder of this article, we elaborate on the previously mentioned three motivationsin detail, with examples given in Section 2. In Section 3, the model is formally stated, and theiterative estimation approach is provided. We discuss the asymptotic properties of the smoothingcoefficients under various assumptions in Section 4. We extend the proposed model to a non-parametric smoothing approach in Section 5. In Section 6, we consider the statistical inferenceaspect of the model and propose a covariance matrix estimator for the raw data. In Section 7, weconduct Monte-Carlo simulations on the proposed model under different settings. A few real dataexamples are given in Section 8, and conclusions are drawn in Section 9. Last, we provide proofsof the relevant theorems and lemmas in the Appendix. We introduce three examples to motivate the proposed model. In these cases, the smoothing modelis not adequate to capture the raw data’s signal information. In the first example, when large4easurement error exists, the residuals after smoothing are large with some extreme values. In thesecond example, when the basis functions are selected incorrectly, part of the functions’ variationcannot be captured by the smoothing model. In the third example, when there are step jumps inthe functional data, the residuals after smoothing contain gaps. These examples demonstrate thatfurther modeling of the residuals is needed.
Figure 1 shows the rainbow plots of the average daily temperature and log precipitation at 35locations in Canada. Due to the nature of the two kinds of data, it is reasonable to assume thattemperature and log precipitation are functions over time. The two graphs, however, displaydistinct features. In the temperature plot, though there are some perturbations, it is relatively easyto discern each curve’s shape. In the precipitation plot, there is a tremendous amount of variabilityin the raw data, such that it is almost impossible to observe the underlying shape of the curves.Smooth temperature data can be retrieved without much difficulty using basic smoothingtechniques. The residuals are small, with constant variation. On the other hand, the residualsafter smoothing exhibit a high level of variation for the precipitation data and even contain someextreme values. Our model endeavors to further explain the large residuals in similar cases to theprecipitation data; we will show the fitting result in Section 8. − − − Average Daily Temperature at 35 locations in Canada
Day T e m pe r a t u r e − . − . . . . Average Daily Precipitation at 35 locations in Canada
Day
Log ( P r e c i p i t a t i on ) Figure 1
Average daily temperature and log precipitation in 35 Canadian weather stations aver-aged over the year 1960 to 1994.
It is important to choose the appropriate basis functions in the smoothing method. In this example,we show the inadequacy of the smoothing model when the basis functions are misidentified. Wegenerate functional data using basis functions with changing frequencies. The raw data are shownin Figure 2a. Fourier basis functions are used. In the second half of the data, the frequency of5he Fourier basis functions increases, so the data set exhibits more variation toward the right end.Suppose that we were not aware of the change in the frequencies in the basis functions, and stillused the basis of the first half of the data for the whole curves. The consequence of misidentifyingthe basis functions when a smoothing model is applied can be observed in Figure 2b. The residualsare large in the second half. The smoothing model fails to reduce the residuals; a factor model canbe used to further model the signal hidden in the large residuals. The data generation process andfurther analysis can be found in Section 7.6. − − − u y (a) Raw data − − − u R e s i dua l s (b) Residuals
Figure 2
A simulated sample of functional data with changing basis functions.
We provide another example of functional data with step jumps to motivate our proposed model.Suppose we observed a sample of the raw functional data, as shown in Figure 3a. It can be seenthat there is a jump at around u = In this section, we formally state the proposed model in Section 3.1 and provide the estimationmethod in Section 3.2. We first show how the smoothing coefficient c i and the latent factors f i are estimated separately and then introduce an iterative approach to simultaneously find theseestimates. 6 .0 0.2 0.4 0.6 0.8 1.0 − − u y (a) Raw data − u R e s i dua l s (b) Residuals
Figure 3
A simulated sample of functional data with step jump.
We consider a sample of functional data X i ( u ) , which takes values in the space H : = L ( I ) ofreal-valued square integrable functions on I . The space H is a Hilbert space, equipped withthe inner product (cid:104) x , y (cid:105) : = (cid:82) x ( u ) y ( u ) du . The function norm is defined as (cid:107) x (cid:107) : = (cid:104) x , x (cid:105) . Thefunctional nature of X i ( u ) allows us to represent it as a linear expansion of a set of K smooth basisfunctions. X i ( u ) = K ∑ k = c ik φ k ( u ) , u ∈ I ,where φ k ( u ) is a set of common basis functions and c ik is the k th coefficient for the i th curve.Therefore, we can express the full model as Y ij = K ∑ k = c ik φ k ( u j ) + η ij , η ij = a (cid:62) j f i + (cid:101) ij , i =
1, . . . , n , j =
1, . . . , p ,where f i ∈ R r are the unobserved common factors, a j ∈ R r are the unobserved factor loadingsand r is the number of factors. We call this model the FASM. For the model to be identifiable, werequire the following condition. Identification Condition 1.
We require(i) X i ( u j ) is independent of η ij for i =
1, . . . , n , j =
1, . . . , p, and(ii) p ∑ pj = a j a (cid:62) j p → Σ a > for some r × r matrix Σ a , as p → ∞ ; n ∑ ni = f i f (cid:62) i p → Σ f > for some r × r matrix Σ f , as n → ∞ . The first part of the identification condition ensures the signal function component and thefactor model component are independent. The second part ensures the existence of r factors, each7f which makes a non-trivial contribution to the variance of η ij , which in turn guarantees theidentifiability between the factors and the error term (cid:101) ij .We treat the basis functions φ k ( u ) as known, and the number K fixed. This is, of course, asimplification to accommodate for the theoretical proofs. In real data analysis, there are variouschoices for the basis functions, and the decision can be quite subjective. For example, Fourierbases are preferred for periodic data, while spline basis systems are most commonly used fornon-periodic data. Other bases include wavelet, polynomial, and some ad-hoc basis functions. We can write the model for the i th object as Y i = Φ c i + Af i + (cid:15) i , i =
1, . . . , n (1)where Y i = Y i ... Y ip , c i = c i ... c iK , Φ = φ ( u ) . . . φ K ( u ) ... ... φ ( u p ) . . . φ K ( u p ) , A = a (cid:62) ... a (cid:62) p , (cid:15) = (cid:101) i ... (cid:101) ip .Combining all the objects, we have in matrix form Y = Φ C + AF (cid:62) + E , (2)where Y is p × n and C = ( c , . . . , c n ) is a K × n matrix containing all the smoothing coefficients.The matrix F = ( f , . . . , f n ) (cid:62) is n × r and E = ( (cid:15) , . . . , (cid:15) n ) is p × n . Since Φ is assumed to beknown, we illustrate how the parameters C , A and f are estimated in the following.For the latent factor estimation, there is an identification problem such that AF (cid:62) = AU U − F (cid:62) for any r × r invertible matrix U . Thus we impose the normalization restriction on the matrices A and F A (cid:62) A / p = I r , and F (cid:62) F is a diagonal matrix. (3)We propose to implement penalized least squares, where the objective function is defined asSSR ( c i , A , f ) = n ∑ i = (cid:104) ( Y i − Φ c i − Af i ) (cid:62) ( Y i − Φ c i − Af i ) + α × PEN ( X i ) (cid:105) ,where PEN ( X i ) is a penalty term used for regularization, and α is the tuning parameter controllingthe degree of regularization. The same α is used for all the functional observations i . This is8 simplified case, where we assume a similar degree of smoothness for all curves. The tuningparameter can be chosen by cross-validation or information criteria. We intend to penalize the“roughness” of the function term. To quantify the notion of “roughness” in a function, we use thesquare of the second derivative. Define the measure of roughness asPEN ( X i ) = (cid:90) I (cid:104) D X i ( s ) (cid:105) ds ,where D X i denotes taking the second derivative of the function X i , the larger the tuning parameter α , the smoother the estimated functions we obtain. Further, we denote Φ ( u ) = [ φ ( u ) , . . . , φ K ( u )] (cid:62) . (4)Then X i ( u ) = c (cid:62) i Φ ( u ) .We can re-express the roughness penalty PEN ( X i ) in matrix form as the following:PEN ( X i ) = (cid:90) I (cid:104) D X i ( s ) (cid:105) ds = (cid:90) I (cid:104) D c (cid:62) i Φ ( s ) (cid:105) ds = (cid:90) I c (cid:62) i D Φ ( s ) D Φ (cid:62) ( s ) c i ds = c (cid:62) i (cid:20) (cid:90) I D Φ ( s ) D Φ (cid:62) ( s ) ds (cid:21) c i = c (cid:62) i Rc i , i =
1, . . . , n where R ≡ (cid:90) I D Φ ( s ) D Φ (cid:62) ( s ) ds . (5)Thee matrix R is the same for all subjects and the penalty term PEN ( X i ) differs for each subjectonly by the coefficient c i . Remark 1.
The number of smoothing coefficient c i needed increases as the sample size increases. Theinclusion of a penalty term not only penalizes the ”roughness” of the smoothed function but also mitigatesthe effect of increasing number of parameters to control the model flexibility. Thus, the objective function can be written asSSR ( c i , A , f ) = n ∑ i = (cid:104) ( Y i − Φ c i − Af i ) (cid:62) ( Y i − Φ c i − Af i ) + α c (cid:62) i Rc i (cid:105) ,subject to the constraint A (cid:62) A / p = I r . 9e aim to estimate the smoothing coefficient of c i . We left multiply a matrix to each term in (1)to project the factor model term onto a zero matrix. Define the projection matrix M A ≡ I p − A ( A (cid:62) A ) − A (cid:62) = I p − AA (cid:62) / p . (6)Then M A Af i = (cid:16) I p − AA (cid:62) / p (cid:17) Af i = (cid:16) A − AA (cid:62) A / p (cid:17) f i = .So we estimate c i from the projected equation M A Y i = M A Φ c i + M A (cid:15) i .The projected objective function becomesSSR ( c i , A ) = n ∑ i = (cid:104) ( M A Y i − M A Φ c i ) (cid:62) ( M A Y i − M A Φ c i ) + α c (cid:62) i Rc i (cid:105) . (7)By taking the derivative of SSR ( c i , A ) with respective to each c i , we can solve for the estimator (cid:98) c i . ∂ SSR ( c i A ) ∂ c i = ( M A Y i − M A Φ c i ) (cid:62) ( M A Φ ) + α c (cid:62) i R .Setting the derivative to zero and rearranging the terms, we have (cid:16) Φ (cid:62) M (cid:62) A M A Φ + α R (cid:17) c i = Φ (cid:62) M (cid:62) A M A Y i .Using the fact that M (cid:62) A M A = (cid:16) I p − AA (cid:62) / p (cid:17) (cid:62) (cid:16) I p − AA (cid:62) / p (cid:17) = M A ,we obtain the least squares estimator for c i given A (cid:98) c i = (cid:16) Φ (cid:62) M A Φ + α R (cid:17) − Φ (cid:62) M A Y i .Next, to estimate A and f i , we focus on the factor model η i = Af i + (cid:15) i ,and in matrix form Z = AF (cid:62) + E ,10here Z = ( η , . . . , η n ) . In high dimensions, the unknown factors and loadings are typicallyestimated by least squares (i.e., the principal component analysis; see, e.g., Fan et al. (2008), Onatski(2012). The least squares objective function istr (cid:104) ( Z − AF (cid:62) )( Z − AF (cid:62) ) (cid:62) (cid:105) . (8)Minimizing the objective function with respect to F (cid:62) , we have F (cid:62) = ( A (cid:62) A ) − A (cid:62) Z = A (cid:62) Z / p using (3). Substituting in (8), we obtain the objective functiontr (cid:104) ( Z − AA (cid:62) Z / p )( Z − AA (cid:62) Z / p ) (cid:62) (cid:105) = tr (cid:16) ZZ (cid:62) − ZZ (cid:62) AA (cid:62) / p − ZZ (cid:62) AA (cid:62) / p + AA (cid:62) ZZ (cid:62) AA (cid:62) / p (cid:17) = tr ( ZZ (cid:62) ) − tr ( A (cid:62) ZZ (cid:62) A ) / p ,where the last equality uses (3) and that tr ( ZZ (cid:62) AA (cid:62) ) = tr ( A (cid:62) ZZ (cid:62) A ) . Thus, minimizing theobjective function is equivalent to maximizing tr ( A (cid:62) ZZ (cid:62) A ) / p . The estimator for A is obtainedby finding the first r eigenvectors corresponding to the r largest eigenvalues of the matrix ZZ (cid:62) indescending order, where ZZ (cid:62) = n ∑ i = η i η (cid:62) i = n ∑ i = ( Y i − Φ c i )( Y i − Φ c i ) (cid:62) .Therefore, knowing c i , we solve for (cid:98) A using (cid:34) np n ∑ i = ( Y i − Φ c i )( Y i − Φ c i ) (cid:62) (cid:35) (cid:98) A = (cid:98) AV np , (9)where V np is a r × r diagonal matrix containing the r eigenvalues of the matrix in the squarebrackets in decreasing order. The coefficient np is used for scaling. Remark 2.
The number of factors r is assumed to be known in this paper. In practice, r is selected based onsome criteria regarding the eigenvalues. There have been many studies on this topic. Examples include Bai& Ng (2002), where two model selection criteria functions were proposed; and Onatski (2010), where thenumber of factors was estimated using differenced eigenvalues; and Ahn & Horenstein (2013), where thisnumber was selected based on the ratio of two adjacent eigenvalues.
It can be seen that A is needed to find (cid:98) c i , and in turn c i is needed to find (cid:98) A . The final estimator11 (cid:98) c i , (cid:98) A ) is the solution of the set of equations (cid:98) c i = (cid:16) Φ (cid:62) M (cid:98) A Φ + α R (cid:17) − Φ (cid:62) M (cid:98) A Y i , i =
1, . . . , n (cid:104) np ∑ ni = ( Y i − Φ (cid:98) c i ) ( Y i − Φ (cid:98) c i ) (cid:62) (cid:105) (cid:98) A = (cid:98) AV np . (10)Since there is no closed-form expression of (cid:98) A and (cid:98) c i , we propose using numerical iterations tofind the estimates. The details of these iterations are as follows: Algorithm 1:
Iterations for estimating FASM1. Denote the initial value as (cid:98) A ( ) . Using (10), we obtain (cid:98) c ( ) i = (cid:16) Φ (cid:62) M (cid:98) A ( ) Φ + α R (cid:17) − Φ (cid:62) M (cid:98) A ( ) Y i .2. With (cid:98) c ( t ) i , we substitute into the second equation of (10), to obtain (cid:98) A ( t + ) = ( (cid:98) a ( t + ) , . . . , (cid:98) a ( t + ) r ) (cid:62) , where (cid:98) a ( t + ) j is the eigenvector of the matrix np ∑ ni = ( Y i − Φ (cid:98) c ( t + ) i )( Y i − Φ (cid:98) c ( t + ) i ) (cid:62) corresponding to its j th largest eigenvalue.3. With (cid:98) A ( t + ) , we obtain (cid:98) c ( t + ) i = (cid:16) Φ (cid:62) M (cid:98) A ( t + ) Φ + α R (cid:17) − Φ (cid:62) M (cid:98) A ( t + ) Y i using (10)4. We then repeat steps 2 and 3 until (cid:107) (cid:98) c ( t + ) i − (cid:98) c ( t ) i (cid:107) < δ , where δ is a small positive constant. Remark 3.
In this paper, we use (cid:98) A ( ) = . This means we start by ignoring the factor model componentso the initial value for the smoothing coefficient (cid:98) c ( ) i = (cid:16) Φ (cid:62) Φ + α R (cid:17) − Φ (cid:62) Y i , which is simply the ridgeestimator. The convergence of Newton’s numeric iteration requires the convergence of this estimator, whichin turn requires the factor model component η ij to have an expectation of zero. The stopping criterion onlyfocuses on (cid:98) c i because we are interested in estimating η ij as a whole. Remark 4.
Common methods for selecting the shrinkage parameter α include the Akaike’s InformationCriterion (AIC Akaike 1974), the Bayesian Information Criterion (BIC Schwarz 1978), and cross-validation.In this paper, we use the mean generalized cross-validation (mGCV) method (Golub et al. 1979). We define,at step t, mGCV ( t ) = n n ∑ i = pSSE ( t ) i [ p − d f ( t ) ( α )] , (11) where SSE ( t ) i is the sum of squares residual for the ith object at step t and d f ( t ) ( α ) is the equivalent degreesof freedom measure, which can be calculated as df ( t ) ( α ) = trace (cid:20) Φ (cid:16) Φ (cid:62) M (cid:98) A ( t ) Φ + α R (cid:17) − Φ (cid:62) M (cid:98) A ( t ) (cid:21) . (12) At each step of the iteration, the tuning parameter α is chosen by minimizing the mGCV ( t ) . emark 5. Algorithm 1 is an iteration procedure in which ridge regression and PCA are iterated. Theconvergence of this iterative algorithm is studied in Jiang et al. (2020). For instance, Theorem 2 of Jianget al. (2020) provides some sufficient conditions under which the recursive algorithm converges to the truevalue or some other values. In particular, when the regressors are independent of the common factors, or thefactors involved in regressors are weaker than the common factors, this algorithm will converge to the trueparameter.
After we obtain the estimates (cid:98) A and (cid:98) c i , the estimated coefficient matrix (cid:98) C is constructed as (cid:98) C = ( (cid:98) c , . . . , (cid:98) c n ) , and the estimated factor can be obtained by (cid:98) F (cid:62) = (cid:98) A (cid:62) ( Y − Φ (cid:98) C ) .Finally, the functional component can be estimated by (cid:98) X i ( u ) = (cid:98) c (cid:62) i Φ ( u ) , where Φ ( u ) is definedin (4). Remark 6.
Although we have imposed the constraint in (3) and the identification condition 1, A and f i are not uniquely determined, since the model (1) is unchanged if we replace A and f i with AU and U (cid:62) f i for any orthogonal r × r matrix U . However, the linear space spanned by the columns of A is uniquelydefined. Although we are not able to estimate A , we can still estimate a rotation of A , which spans the samespace as A does. The matrix M A defined in (6) is a projecting matrix onto the orthogonal complement ofthe linear space spanned by the columns of A . It is shown in the next section that the estimator M (cid:98) A for M A is consistent. In this section, we study the asymptotic properties of the coefficient estimator (cid:98) c i with growingsample size and dimension. We state the assumptions in Section 4.1 and provide the asymptoticresults of (cid:98) c i in Section 4.2. We use ( c i , A ) to denote the true parameters. In this paper, the norm of a vector or matrix U isdefined as the Frobenius norm; that is, (cid:107) U (cid:107) = [ tr ( U (cid:62) U )] . We introduce the matrix D i ( A ) ≡ p Φ (cid:62) M A Φ − p Φ (cid:62) M A Φ f (cid:62) i (cid:32) F (cid:62) F n (cid:33) (cid:62) f i . (13)This matrix plays an important role in this article. It is used in the proof of the consistency of (cid:98) c i , ascan be found in Appendix A. The identifying condition for c i is that D i ( A ) is positive definite for13ll i , which is stated in Assumption 3.First, we state the assumptions. Assumption 1. sup u | φ k ( u ) | = O ( ) , k =
1, . . . , K .The above assumption declares that the basis functions are bounded in the norm. This is quitenatural as some of the most commonly used basis functions are bounded; for instance, the Fourierbasis, B-spline basis, and wavelet basis functions (Ramsay & Silverman 2005). Assumption 2. (cid:107) c i (cid:107) = O ( ) , for all i .Above, we assume the smoothing coefficients c i are bounded uniformly for all i . This assump-tion is introduced to ensure the uniform consistency of the estimated coefficients of (cid:98) c i . Assumption 3.
Let A = { A : A (cid:62) A / p = I , and A independent of Φ } . We assume inf A ∈A D i ( A ) > c i . The usual assumption for the least-squares estimator only contains the first term on the right-hand side of (13). The second term onthe right-hand side of (13) arises because of the unobservable matrices F and A . Assumption 4.
For some constant M > , E (cid:107) a j (cid:107) ≤ M, j =
1, . . . , p and E (cid:107) f i (cid:107) ≤ M. Assumption 5.
For some constant M > , the error terms (cid:101) ji , j =
1, . . . , p , i =
1, . . . , n are i.i.d. in bothdirections, with E ( (cid:101) ji ) = , Var ( (cid:101) ji ) = σ , and E | (cid:101) ji | ≤ M. Assumption 6. (cid:101) ji is independent of φ s , f t , and a s for all j , i , s , t. We require that the errors are independent in themselves and also of the functional term φ ( u ) and factor model terms f i and a j . To not mask the main contribution of our method, we use asimplified setting on the error terms to exclude endogeneity. Nevertheless, with simple but tediousmodifications, Assumption 5 can be relaxed, and our model can be extended to more complicatedsettings where correlations between the error term and the factor model term are allowed. Assumption 7.
The tuning parameter α satisfies α = o ( p ) . This is conventionally assumed in ridge regression (see, e.g., Knight & Fu 2000) and assuresthat the estimator’s asymptotic bias is zero. 14efore stating the next assumption, we introduce some notations. Let ω j , j =
1, . . . , p denotethe j th column of the K × p matrix Φ (cid:62) M A , and let ψ ik denote the ( i , k ) th element of the matrix M F , where M F ≡ I n − F (cid:16) F (cid:62) F (cid:17) F (cid:62) . (14)Then, for any vector b = ( b , . . . , b n ) (cid:62) , we can write1 √ np Φ (cid:62) M A EM F b = √ np n ∑ i p ∑ j ω j (cid:101) ji n ∑ k ψ ik b k ≡ √ np n ∑ i p ∑ j x ij . (15)In (15), for notation simplicity, we define x ij as ω j (cid:101) ji ∑ nk ψ ik b k . The matrix Φ (cid:62) M A EM F is ofinterest because it is the main component that contributes to the asymptotic distribution of theestimators, as shall be seen in the next section.Let L np ≡ σ np n ∑ i p ∑ j ω (cid:62) j ω j (cid:32) n ∑ k ψ ik b k (cid:33) . (16)We make the following assumption. Assumption 8.
We assume there exists a K × K matrix L such that L ≡ lim n , p → ∞ L np , (17) where L np is defined in (16) . Let ν be the smallest eigenvalue of the matrix L defined in (17) , then assumethat ν > , and that, for all ε > , lim n , p → ∞ np ν n ∑ i = p ∑ j = E (cid:104)(cid:13)(cid:13) x ij (cid:13)(cid:13) (cid:16)(cid:13)(cid:13) x ij (cid:13)(cid:13) ≥ ε np ν (cid:17)(cid:105) =
0. (18)This assumption is the multivariate Lindeberg condition, which is needed in constructing thecentral limit theorem in the next section. This is by no means a strong condition; for instance,when the factor model component is ignored, ω j is simply φ j , and x ij = φ j b i (cid:101) ji . Since we assume φ j = O ( ) in Assumption 1, the Lindeberg condition in (18) is met. As we have mentioned previously, the identification problem of the latent factor implies that weactually use the estimator (cid:98) A to estimate a rotation of A . Based on the objective function (7) inSection 3, we use a center-adjusted objective function, defined as below. S np ( c i , A ) = np n ∑ i = (cid:104) ( Y i − Φ c i ) (cid:62) M A ( Y i − Φ c i ) + α c (cid:62) i Rc i (cid:105) − np n ∑ i = (cid:101) (cid:62) i M A (cid:101) i , (19)15here M A is defined in (6), satisfying A (cid:62) A / p = I r . The second term on the right-hand sideof (19) does not contain the unknown A and c i , so the inclusion of this term does not affect theoptimization result. This term is only used for center adjusting, so that the resulting objectivefunction has an expectation zero. We estimate c i and A by ( (cid:98) c i , (cid:98) A ) = argmin c i , A S np ( c i , A ) . (20)In the following, we establish the asymptotic properties for the estimated coefficient matrix (cid:98) C . In Theorem 1, the consistency of the matrix (cid:98) C is proved. In Theorem 2, we show the rate ofconvergence of (cid:98) C . Theorem 3 provides the asymptotic distribution of (cid:98) C .Let P U = U ( U (cid:62) U ) − U (cid:62) for a matrix U . Theorem 1.
Under Assumptions 1 - 6, as n , p → ∞ , we have the following statements(i) √ n (cid:13)(cid:13)(cid:13) C − (cid:98) C (cid:13)(cid:13)(cid:13) p → (ii) (cid:13)(cid:13) P (cid:98) A − P A (cid:13)(cid:13) p → . We start by proving consistency for the vector (cid:98) c i . This consistency is uniform for all i =
1, . . . , n .Therefore, we could combine c i for all i =
1, . . . , n , and have the result for the coefficient matrix (cid:98) C in ( i ) . The matrix (cid:98) C is of dimension K × n , where K is fixed and the sample size n goes toinfinity, so there is a √ n scale in the result of ( i ) . In the second part of the theorem, note that P A = I p − M A , where M A is the projection matrix onto the orthogonal complement of the linearspace spanned by the columns of A . Thus, P (cid:98) A and P A represent the spaces spanned by (cid:98) A and A , and we show that they are asymptotically the same in ( ii ) .Next, we obtain the rate of convergence. Theorem 2.
Under Assumptions 1 - 6, if p / n → ρ > , (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) (cid:16) C − (cid:98) C (cid:17) √ n M F (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) = O p (cid:18) √ p (cid:19) , where M F is defined in (14) . We study the case when the dimension p and the sample size n are comparable. We achieverate √ p convergence, considering (cid:107) C − (cid:98) C (cid:107) √ n on the whole. It is expected that the rate of convergencefor smoothing models depends on the number of discrete points p observed on each curve. Remark 7.
The asymptotic result in Theorem 2 contains a projection matrix M F . This matrix projects C − (cid:98) C onto the space orthogonal to the factor matrix F . This theorem shows the interplay between C and F . When C and F are orthogonal, ( C − (cid:98) C ) M F = C − (cid:98) C , and we obtain the rate of convergence f C − (cid:98) C . When C and F are not orthogonal, the inference on C will be affected by the existence of thefactor model component. We further begin to establish the limiting distribution. It is shown in Appendix A that (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) √ p (cid:16) C − (cid:98) C (cid:17) √ n M F (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) = (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:18) p Φ (cid:62) M A Φ (cid:19) − √ np Φ (cid:62) M A EM F (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) + o p ( ) .The limiting distribution is constructed based on the first term on the right-hand side. Let ω j denote the j th column of the K × p matrix Φ (cid:62) M A . We then have the following lemma. Lemma 1.
Under Assumptions 1 - 7, for any vector b = ( b , . . . , b n ) (cid:62) , √ np Φ (cid:62) M A EM F b d → N ( , L ) , where L is defined in (17) . This lemma paves the way for the next theorem on asymptotic normality.
Theorem 3.
Under Assumptions 1 - 7, if p / n → ρ > , we have for any vector b ∈ R n √ p (cid:32) C − (cid:98) C √ n (cid:33) M F b d → N (cid:18) , Q (cid:16) A (cid:17) − LQ (cid:16) A (cid:17) − (cid:19) , where M F is defined in Theorem 2, L is defined in (17) , and Q (cid:16) A (cid:17) ≡ p Φ (cid:62) M A Φ .The vector b comes from the same vector in Lemma 1. The asymptotic bias is zero since weassume no serial or cross-sectional correlation in the error terms. This is a simplified setting,which can be extended to allow for weak correlations in errors in both directions. In that case, theasymptotic distribution will include a non-zero bias term. Remark 8.
Theorem 3 shows that the asymptotic distribution of the coefficient matrix (cid:98) C relies on theunobserved factor loading matrix A . Although we are unable to consistently estimate A with (cid:98) A , what weneed is in fact the projected matrix M A , which can be estimated by M (cid:98) A . We are able to find estimators for Q and L based on M (cid:98) A , (cid:98) Q = p Φ (cid:62) M (cid:98) A Φ (cid:98) L = σ np n ∑ i p ∑ j (cid:98) ω (cid:62) j (cid:98) ω j (cid:32) n ∑ k (cid:98) ψ ik b k (cid:33) ,17 here (cid:98) ω j is the jth column of the K × p matrix Φ (cid:62) M (cid:98) A and (cid:98) ψ ik is the ( i , k ) th element in the matrix M (cid:98) F . In the FASM, we use basis smoothing where we assume the basis functions φ k ( u ) are known,and we show the asymptotic properties in the previous section. However, in practice, the basisfunctions are usually unknown, and nonparametric smoothing techniques are frequently used. Inthis section, we extend the proposed model to a smoothing spline.In spline smoothing, a spline basis is used to model functions. We consider smoothing splines,where regularized regression is performed, and the knots are placed on all the observed discretepoints. The most commonly considered basis is the cubic smoothing splines, where the order is4. The number of basis functions equals the number of interior knots plus the order. Thus, weuse p + k th basis function as ψ k ( u ) , k =
1, . . . , p +
2. Let the p × ( p + ) matrix Ψ denote the basis matrix, where the ( j , k ) th element is ψ k ( u j ) . The objectivefunction can be written asSSR ( w i , A , f ) = n ∑ i = (cid:104) ( Y i − Ψ w i − Af i ) (cid:62) ( Y i − Ψ w i − Af i ) + α w (cid:62) i U w i (cid:105) ,where w i is the vector of smoothing coefficient and the matrix U = (cid:82) I D Ψ ( s ) D Ψ (cid:62) ( s ) ds issimilarly defined as the matrix R in (5). The estimators are the solution of the equation system (cid:98) w i = (cid:16) Ψ (cid:62) M (cid:98) A Ψ + α U (cid:62) (cid:17) − Ψ (cid:62) M (cid:98) A Y i , i =
1, . . . , n (cid:104) np ∑ ni = ( Y i − Ψ (cid:98) c i )( Y i − Ψ (cid:98) w i ) (cid:62) (cid:105) (cid:98) A = (cid:98) AV np ,which can be solved by iteration approach in Section 3.2. The function estimator is (cid:98) X i ( u ) = (cid:98) w (cid:62) i Ψ ( u ) , where Ψ ( u ) is a vector containing all the p + ψ k ( u ) .It could be seen that the model is almost the same as the model proposed in Section 3. Thedifference lies in the dimensions of the matrices Ψ and Φ . In the parametric model, we assume thebasis functions are known, and the number of basis functions is fixed, so Φ is a p × K matrix. Whilein smoothing spline modeling, the matrix Ψ is of dimension p × ( p + ) . The number of basisfunctions p + Statistical inference on covariance matrix estimation
Having presented the model estimation approach and the estimators’ asymptotic properties,we now consider statistical inference with FASM. Our model serves as a dimension reductiontechnique and avoids the curse of dimensionality problem, rendering making inferences from themodel convenient.Covariance estimation is fundamental in both FDA and high-dimensional data analysis. Inthese areas, data are of high dimensions, which brings many challenges. In the FDA, the number ofdiscrete points on each curve is often larger than the number of curves. Similarly, the dimension p of high-dimensional data is typical of the same order or larger than the sample size n . In this case,the traditional sample covariance estimator no longer works. Dimension reduction by imposingsome structure on the data is one of the main ways to solve this problem (see, e.g., Wong et al.2003, Bickel & Levina 2008, Fan et al. 2008). By reducing the data dimension with a smoothingmodel and a factor model in FASM, we propose an alternative covariance matrix estimator.We consider the covariance matrix of the observed high-dimensional data Y i . Let Σ Y ≡ cov ( Y ) .Based on the FASM where Y i = Φ c i + Af i + (cid:15) i ,we obtain Σ Y = ΦΣ c Φ (cid:62) + A Σ f A (cid:62) + Σ (cid:15) , (21)where Σ c and Σ f are covariance matrices of the vectors c and F respectively and Σ (cid:101) denotesthe error variance structure and is a diagonal matrix under Assumption 4. Based on the aboveequation, we have an estimator (cid:98) Σ Y = Φ (cid:98) Σ c Φ (cid:62) + (cid:98) A (cid:98) Σ f (cid:98) A (cid:62) + (cid:98) Σ (cid:15) , (22)where (cid:98) Σ c and (cid:98) Σ f can be calculated by (cid:98) Σ c = n − CC (cid:62) − n ( n − ) C (cid:62) C (cid:62) (cid:98) Σ f = n − F F (cid:62) − n ( n − ) F (cid:62) F (cid:62) ,where s are vectors containing ones, the dimensions of which depend on the matrices multiplied19efore and after the vectors. The diagonal error covariance matrix Σ (cid:15) is estimated by (cid:98) Σ (cid:15) = diag (cid:16) n − (cid:98) E (cid:98) E (cid:62) (cid:17) ,where (cid:98) E is the residual matrix calculated as (cid:98) E = Y − Φ (cid:98) C − (cid:98) A (cid:98) F (cid:62) . Remark 9.
In functional data analysis where the functional signal is the focus, the estimation of thecovariance function ΦΣ c Φ (cid:62) is of main interest. In this paper, we study the covariance structure of themixture of functional data and high-dimensional data. This type of covariance estimator based on factormodels has also been used in previous literature. For example, Fan et al. (2008) employed a multi-factor modelwhere the factors are assumed observable, while Fan et al. (2011) considered an extension to approximatefactor models where cross-sectional correlation is allowed in the error terms. We compare the finite sample performance using mean squared error (MSE) of the proposedcovariance estimator with the ordinary sample covariance estimator. When the factor structure isignored, the sample covariance estimator is expected to have a larger variance than our estimator.The advantage of the proposed estimator is shown in Section 7.4.
In this section, we use simulated data to illustrate the superiority of the proposed model. TheFASM is compared with the smoothing model in Section 7.1 to 7.3. In Section 7.4, we comparethe finite sample performance of the covariance matrix estimator introduced in Section 6 with theordinary sample covariance estimator. In Section 7.6, we show how the FASM performs whenapplied to functional data with step jumps.
We generate simulated data Y ij , where i =
1, . . . , n and j =
1, . . . , p from the following model: Y ij = X i ( u j ) + η ij + (cid:101) ji = ∑ k = c ik φ k ( u j ) + ∑ k = λ ik F kj + (cid:101) ji ,where φ k ( u ) are chosen as B-spline basis functions of order 4 and the smoothing coefficients c ik are generated from N (
0, 1.5 ) . The factors F kj follow N (
0, 0.5 ) and the factor loadings ( λ i , λ i , λ i , λ i ) (cid:62) ∼ N ( µ , Σ ) , where Σ is a 4 by 4 covariance matrix. We set the multivariatemean term µ = and variance Σ = σ I . We adjust the value of σ to control the signal-to-noise20atio. When σ is large, the signal-to-noise level is low, and when σ is small, the signal-to-noiselevel is high. The random error terms (cid:101) ji follow N (
0, 0.5 ) . The numeric iteration procedure for finding ( (cid:98) c i , (cid:98) A , (cid:98) f ) is introduced in Section 3. We compare theFASM with the smoothing model, where the factor model component is ignored. The smoothingmodel can be expressed as: Y i = Φ c i + (cid:15) i ,where the coefficient estimator is calculated as: (cid:98) c i = (cid:16) Φ (cid:62) Φ + α R (cid:17) − Φ (cid:62) Y i , i =
1, . . . , n .The tuning parameter α is also chosen using mGCV defined in (11). We repeat the simulation setup 100 times and obtain the estimated smooth function (cid:98) X i ( u ) = (cid:98) c (cid:62) i Φ ( u ) . The averaged mean squared error (aMSE) for function estimation is calculated asaMSE = np n ∑ i = p ∑ j = (cid:104) X i ( u j ) − (cid:98) X i ( u j ) (cid:105) .The results are reported in Table 1. With the same sample size n , increasing the number of points p on the curve decreases the estimation error. However, with the same value for p , increasing thesample size does not decrease the estimation error. This is consistent with the convergence ratestated in Section 4, where the estimator converges with the rate related to p . When σ is large, suchthat the signal-to-noise ratio is high, the FASM performs better than the smoothing model. Table 1
The aMSE of the function estimates with different samples sizes and dimensions. Theadjustment of σ value is used to control the signal-to-noise ratio.aMSEDimension σ FASM Smoothing model n = p = σ = σ = σ = η FASM Smoothing model n = p = σ = σ = σ = n = p = σ = σ = σ = n = p = σ = σ = σ = This section shows the finite sample performance of the covariance estimator defined in (22). Wealso calculate the regular sample covariance estimator (cid:98) Σ ∗ Y using (cid:98) Σ ∗ Y = n − (cid:0) Y − Y (cid:1) (cid:0) Y − Y (cid:1) (cid:62) ,where the p × n matrix Y is the sample mean matrix whose j th row elements are n ∑ ni = Y ij .Both estimators are compared with the population covariance matrix, which is calculatedusing (21). We calculate the estimation errors under the Frobenius norm asMSE = p (cid:13)(cid:13)(cid:13)(cid:98) Σ Y − Σ Y (cid:13)(cid:13)(cid:13) .We show the MSE results in Table 3. It can be seen that the FASM produces smaller MSE values inalmost all cases. 22 able 2 The MSE of the two covariance estimators with different sample sizes and dimensions.The adjustment of σ value is used to control the signal-to-noise ratio.MSEDimension σ FASM Sample covariance n = p = σ = σ = σ = n = p = σ = σ = σ = n = p = σ = σ = σ = n = p = σ = σ = σ = In this section, we apply the factor-augmented nonparametric smoothing model introduced inSection 5 to simulated data and compare the results with using nonparametric smoothing modelswithout the factor component.We generate simulated data Y ij , where i =
1, . . . , n and j =
1, . . . , p from the following model: Y ij = X i ( u j ) + η ij + (cid:101) ji = ∑ k = c ik φ k ( u j ) + ∑ k = λ ik F kj + (cid:101) ji ,where φ k ( u ) are Fourier basis functions and the smoothing coefficients c ik are generated from N (
0, 1.5 ) . The factors F kj follow N (
0, 0.5 ) and the factor loadings ( λ i , λ i , λ i , λ i ) (cid:62) ∼ N ( µ , Σ ) ,where Σ is a 4 by 4 covariance matrix. The random error terms (cid:101) ji follow N (
0, 0.5 ) . We set themultivariate mean term µ = and variance Σ = σ I . We adjust the value of σ to control thesignal-to-noise ratio. When σ is large, the signal-to-noise level is low, and when σ is small, thesignal-to-noise level is high. 23 moothing spline We use order 4 B-spline basis with knots at every data point. With data of dimension p , we use p + λ is selected by the mean generalized cross-validation (11)at each iteration step. The covariance estimate is calculated using (22). Table 3 presents the results. Table 3
Using smoothing splines: The aMSE of the estimated functions and the MSE of the twocovariance estimators with different sample sizes and dimensions. The adjustment of σ value is used to control the signal-to-noise ratio.aMSE MSEDimension σ FASM Smoothing FASM Sample covariance n = p = σ = σ = σ = n = p = σ = σ = σ = n = p = σ = σ = σ = n = p = σ = σ = σ = We elaborate on the example presented in Section 2.2. We generate data from Y ij = ∑ k = c ik φ k ( u j ) + (cid:101) ji , i =
1, . . . , n , j =
1, . . . , p , (23)where { φ k ( u ) , k =
1, . . . , 7 } are a set of Fourier basis functions. The first Fourier basis function φ ( u ) is the constant function; the remainder are sine and cosine pairs with integer multiples ofthe base period. We generate the Fourier functions with doubled frequencies in the second half tosimulate the change in the basis functions. In particular, when u ∈ [
0, 0.5 ] , φ k ( u ) = ( k π u ) , for24 =
2, 4, 6, and φ k ( u ) = [( k − ) π u ] , for k =
3, 5, 7, and when u ∈ ( ] , φ k ( u ) = ( k π u ) ,for k =
2, 4, 6, and φ k ( u ) = [ ( k − ) π u ] , for k =
3, 5, 7. The coefficients c ik are generated fromthe normal distribution with mean 0 and variance 0.5 . The error terms are also drawn from thenormal distribution with mean 0 and variance 0.5 . The generated Y ij are shown in Figure 2a. Itcan be seen that the data exhibit more variation in the second half of the interval.Suppose we were unaware of the change in the frequencies of the basis functions and usedthe bases in the first half to fit the data on the whole interval. The smoothing model residuals,shown in Figure 2b, are large in the second half. When the frequency of the basis functions ismisidentified, a smoothing model with the wrong set of bases is inadequate. We conduct principalcomponent analysis on the smoothing residuals; the eigenvalues in descending order are shown inFigure 4a. The residuals preserve a spiked structure, where six common factors can explain mostof the variation.We also apply FASM to the same data with the wrong set of basis functions. According to theeigenvalue scree plot, we retain six factors in the model ( r = l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l Index E i gen v a l ue s (a) Spikes of the smoothing residuals − − − u R e s i dua l s (b) Residuals of FASM
Figure 4
Applying FASM to the data generated by (23).
We study the case where the functional data exhibit a dramatic change in the mean level within asmall window. We generate data from the following model Y ij = µ ( u j ) + ∑ k = c ik φ k ( u j ) + (cid:101) ji , i =
1, . . . , n , j =
1, . . . , p ,where the basis functions φ k ( u ) are order 4 B-spline bases. The coefficients c ik come from N (
0, 1.5 ) and the error terms from N (
0, 0.5 ) . The mean function µ ( u ) is generated by a linear combination25f 25 B-spline basis functions. Figure 5 shows an example of the mean function- there is a sharpincrease in the mean function at around u = u = δ denotes the amount of change. Figure 3is generated using δ =
2. Figure 6 compares the residuals from the smoothing model and theFASM. With the smoothing model, the residuals around the jump are large. In contrast, our modelexplains the large residuals around the structural break very well. In the aspect of model selection,we consider the trade-off between model fit and model flexibility. We first define a notion offlexibility for a fitted model with the degrees of freedom. We use the same concept as in mosttextbooks that the degrees of freedom measure the number of parameters estimated from the datarequired to define the model. The degrees of freedom for the smoothing model are calculatedby (12) of the last step of convergence. The degrees of freedom for the FASM isdf = trace (cid:104) Φ ( Φ (cid:62) M (cid:98) A ( t ) Φ + α R ) − Φ (cid:62) M (cid:98) A ( t ) (cid:105) + r ,where r is the number of factors retained in the fitted model, the larger the degrees of freedom, themore flexible the fitted models are. To quantify the model fitting, we useRMSE = (cid:118)(cid:117)(cid:117)(cid:116) np n ∑ i = p ∑ j = (cid:16) Y ij − (cid:98) Y ij (cid:17) ,where (cid:98) Y ij = ∑ Kk = (cid:98) c ik φ k ( u j ) + (cid:98) η ij . In Table 4, we show the simulation results by changing the valueof the mean shift δ . The RMSE of the FASM is always smaller than the compared model. Thedegrees of freedom when δ = δ increases, the degrees of freedom is smallerfor the proposed model. Therefore, we achieve better fit but less flexibility with the FASM. − u m u Figure 5
The mean function µ ( u ) .26 .0 0.2 0.4 0.6 0.8 1.0 − . − . . . . u R e s i dua l (a) Residuals from applying smoothing model − . − . . . . u R e s i dua l (b) Residuals from FASM
Figure 6
Residual plots of the two models.
Table 4
The trade-off between model fitting and flexibility.RMSE DFSmoothing Proposed Smoothing Proposed δ = δ = δ = In this section, we apply the FASM to two real data sets. In Section 8.1, we compare Canadianyearly temperature and precipitation data and demonstrate the advantages of the FASM whenthe measurement error is large. In Section 8.2, we analyze Australian daily temperature data anddisplay the necessity of including the factor model because of the spike structure of the data.
In Section 2.1, we introduced Canadian weather data. Raw observations of daily temperature andprecipitation data are presented in Figure 1. Since the true basis functions are unknown, we applythe FASM with nonparametric smoothing splines introduced in Section 5 to these two datasets.We use order 4 B-spline basis functions with knots at every data point. Thus, when the numberof data points is 365, we use 367 basis functions. The number of factors r is chosen with thescree plot showing the fraction of variation explained. For temperature data, we presumed themeasurement error is small. The resulting smoothed curves are shown in Figure 7. Comparedwith using the smoothing model introduced in Section 7.2, the FASM generates similar results.This meets our expectation that our model should work the same as a simple smoothing modelwhen measurement error does not exist.In Section 2.1, we suspect large measurement errors are contained in the raw log precipitationdata. We apply the two models to the log precipitation data; the resulting smoothed curves are27resented in Figure 8. The plot on the right shows smoother curves, especially at the drop in theblue curve (the ’Victoria’ Station) at around day 200. Looking at the residual plots in Figure 9, ourmodel mainly explains some extreme residuals left out from solely applying the smoothing model.As in Section 6, we also compare the RMSE and degrees of freedom of the two fitted models; theyare 0.1933 and 14.41 for the smoothing model and 0.1659 and 12.71 respectively for the proposedmodel. Thus, in terms of model selection, our model performs better across both model fit andmodel simplicity. − − − Day T e m pe r a t u r e (a) Smoothed temperature curves from basissmoothing with the penalty − − − Day T e m pe r a t u r e (b) Smoothed temperature curves from the FASM
Figure 7
Comparison between the smoothed curves. − . − . . . . Day
Log ( P r e c i p i t a t i on ) (a) Smoothed log precipitation curves using thesmoothing model − . − . . . . Day
Log ( P r e c i p i t a t i on ) (b) Smoothed log precipitation curves using theFASM
Figure 8
Comparison between the smoothed curves.
In this section, we consider Friday temperature data at Adelaide airport. We choose Adelaidebecause it tends to have the hottest temperature among Australia’s big cities. Data from otherweekdays exhibit similar features and are not shown here. The data are measured every half an28
100 200 300 − . − . . . . Day R e s i dua l (a) Residuals from basis smoothing with the penalty − . − . . . . Day R e s i dua l (b) Residuals from the FASM
Figure 9
Comparison between the residuals.hour from the year 1997 to 2007. The sample size n is 508, and the number of discrete data points p from each curve is 48. The plot of the raw data can be found in Figure 10a. It can be seen that thedata are quite noisy, with extreme values in some of the curves due to large measurement errors.We use the B-spline basis functions of order 4 with knots at every data point. A penalizedsmoothing model is fitted to the data, with the tuning parameter selected to minimize the mGCVvalue. The residuals are shown in Figure 10b. As can be seen, the smoothing model fails to capturethe extreme values in the data.We check the residuals’ spikiness in Figure 10b by conducting a principal component analysis.The eigenvalues in descending order are shown in Figure 10c. The first few eigenvalues aresignificantly larger than the rest. This means the residuals contain information captured by just afew factors, which calls for a further dimension reduction model on the residuals.As a comparison, the FASM is also applied to the data. The tuning parameter for the smoothingpart is selected based on mGCV at each step of the iteration. The number of factors retained in thefactor model component is five. The residuals are shown in Figure 10d. The extreme values arealmost all removed from the remaining residuals. In this paper, we propose a factor-augmented smoothing model for functional data. We studyraw functional data, which is a mixture of functional curves and high-dimensional errors. Whenmeasurement error is informative, a smoothing model alone is inadequate to capture data variationand recover the signal functional component. The proposed model incorporates a factor structureinto the smoothing model to further explain the large residuals. We propose a numerical iterationapproach to simultaneously obtain estimates in the smoothing model and the factor model. Theasymptotic distribution of the estimators is given with proof. Our model also serves as a dimension29
10 20 30 40
Friday temperature at Adelaide airport
Half−hours T e m pe r a t u r e (a) Raw temperature data − − Half−hours R e s i dua l s (b) Residuals of the smoothing model l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l . . . . Index E i gen v a l ue (c) Eigenvalues of the residuals in the plot (b) − − Half−hours R e s i dua l s (d) Residuals of the FASM
Figure 10
Half-hourly temperature data on Friday at Adelaide airport.30eduction method on functional and high-dimensional mixture data, easing the path to makinginferences. We provide an example of the construction of a covariance estimator for the raw data.Further, we show that the model can be applied in situations where there is misidentification in thedata structure, two examples of which are the misspeficications of smoothing basis functions andthe neglect of the step jumps in the mean level of the functions. The advantages of the proposedmodel are demonstrated in extensive simulation studies. We also show how our model performsvia applications to Canadian weather data and Australian temperature data.The proposed model is a good start point for modeling complex data structures. The data wedeal with are a mixture of smooth functional curves and high-dimensional measurement error.The factor model component can be regarded as a ”boosting” component that improves modelaccuracy. Extending from this idea, the model can be applied to other data structures. One exampleis the data that contain change points. Change point is a popular problem in many statistics andeconometric topics and has been extensively studies in the multivariate setting. Previous literatureon change point in functional data include Berkes et al. (2009), H ¨ormann & Kokoszka (2010) andH ¨ormann et al. (2015). It is shown in simulation examples that our model can be used for modelingfunctional data with change point in the cross-sectional direction. The model can be modified toaccount for change point also in the sample direction. Further research can be conducted alongthis line. 31 eferences
Ahn, S. C. & Horenstein, A. R. (2013), ‘Eigenvalue ratio test for the number of factors’,
Econometrica (3), 1203–1227.Akaike, H. (1974), ‘A new look at the statistical model identification’, IEEE Transactions on AutomaticControl (6), 716–723.Bai, J. & Ng, S. (2002), ‘Determining the number of factors in approximate factor models’, Econo-metrica (1), 191–221.Berkes, I., Gabrys, R., Horv´ath, L. & Kokoszka, P. (2009), ‘Detecting changes in the mean offunctional observations’, Journal of the Royal Statistical Society: Series B (Statistical Methodology) (5), 927–946.Bickel, P. J. & Levina, E. (2008), ‘Regularized estimation of large covariance matrices’, The Annals ofStatistics (1), 199–227.Cai, T. T. & Yuan, M. (2011), ‘Optimal estimation of the mean function based on discretely sampledfunctional data: Phase transition’, The Annals of Statistics (5), 2330–2355.Cuevas, A. (2014), ‘A partial overview of the theory of statistics with functional data’, Journal ofStatistical Planning and Inference , 1–23.Eubank, R. L. (1999),
Nonparametric Regression and Spline Smoothing , 2nd edn, Marcel Dekker, NewYork.Fan, J., Fan, Y. & Lv, J. (2008), ‘High dimensional covariance matrix estimation using a factormodel’,
Journal of Econometrics (1), 186–197.Fan, J. & Gijbels, I. (1996),
Local Polynomial Modelling and Its Applications , Chapman & Hall, London.Fan, J., Liao, Y. & Mincheva, M. (2011), ‘High dimensional covariance matrix estimation inapproximate factor models’,
The Annals of Statistics (6), 3320.Febrero-Bande, M., Galeano, P. & Gonz ´alez-Manteiga, W. (2017), ‘Functional principal componentregression and functional partial least-squares regression: An overview and a comparativestudy’, International Statistical Review (1), 61–83.Ferraty, F. & Vieu, P. (2006), Nonparametric Functional Data Analysis: Theory and Practice , SpringerScience & Business Media, New York.Goia, A. & Vieu, P. (2016), ‘An introduction to recent advances in high/infinite dimensionalstatistics’,
Journal of Multivariate Analysis , 1–6.32olub, G. H., Heath, M. & Wahba, G. (1979), ‘Generalized cross-validation as a method for choosinga good ridge parameter’,
Technometrics (2), 215–223.Green, P. J. & Silverman, B. W. (1999), Nonparametric Regression and Generalized Linear Models: ARoughness Penalty Approach , Chapman & Hall, London.H ¨ormann, S., Kidzi ´nski, L. & Hallin, M. (2015), ‘Dynamic functional principal components’,
Journalof the Royal Statistical Society, Statistical Methodology, Series B (2), 319–348.H ¨ormann, S. & Kokoszka, P. (2010), ‘Weakly dependent functional data’, The Annals of Statistics (3), 1845–1884.Horv´ath, L. & Kokoszka, P. (2012), Inference for functional data with applications , Vol. 200, SpringerScience & Business Media, New York.Jiang, B., Yang, Y., Gao, J. & Hsiao, C. (2020), ‘Recursive estimation in large panel data models:Theory and practice’,
Journal of Econometrics .Knight, K. & Fu, W. (2000), ‘Asymptotics for lasso-type estimators’,
The Annals of Statistics (5), 1356–1378.Lam, C., Yao, Q. & Bathia, N. (2011), ‘Estimation of latent factors for high-dimensional time series’, Biometrika (4), 901–918.Onatski, A. (2010), ‘Determining the number of factors from empirical distribution of eigenvalues’, The Review of Economics and Statistics (4), 1004–1016.Onatski, A. (2012), ‘Asymptotics of the principal components estimator of large factor models withweakly influential factors’, Journal of Econometrics (2), 244–258.Ramsay, J. O. & Hooker, G. (2017),
Dynamic Data Analysis: Modeling Data with Differential Equations ,Springer, New York.Ramsay, J. O. & Silverman, B. W. (2002),
Applied Functional Data Analysis , Springer, New York.Ramsay, J. O. & Silverman, B. W. (2005),
Functional Data Analysis , Springer, New York.Reiss, P. T., Goldsmith, J., Shang, H. L. & Ogden, R. T. (2017), ‘Methods for scalar-on-functionregression’,
International Statistical Review (2), 228–249.Schwarz, G. (1978), ‘Estimating the dimension of a model’, The Annals of Statistics (2), 461–464.Wahba, G. (1990), Spline models for observational data , Vol. 59, SIAM.Wand, M. P. & Jones, C. M. (1995),
Kernel Smoothing , Chapman & Hall, Boca Raton, FL.33ang, J.-L., Chiou, J.-M. & M ¨uller, H.-G. (2016), ‘Functional data analysis’,
Annual Review ofStatistics and Its Application , 257–295.Wong, F., Carter, C. K. & Kohn, R. (2003), ‘Efficient estimation of covariance selection models’, Biometrika (4), 809–830.Yao, W. & Li, R. (2013), ‘New local estimation procedure for a non-parametric regression functionfor longitudinal data’, Journal of the Royal Statistical Society: Series B (Statistical Methodology) (1), 123–138.Zhang, X. & Wang, J. L. (2016), ‘From sparse to dense functional data and beyond’, The Annals ofStatistics (5), 2281–2321. 34 upplement to “Factor-augmented Smoothing Modelfor Functional Data” Yuan Gao * Research School of Finance, Actuarial Studies and StatisticsAustralian National UniversityHan Lin ShangDepartment of Actuarial Studies and Business AnalyticsMacquarie UniversityYanrong YangResearch School of Finance, Actuarial Studies and StatisticsAustralian National UniversityFebruary 5, 2021
This section contains the proofs for the theorems in the main article. In Appendix A, we providethe proofs for the theorems in Section 4. In Appendix B, we include the results of a propositionand its proof. In Appendix C, the lemmas used for the proofs in Appendix A and B are stated aswell as their proofs.
Appendix A
Theorem 2 is the main result of the asymptotic theories, and the proof of it is lengthy. Thus, weinclude in the following the outlines for the proof before we show the details.
Outlines for proof of Theorem 2
In Theorem 2, we find the order of convergence of the estimated coefficient matrix (cid:98) C . Thedifference between (cid:98) C and C could be written into three terms:1 p (cid:16) Φ (cid:62) M (cid:98) A Φ + α R (cid:17) ( (cid:98) C − C ) = p α RC + p Φ (cid:62) M (cid:98) A A F (cid:62) + p Φ (cid:62) M (cid:98) A E . (24) * Postal address: Research School of Finance, Actuarial Studies and Statistics, Level 4, Building 26C, Kingsley St,Australian National University, Canberra, ACT 2601, Australia; Email: [email protected] p (cid:16) Φ (cid:62) M (cid:98) A Φ + α R (cid:17) is O p ( ) . The first term on the right-hand side of (19) comes fromthe penalty, and the order can be found easily from Assumption 6. The third term contains therandom error matrix E , and the order can be found using the result in Lemma 10. The second termis the most complicated one, and we show in the following proof that it could be further brokendown into eight terms. We find the order of each of the eight terms using the lemmas in AppendixC. Most of the terms can be shown to be o p (cid:16)(cid:13)(cid:13)(cid:13) C − (cid:98) C (cid:13)(cid:13)(cid:13)(cid:17) and thus can be omitted. Combining theremaining terms, we arrive at the result (cid:16) (cid:98) C − C (cid:17) M F = Q − (cid:16) (cid:98) A (cid:17) p α RC + Q − (cid:16) (cid:98) A (cid:17) p Φ (cid:62) M (cid:98) A EM F + O p (cid:18) ( n , p ) (cid:19) + O p (cid:18) √ np √ p (cid:19) + O p (cid:18) √ np (cid:19) , (25)where matrix Q and M F are Q (cid:16) (cid:98) A (cid:17) = p Φ (cid:62) M (cid:98) A Φ M F = I n − F (cid:16) F (cid:62) F (cid:17) − F (cid:62) .The first term on the right-hand side of (25) is O p ( ) using the assumption on the tuning parameter α . We also show the second term is O p ( ) using results from the lemmas. When n and p are of thesame order, we are able to show the norm of (cid:16) (cid:98) C − C (cid:17) projected on the matrix M F is O p ( ) .Next begins the formal proofs. Proof of Theorem 1
Proof.
The concentrated objective function defined in Section 4.2 is S np ( c i , A ) = np n ∑ i = (cid:104) ( Y i − Φ c i ) (cid:62) M A ( Y i − Φ c i ) + α c (cid:62) i Rc i (cid:105) − np n ∑ i = (cid:15) (cid:62) i M A (cid:15) i .Assume c i = for simplicity without loss of generality. From Y i = Φ c i + A f i + (cid:15) i = A f i + (cid:15) i ,we have S np ( c i , A ) = np n ∑ i = (cid:104) ( A f i + (cid:15) i − Φ c i ) (cid:62) M A ( A f i + (cid:15) i − Φ c i ) + α c (cid:62) i Rc i (cid:105) − np n ∑ i = (cid:15) (cid:62) i M A (cid:15) i = np n ∑ i = f (cid:62) i A (cid:62) M A A f i + np n ∑ i = c (cid:62) i Φ (cid:62) M A Φ c i − np n ∑ i = f (cid:62) i A (cid:62) M A Φ c i + np n ∑ i = (cid:15) (cid:62) i M A A f i − np n ∑ i = (cid:15) (cid:62) i M A Φ c i + np n ∑ i = (cid:15) (cid:62) i ( M A − M A ) (cid:15) i + α np n ∑ i = c (cid:62) i Rc i .2enote the first three terms in the above equation as (cid:101) S np ( c i , A ) = np n ∑ i = f (cid:62) i A (cid:62) M A A f i + np n ∑ i = c (cid:62) i Φ (cid:62) M A Φ c i − np n ∑ i = f (cid:62) i A (cid:62) M A Φ c i .Then by Lemma 4, S np ( c i , A ) = (cid:101) S np ( c i , A ) + o p ( ) .It is easy to see that (cid:101) S np ( c i = , A H ) = r × r invertible H , because M A H = M A and M A A = .Here we define two matrix operations before further transformations on ˜ S np ( c i , A ) . For an m × n matrix U and a p × q matrix V , the vectorization of U is defined asvec ( U ) ≡ ( u , . . . , u m ,1 , u , . . . , u m ,2 , u n , . . . , u m , n ) (cid:62) ,and the Kronecker product U ⊗ V is the pm × qn block matrix defined as U ⊗ V ≡ u V . . . u n V ... ... u m ,1 V . . . u m , n V , where u ij represents the element on the i th row and j th column of matrix U .Next we can further write ˜ S n , p ( c i , A ) as (cid:101) S np ( c i , A ) = vec ( M A A ) (cid:62) (cid:32) F (cid:62) F np ⊗ I p (cid:33) vec ( M A A ) + n n ∑ i = c (cid:62) i (cid:18) p Φ (cid:62) M A Φ (cid:19) c i − n n ∑ i = c (cid:62) i (cid:18) p f i ⊗ M A Φ (cid:19) vec ( M A A ) .If we denote P = p Φ (cid:62) M A Φ , W = F (cid:62) F np ⊗ I p , V i = p f i ⊗ M A Φ ,3nd γ = vec ( M A A ) , then we can write (cid:101) S np ( c i , A ) = n n ∑ i = (cid:104) c (cid:62) i P c i + γ (cid:62) W γ − c (cid:62) i V (cid:62) i γ (cid:105) = n n ∑ i = (cid:104) c (cid:62) i (cid:16) P − V (cid:62) i W − V i (cid:17) c i + ( γ (cid:62) − c (cid:62) i V (cid:62) i W − ) W ( γ (cid:62) − W − V i c i ) (cid:105) ≡ n n ∑ i = (cid:104) c (cid:62) i D i ( A ) c i + θ (cid:62) i W θ i (cid:105) .In the last equation, D i ( A ) ≡ P − V (cid:62) i W − V i = p Φ (cid:62) M A Φ − p Φ (cid:62) M A Φ f (cid:62) i (cid:32) F (cid:62) F n (cid:33) − f i , θ i ≡ γ (cid:62) − W − V i c i .By Assumption 2 and 3, the matrices D i and W are positive definite for each i . Thus we have (cid:101) S np ( c i , A ) ≥
0. In addition, if either c i (cid:54) = c i or A (cid:54) = A H , then (cid:101) S np ( c i , A ) >
0. Thus (cid:101) S np ( c i , A ) achieves its unique minimum at ( c i , A ) . Thus we have (cid:98) c i − c i = o p ( ) , i =
1, . . . , n .Next, we show (cid:98) c i is consistent uniformly in i .We can write S np ( c i , A ) − (cid:101) S np ( c i , A ) = np n ∑ i = (cid:15) (cid:62) i M A A f i − np n ∑ i = (cid:15) (cid:62) i M A Φ c i + np n ∑ i = (cid:15) (cid:62) i ( M A − M A ) (cid:15) i + α np n ∑ i = c (cid:62) i Rc i .Using Taylor’s expansion at c i , S np ( c i , A ) − (cid:101) S np ( c i , A ) = np n ∑ i = (cid:15) (cid:62) i M A A f i − np n ∑ i = (cid:15) (cid:62) i M A Φ c i + np n ∑ i = (cid:15) (cid:62) i ( M A − M A ) (cid:15) i + α np n ∑ i = c i (cid:62) Rc i + (cid:18) − np (cid:15) (cid:62) i M A Φ + α np c i (cid:62) R (cid:19) ( c i − c i ) + ∆ ,4here ∆ denotes the small order terms. Then we have (cid:18) − np (cid:15) (cid:62) i M A Φ + α np c i (cid:62) R (cid:19) (cid:16) c i − c i (cid:17) = S np ( c i , A ) − (cid:101) S np ( c i , A ) − np n ∑ i = (cid:15) (cid:62) i M A A f i + np n ∑ i = (cid:15) (cid:62) i M A Φ c i + np n ∑ i = (cid:15) (cid:62) i ( M A − M A ) (cid:15) i − α np n ∑ i = c i (cid:62) Rc i + ∆ (26)In the above equation, the right-hand side is o p ( ) uniformly in i . This is because S np ( c i , A ) − (cid:101) S np ( c i , A ) = o p ( ) and S np ( c i , A ) and (cid:101) S np ( c i , A ) both consist of summations over i . Furthermore,all other terms on the right-hand side are o p ( ) as proved in Lemma 4 and all contain summationsover i . On the left-hand side of (26), − np (cid:15) (cid:62) i M A Φ is o p ( ) uniformly because E (cid:16) np (cid:13)(cid:13) (cid:15) (cid:62) i M A c i (cid:13)(cid:13)(cid:17) = o ( ) as shown in Lemma 4 ( ii ) . Moreover, the term α np c i (cid:62) R is also o p ( ) uniformly using Lemma 4 ( iv ) and that we assume c i are bounded uniformly in Assumption 2. This leads us to the resultthat (cid:98) c i − c i = o p ( ) , uniformly for all i =
1, . . . , n Combining the i , we have (cid:13)(cid:13)(cid:13) (cid:98) C − C (cid:13)(cid:13)(cid:13) √ n = o p ( ) .To prove part ( ii ) , note that the centred objective function satisfies S np ( c i = , A ) = S np ( (cid:98) c i , (cid:98) A ) ≤
0. Therefore,0 ≥ S np ( (cid:98) c i , (cid:98) A ) = (cid:101) S np ( (cid:98) c i , (cid:98) A ) + o p ( ) .Combined with (cid:101) S np ( (cid:98) c i , (cid:98) A ) ≥
0, it must be true that (cid:101) S np ( (cid:98) c i , (cid:98) A ) = o p ( ) .This implies that 1 np n ∑ i = F (cid:62) i A (cid:62) M A A F i = tr (cid:34) A (cid:62) M (cid:98) A A p F (cid:62) F n (cid:35) = o p ( ) .Since F (cid:62) F / n = O p ( ) , it must be true that A (cid:62) M (cid:98) A A p = A (cid:62) A p − A (cid:62) (cid:98) A p (cid:98) A (cid:62) A p = o p ( ) . (27)5y Assumption 4, A (cid:62) A / p is invertible. Thus A (cid:62) (cid:98) A / p is also invertible. Next, (cid:107) P (cid:98) A − P A (cid:107) = tr [( P (cid:98) A − P A ) ] = ( I r − (cid:98) A (cid:62) P A (cid:98) A / p ) .But (27) implies (cid:98) A (cid:62) P A (cid:98) A / p → I r , which means (cid:107) P (cid:98) A − P A (cid:107) → Proof of Theorem 2
Proof.
Writing the first equation in (10) in matrix notation, we have (cid:98) C = (cid:16) Φ (cid:62) M (cid:98) A Φ + α R (cid:17) − Φ (cid:62) M (cid:98) A Y . (28)Substitute Y = Φ C + A f (cid:62) + E into (28) and subtract the matrix C on both sides, we get (cid:98) C − C = (cid:20)(cid:16) Φ (cid:62) M (cid:98) A Φ + α R (cid:17) − Φ (cid:62) M (cid:98) A Φ − I K (cid:21) C + (cid:16) Φ (cid:62) M (cid:98) A Φ + α R (cid:17) − Φ (cid:62) M (cid:98) A A F (cid:62) + (cid:16) Φ (cid:62) M (cid:98) A Φ + α R (cid:17) − Φ (cid:62) M (cid:98) A E ,or 1 p (cid:16) Φ (cid:62) M (cid:98) A Φ + α R (cid:17) (cid:16) (cid:98) C − C (cid:17) = p α RC + p Φ (cid:62) M (cid:98) A A F (cid:62) + p Φ (cid:62) M (cid:98) A E (29)We first look at the second term on the right-hand side of (29). Recall that M (cid:98) A = I p − (cid:98) A (cid:98) A (cid:62) / p .We have M (cid:98) A (cid:98) A = . Thus M (cid:98) A A = M (cid:98) A (cid:16) A − (cid:98) AH − + (cid:98) AH − (cid:17) = M (cid:98) A (cid:16) A − (cid:98) AH − (cid:17) ,where H is defined in (43). Using (47), it follows that1 p Φ (cid:62) M (cid:98) A A F (cid:62) = − p Φ (cid:62) M (cid:98) A ( I + · · · + I ) (cid:32) A (cid:62) (cid:98) A p (cid:33) − (cid:32) F (cid:62) F n (cid:33) − F (cid:62) (30) ≡ J + · · · + J J J
8. Note that I I G ≡ (cid:32) A (cid:62) (cid:98) A p (cid:33) − (cid:32) F (cid:62) F n (cid:33) − . (31)6e prove in Lemma 5 that G = O p ( ) . We also use the fact that (cid:13)(cid:13) M (cid:98) A (cid:13)(cid:13) = O p ( ) . Now J = − p Φ (cid:62) M (cid:98) A ( I ) GF (cid:62) . (32)Since I = O p (cid:18) √ pn (cid:13)(cid:13)(cid:13) C − (cid:98) C (cid:13)(cid:13)(cid:13) (cid:19) , using the result from Lemma 2 ( i ) , the term J O p (cid:18) √ n (cid:13)(cid:13)(cid:13) C − (cid:98) C (cid:13)(cid:13)(cid:13) (cid:19) . Thus it is also o p (cid:16)(cid:13)(cid:13)(cid:13) C − (cid:98) C (cid:13)(cid:13)(cid:13)(cid:17) . J = − p Φ (cid:62) M (cid:98) A ( I ) (cid:32) A (cid:62) (cid:98) A p (cid:33) − (cid:32) F (cid:62) F n (cid:33) − F (cid:62) = p Φ (cid:62) M (cid:98) A Φ (cid:16) (cid:98) C − C (cid:17) F (cid:16) F (cid:62) F (cid:17) − F (cid:62) . (33)For the term J
2, since it is not a small order term, we keep it as what it is.Now consider J = − p Φ (cid:62) M (cid:98) A ( I ) (cid:32) A (cid:62) (cid:98) A p (cid:33) − (cid:32) F (cid:62) F n (cid:33) − F (cid:62) = np Φ (cid:62) M (cid:98) A Φ (cid:16) (cid:98) C − C (cid:17) E (cid:62) (cid:98) AGF (cid:62) . (34)We take E (cid:62) (cid:98) A = E (cid:62) (cid:16) (cid:98) A − A H (cid:17) + E (cid:62) A H , where the order of each term can be found inLemma 6 ( i ) . Again using the result of Lemma 2 ( i ) and ( iii ) , it can be shown that J o p (cid:16)(cid:13)(cid:13)(cid:13) C − (cid:98) C (cid:13)(cid:13)(cid:13)(cid:17) .Next (cid:107) J (cid:107) = (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) − p Φ M (cid:98) A I (cid:32) A (cid:62) (cid:98) A p (cid:33) − (cid:32) F (cid:62) F n (cid:33) − F (cid:62) (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) = O p (cid:32) M (cid:98) A A √ p (cid:13)(cid:13)(cid:13) C − (cid:98) C (cid:13)(cid:13)(cid:13)(cid:33) . (35)Using Proposition 1, we have √ p M (cid:98) A A = M (cid:98) A √ p (cid:16) A − (cid:98) AH − (cid:17) = o p ( ) , Thus, (cid:107) J (cid:107) = o p (cid:16)(cid:13)(cid:13)(cid:13) C − (cid:98) C (cid:13)(cid:13)(cid:13)(cid:17) .It can also be proven that (cid:107) J (cid:107) = o p (cid:16)(cid:13)(cid:13)(cid:13) C − (cid:98) C (cid:13)(cid:13)(cid:13)(cid:17) .7hen we consider J = − p Φ (cid:62) M (cid:98) A I GF (cid:62) = − np Φ (cid:62) M (cid:98) A A F (cid:62) E (cid:62) (cid:98) AGF (cid:62) = − np Φ (cid:62) M (cid:98) A (cid:16) A − (cid:98) AH − (cid:17) F (cid:62) E (cid:62) (cid:98) AGF (cid:62) ,where the last equation comes from M (cid:98) A (cid:98) AH − = . Now (cid:13)(cid:13)(cid:13) F (cid:62) E (cid:62) (cid:98) A (cid:13)(cid:13)(cid:13) = (cid:13)(cid:13)(cid:13) E (cid:62) (cid:98) A (cid:13)(cid:13)(cid:13) ≤ (cid:13)(cid:13)(cid:13) E (cid:62) (cid:16) (cid:98) A − A H (cid:17)(cid:13)(cid:13)(cid:13) + (cid:13)(cid:13)(cid:13) E (cid:62) A H (cid:13)(cid:13)(cid:13) = O p (cid:32) p min (cid:0) √ n , √ p (cid:1) (cid:13)(cid:13)(cid:13) C − (cid:98) C (cid:13)(cid:13)(cid:13)(cid:33) + O p (cid:0) √ n (cid:1) + O p (cid:18) p √ n (cid:19) + O p ( √ np )= O p (cid:18) p √ n (cid:13)(cid:13)(cid:13) C − (cid:98) C (cid:13)(cid:13)(cid:13)(cid:19) + O p (cid:18) p √ n (cid:19) + O p ( √ np ) ,using Lemma 6. Thus, (cid:107) J (cid:107) ≤ (cid:13)(cid:13)(cid:13)(cid:13) np Φ (cid:62) M (cid:98) A (cid:16) A − (cid:98) AH − (cid:17)(cid:13)(cid:13)(cid:13)(cid:13) (cid:13)(cid:13)(cid:13) F (cid:62) E (cid:62) (cid:98) A (cid:13)(cid:13)(cid:13) (cid:107) G (cid:107) (cid:13)(cid:13)(cid:13) F (cid:62) (cid:13)(cid:13)(cid:13) = − np × (cid:20) O p (cid:18) p √ n (cid:13)(cid:13)(cid:13) C − (cid:98) C (cid:13)(cid:13)(cid:13)(cid:19) + O p (cid:18) p min ( n , p ) (cid:19)(cid:21) × (cid:20) O p (cid:18) p √ n (cid:13)(cid:13)(cid:13) C − (cid:98) C (cid:13)(cid:13)(cid:13)(cid:19) + O p (cid:18) p √ n (cid:19) + O p ( √ np ) (cid:21) × O p (cid:0) √ n (cid:1) = o p (cid:16)(cid:13)(cid:13)(cid:13) C − (cid:98) C (cid:13)(cid:13)(cid:13)(cid:17) + O p (cid:18) n (cid:19) + O p (cid:18) n √ p (cid:19) + O p (cid:18) np (cid:19) + O p (cid:18) p √ p (cid:19) , (36)where Proposition 1 is used in the first equation, and the second equation is a result of thecalculation on the orders. Next J = − p Φ (cid:62) M (cid:98) A I GF (cid:62) = − np Φ (cid:62) M (cid:98) A EF (cid:32) F (cid:62) F n (cid:33) − F (cid:62) . (37)This term is not a small order term, so we keep it as what it is. And lastly, the proof of order forthe term J J = o p (cid:16)(cid:13)(cid:13)(cid:13) C − (cid:98) C (cid:13)(cid:13)(cid:13)(cid:17) + O p (cid:18) ( n , p ) (cid:19) + O p (cid:18) √ n √ p ( n , p ) (cid:19) . (38)8ollecting terms from J J
8, we can write (29) as (cid:18) p Φ (cid:62) M (cid:98) A Φ + p α R (cid:19) (cid:16) (cid:98) C − C (cid:17) = p α RC + J + · · · + J + p Φ (cid:62) M (cid:98) A E .Combining the results we have found for J J J J J J (cid:18) p Φ (cid:62) M (cid:98) A Φ + o p ( ) (cid:19) (cid:16) (cid:98) C − C (cid:17) − J = p α RC + p Φ (cid:62) M (cid:98) A E + J + O p (cid:18) ( n , p ) (cid:19) + O p (cid:18) √ np √ p (cid:19) + O p (cid:18) √ np (cid:19) .(39)Substitute J J (cid:18) p Φ (cid:62) M (cid:98) A Φ + o p ( ) (cid:19) (cid:16) (cid:98) C − C (cid:17) − p Φ (cid:62) M (cid:98) A Φ (cid:16) (cid:98) C − C (cid:17) F (cid:16) F (cid:62) F (cid:17) − F (cid:62) = p α RC + p Φ (cid:62) M (cid:98) A E − p Φ (cid:62) M (cid:98) A EF (cid:16) F (cid:62) F (cid:17) − F (cid:62) + O p (cid:18) ( n , p ) (cid:19) + O p (cid:18) √ np √ p (cid:19) + O p (cid:18) √ np (cid:19) . (40)We combine the two terms on the left-hand side of (40) and also combine the second and thirdterm on the right-hand side of (40), then we get1 p Φ (cid:62) M (cid:98) A Φ ( (cid:98) C − C ) (cid:18) I n − F (cid:16) F (cid:62) F (cid:17) − F (cid:62) (cid:19) = p α RC + p Φ (cid:62) M (cid:98) A E (cid:16) I n − F ( F (cid:62) F ) − F (cid:62) (cid:17) + O p (cid:18) ( n , p ) (cid:19) + O p (cid:18) √ np √ p (cid:19) + O p (cid:18) √ np (cid:19) .Let Q ( (cid:98) A ) ≡ p Φ (cid:62) M (cid:98) A Φ , and M F ≡ I n − F ( F (cid:62) F ) − F (cid:62) . Left multiplying Q − ( (cid:98) A ) to both sidesof the equation above, we have ( (cid:98) C − C ) M F = Q ( (cid:98) A ) − p α RC + Q ( (cid:98) A ) − p Φ (cid:62) M (cid:98) A EM F + O p (cid:18) ( n , p ) (cid:19) + O p (cid:18) √ np √ p (cid:19) + O p (cid:18) √ np (cid:19) = Q − ( A ) p α RC + Q − ( A ) p Φ (cid:62) M A EM F + O p (cid:18) ( n , p ) (cid:19) + O p (cid:18) √ np √ p (cid:19) + O p (cid:18) √ np (cid:19) ,where in the last equation, we substitute Q ( (cid:98) A ) with Q ( A ) using Lemma 8 and substitute √ np Φ (cid:62) M (cid:98) A E with √ np Φ (cid:62) M A E using Lemma 10. Note that √ pO p (cid:18) (cid:107) C − (cid:98) C (cid:107) n (cid:19) in the result of9emma 10 is dominated by √ p (cid:107) C − (cid:98) C (cid:107) √ n . Next by multiplying a scale of √ p √ n , √ p √ n ( (cid:98) C − C ) M F = Q ( A ) − p α RC + Q ( A ) − √ np Φ (cid:62) M A EM F + √ p √ n × O p (cid:18) ( n , p ) (cid:19) + √ p √ n × O p (cid:18) √ np √ p (cid:19) + O p (cid:18) √ np (cid:19) = O p ( ) , (41)when n and p are of the same order, that is p / n → ρ > Proof of Theorem 3
From (41), we have when p / n → ρ > √ p √ n ( (cid:98) C − C ) M F = Q ( A ) − √ np Φ (cid:62) M A EM F + o p ( ) .Using Lemma 1 we have, for any vector b = ( b , . . . , b n ) (cid:62) ,1 √ np Φ (cid:62) M A EM F b d → N ( L ) , (42)where L is defined in (17).Multiplying the constant matrix Q ( A ) − to (42), we have the result Q ( A ) − √ np Φ (cid:62) M A EM F b d → N (cid:16) Q ( A ) − LQ ( A ) − (cid:17) .The theorem is thus proved. Appendix B
In this section, we provide the proposition used in Appendix A, along with its proof.
Proposition 1.
Under Assumptions 1 to 4, we have the following statements:(i) The matrix V np defined in (9) is invertible and V np p → V , where the r × r matrix V is a diagonalmatrix consisting of the eigenvalues of Σ F Σ A ;(ii) Let H = ( F (cid:62) F / n ) − ( A (cid:62) (cid:98) A / p ) − V np , (43)10 hen H is r × r invertible matrix and p (cid:107) (cid:98) A − A H (cid:107) = O p (cid:18) n (cid:13)(cid:13)(cid:13) C − (cid:98) C (cid:13)(cid:13)(cid:13) (cid:19) + O p (cid:18) ( n , p ) (cid:19) . Proof.
Write the second equation in (10) in a matrix form, we have1 np ( Y − Φ (cid:98) C )( Y − Φ (cid:98) C ) (cid:62) (cid:98) A = (cid:98) AV np .By (2), we also have Y − Φ (cid:98) C = Φ ( C − (cid:98) C ) + A F (cid:62) + E . (44)Plugging it in (44) and by expanding terms, we obtain (cid:98) AV np = np (cid:104) Φ ( C − (cid:98) C ) + A F (cid:62) + E (cid:105) (cid:104) Φ ( C − (cid:98) C ) + A F (cid:62) + E (cid:105) (cid:62) (cid:98) A = np Φ ( C − (cid:98) C )( C − (cid:98) C ) (cid:62) Φ (cid:62) (cid:98) A + np Φ ( C − (cid:98) C ) F A (cid:62) (cid:98) A + np Φ ( C − (cid:98) C ) E (cid:62) (cid:98) A + np A F (cid:62) ( C − (cid:98) C ) (cid:62) Φ (cid:62) (cid:98) A , + np E ( C − (cid:98) C ) (cid:62) Φ (cid:62) (cid:98) A + np A F (cid:62) E (cid:62) (cid:98) A + np EF A (cid:62) (cid:98) A + np EE (cid:62) (cid:98) A + np A F (cid:62) F A (cid:62) (cid:98) A ≡ I + · · · + I
9. (45)The above can be rewritten as (cid:98) AV np − A ( F (cid:62) F / n )( A (cid:62) (cid:98) A / p ) = I + · · · + I
8. (46)Right multiplying ( F (cid:62) F / n ) − ( A (cid:62) (cid:98) A / p ) − on each side, we obtain (cid:98) A (cid:104) V np ( A (cid:62) (cid:98) A / p ) − ( F (cid:62) F / n ) − (cid:105) − A = ( I + · · · + I )( A (cid:62) (cid:98) A / p ) − ( F (cid:62) F / n ) − . (47)Note that the matrix in the square brackets is H − , but the invertibility of V np hasn’t been provedyet. We can write1 √ p (cid:13)(cid:13)(cid:13) (cid:98) A (cid:104) V np ( A (cid:62) (cid:98) A / p ) − ( F (cid:62) F / n ) − (cid:105) − A (cid:13)(cid:13)(cid:13) ≤ √ p ( (cid:107) I (cid:107) + · · · + (cid:107) I (cid:107) ) (cid:107) G (cid:107) , (48)where G is defined in (31) and (cid:107) G (cid:107) is proved to be O p ( ) in Lemma 5. In the following, we find11he order for each term on the right-hand side of (48). We repeatedly use results from Lemma 2,where the orders of the matrices Φ , A and F are given. The first term1 √ p (cid:107) I (cid:107) ≤ √ p np (cid:107) Φ (cid:107)(cid:107) ( C − (cid:98) C )( C − (cid:98) C ) (cid:62) (cid:107)(cid:107) Φ (cid:62) (cid:107)(cid:107) (cid:98) A (cid:107) = O p (cid:18) n (cid:13)(cid:13)(cid:13) C − (cid:98) C (cid:13)(cid:13)(cid:13) (cid:19) = o p (cid:18) √ n (cid:13)(cid:13)(cid:13) C − (cid:98) C (cid:13)(cid:13)(cid:13)(cid:19) .For the second term 1 √ p (cid:107) I (cid:107) ≤ √ p np (cid:107) Φ (cid:107)(cid:107) ( C − (cid:98) C ) (cid:107)(cid:107) F (cid:107)(cid:107) A (cid:62) (cid:107)(cid:107) (cid:98) A (cid:107) = O p (cid:18) √ n (cid:13)(cid:13)(cid:13) C − (cid:98) C (cid:13)(cid:13)(cid:13)(cid:19) .The terms I I O p ( √ n (cid:13)(cid:13)(cid:13) C − (cid:98) C (cid:13)(cid:13)(cid:13) ) . The proofs are similar to the proof for I √ p (cid:107) I (cid:107) ≤ √ p np (cid:107) A (cid:107)(cid:107) F (cid:62) E (cid:62) (cid:107)(cid:107) (cid:98) A (cid:107) = O p (cid:18) √ n (cid:19) ,by Lemma 3 ( i ) . Similarly, for the next term1 √ p (cid:107) I (cid:107) ≤ √ p np (cid:107) EF (cid:107)(cid:107) A (cid:62) (cid:107)(cid:107) (cid:98) A (cid:107) = O p (cid:18) √ n (cid:19) .For the last term 1 √ p (cid:107) I (cid:107) ≤ √ p np (cid:107) EE (cid:62) (cid:107)(cid:107) (cid:98) A (cid:107) = O p (cid:18) √ n (cid:19) + (cid:18) √ p (cid:19) ,where Lemma 3 ( iv ) is used.Putting all the above together, we have1 √ p (cid:13)(cid:13)(cid:13) (cid:98) A (cid:104) V np ( A (cid:62) (cid:98) A / p ) − ( F (cid:62) F / n ) − (cid:105) − A (cid:13)(cid:13)(cid:13) = O p (cid:18) √ n (cid:13)(cid:13)(cid:13) C − (cid:98) C (cid:13)(cid:13)(cid:13)(cid:19) + O p (cid:32) ( √ n , √ p ) (cid:33) . (49)To show ( i ) , left multiply (46) by p (cid:98) A (cid:62) . Using (cid:98) A (cid:62) (cid:98) A / p = I r , we have V np − ( (cid:98) A (cid:62) A / p )( F (cid:62) F / n )( A (cid:62) (cid:98) A / p ) = p (cid:98) A (cid:62) ( I + · · · + I ) = o p ( ) ,where the last equality is using Lemma 2 ( v ) and that p − ( (cid:107) I (cid:107) + · · · + (cid:107) I (cid:107) ) = o p ( ) from (49).12hus, V np = ( (cid:98) A (cid:62) A / p )( F (cid:62) F / n )( A (cid:62) (cid:98) A / p ) + o p ( ) .We have shown in (27) that (cid:98) A (cid:62) (cid:98) A is invertible, thus V np is invertible. To obtain the limit of V np ,left multiply (46) by p A (cid:62) to yield ( A (cid:62) (cid:98) A / p ) V np − ( A (cid:62) A / p )( F (cid:62) F / n )( A (cid:62) (cid:98) A / p ) = o p ( ) ,or ( A (cid:62) A / p )( F (cid:62) F / n )( A (cid:62) (cid:98) A / p ) + o p ( ) = ( A (cid:62) (cid:98) A / p ) V np (50)because p − A (cid:62) ( (cid:107) I (cid:107) + . . . (cid:107) I (cid:107) ) = o p ( ) . Equation (50) shows that the columns of ( A (cid:62) (cid:98) A / p ) are the eigenvectors of the matrix ( A (cid:62) A / p )( F (cid:62) F / n ) , and that V np consists of the eigenvaluesof the same matrix in the limit. Thus, V np p → V , where the r × r matrix V is a diagonal matrixconsisting of the eigenvalues of Σ F Σ A .For ( ii ) , since V np is invertible, H is also invertible we can write (49) as1 √ p (cid:13)(cid:13)(cid:13) (cid:98) AH − − A (cid:13)(cid:13)(cid:13) = O p (cid:18) √ n (cid:13)(cid:13)(cid:13) C − (cid:98) C (cid:13)(cid:13)(cid:13)(cid:19) + O p (cid:32) ( √ n , √ p ) (cid:33) .By right multiplying the matrix H , we obtain ( ii ) . Appendix C
In this section, we state all the lemmas used for previous theorems and propositions, along withthe proofs of the lemmas.Lemma 1 is stated in Section 4.2. We provide the proof here.
Proof.
For any vector b = ( b , . . . , b n ) (cid:62) ,1 √ np Φ (cid:62) M A Eb = √ np n ∑ i p ∑ j ω j (cid:101) ji b i ≡ √ np n ∑ i p ∑ j x ij ,where ω j is the j th column in the matrix Φ (cid:62) M A . Since we assume (cid:101) ji are i.i.d., the variance of theabove quantity is given byvar (cid:18) √ np Φ (cid:62) M A Eb (cid:19) = var (cid:32) √ np n ∑ i p ∑ j x ij (cid:33) = np p ∑ i n ∑ j b j b i σ E (cid:16) ω i ω (cid:62) j (cid:17) .The Lindeberg condition is assumed to hold in Assumption 7. Thus we have a central limit13heorem result 1 √ np Φ (cid:62) M A Eb = √ np n ∑ i p ∑ j x ij d → N ( L ) ,where L is defined in (17). Lemma 2.
Under Assumptions 1-3, we have(i) √ p (cid:107) Φ (cid:107) = O p ( ) (ii) √ p (cid:107) A (cid:107) = O p ( ) (iii) √ n (cid:107) F (cid:107) = O p ( ) (iv) √ np (cid:107) E (cid:107) = O p ( ) (v) √ p (cid:107) (cid:98) A (cid:107) = O p ( ) Proof.
In Assumption 1, we assume the basis functions φ k ( u ) , k =
1, . . . , K are bounded. The p × K basis matrix Φ contains discrete evaluations on the basis functions so each element is O p ( ) , thus Φ is of order √ p . Similarly, using Assumption 3, we have results ( ii ) and ( iii ) . Using Assumption 4,we have result ( iv ) . Lastly, ( v ) is directly from the restriction (cid:98) A (cid:62) (cid:98) A / p = I r . Lemma 3.
Under Assumptions 1 to 5, we have(i) np (cid:107) EF (cid:107) = O p ( ) (ii) np (cid:107) E (cid:62) Φ (cid:107) = O p ( ) and np (cid:107) E (cid:62) A (cid:107) = O p ( ) (iii) np (cid:107) F (cid:62) E (cid:62) A (cid:107) = O p ( ) and np (cid:107) Φ (cid:62) E (cid:62) F (cid:107) = O p ( ) (iv) (cid:107) E (cid:62) E (cid:107) = O p ( n p ) + O p ( p n ) ; (cid:107) EE (cid:62) (cid:107) = O p ( n p ) + O p ( p n ) ; (cid:107) F (cid:62) E (cid:62) E (cid:107) = O p ( n p ) + O p ( p n ) ; (cid:107) Φ (cid:62) E (cid:62) E (cid:107) = O p ( n p ) + O p ( p n ) ; (cid:107) Φ (cid:62) E (cid:62) EA (cid:107) = O p ( n p ) + O p ( p n ) ; (cid:107) F (cid:62) E (cid:62) EF (cid:107) = O p ( n p ) + O p ( p n ) .Proof. For ( i ) E (cid:18) np (cid:107) EF (cid:107) (cid:19) = E (cid:32) np p ∑ k = n ∑ i = n ∑ j = (cid:101) ki (cid:101) kj f (cid:62) i f j (cid:33) = np p ∑ k = n ∑ i = n ∑ j = E ( (cid:101) ki (cid:101) kj ) E ( f (cid:62) i f j ) = O ( ) ,14here the second equation uses the independence between (cid:101) ki and f j assumed in Assumption 5.The proof of ( ii ) and ( iii ) is similar to ( i ). For ( iv ) , E (cid:16) (cid:107) E (cid:62) E (cid:107) (cid:17) = E (cid:32) n ∑ ij p ∑ kl (cid:101) kj (cid:101) lj (cid:101) ki (cid:101) li (cid:33) = n ∑ i (cid:54) = j p ∑ k = l E ( (cid:101) kj ) E ( (cid:101) ki ) + n ∑ i = j p ∑ k (cid:54) = l E ( (cid:101) kj ) E ( (cid:101) lj ) + n ∑ i = j p ∑ k = l E ( (cid:101) kj )= O ( n p ) + O ( p n ) + O ( np )= O ( n p ) + O ( p n ) ,where Assumption 4 is used. The proof of (cid:107) EE (cid:62) (cid:107) is the same. The orders of (cid:107) F (cid:62) E (cid:62) E (cid:107) and (cid:107) F (cid:62) E (cid:62) EF (cid:107) are the same since E (cid:16) (cid:107) F (cid:62) E (cid:62) E (cid:107) (cid:17) = E (cid:32) n ∑ ij p ∑ kl (cid:101) kj (cid:101) lj (cid:101) ki (cid:101) li (cid:107) f i (cid:107) (cid:33) ,and E (cid:16) (cid:107) F (cid:62) E (cid:62) EF (cid:107) (cid:17) = E (cid:32) n ∑ ij p ∑ kl (cid:101) kj (cid:101) lj (cid:101) ki (cid:101) li (cid:107) f i (cid:107) (cid:33) ,where the order of f i is assumed to be O p ( ) in Assumption 3. Lemma 4.
Under Assumptions 1-6,(i) np ∑ ni = (cid:15) (cid:62) i M A A F i = o p ( ) (ii) np ∑ ni = (cid:15) (cid:62) i M A Φ c i = o p ( ) (iii) np ∑ ni = (cid:15) (cid:62) i ( M A − M A ) (cid:15) i = o p ( ) (iv) α np ∑ ni = c (cid:62) i Rc i = o p ( ) Proof.
We prove ( ii ) . First, we have E (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) n ∑ i = (cid:15) i (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) = E (cid:32) n ∑ i = n ∑ j = p ∑ k = (cid:101) ik (cid:101) jk (cid:33) = n ∑ i = j p ∑ k = E (cid:16) (cid:101) ik (cid:17) = O ( np ) .Since M A = I p − AA (cid:62) / p , we have1 np n ∑ i = (cid:15) (cid:62) i M A Φ c i = np n ∑ i = (cid:15) (cid:62) i Φ c i − np n ∑ i = (cid:15) (cid:62) i AA (cid:62) Φ c i . (51)15he first term on the right of (51) is o p ( ) since E (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) n ∑ i = (cid:15) (cid:62) i Φ c i (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) = E (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) p ∑ j = n ∑ i = (cid:101) ji φ (cid:62) j c i (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) = E (cid:32) p ∑ t n ∑ s p ∑ j n ∑ i (cid:101) ji (cid:101) ts c (cid:62) i φ j φ (cid:62) t c s (cid:33) = p ∑ t n ∑ s p ∑ j n ∑ i E ( (cid:101) ji (cid:101) ts ) E ( c (cid:62) i φ j φ (cid:62) t c s )= p ∑ j n ∑ i σ E ( c (cid:62) i φ j φ (cid:62) j c i )= O ( np ) ,where the third equation uses Assumption 5; the fourth equation uses the assumption that (cid:101) ji areindependent in both directions.The second term on the right-hand side of (51) is also o p ( ) since E (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) n ∑ i = (cid:15) (cid:62) i AA (cid:62) Φ c i (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) = E (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) p ∑ j = n ∑ i = (cid:101) ji a (cid:62) j A (cid:62) Φ c i (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) = E (cid:32) p ∑ t n ∑ s p ∑ j n ∑ i (cid:101) ji (cid:101) ts c (cid:62) i Φ (cid:62) Aa j a (cid:62) t A (cid:62) Φ c s (cid:33) = p ∑ t n ∑ s p ∑ j n ∑ i E ( (cid:101) ji (cid:101) ts ) E ( c (cid:62) i Φ (cid:62) Aa j a (cid:62) t A (cid:62) Φ c s )= p ∑ j n ∑ i σ c (cid:62) i E ( Φ (cid:62) i Aa j a (cid:62) j A (cid:62) Φ ) c i = O ( np ) ,where the third equality uses the independence in Assumption 5 and the last equality uses theresults in Lemma 2, where Φ and A are both O p ( √ p ) .The proofs for ( i ) and ( iii ) are similar. And ( iv ) is a direct result from Assumption 6. Lemma 5.
Under Assumptions 1-5 , we have G ≡ (cid:16) A (cid:62) (cid:98) A / p (cid:17) − (cid:16) F (cid:62) F / n (cid:17) − = O p ( ) . Proof.
The matrix F (cid:62) F / n is positive definite by Assumption 3. We have shown in the proof ofTheorem 1 in (27) that the matrix A (cid:62) (cid:98) A / p is invertible, thus is also positive definite. Therefore, λ min (cid:16) A (cid:62) (cid:98) A / p (cid:17) >
0, and λ min (cid:0) F (cid:62) F / n (cid:1) >
0, where λ min ( · ) denotes the smallest eigenvalue of16 matrix. So we have (cid:16) A (cid:62) (cid:98) A / p (cid:17) − = O p ( ) , (cid:16) F (cid:62) F / n (cid:17) − = O p ( ) . Lemma 6.
We have the following(i) (cid:13)(cid:13)(cid:13) E (cid:62) ( (cid:98) A − A H ) (cid:13)(cid:13)(cid:13) = O p (cid:32) p min ( √ n , √ p ) (cid:13)(cid:13)(cid:13) C − (cid:98) C (cid:13)(cid:13)(cid:13)(cid:33) + O p ( √ n ) + O p (cid:18) p √ n (cid:19) . (ii) (cid:13)(cid:13)(cid:13) F (cid:62) E (cid:62) ( (cid:98) A − A H ) (cid:13)(cid:13)(cid:13) = O p (cid:32) p min ( √ n , √ p ) (cid:13)(cid:13)(cid:13) C − (cid:98) C (cid:13)(cid:13)(cid:13)(cid:33) + O p ( √ n ) + O p (cid:18) p √ n (cid:19) . Proof.
For ( i ) , from Proposition 1, we can write (cid:107) E (cid:62) ( (cid:98) A − A H ) (cid:107) = (cid:107) E (cid:62) ( I + . . . , I ) G (cid:107)≤ (cid:107) E (cid:62) I G (cid:107) + · · · + (cid:107) E (cid:62) I G (cid:107) = (cid:107) a (cid:107) + · · · + (cid:107) a (cid:107) .To find the order for each term, the results from Lemma 2 are repeatedly used where the orderof the matrices Φ , A , (cid:98) A and F are given. (cid:107) a (cid:107) = (cid:13)(cid:13)(cid:13)(cid:13) E (cid:62) np Φ ( C − (cid:98) C )( C − (cid:98) C ) (cid:62) Φ (cid:62) (cid:98) AG (cid:13)(cid:13)(cid:13)(cid:13) ≤ np (cid:107) E (cid:62) Φ (cid:107) (cid:13)(cid:13)(cid:13) C − (cid:98) C (cid:13)(cid:13)(cid:13) (cid:107) Φ (cid:107)(cid:107) (cid:98) A (cid:107)(cid:107) G (cid:107) = O p (cid:18) √ p √ n (cid:13)(cid:13)(cid:13) C − (cid:98) C (cid:13)(cid:13)(cid:13) (cid:19) = o p (cid:16) √ p (cid:13)(cid:13)(cid:13) C − (cid:98) C (cid:13)(cid:13)(cid:13)(cid:17) ,where the order of (cid:107) E (cid:62) Φ (cid:107) is from Lemma 3 ( ii ) . The orders of (cid:107) Φ (cid:107) , (cid:107) (cid:98) A (cid:107) and (cid:107) G (cid:107) can be foundfrom Lemmas 2 and 5. Similarly, (cid:107) a (cid:107) = (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) E (cid:62) n Φ ( C − (cid:98) C ) F (cid:32) F (cid:62) F n (cid:33) − (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) ≤ n (cid:107) E (cid:62) Φ (cid:107) (cid:13)(cid:13)(cid:13) C − (cid:98) C (cid:13)(cid:13)(cid:13) (cid:107) F (cid:107) (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:32) F (cid:62) F n (cid:33) − (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) = O p (cid:16) √ p (cid:13)(cid:13)(cid:13) C − (cid:98) C (cid:13)(cid:13)(cid:13)(cid:17) .17 a (cid:107) = (cid:13)(cid:13)(cid:13)(cid:13) E (cid:62) np Φ ( C − (cid:98) C ) E (cid:62) (cid:98) AG (cid:13)(cid:13)(cid:13)(cid:13) ≤ np (cid:107) E (cid:62) Φ (cid:107) (cid:13)(cid:13)(cid:13) C − (cid:98) C (cid:13)(cid:13)(cid:13) (cid:107) E (cid:62) (cid:107)(cid:107) (cid:98) A (cid:107)(cid:107) G (cid:107) = O p (cid:16) √ p (cid:13)(cid:13)(cid:13) C − (cid:98) C (cid:13)(cid:13)(cid:13)(cid:17) . (cid:107) a (cid:107) = (cid:13)(cid:13)(cid:13)(cid:13) E (cid:62) np A F (cid:62) ( C − (cid:98) C ) (cid:62) Φ (cid:62) (cid:98) AG (cid:13)(cid:13)(cid:13)(cid:13) ≤ np (cid:107) E (cid:62) A (cid:107)(cid:107) F (cid:62) (cid:107) (cid:13)(cid:13)(cid:13) C − (cid:98) C (cid:13)(cid:13)(cid:13) (cid:107) Φ (cid:107)(cid:107) (cid:98) A (cid:107)(cid:107) G (cid:107) = O p (cid:16) √ p (cid:13)(cid:13)(cid:13) C − (cid:98) C (cid:13)(cid:13)(cid:13)(cid:17) ,where Lemma 3 ( ii ) is used. (cid:107) a (cid:107) = (cid:107) E (cid:62) np E ( C − (cid:98) C ) (cid:62) Φ (cid:62) (cid:98) AG (cid:107)≤ np (cid:107) E (cid:62) E (cid:107) (cid:13)(cid:13)(cid:13) C − (cid:98) C (cid:13)(cid:13)(cid:13) (cid:107) Φ (cid:107)(cid:107) (cid:98) A (cid:107)(cid:107) G (cid:107) = O p (cid:16) √ p (cid:13)(cid:13)(cid:13) C − (cid:98) C (cid:13)(cid:13)(cid:13)(cid:17) + O p (cid:18) p √ n (cid:13)(cid:13)(cid:13) C − (cid:98) C (cid:13)(cid:13)(cid:13)(cid:19) ,where Lemma 3 ( iv ) is used. (cid:107) a (cid:107) = (cid:107) E (cid:62) np A F (cid:62) E (cid:62) (cid:98) AG (cid:107)≤ np (cid:107) E (cid:62) A F (cid:62) E (cid:62) ( (cid:98) A − A H ) G (cid:107) + np (cid:107) E (cid:62) A F (cid:62) E (cid:62) A HG (cid:107)≤ np (cid:107) E (cid:62) A (cid:107)(cid:107) F (cid:62) E (cid:62) (cid:107)(cid:107) (cid:98) A − A H (cid:107)(cid:107) G (cid:107) + np (cid:107) E (cid:62) A (cid:107)(cid:107) F (cid:62) E (cid:62) A (cid:107)(cid:107) HG (cid:107) = O p √ p ( (cid:13)(cid:13)(cid:13) C − (cid:98) C (cid:13)(cid:13)(cid:13) √ n + O p (cid:32) ( √ n , √ p ) (cid:33) + O p ( )= O p √ p ( (cid:13)(cid:13)(cid:13) C − (cid:98) C (cid:13)(cid:13)(cid:13) √ n + O p (cid:32) ( √ n , √ p ) (cid:33) ,where the order of (cid:98) A − A H is proved in Proposition 1 and the order of other matrix norms can18e found in Lemma 3 ( i ) , ( ii ) and ( iii ) . (cid:107) a (cid:107) = (cid:107) E (cid:62) np EF A (cid:62) (cid:98) AG (cid:107) = (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) n E (cid:62) EF (cid:32) F (cid:62) F n (cid:33) − (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) ≤ n (cid:13)(cid:13)(cid:13) E (cid:62) EF (cid:13)(cid:13)(cid:13) (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:32) F (cid:62) F n (cid:33) − (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) = O p ( √ p ) + O p (cid:18) p √ n (cid:19) ,where Lemma 3 ( iv ) is used. (cid:107) a (cid:107) = np (cid:13)(cid:13)(cid:13) E (cid:62) EE (cid:62) (cid:98) AG (cid:13)(cid:13)(cid:13) ≤ np (cid:13)(cid:13)(cid:13) E (cid:62) EE (cid:62) A HG (cid:13)(cid:13)(cid:13) + np (cid:13)(cid:13)(cid:13) E (cid:62) EE (cid:62) ( (cid:98) A − A H ) G (cid:13)(cid:13)(cid:13) ≤ np (cid:107) E (cid:62) E (cid:107)(cid:107) E (cid:62) A (cid:107)(cid:107) H (cid:107)(cid:107) G (cid:107) + np (cid:107) E (cid:62) E (cid:107)(cid:107) E (cid:62) (cid:107)(cid:107) (cid:98) A − A H (cid:107)(cid:107) G (cid:107) = np (cid:2) O p ( n √ p ) + O p ( p √ n ) (cid:3) O p ( √ np )+ np (cid:2) O p ( n √ p ) + O p ( p √ n ) (cid:3) O p ( √ np ) O p √ p (cid:13)(cid:13)(cid:13) C − (cid:98) C (cid:13)(cid:13)(cid:13) √ n + O p (cid:32) √ p min ( √ n , √ p ) (cid:33) = O p ( √ n ) + O p ( √ p ) + O p (cid:18) p √ n (cid:13)(cid:13)(cid:13) C − (cid:98) C (cid:13)(cid:13)(cid:13)(cid:19) + O p (cid:16) √ p (cid:13)(cid:13)(cid:13) C − (cid:98) C (cid:13)(cid:13)(cid:13)(cid:17) + O p (cid:0) √ n (cid:1) + O p (cid:18) p √ n (cid:19) = O p (cid:18) p √ n (cid:13)(cid:13)(cid:13) C − (cid:98) C (cid:13)(cid:13)(cid:13)(cid:19) + O p (cid:16) √ p (cid:13)(cid:13)(cid:13) C − (cid:98) C (cid:13)(cid:13)(cid:13)(cid:17) + O p (cid:0) √ n (cid:1) + O p (cid:18) p √ n (cid:19) ,where the order of (cid:98) A − A H is proved in Proposition 1 and the order of other matrix norms canbe found in Lemma 3 ( i ) , ( ii ) and ( iii ) .Combining all the terms, we have (cid:107) E (cid:62) ( (cid:98) A − A H ) (cid:107) = O p (cid:32) p min ( √ n , √ p ) (cid:13)(cid:13)(cid:13) C − (cid:98) C (cid:13)(cid:13)(cid:13)(cid:33) + O p ( √ n ) + O p (cid:18) p √ n (cid:19) .For ( ii ) , multiplying the matrix F (cid:62) in the front does not change the order, using the fact that (cid:107) F (cid:62) E (cid:62) Φ (cid:107) is of the same order as (cid:107) E (cid:62) Φ (cid:107) and that (cid:107) F (cid:62) E (cid:62) E (cid:107) and (cid:107) F (cid:62) E (cid:62) EF (cid:107) are of the sameorder as (cid:107) E (cid:62) E (cid:107) , as proved in Lemma 3. Lemma 7.
Under Assumptions 1-5, we have the following(i) p Φ (cid:62) ( (cid:98) A − A H ) = O p (cid:18) √ n (cid:13)(cid:13)(cid:13) C − (cid:98) C (cid:13)(cid:13)(cid:13)(cid:19) + O p (cid:18) ( n , p ) (cid:19) ii) p A (cid:62) ( (cid:98) A − A H ) = O p (cid:18) √ n (cid:13)(cid:13)(cid:13) C − (cid:98) C (cid:13)(cid:13)(cid:13)(cid:19) + O p (cid:18) ( n , p ) (cid:19) (iii) p (cid:98) A (cid:62) ( (cid:98) A − A H ) = O p (cid:18) √ n (cid:13)(cid:13)(cid:13) C − (cid:98) C (cid:13)(cid:13)(cid:13)(cid:19) + O p (cid:18) ( n , p ) (cid:19) (iv) p Φ (cid:62) M (cid:98) A ( (cid:98) A − A H ) = O p (cid:18) √ n (cid:13)(cid:13)(cid:13) C − (cid:98) C (cid:13)(cid:13)(cid:13)(cid:19) + O p (cid:18) ( n , p ) (cid:19) Proof.
For ( i ) , using (47) Φ (cid:62) ( (cid:98) A − A H ) = Φ (cid:62) ( I + I + · · · + I ) G . (52)It can be easily proved that the first five terms in (52) are O p (cid:16) p √ n (cid:13)(cid:13)(cid:13) C − (cid:98) C (cid:13)(cid:13)(cid:13)(cid:17) using the results fromLemma 2 and 3. Recall that G = ( A (cid:62) (cid:98) A / p ) − ( F (cid:62) F / n ) − , and that G = O p ( ) from Lemma 5.For the sixth term, Φ (cid:62) I G = np Φ (cid:62) A F (cid:62) E (cid:62) (cid:98) AG = np Φ (cid:62) A F (cid:62) E (cid:62) ( (cid:98) A − A H ) G + np Φ (cid:62) A F (cid:62) E (cid:62) A HG ≤ np (cid:107) Φ (cid:62) (cid:107)(cid:107) A (cid:107)(cid:107) F (cid:62) E (cid:62) ( (cid:98) A − A H ) (cid:107)(cid:107) G (cid:107) + np (cid:107) Φ (cid:62) (cid:107)(cid:107) A (cid:107)(cid:107) F (cid:62) E (cid:62) A (cid:107)(cid:107) H (cid:107)(cid:107) G (cid:107) = O p (cid:18) √ p √ n (cid:13)(cid:13)(cid:13) C − (cid:98) C (cid:13)(cid:13)(cid:13)(cid:19) + O p (cid:32) √ p min ( √ n , √ p ) (cid:33) + O p (cid:18) √ p √ n (cid:19) ,using the results from Lemma 6. Next, Φ (cid:62) I G = np Φ (cid:62) EF A (cid:62) (cid:98) AG = n Φ (cid:62) EF (cid:32) F (cid:62) F n (cid:33) − = O p (cid:18) √ p √ n (cid:19) ,where the order of Φ (cid:62) EF is found in Lemma 3 ( iii ) . Φ (cid:62) I G = np Φ (cid:62) EE (cid:62) (cid:98) AG = np Φ (cid:62) EE (cid:62) ( (cid:98) A − A H ) G + np Φ (cid:62) EE (cid:62) A HG ≤ np (cid:107) Φ (cid:62) E (cid:107)(cid:107) E (cid:62) ( (cid:98) A − A H ) (cid:107)(cid:107) G (cid:107) + np (cid:107) Φ (cid:62) E (cid:107)(cid:107) E (cid:62) A (cid:107)(cid:107) H (cid:107)(cid:107) G (cid:107) = O p (cid:18) √ n (cid:13)(cid:13)(cid:13) C − (cid:98) C (cid:13)(cid:13)(cid:13)(cid:19) + O p (cid:32) ( √ n , √ p ) (cid:33) + O p ( ) = O p ( ) ,20here Lemma 3 ( ii ) and Lemma 6 ( i ) are used. Combining the terms, we have proved ( i ) . Theproof for ( ii ) is the same.For ( iii ) , we can write (cid:98) A (cid:62) ( (cid:98) A − A H ) = ( (cid:98) A − A H ) (cid:62) ( (cid:98) A − A H ) + ( A H ) (cid:62) ( (cid:98) A − A H ) .The order of the first term on the right can be found in Propostion 1. The order of the second termon the right is proved in ( ii ) . For ( iv ) , we have Φ (cid:62) M (cid:98) A ( (cid:98) A − A H ) = Φ (cid:62) ( (cid:98) A − A H ) + p Φ (cid:62) (cid:98) A (cid:98) A (cid:62) ( (cid:98) A − A H ) ,where the orders of the two terms are proved in ( i ) and ( iii ) . Lemma 8.
Define the matrix Q ( A ) = p Φ (cid:62) M A Φ . Under Assumptions 1-4, it holds Q (cid:16) (cid:98) A (cid:17) − − Q (cid:16) A (cid:17) − = o p ( ) . Proof. Q (cid:16) (cid:98) A (cid:17) − Q (cid:16) A (cid:17) = p Φ (cid:62) M (cid:98) A Φ − p Φ (cid:62) M A Φ = p Φ (cid:62) (cid:0) M (cid:98) A − M A (cid:1) Φ = p Φ (cid:62) (cid:0) P A − P (cid:98) A (cid:1) Φ = O p (cid:0)(cid:13)(cid:13) P A − P (cid:98) A (cid:13)(cid:13)(cid:1) = o p ( ) ,using Theorem 1 ( ii ) . In Assumption 2, we have assumed inf A D ( A ) >
0, since the second termin D ( A ) is nonnegative, we have inf A Q ( A ) >
0, so the matrix Q ( A ) is invertible. Therefore, Q (cid:16) (cid:98) A (cid:17) − = (cid:20) Q (cid:16) A (cid:17) − + o p ( ) (cid:21) − = Q (cid:16) A (cid:17) + o p ( ) . Lemma 9.
Recall H defined in Proposition 1, then HH (cid:62) = (cid:32) A (cid:62) A p (cid:33) − + O p (cid:18) √ n (cid:13)(cid:13)(cid:13) C − (cid:98) C (cid:13)(cid:13)(cid:13)(cid:19) + O p (cid:18) ( n , p ) (cid:19) roof. We have from Lemma 71 p A (cid:62) ( (cid:98) A − A H ) = O p (cid:18) √ n (cid:13)(cid:13)(cid:13) C − (cid:98) C (cid:13)(cid:13)(cid:13)(cid:19) + O p (cid:18) ( n , p ) (cid:19) , (53)and 1 p (cid:98) A (cid:62) ( (cid:98) A − A H ) = I r − p (cid:98) A (cid:62) A H = O p (cid:18) √ n (cid:13)(cid:13)(cid:13) C − (cid:98) C (cid:13)(cid:13)(cid:13)(cid:19) + O p (cid:18) ( n , p ) (cid:19) . (54)Left multiply (53) by H (cid:62) and sum with the transpose of (54) to obtain I r − p H (cid:62) A (cid:62) A H = O p (cid:18) n (cid:13)(cid:13)(cid:13) C − (cid:98) C (cid:13)(cid:13)(cid:13)(cid:19) + O p (cid:18) ( n , p ) (cid:19) .Right multiplying by H (cid:62) and left multiplying by H (cid:62)− , we obtain I r − p A (cid:62) A HH (cid:62) = O p (cid:18) √ n (cid:13)(cid:13)(cid:13) C − (cid:98) C (cid:13)(cid:13)(cid:13)(cid:19) + O p (cid:18) ( n , p ) (cid:19) .Then left multiplying (cid:16) A (cid:62) A / p (cid:17) − , we have HH (cid:62) = (cid:32) A (cid:62) A p (cid:33) − + O p (cid:18) √ n (cid:13)(cid:13)(cid:13) C − (cid:98) C (cid:13)(cid:13)(cid:13)(cid:19) + O p (cid:18) ( n , p ) (cid:19) . Lemma 10.
Under Assumptions 1-5, when p / n → ρ > , (cid:13)(cid:13)(cid:13)(cid:13) √ np Φ (cid:62) M (cid:98) A E − √ np Φ (cid:62) M A E (cid:13)(cid:13)(cid:13)(cid:13) = √ p × O p (cid:13)(cid:13)(cid:13) C − (cid:98) C (cid:13)(cid:13)(cid:13) n + o p ( ) . Proof.
Using M A = I p − A (cid:16) A (cid:62) A (cid:17) − A (cid:62) , M (cid:98) A = I p − (cid:16) (cid:98) A (cid:98) A (cid:62) (cid:17) / p ,22e calculate1 √ np Φ (cid:62) M A E − √ np Φ (cid:62) M (cid:98) A E = p √ np Φ (cid:62) (cid:98) A (cid:98) A (cid:62) E − p √ np Φ (cid:62) A (cid:32) A (cid:62) A p (cid:33) − A (cid:62) E = p √ np (cid:40) Φ (cid:62) ( (cid:98) A − A H ) H (cid:62) A (cid:62) E + Φ (cid:62) ( (cid:98) A − A H )( (cid:98) A − A H ) (cid:62) E + Φ (cid:62) A H ( (cid:98) A − A H ) (cid:62) E + Φ (cid:62) A HH (cid:62) − (cid:32) A (cid:62) A p (cid:33) − A (cid:62) E (cid:41) ≡ a + b + c + d ,where we substitute (cid:98) A with (cid:98) A − A H + A H in the second equality. So the first term on theright-hand side of the first equality is broken down into four terms, one of which is combined withthe second term in the right-hand side of the first equality.For notation simplicity, we denote q = √ n (cid:13)(cid:13)(cid:13) C − (cid:98) C (cid:13)(cid:13)(cid:13) + ( √ n , √ p ) , (55)which is used to represent the order in the result of Proposition 1.We calculate each term: (cid:107) a (cid:107) = (cid:13)(cid:13)(cid:13)(cid:13) p √ np Φ (cid:62) ( (cid:98) A − A H ) H (cid:62) A (cid:62) E (cid:13)(cid:13)(cid:13)(cid:13) = p √ np × O p ( √ p ) × O p ( √ pq ) × O p ( √ np )= O p (cid:13)(cid:13)(cid:13) C − (cid:98) C (cid:13)(cid:13)(cid:13) √ n + (cid:32) ( √ n , √ p ) (cid:33) = o p ( ) ,where the order of (cid:98) A − A H is √ pq as proved in Proposition 1 and the order of (cid:107) Φ (cid:107) (cid:107) A (cid:62) E (cid:107) can23e found in Lemma 2 and 3 ( ii ) respectively. And when p / n → ρ > (cid:107) b (cid:107) = (cid:13)(cid:13)(cid:13)(cid:13) p √ np Φ (cid:62) ( (cid:98) A − A H )( (cid:98) A − A H ) (cid:62) E (cid:13)(cid:13)(cid:13)(cid:13) ≤ p √ np × O p ( √ p ) × O p ( pq ) × O p ( √ np )= √ p × O p (cid:13)(cid:13)(cid:13) C − (cid:98) C (cid:13)(cid:13)(cid:13) n + O p (cid:18) ( n , p ) (cid:19) = √ p × O p (cid:13)(cid:13)(cid:13) C − (cid:98) C (cid:13)(cid:13)(cid:13) n + o p ( ) ,where again Proposition 1 and Lemma 2 are used. And c = p √ np Φ (cid:62) A H ( (cid:98) A − A H ) (cid:62) E = p √ np Φ (cid:62) A HH (cid:62) ( (cid:98) AH − − A ) (cid:62) E = p √ np Φ (cid:62) A HH (cid:62) − (cid:32) A (cid:62) A p (cid:33) − ( (cid:98) AH − − A ) (cid:62) E + p √ np Φ (cid:62) A (cid:32) A (cid:62) A p (cid:33) − ( (cid:98) AH − − A ) (cid:62) E ≡ c + c (cid:98) A − A H = ( (cid:98) AH − − A ) H . In the third equality, we subtract ( A (cid:62) A / p ) − from HH (cid:62) and then add it back.For c
1, when p / n → ρ > (cid:107) c (cid:107) = (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) p √ np Φ (cid:62) A HH (cid:62) − (cid:32) A (cid:62) A p (cid:33) − ( (cid:98) AH − − A ) (cid:62) E (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) ≤ p √ np × O p ( √ p ) × O p ( √ p ) × (cid:20) O p (cid:18) √ n (cid:13)(cid:13)(cid:13) C − (cid:98) C (cid:13)(cid:13)(cid:13)(cid:19) + O p (cid:18) ( n , p ) (cid:19)(cid:21) × (cid:34) O p (cid:32) p min ( √ n , √ p ) (cid:13)(cid:13)(cid:13) C − (cid:98) C (cid:13)(cid:13)(cid:13)(cid:33) + O p ( √ n ) + O p (cid:18) p √ n (cid:19)(cid:35) = O p √ p min ( √ n , √ p ) (cid:13)(cid:13)(cid:13) C − (cid:98) C (cid:13)(cid:13)(cid:13) n + O p √ p (cid:13)(cid:13)(cid:13) C − (cid:98) C (cid:13)(cid:13)(cid:13) √ n + O p √ pn (cid:13)(cid:13)(cid:13) C − (cid:98) C (cid:13)(cid:13)(cid:13) √ n + O p (cid:18) √ p ( n , p ) (cid:19) + O p (cid:18) √ pn ( n , p ) (cid:19) = o p ( ) , 24here the order of Φ , A and E are found in Lemma 2; the order of HH (cid:62) − (cid:16) A (cid:62) A / p (cid:17) − isfound in Lemma 9; and the order of (cid:98) AH − − A is found in Proposition 1. Now for c
2, using thesame lemmas and proposition, (cid:107) c (cid:107) = (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) p √ np Φ (cid:62) A (cid:32) A (cid:62) A p (cid:33) − ( (cid:98) AH − − A ) (cid:62) E (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) ≤ p √ np O p ( √ p ) × O p ( √ p ) × (cid:34) O p (cid:32) p min ( √ n , √ p ) (cid:13)(cid:13)(cid:13) C − (cid:98) C (cid:13)(cid:13)(cid:13)(cid:33) + O p ( √ n ) + O p (cid:18) p √ n (cid:19)(cid:35) = O p √ p √ n (cid:13)(cid:13)(cid:13) C − (cid:98) C (cid:13)(cid:13)(cid:13) √ n + O p (cid:18) √ pn (cid:19) ,which is o p ( ) when p / n → ρ > (cid:107) d (cid:107) = (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) p √ np Φ (cid:62) A HH (cid:62) − (cid:32) A (cid:62) A p (cid:33) − A (cid:62) E (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) ≤ p √ np O p ( √ p ) × O p ( √ p ) × O p (cid:18) √ n (cid:13)(cid:13)(cid:13) C − (cid:98) C (cid:13)(cid:13)(cid:13) + ( n , p ) (cid:19) × O p ( √ np )= O p (cid:18) √ n (cid:13)(cid:13)(cid:13) C − (cid:98) C (cid:13)(cid:13)(cid:13)(cid:19) + O p (cid:18) ( n , p ) (cid:19) = o p ( ) ,where again Lemma 9 is used.Thus combining the above terms, we have (cid:13)(cid:13)(cid:13)(cid:13) √ np Φ (cid:62) M (cid:98) A E − √ np Φ (cid:62) M A E (cid:13)(cid:13)(cid:13)(cid:13) = √ p × O p (cid:13)(cid:13)(cid:13) C − (cid:98) C (cid:13)(cid:13)(cid:13) n + o p ( ) ,when p / n → ρ > Lemma 11.
Recall J defined in (30) , we have (cid:107) J (cid:107) = o p (cid:16)(cid:13)(cid:13)(cid:13) C − (cid:98) C (cid:13)(cid:13)(cid:13)(cid:17) + O p (cid:18) ( n , p ) (cid:19) + O p (cid:18) √ n √ p ( n , p ) (cid:19) .25 roof. J = − p Φ (cid:62) M (cid:98) A I GF (cid:62) = − np Φ (cid:62) M (cid:98) A EE (cid:62) (cid:98) AGF (cid:62) = − np Φ (cid:62) EE (cid:62) (cid:98) AGF (cid:62) + np Φ (cid:62) (cid:98) A (cid:98) A (cid:62) EE (cid:62) (cid:98) AGF (cid:62) ≡ I + I I ,where we use M (cid:98) A = I p − (cid:98) A (cid:98) A (cid:62) / p . For I , I = − np Φ (cid:62) EE (cid:62) (cid:16) (cid:98) A − A H (cid:17) GF (cid:62) − np Φ (cid:62) EE (cid:62) A HGF (cid:62) ,then (cid:107) I (cid:107) ≤ np (cid:13)(cid:13)(cid:13) Φ (cid:62) EE (cid:62) (cid:13)(cid:13)(cid:13) (cid:107) (cid:98) A − A H (cid:107)(cid:107) G (cid:107)(cid:107) F (cid:107) + np (cid:13)(cid:13)(cid:13) Φ (cid:62) EE (cid:62) A (cid:13)(cid:13)(cid:13) (cid:107) H (cid:107)(cid:107) G (cid:107)(cid:107) F (cid:107) = np × (cid:2) O p (cid:0) p √ n (cid:1) + O p ( n √ p ) (cid:3) × O p ( √ pq ) × O p (cid:0) √ n (cid:1) + np × (cid:2) O p (cid:0) p √ n (cid:1) + O p ( n √ p ) (cid:3) × O p (cid:0) √ n (cid:1) = O p (cid:18) √ p q (cid:19) + O p (cid:18) √ np q (cid:19) ,where the order of (cid:13)(cid:13)(cid:13) Φ (cid:62) EE (cid:62) (cid:13)(cid:13)(cid:13) and (cid:13)(cid:13)(cid:13) Φ (cid:62) EE (cid:62) A (cid:13)(cid:13)(cid:13) are found in Lemma 3; the order of (cid:107) (cid:98) A − A H (cid:107) is from Proposition 1; and the orders of (cid:107) F (cid:107) and (cid:107) G (cid:107) are found in Lemma 2 ( iii ) and Lemma 5respectively.For I I , I I = np Φ (cid:62) (cid:98) A (cid:16) (cid:98) A − A H + A H (cid:17) (cid:62) EE (cid:62) (cid:16) (cid:98) A − A H + A H (cid:17) GF (cid:62) = np Φ (cid:62) (cid:98) A (cid:20)(cid:16) (cid:98) A − A H (cid:17) (cid:62) EE (cid:62) (cid:16) (cid:98) A − A H (cid:17) + (cid:16) A H (cid:17) (cid:62) EE (cid:62) (cid:16) (cid:98) A − A H (cid:17) + (cid:16) (cid:98) A − A H (cid:17) (cid:62) EE (cid:62) A H + (cid:16) A H (cid:17) (cid:62) EE (cid:62) A H (cid:21) GF (cid:62) ,26hen (cid:107) I I (cid:107) ≤ np (cid:107) Φ (cid:62) (cid:107)(cid:107) (cid:98) A (cid:107) (cid:104) (cid:107) (cid:98) A − A H (cid:107) (cid:107) EE (cid:62) (cid:107) + (cid:107) (cid:98) A − A H (cid:107)(cid:107) EE (cid:62) A (cid:107) + (cid:107) A (cid:62) EE (cid:62) A (cid:107) (cid:105) (cid:107) G (cid:107)(cid:107) F (cid:107) = np (cid:110)(cid:2) O p (cid:0) p √ n (cid:1) + O p ( n √ p ) (cid:3) × (cid:16) pq + √ pq + (cid:17)(cid:111) × O p (cid:0) √ n (cid:1) = O p (cid:20)(cid:18) p + √ np √ p (cid:19) (cid:16) pq + √ pq (cid:17)(cid:21) ,where the order of (cid:13)(cid:13) EE (cid:62) A (cid:13)(cid:13) and (cid:13)(cid:13)(cid:13) A (cid:62) EE (cid:62) A (cid:13)(cid:13)(cid:13) are found in Lemma 3; the order of (cid:107) (cid:98) A − A H (cid:107) is from Proposition 1; and the orders of (cid:107) Φ (cid:107) , (cid:107) F (cid:107) and (cid:107) G (cid:107) are found in Lemma 2 ( i ) , ( iii ) andLemma 5 respectively.Combining I and I I , we have (cid:107) J (cid:107) = O p (cid:20)(cid:18) p + √ np √ p (cid:19) (cid:16) pq + √ pq (cid:17)(cid:21) .Since 1 = O (cid:0) √ pq (cid:1) , the term √ pq is dominated by pq , thus (cid:107) J (cid:107) = O p (cid:20)(cid:18) p + √ np √ p (cid:19) pq (cid:21) = O p (cid:18) + √ n √ p (cid:19) (cid:13)(cid:13)(cid:13) C − (cid:98) C (cid:13)(cid:13)(cid:13) n + ( n , p ) = o p (cid:16)(cid:13)(cid:13)(cid:13) C − (cid:98) C (cid:13)(cid:13)(cid:13)(cid:17) + O p (cid:18) ( n , p ) (cid:19) + O p (cid:18) √ n √ p ( n , p ) (cid:19)(cid:19)