Max-sum tests for cross-sectional dependence of high-demensional panel data
SSubmitted to the Annals of Statistics
MAX-SUM TESTS FOR CROSS-SECTIONALDEPENDENCE OF HIGH-DIMENSIONAL PANEL DATA
By Long Feng ∗ , Tiefeng Jiang † , Binghui Liu ∗ and Wei Xiong ‡ Northeast Normal University ∗ , University of Minnesota † and University ofInternational Business and Economics ‡ We consider a testing problem for cross-sectional dependence forhigh-dimensional panel data, where the number of cross-sectionalunits is potentially much larger than the number of observations.The cross-sectional dependence is described through a linear regres-sion model. We study three tests named the sum test, the max testand the max-sum test, where the latter two are new. The sum test isinitially proposed by Breusch and Pagan (1980). We design the maxand sum tests for sparse and non-sparse residuals in the linear regres-sions, respectively. And the max-sum test is devised to compromiseboth situations on the residuals. Indeed, our simulation shows thatthe max-sum test outperforms the previous two tests. This makesthe max-sum test very useful in practice where sparsity or not fora set of data is usually vague. Towards the theoretical analysis ofthe three tests, we have settled two conjectures regarding the sum ofsquares of sample correlation coefficients asked by Pesaran (2004 and2008). In addition, we establish the asymptotic theory for maximaof sample correlations coefficients appeared in the linear regressionmodel for panel data, which is also the first successful attempt to ourknowledge. To study the max-sum test, we create a novel method toshow asymptotic independence between maxima and sums of depen-dent random variables. We expect the method itself is useful for otherproblems of this nature. Finally, an extensive simulation study as wellas a case study are carried out. They demonstrate advantages of ourproposed methods in terms of both empirical powers and robustnessfor residuals regardless of sparsity or not.
CONTENTS1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 The proposed tests . . . . . . . . . . . . . . . . . . . . . . . . . . . 52.1 Problem description . . . . . . . . . . . . . . . . . . . . . . . 52.2 Test statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . 62.3 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 Theoretical results . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
Keywords and phrases: high-dimensional data, panel data models, hypothesis tests,cross-sectional dependence, asymptotic normality, extreme-value distribution, asymptoticindependence, max-sum test. a r X i v : . [ m a t h . S T ] J u l L. FENG ET AL.
AX-SUM TEST FOR CROSS-SECTIONAL DEPENDENCE
1. Introduction.
In this paper we will study the cross-sectional depen-dence for the following linear regression model for panel data y it = x (cid:48) it β i + (cid:15) it (1)for i = 1 , · · · , N and t = 1 , · · · , T , where i represents households, individ-uals, firms, etc., and t represents time. In the literature of panel data, theindex i stands for sections . For each section i , the corresponding model isa standard multiple linear regression model, where y it ∈ R is the depen-dent variable and x it ∈ R p is the regressor with slope parameter β i ∈ R p .The first coordinate of x it is one if there is an intercept in the linear re-gression model (1). The value of β i may vary across i . In (1), we assume { (cid:15) it ; 1 ≤ t ≤ T } are independent and identically distributed (i.i.d.) for eachsection i . However, across sections the random errors may be dependent,that is, { (cid:15) it ; 1 ≤ i ≤ N } may be dependent for some t. Such dependence isreferred to as cross-sectional dependence. The objective of this paper is totest if there exists cross-sectional dependence by using a few of new methods.Before stating our results, we will introduce some background next.In statistics and econometrics, panel data or longitudinal data are multi-dimensional data involving measurements over time, which contain obser-vations of various phenomena over multiple time periods for the same unit,for instance, a household or a firm. In the study of panel data models, thecross-sectional dependence is an important concept, described as the inter-action between cross-sectional units, which could arise from the behavioralinteraction between units.Stephan [38] argues that “in dealing with social data, we know that byvirtue of their very social character, persons, groups and their characteris-tics are interrelated and not independent. ” However, to make theoreticalstudy easier, experts assume cross-sectional independence in various modelsetups [19, 31]. If data across individuals are dependent, inferences underthe assumption of cross-sectional independence would be inaccurate andmisleading; see [19, 32] and the literature therein. To this end, testing theexistence of cross-sectional dependence is an important task, which has at-tracted more attention in recent years, see, for instance, [12, 29, 32, 33, 35].Perhaps the most widely known test for cross-sectional independence isthe Lagrange Multiplier (LM) statistic proposed by Breusch and Pagan[3] in 1980 (Google records 5353 citations currently). Their test statisticis the sum of squares of sample correlation coefficients between the resid-uals from the ordinary least square (OLS). Precisely, for each i , let ˆ β i be the standard estimator of β i in the linear regression for observations { ( y it , x it ); t = 1 , · · · , T } and the quantity ˆ (cid:15) it = y it − x (cid:48) it ˆ β i denotes the resid- L. FENG ET AL. ual. For each i, j = 1 , · · · , N , define the sample correlation ˆ ρ ij byˆ ρ ij = (cid:80) Tt =1 ˆ (cid:15) it ˆ (cid:15) jt (cid:113)(cid:80) Tt =1 ˆ (cid:15) it (cid:80) Tt =1 ˆ (cid:15) jt . (2)Breusch and Pagan [3] propose the Lagrange multiplier test statistic definedby S N := (cid:88) ≤ i 2. The proposed tests. Problem description. Review model (1) that y it = x (cid:48) it β i + (cid:15) it for i = 1 , · · · , N and t = 1 , · · · , T , where i indexes the cross-sectional units and t indexes the observations. In this model, y it ∈ R is the dependent variable,and x it ∈ R p is the non-random, exogenous regressor with slope parameter β i ∈ R p that are allowed to vary across i . We assume { (cid:15) it ; 1 ≤ t ≤ T } are i.i.d. real-valued random variables for each section i . However, acrosssections the random errors may be dependent, that is, { (cid:15) it ; 1 ≤ i ≤ N } may be dependent for some t. Such dependence is called cross-sectionaldependence. Set x i = ( x i , · · · , x iT ) (cid:48) , y i = ( y i , · · · , y iT ) (cid:48) , (cid:15) i = ( (cid:15) i , · · · , (cid:15) iT ) (cid:48) (6)for i = 1 , , · · · , N. Then x i is a T × p matrix; both y i and (cid:15) i are T -dimensional vectors. Throughout the paper we assume that the T entries L. FENG ET AL. of (cid:15) i are i.i.d. with mean zero for each i . Recalling (1), the cross-sectionalindependence is the same as saying that H : (cid:15) , (cid:15) , · · · , (cid:15) N are independent random vectors . (7)In general, although sometimes we assume (cid:15) has the normal distribution,we do not need the exact distribution of (cid:15) but rather its moments.2.2. Test statistics. First, we list some notations used in the rest of thepaper. Reviewing (6), for each i = 1 , · · · , N , letˆ β i = ( x (cid:48) i x i ) − x (cid:48) i y i , P i = I T − x i ( x (cid:48) i x i ) − x (cid:48) i , (8)where I T is the T × T identity matrix and P i is a T × T projection matrix with P i = P i and the rank of P i is T − p . For each i, j = 1 , · · · , N , let ˆ ρ ij denotethe sample correlation coefficient computed by the Ordinary Least Squares(OLS) residuals (ˆ (cid:15) i , · · · , ˆ (cid:15) iT ) T and (ˆ (cid:15) j , · · · , ˆ (cid:15) jT ) T where ˆ (cid:15) it = y it − x (cid:48) it ˆ β i foreach i and t . Under model (1), it is easy to see that(ˆ (cid:15) i , · · · , ˆ (cid:15) iT ) T = P i (cid:15) i for each i . Thus, by (2),ˆ ρ ij = (cid:80) Tt =1 ˆ (cid:15) it ˆ (cid:15) jt (cid:113)(cid:80) Tt =1 ˆ (cid:15) it (cid:113)(cid:80) Tt =1 ˆ (cid:15) jt = (cid:15) (cid:48) i P i P j (cid:15) j (cid:107) P i (cid:15) i (cid:107) · (cid:107) P j (cid:15) j (cid:107) . (9)In this paper, to test the null hypothesis (7), we will study three types oftests as follows: sum: S N = (cid:88) ≤ i In this paper, for the panel data model (1) we studythe cross-sectional dependence. The asymptotic distributions of three teststatistics based on residuals are established. As application, three hypothesistests are accomplished. A real data analysis by using our results is provided.We will now further elaborate below.In the theoretical part, we have solved two open problems on the sum ofsquares of residuals conjectured by economists ([31, 34]; see also [33, 35]).We have developed an extreme-value theory for the maximum of residuals.Further, a new method is developed to show the sum and the maximumare asymptotically independent. There are not many results in literatureto show asymptotic independence between sums of and maxima of randomvariables. Close references are [21, 40]. Our method, being different fromearlier literature, provides a general and novel tool for showing asymptoticindependence between sums of and maxima of random variables.In application, we propose three tests on the cross-sectional dependencefor high-dimensional panel data: the sum test, the max test and the max-sumtest. The max test is the first high-dimensional max test for cross-sectionaldependence in panel data models, which is good for sparse residuals whileexisting test statistics of sum types tend to fail. The sum test is useful fornon-sparse residuals, which is clearly demonstrated by simulation in, for L. FENG ET AL. example, [31, 33, 34, 35]. We are able to derive the limiting distribution ofthe sums in this paper.Furthermore, the max-sum test is constructed based on the asymptoticindependence between the max and the sum statistics aforementioned. Itis the first max-sum test for studying cross-sectional dependence for high-dimensional panel data. The advantage is that the test works well for bothsparse and non-sparse residuals. Comparing the pros and cons of the maxtest and the sum test, the max-sum test definitely overcomes both disad-vantages. Our simulations reveal this fact clearly; see Figure 1 and its inter-pretation at the last part of Section 4.2. The max-sum test is particularlyuseful considering it is hard to quantify or determine in practice whether adata set is sparse or not. 3. Theoretical results. We now present the main theoretical resultsbased on the three types of tests in the order of the sum test, the max testand the max-sum test. Their proofs are presented in Section 7.3.1. The limiting distribution for the sum test. Recall that the sum testdescribed in (12) is a classical one for testing cross-sectional dependence inpanel data models. However, the asymptotic theory has not been establishedyet. Pesaran from [31, 34] conjectures that S N satisfies the central limittheorem. Some insights on this aspect are given, for example, in [35] and[33]. In the following we will present our solution to the problem as well asanother one in which the details are given below. The following assumptionwill be needed throughout the paper. Recall a random variable V is said tobe continuous if P ( V = v ) = 0 for every v ∈ R . Assume (cid:15) i = ( (cid:15) i , · · · , (cid:15) iT ) (cid:48) , i = 1 , , · · · , N, are independent T -dimensionalrandom vectors , and (cid:15) i , · · · , (cid:15) iT are i.i.d. continuous random variableswith E(cid:15) i = 0 and Var( (cid:15) i ) = σ i > for each i. (14)If the T entries of (cid:15) i are i.i.d. continuous random variables, by using aconditional argument, we then trivially have P ( a (cid:48) (cid:15) i = 0) = 0 for any a ∈ R T \{ } . This implies that P ( M (cid:15) i = ) = 0 for any l × T matrix M (cid:54) = 0 and l ≥ 1. So ˆ ρ ij in (9) is well-defined if T > p because rank( P i ) = T − p ; seethe explanation below (8).Now we present our solutions to Pesaran’s conjectures from [31, 34] asfollows. For mathematical rigor, we assume the parameter T depends on N . The notation N k ( µ, Σ ) stands for the k -dimensional multivariate normaldistribution with mean vector µ and covariance matrix Σ . Although the AX-SUM TEST FOR CROSS-SECTIONAL DEPENDENCE linear regression model in (1) requires p ≥ 1, that is, there are at least oneregressors, the following theorem applies to the case that p = 0 and P i = I T for each i ; (15)see (8). T HEOREM Assume p ≥ is fixed and T / √ N → ∞ as N → ∞ . Letassumption (14) hold with (cid:15) i ∼ N T ( , σ i I ) for each i . Let S N be as in (10) and µ N be as in (13) , respectively. Then ( S N − µ N ) /N converges to N (0 , in distribution as N → ∞ . From the assumption, it is allowed that the number of cross-sectionalunits N is much larger than the number of observations T , for example, N is of order T / . We apply the framework of the Lindeberg-Feller mar-tingale CLT to study S N . Although the method is simple and is easy tofollow, the technical steps are very involved due to the complex nature ofthe sample correlation coefficients ˆ ρ ij in (9). Many computations focus onthe conditional means, variances and higher moments.Considering a possible better convergence rate than the CLT given inTheorem 1, [34] revises the statistic S N and proposes a new one as follows. Q N = (cid:115) N ( N − N − (cid:88) i =1 N (cid:88) j = i +1 ( T − p ) ˆ ρ ij − µ Nij v Nij , (16)where µ Nij = 1 T − p tr( P i P j ) , (17) v Nij = a N · [tr( P i P j )] + 2 a N · tr[( P i P j ) ] , (18) a N = a N − T − p ) , (19) a N = 3 · (cid:104) ( T − p − T − p + 2) + 24( T − p + 2)( T − p − T − p − (cid:105) . (20)Pesaran et al . [34] conjecture that Q N also satisfies the CLT. We confirm itin the next theorem. T HEOREM Assume the setting in Theorem 1. Then Q N converges to N (0 , in distribution as N → ∞ . Our simulation in Figure 1 shows that the effects of the two approximationsin Theorems 1 and 2 are too close to be distinguishable. Theorem 1 allows L. FENG ET AL. us to perform a level- α test by rejecting the null hypothesis from (7) when( S N − µ N ) /N is larger than the 1 − α quantile z α = Φ − (1 − α ) of N (0 , Q N under the same nullhypothesis. This provides a theoretical guarantee for Q N that has been usedin econometrics; see, for example, [12, 29, 32].Now we make some comments. R EMARK Let us see how heuristically the mean µ N and the standarddeviation N in Theorem 1 are figured out. In fact, µ N is computed by us-ing Lemma 12(i). The variance is calculated via Lemma 13(iv) by noticing T tr [( P i P j ) ] → and T [ tr ( P i P j )] → as N → ∞ and by regarding { ˆ ρ ij , ≤ i < j ≤ N } as independent random variables although they areweakly correlated. R EMARK Take p = 0 in Theorem 1. From (15) , P i = I T for all ≤ i ≤ N and hence tr ( P i P j ) = T . Theorem 1 then says that, assuming T / √ N →∞ , S N N − N → N (cid:16) − , (cid:17) (21) in distribution as N → ∞ . This is the trivial case that no linear regressionis involved. And ˆ ρ ij = (cid:15) (cid:48) i (cid:15) j (cid:107) (cid:15) i (cid:107)·(cid:107) (cid:15) j (cid:107) from (9) , where (cid:15) i and (cid:15) j are independentGaussian vectors with distributions N T ( , σ i I ) and N T ( , σ j I ) , respectively.Suppose N/T → γ ∈ (0 , ∞ ) . Rewrite (21) by using the Slutsky lemma tosee (cid:88) ≤ i 12 + o (1) (cid:105) . Theorem 1 says that S N N − T N T − p ) → N (cid:16) − , (cid:17) in distribution as N → ∞ . In particular, if NT → c ∈ (0 , ∞ ) , then T N T − p ) = N + cp + o (1) . Hence S N N − N → N (cid:16) cp − , (cid:17) . (23) A point for this extreme example is that, as these x i are highly correlated, theCLT in (23) is indeed different from the trivial CLT in (21) . Interestingly,the next example is completely different from this one. R EMARK Assume T = N p . In this case, NT = p . Construct x = ( I p , p × ( T − p ) ) (cid:48) , x = ( p × p , I p , p × ( T − p ) ) (cid:48) , · · · , x N = ( p × ( T − p ) , I p ) (cid:48) . They are T × p matrices. Then x (cid:48) i x i = I p for each ≤ i ≤ N , and hence x i ( x (cid:48) i x i ) − x (cid:48) i = p × p · · · p × p · · · p × p ... ... ... ... p × p · · · I p · · · p × p ... ... ... ... p × p · · · p × p · · · p × p , where each p × p is a p × p submatrix with all entries equal to zero. In otherwords, we may regard x i ( x (cid:48) i x i ) − x (cid:48) i as an N × N matrix with each entrybeing a block of p × p matrix, and the only non-zero entry is the ( i, i ) -entry I p . Then x i ( x (cid:48) i x i ) − x (cid:48) i · x j ( x (cid:48) j x j ) − x (cid:48) j = T × T for i (cid:54) = j . It follows from thedefinition of P i in (8) that tr ( P i P j ) = T − p . Thus, µ N = N ( N − · T ( T − p )( T − p ) = N · (cid:104) N − 12 + o (1) (cid:105) L. FENG ET AL. by the assumption T = N p . Then S N N − N → N (cid:16) − , (cid:17) in distribution as N → ∞ . The essence for this example demonstrates thatthe projection matrices P i are orthogonal to each other contrary to the highlycorrelated case in Remark 4. We see the CLT here is more like the one inthe trivial case from Remark 2 but is different from that in Remark 4. R EMARK Review (4) that S N → χ ( d ) as T → ∞ while N is fixed,where d = N ( N − . By using the approximation ( χ ( d ) − d ) / √ d → N (0 , as d → ∞ , we see that, if taking limit above were legitimate, wewould have S N − N ( N − (cid:112) N ( N − → N (0 , . By the Slutsky lemma, this entails S N N − N → N (cid:16) − , (cid:17) in distribution. It is interesting to see this weak convergence in Remarks 2and 5, but not in Remark 4. In fact, for big data with the feature that two ormore parameters are large, to study a statistic of interest, it is not alwaysvalid to send parameters to infinity one by one; see such examples in, forinstance, [25, 26, 41]. The limiting distribution for the max test. Recall model (1) andnotations in (8). As in Section 3.1, we assume that x i x (cid:48) i is invertible for each1 ≤ i ≤ N and the quantity T depends on N . From (9) and assumption(14), we know { ˆ ρ ij : 1 ≤ i, j ≤ N } are invariant of σ , · · · , σ N , so we areable to assume, without loss of generality, σ = · · · = σ N ≡ 1. Review (cid:15) inassumption (14) and L N in (11). As explained in (15), we will also considerthe case p = 0. The main results in this section are presented as follows. T HEOREM Assume p ≥ is fixed and lim N →∞ T /N = c ∈ (0 , ∞ ) . Let (cid:15) , · · · , (cid:15) N be i.i.d. and assumption (14) hold with E | (cid:15) | τ < ∞ for some τ > . Then, as N → ∞ , T L N − N + log log N converges weakly to thedistribution function F ( y ) = exp( − e − y/ / √ π ) , y ∈ R . T HEOREM Assume p ≥ is fixed and log N = o ( T / ) as N → ∞ . Let (cid:15) , · · · , (cid:15) N be i.i.d. and assumption (14) hold with Ee ω | (cid:15) | < ∞ for some ω > . Then, as N → ∞ , T L N − N + log log N converges weakly to thedistribution function F ( y ) = exp( − e − y/ / √ π ) , y ∈ R . AX-SUM TEST FOR CROSS-SECTIONAL DEPENDENCE We say ξ is a subgaussian random variable if there exists σ > Ee tξ ≤ e σ t / for all t ∈ R . By the Markov inequality, it is easy to see P ( | ξ | ≥ x ) ≤ e − x / (2 σ ) for all x > 0. As a consequence, Ee θξ < ∞ forall θ < σ . Obviously, bounded random variables and Gaussian randomvariables are all subgaussian random variables. T HEOREM Assume p ≥ is fixed and log N = o ( T / ) as N → ∞ . Let (cid:15) , · · · , (cid:15) N be i.i.d. and assumption (14) hold with (cid:15) being a subgaussianrandom variable. Then, as N → ∞ , T L N − N + log log N convergesweakly to the distribution function F ( y ) = exp( − e − y/ / √ π ) , y ∈ R . The strategy of the proofs of Theorems 3-5 is to approximate L N =max ≤ i Review the accountsbefore the statement of Theorem 1. We have the following conclusion onasymptotic independence. T HEOREM Let S N , L N and µ N be as in (10) , (11) and (13) , respec-tively. Under the same assumptions as in Theorem 1, we have that ( S N − µ N ) /N and T L N − N + log log N are asymptotically independent as N → ∞ . By employing a new trick we prove the asymptotic independence betweenthe maximum and the sum of random variables in Theorem 6. This methodis expected to be used in many of such type of problems. In fact, there are fewliterature to prove asymptotic independence between sums of and maximaof random variables. Some close references are [21, 40]. The method hereis new. It gives a novel tool to establish asymptotic independence betweensums of and maxima of random variables.To understand the idea quickly, we start with the set-up (86). The firstobservation is that the maximum of many random variables, seemingly aglobal property, can be understood from their local property, that is, themaxima of subsets of random variables with fixed sizes. This step is done L. FENG ET AL. through the inclusion-exclusion formula. The second observation is that anysuch subset of random variables and the sum are independent with very highprobability. Consequently the probability of the intersection of the eventsrelated to local maxima and the sum can be written as the product of twoindividual probabilities. Then we use the inclusion-exclusion formula onemore time to get the product of two individual probabilities up to negligibleerrors. Details are shown at the beginning of Section 7.3.2.An immediate application is given below. By Theorems 1 and 5, we knowthat 1 N ( S N − µ N ) → N (0 , 1) weakly; (24) T L N − N + log log N → F ( y ) = exp( − e − y/ / √ π ) weakly . (25)Let Φ( x ) be the distribution function of N (0 , . Trivially, both F ( y ) andΦ( x ) are continuous functions. Set P S N = 1 − Φ { ( S N − µ N ) /N } and P L N =1 − F ( T L N − N + log log N ). By Theorem 6, (24) and (25), we see that P L N and P S N are asymptotically independent and each limit is U [0 , , . So the following holds easily. C OROLLARY Set C N = min { P S N , P L N } . Assume the setting in Theo-rem 6. Then C N converges to W := min { U, V } in distribution as N → ∞ ,where U and V are i.i.d. random variables with distribution U [0 , . Thedistribution function of W is given by G ( w ) = 2 w − w for w ∈ [0 , . According to Corollary 1, the proposed max-sum test in (12) allows us toperform a level- α test by rejecting the null hypothesis (7) if C N < −√ − α . 4. Simulation studies. We now conduct simulations to compare thefinite sample performance of the tests studied in this paper and another testin literature. The tests we have worked on in this paper are based on S N , L N , C N , Q N in (10), (11), (12), (16), respectively. The other one is basedon CD from [31] defined by CD = (cid:115) TN ( N − (cid:88) ≤ i We consider the data generating process usedin [34], which is specified as y it = α i + p (cid:88) l =2 x lit β li + (cid:15) it for i = 1 , · · · , N and t = 1 , · · · , T . Comparing the notations in model (1),we have x it = (1 , x it , · · · , x pit ) T ∈ R p and β i = ( α i , β i , · · · , β pi ) ∈ R p . Now we independently generate α i ∼ N (0 , 1) and β li ∼ N (1 , . x lit = 0 . x lit − + v lit for i = 1 , · · · , N, t = − , − , · · · , T and l = 2 , · · · , p with x li, − = 0,where v lit ∼ N (0 , ζ li / (1 − . )) and ζ li ∼ χ / 6. In this case, ζ li ’s are indepen-dently sampled first, then v lit ’s are independently generated by conditioningon the values of ζ li .Now we generate (cid:15) it ’s under null hypothesis (7). Let (cid:15) it = σ i w it , where w it ’s are generated from three different distributions: (i) N (0 , t / (cid:112) / χ − / √ 10. Here t d is the t -distribution of degree d and χ d is the chi-square distribution of degree d. The normalization in (ii) and (iii) is such thateach new random variable has mean zero and variance one. Let σ i ∼ p χ ,as in the dynamic setup of [34].We turn to produce data under the alternative hypothesis. Let η t :=( η t , · · · , η Nt ) (cid:48) be generated from the above three different distributions un-der the null hypothesis. Set (cid:15) .t = ( (cid:15) t , · · · , (cid:15) Nt ) (cid:48) = Σ / η t . Please differenti-ate the notation (cid:15) .t here and (cid:15) i in (6). We consider the following two casesof the covariance matrix Σ = D / RD / with D = diag { σ , · · · , σ N } .(1) Non-sparse case. Randomly select a subset A ⊂ { , · · · , N } with car-dinality N . . Let R = ( ρ ij ) N × N be a symmetric matrix with ρ ij = 1if i = j . For i < j , define ρ ij = 0 if i (cid:54)∈ A or j (cid:54)∈ A , and ρ ij hasthe uniform distribution over ( (cid:112) N/T , (cid:112) N/T ) if i ∈ A and j ∈ A .(2) Sparse case. Randomly select a subset A ⊂ { , · · · , N } with cardinality N . . Let R = ( ρ ij ) N × N be a symmetric matrix with ρ ij = 1 if i = j .For i < j , define ρ ij = 0 if i (cid:54)∈ A or j (cid:54)∈ A , and ρ ij has the uniformdistribution over ( (cid:112) N/T , (cid:112) 10 log N/T ) if i ∈ A and j ∈ A .To ensure that the covariance matrix Σ = D / RD / is positive definite, wereplace the correlation matrix R with R + λ I N , where λ := | λ min ( R ) | + 0 . λ min ( R ) is the minimum eigenvalue of R . Then, we consider two choicesof the sample size T = 50 , N =50 , , L. FENG ET AL. Simulation results. We now present simulation results on the testsof S N , L N , C N , Q N , CD in (10), (11), (12), (16), (26), respectively. All theconclusions are based on 1,000 replications. The empirical sizes and powersof these tests in non-sparse and sparse cases are summarized in Tables 1 to3. The power curves are plotted in Figure 1. We next analyze them in detail.Table 1 indicates that all methods have empirical sizes not much largerthan 5%. Here, the max test L N and the max-sum test C N tend to havesmaller empirical sizes than the remaining ones, especially as T is relativelysmall. This is not very surprising because it is common for maximum meth-ods designed for raw data models; see, for example, [28].Tables 2 and 3 show the information of empirical powers in both non-sparse and sparse cases. Review the sum test S N and the sum-based test Q N are originally proposed in [3] and [34], respectively. The two are wellstudied in this paper. Tables 2 and 3 show that S N and Q N perform bestin non-sparse cases in terms of empirical powers, but very poorly in sparsecases. On the contrary, the proposed max test L N performs the best in sparsecases, but very poorly in dense cases. Interestingly, it can be seen from Figure1 that the empirical power performance of our proposed max-sum test C N isalways very close to the optimal one among all of the tests, regardless of thelocal alternative being sparse or not. This shows a very appealing propertyfor the test C N which compromises the tests for residuals in both sparse andnon-sparse cases. In fact, it is hard to tell in reality if residuals are sparseor not.Figure 1 shows the changes of the powers of all the tests as the degree ofsparsity changes. Now we explain the procedure to generate the empiricalpower curves in Figure 1. In fact, the horizonal direction in the plot is n ,the degree of sparsity to be defined; the vertical direction represents powers.Specifically, the simulation is designed as follows. Review the general simu-lation design in Section 4.1. We take T = 50, N = 200, p = 2, n = 2 , · · · , w it are generated from normal distributions; a subset A ⊂ { , · · · , N } israndomly selected with cardinality n ; R = ( ρ ij ) ≤ i,j ≤ N , where ρ ij = 1 if i = j ; for i (cid:54) = j , set ρ ij = 0 if i (cid:54)∈ A or j (cid:54)∈ A , and ρ ij has the uniformdistribution over ( (cid:112) n ) − log N/T , (cid:112) n ) − log N/T ) if i ∈ A and j ∈ A . So a larger n means a lower level of sparsity.Figure 1 indicates that the empirical power of the max-sum test C N isalways very close to the maximum power of all tests for all n . By contrast,the empirical power curves of the remaining methods are monotone, i.e. , theempirical powers of both S N and Q N generally increase with the decreaseof sparsity. On the contrary, the empirical power of the max test increaseswith the increase of sparsity. However, every test excluding the max-sum AX-SUM TEST FOR CROSS-SECTIONAL DEPENDENCE Table 1 Empirical sizes (%) of tests. p N 50 100 200 50 100 200 50 100 200Normal distribution50 Q N L N S N C N Q N L N S N C N t -distribution50 Q N L N S N C N Q N L N S N C N χ -distribution50 Q N L N S N C N Q N L N S N C N L. FENG ET AL. Table 2 Empirical powers (%) of tests in non-sparse cases. p Q N L N S N C N Q N L N S N C N t -distribution50 Q N L N S N C N Q N L N S N C N χ -distribution50 Q N L N S N C N Q N L N S N C N Table 3 Empirical powers (%) of tests in sparse cases. p N 50 100 200 50 100 200 50 100 200Normal distribution50 Q N L N S N C N Q N L N S N C N t -distribution50 Q N L N S N C N Q N L N S N C N χ -distribution50 Q N L N S N C N Q N L N S N C N L. FENG ET AL. . . . . . . n (characterizing the sparsity) E m p i r i c a l P o w e r Q N CDL N S N C N Fig 1 . Empirical power curves of tests vary with n . The number n characterizes the degreeof sparsity. The larger the degree n is, the lower the sparsity is. test C N , favors either the sparse case or the non-sparse case, not both casessimultaneously. 5. Application. In this section, we apply the five tests to the securitiesin the Standard & Poor (S&P) 500 index of large cap U.S. equity market.As seen earlier, they are S N , L N , C N , Q N , CD in (10), (11), (12), (16),(26), respectively. This demonstrates the practical usefulness of the proposedtests. The S&P 500 index is primarily intended as a leading indicator of U.S.equities. The composition of this index is monitored by Standard and Poorto ensure the widest possible overall market representation while reducingthe index turnover to a minimum. In this section, we consider 374 securitiesthat have been included in the S&P 500 index during the whole period fromJanuary 2005 to November 2018.In particular, the panel data on the safe rate of return, and the marketfactors are obtained from Ken French’s data library web page. The one- AX-SUM TEST FOR CROSS-SECTIONAL DEPENDENCE month US treasury bill rate is chosen as the risk-free rate ( r ft ), the value-weighted return on all NYSE, AMEX, and NASDAQ stocks from CRSPis used as a proxy for the market return ( r mt ), the average return on thethree small portfolios minus the average return on the three big portfolios( SM B t ), and the average return on two value portfolios minus the averagereturn on two growth portfolios ( HM L t ). SMB and HML are based on thestocks listed on the NYSE, AMEX and NASDAQ. All data are measured inpercent per month. During January 2005 to November 2018, a total of 163consecutive observations are obtained.The Fama-French three-factor model [16] is given as follows: y it = r it − r ft = β i + β i ( r mt − r ft ) + β i SM B t + β i HM L t + (cid:15) it for each 1 ≤ i ≤ N and 1 ≤ t ≤ T with N = 374. We are interested in thefollowing null hypothesis: H : (cid:15) , · · · , (cid:15) N are independent . That is, we are testing that the 374 variables are independent.Now we evaluate the performance of the five tests in Section 4.2, thatis, S N , L N , C N , Q N , CD in (10), (11), (12), (16), (26), respectively. Werandomly sample T = 15 , , 35 observations from the 163 monthly returns.At each value of T , the experiment is repeated 1000 times. It is trivial tosee (cid:18) (cid:19) = 163 · · · · · · · · > · > . Similarly, (cid:18) (cid:19) > · and (cid:18) (cid:19) > · . This says that, although there is a dependency when sample 15 numbersfrom a total of 163 numbers for 1000 times, comparing to 10 , the numberof repeats 1000 is still reasonable. The same also applies to the cases T = 25and T = 35.The results are summarized in Table 4. It suggests that all tests except themax test always reject the null hypothesis of cross-sectional independence.So this indicates the definite cross-sectional dependence among stock returnsunder the three-factor model by Fama-French. In particular, the max testrejects the null hypothesis when T grows to 35, but never reject it when T reduces to 15. To understand this phenomenon, we point out a well known L. FENG ET AL. fact that there may exist a large number of underlying dependencies betweenstocks in the same industry or relevant industries. This leads us to believethat this is indeed a non-sparse case in which the sum and max-sum testsare more valid. 6. Concluding remarks. In this paper we study three tests: the sumtest, the max test and the max-sum test, where the latter two are new ones.Two conjectures on the sum test have been settled. A new method to showasymptotic independence between the maximum and the sum of squares of agiven set of random variables is established. Now we make some comments.1. Under the Gaussian assumption, we obtain the CLTs for S N in Theo-rems 1 and 2. However, the Gaussian assumption is not needed in the studyon the maxima of sample correlations in Theorems 3, 4 and 5. One questionis whether the Gaussian assumption can be removed from Theorems 1 and2. Our proofs rely on the framework in Lemma 3 where the normal assump-tion is essential. Another question is about the restriction between N and T in Theorems 1 and 2. Can the assumption “ T / √ N → ∞ ” be relaxed? Whatis the behavior of S N for other regimes of relationship between N and T ?2. The linear regression in (1) is one of many panel data models; see, forexample, the book length treatment in [1], [18], [33], [39], among others.Some of other models can be studied similarly for the properties we havepursued in this paper. We leave them as a future work to our authors.3. A new way is established to show the asymptotic independence betweenthe sum of and the maximum of a set of random variables. The detail ofthe method is elaborated at the beginning of Section 7.3.2. We expect thismethod will also work for other set of random variables of similar feature.4. For the sum S N = (cid:80) ≤ i The rejection rates of testing cross-sectional independence for the S&P stock panel data,where N = 374 and T = 15 , , . For each T , we sample 1000 data sets. T = 15 T = 25 T = 35 Q N CD L N S N C n 7. Proof. There are three subsections in this part. In each subsectionwe first accumulate some first hand or second hand of understanding beforethe proofs of main theorems are presented. Considering many proofs areinvolved, we postpone some of them in Appendix. They are interesting intheir own right.In this paper we use the following notation. For a sequence of randomvariables { U N ; N ≥ } and a sequence of constants { a N ; N ≥ } , the no-tation U N = o p ( a N ) means that U N /a N → N → ∞ ; wewrite U N = O p ( a N ) if { U N /a N ; N ≥ } is stochastically bounded, that is,lim A →∞ lim sup N →∞ P ( | U N /a N | ≥ A ) = 0 . In particular, if U N = O p ( a N )then U N = o p ( a N b N ) for any sequence of numbers { b N ; N ≥ } withlim N →∞ b N = ∞ . We write a N ∼ b N if lim N →∞ a N b N = 1 for any two se-quence of numbers { a N ; N ≥ } and { b N ; N ≥ } .7.1. The proofs of Theorems 1 and 2. The proof of Theorem 1 is lengthy.The main tool is the Lindeberg-Feller central limit theorem for martingales.Automatically many computations of conditional means and variances aswell as higher moments are needed for sample correlation coefficients ˆ ρ ij .They are non-trivial. To make the proof organized, we decide to put keysteps in a few of sections. This may best facilitate the understanding ofreaders.7.1.1. Prelude 1: technical lemmas towards proofs of Theorems 1 and 2. The proofs of the results in this section will be presented in Section A.1. L EMMA Let ξ be a random variable with Eξ = a . Let τ ≥ be given.The following holds.(i) If a = 0 , then E [ | ξ − Eξ | τ ] ≤ τ · E ( | ξ | τ ) . L. FENG ET AL. (ii) If a (cid:54) = 0 , then E [ | ξ − Eξ | τ ] ≤ τ · (cid:104) | a | − τ · Var( ξ ) τ + (cid:112) E ( | ξ − a | τ ) (cid:105) · (cid:104) | a | τ + | a | − τ · Var( ξ ) τ + (cid:112) E ( | ξ − a | τ ) (cid:105) . The following is the Marcinkiewicz-Zygmund inequality; see, e.g., p. 386and p. 387 from [11]. L EMMA Let m ≥ and { ξ i ; 1 ≤ i ≤ m } be independent random vari-ables with Eξ i = 0 for each i and sup ≤ i ≤ m E ( | ξ i | τ ) < ∞ for some τ ≥ .Then there exists a constant K τ > depending on τ only such that E ( | ξ + · · · + ξ m | τ ) ≤ K τ · E (cid:2)(cid:0) ξ + · · · + ξ m (cid:1) τ/ (cid:3) (27) ≤ K τ · m ( τ/ − (cid:0) E | ξ | τ + · · · + E | ξ m | τ (cid:1) . (28)7.1.2. Prelude 2: mixing moments on random variables uniformly dis-tributed on spheres. In this subsection we develop some identities and in-equalities regarding moments of random vectors with the uniform distribu-tion on high-dimensional unit spheres. The tools and methods are of inde-pendent interest. The proof of Lemma 3 is given in this section to show themain idea and starting point. The remaining proofs of other lemmas will bepresented in Section A.2.Review the setting above (8) and notation P i and (cid:15) i = ( (cid:15) i , · · · , (cid:15) iT ) (cid:48) ∈ R T for each i . The notation S m − represents the unit sphere in the m -dimensional Euclidean space. L EMMA Set m = T − p. Let O i be a T × T orthogonal matrix such that P i = O i (cid:18) I m 00 0 (cid:19) O (cid:48) i , ≤ i ≤ N. (29) Write O i = ( U i , V i ) for each i , where U i is a T × m submatrix. Let { (cid:15) ij ; 1 ≤ i ≤ N, ≤ j ≤ T } be independent random variables with (cid:15) ij ∼ N (0 , σ i ) , σ i > , for all i and j . Write (cid:15) i = ( (cid:15) i , · · · , (cid:15) iT ) (cid:48) ∈ R T for each i . Let s , · · · , s N be i.i.d. random vectors uniformly distributed on S m − . Then ( P (cid:15) (cid:107) P (cid:15) (cid:107) , · · · , P N (cid:15) N (cid:107) P N (cid:15) N (cid:107) ) and ( U s , · · · , U N s N ) have the same distribution. Proof of Lemma 3 . By the scale-invariance of P i (cid:15) i (cid:107) P i (cid:15) i (cid:107) , without loss of gen-erality, assume σ = · · · = σ N = 1. Evidently, ( U i , ) (cid:15) i = U i η i for each i , AX-SUM TEST FOR CROSS-SECTIONAL DEPENDENCE where η i = ( (cid:15) i , · · · , (cid:15) im ) (cid:48) . By the orthogonality and (29), U (cid:48) i U i = I m and U i U (cid:48) i = P i . (30)By the orthogonal invariance of normal distributions and (29) again, P i (cid:15) i =( U i , ) O (cid:48) i (cid:15) i has the same distribution as that of ( U i , ) (cid:15) i = U i η i . Then P i (cid:15) i (cid:107) P i (cid:15) i (cid:107) , as a function of P i (cid:15) i , has the same distribution as that of U i η i (cid:107) U i η i (cid:107) = U i η i ( η (cid:48) i U (cid:48) i U i η i ) / = U i η i (cid:107) η i (cid:107) for each i by the first identity of (30). The desired conclusion then followsfrom the independence among { (cid:15) , · · · , (cid:15) N } . (cid:3) L EMMA Let U i ’s be as in Lemma 3. The following holds.(i) Set M ij = U (cid:48) i U j U (cid:48) j U i for any ≤ i < j ≤ N. Then both M ij and I m − M ij are non-negative definite.(ii) Let M be a m × m non-negative definite matrix satisfying that I m − M is non-negative definite. Then I m − M is also non-negative definite. L EMMA Let { M i , i = 1 , } be non-negative definite matrices. Assume M is idempotent, that is, M = M . Then tr ( M M ) ≤ tr ( M ) . L EMMA Let M and M be n × n non-negative definite matrices. Then,tr ( M M ) ≥ and [ tr ( M M )] ≤ r · tr (( M M ) ) , where r := rank ( M M ) ≤ n . Recall notation (2 m − · · · · (2 m − 1) for any integer m ≥ 1. Byconvention we set ( − L EMMA [Lemma 2.4 from [23]]. Suppose m ≥ and Z , · · · , Z m arei.i.d. N (0 , -distributed random variables. Define U i = Z i / ( Z + · · · + Z m ) for ≤ i ≤ m . Let a , · · · , a m be nonnegative integers. Set a = a + · · · + a m .Then E ( U a U a · · · U a m m ) = (cid:81) mi =1 (2 a i − (cid:81) ai =1 ( m + 2 i − . L EMMA Let m ≥ and { Z i ; 1 ≤ i ≤ m } be i.i.d. N (0 , -distributedrandom variables. Set d = ( Z , · · · , Z m ) (cid:48) / ( Z + · · · + Z m ) / . Let M be asymmetric matrix. Then ( i ) E ( d (cid:48) Md ) = 1 m · tr ( M );( ii ) E [( d (cid:48) Md ) ] = 1 m ( m + 2) · (cid:8) tr ( M ) + [ tr ( M )] (cid:9) ;( iii ) Var ( d (cid:48) Md ) = 2 m ( m + 2) · tr ( M ) − m ( m + 2) · (cid:2) tr ( M ) (cid:3) . L. FENG ET AL. L EMMA Let { Z i ; 1 ≤ i ≤ m } be i.i.d. N (0 , -distributed random vari-ables. Set d = ( Z , · · · , Z m ) (cid:48) / ( Z + · · · + Z m ) / for ≤ i ≤ m . Let M be a m × m symmetric matrix. Let τ ≥ be given. Then, E (cid:2) | d (cid:48) Md − E ( d (cid:48) Md ) | τ (cid:3) ≤ C τ m τ · (cid:110) tr ( M ) − m [ tr ( M )] (cid:111) τ/ for all m ≥ τ + 1 , where C τ > is a constant depending on τ only. L EMMA Let { Z i ; 1 ≤ i ≤ m } be i.i.d. N (0 , -distributed random vari-ables. Set d = ( Z , · · · , Z m ) (cid:48) / ( Z + · · · + Z m ) / for ≤ i ≤ m . Let a ∈ R m be a vector and M be a m × m symmetric matrix. Let τ ≥ be given. Then, E ( | a (cid:48) d | τ ) ≤ C τ (cid:107) a (cid:107) τ /m τ and E (cid:0) d (cid:48) Md (cid:1) τ ≤ C τ m τ · (cid:110) | tr ( M ) | τ + (cid:104) tr ( M ) − m [ tr ( M )] (cid:105) τ/ (cid:111) for all m ≥ τ + 1 , where C τ > is a constant depending on τ only. L EMMA Let { h , h , h } be i.i.d. R m -valued random vectors, where h has the same distribution as d in Lemma 9. Let A , B and C be m × m matrices. Then(i) E (cid:2) ( h (cid:48) Ah )( h (cid:48) Bh ) (cid:3) = m ( m +2) (cid:2) tr ( AB ) + tr ( A ) · tr ( B ) (cid:3) if A and B aresymmetric.(ii) Var [( h (cid:48) Ch ) ] ≤ Km / · [ tr (( CC (cid:48) ) )] / , where K > is a constant.(iii) Cov (cid:2) ( h (cid:48) Ah ) , ( h (cid:48) Bh ) (cid:3) = m ( m +2) · tr ( AA (cid:48) BB (cid:48) ) − m ( m +2) tr ( AA (cid:48) ) · tr ( BB (cid:48) ) . A quick reminder is that, although we assume that A and B are symmetricin (i) above, we do no need that A , B or C are symmetric in (ii) and (iii). L EMMA Review P i in (8) and ˆ ρ ij in (9) . Recall U i and s i from Lemma3 and M ij = U (cid:48) i U j U (cid:48) j U i from Lemma 4. The following statements hold forall i (cid:54) = j. (i) E ˆ ρ ij = 0 and E ( ˆ ρ ij ) = m · tr ( P i P j ) .(ii) E [ ˆ ρ ij | s i ] = 0 and E [ ˆ ρ ij | s i ] = m · s (cid:48) i M ij s i . In the following we will use notation Var( ξ | ξ ) for conditional variance,which is defined by E ( ξ | ξ ) − [ E ( ξ | ξ )] for any random variables ξ and ξ . L EMMA Review P i in (8) and ˆ ρ ij in (9) . Recall U i and s i from Lemma12 and M ij = U (cid:48) i U j U (cid:48) j U i from Lemma 4. The following statements are truefor all i (cid:54) = j. (i) E (cid:2) ( ˆ ρ ij ) (cid:12)(cid:12) s i (cid:3) = m ( m +2) · (cid:0) s (cid:48) i M ij s i (cid:1) . AX-SUM TEST FOR CROSS-SECTIONAL DEPENDENCE (ii) E [( ˆ ρ ij ) ] = m ( m +2) · (cid:8) tr [( P i P j ) ] + [ tr ( P i P j )] (cid:9) .(iii) Var ( ˆ ρ ij | s i ) = m − m ( m +2) · (cid:0) s (cid:48) i M ij s i (cid:1) .(iv) Var ( ˆ ρ ij ) = m ( m +2) · tr [( P i P j ) ] + m − m − m ( m +2) · [ tr ( P i P j )] . Intermezzo 1: calculations of variances of sums related to samplecorrelation coefficients. In (1) and (6), we see parameters p, N, T and vari-ables x i . In the rest of the paper, we will use or develop many inequalitieswhere a constant C will appear frequently. The constant C does not dependon p, N, T or x i ’s and it can be different from line to line. The proofs of thelemma in this section will be given in Section A.3. L EMMA Review the notations p, T, N and P i in (8) . Let p be fixed and m = T − p ≥ . For any set S ⊂ { , , · · · , N } with q = | S | ∈ { , · · · , N − } ,define P S = (cid:80) k ∈ S P k . Let j / ∈ S. Then there exists a constant K > depending on p but not on N , T or P i such that the following statementshold uniformly for all ≤ i < j ≤ N and N ≥ . ( i ) 1 T · (cid:12)(cid:12) [ tr ( P i P j )] − T (cid:12)(cid:12) ≤ K. ( ii ) (cid:12)(cid:12) tr (( P i P j ) ) − T (cid:12)(cid:12) ≤ K. ( iii ) 1 T q · (cid:12)(cid:12) [ tr ( P S P j )] − T q (cid:12)(cid:12) ≤ K. ( iv ) 1 q · (cid:12)(cid:12) tr (( P S P j ) ) − T q (cid:12)(cid:12) ≤ K. ( v ) Statements (i)-(iv) still hold if symbol “ T ” is replaced by “ m ” . L EMMA Recall U i from Lemma 3 and M ij = U (cid:48) i U j U (cid:48) j U i from Lemma4. Let e have the uniform distribution on S m − . Then there is a constant C > free of N, T and p such that sup ≤ i From Lemma 4, we know I m − M ij is non-negative for each i (cid:54) = j. Sincethe sum of non-negative definite matrices is still non-negative definite, wesee that ( N − i ) I m − M i (cid:72) is also non-negative definite. By Lemma 4(ii),( N − i ) I m − M i (cid:72) is non-negative definite. In particular,tr( M i (cid:72) ) ≤ ( N − i ) m. (33)Moreover, tr( M i (cid:72) ) = N (cid:88) j = i +1 tr( M ij ) = N (cid:88) j = i +1 tr( P i P j ) (34)by (31). Now we estimate tr( P i P j ).Recall (8). Set A i = x i ( x (cid:48) i x i ) − x (cid:48) i for 1 ≤ i ≤ N. Then A i is a T × T idempotent matrix with rank p and tr( A i ) = p for each i . Since P i = I T − A i ,we see P i P j = I T + B ij where B ij := A i A j − A i − A j . By Lemma 6,tr( F F ) ≥ F and F . As a result, tr( A i A j ) ≥ A i A j ) ≤ p by Lemma 5. Thus, − p ≤ tr( B ij ) ≤ − p. Therefore, we have tr( P i P j ) ≥ T − p . Hence, tr( M i (cid:72) ) ≥ ( N − i )( T − p )by (34). This and (33) tell us thattr( M i (cid:72) ) − m · (cid:2) tr( M i (cid:72) ) (cid:3) ≤ ( N − i ) m − m ( N − i ) ( T − p ) = ( N − i ) · m − ( m − p ) m ≤ N − i ) p by recalling the notation m = T − p. Plugging this into (32) we getVar (cid:16) N (cid:88) j =2 B j (cid:17) ≤ T m ( m + 2) N − (cid:88) i =1 N p ≤ (4 p ) T N m . AX-SUM TEST FOR CROSS-SECTIONAL DEPENDENCE By the Chebyshev inequality, for any τ > P (cid:16) N (cid:12)(cid:12)(cid:12)(cid:16) N (cid:88) j =2 B j (cid:17) − µ N (cid:12)(cid:12)(cid:12) ≥ τ (cid:17) ≤ τ N · Var (cid:16) N (cid:88) j =2 B j (cid:17) ≤ pτ · T N ( T − p ) , which goes to zero provided N = o ( T ) . (cid:3) Let { s , · · · , s j } for 1 ≤ j ≤ N be defined in Lemma 3, which are i.i.d.random vectors uniformly distributed on S m − . Set F = {∅ , Ω } and F j = σ ( s , · · · , s j ) (35)which is the σ -algebra generated by { s , · · · , s j } for 1 ≤ j ≤ N . Here Ω isthe sample space on which random variables { (cid:15) ij } are defined on. L EMMA Let X j be defined as in Lemma 16 and F j be as in (35) .Assume N = o ( T ) . Define Z N = 1 N N (cid:88) j =2 E [ X j |F j − ] . Then Var( Z N ) → as N → ∞ . Proof of Lemma 21 . Set H ij = U (cid:48) i U j s j s (cid:48) j U (cid:48) j U i for 1 ≤ i, j ≤ N , where U i ’s and s j ’s are defined as in Lemma 3. Then s (cid:48) i H ij s i = s (cid:48) j C ij s j , where C ij := U (cid:48) j U i s i s (cid:48) i U (cid:48) i U j . Since s (cid:48) i U (cid:48) i U j s j = ( s (cid:48) i U (cid:48) i U j s j ) (cid:48) = s (cid:48) j U (cid:48) j U i s i ∈ R , we haveˆ ρ ij = s (cid:48) j ( U (cid:48) j U i s i s (cid:48) i U (cid:48) i U j ) s j = s (cid:48) j C ij s j . (36)By Lemma 8(i) and the independence between s i and s j , we have that E ( ˆ ρ ij | s i ) = 1 m tr( C ij ) = 1 m s (cid:48) i U (cid:48) i U j U (cid:48) j U i s i = 1 m s (cid:48) i M ij s i for i < j , where M ij = U (cid:48) i U j U (cid:48) j U i . Then1 T X j = j − (cid:88) i =1 [ ˆ ρ ij − E ( ˆ ρ ij | s i )]= j − (cid:88) i =1 (cid:2) s (cid:48) j C ij s j − m s (cid:48) i M ij s i (cid:3) = s (cid:48) j D j s j − W j (37) L. FENG ET AL. for 2 ≤ j ≤ N , where D j := j − (cid:88) i =1 C ij and W j := 1 m j − (cid:88) i =1 s (cid:48) i M ij s i . (38)In view of the independence among s i ’s, it is easy to check from Lemma 8that E (cid:0) s (cid:48) j D j s j (cid:12)(cid:12) F j − (cid:1) = 1 m tr( D j ) = 1 m j − (cid:88) i =1 tr( C ij ) . Since tr( C ij ) = tr( U (cid:48) j U i s i s (cid:48) i U (cid:48) i U j ) = s (cid:48) i M ij s i , we have that W j = E (cid:0) s (cid:48) j D j s j (cid:12)(cid:12) F j − (cid:1) . This, (37) and Lemma 8 imply1 T E [ X j |F j − ] =Var (cid:0) s (cid:48) j D j s j (cid:12)(cid:12)(cid:12) F j − (cid:1) = 2 m ( m + 2) · tr( D j ) − m ( m + 2) · (cid:2) tr( D j ) (cid:3) . (39)From (38), tr( D j ) = j − (cid:88) i =1 tr( C ij ) = j − (cid:88) i =1 s (cid:48) i M ij s i ;tr( D j ) = tr (cid:104)(cid:0) j − (cid:88) i =1 C ij (cid:17) (cid:105) = tr (cid:104)(cid:16) j − (cid:88) i =1 U (cid:48) j U i s i s (cid:48) i U (cid:48) i U j (cid:17) (cid:105) by the definition of C ij . Thus, we conclude from (39) that1 T E [ X j |F j − ] = 2 m ( m + 2) · tr (cid:104)(cid:16) j − (cid:88) i =1 U (cid:48) j U i s i s (cid:48) i U (cid:48) i U j (cid:17) (cid:105) − m ( m + 2) · (cid:16) j − (cid:88) i =1 s (cid:48) i M ij s i (cid:17) . AX-SUM TEST FOR CROSS-SECTIONAL DEPENDENCE It follows that N T Z N = 1 T N (cid:88) j =2 E [ X j |F j − ]= 2 m ( m + 2) · N (cid:88) j =2 tr (cid:104)(cid:16) j − (cid:88) i =1 U (cid:48) j U i s i s (cid:48) i U (cid:48) i U j (cid:17) (cid:105) − m ( m + 2) · N (cid:88) j =2 (cid:16) j − (cid:88) i =1 s (cid:48) i M ij s i (cid:17) . Review T = m + p . Since Var( ξ + ξ ) ≤ ξ ) + 2Var( ξ ) for any randomvariables ξ and ξ , to show Var( Z N ) → 0, it is enough to prove the followingtwo facts. Var (cid:110) N (cid:88) j =2 tr (cid:104)(cid:16) j − (cid:88) i =1 U (cid:48) j U i s i s (cid:48) i U (cid:48) i U j (cid:17) (cid:105)(cid:111) = o ( N ); (40)Var (cid:104) N (cid:88) j =2 (cid:16) j − (cid:88) i =1 s (cid:48) i M ij s i (cid:17) (cid:105) = o ( N T ) . (41)Under restriction N = o ( T ), the assertion (40) is confirmed in Lemma 19and (41) is proved in Lemma 18. The proof is completed. (cid:3) L EMMA Let X j be defined as in Lemma 16 and F j be as in (35) .Assume N = o ( T ) . Then N N (cid:88) j =2 E ( X j |F j − ) → in probability as N → ∞ . Proof of Lemma 22 . It suffices to show1 N N (cid:88) j =2 E ( X j ) → N → ∞ . By (37) and (38),1 T X j = s (cid:48) j D j s j − m j − (cid:88) i =1 s (cid:48) i M ij s i (43) L. FENG ET AL. for 2 ≤ j ≤ N , where M ij = U (cid:48) i U j U (cid:48) j U i and D j = j − (cid:88) i =1 C ij and C ij = U (cid:48) j U i s i s (cid:48) i U (cid:48) i U j . Notice s (cid:48) j D j s j = j − (cid:88) i =1 s (cid:48) j U (cid:48) j U i s i s (cid:48) i U (cid:48) i U j s j = j − (cid:88) i =1 s (cid:48) i H ij s i where H ij := U (cid:48) i U j s j s (cid:48) j U (cid:48) j U i for 1 ≤ i < j ≤ N. By Lemma 8, µ ij := E ( s (cid:48) i H ij s i | s j ) = 1 m tr( H ij ) = 1 m s (cid:48) j M ji s j ; ν ij := E ( s (cid:48) j M ji s j ) = 1 m tr( M ji ) = 1 m tr( P i P j ) (44)for any 1 ≤ i < j ≤ N. We rewrite (43) to have1 T X j = j − (cid:88) i =1 ( s (cid:48) i H ij s i − µ ij ) − m j − (cid:88) i =1 ( s (cid:48) i M ij s i − ν ij ) + j − (cid:88) i =1 ( µ ij − m ν ij ):= A j + B j + C j . Therefore, 1 T E ( X j ) ≤ · (cid:2) E ( | A j | ) + E ( | B j | ) + E ( | C j | ) (cid:3) . (45)Note that A j is the sum of independent random variables. By (28) with τ = 4, E (cid:0) | A j | | s j (cid:1) ≤ C · ( j − · j − (cid:88) i =1 E (cid:2) ( s (cid:48) i H ij s i − µ ij ) | s j (cid:3) ≤ C · jm · j − (cid:88) i =1 (cid:110) tr( H ij ) − m [tr( H ij )] (cid:111) (46) ≤ C · jm · j − (cid:88) i =1 (cid:2) tr( H ij ) (cid:3) , AX-SUM TEST FOR CROSS-SECTIONAL DEPENDENCE where the second inequality follows from Lemma 9, and where the factthat tr( H ij ) ≥ m [tr( H ij )] from Lemma 6 is used in the last step. Eas-ily, tr( H ij ) = ( s (cid:48) j M ji s j ) . Take another expectation to see E ( | A j | ) ≤ C · jm · j − (cid:88) i =1 E [( s (cid:48) j M ji s j ) ] . (47)By Lemma 10 with τ = 4 and the fact that tr( M ) ≥ m [tr( M )] for any m × m symmetric matrix M from Lemma 6, we obtain E [( s (cid:48) j M ji s j ) ] ≤ Cm · (cid:110) tr( M ji ) − m (cid:2) tr( M ji )] (cid:111) + Cm · (cid:2) tr( M ji ) (cid:3) ≤ Cm · (cid:2) tr( M ji ) (cid:3) + Cm · (cid:2) tr( M ji ) (cid:3) . It is used before that tr( M ji ) = tr( P i P j ) and tr[ M ji ] = tr[( P i P j ) ]. ByLemma 5, both quantities are bounded by m. Hence, E (cid:2) ( s (cid:48) M ji s ) (cid:3) ≤ C uniformly for all 1 ≤ i < j ≤ N . We conclude from (47) that E ( | A j | ) ≤ C · j m (48)uniformly for all 2 ≤ j ≤ N .Now we estimate B j . Replace “ H ij ” in (46) with “ M ij ” to see E ( | B j | ) ≤ C · m · jm · j − (cid:88) i =1 (cid:110) tr( M ij ) − m [tr( M ij )] (cid:111) = C · jm · j − (cid:88) i =1 (cid:104) tr(( P j P i ) ) − m (tr( P j P i )) (cid:105) ≤ C · j m , (49)where the last step holds by (i) and (ii) from Lemma 14.Finally, by (44), C j = 1 m j − (cid:88) i =1 (cid:2) s (cid:48) j M ji s j − m tr( P i P j ) (cid:3) = 1 m (cid:2) s (cid:48) j M j (cid:78) s j − E ( s (cid:48) j M j (cid:78) s j ) (cid:3) , L. FENG ET AL. where M j (cid:78) := j − (cid:88) i =1 M ji . Since tr( M ji ) = tr( P i P j ), by defining P j (cid:78) := j − (cid:88) i =1 P i , we have tr( M j (cid:78) ) = tr( P j (cid:78) P j ). Recall (30), U i U (cid:48) i = P i . Easily,tr( M ji M jk ) = tr( P i P j P k P j )for any 1 ≤ i, j, k ≤ N . It follows thattr( M j (cid:78) ) = tr (cid:104)(cid:16) j − (cid:88) i =1 M ji (cid:17) (cid:105) = (cid:88) ≤ i,k ≤ j − tr( P i P j P k P j ) = tr (cid:0)(cid:0) P j (cid:78) P j (cid:1) (cid:1) . On the other hand, recall m = T − p. By Lemma 14(v), there exists aconstant K not depending on T or N such that1 mj · (cid:12)(cid:12) [tr( P j (cid:78) P j )] − m ( j − (cid:12)(cid:12) ≤ K, j · (cid:12)(cid:12) tr(( P j (cid:78) P j ) ) − m ( j − (cid:12)(cid:12) ≤ K for every 2 ≤ j ≤ N. It follows from the triangle inequality that (cid:12)(cid:12)(cid:12) tr (cid:0) ( P j (cid:78) P j ) (cid:1) − m · (cid:2) tr( P j (cid:78) P j ) (cid:3) (cid:12)(cid:12)(cid:12) ≤ Kj for 2 ≤ j ≤ N. Consequently, by taking τ = 4 in Lemma 9 we have that E ( | C j | ) ≤ m · Cm · (cid:110) tr[ M j (cid:78) ] − m (tr( M j (cid:78) )) (cid:111) = Cm · (cid:110) tr[ (cid:0) P j (cid:78) P j (cid:1) ] − m [tr( P j (cid:78) P j )] (cid:111) ≤ C · j m uniformly for all 2 ≤ j ≤ N . Combining this with (45), (48) and (49), wearrive at E ( X j ) ≤ CT (cid:16) j m + j m + j m (cid:17) ≤ C · (cid:16) j + j m (cid:17) AX-SUM TEST FOR CROSS-SECTIONAL DEPENDENCE uniformly for all 2 ≤ j ≤ N as N is large (reviewing m = T − p and T = T N → ∞ ) . As a result,1 N N (cid:88) j =2 E ( X j ) = O (cid:16) N + Nm (cid:17) → N → ∞ as long as N = o ( T ) . We obtain (42). (cid:3) L EMMA Let X j be defined as in Lemma 16. Assume N = o ( T ) as N → ∞ . Then N (cid:80) Nj =2 X j → N (0 , in distribution as N → ∞ . The Proof of Lemma 23 . Reviewing Lemma 16, we know X j = j − (cid:88) i =1 [ T ˆ ρ ij − E ( T ˆ ρ ij | s i )] = j − (cid:88) i =1 T ˆ ρ ij − Tm j − (cid:88) i =1 s (cid:48) i M ij s i for 2 ≤ j ≤ N. Let F j be as in (35). Next we will verify that, for each N ≥ { X j ; 2 ≤ j ≤ N } forms a sequence of martingale differences with respect tothe σ -algebras {F j ; 1 ≤ j ≤ N − } . Define J = 0 and J j = j − (cid:88) i =1 T ˆ ρ ij for 2 ≤ j ≤ N − 1. By Lemma 3, ˆ ρ ij depends on s i and s j only. Fromindependence of { s , · · · , s N } and Lemma 12, E ( J j |F j − ) = j − (cid:88) i =1 E ( T ˆ ρ ij | s i ) = Tm j − (cid:88) i =1 s (cid:48) i M ij s i for 2 ≤ j ≤ N, where M ij = U (cid:48) i U j U (cid:48) j U i . Therefore, X j = J j − E ( J j |F j − ) , ≤ j ≤ N, (50)forms a martingale difference with respect to the σ -algebras {F j ; 2 ≤ j ≤ N } . Now, in order to prove 1 N N (cid:88) j =2 X j → N (0 , L. FENG ET AL. in distribution as N → ∞ , we will employ the Lindeberg-Feller central limittheorem (see, for example, p. 476 from [2] or p. 344 from [15]). To achieveso, it is enough to verify that Z N := 1 N N (cid:88) j =2 E [ X j |F j − ] → N N (cid:88) j =2 E (cid:0) X j |F j − (cid:1) → N → ∞ . Lemma 22 has showed (52). Now, to prove (51),it suffices to show E ( Z N ) → Z N ) → N → ∞ . Lemma 17 proves (53) under the assumption N = o ( T ). Theassertion (54) is confirmed in Lemma 21 by assuming N = o ( T ). Inspectall restrictions between N and T in the lemmas used earlier, the condition N = o ( T ) meets all requirement. The proof is then completed. (cid:3) Finale: proofs of Theorems 1 and 2. With the preparations inSections in 7.1.1-7.1.4, we now are ready to prove the central limit theo-rem stated in Theorem 1. The main idea is to write the sum of squares ofsample correlation coefficients as sums of martingale differences. Then theLindeberg-Feller martingale CLT is applied. Proof of Theorem 1 . Review J = 0 and J j = j − (cid:88) i =1 T ˆ ρ ij for 2 ≤ j ≤ N − 1. Then S N = (cid:80) Nj =2 J j . Review F and F j in (35). ByLemma 12, the conditional expectation, B j := E ( J j |F j − ) = Tm j − (cid:88) i =1 s (cid:48) i M ij s i AX-SUM TEST FOR CROSS-SECTIONAL DEPENDENCE for 2 ≤ j ≤ N, where M ij = U (cid:48) i U j U (cid:48) j U i . As in (50), X j = J j − E ( J j |F j − ) , ≤ j ≤ N, forms a martingale difference with respect to the σ -algebras {F j ; 2 ≤ j ≤ N } . Therefore N ( S N − µ N ) can be further written by1 N ( S N − µ N ) = 1 N (cid:16) N (cid:88) j =2 X j (cid:17) + 1 N (cid:104)(cid:16) N (cid:88) j =2 B j (cid:17) − µ N (cid:105) . From Lemma 20, 1 N (cid:104)(cid:16) N (cid:88) j =2 B j (cid:17) − µ N (cid:105) → N → ∞ . By Lemma 23,1 N N (cid:88) j =2 X j → N (0 , N → ∞ . The proof then follows from the Slutsky lemma. (cid:3) Proof of Theorem 2 . Set m = T − p . First, (cid:114) N − N · Q N = √ N · N − (cid:88) i =1 N (cid:88) j = i +1 m ˆ ρ ij − µ Nij v Nij . (55)It is easy to see a N = 3 T (cid:104) O (cid:16) T (cid:17)(cid:105) as N → ∞ . It follows that a N = 3 T (cid:104) O (cid:16) T (cid:17)(cid:105) − m = 3 T (cid:104) O (cid:16) T (cid:17)(cid:105) − T · (cid:104) O (cid:16) T (cid:17)(cid:105) = 2 T (cid:104) O (cid:16) T (cid:17)(cid:105) . By Lemma 14(i) and (ii), there exists a constant K > p butnot on N , T or P i ’s such that1 T · (cid:12)(cid:12) [tr( P i P j )] − T (cid:12)(cid:12) ≤ K and (cid:12)(cid:12) tr[( P i P j ) ] − T (cid:12)(cid:12) ≤ K L. FENG ET AL. uniformly for all 1 ≤ i < j ≤ N and N ≥ . Therefore, by the definition of v Nij , we have v Nij = 2 T (cid:104) O (cid:16) T (cid:17)(cid:105) · [ T + O ( T )] + 6 T (cid:104) O (cid:16) T (cid:17)(cid:105) · [ T + O (1)]= 2 + O (cid:16) T (cid:17) uniformly for all 1 ≤ i < j ≤ N as N → ∞ . Immediately,1 v Nij = 1 √ O (cid:16) T (cid:17) (56)uniformly for all 1 ≤ i < j ≤ N as N → ∞ . Now write v Nij = √ (1 + ω Nij ).Then sup ≤ i 1) in distribu-tion, by the Slutsky lemma again, it is enough to show∆ n := mN N − (cid:88) i =1 N (cid:88) j = i +1 ω Nij ( ˆ ρ ij − E ˆ ρ ij ) → AX-SUM TEST FOR CROSS-SECTIONAL DEPENDENCE in probability as N → ∞ . Since ˆ ρ ij and ˆ ρ kl are independent if { i, j }∩{ k, l } = ∅ , then Var(∆ n ) = (cid:16) mN (cid:17) (cid:88) ≤ i Cov( ˆ ρ ij , ˆ ρ kl )where the last sum runs over all ( k, l ) with 1 ≤ k < l ≤ N and { i, j }∩{ k, l } (cid:54) = ∅ . The total number of such ( k, l )’s is no more than 2 N + 2 N = 4 N. Since | Cov( U, V ) | ≤ [Var( U )] / · [Var( V )] / for any random variables U and V ,we have from (57) thatVar(∆ n ) ≤ (cid:16) mN (cid:17) · C T · N ( N − · (4 N ) · max ≤ i The proofs of Theorems 3, 4 and 5 . Theorems 3-5 will be provedvia approximating L N = max ≤ i The results stated in this section will be proved in Section A.4. L EMMA Let m ≥ and { ξ i ; 1 ≤ i ≤ m } be i.i.d. random variables with Eξ = 0 , Eξ = 1 and E ( | ξ | τ ) < ∞ for some τ ≥ . Let { a i ; 1 ≤ i ≤ m } be L. FENG ET AL. constants such that a + · · · + a m = 1 . Then, there exists a constant K > satisfying P ( | a ξ + · · · + a m ξ m | ≥ x ) ≤ Kx τ for all x ≥ . It is easy to see that the bound in the lemma is tight by simply taking a = 1 and a = · · · = a m = 0 . L EMMA Let m ≥ and { ξ i ; 1 ≤ i ≤ m } be i.i.d. random variables with Eξ = 0 , Eξ = 1 and Ee ω | ξ | < ∞ for some ω > . Let { a i ; 1 ≤ i ≤ m } beconstants satisfying a + · · · + a m = 1 . Then, there exists K > such that P ( | a ξ + · · · + a m ξ m | ≥ x ) ≤ K · e − x/K for all x ≥ . The above inequality is tight, which can be seen by taking a = 1 and a i = 0 for 2 ≤ i ≤ m. Recall the definition of subgaussian random variables defined before thestatement of Theorem 5. L EMMA Let m ≥ and { ξ i ; 1 ≤ i ≤ m } be i.i.d. subgaussian randomvariables. Let { a i ; 1 ≤ i ≤ m } be constants such that a + · · · + a m = 1 . Then,there exists a positive constant K not depending on m or { a i ; 1 ≤ i ≤ m } such that P ( | a ξ + · · · + a m ξ m | ≥ x ) ≤ · e − Kx for all x > . The upper bound in the lemma is optimized, which can be seen evidentlyby choosing a = 1 and a = · · · = a m = 0.7.2.2. Intermezzo: approximation of sample correlation coefficients by sim-ple versions. Recall the setting in (6), (8) and (9). Let p be fixed. Let e i = (cid:15) i (cid:107) (cid:15) i (cid:107) and ˜ ρ ij = e (cid:48) i e j for 1 ≤ i, j ≤ N . In this section, we always assumethat { (cid:15) ij ; i ≥ , j ≥ } are i.i.d. continuous random variables. The “continu-ous” requirement guarantees that ˆ ρ ij in (9) is well-defined. See the commentbelow (14). L EMMA Assume { (cid:15) ij ; i ≥ , j ≥ } are i.i.d. continuous random vari-ables. Let ˆ ρ ij be defined as in (9) . Set A i = x i ( x (cid:48) i x i ) − x (cid:48) i for ≤ i ≤ N .Then, max ≤ i 0, hence0 ≤ e (cid:48) i A i e i ≤ i . Combining the last two identities, we haveˆ ρ ij = (cid:15) (cid:48) i (cid:15) j − (cid:15) (cid:48) i A i (cid:15) j − (cid:15) (cid:48) i A j (cid:15) j + (cid:15) (cid:48) i A i A j (cid:15) j (cid:112) (cid:107) (cid:15) i (cid:107) − (cid:15) (cid:48) i A i (cid:15) i · (cid:113) (cid:107) (cid:15) j (cid:107) − (cid:15) (cid:48) j A j (cid:15) j . Dividing the numerator and denominator by (cid:107) (cid:15) i (cid:107) · (cid:107) (cid:15) j (cid:107) , we haveˆ ρ ij = (cid:2) ˜ ρ ij − e (cid:48) i A i e j − e (cid:48) i A j e j + e (cid:48) i A i A j e j (cid:3) · (cid:0) − e (cid:48) i A i e i (cid:1) − / (cid:0) − e (cid:48) j A j e j (cid:1) − / . (61)Write 1 − √ − x = √ − x − √ − x = − x ( √ − x + 1) √ − x . It is easy to see | − (1 − x ) − / | ≤ | x | if | x | ≤ . Then (1 − x ) − / ≤ | x | as | x | ≤ . For brevity of notation, set h ij = (cid:0) − e (cid:48) i A i e i (cid:1) − / (cid:0) − e (cid:48) j A j e j (cid:1) − / . Then 0 ≤ h ij − ≤ (1 + 2 e (cid:48) i A i e i )(1 + 2 e (cid:48) j A j e j ) − 1= 2 e (cid:48) i A i e i + 2 e (cid:48) j A j e j + 4( e (cid:48) i A i e i ) · ( e (cid:48) j A j e j ) ≤ e (cid:48) i A i e i + e (cid:48) j A j e j )provided max ≤ i ≤ N e (cid:48) i A i e i ≤ , and at the same time h ij ≤ ρ ij = ˜ ρ ij + ˜ ρ ij ( h ij − 1) + ( − e (cid:48) i A i e j − e (cid:48) i A j e j + e (cid:48) i A i A j e j ) · h ij . (62)By the Cauchy-Schwartz inequality and the fact A i = A i , | e (cid:48) i A i A j e j | ≤ (cid:107) A i e i (cid:107) · (cid:107) A j e j (cid:107) . L. FENG ET AL. Similarly, | e (cid:48) i A i e j | ≤ (cid:107) A i e i (cid:107) · (cid:107) A i e j (cid:107) and | e (cid:48) i A j e j | ≤ (cid:107) A j e i (cid:107) · (cid:107) A j e j (cid:107) since A i = A i . Consequently (cid:12)(cid:12) ˜ ρ ij ( h ij − 1) + (cid:0) − e (cid:48) i A i e j − e (cid:48) i A j e j + e (cid:48) i A i A j e j (cid:1) · h ij (cid:12)(cid:12) ≤ e (cid:48) i A i e i + e (cid:48) j A j e j )+2 (cid:0) (cid:107) A i e i (cid:107) · (cid:107) A i e j (cid:107) + (cid:107) A j e i (cid:107) · (cid:107) A j e j (cid:107) + (cid:107) A i e i (cid:107) · (cid:107) A j e j (cid:107) (cid:1) by the fact | ˜ ρ ij | ≤ 1. Use the trivial fact that 2 xy ≤ x + y to seemax ≤ i 1) + (cid:0) − e (cid:48) i A i e j − e (cid:48) i A j e j + e (cid:48) i A i A j e j (cid:1) · h ij (cid:12)(cid:12) ≤ max ≤ i 00 0 (cid:19) . Then (cid:15) (cid:48) Q (cid:15) = (cid:80) pk =1 (cid:0) (cid:80) Tj =1 γ kj (cid:15) j (cid:1) . It follows that P (cid:0) (cid:15) (cid:48) Q (cid:15) > v (cid:112) T / log N (cid:1) ≤ p · max ≤ k ≤ p P (cid:16)(cid:0) T (cid:88) j =1 γ kj (cid:15) j (cid:1) > vp (cid:112) T / log N (cid:17) = p · max ≤ k ≤ p P (cid:16)(cid:12)(cid:12) T (cid:88) j =1 γ kj (cid:15) j (cid:12)(cid:12) > v (cid:48) ( T / log N ) / (cid:17) , (68)where v (cid:48) := ( v/p ) / . Note that (cid:80) Tj =1 γ kj = 1 for each 1 ≤ k ≤ p byorthogonality. Thus, from Lemma 24 we have that there exists some K > P (cid:0) (cid:15) (cid:48) Q (cid:15) > v (cid:112) T / log N (cid:1) ≤ pKv (cid:48) τ · ( T / log N ) − τ/ . L. FENG ET AL. Join this with (65), (66) and (67) to get P (cid:0) max ≤ i ≤ j ≤ N e (cid:48) j Qe j > α N v (cid:1) = O (cid:16) N ( T / log N ) τ/ (cid:17) + O (cid:16) N T τ/ (cid:17) . By taking v = h , we have from the above and (64) that P (cid:16) max ≤ i 0, by (65) and (66), P (cid:0) max ≤ i ≤ j ≤ N e (cid:48) j A i e j > α N v (cid:1) ≤ N · max ≤ i ≤ N P (cid:0) (cid:15) (cid:48) A i (cid:15) > v (cid:112) T / log N (cid:1) + N · P (cid:16) (cid:107) (cid:15) (cid:107) ≤ T (cid:17) . (70)Since E ( (cid:15) ) = 1, by large deviations, there exists a constant η > P (cid:16) (cid:107) (cid:15) (cid:107) ≤ T (cid:17) = P (cid:16) T T (cid:88) j =1 (cid:15) j < (cid:17) ≤ e − η T (71) AX-SUM TEST FOR CROSS-SECTIONAL DEPENDENCE for large enough T ; see, for example, [13]. By (68), P (cid:0) (cid:15) (cid:48) A i (cid:15) > v (cid:112) T / log N (cid:1) ≤ p · max ≤ k ≤ p P (cid:16)(cid:12)(cid:12) T (cid:88) j =1 γ kj (cid:15) j (cid:12)(cid:12) > v (cid:48) ( T / log N ) / (cid:17) , where v (cid:48) := ( v/p ) / . Note that (cid:80) Tj =1 γ kj = 1 for each 1 ≤ k ≤ p byorthogonality. From Lemma 25, there exists K > P (cid:0) max ≤ i ≤ j ≤ N e (cid:48) j A i e j > α N v (cid:1) ≤ N · (cid:104) p · max ≤ k ≤ p P (cid:16)(cid:12)(cid:12) T (cid:88) j =1 γ kj (cid:15) j (cid:12)(cid:12) > v (cid:48) ( T / log N ) / (cid:17) + e − η T (cid:105) ≤ N · (cid:104) ( pK ) e − ( v (cid:48) /K )( T/ log N ) / + e − η T (cid:105) as T is large enough, where v (cid:48) := ( v/p ) / and K > p , N , T or γ kj ’s. It is easy to see the above goes to zero if T / (log N ) → ∞ . It follows that P (cid:0) max ≤ i ≤ j ≤ N e (cid:48) j A i e j > α N v (cid:1) → T / (log N ) → ∞ . The proof is completed. (cid:3) P ROPOSITION Assume { (cid:15) ij ; i ≥ , j ≥ } are i.i.d. continuous andsubgaussian random variables. If log N = o ( T / ) , then (cid:112) T log N · max ≤ i 0. Hence Ee t | (cid:15) | < ∞ for all t > 0. We will usethe same notation as in the proof of Proposition 2. Reviewing (70) and (71),to get our desired result, it suffices to show that N · max ≤ i ≤ N P (cid:0) e (cid:48) A i e > α N v (cid:1) → v > i , P (cid:0) (cid:15) (cid:48) A i (cid:15) > v (cid:112) T / log N (cid:1) ≤ p · max ≤ k ≤ p P (cid:16)(cid:12)(cid:12) T (cid:88) j =1 γ kj (cid:15) j (cid:12)(cid:12) > v (cid:48) ( T / log N ) / (cid:17) , L. FENG ET AL. where v (cid:48) := ( v/p ) / . Note that (cid:80) Tj =1 γ kj = 1 for each 1 ≤ k ≤ p byorthogonality. From Lemma 26, there exists K > P (cid:0) (cid:15) (cid:48) A i (cid:15) > v (cid:112) T / log N (cid:1) ≤ (2 p ) · exp (cid:0) − Kv (cid:48) (cid:112) T / log N (cid:1) . Therefore, by (70) and (71), there exists a constant β > N · max ≤ i ≤ N P (cid:0) e (cid:48) A i e > α N v (cid:1) ≤ (2 pN ) · exp( − Kv (cid:48) (cid:112) T / log N ) + N e − β T . It is easy to see the above goes to zero if log N = o ( T / ) . (cid:3) Assume { (cid:15) ij ; i ≥ , j ≥ } are i.i.d. continuous random variables. Set¯ (cid:15) i = (1 /T ) (cid:80) Tj =1 (cid:15) ij for all i . Define = (1 , . . . , (cid:48) ∈ R T . The Pearsoncorrelation coefficient ρ ij is then defined by ρ ij = ( (cid:15) i − ¯ (cid:15) i ) (cid:48) ( (cid:15) j − ¯ (cid:15) j ) (cid:107) (cid:15) i − ¯ (cid:15) i (cid:107) · (cid:107) (cid:15) j − ¯ (cid:15) j (cid:107) (72)for 1 ≤ i, j ≤ N. Similar to the clarification below (14), the “i.i.d. continu-ous” assumption justifies that ρ ij is well-defined. P ROPOSITION Let ˜ ρ ij be as in Lemma 27. Assume E(cid:15) = 0 and E ( | (cid:15) | τ ) < ∞ for some τ ≥ . If N /α log NT → , then (cid:112) T log N · max ≤ i 1, the above two estimates joining with (74) implies thatmax ≤ i Step 2 . Set α N = 1 / √ T log N . Then lim N →∞ α N = 0. From Step 1 , forany t > P (cid:16) max ≤ i 0, it is enough toprove that P (cid:16) max ≤ i ≤ N δ i > s √ α N (cid:17) → s > . In fact, P (cid:16) max ≤ i ≤ N δ i > s √ α N (cid:17) ≤ N · P (cid:16) | ξ + · · · + ξ T | (cid:113) ξ + · · · + ξ T > s (cid:112) T α N (cid:17) ≤ N · P (cid:16) ξ + · · · + ξ T ≤ T (cid:17) + N · P (cid:16) | ξ + · · · + ξ T | > √ sT √ α N (cid:17) , L. FENG ET AL. where { ξ j ; 1 ≤ j ≤ T } are i.i.d. random variables with the same distributionof (cid:15) . The reason we switch the notations from { (cid:15) ij } ’s to { ξ j ; 1 ≤ j ≤ T } is for the brevity of symbols. By (67), P (cid:16) ξ + · · · + ξ T ≤ T (cid:17) = O (cid:16) T τ/ (cid:17) . By the Markov inequality and (28) as used in (67), P (cid:16) | ξ + · · · + ξ T | > √ sT √ α N (cid:17) = O (cid:16) T α/ ( T √ α N ) α (cid:17) = O (cid:16) (log N ) α/ T α/ (cid:17) since α N = 1 / √ T log N . Combing the above assertions, we arrive at P (cid:16) max ≤ i ≤ N δ i > s √ α N (cid:17) = O (cid:16) NT α/ (cid:17) + O (cid:16) N (log N ) α/ T α/ (cid:17) , which converges to zero provided N (log N ) α/ T α/ → 0, or equivalently, N /α log NT → (cid:3) Finale: proofs of Theorems 3, 4 and 5. With preparations earlier,we are now ready to prove the main theorems on the maximum statistics ofsample correlation coefficients. Proof of Theorem 3 . Under the condition E | (cid:15) | < ∞ , [22] and [42] showthat T L (cid:48) N − N + log log N (75)converges weakly to a distribution with distribution function F ( y ), where L (cid:48) N = max ≤ i 8, by using the assumption T /N → c ∈ (0 , ∞ ),we see that lim N →∞ T / ( N /τ log N ) = ∞ . Hence, by the Proposition 1 (cid:112) T log N · max ≤ i 0. Then, by Proposition 4, (cid:112) T log N · max ≤ i T L N = T ( L (cid:48)(cid:48) N ) + o p (1)as N → ∞ . The conclusion follows from (80). (cid:3) L. FENG ET AL. Proof of Theorem 5 . By assumption, log N = o ( T / ). Taking “ µ = 0”and “ α = 2” in Theorem 3 and Remark 2.1 from [5], we have T ( L (cid:48)(cid:48) N ) − N + log log N (81)converges weakly to distribution function F ( y ) for y ∈ R , where L (cid:48)(cid:48) N =max ≤ i T L N = T ( L (cid:48)(cid:48) N ) + o p (1) (82)as N → ∞ . This and (81) yield the conclusion. (cid:3) The proof of Theorem 6. We create a new method to prove Theorem6 which gives the asymptotic independence between the sum S N and themaximum L N . The idea is employing the inclusion-exclusion formula twice.We expect this method to work for other problems regarding asymptoticindependence between sums of and maxima of weakly dependent randomvariables.7.3.1. Prelude: auxiliary results towards proof of Theorem 6. The resultsstated in this section are about the estimates of probabilities of events relatedto Gaussian random variables. They are useful in their own right. Theirproofs will be presented in Section A.5. L EMMA For each N ≥ , let T = T N ≥ be an integer. Suppose s and s are i.i.d. random vectors uniformly distributed on S T − . Given y ∈ R ,set l N = T − / · (4 log N − log log N + y ) / which makes sense for large N .Assume log N = o ( √ T ) as N → ∞ . Then lim N →∞ N · P ( s (cid:48) s ≥ l N ) = 12 √ π e − y/ . AX-SUM TEST FOR CROSS-SECTIONAL DEPENDENCE L EMMA Suppose s and s are two i.i.d. random vectors uniformlydistributed on S T − with T ≥ . Let { ξ , · · · , ξ k } be random variables (notnecessarily independent), each of which has the same distribution as that of s (cid:48) s . Then P (cid:0) max ≤ i ≤ k | ξ i | ≥ t (cid:1) ≤ k · e − T t / + (2 k ) · e − cT for all t > √ T , where c > is a constant free of k , t and T . L EMMA Let { Z, Z , · · · , Z k } be i.i.d. standard normals. Let δ ∈ (0 , be given. Set v i = √ δZ + √ − δZ i for ≤ i ≤ k . Then P (cid:0) min ≤ i ≤ k v i > x (cid:1) ≤ y exp (cid:16) − y δ (cid:17) + 1( x − y ) k · exp (cid:104) − k ( x − y ) − δ ) (cid:105) for all x > y > . L EMMA (Slepian’s lemma from [37]) Suppose ( U , · · · , U k ) (cid:48) and ( V , · · · , V k ) (cid:48) are two R k -valued centered Gaussian random vectors such that EU i = EV i and E ( U i U j ) ≤ E ( V i V j ) for all ≤ i, j ≤ k. Then, for any real numbers t , · · · , t k , P ( U i ≤ t i for all ≤ i ≤ k ) ≤ P ( V i ≤ t i for all ≤ i ≤ k ) . L EMMA Suppose a , · · · , a k are constant unit vectors on S T − for some T ≥ . Let s be a vector with the uniform distribution on S T − . Assume max ≤ i After collectingsome useful facts in Section 7.3.1, we are now ready to prove Theorem 6. Tomake the discussion easier to follow, we give the outline first.First, Let S N , L N and µ N be as in (10), (11) and (13), respectively. Reviewthe framework between (8) and (11). In particular,ˆ ρ ij = (cid:15) (cid:48) i P i P j (cid:15) j (cid:107) P i (cid:15) i (cid:107) · (cid:107) P j (cid:15) j (cid:107) (83)for 1 ≤ i, j ≤ N. Assume (14) holds with (cid:15) i ∼ N T ( , σ i I ) for each i . Then e i := (cid:15) i (cid:107) (cid:15) i (cid:107) , ≤ i ≤ N, (84)are i.i.d. uniformly distributed over the T -dimensional unit sphere S T − . Forfixed y ∈ R , set l N = T − / · (4 log N − log log N + y ) / . (85)Here is the structure of the proof of Theorem 6.1. Let e i be as in (84). Define ˜ L N = max ≤ i 1) and is alsothe limiting distribution function of N ( S N − µ N ); F ( y ) is the Gumbel dis-tribution and is also the limiting distribution function of ˜ L N . Recall the AX-SUM TEST FOR CROSS-SECTIONAL DEPENDENCE definition of ˜ L N , we are able to write the event in (86) as the union of (cid:0) N (cid:1) many events which are exchangeable. Then, by using the inclusion-exclusionformula, the probability in (86) is sandwiched between two bounds [(117)and (118)]. The advantage is that we reduce the probability on the globalmaximum “ ˜ L N ” to sums of probabilities on “local maxima”.3. In dealing with the “local maxima”, each probability in the sum is ofthe form P ( N ( S N − µ N ) ≤ x, | e (cid:48) i e j | > l N , · · · , | e (cid:48) i n e j n | > l N ), where n is afixed number free of N and T , and where the indices { ( i l , j l ); 1 ≤ l ≤ n } aredifferent. Review S N is the sum of ( e (cid:48) i e j ) over all 1 ≤ i < j ≤ N . Removethe terms related to { e (cid:48) i l e j l ; 1 ≤ l ≤ n } from S N , in other words, eliminatethe terms ( e (cid:48) i e j ) for all ( i, j ) with { i, j } ∩ { i l , j l } (cid:54) = ∅ for some 1 ≤ l ≤ n .Then the resulting sum is independent of { e (cid:48) i l e j l ; 1 ≤ l ≤ n } , and hence P ( N ( S N − µ N ) ≤ x, | e (cid:48) i e j | > l N , · · · , | e (cid:48) i n e j n | > l N ) is asymptotically theproduct of P ( N ( S N − µ N ) ≤ x ) and P ( | e (cid:48) i e j | > l N , · · · , | e (cid:48) i n e j n | > l N ). Ofcourse we have to handle the “loss” after removing the terms. It turns outthat the removed terms are very concentrated at their mean values by thesecond and third conclusions from Lemma 33. So the probability P ( N ( S N − µ N ) ≤ x ) and the modified version P ( N ( ˜ S N − ˜ µ N ) ≤ x ) are asymptoticallyequal. The total errors in the above approximations is negligible (Lemma36).4. In step 3, we have showed that P ( N ( S N − µ N ) ≤ x, | e (cid:48) i e j | > l N , · · · , | e (cid:48) i n e j n | > l N )is asymptotically the product of P ( N ( S N − µ N ) ≤ x ) and P ( | e (cid:48) i e j | >l N , · · · , | e (cid:48) i n e j n | > l N ) in (117) and (118), where A N = { N ( S N − µ N ) ≤ x } and B I = {| e (cid:48) i e j | > l N } for I = ( i, j ). We will use one more time theinclusion-exclusion formula to regroup the sum of probabilities P ( | e (cid:48) i e j | >l N , · · · , | e (cid:48) i n e j n | > l N ) and change it to P (max ≤ i Proof of Lemma 34 . Let m = T − p. Under assumption (14) with (cid:15) i ∼ N T ( , σ i I ) for each i , we know { e i ; 1 ≤ i ≤ N } are i.i.d. uniformly dis-tributed over S T − . Define ˜ ρ ij = e (cid:48) i e j for 1 ≤ i < j ≤ N. To organize theproof clearly, we list the relevant quantities as follows. L N = max ≤ i 1) weakly . (88)By Theorem 6 from [4], the assertion (87) is also true if “ L N ” is replacedby “ ˜ L N ”. To show asymptotic independence, it is enough to showlim N →∞ P (cid:16) N ( S N − µ N ) ≤ x, T L N − N + log log N ≤ y (cid:17) = Φ( x ) · F ( y )(89)for any x ∈ R and y ∈ R , where Φ( x ) = (2 π ) − / (cid:82) x −∞ e − t / dt. Let l N be asin (85). Due to (87) and (88) we know (89) is equivalent to thatlim N →∞ P (cid:16) N ( S N − µ N ) ≤ x, L N > l N (cid:17) = Φ( x ) · [1 − F ( y )] (90)for any x ∈ R and y ∈ R . By assumption, we know thatlim N →∞ P (cid:16) N ( S N − µ N ) ≤ x, ˜ L N > l N (cid:17) = Φ( x ) · [1 − F ( y )] (91)for any x ∈ R and y ∈ R . We show next that (91) implies (90).By Proposition 3, (cid:112) T log N · max ≤ i 0, we obtainlim sup N →∞ P (cid:16) N ( S N − µ N ) ≤ x, L N > l N (cid:17) ≤ Φ( x ) · [1 − F ( y )] (96)for any x ∈ R and y ∈ R . In the following we will show the lower limit.Evidently, P (cid:16) N ( S N − µ N ) ≤ x, L N > l N (cid:17) ≥ P (cid:16) N ( S N − µ N ) ≤ x, L N > l N , Ω N (cid:17) . (97)Set ˜ l (cid:48) N = T − / · (4 log N − log log N + y + 5 (cid:15) ) / . Similar to (95), it is checked that T / · (˜ l (cid:48) N − l N ) ∼ (cid:15) √ log N as N → ∞ . Therefore, ˜ l (cid:48) N > l N + (cid:15) √ T log N as N is sufficiently large. It is straightforward to verify that (cid:8) ˜ L N > ˜ l (cid:48) N , Ω N (cid:9) ⊂ (cid:8) ˜ L N > l N + (cid:15) √ T log N , Ω N (cid:9) ⊂ (cid:8) L N > l N , Ω N (cid:9) as N is sufficiently large, where the last inclusion follows from the definitionof Ω N . By (97), P (cid:16) N ( S N − µ N ) ≤ x, L N > l N (cid:17) ≥ P (cid:16) N ( S N − µ N ) ≤ x, ˜ L N > ˜ l (cid:48) N , Ω N (cid:17) . AX-SUM TEST FOR CROSS-SECTIONAL DEPENDENCE Thus, from (91) and (92) we getlim inf N →∞ P (cid:16) N ( S N − µ N ) ≤ x, L N > l N (cid:17) ≥ Φ( x ) · [1 − F ( y + 5 (cid:15) )]for any (cid:15) ∈ (0 , . Sending (cid:15) ↓ N →∞ P (cid:16) N ( S N − µ N ) ≤ x, L N > l N (cid:17) ≥ Φ( x ) · [1 − F ( y )]for any x ∈ R and y ∈ R . This together with (96) concludes (90). (cid:3) We need some notations now. Let S N , L N and µ N be as in (10), (11) and(13), respectively. Let { e i ; 1 ≤ i ≤ N } be as in (84). DefineΛ N = { ( i, j ); 1 ≤ i < j ≤ N } ; A N = (cid:110) N ( S N − µ N ) ≤ x (cid:111) and B I = {| e (cid:48) i e j | > l N } (98)for any I = ( i, j ) ∈ Λ N . To make a clear presentation, we impose a trivialordering for elements in Λ N . For any I = ( i , j ) ∈ Λ N and I = ( i , j ) ∈ Λ N , we say I < I if i < i or i = i but j < j . L EMMA Recall the notations from (85) and (98) . Assume log N = o ( √ T ) as N → ∞ . Assume { e i ; 1 ≤ i ≤ N } are i.i.d. uniformly distributedover S T − , which is particularly true if (14) holds with (cid:15) i ∼ N T ( , σ i I ) foreach i . Set H ( N, n ) = (cid:88) I l N } if I = ( i, j ) ∈ Λ N , where l N is defined in (85). By the definition of Γ N, , we know that B I , B I · · · , B I n are independent. By Lemma 28 and the symmetry of e (cid:48) e ,max I ∈ Λ N P ( B I ) = P ( | e (cid:48) e | ≥ l N ) = 2 P ( e (cid:48) e ≥ l N ) ≤ CN (99)for all N ≥ 3. Then, by the elementary fact (cid:0) kn (cid:1) = n ! k ( k − · · · ( k − n + 1) ≤ k n n ! for all k > n ≥ . F ≤ CN n · (cid:18) N ( N − n (cid:19) ≤ Cn ! . (100) Step 2: the estimate of F . Evidently, the size of Γ N, is no more than (cid:0) N (cid:1) · (cid:0) Nn (cid:1) · ≤ N n +1 . We first claim that { e (cid:48) e , e (cid:48) e , · · · , e (cid:48) e n } are inde-pendent. In fact, let e be uniformly distributed on S T − . Then, a (cid:48) e has thesame distribution as that of (1 , , · · · , (cid:48) e for any a ∈ S T − (see, e.g., The-orem 1.5.7(i) and the argument for (5) on p.147 from [30]). Since e , · · · , e n are i.i.d. random vectors, we know that, conditioning on e , the randomvariables { e (cid:48) e , e (cid:48) e , · · · , e (cid:48) e n } are i.i.d. with a common distribution of(1 , , · · · , (cid:48) e . In particular, their conditional distributions do not dependon e . This proves the claim. Consequently, F ≤ N n +1 · P ( | e (cid:48) e | > l N , · · · , | e (cid:48) e n +1 | > l N )=2 N n +1 · (cid:2) P ( | e (cid:48) e | > l N ) (cid:3) n ≤ C n N n − (101)by (99). Step 3: the estimate of F . Fix a tuple ( I , I , · · · , I n ) ∈ Γ N, . By theordering imposed on Λ N , we see that i ≤ i ≤ · · · ≤ i n . There are twodifferent cases: (1) i < i ; (2) there exists 2 ≤ k ≤ n − i = · · · = i k < i k +1 .Under case (1), let F be the set of random vectors { e j , e i l , e j l ; 2 ≤ l ≤ n } (the first index is “ j ” which is different from the third one “ j l ”). Then, byindependence and the property “take out what is known” for the conditionalprobability, P ( B I B I · · · B I n ) = E [ P ( B I B I · · · B I n |F )]= E (cid:104) P (cid:16) | e (cid:48) i e j | ≥ l N | e j (cid:17) · n (cid:89) l =2 I ( B I l ) (cid:105) . AX-SUM TEST FOR CROSS-SECTIONAL DEPENDENCE As a fact used earlier, the conditional distribution of e (cid:48) i e j given e j andthe unconditional distribution of e (cid:48) i e j are identical. Therefore, by (99), P ( B I B I · · · B I n ) ≤ CN · P ( B I · · · B I n ) . (102)Let us study case (2). Without loss of generality, for notational clarity,we assume i = · · · = i k = 1 and i k +1 = 2. Denote by F the set of randomvectors { e i l , e j l ; 1 ≤ l ≤ n } excluding e . Then use conditional probabilityand independence to see P ( B I B I · · · B I n ) = E [ P ( B I B I · · · B I n |F )]= E (cid:104) P (cid:16) min ≤ l ≤ k | e (cid:48) e j l | ≥ l N (cid:17) · (cid:89) l : i l (cid:54) =1 I ( B I l ) (cid:105) , (103)where P stands for the condition probability given F . By independence,the last probability in (103) is computed by treating e as a random vari-able while fixing the values of e j , · · · , e j k . To study the P , we need tounderstand the relationship among { e j , · · · , e j k } . To do so, setΩ N = (cid:110) max j ≤ l By Lemma 29 and the fact k ≤ n , P (Ω cN ) ≤ k · (cid:104) exp (cid:16) − T δ N (cid:17) + e − cT (cid:105) ≤ nN n (104)as N is sufficiently large provided log N = o ( T ). Notice that P (cid:16) min ≤ l ≤ k | e (cid:48) e j l | ≥ l N (cid:17) ≤ I (Ω cN ) + P (cid:16) min ≤ l ≤ k | e (cid:48) e j l | ≥ l N (cid:17) · I (Ω N ) . (105)We claim that, for any (cid:15) ∈ (0 , N (cid:15) ≥ P (cid:16) min ≤ l ≤ k | e (cid:48) e j l | ≥ l N (cid:17) · I (Ω N ) ≤ N k − (cid:15) (106)as N ≥ N (cid:15) . On Ω N , we know max j ≤ l 0. By(85), l N ∼ (cid:112) (log N ) /T as N → ∞ , and hence y = o ( z √ rT ). Also, δ N → N = o ( T ). k (cid:0) z √ rT − y (cid:1) − δ ) ∼ (2 rk ) · log N. L. FENG ET AL. Thus, by the lemma, use the facts that 2 rk > k − (cid:15) and that z √ rT − y → ∞ to get P (cid:16) min ≤ l ≤ k | e (cid:48) e j l | ≥ l N (cid:17) · I (Ω N ) ≤ exp (cid:16) − δ N (log N ) / (cid:17) + exp (cid:104) − k (cid:0) z √ rT − y (cid:1) − δ ) (cid:105) + 2 exp( − cT ) ≤ exp (cid:16) − √ n √ T (cid:17) + exp (cid:104) − (cid:16) k − (cid:15) (cid:17) · log N (cid:105) + 2 exp( − cT ) ≤ N k − (cid:15) + 2 exp( − cT )as N ≥ N (cid:15) thanks to the assumption log N = o ( √ T ), where N (cid:15) ≥ (cid:15) only. This leads to (106).Now, combining (105) and (106), we arrive at P (cid:16) min ≤ l ≤ k +1 | e (cid:48) e j l | ≥ l N (cid:17) ≤ I (Ω cN ) + 1 N k − (cid:15) + 2 exp( − cT )as N ≥ N (cid:15) . This together with (103) and (104) implies P ( B I B I · · · B I n ) ≤ N k − (cid:15) · P (cid:16) (cid:92) l : i l (cid:54) =1 B l (cid:17) + 3 nN n + 2 exp( − cT )as N is sufficiently large. In summary, by using the above conclusion and(102), for any (cid:15) ∈ (0 , 1) and any ( I , · · · , I n ) ∈ Γ n, , P ( B I B I · · · B I n ) ≤ N k − (cid:15) · P (cid:16) (cid:92) l : i l >i B I l (cid:17) + 3 nN n + 2 exp( − cT )as N ≥ N (cid:15) , where k is the number of elements on the i -th row of { I , · · · , I n } .In words, when we consider P ( B I B I · · · B I n ) based on the positions of I j ’sappeared in the upper triangular matrix Λ N = { ( i, j ); 1 ≤ i < j ≤ N } ,after reducing the first row we see the connection between the old and newprobabilities. Similarly, let k j be the number of elements from { I , · · · , I n } on the j -th row for j ≥ . Then P ( B I B I · · · B I n ) ≤ N k − (cid:15) · (cid:104) N k − (cid:15) · P (cid:16) (cid:92) B l (cid:17) + 3 nN n + 2 exp( − cT ) (cid:105) + 3 nN n + 2 exp( − cT ) ≤ N k +2 k − (cid:15) · P (cid:16) (cid:92) B l (cid:17) + 2 · nN n + 4 exp( − cT ) AX-SUM TEST FOR CROSS-SECTIONAL DEPENDENCE where the two intersections above run over all elements from { I , · · · , I n } excluding the first two rows. Continue the process recursively to see P ( B I B I · · · B I n ) ≤ N k + ··· +2 k b − b(cid:15) + b · nN n + 2 b exp( − cT )where b is the total number of rows of { I , · · · , I n } in the upper triangularmatrix Λ N = { ( i, j ); 1 ≤ i < j ≤ N } . Obviously, k + · · · + k b = n and b ≤ n. Therefore, for each (cid:15) ∈ (0 , P ( B I B I · · · B I n ) ≤ N n − n(cid:15) + 3 n N n + 2 n exp( − cT )for N ≥ N (cid:15) . This gives that F = (cid:88) I
1. To see how many such ( I , · · · , I n ) with |{ i l , j l ; 1 ≤ l ≤ N }| = κ , first pick κ many indices from { , , · · · , N } , which has thetotal number of ways (cid:0) Nκ (cid:1) ≤ N κ , then use the κ many indices to make a( I , · · · , I n ) ∈ Γ N, . The total number of ways to do so is no more than κ n .Therefore, | Γ N, | ≤ n − (cid:88) κ = n +1 N κ · κ n ≤ (2 n ) n · N n − . As a consequence, for each (cid:15) ∈ (0 , F ≤ (2 n ) n · (cid:16) N − n(cid:15) + 3 n N n +1 + 2 n exp( − cT ) (cid:17) as N ≥ N (cid:15) . Take (cid:15) = n to see lim N →∞ F = 0 . Joining this with (100) and(101), we eventually arrive atlim sup N →∞ H ( N, n ) ≤ Cn ! (108)for each n ≥ . The desired conclusion then follows by sending n → ∞ . (cid:3) L. FENG ET AL. L EMMA Recall the notations from (85) and (98) . Assume (14) holdswith (cid:15) i ∼ N T ( , σ i I ) for each i . If N = o ( T ) as N → ∞ , then (cid:88) I
2, from a convex inequality we have E (cid:0) | Q N, − EQ N, | τ (cid:1) ≤ n τ − · n (cid:88) l =1 E (cid:16)(cid:12)(cid:12)(cid:12) N (cid:88) j = i l +1 T ( ˆ ρ i l j − E ˆ ρ i l j ) (cid:12)(cid:12)(cid:12) τ (cid:17) ≤ Cn τ T τ · (cid:16) N τ/ m τ + N τ m τ (cid:17) ≤ C · n τ N τ/ by Lemma 33, where the constant C is free of N and T , and where the laststep follows from the assumption N = o ( T ). Similarly, E (cid:0) | Q N, − EQ N, | τ (cid:1) ≤ C · n τ N τ/ . Lastly, by Lemma 33 again, E (cid:0) | Q N, − EQ N, | τ (cid:1) ≤ T τ · n τ − · n (cid:88) s =1 n (cid:88) l =1 E (cid:2) | ˆ ρ i l j s − E ˆ ρ i l j s | τ (cid:3) ≤ C · n τ . Therefore, E | S N,n − ES N,n | τ ≤ C (cid:0) n τ N τ/ + n τ (cid:1) . Fix (cid:15) ∈ (0 , . By the Markov inequality, P (cid:16) N | S N,n − ES N,n | ≥ (cid:15) (cid:17) ≤ C(cid:15) τ · n τ N τ/ = C (cid:48) · n τ N τ/ (109)for all N ≥ n , where C (cid:48) is a constant depending on (cid:15) but free of N , T orindices { I , · · · , I n } . Fix I < I < · · · < I n ∈ Λ N . By (109) and the definition of A N ( x ), P ( A N ( x ) B I B I · · · B I n ) ≤ P (cid:16) A N ( x ) B I B I · · · B I n , N | S N,n − ES N,n | < (cid:15) (cid:17) + C (cid:48) · n τ N τ/ ≤ P (cid:16) N [( S N − S N,n ) − E ( S N − S N,n )] ≤ x + (cid:15), B I B I · · · B I n (cid:17) + C (cid:48) · n τ N τ/ = P (cid:16) N [( S N − S N,n ) − E ( S N − S N,n )] ≤ x + (cid:15) (cid:17) · P (cid:0) B I B I · · · B I n (cid:1) + C (cid:48) · n τ N τ/ L. FENG ET AL. by the independence between S N − S N,n and B I B I · · · B I n . Now P (cid:16) N [( S N − S N,n ) − E ( S N − S N,n )] ≤ x + (cid:15) (cid:17) ≤ P (cid:16) N [( S N − S N,n ) − E ( S N − S N,n )] ≤ x + (cid:15), N | S N,n − ES N,n | < (cid:15) (cid:17) + C (cid:48) · n τ N τ/ ≤ P (cid:16) N ( S N − ES N ) ≤ x + 2 (cid:15) (cid:17) + C (cid:48) · n τ N τ/ ≤ P (cid:0) A N ( x + 2 (cid:15) ) (cid:1) + C (cid:48) · n τ N τ/ . Combing the two inequalities to get P ( A N ( x ) B I B I · · · B I n ) ≤ P (cid:0) A N ( x + 2 (cid:15) ) (cid:1) · P (cid:0) B I B I · · · B I n (cid:1) + 2 C (cid:48) · n τ N τ/ . (110)Similarly, P (cid:16) N [( S N − S N,n ) − E ( S N − S N,n )] ≤ x − (cid:15), B I B I · · · B I n (cid:17) ≤ P (cid:16) N [( S N − S N,n ) − E ( S N − S N,n )] ≤ x − (cid:15), B I B I · · · B I n , N | S N,n − ES N,n | < (cid:15) (cid:17) + C (cid:48) · n τ N τ/ ≤ P (cid:16) N ( S N − ES N ) ≤ x, B I B I · · · B I n (cid:17) + C (cid:48) · n τ N τ/ . In other words, by independence, P ( A N ( x ) B I B I · · · B I n ) ≥ P (cid:16) N [( S N − S N,n ) − E ( S N − S N,n )] ≤ x − (cid:15) (cid:17) · P (cid:16) B I B I · · · B I n (cid:17) − C (cid:48) · n τ N τ/ . Furthermore, P (cid:16) N ( S N − ES N ) ≤ x − (cid:15) (cid:17) ≤ P (cid:16) N ( S N − ES N ) ≤ x − (cid:15), N | S N,n − ES N,n | < (cid:15) (cid:17) + C (cid:48) · n τ N τ/ ≤ P (cid:16) N [( S N − S N,n ) − E ( S N − S N,n )] ≤ x − (cid:15) (cid:17) + C (cid:48) · n τ N τ/ . AX-SUM TEST FOR CROSS-SECTIONAL DEPENDENCE The above two strings of inequalities imply P ( A N ( x ) B I B I · · · B I n ) ≥ P (cid:16) N ( S N − ES N ) ≤ x − (cid:15) (cid:17) · P (cid:16) B I B I · · · B I n (cid:17) − C (cid:48) · n τ N τ/ , which joining with (110) yields (cid:12)(cid:12) P ( A N ( x ) B I B I · · · B I n ) − P ( A N ( x )) · P ( B I B I · · · B I n ) (cid:12)(cid:12) ≤ ∆ N,(cid:15) · P ( B I B I · · · B I n ) + 4 C (cid:48) · n τ N τ/ where∆ N,(cid:15) : = | P ( A N ( x )) − P ( A N ( x + 2 (cid:15) ) | + | P ( A N ( x )) − P ( A N ( x − (cid:15) ) | . In particular, ∆ N,(cid:15) → | Φ( x + 2 (cid:15) ) − Φ( x ) | + | Φ( x − (cid:15) ) − Φ( x ) | (111)as N → ∞ by Theorem 1. As a consequence, ζ ( N, n ) : = (cid:88) I 0. The desired result follows by sending (cid:15) ↓ . (cid:3) L. FENG ET AL. Finale: proof of Theorem 6. We now are ready to assemble every-thing together. Proof of Theorem 6 . Recall { e i ; 1 ≤ i ≤ N } in (84). By assumption (14),we see that { e i ; 1 ≤ i ≤ N } are i.i.d. uniformly distributed over S T − . Asin Lemma 34, define ˜ L N = max ≤ i 1) weakly . (113)To show asymptotic independence, by Lemma 34, it is enough to showlim N →∞ P (cid:16) N ( S N − µ N ) ≤ x, T ˜ L N − N + log log N ≤ y (cid:17) = Φ( x ) · F ( y )for any x ∈ R and y ∈ R , where Φ( x ) = (2 π ) − / (cid:82) x −∞ e − t / dt. Review (85)to see l N = T − / · (4 log N − log log N + y ) / , (114)which makes sense for large N . Because of (112) and (113), the above isequivalent to thatlim N →∞ P (cid:16) N ( S N − µ N ) ≤ x, ˜ L N > l N (cid:17) = Φ( x ) · [1 − F ( y )] (115)for any x ∈ R and y ∈ R . Review notations Λ N , A N and B I for any I =( i, j ) ∈ Λ N in (98). Write P (cid:16) N ( S N − µ N ) ≤ x, ˜ L N > l N (cid:17) = P (cid:16) (cid:91) I ∈ Λ N A N B I (cid:17) . (116) AX-SUM TEST FOR CROSS-SECTIONAL DEPENDENCE Here the notation A N B I stands for A N ∩ B I . From the inclusion-exclusionprinciple, P (cid:16) (cid:91) I ∈ Λ N A N B I (cid:17) ≤ (cid:88) I ∈ Λ N P ( A N B I ) − (cid:88) I
1. Reviewing the definition H ( N, n ) = (cid:88) I
1. The assertion (117) implies that P (cid:16) (cid:91) I ∈ Λ N A N B I (cid:17) ≤ P ( A N ) (cid:104) (cid:88) I ∈ Λ N P ( B I ) − (cid:88) I
1. By the definition of l N and (112), P (cid:16) (cid:91) I ∈ Λ N B I (cid:17) = P ( ˜ L N > l N ) = P ( T ˜ L N − N + log log N > y ) → − F ( y )as N → ∞ . By (113), P ( A N ) → Φ( x ) as N → ∞ . From (116), by fixing k first and sending N → ∞ we get from (120) thatlim sup N →∞ P (cid:16) N ( S N − µ N ) ≤ x, ˜ L N > l N (cid:17) ≤ Φ( x ) · [1 − F ( y )] + lim sup N →∞ H ( N, k + 1) . Now, let k → ∞ and use (119) to seelim sup N →∞ P (cid:16) N ( S N − µ N ) ≤ x, ˜ L N > l N (cid:17) ≤ Φ( x ) · [1 − F ( y )] . (122)By applying the same argument to (118), we see that the counterpart of AX-SUM TEST FOR CROSS-SECTIONAL DEPENDENCE (121) becomes P (cid:16) (cid:91) I ∈ Λ N A N B I (cid:17) ≥ P ( A N ) (cid:104) (cid:88) I ∈ Λ N P ( B I ) − (cid:88) I
1. Review (116) and repeat the earlier procedure to seelim inf N →∞ P (cid:16) N ( S N − µ N ) ≤ x, ˜ L N > l N (cid:17) ≥ Φ( x ) · [1 − F ( y )]by sending N → ∞ and then sending k → ∞ . This and (122) yield (115).The proof is completed. (cid:3) Acknowledgment. Professors Feng and Liu thank NSFC grants 11501092and 11571068 for partially support. Professor Jiang thanks NSF GrantsDMS-1406279 and DMS-1916014 for partially support.APPENDIXIn this part we will prove the technical results stated in previous sections.We create same number of sections to accumulate the proofs of the claimsin the corresponding section. A.1. Proofs of auxiliary results in Section 7.1.1. We prove thelemmas in the order of their numerations. Proof of Lemma 1 . The conclusions are about even functions of ξ . So,without loss of generality, assume a ≥ . Set Eξ = b for some b > 0. Then L. FENG ET AL. b ≥ a . Note that E [ | ξ − Eξ | τ ] = E [ | ξ − b | τ · | ξ + b | τ ] ≤ (cid:2) E | ξ − b | τ (cid:3) / · (cid:2) E | ξ + b | τ (cid:3) / by the Cauchy-Schwartz inequality. Notice E | ξ − b | τ ≤ τ − · [ | a − b | τ + E | ξ − a | τ ]and E [ | ξ + b | τ ] ≤ τ − · (cid:0) | a + b | τ + E | ξ − a | τ (cid:1) ≤ τ − · (cid:2) | a − b | τ + (2 a ) τ + E | ξ − a | τ (cid:3) ≤ τ − · (cid:0) | a − b | τ + a τ + E | ξ − a | τ (cid:1) . Use the inequality √ x + y + z ≤ √ x + √ y + √ z for all x ≥ y ≥ z ≥ E [ | ξ − Eξ | τ ] ≤ τ − . · (cid:2) | b − a | τ + (cid:112) E | ξ − a | τ (cid:3) · (cid:2) | b − a | τ + a τ + (cid:112) E | ξ − a | τ (cid:3) . (123)If a = 0, then E [ | ξ − Eξ | τ ] ≤ τ − . · (cid:104) Var( ξ ) τ/ + (cid:112) E ( | ξ | τ ) (cid:105) ≤ τ · (cid:2) Var( ξ ) τ + E ( | ξ | τ ) (cid:3) This leads to (i) since Var( ξ ) τ = [ E ( ξ )] τ ≤ E ( | ξ | τ ) by the H¨older inequal-ity. Now, if a (cid:54) = 0, we continue from (123) to see | b − a | τ = ( b − a ) τ ( a + b ) τ = 1( a + b ) τ · Var( ξ ) τ ≤ a τ · Var( ξ ) τ . We get (ii). The proof is finished. (cid:3) A.2. Proofs of auxiliary results in Section 7.1.2. In this part wedevelop some identities and inequalities regarding moments of random vec-tors with the uniform distribution on high-dimensional spheres. We willfocus on developing basic tools. They are of independent interest. Reviewnotations P i and (cid:15) i = ( (cid:15) i , · · · , (cid:15) iT ) (cid:48) ∈ R T in (6) and (8). Proof of Lemma 4 . (i) Notice M ij = ( U (cid:48) j U i ) (cid:48) ( U (cid:48) j U i ). Automatically M ij is non-negative definite. To show I − M ij is non-negative definite, it is enoughto prove x (cid:48) M ij x ≤ x (cid:48) x (124) AX-SUM TEST FOR CROSS-SECTIONAL DEPENDENCE for any x ∈ R m . In fact, let z = U i x . Then x (cid:48) M ij x = (cid:107) U (cid:48) j z (cid:107) = z (cid:48) U j U (cid:48) j z = z (cid:48) P j z by (30). By (8) and (30) again, z (cid:48) P j z ≤ z (cid:48) z = x (cid:48) U (cid:48) i U i x = x (cid:48) x . The above two assertions lead to (124).(ii) Since I − M is non-negative definite, then all of the eigenvalues of M are in the interval [0 , M . This gives the conclusion. (cid:3) Proof of Lemma 5 . Notetr( F F ) = tr( F F ) (125)for any matrices F and F . Write M = H (cid:48) diag(1 , · · · , , , · · · , H where H is an orthogonal matrix, and the number of 1’s is equal to r := rank( M ).Recall (125). Thentr ( M M ) = tr (cid:2) diag(1 , · · · , , , · · · , HM H (cid:48) ) (cid:3) = tr (the upper-left r × r submatrix of HM H (cid:48) ) ≤ tr ( HM H (cid:48) )= tr ( M )by (125), where the inequality is obtained because HM H (cid:48) is nonnegativedefinite and hence all of its diagonal entries are non-negative. The conclusionfollows. (cid:3) Proof of Lemma 6 . Pick a non-negative matrix M / such that M / · M / = M . Recall the fact that AB and BA have the same eigenvaluesfor any square matrices A and B . Then M M and M / M M / have thesame eigenvalues. Since the latter one is readily seen to be a non-negativedefinite matrix, we know that all of the eigenvalues of M M are non-negative. In particular, tr( M M ) ≥ r = 0. We next assume r ≥ . Let M (cid:54) = be an n × n real matrix. Assume all eigenvalues are real and thenon-zero eigenvalues are λ , · · · , λ v with 1 ≤ v ≤ n. Then,tr( M ) = λ + · · · + λ v ≥ v ( λ + · · · + λ v ) = 1 v [tr( M )] . (126)From the singular value decomposition theorem (see e.g., p. 150 from [17]),we see v is the same as the number of non-zero singular values of M . Let s ≥ L. FENG ET AL. · · · ≥ s n be the singular values of M , that is, the eigenvalues of ( M (cid:48) M ) / .Assume | λ | ≥ · · · ≥ | λ v | without loss of generality. We then have from theWeyl inequality that | λ · · · λ k | ≤ s · · · s k for all 1 ≤ k ≤ n ; see, for example,p. 454 from [17]. This implies that v is no more than the number of non-zero eigenvalues of ( M (cid:48) M ) / , which is the same as the number of non-zeroeigenvalues of M (cid:48) M , which is again equal to rank( M (cid:48) M ) = rank( M ). Thatis, v ≤ rank( M ) . This and (126) yield the desired conclusion by taking M = M M . (cid:3) Proof of Lemma 8 . First,Var( d (cid:48) Md ) = E [( d (cid:48) Md ) ] − (cid:2) E ( d (cid:48) Md ) (cid:3) . Then (iii) follows if (i) and (ii) are valid. Let us prove (i) and (ii) next.Write M = H (cid:48) diag( λ , · · · , λ m ) H where H is an orthogonal matrix. Set η = ( Z , · · · , Z m ) (cid:48) . Observe Hd = H η (cid:107) H η (cid:107) . By the orthogonal invariance ofGaussian distributions, H η and η have the same distribution, so are Hd and d . As a consequence, d (cid:48) Md and λ Z + ··· λ m Z m Z + ··· + Z m have a common distribution.Easily, E λ Z + · · · λ m Z m Z + · · · + Z m = E Z Z + · · · + Z m · m (cid:88) i =1 λ i = 1 m · tr( M )by Lemma 7 with a = 1 and other a i ’s being equal to zero. We get (i). Now,use the formula ( a + · · · + a m ) = (cid:80) mi =1 a i + 2 (cid:80) ≤ i E Z ( Z + · · · + Z m ) = 3 m ( m + 2) and E Z Z ( Z + · · · + Z m ) = 1 m ( m + 2) . (127)Hence, E [( d (cid:48) Md ) ]= 3 m ( m + 2) · (cid:16) m (cid:88) i =1 λ i (cid:17) + 2 m ( m + 2) · (cid:16) (cid:88) ≤ i 1) + · · · + ( λ m − ¯ λ )( Z m − Z + · · · + Z m where ¯ λ = λ + ··· + λ m m . For clarity, set a i = λ i − ¯ λ and ξ i = Z i − i = 1 , · · · , m . By H¨older’s inequality, E (cid:104) d (cid:48) Md − m tr( M ) (cid:105) τ ≤ (cid:0) E | a ξ + · · · + a m ξ m | τ (cid:1) / · (cid:2) E ( Z + · · · + Z m ) − τ (cid:3) / . (129)From (27), there exists a constant K τ > τ only such that E | a ξ + · · · + a m ξ m | τ ≤ K τ · E ( a ξ + · · · + a m ξ m ) τ . Set b i = a i ( a + · · · + a m ) − for i = 1 , · · · , m . Then b + · · · + b m = 1 . Notice ϕ ( x ) := x τ is convex over [0 , ∞ ) since τ > . Then( b ξ + · · · + b m ξ m ) τ ≤ b | ξ | τ + · · · + b m | ξ m | τ . This implies that( a ξ + · · · + a m ξ m ) τ ≤ ( a + · · · + a m ) τ − · (cid:0) a | ξ | τ + · · · + a m | ξ m | τ (cid:1) . Hence E | a ξ + · · · + a m ξ m | τ ≤ K τ · ( a + · · · + a m ) τ − · (cid:2) a ( E | ξ | τ ) + · · · + a m E ( | ξ m | τ ) (cid:3) = K τ · ( a + · · · + a m ) τ · E ( | ξ | τ ) . (130) L. FENG ET AL. Now we bound the last term in (129). Since Z + · · · + Z m has the χ distribution with m -degree of freedom, E ( Z + · · · + Z m ) − τ = 12 m/ Γ( m/ (cid:90) ∞ x − τ · x ( m/ − e − x/ dx = 2 ( m/ − τ Γ(( m/ − τ )2 m/ Γ( m/ . It is known that lim x →∞ Γ( x + a ) x a Γ( x ) = 1for any a ∈ R , see, e.g., Lemma 2.4 from [14]. Therefore, there exists aconstant K (cid:48) τ such that E ( Z + · · · + Z m ) − τ ≤ K (cid:48) τ · m τ (131)for every m ≥ τ + 1 in which case Γ(( m/ − τ ) is finite. This, (129) and(130) conclude E (cid:104) d (cid:48) Md − m tr( M ) (cid:105) τ ≤ C τ · ( a + · · · + a m ) τ/ · m τ for all m ≥ τ + 1, where C τ is a constant depending on τ only. Trivially, m (cid:88) i =1 a i = m (cid:88) i =1 ( λ i − ¯ λ ) = m (cid:88) i =1 λ i − m (cid:0) m (cid:88) i =1 λ i (cid:1) = tr( M ) − m [tr( M )] . The lemma is proved. (cid:3) Proof of Lemma 10 . From Lemma 8, E ( d (cid:48) Md ) = m tr( M ). The secondconclusion comes from Lemma 9 directly by using the formula ( x + y ) τ ≤ τ − ( x τ + y τ ) for all x ≥ y ≥ . Since b := a / (cid:107) a (cid:107) is a unit vector,then b (cid:48) d and Z ( Z + · · · + Z m ) / have the same distribution;see, for instance, Theorem 1.5.7 (i) and (5) on p. 147 from Muirhead (1982).It follows that E ( | a (cid:48) d | τ ) = (cid:107) a (cid:107) τ · E | Z | τ ( Z + · · · + Z m ) τ ≤ (cid:107) a (cid:107) τ · ( E | Z | τ ) / · (cid:2) E ( Z + · · · + Z m ) − τ (cid:3) / . AX-SUM TEST FOR CROSS-SECTIONAL DEPENDENCE The first conclusion then follows from (131). (cid:3) Proof of Lemma 11 . (i) Trivially,( h (cid:48) Ah )( h (cid:48) Bh ) = 12 (cid:8)(cid:2) h (cid:48) ( A + B ) h (cid:3) − ( h (cid:48) Ah ) − ( h (cid:48) Bh ) (cid:9) . (132)From (ii) of Lemma 8, we know E [( h (cid:48) Mh ) ] = 1 m ( m + 2) · (cid:2) M ) + (tr( M )) (cid:3) for any symmetric matrix M . Then, by (132),2 m ( m + 2) · E (cid:2) ( h (cid:48) Ah )( h (cid:48) Bh ) (cid:3) = 2 tr(( A + B ) ) + (tr( A + B )) − (cid:2) A ) + (tr( A )) (cid:3) − (cid:2) B ) + (tr( B )) (cid:3) . A simple manipulation leads to (i).(ii) By singular value decomposition, write C = H (cid:48) diag( λ , · · · , λ m ) H ,where H and H are orthogonal matrices, and where λ , · · · , λ m are theeigenvalues of CC (cid:48) . Now h (cid:48) Ch = ( H h ) (cid:48) diag( λ , · · · , λ m )( H h ). Since h and h are i.i.d. and orthogonal-invariant, we know H h and H h arealso i.i.d. and have the same distribution as that of h . So we are able towrite h (cid:48) Ch = 1 (cid:107) v (cid:107) · (cid:107) w (cid:107) · m (cid:88) i =1 λ i v i w i where v = ( v , · · · , v m ), w = ( w , · · · , w m ) and { v i , w i ; 1 ≤ i ≤ m } are i.i.d. N (0 , h (cid:48) Ch ) ] ≤ E [( h (cid:48) Ch ) ] ≤ (cid:16) E (cid:107) v (cid:107) · (cid:107) w (cid:107) (cid:17) / · (cid:104) E (cid:16) m (cid:88) i =1 λ i v i w i (cid:17) (cid:105) / = (cid:2) E ( (cid:107) v (cid:107) − ) (cid:3) · (cid:104) E (cid:16) m (cid:88) i =1 λ i v i w i (cid:17) (cid:105) / , where the last step follows from independence. By (28), E (cid:16) m (cid:88) i =1 λ i v i w i (cid:17) ≤ Km m (cid:88) i =1 λ i = Km · tr[( CC (cid:48) ) ] , L. FENG ET AL. where K is a constant. Take τ = 2 from (131), we have E ( (cid:107) v (cid:107) − ) ≤ K (cid:48) m where K (cid:48) is a constant. This concludesVar[( h (cid:48) Ch ) ] ≤ K (cid:48) √ Km / · (cid:112) tr[( CC (cid:48) ) ] . (iii). Notice Cov (cid:2) ( h (cid:48) Ah ) , ( h (cid:48) Bh ) (cid:3) = E (cid:2) ( h (cid:48) Ah ) ( h (cid:48) Bh ) (cid:3) − E (cid:2) ( h (cid:48) Ah ) ] · E [( h (cid:48) Bh ) (cid:3) . Observe E ( hh (cid:48) ) = E ( h h (cid:48) ) = m I m because of the structure of h appearedin Lemma 8. Then, use the fact h (cid:48) Ah = h (cid:48) Ah and independence to have E (cid:2) ( h (cid:48) Ah ) ] = E tr (cid:2) A ( h h (cid:48) ) A (cid:48) ( hh (cid:48) ) (cid:3) = tr (cid:8) A [ E ( h h (cid:48) )] A (cid:48) [ E ( hh (cid:48) )] (cid:9) = 1 m tr( AA (cid:48) ) . The above is also true if A is replaced by B . For a vector a ∈ R m , we seethat E ( a (cid:48) h ) = E ( h (cid:48) aa (cid:48) h ) = m (cid:107) a (cid:107) by (i) of Lemma 8. Conditioning on h , using independence and by the proved (i), we obtain E (cid:2) ( h (cid:48) Ah ) ( h (cid:48) Bh ) (cid:3) = 1 m · E (cid:0) (cid:107) A (cid:48) h (cid:107) (cid:107) B (cid:48) h (cid:107) (cid:1) = 1 m · E (cid:8) [ h (cid:48) ( AA (cid:48) ) h ] · [ h (cid:48) ( BB (cid:48) ) h ] (cid:9) = 1 m ( m + 2) (cid:2) AA (cid:48) BB (cid:48) ) + tr( AA (cid:48) ) · tr( BB (cid:48) ) (cid:3) . Combing all of the above equalities, we haveCov (cid:2) ( h (cid:48) Ah ) , ( h (cid:48) Bh ) (cid:3) = 1 m ( m + 2) (cid:2) AA (cid:48) BB (cid:48) ) + tr( AA (cid:48) ) · tr( BB (cid:48) ) (cid:3) − m tr( AA (cid:48) ) · tr( BB (cid:48) )= 2 m ( m + 2) · tr( AA (cid:48) BB (cid:48) ) − m ( m + 2) tr( AA (cid:48) ) · tr( BB (cid:48) ) . The proof is completed. (cid:3) AX-SUM TEST FOR CROSS-SECTIONAL DEPENDENCE Proof of Lemma 12 . From Lemma 3 and the fact that P i (cid:15) i / (cid:107) P i (cid:15) i (cid:107) = U i s i , { ˆ ρ ij ; 1 ≤ i < j ≤ N } has the same distribution as that of { s (cid:48) i U (cid:48) i U j s j , ≤ i < j ≤ N } . We will use this fact repeatedly to prove the results next.By independence, E [ ˆ ρ ij | s i ] = s (cid:48) i U (cid:48) i U j · E s j = 0 for i (cid:54) = j . Hence E ˆ ρ ij = 0.Since s (cid:48) i U (cid:48) i U j s j = ( s (cid:48) i U (cid:48) i U j s j ) (cid:48) = s (cid:48) j U (cid:48) j U i s i ∈ R , we haveˆ ρ ij = s (cid:48) j ( U (cid:48) j U i s i s (cid:48) i U (cid:48) i U j ) s j . (133)Let B = U (cid:48) j U i s i s (cid:48) i U (cid:48) i U j . Conditioning on s i , we see from independence that E (cid:2) s (cid:48) j ( U (cid:48) j U i s i s (cid:48) i U (cid:48) i U j ) s j | s i (cid:3) = E ( s (cid:48) j Bs j | s i ) = 1 m tr( B ) (134)by Lemma 8. By (125), tr( B ) = s (cid:48) i ( U (cid:48) i U j U (cid:48) j U i ) s i . The above assertionsconclude that E (cid:2) s (cid:48) j ( U (cid:48) j U i s i s (cid:48) i U (cid:48) i U j ) s j | s i (cid:3) = 1 m · s (cid:48) i M ij s i (135)by the notation M ij = U (cid:48) i U j U (cid:48) j U i . Combining (133) and (135) together, weobtain E [ ˆ ρ ij | s i ] = m · s (cid:48) i M ij s i . Now, taking a further expectation, we havefrom Lemma 8 again that E ˆ ρ ij = m · tr( M ij ). By (30), U i U (cid:48) i = P i . By(125), tr( M ij ) = tr( U i U (cid:48) i U j U (cid:48) j ) = tr( P i P j ) . (136)We get the second conclusion from (ii). (cid:3) In the following we will use the conditional variance Var( ξ | ξ ), which isdefined by E ( ξ | ξ ) − [ E ( ξ | ξ )] for any random variables ξ and ξ . Proof of Lemma 13 . (i) Review (133) and the notation B = U (cid:48) j U i s i s (cid:48) i U (cid:48) i U j .Then ( ˆ ρ ij ) = ( s (cid:48) j Bs j ) . (137)Since s i s (cid:48) i is a rank-one matrix, we know the rank of B is no more than 1 . As a consequence tr( B ) = [tr( B )] = ( s (cid:48) i M ij s i ) since tr( B ) = s (cid:48) i M ij s i by(125). Use independence and Lemma 8 to yield E (cid:2) ( s (cid:48) j Bs j ) (cid:12)(cid:12) s i (cid:3) = 3 m ( m + 2) · (cid:0) s (cid:48) i M ij s i (cid:1) , (138) L. FENG ET AL. and hence E (cid:2) ( ˆ ρ ij ) (cid:12)(cid:12) s i (cid:3) = 3 m ( m + 2) · ( s (cid:48) i M ij s i (cid:1) (139)by (137). We obtain (i).(ii) Taking expectations for both sides of (139), we get from Lemma 8(ii)that E (cid:2) ( ˆ ρ ij ) (cid:3) = 3 m ( m + 2) · E ( s (cid:48) i M ij s i (cid:1) = 3 m ( m + 2) · (cid:8) M ij ) + [tr( M ij )] (cid:9) . By (136), tr( M ij ) = tr( P i P j ). Also, from (30) and (125),tr( M ij ) = tr( U (cid:48) i U j U (cid:48) j U i U (cid:48) i U j U (cid:48) j U i ) = tr[( P i P j ) ] . We have proved (ii).(iii) NoticeVar(ˆ ρ ij | s i ) = E (cid:2) ( ˆ ρ ij ) | s i (cid:3) − (cid:2) E ( ˆ ρ ij | s i ) (cid:3) = 3 m ( m + 2) · (cid:0) s (cid:48) i M ij s i (cid:1) − (cid:104) m · s (cid:48) i M ij s i (cid:105) = 2( m − m ( m + 2) · (cid:0) s (cid:48) i M ij s i (cid:1) by (i) proved above and (ii) from Lemma 12. (cid:3) (iv) By (i) of Lemma 12 and (ii) proved above,Var(ˆ ρ ij )= 3 m ( m + 2) · (cid:8) P i P j ) ] + [tr( P i P j )] (cid:9) − (cid:104) m · tr( P i P j ) (cid:105) = 6 m ( m + 2) · tr[( P i P j ) ] + 2( m − m − m ( m + 2) · [tr( P i P j )] . We finish the proof. (cid:3) A.3. Proofs of auxiliary results in Section 7.1.3. Review the in-terpretation of constant C before the statement of Lemma 14. Proof of Lemma 14 . Recall (8). Set A i = x i ( x (cid:48) i x i ) − x (cid:48) i for 1 ≤ i ≤ N. Then A i is a T × T idempotent matrix with rank p and tr( A i ) = p for each i . Since P i = I T − A i , we see P i P j = I T + B ij (140) AX-SUM TEST FOR CROSS-SECTIONAL DEPENDENCE where B ij := A i A j − A i − A j . By Lemma 6,tr( F F ) ≥ F and F . As a result, tr( A i A j ) ≥ A i A j ) ≤ p by Lemma 5. Thus, − p ≤ tr( B ij ) ≤ − p. (142)Expand [ A i A j − ( A i + A j )] and use (125) to seetr( B ij ) = tr( A i A j A i A j ) − A i A j A i ) − A j A i A j ) + tr(( A i + A j ) ) . (143)By (141), tr( A i A j A i A j ) = tr[ A i ( A j A i A j )] ≥ A j A i A j is non-negative definite. So each trace in (143) is non-negative. Also, tr(( A i + A j ) ) = tr( A i ) + 2tr( A i A j ) + tr( A j ) ≤ p by Lemma 5. Observemax { tr( A i A j A i ) , tr( A j A i A j ) } ≤ tr( A i A j ),tr( A i A j A i ) ≤ tr( A i ) and tr( A j A i A j ) ≤ tr( A j ) by Lemma 5. Therefore,2tr( A i A j A i ) + 2tr( A j A i A j ) ≤ tr(( A i + A j ) ) ≤ p. It follows that 0 ≤ tr( B ij ) ≤ p. (144)With the above preparation, we now derive the conclusions. In fact, from(140), [tr( P i P j )] = T + 2 tr( B ij ) · T + [tr( B ij )] . This implies (i) by (142). Now, from (140) again,tr(( P i P j ) ) = T + 2 tr( B ij ) + tr( B ij ) . (145)Then (ii) follows from (142) and (144). Let us show the remaining two claimsnext.By the definition of B ij and the notation q = | S | , P S P j = (cid:104) q I T − (cid:88) i ∈ S A i (cid:105) · ( I T − A j )= q I T − q A j − (cid:104) (cid:88) i ∈ S A i (cid:105) + (cid:88) i ∈ S A i A j = q I T + (cid:88) i ∈ S B ij . (146) L. FENG ET AL. Hence, tr( P S P j ) = qT + (cid:88) i ∈ S tr( B ij ) . The inequality from (142) implies that (cid:80) i ∈ S tr( B ij ) is between − pq and − pq . This leads to (iii) by a trivial equality ( x + y ) = x + 2 xy + y for all x, y ∈ R . To get (iv), we start from (146) again such that1 q ( P S P j ) = I T + 2 q (cid:16) (cid:88) i ∈ S B ij (cid:17) + (cid:16) q (cid:88) i ∈ S B ij (cid:17) Then 1 q tr(( P S P j ) ) = T + 2 q (cid:104) (cid:88) i ∈ S tr( B ij ) (cid:105) + tr (cid:104)(cid:16) q (cid:88) i ∈ S B ij (cid:17) (cid:105) := T + C ij . (147)By the Cauchy-Schwartz inequality, for any T × T matrix M = ( m ij ) T × T ,we have | tr( M ) | = | (cid:80) ≤ i,j ≤ T m ij m ji | ≤ (cid:80) ≤ i,j ≤ T m ij = (cid:107) M (cid:107) F , where (cid:107) M (cid:107) F := (cid:112) tr( MM (cid:48) ) is the Frobenius norm of M . By the triangle inequalityand then Lemma 5, (cid:107) B ij (cid:107) F ≤ (cid:107) A i A j (cid:107) F + (cid:107) A i (cid:107) F + (cid:107) A j (cid:107) F ≤ √ p . It followsthat (cid:12)(cid:12)(cid:12) tr (cid:104)(cid:16) q (cid:88) i ∈ S B ij (cid:17) (cid:105)(cid:12)(cid:12)(cid:12) / ≤ (cid:13)(cid:13)(cid:13) q (cid:88) i ∈ S B ij (cid:13)(cid:13)(cid:13) F ≤ q (cid:88) i ∈ S (cid:107) B ij (cid:107) F ≤ √ p. (148)This and (142) conclude that − p ≤ C ij ≤ p for all i, j . We then get (iv)from (147).Now we prove (v). Obviously (i) and (ii) still hold if symbol“ T ” is replacedby “ m ”. On the other hand, by the triangle inequality and the facts T = m + p and T − m = 2 mp + p ,1 mq · (cid:12)(cid:12) [tr( P S P j )] − m q (cid:12)(cid:12) ≤ mq · (cid:8)(cid:12)(cid:12) [tr( P S P j )] − T q (cid:12)(cid:12) + (2 mp + p ) q (cid:9) = (cid:16) pm (cid:17) · T q · (cid:12)(cid:12) [tr( P S P j )] − T q (cid:12)(cid:12) + 2 p + p m . Since p is fixed, (iii) is also true if “ T ” is replaced by “ m ”. The remainingpart of (v) is obtained similarly. AX-SUM TEST FOR CROSS-SECTIONAL DEPENDENCE The constant K is taken to be the maximum of the five bounds in (i)-(v). (cid:3) Proof of Lemma 15 . By (30), U i U (cid:48) i = P i . Use this fact and (125) to seetr( M ij ) = tr( P i P j ) and tr( M ij ) = tr (cid:2) ( P i P j ) (cid:3) . Let ξ = e (cid:48) M ij e . By Lemma 8, Eξ = m tr( P i P j ) andVar( ξ ) = 2 m ( m + 2) (cid:110) tr (cid:2) ( P i P j ) (cid:3) − m · (cid:2) tr( P i P j ) (cid:3) (cid:111) . By taking α = 4 in Lemma 9, we get E (cid:2) ( ξ − Eξ ) (cid:3) ≤ Cm (cid:110) tr (cid:2) ( P i P j ) (cid:3) − m · (cid:2) tr( P i P j ) (cid:3) (cid:111) , as m ≥ α + 1, that is, T ≥ p + 17 since m = T − p and α = 4. Noticerank( P i P j ) ≤ rank( P i ) = m . By Lemma 6, Lemma 14(v) and the triangleinequality, 0 ≤ tr (cid:2) ( P i P j ) (cid:3) − m · (cid:2) tr( P i P j ) (cid:3) ≤ C. This says that max (cid:110) Var( ξ ) , (cid:113) E (cid:2) ( ξ − Eξ ) (cid:3)(cid:111) ≤ Cm . From Lemma 5, Eξ ≤ 1. Recall T = m + p . By (i) of Lemma 14, there existsa constant K > Eξ = 1 m tr( P i P j ) ≥ m (cid:112) T − T K ≥ N ≥ C since T = T N → ∞ as N → ∞ , where C > N , p and T . By using the above two inequalities and Lemma 1, we see E [( ξ − Eξ ) ] ≤ · (cid:104) · Var( ξ ) + (cid:112) E ( | ξ − Eξ | ) (cid:105) · (cid:104) · Var( ξ ) + (cid:112) E ( | ξ − Eξ | ) (cid:105) ≤ Cm . The proof is completed. (cid:3) Proof of Lemma 16 . The second expression of X j follows from Lemma12. Now we start to compute E ( X j ). L. FENG ET AL. Evidently, s (cid:48) i U (cid:48) i U j s j = s (cid:48) j U (cid:48) j U i s i ∈ R . We have from (133) thatˆ ρ ij = s (cid:48) j ( U (cid:48) j U i s i s (cid:48) i U (cid:48) i U j ) s j = s (cid:48) i ( U (cid:48) i U j s j s (cid:48) j U (cid:48) j U i ) s i . Let H ij = U (cid:48) i U j s j s (cid:48) j U (cid:48) j U i . Recall M ij = U (cid:48) i U j U (cid:48) j U i . Write1 T X j = j − (cid:88) i =1 (cid:2) s (cid:48) i H ij s i − m s (cid:48) i M ij s i (cid:3) (149)for 2 ≤ j ≤ N. Given s j , the conditional mean of the term in the sum aboveis equal to E (cid:0) s (cid:48) i H ij s i (cid:12)(cid:12) s j (cid:1) − m E (cid:0) s (cid:48) i M ij s i (cid:1) = 1 m · tr( H ij ) − m · tr( M ij )= s (cid:48) j M ji s j m − m · tr( P i P j )by Lemma 8 and (30). Observe that, given s j , the terms in the sum from(149) are independent. Also, it is true that E ( ψ + c ) = Var( ψ ) + c for anyrandom variable ψ with mean zero and constant c . Thus, E (cid:16) T X j (cid:12)(cid:12)(cid:12) s j (cid:17) = j − (cid:88) i =1 Var (cid:104)(cid:0) s (cid:48) i H ij s i − m s (cid:48) i M ij s i (cid:1)(cid:12)(cid:12) s j (cid:105) + (cid:104) j − (cid:88) i =1 (cid:16) s (cid:48) j M ji s j m − m · tr( P i P j ) (cid:17)(cid:105) . (150)In what follows we exam the last two terms carefully. Write s (cid:48) i H ij s i − m s (cid:48) i M ij s i = s (cid:48) i D ij s i where D ij := H ij − M ij m . Define Υ ij = Var (cid:2)(cid:0) s (cid:48) i H ij s i − m s (cid:48) i M ij s i (cid:1)(cid:12)(cid:12) s j (cid:3) . Therefore, we get from Lemma 8 thatΥ ij = 2 m ( m + 2) · tr( D ij ) − m ( m + 2) [tr( D ij )] . (151) AX-SUM TEST FOR CROSS-SECTIONAL DEPENDENCE First, tr( D ij ) = s (cid:48) j M ji s j − m · tr( P i P j ) , hence[tr( D ij )] = ( s (cid:48) j M ji s j ) − P i P j ) m · s (cid:48) j M ji s j + 1 m · [tr( P i P j )] . (152)Second, tr( D ij ) = tr( H ij ) − · tr( H ij M ij ) m + tr( M ij ) m . Observe the rank of s j s (cid:48) j is at most one, since H ij = U (cid:48) i U j ( s j s (cid:48) j ) U (cid:48) j U i , weknow rank( H ij ) ≤ 1. As a consequence,tr( H ij ) = [tr( H ij )] = [ s (cid:48) j M ji s j ] . Now, by the definition of M ij and the fact U i U (cid:48) i = P i in (30),tr( H ij M ij ) = tr( U (cid:48) i U j s j s (cid:48) j U (cid:48) j U i U (cid:48) i U j U (cid:48) j U i ) = s (cid:48) j M ji s j ;tr( M ij ) = tr( U (cid:48) i U j U (cid:48) j U i U (cid:48) i U j U (cid:48) j U i ) = tr(( P i P j ) ) , (153)where (125) is used above. Combining the above identities to seetr( D ij ) = [ s (cid:48) j M ji s j ] − · s (cid:48) j M ji s j m + tr(( P i P j ) ) m . This together with (151) and (152) implies thatΥ ij = 2 m ( m + 2) (cid:2) ( s (cid:48) j M ji s j ) − · s (cid:48) j M ji s j m + tr(( P i P j ) ) m (cid:105) − m ( m + 2) (cid:110) ( s (cid:48) j M ji s j ) − P i P j ) m · ( s (cid:48) j M ji s j ) + 1 m · [tr( P i P j )] (cid:111) . By a trivial sorting, we obtainΥ ij = 2 m − m ( m + 2) · ( s (cid:48) j M ji s j ) − m ( m + 2) · s (cid:48) j M ji s j + 4 tr( P i P j ) m ( m + 2) · s (cid:48) j M ji s j +2 m ( m + 2) · tr(( P i P j ) ) − m ( m + 2) · [tr( P i P j )] . (154) L. FENG ET AL. Now we analyze the expectation of each term above in order to compute themean of the conditional variance. It is easy to checktr( M ji M jk ) = tr( P i P j P k P j ) (155)for any 1 ≤ i, j, k ≤ N . Now, by Lemma 8, E ( s (cid:48) j M ji s j ) = 1 m ( m + 2) · (cid:8) M ji ) + [tr( M ji )] (cid:9) = 1 m ( m + 2) · (cid:8) P i P j ) ) + [tr( P i P j )] (cid:9) since tr( M ji ) = tr( P i P j ). By Lemma 8 again, E ( s (cid:48) j M ji s j ) = 1 m · tr( M ji ) = 1 m · tr(( P i P j ) ); E ( s (cid:48) j M ji s j ) = 1 m · tr( M ji ) = 1 m · tr( P i P j ) . Take expectations for both sides of (154) and use the above facts to see E Υ ij =tr(( P i P j ) ) · (cid:104) m − m ( m + 2) − m ( m + 2) + 2 m ( m + 2) (cid:105) +[tr( P i P j )] · (cid:104) m − m ( m + 2) + 4 m ( m + 2) − m ( m + 2) (cid:105) =tr(( P i P j ) ) · m − m ( m + 2) + [tr( P i P j )] · m + 4 m ( m + 2) . (156)Now we turn to study the mean of the last term from (150). WriteΞ ij : = j − (cid:88) i =1 (cid:16) s (cid:48) j M ji s j m − m · tr( P i P j ) (cid:17) = s (cid:48) j M j (cid:78) s j m − m · tr( P j (cid:78) P j ) , for 2 ≤ j ≤ N, where we define M j (cid:78) = j − (cid:88) i =1 M ji and P j (cid:78) = j − (cid:88) i =1 P i . (157)By using Lemma 8, E s (cid:48) j M j (cid:78) s j m = tr( M j (cid:78) ) m = 1 m j − (cid:88) i =1 tr( M ji ) . AX-SUM TEST FOR CROSS-SECTIONAL DEPENDENCE Since tr( M ji ) = tr( P i P j ), the above is equal to1 m j − (cid:88) i =1 tr( P i P j ) = 1 m · tr( P j (cid:78) P j ) . As a byproduct, tr( M j (cid:78) ) = tr( P j (cid:78) P j ) . (158)Therefore E (Ξ ij ) = 1 m · Var (cid:0) s (cid:48) j M j (cid:78) s j (cid:1) = 2 m ( m + 2) · (cid:110) tr( M j (cid:78) ) − m · (cid:2) tr( M j (cid:78) ) (cid:3) (cid:111) (159)by Lemma 8. Note thattr( M j (cid:78) ) = tr (cid:104)(cid:16) j − (cid:88) i =1 M ji (cid:17) (cid:105) = (cid:88) ≤ i,k ≤ j − tr( P i P j P k P j ) = tr (cid:0)(cid:0) P j (cid:78) P j (cid:1) (cid:1) (160)by (155). This, (158) and (159) conclude E (Ξ ij ) = 2 m ( m + 2) · (cid:110) tr (cid:0) ( P j (cid:78) P j ) (cid:1) − m · (cid:2) tr( P j (cid:78) P j ) (cid:3) (cid:111) . (161)Review the notations Ξ ij and Υ ij , the conclusion follows from (150), (156)and (161). (cid:3) Proof of Lemma 17 . By Lemma 16,1 T E ( X j )= 2 m − m ( m + 2) j − (cid:88) i =1 tr(( P i P j ) ) + 2 m + 4 m ( m + 2) j − (cid:88) i =1 [tr( P i P j )] +2 m ( m + 2) (cid:110) tr (cid:0) ( P j (cid:78) P j ) (cid:1) − m · (cid:2) tr( P j (cid:78) P j ) (cid:3) (cid:111) for 2 ≤ j ≤ N. We next analyze the above three terms.Review m = T − p. From Lemma 14(v), there exists a constant K notdepending on T or N such that1 mj · (cid:12)(cid:12) [tr( P j (cid:78) P j )] − m ( j − (cid:12)(cid:12) ≤ K, j · (cid:12)(cid:12) tr(( P j (cid:78) P j ) ) − m ( j − (cid:12)(cid:12) ≤ K L. FENG ET AL. for all 2 ≤ j ≤ N. By the triangle inequality, (cid:12)(cid:12)(cid:12) tr (cid:0) ( P j (cid:78) P j ) (cid:1) − m · (cid:2) tr( P j (cid:78) P j ) (cid:3) (cid:12)(cid:12)(cid:12) ≤ Kj (162)for 2 ≤ j ≤ N. This, (i) and (ii) from Lemma 14 imply1 T E ( X j ) ≤ m − m ( m + 2) ( T + K )( j − m + 4 m ( m + 2) ( T + KT )( j − 1) + 4 Kj m ( m + 2) . It follows that1 T N (cid:88) j =2 E ( X j ) ≤ m − m ( m + 2) ( T + K ) · 12 ( N − N + 2 m + 4 m ( m + 2) ( T + KT ) · 12 ( N − N + 4 Km ( m + 2) · N ( N + 1)(2 N + 1) . Similarly, by the lower bound from (162),1 T N (cid:88) j =2 E ( X j ) ≥ m − m ( m + 2) ( T − K ) · 12 ( N − N + 2 m + 4 m ( m + 2) ( T − KT ) · 12 ( N − N − Km ( m + 2) · (cid:104) N ( N + 1)(2 N + 1) − (cid:105) . Inspecting the above two bounds carefully, the dominating term is2 m + 4 m ( m + 2) T · 12 ( N − N = T N m (1 + o (1))provided 4 Km ( m + 2) · N ( N + 1)(2 N + 1) = o (cid:16) T N m (cid:17) . This is equivalent to that N = o ( T ) . Therefore1 T N (cid:88) j =2 E ( X j ) = T N m (1 + o (1)) AX-SUM TEST FOR CROSS-SECTIONAL DEPENDENCE as N → ∞ . Consequently, 1 N N (cid:88) j =2 E ( X j ) → N → ∞ since m = T − p and p is fixed. (cid:3) Proof of Lemma 18 . Set R j = (cid:16) j − (cid:88) i =1 s (cid:48) i M ij s i (cid:17) , ≤ j ≤ N. Then Var (cid:104) N (cid:88) j =2 (cid:16) j − (cid:88) i =1 s (cid:48) i M ij s i (cid:17) (cid:105) =Var (cid:16) N (cid:88) j =2 R j (cid:17) ≤ ( N − (cid:104) N (cid:88) j =2 Var( R j ) (cid:105) (163)by the convexity of function f ( x ) := x for x ∈ R . We next calculate Var( R j )for each j .Fix 2 ≤ j ≤ N. For simplicity of notation, set ξ = (cid:80) j − i =1 s (cid:48) i M ij s i . Then R j = ξ . Note that tr( M ij ) = tr( P i P j ) by (136) and tr( M ij ) = tr[( P i P j ) ]from (155). Then Eξ = 1 m j − (cid:88) i =1 tr( M ij ) = 1 m j − (cid:88) i =1 tr( P i P j ) = 1 m tr( P j (cid:78) P j ) , (164)where P j (cid:78) is defined in (157). By independence and Lemma 8, we obtainVar( ξ ) = j − (cid:88) i =1 Var (cid:0) s (cid:48) i M ij s i (cid:1) = 2 m ( m + 2) · j − (cid:88) i =1 (cid:110) tr( M ij ) − m · (cid:2) tr( M ij ) (cid:3) (cid:111) = 2 m ( m + 2) · j − (cid:88) i =1 (cid:110) tr[( P i P j ) ] − m · (cid:2) tr( P i P j ) (cid:3) (cid:111) . (165) L. FENG ET AL. Furthermore, by (28) and then Lemma 9, E [( ξ − Eξ ) ] ≤ ( Kj ) · j − (cid:88) i =1 E (cid:104) s (cid:48) i M ij s i − E ( s (cid:48) i M ij s i ) (cid:105) ≤ K (cid:48) jm j − (cid:88) i =1 (cid:110) tr( M ij ) − m [tr( M ij )] (cid:111) = K (cid:48) jm j − (cid:88) i =1 (cid:110) tr[( P i P j ) ] − m [tr( P i P j )] (cid:111) (166)as m ≥ · T − p ≤ tr( P i P j ) ≤ T − p for any 1 ≤ i < j ≤ N , which implies that (cid:16) − pm (cid:17) ( j − ≤ Eξ ≤ ( j − P j (cid:78) and the notation m = T − p . Now, by (i) and(ii) from Lemma 14,tr[( P i P j ) ] − m [tr( P i P j )] ≤ tr[( P i P j ) ] − T [tr( P i P j )] ≤ C (167)for 1 ≤ i < j ≤ N . Hence j ≤ Eξ ≤ j, Var( ξ ) ≤ Cjm and E [( ξ − Eξ ) ] ≤ Cj m uniformly for all 2 ≤ j ≤ N as N is sufficiently large, where the “ ” appearedin the lower bound of Eξ is not essential, it can be any positive number lessthan one. We then have from (ii) of Lemma 1 (taking α = 2) thatVar( R j ) = Var( ξ ) ≤ C · j m uniformly for all 2 ≤ j ≤ N as N is sufficiently large. This implies that N (cid:88) j =2 Var( R j ) = O (cid:16) N T (cid:17) as N → ∞ . AX-SUM TEST FOR CROSS-SECTIONAL DEPENDENCE Proof of Lemma 19 . Similar to the last inequality from (163), we haveVar (cid:110) N (cid:88) j =2 tr (cid:104)(cid:16) j − (cid:88) i =1 U (cid:48) j U i s i s (cid:48) i U (cid:48) i U j (cid:17) (cid:105)(cid:111) ≤ ( N − · N (cid:88) j =2 Var (cid:110) tr (cid:104)(cid:16) j − (cid:88) i =1 U (cid:48) j U i s i s (cid:48) i U (cid:48) i U j (cid:17) (cid:105)(cid:111) . (168)Use the formula that ( a + · · · + a n ) = (cid:80) ≤ i,k ≤ n a i a k for any real numbers a i ’s to see tr (cid:104)(cid:16) j − (cid:88) i =1 U (cid:48) j U i s i s (cid:48) i U (cid:48) i U j (cid:17) (cid:105) = (cid:88) ≤ i,k ≤ j − tr (cid:0) U (cid:48) j U i s i s (cid:48) i U (cid:48) i U j U (cid:48) j U k s k s (cid:48) k U (cid:48) k U j (cid:1) = (cid:88) ≤ i,k ≤ j − (cid:0) s (cid:48) i U (cid:48) i U j U (cid:48) j U k s k (cid:1) since tr (cid:0) U (cid:48) j U i s i s (cid:48) i U (cid:48) i U j U (cid:48) j U k s k s (cid:48) k U (cid:48) k U j (cid:1) = tr (cid:2) ( s (cid:48) i U (cid:48) i U j U (cid:48) j U k s k )( s (cid:48) k U (cid:48) k U j U (cid:48) j U i s i ) (cid:3) = ( s (cid:48) i U (cid:48) i U j U (cid:48) j U k s k ) by (125). Set J ijk := U (cid:48) i U j U (cid:48) j U k (169)for all 1 ≤ i, j, k ≤ N . Of course, J iji = M ij which appears in Lemma 4.Furthermore, J (cid:48) ijk = J kji for all i, j, k . Then s (cid:48) i J ijk s k = ( s (cid:48) i J ijk s k ) (cid:48) = s (cid:48) k J kji s i .Thus, (cid:88) ≤ i,k ≤ j − (cid:0) s (cid:48) i U (cid:48) i U j U (cid:48) j U k s k (cid:1) = j − (cid:88) i =1 (cid:0) s (cid:48) i M ij s i (cid:1) + 2 (cid:88) ≤ i Although theresults stated in Section 7.2.1 serve the understanding of sample correlationcoefficients ˆ ρ ij , their proofs have their own merits. Proof of Lemma 24 . First, by the Chebyshev inequality, P ( | a ξ + · · · + a m ξ m | ≥ x ) ≤ x · E ( a ξ + · · · + a m ξ m ) = 1 x (179)since the last expectation is equal to a + · · · + a m = 1. Let { ¯ ξ i ; 1 ≤ i ≤ m } bean independent copy of { ξ i ; 1 ≤ i ≤ m } . Then, we see P ( | a ¯ ξ + · · · + a m ¯ ξ m | ≥ AX-SUM TEST FOR CROSS-SECTIONAL DEPENDENCE x/ ≤ x ≤ for x ≥ 3, and hence P ( | a ¯ ξ + · · · + a m ¯ ξ m | < x/ ≥ . Consequently, P (cid:0) | a ξ + · · · + a m ξ m | ≥ x ) ≤ P ( | a ξ + · · · + a m ξ m | ≥ x, | a ¯ ξ + · · · + a m ¯ ξ m | < x (cid:1) ≤ P (cid:0) | a η + · · · + a m η m | ≥ x (cid:1) , (180)where η i = ξ i − ¯ ξ i for 1 ≤ i ≤ m. The advantage in doing so is that η i ’s are symmetric and i.i.d. random variables with mean 0, variance 2 and E | η | τ < ∞ . Set S m = a η + · · · + a m η m . By a different version of theHoffmann-Jøgensen inequality (Lemma 2.2 from [27]), for any integer j ≥ C j and D j such that P (cid:16) | S m | ≥ x (cid:17) ≤ C j · P (cid:16) max ≤ i ≤ m | a i η i | ≥ x j (cid:17) + D j · P (cid:16) | S m | ≥ x j (cid:17) j (181)for any x > . Similar to (179), P (cid:16) | S m | ≥ x j (cid:17) ≤ j x . (182)Furthermore, P (cid:16) max ≤ i ≤ m | a i η i | ≥ x j (cid:17) ≤ m (cid:88) i =1 P (cid:16) | a i η i | ≥ x j (cid:17) ≤ (2 j ) τ x τ · E | η | τ · m (cid:88) i =1 | a i | τ ≤ (2 j ) τ E | η | τ x τ (183)since (cid:80) mi =1 | a i | τ ≤ τ ≥ 2. Combing (181)-(183), we have P (cid:16) | S m | ≥ x (cid:17) ≤ C (cid:48) j x τ + D (cid:48) j x j for all x > 0, where C (cid:48) j and D (cid:48) j are constants depending on j and τ. Takinginteger j ≥ τ / 2, we have P (cid:16) | S m | ≥ x (cid:17) ≤ Kx τ for x ≥ 3, where K is constant depending on τ. The desired conclusionfollows from (180). (cid:3) L. FENG ET AL. Proof of Lemma 25 . By the Taylor expansion, e y = 1 + y + y + y e ρ for any y ∈ R , where ρ is between 0 and y. It follows that e θξ = 1 + θξ + 12 θ ξ + 16 θ ξ e ρ ≤ θξ + 12 θ ξ + 16 | θ | | ξ | e ω | ξ | / for all θ ∈ [ − ω/ , ω/ . Set λ = E ( | ξ | e ω | ξ | / ). Then λ < ∞ since Ee ω | ξ | < ∞ . It follows that Ee θξ ≤ θ + λ | θ | ≤ exp (cid:16) θ + λ | θ | (cid:17) for all θ ∈ [ − ω/ , ω/ . Now, notice | a i | ≤ i , by the Markovinequality and the above, P ( a ξ + · · · + a m ξ m ≥ x ) ≤ e − τx Ee τ ( a ξ + ··· + a m ξ m ) = e − τx m (cid:89) i =1 Ee a i τξ i ≤ e − τx m (cid:89) i =1 exp (cid:16) a i τ + λ | a i | | τ | (cid:17) for any x ≥ τ ∈ [0 , ω/ . From the assumption that a + · · · + a m = 1we see P ( a ξ + · · · + a m ξ m ≥ x ) ≤ e − τx · exp (cid:16) τ + λ | τ | (cid:17) for all τ ∈ [0 , ω/ . By taking τ = ω/ 2, we get P ( a ξ + · · · + a m ξ m ≥ x ) ≤ e ω + λω · e − ωx/ for all x ≥ 0. Obviously, the above also holds if “ a i ” is replaced by “ − a i ”.By taking K = 2 e ω + λω + ω , we have that P ( a ξ + · · · + a m ξ m ≥ x ) ≤ K · e − x/K . The proof is completed. (cid:3) Proof of Lemma 26 . Since ξ is a subgaussian random variable, thereexists σ > Ee tξ ≤ e σ t / for all t > . Hence, P ( a ξ + · · · + a m ξ m ≥ x ) ≤ e − tx E exp( t ( a ξ + · · · + a m ξ m ))= e − tx m (cid:89) i =1 Ee ta i ξ i ≤ e − tx · e ( a + ··· + a m ) σ t / = e − tx + σ t / AX-SUM TEST FOR CROSS-SECTIONAL DEPENDENCE for all t > . Take t = xσ to get P ( a ξ + · · · + a m ξ m ≥ x ) ≤ e − x / (2 σ ) . Similarly, P (( − a ) ξ + · · · + ( − a m ) ξ m ≥ x ) ≤ e − x / (2 σ ) . The results thenfollows by taking K = 1 / (2 σ ). (cid:3) A.5. Proofs of auxiliary results in Section 7.3.1. Review S T − stands for the unit sphere in the T -dimensional Euclidean space. Proof of Lemma 28 . By Theorem 1.5.7(i) and the argument for (5) onp.147 of [30], the density of s (cid:48) s is given by g ( ρ ) = 1 √ π Γ( T )Γ( T − ) (1 − ρ ) ( T − / , | ρ | < . Hence P ( s (cid:48) s ≥ l N ) = 1 √ π Γ( T )Γ( T − ) (cid:90) l N (1 − ρ ) ( T − / dρ. Let t = t q ∈ (0 , 1) for each q ≥ qt q → ∞ as q → ∞ . By Lemma6.2 from [6], (cid:90) t (1 − ρ ) q/ dρ = 1 qt (1 − t ) ( q +2) / (1 + o (1))as q → ∞ . Now, by taking q = T − ql N = ( T − · NT (1+ o (1)) =4(log N )(1 + o (1)) → ∞ . By (33) from [6],Γ( T )Γ( T − ) = (cid:114) T o (1))as N → ∞ . Consequently, P ( s (cid:48) s ≥ l N ) = 1 √ π · (cid:114) T T − l N (1 − l N ) ( T − / (1 + o (1))= 12 √ π · √ log N · exp (cid:104) T (cid:0) − l N (cid:1)(cid:105) · (1 + o (1))since l N = 4 log N − log log N + yT . L. FENG ET AL. By the Taylor expansion, log(1 − x ) = − x + O ( x ) as x → . Then T (cid:0) − l N (cid:1) = T (cid:2) − l N + O (cid:0) l N (cid:1)(cid:3) = − N + 12 (log log N ) − y + O (cid:16) log NT (cid:17) as N → ∞ . Then the conclusion follows from the assumption log N = o ( √ T ). (cid:3) Proof of Lemma 29 . For any vector a ∈ S T − , the distribution of a (cid:48) s is independent of a ; see, e.g., Theorem 1.5.7(i) and the argument for (5)on p.147 from [30]. Hence, by taking a = (1 , , · · · , (cid:48) ∈ S T − and usingindependence, we see s (cid:48) s has the same distribution as that of Z ( Z + · · · + Z T ) − / , where Z , · · · , Z T are i.i.d. N (0 , P (cid:0) max ≤ i ≤ k | ξ i | ≥ t (cid:1) ≤ k · P ( | ξ | ≥ t )= k · P (cid:16) | Z | (cid:113) Z + · · · + Z T ≥ t (cid:17) . By the large deviation bound for the sum of i.i.d. random variables (see,e.g., page 27 from [13]), P (cid:16) k k (cid:88) i =1 Z i ∈ A (cid:17) ≤ · exp (cid:8) − k inf x ∈ A I ( x ) (cid:9) where A ⊂ R is any Borel set and Λ( x ) = sup θ ∈ R { θx − log Ee θξ } , where ξ is a N (0 , 1) random variable. Since log Ee θξ = − log(1 − θ ) for θ < / , it is easy to check thatΛ( x ) = (cid:40) ( x − − log x ) , if x > ∞ , if x ≤ x ) is decreasing for x ∈ (0 , ∞ . Hence, for any r ∈ (0 , P ( Z + · · · + Z T ≤ rT ) ≤ · e − cT (184) AX-SUM TEST FOR CROSS-SECTIONAL DEPENDENCE where c = Λ( r ) > 0. Thus, P (cid:0) max ≤ i ≤ k | ξ i | ≥ t (cid:1) ≤ k · P (cid:16) | Z | (cid:113) Z + · · · + Z T ≥ t, Z + · · · + Z T > rT (cid:17) + k · P ( Z + · · · + Z T ≤ rT ) ≤ k · P (cid:0) | Z | ≥ t √ rT (cid:1) + (2 k ) · e − cT . Take r = and the result follows by the well-known inequality that P ( | Z | >x ) ≤ e − x / for x ≥ . (cid:3) Proof of Lemma 30 . First,min ≤ i ≤ k v i = √ δZ + √ − δ · min ≤ i ≤ k Z i . If min ≤ i ≤ k v i > x and √ δZ ≤ y , thenmin ≤ i ≤ k Z i > x − y √ − δ . Then, for the event { min ≤ i ≤ k v i > x } , considering if √ δZ ≤ y occurs ornot, we have from independence that P (cid:0) min ≤ i ≤ k v i > x (cid:1) ≤ P (cid:0) √ δZ > y (cid:1) + P (cid:16) min ≤ i ≤ k Z i > x − y √ − δ (cid:17) ≤ P (cid:16) Z > y √ δ (cid:17) + P (cid:16) Z > x − y √ − δ (cid:17) k . Use the inequality that P ( Z > t ) ≤ √ π t e − t / for any t > P (cid:0) min ≤ i ≤ k v i > x (cid:1) ≤ y exp (cid:16) − y δ (cid:17) + 1( x − y ) k · exp (cid:104) − k ( x − y ) − δ ) (cid:105) . The proof is completed. (cid:3) Proof of Lemma 32 . Let Z , · · · , Z T be i.i.d. standard normals. Write Z = ( Z , · · · , Z T ) (cid:48) ∈ R T . Then, s has the same distribution as that of Z (cid:107) Z (cid:107) .Therefore, for each r ∈ (0 , P (cid:0) min ≤ i ≤ k | a (cid:48) i s | > z (cid:1) = P (cid:0) min ≤ i ≤ k | a (cid:48) i Z | > z · (cid:107) Z (cid:107) (cid:1) ≤ P (cid:0) min ≤ i ≤ k | a (cid:48) i Z | > z · (cid:107) Z (cid:107) , (cid:107) Z (cid:107) > √ rT (cid:1) + P ( (cid:107) Z (cid:107) ≤ √ rT (cid:1) ≤ P (cid:0) min ≤ i ≤ k | a (cid:48) i Z | > z √ rT (cid:1) + 2 · e − cT L. FENG ET AL. where c = c r > (cid:110) min ≤ i ≤ k | a (cid:48) i Z | > z √ rT (cid:111) ⊂ (cid:91) (cid:110) min ≤ i ≤ k (cid:15) i a (cid:48) i Z > z √ rT (cid:111) where the union is taken over 2 k many events such that (cid:15) i = ± ≤ i ≤ k . Hence, P (cid:16) min ≤ i ≤ k | a (cid:48) i Z | > z √ rT (cid:17) ≤ (cid:88) P (cid:16) min ≤ i ≤ k (cid:15) i a (cid:48) i Z > z √ rT (cid:17) (185)where the sum runs over all possible (cid:15) i = ± ≤ i ≤ k. Easily, the k -dimensional centered Gaussian random vector u := (cid:15) a (cid:48) ... (cid:15) k a (cid:48) k k × T · Z has covariance matrix Σ = E ( uu (cid:48) ) = (cid:15) a (cid:48) ... (cid:15) k a (cid:48) k E ( ZZ (cid:48) ) · ( (cid:15) a , · · · , (cid:15) k a k )= ( (cid:15) i (cid:15) j a (cid:48) i a j ) k × k . Obviously, the diagonal entries of Σ are all equal to 1 because a i ’s are unitvectors. By assumption, we havemax ≤ i It then follows from Lemma 10 that E (cid:2) ( s (cid:48) i M ij s i ) τ (cid:3) ≤ Cm τ · (cid:110) [tr( M ij )] τ + (cid:104) tr( M ij ) − m [tr( M ij )] (cid:105) τ/ (cid:111) ≤ Cm τ · (cid:110) [tr( M ij )] τ + (cid:2) tr( M ij ) (cid:3) τ/ (cid:111) ≤ C (190)by Lemma 6. Then, (189) and (190) lead to the first conclusion. Now weprove the second one. Notice E (cid:12)(cid:12)(cid:12) N (cid:88) j = i +1 ( ˆ ρ ij − E ˆ ρ ij ) (cid:12)(cid:12)(cid:12) τ ≤ τ − · E (cid:12)(cid:12)(cid:12) N (cid:88) j = i +1 (cid:2) ˆ ρ ij − E ( ˆ ρ ij | s i ) (cid:3)(cid:12)(cid:12)(cid:12) τ + 2 τ − · E (cid:12)(cid:12)(cid:12) N (cid:88) j = i +1 [ E ( ˆ ρ ij | s i ) − E ˆ ρ ij ] (cid:12)(cid:12)(cid:12) τ . (191)By (28) and the fact that { ˆ ρ ij ; 1 ≤ j ≤ N, j (cid:54) = i } are conditionally inde-pendent random variables given s i , we see that E (cid:12)(cid:12)(cid:12) N (cid:88) j = i +1 (cid:2) ˆ ρ ij − E ( ˆ ρ ij | s i ) (cid:3)(cid:12)(cid:12)(cid:12) τ ≤ K τ · ( N − i ) ( τ/ − · N (cid:88) j = i +1 E (cid:12)(cid:12) ˆ ρ ij − E ˆ ρ ij (cid:12)(cid:12) τ . Take expectation for both sides of the above and use the first conclusion tosee that E (cid:12)(cid:12)(cid:12) N (cid:88) j = i +1 (cid:2) ˆ ρ ij − E ( ˆ ρ ij | s i ) (cid:3)(cid:12)(cid:12)(cid:12) τ ≤ C · ( N − i ) τ/ m τ . (192)Now we estimate the last term from (191). By (187) and (188), E (cid:12)(cid:12)(cid:12) N (cid:88) j = i +1 (cid:2) E ( ˆ ρ ij | s i ) − E ˆ ρ ij (cid:3)(cid:12)(cid:12)(cid:12) τ = 1 m τ · E (cid:12)(cid:12)(cid:12) N (cid:88) j = i +1 (cid:2) s (cid:48) i M ij s i − E ( s (cid:48) i M ij s i ) (cid:3)(cid:12)(cid:12)(cid:12) τ = 1 m τ · E (cid:12)(cid:12) s (cid:48) i M i (cid:72) s i − E ( s (cid:48) i M i (cid:72) s i ) (cid:12)(cid:12) τ , (193)where M i (cid:72) := N (cid:88) j = i +1 M ij . AX-SUM TEST FOR CROSS-SECTIONAL DEPENDENCE By Lemma 9 again, E (cid:12)(cid:12) s (cid:48) i M i (cid:72) s i − E ( s (cid:48) i M i (cid:72) s i ) (cid:12)(cid:12) τ ≤ Cm τ · (cid:104) tr( M i (cid:72) ) − m [tr( M i (cid:72) )] (cid:105) τ/ . (194)First, tr( M i (cid:72) ) = N (cid:88) j = i +1 tr( P i P j ) . Easily, tr( M ij M ik ) = tr( P j P i P k P i ) since M ij = U (cid:48) i U j U (cid:48) j U i and U i U (cid:48) i = P i for each i = 1 · · · N as stated in (30). Hence,tr( M i (cid:72) ) = (cid:88) i REFERENCES [1] Baltagi, B. H. (2013). Econometric Analysis of Panel Data. Wiley, 5 ed.[2] Billingsley, P. (1995). Probability and Measure . Wiley-Interscience, 3 ed.[3] Breusch, T. and Pagan, A. (1980). The lagrange multiplier test and its applica-tions to model specification in econometrics. The Review of Economic Studies Cai, T. Fan. J. and Jiang, T. (2013). Distributions of angles in random packingon spheres. Journal of Machine Learning Research Cai, T. and Jiang, T. (2011). Limiting laws of coherence of random matrices withapplications to testing covariance structure and construction of compressed sensingmatrices Ann. Stat. Cai, T. and Jiang, T. (2012). Phase transition in limiting distributions of coherenceof high-dimensional random matrices. Journal of Multivariate Analysis Cai, T. and Liu, W. (2011). Adaptive thresholding for sparse covariance matrixestimation. Journal of the American Statistical Association Cai, T., Liu, W. and Xia, Y. (2013). Two-sample covariance matrix testing andsupport recovery in high-dimensional and sparse settings. Journal of the AmericanStatistical Association Cai, T., Liu, W. and Xia, Y. (2014). Two-sample test of high dimensional meansunder dependence. Journal of The Royal Statistical Society Series B-statisticalMethodology Cai, T. and Zhang, A. (2016). Inference for high-dimensional differential correlationmatrices. Journal of Multivariate Analysis Chow, Y. S. and Teicher, H. (1997). Probability Theory: Independence, Inter-changeability, Martingales . Springer, 3rd ed.[12] Chudik, A. and Pesaran, M. H. (2015). Large panel data models with cross-sectional dependence: a survey. The Oxford Handbook of Panel Data Dembo, A. and Zeitouni, O. (1998). Large Deviations Techniques and Applica-tions . Springer, 2nd ed.[14] Dong, Z., Jiang, T. and Li, D. (2012). Circular law and arc law for truncation ofrandom unitary matrix. Journal of Mathematical Physics Durrett, R. (2019). Probability: Theory and Examples . Cambridge University Press,5th ed.[16] Fama, E. and French, K. (1993). Common risk factors in the returns on stocksand bonds. Journal of Financial Economics Horn, G. A. and Johnson, C. R. (2012). Matrix Analysis . Cambridge UniversityPress, 2nd ed.[18] Hsiao, C. (2014). Analysis of Panel Data . Cambridge University Press, 3rd ed.[19] Hsiao C, Pesaran, M. H. and Pick, A. (2012). Diagnostic tests of cross-sectionalindependence for limited dependent variable panel data models. Oxford Bull EconomStatist Hsing, T. (1995). A note on asmptotic independence of the sum and maximum ofstrongly mixing stationary random variables. Ann. Probab. James, B., James K. and Qi, Y. (1998). Limiting distribution of the sum andmaximum from multivariate Gaussian sequences. J. Multivariate Analysis Jiang, T. (2004). The asymptotic distributions of the largest entries of sample cor-relation matrices. Ann. Appl. Probab. Jiang, T. (2009). A variance formula related to quantum conductance. Physics Let-ters A [24] Jiang, T. (2019). Determinant of sample correlation matrix with application. Ann.Appl. Probab. Jiang, T. and Qi, Y. (2015). Limiting distributions of likelihood ratio tests for high-dimensional normal distributions. Scandinavian Journal of Statistics Jiang, T. and Yang, F. (2013). Central limit theorems for classical likelihood ratiotests for high-dimensional normal distributions. Ann. Stat. Li, D., Rao, M., Jiang, T. and Wang, X. (1995). Complete convergence andalmost sure convergence of weighted sums of random variables. J. Theoret. Probab. Liu, W., Lin, Z. and Shao, Q. (2008) The asymptotic distribution and Berry-Esseen bound of a new test for independence in high dimension with an applicationto stochastic optimization. Ann. Appl. Probab. Moscone, F. and Tosetti, E. (2009). A review and comparison of tests of cross-section independence in panels. Journal of Economic Surveys Muirhead, R. J. (1982). Aspects of Multivariate Statistical Theory . Wiley, New York.[31] Pesaran, M. H. (2004). General diagnostic test for cross section dependence inpanels. IZA Discussion Paper No. 1240. [32] Pesaran, M. H. (2015). Testing weak cross-sectional dependence in large panels. Econometric Reviews Pesaran, M. H. (2015). Time Series and Panel Data Econometrics . Oxford.[34] Pesaran, M. H., Ullah, A. and Yamagata, T. (2008). A bias-adjusted lm testof error cross-section independence. Econometrics Journal Sarafidis, V. and Wansbeek, T. (2012). Cross-sectional dependence in panel dataanalysis. Econometric Reviews Schott, J. R. (2012). Testing for complete independence in high dimensions. Biometrika Slepian, D. (1962). The one-sided barrier problem for Gaussian noise. Bell SystemTechnical Journal Stephan, F. F. (1934). Sampling errors and interpretations of social data orderedin time and space. Journal of the American Statistical Association Wooldridge, J. M. (2010). Econometric Analysis of Cross Section and Panel Data .The MIT Press.[40] Xu, G., Lin, L., Wei, P. and Pan, W. (2016) An adaptive two-sample test forhigh-dimensional means. Biometrika Ann. Stat. Zhou, W. (2007) Asymptotic distribution of the largest off-diagonal entry of corre-lation matrices. Trans. Amer. Math. Soc. Address of Long Feng and Binghui LiuKey Laboratory of Applied Statistics of MOE & School of Mathematics and StatisticsNortheast Normal University [email protected] ; [email protected] Address of Tiefeng JiangSchool of Statistics313 Ford Hall224 Church Street SEMinneapolis, MN55455E-mail: [email protected] L. FENG ET AL.