Simultaneous Inference for High Dimensional Mean Vectors
SSimultaneous Inference for High Dimensional Mean Vectors
By Zhipeng Lou and Wei Biao WuDepartment of Statistics, University of ChicagoSeptember 15, 2018
Abstract
Let X , . . . , X n ∈ R p be i.i.d. random vectors. We aim to perform simultaneousinference for the mean vector E ( X i ) with finite polynomial moments and an ultrahigh dimension. Our approach is based on the truncated sample mean vector. AGaussian approximation result is derived for the latter under the very mild finitepolynomial ((2+ θ )-th) moment condition and the dimension p can be allowed to growexponentially with the sample size n . Based on this result, we propose an innovativeresampling method to construct simultaneous confidence intervals for mean vectors. Let X , . . . , X n be i.i.d. random vectors in R p with mean vector E X i = µ and covariancematrix Cov( X i ) = Σ. We are interested in conducting statistical inference for µ whenthe dimension p can be comparable to or even much larger than n . Estimating µ by thetraditional sample mean ˆ µ = (cid:80) ni =1 X i /n , Chernozhukov et al. (2013) and Chernozhukovet al. (2014) stated that under suitable moment conditions, as n → ∞ and possibly p = p n → ∞ , ρ n := sup t ∈ R (cid:12)(cid:12) P ( √ n | ˆ µ − µ | ∞ ≤ t ) − P ( | Y | ∞ ≤ t ) (cid:12)(cid:12) → . (1)where Y = ( Y , . . . , Y p ) (cid:62) ∈ R p is a centered Gaussian vector with Cov( Y ) = Σ and | x | ∞ = max j ≤ p | x j | is the usual (cid:96) ∞ norm for a vector x = ( x , . . . , x p ) (cid:62) ∈ R p . Suppose weonly have uniformly finite q -th ( q >
3) moment for each coordinate of X i , i.e., there existsa constant C > j ≤ p E | X ij | q ≤ C (2)1 a r X i v : . [ m a t h . S T ] A p r or some q >
3. We can show that the condition p (log p ) q/ − = o ( n q/ − ) (3)is nearly optimal for (1); see Proposition 2.2. If (3) is barely violated, then under (2) wecan have ρ n →
1; see (12) and (13). Hence in general, the allowed dimension p can be atmost a power of n if we use ˆ µ for inference of µ . In this paper, we propose a new approachto perform simultaneous inference for the mean vector with finite polynomial moments andshow that our method applies under ultra high dimensional settings, in which log p = o ( n c )for some c > θ )-th (0 < θ ≤
1) moments for X ij , 1 ≤ j ≤ p . Asan important feature, the dimension p can be as large as e o ( n c ) for some c > µ . In particular, wepropose an innovative resampling method called truncated half sampling procedure toconstruct simultaneous confidence intervals of µ in Section 3. As a main advantage of thismethod, we do not need to deal with the problem of estimating the covariance matriceswhich is highly nontrivial and computationally intensive in the high dimensional case, andwhich may require extra structural assumptions. Given any κ >
0, let t κ : R → [ − κ, κ ] be the truncated function defined by t κ ( x ) =( x ∧ κ ) ∨ ( − κ ). Given the data X , X , . . . , X n , define the truncated sample mean vectorˆ µ κ = 1 n n (cid:88) i =1 t κ ( X i ) , (4)where t κ ( X i ) = ( t k ( X i ) , . . . , t κ ( X ip )) (cid:62) ∈ R p . With properly chosen truncated level κ ,we establish a Gaussian approximation result for | ˆ µ κ | ∞ with uniformly finite (2 + θ )-th2oments for X ij , 1 ≤ j ≤ p . Theorem 2.1.
Let µ = E X i = 0 . Assume there exist constants b, M θ > such that min j ≤ p E | X ij | ≥ b and M θ = max j ≤ p E | X ij | θ < ∞ for some < θ ≤ . Further assume (log p ) θ M θ = o ( n θ ) . (5) and we take κ (cid:16) ( nM θ / log p ) / (2+ θ ) . Then as n → ∞ , ρ n,κ := sup t ∈ R | P ( √ n | ˆ µ κ | ∞ ≤ t ) − P ( | Y | ∞ ≤ t ) | → . (6) Proof.
For simplicity of notation, write E [ X ] = X − E X . Elementary calculation impliesfor q = 1 ,
2, max j ≤ p E | X ij | q {| X ij | ≥ κ } ≤ κ q − − θ M θ . (7)Recall that t κ ( X ij ) = ( X ij ∧ κ ) ∨ ( − κ ). Then for any j = 1 , . . . , p , as κ → ∞ , we have (cid:12)(cid:12) E | E [ t κ ( X ij )] | − E | X ij | (cid:12)(cid:12) ≤ E | X ij | {| X ij | ≥ κ } + ( E | X ij | {| X ij | ≥ κ } ) ≤ κ − θ M θ + κ − θ ) M θ → , which implies min j ≤ p E | E [ t κ ( X ij )] | ≥ b − o (1) >
0. Define L κ = max j ≤ p E | E [ t κ ( X ij )] | .By a similar argument as above, for any j = 1 , . . . , p , E | t κ ( X ij ) | = E | X ij | {| X ij | ≤ κ } + κ E {| X ij | ≥ κ }≤ κ − θ E | X ij | θ ≤ κ − θ M θ . So we have L κ ≤ κ − θ M θ := ¯ L κ . Let φ κ be the quantity defined in (26) with ¯ L n replacedby ¯ L κ . We are to apply Lemma 4.1 for the i.i.d. vectors E [ t κ ( X i )], 1 ≤ i ≤ n , and evaluatethe quantities therein. Let κ satisfy 2 κ < √ n/ (4 φ κ log p ). It can be easily seen that M n, E [ t κ ( X )] ( φ κ ) := E (cid:104) max j ≤ p | E [ t κ ( X j )] | (cid:110) max j ≤ p | E [ t κ ( X j )] | > √ n/ (4 φ κ log p ) (cid:111)(cid:105) = 0 . Let Y κ = ( Y κ, , . . . , Y κ,p ) (cid:62) ∈ R p be the analogue centered Gaussian vector with the samecovariance matrix as E [ t κ ( X i )]. It follows that M n,Y κ ( φ κ ) := E (cid:104) max j ≤ p | Y κ,j | (cid:110) max j ≤ p | Y κ,j | > √ n/ (4 φ κ log p ) (cid:111)(cid:105) E | Y κ | ∞ {| Y κ | ∞ > κ } (cid:46) P ( | Y κ | ∞ > κ ) κ + (cid:90) ∞ κ P ( | Y κ | ∞ > x ) x dx =: I + II . As has been assumed, E | Y κ,j | = E | E [ t κ ( X ij )] | is upper bounded by M / (2+ θ ) θ . We have I (cid:46) pκ e − Cκ /M / (2+ θ ) θ . Also, elementary manipulation implies II (cid:46) pκM / (2+ θ ) θ e − Cκ /M / (2+ θ ) θ + pM / (2+ θ ) θ e − Cκ /M / (2+ θ ) θ . Hence, M n,Y κ ( φ κ ) (cid:46) pκ e − Cκ /M / (2+ θ ) θ + pκM / (2+ θ ) θ e − Cκ /M / (2+ θ ) θ + pM / (2+ θ ) θ e − Cκ /M / (2+ θ ) θ (cid:46) p ( nM θ ) / (2+ θ ) exp {− C ( n M − (2+ θ ) θ / log θ p ) / (2+ θ ) log( pn ) M θ } ≤ n − , where the second line follows in view of (5). By Lemma 4.1, we have ρ ∗ n,κ := sup t ∈ R | P ( √ n | E [ˆ µ κ ] | ∞ ≤ t ) − P ( | Y κ | ∞ ≤ t ) | (cid:46) (cid:16) M θ log θ pn θ (cid:17) / (4+2 θ ) + 1 n . (8)With (8), we are to consider the error bound between | ˆ µ κ | ∞ and | E [ˆ µ κ ] | ∞ . Since E X i = 0,we have | E ˆ µ k | ∞ = max j ≤ p (cid:12)(cid:12) E ( X ij − κ ) { X ij > κ } + E ( X ij + κ ) { X ij < − κ } (cid:12)(cid:12) ≤ max j ≤ p E | X ij | {| X ij | > κ } ≤ κ − (1+ θ ) M θ . By (7) and Lemma 2.1 in Chernozhukov et al. (2013), for any δ > ρ (cid:5) n,κ := sup t ∈ R (cid:12)(cid:12) P ( √ n | ˆ µ κ | ∞ ≤ t ) − P ( √ n | E [ˆ µ κ ] (cid:12)(cid:12) ∞ ≤ t ) |≤ P ( √ n | ˆ µ κ − E [ˆ µ κ ] | ∞ > δ ) + sup t ∈ R P ( |√ n | E [ˆ µ κ ] | ∞ − t | ≤ δ ) ≤ P ( √ n | E ˆ µ κ | ∞ > δ ) + ρ ∗ n,κ + sup t ∈ R P ( || Y κ | ∞ − t | ≤ δ ) ≤ ρ ∗ n,κ + 2 √ nM θ κ − (1+ θ ) (cid:112) log p, (9)where the last inequality follows by taking δ = √ nM θ κ − (1+ θ ) .Next we compare | Y k | ∞ and | Y | ∞ . Observe thatmax j,l | Cov( Y κ,j , Y κ,l ) − Cov( Y j , Y l ) | = max j,l | Cov( t κ ( X ij ) , t κ ( X il ) − Cov( X ij , X il ) |≤ Cκ − θ M θ . By Lemma 3.1 in Chernozhukov et al. (2013), we obtain ρ ◦ n,κ := sup t ∈ R (cid:12)(cid:12) P ( | Y κ | ∞ ≤ t ) − P ( | Y | ∞ ≤ t ) (cid:12)(cid:12) (cid:46) ( M θ /κ θ ) / (1 ∨ log( pκ θ )) / . (10)4herefore, by (8), (9) and (10), we have ρ n,κ ≤ ρ ∗ n,κ + ρ (cid:5) n,κ + ρ ◦ n,κ (cid:46) ρ ∗ n,κ + √ nM θ κ − (1+ θ ) (cid:112) log p + ( M θ /κ θ ) / log / p = ρ ∗ n,κ + (cid:16) M θ log θ pn θ (cid:17) / (4+2 θ ) + (cid:16) M θ log θ pn θ (cid:17) / (6+3 θ ) ≤ n + 3 (cid:16) M θ log θ pn θ (cid:17) / (6+3 θ ) → . Remark . By Propsition 2.2, the Gaussian approximation ρ n → q -th moment and p (log p ) q/ − = o ( n q/ − ) . (11)The above condition is optimal up to a logarithmic factor. In fact, if n q/ − = o ( p (log p ) − − q/ ) , (12)and X ij are i.i.d. symmetric random variables with E X ij = 0, E | X ij | = 1 and the tailprobability P ( X ij ≥ x ) = x − q (log x ) − , x ≥ x , then ρ n →
1; (13)see the discussion in Remark 2 of Zhang and Wu (2017). Here using the truncated samplemean, Theorem 2.1 allows p to grow exponentially with n . For example, with only finitethird moment, i.e., if M = max j ≤ p E | X ij | = O (1), p can be as large as e o ( n / ) . Proposition 2.2.
Assume (2) and there exists some constant b > such that Var ( X ij ) ≥ b for all j = 1 , . . . , p . Then ρ n → as n → ∞ under p (log p ) q/ − = o ( n q/ − ) . (14) Proof.
We apply Lemma 4.1. Recall the definitions of the quantities φ n , M n,X ( φ n ) and L n therein. Denote τ = √ n/ (4 φ n log p ) and substitute φ n = K ( ¯ L n log p/n ) − / with some¯ L n ≥ L n . We can obtain τ = ( n ¯ L n / log p ) / / (4 K ) (15)5ince max j ≤ q E | X ij | q < ∞ , by the Bonferroni technique and Markov’s inequality, M n,X ( φ n ) = P ( | X i | ∞ > τ ) τ + 3 (cid:90) ∞ τ P ( | X i | ∞ > x ) x dx (cid:46) pτ − q . (16)Choose ¯ L n = max j ≤ p E | X j | · ( p /q (log p ) − /q n /q − ) − χ ( n / (log p ) − / ) χ , where (1 − /q ) / (3 / − /q ) < χ < L n ≥ L n is ensured. By (15) and (16), elementarycalculation indicates that ρ n → Based on the Gaussian approximation result for the truncated sample mean (cf. Theorem2.1), we are able to construct simultaneous confidence intervals (SCIs) of µ . Given aconfidence level α ∈ (0 , − α ) SCIs can be constructed by C α,κ = (cid:8) ν = ( ν , . . . , ν p ) (cid:62) ∈ R p : (cid:12)(cid:12) n (cid:88) i =1 t κ ( X i − ν ) (cid:12)(cid:12) ∞ ≤ √ nc − α (cid:9) (17)where c − α is the cutoff value determined by (19) below. Note that for every j = 1 , . . . , p , f j ( y ) := (cid:80) i t κ ( X ij − y ) is a non-increasing and continuous function of y and it is lowerand upper bounded by − nκ and nκ respectively. Assume nκ > √ nc − α . Let l j , u j be thesolutions to the equation f j ( y ) = ±√ nc − α . Then the SC for µ j is [ l j , u j ], 1 ≤ j ≤ p .Note that | (cid:80) i t κ ( X i − ν ) | ∞ = 0 gives the Huber estimate.Similarly, the null hypothesis H : µ = µ can be rejected at level α if | (cid:80) i t κ ( X i − µ ) | ∞ ≥ √ nc − α .Now we shall determine the cutoff value c − α so that the SCIs given by (17) is asymp-totically valid. If Cov( X i ) = Σ is known, by Theorem 2.1, the resultsup t ∈ R (cid:12)(cid:12)(cid:12) P (cid:16) √ n (cid:12)(cid:12)(cid:12) n (cid:88) i =1 t κ ( X i − µ ) (cid:12)(cid:12)(cid:12) ∞ ≤ t (cid:17) − P ( | Y | ∞ ≤ t ) (cid:12)(cid:12)(cid:12) → c − α by c − α = inf { t ∈ R : P ( | Y | ∞ ≤ t ) ≥ − α } . (19)When Σ is unknown, one natural approach is to use a consistent estimator. Estimation ofcovariance matrices in high dimensions is highly nontrivial. The restriction mainly lies inthe requirement of extra structural assumptions such as bandedness or sparsity.6ow we propose an innovative method to estimate the cutoff value c − α which canavoid the estimation of covariance matrices and is very convenient to implement. Forsimplicity, suppose the sample size is even denoted by n = 2 m . Let π = ( π (1) , . . . , π ( n ))be a permutation of [ n ] = { , , . . . , n } and define Z π,i = ( X π ( i ) − X π ( m + i ) ) / √ i =1 , . . . , m . By such construction, Z π, , . . . , Z π,m are then i.i.d. symmetric random vectorswith covariance matrix Σ. By Theorem 2.1 and the fact that E t κ ( Z π,i ) = 0, we havesup t ∈ R (cid:12)(cid:12) P ( √ m | ¯ Z π,κ | ∞ ≤ t ) − P ( | Y | ∞ ≤ t ) (cid:12)(cid:12) → , (20)where ¯ Z π,κ = (cid:80) mi =1 Z π,i /m . The result (20) provides theoretical support for the validity ofthe procedure estimating c − α by the (1 − α ) quantile of √ m | ¯ Z π,κ | . Let Π n be the collectionof all permutations of [ n ] and let π , . . . , π J be i.i.d. uniformly sampled from Π n . Assumethat the sampling process ( π i ) i ≥ is independent of ( X i ). Define F J ( t ) = 1 J J (cid:88) j {√ m | ¯ Z π j ,κ | ∞ ≤ t } , (21)where ¯ Z π j ,κ = (cid:80) mi =1 Z π j ,κ /m . We can obtain the empirical (1 − α )-th quantile of F J ( t ) byˆ q − α = inf { t ∈ R : F J ( t ) ≥ − α } and estimate c − α by ˆ q − α . The simultaneous confidenceinterval for µ can be constructed as C † α,κ = (cid:8) ν ∈ R p : (cid:12)(cid:12) (cid:88) i t κ ( X i − ν ) (cid:12)(cid:12) ∞ ≤ √ n ˆ q − α (cid:9) . (22) Lemma 4.1 follows from Theorem 2.1 in Chernozhukov et al. (2014) for the i.i.d. case, whichis useful to prove Theorem 2.1. For the i.i.d. random vectors X i , define L n = max j ≤ p E | X ij | and for φ >
1, define M n,X ( φ ) = E (cid:104) max j ≤ p | X j | (cid:110) max j ≤ p | X j | > √ n/ (4 φ log p ) (cid:111)(cid:105) . (23)Let Y be a centered Gaussian vector with Cov( Y ) = Σ. We can define M n,Y ( φ ) similarlywith X j replaced by Y j in (23). Let M n ( φ ) := M n,X ( φ ) + M n,Y ( φ ) . (24)7 emma 4.1. Suppose that there exists some constant b > such that min j ≤ p E X ij ≥ b .Then there exist constants K , K > depending only on b such that for every constant ¯ L n ≥ L n , ρ n ≤ K (cid:20)(cid:16) ¯ L n log pn (cid:17) / + M n ( φ n )¯ L n (cid:21) (25) with φ n = K (cid:16) ¯ L n log pn (cid:17) − / . (26) References
Victor Chernozhukov, Denis Chetverikov, and Kengo Kato. Gaussian approximationsand multiplier bootstrap for maxima of sums of high-dimensional random vectors.
TheAnnals of Statistics , 41(6):2786–2819, 2013. 1, 4Victor Chernozhukov, Denis Chetverikov, and Kengo Kato. Central limit theorems andbootstrap in high dimensions. arXiv preprint arXiv:1412.3661 , 2014. 1, 7Jianqing Fan, Weichen Wang, and Ziwei Zhu. Robust low-rank matrix recovery. arXivpreprint arXiv:1603.08315 , 2016. 2Danna Zhang and Wei Biao Wu. Gaussian approximation for high dimensional time series.