Comments on `High-dimensional simultaneous inference with the bootstrap'
CCOMMENTS ON HIGH-DIMENSIONAL SIMULTANEOUSINFERENCE WITH THE BOOTSTRAP
JELENA BRADIC AND YINCHU ZHU
The authors should be congratulated on their insightful article proposing forms ofresidual and paired bootstrap methodologies in the context of simultaneous testing insparse and high-dimensional linear models. We appreciate the clear exposition of theirwork, and the effectiveness of the proposed method. The authors advocate for the boot-strap of a complete high-dimensional estimate rather than the linearized part of the teststatistic. We appreciate the opportunity to comment on several aspects of this article.1.
Bootstraps relative efficiency
The problem of forming a confidence set for the many parameters of interests in high-dimensional setting differs from a more routine interval estimation problems in low-dimensional setting, in that the estimator itself induces shrinkage and bias at estimation.These difficulties, in turn, prohibit simple and intuitive guidelines for judging the relativeefficiency of one bootstrap scheme with respect to the other. Thus it seems interestingto discuss relative efficiencies measured by the width of the confidence intervals (amongthose that achieve nominal coverage).We performed a small scale simulation study to investigate a number of scenarios. Forthat end, we consider Y = Xβ ∗ + ε , where X ∈ R n × p has i.i.d entries generated from N (0 , and ε ∈ R n has i.i.d entries across i with ε i = σ ε,i e i . Here e i is independent of σ ε,i and follows one of the four distributions: (1) N (0 , (Gaussian), (2) centered expo-nential random variables with parameter 1 (Exponential), (3) student-t distribution with6 degrees of freedom normalized to have variance equal to 1 (Student) and (4) mixture oftwo Gaussian distributions centered at 0.95 and -0.99 such that the mixture distributionhas mean zero and variance one (Mixture). The conditional standard deviation σ ε,i iseither 1 (homoscedastic) or | X i, | (heteroscedastic). We use β ∗ = (1 , , , ..., > ∈ R p .The goal is to test H : β ∗ j = β oj j = { , ..., } . We report coverage probability and width of 95% confidence sets. For the ease ofcomparison across different sample size, the width times √ n is reported. The results arecomputed using 2000 random samples. We write Residual bootstrap (RB), multiplierwild bootstrap with Gaussian multipliers (MBG), xyz-paired bootstrap (xyz), Zhangand Cheng (ZC), robust Zhang and Cheng (RZC), multiplier bootstrap with Radmachermultipliers (MBR) and multiplier bootstrap with Mammen multipliers (MBM). For ro-bust Zhang and Cheng, instead of bootstrapping k n − / P ni =1 ˆΘ > X i k ∞ and multiplyingit by ˆ σ ε as proposed by [3], we directly bootstrap k n − / P ni =1 ˆΘ > X i ˆ ε i k ∞ ., where ˆΘ isthe nodewise Lasso estimator as defined therein. a r X i v : . [ s t a t . M E ] M a y JELENA BRADIC AND YINCHU ZHU
Table 1.
The Coverage (Cov) and Width of the confidence interval. De-sign has Gaussian distribution whereas the distribution of the errors is ho-moscedastic and varies from symmetric to non-symmetric to heavy tailedto bimodal. n = 100 and p = 150 Gaussian Exponential Student MixtureCov Width Cov Width Cov Width Cov WidthRB 0.947 3.382 0.952 3.337 0.944 3.336 0.936 3.418MBG 0.922 3.269 0.934 3.237 0.925 3.235 0.904 3.316MBR 0.943 3.373 0.950 3.323 0.941 3.322 0.934 3.407MBM 0.909 3.253 0.925 3.221 0.913 3.222 0.901 3.298XYZ 0.990 3.809 0.991 3.834 0.990 3.840 0.976 3.805ZC 0.945 3.838 0.955 3.814 0.948 3.775 0.942 3.844RZC 0.967 3.959 0.974 4.053 0.972 4.018 0.952 3.907 n = 200 and p = 300 RB 0.943 3.366 0.947 3.322 0.939 3.327 0.941 3.403MBG 0.924 3.285 0.928 3.247 0.931 3.246 0.925 3.335MBR 0.939 3.366 0.944 3.314 0.950 3.321 0.936 3.400MBM 0.921 3.291 0.934 3.248 0.927 3.250 0.924 3.333XYZ 0.967 3.591 0.985 3.626 0.979 3.635 0.966 3.576ZC 0.940 3.624 0.951 3.595 0.946 3.595 0.948 3.626RZC 0.953 3.689 0.966 3.760 0.963 3.766 0.960 3.6461.1.
Gaussian design.
What we have observed in this simulation study differs fromthe classical low-dimensional setting where wild bootstrap and paired bootstrap providethe most robust bootstrap alternatives. Table 1 contains results of the simulation studyand indicate that the MBG and MBM undercover and that XYZ bootstrap has muchlarger coverage than the nominal level of . The effects of shrinkage, the nonlinearityit induces and the tuning parameter choices, result in pattern of the bootstrap efficiencythat is different from the intuitive one (see [1], Section 2.1). The MBR and RB performsimilarly with coverage of around 95% and the shortest widths among all the methods. Inlow-dimensional settings, [2] shows that the distribution of the wild bootstrap convergesfaster than the paired bootstrap; however, we did not observe any indication that thismay be true in high-dimensions. The proposed ZC and RZC even in the case of nominalcoverage provide much wider confidence intervals suggesting they are less efficient inthis example. With the increase in the sample size, the difference in the width shrunksuggesting possible slower convergence rate or RZC and ZC in comparison to MBR andRB. On the other hand, increase in the sample size improved the coverage of XYZ;however, it did not improve the coverage of MBG and MBM. Lastly, all seven methodswere robust to the size of the p .1.2. Heteroscedasticity.
The wild bootstrap has often been interpreted as a procedurethat resamples residuals in a manner that captures any heteroscedasticity in the under-lying errors. Section 5.2 in [1] gives Monte Carlo evidence supporting the superiority of
OMMENTS ON HIGH-DIMENSIONAL SIMULTANEOUS INFERENCE WITH THE BOOTSTRAP 3
Table 2.
The Coverage (Cov) of the confidence interval and width attwo significances: W90 for 90% and W95 for 95%. Design has Gaussiandistribution whereas the distribution of the errors is heteroscedastic andvaries from symmetric to non-symmetric to heavy tailed to lastly bimodal. n = 100 and p = 150 Gaussian Exponential Student MixtureCov Width Cov Width Cov Width Cov WidthRB 0.946 3.306 0.939 3.240 0.960 3.284 0.943 3.341MBG 0.923 3.199 0.920 3.155 0.939 3.184 0.924 3.234MBR 0.942 3.278 0.930 3.195 0.954 3.250 0.943 3.322MBM 0.915 3.187 0.914 3.152 0.931 3.160 0.919 3.232XYZ 0.994 3.883 0.989 3.958 0.994 3.925 0.989 3.920ZC 0.910 3.788 0.914 3.735 0.927 3.679 0.923 3.803RZC 0.957 4.216 0.960 4.351 0.970 4.160 0.956 4.164 n = 200 and p = 300 RB 0.942 3.304 0.946 3.233 0.950 3.268 0.943 3.346MBG 0.919 3.213 0.923 3.155 0.927 3.181 0.917 3.254MBR 0.934 3.291 0.943 3.216 0.950 3.251 0.938 3.343MBM 0.917 3.218 0.924 3.170 0.928 3.183 0.921 3.269XYZ 0.982 3.640 0.985 3.714 0.989 3.709 0.977 3.615ZC 0.912 3.592 0.898 3.550 0.920 3.562 0.904 3.600RZC 0.960 3.966 0.956 4.083 0.969 4.048 0.940 3.904the wild bootstrap for carrying out a t test for the least squares estimator in the het-eroskedastic linear model. A central thesis of the article is that the failure of existingmultiplier bootstrap schemes, such as the multiplier bootstrap [3], is due to its neglectof the excess variation resulting from the possible non-gaussianity of the model error orthe presence of heteroscedastic errors.As claimed by the authors, ZC method undercovers in all heteroscedastic cases. Ourproposed robust version of the ZC, RZC, put the coverage at the correct order; however,it creates confidence intervals which are prominently wide. The XYZ method over coversbut interestingly has the width much smaller than the RZC, indicating suboptimality ofthe RZC method. However, traditional benefits of the wild bootstrap appear lost now asthe wild bootstrap is not able to capture the heteroscedasticity; MBG and MBM bothunder cover significantly below the expected level.1.3.
Non-Gaussian design.
In this section we considered a non-gaussian design modeland tested the ability of the proposed methods to adapt. We observed that both RB andZC showcase strong robustness to the distribution of the design; both methods relate toeach other in the similar way to Section 1.1. However, here we observed the MBR is notalways stable with the non-gaussian design with MBG and MBM failing to cover at thenominal level. XYZ is still over covering.
JELENA BRADIC AND YINCHU ZHU
Table 3.
The Coverage (Cov) and Width of the confidence interval. De-sign has Exponential distribution whereas the distribution of the errorsis homoscedastic and varies from symmetric to non-symmetric to heavytailed to bimodal. n = 100 and p = 150 Gaussian Exponential Student MixtureCov Width Cov Width Cov Width Cov WidthRB 0.933 3.309 0.951 3.325 0.946 3.279 0.943 3.338MBG 0.914 3.223 0.916 3.206 0.933 3.199 0.931 3.268MBR 0.931 3.303 0.930 3.271 0.947 3.269 0.947 3.350MBM 0.910 3.202 0.908 3.196 0.925 3.180 0.922 3.251XYZ 0.986 3.898 0.993 3.940 0.993 3.939 0.988 3.887ZC 0.946 4.091 0.932 4.082 0.934 4.047 0.957 4.121RZC 0.972 4.387 0.984 4.667 0.982 4.633 0.971 4.276 n = 200 and p = 300 RB 0.941 3.301 0.942 3.368 0.939 3.275 0.940 3.335MBG 0.916 3.233 0.893 3.204 0.918 3.195 0.925 3.279MBR 0.934 3.299 0.906 3.265 0.932 3.258 0.938 3.346MBM 0.925 3.235 0.880 3.206 0.922 3.202 0.922 3.274XYZ 0.983 3.653 0.980 3.746 0.985 3.713 0.975 3.614ZC 0.949 3.765 0.920 3.738 0.930 3.747 0.947 3.769RZC 0.968 3.964 0.982 4.252 0.976 4.280 0.956 3.8461.4.
Non-gaussian design and heteroscedasticity.
The effect of the heteroscedas-ticity here is more pronounced as the design is non-gaussian. We observe the completefailure of the ZC method with coverage going as low as 70% for the nominal level of . In this case, we observe that RB needs larger sample size to cover the heavy-tailederror distributions; however, most methods underperform substantially. All of the wildmultiplier bootstraps fail with MBR slightly performing better for larger n . RZC per-haps has the most consistency in covering, but its width can be massive. The case of thebimodal error seems to be particularly difficult for all of the methods and none of themperform sufficiently well. Interestingly, we have not found a case where XYZ performsmuch better than the RB or MBR.2. Bootstrap inference for high-dimensional and possiblynon-sparse models
The interesting and shared part of many of the proposed bootstrap schemes is in theconstruction of the residuals based on regularized, i.e., Lasso estimator. The most promi-nent examples, exhibiting excellent performance in a variety of settings, seem robust tothe construction of such residuals despite the intricate bias introduced by the regulariza-tion. However, when the sparsity of the model increases, we expect that the introducedbootstrap, that is the Lasso within it, will induce a bias term that would be too largeand that would affect the finite sample coverage.
OMMENTS ON HIGH-DIMENSIONAL SIMULTANEOUS INFERENCE WITH THE BOOTSTRAP 5
Table 4.
The Coverage (Cov) and Width of the confidence interval. De-sign has Exponential distribution whereas the distribution of the errorsis heteroscedastic and varies from symmetric to non-symmetric to heavytailed to bimodal. n = 100 and p = 150 Gaussian Exponential Student MixtureCov Width Cov Width Cov Width Cov WidthRB 0.944 3.249 0.902 3.207 0.939 3.237 0.913 3.270MBG 0.922 3.170 0.882 3.137 0.917 3.150 0.891 3.207MBR 0.941 3.275 0.896 3.199 0.935 3.239 0.914 3.358MBM 0.925 3.176 0.885 3.169 0.919 3.159 0.902 3.249XYZ 0.995 4.149 0.964 4.308 0.993 4.181 0.972 4.404ZC 0.797 3.871 0.756 3.757 0.802 3.775 0.749 3.911RZC 0.950 5.154 0.933 5.283 0.955 5.127 0.906 5.108 n = 200 and p = 300 RB 0.943 3.242 0.921 3.202 0.951 3.223 0.927 3.274MBG 0.934 3.164 0.909 3.127 0.933 3.146 0.913 3.206MBR 0.943 3.238 0.921 3.180 0.953 3.211 0.931 3.306MBM 0.929 3.175 0.910 3.151 0.929 3.153 0.919 3.241XYZ 0.993 3.887 0.978 4.056 0.993 3.942 0.984 4.001ZC 0.737 3.641 0.705 3.584 0.737 3.555 0.727 3.670RZC 0.926 5.194 0.934 5.346 0.944 5.165 0.901 5.198We explore the possibility of applying the bootstrap methods proposed by the authorsto perform inference problems of high-dimensional models that are potentially non-sparse.To the best of our knowledge, the only work in this direction is [4]. We only considerthe particular case of G being a singleton, i.e., testing one entry of β . Without loss ofgenerality, consider the problem of testing H : β ∗ = β o . The CorrT method proposed in [4] is implemented as follows. We first compute ˆ β − = arg min v ∈ R p − k v k s.t. k X >− ( Y − X β o − X − v ) k ∞ ≤ η β k Y − X β o − X − v k ∞ ≤ k Y − X β o − X − v k / log n ( Y − X β o ) > ( Y − X β o − X − v ) ≥ . k Y − X β o k / p log n, and ˆ θ = arg min v ∈ R p − k v k (1) s.t. k X >− ( X − X − v ) k ∞ ≤ η θ k X − X − v k ∞ ≤ k X − X − v k / log nX > ( X − X − v ) ≥ . k X k / p log n, JELENA BRADIC AND YINCHU ZHU where η β , η θ (cid:16) p n − log p are tuning parameters chosen as in [4]. Let ˆ u = X − X − ˆ θ, and ˆ ε = Y − X β o − X − ˆ β − . Then the test statistic defined therein takes the form T n = √ n ˆ ε > ˆ u/ k ˆ ε k k ˆ u k . Under the assumptions in [4], it is proved that under H : β ∗ = β o , T n convergesin distribution to N (0 , , regardless of whether or not β ∗ is sparse. In the followingwe define a residual bootstrap counterparts of the test statistic above and explore theirfinite sample properties.2.1. CorrT with residual bootstrap (CorrTRB).
Define centered estimated resid-uals with ¯ u = (¯ u , ..., ¯ u n ) > with ¯ u k = ˆ u k − n − P ni =1 ˆ u i . Then, consider a random sample u RB = ( u RB , ..., u RBn ) > computed by drawing with replacement from { ¯ u i } ni =1 . Then,define the new variable X RB = X − ˆ θ + u RB . We then compute ˆ θ RB as ˆ θ in (1) with X replaced by X RB . The bootstrap test statistic is T RBn = √ n ˆ ε > ( X RB − X − ˆ θ RB ) k ˆ ε k k X RB − X − ˆ θ RB k . CorrT with paired bootstrap (CorrTPB).
For this setting we would like totake advantage of the distributions of pairs of observations. We begin by defining ˆ X = ˆ X − ˆ θ + ¯ u and ˆ X − = ( ˆ X , ..., ˆ X p ) ∈ R n × ( p − with ˆ X j = X j − k ¯ u k − ( X > j ¯ u )¯ u for j ≥ . We now consider a new sample (ˆ ε ∗ , ˆ X ∗ , ˆ X ∗− ) whose rows form a randomsample with replacement from rows of (ˆ ε, ˆ X , ˆ X − ) . We compute ˆ θ ∗ as ˆ θ in (1) with ( X , X − ) replaced by ( ˆ X ∗ , ˆ X ∗− ) . The bootstrap test statistic reads T pairn = √ n ˆ ε ∗> ( X ∗ − X ∗− ˆ θ ∗ ) k ˆ ε ∗ k k X ∗ − X ∗− ˆ θ ∗ k . CorrT with wild multiplier bootstrap (CorrTMB).
Let { ξ i } ni =1 be i.i.d. mul-tipliers drawn independent of the sample. We compute X MB = X − ˆ θ + u MB , where u MB = ( u MB , ..., u MBn ) > with u MBi = ¯ u i ξ i . Then we compute ˆ θ MB as ˆ θ in (1) with X replaced by X MB and the bootstrap teststatistic T MBn = √ n ˆ ε > ( X MB − X − ˆ θ MB ) k ˆ ε k k X MB − X − ˆ θ MB k . We propose three multiplier bootstrap methods labeled by CorrTMBG, CorrTMBR andCorrTMBM, where the multipliers are drawn from N (0 , , Radmacher distribution andMammen distribution, respectively. OMMENTS ON HIGH-DIMENSIONAL SIMULTANEOUS INFERENCE WITH THE BOOTSTRAP 7
Numerical example.
We now compare CorrT and its bootstrap variations. Wereport the rejection probabilities of these methods for the hypothesis H : β ∗ = β o with β o = β ∗ + n − / h . Hence, these probabilities represent the size if h = 0 and the power if h = 0 . Table 5.
CorrT: Gaussian design and Gaussian errors. We considerdense β ∗ = 4 p − / (1 , ..., > ∈ R p with p = 300 and n = 200 .HomoscedasticityCvM KS h = 0 h = 5 h = 10 h = 15 CorrT 0.024 0.046 0.040 0.210 0.683 0.943CorrTRB 0.025 0.044 0.048 0.228 0.710 0.943CorrTPB 0.025 0.051 0.053 0.250 0.703 0.943CorrTMBG 0.021 0.038 0.055 0.225 0.695 0.938CorrTMBR 0.021 0.040 0.050 0.213 0.700 0.938CorrTMBM 0.023 0.045 0.048 0.230 0.678 0.933HeteroscedasticityCvM KS h = 0 h = 5 h = 10 h = 15 CorrT 0.013 0.037 0.053 0.240 0.708 0.960CorrTRB 0.013 0.030 0.053 0.258 0.715 0.960CorrTPB 0.013 0.026 0.065 0.270 0.705 0.965CorrTMBG 0.016 0.035 0.053 0.265 0.705 0.958CorrTMBR 0.018 0.039 0.058 0.258 0.685 0.955CorrTMBM 0.012 0.026 0.065 0.255 0.688 0.955The simulation results are collected in Table 5. We observe that all the methodsperform quite well in that they all control the Type I error rate and thus inverting thesetests would be a valid way of constructing confidence sets for the parameter under testing.This suggests that the bootstrap methods proposed by the authors can be extended tohandle non-sparse high-dimensional linear models via the construction by [4]. In additionto the novel result by the authors who show that bootstrap methods can be successfullyapplied to high-dimensional sparse linear models, our simulation evidence suggests thepossibility that the boundary of bootstrap methods might be further pushed to non-sparsesettings although rigorous theoretical justification is left for future research. Anotherinteresting direction would be investigating how to address the problem for | G | > oreven very large | G | in non-sparse scenarios.To observe finite sample differences between all of the methods we calculated Cramer-von-Mises (CvM) and Kolmogorov-Smirnov (KS) distances between the empirical nulldistribution of p-values and the uniform distribution. We see from Table 5 that Cor-rTMBG has the smallest distance from the uniform distribution in the case of ho-moscedastic errors. However, with regard to the heteroscedastic errors, all methodsare practically indistinguishable with perhaps CorrTPB and CorrTMBM bootstrap per-forming slightly better. This last case mimics the low-dimensional property of the MBMand its strong robustness to the heteroscedasticity (see [2]). JELENA BRADIC AND YINCHU ZHU
References [1] Joel L. Horowitz. The bootstrap. In J.J. Heckman and E.E. Leamer, editors,
Handbookof Econometrics , volume 5, chapter 52, pages 3159–3228. Elsevier, 1 edition, 2001.[2] Enno Mammen. Bootstrap and wild bootstrap for high dimensional linear models.
The Annals of Statistics , 21(1):255–285, 03 1993.[3] Xianyang Zhang and Guang Cheng. Simultaneous inference for high-dimensionallinear models.
Journal of the American Statistical Association , (forthcoming), 2016.[4] Yinchu Zhu and Jelena Bradic. Hypothesis testing in non-sparse high-dimensionallinear models. arXiv:1610.02122 , 2016., 2016.