On Second order correctness of Bootstrap in Logistic Regression
aa r X i v : . [ m a t h . S T ] S e p Submitted to Bernoulli
On Second order correctness of Bootstrap in LogisticRegression
DEBRAJ DAS a ∗ and PRIYAM DAS b a Department of Mathematics and Statistics, Indian Institute of Technology Kanpur, India.E-mail: [email protected] b Department of Biomedical Informatics, Harvard Medical School, Boston, USA.E-mail: priyam [email protected]
Abstract
In the fields of clinical trial, biomedical surveys, marketing, banking, with dichotomous responsevariable, the logistic regression is considered as an alternative convenient approach to linear regression.In this paper, we develop a novel perturbation Bootstrap technique for approximating the distribution ofthe maximum likelihood estimator (MLE) of the regression parameter vector. We establish second ordercorrectness for the proposed Bootstrap method which results in improved inference performance comparedto that based on asymptotic normality. The main challenge in establishing second order correctness remainsin the fact that the response variable being binary, the resulting MLE has a lattice structure. We show thatthe direct Bootstrapping approach fails even after studentization. We adopt smoothing technique developedin Lahiri (1993) to ensure that the smoothed studentized version of the MLE has a density. Similar smoothingstrategy is employed to the Bootstrap version to achieve second order correct approximation. Good finite-sample properties of the proposed Bootstrap method is shown using simulation experiments. The proposedmethod is used to find the confidence intervals of the coefficients of the covariates on a dataset in the fieldof healthcare operations decision.
Keywords:
Logistic Regression, PEBBLE, SOC, Lattice, Smoothing, Perturbation Bootstrap.
1. Introduction
Logistic regression is one of the most widely used regression techniques when the response variableis binary. The use of the ‘logit’ function as a statistical tool dates back to Berkson (1944), followedby Cox (1958), who popularized it in the field of regression. Following those seminal works, nu-merous applications of logistic regression can be found in different fields, from banking sectors toepidemiology, clinical trials, biomedical surveys, among others (Hosmer, Lemeshow and Sturdivant(2013)). The logistic regression model is given as follows. Suppose y denotes the binary responsevariable and the value of y depends on the p independent variables x = ( x , . . . , x p ) ′ . Instead ofcapturing this dependence by modelling y directly on the covariates, in logistic regression, log-oddscorresponding to the success of y , denoted by p ( x ) = P ( y = 1), is modeled as a linear function ofthe covariates. The odds ratio for the event { y = 1 } is given by odd ( x ) = p ( x )1 − p ( x ) . The logistic ∗ Research partially supported by DST Inspire fellowship DST/INSPIRE/04/2018/001290 Das D. and Das P. regression model is given by logit( p ( x )) = log (cid:20) p ( x )1 − p ( x ) (cid:21) = x ′ β , (1.1)where β = ( β , . . . , β p ) is the p -dimensional vector of regression parameters. In convention, themaximum likelihood estimator (MLE) of β is used for the purpose of inference. For a given sample { ( x i , y i ) } ni =1 , the likelihood is given by L ( β | y , . . . , y n , x , . . . , x n ) = n Y i =1 p ( x i ) y i (1 − p ( x i )) − y i , where p ( β | x i ) = e x ′ i β e x ′ i β . The MLE ˆ β n of β is defined as the maximizer of L ( β | y , . . . , y n , x , . . . , x n ),which is obtained by solving n X i =1 ( y i − p ( β | x i )) x i = 0 . (1.2)In order to find confidence intervals for different regression coefficients or to test whether a certaincovariate is of importance or not, it is required to find a good approximatation of the distributionof ˆ β n . ˆ β n being the MLE, the distribution of ˆ β n is approximately normal under certain regularityconditions. Asymptotic normality as well as other large sample properties of ˆ β n have been studiedextensively in the literature (cf. Haberman (1974), McFadden (1974), Amemiya (1976), Gourierouxand Monfort (1981), Fahrmeir and Kaufmann (1985)).As an alternative to asymptotic normality, Efron (1979) proposed the Bootstrap approximationwhich has been shown to work in wide class of models, specially in case of multiple linear regression.In the last few decades, several variants of Bootstrap have been developed in linear regression. De-pending on whether the covariates are non-random or random in linear regression setup, Freedman(1981) proposed the residual Bootstrap or the paired Bootstrap. A few other variants of Bootstrapmethods in linear regression setup are the wild Bootstrap (cf. Liu (1988), Mammen (1993)), theweighted Bootstrap (Lahiri (1992), Barbe and Bertail (2012)) and the perturbation Bootstrap (Dasand lahiri (2019)). Using similar mechanism of the residual and the paired Bootstrap, Moulton andZeger (1989, 1991) developed the standardized Pearson residual resampling and the observationvector resampling Bootstrap methods in generalized linear models (GLM). Lee (1990) consideredthe logistic regression model and showed that the conditional distribution of these resample basedBootstrap estimators for the given data are close to the distribution of the original estimator inalmost sure sense. Claeskens et al. (2003) proposed a couple of Bootstrap methods for logistic re-gression in univariate case, namely ‘Linear one-step Bootstrap’ and ‘Quadratic one-step Bootstrap’.‘Linear one-step Bootstrap’ was developed following the linearization principle proposed in Davi-son et al. (1986), whereas, ‘Quadratic one-step Bootstrap’ was constructed based on the quadraticapproximation of the estimators as discussed in Ghosh (1994). The validity of these two Bootstrapmethods for approximating the underlying distribution in almost sure sense was established in OC of Bootstrap in Logistic Regression o ( n − / ). In order to draw more accurate inferenceresults compared to that based on asymptotic normal distribution, SOC is essential. An elaboratedescription on the results on SOC of residual for generalized and perturbation Bootstrap methodsin linear regression can be found in Lahiri (1992), Barbe and Bertail (2012) and Das and Lahiri(2019) and references their in. However, to the best of our knowledge, for none of the existing Boot-strap methods for logistic regression in the literature, SOC has been explored. In this paper, wepropose Perturbation Bootstrap in Logistic Regression (PEBBLE) as an alternative of the normalapproximation approach. Whenever the underlying estimator is a minimizer of certain objectivefunction, perturbation Bootstrap simply produces a Bootstrap version of the estimator by findingthe minimizer of a random objective function, suitably developed by perturbing the original objec-tive function using some non-negative random variables. We show that the perturbation Bootstrapattains SOC in approximating the distribution of ˆ β n . For the sake of comparison with the proposedBootstrap method, we also find the error rate for the normal approximation of the studentizedversion of the distribution of ˆ β n which comes out to be of O ( n − / log n ). The extra “log n ” termin the error rate appears due to the underlying lattice structure. Therefore, the inference based onour Bootstrap method is more accurate than that based on the asymptotic normality.In order to establish SOC for the proposed method, we start with studentization of √ n ( ˆ β n − β )and its perturbation Bootstrap version. We show that unlike in the case of multiple linear regression,here SOC cannot be achieved only by studentization of √ n ( ˆ β n − β ) due to the lattice nature of thedistribution of the logistic regression estimator ˆ β n , in general. The lattice nature of the distributionis induced by the binary nature of the response variable. It is a common practice to establish SOC bycomparing the Edgeworth expansions in original and Bootstrap case (cf. Hall (1992)). However theusual Edgeworth expansion does not exist when the underlying setup is lattice. Therefore, correctionterms are required to take care of the lattice nature. For example, one can compare Theorem 20.8and corollary 23.2 in Bhattacharya and Rao (1986) [hereafter referred to as BR(86)] to learn thecorrection terms required in the Edgeworth expansions whenever the underlying structure is lattice.In general, these correction terms cannot be approximated with an error of o ( n − / ), which makesSOC unachievable even with studentization. As a remedy we adopt the novel smoothing techniquedeveloped in Lahiri (1993). First, this smoothing technique is applied to transform the lattice natureof the distribution of the studentized version to make it absolutely continuous. Thus the resultingcorrection terms do not appear in the underlying Edgeworth expansion. Further we use the samesmoothing technique for the Bootstrap version and establish SOC by comparing the Edgeworthexpansions across the original and the Bootstrap cases. Moreover, an interesting property of the Das D. and Das P. smoothing is that it has negligible effect on the asymptotic variance of ˆ β n and therefore it is notrequired to incorporate the effect of the smoothing in the form of the studentization. In orderto prove the results, we establish the Edgeworth expansion of a smoothed version of a sequenceof sample means of independent random vectors even if they are not identically distributed (cf.Lemma 3). Lemma 3 may be of independent interest for establishing SOC of Bootstrap in otherrelated problems.The rest of the paper is organized as follows. The perturbation Bootstrap version of the logisticregression estimator is described in Section 2. Main results including theoretical properties of theBootstrap along with normal approximation are stated in Section 3. In Section 4, finite-sampleperformance of PEBBLE is evaluated comparing with other related existing methods by simulationexperiments. Section 5 gives an illustration of PEBBLE in healthcare operations decision dataset.Auxiliary lemmas and the proof of the theorems are presented in Section 6. Finally, we concludeon the proposed methodology in Section 7.
2. Description of PEBBLE
In this section, we define the Perturbation Bootstrapped version of the logistic regression estimator.Let G ∗ , . . . , G ∗ n be n independent copies of a non-negative and non-degenerate random variable G ∗ with mean µ G ∗ , V ar ( G ∗ ) = µ G ∗ and E ( G ∗ − µ G ∗ ) = µ G ∗ . These quantities serve as perturbing ran-dom quantities in the construction of the perturbation Bootstrap version of the logistic regressionestimator. We define the Bootstrap version as the minimizer of a carefully constructed objectivefunction which involves the observed values y , . . . , y n as well as the estimated probability of suc-cesses ˆ p ( x i ) = e x ′ i ˆ β n e x ′ i ˆ β n , i = 1 , . . . , n . Formally, the perturbation Bootstrapped logistic regressionestimator ˆ β ∗ n is defined asˆ β ∗ n = arg max t " n X i =1 n ( y i − ˆ p ( x i )) x ′ i t o ( G ∗ i − µ G ∗ ) + µ G ∗ n X i =1 n ˆ p ( x i )( x ′ i t ) − log(1 + e x ′ i t ) o . In other words, ˆ β ∗ n is the solution of the equation n X i =1 (cid:0) y i − ˆ p ( x i ) (cid:1) x i ( G ∗ i − µ G ∗ ) µ ∗− G + n X i =1 (cid:0) ˆ p ( x i ) − p ( t | x i ) (cid:1) x i = 0 , (2.1)since the derivative of the LHS of (2.1) with respect to t is negative definite. If Bootstrap equation(2.1) is compared to the original equation (1.2), it is easy to note that the second part of the LHSof (2.1) is the estimated version of the LHS of (1.2). The Bootstrap randomness is coming fromthe first part of the LHS in (2.1), i.e., P ni =1 (cid:0) y i − ˆ p ( x i ) (cid:1) x i ( G ∗ i − µ G ∗ ) µ ∗− G . Also, the first partis the main contributing term in the asymptotic expansion of the studentized version of ˆ β ∗ n . Oneimmediate choice for the distribution of G ∗ is Beta(1 / , /
2) since the required conditions of G ∗ aresatisfied for this distribution. Other choices can be found in Liu (1988), Mammen (1993) and Das OC of Bootstrap in Logistic Regression G ∗ are assumed to be true for the rest of this paper.Further any additional assumption on G ∗ will be stated in respective theorems.
3. Main Results
In this section, we describe the theoretical results of Bootstrap as well as the normal approximation.In 3.1 we state a Berry-Esseen type theorem for a studentized version of the logistic regressionestimator ˆ β n . In 3.2 we explore the effectiveness of Bootstrap in approximating the distribution ofthe studentized version. Theorem 2 shows that SOC is not achievable solely by studentization evenwhen p = 1. As a remedy, we introduce a smoothing in the studentization and show that proposedBootstrap method achieves SOC.Before exploring the rate of normal approximation, first we define the class of sets that we wouldconsider in the following theorems. For any natural number m , the class of sets A m is the collectionof Borel subsets of R m satisfying sup B ∈A m Φ (( δB ) ǫ ) = O ( ǫ ) as ǫ ↓ . Here Φ denotes the normal distribution with mean and dispersion matrix being the identitymatrix. We are going to use the class A p for the uniform asymptotic results on normal and Bootstrapapproximations. P ∗ denotes the conditional Bootstrap probability of G ∗ given data { y , . . . , y n } . In this sub-section we explore the rate of normal approximation of suitable studentized version ofthe logistic regression estimator ˆ β n , uniformly over the class of sets A p . From the definition (1.2) ofˆ β n , we have that P ni =1 ( y i − ˆ p ( x i )) x i = 0. Now using Taylor’s expansion of √ n (cid:0) ˆ β n − β (cid:1) , it is easy tosee that the asymptotic variance of √ n (cid:0) ˆ β n − β (cid:1) is L − n where L n = n − P ni =1 x i x ′ i e x ′ i β (1 + e x ′ i β ) − .An estimator of L n can be obtained by replacing β by ˆ β n in the form of L n . Hence we can definethe studentized version of ˆ β n as ˜ H n = √ n ˆ L / n (cid:0) ˆ β n − β (cid:1) , where ˆ L n = n − P ni =1 x i x ′ i e x ′ i ˆ β n (cid:0) e x ′ i ˆ β n (cid:1) − . Other studentized versions can be constructed byconsidering other estimators of L n . For details of the construction of different studentized versions,one can look into Lahiri (1994). The result on normal approximation will hold for other studentizedversions also as long as it involves the estimator of L n which is √ n − consistent.Berry-Esseen theorem states that the error in normal approximation for the distribution ofthe mean of a sequence of independent random variables is O ( n − / ), provided the average thirdabsolute moment is bounded (cf. Theorem 12.4 in BR(86)). Note that there is an extra multiplicative“log n ” term besides the usual n − / term in the error rate of the normal approximation which isdue to the error incurred in Taylor’s approximation of √ n ( ˆ β n − β ). Since the underlying setup Das D. and Das P. in logistic regression has lattice nature, in general, this error cannot be corrected by higher orderapproximations, like Edgeworth expansions. Further one important tool in deriving the error ratein normal approximation, and later for deriving the higher order result for the Bootstrap is to findthe rate of convergence of ˆ β n to β . To this end, we state our first theorem as follows. Theorem 1.
Suppose n − P ni =1 k x i k = O (1) and L n → L as n → ∞ where L is a pd matrix.Then(a) there exists a positive constant C such that when n > C we have P (cid:16) ˆ β n solves (1 . and k ˆ β n − β k ≤ C n − / ( logn ) / (cid:17) = 1 − o (cid:0) n − / (cid:1) . (b) we have sup B ∈A p (cid:12)(cid:12) P (cid:0) ˜ H n ∈ B (cid:1) − Φ ( B ) (cid:12)(cid:12) = O (cid:0) n − / log n (cid:1) . The proof of Theorem 1 is presented in Section 6. Theorem 1 shows that the normal approx-imation of the distribution of ˜ H n , the studentized logistic regression estimator, has near optimalBerry-Esseen rate. However the rate can be improved significantly by Bootstrap and an applicationof a smoothing, as described in 3.2. In this sub-section, we extensively study the rate of Bootstrap approximation for the distribution ofthe logistic regression estimator. To that end, before exploring the rate of convergence of Bootstrapwe need to define the suitable studentized versions in both original and the Bootstrap setting. Sim-ilar to the original case, the asymptotic variance of the Bootstrapped logistic regression estimatorˆ β ∗ n is needed to be found to define the studentized version in the Bootstrap setting. Using Taylor’sexpansion, from (2.1) it is easy to see that the asymptotic variance of √ n (cid:0) ˆ β ∗ n − ˆ β n (cid:1) is ˆ L − n ˆ M n ˆ L − n where ˆ L n = n − P ni =1 x i x ′ i e x ′ i ˆ β n (1 + e x ′ i ˆ β n ) − and ˆ M n = n − P ni =1 (cid:0) y i − ˆ p ( x i ) (cid:1) x i x ′ i . Therefore thestudentized version in Bootstrap setting can be defined as H ∗ n = √ n ˆ M ∗− / n L ∗ n (cid:0) ˆ β ∗ n − ˆ β n (cid:1) , where L ∗ n = n − P ni =1 x i x ′ i e x ′ i ˆ β ∗ n (cid:0) e x ′ i ˆ β ∗ n (cid:1) − and ˆ M ∗ n = n − P ni =1 (cid:0) y i − ˆ p ( x i ) (cid:1) x i x ′ i µ − G ∗ ( G ∗ i − µ G ∗ ) .Analogously, we define the original studentized version as H n = √ n ˆ M − / n ˆ L n (cid:0) ˆ β n − β (cid:1) , which will be used for investigating SOC of Bootstrap for rest of this section. In the next theoremwe show that H ∗ n fails to be SOC in approximating the distribution of H n even when p = 1. Theorem 2.
Suppose p = 1 and denote the only covariate by x in the model (1.1). Let x , . . . , x n be the observed values of x and β be the true value of the regression parameter. Define, µ n = n − P ni =1 x i p ( β | x i ) . Assume the following conditions hold:OC of Bootstrap in Logistic Regression (C.1) x , . . . , x n are non random and are all integers.(C.2) x i , . . . , x i m = 1 where { i , . . . , i m } ⊆ { , . . . , n } with m ≥ (log n ) .(C.3) max {| x i | : i = 1 , . . . , n } = O (1) and lim inf n →∞ (cid:2) n − P ni =1 | x i | (cid:3) > .(C.4) √ n | µ n | < M for n ≥ M where M is a positive constant.(C.5) The distribution of G ∗ has an absolutely continuous component with respect to Lebesgue mea-sure and E G ∗ < ∞ .Then there exist an interval B n and a positive constant M (does not depend on n ) such that lim n →∞ P (cid:16) √ n (cid:12)(cid:12) P ∗ (cid:0) H ∗ n ∈ B n (cid:1) − P (cid:0) H n ∈ B n (cid:1)(cid:12)(cid:12) ≥ M (cid:17) = 1The proof of Theorem 2 is presented in Section 6. Theorem 2 shows that unlike in the case ofmultiple linear regression, in general the Bootstrap cannot achieve SOC even with studentization.Now we further look into the form of the set B n . B n is of the form f n ( E n × R ) with E n = ( −∞ , z n ]and z n = (cid:16) n − µ n (cid:17) . f n ( · ) is a continuous function which is obtained from the Taylor expansionof H n . Since E n × R is a convex subset of R , it is also a connected set. Since f n ( · ) is a continuousfunction, B n is a connected subset of R and hence is an interval.Now, we define the smoothed versions of H n and H ∗ n which are necessary in achieving SOCby the Bootstrap for general p . Note that the primary reason behind Bootstrap’s failure is thelattice nature of the distribution of √ n ( ˆ β n − β ). Hence if one can somehow smooth the distribution √ n ( ˆ β n − β ), or more generally the distribution of H n , so that the smoothed version has densitywith respect to Lebesgue measure, then the Bootstrap may be shown to achieve SOC by employingtheory of Edgeworth expansions. To that end, suppose Z is a p − dimensional standard normalrandom vector, independent of y , . . . , y n . Define the smoothed version of H n asˇ H n = H n + ˆ M − / n b n Z, (3.1)where { b n } n ≥ is a suitable sequence such that it has negligible effect on the variance of √ n ( ˆ β n − β )and hence on the studentization factor. See Theorem 3 for the conditions on { b n } n ≥ . To definethe smoothed studentized version in Bootstrap setting, consider another p − dimensional standardnormal vector by Z ∗ which is independent of y , . . . , y n , G ∗ , . . . , G ∗ n and Z . Define the smoothedversion of H ∗ n as ˇ H ∗ n = H ∗ n + ˆ M ∗− / n b n Z ∗ . (3.2)The following theorem can be distinguished as the main theorem of this section as it showsthat the smoothing does the trick for Bootstrap to achieve SOC. Thus the inference on β basedon the Bootstrap after smoothing is much more accurate than the normal approximation. Tostate the main theorem, define W i = (cid:16) y i x ′ i , (cid:2) y i − E y i (cid:3) z ′ i (cid:17) ′ where y i = ( y i − p ( β | x i )) and z i =( x i , x i x i , . . . , x i x ip , x i , x i x i , . . . , x i x ip , . . . , x ip ) ′ with x i = ( x i , . . . , x ip ) ′ , i ∈ { , . . . , n } . Das D. and Das P.
Theorem 3.
Suppose n − P ni =1 k x i k = O (1) and the matrix n − P ni =1 V ar ( W i ) converges tosome positive definite matrix as n → ∞ . Also choose the sequence { b n } n ≥ such that b n = O ( n − d ) and n − /p log n = o ( b n ) where d > is a constant and p = max { p + 1 , } . Then(a) there exists two positive constant C such that when n > C we have P ∗ (cid:16) ˆ β ∗ n solves (2 . and k ˆ β ∗ n − ˆ β n k ≤ C .n − / . ( logn ) / (cid:17) = 1 − o p (cid:0) n − / (cid:1) . (b) we have sup B ∈A p (cid:12)(cid:12) P ∗ (cid:0) ˇ H ∗ n ∈ B (cid:1) − P (cid:0) ˇ H n ∈ B (cid:1)(cid:12)(cid:12) = o p (cid:0) n − / (cid:1) . The proof of Theorem 3 is presented in Section 6. Theorem 3 shows that SOC of PEBBLE canbe achieved by a simple smoothing in the studentized pivotal quantities. As a result, much moreaccurate inference on β can be drawn based on Bootstrap than that based on normal approximationspecially when n is not large enough compared to p . The finite sample simulation results presentedin Table 1 also confirms this fact. Remark . The class of sets A p used to state the uniform asymptotic results is somewhatabstract. Note that there are two major reasons behind considering this class. The first reason isto obtain asymptotic normality or to obtain valid Edgeworth expansions for the normalized partof the underlying pivot and the second one is to bound the remainder term by required smallmagnitude with sufficiently large probability (or Bootstrap probability). A natural choice for A isthe collection of all Borel measurable convex subsets of R p , due to Theorem 3.1 in BR(86). Remark . The results on Bootstrap approximation presented in Theorem 3, may be estab-lished in almost sure sense also. In that case the only additional requirement is to have n − P ni =1 k x i k = O (1), since y , . . . , y n can take either 0 or 1. Actually an almost sure version of part (a) ofTheorem 3 is necessary to establish Theorem 2. Note that the requirement for almost sure versionis met under the assumptions of Theorem 2. Remark . Note that the random quantities Z and Z ∗ respectively, introduced in (3.1) and(3.2), are essential in achieving SOC of the Bootstrap. Z and Z ∗ both are assumed to be distributedas N ( , I p ), I p being the p × p identity matrix. However, Theorem 3 remains to be true if we replace I p by any diagonal matrix, i.e., Theorem 3 is true even if we only assume that the components of Z (and of Z ∗ ) are independent and have normal distributions.
4. Simulation Study
In this section, we compare the performance of PEBBLE with other existing methods via simulationexperiments. For comparative study, we consider the Normal approximation, Pearson Residual Re-sampling Bootstrap (PRRB, Moulton and Zeger (1991)), One-Step Bootstrap (OSB) and Quadratic
OC of Bootstrap in Logistic Regression b = (1 , . , − , − . , . , − , . , − . . Note that b has length 8. For the scenarios where p ≤
8, we take the true parameter vector β to bethe first p -many elements of b . The covariate vector X is generated from multivariate normal distri-bution with mean and variance Σ = { σ ij } p × p where σ ij = 0 . | i − j | . Now, in order to access the per-formance of all the methods for various dimensional coefficient vectors and sample sizes, we considerthe following cases ( n, p ) = (30 , , (50 , , (50 , , (100 , , (100 , , (100 , , (200 , , (200 , , (200 , , p = max { p + 1 , } , b n = n − p . Both Z and Z ∗ are drawn from in-dependent multivariate normal distribution with mean and variance I p . G ∗ i is genrated from Beta ( , ). Further details regarding the forms of the confidence sets for PEBBLE is providedin the Supplementary Material Section 2. PEBBLE is implemented in R . Other methods namelyNormal approximation, PRRB, OSB and QB are also implemented in R . For the experiment, weconsider 1000 Bootstrap iterations. In order to find coverage, such experiment is repeated 1000times for each ( n, p ) scenario. In Table 1, we note down the empirical coverage of lower 90% con-fidence region of β , upper, middle and lower 90% Confidence intervals (CIs) corresponding to theminimum and maximum components of β . We also note down the average over empirical cover-ages of upper, middle and lower 90% CI corresponding to all components of β . Average widthsof 90% CI corresponding to all applicable cases are also noted in parenthesis. It is noted that ingeneral, PEBBLE performs better than other methods; specifically, for lower n : p scenarios (smallsample size, high dimension), i.e., cases corresponding to ( n, p ) = (30 , , (50 , , (100 , , (200 ,
8) inour study. For example, for ( n, p ) = (100 , , (200 ,
8) it is noted that PEBBLE outperforms othermethods by a big margin. As n increases for fixed p , performance of PEBBLE is noted to improveand the widths of CIs tend to decrease, as expected. PEBBLE performs better in comparativelybigger margin than other methods. It is also noted that for all the simulation scenarios, the averagecoverage over all coordinates is much closer to 0 .
90 for PEBBLE compared to other methods. Weobserve that for relatively smaller n : p scenarios, the PEBBLE CIs are a little wider comparedto other methods, but, as n increases (for fixed p ), PEBBLE CI widths become closer to thoseobserved for other methods.
5. Application to Healthcare Operations Decision
Vaginal delivery is the most common type of birth. However due to several medical reasons, withadvancement of medical procedures, caesarian delivery is often considered as an alternative wayfor delivery. Recently a few studies showed how the recommended type of delivery may depend onvarious clinical aspects of the mother including age, blood pressure and heart problem (Rydahl etal. (2019), Amorim et al. (2017), Pieper (2012)). We consider a dataset about caesarian section0
Das D. and Das P. (n,p) Methods β (lower) β min middle (width) β min upper β min lower β max middle (width) β max upper β max lower β avg.middle (width) β avg.upper β avg.lower(30,3) PEBBLE 0.916 0.885 (2.82) 0.861 0.918 0.936 (3.95) 0.928 0.914 0.900 (3.09) 0.888 0.913Normal 0.952 0.947 (2.31) 0.956 0.896 0.964 (2.86) 0.909 0.993 0.958 (2.42) 0.939 0.935PRRB 0.946 0.916 (2.17) 0.926 0.873 0.943 (2.66) 0.914 0.932 0.930 (2.27) 0.915 0.905OSB 0.953 0.942 (2.34) 0.940 0.889 0.930 (2.67) 0.911 0.939 0.930 (2.38) 0.916 0.921QB 0.976 0.952 (2.49) 0.950 0.924 0.958 (3.07) 0.936 0.965 0.936 (2.53) 0.920 0.940(50,3) PEBBLE 0.888 0.891 (2.07) 0.878 0.923 0.909 (2.89) 0.925 0.895 0.904 (2.20) 0.901 0.912Normal 0.937 0.927 (1.76) 0.924 0.901 0.948 (2.18) 0.906 0.971 0.936 (1.80) 0.920 0.930PRRB 0.917 0.892 (1.68) 0.896 0.880 0.912 (2.06) 0.899 0.933 0.902 (1.71) 0.896 0.909OSB 0.925 0.911 (1.79) 0.907 0.885 0.905 (2.04) 0.903 0.928 0.913 (1.77) 0.908 0.910QB 0.932 0.915 (1.86) 0.904 0.913 0.916 (2.12) 0.904 0.935 0.922 (1.84) 0.914 0.922(50,4) PEBBLE 0.909 0.901 (2.92) 0.877 0.936 0.909 (3.87) 0.936 0.879 0.902 (2.71) 0.897 0.910Normal 0.931 0.926 (2.14) 0.952 0.902 0.951 (2.62) 0.906 0.985 0.939 (2.03) 0.926 0.926PRRB 0.928 0.899 (1.99) 0.933 0.860 0.938 (2.42) 0.899 0.949 0.906 (1.88) 0.906 0.894OSB 0.958 0.928 (2.20) 0.943 0.920 0.937 (2.44) 0.908 0.952 0.928 (2.03) 0.926 0.919QB 0.954 0.924 (2.11) 0.931 0.915 0.926 (2.40) 0.891 0.954 0.924 (1.99) 0.923 0.912(100,3) PEBBLE 0.880 0.877 (1.19) 0.878 0.896 0.896 (1.69) 0.912 0.891 0.887 (1.35) 0.894 0.894Normal 0.926 0.912 (1.08) 0.909 0.904 0.918 (1.40) 0.911 0.901 0.913 (1.18) 0.903 0.903PRRB 0.905 0.901 (1.08) 0.907 0.901 0.912 (1.39) 0.916 0.891 0.901 (1.18) 0.902 0.898OSB 0.906 0.897 (1.09) 0.900 0.899 0.896 (1.39) 0.915 0.877 0.897 (1.18) 0.900 0.894QB 0.899 0.897 (1.08) 0.889 0.900 0.880 (1.33) 0.907 0.873 0.894 (1.17) 0.895 0.895(100,4) PEBBLE 0.885 0.907 (1.79) 0.891 0.927 0.900 (2.24) 0.920 0.880 0.898 (1.71) 0.899 0.902Normal 0.928 0.917 (1.39) 0.924 0.903 0.942 (1.65) 0.912 0.929 0.916 (1.35) 0.910 0.904PRRB 0.901 0.889 (1.35) 0.892 0.900 0.896 (1.60) 0.905 0.881 0.887 (1.32) 0.893 0.887OSB 0.915 0.904 (1.41) 0.918 0.900 0.914 (1.63) 0.915 0.899 0.904 (1.36) 0.906 0.900QB 0.940 0.920 (1.49) 0.934 0.902 0.943 (1.86) 0.937 0.926 0.912 (1.42) 0.913 0.903(100,6) PEBBLE 0.931 0.910 (1.77) 0.880 0.917 0.907 (2.79) 0.929 0.868 0.906 (2.08) 0.908 0.902Normal 0.857 0.874 (1.23) 0.883 0.871 0.903 (1.68) 0.882 0.937 0.871 (1.34) 0.877 0.887PRRB 0.849 0.854 (1.22) 0.878 0.870 0.884 (1.66) 0.869 0.914 0.848 (1.33) 0.866 0.874OSB 0.933 0.797 (1.29) 0.848 0.831 0.832 (1.66) 0.845 0.872 0.791 (1.37) 0.837 0.846QB 0.953 0.819 (1.37) 0.865 0.838 0.863 (1.84) 0.857 0.902 0.807 (1.44) 0.848 0.854(200,3) PEBBLE 0.891 0.906 (0.86) 0.897 0.905 0.918 (1.21) 0.908 0.915 0.903 (1.01) 0.896 0.906Normal 0.905 0.904 (0.78) 0.902 0.910 0.910 (1.03) 0.936 0.879 0.902 (0.89) 0.912 0.894PRRB 0.902 0.900 (0.77) 0.896 0.904 0.899 (1.02) 0.930 0.874 0.893 (0.88) 0.904 0.892OSB 0.905 0.902 (0.78) 0.900 0.917 0.897 (1.01) 0.935 0.870 0.895 (0.88) 0.910 0.893QB 0.867 0.890 (0.75) 0.889 0.913 0.868 (0.93) 0.924 0.842 0.871 (0.83) 0.893 0.877(200,4) PEBBLE 0.872 0.898 (1.08) 0.890 0.908 0.912 (1.54) 0.922 0.893 0.900 (1.11) 0.900 0.905Normal 0.919 0.917 (0.89) 0.902 0.917 0.910 (1.18) 0.918 0.893 0.906 (0.92) 0.905 0.902PRRB 0.899 0.908 (0.88) 0.891 0.915 0.891 (1.15) 0.916 0.876 0.892 (0.91) 0.896 0.890OSB 0.905 0.911 (0.89) 0.897 0.914 0.901 (1.16) 0.925 0.880 0.900 (0.92) 0.905 0.898QB 0.926 0.924 (0.93) 0.905 0.923 0.921 (1.23) 0.930 0.892 0.917 (0.97) 0.912 0.907(200,6) PEBBLE 0.927 0.915 (1.32) 0.890 0.930 0.921 (1.79) 0.933 0.875 0.913 (1.59) 0.908 0.906Normal 0.794 0.833 (0.89) 0.855 0.868 0.892 (1.17) 0.915 0.862 0.847 (1.01) 0.863 0.868PRRB 0.791 0.829 (0.90) 0.860 0.872 0.872 (1.18) 0.911 0.859 0.840 (1.02) 0.860 0.865OSB 0.904 0.751 (0.92) 0.813 0.842 0.794 (1.18) 0.893 0.780 0.741 (1.03) 0.814 0.818QB 0.902 0.738 (0.88) 0.804 0.837 0.784 (1.15) 0.893 0.768 0.736 (1.01) 0.814 0.814(200,8) PEBBLE 0.841 0.869 (1.75) 0.837 0.948 0.866 (2.28) 0.965 0.776 0.851 (1.94) 0.866 0.877Normal 0.405 0.679 (0.94) 0.886 0.676 0.734 (1.19) 0.696 0.961 0.688 (1.00) 0.778 0.800PRRB 0.496 0.679 (0.98) 0.887 0.673 0.731 (1.23) 0.701 0.953 0.691 (1.03) 0.780 0.803OSB 0.861 0.468 (0.97) 0.810 0.571 0.569 (1.17) 0.634 0.843 0.486 (1.00) 0.680 0.714QB 0.852 0.470 (0.98) 0.805 0.575 0.551 (1.15) 0.637 0.837 0.480 (0.99) 0.680 0.713 Table 1.
Comparative performance study of the proposed method Perturbation Bootstrap in Logistic Regression(PEBBLE) and other existing methods Normal approximation (Normal), Pearson Residual Resampling Bootstrap(PRRB), One-Step Bootstrap (OSB) and Quadratic Bootstrap (QB). All considered coverage analysis is based on90% confidence intervals (CI) and average is noted over 1000 experiments, results for each experiment is evaluatedbased on 1000 Bootstrap iterations. We consider the average coverages based on lower CI of norm of β (column 1),upper, lower and middle CI of the minimum absolute value of β (column 2,3,4), upper, lower and middle CI of themaximum absolute value of the β (column 5,6,7), upper, lower and middle CI of the all components of β , onaverage (column 8,9,10). The average width of the middle CI corresponding to the min, max and averagecomponents are provided in parenthesis in columns 2,5,8 respectively. OC of Bootstrap in Logistic Regression Variables ˆ β
90% CI(mid) 90% CI(upper) 90% CI(lower)Age -0.010 (-0.151, 0.300) > -0.100 < > -0.398 < > -0.521 < > -0.548 < > < Table 2.
Real Data Analysis : The estimated coefficients and corresponding middle, upper and lower 90% CIs arenoted for all the covariates; the type of delivery is the dependent variable, which takes values 1 or 0 based on if thedelivery was caesarian or not. results of 80 pregnant women along with several important related clinical covariates. The datasetis avialable in the following link . We regress the type of delivery (caesarian or not) on severalrelated covariates namely age, delivery number, delivery time, blood pressure and presence of heartproblem. Delivery time can take three values 0 (timely), 1 (premature) and 2 (latecomer). Bloodpressure is denoted by 0, 1, 2 for the cases low, normal and high respectively. The covariate presenceof heart problem is also binary, 0 denoting apt behaviour and 1 denoting its inept condition. Weperform a logistic regression and corresponding CIs are computed using PEBBLE and in Table 2we note down the results. It is noted that although 90% CIs for all the covariates contain zero,however, the 90% CI for heart problem belong to the positive quadrant mostly; also the upper 90%CI completely belongs to the positive quadrant, which implies women with heart problems tend tohave caesarian procedure, coinciding with the findings in Yap et al. (2008) and Blaci et al. (2011).
6. Proof of the Results
Before going to the proofs we are going to define few notations. Suppose, Φ V and φ V respectivelydenote the normal distribution and its density with mean and covariance matrix V . We will write Φ V = Φ and φ V = φ when the dispersion matrix V is the identity matrix. C, C , C , · · · denotegeneric constants that do not depend on the variables like n, x , and so on. ν , ν denote vectorsin R p , sometimes with some specific structures (as mentioned in the proofs). ( e , . . . , e p ) ′ denotethe standard basis of R p . For a non-negative integral vector α = ( α , α , . . . , α l ) ′ and a function f = ( f , f , . . . , f l ) : R l → R l , l ≥
1, let | α | = α + . . . + α l , α ! = α ! . . . α l !, f α = ( f α ) . . . ( f α l l ), D α f = D α · · · D α l l f , where D j f denotes the partial derivative of f with respect to the j thcomponent of α , 1 ≤ j ≤ l . We will write D α = D if α has all the component equal to 1. For t = ( t , t , · · · t l ) ′ ∈ R l and α as above, define t α = t α · · · t α l l . For any two vectors α , β ∈ R k , α ≤ β means that each of the component of α is smaller than that of β . For a set A and realconstants a , a , a A + a = { a y + a : y ∈ A } , ∂A is the boundary of A and A ǫ denotes the ǫ − neighbourhood of A for any ǫ > N is the set of natural numbers. C ( · ) , C ( · ) , . . . denote https://archive.ics.uci.edu/ml/datasets/Caesarian+Section+Classification+Dataset Das D. and Das P. generic constants which depend on only their arguments. Given two probability measures P and P defined on the same space (Ω , F ), P ∗ P defines the measure on (Ω , F ) by convolution of P & P and k P − P k = | P − P | (Ω), | P − P | being the total variation of ( P − P ). For a function g : R k → R m with g = ( g , . . . , g m ) ′ , Grad [ g ( x )] = (cid:16)(cid:16) ∂g i ( x ) ∂x j (cid:17)(cid:17) m × k . Before moving to the proofs of the main theorems, we state some auxiliary lemmas. The proofsof lemma 3, 10 and 11 are relegated to the Supplementary material file to save space. Also weare going to present the proof of Theorem 2 at last, since some proof steps of Theorem 3 will beessential in proving Theorem 2.
Lemma 1.
Suppose Y , . . . , Y n are zero mean independent r.v.s with E ( | Y i | t ) < ∞ for i = 1 , . . . , n and S n = P ni =1 Y i . Let P ni =1 E ( | Y i | t ) = σ t , c (1) t = (cid:0) t (cid:1) t and c (2) t = 2(2 + t ) − e − t . Then, for any t ≥ and x > , P [ | S n | > x ] ≤ c (1) t σ t x − t + exp ( − c (2) t x /σ ) Proof of Lemma 1 . This inequality was proved in Fuk and Nagaev (1971).
Lemma 2.
For any t > , − N ( t ) n ( t ) ≤ t wher N ( · ) and n ( · ) respectively denote the cdf and pdfof real valued standard normal rv. Proof of Lemma 2: This inequality is proved in Birnbaum (1942).
Lemma 3.
Suppose Y , . . . , Y n are mean zero independent random vectors in R k with E n = n − P ni =1 V ar ( Y i ) converging to some positive definite matrix V . Let s ≥ be an integer and ¯ ρ s + δ = O (1) for some δ > . Additionally assume Z to be a N ( , I k ) random vector which is independentof Y , . . . , Y n and the sequence { c n } n ≥ to be such that c n = O ( n − d ) & n − ( s − / ˜ k log n = o ( c n ) where ˜ k = max { k + 1 , s + 1 } & d > is a constant. Then for any Borel set B of R k , (cid:12)(cid:12)(cid:12) P (cid:0) √ n ¯ Y + c n Z ∈ B (cid:1) − Z B ψ n,s ( x ) dx (cid:12)(cid:12)(cid:12) = o (cid:16) n − ( s − / (cid:17) , (6.1) where ψ n,s ( · ) is defined above. Proof of Lemma 3. See Section 1 of Supplementary material file.
OC of Bootstrap in Logistic Regression Lemma 4.
Suppose all the assumptions of Lemma 2 are true. Define d n = n − / c n and A δ = { x ∈ R k : k x k < δ } for some δ > . Let H : R k → R m ( k ≥ m ) has continuous partial derivativesof all orders on A δ and Grad [ H ( )] is of full row rank. Then for any Borel set B of R m we have (cid:12)(cid:12)(cid:12) P (cid:16) √ n (cid:0) H ( ¯ Y n + d n Z ) − H ( )) ∈ B (cid:17) − Z B ˇ ψ n,s ( x ) dx (cid:12)(cid:12)(cid:12) = o (cid:16) n − ( s − / (cid:17) , (6.2) where ˇ ψ n,s ( x ) = h P s − r =1 n − r/ a ,r ( Q n , x ) φ ˇ M n ( x ) ih P m − j =1 c jn a ,j ( x ) i with m = inf (cid:8) j : c jn = o (cid:0) n − ( s − / (cid:1)(cid:9) and Q n being the distribution of √ n ¯ Y n . a ,r ( Q n , · ) , r ∈ { , . . . , ( s − } , are poly-nomials whose coefficients are continuous functions of first s average cumulants of { Y , . . . , Y n } . a ,j ( · ) , j ∈ { , . . . , ( m − } , are polynomials whose coefficients are continuous functions of par-tial derivatives of H of order ( s − or less. ˇ M n = ¯ BE n ¯ B ′ with ¯ B = Grad [ H ( )] and E n = n − P ni =1 V ar ( Y i ) . Proof of Lemma 4. This follows exactly through the same line of the proof of Lemma 3.2 inLahiri (1989).
Lemma 5.
Let Y , . . . , Y n be mean zero independent random vectors in R k with n − P ni =1 E k Y i k = O (1) . Suppose T n = E − n where E n = n − P ni =1 V ar ( Y i ) is the average positive definite covariancematrix and E n converges to some positive definite matrix as n → ∞ . Then for any Borel subset B of R k we have (cid:12)(cid:12)(cid:12) P (cid:16) n − / T n n X i =1 Y i ∈ B (cid:17) − Φ ( B ) (cid:12)(cid:12)(cid:12) ≤ C ( k ) n − / ρ + 2 Φ (cid:16) ( ∂B ) C ( k ) ρ n − / (cid:17) , where ρ = n − P ni =1 E k T n Y i k . Proof of Lemma 5. This is a direct consequence of part (a) of corollary 24.3 in BR(86).
Lemma 6.
Suppose
A, B are matrices such that ( A − aI ) and ( B − aI ) are positive semi-definitematrices of same order, for some a > . For some r > , A r , B r are defined in the usual way. Thenfor any < r < , we have k A r − B r k ≤ ra r − k A − B k . Proof of Lemma 6. More general version of this lemma is stated as corollary (X.46) in Bhatia(1996).
Lemma 7.
Suppose all the assumptions of Lemma 4 are true and ˇ M n = I m , the m × m identitymatrix. Define ˆ H n = h √ n (cid:0) H ( ¯ Y n + d n Z ) − H ( )) i + R n where P (cid:16) k R n k = o (cid:0) n − ( s − / (cid:1)(cid:17) = 1 − o (cid:0) n − ( s − / (cid:1) and s is as defined in Lemma 3. Then we have sup B ∈A m (cid:12)(cid:12)(cid:12) P (cid:16) ˆ H n ∈ B (cid:17) − Z B ˇ ψ n,s ( x ) dx (cid:12)(cid:12)(cid:12) = o (cid:16) n − ( s − / (cid:17) , (6.3) where the class of sets A m is as defined in section 3. Das D. and Das P.
Proof of Lemma 7. Recall the definition of ( ∂B ) ǫ which is given in section 3. For some B ⊆ R m and δ >
0, define B n,s,δ = ( ∂B ) δn − ( s − / . Hence using Lemma 4, for any B ∈ A m we have (cid:12)(cid:12)(cid:12) P (cid:16) ˆ H n ∈ B (cid:17) − Z B ˇ ψ n,s ( x ) dx (cid:12)(cid:12)(cid:12) = o (cid:16) n − ( s − / (cid:17) ≤ (cid:12)(cid:12)(cid:12) P (cid:16) ˆ H n ∈ B (cid:17) − P (cid:16) √ n (cid:0) H ( ¯ Y n + d n Z ) − H ( )) ∈ B (cid:17)(cid:12)(cid:12)(cid:12) + o (cid:16) n − ( s − / (cid:17) ≤ P (cid:16) k R n k 6 = o (cid:0) n − ( s − / (cid:1)(cid:17) + 2 P (cid:16) √ n (cid:0) H ( ¯ Y n + d n Z ) − H ( )) ∈ B n,s,δ (cid:17) + o (cid:16) n − ( s − / (cid:17) = 2 P (cid:16) √ n (cid:0) H ( ¯ Y n + d n Z ) − H ( )) ∈ B n,s,δ (cid:17) + o (cid:16) n − ( s − / (cid:17) = 2 Z B n,s,δ ˇ ψ n,s ( x ) dx + o (cid:16) n − ( s − / (cid:17) (6.4)for any δ >
0. Now calculations at page 213 of BR(86) and arguments at page 58 of Lahiri(1989)imply that for any B ∈ A m , Z B n,s,δ ˇ ψ n,s ( x ) dx ≤ C ( s ) sup B ∈A m Φ (cid:0) B n,s,δ (cid:1) + o (cid:16) n − ( s − / (cid:17) = o (cid:16) n − ( s − / (cid:17) , since δ > Lemma 8.
Let A and B be positive definite matrices of same order. Then for some given matrix C , the solution of the equation AX + XB = C can be expressed as X = Z ∞ e − t A C e − t B dt, where e − t A and e − t B are defined in the usual way. Proof of Lemma 8. This lemma is an easy consequence of Theorem VII.2.3 in Bhatia (1996).
Lemma 9.
Let W , . . . , W n be n independent mean random variables with average variance s n = n − P ni =1 E W i and P (cid:0) max {| W j | : i ∈ { , . . . , n }} ≤ C (cid:1) = 1 for some positive constant C and integer s ≥ . ¯ χ ν,n is the average ν th cumulant. Recall the polynomial ˜ P r for any non-negativeinteger r , as defined in the beginning of this section. Then there exists two constants < C ( s ) < and C ( s ) > such that whenever | t | ≤ C ( s ) √ n min { C − s n , C − s/ ( s − s s/ ( s − n } , we have (cid:12)(cid:12)(cid:12) n Y j =1 E (cid:16) e in − / tW j (cid:17) − s − X r =0 n − r/ ˜ P r (cid:0) it : { ¯ χ ν,n } (cid:1) e − ( s n t ) / (cid:12)(cid:12)(cid:12) ≤ C ( s ) C s s − sn n − ( s − / h ( s n t ) s +( s n t ) s − i e − ( s n t ) / Proof of Lemma 9. In view of Theorem 9.9 of BR(86), enough to show that for any j ∈ { , . . . , n } , (cid:12)(cid:12)(cid:12) E (cid:0) e its − n n − / W j (cid:1) − (cid:12)(cid:12)(cid:12) ≤ / | t | ≤ C ( s ) √ n min { C − s n , C − s/ ( s − s s/ ( s − n } . This is indeedthe case due to the fact that (cid:12)(cid:12)(cid:12) E (cid:0) e itn − / W j (cid:1) − (cid:12)(cid:12)(cid:12) ≤ t E W j ns n . OC of Bootstrap in Logistic Regression Lemma 10.
Assume the setup of Theorem 2 and let X i = y i x i , i ∈ { , . . . , n } . Define σ n = n − P ni =1 V ar ( X i ) and ¯ χ ν,n as the ν th average cumulant of { ( X − E ( X )) , . . . , ( X n − E ( X n )) } . P r (cid:0) − Φ σ n : { ¯ χ ν,n } (cid:1) is the finite signed measure on R whose density is ˜ P r (cid:0) − D : { ¯ χ ν,n } (cid:1) φ σ n ( x ) . Let S ( x ) = 1 and S ( x ) = x − / . Suppose σ n is bounded away from both & ∞ and assumptions(C.1)-(C.3) of Theorem 2 hold. Then we have sup x ∈R (cid:12)(cid:12)(cid:12) P (cid:16) n − / n X i =1 (cid:0) X i − E ( X i ) (cid:1) ≤ x (cid:17) − X r =0 n − r/ ( − r S r ( nµ n + n / x ) d r dx r Φ σ n ( x ) − n − / P (cid:0) − Φ σ n : { ¯ χ ν,n } (cid:1) ( x ) (cid:12)(cid:12)(cid:12) = o (cid:0) n − / (cid:1) , (6.5) where P r (cid:0) − Φ σ n : { ¯ χ ν,n } (cid:1) ( x ) is the P r (cid:0) − Φ σ n : { ¯ χ ν,n } (cid:1) − measure of the set ( −∞ , x ] . Proof of Lemma 10. See Section 1 in the Supplementary material file.
Lemma 11.
Let ˘ W , . . . , ˘ W n be iid mean non-degenerate random vectors in R l for some naturalnumber l , with finite fourth absolute moment and lim sup k t k→∞ (cid:12)(cid:12) E e i t ′ ˘ W (cid:12)(cid:12) < (i.e. Cramer’s condi-tion holds). Suppose ˘ W i = ( ˘ W ′ i , . . . , ˘ W ′ im ) ′ where ˘ W ij is a random vector in R l j and P mj =1 l j = l , m being a fixed natural number. Consider the sequence of random variables ˜ W , . . . , ˜ W n where ˜ W i = ( c i ˘ W ′ i , . . . , c im ˘ W ′ im ) ′ . { c ij : i ∈ { , . . . , n } , j ∈ { , . . . , m }} is a collection of real numberssuch that for any j ∈ { , . . . , m } , n n − P ni =1 | c ij | o = O (1) and lim inf n →∞ n − P ni =1 c ij > . Alsoassume that ˜ V n = V ar ( ˜ W i ) converges to some positive definite matrix and ¯ χ ν,n denotes the average ν th cumulant of ˜ W , . . . , ˜ W n . Then we have sup B ∈A l (cid:12)(cid:12)(cid:12) P (cid:16) n − / n X i =1 ˜ W i ∈ B (cid:17) − Z B h n − / ˜ P r (cid:0) − D : { ¯ χ ν,n } (cid:1)i φ ˜ V n ( t ) d t (cid:12)(cid:12)(cid:12) = o (cid:0) n − / (cid:1) , (6.6) where the collection of sets A l is as defined in section 3. Proof of Lemma 11. See Section 1 in the Supplementary material file.
Proof of Theorem 1. Recall that the studentized pivot is˜ H n = √ n ˆ L / n (cid:0) ˆ β n − β (cid:1) , where ˆ L n = n − P ni =1 x i x ′ i e x ′ i ˆ β n (cid:0) e x ′ i ˆ β n (cid:1) − . ˆ β n is the solution of (1.2). By Taylor’s theorem,from (1.2) we have L n (cid:0) ˆ β n − β (cid:1) = n − n X i =1 ( y i − p ( β | x i )) x i − (2 n ) − n X i =1 x i e z i (1 − e z i )(1 + e z i ) − (cid:2) x ′ i ( ˆ β n − β ) (cid:3) , (6.7)6 Das D. and Das P. where | z i − x ′ i β | ≤ | x ′ i ( ˆ β n − β ) | for all i ∈ { , . . . , n } . Now due to the assumption n − P ni =1 k x i k = O (1), by Lemma 1 (with t = 3) we have P (cid:16)(cid:12)(cid:12) n − n X i =1 ( y − p ( β | x i )) x ij (cid:12)(cid:12) ≤ C ( p ) n − / (log n ) / (cid:17) = o (cid:0) n − / (cid:1) , (6.8)for any j ∈ { , . . . , p } . Again by assumption L n converges to some positive definite matrix L .Moreover, (cid:13)(cid:13) (2 n ) − n X i =1 x i e z i (1 − e z i )(1 + e z i ) − (cid:2) x ′ i ( ˆ β − β ) (cid:3) (cid:13)(cid:13) ≤ (cid:16) n − n X i =1 k x i k (cid:17) k ˆ β n − β k . Hence (6.7) can be rewritten as ( ˆ β n − β ) = f n ( ˆ β n − β ) , where f n is a continuous function from R p to R p satisfying P (cid:16) k f n (cid:0) ˆ β n − β (cid:1) k ≤ C n − / (log n ) / (cid:17) =1 − o (cid:0) n − / (cid:1) whenever k ( ˆ β n − β ) k ≤ C n − / ( logn ) / . Therefore, part (a) of Theorem 1 followsby Brouwer’s fixed point theorem. Now we are going to prove part (b). Note that from (6.7) andthe fact that L n converges to some positive definite matrix L , we have for large enough n ,˜ H n = ˆ L / n (cid:2) L − n Λ n + R n (cid:3) . (6.9)Here Λ n = n − / P ni =1 ( y − p ( β | x i )) x i and R n = − L − n √ n P ni =1 x i e z i (1 − e z i )(1 + e z i ) − (cid:2) x ′ i ( ˆ β n − β ) (cid:3) with | z i − x ′ i β | ≤ | x ′ i ( ˆ β n − β ) | for all i ∈ { , . . . , n } . L n and ˆ L n are as defined earlier. Nowapplying part (a) we have P (cid:16) k R n k = O (cid:16) n − / ( logn ) (cid:17)(cid:17) = 1 − o (cid:16) n − / (cid:17) . Again by Taylor’s theoremwe have ˆ L n − L n = n − n X i =1 x i x ′ i e x ′ i β (cid:0) − e x ′ i β (cid:1)(cid:0) e x ′ i β (cid:1) − (cid:2) x ′ i ( ˆ β n − β ) (cid:3) + L n , (6.10)where by part (a), we have P (cid:16) k L n k = O (cid:16) n − ( logn ) (cid:17)(cid:17) = 1 − o (cid:16) n − / (cid:17) . Hence using Lemma6, part (a) and Taylor’s theorem, one can show that P (cid:16) k ˆ L / n − L / n k = O (cid:16) n − / ( logn ) / (cid:17)(cid:17) =1 − o (cid:16) n − / (cid:17) . Therefore (6.8) and (6.10) will imply that˜ H n = L − / n Λ n + R n , where P (cid:16) k R n k = O (cid:16) n − / ( logn ) (cid:17)(cid:17) = 1 − o (cid:16) n − / (cid:17) . Hence for any set B ∈ A p , there exists aconstant C ( p ) > (cid:12)(cid:12)(cid:12) P (cid:16) ˜ H n ∈ B (cid:17) − Φ( B ) (cid:12)(cid:12)(cid:12) ≤ (cid:12)(cid:12)(cid:12) P (cid:16) ˜ H n ∈ B (cid:17) − P (cid:16) L − / n Λ n ∈ B (cid:17)(cid:12)(cid:12)(cid:12) + (cid:12)(cid:12)(cid:12) P (cid:16) L − / n Λ n ∈ B (cid:17) − Φ( B ) (cid:12)(cid:12)(cid:12) ≤ P (cid:16) k R n k > C ( p ) n − / ( logn ) (cid:17) + 2 P (cid:16) L − / n Λ n ∈ ( ∂B ) C ( p ) n − / (log n ) (cid:17) + (cid:12)(cid:12)(cid:12) P (cid:16) L − / n Λ n ∈ B (cid:17) − Φ( B ) (cid:12)(cid:12)(cid:12) = O (cid:16) n − / (log n ) (cid:17) . OC of Bootstrap in Logistic Regression k R n k . Therefore part (b) is proved.Proof of Theorem 3. By applying Taylor’s theorem, it follows from (2.1) thatˆ L n (cid:0) ˆ β ∗ n − ˆ β n (cid:1) = n − n X i =1 ( y − ˆ p ( x i )) x i − (2 n ) − n X i =1 x i e z ∗ i (1 − e z ∗ i )(1 + e z ∗ i ) − (cid:2) x ′ i ( ˆ β ∗ n − ˆ β n ) (cid:3) , (6.11)where | z ∗ i − x ′ i β | ≤ | x ′ i ( ˆ β ∗ n − ˆ β n ) | for all i ∈ { , . . . , n } . Now rest of part (a) of Theorem 3 followsexactly in the same line as the proof of part (a) of Theorem 1. To establish part (b), assume that W i = (cid:16) y i x ′ i , (cid:2) y i − E y i (cid:3) z ′ i (cid:17) ′ and W ∗ i = (cid:16) ˆ Y i (cid:2) ( G ∗ i − µ G ∗ ) µ − G ∗ (cid:3) x ′ i , ˆ Y i (cid:2) µ − G ∗ ( G ∗ i − µ G ∗ ) − (cid:3) z ′ i (cid:17) ′ . Here y i = ( y i − p ( β | x i )) and ˆ Y i = ( y i − ˆ p ( x i )). First we are going to show thatˇ H n = √ n (cid:16) H (cid:0) ¯ W n + n − / b n Z (cid:1)(cid:17) + R n and ˇ H ∗ n = √ n (cid:16) ˆ H (cid:0) ¯ W ∗ n + n − / b n Z (cid:1)(cid:17) + R ∗ n , for some functions H, ˆ H : R k → R p where k = p + q with q = p ( p + 1)2 . H ( · ) , ˆ H ( · ) have continuouspartial derivatives of all orders, H ( ) = ˆ H ( ) = and P (cid:16) k R n k = o (cid:0) n − / (cid:1)(cid:17) = 1 − o (cid:0) n − / (cid:1) & P ∗ (cid:16) k R ∗ n k = o (cid:0) n − / (cid:1)(cid:17) = 1 − o p (cid:0) n − / (cid:1) . Next step is to apply Lemma 3, Lemma 4 and Lemma 7 toclaim that suitable Edgeworth expansions exist for both ˇ H n and ˇ H ∗ n . The last step is to concludeSOC of Bootstrap by comparing the Edgeworth expansions. Now (6.7) and part (a) of Theorem 1imply that √ n (cid:0) ˆ β n − β (cid:1) = L − n h Λ n − ξ n / i + R n , (6.12)where P (cid:16) k R n k ≤ C ( p ) n − (log n ) / (cid:1)(cid:17) = 1 − o (cid:0) n − / (cid:1) . Here Λ n = n − / P ni =1 y i x i and ξ n = n − / P ni =1 x i e x ′ i β (cid:0) − e x ′ i β (cid:1)(cid:0) e x ′ i β (cid:1) − h x ′ i (cid:0) L − n Λ n (cid:1)i . Clearly, P (cid:16) k ξ n k ≤ C ( p ) n − / (log n ) (cid:1)(cid:17) =1 − o (cid:0) n − / (cid:1) . Therefore, by Taylor’s theorem we have √ n (cid:0) ˆ L n − L n (cid:1)(cid:0) ˆ β n − β (cid:1) = ξ n + R n , (6.13)where P (cid:16) k R n k ≤ C ( p ) n − (log n ) (cid:1)(cid:17) = 1 − o (cid:0) n − / (cid:1) . Again noting (6.13), by equation (5) atpage 52 of Turnbull (1929) we haveˆ M − / n − L − / n = − L − / n Z n L − / n + Z n , (6.14)where (cid:0) ˆ M n − L n (cid:1) = L / n Z n + Z n L / n . Also easy to show that P (cid:16) k ˆ M n − M n k ≤ C ( p ) n − (log n ) (cid:17) = 1 − o (cid:0) n − / (cid:1) , where M n = n − P ni =1 y i x i x ′ i . Hence using Lemma 6 we have P (cid:16) k Z n k ≤ C ( p ) n − (log n ) (cid:1)(cid:17) =1 − o (cid:0) n − / (cid:1) . Therefore from (6.12)-(6.14), Lemma 8 and the fact that b n = O ( n − d ) (for some8 Das D. and Das P. d >
0) will imply thatˇ H n = L − / n h Λ n + b n Z + ξ n / i − L − / n h Z ∞ e − t L / n (cid:0) M n − L n (cid:1) e − t L / n dt i L − / n Λ n + R n , (6.15)where P (cid:16) k R n k ≤ C ( p ) n − / (log n ) − (cid:17) = 1 − o (cid:0) n − / (cid:1) . Now writing W i = ( W ′ i , W ′ i ) ′ and¯ W n = n − P ni =1 W i = ( ¯ W ′ n, , ¯ W ′ n ) ′ with W i has first p components of W i for all i ∈ { , . . . , n } , wehave Λ n + b n Z = √ n (cid:0) ¯ W n + n − / b n Z (cid:1) ξ n = n − / n X i =1 x i e x ′ i β (cid:0) − e x ′ i β (cid:1)(cid:0) e x ′ i β (cid:1) − h ¯ W ′ n L − n x i x ′ i L − n ¯ W n i = √ n (cid:16) ¯ W ′ n ˜ M ¯ W n , . . . , ¯ W ′ n ˜ M p ¯ W n (cid:17) ′ , where ˜ M k = n − P ni =1 x ik e x ′ i β (cid:0) − e x ′ i β (cid:1)(cid:0) e x ′ i β (cid:1) − (cid:16) L − n x i x ′ i L − n (cid:17) for k ∈ { , . . . , p } . Hencewriting ˜ W n = ¯ W n + n − / b n Z we have L − / n h Λ n + b n Z + ξ n / i = √ n (cid:20) L − / n ˜ W n + (cid:16) ˜ W ′ n ˘ M ˜ W n , . . . , ˜ W ′ n ˘ M p ˜ W n (cid:17) ′ (cid:21) , (6.16)since b n = O ( n − d ) and k ˜ M k k = O (1) for any k ∈ { , . . . , p } . Here ˘ M k = P pj =1 L − / kjn ˜ M k , k ∈{ , . . . , p } , with L − / kjn being the ( k, j )th element of L − / n . Again the j th row of (cid:0) M n − L n (cid:1) is¯ W ′ n E jn where E jn is a matrix of order q × p with k E jn k ≤ q , j ∈ { , . . . , p } . Therefore from (6.7)and (6.10) we have L − / n h Z ∞ e − t L / n (cid:0) M n − L n (cid:1) e − t L / n dt i L − / n Λ n = √ n (cid:16) ˜ W ′ n ˇ M ˜ W n , . . . , ˜ W ′ n ˇ M p ˜ W n (cid:17) ′ + R n , (6.17)where ˜ W n = ¯ W n + n − / b n Z with Z ∼ N q (cid:0) , I q (cid:1) , independent of Z & { y , . . . , y n } . ¯ M k = R ∞ h P pj =1 m kjn ( t ) ˇ M j ( t ) (cid:3) dt where m kjn ( t ) is the ( k, j )th element of the matrix L − / n e t L / n andˇ M j ( t ) = E jn e t L / n L − / n , k, j ∈ { , . . . , p } . Moreover, P (cid:16) k R n k ≤ C n − / (log n ) − (cid:1)(cid:17) = 1 − o (cid:0) n − / (cid:1) . Now define the ( p + q ) × ( p + q ) matrices { M † , . . . , M † p } where M † k = " ˘ M k ¯ M k . Thereforefrom (6.15)-(6.17) we haveˇ H n = √ n (cid:20)(cid:16) L − / n (cid:17) ˜ W n + (cid:16) ˜ W ′ n M † ˜ W n , . . . , ˜ W ′ n M † p ˜ W n (cid:17) ′ (cid:21) + R n = √ nH (cid:0) ˜ W n (cid:1) + R n , (6.18)where the function H ( · ) has continuous partial derivatives of all orders, ˜ W n = (cid:0) ˜ W ′ n , ˜ W ′ n (cid:1) ′ and R n = R n + R n . OC of Bootstrap in Logistic Regression W ∗ n = n − P ni =1 W ∗ i = n − P ni =1 ˆ Y i µ − G ∗ ( G ∗ i − µ G ∗ ) x i and ¯ W ∗ n = n − P ni =1 W ∗ i = n − P ni =1 ˆ Y i (cid:2) µ − G ∗ ( G ∗ i − µ G ∗ ) − (cid:3) z i , it can be shown thatˇ H ∗ n = √ n (cid:20)(cid:16) ˆ M − / n (cid:17) ˜ W ∗ n + (cid:16) ˜ W ∗′ n M ∗† ˜ W ∗ n , . . . , ˜ W ∗′ n M ∗† p ˜ W ∗ n (cid:17) ′ (cid:21) + R ∗ n = √ n ˆ H (cid:0) ˜ W ∗ n (cid:1) + R ∗ n , (6.19)where ˜ W ∗ n = (cid:0) ˜ W ∗′ n , ˜ W ∗′ n (cid:1) ′ with ˜ W ∗ n = ¯ W ∗ n + n − / b n Z ∗ and ˜ W ∗ n = ¯ W ∗ n + n − / b n Z ∗ , Z ∗ being a N q (cid:0) , I q (cid:1) distributed random vector independent of { G ∗ , . . . , G ∗ n } and Z ∗ . M ∗† j = " ˘ M ∗ k ¯ M ∗ k where˘ M ∗ j = P pj =1 ˆ M − / kjn ˜ M ∗ j with ˆ M − / kjn being the ( k, j )th element of ˆ M − / n , ˜ M ∗ j being same as ˜ M j af-ter replacing β by ˆ β n . ¯ M ∗ j = R ∞ (cid:2) P pj =1 m ∗ kjn ˇ M j ( t ) (cid:3) dt where m ∗ kjn ( t ) is the ( k, j )th element of thematrix ˆ M − / n e − t ˆ M / n and ˇ M ∗ j ( t ) = E jn e − t ˆ M / n ˆ M − / n . Also P ∗ (cid:0) k R ∗ n k ≤ C n − / (log n ) − (cid:1) =1 − o p (cid:0) n − / (cid:1) . Now by applying Lemma 3, Lemma 4 and Lemma 7 with s = 3, Edgeworth expan-sions of the densities of ˇ H n and ˇ H ∗ n can be found uniformly over the class A p upto an error o (cid:0) n − / (cid:1) and o p (cid:0) n − / (cid:1) respectively. Call those Edgeworth expansions ˜ ψ n, ( · ) and ˜ ψ ∗ n, ( · ) respectively. Nowif ˜ ψ n, ( · ) is compared with ˇ ψ n, ( · ) of Lemma 4, then ˇ M n = I p . Similarly for ˜ ψ ∗ n, ( · ) also ˇ M n = I p .Therefore, ˜ ψ n, ( · ) and ˜ ψ ∗ n, ( · ) have the forms˜ ψ n, ( x ) = h n − / q ( β , µ W , x ) + m − X j =1 b jn q j ( β , L n , x ) i φ ( x )˜ ψ ∗ n, ( x ) = h n − / q ( ˆ β n , ˆ µ W , x ) + m − X j =1 b jn q j ( ˆ β n , ˆ M n , x ) i φ ( x ) , where m = inf { j : b jn = o ( n − / ) } , µ W is the vector of { n − P ni = E ( y i − p ( β | x i )) x l ij x l ij ′ : j, j ′ ∈ { , . . . , p } , l , l ∈ { , , } , l + l = 2 } and { n − P ni = E ( y i − p ( β | x i )) x l ij x l ij ′ x l ij ′′ : j, j ′ , j ′′ ∈{ , . . . , p } , l , l , l ∈ { , , , } , l + l + l = 3 } . ˆ µ W is the vector of { n − P ni = ( y i − ˆ p ( x i )) x l ij x l ij ′ : j, j ′ ∈ { , . . . , p } , l , l ∈ { , , } , l + l = 2 } and { n − P ni = ( y i − ˆ p ( x i )) x l ij x l ij ′ x l ij ′′ : j, j ′ , j ′′ ∈{ , . . . , p } , l , l ∈ { , , , } , l + l + l = 3 } . q ( a , b , c ) is a polynomial in c whose coefficients arecontinuous functions of ( a , b ) ′ . q j ( a , b , c ) are polynomials in c whose coefficients are continuousfunctions of a and b . Now Theorem 3 follows by comparing ˜ ψ n, ( · ) and ˜ ψ ∗ n, ( · ) and due to part (a)of Theorem 1.Proof of Theorem 2. Recall that here p = 1 and hence q = 1. Define, B n = √ nH ( E n × R ) with E n = ( −∞ , z n ] and z n = (cid:16) n − µ n (cid:17) . Here µ n = n − P ni =1 x i p ( β | x i ). Note that B n is an interval, asargued in section 3 just after the description of Theorem 2. The function H ( · ) is defined in (6.18).We are going to show that there exists a positive constant M such thatlim n →∞ P (cid:16) √ n (cid:12)(cid:12)(cid:12) P ∗ (cid:0) H ∗ n ∈ B n (cid:1) − P (cid:0) H n ∈ B n (cid:1)(cid:12)(cid:12)(cid:12) ≥ M (cid:17) = 1 . Das D. and Das P.
Define the set Q n = n(cid:12)(cid:12) ˆ β n − β (cid:12)(cid:12) = o (cid:0) n − / (log n ) (cid:1)o ∩ n(cid:12)(cid:12) n − P ni =1 (cid:2) ( y i − p ( β | x i )) − E ( y i − p ( β | x i )) (cid:3) x i (cid:12)(cid:12) = o (cid:0) n − / ( logn ) (cid:1)o ∩ n(cid:12)(cid:12) n − P ni =1 (cid:2) ( y i − p ( β | x i )) − ( y i − p ( β | x i )) (cid:3) x i (cid:12)(cid:12) = o (1) o . Now due to a strongerversion of (6.8), it is easy to see that P (cid:16)(cid:12)(cid:12) ˆ β n − β (cid:12)(cid:12) = o (cid:0) n − / (log n ) (cid:17) = 1 for all but finitely many n ,upon application of Borel-Cantelli lemma and noting that max {| x i | : i ∈ { , . . . , n }} = O (1).Again by applying Lemma 1, it is easy to show that P (cid:16)n(cid:12)(cid:12) n − P ni =1 (cid:2) ( y i − p ( β | x i )) − ( y i − p ( β | x i )) (cid:3)(cid:12)(cid:12) = o (cid:0) n − / ( logn ) (cid:1)o ∩ n(cid:12)(cid:12) n − P ni =1 (cid:2) ( y i − p ( β | x i )) − ( y i − p ( β | x i )) (cid:3)(cid:12)(cid:12) = o (1) o(cid:17) = 1for large enough n . Hence P (cid:0) Q n (cid:1) = 1 for large enough n . Similarly define the Bootstrap version of Q n as Q ∗ n = n(cid:12)(cid:12) ˆ β ∗ n − ˆ β n (cid:12)(cid:12) = o (cid:0) n − / (log n ) (cid:1)o ∩ n(cid:12)(cid:12) n − P ni =1 (cid:2) ( y i − ˆ p ( x i )) (cid:0) µ − G ∗ ( G ∗ i − µ G ∗ ) − (cid:1)(cid:3) x i (cid:12)(cid:12) = o (cid:0) n − / ( logn ) (cid:1)o ∩ n(cid:12)(cid:12) n − P ni =1 (cid:2) ( y i − ˆ p ( x i )) (cid:0) µ − G ∗ ( G ∗ i − µ G ∗ ) − (cid:1)(cid:3) x i (cid:12)(cid:12) = o (cid:0) (cid:1)o . Through the sameline, it is easy to establish that P (cid:16) P ∗ (cid:0) Q ∗ n (cid:1) = 1 (cid:17) = 1 for large enough n . Hence enough to showlim n →∞ P (cid:16)n √ n (cid:12)(cid:12)(cid:12) P ∗ (cid:0)(cid:8) H ∗ n ∈ B n } ∩ Q ∗ n (cid:1) − P (cid:0)(cid:8) H n ∈ B n (cid:9) ∩ Q n (cid:1)(cid:12)(cid:12)(cid:12) ≥ M o ∩ Q n (cid:17) = 1 . (6.20)Recall the definitions of ¯ W n and ¯ W ∗ n from the proof of Theorem 3. Similar to (6.18) and (6.19), itis easy to observe that H n = √ nH ( ¯ W n ) + R n and H ∗ n = √ n ˆ H ( ¯ W ∗ n ) + R ∗ n , (6.21)where (cid:8) | R n | = O ( n − / (log n ) − ) (cid:9) ⊆ Q n and (cid:8) | R ∗ n | = O ( n − / (log n ) − ) (cid:9) ⊆ Q ∗ n . To prove (6.20),first we are going to show for large enough n , (cid:26)n √ n (cid:12)(cid:12)(cid:12) P ∗ (cid:0)(cid:8) H ∗ n ∈ B n } ∩ Q ∗ n (cid:1) − P (cid:0)(cid:8) H n ∈ B n (cid:9) ∩ Q n (cid:1)(cid:12)(cid:12)(cid:12) ≥ M o ∩ Q n (cid:27) ⊇ (cid:26)n √ n (cid:12)(cid:12)(cid:12) P ∗ (cid:0)(cid:8) √ n ˆ H ( ¯ W ∗ n ) ∈ B n } ∩ Q ∗ n (cid:1) − P (cid:0)(cid:8) √ nH ( ¯ W n ) ∈ B n (cid:9) ∩ Q n (cid:1)(cid:12)(cid:12)(cid:12) ≥ M o ∩ Q n (cid:27) . (6.22)Now due to (6.21), we have (cid:12)(cid:12)(cid:12) P (cid:16) H n ∈ B n (cid:17) − P (cid:16) √ nH ( ¯ W n ) ∈ B n (cid:17)(cid:12)(cid:12)(cid:12) ≤ P (cid:16) √ nH ( ¯ W n ) ∈ (cid:0) ∂ B n (cid:1) ( n log n ) − / (cid:17) + P (cid:16) | R n | 6 = o ( n − / (log n ) − ) (cid:17)(cid:12)(cid:12)(cid:12) P ∗ (cid:16) H ∗ n ∈ B n (cid:17) − P ∗ (cid:16) √ n ˆ H ( ¯ W ∗ n ) ∈ B n (cid:17)(cid:12)(cid:12)(cid:12) ≤ P ∗ (cid:16) √ n ˆ H ( ¯ W ∗ n ) ∈ (cid:0) ∂ B n (cid:1) ( n log n ) − / (cid:17) + P ∗ (cid:16) | R ∗ n | 6 = o ( n − / (log n ) − ) (cid:17) To establish (6.22), enough to show P (cid:16) √ n ˆ H ( ¯ W n ) ∈ (cid:0) ∂ B n (cid:1) ( n log n ) − / (cid:17) = o (cid:0) n − / (cid:1) and P (cid:16)n P ∗ (cid:16) √ n ˆ H ( ¯ W ∗ n ) ∈ (cid:0) ∂ B n (cid:1) ( n log n ) − / (cid:17) = o (cid:0) n − / (cid:1)o ∩ Q n (cid:17) = 1 for large enough n . An Edgeworth ex-pansion of √ n ¯ W ∗ n with an error o ( n − / ) (in almost sure sense) can be established using Lemma11. Then we can use transformation technique of Bhattacharya and Ghosh (1978) to find anEdgeworth expansion ˆ η n ( · ) of the density of √ n ˆ H ( ¯ W ∗ n ) with an error o ( n − / ) (in almost suresense). Now the calculations similar to page 213 of BR(86) will imply that P (cid:16)n P ∗ (cid:16) √ n ˆ H ( ¯ W ∗ n ) ∈ OC of Bootstrap in Logistic Regression (cid:0) ∂ B n (cid:1) ( n log n ) − / (cid:17) = o (cid:0) n − / (cid:1)o ∩ Q n (cid:17) = 1, since B n is an interval. Next we are going to show that P (cid:16) √ n ˆ H ( ¯ W n ) ∈ (cid:0) ∂ B n (cid:1) ( n log n ) − / (cid:17) = 0 for large enough n and to show that we need to utilize theform of B n , as Edgeworth expansion of √ nH ( ¯ W n ) similar to √ n ˆ H ( ¯ W ∗ n ) does not exist due to thelattice nature of W , . . . , W n . To this end define k n ( x ) = (cid:0) √ nH ( x / √ n ) , x (cid:1) ′ where x = ( x , x ) ′ .Note that k n ( · ) is a diffeomorphism (cf. proof of lemma 3.2 in Lahiri (1989)). Hence k n ( · ) is a bi-jection and k n ( · ) & k − n ( · ) have derivatives of all orders. Therefore, arguments given between (2.15)and (2.18) at page 444 of Bhattacharya and Ghosh (1978) with g n replaced by k − n ( · ) will implythat (cid:12)(cid:12)(cid:12) P (cid:16) H n ∈ B n (cid:17) − P (cid:16) √ nH ( ¯ W n ) ∈ B n (cid:17)(cid:12)(cid:12)(cid:12) ≤ P (cid:16)(cid:0) √ n ¯ W n ∈ (cid:0) ∂k − n ( B n × R ) (cid:1) d n ( n log n ) − / (cid:17) + o (cid:0) n − / (cid:1) = P (cid:16) √ n ¯ W n ∈ (cid:0) ∂ E n (cid:1) d n ( n log n ) − / (cid:17) + o (cid:0) n − / (cid:1) , where d n ≤ max (cid:8) | det (cid:0) Grad (cid:2) k n ( x ) (cid:3)(cid:1) | − : | x | = O ( √ log n ) (cid:9) . Now by looking into the form of H ( · )in (6.8), it is easy to see that d n = O (1), say d n ≤ C for some positive constant C . Now notethat P (cid:16) √ n ¯ W n ∈ (cid:0) ∂ E n (cid:1) C ( n log n ) − / (cid:17) = P (cid:16)h n − / n X i =1 y i x i − √ nµ n i ∈ (cid:0) z n − C ( n log n ) − / , z n + C ( n log n ) − / (cid:1)(cid:17) = P (cid:16) n X i =1 y i x i ∈ (cid:0) / − C (log n ) − / , / C (log n ) − / (cid:1)(cid:17) = 0 , for large enough n , since P ni =1 y i x i can take only integer values. Therefore (6.22) is established.Now recalling that ˆ η n ( · ) is the Egeworth expansion of the density of √ n ˆ H ( ¯ W ∗ n ) with an almostsure error o ( n − / ), we have for large enough n , P (cid:16) √ n (cid:12)(cid:12)(cid:12) P ∗ (cid:16) √ n ˆ H ( ¯ W ∗ n ) ∈ B n (cid:17) − Z B n ˆ η n ( x ) dx (cid:12)(cid:12)(cid:12) = o (1) (cid:17) = 1 . (6.23)Now define U i = (cid:16)(cid:0) y i − p ( β | x i ) (cid:1) x i V i , (cid:0) y i − p ( β | x i ) (cid:1) x i (cid:2) V i − (cid:3)(cid:17) ′ , i ∈ { , . . . , n } , where V , . . . , V n areiid continuous random variables which are independent of { y , . . . , y n } . Also E ( V ) = 0, E ( V ) = E ( V ) = 1 and E V < ∞ . An immediate choice of the distribution of V is that of ( G ∗ − µ G ∗ ) µ − G ∗ .Other choices of { V , . . . , V n } can be found in Liu(1988), Mammen (1993) and Das et al. (2019).Now since max {| x i | : i ∈ { , . . . , n }} = O (1), there exists a natural number n and constants0 < δ ≤ δ < n ≥ n p ( β | x n ) ≤ δ and inf n ≥ n p ( β | x n ) ≥ δ . Again V , . . . , V n are iidcontinuous random variables. Hence writing p n = p ( β | x n ), for any b > n ≥ n sup k t k >b (cid:12)(cid:12)(cid:12) E e i t ′ U n (cid:12)(cid:12)(cid:12) ≤ sup n ≥ n (cid:20) p n sup k t k >b (1 − δ ) (cid:12)(cid:12)(cid:12) E e it (1 − p n ) V + it ( − p n ) [ V − (cid:12)(cid:12)(cid:12) + (1 − p n ) sup k t k >bδ (cid:12)(cid:12)(cid:12) E e it ( − p n ) V + t ( − p n ) [ V − (cid:12)(cid:12)(cid:12)(cid:21) < , Das D. and Das P. i.e. uniform Cramer’s condition holds. Also the minimum eigen value condition of Theorem 20.6of BR(86) holds due to max {| x i | : i ∈ { , . . . , n }} = O (1) and lim inf n →∞ n − P ni =1 x i >
0. Henceapplying Theorem 20.6 of BR(86) and then applying transformation technique of Bhattacharya andGhosh (1978) we have (cid:12)(cid:12)(cid:12) P (cid:16) √ nH ( ¯ U n ) ∈ B n (cid:17) − Z B n η n ( x ) dx (cid:12)(cid:12)(cid:12) = o (cid:0) n − / (cid:1) , (6.24)where ¯ U n = n − P ni =1 U i . Note that in both the expansions η n ( · ) and ˆ η n ( · ) the variances corre-sponding to normal terms are 1. Also ˆ H ( · ) can be obtained from H ( · ) first replacing L n by ˆ M n and then β by ˆ β n (cf. (6.18) and (6.19)). Hence we can conclude that for any Borel set C , P (cid:16)n √ n (cid:12)(cid:12)(cid:12) Z C η n ( x ) dx − Z C ˆ η n ( x ) dx (cid:12)(cid:12)(cid:12) = o (cid:0) (cid:1)o ∩ Q n (cid:17) = 1Hence from (6.23) and (6.24), we have P (cid:18)n √ n (cid:12)(cid:12)(cid:12) P ∗ (cid:16) √ n ˆ H ( ¯ W ∗ n ) ∈ B n (cid:17) − P (cid:16) √ nH ( ¯ U n ) ∈ B n (cid:17)(cid:12)(cid:12)(cid:12) = o (1) o ∩ Q n (cid:19) = 1 , (6.25)for large enough n . To establish (6.20), in view of (6.22) and (6.25) it is enough to find a positiveconstant M such that √ n (cid:12)(cid:12)(cid:12) P (cid:16) √ nH ( ¯ W n ) ∈ B n (cid:17) − P (cid:16) √ nH ( ¯ U n ) ∈ B n (cid:17)(cid:12)(cid:12)(cid:12) = √ n (cid:12)(cid:12)(cid:12) P (cid:16) √ n ¯ W n ∈ E n (cid:17) − P (cid:16) √ n ¯ U n ∈ E n (cid:17)(cid:12)(cid:12)(cid:12) ≥ M . Note that since E V i = E V i = 1 for all i ∈ { , . . . , n } , the first three average moments of { W , . . . , W n } are same as that of { U , . . . , U n } . However { W , . . . , W n } are independent lat-tice random variables whereas { U , . . . , U n } are independent random variables for which uniformCramer’s condition holds. Therefore by Lemma 10 and Theorem 20.6 of BR(86) we havesup x ∈R (cid:12)(cid:12)(cid:12) P (cid:16) √ n ¯ W n ≤ x (cid:17) − Φ σ n ( x ) − n − / P (cid:0) − Φ σ n : { ¯ χ ν,n } (cid:1) ( x )+ n − / (cid:0) nµ n + √ nx − / ddx Φ σ n ( x ) (cid:12)(cid:12)(cid:12) = o (cid:0) n − / (cid:1) and sup x ∈R (cid:12)(cid:12)(cid:12) P (cid:16) √ n ¯ U n ≤ x (cid:17) − Φ σ n ( x ) − n − / P (cid:0) − Φ σ n : { ¯ χ ν,n } (cid:1) ( x ) (cid:12)(cid:12)(cid:12) = o (cid:0) n − / (cid:1) , (6.26)where P (cid:0) − Φ σ n : { ¯ χ ν,n } (cid:1) ( x ) is as defined in Lemma 10. Recall that E n = ( −∞ , z n ] where z n = (cid:16) n − µ n (cid:17) . Therefore for some positive constants C , C , C we have √ n (cid:12)(cid:12)(cid:12) P (cid:16) √ n ¯ W n ∈ E n (cid:17) − P (cid:16) √ n ¯ U n ∈ E n (cid:17)(cid:12)(cid:12)(cid:12) = √ n (cid:12)(cid:12)(cid:12) P (cid:16) √ n ¯ W n ≤ √ nz n (cid:17) − P (cid:16) √ n ¯ U n ≤ √ nz n (cid:17)(cid:12)(cid:12)(cid:12) ≥ (cid:0) nµ n + nz n − / (cid:1)(cid:0) √ πσ n (cid:1) − e − ( nz n ) / (2 σ n ) − o (1)= (cid:0) √ πσ n (cid:1) − e − ( nz n ) / (2 σ n ) − o (1) ≥ C exp n − C n − (cid:16)
916 + n µ n − nµ n (cid:17)o − o (1) ≥ C exp n − C M o . OC of Bootstrap in Logistic Regression {| x i | : i ∈ { , . . . , n }} = O (1)and the last one is due to the assumption √ n | µ n | < M . Taking 4 M = C exp n − C M o , theproof of Theorem 2 is now complete.
7. Conclusion
In this paper we consider the studentized version of the logistic regression estimator and proposed anovel Bootstrap method called PEBBLE. The rate of convergence of the studentized version to nor-mal distribution is found to be sub-optimal with respect to the classical Berry-Esseen rate O (cid:0) n − / (cid:1) .We observe that the usual studentization also fails significantly in improving the error rate in theBootstrap approximation due to the underlying lattice structure. Therefore, a novel modification isproposed in the form of studentized pivots to achieve SOC by the Bootstrap in approximating thedistribution of the studentized logistic regression estimator. The proposed Bootstrap method canbe used in practical purposes to draw inferences about the regression parameter which will be moreaccurate than that based on asymptotic normality. PEBBLE is shown perform better than otherexisting method performance-wise, in general, via simulation experiments. Specifically for larger p ,smaller n settings, PEBBLE outperforms other methods by a large margin. The proposed method isused to find the middle, upper and lower CIs for the covariates in a real data application concerningthe dependency of the type of delivery on several related clinical variables. As a future extension,the SOC of Bootstrap in the generalized linear model (GLM) can be explored. Additionally, onecan also explore the high dimensional structure in GLM, that is when dimension p grows with n ,by adding suitable penalty terms in the underlying objective function.
8. Supplementary Material
Suppose, Φ V and φ V respectively denote the normal distribution and its density with mean and covariance matrix V . We will write Φ V = Φ and φ V = φ when the dispersion matrix V isthe identity matrix. C, C , C , · · · denote generic constants that do not depend on the variableslike n, x , and so on. ν , ν denote vectors in R p , sometimes with some specific structures (asmentioned in the proofs). ( e , . . . , e p ) ′ denote the standard basis of R p . For a non-negative integralvector α = ( α , α , . . . , α l ) ′ and a function f = ( f , f , . . . , f l ) : R l → R l , l ≥
1, let | α | = α + . . . + α l , α ! = α ! . . . α l !, f α = ( f α ) . . . ( f α l l ), D α f = D α · · · D α l l f , where D j f denotesthe partial derivative of f with respect to the j th component of α , 1 ≤ j ≤ l . We will write D α = D if α has all the component equal to 1. For t = ( t , t , · · · t l ) ′ ∈ R l and α as above, define t α = t α · · · t α l l . For any two vectors α , β ∈ R k , α ≤ β means that each of the component of α Das D. and Das P. is smaller than that of β . For a set A and real constants a , a , a A + a = { a y + a : y ∈ A } , ∂A is the boundary of A and A ǫ denotes the ǫ − neighbourhood of A for any ǫ > N is the set ofnatural numbers. C ( · ) , C ( · ) , . . . denote generic constants which depend on only their arguments.Given two probability measures P and P defined on the same space (Ω , F ), P ∗ P defines themeasure on (Ω , F ) by convolution of P & P and k P − P k = | P − P | (Ω), | P − P | being thetotal variation of ( P − P ). For a function g : R k → R m with g = ( g , . . . , g m ) ′ , Grad [ g ( x )] = (cid:16)(cid:16) ∂g i ( x ) ∂x j (cid:17)(cid:17) m × k . For any natural number m , the class of sets A m is the collection of Borel subsets of R m satisfyingsup B ∈A m Φ (( δB ) ǫ ) = O ( ǫ ) as ǫ ↓ . (8.1)For Lemma 3 below, define ξ ,n,s ( t ) = (cid:16) P s − i =1 n − r/ ˜ P r (cid:0) i t : { ¯ χ ν,n } (cid:1)(cid:17) exp n − t ′ E n t / o where E n = n − P ni =1 V ar ( Y i ) and ¯ χ ν,n is the average ν th cumulant of Y , . . . , Y n . Define ¯ ρ l = n − P ni =1 E k Y i k l , the average l th absolute moment of { Y , . . . , Y n } . The polynomials ˜ P r (cid:0) z : { ¯ χ ν,n } (cid:1) aredefined on the pages of 51 −
53 of Bhattacharya and Rao (1986). Define the identity ξ ,n,s ( t ) (cid:16) ∞ X j =0 ( −k t k b n ) j /j ! (cid:17) = ξ n,s ( t ) + o (cid:0) n − ( s − / (cid:1) , uniformly in k t k <
1, where c n is defined in Lemma 3. ψ n,s ( · ) is the Fourier inverse of ξ n,s ( · ). Lemma 3.
Suppose Y , . . . , Y n are mean zero independent random vectors in R k with E n = n − P ni =1 V ar ( Y i ) converging to some positive definite matrix V . Let s ≥ be an integer and ¯ ρ s + δ = O (1) for some δ > . Additionally assume Z to be a N ( , I k ) random vector which is independentof Y , . . . , Y n and the sequence { c n } n ≥ to be such that c n = O ( n − d ) & n − ( s − / ˜ k log n = o ( c n ) where ˜ k = max { k + 1 , s + 1 } & d > is a constant. Then for any Borel set B of R k , (cid:12)(cid:12)(cid:12) P (cid:0) √ n ¯ Y + c n Z ∈ B (cid:1) − Z B ψ n,s ( x ) dx (cid:12)(cid:12)(cid:12) = o (cid:16) n − ( s − / (cid:17) , (8.2) where ψ n,s ( · ) is defined above. Proof of Lemma 3. Define V i = Y i I (cid:16) k Y i k ≤ √ n (cid:17) and W i = V i − EV i . Suppose ¯˜ χ ν,n is the averagecumulant of W , . . . , W n and D n = n − P ni =1 V ar ( W i ). Let ˜ ξ ,n,s , ˜ ξ n,s and ˜ ψ n,s are respectivelyobtained from ξ ,n,s , ξ n,s and ψ n,s with ¯ χ ν,n replaced by ¯˜ χ ν,n and E n replaced by D n . For any Borel OC of Bootstrap in Logistic Regression B ∈ R k , define B n = B − n − / P ni =1 EV i . Then we have (cid:12)(cid:12)(cid:12) P (cid:0) √ n ¯ Y n + c n Z ∈ B (cid:1) − Z B ψ n,s ( x ) dx (cid:12)(cid:12)(cid:12) ≤ (cid:12)(cid:12)(cid:12) P (cid:0) √ n ¯ Y n + c n Z ∈ B (cid:1) − P (cid:16) √ n ¯ V n + c n Z ∈ B (cid:17)(cid:12)(cid:12)(cid:12) + (cid:12)(cid:12)(cid:12) P (cid:0) √ n ¯ W n + c n Z ∈ B n (cid:1) − Z B n ˜ ψ n,s ( x ) dx (cid:12)(cid:12)(cid:12) + (cid:12)(cid:12)(cid:12) Z B n ˜ ψ n,s ( x ) dx − Z B ψ n,s ( x ) dx (cid:12)(cid:12)(cid:12) = I + I + I (say) . (8.3)First we are going to show that I = o (cid:16) n − ( s − / (cid:17) . Now writing G j and G ′ j to be distributionsof n − / Y j and n − / V j , j ∈ { , . . . , n } , we have I ≤ n X j =1 k G j − G ′ j k = 2 n X j =1 P (cid:16) k Y j k > n / (cid:17) = o (cid:16) n − ( s − / (cid:17) , (8.4)due to the fact that n − P nj =1 E k Y j k s + δ = O (1). Next we are going to show I = o (cid:16) n − ( s − / (cid:17) .Define m = inf { j : b jn = o (cid:0) n − ( s − / (cid:1) } . Again note that the eigen values of D n are boundedaway from 0, due to (14.18) in corollary 14.2 of Bhattacharya and Rao (1986) and the fact that E n converges to some positive definite matrix. Therefore we have I = (cid:12)(cid:12)(cid:12) Z B n ˜ ψ m n,s ( x ) dx − Z B ψ m n,s ( x ) dx (cid:12)(cid:12)(cid:12) + o (cid:16) n − ( s − / (cid:17) = I + o (cid:16) n − ( s − / (cid:17) ( say ) , (8.5)uniformly for any Borel set B of R k , where ψ m n,s ( x ) = (cid:26)h s − X r =0 n − r/ ˜ P r (cid:0) − D : (cid:8) ¯ χ ν,n (cid:9)(cid:1)ih m − X j =0 − j ( j !) − c jn ( D ′ D ) j i(cid:27) φ E n ( x ) and˜ ψ m n,s ( x ) = (cid:26)h s − X r =0 n − r/ ˜ P r (cid:0) − D : (cid:8) ¯˜ χ ν,n (cid:9)(cid:1)ih m − X j =0 − j ( j !) − c jn ( D ′ D ) j i(cid:27) φ D n ( x ) . Das D. and Das P.
Now writing l ( u ) = k u k / , u ∈ R k , and a n = n − / P ni =1 EV i , from (8.4) we have I ≤ s − X r =0 m − X j =0 n − r/ b jn (cid:20) Z B n (cid:12)(cid:12)(cid:12)n ˜ P r (cid:0) − D : (cid:8) ¯ χ ν,n (cid:9)(cid:1) l ( − D ) j ! o φ E n ( x ) − n ˜ P r (cid:0) − D : (cid:8) ¯˜ χ ν,n (cid:9)(cid:1) l ( − D ) j ! o φ D n ( x ) (cid:12)(cid:12)(cid:12) dx + Z B (cid:12)(cid:12)(cid:12)n ˜ P r (cid:0) − D : (cid:8) ¯ χ ν,n (cid:9)(cid:1) l ( − D ) j ! o φ E n ( x ) − n ˜ P r (cid:0) − D : (cid:8) ¯ χ ν,n (cid:9)(cid:1) l ( − D ) j ! o φ E n ( x − a n ) (cid:12)(cid:12)(cid:12) dx (cid:21) + o (cid:16) n − ( s − / (cid:17) = I + I + o (cid:16) n − ( s − / (cid:17) (say) . (8.6)Now assume E n = I k , the k × k identity matrix. Then following the proof of Lemma 14.6 ofBhattacharya and Rao (86), it can be shown that I + I = o (cid:16) n − ( s − / (cid:17) . Main ingredients ofthe proof are (14.74), (14.78), (14.79) and bounds similar to (14.80) and (14.86) in Bhattacharya andRao (86). The general case when E n converges to a positive definite matrix, will follow essentiallythrough the same line. Hence from (8.5) and (8.6), we have I = o (cid:0) n − ( s − / (cid:1) . The last step is toshow I = o (cid:16) n ( s − / (cid:17) . Now let us write Γ n = √ n ¯ W n + c n Z . Then recall that I = (cid:12)(cid:12)(cid:12) P (cid:0) Γ n ∈ B n (cid:1) − Z B n ˜ ψ n,s ( x ) dx (cid:12)(cid:12)(cid:12) . By Theorem 4 of chapter 5 of Feller(2014), we can say that Γ n has density with respect to theLebesgue measure. Let us call that density by q n ( · ). Then we have I ≤ Z (cid:12)(cid:12) q n ( x ) − ˜ ψ n,s ( x ) (cid:12)(cid:12) dx ≤ Z (cid:12)(cid:12) q n ( x ) − ˜ ψ n, (˜ k − ( x ) (cid:12)(cid:12) dx + Z (cid:12)(cid:12) ˜ ψ n,s ( x ) − ˜ ψ n, (˜ k − ( x ) (cid:12)(cid:12) dx, (8.7)where ˜ k = max { k + 1 , s + 1 } . Note that R k x k j (cid:12)(cid:12) q n ( x ) − ˜ ψ n, (˜ k − ( x ) (cid:12)(cid:12) dx < ∞ for any j ∈ N ,since ˜ ψ n, (˜ k − ( x ) has negative exponential term and ¯ W n is bounded. Therefore by Lemma 11.6 ofBhattacharya and Rao (86) we have I ≤ C ( k ) (cid:20) max | β |∈{ ,..., ( k +1) } Z (cid:12)(cid:12)(cid:12) D β (cid:16) ˆ q ( t ) − ˜ ξ n, (˜ k − ( t ) (cid:17)(cid:12)(cid:12)(cid:12) dt (cid:21) + Z (cid:12)(cid:12) ˜ ψ n,s ( x ) − ˜ ψ n, (˜ k − ( x ) (cid:12)(cid:12) dx = I + I (say) . (8.8)Here ˆ q n ( · ) is the Fourier transform of the density q ( · ). Clearly I = o (cid:16) n − ( s − / (cid:17) by looking intothe definition of ˜ ψ n,s ( · ). Now define˘ ξ n, (˜ k − ( t ) = (cid:20) ˜ k − X r =0 n − r/ ˜ P r (cid:16) i t : n ¯˜ χ ν,n o(cid:17)(cid:21) exp (cid:16) − t ′ D n t − c n k t k (cid:17) . OC of Bootstrap in Logistic Regression I ≤ C ( k ) max | β |∈{ ,..., ( k +1) } (cid:20) Z (cid:12)(cid:12)(cid:12) D β (cid:16) ˆ q n ( t ) − ˘ ξ n, (˜ k − ( t ) (cid:17)(cid:12)(cid:12)(cid:12) dt + Z (cid:12)(cid:12)(cid:12) D β (cid:16) ˘ ξ n, (˜ k − ( t ) − ˜ ξ n, (˜ k − ( t ) (cid:17)(cid:12)(cid:12)(cid:12) dt (cid:21) = I + I (say) (8.9)First, we are going to show that I = o (cid:16) n − ( s − / (cid:17) . Note that˘ ξ n, (˜ k − ( t ) − ˜ ξ n, (˜ k − ( t ) = (cid:20) ˜ k − X r =0 n − r/ ˜ P r (cid:16) i t : n ¯˜ χ ν,n o(cid:17)(cid:21) exp (cid:16) − t ′ D n t (cid:17) ∞ X j = m c jn k t k j ( − j j j ! , where m = m ( r ) = ( s − − m (˜ k − − r ). Therefore for any β ∈ N k with | β | ∈ { , . . . , k + 1 } we have D β (cid:16) ˘ ξ n, (˜ k − ( t ) − ˜ ξ n, (˜ k − ( t ) (cid:17) = ∗ X ˜ k − X r =0 ∞ X j = m C ( α , β , γ ) n − r/ ( − j c jn j j ! (cid:20) D α (cid:18) ˜ P r (cid:16) i t : n ¯˜ χ ν,n o(cid:17)(cid:19)(cid:21)(cid:20) D γ (cid:18) exp (cid:16) − t ′ D n t (cid:17)(cid:19)(cid:21) D β − α − γ (cid:16) k t k j (cid:17) , (8.10)where P ∗ is over α , γ ∈ N k such that 0 ≤ α , γ ≤ β . Since the degree of the polynomial˜ P r (cid:16) i t : { ¯˜ χ ν,n } (cid:17) is 3 r , D α (cid:18) ˜ P r (cid:16) i t : n ¯˜ χ ν,n o(cid:17)(cid:19) = 0 if | α | > r . When | α | ≤ r , then recallingthat n − P ni =1 E k Y i k s = O (1) and by Lemma 9.5 & Lemma 14.1(v) of Bhattacharya and Rao(1986) we have (cid:12)(cid:12)(cid:12)(cid:12) D α (cid:18) ˜ P r (cid:16) i t : n ¯˜ χ ν,n o(cid:17)(cid:19)(cid:12)(cid:12)(cid:12)(cid:12) ≤ C ( α , r ) (cid:0) ¯ ρ s (cid:1) r/ ( s − (cid:16) (cid:0) ¯ ρ (cid:1) r ( s − / ( s − (cid:17) (1 + k t k r −| α | ) , if 0 ≤ r ≤ ( s − C ( α , r ) n ( r +2 − s ) / ¯ ρ s (cid:16) (cid:0) ¯ ρ (cid:1) r − (cid:17)(cid:16) k t k r −| α | (cid:17) , if r > ( s − . (8.11)Again note that (cid:12)(cid:12)(cid:12)(cid:12) D γ (cid:18) exp (cid:16) − t ′ D n t (cid:17)(cid:19)(cid:12)(cid:12)(cid:12)(cid:12) ≤ C ( γ ) (cid:16) k t k (cid:17) | γ | k D n k | γ | (cid:18) exp (cid:16) − t ′ D n t (cid:17)(cid:19) (8.12)and ∞ X j = m (cid:12)(cid:12)(cid:12)(cid:12) c jn D β − α − γ (cid:16) k t k j (cid:17) j j ! (cid:12)(cid:12)(cid:12)(cid:12) ≤ C ( α , β , γ ) c m n h e c n / + k t k m exp( c n k t k / i , (8.13)8 Das D. and Das P. where m = m ( α , β , γ , r ) = max { m , | β − α − γ | / } . Now combining (8.11)-(8.13), from (8.10)we have I = o (cid:16) n − ( s − / (cid:17) . Last step is to show I = o (cid:16) n − ( s − / (cid:17) . Recall that I = C ( k ) max | β |∈{ ,..., ( k +1) } (cid:20) Z (cid:12)(cid:12)(cid:12) D β (cid:16) ˆ q n ( t ) − ˘ ξ n, (˜ k − ( t ) (cid:17)(cid:12)(cid:12)(cid:12) dt (cid:21) ≤ C ( k ) max | β |∈{ ,..., ( k +1) } (cid:20) Z A n (cid:12)(cid:12)(cid:12) D β (cid:16) ˆ q n ( t ) − ˘ ξ n, (˜ k − ( t ) (cid:17)(cid:12)(cid:12)(cid:12) dt + Z A cn (cid:12)(cid:12)(cid:12) D β (cid:16) ˆ q ( t ) − ˘ ξ n, (˜ k − ( t ) (cid:17)(cid:12)(cid:12)(cid:12) dt (cid:21) = I + I (say) , (8.14)where A n = ( t ∈ R k : k t k ≤ C ( k ) λ − / n (cid:18) n / η / (˜ k − k (cid:19) (˜ k − / ˜ k ) , with C ( k ) being some fixed positive constant, λ n being the largest eigen value of D n , η ˜ k = n − P ni =1 E k B n W i k ˜ k and B n = D − n . Note that D β (cid:16) ˆ q n ( t ) − ˘ ξ n, (˜ k − ( t ) (cid:17) = X ≤ α ≤ β C ( α , β ) D α (cid:18) E (cid:16) e i √ n t ′ ¯ W n (cid:17) − exp (cid:16) − t ′ D n t (cid:17) ˜ k − X r =0 n − r/ ˜ P r (cid:16) i t : n ¯˜ χ ν,n o(cid:17)(cid:19) D β − α (cid:18) exp (cid:16) − c n k t k (cid:17)(cid:19) , (8.15)where (cid:12)(cid:12)(cid:12)(cid:12) D β − α (cid:18) exp (cid:16) − c n k t k (cid:17)(cid:19)(cid:12)(cid:12)(cid:12)(cid:12) ≤ C ( α , β ) c | β − α | n k t k | β − α | exp (cid:16) − c n k t k (cid:17) andby Theorem 9.11 and the following remark of Bhattacharya and Rao (86) we have (cid:12)(cid:12)(cid:12)(cid:12) D α (cid:18) E (cid:16) e i √ n t ′ ¯ W n (cid:17) − exp (cid:16) − t ′ D n t (cid:17) ˜ k − X r =0 n − r/ ˜ P r (cid:16) i t : n ¯˜ χ ν,n o(cid:17)(cid:19)(cid:12)(cid:12)(cid:12)(cid:12) ≤ C ( k ) λ | α | / n η ˜ k n − (˜ k − / h ( t ′ D n t ) (˜ k −| α | / + ( t ′ D n t ) (3(˜ k − | α | ) / i exp (cid:16) − t ′ D n t (cid:17) . (8.16)Now note that ¯ ρ s + δ = O (1) and E n converges to a positive definite matrix E . Hence applyingLemma 14.1(v) (with s ′ = ˜ k ) and corollary 14.2 of Bhattacharya and Rao (86), from (8.15) we have I = o (cid:16) n − ( s − / (cid:17) . Again applying Lemma 14.1(v) and corollary 14.2 of Bhattacharya and Rao(86) we have η ˜ k ≤ C (˜ k, s ) n (˜ k − s ) / ¯ ρ s for large enough n and λ n being converged to some positivenumber. Therefore we have for large enough n , A cn ⊆ B n where B n = n t ∈ R k : k t k > C ( k, E ) n ( s − / k o , OC of Bootstrap in Logistic Regression I ≤ C ( k ) max | β |∈{ ,..., ( k +1) } Z B n (cid:12)(cid:12)(cid:12) D β (cid:16) ˆ q n ( t ) − ˘ ξ n, (˜ k − ( t ) (cid:17)(cid:12)(cid:12)(cid:12) dt ≤ C ( k ) max | β |∈{ ,..., ( k +1) } (cid:20) Z B n (cid:12)(cid:12)(cid:12) D β (cid:16) ˆ q n ( t ) (cid:17)(cid:12)(cid:12)(cid:12) d t + Z B n (cid:12)(cid:12)(cid:12) D β (cid:16) ˘ ξ n, (˜ k − ( t ) (cid:17)(cid:12)(cid:12)(cid:12) dt (cid:21) = I + I (say) , (8.17)for large enough n . To establish I = o (cid:16) n − ( s − / (cid:17) , first we are going to show I = o (cid:16) n − ( s − / (cid:17) .Note that D β (cid:16) ˘ ξ n, (˜ k − ( t ) (cid:17) = X ≤ α ≤ β C ( α , β ) D α (cid:18) ˜ k − X r =0 n − r/ ˜ P r (cid:16) i t : n ¯˜ χ ν,n o(cid:17)(cid:19) D β − α (cid:18) exp (cid:16) − t ′ ˜ D n t (cid:17)(cid:19) , where ˜ D n = D n + c n I k . We are going to use bounds (8.11) and (8.12) with D n being replaced by˜ D n . Note that by Corollary 14.2 of Bhattacharya and Rao (86) and the fact that c n = O ( n − d ),˜ D n converges to the positive definite matrix E , which is the limit of E n . Hence those bounds willimply that for large enough n , I = C ( k ) max | β |∈{ ,..., ( k +1) } Z B n (cid:12)(cid:12)(cid:12) D β (cid:16) ˘ ξ n, (˜ k − ( t ) (cid:17)(cid:12)(cid:12)(cid:12) dt ≤ C ( k, E ) n (˜ k +1 − s ) / Z B n (cid:16) k t k k − (cid:17) exp (cid:16) − C ( E ) k t k / (cid:17) d t ≤ C ( k, E ) n (˜ k +1 − s ) / Z B n exp (cid:16) − C ( E ) k t k / (cid:17) d t . (8.18)Now apply Lemma 2 of the main paper to conclude that I = o (cid:16) n − ( s − / (cid:17) . Only remainingthing to show is I = o (cid:16) n − ( s − / (cid:17) . Note that D β (cid:16) ˆ q n ( t ) (cid:17) = X ≤ α ≤ β C ( α , β ) D α (cid:18) E (cid:16) e i √ n t ′ ¯ W n (cid:17)(cid:19) D β − α (cid:18) exp (cid:16) − c n k t k (cid:17)(cid:19) , (8.19)where (cid:12)(cid:12)(cid:12) D α (cid:16) E (cid:16) e i √ n t ′ ¯ W n (cid:17)(cid:17)(cid:12)(cid:12)(cid:12) ≤ (cid:12)(cid:12)(cid:12) D α (cid:18) n Y i =1 E (cid:16) e i t ′ W i / √ n (cid:17)(cid:17)(cid:12)(cid:12)(cid:12)(cid:12) and (cid:12)(cid:12)(cid:12)(cid:12) D β − α (cid:18) exp (cid:16) − c n k t k (cid:17)(cid:19)(cid:12)(cid:12)(cid:12)(cid:12) ≤ C ( α , β ) (cid:16) k t k | β − α | (cid:17) exp (cid:16) − c n k t k (cid:17) . Now by Leibniz’s rule of differentiation, D α (cid:16) E (cid:16) e i √ n t ′ ¯ W n (cid:17)(cid:17) is the sum of n | α | terms. A typical termis of the form Y i C r E (cid:16) e i t ′ W i / √ n (cid:17) r Y l =1 D β l (cid:16) E (cid:16) e i t ′ W il / √ n (cid:17)(cid:17) , where C r = { i , . . . , i r } ⊂ { , . . . , n } , 1 ≤ r ≤ | α | . β , . . . , β r are non-negative integral vectorssatisfying | β j | ≥ j ∈ { , . . . , r } and P rj =1 β i = α . Note that (cid:12)(cid:12)(cid:12) D β l (cid:16) E (cid:16) e i t ′ W il / √ n (cid:17)(cid:17)(cid:12)(cid:12)(cid:12) ≤ Das D. and Das P. n −| β l | / E k W i l k | β l | and W j l ≤ √ n , which imply that (cid:12)(cid:12)(cid:12)(cid:12) Y i C r E (cid:16) e i t ′ W i / √ n (cid:17) r Y l =1 D β l (cid:16) E (cid:16) e i t ′ W il / √ n (cid:17)(cid:17)(cid:12)(cid:12)(cid:12)(cid:12) ≤ P rl =1 | β l | = 2 | α | ⇒ (cid:12)(cid:12)(cid:12) D α (cid:16) E (cid:16) e i √ n t ′ ¯ W n (cid:17)(cid:17)(cid:12)(cid:12)(cid:12) ≤ (2 n ) | α | . Let K n = C ( k, E ) n ( s − / k . Therefore from (8.19), for large enough n we have I ≤ h max | β |∈{ ,..., ( k +1) } X ≤ α ≤ β C ( α , β ) i (2 n ) k +1 h Z B n (cid:16) k t k k +1 (cid:17) exp (cid:16) − c n k t k (cid:17)i ≤ C ( k )(2 n ) k +1 Z r ≥ K n r k − (cid:16) r k +1 (cid:17) e − c n r / dr ≤ C ( k )(2 n ) k +1 c − n Z r ≥ K n √ πc − n e − c n r / dr ≤ C ( k ) n k + d +1 Z ∞ c n K n / √ √ π e − z / dr = o (cid:16) n − ( s − / (cid:17) . (8.20)The second inequality follows by considering polar transformation. Third inequality follows due tothe assumptions that n − ( s − / ˜ k (log n ) = o ( c n ) and c n = O ( n − d ). The last equality is the implicationof Lemma 2 presented in main paper. Therefore the proof of Lemma 3 is now complete. Lemma 10.
Assume the setup of Theorem 3 and let X i = y i x i , i ∈ { , . . . , n } . Define σ n = n − P ni =1 V ar ( X i ) and ¯ χ ν,n as the ν th average cumulant of { ( X − E ( X )) , . . . , ( X n − E ( X n )) } . P r (cid:0) − Φ σ n : { ¯ χ ν,n } (cid:1) is the finite signed measure on R whose density is ˜ P r (cid:0) − D : { ¯ χ ν,n } (cid:1) φ σ n ( x ) . Let S ( x ) = 1 and S ( x ) = x − / . Suppose σ n is bounded away from both & ∞ and assumptions(C.1)-(C.3) of Theorem 3 hold. Then we have sup x ∈R (cid:12)(cid:12)(cid:12) P (cid:16) n − / n X i =1 (cid:0) X i − E ( X i ) (cid:1) ≤ x (cid:17) − X r =0 n − r/ ( − r S r ( nµ n + n / x ) d r dx r Φ σ n ( x ) − n − / P (cid:0) − Φ σ n : { ¯ χ ν,n } (cid:1) ( x ) (cid:12)(cid:12)(cid:12) = o (cid:0) n − / (cid:1) , (8.21) where P r (cid:0) − Φ σ n : { ¯ χ ν,n } (cid:1) ( x ) is the P r (cid:0) − Φ σ n : { ¯ χ ν,n } (cid:1) − measure of the set ( −∞ , x ] . Proof of Lemma 10. For any integer α , define p n ( x ) = P (cid:0) P ni =1 X i = α (cid:1) and x α,n = n − / ( α − nµ n ). Also define ˜ X n = n − / P ni =1 (cid:0) X i − E ( X i ) (cid:1) and q n, ( x ) = n − / P r =0 n − r/ ˜ P r (cid:0) − D : OC of Bootstrap in Logistic Regression { ¯ χ ν,n } (cid:1) φ σ n ( x ). Note thatsup x ∈R (cid:12)(cid:12)(cid:12) P (cid:16) n − / n X i =1 (cid:0) X i − E ( X i ) (cid:1) ≤ x (cid:17) − X r =0 n − r/ ( − r S r ( nµ n + n / x ) d r dx r Φ σ n ( x ) − n − / P (cid:0) − Φ σ n : { ¯ χ ν,n } (cid:1) ( x ) (cid:12)(cid:12)(cid:12) ≤ sup x ∈R (cid:12)(cid:12) P (cid:16) ˜ X n ≤ x (cid:17) − Q n, ( x ) (cid:12)(cid:12) + sup x ∈R (cid:12)(cid:12) Q n, ( x ) − X r =0 n − r/ ( − r S r ( nµ n + n / x ) d r dx r Φ σ n ( x ) − n − / P (cid:0) − Φ σ n : { ¯ χ ν,n } (cid:1) ( x ) (cid:12)(cid:12) = J + J (say) , (8.22)where Q n, ( x ) = P { α : x α,n ≤ x } q n, ( x α,n ). Now the fact that J = o (cid:0) n − / (cid:1) follows from TheoremA.4.3 of Bhattacharya and Rao (86) and dropping terms of order n − . Now we are going to show J = O (cid:0) n − (cid:1) . Note that J ≤ X α ∈ Θ (cid:12)(cid:12) p n ( x α,n ) − q n, ( x α,n ) (cid:12)(cid:12) = J (say) , where Θ has cardinality ≤ C n , since P (cid:0)(cid:12)(cid:12) n − P ni =1 X i (cid:12)(cid:12) ≤ C (cid:1) = 1 for some constant C >
0, dueto the assumption that max {| x j | : j ∈ { , . . . , n }} = O (1). Hence n − J ≤ C sup α ∈ Θ (cid:12)(cid:12) p n ( x α,n ) − q n, ( x α,n ) (cid:12)(cid:12) = C sup α ∈ Θ J ( α ) (say). Hence enough to show sup α ∈ Θ J ( α ) = O (cid:0) n − (cid:1) . Now define g j ( t ) = E (cid:0) e itX j (cid:1) and f n ( t ) = E ( it ˜ X n ). Then we have f n (cid:0) √ nt (cid:1) = X α ∈ Θ p n (cid:0) x α,n (cid:1) e i √ ntx α,n . Hence by Fourier inversion formula for lattice random variables (cf. page 230 of Bhattacharya andRao (86)), we have p n (cid:0) x α,n (cid:1) = (2 π ) − Z F ∗ e − i √ ntx α,n f n (cid:0) √ nt (cid:1) dt = (2 π ) − n − / Z √ n F ∗ e − itx α,n f n (cid:0) t (cid:1) dt, (8.23)where F ∗ = ( − π, π ), the fundamental domain corresponding to the lattice distribution of P ni =1 X i .Again note that q n, ( x α,n ) = (2 π ) − n − / Z R e − itx α,n X r =0 n − r/ ˜ P r (cid:0) it : { ¯ χ ν,n } (cid:1) e − σ n t / dt. (8.24)2 Das D. and Das P.
Now defining the set E = n t ∈ R : | t | ≤ C ( s ) √ n min (cid:8) C − σ n , C − / σ / n (cid:9)o , from (8.23) & (8.24)we have sup α ∈ Θ J ( α ) ≤ (2 π ) − n − / (cid:20) Z E (cid:12)(cid:12)(cid:12) f n ( t ) − X r =0 n − r/ ˜ P r (cid:0) it : { ¯ χ ν,n } (cid:1) e − σ n t / (cid:12)(cid:12)(cid:12) dt + Z √ n F ∗ ∩ E c | f n ( t ) | dt + Z R∩ ( √ n F ∗ ) c (cid:12)(cid:12)(cid:12) X r =0 n − r/ ˜ P r (cid:0) it : { ¯ χ ν,n } (cid:1) e − σ n t / (cid:12)(cid:12)(cid:12) dt (cid:21) =(2 π ) − n − / (cid:0) J + J + J (cid:1) (say) . (8.25)Note that J = O (cid:0) n − / (cid:1) by applying Lemma 9 of the main paper with s = 5. J = O (cid:0) n − / (cid:1) due to the presence of the exponential term in the integrand and the form of the set E . Moreovernoting the form of the set F ∗ , we can say that there exists constants C >
0, 0 < C , C < π such that J ≤ C sup t ∈√ n F ∗ ∩ E c n Y i =1 (cid:12)(cid:12) g j ( n − / t ) (cid:12)(cid:12) ≤ C sup C ≤| t |≤ C (cid:12)(cid:12) E ( e ity i ) (cid:12)(cid:12) m ≤ C δ m , (8.26)for some 0 < δ <
1. Recall that x i j = 1 for all j ∈ { , . . . , m } . The last inequality is due to the factthat there is no period of E ( e ity i ) in the interval [ C , C ] ∪ [ − C , − C ]. Now J = O ( n − / )follows from (8.26) since m ≥ (log n ) . Therefore the proof is complete. Lemma 11.
Let ˘ W , . . . , ˘ W n be iid mean non-degenerate random vectors in R l for some naturalnumber l , with finite fourth absolute moment and lim sup k t k→∞ (cid:12)(cid:12) E e i t ′ ˘ W (cid:12)(cid:12) < (i.e. Cramer’s condi-tion holds). Suppose ˘ W i = ( ˘ W ′ i , . . . , ˘ W ′ im ) ′ where ˘ W ij is a random vector in R l j and P mj =1 l j = l , m being a fixed natural number. Consider the sequence of random variables ˜ W , . . . , ˜ W n where ˜ W i = ( c i ˘ W ′ i , . . . , c im ˘ W ′ im ) ′ . { c ij : i ∈ { , . . . , n } , j ∈ { , . . . , m }} is a collection of real numberssuch that for any j ∈ { , . . . , m } , n n − P ni =1 | c ij | o = O (1) and lim inf n →∞ n − P ni =1 c ij > . Alsoassume that ˜ V n = V ar ( ˜ W i ) converges to some positive definite matrix and ¯ χ ν,n denotes the average ν th cumulant of ˜ W , . . . , ˜ W n . Then we have sup B ∈A l (cid:12)(cid:12)(cid:12) P (cid:16) n − / n X i =1 ˜ W i ∈ B (cid:17) − Z B h n − / ˜ P r (cid:0) − D : { ¯ χ ν,n } (cid:1)i φ ˜ V n ( t ) d t (cid:12)(cid:12)(cid:12) = o (cid:0) n − / (cid:1) , (8.27) where the collection of sets A l is as defined in (8.1). Proof of Lemma 11. First note that ˜ W , . . . , ˜ W n is a sequence of independent random variables.Hence (8.27) follows by Theorem 20.6 of Bhattacharya and Rao (1986), provided there exists δ ∈ (0 , n , such that for all υ ≤ δ , n − n X i =1 E (cid:13)(cid:13) ˜ W i (cid:13)(cid:13) (cid:16)(cid:13)(cid:13) ˜ W i (cid:13)(cid:13) > υ √ n (cid:17) = o (1) (8.28) OC of Bootstrap in Logistic Regression | α |≤ l +2 Z k t k≥ υ √ n (cid:12)(cid:12)(cid:12) D α E exp( i t ′ R † n ) (cid:12)(cid:12)(cid:12) d t = o (cid:16) n − / (cid:17) (8.29)where R † n = n − / P ni =1 (cid:0) Z i − E Z i (cid:1) with Z i = ˜ W i (cid:16)(cid:13)(cid:13) ˜ W i (cid:13)(cid:13) ≤ υ √ n (cid:17) . First consider (8.28). Note that max n | c ij | : i ∈ { , . . . , n } , j ∈ { , . . . , m } o = O (cid:0) n / (cid:1) . There-fore, we have for any υ > n − n X i =1 E (cid:13)(cid:13) ˜ W i (cid:13)(cid:13) (cid:16)(cid:13)(cid:13) ˜ W i (cid:13)(cid:13) > υ √ n (cid:17) ≤ n − n X i =1 E (cid:16) m X j =1 c ij (cid:13)(cid:13) ˘ W ij (cid:13)(cid:13) (cid:17) / (cid:16) m X j =1 c ij (cid:13)(cid:13) ˘ W ij (cid:13)(cid:13) > υ n (cid:17) ≤ n − n X i =1 (cid:16) m X j =1 c ij (cid:17) E (cid:20)(cid:13)(cid:13) ˘ W (cid:13)(cid:13) (cid:16)(cid:13)(cid:13) ˘ W (cid:13)(cid:13) > C υ n / (cid:17)(cid:21) = o (1) . Now consider (8.29). Note that for any | α | ≤ l + 2, | D α E exp( i t ′ R † n ) | is bounded above by a sumof n | α | -terms, each of which is bounded above by C ( α ) · n −| α | / max { E k Z i − E Z i k | α | : k ∈ I n } · Y i ∈ I c n | E exp( i t ′ Z i / √ n ) | (8.30)where I n ⊂ { , . . . , n } is of size | α | and I c n = { , . . . , n }\ I n . Now for any ω > t ∈ R l j , definethe set B ( j ) n ( t , ω ) = n i : 1 ≤ i ≤ n and | c ij |k t k > ω o . Hence for any t ∈ R l writing t = (cid:0) t ′ , . . . , t ′ m (cid:1) ′ , t j is of length l j , we havesup ( Y i ∈ I c n | E exp( i t ′ Z k / √ n ) | : k t k ≥ υ √ n ) = sup ( Y i ∈ I c n | E exp( i t ′ Z k ) | : k t k ≥ υ ) ≤ max ( sup (cid:26) Y i ∈ I c n ∩ B ( j ) n (cid:16) t j k t j k ,υ/ √ (cid:17) h | E exp (cid:16) ic ij t ′ j ˘ W j (cid:17) | + P (cid:16) k ˘ W k > C υ n / (cid:17)i : k t j k ≥ υ/ √ (cid:27) : j ∈ (cid:8) , . . . , m (cid:9)) Das D. and Das P.
Now since (cid:12)(cid:12) I c n (cid:12)(cid:12) ≥ (cid:12)(cid:12)(cid:12) I c n ∩ B ( j ) n (cid:16) t j k t j k , υ/ √ (cid:17)(cid:12)(cid:12)(cid:12) ≥ (cid:12)(cid:12) B ( j ) n (cid:16) t j k t j k , υ/ √ (cid:17)(cid:12)(cid:12) − | α | , due to Cramer’s conditionwe havesup (cid:26) Y i ∈ I c n ∩ B ( j ) n (cid:16) t j k t j k ,υ/ √ (cid:17) h | E exp (cid:16) ic ij t ′ j ˘ W j (cid:17) | + P (cid:16) k ˘ W k > C υ n / (cid:17)i : k t q k ≥ υ/ √ (cid:27) ≤ θ (cid:12)(cid:12)(cid:12) B ( j ) n (cid:16) t j k t j k ,υ/ √ (cid:17)(cid:12)(cid:12)(cid:12) − (cid:12)(cid:12) α (cid:12)(cid:12) (8.31)Next note that lim inf n →∞ n − P ni =1 c ij > j ∈ { , . . . , m } . Therefore for any j ∈{ , . . . , m } , u ∈ R l j with | u | = 1, there exists 0 < δ < n wehave nδ ≤ n X i =1 (cid:12)(cid:12) uc ij (cid:12)(cid:12) ≤ max n(cid:12)(cid:12) c ij (cid:12)(cid:12) : 1 ≤ i ≤ n o · | B ( j ) n ( u, ω ) | + (cid:16) n − | B ( j ) n ( u, ω ) | (cid:17) · ω ≤ C · n / · | B ( j ) n ( u , ω ) | + nω which implies | B ( j ) n ( u , ω ) | ≥ C · n / whenever ω < p δ /
2. Therefore taking δ = p δ /
3, (8.29)follows from (8.30) and (8.31).
In this section we present expanded forms of the pivots and the forms of the confidence intervalsobtained based on our proposed Bootstrap method. Code details for the reproduction of the resultsof Section 6 and 7 of the main manuscript can be supplied if required.Recall that our model is y i = 1 , w.p. p ( β | x i ) , = 0 , w.p. [1 − p ( β | x i )] , where p ( β | x i ) = exp ( x Ti β )1+exp ( x Ti β ) , i ∈ { , . . . , n } . Here y , . . . , y n are independent binary responses and x , . . . , x n are known non-random design vectors. β = ( β ,n , . . . , β p,n ) is the p -dimensional vectorof regression parameters. For the rest of this section x i, A denotes the sub vector of x i comprisingof only components belonging to the set A where A ⊆ { , . . . , p } . For any vector γ of length p , γ A is the sub vector of γ comprising of only components belonging to the set A .The logistic regression estimator β n of β is defined asˆ β n = Argmax β L ( β | y , . . . , y n , x , . . . , x n ) , OC of Bootstrap in Logistic Regression L ( β | y , . . . , y n , x , . . . , x n ) = Q ni =1 p ( x i ) y i (1 − p ( x i )) − y i is the likelihood. The Bootstrapversion [hereafter referred to as PEBBLE] ˆ β ∗ n of ˆ β n is defined asˆ β ∗ n = arg max t " n X i =1 n ( y i − ˆ p ( x i )) x ′ i t o ( G ∗ i − µ G ∗ ) + µ G ∗ n X i =1 n ˆ p ( x i )( x ′ i t ) − log(1 + e x ′ i t ) o , where G , . . . , G ∗ n are iid copies of a non-negative & non-degenerate random variable G ∗ with V ar ( G ∗ ) = µ G ∗ and E ( G ∗ − µ G ∗ ) = µ G ∗ . One example of the distribution of G ∗ is Beta (1 / , / The original studentized pivot for the parameter vector β isˇ H n = ˆ M − / n ˆ L n (cid:2) √ n (cid:0) ˆ β n − β (cid:1)(cid:3) + ˆ M − / n b n Z , where ˆ L n = n − P ni =1 x i x ′ i e x ′ i ˆ β n (1+ e x ′ i ˆ β n ) − , ˆ M n = n − P ni =1 (cid:0) y i − ˆ p ( x i ) (cid:1) x i x ′ i , ˆ p ( x i ) = exp ( x Ti ˆ β n )1+exp ( x Ti ˆ β n ) . Z is distributed as N (cid:0) , D (cid:1) where D is a p × p diagonal matrix, independent of y , . . . , y n . { b n } n ≥ is a sequence of real numbers such that b n = O ( n − d ) and n − /p log n = o ( b n ) where d > p = max { p + 1 , } . Corresponding PEBBLE version of the studentized pivot isdefined as ˇ H ∗ n = ˆ M ∗− / n L ∗ n (cid:2) √ n (cid:0) ˆ β ∗ n − ˆ β n (cid:1)(cid:3) + ˆ M ∗− / n b n Z ∗ , where L ∗ n = n − P ni =1 x i x ′ i e x ′ i ˆ β ∗ n (cid:0) e x ′ i ˆ β ∗ n (cid:1) − and ˆ M ∗ n = n − P ni =1 (cid:0) y i − ˆ p ( x i ) (cid:1) x i x ′ i µ − G ∗ ( G ∗ i − µ G ∗ ) . Z ∗ has the same distribution as Z , independent of y , . . . , y n and G ∗ , . . . , G ∗ n .For some α ∈ (0 , (cid:16) k ˇ H ∗ n k (cid:17) α be the α th quantile of the Bootstrap distribution of k ˇ H ∗ n k .Then the 100(1 − α )% confidence region of β is given by (cid:26) β : k ˇ H n k ≤ (cid:16) k ˇ H ∗ n k (cid:17) (1 − α ) (cid:27) . The pivotal quantity for the j th component of β is formulated asˇ H j,n = ˆΣ − j,n (cid:18) √ n ( ˆ β j,n − β j ) + b n (cid:16) ˆ L − n (cid:17) ′ j · Z (cid:19) , where ˆ β j,n & β j,n are respectively the j th component of ˆ β n and β , j ∈ { , . . . , p } . ˆΣ j,n is the( j, j )-th element of ˆ Σ n where ˆ Σ n = ˆ L − n ˆ M n ˆ L − n and (cid:16) ˆ L − n (cid:17) ′ j · is the j -th row of ˆ L − n . Similarly theBootstrap version corresponding to ˇ H j,n is defined asˇ H ∗ j,n = Σ ∗− j,n (cid:18) √ n ( ˆ β ∗ j,n − ˆ β j,n ) + b n (cid:16) L ∗− n (cid:17) ′ j · Z ∗ (cid:19) , where ˆ β ∗ j,n is the j th component of the vector ˆ β ∗ n , j ∈ { , . . . , p } . Σ ∗ j,n is the ( j, j )-th element of Σ ∗ n where Σ ∗ n = L ∗− n ˆ M ∗ n L ∗− n and (cid:16) L ∗− n (cid:17) ′ j · is the j -th row of ˆ L ∗− n .6 Das D. and Das P.
Now define (cid:0) ˇ H ∗ j,n (cid:1) α to be the α th quantile of the Bootstrap distribution of ˇ H ∗ j,n for some α ∈ (0 , − α )% two-sided confidence interval of β j is given by "(cid:26) ˆ β j,n − ˆΣ / j,n u ∗ j √ n (cid:27) , (cid:26) ˆ β j,n − ˆΣ / j,n l ∗ j √ n (cid:27) , where l ∗ j = h ( ˇ H ∗ j,n ) α/ − b n ˆΣ − j,n (cid:16) ˆ L − n (cid:17) ′ j · Z i and u ∗ j = h ( ˇ H ∗ j,n ) (1 − α ) / − b n ˆΣ − j,n (cid:16) ˆ L − n (cid:17) ′ j · Z i . Again100(1 − α )% lower and upper confidence intervals of β j are respectively given by − ∞ , (cid:26) ˆ β j,n − ˆΣ / j,n l ∗ j √ n (cid:27) and "(cid:26) ˆ β j,n − ˆΣ / j,n u ∗ j √ n (cid:27) , ∞ ! , where l ∗ j = h ( ˇ H ∗ j,n ) α − b n ˆΣ − j,n (cid:16) ˆ L − n (cid:17) ′ j · Z i and u ∗ j = h ( ˇ H ∗ j,n ) (1 − α ) − b n ˆΣ − j,n (cid:16) ˆ L − n (cid:17) ′ j · Z i . References [1] AMEMIYA, T. (1976). The maximum likelihood, the minimum chi-square, and the non-linearweighted least squares estimator in the general qualitative response model.
Journal of theAmerican Statistical Association . Cochrane Database Syst Rev. CD009430.[3] BALCI, A., DRENTHEN, W., MULDER, B.J. et al. (2011). Pregnancy in women with cor-rected tetralogy of Fallot: occurrence and predictors of adverse events.
Am Heart J.
The weighted Bootstrap . Lecture Notes in Statistics.[5] BERKSON, J. (1944). Application of the Logistic Function to Bio-Assay.
Journal of the Amer-ican Statistical Association . Annals of Mathematical Statistics . Matrix Analysis . Springer.[8] BHATTACHARYA, R. N. and GHOSH, J. K. (1978). On the validity of the formal Edgeworthexpansion.
Ann. Statist. Normal approximation and asymp-totic expansions . John Wiley & Sons.[10] CLAESKENS, G., AERTS, M. and MOLENBERGHS, G. (2003). A quadratic Bootstrapmethod and improved estimation in logistic regression.
Statistics & Probability Letters . Journal of the Royal Sta-tistical Society: Series B . . 215–232. OC of Bootstrap in Logistic Regression
Ann. of Statist. (47)
Bernoulli . Biometrika Annals of Statistics . Annals of Statistics . Ann. Statist. , 1218-–1228.[18] FUK, D. H. and NAGAEV, S. V. (1971). Probabilistic inequalities for sums of independentrandom variables. Teor. Verojatnost. i Primenen. Higher Order Asymptotics . NSF-CBMS regional conference series inprobability and statistics, Vol. 4.[20] GOURIEROU, C. and MONFORT, A. (1981). Asymptotic properties of the maximum likeli-hood estimator in dichotomous logit models.
Journal of Econometrics . Annals of Statistics . The Bootstrap and Edgeworth expansion . Springer Series in Statistics.[23] HOSMER, D. W., LEMESHOW, S. and STURDIVANT, R. X. (2013).
Applied Logistic Re-gression . Wiley Series in Probability and Statistics.[24] LAHIRI, S. N. (1989). Bootstrap approximations to the distributions of m-estimators.
Thesis .[25] LAHIRI, S. N. (1992). Bootstrapping M-estimators of a multiple linear regression parameter.
Ann. Statist. Jour-nal of Multivariate Analysis . Sankhya A . Commu-nications in Statistics - Theory and Methods . Ann. Statist. Ann. Statist. In P.Zarembka, ed., Frontiers in Econometrics . New York: Academic Press.8
Das D. and Das P. [32] MOULTON, L. H., and ZEGER, S. L. (1989). Analyzing repeated measures on generalizedlinear models via the Bootstrap.
Biometrics
Com-putational Statistics and Data Analysis
Neth Heart J.
Plos One e0210655.[36] YAP, S.C., DRENTHEN, W., PIEPER, P.G. et al. (2008). On behalf of the ZAHARA Inves-tigators. Risk of complications during pregnancy in women with congenital aortic stenosis.
IntJ Cardiol.126