Principles for Covariate Adjustment in Analyzing Randomized Clinical Trials
PPrinciples for Covariate Adjustment inAnalyzing Randomized Clinical Trials
Ting Ye , Jun Shao and Qingyuan Zhao Abstract
In randomized clinical trials, adjustments for baseline covariates at both designand analysis stages are highly encouraged by regulatory agencies. A recent trendis to use a model-assisted approach for covariate adjustment to gain credibility andefficiency while producing asymptotically valid inference even when the model is in-correct. In this article we present three principles for model-assisted inference insimple or covariate-adaptive randomized trials: (1) guaranteed efficiency gain princi-ple, a model-assisted method should often gain but never hurt efficiency; (2) validityand universality principle, a valid procedure should be universally applicable to allcommonly used randomization schemes; (3) robust standard error principle, varianceestimation should be heteroscedasticity-robust. To fulfill these principles, we recom-mend a working model that includes all covariates utilized in randomization and alltreatment-by-covariate interaction terms. Our conclusions are based on asymptotictheory with a generality that has not appeared in the literature, as most existingresults are about linear contrasts of outcomes rather than the joint distribution andmost existing inference results under covariate-adaptive randomization are specialcases of our theory. Our theory also reveals distinct results between cases of twoarms and multiple arms.
Keywords:
Analysis of covariance; Covariate-adaptive randomization; Efficiency; Hetero-geneity and heteroscedasticity; Model-assisted; Multiple treatment arms. Department of Statistics, University of Pennsylvania. School of Statistics, East China Normal University; Department of Statistics, University of Wisconsin. Department of Pure Mathematics and Mathematical Statistics, University of Cambridge.Corresponding to Dr. Jun Shao. Email: [email protected] . a r X i v : . [ s t a t . M E ] S e p Introduction
Consider a clinical trial for assessing the causal effects from giving k treatments. Patientsin a sample for the trial are randomized into the k treatment arms according to fixedproportions. Let Y ( t ) be a discrete or continuous potential response from a patient undertreatment t (we assume each patient in the trial receives one and only one treatment). Let θ be the k -dimensional vector whose t th component is θ t = E ( Y ( t ) ), the unknown meanpotential response under treatment t , where E denotes population expectation. Based ondata collected from the trial, we would like to make statistical inference on given functionsof θ , such as a linear contrast θ t − θ s , a ratio θ t /θ s , or an odds ratio { θ t / (1 − θ t ) } / { θ s / (1 − θ s ) } between two treatment arms t and s .In clinical trials, we typically observe some baseline covariates for each patient, whichare measured prior to treatment assignments and, hence, are not affected by the treatment.As emphasized in regulatory agency guidelines, the baseline covariates can be utilized inthe following two ways. (i) In the design stage, covariate-adaptive randomization can beused to enforce the balancedness of treatments across levels of Z whose components arediscrete baseline prognostic factors, such as institution, disease stage, prior treatment,gender, and age group. “Balance of treatment groups with respect to one or more specificprognostic covariates can enhance the credibility of the results of the trial” (EMA, 2015).(ii) In the analysis stage, a vector X of baseline covariates can be used to gain efficiency.“Incorporating prognostic factors in the primary statistical analysis of clinical trial data canresult in a more efficient use of data to demonstrate and quantify the effects of treatment”(FDA, 2019). More specifically, the investigator is advised to identify “the covariatesexpected to have an important influence on the primary outcome” and to specify “how toaccount for them in the analysis in order to improve precision and to compensate for anylack of balance between groups” (ICH E9 (R1), 2019).The well known analysis of variance (ANOVA) provides asymptotically valid inference2n functions of θ for either discrete or continuous Y ( t ) when simple randomization is used,i.e., patient treatment assignments are randomized independently. However, ANOVA doesnot utilize the baseline covariates and can be inefficient.An alternative is to model the dependence of Y on a baseline covariate vector X forefficiency gain. However, such a model-based approach is unreliable because the inferencemay be invalid if the model is misspecified (EMA, 2015). In practice, it is often hard tospecify the correct model or test any lack of fit. For this reason, a great deal of effort hasbeen made to develop model-assisted approaches that gain efficiency by utilizing covariatesin a working model and produce asymptotically valid inference even when the workingmodel is misspecified. Development of covariate adjustment methods is encouraged bythe most recent guidance for COVID-19: “To improve the precision of treatment effectestimation and inference, sponsors should ... propose methods of covariate adjustment”(FDA, 2020).One example of the model-assisted approach is the analysis of covariance (ANCOVA)which is widely recognized by researchers and regulatory agencies (EMA, 2015; FDA, 2019).The working model used by the customary ANCOVA, originally proposed by Fisher (1935),does not include treatment-by-covariate interaction terms ( § homogeneous working model . In fact, treatment-by-covariate interaction termsare often ignored or even discouraged because of two perceptions: (i) even if the workingmodel is misspecified, ANCOVA can still provide valid inference as it is model-assisted; (ii)a model without interaction terms have fewer coefficients to estimate and may have betterfinite sample properties. However, these perceptions only provide a partial picture. Whenthe treatment effect is indeed heterogeneous, the ANCOVA estimator of θ without includ-ing treatment-by-covariate interaction in its working model may be even less efficient thanthe ANOVA estimator that uses no model assistance at all (Freedman, 2008; Lin, 2013).The confusion about how ANCOVA should be implemented can be seen from recent con-3icting recommendations by regulatory agencies: “The primary model should not includetreatment by covariate interactions.” (EMA, 2015); “The prespecified primary model caninclude interaction terms if appropriate” and “The closer the model approximates the truerelationship between the outcome and the covariates, the greater the improvement in thepower of significance tests and the precision of estimates...” (FDA, 2019).This leads to the first of our three principles presented in this article: Principle 1 (Guaranteed efficiency gain) . The working model should be chosen so thatthe model-assisted estimator often gains but never loses efficiency when compared to abenchmark estimator that does not adjust for any covariate.
Throughout this paper, the benchmark estimator is ANOVA. To fulfill Principle 1,we recommend a heterogeneous working model for ANCOVA that includes treatment-by-covariate interaction terms. We call this approach ANalysis of HEterogeneous COVAriance(ANHECOVA) to distinguish from ANCOVA that specifically refers to ANCOVA using thehomogeneous working model. Our recommendation is motivated by the theoretical resultsin earlier work (Cassel et al., 1976; Lin, 2013; Liu and Yang, 2020; Li and Ding, 2020, amongothers), which we extend in § § θ is consistent, asymptoticallynormal, and asymptotically more efficient than the ANCOVA estimator or the benchmarkANOVA estimator; in fact, it is asymptotically the most efficient estimator within a class ofmodel-assisted estimators. In § § Principle 2 (Validity and universality) . The model-assisted inference should be asymptot-ically valid and can be universally applied to all commonly used randomization schemes.
In addition to validity, the universality means that a unified procedure can be used underall commonly used randomization schemes, including covariate-adaptive randomization andsimple randomization. Our asymptotic theory in § § θ requires an additional condition (C3) on randomization, which is5ot satisfied by the popular Pocock-Simon’s minimization method. Second, the asymptoticvariance of the ANCOVA estimator varies with the randomization scheme.Our third principle concerns robust variance estimation for asymptotically normallydistributed estimators of functions of θ , another crucial step for inference. Principle 3 (Robust standard error) . The model-assisted inference should use robust stan-dard errors that account for heteroscedasticity.
The asymptotic theory for heteroscedasticity-robust standard errors was developeddecades ago (Huber, 1967; White, 1980) and has been widely used in econometrics. Alarm-ingly, its usage in clinical trials is scarce. This is particularly relevant for model-assistedinference because heteroscedasticity may arise from incorrect working models. Further-more, the robust standard error also needs to take into account of the centering of baselinecovariates before fitting working models, which is needed to identify the mean potentialresponses. Thus standard formulas for heteroscedasticity-robust standard errors do notdirectly apply to model-assisted inference for clinical trials.The conclusions we draw in this paper are based on a new theoretical result on the jointasymptotic distribution of a class of covariate-adjusted estimators for θ , under simple orcovariate-adaptive randomization. Our new result is quite general and subsumes previousresults (mostly on the linear contrast θ t − θ s ) as special cases (Shao et al., 2010; Lin, 2013;Bugni et al., 2018, 2019). Furthermore, our theory offers new insights on when ANCOVAmay gain or lose efficiency over ANOVA. For example, under simple randomization withtwo treatment arms, Lin (2013) showed that ANCOVA with homogeneous working modelsatisfies Principle 1 (guaranteed efficiency gain over ANOVA) if the treatment allocationis balanced. However, our theory shows that this does not extend to trials with more thantwo arms, and is thus a peculiar property.After introducing the notation, basic assumptions, and working models in §
2, we presentthe methodology and theory in §
3. Some numerical results are given in §
4. The paper is6oncluded with recommendations and discussions for clinical trial practice in §
5. Technicalproofs can be found in the supplementary material.
Recall that Y ( t ) represents the potential response under treatment t , t = 1 , . . . , k . We use Z to denote the vector of discrete baseline covariates used in covariate-adaptive randomizationand X to denote the vector of baseline covariates used in model-assisted inference. Thevectors Z and X are allowed to share the same entries.Suppose that a random sample of n patients is obtained from the population under inves-tigation. For the i th patient, let Y (1) i , ..., Y ( k ) i , Z i , and X i be the realizations of Y (1) , ..., Y ( k ) , Z , and X , respectively. We impose the following mild condition.(C1) ( Y (1) i , . . . , Y ( k ) i , Z i , X i ), i = 1 , . . . , n , are independent and identically distributed withfinite second order moments. The distribution of baseline covariates is not affectedby treatment and the covariance matrix Σ X = var( X i ) is positive definite.Notice that no model between the potential responses and baseline covariates is assumedin (C1), and the potential responses can be either discrete or continuous. Let π , . . . , π k be the pre-specified treatment assignment proportions, 0 < π t <
1, and (cid:80) kt =1 π t = 1. Let A i be the k -dimensional treatment indicator vector that equals a t ifpatient i receives treatment t , where a t denotes the k -dimensional vector whose t th com-ponent is 1 and other components are 0. For patient i , the treatment A i is assigned afterbaseline covariates Z i and X i are observed; only one treatment is assigned according to A i . The observed response is Y i = Y ( t ) i if and only if A i = a t . Once the treatments are7ssigned and the responses are recorded, the statistical inference is based on the observed( Y i , Z i , X i , A i ) for i = 1 , ..., n .The simple randomization scheme assigns patients to treatments completely at random,under which the A i ’s are independent of the ( Y (1) i , ..., Y ( k ) i , X i )’s and are independent andidentically distributed with P ( A i = a t ) = π t , t = 1 , ..., k . It does not make use of covariatesand, hence, may yield sample sizes that substantially deviate from the target assignmentproportions across levels of the prognostic factors.To improve the credibility of the trial, it is often desirable to enforce the targeted treat-ment assignment proportions across levels of Z by using covariate-adaptive randomization.The three most popular covariate-adaptive randomization schemes are the stratified per-muted block (Zelen, 1974), the stratified biased coin (Shao et al., 2010; Kuznetsova andJohnson, 2017), both of which use all joint levels of Z as strata, and Pocock-Simon’s min-imization (Taves, 1974; Pocock and Simon, 1975; Han et al., 2009), which aims to enforcetreatment assignment proportions across marginal levels of Z .All these covariate-adaptive randomization schemes, as well as the simple randomiza-tion, satisfy the following mild condition (Baldi Antognini and Zagoraiou, 2015).(C2) The discrete covariate Z used in randomization has finitely many joint levels in Z andsatisfies (i) given { Z i , i = 1 , . . . , n } , { A i , i = 1 , . . . , n } is conditionally independentof { ( Y (1) i , ..., Y ( k ) i , X i ) , i = 1 , . . . , n } ; (ii) as n → ∞ , n t ( z ) /n ( z ) → π t almost surely,where n ( z ) is the number of patients with Z = z and n t ( z ) is the number of patientswith Z = z and treatment t , z ∈ Z , t = 1 , ..., k . The benchmark ANOVA approach does not model how the potential responses Y (1) i , ..., Y ( k ) i depend on the baseline covariates X i . It is based on E ( Y i | A i ) = ϑ T A i , (1)8here ϑ is a k -dimensional unknown vector and c T denotes the row vector that is thetranspose of a column vector c . By Lemma 2 in the supplementary material, ϑ identifies θ = ( θ , ..., θ k ) T , where θ t = E ( Y ( t ) ) is the mean potential response under treatment t . In the classical exact ANOVA inference, the responses are assumed to have normaldistributions. So a common perception is that ANOVA can only be used for continuousresponses. As normality is not necessary in the asymptotic theory, the ANOVA and theother approaches introduced next can be used for non-normal or even discrete responseswhen n is large.To utilize the baseline covariates, ANCOVA is based on the following homogeneousworking model, E ( Y i | A i , X i ) = ϑ T A i + β/ T ( X i − µ X ) , (2)where ϑ and β/ are unknown vectors having the same dimensions as A and X , respectively,and µ X = E ( X i ). There is no treatment-by-covariate interaction terms in (2), whichis incorrect if patients with different covariates benefit differently from receiving the sametreatment, a scenario that often occurs in clinical trials. By Lemma 2 in the supplementarymaterial, E { Y i − ϑ T A i − β/ T ( X i − µ X ) } is minimized at ( ϑ, β/ ) = ( θ, β ), where β = (cid:80) kt =1 π t β t and β t = Σ − X cov( X i , Y ( t ) i ). Thus, the ANCOVA estimator with working model (2) is model-assisted (Theorems 1 and 3 in § X , we consider an alternative working model for ANCOVA thatincludes the treatment-by-covariate interactions: E ( Y i | A i , X i ) = ϑ T A i + k (cid:88) t =1 β/ Tt ( X i − µ X ) I ( A i = a t ) , (3)where ϑ, β/ , . . . , β/ k are unknown vectors and I ( · ) is the indicator function. We call model (3)the heterogeneous working model because it includes the interaction terms to accommodate9he treatment effect heterogeneity across covariates, i.e., patients with different covariatevalues may benefit differently from the treatment. By Lemma 2 in the supplementarymaterial, E { Y i − ϑ T A i − (cid:80) kt =1 β/ Tt ( X i − µ X ) I ( A i = a t ) } is minimized at ( ϑ, β/ , ..., β/ k ) =( θ, β , ..., β k ), where β t = Σ − X cov( X i , Y ( t ) i ), i.e., ANCOVA with working model (3) is alsomodel-assisted.To differentiate the methods based on (2) and (3), we refer to the method based on (2)as ANCOVA and the one based on (3) as ANHECOVA.As a final remark, both working models (2) and (3) use the centered covariate vector X − µ X . Otherwise, ANCOVA and ANHECOVA do not directly provide estimators of θ .Centering is crucial, the only non-trivial exception is when homogeneous working model (2)is used and linear contrast θ t − θ s is estimated, as the covariate mean µ X cancels out. Whenfitting the working models (2) and (3) with real datasets, we can use the least squares with µ X replaced by ¯ X , the sample mean of all X i ’s. In other words, we can center the baselinecovariates before fitting the models. Since this step introduces non-negligible variation tothe estimation, it affects the asymptotic variance of model-assisted estimator of θ and itsestimation for inference. Thus, we cannot assume the data has been centered in advanceand µ X = 0 without loss of generality (see § We first describe the estimators of θ under (1)-(3). The traditional ANOVA estimator isˆ θ AN = ( ¯ Y , ..., ¯ Y k ) T , (4)where ¯ Y t is the sample mean of the responses Y i ’s from patients under treatment t . As n → ∞ , ˆ θ AN is consistent and asymptotically normal.10sing the homogeneous working model (2), the ANCOVA estimator of θ is the leastsquares estimator of the coefficient vector ϑ in the linear model (2) with ( A i , X i ) as regres-sors. It has the following explicit formula,ˆ θ ANC = (cid:16) ¯ Y − ˆ β T ( ¯ X − ¯ X ) , ..., ¯ Y k − ˆ β T ( ¯ X k − ¯ X ) (cid:17) T , (5)where ¯ X t is the sample mean of X i ’s from patients under treatment t , ¯ X is the samplemean of all X i ’s, andˆ β = (cid:40) k (cid:88) t =1 (cid:88) i : A i = a t ( X i − ¯ X t )( X i − ¯ X t ) T (cid:41) − k (cid:88) t =1 (cid:88) i : A i = a t ( X i − ¯ X t ) Y i (6)is the least squares estimator of β/ in (2). It is shown in Theorems 1 and 3 that ˆ θ ANC isconsistent and asymptotically normal as n → ∞ regardless of whether working model (2)is correct or not, i.e., ANCOVA is model-assisted.The term ˆ β T ( ¯ X t − ¯ X ) in (5) is an adjustment for covariate X applied to the ANOVAestimator ¯ Y t . However, it may not be the best adjustment in order to reduce the variance.A better choice is to use heterogeneous working model (3). The ANHECOVA estimator of θ is the least squares estimator of ϑ under model (3),ˆ θ ANHC = (cid:16) ¯ Y − ˆ β T ( ¯ X − ¯ X ) , ..., ¯ Y k − ˆ β Tk ( ¯ X k − ¯ X ) (cid:17) T , (7)where ˆ β t = (cid:40) (cid:88) i : A i = a t ( X i − ¯ X t )( X i − ¯ X t ) T (cid:41) − (cid:88) i : A i = a t ( X i − ¯ X t ) Y i (8)is the least squares estimator of β/ t in (3) for each t . It is shown in Theorems 1-3 belowthat the ANHECOVA estimator ˆ θ ANHC is not only model-assisted, but also asymptoticallyat least as efficient as ˆ θ AN and ˆ θ ANC , regardless of whether model (3) is correct or not.The following heuristics reveal why the adjustment ˆ β Tt ( ¯ X t − ¯ X ) in (7) is better thanthe adjustment ˆ β T ( ¯ X t − ¯ X ) in (5), and why ANHECOVA often gains but never hurtsefficiency even if model (3) is wrong. As the treatment has no effect on X , both ¯ X t and ¯ X β Tt ( ¯ X t − ¯ X ) is an “estimator” of zero. As n → ∞ ,ˆ β t converges to β t = Σ − X cov( X, Y ( t ) ) in probability, regardless of whether (3) is correct ornot (Lemma 3 in the supplementary material). Hence, we can “replace” ˆ β Tt ( ¯ X t − ¯ X ) by β Tt ( ¯ X t − ¯ X ). Under simple randomization,var { ¯ Y t − β Tt ( ¯ X t − ¯ X ) } = var( ¯ Y t ) + var { β Tt ( ¯ X t − ¯ X ) } − { ¯ Y t , β Tt ( ¯ X t − ¯ X ) } = var( ¯ Y t ) − var { β Tt ( ¯ X t − ¯ X ) } . (9)Consequently, ¯ Y t − ˆ β Tt ( ¯ X t − ¯ X ) has a smaller asymptotic variance than ¯ Y t . Note that (9)does not hold with β t replaced by other quantities. This explains why the adjustmentˆ β T ( ¯ X t − ¯ X ) in ANCOVA may lose efficiency, as ˆ β in (6) converges to π β + · · · + π k β k .The variance reduction technique by (9) can be found in the generalized regression(GREG) approach in the survey sampling literature (Cassel et al., 1976; Cochran, 1977;S¨arndal et al., 2003; Fuller, 2009; Shao and Wang, 2014; Ta et al., 2020). From the theoryof GREG, ˆ β t in (7) may be replaced by any estimator that converges to β t in probability,without affecting the asymptotic distribution of the GREG estimator. This motivatesthe following potential improvement to (8), which utilizes the fact that X has the samecovariance across treatments and estimates the covariance matrix of X using all patients,ˆ β t = nn t (cid:40) n (cid:88) i =1 ( X i − ¯ X )( X i − ¯ X ) T (cid:41) − (cid:88) i : A i = a t ( X i − ¯ X t ) Y i , (10)where n t is the number of units under treatment t . This alternative estimator alleviatesthe concern of using an unstable inverse in (8) when the sample size is small. We consider asymptotic theory under simple randomization for a general class of estimatorsof the form ˆ θ ( ˆ b , ..., ˆ b k ) = (cid:16) ¯ Y − ˆ b T ( ¯ X − ¯ X ) ..., ¯ Y k − ˆ b Tk ( ¯ X k − ¯ X ) (cid:17) T , (11)12here ˆ b t ’s have the same dimension as X and can either be fixed or depend on the trialdata. Note that class (11) contains all estimators we have discussed so far:ˆ θ ( ˆ b , ..., ˆ b k ) = ˆ θ AN if ˆ b t = 0 for all t ˆ θ ANC if ˆ b t = ˆ β in (6) for all t ˆ θ ANHC if ˆ b t = ˆ β t in (8) or (10) for all t (12) Theorem 1.
Assume (C1) and simple randomization for treatment assignment.(i) Assume that ˆ b t → b t in probability as n → ∞ , where b t is a fixed vector, t = 1 , ..., k .Then, as n → ∞ , √ n (cid:110) ˆ θ ( ˆ b , ..., ˆ b k ) − θ (cid:111) → N (0 , V SR ( B )) in distribution (13) where V SR ( B ) = diag { π − t var( Y ( t ) − b Tt X ) } + B T Σ X B + B T Σ X B − B T Σ X B, diag( d t ) denotes the k × k diagonal matrix with the t th diagonal element d t , B =( β , ..., β k ) , the matrix with columns β , ..., β k , and B = ( b , ..., b k ) . In particular,(13) holds for ˆ θ AN , ˆ θ ANC , and ˆ θ ANHC as described by (12).(ii) (Optimality of ANHECOVA). V SR ( B ) is minimized at B = B in the sense that V SR ( B ) − V SR ( B ) is positive semidefinite for all B . We briefly describe the proof for part (ii) in Theorem 1 and defer other details to thesupplementary material. Notice that V SR ( B ) − V SR ( B ) = diag { π − t ( β t − b t ) T Σ X ( β t − b t ) } − ( B − B ) T Σ X ( B − B ) . The positive semidefiniteness of this matrix follows from the following algebraic result with M = Σ / X ( B − B ). Lemma 1.
Let M be a matrix whose columns are m , ..., m k , and π , ..., π k be nonnegativeconstants with (cid:80) kt =1 π t = 1 . Then diag( π − t m Tt m t ) − M T M is positive semidefinite.
13e would like to emphasize that Theorem 1(i) holds regardless of whether model (3)is correct or not. Theorem 1(ii) shows that ANHECOVA not only satisfies Principle 1(guaranteed efficiency gain over ANOVA), but is also the most efficient estimator withinthe class of estimators in (11) as it attains the optimal V SR ( B ). Another consequenceof Theorem 1(ii) is that adjusting for more covariates in ANHECOVA does not lose andoften gains asymptotic efficiency, although adjusting for fewer covariates may have betterperformance when n is small.For the important scenario of estimating the linear contrast θ t − θ s with fixed t and s , the corresponding model-assisted estimator is c Tts ˆ θ , where ˆ θ is given by (11) and c ts is the k -dimensional vector whose t th component is 1, s th component is −
1, and othercomponents are 0. The following corollary provides a direct comparison of the asymptoticvariances of ANOVA, ANCOVA, and ANHECOVA estimators of linear contracts, showingthat the ANHECOVA estimator has strictly smallest asymptotic variance except for somevery special cases.
Corollary 1.
Assume (C1) and simple randomization.(i) The difference between the asymptotic variances of c Tts ˆ θ AN and c Tts ˆ θ ANHC is ( π s β t + π t β s ) T Σ X ( π s β t + π t β s ) π t π s ( π t + π s ) + (1 − π t − π s )( β t − β s ) T Σ X ( β t − β s ) π t + π s , which is always ≥ with equality holds if and only if π s β t + π t β s = 0 and ( β t − β s )(1 − π t − π s ) = 0 . (14) (ii) The difference between the asymptotic variances of c Tts ˆ θ ANC and c Tts ˆ θ ANHC is ( β t − β ) T Σ X ( β t − β ) π t + ( β s − β ) T Σ X ( β s − β ) π s − ( β t − β s ) T Σ X ( β t − β s ) , which is always ≥ with equality holds if and only if β = π s β t + π t β s π t + π s and ( β t − β s )(1 − π t − π s ) = 0 . (15)14hen k = 2, i.e., there are only two arms, (14) reduces to π β + π β = 0; (15) reducesto β = β or π = π = 1 /
2. The same conclusions were also obtained by Lin (2013) undera different framework that only considers the randomness in the treatment assignments.Liu and Yang (2020) extended the results in Lin (2013) to stratified simple randomization.We would like to emphasize that when there are more than two arms ( k > β t = β s = 0, because 0 < π t + π s < k >
2. For the comparison of ANHECOVA with ANCOVA, (15) holds if and only if β t = β s = β = (cid:80) kt =1 π t β t . Therefore, β t = β s is not enough for ANCOVA to be as efficientas ANHECOVA for estimating θ t − θ s . Moreover, even if treatment allocation is balanced,i.e., π = · · · = π k , ANCOVA is generally less efficient than ANHECOVA when there aremore than two arms; this is different from the conclusion in the case of two arms. Finally,in estimating θ t − θ s for all pairs of t and s , for ANOVA to have the same asymptoticefficiency as ANHECOVA, all β t ’s need to be zero, i.e., X is uncorrelated with Y ( t ) forevery t ; for ANCOVA to have the same asymptotic efficiency as ANHECOVA, all β t ’s mustbe the same, i.e., models (2) and (3) are the same.It is worth to mention that when there are more than two treatment arms, the ANCOVAestimator can be either more efficient or less efficient than the ANOVA estimator even underbalanced treatment allocation. This is also observed by Freedman (2008) in some specificexamples. We now consider the estimation of θ under covariate-adaptive randomization as describedin § θ ANHC in (7) with all the jointlevels of Z included in the covariate X . Theorem 2. (Validity and Universality of ANHECOVA). Assume (C1) and (C2). Ifheterogeneous model (3) is used with X containing the dummy variables for all the jointlevels of Z as a sub-vector, then, regardless of whether working model (3) is correct or notand which randomization scheme is used, as n → ∞ , √ n (cid:16) ˆ θ ANHC − θ (cid:17) → N (cid:0) , V SR ( B ) (cid:1) in distribution , (16) where V SR ( B ) = diag { π − t var( Y ( t ) − β Tt X ) } + B T Σ X B and B = ( β , ..., β k ) . Comparing Theorem 1 with Theorem 2, we see that the ANHECOVA estimator includ-ing all dummy variables for Z has exactly the same asymptotic variance in simple random-ization and any covariate-adaptive randomization satisfying (C2), which is reflected by thefact that V SR ( B ) is the same as V SR ( B ) in (13) with B = B . Therefore, this estimatorfulfills Principle 2 (validity and universality) so that the same inference procedure can beused regardless of which randomization scheme is used. As we show next, however, thisis not true for ANOVA or ANCOVA using model (2), or for ANHECOVA when Z is notfully included in the working model.To answer the second question, we need a further condition on the randomizationscheme, mainly for estimators not using model (3) or not including all levels of Z in X .(C3) There exist k × k matrices Ω( z ), z ∈ Z , such that, as n → ∞ , √ n (cid:16) n ( z ) n ( z ) − π , . . . , n k ( z ) n ( z ) − π k , z ∈ Z (cid:17) T | Z , . . . , Z n → N (0 , D ) in distribution,where D is a block diagonal matrix whose blocks are matrices Ω( z ) /P ( Z i = z ) , z ∈ Z .16ondition (C3) weakens Assumption 4.1(c) in Bugni et al. (2019), where Ω( z ) takesa more special form. For simple randomization, Ω( z ) = diag( π t ) − ππ T for all z , where π = ( π , . . . , π k ) T . For stratified permuted block randomization and stratified biased coinrandomization, Ω( z ) = 0 for all z . Note that Pocock-Simon’s minimization scheme does notsatisfy (C3) because the treatment assignments are correlated across strata, despite somerecent theoretical result has been obtained (Hu and Zhang, 2020). Thus, the followingresult does not apply to minimization. However, as our Theorem 2 does not require (C3),it applies to minimization.The next theorem establishes the asymptotic distribution of the class (11) of estima-tors under covariate-adaptive randomization, based on which we show the optimality ofANHECOVA. Theorem 3.
Assume (C1), (C2), and (C3). Consider class (11) of estimators and, withoutloss of generality, we assume that all levels of Z are included in X , as the components of ˆ b t ’s in (11) corresponding to levels of Z not in X may be 0.(i) For ˆ θ ( ˆ b , ..., ˆ b k ) defined in (11) with ˆ b t → b t in probability as n → ∞ , t = 1 , ..., k , √ n (cid:110) ˆ θ ( ˆ b , ..., ˆ b k ) − θ (cid:111) → N (cid:0) , V ( B ) (cid:1) in distribution , (17) where V ( B ) = V SR ( B ) − E [ R ( B ) { Ω SR − Ω( Z i ) } R ( B ) } ] , (18) V SR ( B ) is given in (13), B = ( b , ..., b k ) , Ω SR = diag( π t ) − ππ T , and R ( B ) =diag (cid:0) π − t E { Y ( t ) i − θ t − b Tt ( X i − µ X ) | Z i } (cid:1) . Furthermore, R ( B ) = 0 and, hence, V ( B ) = V SR ( B ) , where B = ( β , ..., β k ) .(ii) (Optimality of ANHECOVA). V ( B ) is minimized at B = B in the sense that V ( B ) − V ( B ) is positive semidefinite for all B . The main technical challenge in the proof of Theorem 2 and Theorem 3 is that thetreatment assignments A , . . . , A n are not independent, so we cannot directly apply the17lassical Linderberg central limit theorem. Instead, we decompose ˆ θ ( ˆ b , ..., ˆ b k ) − θ intofour terms and then apply a conditional version of the Linderberg central limit theorem tohandle the dependence. The details can be found in the supplementary material.A number of conclusions can be made from Theorem 3.1. While Theorem 2 answers the first question in the beginning of § θ ANHC with alljoint levels of Z included in model (3) achieves Principle 2 (validity and universality),the second question is answered by Theorem 3(ii) showing that ˆ θ ANHC also attainsPrinciple 1 (guaranteed efficiency gain); in fact, it is asymptotically the most efficientestimator compared with all estimators in class (11).2. A price paid for not using model (3) or not including all levels of Z in (3) is that theasymptotic validity of the resulting estimator requires condition (C3), which is notneeded in Theorem 2. Furthermore, the resulting estimator not only is less efficientaccording to the previous conclusion, but also has a more complicated asymptoticcovariance matrix depending on the randomization schemes (universality is not sat-isfied), which requires extra handling in variance estimation for inference; see, forexample, Shao et al. (2010) and Bugni et al. (2018).3. Under covariate-adaptive randomization satisfying (C2)-(C3), it is still true that theANCOVA estimator using model (2) may be asymptotically more efficient or lessefficient than the benchmark ANOVA estimator.4. From (18), the asymptotic covariance matrix V ( B ) is invariant with respect to ran-domization scheme if R ( B ) in (18) is 0, which is the case when B = B , i.e., ˆ θ ANHC isused with all levels of Z included in X . If R ( B ) is not 0, such as the case of ANOVA,ANCOVA, or ANHECOVA not adjusting for all joint levels of Z , then V ( B ) dependson randomization scheme and, the smaller the Ω( z ), the more efficient the estimatoris. Thus, the stratified permuted block or biased coin with Ω( z ) = 0 for all z ispreferred in this regard. 18. The roles played by design and modeling can be understood through V ( B ) − V SR (0) = { V SR ( B ) − V SR (0) } − E [ R ( B ) { Ω SR − Ω( Z i ) } R ( B ) } ] , where V SR (0) is the asymptotic variance of ANOVA estimator under simple random-ization. As we vary the randomization scheme and the working model, the change inthe asymptotic variance is determined by two terms. The first term { V SR ( B ) − V SR (0) } arises from using a working model; the second term E [ R ( B ) { Ω SR − Ω( Z i ) } R ( B )] isthe reduction due to using a covariate-adaptive randomization scheme, and the re-duction also depends on the working model being used via R ( B ).The last conclusion from Theorem 3 requires a further derivation, which leads to the fol-lowing corollary. Corollary 2.
Assume (C1)-(C3) and that X only includes the dummy variables for alljoint levels of Z . Then, for any B in (17), V ( B ) = V SR ( B ) + E { R ( B )Ω( Z i ) R ( B ) } . A direct consequence from Corollary 2 is that, if Ω( z ) = 0 for all z (e.g., stratifiedpermuted block or biased coin randomization is used) and X only includes all joint levelsof Z , then all estimators in class (11), including the benchmark ANOVA estimator, havethe same asymptotic efficiency as the ANHECOVA estimator under any randomization. Inother words, modeling with all joint levels of Z is equivalent to designing with Z . For inference on a function of θ based on Theorems 1 to 3, a crucial step is to constructa consistent estimator of asymptotic variance. For ANOVA or ANCOVA, the customarylinear model-based variance estimation assuming homoscedasticity can be inconsistent, ascriticized by Freedman (2008). This motivates Principle 3—robust standard error. Thatis, we should use variance estimators that are consistent regardless of whether the workingmodel is correct or not and whether heterpscedaticity is present or not.19onsider the ANHECOVA estimator ˆ θ ANHC in (7) (using either (8) or (10)), where theadjustment covariates X include all dummy variables for Z that is used in the random-ization. There exist formulas for heteroscedaticity-robust standard error (such as thoseprovided in the sandwich package in R ). However, those formulas cannot be directly ap-plied here, because they do not account for the additional variation introduced by centeringthe covariate X as required by the identification of θ . In fact, the term B T Σ X B in theasymptotic variance V SR ( B ) in Theorem 2 arises from centering X .Instead, we should use the robust variance estimator based on V SR ( B ), as describednext. Let ˆΣ X be the sample covariance matrix of X i based on the entire sample and S t ( ˆ β t )be the sample variance of ( Y i − ˆ β Tt X i )’s based on the patients in treatment arm t . Then V SR ( B ) in (16) can be estimated byˆ V = diag( π − t S t ( ˆ β t )) + ˆ B T ˆΣ X ˆ B . (19)This variance estimator is consistent as n → ∞ regardless of whether the heterogeneousworking model (3) or homoscedaticity (i.e., var( Y ( t ) i − β t X i | X i ) is a constant) holds ornot, and regardless of which randomization scheme is used.In most applications the primary analysis is about treatment effects in terms of thelinear contrast θ t − θ s = c Tts θ for one or several pairs of ( t, s ). For large n , an asymptoticlevel (1 − α ) confidence interval of θ t − θ s is (cid:16) c Tts ˆ θ ANHC − z α/ SE ts , c Tts ˆ θ ANHC + z α/ SE ts (cid:17) , where SE ts = π − t S t ( ˆ β t ) + π − s S s ( ˆ β s ) + ( ˆ β t − ˆ β s ) T ˆΣ X ( ˆ β t − ˆ β s ) and z α is the (1 − α ) quantileof the standard normal distribution. The same form of confidence interval can be used forany linear contrast c T θ (the sum of components of c is 0) with c Tts ˆ θ ANHC and SE ts replacedby c T ˆ θ ANHC and SE c = c T ˆ V c , respectively. Let C be the collection of all linear contrastswith dimension k . An asymptotic level (1 − α ) simultaneous confidence band of c T θ , c ∈ C ,can be obtained by Scheff´e’s method, (cid:16) c T ˆ θ ANHC − χ α,k − SE c , c T ˆ θ ANHC + χ α,k − SE c (cid:17) , c ∈ C , χ α,k − is the (1 − α ) quantile of the chi-square distribution with ( k −
1) degrees offreedom. Correspondingly, to test the hypothesis that all θ t ’s are the same, i.e., H : θ = · · · = θ k , an asymptotic level α chi-square test rejects H if and only ifˆ θ T ANHC C T ( C ˆ V C T ) − C ˆ θ ANHC > χ α,k − , where C is the ( k − × k matrix whose t th row is c Ttk , t = 1 , ..., k − We perform a simulation study based on the placebo arm of 481 patients in a real clinicaltrial to demonstrate the finite-sample properties of the model-assisted procedures. We usethe observed continuous response of these 481 patients as the potential response Y (1) undertreatment arm 1, and a 2-dimensional continuous baseline covariate ( U, W ). The empiricaldistribution of ( Y (1) , U, W ) of these patients is the population distribution in simulations.Notice that we do not know the true relationship between Y (1) and ( U, W ) because theyare from the real data. Thus, the working models (2) and (3) may be misspecified.21n all numerical results, we apply (10) for ANHECOVA.We consider three simulation studies that differ in how the potential responses Y (2) and Y (3) of the other two treatment arms are generated, and how the treatment assignmentis randomized. Our first simulation compares the standard deviations of the ANOVA,ANCOVA, and ANHECOVA estimators of θ − θ , with X = U for ANCOVA and AN-HECOVA. The two additional potential responses are generated according to Y (2) = Y (1) + ζ ( U − µ U ) , (so θ = θ ) ,Y (3) = Y (2) , (so θ = θ ) , (20)and with sample size 481 (all data points) and treatment being assigned by simple random-ization with three different allocation proportions: 1:2:2, 1:1:1, and 2:1:1. Thus, the onlyrandomness in the first simulation study is from treatment assignments. Here µ U is themean of 481 U -values and ζ ranges from − ζ = 0 corresponds to β = β = β .Although we only consider the estimation of θ − θ , data from the third arm is still usedby ANCOVA and ANHECOVA.Based on 10,000 simulations, all three estimators have negligible biases and their stan-dard deviations are plotted for different values of ζ in Figure 1. The simulation result showsthat, as predicted by our theory, ANHECOVA is generally more efficient than the othertwo estimators, except when ζ is nearly 0 where ANCOVA is comparable to ANHECOVA.Furthermore, the simulation with the 1:2:2 allocation (left panel in Figure 1) shows veryclearly that there is no definite ordering of the variances of ANCOVA and ANCOVA. OurCorollary 1 suggests that a balanced allocation does not guarantee the superiority of AN-COVA over ANOVA when there are multiple arms (in contrast with the case of two arms),which can be seen from the simulations with 1:1:1 allocation (middle panel in Figure 1).The second and third simulation studies are intended to examine the performance ofstandard errors and the coverage of proposed 95% asymptotic confidence intervals for θ − θ and θ − θ . In the second simulation study, treatment assignments are still generated by22imple randomization (with allocation 1:1:1 or 1:2:2), but a random sample of size n = 200or 400 is drawn from the 481 subjects with replacement. The setup (20) is also changed to Y (2) = − . Y (1) − . U − µ U ) , (so θ − θ = − . ,Y (3) = − Y (1) , (so θ − θ = − . (21)For both ANCOVA and ANHECOVA, we include U but not W in their working models,so X = U . The simulation results based on 10,000 simulations are shown in Table 1.In the third simulation study, the treatment assignments are generated by stratifiedpermuted block randomization with block size 6 and treatment allocation 1:1:1 or blocksize 10 and treatment allocation 1:2:2. The covariate Z used in randomization is a three-level discretized W . Similar to the second simulation study, a random sample of size n = 200 or 400 is drawn, but the setup (21) is changed to Y (2) = − . Y (1) − . U − µ U ) + 0 . W − µ W ) , (so θ − θ = − . ,Y (3) = − Y (1) − . U − µ U ) − . W − µ W ) , (so θ − θ = − , (22)where µ W is the mean of all 481 W -values. For ANCOVA and ANHECOVA, we considertwo working models with different choices of X . One model includes the dummy variablesfor Z and U but not W , motivated by the fact that Z is a discretization of W and carriessome information from W . The other model includes the dummy variables for Z , U , and W (this is different from just including U and W because W is continuous and the workingmodel is linear). The simulation results based on 10,000 simulations are shown in Table 2.Notice that in the third simulation that uses a covariate-adaptive randomization scheme,consistent standard errors for ANOVA or ANCOVA can be obtained using Theorem 3 butthey are not readily available. For this reason, in the third simulation we use the standarderror derived under simple randomization based on Theorem 1. In other words, standarderrors for each estimator are computed using the same formula in the second and thirdsimulation studies. Based on our theory, it is expected that confidence intervals based onANOVA and ANCOVA are conservative. 23he following is a summary of the simulation results in Tables 1-2.1. All estimators have negligible bias compared to their standard deviation.2. ANHECOVA has the smallest standard deviation in all our simulation settings. Thisis consistent with our asymptotic theory.3. There is no unambiguous ordering of the standard deviations of ANCOVA and AN-COVA. In particular, in the third simulation study (Table 2), ANCOVA adjustingfor Z, U, W is better in estimating θ − θ but worse in estimating θ − θ . A simi-lar observation can also be obtained from Table 1. Moreover, in Table 1, ANCOVAhas larger standard deviations compared with ANOVA in estimating θ − θ , despite β = β , which is contrary to the results under two arms.4. In the third simulation study that uses stratified permuted block randomization (Ta-ble 2), including the additional covariate W in the working model results in a smallerstandard deviation for ANHECOVA in all our simulation settings. Interestingly, thisis not always the case for ANCOVA.5. In the second simulation study that uses simple randomization (Table 1), the robuststandard errors for all the model-assisted estimators are very close to their actualstandard deviations, and confidence intervals have the nominal coverage in all set-tings. However, although this is still true for ANHECOVA in the third simulationstudy under covariate-adaptive randomization (Table 2), it is not the case for ANOVAand ANCOVA, i.e., standard errors valid under simple randomization appear to over-estimate the actual standard deviations, so the confidence intervals are conservative. We further illustrate the different model-assisted procedures using a real data example.Chong et al. (2016) conducted a randomized experiment to evaluate the impact of lowdietary iron intake on human capital attainment. They recruited students of age 11 to 1924n a rural area of Cajamarca, Peru, where many adolescents suffer from iron deficiency. Thegoal of this randomized trial is to quantify the causal effect of reduced adolescent anemiaon school attainment. By using students’ school grade as covariate Z with five levels, astratified permuted block randomization with 1:1:1 allocation was applied to assign 219students to one of the following three promotional videos: Video 1:
A popular soccer player is encouraging iron supplements to maximize energy;
Video 2:
A physician is encouraging iron supplements for overall health;
Video 3:
A dentist encouraging oral hygiene without mentioning iron at all.Chong et al. (2016) investigated whether showing different promotional videos to the stu-dents can improve their academic performance through increased iron intake. Video 3 istreated as a “placebo”. After the treatment assignments, four students were excluded fromthe analysis for various reasons, which we also ignore in our analysis. The dataset is avail-able at .Chong et al. (2016) used various outcomes in their analysis; here we focus on one oftheir primary outcomes—the academic achievement—as an example. In this trial, theacademic achievement is measured by a standardized average of the student’s grades inmath, foreign language, social sciences, science, and communications in a semester. Forthe model-assisted approaches, we use the baseline anemia status as the covariate in ourworking models (2) and (3), which is believed to moderate the treatment effect (Chonget al., 2016).Table 3 reports the analysis results by using different model-assisted procedures. Like inour simulation studies, the standard errors for ANOVA and ANCOVA are computed usingestimator based on Theorem 1 for simple randomization, even though the randomizationscheme here is covariate-adaptive. All the model-assisted estimators find similar effect sizesfor the two contrasts (physician versus placebo, soccer star versus placebo), and the twoANHECOVA estimators have slightly smaller standard errors. Including baseline anemia25tatus in the working model are useful to reduce the standard error. Compared to theplacebo, the promotional video by the soccer player does not seem to have a positiveeffect on the academic achievement. In contrast, the video of the physician promoting ironsupplements appears to have a positive effect. The difference between ANHECOVA andANOVA or ANCOVA, and between including and not including anemia can be seen fromthe magnitude of the corresponding p-values.
To improve its credibility and efficiency, we believe a clinical trial analysis can benefit fromPrinciples 1 to 3 outlined in § § Z included in heterogeneous workingmodel (3), coupled with the robust variance estimator given by (19). Thus we believeit deserves wider usage in the clinical trial practice. Our theory shows that using anyother estimator with the form (11), including ANOVA, ANCOVA using model (2), andANHECOVA not adjusting for all joint levels of Z , suffers from invalidity, inefficiency, ornon-universality in the sense that the asymptotic distribution of the estimator depends ona particular randomization scheme.Our model-assisted asymptotic theory is given in terms of the joint asymptotic dis-tribution in estimating θ , the vector of mean responses, with multiple treatment armsrandomized with a wide range of covariate-adaptive randomization schemes. It can bereadily used for inference about linear or nonlinear functions of θ , with either continuousor discrete responses. Although working models (2) and (3) are not commonly used for dis-crete responses, ANHECOVA is still asymptotically valid as it is model-assisted. For binaryresponses, a popular model is logistic regression. However, if X has a continuous compo-nent, the standard logistic regression inference is model-based instead of model-assisted26nd, thus, it may be invalid if the logistic model is not correctly specified.In applications, one may be interested in comparing just two treatments, although thetrial contains more than two treatment arms. A simple way of analysis is to ignore thedata from other arms and apply inference procedures to the two arms of interest. ForANOVA, this is equivalent to using all the arms, since ANOVA does not borrow strengthfrom other arms through using covariates. However, using data from two arms only isnot recommended when ANHECOVA is applied, because it can utilize covariate data fromother arms to gain efficiency. Regarding ANCOVA, there is no definite order of efficiencyfor using the whole dataset or data from two given arms, since using more covariate datain ANCOVA may increase or decrease efficiency.Our Theorem 2 can also be applied to rerandomization schemes (Li et al., 2018; Li andDing, 2020) using discrete covariates. Rerandomization also tries to balance the treatmentassignments across levels of Z , but it randomizes the treatment assignments for all patientssimultaneously. This is different from the covariate-adaptive randomization schemes thatassign treatment sequentially. For two-armed trials, (Li et al., 2018, Corollaries 1 and2) have shown that rerandomization satisfies (C2), but it is yet unknown if it satisfies(C3). Similar results for model-assisted inference can also be found in (Li and Ding, 2020,Theorem 3).As a final cautionary note, standard software does not produce asymptotically validstandard errors for model-assisted inference. We implement an R package called RobinCar to compute the model-assisted estimators and their robust standard errors, that is availableat https://github.com/tye27/RobinCar . SUPPLEMENTARY MATERIAL
The supplementary material contains all technical proofs.27 eferences
Baldi Antognini, A. and Zagoraiou, M. (2015). On the almost sure convergence of adaptiveallocation procedures.
Bernoulli Journal , 21(2):881–908.Bugni, F. A., Canay, I. A., and Shaikh, A. M. (2018). Inference under covariate-adaptiverandomization.
Journal of the American Statistical Association , 113(524):1784–1796.Bugni, F. A., Canay, I. A., and Shaikh, A. M. (2019). Inference under covariate-adaptiverandomization with multiple treatments.
Quantitative Economics , 10(4):1747–1785.Cassel, C. M., S¨arndal, C. E., and Wretman, J. H. (1976). Some results on general-ized difference estimation and generalized regression estimation for finite populations.
Biometrika , 63(3):615–620.Chong, A., Cohen, I., Field, E., Nakasone, E., and Torero, M. (2016). Iron deficiencyand schooling attainment in peru.
American Economic Journal: Applied Economics ,8(4):222–55.Ciolino, J. D., Palac, H. L., Yang, A., Vaca, M., and Belli, H. M. (2019). Ideal vs. real: asystematic review on handling covariates in randomized controlled trials.
BMC MedicalResearch Methodology , 19(1):136.Cochran, W. G. (1977).
Sampling Techniques . Third Edition. Wiley, New York.Dawid, A. P. (1979). Conditional independence in statistical theory.
Journal of the RoyalStatistical Society. Series B (Methodological) , 41(1):1–31.EMA (2015). Guideline on adjustment for baseline covariates in clinical trials. Committeefor Medicinal Products for Human Use, European Medicines Agency.FDA (2019). Adjusting for covariates in randomized clinical trials for drugs and biologicswith continuous outcomes. Draft Guidance for Industry. Center for Drug Evaluation andResearch and Center for Biologics Evaluation and Research, Food and Drug Adminis-tration, U.S. Department of Health and Human Services.28DA (2020). Covid-19: Developing drugs and biological products for treatment or preven-tion. Guidance for Industry. Center for Drug Evaluation and Research and Center forBiologics Evaluation and Research, Food and Drug Administration, U.S. Department ofHealth and Human Services.Fisher, R. A. (1935).
The Design of Experiments . Oliver and Boyd, Edinburgh.Freedman, D. A. (2008). On regression adjustments in experiments with several treatments.
Annals of Applied Statistics , 2(1):176–196.Fuller, W. A. (2009).
Sampling Statistics . Wiley, New York.Han, B., Enas, N. H., and McEntegart, D. (2009). Randomization by minimization forunbalanced treatment allocation.
Statistics in Medicine , 28(27):3329–3346.Hu, F. and Zhang, L.-X. (2020). On the theory of covariate-adaptive designs. arXiv preprintarXiv:2004.02994 .Huber, P. J. (1967). The behavior of maximum likelihood estimates under nonstandardconditions. In
Proceedings of the Fifth Berkeley symposium on mathematical statisticsand probability , volume 1, pages 221–233. University of California Press.ICH E9 (R1) (2019). Addendum on estimands and sensitivity analysis in clinical trials tothe guideline on statistical principles for clinical trials E9(R1). International Council forHarmonisation.Kuznetsova, O. M. and Johnson, V. P. (2017). Approaches to expanding the two-arm biasedcoin randomization to unequal allocation while preserving the unconditional allocationratio.
Statistics in Medicine , 36(16):2483–2498.Li, X. and Ding, P. (2020). Rerandomization and regression adjustment.
Journal of theRoyal Statistical Society: Series B (Statistical Methodology) , 82(1):241–268.29i, X., Ding, P., and Rubin, D. B. (2018). Asymptotic theory of rerandomizationin treatment–control experiments.
Proceedings of the National Academy of Sciences ,115(37):9157.Lin, W. (2013). Agnostic notes on regression adjustments to experimental data: Reexam-ining freedman’s critique.
Annals of Applied Statistics , 7(1):295–318.Liu, H. and Yang, Y. (2020). Regression-adjusted average treatment effect estimates instratified randomized experiments.
Biometrika .Pocock, S. J. and Simon, R. (1975). Sequential treatment assignment with balancing forprognostic factors in the controlled clinical trial.
Biometrics , 31(1):103–115.S¨arndal, C.-E., Swensson, B., and Wretman, J. (2003).
Model Assisted Survey Sampling .Springer Science & Business Media.Shao, J. and Wang, S. (2014). Efficiency of model-assisted regression estimators in samplesurveys.
Statistica Sinica , 24(1):395–414.Shao, J. and Yu, X. (2013). Validity of tests under covariate-adaptive biased coin random-ization and generalized linear models.
Biometrics , 69(4):960–969.Shao, J., Yu, X., and Zhong, B. (2010). A theory for testing hypotheses under covariate-adaptive randomization.
Biometrika , 97(2):347–360.Ta, T., Shao, J., Li, Q., and Wang, L. (2020). Generalized regression estimators withhigh-dimensional covariates.
Statistica Sinica , 30(3):1135–1154.Taves, D. R. (1974). Minimization: A new method of assigning patients to treatment andcontrol groups.
Clinical Pharmacology and Therapeutics , 15(5):443–453.White, H. (1980). A heteroskedasticity-consistent covariance matrix estimator and a directtest for heteroskedasticity.
Econometrica , 48(4):817–838.Ye, T. (2018). Testing hypotheses under covariate-adaptive randomisation and additivemodels.
Statistical Theory and Related Fields , 2:96–101.30elen, M. (1974). The randomization and stratification of patients to clinical trials.
Journalof Clinical Epidemiology , 27(7):365–375.Figure 1:
Standard deviations of ANOVA, ANCOVA, and ANHECOVA estimators based on10,000 simulations
Bias, standard deviation (SD), average standard error (SE), and coverage probability(CP) of 95% asymptotic confidence interval under simple randomization and setup (21) based on10,000 simulations θ − θ θ − θ Allocation n Method Bias SD SE CP Bias SD SE CP1:1:1 400 ANOVA 0.001 0.162 0.162 0.950 -0.002 0.137 0.138 0.952ANCOVA 0.000 0.143 0.143 0.951 -0.003 0.138 0.138 0.950ANHECOVA 0.001 0.140 0.140 0.953 -0.002 0.134 0.134 0.950200 ANOVA -0.001 0.229 0.229 0.947 0.001 0.195 0.195 0.950ANCOVA 0.000 0.204 0.203 0.946 0.002 0.196 0.195 0.949ANHECOVA 0.000 0.200 0.198 0.945 0.002 0.191 0.190 0.9491:2:2 400 ANOVA 0.001 0.173 0.172 0.952 0.001 0.155 0.154 0.948ANCOVA 0.001 0.159 0.158 0.951 0.001 0.157 0.156 0.947ANHECOVA 0.001 0.156 0.155 0.948 0.000 0.152 0.150 0.946200 ANOVA 0.001 0.245 0.244 0.946 0.004 0.219 0.218 0.946ANCOVA 0.002 0.229 0.224 0.943 0.005 0.224 0.221 0.943ANHECOVA 0.001 0.225 0.219 0.943 0.003 0.216 0.212 0.941
Bias, standard deviation (SD), average standard error (SE), and coverage probability(CP) of 95% asymptotic confidence interval under stratified permuted block randomization (blocksize = 6) and setup (22) based on 10,000 simulations θ − θ θ − θ Allocation n Method X Bias SD SE CP Bias SD SE CP1:1:1 400 ANOVA -0.002 0.258 0.333 0.986 -0.002 0.163 0.195 0.980ANCOVA
Z, U -0.003 0.259 0.305 0.977 -0.002 0.161 0.228 0.994
Z, U, W -0.017 0.246 0.292 0.977 0.003 0.178 0.239 0.989ANHECOVA
Z, U -0.004 0.258 0.257 0.949 -0.001 0.158 0.157 0.945
Z, U, W -0.031 0.234 0.234 0.946 0.010 0.154 0.153 0.943200 ANOVA -0.004 0.364 0.471 0.987 -0.002 0.233 0.276 0.979ANCOVA
Z, U -0.006 0.366 0.431 0.974 -0.002 0.231 0.323 0.994
Z, U, W -0.036 0.349 0.411 0.971 0.008 0.257 0.337 0.989ANHECOVA
Z, U -0.009 0.364 0.363 0.944 -0.001 0.225 0.221 0.943
Z, U, W -0.067 0.335 0.339 0.941 0.020 0.221 0.218 0.9441:2:2 400 ANOVA 0.002 0.260 0.317 0.981 -0.001 0.176 0.199 0.974ANCOVA
Z, U
Z, U, W -0.014 0.253 0.284 0.970 0.004 0.196 0.258 0.992ANHECOVA
Z, U -0.001 0.258 0.259 0.945 -0.002 0.171 0.169 0.947
Z, U, W -0.025 0.242 0.240 0.943 0.005 0.168 0.165 0.946200 ANOVA -0.001 0.369 0.447 0.981 0.004 0.248 0.282 0.973ANCOVA
Z, U -0.003 0.370 0.415 0.971 0.005 0.248 0.345 0.994
Z, U, W -0.031 0.360 0.400 0.970 0.016 0.279 0.364 0.990ANHECOVA
Z, U -0.006 0.367 0.364 0.946 0.005 0.242 0.237 0.946
Z, U, W -0.050 0.344 0.343 0.943 0.017 0.239 0.233 0.943
Table 3:
Estimate, standard error (SE), and p-value in the real data example analysis
Physician versus placebo Soccer star versus placeboMethod X Estimate SE p-value Estimate SE p-valueANOVA 0.386 0.211 0.067 -0.068 0.205 0.739ANCOVA Grade 0.403 0.203 0.046 -0.052 0.203 0.799Grade, Anemia status 0.437 0.199 0.028 -0.085 0.201 0.672ANHECOVA Grade 0.409 0.200 0.041 -0.051 0.201 0.800Grade, Anemia status 0.481 0.193 0.013 -0.046 0.195 0.815 upplementary Material: Principles for Covariate Adjustment in Analyzing
Randomized Clinical Trials
Lemma 2.
Let (C1) and (C2) be given and suppose P ( A i = a t | Z , . . . , Z n ) = π t for all t = 1 , . . . , k and i = 1 , . . . , n .(i) For any integrable function f , we have that E { f ( Y ( t ) i , X i ) } = E ( f ( Y i , X i ) | A i = a t ) , and E { f ( Y ( t ) i , X i ) | X i } = E ( f ( Y i , X i ) | X i , A i = a t ) . (ii) Let θ = ( E ( Y (1) , ..., E ( Y ( k ) )) (cid:62) be the potential response mean vector, β = (cid:80) kt =1 π t β t ,and β t = Σ − X cov( X i , Y ( t ) i ) , t = 1 , ..., k , then ( θ, β ) = arg min ( ϑ,β/ ) E (cid:104)(cid:8) Y i − ϑ (cid:62) A i − β/ (cid:62) ( X i − µ X ) (cid:9) (cid:105) , and ( θ, β , . . . , β k ) = arg min ( ϑ,β/ ,...,β/ k ) E (cid:40) Y i − ϑ (cid:62) A i − k (cid:88) t =1 β/ (cid:62) t ( X i − µ X ) I ( A i = a t ) (cid:41) . The condition P ( A i = a t | Z , . . . , Z n ) = π t for all t and i holds for most covariate-adaptive randomization schemes. Note that it does not exclude the possibility that the setof random variables { A i , i = 1 , . . . , n } is dependent on { Z i , i = 1 , . . . , n } , which is indeedthe case for covariate-adaptive randomization schemes. We impose this condition only inLemma 2 to facilitate understanding the working models. This additional assumption isnot needed for our asymptotic theory in Section 3, as condition (C2) (ii) is sufficient. Proof. (i) We focus on proving the second result; the first result can be shown similarly. Forsimple randomization, this result immediately follows (C2) (i) as ( Y (1) i , . . . , Y ( k ) i , X i , A i ) are34ndependent and identically distributed. For covariate-adaptive randomization, we remarkthat the property of conditional independence (Dawid, 1979, Lemma 4.3), (C2) (i) andthe third condition in Lemma 2 imply that A i is independent of { ( Y (1) i , ..., Y ( k ) i , X i , Z i ) , i =1 , . . . , n } . Then, it can be shown that E { f ( Y i , X i ) | X i , A i = a t } = E { f ( Y ( t ) i , X i ) | X i , A i = a t } = (cid:88) z ,...,z n ∈Z E { f ( Y ( t ) i , X i ) | X i , A i = a t , Z = z , . . . , Z n = z n } P ( Z = z , . . . , Z n = z n | X i , A i = a t )= (cid:88) z ,...,z n ∈Z E { f ( Y ( t ) i , X i ) | X i , Z = z , . . . , Z n = z n } P ( Z = z , . . . , Z n = z n | X i , A i = a t )= (cid:88) z ,...,z n ∈Z E { f ( Y ( t ) i , X i ) | X i , Z = z , . . . , Z n = z n } P ( Z = z , . . . , Z n = z n | X i )= E { f ( Y ( t ) i , X i ) | X i } . In the equalities above, we have used, respectively, the consistency of potential responses,the law of iterated expectation, (C2) (i), and the remark above.(ii) We only prove the first result. The second result can be proved similarly. Noticethat ( θ, β ) satisfies the following estimation equations: E (cid:2) I ( A i = a t ) { Y i − θ (cid:62) A i − β (cid:62) ( X i − µ X ) } (cid:3) = 0 , for any t (S1) E (cid:2) ( X i − µ X ) { Y i − θ (cid:62) A i − β (cid:62) ( X i − µ X ) } (cid:3) = 0 . (S2)From Lemma 2, (S1) implies that for any t , E (cid:2) Y i − θ (cid:62) A i − β (cid:62) ( X i − µ X ) | A i = a t (cid:3) = E (cid:2) Y ( t ) i − θ t − β (cid:62) ( X i − µ X ) (cid:3) = E [ Y ( t ) i − θ t ] = 0 , θ t = E ( Y ( t ) i ). Then (S2) implies that0 = E (cid:2) ( X i − µ X ) { Y i − θ (cid:62) A i − β (cid:62) ( X i − µ X ) } (cid:3) = k (cid:88) t =1 E (cid:2) I ( A i = a t )( X i − µ X ) { Y i − θ (cid:62) A i − β (cid:62) ( X i − µ X ) } (cid:3) = k (cid:88) t =1 E (cid:2) ( X i − µ X ) { Y i − θ (cid:62) A i − β (cid:62) ( X i − µ X ) } | A i = a t (cid:3) π t = k (cid:88) t =1 E (cid:2) ( X i − µ X ) { Y ( t ) i − θ t − β (cid:62) ( X i − µ X ) } (cid:3) π t = k (cid:88) t =1 (cid:104) cov( X i , Y ( t ) i ) − Σ X β (cid:105) π t = k (cid:88) t =1 cov( X i , Y ( t ) i ) π t − Σ X β and thus, β = Σ − X (cid:80) kt =1 cov( X i , Y ( t ) i ) π t = (cid:80) kt =1 π t β t . Lemma 3.
Under conditions (C1)-(C2), ˆ β t = β t + o p (1) , t = 1 , . . . , k , and ˆ β = β + o p (1) .Proof. We prove the result for ˆ β t . The proofs for ˆ β are analogous and omitted. Notice that1 n t (cid:88) i : A i = a t ( X i − ¯ X t ) Y i = 1 n t n (cid:88) i =1 I ( A i = a t ) X i Y i − n t n (cid:88) i =1 I ( A i = a t ) X i n t n (cid:88) i =1 I ( A i = a t ) Y i Let A = { A , . . . , A n } and F = { Z , . . . , Z n } . Note that E (cid:40) n n (cid:88) i =1 I ( A i = a t ) X i Y i | A , F (cid:41) = 1 n n (cid:88) i =1 I ( A i = a t ) E ( X i Y ( t ) i | A , F )= 1 n n (cid:88) i =1 I ( A i = a t ) E ( X i Y ( t ) i | Z i ) , where the second line is because E ( X i Y ( t ) i | A , F ) = E ( X i Y ( t ) i | F ) = E ( X i Y ( t ) i | Z i )from (C1) and (C2) (i). Moreover, n − (cid:80) ni =1 I ( A i = a t ) X i Y ( t ) i becomes an average ofindependent random variables once conditional on {A , F } . From the existence of second36oment of XY ( t ) , and the weak law of large numbers for independent random variables,we conclude that, for any (cid:15) > n →∞ P (cid:32)(cid:12)(cid:12)(cid:12)(cid:12) n n (cid:88) i =1 I ( A i = a t ) X i Y i − n n (cid:88) i =1 I ( A i = a t ) E ( X i Y ( t ) i | Z i ) (cid:12)(cid:12)(cid:12)(cid:12) ≥ (cid:15) | A , F (cid:33) = 0From the bounded convergence theorem, the above equation also holds unconditionally. Inother words, 1 n n (cid:88) i =1 I ( A i = a t ) X i Y i − n n (cid:88) i =1 I ( A i = a t ) E ( X i Y ( t ) i | Z i ) = o p (1) . Furthermore, 1 n n (cid:88) i =1 I ( A i = a t ) E ( X i Y ( t ) i | Z i )= 1 n (cid:88) z n (cid:88) i =1 I ( Z i = z ) I ( A i = a t ) E ( X i Y ( t ) i | Z i = z )= 1 n (cid:88) z E ( X i Y ( t ) i | Z i = z ) n (cid:88) i =1 I ( Z i = z ) I ( A i = a t )= 1 n (cid:88) z E ( X i Y ( t ) i | Z i = z ) n t ( z )= (cid:88) z E ( X i Y ( t ) i | Z i = z ) n t ( z ) n ( z ) n ( z ) n = (cid:88) z E ( X i Y ( t ) i | Z i = z ) π t P ( Z i = z ) + o p (1)= π t E ( X i Y ( t ) i ) + o p (1)This together with the fact that n t /n = (cid:80) z n t ( z ) / { (cid:80) z n ( z ) } = π t + o p (1), we have1 n t n (cid:88) i =1 I ( A i = a t ) X i Y i = E ( X i Y ( t ) i ) + o p (1)37imilarly, we can show the result with X i Y i replaced by X i or Y i also holds, i.e.,1 n t n (cid:88) i =1 I ( A i = a t ) X i = E ( X i ) + o p (1)1 n t n (cid:88) i =1 I ( A i = a t ) Y i = E ( Y ( t ) i ) + o p (1)The denominator of ˆ β t can be treated similarly, which leads to1 n t (cid:88) i : A i = a t { X i − ¯ X t }{ X i − ¯ X t } (cid:62) = Σ X + o p (1) . The proof is completed by using the definition of β t . Under simple randomization, A = ( A , . . . , A n ) are independent with other variables. Wedecompose ¯ X t − ¯ X = ¯ X t − n n (cid:88) i =1 X i = ¯ X t − n (cid:32) (cid:88) i : A i = a t X i + (cid:88) i : A i (cid:54) = a t X i (cid:33) = ¯ X t − n t n ¯ X t − n (cid:88) i : A i (cid:54) = a t X i = n − n t n ( ¯ X t − ¯ X − t )where ¯ X − t = ( n − n t ) − (cid:80) i : A i (cid:54) = a t X i .First, we show that ¯ X t and ¯ X − t are uncorrelated conditional on A , fromcov( ¯ X t , ¯ X − t | A ) = 1( n − n t ) n t n (cid:88) i =1 n (cid:88) j =1 I ( A i = a t ) I ( A j (cid:54) = a t )cov( X i , X j | A ) = 0where the last equality is from cov( X i , X j | A ) = cov( X i , X j ) = 0 for i (cid:54) = j . Similarly, wecan show that ¯ Y t and ¯ X − t are uncorrelated conditional on A .38hen, cov { β (cid:62) t ( ¯ X t − ¯ X ) , ¯ Y t } = β (cid:62) t cov (cid:18) n − n t n ¯ X t − n − n t n ¯ X − t , ¯ Y t (cid:19) = β (cid:62) t cov (cid:18) n − n t n ¯ X t , ¯ Y t (cid:19) = β (cid:62) t cov (cid:18) n − n t n (cid:80) i : A i = a t X i n t , (cid:80) i : A i = a t Y i n t (cid:19) = β (cid:62) t E (cid:26) cov (cid:18) n − n t n (cid:80) i : A i = a t X i n t , (cid:80) i : A i = a t Y i n t | A (cid:19)(cid:27) = β (cid:62) t E (cid:40) n − n t nn t cov (cid:32) (cid:88) i : A i = a t X i , (cid:88) i : A i = a t Y i | A (cid:33)(cid:41) = β (cid:62) t E (cid:40) n − n t nn t (cid:88) i : A i = a t cov (cid:16) X i , Y ( t ) i (cid:17)(cid:41) = β (cid:62) t E (cid:26) n − n t nn t (cid:27) cov (cid:16) X i , Y ( t ) i (cid:17) = E (cid:26) n − n t nn t (cid:27) β (cid:62) t Σ X β t where the second equality is from cov( ¯ X − t , ¯ Y t | A ) = 0, E ( ¯ Y t | A ) = E ( Y ( t ) ) and theidentity that cov( X, Y ) = E { cov( X, Y | Z ) } + cov { E ( X | Z ) , E ( Y | Z ) } .Also note that var { β (cid:62) t ( ¯ X t − ¯ X ) } = β (cid:62) t var (cid:18) n − n t n ( ¯ X t − ¯ X − t ) (cid:19) β t = β (cid:62) t E (cid:18) ( n − n t ) n var( ¯ X t − ¯ X − t | A ) (cid:19) β t = β (cid:62) t E (cid:18) ( n − n t ) n { var( ¯ X t | A ) + var( ¯ X − t | A ) } (cid:19) β t = β (cid:62) t E (cid:18) ( n − n t ) n (cid:26) var( X i ) n t + var( X i ) n − n t (cid:27)(cid:19) β t = E (cid:26) n − n t nn t (cid:27) β (cid:62) t Σ X β t X ) = E { var( X | Z ) } +var { E ( X | Z ) } ,and E (cid:0) ¯ X t − ¯ X − t | A (cid:1) = E ( X i ) − E ( X i ) = 0. For any fixed k -dimensional vector (cid:96) = ( (cid:96) , . . . , (cid:96) k ) (cid:62) , we have (cid:96) (cid:62) { diag( π − t m (cid:62) t m t ) − M (cid:62) M } (cid:96) = k (cid:88) t =1 π − t (cid:96) t m (cid:62) t m t − (cid:40) k (cid:88) t =1 (cid:96) t m (cid:62) t (cid:41) (cid:40) k (cid:88) t =1 (cid:96) t m t (cid:41) = E ( Q (cid:62) Q ) − E ( Q (cid:62) ) E ( Q )= tr { E ( QQ (cid:62) ) } − tr { E ( Q ) E ( Q (cid:62) ) }≥ , where tr denotes the trace of a matrix, Q denotes a p -dimensional random vector that takesvalue π − t (cid:96) t m t with probability π t , t = 1 , . . . , k , and the last equality follows from the factthat the covariance matrix var( Q ) = E ( QQ (cid:62) ) − E ( Q ) E ( Q (cid:62) ) is positive semidefinite. (i) First, from ¯ X t − ¯ X = O p ( n − / ) and ˆ b t = b t + o p (1), we haveˆ θ (ˆ b , . . . , ˆ b k ) = ˆ θ ( b , . . . , b k ) + { ( ¯ X − ¯ X )( b − ˆ b ) , . . . , ( ¯ X − ¯ X )( b k − ˆ b k ) } (cid:62) = ˆ θ ( b , . . . , b k ) + o p ( n − / )Then, we decompose ˆ θ ( b , . . . , b k ) asˆ θ ( b , . . . , b k ) − θ = ¯ Y − θ − ( ¯ X − µ X ) (cid:62) b · · · ¯ Y k − θ k − ( ¯ X k − µ X ) (cid:62) b k (cid:124) (cid:123)(cid:122) (cid:125) M + ( ¯ X − µ X ) (cid:62) b · · · ( ¯ X − µ X ) (cid:62) b k (cid:124) (cid:123)(cid:122) (cid:125) M M , where the t th term equals M t = ¯ Y t − θ t − ( ¯ X t − µ X ) (cid:62) b t = 1 n t (cid:88) i : A i = a t Y i − θ t − ( X i − µ X ) (cid:62) b t We have E ( M t | A ) = 0 andvar( M t | A ) = n − t var { Y ( t ) − ( X − µ X ) (cid:62) b t } = ( nπ t ) − var { Y ( t ) − X (cid:62) b t } + o p ( n − ) , and cov( M t , M s | A ) = 0 for t (cid:54) = s , where A = { A , . . . , A n } . Hence, var( M | A ) is adiagonal matrix, with the diagonal elements being var( M t | A ) , t = 1 , . . . , k . From thecentral limit theorem, we have that as n → ∞ , M (cid:112) var( M | A ) | A d −→ N (0 , M (cid:112) var( M | A ) d −→ N (0 , n var( M | A ) = diag { π − t var( Y ( t ) − X (cid:62) b t ) } + o p (1). From the Slutsky theorem, weconclude that √ nM d −→ N (cid:0) , diag { π − t var( Y ( t ) − X (cid:62) b t ) } (cid:1) Then, consider M , which can be reformulated as M = b (cid:62) . . . b (cid:62) k k × ( kp ) ( ¯ X − µ X ) · · · ( ¯ X − µ X ) ( kp ) × From the central limit theorem, we can easily derive that √ nM d −→ N (cid:0) , B (cid:62) Σ X B (cid:1) . M , M ), where the ( t, s ) element equalscov { ¯ Y t − θ t − ( ¯ X t − µ X ) (cid:62) b t , ( ¯ X − µ X ) (cid:62) b s } = E (cid:8) cov( ¯ Y t − θ t − ( ¯ X t − µ X ) (cid:62) b t , ( ¯ X − µ X ) (cid:62) b s | A ) (cid:9) = E (cid:40) n t n (cid:88) i : A i = a t n (cid:88) j =1 cov( Y ( t ) i − X (cid:62) i b t , X (cid:62) j b s ) (cid:41) = E (cid:40) n t n (cid:88) i : A i = a t cov( Y ( t ) i − X (cid:62) i b t , X (cid:62) i b s ) (cid:41) = 1 n cov( Y ( t ) − X (cid:62) b t , X (cid:62) b s )= 1 n (cid:8) cov( Y ( t ) , X (cid:62) b s ) − cov( X (cid:62) b t , X (cid:62) b s ) (cid:9) = 1 n (cid:8) β (cid:62) t Σ X b s − b (cid:62) t Σ X b s (cid:9) = 1 n { β t − b t } (cid:62) Σ X b s where the first equality is from E { ( ¯ Y t − θ t − ( ¯ X t − µ X ) (cid:62) b t | A} = E { ( ¯ X − µ X ) (cid:62) b s | A} = 0.Thus, n cov( M , M ) = ( B − B ) (cid:62) Σ X B and n cov( M , M ) = B (cid:62) Σ X ( B − B ).Combining the above results, we conclude that √ n { ˆ θ ( b , . . . , b k ) − θ } is asymptoticallynormal with mean 0 and variance V SR ( B ), V SR ( B ) = diag { π − t var( Y ( t ) − X (cid:62) b t ) } + ( B − B ) (cid:62) Σ X B + B (cid:62) Σ X ( B − B ) + B (cid:62) Σ X B = diag { π − t var( Y ( t ) − X (cid:62) b t ) } + B (cid:62) Σ X B + B (cid:62) Σ X B − B (cid:62) Σ X B (ii) Note that V SR ( B ) − V SR ( B )= diag { π − t var( Y ( t ) − b (cid:62) t X ) } − diag { π − t var( Y ( t ) − β (cid:62) t X ) } − ( B − B ) (cid:62) Σ X ( B − B )42ecause var( Y ( t ) − b (cid:62) t X ) = var( Y ( t ) − β (cid:62) t X + β (cid:62) t X − b (cid:62) t X )= var { Y ( t ) − β (cid:62) t X } + var { ( β t − b t ) (cid:62) X } + 2cov { Y ( t ) − β (cid:62) t X, ( β t − b t ) (cid:62) X } = var { Y ( t ) − β (cid:62) t X } + ( β t − b t ) (cid:62) Σ X ( β t − b t )Thus, we have V SR ( B ) − V SR ( B ) = diag { π − t ( β t − b t ) (cid:62) Σ X ( β t − b t ) } − ( B − B ) (cid:62) Σ X ( B − B ) . The rest follows from applying Lemma 1 with M = Σ / X ( B − B ). From Lemma 3, we know that ˆ β = β + o p (1) and ˆ β t = β t + o p (1), t = 1 , . . . , k . Let σ A , σ B , σ U respectively be the asymptotic variance of c (cid:62) ts ˆ θ ANHC , c (cid:62) ts ˆ θ ANC and c (cid:62) ts ˆ θ AN , wherefrom Theorem 1, σ A = var( Y ( t ) − X (cid:62) β t ) π t + var( Y ( s ) − X (cid:62) β s ) π s + ( β t − β s ) (cid:62) Σ X ( β t − β s ) σ B = var { Y ( t ) − X (cid:62) β } π t + var { Y ( s ) − X (cid:62) β } π s σ U = var( Y ( t ) ) π t + var( Y ( s ) ) π s The results in Corollary 1 (i) follows from σ A − σ U = β (cid:62) t Σ X β t − X (cid:62) β t , Y ( t ) ) π t + β (cid:62) s Σ X β s − X (cid:62) β s , Y ( s ) ) π s + { β t − β s } (cid:62) Σ X { β t − β s } = β (cid:62) t Σ X β t − β (cid:62) t Σ X β t π t + β (cid:62) s Σ X β s − β (cid:62) s Σ X β s π s + { β t − β s } (cid:62) Σ X { β t − β s } = − β (cid:62) t Σ X β t π t − β (cid:62) s Σ X β s π s + { β t − β s } (cid:62) Σ X { β t − β s } = − { π s β t + π t β s } (cid:62) Σ X { π s β t + π t β s } π t π s ( π t + π s ) − { β t − β s } (cid:62) Σ X { β t − β s } (cid:18) − π t − π s π t + π s (cid:19) β t = Σ − X cov( X, Y ( t ) ). This also proves that σ A ≤ σ U , because Σ X is positive definite and π t + π s ≤
1. If σ A = σ U , then we must have π s β t + π t β s = 0 and (1 − π t − π s ) { β t − β s } = 0.To show the results in Corollary 1 (ii), notice that σ B = var { Y ( t ) − X (cid:62) β t + X (cid:62) β t − X (cid:62) β } π t + var { Y ( s ) − X (cid:62) β s + X (cid:62) β s − X (cid:62) β } π s = var { Y ( t ) − X (cid:62) β t } + var { X (cid:62) β t − X (cid:62) β } π t + var { Y ( s ) − X (cid:62) β s } + var { X (cid:62) β s − X (cid:62) β } π s where the second equality is becausecov { Y ( t ) − β (cid:62) t X, β (cid:62) t X − β (cid:62) X } = cov { Y ( t ) − β (cid:62) t X, X }{ β t − β } = { cov( Y ( t ) , X ) − β (cid:62) t Σ X }{ β t − β } = 0Then, σ A − σ B = { β t − β s } (cid:62) Σ X { β t − β s } − { β t − β } (cid:62) Σ X { β t − β } π t − { β s − β } (cid:62) Σ { β s − β } π s In order to show that σ A − σ B ≤