Envelopes for multivariate linear regression with linearly constrained coefficients
EEnvelopes for multivariate linear regression withlinearly constrained coefficients
R. Dennis Cook, * Liliana Forzani † and Lan Liu ‡ January 5, 2021
Abstract
A constrained multivariate linear model is a multivariate linear model with the columns ofits coefficient matrix constrained to lie in a known subspace. This class of models includesthose typically used to study growth curves and longitudinal data. Envelope methods havebeen proposed to improve estimation efficiency in the class of unconstrained multivariate linearmodels, but have not yet been developed for constrained models that we develop in this article.We first compare the standard envelope estimator based on an unconstrained multivariatemodel with the standard estimator arising from a constrained multivariate model in terms ofbias and efficiency. Then, to further improve efficiency, we propose a novel envelope estimatorbased on a constrained multivariate model. Novel envelope-based testing methods are also * R. Dennis Cook is Professor, School of Statistics, University of Minnesota, Minneapolis, MN 55455(E-mail: [email protected]). † Liliana Forzani is Professor, Facultad de Ingenier´ıa Qu´ımica, UNL. Researcher of CONICET, Santa Fe,Argentina (E-mail: [email protected]). ‡ Lan Liu is Assistant Professor, School of Statistics, University of Minnesota, Minneapolis, MN 55455(E-mail: [email protected]). a r X i v : . [ s t a t . M E ] J a n roposed. We provide support for our proposals by simulations and by studying the classicaldental data and data from the China Health and Nutrition Survey and a study of probioticcapacity to reduced Salmonella infection . Key Words: Growth curves, envelope models, repeated measures
Consider the multivariate linear regression model Y i = β + β X i + ε i , i = 1 , . . . , n, (1.1)where the stochastic response Y i ∈ R r , the non-stochastic predictor vectors X i ∈ R p , β ∈ R r ,and the error vectors ε i are independent copies of ε ∼ N (0 , Σ ) . Model (1.1) is unconstrainedin the sense that each response is allowed a separate linear regression: the maximum likelihoodestimator of the j -th row of ( β , β ) is the same as the estimator of the coefficients from the linearregression of the j -th response on X .In many applications, particularly analyses of growth curves and longitudinal data, there maybe additional information that span( β , β ) is contained in a known subspace U with basis matrix U ∈ R r × k . The classic dental data (Potthoff and Roy, 1964; Lee and Geisser, 1975; Rao, 1987;Lee, 1988) provides an illustration of this type of structure. Example 1
A study of dental growth measurements of the distance (mm) from the center of thepituitary gland to the pteryomaxillary fissure were obtained on 11 girls and 16 boys at ages 8, 10,12, and 14. The goal was to study the growth measurement as a function of time and sex. Y ik denote the continuous measure of distance for child i at age t k , for t k = 8 , , , ,and let X i denote the gender indicator for child i (1 for boy and 0 for girl). After graphicalinspection, many researchers treated the population means for distance as linear in time for eachgender. Following this tradition, a mixed effects repeated measure model is Y ik = α + b i + α X i + ( α + b i + α X i ) t k + ε ∗ ik , where ε ∗ i = ( ε ∗ i , . . . , ε ∗ ir ) T i.i.d. ∼ N (0 , Σ ∗ ) , b i and b i denotethe random intercept and slope, ( b i , b i ) i.i.d. ∼ N (0 , D ) , where D is a × positive difinite matrix.We rewrite this model as Y i = U α + U α X i + (cid:15) i (1.2)with U := ( , t ) with t = (8 , , , T , α = ( α , α ) α = ( α , α ) , ε i = ε ∗ i + b i r × + b i t and ε i i.i.d. ∼ N (0 , Σ ) . Applying the same ideas to just β , so span( β ) ⊆ U without requiring that span( β ) ⊆ U , leads to the model Y i = β + U α X i + ε i , i = 1 , . . . , n. (1.3)Let B = span( β ) . If we set U = 1 r , so in model (1.3) α is a row vector of length p , then the meanfunctions for the individual responses are parallel. Although motivated in the context of the dentaldata, we use models (1.2) and (1.3) as general forms that can be adapted for different applicationsby choice of U , referring to them as constrained multivariate linear models. Cooper and Evans(2002) used a version of model (1.2) with U reflecting charge balance constraints on chemicalconstituents of water samples.Constrained models occur in various areas including growth curve and longitudinal studies3here the elements of Y i are repeated observations on the i -th experimental unit over time. It iscommon in such settings to model the rows of U as a user-specified vector-valued function u ( t ) ∈ R k of time t , the i -th row of U then being u T ( t i ) . Polynomial bases u T ( t ) = (1 , t, t , . . . , t k − ) areprevalent, particularly in the foundational work of Potthoff and Roy (1964), Rao (1965), Grizzleand Allen (1969) and others, but splines (Nummi and Koskela, 2008) or other basis constructions(Izenman and Williams, 1989) could be used as well. In longitudinal studies, model (1.2) might beused when it is desirable to model profiles, while model (1.3) could be used when modeling justprofile differences. For instance, if X = 0 , is a population indicator then under model (1.2) themean profiles are modeled as U α and U ( α + α ) , while under model (1.3) the profile meansare β and β + U α . It is known in the literature that constrained models gain efficiency in theestimators compare with model (1.1), provided that U is correctly specified.However, it may be very difficult to correctly specify U in some applications, as in the follow-ing study from Kenward (1987). Example 2
An experiment was carried out to compare two treatments for the control of gut wormin cattle. The treatments were each randomly assigned to 30 cows whose weights were measured at , , , . . . , and weeks after treatment. The goal of the experiment was to see if a differentialtreatment effect could be detected and, if so, the time point when the difference was first manifested. The constrained models (1.2) and (1.3) require that we select U . Lacking prior knowledge, it isnatural to inspect plots of the average weight by time, as shown in Figure 1.1. It seems clear fromthe figure that it would be difficult to model the treatment profiles, particularly their two crossingpoints, without running into problems of over fitting. Envelopes provided a way to model data likethat illustrated in Figure 1.1 without specifying a subspace U .4igure 1.1: Cattle data: Average weight by treatment and time.Envelope methodology is based on a relatively new paradigm for dimension reduction that,when applied in the context of model (1.1), has some similarity with constrained multivariate mod-els. Briefly, envelopes produce a re-parameterization of model (1.1) in terms of a basis Γ ∈ R r × u for the smallest reducing subspace of Σ that contains B . Like the constrained model, envelopesproduce an upper bound for B , B ⊆ span( Γ ) , but unlike the constrained model the bound is un-known and must be estimated. Also, unlike the constrained model, Γ T Y contains the totality of Y that is affected by changing X . Since B ⊆ span( Γ ) , we have β = Γ η for some η ∈ R u × p . Model(1.1) can be then be re-paremeterized to give its envelope counterpart, Y i = β + Γ η X i + ε i , i = 1 , . . . , n, (1.4) Σ = ΓΩΓ T + Γ Ω Γ T , where ( Γ , Γ ) ∈ R r × r , orthogonal, Ω = Γ T ΣΓ > and Ω = Γ T ΣΓ > . Envelopes arereviewed in more detail in Section 2.2.Comparing (1.2)–(1.3) with (1.4), both express β as a basis times a coordinate matrix: β = U α in (1.2)–(1.3) and β = Γ η in (1.4). However, as mentioned previously, Γ is estimated but U isassumed known. Envelopes were first proposed by Cook et al. (2007) to facilitate dimension5eduction and later were shown by Cook et al. (2010) to have the potential to achieve massiveefficiency gains relative to the standard maximum likelihood estimator of β , and that these gainswill be passed on to other tasks such as prediction. There are now a number of extensions andapplications of this basic envelope methodology, each demonstrating the potential for substantialefficiency gains (Su and Cook, 2011; Cook and Zhang, 2015a,b; Forzani and Su, 2020; Su et al.,2016; Li and Zhang, 2017; Rekabdarkolaee et al., 2017). Studies over the past several years havedemonstrated repeatedly that sometimes the efficiency gains of the envelope methods relative tostandard methods amount to increasing the sample size many times over. See Cook (2018) for areview and additional extensions of envelope methodology.The choice between a constrained model, (1.2) or (1.3), and the envelope model (1.4) hingeson the ability to correctly specify an upper bound U for span( β , β ) or B . As we show in Section2, if we have a correct parsimonious basis U then the constrained models are more efficient. But ifbias is present or if we use a correct but excessive U , then the envelope model (1.4) can be muchmore efficient. Although considerable methodology has been developed for the envelope version(1.4) of the unconstrained model (1.1), there are apparently no envelope counterparts available forthe class of models represented by (1.2) and (1.3) when a correct parsimonious U is available.In Section 3 we show how to adapt the envelope paradigm to models (1.2) and (1.3) to achieveefficiency gains over those models. Testing methods are proposed in Section 3.3 to evaluate thechoice of U and also to test importance of predictors. Simulations to support our finding are givenin Section 4 and in Section 5 we compare our methodology with others in two examples. Proofsfor all propositions and discussions of related issues are available in a Supplement to this article.6 otational conventions. Given a sample ( a i , b i ) , i = 1 , . . . , n , let T a , b = n − (cid:80) ni =1 a i b Ti de-note the matrix of raw second moments, and let T a = n − (cid:80) ni =1 a i a Ti . For raw second momentsinvolving Y S and Y D (defined herein) we use S and D as subscripts. We use a subscript 1 inresiduals computed from a model containing a vector of intercepts. The absence of a 1 indicatesno intercept was included. For instance, R a | b means the residuals from the regression of a on b without an intercept vector, a i = β b i + e , while R a | (1 , b ) means the residuals from the regressionof a on b with an intercept vector, a i = β + β b i + e i . Similarly, R D | S means a residual from theregression of Y D and Y S without an intercept, while R D | (1 ,S ) means a residual from the regressionof Y D and Y S with an intercept.Sample variances are written as S a = n − (cid:80) ni =1 ( a i − ¯ a )( a i − ¯ a ) T and sample covariancematrices are written as S a , b = n − (cid:80) ni =1 ( a i − ¯ a )( b i − ¯ b ) T . For variances and covariances involving Y D and Y S we again use D and S as subscripts, e.g. S D = n − (cid:80) ni =1 ( Y Di − ¯ Y D )( Y Di − ¯ Y D ) T . The notation S a | b means the covariance matrix of the residuals from fit of the model a i = β + β b i + e i , which always includes as intercept. That is, S a | b = n − (cid:80) ni =1 R a | (1 , b ) ,i R T a | (1 , b ) ,i .Similarly, S D | S = (cid:80) ni =1 R D | (1 ,S ) ,i R TD | (1 ,S ) ,i .We use span( A ) to denote the subspace spanned by the columns of the matrix A . The pro-jection onto S = span( A ) will be denoted using either the subspace itself P S or its basis P A .Projections onto an orthogonal complement will be denoted similarly using Q ( · ) = I − P ( · ) . For asubspace S and conformable matrix B , B S = { B S | S ∈ S} . If an estimator a ∈ R r of α ∈ R r has the property that √ n ( a − α ) is asymptotically normal with mean 0 and variance A , we write avar( √ n a ) = A to denote its asymptotic variance.7 Comparison of the envelope and constrained estimators
Models (1.2)–(1.3) and (1.4) are similar in the sense that β is represented as a basis times a co-ordinate matrix, β = U α in (1.2)–(1.3) and β = Γ η in (1.4). It might be thought that (1.2) and(1.3) would yield the better estimators because U is known while Γ is not, but that turns out not tobe so generally, in part because we may have B (cid:54)⊆ U , which raises the issue of bias as discussedin Section 2.3, and in part because the envelope model capitalizes automatically on structure in Σ ,which can improve efficiency as discussed in Section 2.4. Our general conclusion is that in prac-tice it may be necessary to compare their fits before selecting an estimator, and that the envelopeestimator may have a clear advantage when there is uncertainty in the choice of U , as illustrated inFigure 1.1.Developments under models (1.2) and (1.3) are very similar since they differ only on how theintercept is handled. In the remainder of this article we focus on model (1.2) and comment fromtime to time on modifications necessary for model (1.3). All subsequent developments are undermodel (1.2) unless model (1.3) is indicated explicitly. Our treatment of maximum likelihood estimation from (1.2) is based on linearly transforming Y .Let U be a semi-orthogonal basis matrix for U ⊥ , and let W = ( U ( U T U ) − , U ) := ( W , W ) .Then the transformed model becomes W T Y i := Y Di Y Si = ( U T U ) − U T Y i U T Y i = α + α X i + W T ε i , i = 1 , . . . , n, (2.1)8here Y Di ∈ R k and Y Si ∈ R r − k with k the number of columns of U . The transformed variancecan be represented block-wise as Σ W := var( W T ε ) = ( W Ti ΣW j ) , i, j = 1 , , where Σ is asdefined for model (1.2). The mean E ( Y D | X ) depends non-trivially on X and thus, as indicatedby the subscript D , we think of Y D as providing direct information about the regression. On theother hand, E ( Y S | X ) = 0 and thus Y S provides no direct information but may provide usefulsubordinate information by virtue of its association with Y D .To find the maximum likelihood estimators from model (2.1), we write the full log likelihoodas the sum of the log likelihoods for the marginal model for Y S | X and the conditional model for Y D | ( X , Y S ) : Y Si | X = e Si (2.2) Y Di | ( X i , Y Si ) = α + α X i + φ D | S Y Si + e D | Si , (2.3)where φ D | S = ( U T U ) − U T ΣU ( U T ΣU ) − ∈ R k × ( r − k ) , e D | S = W T ε , e S = W T ε . Thevariances of the errors are Σ S := var( e S ) = U T ΣU and Σ D | S := var( e D | S ) = ( U T Σ − U ) − .The number of free real parameters in this conditional model is N cm ( k ) = k ( p + 1) + r ( r + 1) / .The subscript ‘ cm ’ is used also to indicate estimators arising from the conditional model (2.3). Themaximum likelihood estimator and its asymptotic variance are (cid:98) β cm = U (cid:98) α cm = US D, R X | (1 ,S ) S − X | S = U ( S D, X − S D,S S − S S S, X ) S − X | S (2.4) avar( √ n vec( (cid:98) β cm )) = Σ − X ⊗ UΣ D | S U T , (2.5)where (cid:98) α cm and (cid:98) β cm are the MLEs of α and β in model (2.1).9stimation for model (1.3) requires just a few modifications of the procedure for model (1.2).All modifications stem from the presence of an intercept vector in model (2.2), which becomes Y S = W T β + e S . In consequence, the variance Σ S is estimated as (cid:98) Σ S = S S with correspondingchange in the estimator of Σ , and the estimator of the intercept W T β is just ¯ Y S . The interceptin (2.3) is redefined as α = W T β − φ D | S W T β . The maximum likelihood estimator of β in model (1.3) can be constructed straightforwardly from the estimators of α , W T β and φ D | S .Because there is an intercept in (2.2), the number of real parameters becomes N cm + r − k . Impor-tantly, the estimators of the parameters in (2.3) are unchanged. This means in particular that (cid:98) α cm and (cid:98) β cm along with their asymptotic variances are the same under models (1.2) and (1.3), althoughdifferent U ’s might be used in their construction. (1.1) Consider a subspace
S ⊆ R r that satisfies the two conditions (i) Q S Y | X = x ∼ Q S Y | X = x for all relevant x and x and (ii) P S Y Q S Y | X . Condition (i) insures that the marginaldistribution of Q S Y does not depend on X , while statement (ii) insures that, given X , Q S Y cannotprovide material information via an association with P S Y . Together these conditions imply thatthe impact of X on the distribution of Y is concentrated solely in P S Y . One motivation underlyingenvelopes is then to characterize linear combinations Q S Y that are unaffected by changes in X and in doing so produce downstream gains in estimative and predictive efficiency.In terms of model (1.1), condition (i) holds if and only if B ⊆ S and condition (ii) holds ifand only if S is a reducing subspace of Σ ; that is, S must decompose Σ = P S ΣP S + Q S ΣQ S .The intersection of all subspaces with these properties is by construction the smallest reducing10ubspace of Σ that contains B , which is called the Σ -envelope of B and is represented as E Σ ( B ) (Cook et al., 2010). These consequences of conditions (i) and (ii) can be incorporated into model(1.1) by using a basis, leading to model (1.4). Let u ∈ { , , . . . , r } denote the dimension of E Σ ( B ) . The number of free real parameters is N em = r + pu + r ( r + 1) / . The subscript ‘ em ’is used also to indicate selected quantities arising from this envelope model. The goal here still toestimate β = Γ η and Σ . Cook et al. (2010) derived the maximum likelihood envelope estimatorsof β and Σ along with their asymptotic variances. They showed that substantial efficiency gainsin estimation of β are possible under this model, particularly when a norm of var( Γ T Y ) = Ω isconsiderably larger than the same norm of var( Γ T Y ) = Ω .Given the envelope dimension u , the maximum likelhood estimator (cid:98) β em of β = Γ η fromenvelope model (1.4) has asymptotic variance given by avar( √ n vec( (cid:98) β em )) = Σ − X ⊗ ΓΩΓ T + ( η T ⊗ Γ ) M † ( Σ X )( η ⊗ Γ T ) , (2.6)where for a C ∈ R p × p , M ( C ) := η C η T ⊗ Ω − + Ω ⊗ Ω − + Ω − ⊗ Ω − I and † denotes the Moore-Penrose inverse. Cook et al. (2010) showed that avar( √ n vec( (cid:98) β em )) ≤ avar( √ n vec( (cid:98) β um )) , where (cid:98) β um is the maximum likelihood estimator under the uncontrained model (1.1). In consequence,estimators from the envelope model (1.4) are always superior to those from the unconstrainedmultivariate model (1.1). Cook et al. (2010) also showed that the envelope estimator is √ n -consistent even the normality assumption is violated as long as the data has finite fourth moments.11 .3 Potential bias in (cid:98) β cm Assuming that
B ⊆ U , (cid:98) α cm and (cid:98) β cm are unbiased estimators of α and β . However, if B (cid:54)⊆U then both (cid:98) α cm and (cid:98) β cm are biased, which could materially affect the estimators: E ( (cid:98) α cm ) =( U T U ) − U T β and E ( (cid:98) β cm ) = P U β . Consequently, the bias in (cid:98) β cm is β − P U β = Q U β . Anonzero bias must necessarily dominate the mean squared error asymptotically and so could limitthe utility of (cid:98) β cm . Simulation results that illustrate the potential bias effects are discussed in Section4.2. Otherwise, we assume that B ⊆ U in the remainder of this article unless indicated otherwise. (cid:98) β em and (cid:98) β cm We now compare the asymptotic variance of the envelope and constrained estimators of β , (2.6)and (2.5). Depending on the dimensions involved, the relationship between U and the envelope E Σ ( B ) and other factors, the difference between the asymptotic covariance matrices for the estima-tors – (cid:98) β em and (cid:98) β cm – from these two models can be positive definite, negative definite or indefinite.Since all comparisons are in terms of β ’s, we assume without loss of generality that U is a semi-orthogonal matrix. Also, since (cid:98) β cm is the same under models (1.2) and (1.3) we do not distinguishbetween these models in this section. B ⊆ U ⊆ E Σ ( B ) Assuming that U is correct so B ⊆ U and that
U ⊆ E Σ ( B ) is one way to simplify the variancecomparison: Proposition 2.1 If B ⊆ U ⊆ E Σ ( B ) , then avar( √ n vec( (cid:98) β cm )) ≤ avar( √ n vec( (cid:98) β em )) . In consequence, under the hypothesis
B ⊆ U ⊆ E Σ ( B ) , the constrained estimator (cid:98) β cm is superior12o the envelope estimator (cid:98) β em . However, this comparison may be seen as loaded in favor of (cid:98) β cm since the constrained estimator uses the additional knowledge that B ⊆ U , the envelope estimatordoes not. Additionally, neither estimator makes use of the proposition’s hypothesis. The nextproposition provides help in assessing the impact of the hypothesis on the underlying structure byconnecting it with E Σ ( U ) , the Σ -envelope of U . Proposition 2.2
Assume that
B ⊆ U . Then1. E Σ ( B ) ⊆ E Σ ( U ) ,2. U ⊆ E Σ ( B ) if and only if E Σ ( B ) = E Σ ( U ) ,3. If rank( α ) = k then B = U and E Σ ( B ) = E Σ ( U ) . This proposition says essentially that if
B ⊆ U ⊆ E Σ ( B ) then we can start with model (1.1) andparameterize in terms of E Σ ( U ) rather than E Σ ( B ) . A key distinction here is that U is known while B is not. In consequence, we expect less estimative variation when parameterizing (1.1) in termsof E Σ ( U ) instead of E Σ ( B ) . Since U ⊆ E Σ ( U ) can construct a semi-orthogonal basis for E Σ ( U ) as Γ = ( U , Γ ) with U = ( Γ , Γ ) and, recognizing that β = U α = Γ η , we get a new model Y i = U α + U α X i + ε i , i = 1 , . . . , n (2.7) Σ = ΓΩΓ T + Γ Ω Γ T . Consider estimating α from this model using the steps sketched in Section 2.1, and partition Ω =( Ω ij ) to conform to the partition of Γ = ( U , Γ ) . The envelope structure of (2.7) induces aspecial structure on the reduced model that corresponds to (2.2)–(2.3): Σ S = bdiag( Ω , Ω )
13s block diagonal, Σ D | S = Ω − Ω Ω − Ω and φ D | S = ( Ω Ω − , . It can now be shownthat the estimators of α from the constrained model (2.2)–(2.3) and from (2.7) have the sameasymptotic variance. In other words, if we neglect the hypothesized condition that U ⊆ E Σ ( B ) then the constrained estimator is better, but if we formulate the envelope model making use of thatcondition then the constrained and envelope estimators are asymptotically equivalent.Rao (1967) posited a simple structure for the analysis of balanced growth curve data (Seealso Geisser, 1970; Lee and Geisser, 1975; Geisser, 1981; Lee, 1988; Pan and Fang, 2002). Inour context, Rao’s simple structure is obtained by assuming that E Σ ( U ) = U , which correspondsto model (2.7) with Γ = U . In view of the options available, Rao’s simple structure seems toospecialized to warrant further attention. Additional discussion of Rao’s simple structure is availablein Supplement Section ?? . U ⊇ E Σ ( B ) Assuming that
U ⊇ E Σ ( B ) is another way to simplify the variance comparison. Let Γ ∈ R r × u be asemi-orthogonal basis matrix for E Σ ( B ) and let ( Γ , Γ ) be an orthogonal matrix. Since U ⊇ E Σ ( B ) ,we can construct semi-orthogonal bases U = ( Γ , Γ ) and Γ = ( Γ , Γ ) . Partition Ω = ( Ω ,ij ) to correspond to the partitioning of Γ . Then Proposition 2.3
Assume that
U ⊇ E Σ ( B ) and let c ∈ R r . Then1. If c ∈ E Σ ( B ) then avar( √ n c T (cid:98) β cm ) = avar( √ n c T (cid:98) β em ) .2. If c ∈ span( Γ ) then avar( √ n c T (cid:98) β cm ) ≤ avar( √ n c T (cid:98) β em ) .3. If c ∈ span( Γ ) , rank( M ( Σ X )) = rank( η Σ X η T ⊗ Ω − ) and Ω = 0 then avar( √ n c T (cid:98) β cm ) ≥ avar( √ n c T (cid:98) β em ) . (cid:98) β em and (cid:98) β cm can be positive semi-definite or negative semi-definite, de-pending on the characteristics of problem.Although the above derivation is under two simple cases where U and the envelope space arenested, the conclusion actually holds for the general case: if we have a correct parsimoniouslyparameterized constrained model then the envelope model (1.4) is less efficient; but if the basis U in the constrained model is incorrect or if the constrained model is excessively parameterized,then envelopes can be much more efficient. This motivated us to incorporate envelopes into theconstrained model so that we can further improve efficiency if constraints are reasonably wellmodeled for the data In this section, we consider two different ways of imposing envelopes in a constrained model when
B ⊆ U . As mentioned previously, we focus on envelope estimators in the constrained model (1.2)and later describe the modifications necessary for model (1.3). In Section 3.1 we describe envelopeestimation of α when there is available an application-grounded basis U that is key to interpretationand inference. In Section 3.2 we address envelope estimation of β = U α . Here the choice of basis U has no effect on the maximum likelihood estimators of β under the constrained models (1.2),but it does affect the envelope estimator of β . Basis selection is addressed in Section 3.2. α Estimation of α will be of interest when it is desirable to interpret β = U α in terms of its coor-15inates α relative to the known application-grounded basis U . Let A = span( α ) . The envelopeestimator of α in model (2.1) can be found by first transforming (2.1) into (2.2)–(2.3) and thenparameterizing (2.3) in terms of a semi-orthogonal basis matrix φ ∈ R k × u for E Σ D | S ( A ) , the Σ D | S -envelope of A with dimension u ≤ k . Since avar( √ n vec( (cid:98) α cm )) = Σ − X ⊗ Σ D | S is in theform of a Kronecker product that allows separation of row and column effects of α , this structurefollows also from the theory of Cook and Zhang (2015a,b) for matrix-valued envelope estimatorsbased on envelopes of the form R p ⊕ E Σ D | S ( A ) , where ⊕ denotes the direct sum.Let η ∈ R u × p be an unconstrained matrix giving the coordinates of α in terms of semi-orthogonal basis matrix φ , so α = φη , and let ( φ , φ ) ∈ R k × k be an orthogonal matrix. Thenthe envelope version of model (2.2)–(2.3) is a version of the partial envelope model (Su and Cook,2011): Y Si | X = e Si and Y Di | ( X i , Y Si ) = α + φη X i + φ D | S Y Si + e D | Si , (3.1) Σ D | S = φ Ω φ T + φ Ω φ T , where Ω ∈ R u × u and Ω ∈ R ( k − u ) × ( k − u ) are positive definite matrices. The total real parametersin model (3.1) is N ecm ( u ) = k + pu + r ( r + 1) / , which reduces to that given previously for model(2.2)–(2.3) when u = k . The subscript ‘ecm’ is used to indicate selected key quantities that arisefrom enveloping A in constrained model (1.2). A basis (cid:98) φ for the maximum likelihood estimator (cid:98) E Σ D | S ( A ) of E Σ D | S ( A ) is constructed as (Su and Cook, 2011; Cook, 2018, Ch. 3), (cid:98) φ = arg min G log | G T S D | ( X ,S ) G | + log | G T S − D | S G | , (3.2)16here the minimum is computed over all semi-orthogonal matrices G ∈ R k × u with u ≤ k . Thefully maximized log likelihood is ˆ L u = c − n (cid:110) log | T S | + log | S D | S | + log | (cid:98) φ T S D | ( X ,S ) (cid:98) φ | + log | (cid:98) φ T S − D | S (cid:98) φ | (cid:111) . (3.3)where c = n log | W | − ( nr/ π )) with the log | W | term corresponding to the Jacobiantransformation back to the scale of Y .Once (cid:98) φ is obtained we get the following envelope estimators for constrained model (1.2).Specifically, we have (cid:98) β ecm = U (cid:98) α ecm , (cid:98) α ecm = P (cid:98) Φ (cid:98) α cm = (cid:98) φ (cid:98) η , (cid:98) α = ¯ Y D − (cid:98) α ecm ¯ X − (cid:98) φ D | S ¯ Y S , where (cid:98) η = (cid:98) φ T (cid:98) α cm , and (cid:98) φ D | S = S D,S S − S − (cid:98) α ecm S X ,S S − S . We also have (cid:98) Ω = (cid:98) φ T S D | ( X ,S ) (cid:98) φ and (cid:98) Ω = (cid:98) φ T S D | S (cid:98) φ , where (cid:98) Σ D | S = (cid:98) φ (cid:98) Ω (cid:98) φ T + (cid:98) φ (cid:98) Ω (cid:98) φ T and (cid:98) Σ S = T S . The variances Σ W and Σ can beestimated as indicated in Section 2.1.The asymptotic variance for (cid:98) α ecm can be deduced from Su and Cook (2011, Prop. 1) recogniz-ing that in our application Y S is random, X is fixed, and the distribution of Y S | X is the same asthat of the marginal of Y S : avar( √ n vec( (cid:98) α ecm )) = Σ − X ⊗ φ Ω φ T + ( η T ⊗ φ ) M † ( Σ X )( η ⊗ φ T ) . It can be shown that avar( √ n vec( (cid:98) α ecm )) ≤ avar( √ n vec( (cid:98) α cm )) , so using an envelope in the con-strained model always improves estimation asymptotically.Because E Σ D | S ( A ) ⊆ R k , E Σ ( B ) ⊆ R r and k ≤ r it is reasonable to expect that dim {E Σ D | S ( A ) } ≤ dim {E Σ ( B ) } , as we have estimated in many examples. However, this relationship between theenvelope dimension is not guaranteed in general. The following proposition gives conditions suf-17cient to bound dim {E Σ D | S ( A ) } . Proposition 3.1
Assume that U = ( ΓG , Γ G ) , where the Γ ’s are as defined for model (1.4),and that G ∈ R u × u and G ∈ R ( r − u ) × ( k − u ) both have full column rank, so u ≤ u . Then dim {E Σ D | S ( A ) } ≤ u ≤ dim {E Σ ( B ) } . β Estimation of β = U α will be of interest in applications where prediction is important or where U is selected based on convenience, say, rather than on criteria that facilitate understanding andinference. For instance, if X serves to indicate different treatments then plots of the columns of β versus time give a visual comparisons of the treatment profiles. The choice of U is of courserelevant to estimation of β , but a basis U is not uniquely determined. While this flexibility hasno effect on the maximum likelihood estimators of β under the constrained model (1.2), it doesaffect the envelope estimator of β . This raises the issue of selecting a good basis for the purposeof estimating β via envelopes.Consider re-parameterizing U as UV − and α as V α for some positive definite matrix V ∈ R k × k , giving β = U α = ( UV − )( V α ) . We could use either E Σ D | S ( A ) or E VΣ D | S V T ( V A ) toestimate β as (cid:98) β ecm = U (cid:98) α ecm or, in terms of re-parameterized coordinates V α , as (cid:98) β ecm , V = UV − (cid:91) ( V α ) ecm . In general (cid:98) β ecm (cid:54) = (cid:98) β ecm , V and we cannot tell which estimator is necessarilybetter. In this section, we show that the envelope estimator of β is invariant under orthogonalre-parameterization, so we only need to consider diagonal re-parameterization: β = U α =( UΛ − )( Λ α ) , where Λ is a diagonal matrix with positive diagonal elements. In growth curveor longitudinal analyses for instance, the columns of U may correspond to different powers of18ime, and then it seems natural to consider rescaling to bring the columns of U closer to the samescale.The following two propositions provide technical tools for demonstrating that the maximumlikelihood envelope estimator of β = U α is simply (cid:98) β ecm = U (cid:98) α ecm when U is semi-orthogonal,where (cid:98) α ecm is the envelope estimator of α under the constrained model (1.2). Proposition 3.2 (a) Let
S ⊆ R k be a reducing subspace of the symmetric matrix M ∈ R k × k , andlet V ∈ R p × k be a semi-orthogonal matrix. Then V S is a reducing subspace of VMV T . (b) Let D ∈ R p be a reducing subspace of VMV T . Then V T D is a reducing subspace of M . Proposition 3.3
Let E M ( S ) ⊆ R k be the smallest reducing subspace of the symmetric matrix M ∈ R k × k that contains S ⊆ R k , and let V ∈ R p × k be a semi-orthogonal matrix. Then V E M ( S ) is the smallest reducing subspace of VMV T that contains V S ; that is, V E M ( S ) = E VMV T ( V S ) . These two propositions show that the results of Section 3.1 can be used straightforwardly toget the envelope estimator of U α when U is semi-orthognonal. The standard maximum likelihoodestimator of U α is just U (cid:98) α cm with asymptotic covariance matrix UΣ D | S U T . In consequence, fol-lowing the rationale at the beginning of Section 3.1, we seek the maximum likelihood estimatorof E UΣ D | S U T ( U A ) , which by Proposition 3.3 is equal to U E Σ D | S ( A ) . From Proposition 3.3, themaximum likelihood estimator of E UΣ D | S U T ( U A ) is U (cid:98) E Σ D | S ( A ) , which implies that envelopeestimator of β = U α is (cid:98) β ecm = U (cid:98) α ecm with asymptotic variance U avar( √ n (cid:98) α ecm ) U T . Proposi-tions 3.2 and 3.3 also suggest how to proceed when re-prameterizing as β = U α = ( UO T )( O α ) ,where O is an orthognal matrix and U is not necessarily orthogonal. In that case the envelope es-timator of O α is simply O α ecm , and so the envelope estimator of β is invariant under orthogonalre-paramterization of the kind used here. 19hus, to consider constrained model envelope under a linear transformation of U , it sufficesto consider a re-scaling transformation. That is, we consider β = U α = ( UΛ − )( Λ α ) , where Λ = diag(1 , λ , . . . , λ k ) . The first diagonal element of Λ is 1 to ensure identifiability. We followthe general logic of Cook and Su (2013) in their development of a scaled version of envelope model(1.2).Without loss of generality, we cast our discussion of scaling in the context of conditional model(2.3). We suppose that there is a scaling of the response Y D so that the scaled response ΛY D follows an envelope model in Λ α with the envelope E ΛΣ D | S Λ ( Λ A ) having dimension v and semi-orthogonal basis matrix Θ ∈ R k × v . Let ( Θ , Θ ) denote an orthogonal matrix. Then we can param-eterize Λ α = Θ η and ΛΣ D | S Λ = ΘΩΘ T + Θ ΩΘ T . This setup can also be viewed equivalentlyas a rescaling U (cid:55)→ UΛ − of U , since ΛY D = Λ ( U T U ) − U T Y = ( Λ − U T UΛ − ) − Λ − U T Y .Since ΛY D is unobserved, we now transform back to the original scale for analysis, leading to themarginal model Y Si | X = e Si and conditional model Y Di | ( X i , Y Si ) = α + Λ − Θ η X i + φ D | S Y Si + e D | Si , (3.4) Σ D | S = Λ − ( ΘΩΘ T + Θ Ω Θ T ) Λ − . The total real parameters in this scaled envelope model is N secm ( v ) = 2 k − pv + r ( r + 1) / ,where the subscript ‘secm’ is used to indicate quantities arising from the scaled envelope version ofthe conditional model. For identifiability we typically need N secm ( v ) ≤ N cm or p ( k − v ) ≥ k − .The goal now is to estimate α , the coefficient matrix β = UΛ − Θ η and Σ D | S , which requiresthe estimation of several constituent parameters.20fter maximizing the log likelihood over all parameters except ( Λ , Θ ) we have ( (cid:98) Λ , (cid:98) Θ ) = arg min A , G log | G T AS D | ( X ,S ) AG | + log | G T A − S − D | S A − G | , (3.5)where the minimum is computed over all semi-orthogonal matrices G ∈ R k × v and diagonal matri-ces A = diag(1 , a , . . . , a k ) . Aside from the inner product matrices S D | ( X ,S ) and S − D | S this is thesame as the objective function that Cook and Su (2013) derived for response scaling prior to usingmodel (1.4), which allowed us to adapt their optimization algorithm to handle (3.5).Having determined the maximum likelihood estimators (cid:98) Λ and (cid:98) Θ , the remaining parameterestimators are (cid:98) β secm = U (cid:98) Λ − P (cid:98) Θ (cid:98) Λ (cid:98) α cm , (cid:98) α = (cid:98) Λ − P (cid:98) Θ (cid:98) Λ (cid:98) α cm , (cid:98) α = ¯ Y D − (cid:98) β secm ¯ X − (cid:98) φ D | S ¯ Y S ,where (cid:98) η = (cid:98) Θ T (cid:98) Λ (cid:98) α cm , (cid:98) φ D | S = ( S D,S − (cid:98) α S X ,S ) S − S . We also have (cid:98) Ω = (cid:98) Θ T (cid:98) ΛS D | ( X ,S ) (cid:98) Λ (cid:98) Θ , (cid:98) Ω = (cid:98) Θ T (cid:98) ΛS D | S (cid:98) Λ (cid:98) Θ , where (cid:98) Σ D | S = (cid:98) Λ − ( (cid:98) Θ (cid:98) Ω (cid:98) Θ T + (cid:98) Θ (cid:98) Ω (cid:98) Θ T ) (cid:98) Λ − , (cid:98) Σ S = T S . The variances Σ W and Σ can be estimated as indicated in Section 2.1.This representation of the scaled envelope estimator (cid:98) β secm shows the construction process. Firstthe direct-information response is transformed to (cid:98) ΛY D . The constrained estimator (cid:98) Λ (cid:98) α cm and theenvelope estimator P (cid:98) Θ (cid:98) Λ (cid:98) α cm are then determined in the transformed scale. Next, the estimator istransformed back to the original scale by multiplying by (cid:98) Λ − to get (cid:98) Λ − P (cid:98) Θ (cid:98) Λ (cid:98) α cm , which is theestimator of α in the original scale. Finally, the estimator in the original scale is multiplied by U to give the scaled envelope estimator of β . In effect, (cid:98) Λ is a similarity transformation to represent P (cid:98) Θ in the original coordinate system as (cid:98) Λ − P (cid:98) Θ (cid:98) Λ .The fully maximized log likelihood is ˆ L u = c − n (cid:110) log | T S | + log | S D | S | + log | (cid:98) Θ T (cid:98) ΛS D | ( X ,S ) (cid:98) Λ (cid:98) Θ | + log | (cid:98) Θ T (cid:98) Λ − S − D | S (cid:98) Λ − (cid:98) Θ | (cid:111) , (3.6)21here c = n log | W | − ( nr/ π )) . To describe the asymptotic variance of (cid:98) β secm , let V secm denote the upper pk × pk diagonal block of the asymptotic variance V given by Proposition2 from Cook and Su (2013) with Σ replaced by Σ D | S , Γ by Θ and Γ by Θ and Λ with Λ − .Additionally, Ω and Ω in the Cook-Su notation are the same as the corresponding quantities in thedecomposition of Σ D | S for model (3.4) . Then avar( √ n vec( (cid:98) β secm )) = ( I p ⊗ U ) V secm ( I p ⊗ U T ) . Using the envelope version (3.1) of constrained model (1.2), we address in Section 3.3.1 the ade-quacy of U through a test on the rows of α and in Section 3.3.2 we present a test of the column of α to asses the importance of the predictors. U by testing rows of α Having selected the dimension u of the envelope, we may also want to test if U is over specified.This can be achieved by testing if individual rows of α are equal to . For instance, if U T ( t ) =(1 , t, t , t ) we might wish to test if the cubic term is necessary by testing if the last row of α is .Consider a test that the last k ≤ k − u rows of the α = φη in model (3.1) all equal .Following Su et al. (2016) and Zhu and Su (2019), this hypothesis can be tested by conformablypartitioning in (3.1) φ = ( φ T , φ T ) T with φ j ∈ R k j × u , j = 1 , , and then testing if φ = 0 , sounder the null hypothesis α = (( φ η ) T , T . The restriction k ≤ k − u on the number of rowstested arises because the rank of φ must equal u under both the null and alternative hypothesis. If k = k − u then without loss of generality we can take φ = I u .When k < k − u , the maximum likelihood estimator of φ can be found by following thesteps leading to (3.2) and then introducing the restriction that G = ( G T , T . Partition Y D = Y TD , Y TD ) T to conform to the partitioning of φ . Then (cid:98) φ = arg min G log | G T S D | ( X ,S ) G | + log | G T S − D | ( D ,S ) G | , (3.7)where the minimum is computed over all semi-orthogonal matrices G ∈ R k × u , S D | ( D ,S ) is thesample residual covariance matrix of the regression of Y D on ( Y D , Y S ) with an intercept, and S D | ( X ,S ) is the sample residual covariance matrix of the regression of Y D on ( X , Y S ) with anintercept. When k = k − u , we must have φ = I u and no estimate of φ is required.The likelihood ratio test statistic is computed as twice the difference between the log likelihoodunder the null hypothesis, ˆ L u,k = c − n (cid:110) log | T S | + log | S D | S | + log | S D | ( D ,S ) | + log | (cid:98) φ T S D | ( X ,S ) (cid:98) φ | + log | (cid:98) φ T S − D | ( D ,S ) (cid:98) φ | (cid:111) , (3.8)and the fully maximized log likelihood (3.3). Here c = n log | W | − ( nr/ π )) . Underthe null hypothesis φ = 0 this difference is asymptotically distributed as a chi-squared randomvariable with uk degrees of freedom. α In some studies we may wish to estimate and infer about column contrasts α = α c , where c ∈ R p × p is a user-selected matrix of known constants with p < p . For instance, when X is atreatment indicator, testing column contrasts allows testing equality of treatment means. Let A =span( α ) . We could use the conditional model (1.2), basing estimation and inference on (cid:98) α cm c .Or we could proceed following the envelope analysis of Section 3.1 and use the estimator (cid:98) α ecm c avar( √ n vec( (cid:98) α ecm c )) as a basis for inference. The latter estimator ispreferable since avar( √ n vec( (cid:98) α ecm c )) ≤ avar( √ n vec( (cid:98) α cm c )) . But there is a potential to gainadditional asymptotic efficiency by using envelope methods to estimate α directly.To develope an envelope estimator of α , we first parameterize model (1.2) so α appearsexplicitly. Select a matrix c ∈ R p × p , p + p = p , so that C = ( c , c ) ∈ R p × p is non-singularand define new predictors and parameters as Z = C − X and α = α c . Then we have U α X = U α CC − X = U ( α , α ) Z = U α Z + U α Z , where the row partitioning of Z = ( Z T , Z T ) T conforms to the column partitioning of C . Followingthe logic used previously in this section, we obtain the marginal and conditional models: Y Si | Z = e Si and Y Di | ( X i , Y Si ) = α + α Z i + α Z i + φ D | S Y Si + e D | Si , = α + α Z i + ω K i + e D | Si , (3.9)where ω = ( α , φ D | S ) , K i = ( Z T i , Y TSi ) T and the other terms are as defined previously. Thismodel is of the same form as (2.3) and so a semi-orthogonal basis φ ∈ R k × u for E Σ D | S ( A ) withdimension u ≤ k can be incorporated into model (3.9) as follows. Y Si | Z = e Si and Y Di | ( X i , Y Si ) = α + φη Z i + ω K i + e D | Si , (3.10) Σ D | S = φ Ω φ + φ Ω φ , Σ S and Σ D | S are as defined in Section 2.1. The number of free real parameters in this modelis k + u p + p k + r ( r + 1) / . The envelope estimators can now be obtained straightforwardlyby following the steps in Section 3.1, and the asymptotic variance of envelope estimator (cid:98) α = P (cid:98) Φ (cid:98) α cm c of α is avar( √ n vec( (cid:98) α )) = Σ − Z | Z ⊗ φ Ω φ T + ( η T ⊗ φ ) M † ( Σ Z | Z )( η ⊗ φ T ) , where Σ Z | Z = lim n →∞ S Z | Z .Looking ahead and adapting the discussion of Section 3, the envelope estimator of U α issimply U (cid:98) α with asymptotic variance avar( √ n vec( U (cid:98) α )) = ( I p ⊗ U )avar( √ n vec( (cid:98) α ))( I p ⊗ U T ) , (3.11)where avar( √ n vec( (cid:98) α )) is as given previously.These results can be adapted to obtain an envelope estimator of the average profile E ( Y | X new ) = U α + U α X new at a new value X new of X by setting c = X new , so α = α X new . Assumingwithout loss of generality that X and Y S in (3.10) are centered, it follows that (cid:98) α = ¯ Y D andthus (cid:91) U α = P U ¯ Y . The estimator of the average profile is then (cid:98) E ( Y | X new ) = P U ¯ Y + U (cid:98) α = P U ¯ Y + UP (cid:98) Φ (cid:98) α cm X new . Since ¯ Y and (cid:98) α are asymptotically independent, we get the asymptoticvariance avar( √ n (cid:98) E ( Y | X new )) = P U ΣP U + avar( √ n vec( U (cid:98) α )) , where the second addend onthe right hand side is given by (3.11). This envelope estimator has the potential to be substantiallyless variable than plugin estimators mentioned at the beginning of Section 3.3.2. A potential dis-advantage is that a new envelope estimator is required for each profile determined by the value of25 new . The modifications necessary to adapt the results in Sections 3.1–3.3 for model (1.3) all stem fromthe new model for the subordinate response, Y S = W T β + e S , and the new definitions of α = W T β − φ D | S W T β for models (3.1), (3.4) and (3.10). This implies that T S is replaced by S S throughout, including log likelihoods (3.3), (3.6) and (3.8), and that the estimator of β can beconstructed as indicated near the end of Section 2.1. There is no change in the objective functions(3.2), (3.5) and (3.7), and consequently no change in the envelope estimators of α and β or theirasymptotic variances. We first evaluate the efficiency of the envelope estimator (cid:98) β em and the constrained estimator (cid:98) β cm us-ing simulations in two scenarios. We also include the unconstrained estimator (cid:98) β um as a reference.In Scenario 1, the eigenvalue corresponds to the material part is small relative to the immaterialpart and the dimension of U is large; thus the envelope estimator is expected to have substantialefficiency gain. In Scenario 2, the eigenvalue of the immaterial part is small relative to that of thematerial part and the envelope estimator is not expected to have substantial efficiency gain. Thesimulation for Scenario 1 is carried out in the following steps.Step 1. We first generated a sample of size n = 5000 . For each individual i , we generated p = 8 X i from a multivariate normal distribution with mean 0 and variance CC T ,where each element in C is identically and independently distributed with a standard nor-mal distribution N (0 , . Comment: The editor said that “The simulations should includedcases where predictors are subject to substantial dependency.” I’m not sure this qualifies assubstantial dependency since the expected covariance is 0. Perhaps mentioning the distri-bution of the predictor correlations in a typical simulation would do. Alternatively, changethe generation scheme to be compound symmetric so the correlations can be specified eas-ily, say .8. Or perhaps one with a small correlation say .5 and one with a larger correlationsay .85Step 2. Set r = 20 , u = 6 , q = 15 , q = 4 and q = q − q . Set Ω = bdiag (0 . I u − q , . I q ) and Ω = 50 I r − u . Set ( Γ , Γ ) = O and let Σ = ΓΩΓ T + Γ Ω Γ T , where O is an orthogonalmatrix obtained by singular value decomposition of a randomly generated matrix. Set η = K K , where K ∈ R u × q , K ∈ R q × p , each element in K and K is identicallyand independently generated from N (0 , . Set β = Γ η . Let U = ( Γ , Γ ) φ , where φ = bdiag { M U , M U } , M U = K and M U = ( I q , q × ( r − u − q ) ) T .Step 3. For each individual i , generate Y i identically and independently from normal distribution N ( β X i , Σ ) .Step 4. Calculate (cid:98) β um , (cid:98) β em and (cid:98) β cm , where U is correctly specified when calculating (cid:98) β cm .Step 5. Repeat Steps 3–4 100 times. 27 .1.1 Scenario 1 From the choice of η in Step 2, we have colrank ( η ) = q , and span ( β ) is strictly containedin both span ( Γ ) and span ( U ) since the dimension of span ( β ) is q = 4 which is smaller than min( u, q ) = 6 . Specifically, we also have span ( β ) = span ( Γ U ) = span ( Γ ) ∩ span ( U ) . Its easy tosee that α = ( K T , p × q ) T in this example. Among the 100 simulations, the envelope dimensionwas always correctly estimated as 6 using BIC. The empirical result of (cid:98) β um − β , (cid:98) β em − β and (cid:98) β cm − β are shown in Figure 4.1a, where all the elements of β are plotted in the same boxplot asif they are from the same population and the outliers are suppressed for a cleaner representation.Since U is correctly specified, (cid:98) β cm is an asymptotically unbiased estimator as are (cid:98) β um and (cid:98) β em .Hence, the boxplot of three estimators are all centered at 0. In Step 2, the larger eigenvalues of Σ are contained in Ω rather than Ω . That is, the variability of the immaterial part is bigger than thatof the material part. Additionally, the column space of U is very conservatively specified as q = 15 ,which is much bigger than the dimension of q = colrank ( β ) = 4 and the span ( U ) contains 11eigenvectors corresponds to large eigenvalues (i.e., 50 in this simulation). Hence, this scenariois in favor of the envelope estimator in terms of the efficiency. Indeed, the envelope estimator isthe most efficient estimator among the three estimators, while (cid:98) β cm is also more efficient than theunconstrained estimator (cid:98) β um .The average estimated asymptotic variances were close to the theoretical asymptotic variancescalculated using the true parameter values for all three estimators. The mean of the theoreticalasymptotic variances across all the elements in three estimators are 127.22 for √ n (cid:98) β um and 99.75for √ n (cid:98) β cm but is only 1.70 for √ n (cid:98) β em . That is, in this setting, the envelope estimator is about58 times more efficient that the constrained estimator and 75 times more efficient than the uncon-28trained estimator.Figure 4.1: Box plot of (cid:98) β um − β , (cid:98) β em − β and (cid:98) β cm − β in two scenarios in 100 simulations. − . . . b ^ sat b ^ env b ~ (a) Scenario 1 − . . b ^ um b ^ em b ^ cm (b) Scenario 2 To carry out simulations in Scenario 2, we modify Step 2 above as q = 6 , Ω = bdiag (50 I u − q , . I q ) and Ω = 0 . I r − u . In this scenario, the larger eigenvalues of Σ are associated with Ω , and thedimension of U can be seen to be just 2 dimensional larger than the true dimension of β . Hence,the envelope method is at a disadvantage in terms of the efficiency as compared with (cid:98) β cm . In the100 simulations, the envelope dimension is again always correctly estimated as 6. The empiricalbiases of the envelope and (cid:98) β cm are shown in Figure 4.1b. Again, all three estimators are centeredaround 0, indicating the asymptotic unbiasedness. As expected, the estimator (cid:98) β cm is the most effi-cient among the three estimators, while the envelope estimator (cid:98) β em is still more efficient than theunconstrained estimator (cid:98) β um .The average estimated asymptotic variance of the three estimators were all close to their the-oretical values. The average empirical variances of all the elements in three estimators are 20.68for √ n (cid:98) β um , 19.53 for √ n (cid:98) β em and 4.87 for √ n (cid:98) β cm . That is, in this setting, the estimator using acorrectly specified U is on average about 4 times of more efficient that the unconstrained estimatorand the envelope estimator. 29 .2 Potential Bias of the constrained estimator We conducted a small simulation, generating data from envelope model (1.4), to further illustratepotential bias effects. The sample size and parameters are chosen the same as in Section 4.1. Thesample size was taken to be large so the bias effects might be clear. It is known that the efficiencygains from fitting (1.4) will be much greater in Scenario 1 than in Scenario 2. . . . . q~ M SE b ~ b ^ env b ^ sat Figure 4.2: Illustration of potential bias in the constrained estimator (1.3) under Scenario 1, where k = dim( U ) , U = span { ( Γ , Γ )( I k , T } and MSE denotes the average element-wise squarederror for the indicated estimators. The lines for (cid:98) β em and (cid:98) β um are indistinguishable.Response vectors were then generated according to model (1.4) using normal errors and theresulting data fitted to obtain the envelope estimator (cid:98) β em . We used the same data to constructthe unconstrained estimator (cid:98) β um and the constrained estimator (cid:98) β cm with different selections for U = ( Γ , Γ ) A k where A k = ( I k , T , k = 1 , . . . , r . For k < u , B (cid:54)⊆ U and so (cid:98) β cm is biased.But for k ≥ u , B ⊆ U and there is no bias in (cid:98) β cm . We summarized the bias by computing themean squared error over all elements β ij of β : MSE = ( rp ) − (cid:80) ri =1 (cid:80) pj =1 ( (cid:98) β ( · ) ,ij − β ij ) for thethree estimators (cid:98) β um , (cid:98) β cm and (cid:98) β em . Shown in Figure 4.2 are plots of the MSE averaged over100 replications of this scheme for Scenario 1, each replication starting with the generation of the30esponse vectors. The constant MSE for (cid:98) β em was e − and that for unconstrained model was about times greater at . e − . The MSE for the constrained estimator decreased monotonically fromits maximum value . at k = 1 to its minimum value, which was around e − , at k = u = 6 and then increased monotonically to . at k = 20 . The corresponding plot for Scenario 2 isgraphically indistinguishable and so is not presented. It seems clear that the bias in the constrainedestimator can be substantial until we achieve B ⊆ U , at which point the three estimators becomeindistinguishable on the scale of Figure 4.2.Assuming that U is correctly specified, we imposed the envelope structure on α and referredto the new envelope estimator as (cid:98) β ecm . We carred out the simulations similar to those in Section4.1, replacing Steps 2–4 with the following.Step 2*. Set r = 20 , u ∗ = 3 , q = 15 . Set Ω ∗ = 0 . I u ∗ and Ω = 50 I q − u ∗ . Set ( Γ ∗ , Γ ∗ ) = I andlet var( ε D | S ) = Σ D | S = Γ ∗ Ω ∗ Γ ∗ T + Γ ∗ Ω ∗ Γ ∗ T . Generate η ∗ ∈ R u ∗ × p and U , whereeach element in η ∗ and U is identically and independently generated from N (0 , . Set α ∗ = Γ ∗ η ∗ and β ∗ = U α ∗ .Step 3*. For each individual i , generate Y S i identically and independently from normal distribu-tion N (0 , I r − q ) . Generate φ ∈ R q × ( r − q ) , where each element is generated identicallyand independently from standard normal. Generate Y D i from distribution N ( α ∗ Z i + φ Y S i , Σ D | S ) Step 4*. Calculate (cid:98) β cm and (cid:98) β ecm , where U is correctly specified for both estimators.The average MSE of (cid:98) β cm and (cid:98) β ecm was e − and e − . The Monte Carlo mean variances overall the elements were 21.76 and 5.27 for √ n ˜ β and √ n (cid:98) β ecm , demonstrating the efficiency of theadditional envelope structure over the (cid:98) β cm estimator.31 Applications
The dental data consists of measurements of the distance (mm) from the center of the pituitaryto the pterygomaxillary fissure for each of 11 girls and 16 boys at ages 8, 10, 12, and 14 years( t ). Since their introduction by Potthoff and Roy (1964), these data have been used frequently toillustrate the analysis of longitudinal data. We respect that tradition in this section. We removedthe outlying and influential male case described by Pan and Fang (2002) prior to application of themethods discussed herein. We set the goal to characterize the differences between boys and girlsrather than profile modeling and so we contrasted the behavior of estimators from the unconstrainedmodel (1.1), the envelope model (1.4), the constrained model (1.3) and the envelope version ofmodel (1.3) discussed in Section 3.1.Consistent with the literature, we fitted constrained model (1.3) and its envelope counterpartwith the rows of U being U T ( t ) = (1 , t ) . The estimated dimension of the envelope for model (1.4)was u = 2 , and thus it was inferred that only two linear combinations of the response vectors areneeded to fully characterize the differences between boys and girls. The estimated dimension ofthe envelope for the constrained envelope model (3.1) was u = 1 . Table 1 shows the estimatedasymptotic variances, determined by the plug-in method, for the four estimators (cid:98) β um , (cid:98) β em , (cid:98) β cm and (cid:98) β ecm . The unconstrained model has the worst estimated performance, followed by the regularenvelope model and the constrained model. The enveloping in the constrained model has the bestestimated performance. We would need to increase the sample size by about 2.5 times for theconstrained estimator (cid:98) β cm to have the performance estimated for the enveloped version (cid:98) β ecm withthe current sample size. 32able 1: Estimated asymptotic variances avar( √ n (cid:98) β ( · ) ) of the four elements of (cid:98) β um from the un-constrained model (1.1), (cid:98) β em from the envelope model (1.4), (cid:98) β cm from the constrained model (1.3)and (cid:98) β ecm from the envelope version of constrained model (1.3).Age (cid:98) β ( · ) (cid:98) β um (cid:98) β em (cid:98) β cm (cid:98) β ecm (cid:98) β em can be traced back to the es-timated eigen-structure of Σ . The eigenvalues of (cid:98) Ω and (cid:98) Ω were (14 . , . and (2 . , . .Envelopes offer relatively little gain when most of the variation in the response is associated mate-rial information, as is the case here. On the other hand, the eigenvalues of (cid:98) Ω and (cid:98) Ω arising fromenveloping in the constrained model were . and . . In this case most of the variation in thedirect response Y D is associated with immaterial information, the general setting when envelopesperform well.Figure 5.1a gives a profile plot of the fitted vectors from envelope model (1.4). The implied fitis quite good and close to the profile plot of the raw mean vectors shown in Supplement Figure ?? a.(Profile plots of residuals are also shown in Figure ?? ). Under envelope theory, the distribution of Q Γ Y should be independent of the predictor values, in this case sex. The profile plot of Q (cid:98) Γ ¯ Y bysex shown in Figure 5.1b reflects this property. Figures 5.1cd show the corresponding plots fromthe fit of the constrained model (1.3). The fit of the constrained model altered the shape of theprofile for girls so that it more closely matches that for boys, which was not done by the fit of theenvelope model. This type of conformity is an intrinsic property of constrained model (1.3).If there is uncertainty about the containment B ⊆ U needed for the constrained model then it33
10 12 14
Age D i s t an c e GirlsBoys 8 10 12 14 − − − Age D i s t an c e GirlsBoys a. Fitted values (cid:98) Y em from (1.4) b. Projected means Q (cid:98) Γ ¯ Y Age D i s t an c e GirlsBoys 8 10 12 14 − − − Age D i s t an c e GirlsBoys c. Fitted values (cid:98) Y cm from (1.3) d. Projected means Q U ¯ Y Figure 5.1: Profile plots by sex of (a) the fitted vectors from envelope model (1.4), (b) meansprojected onto span ⊥ ( (cid:98) Γ ) , (c) the fitted vectors from the constrained model (1.3) and (d) meansprojected onto U ⊥ . The vertical axis for each plot is the distance for the plotted vectors.may be desirable to base an analysis on envelope model (1.4). Otherwise, the results in the lasttwo rows of Table 1 indicate that enveloping in the constrained model (1.3) is the best option fromamong those considered.We also applied the scaled envelope estimator discussed in Section 3.2. The asymptotic vari-ances of the elements of the corresponding estimator of β did not differ materially from those34hown in Table 1 for (cid:98) β um and (cid:98) β em . Then, scaling offered no gains in this example. This was ratheras expected since good scale estimation generally requires large sample size. The China Health and Nutrition Survey (CHNS) was designed to evaluate the effects of the health,nutrition and family planning policies on the health and nutritional status of its population (Popkinet al., 2009). The survey used a multistage, random cluster process to draw samples of householdsin 15 provinces and municipal cities that vary substantially in geography, economic development,public resources, and health indicators. In totals, 9 surveys were carried out between 1989 and2011. We included in our analysis only the 1209 individuals that participated in all the 9 surveys,giving a total of × , records. Five individuals were deleted for having unreasonablechanges in weight or height. For instance, one individual had a height of 65 cm in the seventhsurvey but a height of 160 cm in all other surveys. The baseline predictors we considered includeage at the first survey, binary indicators for gender and region (urban or rural), and a six-levelindicator for highest education levels obtained at the first survey. About 98.2% of the individualsin the analysis were over 21 years old. Age at first survey, gender and region were fully observedbut there were individuals with missing education levels at baseline. We imputed the missingvalues with the education level collected at the next available visit. The response was the change inBMI from baseline at the 8 followup surveys. In the , records, there was a total of 371 valuesof either missing height or weight information needed to calculate BMI. We assumed that heightand weight were missing at random and imputed them by carrying the last observation forward.We compared the estimated asymptotic variances of the unconstrained estimator (cid:98) β um , the en-35elope estimator (cid:98) β em and the constrained estimator (cid:98) β cm from model (1.3) using U T = (1 , t, t ) ,where t is the time in years from baseline. We also included the envelope version of the constrainedestimator (cid:98) β ecm , the scaled envelope estimator (cid:98) β sem from Cook and Su (2013) and its constrainedversion (cid:98) β secm corresponding to model (1.3). We used version (1.3) of the constrained model be-cause we were interested in profile contrasts rather than modeling profiles per se.Since (cid:98) β ( · ) ∈ R × , we report in columns 4–9 of Table 2 various location statistics computedover the estimated variances of the individual elements in (cid:98) β ( · ) . Using these summary statistics asthe basis for comparison, we see that the estimators fall into two clear groups. The unconstrainedestimator does the worst, followed closely by the envelope estimator and the constrained estimator.The three envelope estimators listed in the last three rows of the table do noticeably better than thefirst three. Our assessment based on just the variance summary statistics and taking computationaldifficulty into account leads us to prefer the envelope constrained estimator (cid:98) β ecm . The model orderdetermined by BIC given in the third column of Table 2 tells a similar story. Based on the actualBIC values, the unconstrained estimator in the first row appears clearly inferior to the others, whilethe scaled envelope model in the last row is clearly the best. The remaining models are relativelydifficult to distinguish. We next give a few additional details.Table 2: BIC order, minimum, maximum, mean and quartiles Q – Q of the estimated asymptoticvariances of the elements in (cid:98) β ( · ) for the CHNS studyEstimator Envlp. dim. BIC order Min Q Q Mean Q Max (cid:98) β um (cid:98) β em (cid:98) β cm (cid:98) β ecm (cid:98) β secm (cid:98) β sem (17 . , . of (cid:98) Ω and the six eigenvalues of (cid:98) Ω which rangedbetween . and . . Turning to the envelope version of constrained model (1.3), the estimateddimension of E Σ D | S ( A ) using BIC was 1. The variance gain over the unconstrained model shownin Table 2 is again reflected by the value of (cid:98) Ω = 3 e − and the two eigenvalues of (cid:98) Ω , . and . . As with the regular envelope model, the major variability lies in the immaterial part of theresponse. The aim of the postbiotics study (Dunand et al., 2019) was to determine the protective capac-ity against Salmonella infection in mice of the cell-free fraction (postbiotic) of fermented milkproduced at laboratory and industrial levels. The capacity of the postbiotics produced by pH-controlled fermentation was evaluated to stimulate the production of secretory IgA in feces and toprotect mice against Salmonella infection. There were 3 study groups with seven mice per group:(i) a control group (C), where mice received the unfermented milk supernatant; (ii) an F36 group(F36), where mice received the cell-free supernatant obtained by DSM-100H fermentation in 10%(w/v) skim milk produced in the laboratory; and (iii) an F36D group (F36D), where mice receivedthe product F36 diluted 1/10 in tap water. Feces samples of approximately 50 mg per mouse werecollected once a week for 6 weeks and the concentration of secretory IgA (S-IgA) by ELISA wasdeterminate. The response was the IgA measured over the 6 weeks period and the predictors were37he group indicators.Figure 5.2: Average of IgA by group over time in the Posbiotics Study dataThe research question was whether there were differences of the IgA measures among the treat-ment groups. We present the average response by group over the weeks in Figure 5.2. We set thecontrol group as the baseline and therefore β ∈ R × . We calculate all estimators based on enve-lope model (1.3) because we were interested in profile contrasts rather than modeling profiles. Weuse U T ( t ) = (1 , t/ , ( t/ , cos(2 πt/ , sin(2 πt/ , where t = 1 , . . . , are the weeks where themeasures were taking. The unconstrained estimator (cid:98) β um was considered in Dunand et al. (2019)and it did not show a difference between treatment groups, even when exploratory difference seemapparent from Figure 5.2.Table 3 shows BIC, envelope dimension and MSE of the estimators. We listed the maximumenvelope dimension for the two non-envelope methods as their estimated envelope dimensions.The unconstrained estimator performs the worst and the scaled constrained envelope estimatorperforms the best in terms of both BIC and efficiency.38able 3: Envelope dimension, BIC, BIC order, and MSE for the Postbiotics StudyEstimator Envlp. dim. BIC BIC order MSE (cid:98) β um (cid:98) β em (cid:98) β cm (cid:98) β ecm (cid:98) β sem (cid:98) β secm p -values of the (cid:98) β components. From Table 4we can see that the unconstrained estimator does not reveal any differences, which aligns with thefindings in Dunand et al. (2019). None of the estimators demonstrate any evidence of differencebetween F36D group and the control group at any time. On the other hand, (cid:98) β secm reveals a sig-nificance difference between the control and F36 groups in all followup weeks. The p -values forsuch a comparison of (cid:98) β em are clearly significant only in week 3. Other estimators also fail to findall followup weeks significant between F36 and control groups, e.g., the scaled envelope is notsignificant in week 5 and 6, and constrained envelope is significant only in week 2.Table 4: The p -values for coefficients for (cid:98) β um , (cid:98) β em and (cid:98) β secm week F36 vs control F36 D vs control (cid:98) β um (cid:98) β em (cid:98) β secm (cid:98) β um (cid:98) β em (cid:98) β secm p -values) are reflected by the eigenvalue e − of (cid:98) Ω and the foureigenvalues of (cid:98) Ω which are . , . , . and . . The reason for the envelope estimator to39e not as significant when comparing F36 and control groups is that there is not as big a discrep-ancy between the eigenvalues of (cid:98) Ω ( e − ) and the eigenvalues of (cid:98) Ω ( . , . , . , . , and e − ). The primary computational step for all of the envelope methods described herein involves finding (cid:98) G = arg min G ∈G log | G T M G | + log | G T M G | over a class G of semi-orthogonal matrices,where the inner product matrices M and M depend on the application. The R package Renvlp byM. Lee and Z. Su contains a routine for minimizing objective functions of this form. Computationsare straightforward once (cid:98) G has been found. Renvlp also implements specialized methodology fordata analysis under envelope model (1.4) and partial envelope model. The associated routines canbe modified for the models described herein. Description of and links to packages for envelopemethods are available at z.umn.edu/envelopes.We relegated discussion of certain well-established aspects of envelope methodology to theSupplement. Non-normality and the bootstrap are discussed in Section ?? and methods for select-ing the envelope dimension are reviewed in Section ?? . Enveloping for ( α , α ) jointly is discuss inSection ?? and finally a brief discussions of envelopes and Rao’s simple structure is in Section ?? Extensions to unbalanced data and random effects models requiere additional research,