Abstract

An explanation to Lord's paradox using ordinary least square regression models is given. It is not a paradox at all, if the regression parameters are interpreted as predictive or as causal with stricter conditions and be aware of laws of averages. We use derivation of a super-model from a given sub-model, when its residuals can be modelled with other potential predictors as a solution.

Full PDF

aa r X i v : . [ s t a t . O T ] A p r Resolving the Lord’s Paradox

Priyantha Wijayatunga Department of Statistics, Ume˚a University, Ume˚a SE-90187, SwedenE-mail for correspondence: [email protected]

Abstract:

An explanation to Lord’s paradox using ordinary least square regres-sion models is given. It is not a paradox at all, if the regression parameters areinterpreted as predictive or as causal with stricter conditions and be aware oflaws of averages. We use derivation of a super-model from a given sub-model,when its residuals can be modelled with other potential predictors as a solution.

Keywords:

Eﬀect; Predictive; Causal; Confounding.

In 1967 Frederic Lord posed following question (see Lord 1967 and Pearl2016) that became a paradox among applied statistical community. Tosee eﬀects and if there is any sex diﬀerence of diet provided in a univer-sity weights of students at time of their arrival and those a year later arerecorded. The data are independently examined by two statisticians. Theﬁrst examines the mean weight of the girls at the beginning and at the endof the year, and ﬁnds that they are to be identical, i.e., frequency distri-bution of the weight for the girls is not changed, so is for the boys. Thesecond statistician ﬁnds that the slope of the regression line of the ﬁnalweight on the initial weight is essentially the same for both sexes but theregression coeﬃcient of the variable sex to be statistically signiﬁcant andconcludes that the boys showed signiﬁcantly more gain in weight than thegirls when proper allowance is made for diﬀerences for initial weight.Conclusions of the two statisticians seem to contradict with each other; theﬁrst is predictive and the second is both predictive, and causal if the initialweight is the only confounder of causal relation between the sex and theﬁnal weight. The second has given causal eﬀect of the sex on the ﬁnal weight(weight gain) by a regression coeﬃcient. In fact, to give it by comparing,

This paper was published as a part of the proceedings of the 32nd Interna-tional Workshop on Statistical Modelling (IWSM), Johann Bernoulli Institute,Rijksuniversiteit Groningen, Netherlands, 3–7 July 2017. The copyright remainswith the author(s). Permission to reproduce or extract any parts of this abstractshould be requested from the author(s). Lord’s Paradox two supports of the confounder of both sexes should coincide. But one canassume that the population initial weight ranges of boys and girls coincideeven though sample counterparts diﬀer (so, extrapolation is meaningful).Let the initial weight, ﬁnal weight and sex are denoted by W I , W F , S respectively ( S = 0 , a girl and S = 1 , a boy) and weight gain be D = W F − W I . If the eﬀect of S on D is found by diﬀerence of conditionalmeans, E { D | S = 1 } − E { D | S = 0 } then it is no eﬀect. This can be foundby running regression of D on S. Note that E { W F | S = 1 } = E { W I | S = 1 } ,(say, µ B ) and E { W F | S = 0 } = E { W I | S = 0 } , (say, µ G ). If E { D | W I = i, S = 1 } and E { D | W I = i, S = 0 } are calculated simply by partitioning thedata by taking W I to be discrete or as a functions of i , then the diﬀerence E { D | W I = i, S = 1 } − E { D | W I = i, S = 0 } may not be zero for each i , so may be diﬀerence of their weighted means, P i E { D | W I = i, S =1 } p ( W I = i ) − P i E { D | W I = i, S = 0 } p ( W I = i ) . If the eﬀect of S on W F is calculated by it then it is diﬀerent from former value (paradoxical!).Now let us see why two types of diﬀerences of averages diﬀer by simplealgebra, that will say that they should have two diﬀerent interpretations.First assume that we have a number of subgroups of boys and, for simplicity,the same is true for girls. Let D ij be the weight gain of the j -th boy inthe i -th subgroup of boys where sub-group size is n i and D ij be that ofthe girls where sub-group size is m i and furthermore, let f i = n i / P k n k , f i = m i / P k m k and f i = ( n i + m i ) / P k ( n k + m k ) for j = 1 , ..., n i and i = 1 , ..., a. And let A be diﬀerence of the average weight gain of the boysand the girls, ¯ D i = P j D ij /n i and ¯ D i = P j D ij /m i for i = 1 , ..., a. So, A = P ai =1 P n i j =1 D ij P ai =1 n i − P ai =1 P m i j =1 D ij P ai =1 m i = a X i =1 ¯ D i f i − a X i =1 ¯ D i f i = 12 n a X i =1 ¯ D i f i + a X i =1 ¯ D i f i − a X i =1 ¯ D i f i − a X i =1 ¯ D i f i o ; generally= 12 n a X i =1 ( ¯ D i − ¯ D i )( f i + f i ) o = a X i =1 ( ¯ D i − ¯ D i ) f i = A where f i = αf i +(1 − α ) f i for i = 1 , ..., a such that α = P ai =1 n i / P ai =1 ( n i + m i ) and A is the diﬀerence of weighted averages of the sub-group weightgain averages. So, the diﬀerence of group averages A (which is zero inour case) is diﬀerent from the diﬀerence of pooled-weighted average of thesub-group averages A . The second statistician compares the boys and thegirls subgroup-wise and ﬁnds that it is a constant gain for the boys over thegirls across the subgroups, i.e., ¯ D i − ¯ D i is constant for all initial weight i. Therefore he ﬁnds that the boys gain more weight than the girls incorresponding sub-groups. Note that for simplicity we have taken initialweights as discrete values. In fact, A = P i E { D | S = 1 , W I = i } p ( W I = i ) − P i E { D | S = 0 , W I = i } p ( W I = i ) is the causal eﬀect of S on D if W I ijayatunga et al. 3 is the only confounder, under the linear assumption. It is diﬀerent from A unless E { D | S, W I } = E { D | S } . The confounding eﬀect ( A − A ) dependson how diﬀerent f and f are (can have a measure from them). Now we deﬁne interpretation of ordinary least square (OLS) estimates ofthe regression coeﬃcients (parameters). The OLS estimation is based onthe variation of the response variable Y for a given functional form of thevalues of explanatory factors. Regression coeﬃcients are estimated so thatsum of squared prediction errors for the data in the sample is the minimum.So, reverse regression is not generally obtainable from forward regressionand may not be consistent with the latter. For simple linear regression onecan easily establish that the reverse regression and the forward regressionare consistent with each other if and only if one of the regressions havesymmetric residuals about and uni-modal at conditional expectation ofresponse, that implies other regression too.Now consider the OLS linear regression model Y = β + β X + β X + ǫ ,then linear eﬀect of X on Y when X is held unchanged is given by β if Y values are symmetric about and uni-modal at β + β X + β X . It is clearthat the supports of X for each value of X are the same (or extrapolationis meaningful if empirical supports diﬀer). Symmetry and uni-modality of Y values for given values of X and X are observed if all other factors thataﬀect or are associated, but are not taken into consideration are allowed tovary pure randomly. This is a fundamental assumption used in statisticalmodelling often implicitly.Let us do a regression of W F on the binary variable S . Then we get themodel W F = µ G + ( µ B − µ G ) S + ǫ . where the regression co-eﬃcient of S is the predictive eﬀect of S on W F provided that above requirementis fulﬁlled. The residuals of the model are just individual values of D , i.e., ǫ = D for each subject and it is easy to see in Fig. 1 of Lord 1967, that theresiduals are predictive by W I for each sex category separately, ǫ W I | S. However, it may be that ǫ ⊥ S. So, if the two clusters of values of W F fortwo sexes are symmetric about and uni-modal at the respective means thenthe eﬀect of S on W F is the regression coeﬃcient of S in the model. But itis uncontrolled confounders that are associated with W F , then it should beinterpreted accordingly. That is, it is the predictive eﬀect of sex diﬀerencesand causal if there are no confounders such as W I . And we see that we getzero predictive eﬀect from the meal change since the regression coeﬃcientis the same as that when the girls and boys had previous meal type.Let we can write the distribution of residuals for each value s of S, say, f ( ǫ | s ) as a mixture, f ( ǫ | s ) = R g ( ǫ | x, s ) π ( x, s ) dx for some random vari-able X , and for each value x of X the component distribution g ( ǫ | x, s ) mayhave non-zero mean such that E { ǫ | s } = R E { ǫ | x, s } π ( x, s ) dx = 0 and Lord’s Paradox then we have that

V ar { ǫ | x, s } ≤ V ar { ǫ | s } where π ( x, s ) = h ( x | s ) p ( s );here h ( x | s ) is the conditional probability density of X given S = s and p ( s )is the marginal probability distribution of S. If X could be identiﬁed mean-ingfully, then model should include such feature variables too. In this case, X could be identiﬁed as the initial weight W I . Then one should accept theupgraded model that includes W I too. It has residuals that have a smallerconditional standard deviation given W I and S. Furthermore, if W I is theonly confounding factor and when it is also included in the model the thecoeﬃcient of S is the causal eﬀect of S on W F .Let the residual ǫ ′ corresponds to the context that W I = w I and S = s andthen it can be written as ǫ ′ = µ w I ,sǫ + ǫ where µ w I ,sǫ is the expectation of it.So, we have E { ǫ | W I = w I , S = s } = 0 and also that V ar { ǫ | W I = w I , S = s } ≤ V ar { ǫ | S = s } . And furthermore, we can have that µ w I , ǫ = a + b w I for s = 0 and µ w I , ǫ = a + b w I for s = 1 where a , b and a are constants.Now, given that W I = w I and S = s , for s = 0 ,

1, and I ( A ) = 1 when A isa true statement and I ( A ) = 0 otherwise, we have W F = W F = µ G + ( µ B − µ G ) s + µ w I ,sǫ + ǫ = µ G + ( µ B − µ G ) s + ( a + b w I ) I ( S = 0) + ( a + b w I ) I ( S = 1) + ǫ = µ G + ( µ B − µ G ) s + a I ( S = 0) + a I ( S = 1) + b w I + ǫ = µ G + ( µ B − µ G ) s + a (1 − s ) + a s + b w I + ǫ = µ G + a + ( µ B − µ G − a + a ) s + b w I + ǫ So we can obtain a super-model (regression) from a given regression model(it is a sub-model of the former) as long as its residuals are predictive (lin-early in this case) with another explanatory variable. The predictive eﬀectof S on W F when controlled for W I is µ B − µ G − a + a that is gener-ally diﬀerent from earlier value of µ B − µ G and for each individual modelprediction is more accurate than that of the previous model, therefore newmodel is preferred to the previous one. If W I is only a confounder but notan intermediate variable between the causal pathway between S and W F ,and has a common support for all values of S , then β is the average causaleﬀect of S on W F in the linear case. In our example, sample supports of W I for S = 1 and S = 0 diﬀer but we can assume that they are the samein the population (so, extrapolation is meaningful). Note that the abovearguments can be generalised. For restrictions of space, we avoid presentingsolution to the paradox, that is based on causal diagrams. We object recentsolution by Pearl. Our explanations comply with Lord’s initial comments. References

Lord, F. M. (1967). A Paradox in the Interpretation of Group Compar-isons.

Psychological Bulletin , (5), 304 – 305.Pearl, J. (2016). Lord’s Paradox Revisted - (Oh Lord Kumbaya!). Journalof Causal Inference ,4