aa r X i v : . [ s t a t . O T ] A p r Resolving the Lord’s Paradox
Priyantha Wijayatunga Department of Statistics, Ume˚a University, Ume˚a SE-90187, SwedenE-mail for correspondence: [email protected]
Abstract:
An explanation to Lord’s paradox using ordinary least square regres-sion models is given. It is not a paradox at all, if the regression parameters areinterpreted as predictive or as causal with stricter conditions and be aware oflaws of averages. We use derivation of a super-model from a given sub-model,when its residuals can be modelled with other potential predictors as a solution.
Keywords:
Effect; Predictive; Causal; Confounding.
In 1967 Frederic Lord posed following question (see Lord 1967 and Pearl2016) that became a paradox among applied statistical community. Tosee effects and if there is any sex difference of diet provided in a univer-sity weights of students at time of their arrival and those a year later arerecorded. The data are independently examined by two statisticians. Thefirst examines the mean weight of the girls at the beginning and at the endof the year, and finds that they are to be identical, i.e., frequency distri-bution of the weight for the girls is not changed, so is for the boys. Thesecond statistician finds that the slope of the regression line of the finalweight on the initial weight is essentially the same for both sexes but theregression coefficient of the variable sex to be statistically significant andconcludes that the boys showed significantly more gain in weight than thegirls when proper allowance is made for differences for initial weight.Conclusions of the two statisticians seem to contradict with each other; thefirst is predictive and the second is both predictive, and causal if the initialweight is the only confounder of causal relation between the sex and thefinal weight. The second has given causal effect of the sex on the final weight(weight gain) by a regression coefficient. In fact, to give it by comparing,
This paper was published as a part of the proceedings of the 32nd Interna-tional Workshop on Statistical Modelling (IWSM), Johann Bernoulli Institute,Rijksuniversiteit Groningen, Netherlands, 3–7 July 2017. The copyright remainswith the author(s). Permission to reproduce or extract any parts of this abstractshould be requested from the author(s). Lord’s Paradox two supports of the confounder of both sexes should coincide. But one canassume that the population initial weight ranges of boys and girls coincideeven though sample counterparts differ (so, extrapolation is meaningful).Let the initial weight, final weight and sex are denoted by W I , W F , S respectively ( S = 0 , a girl and S = 1 , a boy) and weight gain be D = W F − W I . If the effect of S on D is found by difference of conditionalmeans, E { D | S = 1 } − E { D | S = 0 } then it is no effect. This can be foundby running regression of D on S. Note that E { W F | S = 1 } = E { W I | S = 1 } ,(say, µ B ) and E { W F | S = 0 } = E { W I | S = 0 } , (say, µ G ). If E { D | W I = i, S = 1 } and E { D | W I = i, S = 0 } are calculated simply by partitioning thedata by taking W I to be discrete or as a functions of i , then the difference E { D | W I = i, S = 1 } − E { D | W I = i, S = 0 } may not be zero for each i , so may be difference of their weighted means, P i E { D | W I = i, S =1 } p ( W I = i ) − P i E { D | W I = i, S = 0 } p ( W I = i ) . If the effect of S on W F is calculated by it then it is different from former value (paradoxical!).Now let us see why two types of differences of averages differ by simplealgebra, that will say that they should have two different interpretations.First assume that we have a number of subgroups of boys and, for simplicity,the same is true for girls. Let D ij be the weight gain of the j -th boy inthe i -th subgroup of boys where sub-group size is n i and D ij be that ofthe girls where sub-group size is m i and furthermore, let f i = n i / P k n k , f i = m i / P k m k and f i = ( n i + m i ) / P k ( n k + m k ) for j = 1 , ..., n i and i = 1 , ..., a. And let A be difference of the average weight gain of the boysand the girls, ¯ D i = P j D ij /n i and ¯ D i = P j D ij /m i for i = 1 , ..., a. So, A = P ai =1 P n i j =1 D ij P ai =1 n i − P ai =1 P m i j =1 D ij P ai =1 m i = a X i =1 ¯ D i f i − a X i =1 ¯ D i f i = 12 n a X i =1 ¯ D i f i + a X i =1 ¯ D i f i − a X i =1 ¯ D i f i − a X i =1 ¯ D i f i o ; generally= 12 n a X i =1 ( ¯ D i − ¯ D i )( f i + f i ) o = a X i =1 ( ¯ D i − ¯ D i ) f i = A where f i = αf i +(1 − α ) f i for i = 1 , ..., a such that α = P ai =1 n i / P ai =1 ( n i + m i ) and A is the difference of weighted averages of the sub-group weightgain averages. So, the difference of group averages A (which is zero inour case) is different from the difference of pooled-weighted average of thesub-group averages A . The second statistician compares the boys and thegirls subgroup-wise and finds that it is a constant gain for the boys over thegirls across the subgroups, i.e., ¯ D i − ¯ D i is constant for all initial weight i. Therefore he finds that the boys gain more weight than the girls incorresponding sub-groups. Note that for simplicity we have taken initialweights as discrete values. In fact, A = P i E { D | S = 1 , W I = i } p ( W I = i ) − P i E { D | S = 0 , W I = i } p ( W I = i ) is the causal effect of S on D if W I ijayatunga et al. 3 is the only confounder, under the linear assumption. It is different from A unless E { D | S, W I } = E { D | S } . The confounding effect ( A − A ) dependson how different f and f are (can have a measure from them). Now we define interpretation of ordinary least square (OLS) estimates ofthe regression coefficients (parameters). The OLS estimation is based onthe variation of the response variable Y for a given functional form of thevalues of explanatory factors. Regression coefficients are estimated so thatsum of squared prediction errors for the data in the sample is the minimum.So, reverse regression is not generally obtainable from forward regressionand may not be consistent with the latter. For simple linear regression onecan easily establish that the reverse regression and the forward regressionare consistent with each other if and only if one of the regressions havesymmetric residuals about and uni-modal at conditional expectation ofresponse, that implies other regression too.Now consider the OLS linear regression model Y = β + β X + β X + ǫ ,then linear effect of X on Y when X is held unchanged is given by β if Y values are symmetric about and uni-modal at β + β X + β X . It is clearthat the supports of X for each value of X are the same (or extrapolationis meaningful if empirical supports differ). Symmetry and uni-modality of Y values for given values of X and X are observed if all other factors thataffect or are associated, but are not taken into consideration are allowed tovary pure randomly. This is a fundamental assumption used in statisticalmodelling often implicitly.Let us do a regression of W F on the binary variable S . Then we get themodel W F = µ G + ( µ B − µ G ) S + ǫ . where the regression co-efficient of S is the predictive effect of S on W F provided that above requirementis fulfilled. The residuals of the model are just individual values of D , i.e., ǫ = D for each subject and it is easy to see in Fig. 1 of Lord 1967, that theresiduals are predictive by W I for each sex category separately, ǫ W I | S. However, it may be that ǫ ⊥ S. So, if the two clusters of values of W F fortwo sexes are symmetric about and uni-modal at the respective means thenthe effect of S on W F is the regression coefficient of S in the model. But itis uncontrolled confounders that are associated with W F , then it should beinterpreted accordingly. That is, it is the predictive effect of sex differencesand causal if there are no confounders such as W I . And we see that we getzero predictive effect from the meal change since the regression coefficientis the same as that when the girls and boys had previous meal type.Let we can write the distribution of residuals for each value s of S, say, f ( ǫ | s ) as a mixture, f ( ǫ | s ) = R g ( ǫ | x, s ) π ( x, s ) dx for some random vari-able X , and for each value x of X the component distribution g ( ǫ | x, s ) mayhave non-zero mean such that E { ǫ | s } = R E { ǫ | x, s } π ( x, s ) dx = 0 and Lord’s Paradox then we have that
V ar { ǫ | x, s } ≤ V ar { ǫ | s } where π ( x, s ) = h ( x | s ) p ( s );here h ( x | s ) is the conditional probability density of X given S = s and p ( s )is the marginal probability distribution of S. If X could be identified mean-ingfully, then model should include such feature variables too. In this case, X could be identified as the initial weight W I . Then one should accept theupgraded model that includes W I too. It has residuals that have a smallerconditional standard deviation given W I and S. Furthermore, if W I is theonly confounding factor and when it is also included in the model the thecoefficient of S is the causal effect of S on W F .Let the residual ǫ ′ corresponds to the context that W I = w I and S = s andthen it can be written as ǫ ′ = µ w I ,sǫ + ǫ where µ w I ,sǫ is the expectation of it.So, we have E { ǫ | W I = w I , S = s } = 0 and also that V ar { ǫ | W I = w I , S = s } ≤ V ar { ǫ | S = s } . And furthermore, we can have that µ w I , ǫ = a + b w I for s = 0 and µ w I , ǫ = a + b w I for s = 1 where a , b and a are constants.Now, given that W I = w I and S = s , for s = 0 ,
1, and I ( A ) = 1 when A isa true statement and I ( A ) = 0 otherwise, we have W F = W F = µ G + ( µ B − µ G ) s + µ w I ,sǫ + ǫ = µ G + ( µ B − µ G ) s + ( a + b w I ) I ( S = 0) + ( a + b w I ) I ( S = 1) + ǫ = µ G + ( µ B − µ G ) s + a I ( S = 0) + a I ( S = 1) + b w I + ǫ = µ G + ( µ B − µ G ) s + a (1 − s ) + a s + b w I + ǫ = µ G + a + ( µ B − µ G − a + a ) s + b w I + ǫ So we can obtain a super-model (regression) from a given regression model(it is a sub-model of the former) as long as its residuals are predictive (lin-early in this case) with another explanatory variable. The predictive effectof S on W F when controlled for W I is µ B − µ G − a + a that is gener-ally different from earlier value of µ B − µ G and for each individual modelprediction is more accurate than that of the previous model, therefore newmodel is preferred to the previous one. If W I is only a confounder but notan intermediate variable between the causal pathway between S and W F ,and has a common support for all values of S , then β is the average causaleffect of S on W F in the linear case. In our example, sample supports of W I for S = 1 and S = 0 differ but we can assume that they are the samein the population (so, extrapolation is meaningful). Note that the abovearguments can be generalised. For restrictions of space, we avoid presentingsolution to the paradox, that is based on causal diagrams. We object recentsolution by Pearl. Our explanations comply with Lord’s initial comments. References
Lord, F. M. (1967). A Paradox in the Interpretation of Group Compar-isons.
Psychological Bulletin , (5), 304 – 305.Pearl, J. (2016). Lord’s Paradox Revisted - (Oh Lord Kumbaya!). Journalof Causal Inference ,4