On the Non-Monotonicity of a Non-Differentially Mismeasured Binary Confounder
OON THE NON-MONOTONICITY OF ANON-DIFFERENTIALLY MISMEASURED BINARYCONFOUNDER
JOSE M. PE ˜NAIDA, LINK ¨OPING UNIVERSITY, [email protected]
Abstract.
Suppose that we are interested in the average causaleffect of a binary treatment on an outcome when this relationship isconfounded by a binary confounder. Suppose that the confounderis unobserved but a non-differential binary proxy of it is observed.We identify conditions under which adjusting for the proxy comescloser to the incomputable true average causal effect than not ad-justing at all. Unlike other works, we do not assume that theaverage causal effect of the confounder on the outcome is in thesame direction among treated and untreated. Introduction
Suppose that we are interested in the average causal effect of a binarytreatment A on an outcome Y when this relationship is confoundedby a binary confounder C . Suppose also that C is non-differentiallymismeasured, meaning that (i) C is not observed and, instead, a binaryproxy D of C is observed, and (ii) D is conditionally independent of A and Y given C . The causal graph to the left in Figure 1 represents therelationships between the random variables.Greenland (1980) argues that adjusting for D produces a partiallyadjusted measure of the average causal effect of A on Y that is be-tween the crude (i.e., unadjusted) and the true (i.e., adjusted for C )measures and, thus, it comes closer to the incomputable true mea-sure than the crude one. Ogburn and VanderWeele (2012) show that,although this result does not always hold, it does hold under somemonotonicity condition in C . Specifically, E [ Y ∣ A, C ] must be non-decreasing or non-increasing in C . Unfortunately, the condition cannotbe verified empirically because C is unobserved. Ogburn and Vander-Weele (2013) extend these results to the case where C takes more thantwo values. Pe˜na (2020) shows that if E [ Y ∣ A, D ] is non-decreasingor non-increasing in D (which can be verified empirically), then so is E [ Y ∣ A, C ] with respect to C and, thus, the partially adjusted averagecausal effect lies between the crude and the true ones. Finally, if thereare at least two independent proxies of C , then Miao et al. (2018) show a r X i v : . [ s t a t . M E ] J a n A D YC A DYCαβ γδ
Figure 1.
Left: Causal graph where Y is a discrete orcontinuous random variable, and A , C and D are binaryrandom variables. Moreover, C is unobserved. Right:Path diagram where C is unobserved.that the average causal effect of A on Y can be identified under certainrank condition.In this paper, we focus on the case where neither E [ Y ∣ A, C ] nor E [ Y ∣ A, D ] are monotone in C or D . And we report conditions underwhich the partially adjusted average causal effect is still between thecrude and the true ones and, thus, it is still closer to the incomputabletrue average causal effect. Specifically, the rest of the paper is organizedas follows. Sections 2 and 3 report the novel conditions. Section 4deals with continuous random variables. Section 5 closes with somediscussion.2. Bounding the Observed Risk Difference
Consider the causal graph to the left in Figure 1, where Y is a discreteor continuous random variable, and A , C and D are binary randomvariables. The graph entails the following factorization: p ( A, C, D, Y ) = p ( C ) p ( D ∣ C ) p ( A ∣ C ) p ( Y ∣ A, C ) . (1)Let A take values a and a , and similarly for C and D . Let A , D and Y be observed and let C be unobserved. Let Y a and Y a denote the coun-terfactual outcomes under treatments A = a and A = a , respectively.The average causal effect of A on Y or true risk difference ( RD true )is defined as RD true = E [ Y a ] − E [ Y a ] . It can be rewritten as follows(Pearl, 2009, Theorem 3.3.2): RD true = E [ Y ∣ a, c ] p ( c ) + E [ Y ∣ a, c ] p ( c ) − E [ Y ∣ a, c ] p ( c ) − E [ Y ∣ a, c ] p ( c ) . Since C is unobserved, RD true cannot be computed. However, it canbe approximated by the unadjusted average causal effect or crude riskdifference ( RD crude ): RD crude = E [ Y ∣ a ] − E [ Y ∣ a ] and by the partially adjusted average causal effect or observed riskdifference ( RD obs ): RD obs = E [ Y ∣ a, d ] p ( d ) + E [ Y ∣ a, d ] p ( d ) − E [ Y ∣ a, d ] p ( d ) − E [ Y ∣ a, d ] p ( d ) . Now the question is, which of the two approximations comes closer tothe true quantity ? This paper aims to answer this question.We say that E [ Y ∣ A, C ] is non-decreasing in C if E [ Y ∣ a, c ] ≥ E [ Y ∣ a, c ] and E [ Y ∣ a, c ] ≥ E [ Y ∣ a, c ] . Likewise, E [ Y ∣ A, C ] is non-increasing in C if E [ Y ∣ a, c ] ≤ E [ Y ∣ a, c ] and E [ Y ∣ a, c ] ≤ E [ Y ∣ a, c ] . Moreover, E [ Y ∣ A, C ] is monotone in C if it is non-decreasing or non-increasing in C , i.e. the average causal effect of C on Y is in the samedirection among the treated ( A = a ) and the untreated ( A = a ). Ogburnand VanderWeele (2012, Result 1) show that if E [ Y ∣ A, C ] is monotonein C , then RD obs lies between RD true and RD crude and, thus, it comescloser to RD true than RD crude . Unfortunately, the antecedent of thisrule cannot be verified empirically, because C is unobserved. Therefore,one must rely on substantive knowledge to apply the rule. Pe˜na (2020,Corollay 2) shows that if E [ Y ∣ A, D ] is monotone in D , then RD obs liesbetween RD true and RD crude . Note that the antecedent of this rule canbe verified empirically. Actually, E [ Y ∣ A, C ] is monotone in C if andonly if E [ Y ∣ A, D ] is monotone in D (Ogburn and VanderWeele, 2012;Pe˜na, 2020).Pe˜na (2020, Theorems 3 and 4) characterizes a case where E [ Y ∣ A, C ] is not monotone in C and, thus, E [ Y ∣ A, D ] is not monotone in D , andyet RD obs lies between RD true and RD crude . We re-state this result inthe next theorem. Note that one must rely on substantive knowledgeto verify the conditions in the theorem. Theorem 1 (Pe˜na, 2020, Theorems 3 and 4) . Consider the causalgraph to the left in Figure 1. Let p ( c ) = . and p ( a ∣ c ) = p ( a ∣ c ) = p ( d ∣ c ) = p ( d ∣ c ) ≥ . . If E [ Y ∣ a, c ] − E [ Y ∣ a, c ] ≥ E [ Y ∣ a, c ] − E [ Y ∣ a, c ] ≥ ,then RD crude ≥ RD obs ≥ RD true . If E [ Y ∣ a, c ] − E [ Y ∣ a, c ] ≤ E [ Y ∣ a, c ] − E [ Y ∣ a, c ] ≤ , then RD crude ≤ RD obs ≤ RD true . The following theorems are the main contribution of this work. Theyshow that the conditions in the previous theorem can be relaxed. Theirproofs can be found in the supplementary material.
Theorem 2.
Consider the causal graph to the left in Figure 1. Let p ( c ) = . , p ( a ∣ c ) = p ( a ∣ c ) ≥ . and p ( d ∣ c ) = p ( d ∣ c ) ≥ . . Then, RD obs lies between RD true and RD crude . Theorem 3.
Consider the causal graph to the left in Figure 1. Let p ( c ) = . , p ( a ∣ c ) = p ( a ∣ c ) ≤ . and p ( d ∣ c ) = p ( d ∣ c ) ≤ . . Then, RD obs lies between RD true and RD crude . The following example gives some intuition about the conditions inTheorem 2. Let A , D and Y represent three diseases, and C a gene vari-ant that affects the three of them. Moreover, suppose that suffering A affects the risk of suffering Y . Suppose also that half of the populationcarries the gene variant C , i.e. p ( c ) = .
5. Suppose also that carrying C predisposes to suffer A and D as much as not carrying it protectsagainst the diseases, i.e. p ( a ∣ c ) = p ( a ∣ c ) ≥ . p ( d ∣ c ) = p ( d ∣ c ) ≥ . Corollary 4.
Consider the causal graph to the left in Figure 1. Let p ( c ) = . , p ( a ∣ c ) = p ( a ∣ c ) ≥ . and p ( d ∣ c ) = p ( d ∣ c ) ≥ . . If E [ Y ∣ a, c ] − E [ Y ∣ a, c ] ≥ E [ Y ∣ a, c ]− E [ Y ∣ a, c ] ≥ , then RD crude ≥ RD obs ≥ RD true . If E [ Y ∣ a, c ]− E [ Y ∣ a, c ] ≤ E [ Y ∣ a, c ]− E [ Y ∣ a, c ] ≤ , then RD crude ≤ RD obs ≤ RD true . Corollary 5.
Consider the causal graph to the left in Figure 1. Let p ( c ) = . , p ( a ∣ c ) = p ( a ∣ c ) ≤ . and p ( d ∣ c ) = p ( d ∣ c ) ≤ . . If E [ Y ∣ a, c ] − E [ Y ∣ a, c ] ≥ E [ Y ∣ a, c ]− E [ Y ∣ a, c ] ≥ , then RD crude ≤ RD obs ≤ RD true . If E [ Y ∣ a, c ]− E [ Y ∣ a, c ] ≤ E [ Y ∣ a, c ]− E [ Y ∣ a, c ] ≤ , then RD crude ≥ RD obs ≥ RD true . To get some intuition about the conditions for the first result inCorollary 4, let us extend the previous example with the following ad-ditional assumption: Carrying the gene variant C increases the averageseverity of Y for the individuals suffering A more than it decreases theseverity for the rest. Then, the corollary applies.Note that one must rely on substantive knowledge to verify the condi-tions in the previous theorems and corollaries. The next two corollariesshow that this can partially be alleviated by replacing the conditionson E [ Y ∣ A, C ] with similar conditions on E [ Y ∣ A, D ] : The former arenot empirically testable because C is unobserved, but the latter are. Corollary 6.
Under the conditions in Corollary 4, E [ Y ∣ a, c ]− E [ Y ∣ a, c ] ≥ E [ Y ∣ a, c ]− E [ Y ∣ a, c ] ≥ if and only if E [ Y ∣ a, d ]− E [ Y ∣ a, d ] ≥ E [ Y ∣ a, d ]− E [ Y ∣ a, d ] ≥ . Likewise when replacing ≥ with ≤ . Corollary 7.
Under the conditions in Corollary 5, E [ Y ∣ a, c ]− E [ Y ∣ a, c ] ≥ E [ Y ∣ a, c ]− E [ Y ∣ a, c ] ≥ if and only if E [ Y ∣ a, d ]− E [ Y ∣ a, d ] ≤ E [ Y ∣ a, d ]− E [ Y ∣ a, d ] ≤ . Likewise when swapping ≤ and ≥ . Experiments.
In this section, we report some experiments thatshed additional light on the relationships between the various risk dif-ferences under the conditions in Theorem 2. For the experiments, welet Y be binary. Then, we randomly parameterize 10000 times thecausal graph to the left in Figure 1 by parameterizing the terms inthe right-hand side of Equation 1 with parameter values drawn from auniform distribution, while enforcing the assumptions in the theorem.For each parameterization, we compute RD true , RD obs and RD crude .Figure 2 summarizes the results. The top left plot shows that most Code available at . Figure 2. (tl) Histogram of the interval length. (tr)Distance between RD obs and RD true relative to the in-terval length. (bl) Zoom of the previous plot. (br) Dis-tance between RD obs and RD true relative to the intervallength, as a function of the strength of the dependencebetween C and D when measured by the Youden index.intervals are relatively small and, thus, that RD obs is close to RD true inmost cases. However, the top right plot shows that RD obs tends to becloser to RD crude than to RD true . The bottom left plot is a zoom of theprevious plot at the smallest intervals. Finally, the bottom right plotshows that the stronger the dependence between C and D as measuredby the Youden index (i.e., p ( d ∣ c ) + p ( d ∣ c ) − RD obs is to RD true . In summary, RD obs is a reasonable approximation to RD true ,but it is biased towards RD crude . This may be a problem when the in-terval between RD crude and RD true is large. However, the length of theinterval is unknown in practice, and we doubt substantive knowledge may provide hints on it. The bias decreases with increasing depen-dence between C and D . Although the strength of this dependence isunknown in practice, substantive knowledge may give hints on it.3. Bounding the True Risk Difference
Theorems and Corollaries 1-5 do not hold if the assumption that p ( a ∣ c ) = p ( a ∣ c ) ≥ . p ( a ∣ c ) ≥ p ( a ∣ c ) ≥ .
5. Likewise for the assumption that p ( d ∣ c ) = p ( d ∣ c ) ≥ . p ( a ∣ c ) = p ( a ∣ c ) ≥ . RD crude and RD obs bound RD true . We re-state this result in the next theorem. Theorem 8 (Pe˜na, 2020, Theorems 5 and 6) . Consider the causalgraph to the left in Figure 1. Let p ( c ) = . , p ( d ∣ c ) = p ( d ∣ c ) ≥ . and p ( a ∣ c ) ≥ p ( a ∣ c ) ≥ . . If E [ Y ∣ a, c ] − E [ Y ∣ a, c ] ≥ E [ Y ∣ a, c ] − E [ Y ∣ a, c ] ≥ ,then RD crude ≥ RD true and RD obs ≥ RD true . If E [ Y ∣ a, c ] − E [ Y ∣ a, c ] ≤ E [ Y ∣ a, c ] − E [ Y ∣ a, c ] ≤ , then RD crude ≤ RD true and RD obs ≤ RD true . Note that the previous theorem does not determine the order be-tween RD crude and RD obs . Thus, it cannot be used to decide which ofthe two comes closer to RD true . However, the theorem may be useful toconclude whether RD true is positive or negative. For instance, the lastresult in the theorem allows us to conclude that RD true > ( RD crude , RD obs ) > p ( d ∣ c ) = p ( d ∣ c ) ≥ . Theorem 9.
Consider the causal graph to the left in Figure 1. Let p ( c ) = . , p ( a ∣ c ) ≥ p ( a ∣ c ) ≥ . and p ( d ∣ c ) ≥ p ( d ∣ c ) ≥ . . If E [ Y ∣ a, c ] − E [ Y ∣ a, c ] ≥ E [ Y ∣ a, c ]− E [ Y ∣ a, c ] ≥ , then RD crude ≥ RD true and RD obs ≥ RD true . If E [ Y ∣ a, c ]− E [ Y ∣ a, c ] ≤ E [ Y ∣ a, c ]− E [ Y ∣ a, c ] ≤ , then RD crude ≤ RD true and RD obs ≤ RD true . Theorem 10.
Consider the causal graph to the left in Figure 1. Let p ( c ) = . , p ( a ∣ c ) ≤ p ( a ∣ c ) ≤ . and p ( d ∣ c ) ≤ p ( d ∣ c ) ≤ . . If E [ Y ∣ a, c ] − E [ Y ∣ a, c ] ≥ E [ Y ∣ a, c ]− E [ Y ∣ a, c ] ≥ , then RD crude ≤ RD true and RD obs ≤ RD true . If E [ Y ∣ a, c ]− E [ Y ∣ a, c ] ≤ E [ Y ∣ a, c ]− E [ Y ∣ a, c ] ≤ , then RD crude ≥ RD true and RD obs ≥ RD true . Returning to our example of three diseases A , D and Y and a genevariant C , the assumptions for the first result in Theorem 9 mean that(i) half of the population carry the gene variant C , i.e. p ( c ) = . C protects against A and D more than carrying itpredisposes to suffer the diseases, i.e. p ( a ∣ c ) ≥ p ( a ∣ c ) ≥ . p ( d ∣ c ) ≥ p ( d ∣ c ) ≥ .
5, and (iii) carrying C increases the average severity of Y forthe individuals suffering A more than it decreases the severity for therest. The last two theorems can be strengthened for RD crude as follows.Analogous results do not hold for RD obs , though. Theorem 11.
Consider the causal graph to the left in Figure 1. Let p ( c ) ≤ . and p ( a ∣ c ) ≥ p ( a ∣ c ) ≥ . . If E [ Y ∣ a, c ]− E [ Y ∣ a, c ] ≥ E [ Y ∣ a, c ]− E [ Y ∣ a, c ] ≥ , then RD crude ≥ RD true . If E [ Y ∣ a, c ]− E [ Y ∣ a, c ] ≤ E [ Y ∣ a, c ]− E [ Y ∣ a, c ] ≤ , then RD crude ≤ RD true . Theorem 12.
Consider the causal graph to the left in Figure 1. Let p ( c ) ≥ . and p ( a ∣ c ) ≤ p ( a ∣ c ) ≤ . . If E [ Y ∣ a, c ]− E [ Y ∣ a, c ] ≥ E [ Y ∣ a, c ]− E [ Y ∣ a, c ] ≥ , then RD crude ≤ RD true . If E [ Y ∣ a, c ]− E [ Y ∣ a, c ] ≤ E [ Y ∣ a, c ]− E [ Y ∣ a, c ] ≤ , then RD crude ≥ RD true . Path Diagrams
Finally, we suppose that the variables A , C , D and Y are all con-tinuous and follow the linear structural equation model represented bythe path diagram to the right in Figure 1. The true, crude and par-tially adjusted average causal effects of A on Y are given by the partialregression coefficients β Y A ⋅ C , β Y A and β Y A ⋅ D , respectively. Note thatthe first cannot be computed because C is unobserved. The followingtheorem proves that the partially adjusted average causal effect liesbetween the true and the crude ones and, thus, it comes closer to thetrue average causal effect than the crude. Theorem 13.
Consider the path diagram to the right in Figure 1.Assume that the variables are standardized. If sign ( β ) = sign ( γ ) then β Y A ⋅ C ≤ β Y A ⋅ D ≤ β Y A , else β Y A ⋅ C ≥ β Y A ⋅ D ≥ β Y A . Note that unlike in the discrete case, no assumptions about the causalrelationships of the variables are required to conclude that the partiallyadjusted average causal effect lies between the true and the crude ones.Note also that the signs of β and γ tell us whether the partially adjustedaverage causal effect is an upper or lower bound of the true one.5. Discussion
One may think that adjusting for a proxy of a latent confounder isalways a good idea. However, it is not. In this work, we have describedsufficient conditions under which adjusting for a proxy of a latent con-founder comes closer to the incomputable true average causal effectthan not adjusting at all. Under some conditions, it is even possibleto decide whether the partially adjusted approximation is an upper ora lower bound of the true quantity. We have experimentally shownthat the partially adjusted approximation can be substantially betterthan the unadjusted one when the dependence between confounder andproxy is significant. We have also illustrated with an example that theconditions proposed are not too restrictive and unrealistic. Since onemust rely on expert knowledge to verify the conditions, we would like to investigate in the future whether realistic, sufficient and empiricallytestable conditions exist. We would also like to extend this work towhen several latent confounders exist.
Acknowledgments
This work was funded by the Swedish Research Council (ref. 2019-00245).
Supplementary Material: Proofs
Theorem 2.
Consider the causal graph to the left in Figure 1. Let p ( c ) = . , p ( a ∣ c ) = p ( a ∣ c ) ≥ . and p ( d ∣ c ) = p ( d ∣ c ) ≥ . . Then, RD obs lies between RD true and RD crude .Proof. We start by establishing a relationship between RD obs and RD true .First, note that p ( c ∣ a, d ) = p ( a, d ∣ c ) p ( c ) p ( a, d ∣ c ) p ( c ) + p ( a, d ∣ c ) p ( c ) = + exp (− δ ( a, d )) = σ ( δ ( a, d )) where δ ( a, d ) = ln p ( a, d ∣ c ) p ( c ) p ( a, d ∣ c ) p ( c ) is known as the log odds, and σ () is known as the logistic sigmoidfunction (Bishop, 2006, Section 4.2). Then, p ( c ∣ a, d ) = σ ( ln p ( a, d ∣ c ) p ( c ) p ( a, d ∣ c ) p ( c ) ) = σ ( ln p ( a ∣ c ) p ( d ∣ c ) p ( a ∣ c ) p ( d ∣ c ) ) (2)where the second equality follows from the assumption that p ( c ) = . A and D are conditionally independent given C dueto the causal graph under consideration. Likewise, p ( c ∣ a, d ) = σ ( ln p ( a ∣ c ) p ( d ∣ c ) p ( a ∣ c ) p ( d ∣ c ) ) (3)and p ( c ∣ d ) = σ ( ln p ( d ∣ c ) p ( d ∣ c ) ) (4)and p ( c ∣ d ) = σ ( ln p ( d ∣ c ) p ( d ∣ c ) ) . (5)Then, p ( c ∣ a, d ) ≥ p ( c ∣ d ) and p ( c ∣ a, d ) ≥ p ( c ∣ d ) because σ () and ln () are increasing functions and p ( a ∣ c )/ p ( a ∣ c ) ≥ p ( a ∣ c ) = p ( a ∣ c ) ≥ .
5. Then, p ( c ∣ a, d ) p ( d ) + p ( c ∣ a, d ) p ( d ) ≥ p ( c ∣ d ) p ( d ) + p ( c ∣ d ) p ( d )= p ( d ∣ c ) p ( c ) p ( d ) p ( d ) + p ( d ∣ c ) p ( c ) p ( d ) p ( d ) = . p ( c ) = .
5. Moreover, p ( c ∣ a, d ) p ( d ) + p ( c ∣ a, d ) p ( d ) = − ( p ( c ∣ a, d ) p ( d ) + p ( c ∣ a, d ) p ( d )) . (7)Next, note that p ( c ∣ a, d ) = σ ( ln p ( a ∣ c ) p ( d ∣ c ) p ( a ∣ c ) p ( d ∣ c ) ) = σ ( ln p ( a ∣ c ) p ( d ∣ c ) p ( a ∣ c ) p ( d ∣ c ) ) = p ( c ∣ a, d ) (8)by the assumptions that p ( a ∣ c ) = p ( a ∣ c ) and p ( d ∣ c ) = p ( d ∣ c ) . Likewise, p ( c ∣ a, d ) = σ ( ln p ( a ∣ c ) p ( d ∣ c ) p ( a ∣ c ) p ( d ∣ c ) ) = σ ( ln p ( a ∣ c ) p ( d ∣ c ) p ( a ∣ c ) p ( d ∣ c ) ) = p ( c ∣ a, d ) . (9)Then, p ( c ∣ a, d ) p ( d ) + p ( c ∣ a, d ) p ( d ) = p ( c ∣ a, d ) p ( d ) + p ( c ∣ a, d ) p ( d ) (10)because p ( d ) = p ( d ∣ c ) p ( c ) + p ( d ∣ c ) p ( c ) = p ( d ∣ c ) p ( c ) + p ( d ∣ c ) p ( c ) = . . (11)Moreover, p ( c ∣ a, d ) p ( d ) + p ( c ∣ a, d ) p ( d ) = − ( p ( c ∣ a, d ) p ( d ) + p ( c ∣ a, d ) p ( d )) . (12)Finally, Equations 6 and 10 allow us to write p ( c ∣ a, d ) p ( d )+ p ( c ∣ a, d ) p ( d ) = p ( c ∣ a, d ) p ( d ) + p ( c ∣ a, d ) p ( d ) = . + α with α ≥
0, whereas Equations 7,10 and 12 allow us to write p ( c ∣ a, d ) p ( d ) + p ( c ∣ a, d ) p ( d ) = p ( c ∣ a, d ) p ( d ) + p ( c ∣ a, d ) p ( d ) = . − α . Therefore, RD obs = E [ Y ∣ a, d ] p ( d ) + E [ Y ∣ a, d ] p ( d ) − E [ Y ∣ a, d ] p ( d ) − E [ Y ∣ a, d ] p ( d )= ( E [ Y ∣ a, c, d ] p ( c ∣ a, d ) + E [ Y ∣ a, c, d ] p ( c ∣ a, d )) p ( d )+ ( E [ Y ∣ a, c, d ] p ( c ∣ a, d ) + E [ Y ∣ a, c, d ] p ( c ∣ a, d )) p ( d )− ( E [ Y ∣ a, c, d ] p ( c ∣ a, d ) + E [ Y ∣ a, c, d ] p ( c ∣ a, d )) p ( d )− ( E [ Y ∣ a, c, d ] p ( c ∣ a, d ) + E [ Y ∣ a, c, d ] p ( c ∣ a, d )) p ( d )= E [ Y ∣ a, c ]( p ( c ∣ a, d ) p ( d ) + p ( c ∣ a, d ) p ( d ))+ E [ Y ∣ a, c ]( p ( c ∣ a, d ) p ( d ) + p ( c ∣ a, d ) p ( d ))− E [ Y ∣ a, c ]( p ( c ∣ a, d ) p ( d ) + p ( c ∣ a, d ) p ( d ))− E [ Y ∣ a, c ]( p ( c ∣ a, d ) p ( d ) + p ( c ∣ a, d ) p ( d ))= E [ Y ∣ a, c ]( . + α ) + E [ Y ∣ a, c ]( . − α )− E [ Y ∣ a, c ]( . − α ) − E [ Y ∣ a, c ]( . + α ) where the third equality follows from the fact that Y and D are con-ditionally independent given A and C due to the causal graph underconsideration. Then, RD obs = RD true + α ( E [ Y ∣ a, c ] − E [ Y ∣ a, c ] + E [ Y ∣ a, c ] − E [ Y ∣ a, c ]) (13)with α ≥ RD obs and RD crude . First, note that p ( a ∣ d ) = p ( a ∣ c, d ) p ( c ∣ d ) + p ( a ∣ c, d ) p ( c ∣ d )= p ( a ∣ c ) p ( c ∣ d ) + p ( a ∣ c ) p ( c ∣ d ) (14) = p ( a ∣ c ) p ( c ∣ d ) + p ( a ∣ c ) p ( c ∣ d )= p ( a ∣ c, d ) p ( c ∣ d ) + p ( a ∣ c, d ) p ( c ∣ d )= p ( a ∣ d ) by the fact that A and D are conditionally independent given C dueto the causal graph under consideration, the assumption that p ( a ∣ c ) = p ( a ∣ c ) , and the fact that p ( c ∣ d ) = p ( d ∣ c ) = p ( d ∣ c ) = p ( c ∣ d ) which followsfrom the assumptions that p ( c ) = . p ( d ∣ c ) = p ( d ∣ c ) and the factthat p ( d ) = . p ( a ∣ d ) = p ( a ∣ c, d ) p ( c ∣ d ) + p ( a ∣ c, d ) p ( c ∣ d )= p ( a ∣ c ) p ( c ∣ d ) + p ( a ∣ c ) p ( c ∣ d ) (15) = p ( a ∣ c ) p ( c ∣ d ) + p ( a ∣ c ) p ( c ∣ d )= p ( a ∣ c, d ) p ( c ∣ d ) + p ( a ∣ c, d ) p ( c ∣ d )= p ( a ∣ d ) . Next, let x = p ( a ∣ c ) and z = p ( d ∣ c ) . Recall that p ( a ∣ c ) = p ( a ∣ c ) by as-sumption, and p ( c ∣ d ) = p ( d ∣ c ) = p ( d ∣ c ) = p ( c ∣ d ) as shown above. Then,Equation 15 can be rewritten as p ( a ∣ d ) = x ( − z ) + ( − x ) z = − xz + x + z. Recall that z ≥ . z = .
5, then − xz + x + z = . z > . − xz + x + z ≤ .
5. To see it, assumeto the contrary that − xz + x + z > . x ( − z ) > . − z = ( − z )/ x < . x ≥ .
5. Consequently, p ( a ∣ d ) = p ( a ∣ d ) ≤ . p ( a ∣ d ) = p ( a ∣ d ) ≥ . p ( a ) = p ( a ∣ c ) p ( c ) + p ( a ∣ c ) p ( c ) = p ( a ∣ c ) p ( c ) + p ( a ∣ c ) p ( c ) = . p ( a ∣ c ) = p ( a ∣ c ) and p ( c ) = .
5. This togetherwith the fact that p ( d ) = . p ( d ∣ a ) = p ( d ∣ a ) ≤ . p ( d ∣ a ) = p ( d ∣ a ) ≥ . p ( d ∣ a ) = p ( d ∣ a ) = . + β and p ( d ∣ a ) = p ( d ∣ a ) = . − β with β ≥ RD crude = E [ Y ∣ a ] − E [ Y ∣ a ]= E [ Y ∣ a, d ] p ( d ∣ a ) + E [ Y ∣ a, d ] p ( d ∣ a )− E [ Y ∣ a, d ] p ( d ∣ a ) − E [ Y ∣ a, d ] p ( d ∣ a )= E [ Y ∣ a, d ]( . + β ) + E [ Y ∣ a, d ]( . − β )− E [ Y ∣ a, d ]( . − β ) − E [ Y ∣ a, d ]( . + β )= RD obs + β ( E [ Y ∣ a, d ] − E [ Y ∣ a, d ] + E [ Y ∣ a, d ] − E [ Y ∣ a, d ]) (16)with β ≥
0, where the last equality follows from the fact that p ( d ) = . Now, note that E [ Y ∣ a, d ] − E [ Y ∣ a, d ] = E [ Y ∣ a, c, d ] p ( c ∣ a, d ) + E [ Y ∣ a, c, d ] p ( c ∣ a, d )− E [ Y ∣ a, c, d ] p ( c ∣ a, d ) − E [ Y ∣ a, c, d ] p ( c ∣ a, d )= E [ Y ∣ a, c ]( p ( c ∣ a, d ) − p ( c ∣ a, d ))− E [ Y ∣ a, c ]( p ( c ∣ a, d ) − p ( c ∣ a, d ))= E [ Y ∣ a, c ]( p ( c ∣ a, d ) − p ( c ∣ a, d ))− E [ Y ∣ a, c ]( − p ( c ∣ a, d ) − + p ( c ∣ a, d ))= ( E [ Y ∣ a, c ] − E [ Y ∣ a, c ])( p ( c ∣ a, d ) − p ( c ∣ a, d )) (17)where the second equality follows from the fact that Y and D areconditionally independent given A and C due to the causal graph underconsideration. Likewise, E [ Y ∣ a, d ] − E [ Y ∣ a, d ] = E [ Y ∣ a, c, d ] p ( c ∣ a, d ) + E [ Y ∣ a, c, d ] p ( c ∣ a, d )− E [ Y ∣ a, c, d ] p ( c ∣ a, d ) − E [ Y ∣ a, c, d ] p ( c ∣ a, d )= E [ Y ∣ a, c ]( p ( c ∣ a, d ) − p ( c ∣ a, d ))− E [ Y ∣ a, c ]( p ( c ∣ a, d ) − p ( c ∣ a, d ))= E [ Y ∣ a, c ]( p ( c ∣ a, d ) − p ( c ∣ a, d ))− E [ Y ∣ a, c ]( − p ( c ∣ a, d ) − + p ( c ∣ a, d ))= ( E [ Y ∣ a, c ] − E [ Y ∣ a, c ])( p ( c ∣ a, d ) − p ( c ∣ a, d ))= ( E [ Y ∣ a, c ] − E [ Y ∣ a, c ])( p ( c ∣ a, d ) − p ( c ∣ a, d )) (18)where the last equality follows from Equations 8 and 9. Moreover,Equations 8 and 9 also imply that p ( c ∣ a, d ) ≥ p ( c ∣ a, d ) , because p ( d ∣ c ) = p ( d ∣ c ) ≥ . sign ( E [ Y ∣ a, c ] − E [ Y ∣ a, c ] + E [ Y ∣ a, c ] − E [ Y ∣ a, c ]) = sign ( E [ Y ∣ a, d ] − E [ Y ∣ a, d ] + E [ Y ∣ a, d ] − E [ Y ∣ a, d ]) . (19)This equation together with Equations 13 and 16 imply the desiredresult. (cid:3) Theorem 3.
Consider the causal graph to the left in Figure 1. Let p ( c ) = . , p ( a ∣ c ) = p ( a ∣ c ) ≤ . and p ( d ∣ c ) = p ( d ∣ c ) ≤ . . Then, RD obs lies between RD true and RD crude .Proof. Similar to the proof of Theorem 2. Specifically, the assump-tion that p ( a ∣ c ) = p ( a ∣ c ) ≤ . p ( a ∣ c )/ p ( a ∣ c ) ≤
1, whichimplies that p ( c ∣ a, d ) ≤ p ( c ∣ d ) and p ( c ∣ a, d ) ≤ p ( c ∣ d ) , which implies that p ( c ∣ a, d ) p ( d ) + p ( c ∣ a, d ) p ( d ) ≤ .
5, which implies that RD obs = RD true + α ( E [ Y ∣ a, c ] − E [ Y ∣ a, c ] + E [ Y ∣ a, c ] − E [ Y ∣ a, c ]) (20)with α ≤
0. Likewise, the assumption that p ( d ∣ c ) = p ( d ∣ c ) ≤ . p ( a ∣ d ) = p ( a ∣ d ) ≤ . p ( a ∣ d ) = p ( a ∣ d ) ≥ .
5, which implies that p ( d ∣ a ) = p ( d ∣ a ) ≤ . p ( d ∣ a ) = p ( d ∣ a ) ≥ .
5, which implies that RD crude = RD obs + β ( E [ Y ∣ a, d ]− E [ Y ∣ a, d ]+ E [ Y ∣ a, d ]− E [ Y ∣ a, d ]) (21)with β ≥
0. The assumption that p ( d ∣ c ) = p ( d ∣ c ) ≤ . p ( c ∣ a, d ) ≤ p ( c ∣ a, d ) , which implies that sign ( E [ Y ∣ a, c ] − E [ Y ∣ a, c ] + E [ Y ∣ a, c ] − E [ Y ∣ a, c ]) =− sign ( E [ Y ∣ a, d ] − E [ Y ∣ a, d ] + E [ Y ∣ a, d ] − E [ Y ∣ a, d ]) . (22)This equation together with Equations 20 and 21 imply the desiredresult. (cid:3) Corollary 4.
Consider the causal graph to the left in Figure 1. Let p ( c ) = . , p ( a ∣ c ) = p ( a ∣ c ) ≥ . and p ( d ∣ c ) = p ( d ∣ c ) ≥ . . If E [ Y ∣ a, c ] − E [ Y ∣ a, c ] ≥ E [ Y ∣ a, c ]− E [ Y ∣ a, c ] ≥ , then RD crude ≥ RD obs ≥ RD true . If E [ Y ∣ a, c ]− E [ Y ∣ a, c ] ≤ E [ Y ∣ a, c ]− E [ Y ∣ a, c ] ≤ , then RD crude ≤ RD obs ≤ RD true .Proof. It follows from Equations 13, 16 and 19. (cid:3)
Corollary 5.
Consider the causal graph to the left in Figure 1. Let p ( c ) = . , p ( a ∣ c ) = p ( a ∣ c ) ≤ . and p ( d ∣ c ) = p ( d ∣ c ) ≤ . . If E [ Y ∣ a, c ] − E [ Y ∣ a, c ] ≥ E [ Y ∣ a, c ]− E [ Y ∣ a, c ] ≥ , then RD crude ≤ RD obs ≤ RD true . If E [ Y ∣ a, c ]− E [ Y ∣ a, c ] ≤ E [ Y ∣ a, c ]− E [ Y ∣ a, c ] ≤ , then RD crude ≥ RD obs ≥ RD true .Proof. It follows from Equations 20, 21 and 22. (cid:3)
Corollary 6.
Under the conditions in Corollary 4, E [ Y ∣ a, c ]− E [ Y ∣ a, c ] ≥ E [ Y ∣ a, c ]− E [ Y ∣ a, c ] ≥ if and only if E [ Y ∣ a, d ]− E [ Y ∣ a, d ] ≥ E [ Y ∣ a, d ]− E [ Y ∣ a, d ] ≥ . Likewise when replacing ≥ with ≤ .Proof. It follows from Equations 17 and 18. Recall that p ( c ∣ a, d ) ≥ p ( c ∣ a, d ) was established in the proof of Theorem 2. (cid:3) Corollary 7.
Under the conditions in Corollary 5, E [ Y ∣ a, c ]− E [ Y ∣ a, c ] ≥ E [ Y ∣ a, c ]− E [ Y ∣ a, c ] ≥ if and only if E [ Y ∣ a, d ]− E [ Y ∣ a, d ] ≤ E [ Y ∣ a, d ]− E [ Y ∣ a, d ] ≤ . Likewise when swapping ≤ and ≥ .Proof. It follows from Equations 17 and 18. Recall that p ( c ∣ a, d ) ≤ p ( c ∣ a, d ) was established in the proof of Theorem 3. (cid:3) Theorem 9.
Consider the causal graph to the left in Figure 1. Let p ( c ) = . , p ( a ∣ c ) ≥ p ( a ∣ c ) ≥ . and p ( d ∣ c ) ≥ p ( d ∣ c ) ≥ . . If E [ Y ∣ a, c ] − E [ Y ∣ a, c ] ≥ E [ Y ∣ a, c ]− E [ Y ∣ a, c ] ≥ , then RD crude ≥ RD true and RD obs ≥ RD true . If E [ Y ∣ a, c ]− E [ Y ∣ a, c ] ≤ E [ Y ∣ a, c ]− E [ Y ∣ a, c ] ≤ , then RD crude ≤ RD true and RD obs ≤ RD true .Proof. We start by proving the first result in the theorem, specificallythat RD crude ≥ RD true . Recall from the proof of Theorem 2 that p ( c ∣ a ) = σ ( ln p ( a ∣ c ) p ( c ) p ( a ∣ c ) p ( c ) ) = σ ( ln p ( a ∣ c ) p ( a ∣ c ) ) where the second equality follows from the assumption that p ( c ) = . p ( c ∣ a ) = σ ( ln p ( a ∣ c ) p ( a ∣ c ) ) . Therefore, p ( c ∣ a ) ≥ . p ( c ∣ a ) ≥ . p ( a ∣ c ) ≥ p ( a ∣ c ) ≥ .
5. Now, consider the function f ( x ) = x ( − x ) .By inspecting the first and second derivatives, we can conclude that f ( x ) has a single maximum at x = .
5, and that it is increasing inthe interval [ , . ] and decreasing in the interval [ . , ] . This impliesthat f ( p ( a ∣ c )) = p ( a ∣ c ) p ( a ∣ c ) ≥ p ( a ∣ c ) p ( a ∣ c ) = f ( p ( a ∣ c )) due to theassumption that p ( a ∣ c ) ≥ p ( a ∣ c ) ≥ .
5. Then, p ( a ∣ c ) p ( a ∣ c ) ≥ p ( a ∣ c ) p ( a ∣ c ) (23)which together with the fact that σ () and ln () are increasing functionsimply that p ( c ∣ a ) ≥ p ( c ∣ a ) .The results in the previous paragraph allow us to write p ( c ∣ a ) = . + α and p ( c ∣ a ) = . + β with α ≥ β ≥
0. Therefore, RD crude = E [ Y ∣ a ] − E [ Y ∣ a ]= E [ Y ∣ a, c ] p ( c ∣ a ) + E [ Y ∣ a, c ] p ( c ∣ a )− E [ Y ∣ a, c ] p ( c ∣ a ) − E [ Y ∣ a, c ] p ( c ∣ a )= E [ Y ∣ a, c ]( . + α ) + E [ Y ∣ a, c ]( . − α )− E [ Y ∣ a, c ]( . − β ) − E [ Y ∣ a, c ]( . + β )= RD true + α ( E [ Y ∣ a, c ] − E [ Y ∣ a, c ]) − β ( E [ Y ∣ a, c ] − E [ Y ∣ a, c ]) (24)which implies that RD crude ≥ RD true because α ≥ β ≥
0, and E [ Y ∣ a, c ] − E [ Y ∣ a, c ] ≥ E [ Y ∣ a, c ] − E [ Y ∣ a, c ] ≥ RD obs ≥ RD true . First, recall Equations2-5. Then, p ( c ∣ a, d ) ≥ p ( c ∣ d ) and p ( c ∣ a, d ) ≥ p ( c ∣ d ) because σ () and ln () are increasing functions and p ( a ∣ c )/ p ( a ∣ c ) ≥ p ( a ∣ c ) ≥ p ( a ∣ c ) ≥ .
5. Then, p ( c ∣ a, d ) p ( d ) + p ( c ∣ a, d ) p ( d ) ≥ p ( c ∣ d ) p ( d ) + p ( c ∣ d ) p ( d )= p ( d ∣ c ) p ( c ) p ( d ) p ( d ) + p ( d ∣ c ) p ( c ) p ( d ) p ( d ) = . by the assumption that p ( c ) = .
5. We can analogously prove that p ( c ∣ a, d ) p ( d ) + p ( c ∣ a, d ) p ( d ) ≥ .
5. Moreover, it also holds that p ( c ∣ a, d ) p ( d ) + p ( c ∣ a, d ) p ( d ) ≥ p ( c ∣ a, d ) p ( d ) + p ( c ∣ a, d ) p ( d ) . To prove this inequality and after failing to do it on our own, weresorted to the function
FindInstance from
Mathematica 12.2.0 .Specifically, we used
FindInstance to find an instance of the prob-abilities that satisfied the reverse of the inequality above subject to p ( c ) = . p ( a ∣ c ) ≥ p ( a ∣ c ) ≥ . p ( d ∣ c ) ≥ p ( d ∣ c ) ≥ .
5. Since nosuch instance was found, the inequality above must hold. It is worthmentioning that
FindInstance works analytically and not numericallyand, thus, its outcome is exact and correct. The results in the previous paragraph allow us to write p ( c ∣ a, d ) p ( d )+ p ( c ∣ a, d ) p ( d ) = . + α and p ( c ∣ a, d ) p ( d ) + p ( c ∣ a, d ) p ( d ) = . + β with α ≥ β ≥
0. Consequently, p ( c ∣ a, d ) p ( d ) + p ( c ∣ a, d ) p ( d ) = − ( p ( c ∣ a, d ) p ( d ) + p ( c ∣ a, d ) p ( d )) = . − α , and p ( c ∣ a, d ) p ( d )+ p ( c ∣ a, d ) p ( d ) = −( p ( c ∣ a, d ) p ( d )+ p ( c ∣ a, d ) p ( d )) = . − β . Therefore, RD obs = E [ Y ∣ a, d ] p ( d ) + E [ Y ∣ a, d ] p ( d ) − E [ Y ∣ a, d ] p ( d ) − E [ Y ∣ a, d ] p ( d )= ( E [ Y ∣ a, c, d ] p ( c ∣ a, d ) + E [ Y ∣ a, c, d ] p ( c ∣ a, d )) p ( d )+ ( E [ Y ∣ a, c, d ] p ( c ∣ a, d ) + E [ Y ∣ a, c, d ] p ( c ∣ a, d )) p ( d )− ( E [ Y ∣ a, c, d ] p ( c ∣ a, d ) + E [ Y ∣ a, c, d ] p ( c ∣ a, d )) p ( d )− ( E [ Y ∣ a, c, d ] p ( c ∣ a, d ) + E [ Y ∣ a, c, d ] p ( c ∣ a, d )) p ( d )= E [ Y ∣ a, c ]( p ( c ∣ a, d ) p ( d ) + p ( c ∣ a, d ) p ( d ))+ E [ Y ∣ a, c ]( p ( c ∣ a, d ) p ( d ) + p ( c ∣ a, d ) p ( d ))− E [ Y ∣ a, c ]( p ( c ∣ a, d ) p ( d ) + p ( c ∣ a, d ) p ( d ))− E [ Y ∣ a, c ]( p ( c ∣ a, d ) p ( d ) + p ( c ∣ a, d ) p ( d ))= E [ Y ∣ a, c ]( . + α ) + E [ Y ∣ a, c ]( . − α )− E [ Y ∣ a, c ]( . − β ) − E [ Y ∣ a, c ]( . + β )= RD true + α ( E [ Y ∣ a, c ] − E [ Y ∣ a, c ]) − β ( E [ Y ∣ a, c ] − E [ Y ∣ a, c ]) (25)where the third equality follows from the fact that Y and D are con-ditionally independent given A and C due to the causal graph un-der consideration. Then, RD obs ≥ RD true because α ≥ β ≥
0, and E [ Y ∣ a, c ] − E [ Y ∣ a, c ] ≥ E [ Y ∣ a, c ] − E [ Y ∣ a, c ] ≥ (cid:3) Theorem 10.
Consider the causal graph to the left in Figure 1. Let p ( c ) = . , p ( a ∣ c ) ≤ p ( a ∣ c ) ≤ . and p ( d ∣ c ) ≤ p ( d ∣ c ) ≤ . . If E [ Y ∣ a, c ] − Code available at . E [ Y ∣ a, c ] ≥ E [ Y ∣ a, c ]− E [ Y ∣ a, c ] ≥ , then RD crude ≤ RD true and RD obs ≤ RD true . If E [ Y ∣ a, c ]− E [ Y ∣ a, c ] ≤ E [ Y ∣ a, c ]− E [ Y ∣ a, c ] ≤ , then RD crude ≥ RD true and RD obs ≥ RD true .Proof. Similar to the proof of Theorem 9. Specifically, the assumptionthat p ( a ∣ c ) ≤ p ( a ∣ c ) ≤ . p ( a ∣ c ) p ( a ∣ c ) ≤ p ( a ∣ c ) p ( a ∣ c ) , whichtogether imply that p ( c ∣ a ) ≤ p ( c ∣ a ) ≤ .
5, which implies that RD crude = RD true + α ( E [ Y ∣ a, c ] − E [ Y ∣ a, c ]) − β ( E [ Y ∣ a, c ] − E [ Y ∣ a, c ]) with α ≤ β ≤
0. This implies the stated relationships between RD crude and RD true . The assumption that p ( a ∣ c ) ≤ p ( a ∣ c ) ≤ . p ( a ∣ c )/ p ( a ∣ c ) ≤
1, which implies that p ( c ∣ a, d ) ≤ p ( c ∣ d ) and p ( c ∣ a, d ) ≤ p ( c ∣ d ) , which together with the assumption that p ( d ∣ c ) ≤ p ( d ∣ c ) ≤ . p ( c ∣ a, d ) p ( d ) + p ( c ∣ a, d ) p ( d ) ≤ p ( c ∣ a, d ) p ( d ) + p ( c ∣ a, d ) p ( d ) ≤ . RD obs = RD true + α ( E [ Y ∣ a, c ] − E [ Y ∣ a, c ]) − β ( E [ Y ∣ a, c ] − E [ Y ∣ a, c ]) with α ≤ β ≤
0. This implies the stated relationships between RD obs and RD true . (cid:3) Theorem 11.
Consider the causal graph to the left in Figure 1. Let p ( c ) ≤ . and p ( a ∣ c ) ≥ p ( a ∣ c ) ≥ . . If E [ Y ∣ a, c ]− E [ Y ∣ a, c ] ≥ E [ Y ∣ a, c ]− E [ Y ∣ a, c ] ≥ , then RD crude ≥ RD true . If E [ Y ∣ a, c ]− E [ Y ∣ a, c ] ≤ E [ Y ∣ a, c ]− E [ Y ∣ a, c ] ≤ , then RD crude ≤ RD true .Proof. Recall from the proof of Theorem 2 that p ( c ∣ a ) = σ ( ln p ( a ∣ c ) p ( c ) p ( a ∣ c ) p ( c ) ) and p ( c ∣ a ) = σ ( ln p ( a ∣ c ) p ( c ) p ( a ∣ c ) p ( c ) ) and p ( c ) = σ ( ln p ( c ) p ( c ) ) and p ( c ) = σ ( ln p ( c ) p ( c ) ) . Therefore, p ( c ∣ a ) ≥ p ( c ) and p ( c ∣ a ) ≥ p ( c ) because σ () and ln () areincreasing functions and p ( a ∣ c )/ p ( a ∣ c ) ≥ p ( a ∣ c )/ p ( a ∣ c ) ≥ p ( a ∣ c ) ≥ p ( a ∣ c ) ≥ .
5. Then, we can write p ( c ∣ a ) = p ( c ) + α and p ( c ∣ a ) = p ( c ) + β with α, β ≥
0. Moreover, α ≥ β . Tosee it, recall from the proof of Theorem 9 that a function of the form f ( x ) = x ( − x ) has a single maximum at x = .
5, and it is increasing inthe interval [ , . ] and decreasing in the interval [ . , ] . Now, notethat σ ′ ( z ) = σ ( z )( − σ ( z )) (Bishop, 2006, Equation 4.88). Then, σ ′ ( z ) has a single maximum at σ ( z ) = . z = { σ ( z )∣ ≤ σ ( z ) ≤ . } (i.e., { z ∣ −∞ < z ≤ } ) and decreasingin the interval { σ ( z ) ∣ . ≤ σ ( z ) ≤ } (i.e., { z ∣ ≤ z < +∞} ). Inother words, σ ( z ) increases at an increasing rate in the interval (−∞ , ] and increases at a decreasing rate in the interval [ , +∞) . Therefore, σ (− u + v ) − σ (− u ) ≥ σ ( u + v ) − σ ( u ) for all u, v ≥
0. Then, α = p ( c ∣ a ) − p ( c )= σ ( ln p ( a ∣ c ) p ( a ∣ c ) + ln p ( c ) p ( c ) ) − σ ( ln p ( c ) p ( c ) )≥ σ ( ln p ( a ∣ c ) p ( a ∣ c ) + ln p ( c ) p ( c ) ) − σ ( ln p ( c ) p ( c ) )≥ σ ( ln p ( a ∣ c ) p ( a ∣ c ) + ln p ( c ) p ( c ) ) − σ ( ln p ( c ) p ( c ) )= p ( c ∣ a ) − p ( c ) = β where the first inequality follows from the fact that σ (− u + v ) − σ (− u ) ≥ σ ( u + v )− σ ( u ) with u = ln p ( c ) p ( c ) and v = ln p ( a ∣ c ) p ( a ∣ c ) , and the second inequalityfollows from Equation 23 and the fact that σ () and ln () are increasingfunctions. Note that u, v ≥ p ( c ) ≤ . p ( a ∣ c ) ≥ p ( a ∣ c ) ≥ . RD crude = E [ Y ∣ a ] − E [ Y ∣ a ]= E [ Y ∣ a, c ] p ( c ∣ a ) + E [ Y ∣ a, c ] p ( c ∣ a )− E [ Y ∣ a, c ] p ( c ∣ a ) − E [ Y ∣ a, c ] p ( c ∣ a )= E [ Y ∣ a, c ]( p ( c ) + α ) + E [ Y ∣ a, c ]( p ( c ) − α )− E [ Y ∣ a, c ]( p ( c ) − β ) − E [ Y ∣ a, c ]( p ( c ) + β )= RD true + α ( E [ Y ∣ a, c ] − E [ Y ∣ a, c ]) − β ( E [ Y ∣ a, c ] − E [ Y ∣ a, c ]) (26)which implies the desired results because α ≥ β ≥ (cid:3) Theorem 12.
Consider the causal graph to the left in Figure 1. Let p ( c ) ≥ . and p ( a ∣ c ) ≤ p ( a ∣ c ) ≤ . . If E [ Y ∣ a, c ]− E [ Y ∣ a, c ] ≥ E [ Y ∣ a, c ]− E [ Y ∣ a, c ] ≥ , then RD crude ≤ RD true . If E [ Y ∣ a, c ]− E [ Y ∣ a, c ] ≤ E [ Y ∣ a, c ]− E [ Y ∣ a, c ] ≤ , then RD crude ≥ RD true .Proof. Similar to the proof of Theorem 11. Specifically, the assumptionthat p ( a ∣ c ) ≤ p ( a ∣ c ) ≤ . p ( c ∣ a ) ≤ p ( c ) and p ( c ∣ a ) ≤ p ( c ) ,which implies that p ( c ∣ a ) = p ( c )+ α and p ( c ∣ a ) = p ( c )+ β with α ≤ β ≤ σ (− u + v ) − σ (− u ) ≤ σ ( u + v ) − σ ( u ) for all u ≥ v ≤ (cid:3) Theorem 13.
Consider the path diagram to the right in Figure 1.Assume that the variables are standardized. If sign ( β ) = sign ( γ ) then β Y A ⋅ C ≤ β Y A ⋅ D ≤ β Y A , else β Y A ⋅ C ≥ β Y A ⋅ D ≥ β Y A .Proof.
Pearl (2013, Section 3.11) shows that β Y A ⋅ C = α , β Y A = α + βγ and β Y A ⋅ D = α + γβ ( − δ ) − β δ . (27)Note that the linear structural equation model corresponding to thepath diagram under consideration implies that A = βC + (cid:15) A where (cid:15) A is an error term that is independent of C and, thus, var ( A ) = β var ( C ) + var ( (cid:15) A ) where var ( A ) = var ( C ) = β ≤
1. Similarly δ ≤
1. Then,1 − δ ≤ − β δ in Equation 27. The result is now immediate. (cid:3) References
C. M. Bishop.
Pattern Recognition and Machine Learning . Springer,2006.S. Greenland. The Effect of Misclassification in the Presence of Covari-ates.
American Journal of Epidemiology , 112(4):564–569, 1980.W. Miao, Z. Geng, and E. J. Tchetgen Tchetgen. IdentifyingCausal Effects with Proxy Variables of an Unmeasured Confounder.
Biometrika , 105(4):987–993, 2018.E. L. Ogburn and T. J. VanderWeele. On the Nondifferential Mis-classification of a Binary Confounder.
Epidemiology , 23(3):433–439,2012.E. L. Ogburn and T. J. VanderWeele. Bias Attenuation Results forNondifferentially Mismeasured Ordinal and Coarsened Confounders.
Biometrika , 100(1):241–248, 2013.J. M. Pe˜na. On the Monotonicity of a Nondifferentially MismeasuredBinary Confounder.
Journal of Causal Inference , 8:150–163, 2020.J. Pearl.
Causality: Models, Reasoning, and Inference . CambridgeUniversity Press, 2009.J. Pearl. Linear Models: A Useful “Microscope” for Causal Analysis.