[PDF] On the Non-Monotonicity of a Non-Differentially Mismeasured Binary Confounder

Abstract

Suppose that we are interested in the average causal effect of a binary treatment on an outcome when this relationship is confounded by a binary confounder. Suppose that the confounder is unobserved but a non-differential binary proxy of it is observed. We identify conditions under which adjusting for the proxy comes closer to the incomputable true average causal effect than not adjusting at all. Unlike other works, we do not assume that the average causal effect of the confounder on the outcome is in the same direction among treated and untreated.

Full PDF

OON THE NON-MONOTONICITY OF ANON-DIFFERENTIALLY MISMEASURED BINARYCONFOUNDER

JOSE M. PE ˜NAIDA, LINK ¨OPING UNIVERSITY, [email protected]

Abstract.

Suppose that we are interested in the average causaleﬀect of a binary treatment on an outcome when this relationship isconfounded by a binary confounder. Suppose that the confounderis unobserved but a non-diﬀerential binary proxy of it is observed.We identify conditions under which adjusting for the proxy comescloser to the incomputable true average causal eﬀect than not ad-justing at all. Unlike other works, we do not assume that theaverage causal eﬀect of the confounder on the outcome is in thesame direction among treated and untreated. Introduction

Suppose that we are interested in the average causal eﬀect of a binarytreatment A on an outcome Y when this relationship is confoundedby a binary confounder C . Suppose also that C is non-diﬀerentiallymismeasured, meaning that (i) C is not observed and, instead, a binaryproxy D of C is observed, and (ii) D is conditionally independent of A and Y given C . The causal graph to the left in Figure 1 represents therelationships between the random variables.Greenland (1980) argues that adjusting for D produces a partiallyadjusted measure of the average causal eﬀect of A on Y that is be-tween the crude (i.e., unadjusted) and the true (i.e., adjusted for C )measures and, thus, it comes closer to the incomputable true mea-sure than the crude one. Ogburn and VanderWeele (2012) show that,although this result does not always hold, it does hold under somemonotonicity condition in C . Speciﬁcally, E [ Y ∣ A, C ] must be non-decreasing or non-increasing in C . Unfortunately, the condition cannotbe veriﬁed empirically because C is unobserved. Ogburn and Vander-Weele (2013) extend these results to the case where C takes more thantwo values. Pe˜na (2020) shows that if E [ Y ∣ A, D ] is non-decreasingor non-increasing in D (which can be veriﬁed empirically), then so is E [ Y ∣ A, C ] with respect to C and, thus, the partially adjusted averagecausal eﬀect lies between the crude and the true ones. Finally, if thereare at least two independent proxies of C , then Miao et al. (2018) show a r X i v : . [ s t a t . M E ] J a n A D YC A DYCαβ γδ

Figure 1.

Left: Causal graph where Y is a discrete orcontinuous random variable, and A , C and D are binaryrandom variables. Moreover, C is unobserved. Right:Path diagram where C is unobserved.that the average causal eﬀect of A on Y can be identiﬁed under certainrank condition.In this paper, we focus on the case where neither E [ Y ∣ A, C ] nor E [ Y ∣ A, D ] are monotone in C or D . And we report conditions underwhich the partially adjusted average causal eﬀect is still between thecrude and the true ones and, thus, it is still closer to the incomputabletrue average causal eﬀect. Speciﬁcally, the rest of the paper is organizedas follows. Sections 2 and 3 report the novel conditions. Section 4deals with continuous random variables. Section 5 closes with somediscussion.2. Bounding the Observed Risk Difference

Consider the causal graph to the left in Figure 1, where Y is a discreteor continuous random variable, and A , C and D are binary randomvariables. The graph entails the following factorization: p ( A, C, D, Y ) = p ( C ) p ( D ∣ C ) p ( A ∣ C ) p ( Y ∣ A, C ) . (1)Let A take values a and a , and similarly for C and D . Let A , D and Y be observed and let C be unobserved. Let Y a and Y a denote the coun-terfactual outcomes under treatments A = a and A = a , respectively.The average causal eﬀect of A on Y or true risk diﬀerence ( RD true )is deﬁned as RD true = E [ Y a ] − E [ Y a ] . It can be rewritten as follows(Pearl, 2009, Theorem 3.3.2): RD true = E [ Y ∣ a, c ] p ( c ) + E [ Y ∣ a, c ] p ( c ) − E [ Y ∣ a, c ] p ( c ) − E [ Y ∣ a, c ] p ( c ) . Since C is unobserved, RD true cannot be computed. However, it canbe approximated by the unadjusted average causal eﬀect or crude riskdiﬀerence ( RD crude ): RD crude = E [ Y ∣ a ] − E [ Y ∣ a ] and by the partially adjusted average causal eﬀect or observed riskdiﬀerence ( RD obs ): RD obs = E [ Y ∣ a, d ] p ( d ) + E [ Y ∣ a, d ] p ( d ) − E [ Y ∣ a, d ] p ( d ) − E [ Y ∣ a, d ] p ( d ) . Now the question is, which of the two approximations comes closer tothe true quantity ? This paper aims to answer this question.We say that E [ Y ∣ A, C ] is non-decreasing in C if E [ Y ∣ a, c ] ≥ E [ Y ∣ a, c ] and E [ Y ∣ a, c ] ≥ E [ Y ∣ a, c ] . Likewise, E [ Y ∣ A, C ] is non-increasing in C if E [ Y ∣ a, c ] ≤ E [ Y ∣ a, c ] and E [ Y ∣ a, c ] ≤ E [ Y ∣ a, c ] . Moreover, E [ Y ∣ A, C ] is monotone in C if it is non-decreasing or non-increasing in C , i.e. the average causal eﬀect of C on Y is in the samedirection among the treated ( A = a ) and the untreated ( A = a ). Ogburnand VanderWeele (2012, Result 1) show that if E [ Y ∣ A, C ] is monotonein C , then RD obs lies between RD true and RD crude and, thus, it comescloser to RD true than RD crude . Unfortunately, the antecedent of thisrule cannot be veriﬁed empirically, because C is unobserved. Therefore,one must rely on substantive knowledge to apply the rule. Pe˜na (2020,Corollay 2) shows that if E [ Y ∣ A, D ] is monotone in D , then RD obs liesbetween RD true and RD crude . Note that the antecedent of this rule canbe veriﬁed empirically. Actually, E [ Y ∣ A, C ] is monotone in C if andonly if E [ Y ∣ A, D ] is monotone in D (Ogburn and VanderWeele, 2012;Pe˜na, 2020).Pe˜na (2020, Theorems 3 and 4) characterizes a case where E [ Y ∣ A, C ] is not monotone in C and, thus, E [ Y ∣ A, D ] is not monotone in D , andyet RD obs lies between RD true and RD crude . We re-state this result inthe next theorem. Note that one must rely on substantive knowledgeto verify the conditions in the theorem. Theorem 1 (Pe˜na, 2020, Theorems 3 and 4) . Consider the causalgraph to the left in Figure 1. Let p ( c ) = . and p ( a ∣ c ) = p ( a ∣ c ) = p ( d ∣ c ) = p ( d ∣ c ) ≥ . . If E [ Y ∣ a, c ] − E [ Y ∣ a, c ] ≥ E [ Y ∣ a, c ] − E [ Y ∣ a, c ] ≥ ,then RD crude ≥ RD obs ≥ RD true . If E [ Y ∣ a, c ] − E [ Y ∣ a, c ] ≤ E [ Y ∣ a, c ] − E [ Y ∣ a, c ] ≤ , then RD crude ≤ RD obs ≤ RD true . The following theorems are the main contribution of this work. Theyshow that the conditions in the previous theorem can be relaxed. Theirproofs can be found in the supplementary material.

Theorem 2.

Consider the causal graph to the left in Figure 1. Let p ( c ) = . , p ( a ∣ c ) = p ( a ∣ c ) ≥ . and p ( d ∣ c ) = p ( d ∣ c ) ≥ . . Then, RD obs lies between RD true and RD crude . Theorem 3.

Consider the causal graph to the left in Figure 1. Let p ( c ) = . , p ( a ∣ c ) = p ( a ∣ c ) ≤ . and p ( d ∣ c ) = p ( d ∣ c ) ≤ . . Then, RD obs lies between RD true and RD crude . The following example gives some intuition about the conditions inTheorem 2. Let A , D and Y represent three diseases, and C a gene vari-ant that aﬀects the three of them. Moreover, suppose that suﬀering A aﬀects the risk of suﬀering Y . Suppose also that half of the populationcarries the gene variant C , i.e. p ( c ) = .

5. Suppose also that carrying C predisposes to suﬀer A and D as much as not carrying it protectsagainst the diseases, i.e. p ( a ∣ c ) = p ( a ∣ c ) ≥ . p ( d ∣ c ) = p ( d ∣ c ) ≥ . Corollary 4.

Consider the causal graph to the left in Figure 1. Let p ( c ) = . , p ( a ∣ c ) = p ( a ∣ c ) ≥ . and p ( d ∣ c ) = p ( d ∣ c ) ≥ . . If E [ Y ∣ a, c ] − E [ Y ∣ a, c ] ≥ E [ Y ∣ a, c ]− E [ Y ∣ a, c ] ≥ , then RD crude ≥ RD obs ≥ RD true . If E [ Y ∣ a, c ]− E [ Y ∣ a, c ] ≤ E [ Y ∣ a, c ]− E [ Y ∣ a, c ] ≤ , then RD crude ≤ RD obs ≤ RD true . Corollary 5.

Consider the causal graph to the left in Figure 1. Let p ( c ) = . , p ( a ∣ c ) = p ( a ∣ c ) ≤ . and p ( d ∣ c ) = p ( d ∣ c ) ≤ . . If E [ Y ∣ a, c ] − E [ Y ∣ a, c ] ≥ E [ Y ∣ a, c ]− E [ Y ∣ a, c ] ≥ , then RD crude ≤ RD obs ≤ RD true . If E [ Y ∣ a, c ]− E [ Y ∣ a, c ] ≤ E [ Y ∣ a, c ]− E [ Y ∣ a, c ] ≤ , then RD crude ≥ RD obs ≥ RD true . To get some intuition about the conditions for the ﬁrst result inCorollary 4, let us extend the previous example with the following ad-ditional assumption: Carrying the gene variant C increases the averageseverity of Y for the individuals suﬀering A more than it decreases theseverity for the rest. Then, the corollary applies.Note that one must rely on substantive knowledge to verify the condi-tions in the previous theorems and corollaries. The next two corollariesshow that this can partially be alleviated by replacing the conditionson E [ Y ∣ A, C ] with similar conditions on E [ Y ∣ A, D ] : The former arenot empirically testable because C is unobserved, but the latter are. Corollary 6.

Under the conditions in Corollary 4, E [ Y ∣ a, c ]− E [ Y ∣ a, c ] ≥ E [ Y ∣ a, c ]− E [ Y ∣ a, c ] ≥ if and only if E [ Y ∣ a, d ]− E [ Y ∣ a, d ] ≥ E [ Y ∣ a, d ]− E [ Y ∣ a, d ] ≥ . Likewise when replacing ≥ with ≤ . Corollary 7.

Under the conditions in Corollary 5, E [ Y ∣ a, c ]− E [ Y ∣ a, c ] ≥ E [ Y ∣ a, c ]− E [ Y ∣ a, c ] ≥ if and only if E [ Y ∣ a, d ]− E [ Y ∣ a, d ] ≤ E [ Y ∣ a, d ]− E [ Y ∣ a, d ] ≤ . Likewise when swapping ≤ and ≥ . Experiments.

In this section, we report some experiments thatshed additional light on the relationships between the various risk dif-ferences under the conditions in Theorem 2. For the experiments, welet Y be binary. Then, we randomly parameterize 10000 times thecausal graph to the left in Figure 1 by parameterizing the terms inthe right-hand side of Equation 1 with parameter values drawn from auniform distribution, while enforcing the assumptions in the theorem.For each parameterization, we compute RD true , RD obs and RD crude .Figure 2 summarizes the results. The top left plot shows that most Code available at . Figure 2. (tl) Histogram of the interval length. (tr)Distance between RD obs and RD true relative to the in-terval length. (bl) Zoom of the previous plot. (br) Dis-tance between RD obs and RD true relative to the intervallength, as a function of the strength of the dependencebetween C and D when measured by the Youden index.intervals are relatively small and, thus, that RD obs is close to RD true inmost cases. However, the top right plot shows that RD obs tends to becloser to RD crude than to RD true . The bottom left plot is a zoom of theprevious plot at the smallest intervals. Finally, the bottom right plotshows that the stronger the dependence between C and D as measuredby the Youden index (i.e., p ( d ∣ c ) + p ( d ∣ c ) − RD obs is to RD true . In summary, RD obs is a reasonable approximation to RD true ,but it is biased towards RD crude . This may be a problem when the in-terval between RD crude and RD true is large. However, the length of theinterval is unknown in practice, and we doubt substantive knowledge may provide hints on it. The bias decreases with increasing depen-dence between C and D . Although the strength of this dependence isunknown in practice, substantive knowledge may give hints on it.3. Bounding the True Risk Difference

Theorems and Corollaries 1-5 do not hold if the assumption that p ( a ∣ c ) = p ( a ∣ c ) ≥ . p ( a ∣ c ) ≥ p ( a ∣ c ) ≥ .

5. Likewise for the assumption that p ( d ∣ c ) = p ( d ∣ c ) ≥ . p ( a ∣ c ) = p ( a ∣ c ) ≥ . RD crude and RD obs bound RD true . We re-state this result in the next theorem. Theorem 8 (Pe˜na, 2020, Theorems 5 and 6) . Consider the causalgraph to the left in Figure 1. Let p ( c ) = . , p ( d ∣ c ) = p ( d ∣ c ) ≥ . and p ( a ∣ c ) ≥ p ( a ∣ c ) ≥ . . If E [ Y ∣ a, c ] − E [ Y ∣ a, c ] ≥ E [ Y ∣ a, c ] − E [ Y ∣ a, c ] ≥ ,then RD crude ≥ RD true and RD obs ≥ RD true . If E [ Y ∣ a, c ] − E [ Y ∣ a, c ] ≤ E [ Y ∣ a, c ] − E [ Y ∣ a, c ] ≤ , then RD crude ≤ RD true and RD obs ≤ RD true . Note that the previous theorem does not determine the order be-tween RD crude and RD obs . Thus, it cannot be used to decide which ofthe two comes closer to RD true . However, the theorem may be useful toconclude whether RD true is positive or negative. For instance, the lastresult in the theorem allows us to conclude that RD true > ( RD crude , RD obs ) > p ( d ∣ c ) = p ( d ∣ c ) ≥ . Theorem 9.

Consider the causal graph to the left in Figure 1. Let p ( c ) = . , p ( a ∣ c ) ≤ p ( a ∣ c ) ≤ . and p ( d ∣ c ) ≤ p ( d ∣ c ) ≤ . . If E [ Y ∣ a, c ] − E [ Y ∣ a, c ] ≥ E [ Y ∣ a, c ]− E [ Y ∣ a, c ] ≥ , then RD crude ≤ RD true and RD obs ≤ RD true . If E [ Y ∣ a, c ]− E [ Y ∣ a, c ] ≤ E [ Y ∣ a, c ]− E [ Y ∣ a, c ] ≤ , then RD crude ≥ RD true and RD obs ≥ RD true . Returning to our example of three diseases A , D and Y and a genevariant C , the assumptions for the ﬁrst result in Theorem 9 mean that(i) half of the population carry the gene variant C , i.e. p ( c ) = . C protects against A and D more than carrying itpredisposes to suﬀer the diseases, i.e. p ( a ∣ c ) ≥ p ( a ∣ c ) ≥ . p ( d ∣ c ) ≥ p ( d ∣ c ) ≥ .

5, and (iii) carrying C increases the average severity of Y forthe individuals suﬀering A more than it decreases the severity for therest. The last two theorems can be strengthened for RD crude as follows.Analogous results do not hold for RD obs , though. Theorem 11.

Finally, we suppose that the variables A , C , D and Y are all con-tinuous and follow the linear structural equation model represented bythe path diagram to the right in Figure 1. The true, crude and par-tially adjusted average causal eﬀects of A on Y are given by the partialregression coeﬃcients β Y A ⋅ C , β Y A and β Y A ⋅ D , respectively. Note thatthe ﬁrst cannot be computed because C is unobserved. The followingtheorem proves that the partially adjusted average causal eﬀect liesbetween the true and the crude ones and, thus, it comes closer to thetrue average causal eﬀect than the crude. Theorem 13.

Consider the path diagram to the right in Figure 1.Assume that the variables are standardized. If sign ( β ) = sign ( γ ) then β Y A ⋅ C ≤ β Y A ⋅ D ≤ β Y A , else β Y A ⋅ C ≥ β Y A ⋅ D ≥ β Y A . Note that unlike in the discrete case, no assumptions about the causalrelationships of the variables are required to conclude that the partiallyadjusted average causal eﬀect lies between the true and the crude ones.Note also that the signs of β and γ tell us whether the partially adjustedaverage causal eﬀect is an upper or lower bound of the true one.5. Discussion

One may think that adjusting for a proxy of a latent confounder isalways a good idea. However, it is not. In this work, we have describedsuﬃcient conditions under which adjusting for a proxy of a latent con-founder comes closer to the incomputable true average causal eﬀectthan not adjusting at all. Under some conditions, it is even possibleto decide whether the partially adjusted approximation is an upper ora lower bound of the true quantity. We have experimentally shownthat the partially adjusted approximation can be substantially betterthan the unadjusted one when the dependence between confounder andproxy is signiﬁcant. We have also illustrated with an example that theconditions proposed are not too restrictive and unrealistic. Since onemust rely on expert knowledge to verify the conditions, we would like to investigate in the future whether realistic, suﬃcient and empiricallytestable conditions exist. We would also like to extend this work towhen several latent confounders exist.

Acknowledgments

This work was funded by the Swedish Research Council (ref. 2019-00245).

Supplementary Material: Proofs

Theorem 2.

Consider the causal graph to the left in Figure 1. Let p ( c ) = . , p ( a ∣ c ) = p ( a ∣ c ) ≥ . and p ( d ∣ c ) = p ( d ∣ c ) ≥ . . Then, RD obs lies between RD true and RD crude .Proof. We start by establishing a relationship between RD obs and RD true .First, note that p ( c ∣ a, d ) = p ( a, d ∣ c ) p ( c ) p ( a, d ∣ c ) p ( c ) + p ( a, d ∣ c ) p ( c ) = + exp (− δ ( a, d )) = σ ( δ ( a, d )) where δ ( a, d ) = ln p ( a, d ∣ c ) p ( c ) p ( a, d ∣ c ) p ( c ) is known as the log odds, and σ () is known as the logistic sigmoidfunction (Bishop, 2006, Section 4.2). Then, p ( c ∣ a, d ) = σ ( ln p ( a, d ∣ c ) p ( c ) p ( a, d ∣ c ) p ( c ) ) = σ ( ln p ( a ∣ c ) p ( d ∣ c ) p ( a ∣ c ) p ( d ∣ c ) ) (2)where the second equality follows from the assumption that p ( c ) = . A and D are conditionally independent given C dueto the causal graph under consideration. Likewise, p ( c ∣ a, d ) = σ ( ln p ( a ∣ c ) p ( d ∣ c ) p ( a ∣ c ) p ( d ∣ c ) ) (3)and p ( c ∣ d ) = σ ( ln p ( d ∣ c ) p ( d ∣ c ) ) (4)and p ( c ∣ d ) = σ ( ln p ( d ∣ c ) p ( d ∣ c ) ) . (5)Then, p ( c ∣ a, d ) ≥ p ( c ∣ d ) and p ( c ∣ a, d ) ≥ p ( c ∣ d ) because σ () and ln () are increasing functions and p ( a ∣ c )/ p ( a ∣ c ) ≥ p ( a ∣ c ) = p ( a ∣ c ) ≥ .

5. Then, p ( c ∣ a, d ) p ( d ) + p ( c ∣ a, d ) p ( d ) ≥ p ( c ∣ d ) p ( d ) + p ( c ∣ d ) p ( d )= p ( d ∣ c ) p ( c ) p ( d ) p ( d ) + p ( d ∣ c ) p ( c ) p ( d ) p ( d ) = . p ( c ) = .

5. Moreover, p ( c ∣ a, d ) p ( d ) + p ( c ∣ a, d ) p ( d ) = − ( p ( c ∣ a, d ) p ( d ) + p ( c ∣ a, d ) p ( d )) . (7)Next, note that p ( c ∣ a, d ) = σ ( ln p ( a ∣ c ) p ( d ∣ c ) p ( a ∣ c ) p ( d ∣ c ) ) = σ ( ln p ( a ∣ c ) p ( d ∣ c ) p ( a ∣ c ) p ( d ∣ c ) ) = p ( c ∣ a, d ) (8)by the assumptions that p ( a ∣ c ) = p ( a ∣ c ) and p ( d ∣ c ) = p ( d ∣ c ) . Likewise, p ( c ∣ a, d ) = σ ( ln p ( a ∣ c ) p ( d ∣ c ) p ( a ∣ c ) p ( d ∣ c ) ) = σ ( ln p ( a ∣ c ) p ( d ∣ c ) p ( a ∣ c ) p ( d ∣ c ) ) = p ( c ∣ a, d ) . (9)Then, p ( c ∣ a, d ) p ( d ) + p ( c ∣ a, d ) p ( d ) = p ( c ∣ a, d ) p ( d ) + p ( c ∣ a, d ) p ( d ) (10)because p ( d ) = p ( d ∣ c ) p ( c ) + p ( d ∣ c ) p ( c ) = p ( d ∣ c ) p ( c ) + p ( d ∣ c ) p ( c ) = . . (11)Moreover, p ( c ∣ a, d ) p ( d ) + p ( c ∣ a, d ) p ( d ) = − ( p ( c ∣ a, d ) p ( d ) + p ( c ∣ a, d ) p ( d )) . (12)Finally, Equations 6 and 10 allow us to write p ( c ∣ a, d ) p ( d )+ p ( c ∣ a, d ) p ( d ) = p ( c ∣ a, d ) p ( d ) + p ( c ∣ a, d ) p ( d ) = . + α with α ≥

0, whereas Equations 7,10 and 12 allow us to write p ( c ∣ a, d ) p ( d ) + p ( c ∣ a, d ) p ( d ) = p ( c ∣ a, d ) p ( d ) + p ( c ∣ a, d ) p ( d ) = . − α . Therefore, RD obs = E [ Y ∣ a, d ] p ( d ) + E [ Y ∣ a, d ] p ( d ) − E [ Y ∣ a, d ] p ( d ) − E [ Y ∣ a, d ] p ( d )= ( E [ Y ∣ a, c, d ] p ( c ∣ a, d ) + E [ Y ∣ a, c, d ] p ( c ∣ a, d )) p ( d )+ ( E [ Y ∣ a, c, d ] p ( c ∣ a, d ) + E [ Y ∣ a, c, d ] p ( c ∣ a, d )) p ( d )− ( E [ Y ∣ a, c, d ] p ( c ∣ a, d ) + E [ Y ∣ a, c, d ] p ( c ∣ a, d )) p ( d )− ( E [ Y ∣ a, c, d ] p ( c ∣ a, d ) + E [ Y ∣ a, c, d ] p ( c ∣ a, d )) p ( d )= E [ Y ∣ a, c ]( p ( c ∣ a, d ) p ( d ) + p ( c ∣ a, d ) p ( d ))+ E [ Y ∣ a, c ]( p ( c ∣ a, d ) p ( d ) + p ( c ∣ a, d ) p ( d ))− E [ Y ∣ a, c ]( p ( c ∣ a, d ) p ( d ) + p ( c ∣ a, d ) p ( d ))− E [ Y ∣ a, c ]( p ( c ∣ a, d ) p ( d ) + p ( c ∣ a, d ) p ( d ))= E [ Y ∣ a, c ]( . + α ) + E [ Y ∣ a, c ]( . − α )− E [ Y ∣ a, c ]( . − α ) − E [ Y ∣ a, c ]( . + α ) where the third equality follows from the fact that Y and D are con-ditionally independent given A and C due to the causal graph underconsideration. Then, RD obs = RD true + α ( E [ Y ∣ a, c ] − E [ Y ∣ a, c ] + E [ Y ∣ a, c ] − E [ Y ∣ a, c ]) (13)with α ≥ RD obs and RD crude . First, note that p ( a ∣ d ) = p ( a ∣ c, d ) p ( c ∣ d ) + p ( a ∣ c, d ) p ( c ∣ d )= p ( a ∣ c ) p ( c ∣ d ) + p ( a ∣ c ) p ( c ∣ d ) (14) = p ( a ∣ c ) p ( c ∣ d ) + p ( a ∣ c ) p ( c ∣ d )= p ( a ∣ c, d ) p ( c ∣ d ) + p ( a ∣ c, d ) p ( c ∣ d )= p ( a ∣ d ) by the fact that A and D are conditionally independent given C dueto the causal graph under consideration, the assumption that p ( a ∣ c ) = p ( a ∣ c ) , and the fact that p ( c ∣ d ) = p ( d ∣ c ) = p ( d ∣ c ) = p ( c ∣ d ) which followsfrom the assumptions that p ( c ) = . p ( d ∣ c ) = p ( d ∣ c ) and the factthat p ( d ) = . p ( a ∣ d ) = p ( a ∣ c, d ) p ( c ∣ d ) + p ( a ∣ c, d ) p ( c ∣ d )= p ( a ∣ c ) p ( c ∣ d ) + p ( a ∣ c ) p ( c ∣ d ) (15) = p ( a ∣ c ) p ( c ∣ d ) + p ( a ∣ c ) p ( c ∣ d )= p ( a ∣ c, d ) p ( c ∣ d ) + p ( a ∣ c, d ) p ( c ∣ d )= p ( a ∣ d ) . Next, let x = p ( a ∣ c ) and z = p ( d ∣ c ) . Recall that p ( a ∣ c ) = p ( a ∣ c ) by as-sumption, and p ( c ∣ d ) = p ( d ∣ c ) = p ( d ∣ c ) = p ( c ∣ d ) as shown above. Then,Equation 15 can be rewritten as p ( a ∣ d ) = x ( − z ) + ( − x ) z = − xz + x + z. Recall that z ≥ . z = .

5, then − xz + x + z = . z > . − xz + x + z ≤ .

5. To see it, assumeto the contrary that − xz + x + z > . x ( − z ) > . − z = ( − z )/ x < . x ≥ .

5. Consequently, p ( a ∣ d ) = p ( a ∣ d ) ≤ . p ( a ∣ d ) = p ( a ∣ d ) ≥ . p ( a ) = p ( a ∣ c ) p ( c ) + p ( a ∣ c ) p ( c ) = p ( a ∣ c ) p ( c ) + p ( a ∣ c ) p ( c ) = . p ( a ∣ c ) = p ( a ∣ c ) and p ( c ) = .

5. This togetherwith the fact that p ( d ) = . p ( d ∣ a ) = p ( d ∣ a ) ≤ . p ( d ∣ a ) = p ( d ∣ a ) ≥ . p ( d ∣ a ) = p ( d ∣ a ) = . + β and p ( d ∣ a ) = p ( d ∣ a ) = . − β with β ≥ RD crude = E [ Y ∣ a ] − E [ Y ∣ a ]= E [ Y ∣ a, d ] p ( d ∣ a ) + E [ Y ∣ a, d ] p ( d ∣ a )− E [ Y ∣ a, d ] p ( d ∣ a ) − E [ Y ∣ a, d ] p ( d ∣ a )= E [ Y ∣ a, d ]( . + β ) + E [ Y ∣ a, d ]( . − β )− E [ Y ∣ a, d ]( . − β ) − E [ Y ∣ a, d ]( . + β )= RD obs + β ( E [ Y ∣ a, d ] − E [ Y ∣ a, d ] + E [ Y ∣ a, d ] − E [ Y ∣ a, d ]) (16)with β ≥

0, where the last equality follows from the fact that p ( d ) = . Now, note that E [ Y ∣ a, d ] − E [ Y ∣ a, d ] = E [ Y ∣ a, c, d ] p ( c ∣ a, d ) + E [ Y ∣ a, c, d ] p ( c ∣ a, d )− E [ Y ∣ a, c, d ] p ( c ∣ a, d ) − E [ Y ∣ a, c, d ] p ( c ∣ a, d )= E [ Y ∣ a, c ]( p ( c ∣ a, d ) − p ( c ∣ a, d ))− E [ Y ∣ a, c ]( p ( c ∣ a, d ) − p ( c ∣ a, d ))= E [ Y ∣ a, c ]( p ( c ∣ a, d ) − p ( c ∣ a, d ))− E [ Y ∣ a, c ]( − p ( c ∣ a, d ) − + p ( c ∣ a, d ))= ( E [ Y ∣ a, c ] − E [ Y ∣ a, c ])( p ( c ∣ a, d ) − p ( c ∣ a, d )) (17)where the second equality follows from the fact that Y and D areconditionally independent given A and C due to the causal graph underconsideration. Likewise, E [ Y ∣ a, d ] − E [ Y ∣ a, d ] = E [ Y ∣ a, c, d ] p ( c ∣ a, d ) + E [ Y ∣ a, c, d ] p ( c ∣ a, d )− E [ Y ∣ a, c, d ] p ( c ∣ a, d ) − E [ Y ∣ a, c, d ] p ( c ∣ a, d )= E [ Y ∣ a, c ]( p ( c ∣ a, d ) − p ( c ∣ a, d ))− E [ Y ∣ a, c ]( p ( c ∣ a, d ) − p ( c ∣ a, d ))= E [ Y ∣ a, c ]( p ( c ∣ a, d ) − p ( c ∣ a, d ))− E [ Y ∣ a, c ]( − p ( c ∣ a, d ) − + p ( c ∣ a, d ))= ( E [ Y ∣ a, c ] − E [ Y ∣ a, c ])( p ( c ∣ a, d ) − p ( c ∣ a, d ))= ( E [ Y ∣ a, c ] − E [ Y ∣ a, c ])( p ( c ∣ a, d ) − p ( c ∣ a, d )) (18)where the last equality follows from Equations 8 and 9. Moreover,Equations 8 and 9 also imply that p ( c ∣ a, d ) ≥ p ( c ∣ a, d ) , because p ( d ∣ c ) = p ( d ∣ c ) ≥ . sign ( E [ Y ∣ a, c ] − E [ Y ∣ a, c ] + E [ Y ∣ a, c ] − E [ Y ∣ a, c ]) = sign ( E [ Y ∣ a, d ] − E [ Y ∣ a, d ] + E [ Y ∣ a, d ] − E [ Y ∣ a, d ]) . (19)This equation together with Equations 13 and 16 imply the desiredresult. (cid:3) Theorem 3.

Consider the causal graph to the left in Figure 1. Let p ( c ) = . , p ( a ∣ c ) = p ( a ∣ c ) ≤ . and p ( d ∣ c ) = p ( d ∣ c ) ≤ . . Then, RD obs lies between RD true and RD crude .Proof. Similar to the proof of Theorem 2. Speciﬁcally, the assump-tion that p ( a ∣ c ) = p ( a ∣ c ) ≤ . p ( a ∣ c )/ p ( a ∣ c ) ≤

1, whichimplies that p ( c ∣ a, d ) ≤ p ( c ∣ d ) and p ( c ∣ a, d ) ≤ p ( c ∣ d ) , which implies that p ( c ∣ a, d ) p ( d ) + p ( c ∣ a, d ) p ( d ) ≤ .

5, which implies that RD obs = RD true + α ( E [ Y ∣ a, c ] − E [ Y ∣ a, c ] + E [ Y ∣ a, c ] − E [ Y ∣ a, c ]) (20)with α ≤

0. Likewise, the assumption that p ( d ∣ c ) = p ( d ∣ c ) ≤ . p ( a ∣ d ) = p ( a ∣ d ) ≤ . p ( a ∣ d ) = p ( a ∣ d ) ≥ .

5, which implies that p ( d ∣ a ) = p ( d ∣ a ) ≤ . p ( d ∣ a ) = p ( d ∣ a ) ≥ .

5, which implies that RD crude = RD obs + β ( E [ Y ∣ a, d ]− E [ Y ∣ a, d ]+ E [ Y ∣ a, d ]− E [ Y ∣ a, d ]) (21)with β ≥

0. The assumption that p ( d ∣ c ) = p ( d ∣ c ) ≤ . p ( c ∣ a, d ) ≤ p ( c ∣ a, d ) , which implies that sign ( E [ Y ∣ a, c ] − E [ Y ∣ a, c ] + E [ Y ∣ a, c ] − E [ Y ∣ a, c ]) =− sign ( E [ Y ∣ a, d ] − E [ Y ∣ a, d ] + E [ Y ∣ a, d ] − E [ Y ∣ a, d ]) . (22)This equation together with Equations 20 and 21 imply the desiredresult. (cid:3) Corollary 4.

Corollary 5.

Corollary 6.

Under the conditions in Corollary 4, E [ Y ∣ a, c ]− E [ Y ∣ a, c ] ≥ E [ Y ∣ a, c ]− E [ Y ∣ a, c ] ≥ if and only if E [ Y ∣ a, d ]− E [ Y ∣ a, d ] ≥ E [ Y ∣ a, d ]− E [ Y ∣ a, d ] ≥ . Likewise when replacing ≥ with ≤ .Proof. It follows from Equations 17 and 18. Recall that p ( c ∣ a, d ) ≥ p ( c ∣ a, d ) was established in the proof of Theorem 2. (cid:3) Corollary 7.

Under the conditions in Corollary 5, E [ Y ∣ a, c ]− E [ Y ∣ a, c ] ≥ E [ Y ∣ a, c ]− E [ Y ∣ a, c ] ≥ if and only if E [ Y ∣ a, d ]− E [ Y ∣ a, d ] ≤ E [ Y ∣ a, d ]− E [ Y ∣ a, d ] ≤ . Likewise when swapping ≤ and ≥ .Proof. It follows from Equations 17 and 18. Recall that p ( c ∣ a, d ) ≤ p ( c ∣ a, d ) was established in the proof of Theorem 3. (cid:3) Theorem 9.

Consider the causal graph to the left in Figure 1. Let p ( c ) = . , p ( a ∣ c ) ≥ p ( a ∣ c ) ≥ . and p ( d ∣ c ) ≥ p ( d ∣ c ) ≥ . . If E [ Y ∣ a, c ] − E [ Y ∣ a, c ] ≥ E [ Y ∣ a, c ]− E [ Y ∣ a, c ] ≥ , then RD crude ≥ RD true and RD obs ≥ RD true . If E [ Y ∣ a, c ]− E [ Y ∣ a, c ] ≤ E [ Y ∣ a, c ]− E [ Y ∣ a, c ] ≤ , then RD crude ≤ RD true and RD obs ≤ RD true .Proof. We start by proving the ﬁrst result in the theorem, speciﬁcallythat RD crude ≥ RD true . Recall from the proof of Theorem 2 that p ( c ∣ a ) = σ ( ln p ( a ∣ c ) p ( c ) p ( a ∣ c ) p ( c ) ) = σ ( ln p ( a ∣ c ) p ( a ∣ c ) ) where the second equality follows from the assumption that p ( c ) = . p ( c ∣ a ) = σ ( ln p ( a ∣ c ) p ( a ∣ c ) ) . Therefore, p ( c ∣ a ) ≥ . p ( c ∣ a ) ≥ . p ( a ∣ c ) ≥ p ( a ∣ c ) ≥ .

5. Now, consider the function f ( x ) = x ( − x ) .By inspecting the ﬁrst and second derivatives, we can conclude that f ( x ) has a single maximum at x = .

5, and that it is increasing inthe interval [ , . ] and decreasing in the interval [ . , ] . This impliesthat f ( p ( a ∣ c )) = p ( a ∣ c ) p ( a ∣ c ) ≥ p ( a ∣ c ) p ( a ∣ c ) = f ( p ( a ∣ c )) due to theassumption that p ( a ∣ c ) ≥ p ( a ∣ c ) ≥ .

5. Then, p ( a ∣ c ) p ( a ∣ c ) ≥ p ( a ∣ c ) p ( a ∣ c ) (23)which together with the fact that σ () and ln () are increasing functionsimply that p ( c ∣ a ) ≥ p ( c ∣ a ) .The results in the previous paragraph allow us to write p ( c ∣ a ) = . + α and p ( c ∣ a ) = . + β with α ≥ β ≥

0. Therefore, RD crude = E [ Y ∣ a ] − E [ Y ∣ a ]= E [ Y ∣ a, c ] p ( c ∣ a ) + E [ Y ∣ a, c ] p ( c ∣ a )− E [ Y ∣ a, c ] p ( c ∣ a ) − E [ Y ∣ a, c ] p ( c ∣ a )= E [ Y ∣ a, c ]( . + α ) + E [ Y ∣ a, c ]( . − α )− E [ Y ∣ a, c ]( . − β ) − E [ Y ∣ a, c ]( . + β )= RD true + α ( E [ Y ∣ a, c ] − E [ Y ∣ a, c ]) − β ( E [ Y ∣ a, c ] − E [ Y ∣ a, c ]) (24)which implies that RD crude ≥ RD true because α ≥ β ≥

0, and E [ Y ∣ a, c ] − E [ Y ∣ a, c ] ≥ E [ Y ∣ a, c ] − E [ Y ∣ a, c ] ≥ RD obs ≥ RD true . First, recall Equations2-5. Then, p ( c ∣ a, d ) ≥ p ( c ∣ d ) and p ( c ∣ a, d ) ≥ p ( c ∣ d ) because σ () and ln () are increasing functions and p ( a ∣ c )/ p ( a ∣ c ) ≥ p ( a ∣ c ) ≥ p ( a ∣ c ) ≥ .

5. Then, p ( c ∣ a, d ) p ( d ) + p ( c ∣ a, d ) p ( d ) ≥ p ( c ∣ d ) p ( d ) + p ( c ∣ d ) p ( d )= p ( d ∣ c ) p ( c ) p ( d ) p ( d ) + p ( d ∣ c ) p ( c ) p ( d ) p ( d ) = . by the assumption that p ( c ) = .

5. We can analogously prove that p ( c ∣ a, d ) p ( d ) + p ( c ∣ a, d ) p ( d ) ≥ .

5. Moreover, it also holds that p ( c ∣ a, d ) p ( d ) + p ( c ∣ a, d ) p ( d ) ≥ p ( c ∣ a, d ) p ( d ) + p ( c ∣ a, d ) p ( d ) . To prove this inequality and after failing to do it on our own, weresorted to the function

FindInstance from

Mathematica 12.2.0 .Speciﬁcally, we used

FindInstance to ﬁnd an instance of the prob-abilities that satisﬁed the reverse of the inequality above subject to p ( c ) = . p ( a ∣ c ) ≥ p ( a ∣ c ) ≥ . p ( d ∣ c ) ≥ p ( d ∣ c ) ≥ .

5. Since nosuch instance was found, the inequality above must hold. It is worthmentioning that

FindInstance works analytically and not numericallyand, thus, its outcome is exact and correct. The results in the previous paragraph allow us to write p ( c ∣ a, d ) p ( d )+ p ( c ∣ a, d ) p ( d ) = . + α and p ( c ∣ a, d ) p ( d ) + p ( c ∣ a, d ) p ( d ) = . + β with α ≥ β ≥

0. Consequently, p ( c ∣ a, d ) p ( d ) + p ( c ∣ a, d ) p ( d ) = − ( p ( c ∣ a, d ) p ( d ) + p ( c ∣ a, d ) p ( d )) = . − α , and p ( c ∣ a, d ) p ( d )+ p ( c ∣ a, d ) p ( d ) = −( p ( c ∣ a, d ) p ( d )+ p ( c ∣ a, d ) p ( d )) = . − β . Therefore, RD obs = E [ Y ∣ a, d ] p ( d ) + E [ Y ∣ a, d ] p ( d ) − E [ Y ∣ a, d ] p ( d ) − E [ Y ∣ a, d ] p ( d )= ( E [ Y ∣ a, c, d ] p ( c ∣ a, d ) + E [ Y ∣ a, c, d ] p ( c ∣ a, d )) p ( d )+ ( E [ Y ∣ a, c, d ] p ( c ∣ a, d ) + E [ Y ∣ a, c, d ] p ( c ∣ a, d )) p ( d )− ( E [ Y ∣ a, c, d ] p ( c ∣ a, d ) + E [ Y ∣ a, c, d ] p ( c ∣ a, d )) p ( d )− ( E [ Y ∣ a, c, d ] p ( c ∣ a, d ) + E [ Y ∣ a, c, d ] p ( c ∣ a, d )) p ( d )= E [ Y ∣ a, c ]( p ( c ∣ a, d ) p ( d ) + p ( c ∣ a, d ) p ( d ))+ E [ Y ∣ a, c ]( p ( c ∣ a, d ) p ( d ) + p ( c ∣ a, d ) p ( d ))− E [ Y ∣ a, c ]( p ( c ∣ a, d ) p ( d ) + p ( c ∣ a, d ) p ( d ))− E [ Y ∣ a, c ]( p ( c ∣ a, d ) p ( d ) + p ( c ∣ a, d ) p ( d ))= E [ Y ∣ a, c ]( . + α ) + E [ Y ∣ a, c ]( . − α )− E [ Y ∣ a, c ]( . − β ) − E [ Y ∣ a, c ]( . + β )= RD true + α ( E [ Y ∣ a, c ] − E [ Y ∣ a, c ]) − β ( E [ Y ∣ a, c ] − E [ Y ∣ a, c ]) (25)where the third equality follows from the fact that Y and D are con-ditionally independent given A and C due to the causal graph un-der consideration. Then, RD obs ≥ RD true because α ≥ β ≥

0, and E [ Y ∣ a, c ] − E [ Y ∣ a, c ] ≥ E [ Y ∣ a, c ] − E [ Y ∣ a, c ] ≥ (cid:3) Theorem 10.

Consider the causal graph to the left in Figure 1. Let p ( c ) = . , p ( a ∣ c ) ≤ p ( a ∣ c ) ≤ . and p ( d ∣ c ) ≤ p ( d ∣ c ) ≤ . . If E [ Y ∣ a, c ] − Code available at . E [ Y ∣ a, c ] ≥ E [ Y ∣ a, c ]− E [ Y ∣ a, c ] ≥ , then RD crude ≤ RD true and RD obs ≤ RD true . If E [ Y ∣ a, c ]− E [ Y ∣ a, c ] ≤ E [ Y ∣ a, c ]− E [ Y ∣ a, c ] ≤ , then RD crude ≥ RD true and RD obs ≥ RD true .Proof. Similar to the proof of Theorem 9. Speciﬁcally, the assumptionthat p ( a ∣ c ) ≤ p ( a ∣ c ) ≤ . p ( a ∣ c ) p ( a ∣ c ) ≤ p ( a ∣ c ) p ( a ∣ c ) , whichtogether imply that p ( c ∣ a ) ≤ p ( c ∣ a ) ≤ .

5, which implies that RD crude = RD true + α ( E [ Y ∣ a, c ] − E [ Y ∣ a, c ]) − β ( E [ Y ∣ a, c ] − E [ Y ∣ a, c ]) with α ≤ β ≤

0. This implies the stated relationships between RD crude and RD true . The assumption that p ( a ∣ c ) ≤ p ( a ∣ c ) ≤ . p ( a ∣ c )/ p ( a ∣ c ) ≤

1, which implies that p ( c ∣ a, d ) ≤ p ( c ∣ d ) and p ( c ∣ a, d ) ≤ p ( c ∣ d ) , which together with the assumption that p ( d ∣ c ) ≤ p ( d ∣ c ) ≤ . p ( c ∣ a, d ) p ( d ) + p ( c ∣ a, d ) p ( d ) ≤ p ( c ∣ a, d ) p ( d ) + p ( c ∣ a, d ) p ( d ) ≤ . RD obs = RD true + α ( E [ Y ∣ a, c ] − E [ Y ∣ a, c ]) − β ( E [ Y ∣ a, c ] − E [ Y ∣ a, c ]) with α ≤ β ≤

0. This implies the stated relationships between RD obs and RD true . (cid:3) Theorem 11.

Consider the causal graph to the left in Figure 1. Let p ( c ) ≤ . and p ( a ∣ c ) ≥ p ( a ∣ c ) ≥ . . If E [ Y ∣ a, c ]− E [ Y ∣ a, c ] ≥ E [ Y ∣ a, c ]− E [ Y ∣ a, c ] ≥ , then RD crude ≥ RD true . If E [ Y ∣ a, c ]− E [ Y ∣ a, c ] ≤ E [ Y ∣ a, c ]− E [ Y ∣ a, c ] ≤ , then RD crude ≤ RD true .Proof. Recall from the proof of Theorem 2 that p ( c ∣ a ) = σ ( ln p ( a ∣ c ) p ( c ) p ( a ∣ c ) p ( c ) ) and p ( c ∣ a ) = σ ( ln p ( a ∣ c ) p ( c ) p ( a ∣ c ) p ( c ) ) and p ( c ) = σ ( ln p ( c ) p ( c ) ) and p ( c ) = σ ( ln p ( c ) p ( c ) ) . Therefore, p ( c ∣ a ) ≥ p ( c ) and p ( c ∣ a ) ≥ p ( c ) because σ () and ln () areincreasing functions and p ( a ∣ c )/ p ( a ∣ c ) ≥ p ( a ∣ c )/ p ( a ∣ c ) ≥ p ( a ∣ c ) ≥ p ( a ∣ c ) ≥ .

5. Then, we can write p ( c ∣ a ) = p ( c ) + α and p ( c ∣ a ) = p ( c ) + β with α, β ≥

0. Moreover, α ≥ β . Tosee it, recall from the proof of Theorem 9 that a function of the form f ( x ) = x ( − x ) has a single maximum at x = .

5, and it is increasing inthe interval [ , . ] and decreasing in the interval [ . , ] . Now, notethat σ ′ ( z ) = σ ( z )( − σ ( z )) (Bishop, 2006, Equation 4.88). Then, σ ′ ( z ) has a single maximum at σ ( z ) = . z = { σ ( z )∣ ≤ σ ( z ) ≤ . } (i.e., { z ∣ −∞ < z ≤ } ) and decreasingin the interval { σ ( z ) ∣ . ≤ σ ( z ) ≤ } (i.e., { z ∣ ≤ z < +∞} ). Inother words, σ ( z ) increases at an increasing rate in the interval (−∞ , ] and increases at a decreasing rate in the interval [ , +∞) . Therefore, σ (− u + v ) − σ (− u ) ≥ σ ( u + v ) − σ ( u ) for all u, v ≥

0. Then, α = p ( c ∣ a ) − p ( c )= σ ( ln p ( a ∣ c ) p ( a ∣ c ) + ln p ( c ) p ( c ) ) − σ ( ln p ( c ) p ( c ) )≥ σ ( ln p ( a ∣ c ) p ( a ∣ c ) + ln p ( c ) p ( c ) ) − σ ( ln p ( c ) p ( c ) )≥ σ ( ln p ( a ∣ c ) p ( a ∣ c ) + ln p ( c ) p ( c ) ) − σ ( ln p ( c ) p ( c ) )= p ( c ∣ a ) − p ( c ) = β where the ﬁrst inequality follows from the fact that σ (− u + v ) − σ (− u ) ≥ σ ( u + v )− σ ( u ) with u = ln p ( c ) p ( c ) and v = ln p ( a ∣ c ) p ( a ∣ c ) , and the second inequalityfollows from Equation 23 and the fact that σ () and ln () are increasingfunctions. Note that u, v ≥ p ( c ) ≤ . p ( a ∣ c ) ≥ p ( a ∣ c ) ≥ . RD crude = E [ Y ∣ a ] − E [ Y ∣ a ]= E [ Y ∣ a, c ] p ( c ∣ a ) + E [ Y ∣ a, c ] p ( c ∣ a )− E [ Y ∣ a, c ] p ( c ∣ a ) − E [ Y ∣ a, c ] p ( c ∣ a )= E [ Y ∣ a, c ]( p ( c ) + α ) + E [ Y ∣ a, c ]( p ( c ) − α )− E [ Y ∣ a, c ]( p ( c ) − β ) − E [ Y ∣ a, c ]( p ( c ) + β )= RD true + α ( E [ Y ∣ a, c ] − E [ Y ∣ a, c ]) − β ( E [ Y ∣ a, c ] − E [ Y ∣ a, c ]) (26)which implies the desired results because α ≥ β ≥ (cid:3) Theorem 12.

Consider the causal graph to the left in Figure 1. Let p ( c ) ≥ . and p ( a ∣ c ) ≤ p ( a ∣ c ) ≤ . . If E [ Y ∣ a, c ]− E [ Y ∣ a, c ] ≥ E [ Y ∣ a, c ]− E [ Y ∣ a, c ] ≥ , then RD crude ≤ RD true . If E [ Y ∣ a, c ]− E [ Y ∣ a, c ] ≤ E [ Y ∣ a, c ]− E [ Y ∣ a, c ] ≤ , then RD crude ≥ RD true .Proof. Similar to the proof of Theorem 11. Speciﬁcally, the assumptionthat p ( a ∣ c ) ≤ p ( a ∣ c ) ≤ . p ( c ∣ a ) ≤ p ( c ) and p ( c ∣ a ) ≤ p ( c ) ,which implies that p ( c ∣ a ) = p ( c )+ α and p ( c ∣ a ) = p ( c )+ β with α ≤ β ≤ σ (− u + v ) − σ (− u ) ≤ σ ( u + v ) − σ ( u ) for all u ≥ v ≤ (cid:3) Theorem 13.

Consider the path diagram to the right in Figure 1.Assume that the variables are standardized. If sign ( β ) = sign ( γ ) then β Y A ⋅ C ≤ β Y A ⋅ D ≤ β Y A , else β Y A ⋅ C ≥ β Y A ⋅ D ≥ β Y A .Proof.

Pearl (2013, Section 3.11) shows that β Y A ⋅ C = α , β Y A = α + βγ and β Y A ⋅ D = α + γβ ( − δ ) − β δ . (27)Note that the linear structural equation model corresponding to thepath diagram under consideration implies that A = βC + (cid:15) A where (cid:15) A is an error term that is independent of C and, thus, var ( A ) = β var ( C ) + var ( (cid:15) A ) where var ( A ) = var ( C ) = β ≤

1. Similarly δ ≤

1. Then,1 − δ ≤ − β δ in Equation 27. The result is now immediate. (cid:3) References

C. M. Bishop.

Pattern Recognition and Machine Learning . Springer,2006.S. Greenland. The Eﬀect of Misclassiﬁcation in the Presence of Covari-ates.

American Journal of Epidemiology , 112(4):564–569, 1980.W. Miao, Z. Geng, and E. J. Tchetgen Tchetgen. IdentifyingCausal Eﬀects with Proxy Variables of an Unmeasured Confounder.

Biometrika , 105(4):987–993, 2018.E. L. Ogburn and T. J. VanderWeele. On the Nondiﬀerential Mis-classiﬁcation of a Binary Confounder.

Epidemiology , 23(3):433–439,2012.E. L. Ogburn and T. J. VanderWeele. Bias Attenuation Results forNondiﬀerentially Mismeasured Ordinal and Coarsened Confounders.

Biometrika , 100(1):241–248, 2013.J. M. Pe˜na. On the Monotonicity of a Nondiﬀerentially MismeasuredBinary Confounder.

Journal of Causal Inference , 8:150–163, 2020.J. Pearl.

Causality: Models, Reasoning, and Inference . CambridgeUniversity Press, 2009.J. Pearl. Linear Models: A Useful “Microscope” for Causal Analysis.