[PDF] Comment: Reflections on the Deconfounder

Abstract

The aim of this comment (set to appear in a formal discussion in JASA) is to draw out some conclusions from an extended back-and-forth I have had with Wang and Blei regarding the deconfounder method proposed in "The Blessings of Multiple Causes" [arXiv:1805.06826]. I will make three points here. First, in my role as the critic in this conversation, I will summarize some arguments about the lack of causal identification in the bulk of settings where the "informal" message of the paper suggests that the deconfounder could be used. This is a point that is discussed at length in D'Amour 2019 [arXiv:1902.10286], which motivated the results concerning causal identification in Theorems 6--8 of "Blessings". Second, I will argue that adding parametric assumptions to the working model in order to obtain identification of causal parameters (a strategy followed in Theorem 6 and in the experimental examples) is a risky strategy, and should only be done when extremely strong prior information is available. Finally, I will consider the implications of the nonparametric identification results provided for a narrow, but non-trivial, set of causal estimands in Theorems 7 and 8. I will highlight that these results may be even more interesting from the perspective of detecting causal identification from observed data, under relatively weak assumptions about confounders.

Full PDF

aa r X i v : . [ s t a t . M E ] O c t Comment: Reﬂections on the Deconfounder

Alexander D’AmourGoogle Research, Cambridge, MA, [email protected]

I would like to congratulate the authors on their illuminating article, and thank the editors for the opportunityto discuss the paper. The deconfounder method that this article presents is appealing: a number of importantscientiﬁc investigations and high-stakes decisions ﬁt into its template. Indeed, as the authors note, instancesof the deconfounder have already been deployed without explicit causal language in a number of appliedsettings. By bringing to light the implicit causal argument that underlies this approach, the authors havesparked an important conversation with potentially far-reaching consequences. It is thus important tocarefully outline when we expect the deconfounder method to succeed in characterizing causal relationshipsand when we expect it to fail.I have personally been in conversation with the authors over the past two years about this work, and thisdiscussion has yielded some interesting insights, some of which have been published (D’Amour, 2019), andsome of which now appear in the current version of the article and in follow-up work (Wang and Blei,2019). The aim of this note is to draw out some conclusions from this conversation about the role thatthe deconfounder can play in practical causal inference. In particular, I will make three points here. First,in my role as the critic in this conversation, I will summarize some arguments about the lack of causalidentiﬁcation in the bulk of settings where the “informal” message of the paper suggests that deconfoundercould be used. This is a point that is discussed at length in D’Amour (2019), which motivated the resultsconcerning causal identiﬁcation in Theorems 6–8. Second, I will argue that adding parametric assumptionsto the working model in order to obtain identiﬁcation of causal parameters (a strategy followed in Theorem6 and in the experimental examples) is a risky strategy, and should only be done when extremely strongprior information is available. Finally, I will consider the implications of the nonparametric identiﬁcationresults provided for a narrow, but non-trivial, set of causal estimands in Theorems 7 and 8. I will highlightthat these results may be even more interesting from the perspective of detecting causal identiﬁcation from1bserved data, under relatively weak assumptions about confounders.Throughout this note, I will draw connections to sensitivity analysis methods that probe the implicationsof unobserved confounding. This is a natural lens through which to study the deconfounder because manysensitivity analysis methods posit a similar latent variable model to the one that the deconfounder deploysas a working model (see, e.g., Rosenbaum and Rubin, 1983). Well-designed sensitivity analyses can revealhow speciﬁc assumptions restrict the range of causal conclusions that are compatible with the observed data,and are thus useful for understanding what is lost when assumptions like “no unobserved confounders” arerelaxed to “no unobserved single-cause confounders.” Thus, I believe, as the authors suggest, that sensitivityanalysis should be a core part of any workﬂow that deploys the deconfounder, and discuss at various placeshow sensitivity analysis could be used eﬀectively in this setting.

Preliminaries

Following the paper, I will denote causes as A := ( A (1) , . . . , A ( m ) ) taking speciﬁc values a = ( a (1) , . . . , a ( m ) ), potential outcomes as Y ( a ), and latent confounders as Z . To avoid measure-theoreticconsiderations when writing conditioning statements, I will consider the treatments A ( k ) to be discrete. Iwill write observed outcomes as Y obs , where, under the stable unit treatment value assumption (SUTVA), Y obs = Y ( A ). Finally, I will denote by Z any latent confounders.Throughout, I will consider models of the joint distribution P ( A, Y obs , Z ), which I will refer to as latentvariable models. I will assume that unconfoundedness is satisﬁed conditional on Z : Y ( a ) ⊥⊥ A | Z Z -a.e. , ∀ a. Thus, if the latent variable model is fully speciﬁed, the potential outcome distributions P ( Y ( a )) are alsospeciﬁed by the following adjustment formula, which “adjusts” for the confounder ZP ( Y ( a )) = E [ P ( Y obs | Z, A = a )] ∀ a. (1)I will refer to the integrand in (1) P ( Y obs | Z, A = a ) as the outcome model. If the confounder Z is observed,and the overlap condition is satisﬁed, then P ( Y ( a )) is identiﬁed from observed data. The question at handis whether P ( Y ( a )) can be identiﬁed when Z is unobserved.2 Fundamental Limitations of the Deconfounder Approach

I will begin by summarizing the argument in D’Amour (2019) critiquing the “informal” message about thedeconfounder approach (stated most explicitly in the informal statement of Theorem 6 and Section 3.4).Speciﬁcally, this message asserts that, under the “no unobserved single-cause confounders” assumption,any well-ﬁtting latent variable model P ( Y obs , A, Z ) will yield the correct potential outcome distribution in P ( Y ( a )) via the adjustment formula (1). This informal story is motivated by strong intuition. Lemmas 1–3establish that multi-cause confounding leaves an observable “imprint” of dependence between the causes A .Thus, it seems natural that we might be able to gain some information, and even adjust for, an unobservedmulti-cause confounder Z by modeling the dependence between the causes A .Unfortunately, this intuition can only be carried so far: while a factor model for the causes A can recoverinformation about multi-cause confounders from observed data, the potential outcome distributions P ( Y ( a ))are not non-parametrically identiﬁed, except in cases where all confounding is observed. Thus, without addi-tional unveriﬁable assumptions, no method can recover the distributions P ( Y ( a )) when there is unobservedconfounding. In this section, I brieﬂy demonstrate why this is the case. For a more in-depth argument aboutlack of identiﬁcation in this setting with concrete examples, see D’Amour (2019).As I show formally below, the key diﬃculty is that the causes A cannot be used simultaneously as measure-ments of the unobserved confounder Z , and as treatments whose eﬀects are being estimated. If the event A = a provides only a noisy measurement of Z , there is ambiguity in how the outcome model P ( Y obs | Z, A = a ) should align the variability in the residual distributions P ( Y obs | A = a ) and P ( Z | A = a );there are many speciﬁcations of the residual dependence between Y obs and Z that are compatible with theobserved data. This is a classic problem that arises when confounders are measured with error (see, e.g.Ogburn and Vanderweele, 2012). On the other hand, if the event A = a provides a perfect measurement of Z , such that there is some function ˆ z ( A ) such that ˆ z ( a ) = Z , then the overlap condition fails. In this case, P ( Y obs | Z, A = a ) is only identiﬁed when Z = ˆ z ( a ) because the event Z = ˆ z ( a ) has zero probability in theobserved data.Let us now make this argument formal. To do this, we will account for how the two deconfounder assumptionsof (a) good model ﬁt, and (b) “no unobserved single-cause confounders” constrain the factor model and itsimplications about the potential outcomes P ( Y ( a )). This accounting is convenient if we rewrite the jointdistribution using copula densities c ( V, W ) = P ( V,W ) P ( V ) P ( W ) , which characterize the dependence between random3ariables independently of their marginal distributions. P ( Y obs , A, Z ) = P ( A, Y obs ) | {z } Observed · P ( Z ) c ( Z, A ) | {z } Factor Model · c ( Y obs , Z | A ) | {z } Outcome Copula . (2)Each factor in this composition corresponds to a diﬀerent assumption. The requirement for good modelﬁt constrains only the ﬁrst term, which speciﬁes the distribution of observable quantities, while the “nounobserved single-cause confounders” assumption constrains the second term by constraining the causes tobe conditionally independent given Z (Lemma 2). This leaves the outcome-confounder copula density c ( Y obs , Z | A ) = P ( Y obs ,Z | A ) P ( Y | A ) P ( Z | A ) unconstrained. This copula speciﬁes the residual dependence between Y obs and Z after conditioning on the causes A , and plays a key role in specifying the outcome model P ( Y obs | A, Z ).To complete the argument, note that the potential outcome distributions P ( Y ( a )) implied by the latentvariable model are sensitive to the speciﬁcation of this copula. Speciﬁcally, the estimand in (1) can bewritten as P ( Y ( a )) = Z Z P ( Y obs | A = a ) c ( Y obs , Z | A = a ) dP ( Z ) . Plugging in diﬀerent speciﬁcations of the copula here yields diﬀerent conclusions about P ( Y ( a )). Whenever P ( Y ( a )) = P ( Y | A = a ), there are multiple speciﬁcations of the copula that yield diﬀerent conclusions aboutthe potential outcomes. Thus, P ( Y ( a )) is not identiﬁed unless there is no confounding and P ( Y ( a )) = P ( Y | A = a ).We can now revisit the tension between the roles of causes A as measurements of Z , and as treatments.In cases where Z can only be inferred inexactly (i.e., P ( Z | A = a ) is non-degenerate), the marginals P ( Y obs | A = a ) and P ( Z | A = a ) put some constraints on the outcome model P ( Y obs | Z, A = a ), but theambiguity in the copula implies that this model is not identiﬁed for any value of Z . In cases where Z canbe reconstructed deterministically from the causes by some function ˆ z ( a ), (i.e., P ( Z | A = a ) is degenerate),the outcome model P ( Y obs | Z, A = a ) is identiﬁed when Z = ˆ z ( a ), but the copula is undeﬁned whenever Z = ˆ z ( a ) because this event has zero probability.The upshot of this argument is that neither the deconfounder nor any other estimation method can adjustfor unobserved confounding when estimating P ( Y ( a )) under the “no unobserved single-cause confounders” The “no unobserved single-cause confounders” assumption does not uniquely identify the factor model by itself. Somestructure also needs to be put on the latent variable, and even then, the factor model may not be identiﬁed. See D’Amour(2019) for an example where the factor model P ( A, Z ) is itself not identiﬁed. To see this, note that the independence copula c ( Y obs , Z | A = a ) = 1 implies that P ( Y ( a )) = P ( Y | A = a ). Thus, because P ( Y ( a )) = P ( Y | A = a ), this copula and the true copula yield diﬀerent conclusions about P ( Y ( a )). Z from the causes A . Although the single-cause confounding assumption does put some non-trivial structure on the latent variable model, it is not enough for causal estimation.This lack of identiﬁcation leaves practitioners looking to apply the deconfounder with two options: eithermake additional assumptions about the latent variable model P ( Y obs , A, Z ) so that P ( Y ( a )) is identiﬁed, orseek out causal comparisons where all of the confounding is eﬀectively observed. In the Theory section ofthe paper, the authors consider both of these paths. I will discuss each of these options in turn. I now turn to the subject of parametric identiﬁcation of causal parameters, and oﬀer some cautions aboutemploying this strategy. Parametric identiﬁcation is a natural strategy to employ when the causal parametersof interest are not non-parametrically identiﬁed. One obtains parametric identiﬁcation by adding parametricassumptions to the working model that constrain the implied potential outcome distributions P ( Y ( a )) tobe unique. The authors employ this parametric identiﬁcation strategy in the experimental demonstrationsof the deconfounder, as well as the formal result in Theorem 6. In Theorem 6, the copula c ( Y obs , Z | A ) isrestricted by assuming that there is no interaction between the causes A and the latent variable Z in theoutcome model (i.e., that they combine linearly), and assuming that the confounder is piecewise constantin A . In the paper’s experiments, the authors assume a parametric factor model (e.g., a quadratic factormodel for the genome-wide association study simulation), and a true linear outcome model. In the casesof Theorem 6 and the GWAS simulation study, the authors prove that these parametric assumptions aresuﬃcient for identiﬁcation.Parametric identiﬁcation can be a risky strategy to employ in practice. Speciﬁcally, the fact that theparametric assumptions are necessary to identify causal parameters implies that some aspects of theseassumptions are not testable in the observed data. The decomposition in (2) makes this clear: given that theobserved data are insuﬃcient to identify the causal parameters, the parametric assumptions must restrictsome of the unidentiﬁed portions of the latent variable model. Thus, to have conﬁdence in this approach,one needs to have conﬁdence in the parametric model used to identify causal eﬀects as a true model ofthe world , not merely as an acceptable description of the observed data. This is because the identifyingparametric assumptions specify not only a descriptive model of the observed data, but also a structuralmodel for unobserved counterfactual outcomes. Relying on parametric identiﬁcation may be feasible in caseswhere one has strong prior knowledge—e.g., about the quantity represented by the unmeasured confounder,5r the speciﬁc distributions of measurement errors—but such knowledge is often unavailable.In addition, uncertainty estimates that are based directly on the parametric speciﬁcation, e.g., Bayesiancredible sets, do not capture the full extent of uncertainty about causal eﬀects according to the data.Speciﬁcally, these uncertainty estimates only quantify uncertainty within the speciﬁed model, and do notinclude the fundamental uncertainty associated with the lack of non-parametric identiﬁcation of the potentialoutcome distributions P ( Y ( a )). As a result, unless the prior information used to specify the parametricassumptions is very strong, these uncertainty estimates will understate the degree of uncertainty about acausal parameter estimate. This is a standard critique of parametric uncertainty quantiﬁcation, but carriesextra weight in the context where conclusions depend on untestable aspects of the parametric model. Forexample, for the parametrically identiﬁed latent variable model in the GWAS example, as the sample sizegrows, the posterior for the causal parameter will concentrate around a single value, even though there existsa range of outcome models that correspond to diﬀerent copulas c ( Y obs , Z | A = a ) that are equivalentlycompatible with the observed data, but would concentrate on diﬀerent causal parameters. In fact, even small,seemingly benign parametric choices can mask alternative causal explanations. Lessons from latent variablemodels in the missing data and causal inference literatures can be instructive here. For example, analysesof the widely-used Heckman selection model (Heckman, 1979) have noted that the tail thickness of priorson latent variables can induce starkly diﬀerent conclusions that are hidden by using the Gaussian default(Little and Rubin, 2015; Ding, 2014). See also discussions in Robins et al. (2000) and Linero and Daniels(2017) for other examples.Here, sensitivity analysis can be a useful tool to account for the fundamental uncertainty due to non-identiﬁcation of the causal estimand. When performed with parametric models, sensitivity analyses perturbthe parametric assumptions made with the estimating model in order to understand what other causalconclusions could be obtained under diﬀerent parametric speciﬁcations. Performing sensitivity analyses ondeconfounder estimates is straightforward: a number of sensitivity analysis approaches employ a workingmodel with the same latent variable structure (e.g., Rosenbaum and Rubin, 1983; Imbens, 2003; Dorie et al.,2016; Cinelli and Hazlett, 2018). However, sensitivity analyses can also fall victim to spurious parametricidentiﬁcation if the perturbations are not appropriately parameterized (Gustafson et al., 2018). To avoidthis issue, it can be useful to employ sensitivity analysis strategies that cleanly separate the portions of themodel that are identiﬁed by the observed data from those that are identiﬁed by parametric assumptions(Franks et al., 2019; Robins et al., 2000; Linero and Daniels, 2017). In the context of the deconfounder, thedecomposition in (2) is a promising place to start, and is the subject of current work.6 Toward a More Selective Deconfounder Workﬂow

A more cautious alternative to pursuing parametric identiﬁcation is to seek out causal questions that havedeﬁnitive answers under the “no unobserved single-cause confounders” assumption. The authors take thispath in Theorems 7 and 8, in a setting where the latent confounder Z can be deterministically reconstructedas a function of the causes ˆ z ( A ). Here, however, the factor model seems less interesting as a tool forcalculating causal eﬀects, and more interesting as a tool for establishing empirically when no unobservedconfounding is present. In my opinion, this seems to be a more interesting thread to follow.To review, in Theorem 7 the authors consider partitioning the causes into a set of focal causes A k whoseeﬀects will be estimated, and a set of auxiliary causes A k +1: m that will serve as measurements of the latentconfounder. The theorem then states that if the latent confounder Z can be written as a function of theauxiliary causes Z = ˆ z ( A k +1: m ) alone, then the distributions of potential outcomes deﬁned with respect tothe subset of focal causes P ( Y ( a k )) are identiﬁable subject to an overlap condition. Meanwhile, Theorem8 states that certain counterfactual potential outcome distributions of the form P ( Y ( a ) | A = a ′ ) areidentiﬁable as long as the causes a and a ′ map to the same value of the latent confounder, i.e., ˆ z ( a ) =ˆ z ( a ′ ).In these results, the authors focus on the role of the factor model in the identiﬁcation of causal estimandsunder the “no unobserved single-cause confounders” assumption. However, the factor model is not essentialfor this point. Note that Theorems 7 and 8 both imply that the causal parameters can be identiﬁed interms of the causes A alone, because it is assumed that the confounder Z can be written as a function of A .Written with slightly more generality, the identiﬁcation result in Theorem 7 implies: P ( Y ( a k )) = E [ P ( Y obs | A k = a k , A k +1: m )] , (3)while the identiﬁcation result in Theorem 8 implies: P ( Y ( a ′ ) | A = a ) = P ( Y obs | A = a ′ ) ∀ ( a, a ′ ) s.t. ˆ z ( a ) = ˆ z ( a ′ ) . (4)To me, the more interesting point is that the factor model can be used in some cases to determine em-pirically whether some of the assumptions of the theorems are met. For example, the setting of Theorem7 can be framed as a problem where the unobserved confounder Z is measured with proxies A k +1: m . Itis well-understood that in the limit where Z is perfectly recovered by the proxies, the potential outcome This is not how the theorem is stated, but this function restriction is implied by the subsequent overlap condition. Y obs A ( m ) · · · A (1) X Figure 1: DAG assumed in Proposition 1, representing the relationship between causes A , latent confounder Z , covariates X , and observed outcome Y obs .distribution P ( Y ( a K )) is identiﬁed (Ogburn and Vanderweele, 2012); however, in single-cause problems,one cannot determine whether this condition has been met. Similarly, Theorem 8 can be framed as a settingwhere one is imputing a set of counterfactual outcomes within a subpopulation where there is no confound-ing because, within this subpopulation, the confounder is ﬁxed. Here, too, in single-cause problems, onecannot deﬁnitively identify such subpopulations from observed data. Interestingly, the theory of multi-causeconfounding presented in the paper suggests that these assumptions can be empirically validated undersome restrictions on the causal DAG relating A to Y obs and the “no unobserved single-cause confounders”assumption. For example, this theory supports the following proposition. Proposition 1.

Suppose there are no single-cause confounders, and the structural relationships betweencauses A , latent confounder Z , and observed outcomes Y obs can be represented in the DAG in Figure 1.Suppose that in addition to causes A , we also have auxiliary covariates X , which are conditionally independentof the causes A conditional on the multi-cause confounder Z . Then for any function ˆ z ( A, X ) such that thecauses A are mutually independent conditional on ˆ z ( A, X ) , the conditional independence A ⊥⊥ Y ( a ) | ˆ z ( A, X ) also holds for each a . Theorems 7 and 8 can be written as consequences of this proposition. This proposition is potentially usefulbecause it shows that absence of certain confounding structures has observable implications. This insight isclosely related to the literature on negative controls (see, e.g., Lipsitch et al., 2010).This result suggests that one can use a similar workﬂow to the deconfounder to determine, at least inprinciple, whether identiﬁcation statements like (3) or (4) are valid in a given setting. Speciﬁcally, one canobtain a function ˆ z ( A, X ) (perhaps by ﬁtting a factor model), then test whether the causes A appear to bemutually independent conditional on ˆ z ( A, X ). If one is satisﬁed that this is true, (3) or (4) can be applied.Importantly, this procedure is truly agnostic to the parametric speciﬁcation of the model used to obtainˆ z ( A, X ): all of the conditions are only functions of observables.While the workﬂow in this procedure is similar to the deconfounder, it has a diﬀerent use case. Insteadof enabling causal inference in a wide range of cases, this procedure would be used to determine whether8ne can proceed with unconfounded inference at all, and can potentially give “no” as an answer. Still, thissort of procedure can prove useful in complex data contexts, where it can be valuable to surface causalquestions that can be adequately answered with the available data. In a speciﬁc example of this approach,Sharma et al. (2018) propose a similar testing procedure to uncover unconfounded comparisons, and use itto evaluate the causal eﬀect of a recommender system on purchasing rates for certain products.In outlining this procedure, I have belabored the point that it is a workﬂow “in principle” because it couldprove tricky to implement. The observable implication that needs to be tested is a complex conditionalindependence statement, and these are notoriously diﬃcult to test in practice (Shah and Peters, 2018). Inparticular, one would receive the “green light” to estimate a causal parameter by failing to reject the null ofconditional independence, which can only be reliably depended upon if the test has acceptably high power,but designing such tests is diﬃcult, and in some settings, impossible.Here, it can again be helpful to turn back to sensitivity analysis. Instead of attempting to rule out allpossible forms of dependence between the causes A conditional on ˆ z ( A, X ), a sensitivity analysis approachcould explore a number of candidate models for the residual dependence between the causes A and relatethese models to the confounding induced by the unobserved confounder Z . For example, one could examinethe range of causal eﬀects that would be compatible with the assumption that, conditional on ˆ z ( A, X ), thethe causes A are no more predictive of a potential outcome Y ( a ) than any leave-one-out set of the causes A − k is able to predict a held-out cause A ( k ) . This sort of calibration argument is common in more standardsensitivity analyses (Imbens, 2003; Dorie et al., 2016; Franks et al., 2019; Cinelli and Hazlett, 2018). Incases where dependence between the causes can be ruled out conclusively, this approach would yield asensitivity region that collapses to a point; however, in the more likely case where many dependences cannotbe ruled out, this approach would represent this uncertainty with a wider sensitivity region. It should benoted that constructing a plausible sensitivity analysis of this type would require deep domain knowledgeto justify the analogy between diﬀerent dependences between variables. Negative control methods andrelated identiﬁcation strategies Lipsitch et al. (2010) and Miao et al. (2018) could be framed as particularlysuccessful executions of this type of argument. In writing this paper, the authors have drawn attention to a problem that is simultaneously scientiﬁcallyimportant, methodologically interesting, and conceptually subtle. Although I have taken on the role ofcritic in our conversations, I believe their contribution here is important. I remain skeptical about the9econfounder as a method for causal point estimation, but believe that the authors’ characterization ofmulti-cause confounding could yield fruitful developments in sensitivity analysis, and in potentially obtainingidentiﬁcation results in more complex settings. This work has certainly inspired me to pay more attentionto this problem, and to consider how new methods and tools can be developed to help practitioners drawprincipled causal conclusions in this setting.

References

Carlos Cinelli and Chad Hazlett. Making sense of sensitivity: Extending omitted variable bias. Technicalreport, Working Paper, 2018.Peng Ding. Bayesian robust inference of sample selection using selection-t models.

Journal of MultivariateAnalysis , 124:451–464, 2014.Vincent Dorie, Masataka Harada, Nicole Bohme Carnegie, and Jennifer Hill. A ﬂexible, interpretable frame-work for assessing sensitivity to unmeasured confounding.

Statistics in medicine , 35(20):3453–3470, 2016.Alexander D’Amour. On multi-cause causal inference with unobserved confounding: Counterexamples, im-possibility, and alternatives. In

The 22nd International Conference on Artiﬁcial Intelligence and Statistics ,pages 3478–3486, 2019.Alex Franks, Alex D’Amour, and Avi Feller. Flexible sensitivity analysis for observational studies withoutobservable implications.

Journal of the American Statistical Association , (just-accepted):1–38, 2019.Paul Gustafson, Lawrence C McCandless, et al. When is a sensitivity parameter exactly that?

StatisticalScience , 33(1):86–95, 2018.James J Heckman. Sample selection bias as a speciﬁcation error.

Econometrica , 47(1):153–161, 1979.Guildo W Imbens. Sensitivity to exogeneity assumptions in program evaluation.

American Economic Review ,93(2):126–132, 2003.Antonio R Linero and Michael J Daniels. Bayesian approaches for missing not at random outcome data:The role of identifying restrictions. 2017.Marc Lipsitch, Eric Tchetgen Tchetgen, and Ted Cohen. Negative controls: a tool for detecting confoundingand bias in observational studies.

Epidemiology (Cambridge, Mass.) , 21(3):383, 2010.Roderick JA Little and Donald B Rubin.

Statistical analysis with missing data . John Wiley & Sons, 2015.Wang Miao, Zhi Geng, and Eric J Tchetgen Tchetgen. Identifying causal eﬀects with proxy variables of anunmeasured confounder.

Biometrika , 105(4):987–993, 2018.Elizabeth L Ogburn and Tyler J Vanderweele. Bias attenuation results for nondiﬀerentially mismeasuredordinal and coarsened confounders.

Biometrika , 100(1):241–248, 2012.James M Robins, Andrea Rotnitzky, and Daniel O Scharfstein. Sensitivity analysis for selection bias andunmeasured confounding in missing data and causal inference models. In

Statistical models in epidemiology,the environment, and clinical trials , pages 1–94. Springer, 2000.Paul R Rosenbaum and Donald B Rubin. Assessing sensitivity to an unobserved binary covariate in an ob-servational study with binary outcome.

Journal of the Royal Statistical Society: Series B (Methodological) ,45(2):212–218, 1983.Rajen D Shah and Jonas Peters. The hardness of conditional independence testing and the generalisedcovariance measure. arXiv preprint arXiv:1804.07203 , 2018.10mit Sharma, Jake M Hofman, Duncan J Watts, et al. Split-door criterion: Identiﬁcation of causal eﬀectsthrough auxiliary outcomes.

The Annals of Applied Statistics , 12(4):2699–2733, 2018.Yixin Wang and David M Blei. Multiple causes: A causal graphical view. arXiv preprint arXiv:1905.12793arXiv preprint arXiv:1905.12793