[PDF] A Selective Review of Negative Control Methods in Epidemiology

Abstract

Purpose of Review: Negative controls are a powerful tool to detect and adjust for bias in epidemiological research. This paper introduces negative controls to a broader audience and provides guidance on principled design and causal analysis based on a formal negative control framework. Recent Findings: We review and summarize causal and statistical assumptions, practical strategies, and validation criteria that can be combined with subject matter knowledge to perform negative control analyses. We also review existing statistical methodologies for detection, reduction, and correction of confounding bias, and briefly discuss recent advances towards nonparametric identification of causal effects in a double negative control design. Summary: There is great potential for valid and accurate causal inference leveraging contemporary healthcare data in which negative controls are routinely available. Design and analysis of observational data leveraging negative controls is an area of growing interest in health and social sciences. Despite these developments, further effort is needed to disseminate these novel methods to ensure they are adopted by practicing epidemiologists.

Full PDF

AA Selective Review of Negative Control Methods inEpidemiology

Xu Shi ∗ , Wang Miao and Eric Tchetgen Tchetgen Department of Biostatistics, University of Michigan School of Mathematical Sciences, Peking University Statistics Department, The Wharton School, University of Pennsylvania

AbstractPurpose of Review

Negative controls are a powerful tool to detect and adjust forbias in epidemiological research. This paper introduces negative controls to a broaderaudience and provides guidance on principled design and causal analysis based on aformal negative control framework.

Recent Findings

We review and summarize causal and statistical assumptions,practical strategies, and validation criteria that can be combined with subject matterknowledge to perform negative control analyses. We also review existing statisticalmethodologies for detection, reduction, and correction of confounding bias, and brieﬂydiscuss recent advances towards nonparametric identiﬁcation of causal eﬀects in a dou-ble negative control design.

Summary

There is great potential for valid and accurate causal inference lever-aging contemporary healthcare data in which negative controls are routinely available.Design and analysis of observational data leveraging negative controls is an area ofgrowing interest in health and social sciences. Despite these developments, furthereﬀort is needed to disseminate these novel methods to ensure they are adopted bypracticing epidemiologists. K eywords: bias correction, bias detection, bias reduction, negative control, unmeasuredconfounding. ∗ Email: [email protected]. The authors have no conﬂicts to disclose. Human and Animal Rights: Thisarticle does not contain any studies with human or animal subjects performed by any of the authors. Introduction

Despite ongoing eﬀorts to improve study design and statistical analysis of epidemiologicalresearch, failure to rule out non-causal explanation of empirical ﬁndings has prompted sub-stantial discussions in the health science [1, 2]. A powerful tool increasingly recognized tomitigate bias is negative control study design and analysis [3–5]. Negative controls havea long history in laboratory experiments and epidemiology [3, 6–8]. However, they havemainly been used to detect bias rather than to remove bias. More recent methodologicaladvances that enable both bias detection and bias removal have not been fully recognized.As a result, the potential for valid and accurate causal inference leveraging contemporaryhealthcare data with abundant negative controls has to date not been fully realized. Thispaper aims to introduce negative controls to a broader audience and provide guidance onprincipled design and causal analysis based on a formal negative control framework. Wefocus on resolving bias due to unmeasured confounding in observational studies, althoughnegative controls have recently also been used to tackle a variety of biases such as selectionbias [3, 4, 9], measurement bias [3, 4], and homophily bias [10, 11] in both observationalstudies and randomized trials [5].

A negative control outcome (NCO) is a variable known not to be causally aﬀected by thetreatment of interest. Likewise, a negative control exposure (NCE) is a variable known not tocausally aﬀect the outcome of interest. To the extent possible, both NCO and NCE should beselected such that they share a common confounding mechanism as the exposure and outcomevariables of primary interest, although this is not always necessary [12, 13]. These known-nulleﬀects have been used to detect residual confounding bias: presence of an association betweenthe NCE and the outcome (or between the NCO and the exposure) constitutes compellingevidence of residual confounding bias, while absence of such association implies no empiricalevidence of such bias. For example, in a study about the eﬀects of inﬂuenza vaccinationon inﬂuenza hospitalization in the elderly (Figure 1), injury/trauma hospitalization wasconsidered as an NCO as it can not be causally aﬀected by inﬂuenza vaccination, but maybe subject to the same confounding mechanism mainly driven by health-seeking behavior [14].The authors found that despite eﬀorts to control for confounding, inﬂuenza vaccination notonly appeared to reduce risk of inﬂuenza hospitalization after inﬂuenza season (risk ratio 0.82,95% CI 0.73–0.92), but also appeared to reduce risk of injury/trauma hospitalization (riskratio 0.83, 95% CI 0.75–0.91). This was interpreted as evidence of bias due to inadequatelycontrolled confounding. Likewise, annual wellness visit history can be considered as an NCEas it is unlikely to cause ﬂu-related hospitalization.In the following, we adopt the potential outcome framework which we use to formallydeﬁne causal eﬀects as well as to articulate suﬃcient identiﬁcation conditions to perform validcausal inferences from observational data. We proceed under the fundamental assumptionthat for each subject in the target population there exist a potential outcome variable Y ( a ),that would be observed if possibly contrary to fact, the subject were exposed to treatmentvalue a , for all possible treatment values of a in a set A . In the common setting where thetreatment is dichotomous A = { , } , the assumption states that each subject has a well2 YU ZWIV ﬂu shot inﬂuenzahospitalizationhealth-seeking behavior(unmeasured) annual wellnessvisit history (NCE) injury/traumahospitalization (NCO) physicianpreference (cid:55) (cid:55) Figure 1: An illustrating example of diﬀerent types of negative controls: consider studyingthe causal eﬀect of ﬂu shot (A) on inﬂuenza hospitalization (Y), subject to confoundingby unmeasured health-seeking behavior (U). Annual wellness visit history (Z) is an NCEwhich does not causally aﬀect Y. Injury/trauma hospitalization (W) is an NCO which is notcausally aﬀected by A. Both Z and W are proxies of health-seeking behavior. Physician’sprescribing preference (IV) is an instrumental variable which likely induces variation in thechoice of treatment, and may not aﬀect the outcome other than through its inﬂuence onthe treatment. As discussed in Sections 1.1 and 3.1, both a valid instrumental variable andan invalid instrumental variable associated with U are valid NCE. All arguments are madeimplicitly conditional on measured covariates X. Independence between A and Z (or Y andW) conditional on U is not necessary. See more examples in Table A.1 of the Appendix.deﬁned pair of potential outcomes ( Y (0) , Y (1)) corresponding to their outcome under activetreatment a = 1 and control treatment a = 0, respectively [15, 16]. In such setting, our goalis to make inferences about the population average treatment eﬀect (ATE) deﬁned as ATE = E [ Y (1) − Y (0)]. Now, consider an observational study in which one observes independentand identically distributed samples on ( Y, A, X ), where A is a subject’s observed binarytreatment assignment, Y is his/her observed outcome, and X are observed confounders ofthe association between A and Y . We sometimes refer to A as primary treatment and Y as primary outcome. We assume that the treatment is deﬁned with enough speciﬁcity suchthat among subjects with A = a , the observed outcome Y is a realization of the potentialoutcome value Y ( a ), that is Assumption 1 (Consistency) . Y ( a ) = Y when A = a . Much of the literature on causal inference in observational studies relies on the strongassumption of no unmeasured confounding for the purpose of identiﬁcation, i.e., A ⊥⊥ Y ( a ) | X , which is sometimes referred to as ignorability assumption. This assumption essentiallyrules out the existence of unmeasured common causes, denoted as U , of the treatment andoutcome variables – an untestable assumption which is often at the source of much skepticismabout causal interpretation of associations found in observational data. We do not make suchignorability assumption to establish causation. Instead, we invoke the following assumptionthat describes the relationship between treatment and outcome in the presence of bothmeasured and unmeasured confounding. Assumption 2 (Latent ignorability) . A ⊥⊥ Y ( a ) | U, X .

3n addition to (

A, Y, X ), suppose that one has also observed a secondary outcome W and/or a secondary exposure Z , and let Y ( a, z ) and W ( a, z ) denote the corresponding coun-terfactual values that would be observed had the primary treatment and secondary exposuretaken value ( a, z ). W and Z are formally deﬁned as negative control outcome and exposurevariables provided that the following assumptions hold Assumption 3 (Negative control outcome) . W ( a, z ) = W and W ⊥⊥ A | U, X . Assumption 4 (Negative control exposure) . Y ( a, z ) = Y ( a ) and Z ⊥⊥ ( Y ( a ) , W ) | U, X . Assumptions 3 and 4 entail: (1) there is no remaining unmeasured common cause between(

A, Z ) and (

Y, W ) conditional on (

U, X ); (2) there is no causal eﬀect of Z on Y conditionalon U , A and X , and there is no causal eﬀect of A and Z on W conditional on U and X ,which are referred to as the exclusion restrictions. We refer to a pair of W and Z as thedouble negative control. It is not necessary to have both NCO and NCE, although the doublenegative control will be suﬃcient for nonparametric identiﬁcation of the ATE as detailed inSection 3.2.Figure 1 illustrates a directed acyclic graph (DAG) encoding the above assumptions.Consider a study of the eﬀectiveness of ﬂu shot ( A ) on inﬂuenza-related hospitalization ( Y ).A major concern in such studies is potential hidden bias due to unmeasured health-seekingbehavior ( U ), a well-known common cause of ﬂu shot status and inﬂuenza hospitalization.In such a study, routinely captured information on a person’s annual wellness visit historyentails a good candidate NCE ( Z ) satisfying Assumption 4, as it reﬂects a person’s tendencyto engage in healthy behavior, and is unlikely to cause inﬂuenza hospitalization. Similarly,recorded data on a person’s injury/trauma hospitalization provides compelling candidateNCO( W ) satisfying Assumption 3, as it is likely associated with health-seeking behavior andunaﬀected by ﬂu shot. In addition, we can view an instrumental variable (IV) as an NCE[12, 17]. An IV is a pre-treatment variable satisfying the following three core assumptions:(IV relevance) the IV must be associated with the treatment; (Exclusion restriction) theIV must not have a direct eﬀect on the outcome that is not mediated by the treatment;(IV independence) the IV must be independent of unmeasured confounders. For example,physician’s prescribing preference is often taken as an IV in comparative eﬀectiveness studies,because it likely induces variation in the choice of treatment, and may not aﬀect the outcomeother than through its inﬂuence on the treatment [18]. A valid IV satisﬁes Assumption 4 andhence is a valid NCE, which is further explained in Section 3.1. Besides the above three IVconditions, a forth condition is necessary to identify a causal eﬀect, such as the monotonicityassumption or the no current treatment interaction assumption [19–22]. Alternatively, causaleﬀect identiﬁcation using IV is also made possible by further incorporating an NCO undera double negative control framework introduced in Section 3.2.It is important to note that Figure 1 is not the only DAG satisfying the negative controlassumptions. For example, a more general DAG would allow Z to aﬀect A , correspondingto the case where an annual wellness visit could result in ﬂu vaccination during ﬂu season.Moreover, physician preferences are not randomized and may be associated with U viaphysician-patient interactions, potentially violating the IV independence assumption. Suchan invalid IV violating the IV independence assumption is still a valid NCE as long as theexclusion restriction holds, regardless of whether the IV relevance assumption holds. In this4ase, an NCO can be used to repair an invalid IV for causal eﬀect identiﬁcation under adouble negative control framework [12, 17]. Additional DAGs illustrating settings in whichAssumptions 2-3 hold are provided in Table A.1 of the Appendix. As demonstrated in[12] and [17], an NCE can be either pre- or post-treatment variable. Unmeasured commoncauses of the Z - A association and Y - W association can also be present without necessarilyinvalidating Assumptions 3-4. A key insight is that a valid NCO does not necessarily needto be an outcome variable and may in fact precede the treatment in view, while a valid NCEneed not necessarily be a treatment and may in fact be ascertained either together withprimary outcome of interest or subsequently. In prior literature, NCO has been referred to as falsiﬁcation outcome/end point [23–26],control outcome [14, 27, 28], secondary outcome [29, 30], supplementary response [6] andunaﬀected outcome [31]. NCE has been referred to as control exposure [27] and residual-confounding indicator [32, 33]. Both NCO and NCE have been referred to as proxies ofunmeasured confounder [34–36]. In addition, an exposure-outcome pair known a priori tobe unrelated has also been referred to as a negative control pair [37–41].The literature reviewed in the current paper is largely limited to papers that use afore-mentioned nomenclature. Although [3] and [27] review negative control literature, to thebest of our knowledge, this paper is the ﬁrst to systematically summarize both formal causaland statistical methodology together with applications of negative controls. The rest of thepaper is organized as follows. Design and validation of negative controls are discussed in Sec-tion 2. We then review both assumptions and methods for using negative controls to detect,reduce, and remove unmeasured confounding bias in Section 3. We use a simple example toillustrate double negative control adjustment (i.e., leveraging NCE and NCO when both areavailable) of confounding bias in Section 3.2. We close with a summary in Section 4.

Existing applications of negative controls mainly focus on detection of uncontrolled con-founding bias. We list in Table 1 selected studies that employed negative controls to detectresidual confounding and to strengthen causal conclusions. Among these studies, eight usedNCEs and nine used NCOs. Table 1 is by no means comprehensive, as hundreds of stud-ies have leveraged negative control variables as evidenced by the number of recent articlesthat have cited [3] as the foundational paper on the use of negative control exposures andoutcomes in Epidemiology, but rather a representative set of examples that help illustratestrategies for identifying compelling candidate negative controls.

Eﬀect of inﬂuenza vaccination on inﬂuenza hospitalization: using injury/traumahospitalization as an NCO

As detailed in Section 1.1, to study the eﬀects of inﬂuenzavaccination on inﬂuenza hospitalization in the elderly, injury/trauma hospitalization was5aken as an NCO to detect confounding by unmeasured health-seeking behavior [14]. In-ﬂuenza hospitalization before the ﬂu season was also used as an NCO, because ﬂu vaccinecan not protect against inﬂuenza hospitalization when there is little ﬂu virus circulation.

Eﬀect of maternal exposure on oﬀspring outcomes: using paternal exposure asan NCE

A number of publications have used paternal exposure as an NCE to study theintrauterine eﬀect of maternal exposure on oﬀspring outcome. Speciﬁcally, [42–46] studiedthe association between maternal smoking and oﬀspring outcomes, and compared paternaland maternal associations to detect potential bias due to unmeasured confounding by family-level confounding factors or parental phenotypes. Similarly, [47] compared maternal andpaternal distress and their associations with oﬀspring asthma. Evaluation of the validity ofpaternal exposure as an NCE has also been considered in [48]. They found that cotinine levelfrom exposure to partner smoking were low in non-smoking pregnant women, which suggeststhat using paternal smoking as an NCE for investigating intrauterine eﬀects is valid.

Eﬀect of air pollution on health outcomes: using future air pollution as an NCE

Besides use of paternal exposures, NCEs are also used in air pollution studies. For example,[32, 33, 49, 50] studied statistical methods that utilize future air pollution as an NCE for biasdetection and bias reduction, because the future is not expected to causally aﬀect the past.In addition, [51] studied the eﬀect of air pollutant on asthma, and leveraged two diﬀerentNCEs: air pollutant level in the future and air pollutant level in a distant city.

In addition to the above examples, various negative control designs are also summarized inTable 1. Rather than detailing each study in Table 1, we summarize these studies in termsof their respective strategy to identify negative control variables below. A commonly usedstrategy to select negative controls leverages temporal and spacial constraints that essen-tially guarantee the exclusion restrictions in Assumptions 3-4. Temporal ordering leveragesthe universal truth that the future cannot causally aﬀect the past. For example, as detailedabove, [32, 33, 49–51] specify future measurements of air pollution as an NCE to study theeﬀect of current air pollution on health outcomes. Similarly, [46] proposed to look at ma-ternal exposure before and after pregnancy in studying the intrauterine eﬀect of maternalexposure on oﬀspring outcome. An essential prerequisite for this design is that primaryoutcome does not cause subsequent exposure (at least in the short term), certainly a reason-able assumption in air pollution settings. Prior information about timing of exposure alsosometimes allows one to leave out an essential ingredient [3]. For instance, [14] deﬁned asNCO the number of hospitalizations prior to inﬂuenza season in order to estimate the eﬀectof inﬂuenza vaccination on inﬂuenza hospitalization, as little to no ﬂu circulates prior to ﬂuseason for inﬂuenza vaccination to be protective against. Spatial distancing has also beenconsidered as an eﬀective means to enforce exclusion restrictions in Assumptions 3-4. Forinstance, [51] took air pollutant level in a distant city as an NCE to study the eﬀect of airpollutant on asthma. [52, 53] studied screening sigmoidoscopy and mortality from colon tu-mor, and selected tumor from proximal colon that is beyond the reach of the sigmoidoscopy6 eference Exposure Outcome Negative Control Exposure Negative Control Outcome [42] Maternal smoking Low birth weight Paternal smoking[43] Maternal smoking Sudden infant deathsyndrom Paternal smoking[44] Maternal smoking Oﬀspring height, ponderalindex, body mass index Paternal smoking[45] Maternal smoking Oﬀspring blood pressure Paternal smoking[47] Maternal distress Oﬀspring asthma Paternal distress[46, 48]: Maternal smoking, alcoholuse or dietary patterns Oﬀspring development Paternal smoking, alcohol use ordietary patterns[51] Air pollutant Ashma Future air pollutant, airpollutant elsewhere[54] Mammography-screeningparticipation Death from breast-cancer Dental-care participation Death from causes other thanbreast cancer and from externalcauses such as accidents,intentional self-harm and assaults[14] Inﬂuenza vaccination Mortality andpneumonia/inﬂuenzahospitalization Outcome before and after inﬂuenzaseason; injury/traumahospitalization[55] Air pollutant Asthma hospitalization Appendicitis hospitalization[56–59] Smoking Mortality from lungcancer Other causes of death[60] Psychological stress postearthquake Deaths from cardiacevents Other causes of death, e.g. cancer[52, 53] Screening sigmoidoscopy Mortality from distalcolon tumor Mortality from proximal colontumor (above the reach of thesigmoidoscopy)

Table 1: Summary of selected applications using negative controls for detection of confounding bias. s an NCO.Another strategy is to select as NCO an outcome analogous to the primary outcome how-ever resulting from mechanism a priori known to be unrelated to the primary treatment. Asillustration of this approach, consider [14] which took hospitalization due to injury/traumaas an NCO for the primary outcome, hospitalization due to inﬂuenza. Similarly, to evalu-ate the eﬀect of air pollution on hospitalization due to asthma, [55] deﬁned hospitalizationdue to appendicitis as an NCO. In addition, several studies routinely use death from othercauses as NCO: [56–59] studied the eﬀect of smoking on lung cancer with mortality fromother causes as an NCO, [60] studied the eﬀect of psychological stress on deaths from car-diac events after an earthquake with death from other causes as an NCO, and [54] selecteddeath from causes other than breast cancer and from external causes such as accidents, in-tentional self-harm and assaults as NCO to estimate the eﬀect of mammography-screeningparticipation on breast cancer mortality. Despite the various strategies in the literature to ﬁnd candidate negative controls, researchersshould rigorously validate the choice of negative controls and be aware of possible violationsof negative control assumptions. Similar to the assumptions of no unmeasured confounding,negative control assumptions (Assumptions 3 and 4) are causal assumptions that can onlybe established by subject matter considerations and not by empirical test without additionalassumptions. In practice, we recommend checking the following criteria in ﬁnding a candidatenegative control. • “Irrelevant to Y (or A )”: The NCE should not cause the outcome of interest, while theNCO should not be caused by the treatment of interest nor the NCE. These conditionsare formally implied by Assumptions 3 and 4. • “Comparable to A (or Y )”: In most cases it is important to have the source of bias inmind before designing a negative control study although this is not always necessary[12, 13]. Unmeasured confounding mechanism of negative controls should be com-parable to that of A and Y in the following sense: the NCE must be associated withunmeasured confounders conditional on measured confounders and primary treatment;the NCO must be associated with unmeasured confounders conditional on measuredconfounders. Hence the negative control variable is often viewed as a proxy of theunmeasured confounders. A variable completely irrelevant to all mechanisms underconsideration would not provide any useful information. These conditions are for-mally required by Assumptions 5 and 7 in Section 3; • “Adequate Negative Control Power”: The NCE and NCO are not exceedingly rarerelative to primary treatment and outcome variables, respectively. For example, in theevent that the negative control variable is a rare binary variable, or if the associationbetween unmeasured confounder and negative control variable is weak, then large sam-ple may be necessary to achieve suﬃcient power for detecting confounding bias [61,62].We list examples of possible violations of negative control assumptions in the Appendix.8 Review of methods

Key assumption and rationale for bias detection

Assumptions 3 and 4 give rise toformal statistical tests of the null hypothesis that adjustment for observed covariates suﬃcesto control for confounding bias, rejection of which indicates presence of an unmeasuredconfounder U . A key assumption for this bias detection strategy is that the negative controlexposure or outcome is U -comparable to the primary exposure or outcome: Assumption 5 ( U -comparable) . W (cid:54)⊥⊥ U | X and Z (cid:54)⊥⊥ U | A, X . The U -comparability assumption requires that unmeasured confounders U of A - Y as-sociation are identical to those of the A - W association and Z - Y association, such that anon-null A - W or Z - Y association can be attributed to U . Therefore, presence of an associa-tion between primary and negative control variables implies residual confounding bias, whileabsence of such associations implies no empirical evidence of unmeasured confounding. It isimportant to note that when evaluating Z - Y association one must also adjust for A to ruleout the potential association between Z and Y due to the pathway Z − A → Y (the arrowbetween Z and A could either be Z → A or Z ← A ). Examples of such relationships arelisted in Table A.1 of the Appendix. Notably, conditional on X , a valid IV independent of U and associated with A satisﬁes Assumption 5 because of conditioning on a collider A onthe IV → A ← U pathway [12, 17]; likewise an invalid IV that violates the IV independenceassumption deﬁned in Section 1.1 would also satisfy Assumption 5 regardless of whether IVand A are associated, as mentioned in Section 1.1. Methods

As detailed in Section 2, majority of existing applications used negative con-trols for bias detection, by testing for an association between primary and negative controlvariables. A review of bias detection methods is presented in Table 2. For example, [32]formalized bias detection as a Wald test of the coeﬃcient of NCE in a regression modelof the outcome on the primary and negative control exposures. Moreover, [63, 64] notedthat an invalid NCE that violates the exclusion restriction but satisﬁes the U -comparableassumption can nevertheless validate a causal interpretation when it does not appear to beassociated with the outcome adjusting for the treatment of interest. Summary of literature

Beyond bias detection, recent developments have made it possibleto reduce and sometimes completely remove unmeasured confounding bias using negativecontrols. In air pollution studies, current and future pollutant levels are often positivelycorrelated and are associated with unmeasured confounders in the same direction. In thissetting, [33] showed that incorporating future air pollution, an NCE, in the outcome modelcan reduce confounding bias. Further bias attenuation was proposed in [49] by incorporatingboth past and future exposures. Bias reduction using an NCO was considered by [65] inestimation of standardized mortality ratio, where the standardized mortality ratio of theNCO was used to reduce bias in that of the primary outcome. In addition, [38, 40] considered9 eference and Setting Main Assumptions Besides Assumptions 2-5 MethodsD [32]: Time-series study. Z = future air pollution A t +1 . (1) A t +1 ⊥ Y t | A t , U t , X t .(2) log[ E ( Y t )] = α + βA t + γX t + β f A t +1 . Bias detection by Wald-test on β f .[63, 64]: invalid NCE Z . (1) Violation of exclusion restriction Y ( a, z ) (cid:54) = Y ( a ).(2) Z is U -comparable with A : Z (cid:54)⊥⊥ U | A, X . No evidence of Z - Y association adjusting for A implies no residual confounding of A - Y association. R [33, 49]: Time-series study. Z = future air pollution A t +1 . (1) A t +1 ⊥ Y t | A t , U t , X t ; A t +1 (cid:54)⊥⊥ ( A t , U t ) | X t .(2) Y t ( a t , x t , u t ) = β + β α t + β x t + β u t + (cid:15) t ; E [ (cid:15) t | A t = a t , U t = u t , X t = x t ] = 0.(3) E [ U t | A t = a t , A t +1 = a t +1 , X t = x t ] = α + α a t + α x t + α a t +1 ; sign ( α ) = sign ( α ).(4) E [ A t +1 | A t = a t , X t = x t ] = γ + γ a t + γ x t ; γ >

0. Bias reduction by ﬁtting E [ Y t | A t , X t , A t +1 ] insteadof ﬁtting E [ Y t | A t , X t ]. Further bias reductionconsidered in [49] by incorporating X t +1 or A t − .Identiﬁcation of β is possible with multiple futureexposures under autoregressive model for exposuretime series.[65]: Standardized mortalityratio in occupational cohortstudy. (1) E [ Y (1) | X = k ] /E [ Y ref | X = k ] = exp( α k − δ k ) E [ W | X = k ] /E [ W ref | X = k ] = exp( − (cid:15) k ).(2) sign ( (cid:15) k ) = sign ( δ k ) and 0 < | (cid:15) k | < | δ k | . Adjust for bias δ k via E [ Y (1) | X = k ] E [ W ref | X = k ] /E [ Y ref | X = k ] E [ W | X = k ] . [38, 40]: Deﬁne negativecontrols as drug–outcomepairs where one believes nocausal eﬀect exists. (1) For a negative control drug-outcome pair, the eﬀectestimate β i ∼ N ( θ i , τ i ) , i = 1 , . . . , n , where θ i ∼ N ( µ, σ )is the true bias.(2) Under the null of no treatment eﬀect, the eﬀectestimate β n +1 H ∼ N ( µ, σ + τ n +1 ). Estimate µ, σ by MLE with L ( µ, σ | θ, τ ) = Π ni =1 ´ p ( β i | θ i , τ i ) p ( θ i | µ, σ ) dθ i .Calibrated p -value computed via Wald-test of β n +1 .Conﬁdence interval calibrated similarly usingdistribution generated by positive controls. C [66, 67]: W, Y =Time-to-event outcome. (1) There exist monotonic functions that describe U - Y and U - W associations: Y (0) = h y ( U, X ) , W = h w ( U, X ).(2) Cox models for Y and W w/ hazard ratio e β y and e β w . The hazard ratio measuring the causal eﬀect oftreatment is e β y − β w .[13, 68]: Generalizeddiﬀerence-in-diﬀerences usingNCO. (1) There exist monotonic functions that describe U - Y and U - W associations: Y (0) = h y ( U, X ) , W = h w ( U, X ).(2) Positivity: if 0 < f W | A =1 ,X ( W ∗ ) then0 < f W | A =0 ,X ( W ∗ ) <

1, where W ∗ = ( W | A = 1 , X ) isdistributed as W in the exposed group. The average treatment eﬀect on the treated is E [ Y (1) − Y (0) | A = 1] = E [ Y | A =1] − E [ F − Y | A =0 ,X ) · F W | A =0 ,X ( W ∗ )]. Generalized thediﬀerence-in-diﬀerences approach to the broadercontext of NCO.[69]: Calibration using NCO. (1) W ⊥ A | X, Y (1) , Y (0). (2) Rank preservation: Y = Y (0) + Ψ A , and hence W ⊥ A | X, Y (0) by (1).(3) E [ W | A, Y (0) = Y − Ψ A, X ] = β + β X + β Y (Ψ)+ β A ,where β = 0 by (1). The 95% CI for any Ψ consists of all Ψ for whichˆ β (Ψ) ± . β (Ψ)] contains 0; Under (1)-(3), ﬁt E [ W | A, Y, X ] = β + β X + β Y + β Ψ A , then thecausal eﬀect Ψ = − β Ψ /β .[70–72]: Removing unwantedvariation in gene-expressionanalysis. (1) Y × p = X × q β q × p + U × r Γ r × p + (cid:15) × p , p ≥ r + 1.(2) W × s = U × r Γ Wr × s + (cid:15) W × s , s ≥ r, Rank(Γ Wr × s ) = r .(3) ( (cid:15), (cid:15) W ) ∼ N (0 , diag( σ , . . . , σ p + s )) , ( (cid:15), (cid:15) W ) ⊥⊥ ( X, U ).(4) U × r = X q α q × r + (cid:15) U × r , (cid:15) U ∼ N (0 , I r ) , (cid:15) U ⊥⊥ X . [70, 71]: Estimate U by factor analysis of (2), thenestimate β from (1). [72]: Estimate Γ W and Γ byfactor analysis of Y = X ( β + α Γ) + ( (cid:15) U Γ + (cid:15) ) (5) and W = Xα Γ W + ( (cid:15) U Γ W + (cid:15) W ) (6). Then estimate α from (6), and estimate β from (5).[12, 17, 36]: Nonparametricidentiﬁcation. Assumption 7 Identify h in E [ Y | A, Z, X ] = E [ h ( W, A, X ) | A, Z, X ],then ATE = E [ h ( W, A = 1 , X )] − E [ h ( W, A = 0 , X )].

Table 2: Summary of published methodologies using negative controls for detection (D), reduction (R), and correction (C) of confounding bias. alibrating p -value and conﬁdence intervals by deriving an empirical null distribution fromthe association between primary and negative control variables.Several methods were developed to achieve full bias removal, under certain assumptionssuch as monotonicity [13, 66–68], rank preservation [69], and linear model for unmeasuredconfounding. Speciﬁcally, [66, 67] considered bias correction by using a negative controltime-to-event outcome under a monotonicity assumption that describes the U - Y and U - W association. Under a similar monotonicity assumption, [13] generalized diﬀerence-in-diﬀerence method to NCO method, which is further extended by [68]. In addition, [69]developed an outcome calibration approach with a rank preservation assumption under whichthe counterfactual primary outcome can account for the unmeasured confounding betweenthe A - W association. Lastly, [70–72] assumed a linear model for the unmeasured confounderand proposed to estimate U by factor analysis. Nonparametric identiﬁcation in a double negative control design

The above meth-ods remove unmeasured confounding bias under relatively stringent assumptions. [36] es-tablished suﬃcient conditions under which the ATE can be nonparametrically identiﬁedleveraging an NCE and an NCO, i.e., via a double negative control design [17]. That is, theATE can be uniquely expressed as a function of the observed data distribution without im-posing any restriction on the observed data distribution, such that distinct data generatingmechanisms are guaranteed to lead to distinct ATE values. Further method developmentsinclude semiparametric estimation under categorical negative controls and unmeasured con-founding [17] and alternative strategies to identify the ATE via a so-called confoundingbridge function [12].Double negative controls are widely available in health sciences. For example, in air pollu-tion studies, [12] used future air pollution level and past health outcome as negative controlexposure and outcome, respectively. [17] took two routinely monitored control outcomesfrom administrative healthcare data in vaccine safety studies as double negative control, inthe setting where both control outcomes are independent of the primary outcome and satisfyboth Assumption 3 and Assumption 4. In inﬂuenza vaccine eﬀectiveness research presentedin Figure 1, annual wellness visit and injury/trauma hospitalization can serve as doublenegative control. In addition, when IV is available, identiﬁcation is made possible by furtherincorporating an NCO such as a pretreatment measurement of the outcome.Below we will ﬁrst detail the identiﬁcation conditions established in [36] and then intro-duce identiﬁcation methods proposed in [36] and [12].

Assumption 6 (Positivity) . < P ( A = a, Z = z | X ) < for all a , z . Assumption 7 (Completeness) . (a) For all a , W (cid:54)⊥⊥ Z | A = a, X . (b) For any squareintegrable function g , if E [ g ( W ) | Z = z, A = a, X ] = 0 for almost all z, a , then g ( W ) = 0 . Assumption 6 is a regular positivity assumption ensuring that in all strata of X , thereare always some individuals with A = a, Z = z for all a , z . Assumption 7 is a commonlyused completeness condition for identiﬁcation [73]. Speciﬁcally, Assumption 7(a) essentiallyrequires U -comparability. That is, both Z and W should be associated with U such thatvariation in U can be recovered from variation in Z and W . Assumption 7(b) aims to ensurethat the underlying unmeasured confounding mechanism in E [ Y | A, U ] can be identiﬁed11sing Z and W . For example, suppose U is a binary variable. Then Assumption 7 furtherrequires that Z and W have at least two categories, and E [ W | A = a, Z = 1 , X = x ] − E [ W | A = a, Z = 0 , X = x ] is not equal to zero for all a, x . Rationale

In the presence of unmeasured confounding by a latent variable U , an observeddiﬀerence in the outcome between the treatment and control groups is a combination ofthe underlying causal eﬀect and confounding bias. One cannot directly disentangle thevariation in the outcome due to the treatment from the unwanted variation due to U , as U is not measured. We seek to indirectly remove such unwanted variation, i.e., unmeasuredconfounding bias, by leveraging available proxies of U . An important example of such proxy isan NCO chosen to be associated with U but not causally aﬀected by the treatment (Figure 1).Therefore, any diﬀerence in the NCO, W , between the treatment and control groups can onlybe attributed to U . Such a diﬀerence can uncover the unwanted variation due to U assumingthat U - Y and U - W associations are the same, and there is no U - A additive interaction on Y . An example of such W is the pre-exposure baseline measure of the outcome, in whichcase bias adjustment reduces to the well-known diﬀerence-in-diﬀerences approach [13].The above describes identiﬁcation of the ATE under assumptions that are generallyuntenable, because the U - Y and U - W associations will often be on diﬀerent scales, andthere may be U - A interactions in the model for Y . In order to nonparametrically identifyunmeasured confounding bias, we make use of the NCE Z. Because Z is associated with Y or W only through U , the ratio of Z - Y and Z - W associations captures the ratio of U - Y and U - W associations, allowing for U - A interactions. In summary, leveraging a double negativecontrol design one can nonparametrically identify the magnitude of unmeasured confoundingbias via the following mechanism: The NCO uncovers the confounding bias up to a scalethat reﬂects the diﬀerence between U - Y and U - W associations, while the NCE recovers thescale leveraging Z - Y and Z - W associations. This mechanism is further illustrated in anexample below. Example

To further illustrate the idea of identiﬁcation using double negative control,consider a simple example where we assume the following linear structural equation modelsinvolving unmeasured confounding U , although the nonparametric identiﬁcation proposedin [36] does not rely on any restriction about the data generating models. We suppressmeasured confounders X to ease notation – all arguments are made implicitly conditionalon X .Had U been measured, we could ﬁt (1) and obtain the true causal eﬀect which is β YA .When in fact U is not measured, to leverage double negative control, we additionally assumethe U - W relationship in (2) and U - Z relationship in (3). E [ Y | A, U ] = β Y0 + β YA A + β YU U (1) E [ W | U ] = β W0 + β WU U (2) E [ U | A, Z ] = β U0 + β UA A + β UZ Z. (3)Models (1)–(3) indicate the following models that one could actually ﬁt using the observeddata ( Y, A, W, Z ). These models are obtained by replacing U with E [ U | A, Z ] in the primary12nd negative control outcome models (1) and (2). E [ Y | A, Z ] (1) = β Y0 + β YA A + β YU E [ U | A, Z ] (4) (3) = β Y0 + β YA A + β YU ( β U0 + β UA A + β UZ Z ) (5) E [ W | A, Z ] (2) = β W0 + β WU E [ U | A, Z ] (6) (3) = β W0 + β WU ( β U0 + β UA A + β UZ Z ) . (7)From (1) we know that the true causal eﬀect is β YA . However, if one were to regress Y on A and Z without accounting for U such as in [33], then the coeﬃcient of A would beequal to β YA + β YU β UA . Here β YU β UA is confounding bias, which arises when there exists a U that is associated with both Y and A . One cannot directly separate the confounding biasfrom the true causal eﬀect because U is not observed. Nevertheless, the coeﬃcients in theobserved models (5) and (7) allows us to infer β YU β UA . To facilitate discussion, we introducenotation for the coeﬃcients in models (5) and (7). Let δ YA = β YA + β YU β UA and δ YZ = β YU β UZ denote the coeﬃcients of A and Z in the primary outcome model (5), respectively, and let δ WA = β WU β UA and δ WZ = β WU β UZ denote the coeﬃcients of A and Z in the negative controloutcome model (7), respectively.We detail three strategies to identify the unmeasured confounding bias β YU β UA leveraginga single NCO, a single NCE, or the double negative control. First, we note that coeﬃcientof A in the primary outcome model, δ YA , is a combination of both true causal eﬀect andconfounding bias, whereas coeﬃcient of A in the negative control outcome model, δ WA , reﬂectspure confounding bias because A does not causally aﬀect W . In fact, if U - Y and U - W associations are equal on the additive scale, i.e., β WU = β YU , then δ WA matches the confoundingbias β YU β UA . That is, under the assumption of equal U - Y and U - W additive association, aform of “additive outcome equi-confounding” [13], the treatment eﬀect on NCO is equal tothe unmeasured confounding bias. Hence the causal eﬀect can be recovered by backing outthe association of the treatment with the NCO from the association of the treatment withthe primary outcome. Note that in this scenario it is not necessary to have an NCE: one canﬁt the primary and negative control outcome on treatment without adjusting for the NCE,and then take the diﬀerence in treatment eﬀects. When NCO is the baseline outcome, theabove reduces to the diﬀerence-in-diﬀerence method [13].Second, the coeﬃcient of Z in the primary outcome model, δ YZ , would be zero if there wasno unmeasured confounding because Z does not causally aﬀect Y . Therefore, coeﬃcient of Z in the outcome model reﬂects pure confounding bias. In fact, if U - A and U - Z associationsare the equal on the additive scale, i.e., β UA = β UZ , then δ YZ captures the bias β YU β UA dueto unmeasured confounding. That is, under the assumption of equal U - A and U - Z additiveassociation, a form of “additive treatment equi-confounding”, the NCE eﬀect on the primaryoutcome is equal to the unmeasured confounding bias. Hence the causal eﬀect is given bythe diﬀerence in coeﬃcients of treatment and NCE in the primary outcome model. Notethat in this scenario it is not necessary to have an NCO: one can ﬁt the primary outcomeon treatment and NCE, and then take the diﬀerence in eﬀects of treatment and NCE on Y .In both scenarios described above, the “additive outcome equi-confounding” or “additivetreatment equi-confounding” is a rather strong assumption, as it requires Y and W , or Z A , to operate on the same scale. To relax these assumptions, we can leverage thedouble negative control. Speciﬁcally, if U - Y and U - W associations are unequal, then δ WA reﬂects pure confounding bias up to a scale which is equal to β YU /β WU . Because Z - Y ( Z - W )association is a product of U - Z and U - Y ( U - W ) associations, the ratio of Z - Y and Z - W associations is equal to the ratio of U - Y and U - W associations. That is, β YU /β WU = δ YZ /δ WZ .The confounding bias is thus equal to δ WA scaled by δ YZ /δ WZ , and the true causal eﬀect is giveby δ YA − δ WA × δ YZ /δ WZ . It is important to note that the ﬁrst two adjustment methods are aspecial case of the general adjustment method, in that the confounding bias is always equalto δ WA δ YZ /δ WZ across all three scenarios.To summarize, the confounding bias β YU β UA = δ WA δ YZ /δ WZ =  δ WA if β WU = β YU δ YZ if β UA = β UZ δ WA δ YZ /δ WZ if β WU (cid:54) = β YU and β UA (cid:54) = β UZ . (8a)(8b)(8c)Hence the true causal eﬀect is identiﬁed as β YA = δ YA − δ WA δ YZ /δ WZ . (9)It is important to note that equation (9) is only meaningful when δ WZ is not equal to zero.If δ WZ = 0 then either there is no evidence of the presence of U and β YU β UA = 0, or a se-lected negative control variable is not suﬃciently associated with U , violating Assumption 7.Similar arguments apply to δ WA and δ YZ . In fact, as summarized in Table 2, many negativecontrol methods detect, reduce, and remove unmeasured confounding bias using analogies ofscenario (8a) [13, 65–67] and scenario (8b) [32, 33, 49].In practice, identiﬁcation via (9) relies on ﬁtting the primary and negative control out-come models E [ Y | A, Z ] and E [ W | A, Z ]. Alternatively, one could directly make as-sumption about the underlying unmeasured confounding mechanism E [ Y | A, U ] which isproposed in [12]. To illustrate, consider again the example above. Let (cid:101) U W = W − β β WU , thenby (2) (cid:101) U W is a good proxy of U in the sense that E [ (cid:101) U W | U ] = U . In particular, let h ( W, A ) = β Y0 + β YA A + β YU (cid:101) U W , then by (1) we have E [ Y | A, U ] = E [ h ( W, A ) | A, U ] , (10) E [ Y | A, Z ] = E [ h ( W, A ) | A, Z ] , (11)where (11) is obtained by taking expectation on both sides of (10). The above equationsindicate that h captures the relationship between U - Y and U - W associations via (10), whichcan be identiﬁed by the relationship between Z - Y and Z - W associations via (11). Becauseof this key observation, h is referred to as the confounding bridge function in [12]. Thefunctional form of h is implied by (1) and (2). Once h is identiﬁed, we have that E [ Y ( a )] (10) = E U { E [ Y | A = a, U ] } = E [ h ( W, A = a )]. In practice, one may assume a familiar linearmodel about the functional form of h that satisﬁes (10), such as h ( W, A ; θ ) = θ + θ A A + θ W W. (12)14hen under Assumption 7, θ can be identiﬁed by the population moment equation E [ g ( A, Z ) { Y − h ( W, A ; θ ) } ] = 0 using the generalized method of moments (GMM) method [74]. With θ identiﬁed, the ATE is given byATE = E [ h ( W, A = 1; θ )] − E [ h ( W, A = 0; θ )] . (13)A simple version of the above GMM procedure can be realized via a simple two stage leastsquares procedure as followed [12]:Stage I: regress W on A and Z (with intercept), and obtain the ﬁtted value (cid:99) W as a proxy of U ;Stage II: regress Y on and A (with intercept), adjusting for (cid:99) W ,then the coeﬃcient of A is the true causal eﬀect β YA assuming (1) and (2). The two stageleast squares approach given above provides a simple implementation of the NC methodusing existing and widely disseminated IV software packages such as the ivregress , ivreg ,or ivreg2 command in Stata, the gmm , sem , ivpack , or AER package in R, and the

SYSLIN procedure in SAS.

Negative controls are innovative and important tools in observational studies. Develop-ment of negative control methods will encourage researchers to routinely check for evidenceof confounding bias and rigorously adjust for residual confounding bias. Negative controlvariables are widely available in routinely collected healthcare data such as administrativeclaims and electronic health records data, because information on secondary treatments andoutcomes beyond the primary treatment and outcome of interest are often recorded, andsuch secondary treatments and outcomes can potentially serve as negative controls. There-fore development of negative controls methods is critical to unlocking the full potential ofcontemporary healthcare data and ultimately improve the validity of research ﬁndings. Itis important to note that other sources of bias, such as selection bias and misclassiﬁcationbias, are typical in routinely collected healthcare data. Developing negative control methodsaccounting for bias beyond residual confounding is thus an important area of future research.We have speciﬁed statistical assumptions, practical strategies, and validation criteriathat can be combined with subject matter knowledge to design negative control studies inSection 2. We also illustrated identiﬁcation of the ATE by either ﬁtting the observed primaryand negative control outcome models or through assumption on the unmeasured confoundingmechanism followed by a simple two stage least squares procedure in Section 3. We believethat these examples can provide practical guidance on use of negative control methods to abroader audience. 15 ppendix

A.1 Examples of invalid negative controls that violates some as-sumption

Violation 1: no arrow between U and W

There must be an arrow between U and W ,because an NCO is a proxy of unmeasured confounder. It recovers the confounding bias byreﬂecting variation due to U . Violation 2: no arrow between U and Z, and Z (cid:54)→ A The only scenario that Z doesnot need to be associated with U is when Z is an instrumental variable (see ﬁrst cell ofTable A.1). In this case, A is a collider between Z and U , such that Z and U are marginallyindependent. Conditioning on a collider will create collider bias such that Z and U becomeconditionally dependent. The requirements about Z in Assumptions 5 and 7 are all madeconditioning on A . Therefore an instrumental variable is a valid NCE. Violation 3: Y → W If the outcome causes the NCO, then the treatment directly causesthe NCO via the path A → Y → W , which violates Assumption 3. Violation 4: Z → U ← W The direction of the arrow between U and the negative controldoesn’t always matter. For example, we can have Z → U , U → Z , W → U , or U → W .However, if both Z and W cause U , then U is a collider in the path Z → U ← W . In thiscase, conditional on U , Z and W will become associated. This violates Assumption 2. A.2 Example of causal graphs encoding the negative control as-sumptions

Below we enumerate the possible relationships among

Z, A, U and among

Y, W, U in Ta-ble A.1. These partial graphs can be combined into a directed acyclic graph that encodesthe negative control assumptions. Grey colored graphs are invalid because of violation ofkey assumptions. 16able A.1: Examples of graphs for

Z, A, U relationships and for

W, Y, U relationships. The two pieces of graphs can be combined in to a directedacyclic graph that encodes the negative control assumptions. Grey colored graphs are invalid because of violation of key assumptions.

Examples of graphs for

Z, A, U relationships Z → A (pre-treatment) A → Z (post-treatment) Z ⊥⊥ A No arrow between Instrumental variable (IV) Violate Assumption 5 and 7 Violate Assumption 5 and 7 U and Z (may violate A U, X YZ A U, X YZ A U, X YZ

Assumption 5 and 7) Invalid IV Post-treatment proxy of U Surrogate of UU → Z A U, X YZ A U, X YZ A U, X YZ

May violate Assumption 2 if there is W → UZ → U A U, X YZ A U, X YZ A U, X YZ

Examples of graphs for

W, Y, U relationships W → Y ( a ) Y ( a ) → W Y ( a ) ⊥⊥ W | ( U, X )(violate Assumptions 3 and 4)No arrow between Violate Assumption 5 and 7 Violate Assumptions 3, 5, and 7 Violate Assumption 5 and 7 U and W (violate A U, X Y W A U, X Y W A U, X Y W

Assumption 5 and 7) Violate Assumption 3 U → W A U, X Y W A U, X Y W A U, X Y W

May violate Assumption 2 if there is Z → U Violate Assumption 3 W → U A U, X Y W A U, X Y W A U, X Y W eferences [1] John PA Ioannidis. “Why most published research ﬁndings are false”. In: PLOS Medicine

American Journal of Epidemiology •• Marc Lipsitch, Eric J Tchetgen Tchetgen, and Ted Cohen. “Negative controls: atool for detecting confounding and bias in observational studies”. In:

Epidemiology

This paper is the ﬁrst to formally deﬁne negativecontrol exposure and outcome with conditions for bias detection as well asexamples in epidemiology. [4] Benjamin F Arnold, Ayse Ercumen, Jade Benjamin-Chung, and John M Colford Jr.“Brief report: negative controls to detect selection bias and measurement bias in epi-demiologic studies”. In:

Epidemiology

Journal of the American Medical Association

Biometrics

Epidemiology

Experimental Design for Biologists . Cold Spring Harbor LaboratoryPress, 2014.[9] Zhihong Cai and Manabu Kuroki. “On identifying total eﬀects in the presence oflatent variables and selection bias”. In:

Proceedings of the Twenty-Fourth Conferenceon Uncertainty in Artiﬁcial Intelligence . 2008, pp. 62–69.[10] Lan Liu and Eric Tchetgen Tchetgen. “Regression-based Negative Control of Ho-mophily in Dyadic Peer Eﬀect Analysis”. In: arXiv preprint arXiv:2002.06521 (2020).[11] Naoki Egami. “Identiﬁcation of Causal Diﬀusion Eﬀects Under Structural Stationar-ity”. In: arXiv preprint arXiv:1810.07858 (2018).[12] • Wang Miao, Xu Shi, and Eric J Tchetgen Tchetgen. “A Confounding Bridge Approachfor Double Negative Control Inference on Causal Eﬀects”. In: (2020). In progress, aprior version can be found at https://arxiv.org/abs/1808.04945 . This paperintroduces the confounding bridge function that links primary and negativecontrol outcome distributions for identiﬁcation of the average treatmenteﬀect leveraging a negative control exposure. [13] Tamar Sofer, David B Richardson, Elena Colicino, Joel Schwartz, and Eric J TchetgenTchetgen. “On negative outcome control of unobserved confounding as a generalizationof diﬀerence-in-diﬀerences”. In:

Statistical Science

International Journal of Epidemiology

Statistical Science (1990), pp. 465–472.[16] Donald B Rubin. “Estimating causal eﬀects of treatments in randomized and nonran-domized studies.” In:

Journal of Educational Psychology • Xu Shi, Wang Miao, and Eric J Tchetgen Tchetgen. “Multiply robust causal inferencewith double negative control adjustment for categorical unmeasured confounding”. In:

Journal of the Royal Statistical Society: Series B (Statistical Methodology)

This paper provides a general semiparametric framework forobtaining inferences about the average treatment eﬀect under categoricalunmeasured confounding and negative controls. [18] M Alan Brookhart, Jeremy A Rassen, and Sebastian Schneeweiss. “Instrumental vari-able methods in comparative safety and eﬀectiveness research”. In:

Pharmacoepidemi-ology and Drug Safety

Journal of the American Statistical Association

Epidemiology (2006), pp. 360–372.[21] James M Robins. “Correcting for non-compliance in randomized trials using struc-tural nested mean models”. In:

Communications in Statistics-Theory and methods

Journal ofthe Royal Statistical Society: Series B (Statistical Methodology)

Journal of the American Medical Association

Annalsof Internal Medicine

BMC Public Health

Scientiﬁc Reports

HealthServices Research

Design of observational studies . New York, NY: Springer-Verlag,2010.[29] Marcus R Munaf`o, Kate Tilling, Amy E Taylor, David M Evans, and George DaveySmith. “Collider scope: when selection bias can substantially inﬂuence observed asso-ciations”. In:

International Journal of Epidemiology

Journal of the American StatisticalAssociation

Biometrika

Epidemiology • W Dana Flanders, Matthew J Strickland, and Mitchel Klein. “A new method for par-tial correction of residual confounding in time-series and other observational studies”.In:

American Journal of Epidemiology

This paper de-velops a regression-based method taking future air pollution as a negativecontrol exposure to reduce residual confounding bias in a time-series studyon air pollution eﬀects. [34] Xavier de Luna, Philip Fowler, and Per Johansson. “Proxy variables and nonparametricidentiﬁcation of causal eﬀects”. In:

Economics Letters

150 (2017), pp. 152–154.[35] Manabu Kuroki and Judea Pearl. “Measurement bias and eﬀect restoration in causalinference”. In:

Biometrika •• Wang Miao, Zhi Geng, and Eric J Tchetgen Tchetgen. “Identifying causal eﬀectswith proxy variables of an unmeasured confounder”. In:

Biometrika

This paper establishes suﬃcient conditions for nonparametricidentiﬁcation of the average treatment eﬀect using double negative control. [37] • David Madigan, Paul E Stang, Jesse A Berlin, Martijn Schuemie, J Marc Overhage,Marc A Suchard, Bill Dumouchel, Abraham G Hartzema, and Patrick B Ryan. “Asystematic statistical approach to evaluating evidence from observational studies”. In:

Annual Review of Statistics and Its Application

This paperprovides a systematic review of challenges in observational studies and de- cribes a data-driven approach to calculating calibrated p-values leveragingnegative controls. [38] Martijn J Schuemie, Patrick B Ryan, William DuMouchel, Marc A Suchard, and DavidMadigan. “Interpreting observational studies: why empirical calibration is needed tocorrect p-values”. In: Statistics in Medicine

Statistics in Medicine

Proceedings of the National Academyof Sciences

Philosophical Transactions of the Royal Society A:Mathematical, Physical and Engineering Sciences

American Journal of Epidemiology

Pediatrics

International Journal of Epidemiology

Hypertension

Basic & Clinical Pharmacology& Toxicology

Scandinavian Journal of Public Health

Drug and Alcohol Dependence

139 (2014), pp. 159–163.2149] Wang Miao and Eric J Tchetgen Tchetgen. “Invited commentary: bias attenuation andidentiﬁcation of causal eﬀects with multiple negative controls”. In:

American Journalof Epidemiology

American Journal of Epidemiology (2020).[51] Thomas Lumley and Lianne Sheppard. “Assessing seasonal confounding and model se-lection bias in air pollution epidemiology using positive and negative control analyses”.In:

Environmetrics

New England Journal of Medicine

Digestive Diseases and Sciences • Mette Lise Lousdal, Timothy L Lash, W Dana Flanders, M Alan Brookhart, IvarSønbø Kristiansen, Mette Kalager, and Henrik Støvring. “Negative controls to detectuncontrolled confounding in observational studies of mammographic screening com-paring participants and non-participants”. In:

International Journal of Epidemiology (2020).

This paper uses both negative control exposure and negative con-trol outcome to detect residual confounding in an observational study ofmammographic screening comparing participants and non-participants. [55] Lianne Sheppard, Drew Levy, Gary Norris, Timothy V Larson, and Jane Q Koenig.“Eﬀects of ambient air pollution on nonelderly asthma hospital admissions in Seattle,Washington, 1987-1994”. In:

Epidemiology (1999), pp. 23–30.[56] E Cuyler Hammond and Daniel Horn. “The relationship between human smokinghabits and death rates: a follow-up study of 187,766 men”. In:

Journal of the AmericanMedical Association

British Medical Journal

Journal of the National Cancer institute

The Lancet

Epidemiology

Epidemiology (Cambridge, Mass.)

Epidemiology

American Journal ofEpidemiology https://biostats.bepress.com/harvardbiostat/paper192/ .[68] Adam Glynn and Nahomi Ichino. “Generalized Nonlinear Diﬀerence-in-Diﬀerence-in-Diﬀerences”. In:

V-Dem Working Paper

90 (2019). Available at https : / / papers .ssrn.com/sol3/papers.cfm?abstract_id=3410888 .[69] Eric Tchetgen Tchetgen. “The control outcome calibration approach for causal in-ference with unobserved confounding”. In:

American Journal of Epidemiology

Biostatistics

Biostatistics • Jingshu Wang, Qingyuan Zhao, Trevor Hastie, and Art B Owen. “Confounder adjust-ment in multiple hypothesis testing”. In:

Annals of Statistics

This paper uniﬁes unmeasured confounding adjustment methods inmultiple hypothesis testing and provides theoretical guarantees for thesemethods. [73] Whitney K Newey and James L Powell. “Instrumental variable estimation of nonpara-metric models”. In:

Econometrica