Sensitivity Analysis for Unmeasured Confounding via Effect Extrapolation
SSensitivity Analysis for Unmeasured Confounding via EffectExtrapolation
Wen Wei Loh and Stijn Vansteelandt Department of Data Analysis, Ghent University, Gent, Belgium. Department of Applied Mathematics, Computer Science and Statistics, Ghent University,Ghent, Belgium Department of Medical Statistics, London School of Hygiene and Tropical Medicine,London, United KingdomFebruary 4, 2021
Abstract
Inferring the causal effect of a non-randomly assigned exposure on an outcome requires adjustingfor common causes of the exposure and outcome to avoid biased conclusions. Notwithstanding the ef-forts investigators routinely make to measure and adjust for such common causes (or confounders),some confounders typically remain unmeasured, raising the prospect of biased inference in observa-tional studies. Therefore, it is crucial that investigators can practically assess their substantive con-clusions’ relative (in)sensitivity to potential unmeasured confounding. In this article, we propose asensitivity analysis strategy that is informed by the stability of the exposure effect over different, well-chosen subsets of the measured confounders. The proposal entails first approximating the process forrecording confounders to learn about how the effect is potentially affected by varying amounts of un-measured confounding, then extrapolating to the effect had hypothetical unmeasured confounders beenadditionally adjusted for. A large set of measured confounders can thus be exploited to provide insightinto the likely presence of unmeasured confounding bias, albeit under an assumption about how dataon the confounders are recorded. The proposal’s ability to reveal the true effect and ensure valid infer-ence after extrapolation is empirically compared with existing methods using simulation studies. Wedemonstrate the procedure using two different publicly available datasets commonly used for causalinference. a r X i v : . [ s t a t . M E ] F e b eywords: Causal inference, Hidden bias, Observational studies, Residual confounding, Uncon-trolled confounding, Unobserved confounding INTRODUCTION
When inferring the causal effect of a non-randomly assigned (point) exposure on an outcome, commoncauses (i.e., confounders ) of both exposure and outcome that are unadjusted for can induce associationsthat lead to biased estimates. Investigators therefore strive to measure as many (pertinent) confoundersas possible when analyzing data from observational studies in efforts to eliminate such biases due to un-measured confounding. But in most realistic settings, some confounders remain unadjusted for, raisingthe prospect of such biases being merely reduced, but not entirely eliminated. For example, there may beeither confounders that are unfeasible or impractical to observe or record , or (measured) confounders ex-cluded by routine variable selection techniques . It is therefore crucial that investigators can practicallyassess the relative stability of their conclusions to dormant confounding that remains unadjusted for. PROPOSED SENSITIVITY ANALYSIS STRATEGY
In this paper, we propose a novel sensitivity analysis strategy to assess how different the conclusions wouldbe after removing unmeasured confounding. To understand the proposal, consider the following thoughtexperiment based on an illustration presented later. Suppose that you were given data from a simple ran-domized trial on a (dichotomous) AIDS therapeutic treatment, an outcome measuring patients’ CD4 Tcell count, and a collection of baseline covariates. However, suppose that you were not told that the dataoriginated from a randomized experiment. Then upon analyzing the data, you would observe that regard-less of the covariates adjusted for, roughly the same average treatment effect estimate is obtained. The rel-ative stability of the effect estimates, across different adjustment sets, would allow you to gain confidencethat the data might have originated from a randomized experiment. This confidence would grow as moreand more covariates (beyond the initial collection) are subsequently recorded, and you keep on observing astability in the average treatment effect estimate that is maintained over larger and larger adjustment sets.What we learn from the above thought experiment is that a large set of measured covariates canbe exploited to provide insight into the likely presence of unmeasured confounding bias, albeit under anassumption about how the confounders are recorded. In particular, we assume that under repeated sam-pling, covariates are chosen (from a possibly infinitely large collection) according to some sampling mecha-nism, such that each confounder that simultaneously influences exposure and outcome has a positive (non-zero) probability, or likelihood, of being recorded (and adjusted for). Confounders that are unadjusted for2ould thus induce unmeasured confounding, and result in biased effects. But if that sampling mechanismis well understood, then the bias due to unmeasured confounding can be eliminated, by extrapolating theobservable behavior of the effects adjusting for the measured confounders to recover the (true) effect hadthe entire collection of confounders been adjusted for. Indeed, suppose for instance that the measured con-founders form a random subset of all confounders, where each confounder has an equal probability of be-ing recorded. Our proposal mimics the sampling mechanism as follows. Starting with the set of measuredconfounders, randomly select a confounder and eliminate it from adjustment so that it can be consideredto be unmeasured. Assuming a sufficiently large sample size so that sampling variability can be ignored,calculate the (population-averaged) exposure effect adjusting for only the retained confounders. Repeatthis process of intentionally eliminating from adjustment a single confounder one at a time, until no con-founders are adjusted for. We can now probe how the (biased) effects change with different amounts ofunmeasured confounding; furthermore, we can extrapolate to the (unbiased) effect adjusting for the en-tire collection of confounders. In practice, however, because the sampling mechanism through which con-founders are recorded for adjustment is less well understood, we propose a more heuristic approach to ap-proximate the (non-random) process for recording confounders.
SENSITIVITY ANALYSIS VIA EFFECT EXTRAPOLATIONIntentionally eliminating measured covariates from adjustment
The proposal proceeds by eliminating measured covariates one at a time to intentionally induce unmea-sured confounding. Ideally, the order of elimination should be determined using information about theinvestigators’ process for recording confounders. But such information may be difficult to precisely quan-tify in practice. In this paper, we will thus proceed under the assumption that the covariates exerting thestrongest impact on the effect are prioritized to be recorded (and adjusted for) first. We propose a data-adaptive approach to mimic such a process. For simplicity we will initially ignore any sampling variability.Starting with the set of measured covariates, eliminate the covariate that causes the smallest change in the(population-averaged) exposure effect when unadjusted for, so that the covariate can be considered as un-measured. Repeat this until no covariates remain for adjustment. Under this assumed process, covariatesthat are eliminated earlier have lower priority for confounding adjustment and thus the weakest impact onthe effect.To aid visualizing the impact of unmeasured confounding on the effect, we partition the space ofall possible covariate subsets into orbits , where the j -th orbit comprises all subsets with j + 1 covariates,including an intercept. Let J denote the total number of measured covariates so that there are J + 1 differ-3nt orbits. We briefly introduce the notation as follows. In a sample of size n , for individual i = 1 , . . . , n ,denote the binary exposure by A i and the outcome of interest by Y i . Let Y ai denote the potential outcomefor Y i if, possibly counter to fact, individual i had been assigned to exposure level A i = a . In this paper,our interest is in the marginal exposure effect, defined as ψ = E( Y ) − E( Y ). The first part proceeds byrepeating the following steps for each orbit indexed by j = J, . . . , L j +1 denote the subset of covariates remaining in the ( j + 1)-th orbit. When j = J , let L J +1 denote the full set containing all measured covariates and the constant (intercept) 1. Denote eachof the covariates in L j +1 by L j +1 ,k , k = 1 , . . . , j . These j covariates are therefore candidates to beeliminated from confounding adjustment in the j -th orbit.2. For k = 1 , . . . , j , calculate the exposure effect estimator conditional on the covariates in L j +1 exclud-ing the single covariate L j +1 ,k , i.e., ( L j +1 \ L j +1 ,k ), which we denote simply by (cid:98) ψ j +1 ,k .3. Let ψ j +1 ,k = E( (cid:98) ψ j +1 ,k ) denote the expected value of the effect estimator, and ψ j +1 = E( (cid:98) ψ j +1 ) de-note the expected value adjusting for the covariates in L j +1 . We will first explain how to proceedwhen the true values of the (asymptotic) expectations ψ j +1 ,k and ψ j +1 are known, so that the truebias caused by each candidate confounder being unadjusted for can be determined exactly. Let k ∗ denote the index of the candidate confounder that yields the smallest (squared) magnitude of thefollowing difference: k ∗ = arg min k ( ψ j +1 ,k − ψ j +1 ) . (1)Define the subset of remaining covariates in the j -th orbit to be L j = ( L j +1 \ L j +1 ,k ∗ ). Denote theeffect estimator that adjusts for only the (retained) confounders in L j by (cid:98) ψ j = (cid:98) ψ j +1 ,k ∗ .Repeating the steps above for j = J, . . . ,
1, returns a sequence of (nested) covariate subsets L J +1 ⊃ . . . ⊃ L , where a single, different measured covariate is dropped in each orbit, relative to those in the previous(larger) orbit. Each subset indexes an effect so that the sequence of effect estimators, (cid:98) ψ J +1 , . . . , (cid:98) ψ , quan-tifies the impact of unmeasured confounding induced by eliminating covariates from adjustment in turn.The assumption that ψ j +1 ,k and ψ j +1 are known when evaluating the criterion (1) can only beconsidered to hold approximately in very large samples where sampling error can be ignored. Otherwise,estimators are needed to approximate (1). Asymptotically unbiased estimators may nonetheless sufferfrom inaccuracies in finite samples due to sampling variability, so that merely plugging in the (sample) es-timators for the (population) effects can produce incorrect results. To see why, consider a confounder thatis strongly associated with exposure, but weakly associated with outcome, so that it causes only a small(true) bias when unadjusted for. But adjusting for this confounder reduces the precision of the effect esti-4ator , so that the observable change in estimated effect may be deceptively large relative to the true biaswhen the confounder is eliminated, simply due to sampling variability. The sampling uncertainty of theeffect estimators can be acknowledged to more accurately assess the true biases due to unmeasured con-founding as follows. Let (cid:98) ψ j and (cid:98) ψ j (cid:48) denote the effect estimators from two different orbits, e.g., j and j (cid:48) .The expectation of the squared difference between the estimators can thus be decomposed as:E (cid:26)(cid:16) (cid:98) ψ j − (cid:98) ψ j (cid:48) (cid:17) (cid:27) = E (cid:20)(cid:110) (cid:98) ψ j − (cid:98) ψ j (cid:48) − ( ψ j − ψ j (cid:48) ) + ( ψ j − ψ j (cid:48) ) (cid:111) (cid:21) = V (cid:16) (cid:98) ψ j − (cid:98) ψ j (cid:48) (cid:17) + ( ψ j − ψ j (cid:48) ) + 2 E (cid:104)(cid:110) (cid:98) ψ j − (cid:98) ψ j (cid:48) − ( ψ j − ψ j (cid:48) ) (cid:111) ( ψ j − ψ j (cid:48) ) (cid:105) = V (cid:16) (cid:98) ψ j − (cid:98) ψ j (cid:48) (cid:17) + ( ψ j − ψ j (cid:48) ) , where V( X ) denotes the asymptotic variance of X . The last equality follows from the asymptotic unbi-asedness of (cid:98) ψ j and (cid:98) ψ j (cid:48) . By considering a squared difference between the effects in (1), the squared differ-ence between the corresponding estimators is no longer an unbiased estimator, with bias proportional tothe (population) variance of the difference between the effect estimators. The “debiased” squared differ-ence between the effects is thus ( ψ j − ψ j (cid:48) ) = E (cid:26)(cid:16) (cid:98) ψ j − (cid:98) ψ j (cid:48) (cid:17) (cid:27) − V (cid:16) (cid:98) ψ j − (cid:98) ψ j (cid:48) (cid:17) . In practice, values of(1) can be calculated by plugging in the sample estimator for the population mean, and the (asymptotic)variance estimator for the population variance; i.e., (cid:16) (cid:98) ψ j +1 ,k − (cid:98) ψ j +1 (cid:17) − (cid:98) V (cid:16) (cid:98) ψ j +1 ,k − (cid:98) ψ j +1 (cid:17) . When thisquantity is non-positive in practice, we will simply set its value to zero; when there are multiple covariateswith the same zero value, we will randomly select a covariate among these to be eliminated. The samplingvariability of the estimates is thus recognized (and removed) when assessing the potential bias caused byeliminating each candidate confounder from adjustment. Because these sample approximations can be im-perfect, we defer to future work investigating other (computationally-intensive) methods, such as cross fit-ting, to more precisely estimate the true changes in the population-level effects between orbits.To facilitate comparing effect estimators that condition on different confounders across orbitswhen evaluating non-collapsible effect measures , we recommend focusing on the same (marginal) esti-mand to avoid falsely interpreting changes in the estimated effects due to non-collapsibility as a result ofthe eliminated confounders’ influence on the effect. We will therefore employ an effect estimator based ondoubly robust standardization to ensure that the selected sequence of confounders is determined using acollapsible measure. The differences between the exposure effect estimators, and the associated variances,can be consistently estimated under settings with (non-)linear parametric regression models for the expo-sure and the outcome. This approach delivers an unbiased estimator if either the outcome or the exposuremodel is correctly specified, without amplifying biases that may arise due to the misspecified model. Fur-5hermore, the (asymptotic) variance estimator needed to calculate (1) can be derived in closed form forcomputational efficiency. We describe how to calculate this quantity, including the effect estimator in agiven orbit, in Appendix A. In principle, any criterion can be used in place of (1) to eliminate the con-founders in turn toward mimicking (or realizing) the process of recording confounders for adjustment. Forexample, theoretical background or study design knowledge, or empirical covariate prioritization or impor-tance measures , can be exploited. Accounting for sampling uncertainty in the effect estimators
We have sought to construct the (ideal) sequence in which covariates are eliminated from adjustment inturn using the true causal effects in (1). But the inherent sampling uncertainty of the estimators only per-mits “guessing” what the sequence should be. To acknowledge the uncertainty, we will give the data sev-eral opportunities to approximate the (true) process of prioritizing covariates for adjustment, by perturbing the sequence to express our uncertainty about the effects while investigating unknown confounding.We describe how to construct a perturbed sequence by modifying steps 2 and 3 of the proposedprocedure, and defer technical details to Appendix A. In step 2, for each candidate confounder indexedby k , determine the maximum likelihood estimates (MLEs) of the (coefficients in the) exposure and out-come models used to calculate the effect estimator (cid:98) ψ j +1 ,k . Then randomly draw a value of the estimatedoutcome model coefficients from their joint sampling distribution. Calculate the perturbed effect estima-tor using the randomly drawn coefficient values, which we denote by ˜ ψ j +1 ,k . Hence ˜ ψ j +1 ,k varies from (cid:98) ψ j +1 ,k on the basis of sampling uncertainty, where the former is based on a random draw from the esti-mated coefficients’ sampling distribution, and the latter is based on the MLE of the coefficients. In step3, the confounder to be eliminated from adjustment can then be determined using the perturbed estima-tors ˜ ψ j +1 ,k , and ˜ ψ j +1 , in place of (cid:98) ψ j +1 ,k , and (cid:98) ψ j +1 ; in other words, set k ∗ = arg min k (cid:16) ˜ ψ j +1 ,k − ˜ ψ j +1 (cid:17) − (cid:98) V (cid:16) ˜ ψ j +1 ,k − ˜ ψ j +1 (cid:17) . The values of the resulting sequence of perturbed effect estimators, ˜ ψ J +1 , . . . , ˜ ψ , willthus differ from the observed sequence (which is based on the MLEs), even when the covariates are elim-inated in the same order. A Monte Carlo sampling distribution can be readily obtained by constructinge.g., B = 500 perturbed sequences.Ideally, when the precise probability model for recording confounders is known, the perturbed ef-fect estimators should be constructed using random draws from the implied joint sampling distributionof the model coefficients across all orbits. The reduced variability of the effect estimators due to the se-quence for eliminating covariates being known would likely yield a trajectory that fluctuates less betweenorbits. A Bayesian approach may facilitate estimating such a sampling distribution, e.g., in terms of the6marginal) posterior probabilities of adjusting for each covariate. Because we have focused on the concep-tual development of the proposed sensitivity analysis in this paper, more complex approaches to samplesequences of effect estimators from their joint distribution are deferred to future work. Extrapolating to the effect adjusting for unmeasured confounders
In the second part of the proposed strategy, we extrapolate each unique sequence of (biased) effect estima-tors to the predicted (unbiased) effect, had additional hypothetical confounders been further adjusted for.The assumption that there exists a (possibly infinitely) large collection of covariates, among which only asubset is revealed for confounding adjustment, and no covariate is precluded from being recorded (and ad-justed for) under some unknown sampling mechanism is therefore necessary. In particular, under repeatedobserved samples we may each time observe a different set of measured covariates (where some are possi-bly sampled with unit probability, but none are sampled with zero probability), so that in the long run,across many repeated samples, even the weakest confounders (that are most likely to remain unmeasured)can be adjusted for.We will fit a natural cubic spline to each (perturbed) sequence of exposure effect estimators, e.g., (cid:98) ψ j , j = 1 , . . . , J + 1, with the number of measured covariates adjusted for (0 , . . . , J , after excluding theintercept) as a predictor. Natural cubic splines permit flexibly evaluating the trajectory of the exposureeffect estimator as unmeasured confounding is reduced one covariate at a time. The predicted value ofthe effect estimator, had a given number of hypothetical confounders been further adjusted for, can thenbe (linearly) extrapolated to. Suggestions on fitting natural cubic splines in practice are described in Ap-pendix B. This second part of the strategy is inspired by the Simulation Extrapolation (SIMEX) approach for different settings with measurement error in regression predictors.To express uncertainty, a 100(1 − α )% Wald confidence interval (CI) may be constructed for eachsequence of effect estimators; i.e., (cid:98) ψ j ± z − α/ (cid:113) (cid:98) V( (cid:98) ψ j ) , j = 1 , . . . , J + 1, where z − α/ is the 1 − α/ B different extrapolated100(1 − α )% CIs. For a given number of hypothetical confounders, the extrapolated CIs across all per-turbed sequences (including the observed sequence) can be combined into an uncertainty interval . Let thelower (upper) endpoint of the uncertainty interval be the 2 . .
5) percentile among the lower (or up-per) endpoints of all the extrapolated CIs. Using the (symmetric) 95 percentiles reduces the uncertaintyintervals’ susceptibility to (i) perturbations that yield extreme CIs, and (ii) overly-conservative conclu-7ions, as compared to using the minimum (and maximum) of the extrapolated CIs. We show empiricallythat the uncertainty intervals cover the true effect either at, or above, the nominal level of the constituentCIs under a variety of settings in simulation studies; details are described in Appendix C.Instead of extrapolating to a given (additional) number of hypothetical confounders, another pos-sibility is to extrapolate to the (smallest) number of confounders for the uncertainty interval to either in-clude or exclude zero so that the statistical significance of the effect estimate changes. The extent of un-measured confounding required to change the conclusions can thus be quantified in terms of the number ofhypothetical (unmeasured) confounders, rather than the strength of a single hypothetical confounder. Wedevelop this interpretation using the illustrations in the next section.
ILLUSTRATIONS
We first elaborate on the thought experiment in the introduction to demonstrate how the proposal canreveal a stability in the effect when unmeasured confounding is absent. We use data from an AIDS ran-domized clinical trial, and defer details to Appendix D. Suppose that knowledge about treatment beingrandomized, so that confounding of the exposure-outcome relation was unlikely, was hidden from us. Ad-justing for different covariates did not greatly affect the exposure effect estimates, as shown by the rela-tively stable trajectory in Figure 2. Across all perturbations, the 95% CIs for the exposure effect excludedzero, even when none of the measured covariates were adjusted for. Based on the extrapolations, adjust-ing for additional hypothetical confounders appeared unlikely to yield an uncertainty interval that wouldinclude zero, suggesting an insensitivity to hypothetical unmeasured confounding.We next apply the proposal to an observational study on the effectiveness of Right Heart Catheter-ization (RHC) in the initial care of critically ill patients . The exposure variable was defined to be whetheror not a patient received an RHC within 24 hours of admission. A binary outcome was defined based onwhether a patient died at any time up to 180 days since admission. We considered a reduced dataset with2707 participants having complete data on 72 covariates (one covariate that was singular in the reduceddataset was dropped), so that the exposure and outcome models with all covariates can be fitted.The sequence of effect estimators constructed by eliminating covariates one at a time following thecriterion in (1) is plotted (as empty circles) in Figure 1. Adjusting for all the measured covariates yieldedan estimated effect of 0 .
07, with a 95% CI of (0 . , . B = 500 perturbed sequences of the estima-tors and their corresponding 95% CIs. Natural cubic splines were fitted to each (perturbed) sequence ofthe estimators and endpoints of the CI, and the predicted effects, and uncertainty intervals, extrapolated8o. Almost all the predicted effects across the perturbations remained positive after adjusting for unmea-sured confounding; e.g., 4% (8%) of the predicted effects were negative after adjusting for one (nine) hypo-thetical confounder(s). The high sampling uncertainty in the predicted effects can be observed from their95 percentiles (vertical lines) being wider than the endpoints of the uncertainty intervals (filled invertedtriangles) in Figure 1. Furthermore, the uncertainty intervals included zero with only one hypothetical un-measured confounder. The conclusion of a (statistically significant) positive effect adjusting for only themeasured covariates is thus likely to change to an effect indistinguishable from zero after accounting forunmeasured confounding. Our findings are line with Lin et al. : in spite of the efforts made by the studyinvestigators to ascertain and adjust for all the known risk factors, the conclusion that there was a (barelystatistically significant) harmful exposure effect of RHC is sensitive to unmeasured confounding. COMPARISONS WITH EXISTING APPROACHES
Existing sensitivity analyses for unmeasured confounding typically encode the association between a singlehypothetical confounder with either the exposure or the outcome (or both) as (separate) sensitivity pa-rameters . Because the observed data do not identify the sensitivity parameters, different “reference”values are specified, typically by selecting a single measured confounder (among possibly many) as a proxyfor the hypothetical confounder, then judging the (minimum) extent that inference is affected . Wedefer reviewing (other) existing methods to Appendix E. In contrast to these approaches, we exploit therichness of all available joint information on the measured covariates, by probing the effects across differ-ent amounts of intentionally induced unmeasured confounding using different adjustment sets, without be-ing limited to particular interpretations and values of the (unidentifiable) sensitivity parameters. This canturn out to be informative. For example, suppose that a researcher has recorded all confounders so thatthere truly exists no unmeasured confounding, but this fact is unknown to them. Existing methods wouldnonetheless imply that unmeasured confounding remains a possibility in the worst-case scenario, whichmay inflate the degree of uncertainty using subjective judgment . In contrast, our proposal exploits fea-tures of the available data and study design to better inform the plausibility for unmeasured confounding.In particular, if all confounders are indeed adjusted for, and additional “redundant” covariates that are as-sociated with only either exposure or outcome are subsequently recorded (Greenland describes such anexample ), then our proposed analysis would reveal little or no sensitivity of inference due to unmeasuredconfounding. We acknowledge the heuristic nature of our proposal, but argue that all sensitivity analy-ses for unmeasured confounding are inherently heuristic, because the observed data does not identify thecausal effect under such settings; see e.g., Cusson and Infante-Rivard and the ensuing discussions.9ur proposal therefore differs from existing approaches in a few aspects. (i) No (unidentifiable)sensitivity parameters are required. (ii) No assumptions about the (arbitrary) distribution or scale of thehypothetical confounder(s), or whether they amplify or nullify the effect estimate, need to be imposed.(iii) The exposure and outcome can be continuous or non-continuous, so that non-linear models can bereadily accommodated for estimating the exposure effect. (iv) The sensitivity of exposure effects to thepossibility of (un)measured confounding can be concisely inspected using both graphical and numericalmethods, as we demonstrated in the illustrations. (v) The proposed strategy is not limited to the averageexposure effect, and can be readily applied to assess the sensitivity of any (scalar) causal effect to unmea-sured confounding, such as the effect of exposure among the (un)treated. In principle, the strategy can bereadily generalized in future work to more complex settings, such as heterogeneous exposure effects andnonparametric estimation methods. (vi) In contrast to existing approaches, the proposal works best whena large collection of covariates has been recorded, and background or empirical information on both mea-sured and unmeasured confounders is available, so that their joint influence on the exposure effect can bemeticulously examined. REFERENCES [1] M. Bonvini and E. H. Kennedy. Sensitivity analysis via the proportion of unmeasured confounding. arXiv preprint arXiv:1912.02793 , 2019.[2] M. A. Brookhart, S. Schneeweiss, K. J. Rothman, R. J. Glynn, J. Avorn, and T. St¨urmer. Variableselection for propensity score models.
American Journal of Epidemiology , 163(12):1149–1156, 2006.[3] H. K. Brown, J. G. Ray, A. S. Wilton, Y. Lunsky, T. Gomes, and S. N. Vigod. Association betweenserotonergic antidepressant use during pregnancy and autism spectrum disorder in children.
JAMA ,317(15):1544–1552, 2017.[4] N. B. Carnegie, M. Harada, and J. L. Hill. Assessing sensitivity to unmeasured confounding using asimulated potential confounder.
Journal of Research on Educational Effectiveness , 9(3):395–420, 2016.doi: 10.1080/19345747.2015.1078862. URL https://doi.org/10.1080/19345747.2015.1078862 .[5] N. B. Carnegie, M. Harada, V. Dorie, and J. Hill. treatSens: A Package to Assess Sensitivity ofCausal Analyses to Unmeasured Confounding . New York, NY, August 2020. URL https://github.com/vdorie/treatSens . Version R package version 3.0.[6] C. Cinelli and C. Hazlett. Making sense of sensitivity: extending omitted variable bias.
Journal of he Royal Statistical Society: Series B (Statistical Methodology) , 82(1):39–67, 2020. doi: 10.1111/rssb.12348. URL https://rss.onlinelibrary.wiley.com/doi/abs/10.1111/rssb.12348 .[7] C. Cinelli, J. Ferwerda, and C. Hazlett. sensemakr: Sensitivity Analysis Tools for Regression Models ,2020. URL https://CRAN.R-project.org/package=sensemakr . R package version 0.1.3.[8] A. F. Connors, T. Speroff, N. V. Dawson, C. Thomas, F. E. Harrell, D. Wagner, N. Desbiens,L. Goldman, A. W. Wu, R. M. Califf, et al. The effectiveness of right heart catheterization in theinitial care of critically iii patients. Journal of the American Medical Association , 276(11):889–897,1996.[9] J. R. Cook and L. A. Stefanski. Simulation-extrapolation estimation in parametric measurement errormodels.
Journal of the American Statistical Association , 89(428):1314–1328, 1994.[10] C. M. Crainiceanu, F. Dominici, and G. Parmigiani. Adjustment uncertainty in effect estimation.
Biometrika , 95(3):635–651, 2008.[11] A. Cusson and C. Infante-Rivard. Bias factor, maximum bias and the E-value: insight and extendedapplications.
International Journal of Epidemiology , 49(5):1509–1516, 09 2020. ISSN 0300-5771. doi:10.1093/ije/dyaa127. URL https://doi.org/10.1093/ije/dyaa127 .[12] P. Ding and T. J. VanderWeele. Sensitivity analysis without assumptions.
Epidemiology (Cambridge,Mass.) , 27(3):368, 2016.[13] V. Dorie, M. Harada, N. B. Carnegie, and J. Hill. A flexible, interpretable framework for assessingsensitivity to unmeasured confounding.
Statistics in Medicine , 35(20):3453–3470, 2016. doi: 10.1002/sim.6973. URL https://onlinelibrary.wiley.com/doi/abs/10.1002/sim.6973 .[14] M. Genb¨ack and X. de Luna. Causal inference accounting for unobserved confounding after outcomeregression and doubly robust estimation.
Biometrics , 75(2):506–515, 2019.[15] S. Greenland. The impact of prior distributions for uncontrolled confounding and response bias: acase study of the relation of wire codes and magnetic fields to childhood leukemia.
Journal of theAmerican Statistical Association , 98(461):47–54, 2003.[16] S. Greenland. Commentary: An argument against E-values for assessing the plausibility that an as-sociation could be explained away by residual confounding.
International Journal of Epidemiology , 49(5):1501–1503, 08 2020. ISSN 0300-5771. doi: 10.1093/ije/dyaa095. URL https://doi.org/10.1093/ije/dyaa095 . 1117] S. Greenland, J. M. Robins, and J. Pearl. Confounding and collapsibility in causal inference.
Statisti-cal Science , 14(1):29–46, 1999.[18] R. H. Groenwold, I. Shofty, M. Mioˇcevi´c, M. Van Smeden, and I. Klugkist. Adjustment for unmea-sured confounding through informative priors for the confounder-outcome relation.
BMC medicalresearch methodology , 18(1):174, 2018.[19] G. W. Imbens. Sensitivity to exogeneity assumptions in program evaluation.
American EconomicReview , 93(2):126–132, May 2003. doi: 10.1257/000282803321946921. URL .[20] D. Y. Lin, B. M. Psaty, and R. A. Kronmal. Assessing the Sensitivity of Regression Results to Un-measured Confounders in Observational Studies.
Biometrics , 54(3):948–963, 1998.[21] W. Liu, S. J. Kuramoto, and E. A. Stuart. An introduction to sensitivity analysis for unobservedconfounding in nonexperimental prevention research.
Prevention science , 14(6):570–580, 2013.[22] W. W. Loh and S. Vansteelandt. Confounder selection strategies targeting stable treatment effectestimators.
Statistics in Medicine , 2020. doi: https://doi.org/10.1002/sim.8792. URL https://onlinelibrary.wiley.com/doi/abs/10.1002/sim.8792 .[23] L. C. McCandless and P. Gustafson. A comparison of bayesian and monte carlo sensitivity analysisfor unmeasured confounding.
Statistics in Medicine , 36(18):2887–2901, 2017.[24] L. C. McCandless, P. Gustafson, and A. Levy. Bayesian sensitivity analysis for unmeasured confound-ing in observational studies.
Statistics in Medicine , 26:2331–2347, 2007.[25] R Core Team.
R: A Language and Environment for Statistical Computing . R Foundation for Statisti-cal Computing, Vienna, Austria, 2020. URL .[26] A. Richardson, M. G. Hudgens, P. B. Gilbert, and J. P. Fine. Nonparametric bounds and sensitivityanalysis of treatment effects.
Statistical science: a review journal of the Institute of MathematicalStatistics , 29(4):596, 2014.[27] J. M. Robins. [covariance adjustment in randomized experiments and observational studies]: Com-ment.
Statistical Science , 17(3):309–321, 2002. ISSN 08834237. URL .[28] P. Rosenbaum. A conditional test with demonstrated insensitivity to unmeasured bias in matchedobservational studies.
Biometrika , 2020. 1229] P. R. Rosenbaum. Sensitivity analysis for certain permutation inferences in matched observationalstudies.
Biometrika , 74:13–26, 1987.[30] P. R. Rosenbaum. Model-based direct adjustment.
Journal of the American Statistical Association , 82(398):387–394, 1987.[31] P. R. Rosenbaum.
Observational Studies . New York : Springer, New York, 2002.[32] P. R. Rosenbaum and D. B. Rubin. Assessing sensitivity to an unobserved binary covariate in anobservational study with binary outcome.
Journal of the Royal Statistical Society: Series B (Method-ological) , 45(2):212–218, 1983.[33] Z. Tan. A distributional approach for causal inference using propensity scores.
Journal of the Ameri-can Statistical Association , 101(476):1619–1637, 2006.[34] T. J. VanderWeele and O. A. Arah. Bias formulas for sensitivity analysis of unmeasured confoundingfor general outcomes, treatments, and confounders.
Epidemiology , 22:42–52, 2011.[35] T. J. VanderWeele and P. Ding. Sensitivity analysis in observational research: introducing the E-value.
Annals of internal medicine , 2017.[36] S. Vansteelandt and N. Keiding. Invited Commentary: G-Computation–Lost in Translation?
Ameri-can Journal of Epidemiology , 173(7):739–742, 03 2011. ISSN 0002-9262. doi: 10.1093/aje/kwq474.[37] V. Veitch and A. Zaveri. Sense and sensitivity analysis: Simple post-hoc analysis of bias due to unob-served confounding. arXiv preprint arXiv:2003.01747 , 2020.[38] J. Wakefield.
Bayesian and frequentist regression methods . Springer Science & Business Media, 2013.[39] B. D. Williamson, P. B. Gilbert, M. Carone, and N. Simon. Nonparametric variable importance as-sessment using machine learning techniques.
Biometrics , 2020. doi: https://doi.org/10.1111/biom.13392. URL https://onlinelibrary.wiley.com/doi/abs/10.1111/biom.13392 .[40] K. Wuthrich and Y. Zhu. Omitted variable bias of lasso-based inference methods: A finite sampleanalysis.
Available at SSRN 3379123 , 2019. doi: 10.2139/ssrn.3379123. URL https://dx.doi.org/10.2139/ssrn.3379123 .[41] B. Zhang and E. J. Tchetgen Tchetgen. A semiparametric approach to model-based sensitivity analy-sis in observational studies. arXiv preprint arXiv:1910.14130 , 2019.1342] Q. Zhao, D. S. Small, and B. B. Bhattacharya. Sensitivity analysis for inverse probability weightingestimators via the percentile bootstrap.
Journal of the Royal Statistical Society: Series B (StatisticalMethodology) , 81(4):735–761, 2019. − . − . − . − . − . . . . Uncertainty intervals for RHC data
Number of confounders adjusted for E ff e c t e s t i m a t e Figure 1: Estimators of the average exposure effect from each orbit, for the RHC data. Circlesindicate the point estimates, and (inverted) triangles indicate the (lower) upper endpoint of the 95% CI.Empty points indicate the sequence of effect estimates constructed by eliminating a single measured con-founder from adjustment in turn. The curve corresponds to a fitted natural cubic spline. Filled points in-dicate the (extrapolated) predictions after adjusting for q additional unmeasured confounders. Each thingrey line corresponds to the natural cubic spline fitted to a perturbed sequence. The vertical lines indicatethe symmetric 95 percentiles of the predicted effects across all B perturbations.14 EXPOSURE EFFECT ESTIMATOR
The effect estimator based on doubly robust standardization is calculated as follows. For a binary expo-sure, denote the (non-linear) exposure model, conditional on the remaining confounders in the j -th orbit L j , by E( A i | L ji ) = Pr( A i = 1 | L ji ) = expit( α j L ji ), where the subscript i of L ji denotes individual i andexpit( x ) = exp( x ) / { x ) } . Define the inverse probability of exposure weight for individual i as: W ji = A i Pr (cid:16) A i = 1 | L i = L ji (cid:17) + 1 − A i − Pr (cid:16) A i = 1 | L i = L ji (cid:17) . (2)The weight W ji is the reciprocal of the conditional probability of individual i being assigned the observedexposure A i given the confounders L ji . Let (cid:99) W ji denote the estimated weights obtained by substituting themaximum likelihood estimators (MLE) for α j in the exposure model. Fit the outcome regression modelE( Y | A, L j ) = h − ( ψ ∗ j A + β j L j ) to the observed data using the aforementioned weights, where h − isthe inverse of a link function h . (The ∗ superscript indicates a conditional effect that may differ from themarginal effect ψ j .) Let (cid:98) E( Y | A, L j ) denote the fitted outcome model obtained by plugging in the MLE for ψ ∗ j and β j . A doubly robust estimator of the average potential outcome E( Y a ) = n − (cid:80) i Y ai for a = 0 , (cid:98) E( Y a ) = n − (cid:88) i I { A i = a } (cid:99) W ji (cid:110) Y i − (cid:98) E( Y | A = A i , L ji ) (cid:111) + (cid:98) E( Y | A = a, L ji ) , (3)where I { B } = 1 if B is true and 0 otherwise. The estimator for the marginal exposure effect ψ = E( Y ) − E( Y ) in the j -th orbit is therefore: (cid:98) ψ j = n − (cid:88) i (2 A i − (cid:99) W ji (cid:110) Y i − (cid:98) E( Y | A = A i , L ji ) (cid:111) + (cid:98) E( Y | A = 1 , L ji ) − (cid:98) E( Y | A = 0 , L ji ) . (4)When both exposure and outcome models are correctly specified, an asymptotic expansion around ψ yieldsthe so-called “influence function” for individual i as: φ ji = (2 A i − W ji (cid:110) Y i − E( Y | A = A i , L ji ) (cid:111) + E( Y | A = 1 , L ji ) − E( Y | A = 0 , L ji ) − ψ. (5)Let (cid:98) φ ji = (2 A i − (cid:99) W ji (cid:110) Y i − (cid:98) E( Y | A = A i , L ji ) (cid:111) + (cid:98) E( Y | A = 1 , L ji ) − (cid:98) E( Y | A = 0 , L ji ) − (cid:98) ψ j denote the estimatedinfluence function, obtained by plugging in the maximum likelihood estimators for the coefficients in theexposure and outcome models, and substituting the population expectation with a sample average. Thevariance of the difference between effect estimators from two different orbits, e.g., j and k , is consistentlyestimated by the sample variance (denoted by (cid:98) V) of the corresponding difference in estimated influence15unctions: (cid:98) V (cid:110) n / (cid:16) (cid:98) ψ j − (cid:98) ψ k (cid:17)(cid:111) = ( n − − (cid:88) i (cid:16) (cid:98) φ ji − (cid:98) φ ki (cid:17) . (6)Consistency and asymptotic normality of the standardized difference (1) with mean zero and varianceone directly follow from the law of large numbers and the central limit theorem. The squared differencebetween the (asymptotic) expectations of the effect estimators is then approximated by (cid:16) (cid:98) ψ j − (cid:98) ψ k (cid:17) − (cid:98) V (cid:16) (cid:98) ψ j − (cid:98) ψ k (cid:17) .A perturbed effect estimator can be calculated as follows. In step 2 of the proposed procedure, inplace of calculating the effect estimator (cid:98) ψ j +1 ,k for each candidate confounder based on the maximum like-lihood estimates (MLEs) of the (coefficients in the) exposure and outcome models, carry out the followingsteps instead:2(a) Fit the exposure model Pr( A i = 1 | L j +1 , − ki ) = expit( α j +1 , − k L j +1 , − ki ) to the observed data, where L j +1 , − ki = ( L j +1 i \ L j +1 ,ki ) denotes the confounders in L j +1 excluding the single confounder L j +1 ,k for individual i , and α j +1 , − k denotes the corresponding coefficients. Calculate the MLEs, denoted bye.g., (cid:98) α j +1 , − k , and the observed weights as defined in (2), by substituting (cid:98) α j +1 , − k for α j +1 , − k in theexposure model.2(b) Fit the outcome regression model E( Y | A, L j +1 , − k ) = h − ( ψ ∗ j +1 , − k A + β j +1 , − k L j +1 , − k ) to the ob-served data using the observed weights from the previous step, where h − is the inverse link functionin a canonical generalized linear model for the outcome. Calculate the MLEs, e.g., ( ˆ ψ ∗ j +1 , − k , ˆ β j +1 , − k ),as well as the observed Fisher information matrix.2(c) Randomly draw a value of the estimated outcome model coefficients ( (cid:98) ψ ∗ j +1 , − k , (cid:98) β j +1 , − k ) from theirjoint sampling distribution, which we denote simply by G ( · ). In practice, G ( · ) may be approximatedby the (asymptotically-valid) multivariate normal distribution with mean vector ( ˆ ψ ∗ j +1 , − k , ˆ β j +1 , − k )and the covariance matrix being the inverse of the observed Fisher information matrix; see e.g., Wake-field , Section 7.6.2. Denote the randomly drawn values by ( ˜ ψ ∗ j +1 , − k , ˜ β j +1 , − k ). Let ˜E( Y | A, L j +1 , − k )denote the perturbed outcome model obtained by plugging in the randomly drawn values of the out-come model coefficients.2(d) Calculate the effect estimator as described in the Appendix, using the observed (exposure) weightsand perturbed outcome model. Denote the resulting perturbed estimator by ˜ ψ j +1 ,k .The difference between (cid:98) ψ j +1 ,k and ˜ ψ j +1 ,k therefore lies in step 2(c), where the former is based onthe MLE of the outcome model coefficients whereas the latter is based on a random draw from the esti-mated coefficients’ sampling distribution. Calculating the perturbed effect estimators only requires fitting16he exposure and outcome models once (to the observed data), which is computationally more efficientthan e.g., a nonparametric bootstrap procedure that randomly resamples observations with replacement,and refits both exposure and outcome models. For computational convenience, we have considered only aperturbed outcome model. In principle, a random draw of the estimated exposure model coefficients fromtheir sampling distribution, denoted by e.g., ˜ α j +1 , − k , may be made in step 2(a). The weights as defined in(2) can then be calculated by substituting ˜ α j +1 , − k for α j +1 , − k in the exposure model. The resulting (per-turbed) weights are then used to fit the outcome model in step 2(b). B FITTING NATURAL CUBIC SPLINES TO A TRAJECTORY OF THEEXPOSURE EFFECT ESTIMATOR
A natural cubic spline is fitted to each sequence of exposure effect estimators, e.g., (cid:98) ψ j , j = 1 , . . . , J + 1,with the number of measured covariates adjusted for (0 , . . . , J , after excluding the intercept) as a pre-dictor. Natural cubic splines permit flexibly evaluating the trajectory of the exposure effect estimator asunmeasured confounding is reduced one covariate at a time. The predicted value of the effect estimator,had a given number of hypothetical confounders been further adjusted for, can then be linearly extrapo-lated to. The extrapolation therefore assumes that the impact of adjusting for hypothetical confounderson the causal effect satisfies the boundary conditions of the fitted spline, e.g., a zero second derivative atthe boundary knots. For simplicity, natural cubic splines can be fitted with the largest number of (evenly-spaced) interior knots permitted to ensure identifiability of the spline; when there are J confounders, theremay be as many as J − C SIMULATION STUDIES
Simulation studies were conducted under different data-generating scenarios to empirically evaluate theability of the proposal to avoid potential biases due to unmeasured confounding.
C.1 Study 1
Data for a single population was generated as follows: L , . . . , L p + q ∼ i.i.d. N (0 , A ∼ Bernoulli { expit( α + p + q (cid:88) k =1 α k L k ) } Y ∼ Bernoulli { expit( β + δA + p + q (cid:88) k =1 β k L k ) } We set α = β = 0 , α k ∼ Uniform( − , , β k = α k , k = 1 , . . . , p + q . Each confounder had the samestrength of influence on the exposure and outcome. To induce unmeasured confounding, we iteratively re-moved the q confounders that resulted in the smallest (absolute) changes to the effects between consecu-tive orbits following the deterministic criterion (1). These (same) q confounders would thus be regardedas unmeasured (in each observed dataset), so that only p measured confounders remained available for ad-justment. We considered values of p ∈ { , } , and q ∈ { , , } . The true causal effect was zero when δ = 0; we considered values of δ ∈ { , } . There were a total of 12 different data-generating scenarios. Foreach setting, we first generated a single population of size N = 50000 (so that sampling variability can beessentially ignored when determining which confounders to be unmeasured), then generated each observeddataset by randomly sampling (without replacement) n = 1000 individuals from the population. For eachobserved dataset, we eliminated the measured confounders in turn by minimizing the (squared) differencesin the effects between consecutive orbits following (1). We then constructed the sequence of effect esti-mators, as well as the 95% CIs for the effect, indexed by the sequence of nested confounder subsets. Forsimplicity, we fitted to each sequence a natural cubic spline with the largest number of (evenly-spaced) in-terior knots permitted to ensure identifiability of the spline; when there are p confounders, there may beas many as p − q unmeasured con-founders was then extrapolated to. When q = 0, we extrapolated to two unmeasured confounders to assess18he stability of (inference for) the effect.We simulated 1000 samples for each setting. We considered point estimates of, and 95% CIs for,the population average exposure effect ψ that were based on either (i) adjusting for all p + q (un)measuredconfounders associated with exposure and outcome, or (ii) adjusting for only the p measured confounders,or (iii) the extrapolated predictions. The empirical mean and standard deviation (across all simulatedsamples) of all three point estimates are displayed in Table 1. We also assessed the empirical frequencyat which each of the 95% CIs, as well as the uncertainty interval, included the average treatment effect.(Recall that the uncertainty interval for a single observed dataset was the union of all extrapolated 95%CIs across all perturbed sequences.) Adjusting for only the measured confounders yielded only slightly bi-ased point estimates, but resulted in 95% CIs whose coverage levels were empirically far below the nominallevel in the presence of a large number of unmeasured confounders. The extrapolated predicted effect wassimilarly slightly biased, but had much greater empirical variability, resulting in the largest empirical meansquared error (MSE) among the three estimates. When there were many unmeasured confounders, thepresence of more measured covariates to learn about the impact of unmeasured confounding from resultedin the uncertainty intervals being more likely to capture the population average effect at (or exceeding)the nominal level of the constituent 95% CIs in the presence of unmeasured confounding.Point estimates Empirical MSE (square root) Coverage of 95% CIs q ψ p All Measured Predicted All Measured Predicted All Measured Predicted0 0.00 12 0.00 (0.03) 0.00 (0.03) 0.00 (0.03) 0.03 0.03 0.03 0.94 0.94 0.980 0.00 16 0.00 (0.03) 0.00 (0.03) 0.00 (0.03) 0.03 0.03 0.03 0.93 0.93 0.980 0.16 12 0.16 (0.03) 0.16 (0.03) 0.15 (0.03) 0.03 0.03 0.03 0.94 0.94 0.980 0.13 16 0.13 (0.03) 0.13 (0.03) 0.13 (0.03) 0.03 0.03 0.03 0.93 0.93 0.984 0.00 12 0.00 (0.03) 0.01 (0.03) 0.00 (0.04) 0.03 0.03 0.04 0.94 0.95 0.994 0.00 16 0.00 (0.03) 0.00 (0.03) -0.01 (0.04) 0.03 0.03 0.04 0.93 0.94 0.984 0.13 12 0.13 (0.03) 0.14 (0.03) 0.13 (0.04) 0.03 0.04 0.04 0.92 0.94 0.994 0.13 16 0.13 (0.04) 0.13 (0.04) 0.12 (0.04) 0.04 0.04 0.05 0.90 0.92 0.988 0.00 12 0.00 (0.03) 0.05 (0.03) -0.05 (0.08) 0.03 0.06 0.09 0.94 0.72 0.858 0.00 16 -0.01 (0.03) 0.04 (0.03) -0.04 (0.12) 0.03 0.05 0.12 0.90 0.73 0.938 0.13 12 0.12 (0.04) 0.18 (0.04) 0.08 (0.09) 0.04 0.07 0.10 0.89 0.72 0.918 0.11 16 0.10 (0.03) 0.15 (0.04) 0.08 (0.10) 0.03 0.06 0.11 0.89 0.70 0.97Table 1: Results for simulation study 1. Point estimates of, and inference for, the average expo-sure effect ψ were based on either (i) adjusting for all confounders (“All”), or (ii) adjusting for only themeasured confounders (“Measured”), or (iii) the extrapolated prediction (“Predicted”). The empiricalMSE was calculated as the sum of the squared bias and the empirical variance of the point estimates.Coverage for the 95% CIs was calculated as the empirical proportion of CIs that included the popula-tion average exposure effect. For simplicity, the extrapolated uncertainty interval is termed the “Predicted95% CI” in this table. Empirical standard errors are in brackets. All results were rounded to two decimalplaces. 19 .2 Study 2 In this study, we compare the proposed method with the existing methods of Carnegie et al. , as imple-mented in the “treatSens” package , and of Cinelli and Hazlett , as implemented in the “sensemakr”package in R . We modified the data-generating process of the previous study to fulfill the required as-sumptions of Carnegie et al. ; i.e., exposure was generated as A ∼ Φ( α + (cid:80) p + qk =1 α k L k ), where Φ( · ) is thecumulative distribution function of a standard normal variable so that the exposure model employs a “pro-bit” link, and outcome Y ∼ N ( β + δA + (cid:80) p + qk =1 β k L k , p + q ) was continuous and normally distributed (withvariance simply equal to the total variance of all p + q independent confounders with unit variance). Cinelliet al. require only a continuous outcome so that linear regression models can be fitted, with no exposuremodel needed. We set α = β = 0 , α k ∼ Uniform( − . , . , β k ∼ Uniform( − , , k = 1 , . . . , p + q . Weconsidered settings with p = 16 , q ∈ { , , } , and δ ∈ { , } . In addition, to empirically evaluate the biasesthat can result when the “probit” link function in the assumed exposure model is misspecified, we gener-ated data for the exposure under a “logit” link function. There were a total of 12 different data-generatingscenarios.For each setting, we first generated a single population of size N = 50000 with all p + q con-founders, then induced unmeasured confounding by iteratively removing the q confounders that resultedin the smallest (absolute) changes to the effects between consecutive orbits following the deterministic cri-terion (1). These (same) q confounders would thus be regarded as unmeasured (in each observed sample),so that only p measured confounders remained available for adjustment. The data for each observed sam-ple was generated by randomly drawing (without replacement) n = 2000 individuals from the population.For each observed data, we carried out the proposed sensitivity analysis by eliminating the measured con-founders in turn to minimize the (squared) differences in the effects between consecutive orbits following(1). We then constructed the sequence of standardized effect estimators, as well as the 95% CIs for the ef-fect, indexed by the sequence of nested confounder subsets. Natural cubic splines with the maximum num-ber of p − q unmeasured confounders was then extrapolated to. When q = 0, we extrapolated totwo unmeasured confounders to assess the stability of (inference for) the effect.Both the treatSens and sensemakr methods require specifying values for the sensitivity parame-ters that encode the separate associations between a single hypothetical unmeasured confounder with theexposure, and with the outcome, respectively, for each individual simulated dataset. For simplicity whenimplementing these simulation studies, we selected a “calibrated,” or “benchmark,” value among the co-efficient estimates of the p measured confounders in the exposure (or outcome) model fitted to each sim-20lated observed data. When q = 0 so that there truly was no unmeasured confounding, we selected thecoefficient estimate with the smallest (absolute) magnitude among the p estimated coefficients to assessthe impact of incorrectly assuming some (minimal) unmeasured confounding. When q > p estimated coefficients as a proxy for the multiple unmeasured confounders. When usingthe treatSens method, we sampled 500 posterior draws of the exposure effect point estimate, and its stan-dard error estimate, for the given combination of the sensitivity parameter values. The point estimate wasdetermined as the posterior mean, and the standard error estimate was determined using Rubin’s rules asdescribed in Carnegie et al. ; 95% Wald CIs were then constructed using the corresponding standard nor-mal quantiles. When using the sensemakr method, we selected a bias adjustment that reduces (increases)the absolute value of the estimated coefficient for the exposure in the outcome model if ψ = 0 ( ψ > ψ is unknown in practice. All other arguments in the sensemakr functionwere left at their default values.We simulated 1000 samples for each setting. We again considered the point estimates of, and 95%CIs for, the population average exposure effect ψ that were based on either (i) adjusting for all p + q (un)measuredconfounders associated with exposure and outcome, or (ii) adjusting for only the p measured confounders,or (iii) the extrapolated predictions. In addition, we considered the predictions using the treatSens andsensemakr methods with their respective calibrated or benchmark values of the sensitivity parameters.The empirical mean and standard deviation (across all simulated samples) of all five point estimates aredisplayed in Table 2. Because the treatSens method can fail for a given combination of the calibrated sen-sitivity parameter values, we included the proportion of simulated datasets where the method returnedsuch an error. We also assessed the empirical frequency at which each of the 95% CIs included the av-erage treatment effect. There were no systematic biases in the point estimates empirically when all theconfounders (including those regarded as unmeasured) were adjusted for, regardless of the assumed linkfunction for the exposure model. The treatSens method failed in at least 40% of the simulated datasets;among datasets where the method successfully converged, unbiased point estimates were obtained onlywhen there was truly no exposure effect ( ψ = 0) and no unmeasured confounding ( q = 0). However, the95% CIs met the nominal coverage levels only when the probit link was correctly assumed; in all othercases, the CIs failed to capture the true value of ψ in all the simulated samples. The sensemakr methodyielded unbiased estimates when there was truly no exposure effect ( ψ = 0), and positively biased esti-mates when there was an exposure effect ( ψ > ψ > q > q = 8 where the coverage levels fell below their nominal levels empirically.22oint estimates q link ψ All Measured % fail treatSens sensemakr Predicted0 logit 0.00 0.02 (0.18) 0.02 (0.18) 0.44 -0.01 (0.02) 0.01 (0.17) 0.02 (0.24)0 logit 2.00 2.01 (0.19) 2.01 (0.19) 0.47 0.19 (0.02) 2.02 (0.19) 2.01 (0.25)0 probit 0.00 0.03 (0.20) 0.03 (0.20) 0.44 0.00 (0.02) 0.03 (0.18) 0.03 (0.25)0 probit 2.00 2.03 (0.20) 2.03 (0.20) 0.43 0.20 (0.02) 2.05 (0.19) 2.03 (0.24)4 logit 0.00 -0.02 (0.20) 0.11 (0.29) 0.60 -3.37 (0.97) -0.06 (0.37) 0.16 (0.53)4 logit 2.00 1.99 (0.20) 2.12 (0.28) 0.61 -3.28 (0.83) 2.60 (0.36) 2.17 (0.52)4 probit 0.00 -0.03 (0.23) 0.16 (0.33) 0.60 -3.44 (0.84) -0.19 (0.60) 0.32 (0.66)4 probit 2.00 1.98 (0.24) 2.16 (0.32) 0.59 -3.23 (0.91) 2.94 (0.44) 2.32 (0.67)8 logit 0.00 0.06 (0.23) -0.02 (0.32) 0.64 -1.66 (1.38) -0.01 (0.33) 0.21 (1.15)8 logit 2.00 2.04 (0.22) 1.97 (0.33) 0.61 -1.63 (1.42) 2.35 (0.41) 2.25 (1.11)8 probit 0.00 0.04 (0.29) -0.08 (0.38) 0.66 -1.92 (1.46) 0.03 (0.52) 0.59 (1.56)8 probit 2.00 2.03 (0.27) 1.88 (0.37) 0.64 -1.70 (1.45) 2.47 (0.49) 2.62 (1.48)Empirical MSE (square root) q link ψ All Measured treatSens sensemakr Predicted0 logit 0.00 0.18 0.18 0.02 0.17 0.240 logit 2.00 0.19 0.19 1.81 0.19 0.250 probit 0.00 0.20 0.20 0.02 0.18 0.260 probit 2.00 0.21 0.21 1.80 0.20 0.244 logit 0.00 0.21 0.31 3.50 0.38 0.564 logit 2.00 0.21 0.31 5.35 0.70 0.554 probit 0.00 0.23 0.37 3.54 0.63 0.734 probit 2.00 0.24 0.36 5.31 1.04 0.758 logit 0.00 0.24 0.32 2.16 0.33 1.178 logit 2.00 0.23 0.33 3.89 0.54 1.148 probit 0.00 0.29 0.39 2.41 0.52 1.678 probit 2.00 0.27 0.39 3.97 0.67 1.61Coverage of 95% CIs q link ψ All Measured treatSens sensemakr Predicted0 logit 0.00 0.96 0.96 0.00 0.96 0.990 logit 2.00 0.95 0.95 0.00 0.94 0.980 probit 0.00 0.95 0.95 1.00 0.95 0.980 probit 2.00 0.95 0.95 0.00 0.95 0.984 logit 0.00 0.95 0.93 0.00 0.76 0.984 logit 2.00 0.96 0.94 0.00 0.35 0.974 probit 0.00 0.94 0.91 0.00 0.43 0.954 probit 2.00 0.94 0.92 0.00 0.16 0.948 logit 0.00 0.94 0.96 0.00 0.91 0.818 logit 2.00 0.96 0.96 0.00 0.69 0.828 probit 0.00 0.93 0.94 0.00 0.79 0.848 probit 2.00 0.95 0.95 0.00 0.63 0.86Table 2: Results for simulation study 2. Point estimates of, and inference for, the average ex-posure effect ψ were based on either (i) adjusting for all confounders (“All”), or (ii) adjusting for onlythe measured confounders (“Measured”), or (iii) the extrapolated prediction (“Predicted”), or (iv) thetreatSens method assuming calibrated values of the sensitivity parameters, or (v) the sensemakr methodassuming benchmark values of the sensitivity parameters. The proportion of simulated datasets wherethe treatSens method failed is labelled as “% fail.” The empirical MSE was calculated as the sum of thesquared bias and the empirical variance of the point estimates. Coverage for the 95% CIs was calculatedas the empirical proportion of CIs that included the population average exposure effect. Empirical stan-dard errors are in brackets. All results were rounded to two decimal places.23 AIDS CLINICAL TRIALS GROUP STUDY 175
The ‘ACTG175’ dataset was from an AIDS randomized clinical trial, and was distributed as part of the speff2trial package via the Comprehensive R Archive Network ( https://CRAN.R-project.org/package=speff2trial ). The trial compared monotherapy using either zidovudine or didanosine alone with combi-nation therapy using either zidovudine and didanosine, or zidovudine and zalcitabine, in adults infectedwith the human immunodeficiency virus type I whose CD4 T cell counts were between 200 and 500 percubic millimeter. Exposure was (re)coded as A = 0 for therapy using either zidovudine or didanosine only,and A = 1 for therapies combining zidovudine and either didanosine or zalcitabine. A binary outcomewas defined based on whether a participant’s CD4 T cell count at 96 ± B = 500 perturbed sequences of the estimators and their corresponding 95% CIs. Across all the pertur-bations, the 95% CIs for the exposure effect excluded zero, even when (all) the measured covariates hadbeen intentionally eliminated from adjustment. A natural cubic spline with the maximum number of 15equally-spaced interior knots was fitted to each (perturbed) sequence of the estimators and endpoints ofthe 95% CI. The predicted effects, and uncertainty intervals, adjusting for between one and eight (half thenumber of measured covariates) hypothetical confounders were then extrapolated to. We trimmed 5% ofthe extreme endpoints of the perturbed CIs to construct the uncertainty intervals. Further adjustment forhypothetical unmeasured confounding appeared unlikely to yield an effect estimate that would be signif-icantly indistinguishable from zero: the uncertainty interval excluded zero even after adjusting for sevenadditional hypothetical (unmeasured) confounders, as indicated by the filled triangles being above zero inFigure 2. The results suggested that statistical inference for the effect was unlikely to change (from adjust-ing for only the measured confounders). The conclusion that there was a (statistically significant) positiveaverage exposure effect of combination therapy on CD4 T cell count thus appeared to be insensitive to24hypothetical) unmeasured confounding. . . . . . Uncertainty intervals for ACTG175 data
Number of confounders adjusted for E ff e c t e s t i m a t e Figure 2: Estimators of the average exposure effect from each orbit, for the ACTG175 data. Cir-cles indicate the point estimates, and (inverted) triangles indicate the (lower) upper endpoint of the 95%CI. Empty points indicate the sequence of effect estimates constructed by eliminating a single measuredconfounder from adjustment in turn. The curve corresponds to a fitted natural cubic spline. Filled pointsindicate the (extrapolated) predictions after adjusting for q additional unmeasured confounders. Each thingrey line corresponds to the natural cubic spline fitted to a perturbed sequence. The vertical lines indicatethe symmetric 95 percentiles of the predicted effects across all B perturbations. E EXISTING METHODS FOR SENSITIVITY ANALYSIS
Existing methods for assessing sensitivity to unmeasured confounding typically judge the (minimum) ex-tent that adjusting for a hypothetical scalar confounder, conditional on the measured confounders, affectsinference about the causal effect. A widely-adopted approach focuses on characterizing the (strength anddirection of) association between the hypothetical confounder and the exposure, either as a regression co-efficient or the odds ratio of receiving exposure, as a sensitivity parameter . Another approachfurther incorporates the (strength and direction of) association between the hypothetical confounder andthe outcome as an additional sensitivity parameter . Recent methods adopting the latter approachinclude VanderWeele and Arah , Richardson et al. , Carnegie et al. , Ding and VanderWeele , Dorie25t al. , VanderWeele and Ding , Genb¨ack and de Luna , Zhang and Tchetgen Tchetgen , Cinelli andHazlett , Veitch and Zaveri , among many others. We refer readers to Liu et al. and Carnegie et al. for an overview of other approaches, including methods that require restrictive assumptions about theparametric form of the outcome model that encodes the exposure effect, and to Veitch and Zaveri foran overview of other methods that do not explicitly consider a hypothetical unmeasured confounder andinvolve other (more complex) sensitivity parameters. Methods using sensitivity parameters that encodethe hypothetical confounder’s separate associations with exposure and outcome typically require first spec-ifying a discrete grid of values for the parameters, then estimating the causal effect after adjusting for thehypothetical confounder under different fixed parameter values. Graphical tools, such as bivariate sensi-tivity contour plots that visualize different combinations of how strongly the hypothetical con-founder must be (simultaneously) associated with exposure and outcome for the conclusions to be affected,can be especially useful for applied researchers.But sensitivity analyses predicated on such sensitivity parameters suffer from a number of short-comings. First, users of grid-based approaches must explicitly state the granularity, range, and even signs,for plausible values of the separate (conditional) associations between the hypothetical confounder and theexposure, and the outcome. Reference values are sometimes estimated or postulated based on the mea-sured confounders’ (conditional) associations with exposure (and outcome) via an empirical “calibration.”A specific measured confounder is selected as a reference based on e.g., its (standardized) estimated re-gression coefficient ; or its marginal effect size on the outcome ; or its relative risk on the (binary) out-come conditional on other measured confounders ; or its contribution to the variation in the outcome .Bayesian sensitivity analyses for unmeasured confounding avoid such user calibration throughthe application of Bayes Theorem to obtain a posterior distribution for the sensitivity parameters, butrequire more restrictive parametric assumptions about the outcome model. Second, with the exceptionof Bonvini and Kennedy , Zhang and Tchetgen Tchetgen and Cinelli and Hazlett , most sensitivityanalyses that evaluate the associations between a hypothetical confounder with exposure, or outcome,or both, (implicitly) require certain assumptions about whether the confounder is continuous or dichoto-mous, or the distribution of the confounder, or whether the bias is away or toward the null. Third andmost crucially, sensitivity analyses based on such sensitivity parameters can potentially lead to scientifi-cally meaningless and logically incoherent conclusions . Consider the following simple example where the(true) probability of receiving a binary exposure ( A = 1) depends on a logistic model with main effectsfor three confounders, L , L , and U ; e.g., Pr( A = 1 | L , L , U ) = expit( β + β L + β L + Γ U ), whereexpit( x ) = exp( x ) / { x ) } . The interpretation of the conditional odds ratio of being exposed due to L under the true model, exp( β ), depends on the values of the other confounders L and U , even in the26bsence of any interactions. Suppose that the researcher could only collect data on L and L , so that U is left unmeasured, and is thus restricted to fitting a different treatment model, e.g., Pr( A = 1 | L , L ) =expit( β ∗ + β ∗ L + β ∗ L ). The interpretation of exp( β ∗ ) under the fitted model thus depends on only thevalue of L . It follows that the interpretation of the sensitivity parameter, exp(Γ), is incompatible withboth exp( β ) and exp( β ∗ ) due to non-collapsibility . Of course, one may consider a different treatmentmodel instead, e.g., Pr( A = 1 | L , U ) = expit( β ∗∗ + β ∗∗ L + Γ ∗ U ), so that the interpretation of the sensitiv-ity parameter exp(Γ ∗ ) that encodes the conditional odds ratio due to U given L is seemingly comparablewith exp( β ∗ ). But there is no guarantee that the range of plausible values for Γ will necessarily be nar-rower than that for Γ ∗ , even though the former is based on an exposure model that adjusts for both L and L2