# Sensitivity Analysis for Unmeasured Confounding via Effect Extrapolation

SSensitivity Analysis for Unmeasured Confounding via EﬀectExtrapolation

Wen Wei Loh and Stijn Vansteelandt Department of Data Analysis, Ghent University, Gent, Belgium. Department of Applied Mathematics, Computer Science and Statistics, Ghent University,Ghent, Belgium Department of Medical Statistics, London School of Hygiene and Tropical Medicine,London, United KingdomFebruary 4, 2021

Abstract

Inferring the causal eﬀect of a non-randomly assigned exposure on an outcome requires adjustingfor common causes of the exposure and outcome to avoid biased conclusions. Notwithstanding the ef-forts investigators routinely make to measure and adjust for such common causes (or confounders),some confounders typically remain unmeasured, raising the prospect of biased inference in observa-tional studies. Therefore, it is crucial that investigators can practically assess their substantive con-clusions’ relative (in)sensitivity to potential unmeasured confounding. In this article, we propose asensitivity analysis strategy that is informed by the stability of the exposure eﬀect over diﬀerent, well-chosen subsets of the measured confounders. The proposal entails ﬁrst approximating the process forrecording confounders to learn about how the eﬀect is potentially aﬀected by varying amounts of un-measured confounding, then extrapolating to the eﬀect had hypothetical unmeasured confounders beenadditionally adjusted for. A large set of measured confounders can thus be exploited to provide insightinto the likely presence of unmeasured confounding bias, albeit under an assumption about how dataon the confounders are recorded. The proposal’s ability to reveal the true eﬀect and ensure valid infer-ence after extrapolation is empirically compared with existing methods using simulation studies. Wedemonstrate the procedure using two diﬀerent publicly available datasets commonly used for causalinference. a r X i v : . [ s t a t . M E ] F e b eywords: Causal inference, Hidden bias, Observational studies, Residual confounding, Uncon-trolled confounding, Unobserved confounding INTRODUCTION

When inferring the causal eﬀect of a non-randomly assigned (point) exposure on an outcome, commoncauses (i.e., confounders ) of both exposure and outcome that are unadjusted for can induce associationsthat lead to biased estimates. Investigators therefore strive to measure as many (pertinent) confoundersas possible when analyzing data from observational studies in eﬀorts to eliminate such biases due to un-measured confounding. But in most realistic settings, some confounders remain unadjusted for, raisingthe prospect of such biases being merely reduced, but not entirely eliminated. For example, there may beeither confounders that are unfeasible or impractical to observe or record , or (measured) confounders ex-cluded by routine variable selection techniques . It is therefore crucial that investigators can practicallyassess the relative stability of their conclusions to dormant confounding that remains unadjusted for. PROPOSED SENSITIVITY ANALYSIS STRATEGY

In this paper, we propose a novel sensitivity analysis strategy to assess how diﬀerent the conclusions wouldbe after removing unmeasured confounding. To understand the proposal, consider the following thoughtexperiment based on an illustration presented later. Suppose that you were given data from a simple ran-domized trial on a (dichotomous) AIDS therapeutic treatment, an outcome measuring patients’ CD4 Tcell count, and a collection of baseline covariates. However, suppose that you were not told that the dataoriginated from a randomized experiment. Then upon analyzing the data, you would observe that regard-less of the covariates adjusted for, roughly the same average treatment eﬀect estimate is obtained. The rel-ative stability of the eﬀect estimates, across diﬀerent adjustment sets, would allow you to gain conﬁdencethat the data might have originated from a randomized experiment. This conﬁdence would grow as moreand more covariates (beyond the initial collection) are subsequently recorded, and you keep on observing astability in the average treatment eﬀect estimate that is maintained over larger and larger adjustment sets.What we learn from the above thought experiment is that a large set of measured covariates canbe exploited to provide insight into the likely presence of unmeasured confounding bias, albeit under anassumption about how the confounders are recorded. In particular, we assume that under repeated sam-pling, covariates are chosen (from a possibly inﬁnitely large collection) according to some sampling mecha-nism, such that each confounder that simultaneously inﬂuences exposure and outcome has a positive (non-zero) probability, or likelihood, of being recorded (and adjusted for). Confounders that are unadjusted for2ould thus induce unmeasured confounding, and result in biased eﬀects. But if that sampling mechanismis well understood, then the bias due to unmeasured confounding can be eliminated, by extrapolating theobservable behavior of the eﬀects adjusting for the measured confounders to recover the (true) eﬀect hadthe entire collection of confounders been adjusted for. Indeed, suppose for instance that the measured con-founders form a random subset of all confounders, where each confounder has an equal probability of be-ing recorded. Our proposal mimics the sampling mechanism as follows. Starting with the set of measuredconfounders, randomly select a confounder and eliminate it from adjustment so that it can be consideredto be unmeasured. Assuming a suﬃciently large sample size so that sampling variability can be ignored,calculate the (population-averaged) exposure eﬀect adjusting for only the retained confounders. Repeatthis process of intentionally eliminating from adjustment a single confounder one at a time, until no con-founders are adjusted for. We can now probe how the (biased) eﬀects change with diﬀerent amounts ofunmeasured confounding; furthermore, we can extrapolate to the (unbiased) eﬀect adjusting for the en-tire collection of confounders. In practice, however, because the sampling mechanism through which con-founders are recorded for adjustment is less well understood, we propose a more heuristic approach to ap-proximate the (non-random) process for recording confounders.

SENSITIVITY ANALYSIS VIA EFFECT EXTRAPOLATIONIntentionally eliminating measured covariates from adjustment

The proposal proceeds by eliminating measured covariates one at a time to intentionally induce unmea-sured confounding. Ideally, the order of elimination should be determined using information about theinvestigators’ process for recording confounders. But such information may be diﬃcult to precisely quan-tify in practice. In this paper, we will thus proceed under the assumption that the covariates exerting thestrongest impact on the eﬀect are prioritized to be recorded (and adjusted for) ﬁrst. We propose a data-adaptive approach to mimic such a process. For simplicity we will initially ignore any sampling variability.Starting with the set of measured covariates, eliminate the covariate that causes the smallest change in the(population-averaged) exposure eﬀect when unadjusted for, so that the covariate can be considered as un-measured. Repeat this until no covariates remain for adjustment. Under this assumed process, covariatesthat are eliminated earlier have lower priority for confounding adjustment and thus the weakest impact onthe eﬀect.To aid visualizing the impact of unmeasured confounding on the eﬀect, we partition the space ofall possible covariate subsets into orbits , where the j -th orbit comprises all subsets with j + 1 covariates,including an intercept. Let J denote the total number of measured covariates so that there are J + 1 diﬀer-3nt orbits. We brieﬂy introduce the notation as follows. In a sample of size n , for individual i = 1 , . . . , n ,denote the binary exposure by A i and the outcome of interest by Y i . Let Y ai denote the potential outcomefor Y i if, possibly counter to fact, individual i had been assigned to exposure level A i = a . In this paper,our interest is in the marginal exposure eﬀect, deﬁned as ψ = E( Y ) − E( Y ). The ﬁrst part proceeds byrepeating the following steps for each orbit indexed by j = J, . . . , L j +1 denote the subset of covariates remaining in the ( j + 1)-th orbit. When j = J , let L J +1 denote the full set containing all measured covariates and the constant (intercept) 1. Denote eachof the covariates in L j +1 by L j +1 ,k , k = 1 , . . . , j . These j covariates are therefore candidates to beeliminated from confounding adjustment in the j -th orbit.2. For k = 1 , . . . , j , calculate the exposure eﬀect estimator conditional on the covariates in L j +1 exclud-ing the single covariate L j +1 ,k , i.e., ( L j +1 \ L j +1 ,k ), which we denote simply by (cid:98) ψ j +1 ,k .3. Let ψ j +1 ,k = E( (cid:98) ψ j +1 ,k ) denote the expected value of the eﬀect estimator, and ψ j +1 = E( (cid:98) ψ j +1 ) de-note the expected value adjusting for the covariates in L j +1 . We will ﬁrst explain how to proceedwhen the true values of the (asymptotic) expectations ψ j +1 ,k and ψ j +1 are known, so that the truebias caused by each candidate confounder being unadjusted for can be determined exactly. Let k ∗ denote the index of the candidate confounder that yields the smallest (squared) magnitude of thefollowing diﬀerence: k ∗ = arg min k ( ψ j +1 ,k − ψ j +1 ) . (1)Deﬁne the subset of remaining covariates in the j -th orbit to be L j = ( L j +1 \ L j +1 ,k ∗ ). Denote theeﬀect estimator that adjusts for only the (retained) confounders in L j by (cid:98) ψ j = (cid:98) ψ j +1 ,k ∗ .Repeating the steps above for j = J, . . . ,

1, returns a sequence of (nested) covariate subsets L J +1 ⊃ . . . ⊃ L , where a single, diﬀerent measured covariate is dropped in each orbit, relative to those in the previous(larger) orbit. Each subset indexes an eﬀect so that the sequence of eﬀect estimators, (cid:98) ψ J +1 , . . . , (cid:98) ψ , quan-tiﬁes the impact of unmeasured confounding induced by eliminating covariates from adjustment in turn.The assumption that ψ j +1 ,k and ψ j +1 are known when evaluating the criterion (1) can only beconsidered to hold approximately in very large samples where sampling error can be ignored. Otherwise,estimators are needed to approximate (1). Asymptotically unbiased estimators may nonetheless suﬀerfrom inaccuracies in ﬁnite samples due to sampling variability, so that merely plugging in the (sample) es-timators for the (population) eﬀects can produce incorrect results. To see why, consider a confounder thatis strongly associated with exposure, but weakly associated with outcome, so that it causes only a small(true) bias when unadjusted for. But adjusting for this confounder reduces the precision of the eﬀect esti-4ator , so that the observable change in estimated eﬀect may be deceptively large relative to the true biaswhen the confounder is eliminated, simply due to sampling variability. The sampling uncertainty of theeﬀect estimators can be acknowledged to more accurately assess the true biases due to unmeasured con-founding as follows. Let (cid:98) ψ j and (cid:98) ψ j (cid:48) denote the eﬀect estimators from two diﬀerent orbits, e.g., j and j (cid:48) .The expectation of the squared diﬀerence between the estimators can thus be decomposed as:E (cid:26)(cid:16) (cid:98) ψ j − (cid:98) ψ j (cid:48) (cid:17) (cid:27) = E (cid:20)(cid:110) (cid:98) ψ j − (cid:98) ψ j (cid:48) − ( ψ j − ψ j (cid:48) ) + ( ψ j − ψ j (cid:48) ) (cid:111) (cid:21) = V (cid:16) (cid:98) ψ j − (cid:98) ψ j (cid:48) (cid:17) + ( ψ j − ψ j (cid:48) ) + 2 E (cid:104)(cid:110) (cid:98) ψ j − (cid:98) ψ j (cid:48) − ( ψ j − ψ j (cid:48) ) (cid:111) ( ψ j − ψ j (cid:48) ) (cid:105) = V (cid:16) (cid:98) ψ j − (cid:98) ψ j (cid:48) (cid:17) + ( ψ j − ψ j (cid:48) ) , where V( X ) denotes the asymptotic variance of X . The last equality follows from the asymptotic unbi-asedness of (cid:98) ψ j and (cid:98) ψ j (cid:48) . By considering a squared diﬀerence between the eﬀects in (1), the squared diﬀer-ence between the corresponding estimators is no longer an unbiased estimator, with bias proportional tothe (population) variance of the diﬀerence between the eﬀect estimators. The “debiased” squared diﬀer-ence between the eﬀects is thus ( ψ j − ψ j (cid:48) ) = E (cid:26)(cid:16) (cid:98) ψ j − (cid:98) ψ j (cid:48) (cid:17) (cid:27) − V (cid:16) (cid:98) ψ j − (cid:98) ψ j (cid:48) (cid:17) . In practice, values of(1) can be calculated by plugging in the sample estimator for the population mean, and the (asymptotic)variance estimator for the population variance; i.e., (cid:16) (cid:98) ψ j +1 ,k − (cid:98) ψ j +1 (cid:17) − (cid:98) V (cid:16) (cid:98) ψ j +1 ,k − (cid:98) ψ j +1 (cid:17) . When thisquantity is non-positive in practice, we will simply set its value to zero; when there are multiple covariateswith the same zero value, we will randomly select a covariate among these to be eliminated. The samplingvariability of the estimates is thus recognized (and removed) when assessing the potential bias caused byeliminating each candidate confounder from adjustment. Because these sample approximations can be im-perfect, we defer to future work investigating other (computationally-intensive) methods, such as cross ﬁt-ting, to more precisely estimate the true changes in the population-level eﬀects between orbits.To facilitate comparing eﬀect estimators that condition on diﬀerent confounders across orbitswhen evaluating non-collapsible eﬀect measures , we recommend focusing on the same (marginal) esti-mand to avoid falsely interpreting changes in the estimated eﬀects due to non-collapsibility as a result ofthe eliminated confounders’ inﬂuence on the eﬀect. We will therefore employ an eﬀect estimator based ondoubly robust standardization to ensure that the selected sequence of confounders is determined using acollapsible measure. The diﬀerences between the exposure eﬀect estimators, and the associated variances,can be consistently estimated under settings with (non-)linear parametric regression models for the expo-sure and the outcome. This approach delivers an unbiased estimator if either the outcome or the exposuremodel is correctly speciﬁed, without amplifying biases that may arise due to the misspeciﬁed model. Fur-5hermore, the (asymptotic) variance estimator needed to calculate (1) can be derived in closed form forcomputational eﬃciency. We describe how to calculate this quantity, including the eﬀect estimator in agiven orbit, in Appendix A. In principle, any criterion can be used in place of (1) to eliminate the con-founders in turn toward mimicking (or realizing) the process of recording confounders for adjustment. Forexample, theoretical background or study design knowledge, or empirical covariate prioritization or impor-tance measures , can be exploited. Accounting for sampling uncertainty in the eﬀect estimators

We have sought to construct the (ideal) sequence in which covariates are eliminated from adjustment inturn using the true causal eﬀects in (1). But the inherent sampling uncertainty of the estimators only per-mits “guessing” what the sequence should be. To acknowledge the uncertainty, we will give the data sev-eral opportunities to approximate the (true) process of prioritizing covariates for adjustment, by perturbing the sequence to express our uncertainty about the eﬀects while investigating unknown confounding.We describe how to construct a perturbed sequence by modifying steps 2 and 3 of the proposedprocedure, and defer technical details to Appendix A. In step 2, for each candidate confounder indexedby k , determine the maximum likelihood estimates (MLEs) of the (coeﬃcients in the) exposure and out-come models used to calculate the eﬀect estimator (cid:98) ψ j +1 ,k . Then randomly draw a value of the estimatedoutcome model coeﬃcients from their joint sampling distribution. Calculate the perturbed eﬀect estima-tor using the randomly drawn coeﬃcient values, which we denote by ˜ ψ j +1 ,k . Hence ˜ ψ j +1 ,k varies from (cid:98) ψ j +1 ,k on the basis of sampling uncertainty, where the former is based on a random draw from the esti-mated coeﬃcients’ sampling distribution, and the latter is based on the MLE of the coeﬃcients. In step3, the confounder to be eliminated from adjustment can then be determined using the perturbed estima-tors ˜ ψ j +1 ,k , and ˜ ψ j +1 , in place of (cid:98) ψ j +1 ,k , and (cid:98) ψ j +1 ; in other words, set k ∗ = arg min k (cid:16) ˜ ψ j +1 ,k − ˜ ψ j +1 (cid:17) − (cid:98) V (cid:16) ˜ ψ j +1 ,k − ˜ ψ j +1 (cid:17) . The values of the resulting sequence of perturbed eﬀect estimators, ˜ ψ J +1 , . . . , ˜ ψ , willthus diﬀer from the observed sequence (which is based on the MLEs), even when the covariates are elim-inated in the same order. A Monte Carlo sampling distribution can be readily obtained by constructinge.g., B = 500 perturbed sequences.Ideally, when the precise probability model for recording confounders is known, the perturbed ef-fect estimators should be constructed using random draws from the implied joint sampling distributionof the model coeﬃcients across all orbits. The reduced variability of the eﬀect estimators due to the se-quence for eliminating covariates being known would likely yield a trajectory that ﬂuctuates less betweenorbits. A Bayesian approach may facilitate estimating such a sampling distribution, e.g., in terms of the6marginal) posterior probabilities of adjusting for each covariate. Because we have focused on the concep-tual development of the proposed sensitivity analysis in this paper, more complex approaches to samplesequences of eﬀect estimators from their joint distribution are deferred to future work. Extrapolating to the eﬀect adjusting for unmeasured confounders

In the second part of the proposed strategy, we extrapolate each unique sequence of (biased) eﬀect estima-tors to the predicted (unbiased) eﬀect, had additional hypothetical confounders been further adjusted for.The assumption that there exists a (possibly inﬁnitely) large collection of covariates, among which only asubset is revealed for confounding adjustment, and no covariate is precluded from being recorded (and ad-justed for) under some unknown sampling mechanism is therefore necessary. In particular, under repeatedobserved samples we may each time observe a diﬀerent set of measured covariates (where some are possi-bly sampled with unit probability, but none are sampled with zero probability), so that in the long run,across many repeated samples, even the weakest confounders (that are most likely to remain unmeasured)can be adjusted for.We will ﬁt a natural cubic spline to each (perturbed) sequence of exposure eﬀect estimators, e.g., (cid:98) ψ j , j = 1 , . . . , J + 1, with the number of measured covariates adjusted for (0 , . . . , J , after excluding theintercept) as a predictor. Natural cubic splines permit ﬂexibly evaluating the trajectory of the exposureeﬀect estimator as unmeasured confounding is reduced one covariate at a time. The predicted value ofthe eﬀect estimator, had a given number of hypothetical confounders been further adjusted for, can thenbe (linearly) extrapolated to. Suggestions on ﬁtting natural cubic splines in practice are described in Ap-pendix B. This second part of the strategy is inspired by the Simulation Extrapolation (SIMEX) approach for diﬀerent settings with measurement error in regression predictors.To express uncertainty, a 100(1 − α )% Wald conﬁdence interval (CI) may be constructed for eachsequence of eﬀect estimators; i.e., (cid:98) ψ j ± z − α/ (cid:113) (cid:98) V( (cid:98) ψ j ) , j = 1 , . . . , J + 1, where z − α/ is the 1 − α/ B diﬀerent extrapolated100(1 − α )% CIs. For a given number of hypothetical confounders, the extrapolated CIs across all per-turbed sequences (including the observed sequence) can be combined into an uncertainty interval . Let thelower (upper) endpoint of the uncertainty interval be the 2 . .

5) percentile among the lower (or up-per) endpoints of all the extrapolated CIs. Using the (symmetric) 95 percentiles reduces the uncertaintyintervals’ susceptibility to (i) perturbations that yield extreme CIs, and (ii) overly-conservative conclu-7ions, as compared to using the minimum (and maximum) of the extrapolated CIs. We show empiricallythat the uncertainty intervals cover the true eﬀect either at, or above, the nominal level of the constituentCIs under a variety of settings in simulation studies; details are described in Appendix C.Instead of extrapolating to a given (additional) number of hypothetical confounders, another pos-sibility is to extrapolate to the (smallest) number of confounders for the uncertainty interval to either in-clude or exclude zero so that the statistical signiﬁcance of the eﬀect estimate changes. The extent of un-measured confounding required to change the conclusions can thus be quantiﬁed in terms of the number ofhypothetical (unmeasured) confounders, rather than the strength of a single hypothetical confounder. Wedevelop this interpretation using the illustrations in the next section.

ILLUSTRATIONS

We ﬁrst elaborate on the thought experiment in the introduction to demonstrate how the proposal canreveal a stability in the eﬀect when unmeasured confounding is absent. We use data from an AIDS ran-domized clinical trial, and defer details to Appendix D. Suppose that knowledge about treatment beingrandomized, so that confounding of the exposure-outcome relation was unlikely, was hidden from us. Ad-justing for diﬀerent covariates did not greatly aﬀect the exposure eﬀect estimates, as shown by the rela-tively stable trajectory in Figure 2. Across all perturbations, the 95% CIs for the exposure eﬀect excludedzero, even when none of the measured covariates were adjusted for. Based on the extrapolations, adjust-ing for additional hypothetical confounders appeared unlikely to yield an uncertainty interval that wouldinclude zero, suggesting an insensitivity to hypothetical unmeasured confounding.We next apply the proposal to an observational study on the eﬀectiveness of Right Heart Catheter-ization (RHC) in the initial care of critically ill patients . The exposure variable was deﬁned to be whetheror not a patient received an RHC within 24 hours of admission. A binary outcome was deﬁned based onwhether a patient died at any time up to 180 days since admission. We considered a reduced dataset with2707 participants having complete data on 72 covariates (one covariate that was singular in the reduceddataset was dropped), so that the exposure and outcome models with all covariates can be ﬁtted.The sequence of eﬀect estimators constructed by eliminating covariates one at a time following thecriterion in (1) is plotted (as empty circles) in Figure 1. Adjusting for all the measured covariates yieldedan estimated eﬀect of 0 .

07, with a 95% CI of (0 . , . B = 500 perturbed sequences of the estima-tors and their corresponding 95% CIs. Natural cubic splines were ﬁtted to each (perturbed) sequence ofthe estimators and endpoints of the CI, and the predicted eﬀects, and uncertainty intervals, extrapolated8o. Almost all the predicted eﬀects across the perturbations remained positive after adjusting for unmea-sured confounding; e.g., 4% (8%) of the predicted eﬀects were negative after adjusting for one (nine) hypo-thetical confounder(s). The high sampling uncertainty in the predicted eﬀects can be observed from their95 percentiles (vertical lines) being wider than the endpoints of the uncertainty intervals (ﬁlled invertedtriangles) in Figure 1. Furthermore, the uncertainty intervals included zero with only one hypothetical un-measured confounder. The conclusion of a (statistically signiﬁcant) positive eﬀect adjusting for only themeasured covariates is thus likely to change to an eﬀect indistinguishable from zero after accounting forunmeasured confounding. Our ﬁndings are line with Lin et al. : in spite of the eﬀorts made by the studyinvestigators to ascertain and adjust for all the known risk factors, the conclusion that there was a (barelystatistically signiﬁcant) harmful exposure eﬀect of RHC is sensitive to unmeasured confounding. COMPARISONS WITH EXISTING APPROACHES

Existing sensitivity analyses for unmeasured confounding typically encode the association between a singlehypothetical confounder with either the exposure or the outcome (or both) as (separate) sensitivity pa-rameters . Because the observed data do not identify the sensitivity parameters, diﬀerent “reference”values are speciﬁed, typically by selecting a single measured confounder (among possibly many) as a proxyfor the hypothetical confounder, then judging the (minimum) extent that inference is aﬀected . Wedefer reviewing (other) existing methods to Appendix E. In contrast to these approaches, we exploit therichness of all available joint information on the measured covariates, by probing the eﬀects across diﬀer-ent amounts of intentionally induced unmeasured confounding using diﬀerent adjustment sets, without be-ing limited to particular interpretations and values of the (unidentiﬁable) sensitivity parameters. This canturn out to be informative. For example, suppose that a researcher has recorded all confounders so thatthere truly exists no unmeasured confounding, but this fact is unknown to them. Existing methods wouldnonetheless imply that unmeasured confounding remains a possibility in the worst-case scenario, whichmay inﬂate the degree of uncertainty using subjective judgment . In contrast, our proposal exploits fea-tures of the available data and study design to better inform the plausibility for unmeasured confounding.In particular, if all confounders are indeed adjusted for, and additional “redundant” covariates that are as-sociated with only either exposure or outcome are subsequently recorded (Greenland describes such anexample ), then our proposed analysis would reveal little or no sensitivity of inference due to unmeasuredconfounding. We acknowledge the heuristic nature of our proposal, but argue that all sensitivity analy-ses for unmeasured confounding are inherently heuristic, because the observed data does not identify thecausal eﬀect under such settings; see e.g., Cusson and Infante-Rivard and the ensuing discussions.9ur proposal therefore diﬀers from existing approaches in a few aspects. (i) No (unidentiﬁable)sensitivity parameters are required. (ii) No assumptions about the (arbitrary) distribution or scale of thehypothetical confounder(s), or whether they amplify or nullify the eﬀect estimate, need to be imposed.(iii) The exposure and outcome can be continuous or non-continuous, so that non-linear models can bereadily accommodated for estimating the exposure eﬀect. (iv) The sensitivity of exposure eﬀects to thepossibility of (un)measured confounding can be concisely inspected using both graphical and numericalmethods, as we demonstrated in the illustrations. (v) The proposed strategy is not limited to the averageexposure eﬀect, and can be readily applied to assess the sensitivity of any (scalar) causal eﬀect to unmea-sured confounding, such as the eﬀect of exposure among the (un)treated. In principle, the strategy can bereadily generalized in future work to more complex settings, such as heterogeneous exposure eﬀects andnonparametric estimation methods. (vi) In contrast to existing approaches, the proposal works best whena large collection of covariates has been recorded, and background or empirical information on both mea-sured and unmeasured confounders is available, so that their joint inﬂuence on the exposure eﬀect can bemeticulously examined. REFERENCES [1] M. Bonvini and E. H. Kennedy. Sensitivity analysis via the proportion of unmeasured confounding. arXiv preprint arXiv:1912.02793 , 2019.[2] M. A. Brookhart, S. Schneeweiss, K. J. Rothman, R. J. Glynn, J. Avorn, and T. St¨urmer. Variableselection for propensity score models.

American Journal of Epidemiology , 163(12):1149–1156, 2006.[3] H. K. Brown, J. G. Ray, A. S. Wilton, Y. Lunsky, T. Gomes, and S. N. Vigod. Association betweenserotonergic antidepressant use during pregnancy and autism spectrum disorder in children.

JAMA ,317(15):1544–1552, 2017.[4] N. B. Carnegie, M. Harada, and J. L. Hill. Assessing sensitivity to unmeasured confounding using asimulated potential confounder.

Journal of Research on Educational Eﬀectiveness , 9(3):395–420, 2016.doi: 10.1080/19345747.2015.1078862. URL https://doi.org/10.1080/19345747.2015.1078862 .[5] N. B. Carnegie, M. Harada, V. Dorie, and J. Hill. treatSens: A Package to Assess Sensitivity ofCausal Analyses to Unmeasured Confounding . New York, NY, August 2020. URL https://github.com/vdorie/treatSens . Version R package version 3.0.[6] C. Cinelli and C. Hazlett. Making sense of sensitivity: extending omitted variable bias.

Journal of he Royal Statistical Society: Series B (Statistical Methodology) , 82(1):39–67, 2020. doi: 10.1111/rssb.12348. URL https://rss.onlinelibrary.wiley.com/doi/abs/10.1111/rssb.12348 .[7] C. Cinelli, J. Ferwerda, and C. Hazlett. sensemakr: Sensitivity Analysis Tools for Regression Models ,2020. URL https://CRAN.R-project.org/package=sensemakr . R package version 0.1.3.[8] A. F. Connors, T. Speroﬀ, N. V. Dawson, C. Thomas, F. E. Harrell, D. Wagner, N. Desbiens,L. Goldman, A. W. Wu, R. M. Caliﬀ, et al. The eﬀectiveness of right heart catheterization in theinitial care of critically iii patients. Journal of the American Medical Association , 276(11):889–897,1996.[9] J. R. Cook and L. A. Stefanski. Simulation-extrapolation estimation in parametric measurement errormodels.

Journal of the American Statistical Association , 89(428):1314–1328, 1994.[10] C. M. Crainiceanu, F. Dominici, and G. Parmigiani. Adjustment uncertainty in eﬀect estimation.

Biometrika , 95(3):635–651, 2008.[11] A. Cusson and C. Infante-Rivard. Bias factor, maximum bias and the E-value: insight and extendedapplications.

International Journal of Epidemiology , 49(5):1509–1516, 09 2020. ISSN 0300-5771. doi:10.1093/ije/dyaa127. URL https://doi.org/10.1093/ije/dyaa127 .[12] P. Ding and T. J. VanderWeele. Sensitivity analysis without assumptions.

Epidemiology (Cambridge,Mass.) , 27(3):368, 2016.[13] V. Dorie, M. Harada, N. B. Carnegie, and J. Hill. A ﬂexible, interpretable framework for assessingsensitivity to unmeasured confounding.

Statistics in Medicine , 35(20):3453–3470, 2016. doi: 10.1002/sim.6973. URL https://onlinelibrary.wiley.com/doi/abs/10.1002/sim.6973 .[14] M. Genb¨ack and X. de Luna. Causal inference accounting for unobserved confounding after outcomeregression and doubly robust estimation.

Biometrics , 75(2):506–515, 2019.[15] S. Greenland. The impact of prior distributions for uncontrolled confounding and response bias: acase study of the relation of wire codes and magnetic ﬁelds to childhood leukemia.

Journal of theAmerican Statistical Association , 98(461):47–54, 2003.[16] S. Greenland. Commentary: An argument against E-values for assessing the plausibility that an as-sociation could be explained away by residual confounding.

International Journal of Epidemiology , 49(5):1501–1503, 08 2020. ISSN 0300-5771. doi: 10.1093/ije/dyaa095. URL https://doi.org/10.1093/ije/dyaa095 . 1117] S. Greenland, J. M. Robins, and J. Pearl. Confounding and collapsibility in causal inference.

Statisti-cal Science , 14(1):29–46, 1999.[18] R. H. Groenwold, I. Shofty, M. Mioˇcevi´c, M. Van Smeden, and I. Klugkist. Adjustment for unmea-sured confounding through informative priors for the confounder-outcome relation.

BMC medicalresearch methodology , 18(1):174, 2018.[19] G. W. Imbens. Sensitivity to exogeneity assumptions in program evaluation.

American EconomicReview , 93(2):126–132, May 2003. doi: 10.1257/000282803321946921. URL .[20] D. Y. Lin, B. M. Psaty, and R. A. Kronmal. Assessing the Sensitivity of Regression Results to Un-measured Confounders in Observational Studies.

Biometrics , 54(3):948–963, 1998.[21] W. Liu, S. J. Kuramoto, and E. A. Stuart. An introduction to sensitivity analysis for unobservedconfounding in nonexperimental prevention research.

Prevention science , 14(6):570–580, 2013.[22] W. W. Loh and S. Vansteelandt. Confounder selection strategies targeting stable treatment eﬀectestimators.

Statistics in Medicine , 2020. doi: https://doi.org/10.1002/sim.8792. URL https://onlinelibrary.wiley.com/doi/abs/10.1002/sim.8792 .[23] L. C. McCandless and P. Gustafson. A comparison of bayesian and monte carlo sensitivity analysisfor unmeasured confounding.

Statistics in Medicine , 36(18):2887–2901, 2017.[24] L. C. McCandless, P. Gustafson, and A. Levy. Bayesian sensitivity analysis for unmeasured confound-ing in observational studies.

Statistics in Medicine , 26:2331–2347, 2007.[25] R Core Team.

R: A Language and Environment for Statistical Computing . R Foundation for Statisti-cal Computing, Vienna, Austria, 2020. URL .[26] A. Richardson, M. G. Hudgens, P. B. Gilbert, and J. P. Fine. Nonparametric bounds and sensitivityanalysis of treatment eﬀects.

Statistical science: a review journal of the Institute of MathematicalStatistics , 29(4):596, 2014.[27] J. M. Robins. [covariance adjustment in randomized experiments and observational studies]: Com-ment.

Statistical Science , 17(3):309–321, 2002. ISSN 08834237. URL .[28] P. Rosenbaum. A conditional test with demonstrated insensitivity to unmeasured bias in matchedobservational studies.

Biometrika , 2020. 1229] P. R. Rosenbaum. Sensitivity analysis for certain permutation inferences in matched observationalstudies.

Biometrika , 74:13–26, 1987.[30] P. R. Rosenbaum. Model-based direct adjustment.

Journal of the American Statistical Association , 82(398):387–394, 1987.[31] P. R. Rosenbaum.

Observational Studies . New York : Springer, New York, 2002.[32] P. R. Rosenbaum and D. B. Rubin. Assessing sensitivity to an unobserved binary covariate in anobservational study with binary outcome.

Journal of the Royal Statistical Society: Series B (Method-ological) , 45(2):212–218, 1983.[33] Z. Tan. A distributional approach for causal inference using propensity scores.

Journal of the Ameri-can Statistical Association , 101(476):1619–1637, 2006.[34] T. J. VanderWeele and O. A. Arah. Bias formulas for sensitivity analysis of unmeasured confoundingfor general outcomes, treatments, and confounders.

Epidemiology , 22:42–52, 2011.[35] T. J. VanderWeele and P. Ding. Sensitivity analysis in observational research: introducing the E-value.

Annals of internal medicine , 2017.[36] S. Vansteelandt and N. Keiding. Invited Commentary: G-Computation–Lost in Translation?

Ameri-can Journal of Epidemiology , 173(7):739–742, 03 2011. ISSN 0002-9262. doi: 10.1093/aje/kwq474.[37] V. Veitch and A. Zaveri. Sense and sensitivity analysis: Simple post-hoc analysis of bias due to unob-served confounding. arXiv preprint arXiv:2003.01747 , 2020.[38] J. Wakeﬁeld.

Bayesian and frequentist regression methods . Springer Science & Business Media, 2013.[39] B. D. Williamson, P. B. Gilbert, M. Carone, and N. Simon. Nonparametric variable importance as-sessment using machine learning techniques.

Biometrics , 2020. doi: https://doi.org/10.1111/biom.13392. URL https://onlinelibrary.wiley.com/doi/abs/10.1111/biom.13392 .[40] K. Wuthrich and Y. Zhu. Omitted variable bias of lasso-based inference methods: A ﬁnite sampleanalysis.

Available at SSRN 3379123 , 2019. doi: 10.2139/ssrn.3379123. URL https://dx.doi.org/10.2139/ssrn.3379123 .[41] B. Zhang and E. J. Tchetgen Tchetgen. A semiparametric approach to model-based sensitivity analy-sis in observational studies. arXiv preprint arXiv:1910.14130 , 2019.1342] Q. Zhao, D. S. Small, and B. B. Bhattacharya. Sensitivity analysis for inverse probability weightingestimators via the percentile bootstrap.

Journal of the Royal Statistical Society: Series B (StatisticalMethodology) , 81(4):735–761, 2019. − . − . − . − . − . . . . Uncertainty intervals for RHC data

Number of confounders adjusted for E ff e c t e s t i m a t e Figure 1: Estimators of the average exposure eﬀect from each orbit, for the RHC data. Circlesindicate the point estimates, and (inverted) triangles indicate the (lower) upper endpoint of the 95% CI.Empty points indicate the sequence of eﬀect estimates constructed by eliminating a single measured con-founder from adjustment in turn. The curve corresponds to a ﬁtted natural cubic spline. Filled points in-dicate the (extrapolated) predictions after adjusting for q additional unmeasured confounders. Each thingrey line corresponds to the natural cubic spline ﬁtted to a perturbed sequence. The vertical lines indicatethe symmetric 95 percentiles of the predicted eﬀects across all B perturbations.14 EXPOSURE EFFECT ESTIMATOR

The eﬀect estimator based on doubly robust standardization is calculated as follows. For a binary expo-sure, denote the (non-linear) exposure model, conditional on the remaining confounders in the j -th orbit L j , by E( A i | L ji ) = Pr( A i = 1 | L ji ) = expit( α j L ji ), where the subscript i of L ji denotes individual i andexpit( x ) = exp( x ) / { x ) } . Deﬁne the inverse probability of exposure weight for individual i as: W ji = A i Pr (cid:16) A i = 1 | L i = L ji (cid:17) + 1 − A i − Pr (cid:16) A i = 1 | L i = L ji (cid:17) . (2)The weight W ji is the reciprocal of the conditional probability of individual i being assigned the observedexposure A i given the confounders L ji . Let (cid:99) W ji denote the estimated weights obtained by substituting themaximum likelihood estimators (MLE) for α j in the exposure model. Fit the outcome regression modelE( Y | A, L j ) = h − ( ψ ∗ j A + β j L j ) to the observed data using the aforementioned weights, where h − isthe inverse of a link function h . (The ∗ superscript indicates a conditional eﬀect that may diﬀer from themarginal eﬀect ψ j .) Let (cid:98) E( Y | A, L j ) denote the ﬁtted outcome model obtained by plugging in the MLE for ψ ∗ j and β j . A doubly robust estimator of the average potential outcome E( Y a ) = n − (cid:80) i Y ai for a = 0 , (cid:98) E( Y a ) = n − (cid:88) i I { A i = a } (cid:99) W ji (cid:110) Y i − (cid:98) E( Y | A = A i , L ji ) (cid:111) + (cid:98) E( Y | A = a, L ji ) , (3)where I { B } = 1 if B is true and 0 otherwise. The estimator for the marginal exposure eﬀect ψ = E( Y ) − E( Y ) in the j -th orbit is therefore: (cid:98) ψ j = n − (cid:88) i (2 A i − (cid:99) W ji (cid:110) Y i − (cid:98) E( Y | A = A i , L ji ) (cid:111) + (cid:98) E( Y | A = 1 , L ji ) − (cid:98) E( Y | A = 0 , L ji ) . (4)When both exposure and outcome models are correctly speciﬁed, an asymptotic expansion around ψ yieldsthe so-called “inﬂuence function” for individual i as: φ ji = (2 A i − W ji (cid:110) Y i − E( Y | A = A i , L ji ) (cid:111) + E( Y | A = 1 , L ji ) − E( Y | A = 0 , L ji ) − ψ. (5)Let (cid:98) φ ji = (2 A i − (cid:99) W ji (cid:110) Y i − (cid:98) E( Y | A = A i , L ji ) (cid:111) + (cid:98) E( Y | A = 1 , L ji ) − (cid:98) E( Y | A = 0 , L ji ) − (cid:98) ψ j denote the estimatedinﬂuence function, obtained by plugging in the maximum likelihood estimators for the coeﬃcients in theexposure and outcome models, and substituting the population expectation with a sample average. Thevariance of the diﬀerence between eﬀect estimators from two diﬀerent orbits, e.g., j and k , is consistentlyestimated by the sample variance (denoted by (cid:98) V) of the corresponding diﬀerence in estimated inﬂuence15unctions: (cid:98) V (cid:110) n / (cid:16) (cid:98) ψ j − (cid:98) ψ k (cid:17)(cid:111) = ( n − − (cid:88) i (cid:16) (cid:98) φ ji − (cid:98) φ ki (cid:17) . (6)Consistency and asymptotic normality of the standardized diﬀerence (1) with mean zero and varianceone directly follow from the law of large numbers and the central limit theorem. The squared diﬀerencebetween the (asymptotic) expectations of the eﬀect estimators is then approximated by (cid:16) (cid:98) ψ j − (cid:98) ψ k (cid:17) − (cid:98) V (cid:16) (cid:98) ψ j − (cid:98) ψ k (cid:17) .A perturbed eﬀect estimator can be calculated as follows. In step 2 of the proposed procedure, inplace of calculating the eﬀect estimator (cid:98) ψ j +1 ,k for each candidate confounder based on the maximum like-lihood estimates (MLEs) of the (coeﬃcients in the) exposure and outcome models, carry out the followingsteps instead:2(a) Fit the exposure model Pr( A i = 1 | L j +1 , − ki ) = expit( α j +1 , − k L j +1 , − ki ) to the observed data, where L j +1 , − ki = ( L j +1 i \ L j +1 ,ki ) denotes the confounders in L j +1 excluding the single confounder L j +1 ,k for individual i , and α j +1 , − k denotes the corresponding coeﬃcients. Calculate the MLEs, denoted bye.g., (cid:98) α j +1 , − k , and the observed weights as deﬁned in (2), by substituting (cid:98) α j +1 , − k for α j +1 , − k in theexposure model.2(b) Fit the outcome regression model E( Y | A, L j +1 , − k ) = h − ( ψ ∗ j +1 , − k A + β j +1 , − k L j +1 , − k ) to the ob-served data using the observed weights from the previous step, where h − is the inverse link functionin a canonical generalized linear model for the outcome. Calculate the MLEs, e.g., ( ˆ ψ ∗ j +1 , − k , ˆ β j +1 , − k ),as well as the observed Fisher information matrix.2(c) Randomly draw a value of the estimated outcome model coeﬃcients ( (cid:98) ψ ∗ j +1 , − k , (cid:98) β j +1 , − k ) from theirjoint sampling distribution, which we denote simply by G ( · ). In practice, G ( · ) may be approximatedby the (asymptotically-valid) multivariate normal distribution with mean vector ( ˆ ψ ∗ j +1 , − k , ˆ β j +1 , − k )and the covariance matrix being the inverse of the observed Fisher information matrix; see e.g., Wake-ﬁeld , Section 7.6.2. Denote the randomly drawn values by ( ˜ ψ ∗ j +1 , − k , ˜ β j +1 , − k ). Let ˜E( Y | A, L j +1 , − k )denote the perturbed outcome model obtained by plugging in the randomly drawn values of the out-come model coeﬃcients.2(d) Calculate the eﬀect estimator as described in the Appendix, using the observed (exposure) weightsand perturbed outcome model. Denote the resulting perturbed estimator by ˜ ψ j +1 ,k .The diﬀerence between (cid:98) ψ j +1 ,k and ˜ ψ j +1 ,k therefore lies in step 2(c), where the former is based onthe MLE of the outcome model coeﬃcients whereas the latter is based on a random draw from the esti-mated coeﬃcients’ sampling distribution. Calculating the perturbed eﬀect estimators only requires ﬁtting16he exposure and outcome models once (to the observed data), which is computationally more eﬃcientthan e.g., a nonparametric bootstrap procedure that randomly resamples observations with replacement,and reﬁts both exposure and outcome models. For computational convenience, we have considered only aperturbed outcome model. In principle, a random draw of the estimated exposure model coeﬃcients fromtheir sampling distribution, denoted by e.g., ˜ α j +1 , − k , may be made in step 2(a). The weights as deﬁned in(2) can then be calculated by substituting ˜ α j +1 , − k for α j +1 , − k in the exposure model. The resulting (per-turbed) weights are then used to ﬁt the outcome model in step 2(b). B FITTING NATURAL CUBIC SPLINES TO A TRAJECTORY OF THEEXPOSURE EFFECT ESTIMATOR

A natural cubic spline is ﬁtted to each sequence of exposure eﬀect estimators, e.g., (cid:98) ψ j , j = 1 , . . . , J + 1,with the number of measured covariates adjusted for (0 , . . . , J , after excluding the intercept) as a pre-dictor. Natural cubic splines permit ﬂexibly evaluating the trajectory of the exposure eﬀect estimator asunmeasured confounding is reduced one covariate at a time. The predicted value of the eﬀect estimator,had a given number of hypothetical confounders been further adjusted for, can then be linearly extrapo-lated to. The extrapolation therefore assumes that the impact of adjusting for hypothetical confounderson the causal eﬀect satisﬁes the boundary conditions of the ﬁtted spline, e.g., a zero second derivative atthe boundary knots. For simplicity, natural cubic splines can be ﬁtted with the largest number of (evenly-spaced) interior knots permitted to ensure identiﬁability of the spline; when there are J confounders, theremay be as many as J − C SIMULATION STUDIES

Simulation studies were conducted under diﬀerent data-generating scenarios to empirically evaluate theability of the proposal to avoid potential biases due to unmeasured confounding.

C.1 Study 1

Data for a single population was generated as follows: L , . . . , L p + q ∼ i.i.d. N (0 , A ∼ Bernoulli { expit( α + p + q (cid:88) k =1 α k L k ) } Y ∼ Bernoulli { expit( β + δA + p + q (cid:88) k =1 β k L k ) } We set α = β = 0 , α k ∼ Uniform( − , , β k = α k , k = 1 , . . . , p + q . Each confounder had the samestrength of inﬂuence on the exposure and outcome. To induce unmeasured confounding, we iteratively re-moved the q confounders that resulted in the smallest (absolute) changes to the eﬀects between consecu-tive orbits following the deterministic criterion (1). These (same) q confounders would thus be regardedas unmeasured (in each observed dataset), so that only p measured confounders remained available for ad-justment. We considered values of p ∈ { , } , and q ∈ { , , } . The true causal eﬀect was zero when δ = 0; we considered values of δ ∈ { , } . There were a total of 12 diﬀerent data-generating scenarios. Foreach setting, we ﬁrst generated a single population of size N = 50000 (so that sampling variability can beessentially ignored when determining which confounders to be unmeasured), then generated each observeddataset by randomly sampling (without replacement) n = 1000 individuals from the population. For eachobserved dataset, we eliminated the measured confounders in turn by minimizing the (squared) diﬀerencesin the eﬀects between consecutive orbits following (1). We then constructed the sequence of eﬀect esti-mators, as well as the 95% CIs for the eﬀect, indexed by the sequence of nested confounder subsets. Forsimplicity, we ﬁtted to each sequence a natural cubic spline with the largest number of (evenly-spaced) in-terior knots permitted to ensure identiﬁability of the spline; when there are p confounders, there may beas many as p − q unmeasured con-founders was then extrapolated to. When q = 0, we extrapolated to two unmeasured confounders to assess18he stability of (inference for) the eﬀect.We simulated 1000 samples for each setting. We considered point estimates of, and 95% CIs for,the population average exposure eﬀect ψ that were based on either (i) adjusting for all p + q (un)measuredconfounders associated with exposure and outcome, or (ii) adjusting for only the p measured confounders,or (iii) the extrapolated predictions. The empirical mean and standard deviation (across all simulatedsamples) of all three point estimates are displayed in Table 1. We also assessed the empirical frequencyat which each of the 95% CIs, as well as the uncertainty interval, included the average treatment eﬀect.(Recall that the uncertainty interval for a single observed dataset was the union of all extrapolated 95%CIs across all perturbed sequences.) Adjusting for only the measured confounders yielded only slightly bi-ased point estimates, but resulted in 95% CIs whose coverage levels were empirically far below the nominallevel in the presence of a large number of unmeasured confounders. The extrapolated predicted eﬀect wassimilarly slightly biased, but had much greater empirical variability, resulting in the largest empirical meansquared error (MSE) among the three estimates. When there were many unmeasured confounders, thepresence of more measured covariates to learn about the impact of unmeasured confounding from resultedin the uncertainty intervals being more likely to capture the population average eﬀect at (or exceeding)the nominal level of the constituent 95% CIs in the presence of unmeasured confounding.Point estimates Empirical MSE (square root) Coverage of 95% CIs q ψ p All Measured Predicted All Measured Predicted All Measured Predicted0 0.00 12 0.00 (0.03) 0.00 (0.03) 0.00 (0.03) 0.03 0.03 0.03 0.94 0.94 0.980 0.00 16 0.00 (0.03) 0.00 (0.03) 0.00 (0.03) 0.03 0.03 0.03 0.93 0.93 0.980 0.16 12 0.16 (0.03) 0.16 (0.03) 0.15 (0.03) 0.03 0.03 0.03 0.94 0.94 0.980 0.13 16 0.13 (0.03) 0.13 (0.03) 0.13 (0.03) 0.03 0.03 0.03 0.93 0.93 0.984 0.00 12 0.00 (0.03) 0.01 (0.03) 0.00 (0.04) 0.03 0.03 0.04 0.94 0.95 0.994 0.00 16 0.00 (0.03) 0.00 (0.03) -0.01 (0.04) 0.03 0.03 0.04 0.93 0.94 0.984 0.13 12 0.13 (0.03) 0.14 (0.03) 0.13 (0.04) 0.03 0.04 0.04 0.92 0.94 0.994 0.13 16 0.13 (0.04) 0.13 (0.04) 0.12 (0.04) 0.04 0.04 0.05 0.90 0.92 0.988 0.00 12 0.00 (0.03) 0.05 (0.03) -0.05 (0.08) 0.03 0.06 0.09 0.94 0.72 0.858 0.00 16 -0.01 (0.03) 0.04 (0.03) -0.04 (0.12) 0.03 0.05 0.12 0.90 0.73 0.938 0.13 12 0.12 (0.04) 0.18 (0.04) 0.08 (0.09) 0.04 0.07 0.10 0.89 0.72 0.918 0.11 16 0.10 (0.03) 0.15 (0.04) 0.08 (0.10) 0.03 0.06 0.11 0.89 0.70 0.97Table 1: Results for simulation study 1. Point estimates of, and inference for, the average expo-sure eﬀect ψ were based on either (i) adjusting for all confounders (“All”), or (ii) adjusting for only themeasured confounders (“Measured”), or (iii) the extrapolated prediction (“Predicted”). The empiricalMSE was calculated as the sum of the squared bias and the empirical variance of the point estimates.Coverage for the 95% CIs was calculated as the empirical proportion of CIs that included the popula-tion average exposure eﬀect. For simplicity, the extrapolated uncertainty interval is termed the “Predicted95% CI” in this table. Empirical standard errors are in brackets. All results were rounded to two decimalplaces. 19 .2 Study 2 In this study, we compare the proposed method with the existing methods of Carnegie et al. , as imple-mented in the “treatSens” package , and of Cinelli and Hazlett , as implemented in the “sensemakr”package in R . We modiﬁed the data-generating process of the previous study to fulﬁll the required as-sumptions of Carnegie et al. ; i.e., exposure was generated as A ∼ Φ( α + (cid:80) p + qk =1 α k L k ), where Φ( · ) is thecumulative distribution function of a standard normal variable so that the exposure model employs a “pro-bit” link, and outcome Y ∼ N ( β + δA + (cid:80) p + qk =1 β k L k , p + q ) was continuous and normally distributed (withvariance simply equal to the total variance of all p + q independent confounders with unit variance). Cinelliet al. require only a continuous outcome so that linear regression models can be ﬁtted, with no exposuremodel needed. We set α = β = 0 , α k ∼ Uniform( − . , . , β k ∼ Uniform( − , , k = 1 , . . . , p + q . Weconsidered settings with p = 16 , q ∈ { , , } , and δ ∈ { , } . In addition, to empirically evaluate the biasesthat can result when the “probit” link function in the assumed exposure model is misspeciﬁed, we gener-ated data for the exposure under a “logit” link function. There were a total of 12 diﬀerent data-generatingscenarios.For each setting, we ﬁrst generated a single population of size N = 50000 with all p + q con-founders, then induced unmeasured confounding by iteratively removing the q confounders that resultedin the smallest (absolute) changes to the eﬀects between consecutive orbits following the deterministic cri-terion (1). These (same) q confounders would thus be regarded as unmeasured (in each observed sample),so that only p measured confounders remained available for adjustment. The data for each observed sam-ple was generated by randomly drawing (without replacement) n = 2000 individuals from the population.For each observed data, we carried out the proposed sensitivity analysis by eliminating the measured con-founders in turn to minimize the (squared) diﬀerences in the eﬀects between consecutive orbits following(1). We then constructed the sequence of standardized eﬀect estimators, as well as the 95% CIs for the ef-fect, indexed by the sequence of nested confounder subsets. Natural cubic splines with the maximum num-ber of p − q unmeasured confounders was then extrapolated to. When q = 0, we extrapolated totwo unmeasured confounders to assess the stability of (inference for) the eﬀect.Both the treatSens and sensemakr methods require specifying values for the sensitivity parame-ters that encode the separate associations between a single hypothetical unmeasured confounder with theexposure, and with the outcome, respectively, for each individual simulated dataset. For simplicity whenimplementing these simulation studies, we selected a “calibrated,” or “benchmark,” value among the co-eﬃcient estimates of the p measured confounders in the exposure (or outcome) model ﬁtted to each sim-20lated observed data. When q = 0 so that there truly was no unmeasured confounding, we selected thecoeﬃcient estimate with the smallest (absolute) magnitude among the p estimated coeﬃcients to assessthe impact of incorrectly assuming some (minimal) unmeasured confounding. When q > p estimated coeﬃcients as a proxy for the multiple unmeasured confounders. When usingthe treatSens method, we sampled 500 posterior draws of the exposure eﬀect point estimate, and its stan-dard error estimate, for the given combination of the sensitivity parameter values. The point estimate wasdetermined as the posterior mean, and the standard error estimate was determined using Rubin’s rules asdescribed in Carnegie et al. ; 95% Wald CIs were then constructed using the corresponding standard nor-mal quantiles. When using the sensemakr method, we selected a bias adjustment that reduces (increases)the absolute value of the estimated coeﬃcient for the exposure in the outcome model if ψ = 0 ( ψ > ψ is unknown in practice. All other arguments in the sensemakr functionwere left at their default values.We simulated 1000 samples for each setting. We again considered the point estimates of, and 95%CIs for, the population average exposure eﬀect ψ that were based on either (i) adjusting for all p + q (un)measuredconfounders associated with exposure and outcome, or (ii) adjusting for only the p measured confounders,or (iii) the extrapolated predictions. In addition, we considered the predictions using the treatSens andsensemakr methods with their respective calibrated or benchmark values of the sensitivity parameters.The empirical mean and standard deviation (across all simulated samples) of all ﬁve point estimates aredisplayed in Table 2. Because the treatSens method can fail for a given combination of the calibrated sen-sitivity parameter values, we included the proportion of simulated datasets where the method returnedsuch an error. We also assessed the empirical frequency at which each of the 95% CIs included the av-erage treatment eﬀect. There were no systematic biases in the point estimates empirically when all theconfounders (including those regarded as unmeasured) were adjusted for, regardless of the assumed linkfunction for the exposure model. The treatSens method failed in at least 40% of the simulated datasets;among datasets where the method successfully converged, unbiased point estimates were obtained onlywhen there was truly no exposure eﬀect ( ψ = 0) and no unmeasured confounding ( q = 0). However, the95% CIs met the nominal coverage levels only when the probit link was correctly assumed; in all othercases, the CIs failed to capture the true value of ψ in all the simulated samples. The sensemakr methodyielded unbiased estimates when there was truly no exposure eﬀect ( ψ = 0), and positively biased esti-mates when there was an exposure eﬀect ( ψ > ψ > q > q = 8 where the coverage levels fell below their nominal levels empirically.22oint estimates q link ψ All Measured % fail treatSens sensemakr Predicted0 logit 0.00 0.02 (0.18) 0.02 (0.18) 0.44 -0.01 (0.02) 0.01 (0.17) 0.02 (0.24)0 logit 2.00 2.01 (0.19) 2.01 (0.19) 0.47 0.19 (0.02) 2.02 (0.19) 2.01 (0.25)0 probit 0.00 0.03 (0.20) 0.03 (0.20) 0.44 0.00 (0.02) 0.03 (0.18) 0.03 (0.25)0 probit 2.00 2.03 (0.20) 2.03 (0.20) 0.43 0.20 (0.02) 2.05 (0.19) 2.03 (0.24)4 logit 0.00 -0.02 (0.20) 0.11 (0.29) 0.60 -3.37 (0.97) -0.06 (0.37) 0.16 (0.53)4 logit 2.00 1.99 (0.20) 2.12 (0.28) 0.61 -3.28 (0.83) 2.60 (0.36) 2.17 (0.52)4 probit 0.00 -0.03 (0.23) 0.16 (0.33) 0.60 -3.44 (0.84) -0.19 (0.60) 0.32 (0.66)4 probit 2.00 1.98 (0.24) 2.16 (0.32) 0.59 -3.23 (0.91) 2.94 (0.44) 2.32 (0.67)8 logit 0.00 0.06 (0.23) -0.02 (0.32) 0.64 -1.66 (1.38) -0.01 (0.33) 0.21 (1.15)8 logit 2.00 2.04 (0.22) 1.97 (0.33) 0.61 -1.63 (1.42) 2.35 (0.41) 2.25 (1.11)8 probit 0.00 0.04 (0.29) -0.08 (0.38) 0.66 -1.92 (1.46) 0.03 (0.52) 0.59 (1.56)8 probit 2.00 2.03 (0.27) 1.88 (0.37) 0.64 -1.70 (1.45) 2.47 (0.49) 2.62 (1.48)Empirical MSE (square root) q link ψ All Measured treatSens sensemakr Predicted0 logit 0.00 0.18 0.18 0.02 0.17 0.240 logit 2.00 0.19 0.19 1.81 0.19 0.250 probit 0.00 0.20 0.20 0.02 0.18 0.260 probit 2.00 0.21 0.21 1.80 0.20 0.244 logit 0.00 0.21 0.31 3.50 0.38 0.564 logit 2.00 0.21 0.31 5.35 0.70 0.554 probit 0.00 0.23 0.37 3.54 0.63 0.734 probit 2.00 0.24 0.36 5.31 1.04 0.758 logit 0.00 0.24 0.32 2.16 0.33 1.178 logit 2.00 0.23 0.33 3.89 0.54 1.148 probit 0.00 0.29 0.39 2.41 0.52 1.678 probit 2.00 0.27 0.39 3.97 0.67 1.61Coverage of 95% CIs q link ψ All Measured treatSens sensemakr Predicted0 logit 0.00 0.96 0.96 0.00 0.96 0.990 logit 2.00 0.95 0.95 0.00 0.94 0.980 probit 0.00 0.95 0.95 1.00 0.95 0.980 probit 2.00 0.95 0.95 0.00 0.95 0.984 logit 0.00 0.95 0.93 0.00 0.76 0.984 logit 2.00 0.96 0.94 0.00 0.35 0.974 probit 0.00 0.94 0.91 0.00 0.43 0.954 probit 2.00 0.94 0.92 0.00 0.16 0.948 logit 0.00 0.94 0.96 0.00 0.91 0.818 logit 2.00 0.96 0.96 0.00 0.69 0.828 probit 0.00 0.93 0.94 0.00 0.79 0.848 probit 2.00 0.95 0.95 0.00 0.63 0.86Table 2: Results for simulation study 2. Point estimates of, and inference for, the average ex-posure eﬀect ψ were based on either (i) adjusting for all confounders (“All”), or (ii) adjusting for onlythe measured confounders (“Measured”), or (iii) the extrapolated prediction (“Predicted”), or (iv) thetreatSens method assuming calibrated values of the sensitivity parameters, or (v) the sensemakr methodassuming benchmark values of the sensitivity parameters. The proportion of simulated datasets wherethe treatSens method failed is labelled as “% fail.” The empirical MSE was calculated as the sum of thesquared bias and the empirical variance of the point estimates. Coverage for the 95% CIs was calculatedas the empirical proportion of CIs that included the population average exposure eﬀect. Empirical stan-dard errors are in brackets. All results were rounded to two decimal places.23 AIDS CLINICAL TRIALS GROUP STUDY 175

The ‘ACTG175’ dataset was from an AIDS randomized clinical trial, and was distributed as part of the speff2trial package via the Comprehensive R Archive Network ( https://CRAN.R-project.org/package=speff2trial ). The trial compared monotherapy using either zidovudine or didanosine alone with combi-nation therapy using either zidovudine and didanosine, or zidovudine and zalcitabine, in adults infectedwith the human immunodeﬁciency virus type I whose CD4 T cell counts were between 200 and 500 percubic millimeter. Exposure was (re)coded as A = 0 for therapy using either zidovudine or didanosine only,and A = 1 for therapies combining zidovudine and either didanosine or zalcitabine. A binary outcomewas deﬁned based on whether a participant’s CD4 T cell count at 96 ± B = 500 perturbed sequences of the estimators and their corresponding 95% CIs. Across all the pertur-bations, the 95% CIs for the exposure eﬀect excluded zero, even when (all) the measured covariates hadbeen intentionally eliminated from adjustment. A natural cubic spline with the maximum number of 15equally-spaced interior knots was ﬁtted to each (perturbed) sequence of the estimators and endpoints ofthe 95% CI. The predicted eﬀects, and uncertainty intervals, adjusting for between one and eight (half thenumber of measured covariates) hypothetical confounders were then extrapolated to. We trimmed 5% ofthe extreme endpoints of the perturbed CIs to construct the uncertainty intervals. Further adjustment forhypothetical unmeasured confounding appeared unlikely to yield an eﬀect estimate that would be signif-icantly indistinguishable from zero: the uncertainty interval excluded zero even after adjusting for sevenadditional hypothetical (unmeasured) confounders, as indicated by the ﬁlled triangles being above zero inFigure 2. The results suggested that statistical inference for the eﬀect was unlikely to change (from adjust-ing for only the measured confounders). The conclusion that there was a (statistically signiﬁcant) positiveaverage exposure eﬀect of combination therapy on CD4 T cell count thus appeared to be insensitive to24hypothetical) unmeasured confounding. . . . . . Uncertainty intervals for ACTG175 data

Number of confounders adjusted for E ff e c t e s t i m a t e Figure 2: Estimators of the average exposure eﬀect from each orbit, for the ACTG175 data. Cir-cles indicate the point estimates, and (inverted) triangles indicate the (lower) upper endpoint of the 95%CI. Empty points indicate the sequence of eﬀect estimates constructed by eliminating a single measuredconfounder from adjustment in turn. The curve corresponds to a ﬁtted natural cubic spline. Filled pointsindicate the (extrapolated) predictions after adjusting for q additional unmeasured confounders. Each thingrey line corresponds to the natural cubic spline ﬁtted to a perturbed sequence. The vertical lines indicatethe symmetric 95 percentiles of the predicted eﬀects across all B perturbations. E EXISTING METHODS FOR SENSITIVITY ANALYSIS

Existing methods for assessing sensitivity to unmeasured confounding typically judge the (minimum) ex-tent that adjusting for a hypothetical scalar confounder, conditional on the measured confounders, aﬀectsinference about the causal eﬀect. A widely-adopted approach focuses on characterizing the (strength anddirection of) association between the hypothetical confounder and the exposure, either as a regression co-eﬃcient or the odds ratio of receiving exposure, as a sensitivity parameter . Another approachfurther incorporates the (strength and direction of) association between the hypothetical confounder andthe outcome as an additional sensitivity parameter . Recent methods adopting the latter approachinclude VanderWeele and Arah , Richardson et al. , Carnegie et al. , Ding and VanderWeele , Dorie25t al. , VanderWeele and Ding , Genb¨ack and de Luna , Zhang and Tchetgen Tchetgen , Cinelli andHazlett , Veitch and Zaveri , among many others. We refer readers to Liu et al. and Carnegie et al. for an overview of other approaches, including methods that require restrictive assumptions about theparametric form of the outcome model that encodes the exposure eﬀect, and to Veitch and Zaveri foran overview of other methods that do not explicitly consider a hypothetical unmeasured confounder andinvolve other (more complex) sensitivity parameters. Methods using sensitivity parameters that encodethe hypothetical confounder’s separate associations with exposure and outcome typically require ﬁrst spec-ifying a discrete grid of values for the parameters, then estimating the causal eﬀect after adjusting for thehypothetical confounder under diﬀerent ﬁxed parameter values. Graphical tools, such as bivariate sensi-tivity contour plots that visualize diﬀerent combinations of how strongly the hypothetical con-founder must be (simultaneously) associated with exposure and outcome for the conclusions to be aﬀected,can be especially useful for applied researchers.But sensitivity analyses predicated on such sensitivity parameters suﬀer from a number of short-comings. First, users of grid-based approaches must explicitly state the granularity, range, and even signs,for plausible values of the separate (conditional) associations between the hypothetical confounder and theexposure, and the outcome. Reference values are sometimes estimated or postulated based on the mea-sured confounders’ (conditional) associations with exposure (and outcome) via an empirical “calibration.”A speciﬁc measured confounder is selected as a reference based on e.g., its (standardized) estimated re-gression coeﬃcient ; or its marginal eﬀect size on the outcome ; or its relative risk on the (binary) out-come conditional on other measured confounders ; or its contribution to the variation in the outcome .Bayesian sensitivity analyses for unmeasured confounding avoid such user calibration throughthe application of Bayes Theorem to obtain a posterior distribution for the sensitivity parameters, butrequire more restrictive parametric assumptions about the outcome model. Second, with the exceptionof Bonvini and Kennedy , Zhang and Tchetgen Tchetgen and Cinelli and Hazlett , most sensitivityanalyses that evaluate the associations between a hypothetical confounder with exposure, or outcome,or both, (implicitly) require certain assumptions about whether the confounder is continuous or dichoto-mous, or the distribution of the confounder, or whether the bias is away or toward the null. Third andmost crucially, sensitivity analyses based on such sensitivity parameters can potentially lead to scientiﬁ-cally meaningless and logically incoherent conclusions . Consider the following simple example where the(true) probability of receiving a binary exposure ( A = 1) depends on a logistic model with main eﬀectsfor three confounders, L , L , and U ; e.g., Pr( A = 1 | L , L , U ) = expit( β + β L + β L + Γ U ), whereexpit( x ) = exp( x ) / { x ) } . The interpretation of the conditional odds ratio of being exposed due to L under the true model, exp( β ), depends on the values of the other confounders L and U , even in the26bsence of any interactions. Suppose that the researcher could only collect data on L and L , so that U is left unmeasured, and is thus restricted to ﬁtting a diﬀerent treatment model, e.g., Pr( A = 1 | L , L ) =expit( β ∗ + β ∗ L + β ∗ L ). The interpretation of exp( β ∗ ) under the ﬁtted model thus depends on only thevalue of L . It follows that the interpretation of the sensitivity parameter, exp(Γ), is incompatible withboth exp( β ) and exp( β ∗ ) due to non-collapsibility . Of course, one may consider a diﬀerent treatmentmodel instead, e.g., Pr( A = 1 | L , U ) = expit( β ∗∗ + β ∗∗ L + Γ ∗ U ), so that the interpretation of the sensitiv-ity parameter exp(Γ ∗ ) that encodes the conditional odds ratio due to U given L is seemingly comparablewith exp( β ∗ ). But there is no guarantee that the range of plausible values for Γ will necessarily be nar-rower than that for Γ ∗ , even though the former is based on an exposure model that adjusts for both L and L2