Bayesian Methods for Multiple Mediators: Relating Principal Stratification and Causal Mediation in the Analysis of Power Plant Emission Controls
Chanmin Kim, Michael Daniels, Joseph Hogan, Christine Choirat, Corwin Zigler
BBayesian Methods for Multiple Mediators: Relating PrincipalStratification and Causal Mediation in the Analysis of Power PlantEmission Controls
Chanmin Kim , Michael J. Daniels , Joseph W. Hogan , Christine Choirat and Corwin M. Zigler Department of Biostatistics, Boston University School of Public Health Department of Statistics, University of Florida Department of Biostatistics, Brown University Swiss Data Science Center Department of Statistics and Data Sciences, University of Texas at Austin
Abstract
Emission control technologies installed on power plants are a key feature of many air pollu-tion regulations in the US. While such regulations are predicated on the presumed relationshipsbetween emissions, ambient air pollution, and human health, many of these relationships havenever been empirically verified. The goal of this paper is to develop new statistical meth-ods to quantify these relationships. We frame this problem as one of mediation analysis toevaluate the extent to which the effect of a particular control technology on ambient pollutionis mediated through causal effects on power plant emissions. Since power plants emit var-ious compounds that contribute to ambient pollution, we develop new methods for multipleintermediate variables that are measured contemporaneously, may interact with one another,and may exhibit joint mediating effects. Specifically, we propose new methods leveragingtwo related frameworks for causal inference in the presence of mediating variables: princi-pal stratification and causal mediation analysis. We define principal effects based on multiplemediators, and also introduce a new decomposition of the total effect of an intervention on am-bient pollution into the natural direct effect and natural indirect effects for all combinations ofmediators. Both approaches are anchored to the same observed-data models, which we spec-ify with Bayesian nonparametric techniques. We provide assumptions for estimating principalcausal effects, then augment these with an additional assumption required for causal mediationanalysis. The two analyses, interpreted in tandem, provide the first empirical investigation ofthe presumed causal pathways that motivate important air quality regulatory policies.
Keywords:
Bayesian nonparametrics, Gaussian copula, Natural indirect effect, Multi-Pollutants,Ambient PM . , 1 a r X i v : . [ s t a t . M E ] F e b Introduction
Motivated by a evidence of the association between ambient air pollution and human health out-comes, the US Environmental Protection Agency (EPA) oversees a vast program for air qualitymanagement designed to limit population exposure to harmful air pollution (Pope III et al. 2009,Dominici et al. 2014). Fine particulate matter of diameter 2.5 micrometers or less (PM . ) is ofparticular importance, with regulations to limit exposure to PM . estimated to account for overhalf of the benefits and a substantial portion of the costs of all monetized federal regulations (Officeof Management and Budget 2013). A large contributor to ambient PM . in the US is the powergenerating sector, in particular coal-fired power plants. These plants emit PM . directly into theatmosphere, but are also major sources of sulfur dioxide (SO ) and nitrogen oxides (NO x ) that, onceemitted into the atmosphere, contribute to secondary formation of PM . through chemical reac-tion, coagulation and other mechanisms. The amount PM . formation initiated by emissions ofSO and NO x depends largely on atmospheric conditions such as temperature (Hodan & Barnard2004). Power plants are also major sources of CO emissions.A variety of regulatory programs under the purview of the Clean Air Act (e.g., the Acid RainProgram) are designed to reduce emissions from power plants, with one goal of reducing populationexposure to ambient PM . . One key strategy for achieving this reduction is the installation of SO control technologies such as flue-gas desulfurization scrubbers (henceforth, “scrubbers”), on powerplant smokestacks to reduce SO emissions and, in turn PM . . Estimates of the annualized humanhealth benefits of regulatory polices such as the Acid Rain Program rely heavily on presumedrelationships between such control strategies, emissions, ambient PM . , and human health. Whilethe underlying physical and chemical understanding of the link between power plant emissions andPM . is well established, there remains considerable uncertainty about the effectiveness of specificstrategies for reducing harmful pollution amid the realities of actual regulatory implementation.Accordingly, the EPA and other stakeholders have increasingly emphasized the need to provideevidence of which specific air pollution control strategies are most effective or efficient for reducingpopulation exposures to PM . (HEI Accountability Working Group 2003, U.S. EPA 2013).The goal of this paper is to deploy newly-developed statistical methods to examine the causaleffect of scrubbers installed at coal-fired power plants on the ambient concentration of ambientPM . using observed data on power plant emissions and ambient pollution. Physical and chemicalunderstanding of these processes provide strong support for the expectation that scrubbers reduceambient PM . “through” reducing emissions of SO , but this relationship has never been empiri-cally verified using observed data in the context of regulations that may simultaneously impact avariety of factors. A key statistical challenge to verifying this relationship derives from the factthat SO emissions are highly correlated with emissions of NO x and CO and NO x is known toplay an important role in the formation of ambient PM . , possibly through interactions with SO .Thus, the question will be formally framed as one of mediation analysis: To what extent is thecausal effect of a scrubber (the “treatment”) on ambient PM . (the “outcome”) mediated throughreduced emissions of SO , NO x and CO (the “mediators”)? Recovering a statistical answer tothis question amid the problem of multiple highly correlated and possibly interacting mediators2hat are measured contemporaneously requires new methods development and would also serve tobolster the promise of statistical methods in studies of air pollution that have historically relied onphysical and chemical knowledge and not on statistical analysis.To answer this question, we develop new methods that draw from two frameworks for estimat-ing causal effects in the presence of mediating variables: (1) principal stratification (Frangakis &Rubin 2002) and (2) causal mediation analysis (Robins & Greenland 1992). The methodologicalcontributions of this paper come in three areas. First, we develop new methods to accommodatemultivariate mediating variables that are measured contemporaneously (not sequentially), are cor-related, and may interact with each to impact the outcome (see Figure 1. for a an illustrativedirected acyclic graph). This is essential for evaluating scrubbers because power plants simultane-ously emit multiple pollutants that may interact through atmospheric processes to impact ambientPM . . Existing methods in the literature for both principal stratification and mediation analysishave primarily focused on settings with a single mediator (e.g., Baron & Kenny (1986), Frangakis& Rubin (2002), VanderWeele (2009), Joffe & Greene (2009), Daniels et al. (2012)) and existingextensions to cases with multiple mediating variables cannot accommodate the setting of powerplant emissions where mediators may simultaneously and jointly impact the outcome (Wang et al.2013, Imai & Yamamoto 2013, VanderWeele & Vansteelandt 2014, Daniel et al. 2015). Our sec-ond methodological contribution is the use of Bayesian nonparametric approaches to model theobserved distribution of emissions and pollution outcomes, making use of a multivariate Gaussiancopula model to link flexibly-modeled marginal distributions of observed outcomes to a joint distri-bution of potential outcomes. Similar strategies with a single mediator have received recent atten-tion in the principal stratification literature (Bartolucci & Grilli (2011), Ma et al. (2011), Schwartzet al. (2011), Conlon et al. (2014)) and are emerging for causal mediation analysis (Daniels et al.2012, Kim et al. 2016). These approaches are important for confronting continuous mediators andinfinitely many principal strata, and are deployed here in a novel way to address the problem ofmultiple mediators while flexibly modeling the observed-data distributions of both mediators andoutcomes. Finally, we provide a unification of principal stratification and causal mediation analysis.While the mathematical relationships between these two approaches are well understood (Mealli& Rubin 2003, VanderWeele 2011, Mattei & Mealli 2011), there has not been, to our knowledge, acomprehensive deployment of both perspectives in a complementary fashion to illuminate the sci-entific underpinnings of a specific problem. Baccini et al. (2015) made important progress in thisdirection using different observed-data models to estimate principal effects and mediation effectsin a problem with a single mediator. In contrast, the approach developed here uses the exact sameobserved-data models to ground both perspectives, proposes a common set of basic assumptionsfor estimating both principal effects and mediating effects, modularizes an additional assumptionrequired to augment a principal stratification analysis in order to obtain estimates of natural di-rect and indirect effects, and considers settings with multiple mediating variables. Ultimately, weprovide a new dimension of quantitative, statistical evidence for supporting air policy regulatorydecisions. 3 Scrubber Installation and Linked Data Sources
Title IV of the Clean Air Act established the Acid Rain Program (ARP), which required majoremissions reductions of SO (and other emissions) by ten million tons relative to 1980 levels. Thisreduction was achieved mostly through cutting emissions from power plants, or more formally,electricity-generating units (EGUs). Impacts of the ARP have been evaluated extensively, andthe program is generally lauded as a success due to marked national decreases in SO and NO x coming at relatively low cost. Estimates of the annualized human health benefits of the entire ARPrange from $50 billion to $100 billion (Chestnut & Mills 2005), but rely heavily on presumedrelationships between power plant emissions, ambient PM . , and human health.While power plants under the ARP had latitude to elect a variety of strategies to reduce emis-sions, one key strategy is the installation of a scrubber to reduce SO emissions. The precise extentto which installation of a scrubber reduces ambient PM . through reducing SO emissions re-mains unknown, and has never been estimated empirically amid the realities of actual regulatoryimplementation where pollution controls may impact a variety of factors that are also related to theformation of PM . . Knowledge of these relationships is complicated by the fact that power plantsemit more than just SO , and emissions of a variety of pollutants likely interact in the surroundingatmosphere to form ambient PM . .To provide refined evidence of the extent to which scrubbers reduce emissions and cause im-provements to ambient air quality, we assembled a national database of ambient air quality mea-sures, weather conditions, and information on power plants. Specifically, we assembled data on258 coal-fired power plants from the EPA Air Markets Program Data and the Energy InformationAdministration, with information on plant characteristics, emissions control technologies installed(if any), and emissions of SO , NO x , and CO during 2005, five years after promulgation of animportant phase of regulations under the Acid Rain Program. For each power plant, we augmentthe data set with annual average ambient PM . concentrations in 2005 and baseline meteorologicconditions in 2004 measured at all monitoring stations in the EPA Air Quality System that are lo-cated within 150km. The 150km range was chosen both to acknowledge that atmospheric processescarry power plant emissions across distances at least this great, but also to minimize the number ofmonitoring stations considered within range of more than one power plant. We regard any powerplant as “treated” with scrubbers in 2005 if at least 10% of the plant’s total heat input was attributedto a portion of the plant equipped with a scrubber as of January 2005. Note that this proportion wasnearly 0% or nearly 100% for the vast majority of plants, indicating robustness to this 10% cutoff.Other power plant characteristics are listed in Table 1. The data files and programs to assemble theanalysis data set are available at https://dataverse.harvard.edu/dataverse/mmediators and https://github.com/lit777/MultipleMediators , respectively.4able 1: Summary statistics for covariates and outcomes available for the analysis of SO scrub-bers. Have scrubbers (n=59) Have no scrubber (n=190)Median IQR Median IQRMonitor DataAverage Ambient PM . µg / m ) 12.4 (7.8, 14.8) 13.7 (11.8, 15.2)Average Temperature 2004 ( ◦ C) 11.5 (10.1, 15.0) 12.8 (10.4, 16.1)Average Barometric Pressure 2004 ( mmHg ) 737.8 (686.7, 752.4) 746.1 (739.1, 755.6)Power Plant Level DataTotal SO Emission 2005 (tons) 644.3 (257.3, 1819.9) 1267.1 (504.9, 2707.6)Total NO x Emission 2005 (tons) 852.1 (394.2, 1531.3) 442.5 (193.7, 878.2)Total CO Emission 2005 (1000 tons) 505.3 (232.5, 960.7) 283.6 (117.7, 559.0)Unit Level DataAverage Heat Input 2004 (1000 MMBtu) 4653.3 (2266.4, 9363.9) 2783.4 (1147.6, 5448.1)Total Operating Time 2004 (hours × x Controls 2004 ( × To fix ideas, consider the single mediator case. Let Z i ∈ { , } indicate the presence of the inter-vention of interest, here, whether power plant i has installed scrubbers in January 2005 ( Z i =
1) andlet Z = ( Z , · · · , Z n ) be the vector of intervention indicators for power plants i = , · · · , n . Usingpotential-outcomes notation (Rubin 1974), let M i ( Z ) denote the potential emissions that the i -thpower plant would be generated under the vector of scrubber assignments Z , and let Y i ( Z ; M ) denote the potential ambient PM . outcome that could, in principle, be defined for any scrubberassignment vector Z and any vector of intermediate emissions values M . Throughout the paper,we adopt the stable unit treatment value assumption (SUTVA; Rubin 1980) which implies 1) thereis no “interference” in the sense that potential intermediate and outcome values from power plant i do not depend on scrubber treatments and emissions intermediates of other power plants (i.e, M i ( Z ) = M i ( Z i ) and Y i ( Z ; M ) = Y i ( Z i ; M i ) ) and 2) there are “no multiple versions” of scrubbertreatments such that whenever Z i = Z (cid:48) i , M i ( Z i ) = M i ( Z (cid:48) i ) and Y i ( Z i ; M i ( Z i )) = Y i ( Z (cid:48) i , M i ( Z (cid:48) i )) . Forreasons that will become clear later, we augment the standard SUTVA to also assume “no mul-tiple versions” of emissions intermediates which states, if M i = M (cid:48) i , then Y i ( Z i ; M i ) = Y i ( Z i ; M (cid:48) i ) (Forastiere et al. 2016). We revisit possible violations of SUTVA in Section 8, but note here thatthe linkage of power plants to monitors within 150km provides some justification for this assump-tion. 5he natural direct effect (Pearl 2001) is defined by NDE = E [ Y i ( M i ( )) − Y i ( M i ( ))] , rep-resenting the effect of the intervention obtained when setting the mediator to its ‘natural’ value M i ( ) ; i.e., its realization in the absence of the intervention. The natural indirect effect is defined asNIE = E [ Y i ( M i ( )) − Y i ( M i ( ))] , representing the effect of holding the intervention status fixedat Z = M ( ) to M ( ) . The total causal effect of the in-tervention on the outcome can then be defined as TE = NDE + NIE = E [ Y i ( M i ( )) − Y i ( M i ( ))] .Similar controlled effects could also be defined to represent causal effects at specific values of M (Pearl 2001, Robins & Greenland 1992).Implicit in the definition of these effects is the conceptualization of hypothetical interventionsthat could independently manipulate values of both Z and M to, for example, “block” the effect onthe mediator. Thus, it is important to note that potential outcomes of the form Y i ( Z i ; M i ( Z (cid:48) i )) arepurely hypothetical for Z i (cid:54) = Z (cid:48) i , and can never be observed for any observational unit. Such unob-servable potential outcomes have been referred to as a priori counterfactuals (Robins & Greenland1992, Rubin 2004). We revisit conceptualization of a priori counterfactuals in the context of thepower plant study in Section 4.1, but note here the distinction between a priori counterfactuals andpotential outcomes of the form Y i ( Z i ; M i ( Z i )) that are observable and actually observed for someunits. A distinct but related framework for defining causal effects in the presence of intermediate variablesis principal stratification (Frangakis & Rubin 2002). Continuing with the single-mediator case,principal stratification considers only a single intervention and relies on definition of two causaleffects: the effect of Z i on M i , defined as M i ( ) − M i ( ) , and the effect of Z i on Y i , defined as Y i ( M i ( )) − Y i ( M i ( )) . The objective is to estimate principal effects , which are average causaleffects of Z i on Y i within principal strata of the population defined by ( M i ( ) , M i ( )) .With principal stratification, dissociative effects are defined to quantify the extent to which theintervention causally affects outcomes when the intervention does not causally affect the mediator,for example, E [ Y i ( M i ( )) − Y i ( M i ( )) | M i ( ) = M i ( )] . Dissociative effects are similar to directeffects in a mediation analysis in that they represent causal effects of an intervention on the outcomeamong the subpopulation where there is no causal effect on the mediator, but they refer only to thespecific subpopulation with M ( ) = M ( ) . VanderWeele (2008) and Mealli & Mattei (2012) showthat dissociative effects represent a quantity that is only one contributor to the NDE, with theamount of contribution tied to the size of the subpopulation with M ( ) = M ( ) . Associative effects are defined to quantify the causal effect of the intervention on the out-come among those for which the intervention does causally affect the mediator, for example, E [ Y i ( M i ( )) − Y i ( M i ( )) | M i ( ) < M i ( )] . An associative effect that is large in magnitude rel-ative to the dissociative effect indicates that the causal effect of the intervention on the outcome isgreater among those for which the mediator is causally affected, compared to those for which themediator is not affected. This could be interpreted as suggestive of a causal pathway whereby theintervention impacts the outcome through changing the mediator, but note that associate effects are6enerally a combination of the NDE and NIE for a defined subpopulation.Dissociative effects that are similar in magnitude to associative effects indicate that the in-tervention effect on the outcome is similar among observations that do and do not exhibit causaleffects on the mediator, which could be interpreted as suggestive of other causal pathways throughwhich Z i affects Y i .A primary distinction between principal stratification and causal mediation analysis is that prin-cipal effects only pertain to population subgroups comprised of observations with particular valuesof ( M i ( ) , M i ( )) , whereas natural direct and indirect effects are defined for the whole population(as discussed in detail in Mealli & Mattei (2012)). Importantly, note that the a priori counterfactu-als of the form Y i ( Z i , M i ( Z (cid:48) i )) for Z i (cid:54) = Z (cid:48) i do not appear in the definition of principal effects, whichrely only on the definition of observable potential outcomes Y i ( Z i , M i ( Z i )) . Thus, there is no con-ception in principal stratification of a hypothetical intervention acting on M i independently from Z i ,and there is no definition of a causal effect of Z i on Y i that is mediated through M i . From a modelingperspective, principal effects can be estimated when an outcome model is specified conditional onboth potential mediators (intermediate outcomes), M i ( ) and M i ( ) while causal mediation analy-sis has tended to rely on an outcome model that depends on the observed mediator. The differencesin modeling strategies that are typically employed in principal stratification and causal mediationanalysis complicate comparisons, as results of such analyses have typically been driven in part bydifferent modeling assumptions. In Section 5, we will propose a new set of assumptions to build acommon observed-data model for principal stratification and causal mediation analysis. Extensions of the causal mediation ideas outlined in Section 3.1 to settings of multiple mediatingvariables are emerging. For contemporaneously observed mediators, straightforward extensionsof the Baron and Kenny (1986) regression-based structural equation model approach (MacKinnon2008) have been proposed. For each of K contemporaneous mediators ( M , M , · · · , M K ) , a seriesof regression models is used to estimate mediator-specific NIEs in a manner that implies additivityof indirect effects: JNIE = K ∑ k = NIE k and TE = NDE + JNIE , (1)where JNIE is used to denote the joint natural indirect effect due to changes in all K mediators, andNIE k = E [ Y i ( M k , i ( )) − Y i ( M k , i ( ))] represents the natural indirect effect of the k -th mediator.These approaches assume that each M k , i mediates the treatment effect independently of the othermediators, without interactions among mediators (i.e., the mediators are causally independent or parallel ). Figure 1.a without dashed lines illustrates this case. Wang et al. (2013) propose an al-ternative modeling approach under the setting of causally independent mediators. If the mediatorsinteract with each other in terms of their impact on the outcome, then additivity of indirect effects asin the above cannot hold; and estimation of multivariate mediated effects can then be further com-plicated by correlations among the mediators. Dependence among mediators has been consideredwhen M k are observed sequentially (i.e., sequential mediators; Figure 1.b), as in Imai & Yamamoto7igure 1: Directed Acyclic Graphs : a) contemporaneous mediators with interactions (our case)and b) sequentially ordered mediators.(2013). Albert & Nelson (2011), and Daniel et al. (2015) propose approaches for either sequentiallydependent mediators or mediators that do not affect nor interact with each other. These approachesoffer a decomposition of the JNIE in the case of sequential dependence, and assume additivityof natural indirect effects otherwise. VanderWeele & Vansteelandt (2014) discuss an approach todecompose the JNIE further when the mediators simultaneously affect each other; however, theirapproach does not evaluate the impact of each individual mediator (see Section 4.3).Taguri et al.(2015) propose an approach for contemporaneous, non-ordered mediators, but rely on an assump-tion that the mediators are conditionally independent given observed covariates, which does notfully represent the possibility of contemporaneous interactions among the mediators, as may be thecase with multiple emissions (in particular SO and NO x ) and the formation of ambient PM . .Section 6 examines the possibility of contemporaneous interactions among (possibly correlated)mediators in the context of the scrubber study.In summary, existing methods for multiple mediators rely on either assumed causal indepen-dence of (parallel) mediators and additivity of indirect effects, sequential dependence of media-tors, or on restrictive assumptions of conditional independence among mediators. VanderWeele &Vansteelandt (2014) point out that if there are interactions between the effects of (non-sequential)multiple mediators on the outcome, the joint indirect effect may not be the sum of all three indirecteffects. They note that, in principle, an analysis could proceed with an outcome model includinginteractions M j M k for all { j , k } combinations combined with models for ( E ( M j , M k ) ). However,this approach would lead to issues of model compatibility between the models for M j and M k andthat for the product M j M k . The lack of satisfactory methods for more general settings of multiplecontemporaneously-measured mediators motivates the methods developed herein, where we offera new decomposition of the joint natural indirect effect into individual indirect effects that may not8ffect the outcome additively. Suppressing the i subscript indexing power plants, let { M k ( z ) ; k = , . . . , K } denote the potentialemissions of K pollutants that would occur if a power plant were to have scrubber status Z = z ,for z = ,
1. While much of our development is general for any K , we focus on the case K = M k ( z ) , k = , , , NO x , and CO , respectively. Thecausal effect of the scrubber on emission k can then be defined as a comparison between M k ( ) and M k ( ) . Let M ( z , z , z ) ≡ { M ( z ) , M ( z ) , M ( z ) } denote potential emissions under a set ofthree scrubber statuses { z , z , z } .We similarly define potential PM . outcomes, but extend the notation to define potential con-centrations under different values of scrubber status, Z , and different possible values of emissions, M ( z , z , z ) . Thus, in full generality, each power plant has a set of 2 K + =
16 potential outcomesfor PM . , Y ( z ; M ( z , z , z )) , which denote potential values of PM . that would be observed underintervention Z = z with pollutant emissions set at values under interventions z , z , z . Definition ofall 16 potential PM . concentrations is required for definition of natural direct and indirect effectsand entails a priori counterfactuals. For example, Y ( M ( , , )) would represent the potentialambient PM . concentration near a plant under the hypothetical scenario where the plant installsa scrubber ( z = and NO x are set to what they would be withoutthe scrubber ( z = z =
0) and emissions of CO are set to what they would be with the scrubber( z = x control, thus “blocking” the intervention and maintaining SO and NO x emissions at levels that would have occurred without the SO technology. Principal stratificationwill only rely on potential outcomes with z = z = z = z that are observable from the data, suchas M ( , , ) and Y ( M ( , , )) observed for any power plant that installs a scrubber. Finally, let X denote a vector of baseline covariates measured at the power plant or the surrounding area. Extending principal stratification to settings where the intermediate variable is multivariate is con-ceptually straightforward. Principal stratification defines a principal stratum for every combinationof the joint vector ( M ( , , ) , M ( , , )) , and principal causal effects are defined as comparisonsbetween Y ( M ( , , )) and Y ( M ( , , )) within principal strata.For any subset K ⊆ { , , } , let | M ( , , ) − M ( , , ) | K denote the element-wise absolutedifferences between emissions of the subset of pollutants in K , e.g., | M ( , , ) − M ( , , ) | K = { , } = | M ( ) − M ( ) | , | M ( ) − M ( ) |} . Definitions of quantities such as average associative and dis-sociative effects can proceed following Zigler et al. (2012) by defining: EDE K = E [ Y ( M ( , , )) − Y ( M ( , , )) (cid:12)(cid:12) | ( M ( , , ) − M ( , , )) | K < C D K ] , EAE K = E [ Y ( M ( , , )) − Y ( M ( , , )) (cid:12)(cid:12) | ( M ( , , ) − M ( , , )) | K > C A K ] , where C A K denotes a vector of thresholds beyond which a change in each emission in K is consid-ered meaningful, C D K is a vector of thresholds below which changes in these emissions are consid-ered not meaningful, and > and < represent element-wise comparisons. Note that the dissociateeffect is now defined on principal strata where potential changes (or differences) in the interme-diate variables are less than some vector of thresholds | ( M ( , , ) − M ( , , )) | K < C D K insteadof principal stratum with strict equality | ( M ( , , ) − M ( , , )) | K = { , , } K to accommodatecontinuous intermediate values. For example, K = { , } would be used to define the associative(dissociative) effect in the subpopulation exhibiting an effect on SO and CO in excess of C A K (below C D K ), without regard to the effect on NO x . For the data analysis in Section 7, we divide theEAE defined above into two parts: EAE + K will denote the average associative effects among powerplants where all emissions in K are causally increased in excess of C A K , while EAE − K will denotethe average associative effect in power plants where all emissions in K were causally reduced inexcess of C A K . Note that these summary quantities only consider a subset of principal strata thatmay be of interest. For example, analogous average principal effects could be calculated amongstrata where some emissions are decreased and others are increased. We avoid burdensome nota-tion for such summaries, but will revisit estimates in additional principal strata in the context of thedata analysis in Section 7.In addition to estimating average dissociative and associative effects for different K as definedabove, interest may lie in entire surfaces of, for example, how the causal effect on PM . varies asa function of the causal effect on each emission (“causal effect predictiveness” surface (Gilbert &Hudgens 2008)). a priori Counterfactual Outcomes: Natural Direct and IndirectEffects for Multiple Mediators
Extending definitions of natural direct and indirect effects to the multiple mediator setting is some-what more complicated. The natural direct effect is defined as NDE = E [ Y ( M ( , · · · , )) − Y ( M ( , · · · , ))] , representing the causal effect of Z on Y that is “direct” in the sense that itis not attributable to changes in any of the K emissions. The joint natural indirect effect of all K mediators, JNIE ··· K , is derived by subtracting the natural direct effect from the total effect,JNIE ··· K = TE − NDE = E [ Y ( M ( , , · · · , )) − Y ( M ( , , · · · , ))] .In addition to JNIE ··· K , we introduce a decomposition into the natural indirect effects at-tributable to changes in different combinations of the K mediators. Maintaining focus on the casewhere K =
3, the JNIE can be decomposed into emission-specific indirect effects and the jointindirect effects of all possible pairs of emissions. See Figure 5 in the Web Appendix for a graphicalrepresentation. 10e define the mediator-specific
NIE for the k -th emission as a comparison between the poten-tial PM . outcome under scrubbers and the analogous outcome with the value of the k -th emissionfixed to the natural potential value that would be observed without scrubbers. Specifically, foremissions of SO , NO x , and CO define:NIE = E [ Y ( M ( , , )) − Y ( M ( , , ))] , NIE = E [ Y ( M ( , , )) − Y ( M ( , , ))] , (2)NIE = E [ Y ( M ( , , )) − Y ( M ( , , ))] . In a similar fashion we can define the joint natural indirect effect attributable to subsets ofmediators j and k for j (cid:54) = k as differences between the observable potential PM . outcomes underscrubbers and the analogous a priori counterfactual with values of pollutants j and k set to theirnatural values that would be observed without scrubbers. For example, JNIE defines the jointnatural indirect effects of mediators 1 (SO ) and 2 (NO x ) asJNIE = E [ Y ( M ( , , )) − Y ( M ( , , ))] . Values of JNIE jk for other pairs of mediators can be defined analogously, and all such pairs corre-spond to the second row in Figure 5 in the Web Appendix. Note that the joint natural indirect effectof each pair of mediators is not equal to the sum of corresponding mediator-specific NIEs unlessthere is no overlap between mediator-specific NIEs (additivity). For example, we can represent therelationship between JNIE and the mediator-specific effects NIE and NIE as ( NIE + NIE ) − JNIE = E [ Y ( M ( , , )) − Y ( M ( , , )) − Y ( M ( , , )) + Y ( M ( , , ))] . Thus, if this quantity is not equal to 0, we argue that additivity of mediator-specific NIEs doesnot hold. Note that the above decomposition of JNIE differs from VanderWeele & Vansteelandt(2014), which considers the portion of the JNIE mediated through M , then sequentially con-siders the additional contribution of each mediator in the presence of the others. This presumedordering of mediators precludes estimation of the effect through different pairs of mediators suchas JNIE or JNIE , the availability of which is a benefit of our proposed decomposition. Ourdecomposition also differs from Daniel et al. (2015) who only allow interacting overlap betweenmediator-specific NIEs when one mediator causally affects another.Note that alternative definitions of NIE could use contrasts of the form: NIE ∗ = E [ Y ( M ( , , )) − Y ( M ( , , ))] . Such a strategy is also considered in Daniel et al. (2015), but defining NIE ∗ k inthis way would rely entirely on a priori counterfactuals, whereas a benefit of using the definitionsin (2) is that each definition uses the observable potential outcome Y ( M ( , , )) , comparingagainst only one a priori counterfactual (e.g., Y ( M ( , , )) ). Under the assumptions developed in this section, Bayesian inference for the causal effects de-fined in Section 4 follows from specifying models for the joint distribution of all potential media-11ors (conditional on covariates) and the outcome model conditional on all potential mediators andcovariates, and prior distributions for unknown parameters. Posterior distributions cannot be com-puted directly from observed data because potential outcomes are never jointly observed in both thepresence and absence of a scrubber and a priori counterfactuals are never observed. Our estimationstrategy consists of three steps. First, we specify nonparametric models for the observed data. Themarginal distribution of each observed mediator (i.e., M ( , , ) = { M ( ) , M ( ) , M ( ) } observedfor power plants that did not install scrubbers and M ( , , ) , = { M ( ) , M ( ) , M ( ) } observedfor those that did) is specified separately and then linked into a coherent joint distribution usinga Gaussian copula model (Nelsen 1999). The models for the potential outcomes Y ( M ( , , )) and Y ( M ( , , )) are specified conditional on covariates and all potential mediators ( M ( , , ) and M ( , , ) ) that are never observed simultaneously. Thus, the conditional outcome models areestimated via the data augmentation for unobserved potential mediators. Second, we introduce twoassumptions for estimating the TE and the associative and dissociative effects. Third, we employan additional assumption to equate the distributions of a priori counterfactuals to those of the ob-served potential outcomes under intervention Z = F X ( x ) ,using the empirical distribution. We specify Dirichlet process mixtures for the marginal distribution of each mediator (M¨uller et al.1996). For each intervention z = , k = , , X = x , the conditionaldistribution of the k -th observed mediator is specified as M k , i | Z i = z , X i = x i ∼ N ( β zk , i + x (cid:62) i β zk , τ zk , i ) , M k , i ≥ i = , · · · , n z β zk , i , τ zk , i ∼ F zk , F zk ∼ DP ( λ zk , F zk ) , where { i = , , . . . , n z } denotes the observations with Z = z and k indicates the k -th mediator. Webound the mediator from below (0) using a truncated normal kernel (within the interval [0, ∞ )). β zk , i and τ zk , i denote the intercept and precision parameters for the k -th emission at the i -th powerplant that received intervention z . Here, DP denotes the Dirichlet process with two parameters, amass parameter ( λ zk ) and a base measure ( F zk ). To not overly complicate the model we only ‘mixed’over the intercept and precision parameters in the conditional distributions, β zk , i and τ zk , i . The basedistribution F zk is taken to be the normal-Gamma distribution, N ( µ zk , S zk ) G ( a zk , b zk ) . Details includinghyper prior specification are given in Section A of the Web Appendix.The marginal distributions of each mediator under each z = , [ M , M , M | Z = z , X = x ] with Gaussian copula models of the form: F M ( z , z , z ) ( m z , z , z ) = Φ [ Φ − { F M ( z ) ( m ) } , Φ − { F M ( z ) ( m ) } , Φ − { F M ( z ) ( m ) } ] , m z , z , z are values of potential mediators under intervention Z = z and Φ k is the k -variatestandard normal CDF. Note that we elect to model the marginal distribution of each univariate ran-dom variable separately, and then combine with the Gaussian copula model, rather than directlymodel the joint distributions of [ M , M , M | Z = z , X = x ] . Thus, we allow full flexibility usingDP mixtures of (truncated) normals for the marginal distributions (the fit of which can be checkedempirically) and use the Gaussian copula to link them to construct the joint distribution of potentialmediators. The Gaussian copula model implies some (correlation) structure to the joint distributionof all observable potential outcomes, without implying any specific causal structure. Flexibility ofthis structure derives from the fact that each marginal distribution is modeled as nonparametric withinfinite dimensional parameter spaces. The strategy is designed to coalesce with the modeling strat-egy in Section 5.2. Note that other potential alternatives to link the fixed marginal distributions suchas mixtures of marginals (e.g. H ( x , x ) = pF ( x ) + ( − p ) G ( x ) or H ( x , x ) = (cid:112) F ( x ) G ( x ) ) donot specify the full joint distribution distribution of ( x , y ) (Nelsen 1999)) and our method does notlimit the number of the mediators in general. While the joint distribution of all potential mediators( M ( , , ) and M ( , , ) ) is also modeled via the same Gaussian copula model, this entails mod-eling unobserved potential mediators and will be discussed as a part of the assumptions in Section5.2 To model the distributions of the potential outcomes for each z = , z = ,
1, potential values ofall (counterfactual) mediators and baseline covariates X = x , the conditional distribution of theobserved outcome y i is specified as f ( y i | m i ( , , ) , m i ( , , ) , x i , Z i = z ) = ∞ ∑ l = ω zl N ( y i , m i ( , , ) , m i ( , , ) , x i | µ zl , Σ zl ) where ω zl = γ zl / ( ∑ ∞ j = γ zj N ( m i ( , , ) , m i ( , , ) , x i | µ zj , \ , Σ zj , ( \ , \ ) )) and µ zj , \ denotes all elementsof mean parameters µ zj except for Y i . Similarly, Σ zj , ( \ , \ ) denotes a submatrix of covariance matrix Σ zj formed by deleting the the first row and the first column. The weight involves the parameter γ zj where γ zj = γ (cid:48) , zj ∏ h < j ( − γ (cid:48) , zh ) and γ (cid:48) , zj ∼ Beta ( , α z ) . This flexible conditional model specificationis a necessary feature in our case since we allow the outcome model to capture nonlinear and/orinteraction effects of the mediators. Note again that this outcome model is conditional on all po-tential mediators { M ( , , ) , M ( , , ) } which cannot be observed at the same time. We use asimilar approach to that used in Schwartz et al. (2011) to model the observed outcome distributionconditional on partly missing potential intermediate variables by constructing complete intermedi-ate data . Here, we impute unobserved potential mediators for each unit with a data-augmentationapproach based on the joint distribution of all potential mediators specified above. Details abouthyper prior specification and posterior computation are given in the Web Appendix.13 .2 Assumptions for Estimation of Causal Effects To estimate causal effects based on the model for the observed data specified in Section 5.1, we for-mulate assumptions relating observed quantities to both observable outcomes and a priori counter-factuals. Denote the conditional distribution [ Y ( z ; M ( z , z , z )) | M ( , , ) = m , , , M ( , , ) = m , , , X = x ] with f z , M ( z , z , z ) ( y | m , , , m , , , x ) where m z , z , z is a vector of hypothetical val-ues of the mediators under the interventions z , z , z . The conditional distribution [ M ( z , z , z ) | X = x ] is denoted by f M ( z , z , z ) ( m z , z , z | x ) . Other conditional distributions are defined analogously,and we henceforth omit conditioning on covariates X = x to simplify notation. We begin with an ignorability assumption stating that, conditional on covariates, “assignment” toscrubbers is unrelated to the observable potential outcomes: Assumption 1 (Ignorable treatmentassignment) { Y ( z ; M ( z , z , z )) , M ( , , ) , M ( , , ) } ⊥⊥ Z | X = x , for z = ,
1. This assumption permits estimation of the distributions of potential outcomes underintervention Z = z with observed data on ambient PM . and emissions under the same intervention.We adopt a Gaussian copula model to link the distributions of ( M ( z ) , M ( z ) , M ( z )) for z = , F M ( , , ) , M ( , , ) ( m , , , m , , ) = Φ [ Φ − { F M ( ) ( m ) } , Φ − { F M ( ) ( m ) } , Φ − { F M ( ) ( m ) } , Φ − { F M ( ) ( m ) } , Φ − { F M ( ) ( m ) } , Φ − { F M ( ) ( m ) } ] where Φ is the multivariate normal CDF with mean and a correlation matrix R .Assumption 2 implies a joint distribution of all observable potential mediators in a mannerconsistent with the models for [ M , M , M | Z = z , X = x ] described in Section 5.1. However, thisentire joint distribution of potential mediators under both interventions is not fully identified fromthe data since potential mediators under different interventions are never jointly observed. Specif-ically, entries of the correlation matrix R corresponding to, for example, the correlation between M j ( ) and M k ( ) , are not identifiable in the sense that no amount of data can estimate uniquevalues for these parameters. Nonetheless, proper prior distributions for these parameters can stillpermit inference from proper posterior distributions. Such parameters are sometimes referred toas “partially identifiable” in the sense that increasing amounts of data may lead the supports ofposterior distributions to converge to sets of values that are smaller than those specified in the priordistribution (Gustafson 2010, Mealli & Pacini 2013). This can arise due to restrictions on the jointdistribution implied by the models for the marginal distributions (e.g., the positive-definiteness14estriction on R may exclude some possible values for its entries). We discuss two prior specifica-tions for the partially-identified parameters in R , noting that further details of partial identifiabilityin the principal stratification context appear in Schwartz et al. (2011). Towards estimation of natural direct and indirect effects, we augment the assumptions of Section5.2.1 with one relating observable outcomes to a priori counterfactual outcomes.Assumption 3: For intervention Z =
1, the conditional distribution of the potential outcome givenvalues of all potential mediators (and covariates) is the same regardless of whether the mediatorvalues were induced by Z = Z = a priori counterfactual Y ( M ( , , )) and the observablepotential outcomes Y ( M ( , , )) have the same conditional distribution, f , M ( , , ) ( y | M ( , , ) = m , M ( , , ) , x )= f , M ( , , ) ( y | M ( , , ) , M ( , , ) = m , x ) . This assumption also applies to any two mediators in the absence of the intervention. For in-stance, the a priori counterfactual of PM . , Y ( M ( , , )) , and Y ( M ( , , )) have the sameconditional distribution regardless of whether corresponding emissions values arose under a scrub-ber ( Z =
1) or absent a scrubber ( Z = f , M ( , , ) ( y | M ( , , ) = m , M ( , , ) , x )= f , M ( , , ) ( y | M ( , , ) , M ( , , ) = m , x ) . The key point is that the distribution of PM . under a given (unobservable) combination ofmediators ( m ) only depends on the values of the mediators and not the intervention that led tothose mediators. Asserting this assumption in this case relies in part on what is known aboutthe underlying chemistry relating SO , NO x , and CO emissions to PM . . Note that such anassumption may be more difficult to justify in, say, a clinical study where assumptions about apriori counterfactuals might pertain to choices of study participants.The above assumption can be cast as two homogeneity assumptions of the form proposed inForastiere et al. (2016). For example, one implication of Assumption 3 is that the a priori counter-factual Y ( M ( , , )) is homogeneous across all principal strata with M ( , , ) = m , regardlessof the value of M ( , , ) . Viewing Assumption 3 in terms of the implied homogeneity acrossprincipal strata aids interpretation and justification in the context of the power plant example. Ho-mogeneity across strata implies that the potential ambient air quality value in the area surroundinga power plant is related to (possibly counterfactual) emission levels only, and not to the power plantcharacteristics that govern effectiveness of scrubbers for reducing emissions (i.e., the power plantcharacteristics that determine the exact principal stratum membership). This underscores the im-portance of including covariates in X that capture characteristics of the monitoring locations (e.g.,temperature and barometric pressure). Appendix D provides details of the relationship between15ssumption 3 and assumptions of homogeneity across principal strata. While Assumption 3 im-plies homogeneity assumptions, the converse is not true in the case of multiple mediators due to theconnection of Assumption 3 to a priori counterfactuals defined to have mediator values induced bydifferent interventions (e.g., Y ( M ( , , ) ). We discuss a sensitivity analysis to this assumptionin Web Appendix J. With the above model specification, the partial identifiability of the model parameters in R war-rants careful attention. Proper but noninformative prior distributions for these parameters could bespecified marginally for these parameters as Unif ( − , ) , or equivalently, as conditionally uniformon intervals satisfying positive definiteness restrictions for the correlation matrix. In either case,posterior inference may exhibit large uncertainty.We consider in detail an alternative prior specification similar to that in Zigler et al. (2012)to sharpen posterior inference. Specifically, the correlations between mediators under differentinterventions are specified as follows: Cor ( M j ( ) , M k ( )) = Cor ( M j ( ) , M k ( )) + Cor ( M j ( ) , M k ( )) × ρ , for j , k = , , , with ρ a sensitivity parameter. This strategy implies that (a) the correlation between the same me-diator ( j = k ) under opposite interventions is ρ , and (b) the correlation between different mediators ( j (cid:54) = k ) under opposite interventions is an attenuated version of the correlation observed separatelyunder each intervention. Section B of the Web Appendix provides a correlation matrix impliedby this assumption in the case of 2 mediators. We assume a single ρ and specify a uniform priordistribution, ρ ∼ Unif ( , ) , but a different parameter could be specified for each mediator.As an additional assumption to sharpen posterior inference, we assume that the correlationsbetween emissions (mediators) are all positive. Support for this assumption comes from observed-data estimates of these conditional correlations that are all positive.In summary, assumptions 1-2 are sufficient to estimate the principal causal effects, and pertainonly to observable potential outcomes. Adding assumption 3 relating observed quantities to apriori counterfactuals permits estimation of direct and indirect effects for mediation analysis. Theoptional assumptions here in Section 5.2.3 are designed to sharpen posterior inference in the powerplant analysis. A Markov chain Monte Carlo (MCMC) algorithm is used to sample from this posterior distributionand estimate causal effects using the following steps: (1) sampling parameters from each marginaldistribution for potential mediators and conditional distribution for potential outcomes defined inSection 5.1; (2) sampling parameters from the correlation matrix R of the Gaussian copula; (3)sampling via data augmentation a priori counterfactual mediators from the joint distribution; (4)computing causal effects based on all potential mediators and outcomes including imputed a priori We examine the performance of the proposed model under combinations of the following twodata generating scenarios: (1) correlations among the mediators (Case 1: uncorrelated mediatorsvs. Case 2: correlated mediators) and (2) interaction terms between the mediators in the outcomemodel (Case A: interaction term between M and M vs. Case B: interaction terms between M and M , and between M and M ). Data sets of size n =
500 are simulated for each of the four cases(1/A, 1/B, 2/A, 2/B), each with three continuous confounders. In all cases, the three mediators aregenerated based on a multivariate normal distribution. See the Web Appendix (Section G) for theexact data generating mechanism.We compare our method for estimating mediation effects to a regression-based model(MacKinnon2008): M = α + α Z + X (cid:62) α + ε M = α + α Z + X (cid:62) α + ε M = α + α Z + X (cid:62) α + ε Y = β + β Z + β M + β M + β M + X (cid:62) β + ε Y where ε , ε , ε , and ε Y are all independently distributed as N ( , σ ) .Table 2 summarizes the results based on 400 replications for each of the four scenarios. Itshows that our proposed model (BNP) performs well in terms of bias and MSE for all cases. Notethat the true effects change when the mediators are correlated in the presence of interaction term(s)in the outcome model. Thus, with any interaction effects of the mediators, it is desirable to cap-ture the correlation structure of the mediators, which our method does by flexibly modeling thejoint distribution of all potential mediators. Also, the flexible Bayesian nonparametric model cancapture both complex relationships/interactions among the mediators and non-additive and non-linear forms of mediators and/or confounders in the outcome model. In each scenario, interac-tion terms in the outcome model introduce non-additivity in the joint natural indirect effect (e.g.,JNIE (cid:54) = NIE + NIE + NIE ) and the traditional regression model has larger biases (and largerMSEs) for mediation effects. 17able 2: Simulation results for point estimators of causal mediation and principal causal effectsover 400 replications. The columns correspond to bias and MSE relative to the true values of thecausal effects for each scenario (Cases 1 and 2, and Cases A and B) under two different models; Parametric : a regression based model for the causal mediation effects;
BNP : Our Bayesiannonparametric method.
Case 1 Case 2
BNP Parametric Truth BNP ParametricTruth Bias MSE Bias MSE Truth Bias MSE Bias MSE
Case A
TE 0.73 0.02 (0.09) -0.03 (0.08) 0.92 -0.04 (0.08) (0.33)JNIE 1.73 0.06 (0.11) (0.07) 1.92 0.04 (0.08) 0.02 (0.47)NDE -1 -0.04 (0.01) -0.25 (0.15) -1 -0.08 (0.01) -0.20 (0.08)NIE -0.16 0.00 (0.00) -0.01 (0.01) 0.03 -0.05 (0.00) -0.38 (0.26)NIE -0.39 (0.31)NIE -0.32 0.00 (0.00) -0.01 (0.01) -0.32 0.01 (0.00) -0.01 (0.01)JNIE (0.14) 2.23 0.03 (0.08) (0.44)JNIE -0.48 0.01 (0.01) -0.01 (0.01) -0.29 -0.04 (0.00) -0.38 (0.28)JNIE -0.39 (0.33) Case B
TE 1.08 -0.02 (0.10) -0.01 (0.08) 1.33 -0.09 (0.08) -0.01 (0.08)JNIE 2.08 -0.00 (0.10) (0.12) 2.33 -0.00 (0.08) -0.08 (0.11)NDE -1 -0.01 (0.00) -0.17 (0.04) -1 -0.09 (0.01) 0.08 (0.02)NIE -0.16 -0.01 (0.00) -0.01 (0.01) 0.03 -0.05 (0.01) -0.20 (0.04)NIE -0.25 (0.15)NIE -0.13 0.00 (0.00) 0.01 (0.01) -0.05 -0.02 (0.01) -0.08 (0.01)JNIE (0.16) 2.37 0.00 (0.08) 0.01 (0.10)JNIE -0.29 -0.00 (0.00) -0.01 (0.01) -0.02 -0.07 (0.01) -0.27 (0.08)JNIE -0.34 (0.21) Here we estimate causal effects of having scrubbers installed in January 2005 ( Z ) on annual averageemissions of SO , NO x , and CO in 2005 ( M , M , M ) and on the 2005 annual average ambientPM . concentration within 150km of a power plant ( Y ). Emissions are log-transformed. Beforereporting results, note that basic checks of the fit of marginal nonparametric models appear in WebAppendix I, indicating fit that is clearly superior to simple parametric models.A simple comparison of means indicates that the 150km area around power plants with scrub-bers installed ( Z =
1) had average ambient PM . that was lower, on average, than the areas sur-rounding power plants without scrubbers (12.4 vs. 13.7 µg / m ). Similarly, the power plants withscrubbers also emitted less SO , more NO x , and more CO than the plants without scrubbers. Table1 lists the covariates in X to adjust for confounding and presents summary statistics for scrubberand non-scrubber power plants.We present an analysis with the proposed method using the constrained prior specification18n Section 5.2.3. Analysis using uniform prior distributions on all elements of the correlationmatrix appears in the Web Appendix. All reported estimates are listed as posterior means (95%posterior intervals). The analysis estimates that having scrubbers installed causes SO emissionsto be -1.17 (-1.86, 1.55) 1000 tons lower, on average, than they would be without the scrubber.The analogous causal effects for NO x and CO emissions were 0.04 (0.00, 0.07) 1000 tons and0.001 (-0.00, 0.004) million tons, respectively, indicating that scrubbers did not significantly affectthese emissions, on average. The total effect (TE) of having scrubbers installed on ambient PM . within 150km is estimated to be -1.12 (-2.07, -0.29) µg / m , suggesting a reduction amounting toapproximately 10% of the national annual regulatory standard for PM . . For the k -th emission, let σ k denote the posterior standard deviation of the estimated individual-level causal effect of a scrubber on M k , with posterior mean estimates ˆ σ = .
24, ˆ σ = .
42, ˆ σ = .
02. Let ˆ σ K denote the vector of ˆ σ k for the emissions in K . To summarize dissociative effects, weset C D K = .
25 ˆ σ K to estimate EDE K among power plants where the scrubber effect on emissionsin K is within one-fourth of a standard deviation of the effect in the population. Similarly, wesummarize associative effects with C A K = .
25 ˆ σ K to estimate EAE − K (EAE + K ) among power plantswhere the scrubber causally reduces (increases) emissions in K more than one-fourth of a standarddeviation of the effect in the population.Before providing estimates of specific principal effects, we first examine 3-D surface plots inFigure 2. For each emission separately ( k ∈ { , , } ), Figure 2 depicts estimated scrubber effectson PM . across varying effects on emissions determined by values of ( M k ( ) , M k ( )) simulatedfrom the model. Note the pattern for all emissions that the surfaces are sloped downward in thedirection of increasing M k ( ) and M k ( ) (sloped towards the viewer), indicating larger effects onPM . among plants with larger emissions values under both scrubber statuses, i.e., larger plants.In Figure 2(a) for SO , the dots in the xy -plane lie almost entirely in the region where M ( ) < M ( ) , indicating as expected that scrubbers predominantly decrease SO emissions. Associativeeffects for SO are indicated by the downward slope of the surface in the direction of decreasing M ( ) − M ( ) (towards the left of the viewer), indicating that larger decreases (increases) in SO are associated with larger decreases (increases) in PM . .The analogous surfaces for NO x and CO appear in Figures 2(b) and 2(c), respectively. Incontrast to the surface for SO , the dots in the xy -plane fall more closely and symmetrically aroundthe line M k ( ) = M k ( ) , reflecting that scrubbers do not affect these emissions, on average. Thesurface for NO x exhibits some evidence of associative effects in the opposite direction of those forSO ; there is some downward slope of the surface in the direction of increasing M k ( ) − M k ( ) (towards the right of the viewer), indicating that larger increases (decreases) in these emissions areassociated with larger decreases (increases) in PM . .Table 5 lists posterior mean and standard deviation of EDE, EAE − , and EAE + for all possi-ble K . Estimates of EDE for all K indicate little to no reduction in PM . among plants whereemissions were not affected in excess of C D K , with the exception of some pronounced estimates of19 a) k = ) (b) k = x )(c) k = ) Figure 2: Average surface plots of the causal effect on PM . for different values (log scales)of ( M k ( ) , M k ( )) . Values of ( M k ( ) , M k ( )) are plotted on the x - and y - axes, and determine thecausal effect of a scrubber on emission k . The corresponding value of the causal effect of a scrubberon PM . , Y ( ) − Y ( ) , is plotted on the z -axis. The cloud of points in the xy -plane are one MCMCdraw of 249 pairs of ( M k ( ) , M k ( )) . Red lines are at M k ( ) = M k ( ) (solid line) and + / − .
25 ˆ σ k (dashed lines). 20able 3: Posterior means (standard deviations) for expected associative and dissociative effects ofSO scrubbers. SO NO x CO SO & NO x SO & CO NO x & CO SO & NO x & CO EAE − Mean -1.19 -0.77 -1.14 -0.84 -1.18 -0.90 -0.94SD (0.46) (0.59) (0.56) (0.59) (0.57) (0.67) (0.68)EDE Mean -0.32 -0.69 -0.82 -0.09 -0.31 -0.48 -0.15SD (0.57) (0.54) (0.49) (0.71) (0.68) (0.69) (0.86)EAE + Mean 0.60 -1.68 -1.08 0.38 1.28 -1.63 0.69SD (2.52) (0.74) (0.75) (3.67) (3.78) (1.04) (4.68)
EDE for K = { NO x } and K = { CO } . Estimates of EAE − and EAE + tend to be less than zero.The most pronounced estimate of EAE − K = -1.19 (0.46) for K = { SO } suggests that PM . wasreduced among power plants where SO emissions were substantially reduced, which correspondsto the contour of the surface in Figure 2(a) and is consistent with the anticipated causal pathwaywhereby scrubbers reduce PM . through reducing SO emissions. In accordance with the oppo-site sloping surface in Figures 2(b), the estimate of EAE + K is most pronounced for K = { NO x } , and { NO x , CO } , indicating that ambient PM . is decreased among plants with substantial increases in NO x emissions.Recall that the estimates in Table 5 represent average principal effects over only a subset ofprincipal strata, in particular those where changes in multiple emissions are concordant (i.e., alldecreasing, all increasing, or none changing). Other strata may be of interest. Figure 3 providesestimates of principal effects in a cross-classification of strata defined by changes in CO andSO , with changes defined as increases, decreases, or no change in reference to C D K and C A K . Forexample, the third column of Figure 3 subdivides the stratum defined by causal increases in CO into three substrata: those where CO increases and SO (1) decreases (in excess of C A K ); (2)does not substantially change (beyond C D K ) ; or (3) increases (in excess of C A K ). Principal causaleffect estimates for these three substrata appear along with their relative proportion among thestratum defined by CO increases, indicated by the size of the plotting symbol. The light greydot corresponds to EAE + K for K = { SO , CO } as reported in Table 5, but note that only 4% ofthe CO -increase stratum exhibits SO increases. The dark grey dot corresponds to the principaleffect among the 21% of the CO -increase stratum in substratum (2) where SO does not change,with a principal effect estimate of -0.13 (0.99). The remaining proportion (75%) of the CO -increase stratum belongs to substratum (3) where the plants exhibiting decreases in SO and acorresponding principal effect estimate of -1.21 (0.73). Thus, for K = { CO } , the negative estimateof EAE + K from Table 5 is revealed to be generated in large part by strata where SO decreases andthere is a pronounced negative effect on PM . . Analogously, the second column of Figure 3considering the stratum where CO emissions do not substantially change (used to estimate EDE)reveals that 63% of this strata exhibited causal reduction in SO and a causal reduction in PM . of -0.87 (0.49), explaining in large part the negative estimate of EDE K for K = { CO } in Table 5.Analogous cross-classification of strata by changes in NO x and SO appears very similar to Figure21 . − . . . . . E [ Y ( ) − Y ( ) ] CO2 decrease CO2 no change CO2 increaseSO2 decreaseSO2 no changeSO2 increase
Figure 3: Posterior mean estimates of principal effects for strata defined by cross-classifyingchanges in CO ( x -axis) and changes in SO (colored circles). Size of circle symbolizes the pro-portion of each CO stratum falling in the corresponding SO category, and number (and numberin parentheses) listed is posterior mean proportion (and posterior standard deviation).3 and is not presented.The main conclusions from the principal stratification analysis are that 1) scrubbers reduce SO on average, but not NO x or CO , 2) there is some evidence of a nonzero dissociative effect for SO ,3) associative effects for SO are more pronounced than dissociative effects, with PM . reducedmore around plants where scrubbers cause large reductions in SO ; 4) associative effects for NO x and CO are more pronounced than dissociative effects, with PM . reduced more around plantswhere scrubbers cause larger increases in these emissions; but that 5) strata defined by increases (orno change) in NO x and/or CO are comprised in large part by substrata where SO and PM . werecausally reduced. This analysis points towards (but cannot confirm) the conclusion that scrubbersaffect PM . among plants where emissions are not changed, and that scrubber effects on PM . are mediated in part through effects on SO , with less evidence of a mediating role of NO x andCO . To estimate direct and indirect effects, we augment the principal stratification analysis with As-sumption 3 in Section 5.2.2 about a priori counterfactuals. Figure 1 (top) in the Web Appendixdepicts boxplots of the posterior distributions of TE, NDE, JNIE , JNIE , JNIE , JNIE , andthe individual NIEs. The estimated NDE, representing the direct effect of a scrubber on ambientPM . that is not mediated through any emissions changes, is -0.53 (-1.51, 0.39) µg / m , indicatingno evidence of a direct effect of scrubbers on PM . that is not mediated through SO , NO x , or22O . The NIEs for NO x (NIE ) and CO (NIE ) are estimated to be very close to 0, -0.02 (-0.26,0.21) and -0.04 (-0.33, 0.23), respectively. The estimated NIE for SO (NIE ) is -0.54 (-1.20,-0.01), indicating a significant indirect effect. The joint natural indirect effects involving SO areall similar in magnitude to NIE , with estimates of JNIE , JNIE , and JNIE of -0.56 (-1.23,-0.01), -0.58 (-1.25, -0.02), and -0.59 (-1.27, -0.02), respectively. The estimated JNIE is -0.03(-0.31, 0.23).As discussed in Section 4.3, a benefit of the proposed approach is the accommodation of over-lap between NIEs, and the opportunity to examine the extent of overlap. We evaluate the relation-ship between the joint effects JNIE jk and the mediator-specific effects NIE , NIE , NIE through ( NIE + NIE ) − JNIE = − . ( − . , . ) , ( NIE + NIE ) − JNIE = . ( − . , . ) and ( NIE + NIE ) − JNIE = . ( − . , . ) , which give no evidence of overlap between NIEs.That is, the effect of a scrubber on ambient PM . that is mediated through emissions changes ap-pears to be described by indirect effects that act additively and do not exhibit any apparent synergythat would lead to overlapping effects. The lack of overlapping indirect effects, combined with thefact that a) all indirect effects involving SO (NIE , JNIE , JNIE , and JNIE ) are similar inmagnitude and b) all indirect effects not involving SO (NIE , NIE , JNIE ) are estimated to bezero, provides strong evidence that the effect of scrubbers on PM . is primarily driven by effectson SO .In the Web Appendix, we also conduct inference using flat priors on plausible values of thepartially-identifiable parameters, and the estimates for the effects are similar to those in the mainanalysis.The conclusions of the causal mediation analysis are clear and mostly consistent with thosefrom the principal stratification analysis: scrubber effects on ambient PM . are almost entirelymediated through reductions in SO emissions. Combining reductions in SO with reductions ofNO x and CO does not significantly change the mediated effect. In fact, NO x and CO appear toplay no role in the causal effect of scrubbers on PM . . We conduct two simpler analyses for comparison. First, we implement separate single-mediatoranalyses using the methods described above with K =
1. Results are largely consistent with themultiple mediator analysis, as suggested by the apparent absence of overlapping effects. For SO emissions, the total, indirect and direct effects are estimated to be -1.28 (-2.25, -0.62), -0.70 (-1.51,-0.04) and -0.58 (-1.35, 0.37), respectively. For NO x emissions, the total, indirect and direct effectsare estimated to be -1.21 (-2.05, -0.40), -0.04 (-0.32, 0.28) and -1.17 (-1.99, -0.32), respectively.With CO emissions, the total, indirect and direct effects are estimated to be -1.22 (-1.98, -0.29),0.03 (-0.26, 0.33) and -1.25 (-2.05, -0.30), respectively. Note that significant estimated directeffects for NO x and CO suggest pathways that are not through NO x and CO (i.e., the pathwaythrough SO ).For a second comparison, we conduct a multiple mediator analysis using a traditional regres-sion approach to mediation with the same model in Section 6. The mediation effects are esti-23ated to be NIE = α β = − . ( C . I . − . , . ) , NIE = α β = − . ( C . I . − . , . ) , NIE = α β = . ( C . I . − . , . ) , NDE = β = − . ( C . I . − . , . ) .Thus, while these results are on average consistent with the results from the proposed meth-ods, the estimate of the NIE is not significant. Note that this analysis explicitly assumes thatthe mediators do not interact with each other in the outcome model, implying an estimate ofthe joint indirect effect of all three mediators that is the sum of all three indirect effect (i.e., JNIE = − . ( C . I . − . , . ) ) which is also not significant. The discrepancy betweenthe results of the traditional regression approach and ours is due to our flexible modeling strat-egy using Bayesian nonparametric methods (Dirichlet process mixtures) that even in presence ofadditivity, allows for nonlinearities and non-normal errors. We have developed flexible Bayesian methods for principal stratification and causal mediationanalysis in the presence of multiple mediating variables. To accommodate the setting of mul-tiple pollutants that are emitted contemporaneously and possibly interact with one another, wehave developed methods to accommodate multiple contemporaneous and non-independent medi-ators. Bayesian nonparametric modeling approaches provided flexible models for the observeddata (marginal distribution for each mediator and conditional distribution for the outcome undereach intervention z = , a priori counter-factuals.A key feature of our approach is the integration of principal stratification and causal media-tion analysis in a manner that relies on the same models for the observed data. Deployment ofthese methods in the power plant analysis represents, to our knowledge, the most comprehensiveconsideration of these two approaches and the implications of the results in the context of a singleanalysis. We use Assumption 3 to relate a priori counterfactual outcomes to observed outcomes,and show that this assumption implies homogeneity across principal strata, which aids interpre-tation. This assumption also has close ties to that of sequential ignorability (Imai et al. 2010).Benefits of formulating Assumption 3 as done here include facilitation of a sensitivity analysis tothis assumption following the general approach of Daniels et al. (2012) and the aided interpretationimplied by the relationship to homogeneity assumptions. While a version of sequential ignorabil-ity relevant to the setting of multiple contemporaneous mediators with interactions and that can beused to identify each mediator-specific effect has not been previously formulated, Web AppendixE explores the relationship between our Assumption 3 and sequential ignorability in the case of asingle mediator. In this case, implications of these two assumptions are identical for the types ofestimands considered here, although one assumption does not generally imply the other.The results of the principal stratification and causal mediation analyses should be interpretedjointly and are, in this case study, largely consistent with one another. Principal stratification in-dicated that scrubbers tended to decrease ambient PM . around plants where scrubbers substan-24ially reduced SO emissions, a result consistent with the estimated natural indirect effects fromthe mediation analysis. Jointly interpreting results related to other emissions proved more sub-tle, and highlighted the difficulty involved in interpreting principal effects as mediated effects, inparticular when there are multiple mediators. A finer examination of principal strata defined bycross-classification of SO changes and changes in CO (or NO x ) revealed the dominating roleof scrubber effects on SO that was corroborated by the results of the mediation analysis. Thiscross-classification also reconciled the lack of evidence for a natural direct effect with the apparentevidence of dissociative effects pertaining to NO x and CO that were revealed to be driven primar-ily by changes in SO . The evidence of nonzero dissociative effects for SO is likely explainedby the negative expected direct effect. The relative magnitudes of principal effects and mediationeffects are consistent with the well-known result that, in general, associative effects are a mixture ofdirect and indirect effects. Overall, these results are largely consistent with expectations: scrubbersappear to causally reduce SO emissions but not those of NO x or CO ; scrubbers causally reduceambient PM . (within 150km); the effect on PM . is primarily mediated by causal reductions inSO emissions and not NO x or CO emissions; and there appears to be direct effect of scrubberson PM . .The results of this case study should be interpreted in light of several important limitations.First is the relative simplicity with which we linked power plants to monitors. Specifically, ourstrategy links power plants to all of the ambient monitors within 150km. Thus, our analysis is ofthe causal effects of scrubbers on average PM . measured within 150km. This likely does notreflect the full effect of emissions changes on ambient air quality, which are expected to have im-plications at distances greater than 150km. A related limitation is the assumption that there is nointerference between observations. If the effect of a scrubber on ambient PM . extends far enoughbeyond 150km so that a scrubber at a given power plant causally affects ambient PM . surround-ing other power plants, then this assumption would be violated. More sophisticated strategies forcausal inference in the presence of interference and for linking ambient monitors to power plantsbased on features such as atmospheric conditions and weather patterns are warranted. Nonethelessanalysis presented here represents an important approximation that still yields valuable conclu-sions, especially with respect to quantifying causal pathways. Another important limitation of thisanalysis is that it assumes that the factors listed in Table 1 are sufficient to control for confounding,which in this case would consist of differences between power plants or other features related toambient PM . that are also associated with whether a power plant had scrubbers installed in 2005.Our approach is not readily extended to categorical mediators. We save this as potential future re-search. Despite these limitations, we have developed new statistical methodology and leveraged anunprecedented linked data base to provide the first empirical evaluation of the presumed causal re-lationships that motivate a variety of regulations for improving ambient air quality and, ultimately,human health. 25 cknowledgements This work was supported by NIH R01ES026217, NIH R01CA183854, NIH R01GM112327, NIHR01GM111339, NCI P01 CA134294, HEI 4909, HEI 4953, and USEPA RD-83587201. Its con-tents are solely the responsibility of the grantee and do not necessarily represent the official viewsof the USEPA. Further, USEPA does not endorse the purchase of any commercial products orservices mentioned in the publication. supplement
Web Appendices A-J, tables and figures are provided as supplementary materials.
References
Albert, J. M. & Nelson, S. (2011), ‘Generalized causal mediation analysis’,
Biometrics (3), 1028–1038.Baccini, M., Mattei, A. & Mealli, F. (2015), ‘Bayesian inference for causal mechanisms withapplication to a randomized study for postoperative pain control’, DISIA Working Paper .Barnard, J., McCulloch, R. & Meng, X.-L. (2000), ‘Modeling covariance matrices in terms of stan-dard deviations and correlations, with application to shrinkage’,
Statistica Sinica (4), 1281–1311.Baron, R. M. & Kenny, D. A. (1986), ‘The moderator–mediator variable distinction in social psy-chological research: Conceptual, strategic, and statistical considerations’, Journal of Personalityand Social Psychology , 1173–1182.Bartolucci, F. & Grilli, L. (2011), ‘Modeling partial compliance through copulas in a principalstratification framework’, Journal of the American Statistical Association (494), 469–479.Celeux, G., Forbes, F., Robert, C. P., Titterington, D. M. et al. (2006), ‘Deviance informationcriteria for missing data models’,
Bayesian analysis (4), 651–673.Chestnut, L. G. & Mills, D. M. (2005), ‘A fresh look at the benefits and costs of the US acid rainprogram’, Journal of Environmental Management (3), 252–266.Conlon, A. S. C., Taylor, J. M. G. & Elliott, M. R. (2014), ‘Surrogacy assessment using principalstratification and a gaussian copula model’, Statistical methods in medical research (0), 1–20.Daniel, R. M., De Stavola, B. L., Cousens, S. N. & Vansteelandt, S. (2015), ‘Causal mediationanalysis with multiple mediators’, Biometrics (1), 1–14.26aniels, M. J., Roy, J. A., Kim, C., Hogan, J. W. & Perri, M. (2012), ‘Bayesian inference for thecausal effect of mediation’, Biometrics (4), 1028–1036.Dominici, F., Greenstone, M. & Sunstein, C. R. (2014), ‘Particulate Matter Matters’, Science (6181), 257–259.Forastiere, L., Mealli, F. & VanderWeele, T. J. (2016), ‘Identification and estimation of causalmechanisms in clustered encouragement designs: Disentangling bed nets using Bayesian princi-pal stratification’,
Journal of the American Statistical Association (514), 510–525.Frangakis, C. E. & Rubin, D. B. (2002), ‘Principal stratification in causal inference’,
Biometrics (1), 21–29.Gilbert, P. B. & Hudgens, M. G. (2008), ‘Evaluating candidate principal surrogate endpoints’, Biometrics (4), 1146–1154.Gustafson, P. (2010), ‘Bayesian inference for partially identified models’, The international journalof biostatistics (2).HEI Accountability Working Group (2003), Assessing the Health Impact of Air Quality Regula-tions: Concepts and Methods for Accountability Research. Communication 11. , Health EffectsInstitute, Boston, MA.Hodan, W. M. & Barnard, W. R. (2004), Evaluating the contribution of pm2.5 precursor gasesand re-entrained road emissions to mobile source pm2.5 particulate matter emissions, in ‘13thInternational Emission Inventory Conference ”Working for Clean Air in Clearwater”’.Imai, K., Keele, L. & Yamamoto, T. (2010), ‘Identification, inference and sensitivity analysis forcausal mediation effects’, Statistical Science , 51–71.Imai, K. & Yamamoto, T. (2013), ‘Identification and sensitivity analysis for multiple causal mech-anism: Revisiting evidence from framing experiments’, Political Analysis (2), 141–171.Imbens, G. W. & Rubin, D. B. (1997), ‘Bayesian inference for causal effects in randomized exper-iments with noncompliance’, The Annals of Statistics pp. 305–327.Jara, A., Hanson, T. E., Quintana, F. A., M¨uller, P. & Rosner, G. L. (2011), ‘Dppackage: Bayesiannon-and semi-parametric modelling in r’,
Journal of statistical software (5), 1–30.Jin, H. & Rubin, D. B. (2008), ‘Principal stratification for causal inference with extended partialcompliance’, Journal of the American Statistical Association (481), 101–111.Joffe, M. M. & Greene, T. (2009), ‘Related causal frameworks for surrogate outcomes’,
Biometrics (2), 530–538. 27im, C., Daniels, M. J., Marcus, B. H. & Roy, J. A. (2016), ‘A framework for Bayesian nonpara-metric inference for causal effects of mediation’, to appear in Biometrics .Ma, Y., Roy, J. A. & Marcus, B. H. (2011), ‘Causal models for randomized trials with two activetreatments and continuous compliance’, Statistics in Medicine (19), 2349–2362.MacKinnon, D. P. (2008), Introduction to Statistical Mediation Analysis , Lawrence Earlbaum As-sociates, New York.Mattei, A. & Mealli, F. (2011), ‘Augmented designs to assess principal strata direct effects’,
Journalof the Royal Statistical Society: Series B (Statistical Methodology) (5), 729–752.Mealli, F. & Mattei, A. (2012), ‘A refreshing account of principal stratification’, The internationaljournal of biostatistics (1).Mealli, F. & Pacini, B. (2013), ‘Using secondary outcomes to sharpen inference in ran-domized experiments with noncompliance’, Journal of the American Statistical Association (503), 1120–1131.Mealli, F. & Rubin, D. B. (2003), ‘Commentary: Assumptions allowing the estimation of directcausal effects’,
Journal of Econometrics (1), 79–87.Molitor, J., Papathomas, M., Jerrett, M. & Richardson, S. (2010), ‘Bayesian profile regression withan application to the national survey of children’s health’,
Biostatistics (3), 484–498.M¨uller, P., Erkanli, A. & West, M. (1996), ‘Bayesian curve fitting using multivariate normal mix-tures’, Biometrika (1), 67–79.Nelsen, R. B. (1999), An introduction to copulas , Vol. 139, Springer Science & Business Media.Office of Management and Budget (2013), 2013 Draft report to Congress on the benefits and costsof federal regulation and unfunded mandates on state, local, and tribal entities, Technical report,OMB, Washington, DC.Pearl, J. (2001), Direct and indirect effects, in ‘Proceedings of the 17th Conference on Uncertaintyin Artificial Intelligence’, San Francisco, CA: Morgan Kaufman, pp. 411–420.Pitt, M., Chan, D. & Kohn, R. (2006), ‘Efficient bayesian inference for gaussian copula regressionmodels’, Biometrika (3), 537–554. URL: http://biomet.oxfordjournals.org/content/93/3/537.abstract
Pope III, C. A., Ezzati, M. & Dockery, D. W. (2009), ‘Fine-particulate air pollution and life ex-pectancy in the United States’,
New England Journal of Medicine (4), 376–386.Robins, J. M. & Greenland, S. (1992), ‘Identifiability and exchangeability for direct and indirecteffects’,
Epidemiology , 143–155. 28osenthal, J. S. (2011), Handbook of Markov Chain Monte Carlo , CRC Press, Boca Raton, FL,chapter Optimal Proposal Distributions and Adaptive MCMC, pp. 93–112.Rubin, D. B. (1974), ‘Estimating causaleffects of treatments in randomized and nonrandomizedstudies’,
Journal of Educational Psychology , 688–701.Rubin, D. B. (2004), ‘Direct and Indirect Causal Effects via Potential Outcomes’, ScandinavianJournal of Statistics (2), 161–170.Schwartz, S. L., Li, F. & Mealli, F. (2011), ‘A Bayesian semiparametric approach to intermediatevariables in causal inference’, Journal of the American Statistical Association (496), 1331–1344.Sethuraman, J. (1994 a ), ‘A Constructive Definition of Dirichlet Priors’, Statistica Sinica (2), 639–650.Sethuraman, J. (1994 b ), ‘A constructive definition of Dirichlet priors’, Statistica Sinica , 639–650.Taddy, M. A. (2008), Bayesian Nonparametric analysis of conditional distributions and inferencefor poisson point processes, PhD thesis, University of California, Santa Cruz.Taguri, M., Featherstone, J. & Cheng, J. (2015), ‘Causal mediation analysis with multiple causallynon-ordered mediators’, Statistical methods in medical research p. 0962280215615899.U.S. EPA (2013), Workshop on Designing Research to Assess Air Quality and Health Outcomesfrom Air Pollution Regulations, in ‘Designing Research to Assess Air QuAlity and Health Out-comes from Air Pollution Regulations’, Research Triangel Park, NC.VanderWeele, T. J. (2008), ‘Simple relations between principal stratification and direct and indirecteffects’, Statistics & Probability Letters (17), 2957–2962.VanderWeele, T. J. (2009), ‘Marginal structural models for the estimation of direct and indirecteffects’, Epidemiology (1), 18–26.VanderWeele, T. J. (2011), ‘Principal stratification–uses and limitations’, The international journalof biostatistics (1), 1–14.VanderWeele, T. J. & Vansteelandt, S. (2014), ‘Mediation analysis with multiple mediators’, Epi-demiologic Methods (1), 95–115.Wang, W., Nelson, S. & Albert, J. M. (2013), ‘Estimation of causal mediation effects for a dichoto-mous outcome in multiple-mediator models using the mediation formula’, Statistics in Medicine (24), 4211–4228.Zigler, C. M., Dominici, F. & Wang, Y. (2012), ‘Estimating causal effects of air quality regula-tions using principal stratification for spatially correlated multivariate intermediate outcomes’, Biostatistics (Oxford, England) (2), 289–302.29 . Observed Data Models and Prior Specification We specify Dirichlet process mixtures of truncated normals (bounded from below at 0) for thedistribution of each mediator (M¨uller et al. 1996). Specifically, for each intervention z = , k = , , X = x , the conditional distribution of the k -th observed mediatoris specified as M k , i | Z i = z , X i = x i ∼ N ( β zk , i + x (cid:62) i β zk , τ zk , i ) , M k , i ≥ ; i = , · · · , n z β zk , i , τ zk , i ∼ F zk , F zk ∼ DP ( λ zk , F zk ) , where the subscript i indexes the observation, k indicates the k -th mediator, and the superscript z indicates the intervention arm. For example, β zk , i and τ zk , i denote the intercept and precisionparameters for the k -th emission at the i -th power plant that received intervention z . Here, DP denotes the Dirichlet process with two parameters, a mass parameter ( λ zk ) and a base measure ( F zk )for each mediator k and intervention z . To not overly complicate the model we only ‘mixed’ overthe intercept parameters and precisions in the conditional distributions, β zk , i and τ zk , i . The basedistribution F zk is taken to be the normal-Gamma distribution, N ( µ zk , S zk ) G ( a zk , b zk ) , where S zk is theprecision parameters and the Gamma is parametrized as the mean to be a zk / b zk and we set a Gammaprior G ( , ) on the mass parameter λ zk . For the hyper-priors, we follow the specification fromTaddy (2008) such that µ zk ∼ ( µ z (cid:63) k , S z (cid:63) k ) , S zk ∼ G ( a z (cid:63) k , b z (cid:63) k ) and a zk = b zk = S z (cid:63) k is set to 2 / ˆ Σ zk and µ z (cid:63) k is set to the mean of the data. And a z (cid:63) k ∼ Unif ( , ) , b z (cid:63) k = a zk × ˆ Σ zk /
2. From these specifications, E ( τ zk , i ) = E ( S zk ) = S z (cid:63) k = / ˆ Σ zk (i.e., the expected variance components are an attenuated value ofthe MLE of the variance of the data). These observed-data models can be represented using thestick-breaking construction (Sethuraman 1994 a ) which can be approximated by a finite mixture ofnormals such that, for example, the conditional distribution of M under intervention z = f M ( m | z = , x ) = K ∑ k = θ k N ( m ; β z = , k + x (cid:62) β z = , τ z = , k ) , where θ k = θ (cid:48) k ∏ h < k ( − θ (cid:48) k ) , θ (cid:48) k ∼ Beta ( , λ z = ) , and ( β z = , k , τ z = , k ) iid ∼ F z = and K is a maximumnumber of clusters. We use the stick-breaking construction for posterior samplings.To model the distributions of the potential outcomes for each z = , Z i = ( Y i , M ( , , ) , M ( , , ) , X i ) . The model for thejoint distribution of Z i for each z = , Z i | µ zi , Σ zi ∼ N ( µ zi , Σ zi ) , i = , · · · , n z ( µ zi , Σ zi ) | G z ∼ G z G z | α z G z ∼ DP ( α z G z ) G z = N ( µ z | m , ( / k ) Σ z ) IW ( Σ z | , ψ z ) . The following hyperpriors are also specified: α z ∼ Gamma ( , ) , m z ∼ N ( m z , s ) , k ∼ Gamma ( . / , . / ) and ψ z ∼ IW ( , ψ z ) where m z = mean ( Z i ) for i = , · · · , n z , s = . ( Z i ) for i = , · · · , n z and ψ z = s . This joint distributioninduces the following conditional distribution model of the outcome for each intervention z = , f ( y i | m i ( , , ) , m i ( , , ) , x i , Z i = z ) = ∞ ∑ l = ω zl N ( y i , m i ( , , ) , m i ( , , ) , x i | µ zl , Σ zl ) where ω zl = γ zl / ( ∑ ∞ j = γ zj N ( m i ( , , ) , m i ( , , ) , x i | µ zj , \ , Σ zj , ( \ , \ ) )) and µ zj , \ denotes all elementsof mean parameters µ zj except for Y i . Similarly, Σ zj , ( \ , \ ) denotes a submatrix of covariance matrix Σ zj formed by deleting the the first row and the first column. The weight consists of the parameterwith this prior γ (cid:48) , zj ∼ Beta ( , α z ) where γ zj = γ (cid:48) , zj ∏ h < j ( − γ (cid:48) , zh ) . In the MCMC computation, we use theR package, DPpackage (Jara et al. 2011), for the subroutine to fit the joint model in each iteration.
B. Specification of Correlations in Assumption 2
To better understand the specification of the correlation structure in Assumption 2, we provide atable based on two mediators (e.g., K = r jk ( z ) be the correlation between M j ( z ) and M k ( z ) . M ( ) M ( ) M ( ) M ( ) M ( ) r ( ) = r ( ) · [ r ( ) + r ( )] · ρ · [ r ( ) + r ( )] · ρ M ( ) r ( ) = · [ r ( ) + r ( )] · ρ · [ r ( ) + r ( )] · ρ M ( ) r ( ) = r ( ) M ( ) r ( ) = Table 4: Colored cells represent correlations identified from the observed data.
C. Estimation of the Causal Effects
In the following, we show that Assumptions 1-3 are sufficient to estimate the principal causaleffects and the NDE, JNIE’s and NIE’s. 31 .1. Estimation of the principal causal effects
EDE K (3) = E [ Y ( M ( , , )) − Y ( M ( , , )) (cid:12)(cid:12) | ( M ( , , ) − M ( , , )) | K < C D K ]= E [ Y ( ) − Y ( ) (cid:12)(cid:12) | ( M ( , , ) − M ( , , )) | K < C D K ]= E [ Y ( ) (cid:12)(cid:12) | ( M ( , , ) − M ( , , )) | K < C D K ] − E [ Y ( ) (cid:12)(cid:12) | ( M ( , , ) − M ( , , )) | K < C D K ]= (cid:90) | ( M ( , , ) − M ( , , )) | K < C D K (cid:90) y y dF Y ( ) | M ( , , ) , M ( , , ) ( y ) dF M ( , , ) , M ( , , ) ( m , m ) − (cid:90) | ( M ( , , ) − M ( , , )) | K < C D K (cid:90) y y dF Y ( ) | M ( , , ) , M ( , , ) ( y ) dF M ( , , ) , M ( , , ) ( m , m ) where F Y ( ) | M ( , , ) , M ( , , ) ( y ) , F Y ( ) | M ( , , ) , M ( , , ) ( y ) , and F M ( , , ) , M ( , , ) ( m , m ) denote theconditional distributions of the outcomes and the joint distribution of the mediators, all of which areestimated from the observed data with Assumptions 1 and 2 (and the conditional outcome model).Similarly, we estimate EAE K . C.2. Estimation of the NDE, JNIE’s and NIE’s
The NDE, JNIE’s and NIE’s conditional on covariates X = x are estimated with NDE ( x )= E [ Y ( M ( , , )) − Y ( M ( , , )) | X = x ]= (cid:90) E [ Y ( M ( , , )) | M ( , , ) = m , M ( , , ) = m , X = x ] × dF M ( , , ) , M ( , , ) | X = x ( m , m ) − E [ Y ( M ( , , )) | X = x ]= (cid:90) E [ Y ( M ( , , )) | M ( , , ) = m , X = x ] dF M ( , , ) | X = x ( m ) − E [ Y ( M ( , , )) | X = x ] by Assumption 3 = (cid:90) E [ Y ( ) | M ( , , ) = m , X = x ] dF M ( , , ) | X = x ( m ) − E [ Y ( ) | X = x ] , where the second term is estimated by the observed data model under Assumption 1 and all ele-ments in the first term are estimated from the observed data with Assumptions 1 and 2 (and the32onditional outcome model). For the JNIE, we can estimate with JNIE ( x )= E [ Y ( M ( , , )) − Y ( M ( , , )) | X = x ]= E [ Y ( M ( , , )) | X = x ] − (cid:90) E [ Y ( M ( , , )) | M ( , , ) = m , M ( , , ) = m , X = x ] × dF M ( , , ) , M ( , , ) | X = x ( m , m )= E [ Y ( M ( , , )) | X = x ] − (cid:90) E [ Y ( M ( , , )) | M ( , , ) = m , X = x ] dF M ( , , ) | X = x ( m ) by Assumption 3 = E [ Y ( ) | X = x ] − (cid:90) E [ Y ( ) | M ( , , ) = m , X = x ] dF M ( , , ) | X = x ( m ) , where the first term is estimated by the observed data model under Assumption 1 and all elementsin the second term are estimated from the observed data with Assumptions 1 and 2 (and the condi-tional outcome model). Also, NIE ( x )= E [ Y ( M ( , , )) − Y ( M ( , , )) | X = x ]= E [ Y ( M ( , , )) | X = x ] − (cid:90) E [ Y ( M ( , , )) | M ( , , ) = m , M ( , , ) = m , X = x ] × dF M ( , , ) , M ( , , ) | X = x ( m , m )= E [ Y ( M ( , , )) | X = x ] − (cid:90) E [ Y ( M ( , , )) | M ( , , ) = m , X = x ] dF M ( , , ) | X = x ( m ) by Assumption 3 = E [ Y ( ) | X = x ] − (cid:90) E [ Y ( ) | M ( , , ) = m , X = x ] dF M ( , , ) | X = x ( m ) , where m denotes a vector of values for the mediators M , M , M under interventions 0 , , . Assumption 3 and the Assumptions of Homogeneity Across Princi-pal Strata The homogeneity assumption in Forastiere et al. (2016) applied to both intervention arms andextended to the case of multiple mediators can be written as: Y ( ( m , m , m )) ⊥⊥ M ( − z , − z , − z ) | M ( z , z , z ) = ( m , m , m ) , X = x , (4)where m , m , m are realized emissions values for the first, second and third mediators and thetreatment is denoted as z ∈ { , } . For a priori counterfactuals defined based on values of M ( z , z , z ) ,Assumption 3 implies this assumption:Invoking SUTVA, Assumption 3 can be represented as: f ( Y ( ( m , m , m )) | M ( , , ) = ( m , m , m ) , M ( , , ) , x )= f ( Y ( ( m , m , m )) | M ( , , ) , M ( , , ) = ( m , m , m ) , x ) . (5)Since Equation (5) holds regardless of the values for M ( , , ) in the LHS, it implies Y ( ( m , m , m )) ⊥⊥ M ( , , ) | M ( , , ) = ( m , m , m ) , X = x . (6)Similarly, since Equation (5) holds regardless of the values for M ( , , ) in the RHS, it implies Y ( ( m , m , m )) ⊥⊥ M ( , , ) | M ( , , ) = ( m , m , m ) , X = x , (7)which together imply (4).However, this homogeneity only applies to a priori counterfactuals relying on M ( z , z , z ) . Thoseof the more general form M ( z , z , z ) cannot be assumed homogeneous across principal stratasince a priori counterfactual mediator values defined as simultaneously subject to different inter-ventions do not appear in the definition of principal strata. Thus, Assumption 3 and the extendedhomogeneity assumptions in (4) are equivalent in the case of a single mediator (see Section 8 fordetails), but with multiple mediators Assumption 3 entails additional assumptions about a priori counterfactuals used in the decomposition of the overall natural indirect effect into indirect effectsattributable to subsets of the mediators. D.1. Single Mediatior Case
With a single mediator, Assumption 3 and the extended homogeneity assumptions are equivalent.First, Assumption 3 implies the following two homogeneity assumptions: Y ( m ) ⊥ M ( ) | M ( ) = m , X = x (8) Y ( m ) ⊥ M ( ) | M ( ) = m , X = x . (9)The statement in (8) implies that the distribution of the a priori counterfactual Y ( M ( )) is ho-mogenous across all principal strata with M ( ) = m , regardless of the value of M ( ) . The statement34n (9) implies that the distribution of the observable counterfactual Y ( M ( )) is homogeneousacross principal strata with M ( ) = m . Estimation of the distribution of a priori counterfactual Y ( M ( )) with M ( ) = m follows using observed values of Y ( M ( )) among observations with Z = M ( ) = m .To show that the extended homogeneity assumptions, Y ( m ) ⊥ M ( z ) | M ( − z ) = m , X = x for z = , , (10)imply Assumption 3, recall that Y ( ( m )) ⊥ M ( ) | M ( ) = m , X = x implies that the distributionof potential outcomes Y ( M ( )) with mediator M ( ) = m are the same for all values of M ( ) ,which can be represented as f ( Y ( M ( )) | M ( ) = m a , M ( ) = m , X = x ) (11) = f ( Y ( M ( )) | M ( ) = m b , M ( ) = m , X = x ) , for all m a and m b . Also, Y ( ( m )) ⊥ M ( ) | M ( ) = m , X = x implies that the distribution ofpotential outcomes Y ( M ( )) with mediator M ( ) = m are the same for all values of M ( ) , whichcan be represented as f ( Y ( M ( )) | M ( ) = m , M ( ) = m a , X = x ) (12) = f ( Y ( M ( )) | M ( ) = m , M ( ) = m b , X = x ) , for all m a and m b . Combining (11) and (12), we have Assumption 3 as follows f ( Y ( M ( )) | M ( ) = m , M ( ) , X = x )= f ( Y ( M ( )) | M ( ) = m , M ( ) = m , X = x ) by (11) = f ( Y ( m ) | M ( ) = m , M ( ) = m , X = x ) by (SUTVA) = f ( Y ( M ( )) | M ( ) = m , M ( ) = m , X = x ) by (SUTVA) = f ( Y ( M ( )) | M ( ) , M ( ) = m , X = x ) by (12) . Therefore, in the setting of a continuous single mediator, Assumption 3 and the extended homo-geneity assumptions (10) are equivalent.
E. Assumption 3 and the Sequential Ignorability Assumption
Even though Assumption 3 and the sequential ignorability assumption do not imply one another,they are closely related to each other. Assumption 3 implies a consequence of the sequential ignor-ability (S.I.) assumption (Imai et al. 2010) in the setting of K = K =
1) and the SI have the same implication from an inferential perspective. Specifically, S.I.implies f ( Y ( M ( )) | M ( ) = m , X = x ) = f ( Y ( M ( )) | M ( ) = m , X = x ) , f ( Y ( M ( )) | M ( ) = m , M ( ) , X = x )= f ( Y ( M ( )) | M ( ) , M ( ) = m , X = x ) . The difference is that Assumption 3 states that this relationship holds for all values of M ( ) in theLHS and all values of M ( ) in the RHS while the consequence of the S.I. only holds when M ( ) and M ( ) are marginalized over in the LHS and the RHS, respectively. F. Posterior Inference
Full Bayesian inference for the principal causal effects and the natural direct and indirect ef-fects is based on posterior samples of the parameters, θ , of the observed-data models since allthe causal effects of interest are functions of these parameters. We ran 10,000 MCMC iterationsfor the observed-data posterior and used the last 7,500 with thinning of 5 to obtain N = β j for all j (Molitor et al.2010). Visual inspection of time-series plots provided no indication against convergence. R codesused to perform the MCMC parameter estimation and post-processing, are available in GitHub(https://github.com/lit777/MultipleMediators).Posterior sampling for our approach consists of four major steps: (1) sample from the marginaldistributions of the mediators, (2) sample from the correlation matrix of the Gaussian copula, (3)impute missing mediators from the joint distribution, and (4) sample from the conditional distribu-tion of the outcome given each set of posterior samples of the mediators and observed covariates.In this paper, we use the efficient Bayesian approach for sampling from Gaussian copula modelsoutlined in Pitt et al. (2006). For notational simplicity, denote T i = M , i ( ) , T i = M , i ( ) , T i = M , i ( ) , T i = M , i ( ) , T i = M , i ( ) , T i = M , i ( ) and H ji = Φ − { F j ( T ji ; θ j , X i ) } where θ j is avector of parameters in the j -th marginal and Φ − is the inverse univariate standard normal CDF.In the first step, for j = , · · · ,
6, the parameters θ j are sampled and H ji is updated; then, in thesecond step, the correlation matrix R is sampled. Missing mediator values are imputed based onthe joint distribution of all potential mediators and (observed) outcomes. Finally, we sample ξ z , thevector of parameters in the outcome model for Z = z , all potential mediators and covariates.The likelihood of θ and R in the Gaussian Copula model has the form f ( T , X | θ , R ) = n ∏ i = f ( T · i | X i , θ , R ) f ( X i )= n ∏ i = f ( T i , T i , T i , T i , T i , T i | X i , θ , R ) f ( X i )= | R | / − n / n ∏ i = exp (cid:26) H (cid:62) · i ( I − R − ) H · i (cid:27) ∏ j = f ( T ji | X i , θ j ) f ( X i ) , (13)36here T · i = { T i , T i , T i , T i , T i , T i } (cid:62) , H · i = { H i , H i , H i , H i , H i , H i } (cid:62) and f ( X i ) denotes theempirical PDF of covariates X i . Assuming independent priors p ( θ ) = ∏ j = p ( θ j ) , p ( R ) and p ( ξ ) = p ( ξ ) p ( ξ ) , the posterior distribution of [ θ , ξ , R ] can be represented as follows (Imbens& Rubin (1997), Jin & Rubin (2008) and Schwartz et al. (2011)), f ( θ , ξ , R | T obs , Y obs , Z , X ) ∝ p ( θ ) p ( ξ ) p ( R ) (cid:90) n ∏ i = f ( T i , T i , T i , T i , T i , T i , Y ( ) , Y ( ) | X i , θ , ξ , R ) d T mis · i d Y mis i . We can obtain the joint distribution, f ( θ , ξ , R , T mis | T obs , Y obs , Z , X ) by sampling iterativelyfrom f ( θ | T obs , T mis , Y obs , X , ξ , R ) , f ( T mis | T obs , Y obs , X , θ , ξ , R ) , f ( R | T obs , T mis , Y obs X , θ , ξ ) and f ( ξ | T obs , T mis , Y obs , X , θ , R ) .Specifically, we can use four steps: • Step 1. For j = , · · · ,
6, sample θ j from f ( θ j | T obs j · , T mis j · , H \ j · , X , R ) where H \ j · denotesall elements of H except { H j , H j , · · · , H jn } and T obs j · denotes all observed entries of T ji for i = , · · · , n . Then, update H ji = Φ − { F j ( T ji ; θ j , X i ) } for all i ∈ { , · · · , n } . • Step 2. Sample R from f ( R | T obs , T mis , X , θ ) . • Step 3. For each j = , · · · ,
6, impute T mis j · from f ( T mis j · | T obs j · , T \ j · , Y obs , X , θ , ξ , R ) andupdate H mis j · = Φ − { F j ( T mis j · ; θ j , X ) }• Step 4. Sample ξ from f ( ξ | T obs , T mis , Y obs , X , θ , R ) which is proportional to p ( ξ ) ∏ ni = f ( Y i ( ) | T mis · i , T obs · i , X i ; ξ ) ( − Z i ) f ( Y i ( ) | T mis · i , T obs · i , X i ; ξ ) ( Z i ) f ( T · i , X i | θ , R ) .Note that the first step represents sampling from θ j ∼ f ( θ j | T j , · , X , θ \ j , R ) for j = , · · · , θ \ j = { θ , θ , · · · , θ } \ θ j since f ( θ j | T j · , X , θ \ j , R ) is equivalent to f ( θ j | T j · , H \ j · , X , R ) (be-cause θ \ j affects θ j only through H \ j · in (13)). We obtainlog f ( θ j | T j · , H \ j · , X , R )= const + ( − R − j j ) n ∑ i = H ji − n ∑ i = ∑ k = , k (cid:54) = j ( R − ) jk H ji H ki (14) + n ∑ i = log f ( T ji | X i , θ j ) + log p ( θ j ) . Since H ji = Φ − { F j ( T ji ; θ j , X i ) } , we have H ji in the expression (14) despite conditioning only on H \ j · . Step 1
We specify a Bayesian nonparametric model (Dirichlet process mixtures of truncated normals;DPM) for f ( T ji | X i , θ j ) which makes it difficult to generate θ j from the nonstandard conditionaldensity of θ j . Also, the number of covariates can be large, which makes it hard to sample directlyfrom the conditional density. To overcome these issues, we use a block Metropolis algorithm. As37e mentioned earlier in Section A, we use the finite stick-breaking approximation of the DPMspecification (Sethuraman 1994 a ) for the marginal distribution, f ( T ji | X i , θ j ) = K ∑ k = ω k N ( T ji ; β j , ( k ) + X i β j , τ j , ( k ) ) , where ( β j , ( k ) , τ j , ( k ) ) iid ∼ N ( µ j , S j ) G ( a j , b j ) and ω k = ω (cid:48) k ∏ h < k ( − ω (cid:48) h ) , ω (cid:48) h ∼ Beta ( , λ j ) and K isa maximum number of clusters; we set 8 because the smallest cluster of each marginal modelhas a sufficiently small weight, min ( ω k ) < . λ j ∼ G ( , ) , µ j ∼ N ( µ (cid:63) j , S (cid:63) j ) , S j ∼ G ( a (cid:63) j , b (cid:63) j ) , a j = b j = a (cid:63) j ∼ Unif ( , ) and b (cid:63) j = a (cid:63) j (see Section A fordetails of the specification). At every iteration of the MCMC scheme for Step 1, we need to samplevalues λ j , µ j , a (cid:63) j , S j , β j and ω k , β j , ( k ) , τ j , ( k ) for k = , · · · , K using a block Metropolis algorithm.Specifically, for each j = , · · · , Ω = { ω , · · · , ω K } , λ j : Denote the latent cluster indicator variable for each subject i as Z i which can take a value in { , , · · · , K } . Also, denote the values of β j , ( k ) , β j , τ j , ( k ) , ω at the t -th iteration as β j , ( k ) ( t ) , β j ( t ) , τ j , ( k ) ( t ) , ω k ( t ) . Draw Z i from Z i ∼ Categorical ( p , p , · · · , p K ) where p k = N ( T ji ; β j , ( k ) ( t − )+ X i β j ( t − ) , τ j , ( k ) ( t − )) × ω k ( t − ) . Draw ω (cid:48) k ∼ Beta ( + n k , λ j + ∑ K − q = h + n q ) for k = , · · · K − n k denotes the number of subjects having Z i = k .Then, update Ω = { ω , · · · , ω K } via ω k = ω (cid:48) k ∏ h < k ( − ω (cid:48) h ) . Then, draw λ j from Gamma ( + K − , − ∑ Kk = ( log ( − ω k ))) .Step 1.b a (cid:63) j , µ j , S j : Denote the values at the t -th iteration as a (cid:63) j ( t ) , µ j ( t ) , S j ( t ) . We propose values in-dependently from a (cid:63) j ( prop ) ∼ Unif ( , ) , µ j ( prop ) ∼ N ( µ j ( t − ) , / Σ j ) , S j ( prop ) ∼ Unif ( S j ( t − ) − . , S j ( t − ) + . ) and denote this joint distribution as q { Ψ prop ; Ψ ( t − ) } where Ψ prop = { a (cid:63) j ( prop ) , µ j ( prop ) , S j ( prop ) } and Ψ ( t ) = { a (cid:63) j ( t ) , µ j ( t ) , S j ( t ) } . Then, calculate the ac-ceptance probability of the proposed values AR = min (cid:26) , f ( θ j ( prop ) | T j · , H \ j · , X , R ) q { Ψ ( t − ) ; Ψ prop } f ( θ j ( t − ) | T j · , H \ j · , X , R ) q { Ψ prop ; Ψ ( t − ) } (cid:27) , where f ( θ j ( prop ) | T j · , H \ j · , X , R ) denotes the conditional distribution in (14) before log-transformationand set λ j , a j , a (cid:63) j , µ j , S j to their proposed values and other parameters are set to their currentvalues. Similarly, for f ( θ j ( t − ) | T j · , H \ j · , X , R ) , we plug in the values from the ( t − ) -thiteration instead. Then, we accept the proposed value with probability AR .Step 1.c β j = { β j , ( ) , β j , ( ) , · · · , β j , ( K ) } : propose values from β j , ( k ) , ( prop ) ∼ N ( β j , ( k ) ( t − ) , . ) and use the same Metropolis algorithm as in the previous sub-step.Step 1.d β j = { β j , ( ) , β j , ( ) , · · · , β j , ( P ) } for the coefficients of P covariates: propose values usingadaptive Metropolis (Rosenthal 2011) from β j , ( prop ) ∼ MV N ( β j ( t − ) , Σ † ) where Σ † =( . ) / ( P ) Σ t − + . / ( P ) I p with the empirical covariance matrix Σ t of β j ( ) , β j ( ) , · · · , β j ( t − ) and use the same Metropolis algorithm as in the previous sub-step.38tep 1.e τ j = { τ j , ( ) , · · · , τ j , ( K ) } : propose values from 1 / τ j , ( k ) ∼ Gamma ( , / τ j , ( k ) ( t − ) × c , c ) for k = , · · · , K where c is a large constant to concentrate the probability around 1 / τ j , ( k ) ( t − ) . Step 2
In Step 2, we conduct the analysis using 2 different specifications: (1) we specify uniformpriors on all the association parameters in R where their intervals are restricted to give the positivedefinite matrix; (2) we give restrictions on the (partially-identifiable) association parameters usinga parameter ρ (see Section 5.2.3 in the main paper) and put Unif ( , ) priors on this parameter anduniform priors on other remaining association parameters. However, for computational efficiency,we propose a value for each association parameter r ( prop ) from Unif ( r L , r U ) where r L and r U aredetermined to give the positive definite matrix R . See Barnard et al. (2000) for the computationdetails of calculating these intervals. Then, calculate the acceptance probability of the proposedvalues AR = min (cid:26) , f ( R r ( prop ) | T , X , θ ) q ( r ( t − )) f ( R r ( t − ) | T , X , θ ) q ( r ( prop ) ) (cid:27) , where R r ( prop ) denotes the correlation matrix with the r -th element set to the proposed value andother entries are set to their current values. Similarly, R r ( t − ) denotes the correlation matrixwhere r -th element is set to the value from the ( t − ) -th iteration and other entries are set to theircurrent values. We accept the proposed value with probability AR . Step 3
In Step 3, we impute T mis . Specifically, for each j = , · · · ,
6, draw T j ( prop ) ∼ N ( T j ( t − ) , σ T ) and calculate the acceptance probability of the proposed values AR = min (cid:40) , f ( T j ( prop ) , T obs j · , T \ j · , X | θ , R ) f ( Y obs | T j ( prop ) , T obs j · , T \ j · , X , θ , ξ ) q ( T j ( t − )) f ( T j ( t − ) , T obs j · , T \ j · , X | θ , R ) f ( Y obs | T j ( t − ) , T obs j · , T \ j · , X , θ , ξ ) q ( T j ( prop ) ) (cid:41) Then, we accept the proposed value with probability AR . Step 4
Finally, in Step 4, once we obtain the missing components of the potential mediators ( m ( , , ) for subjects with Z = m ( , , ) for subjects with Z = ( Y ( ) , M ( , , ) , M ( , , ) , X ) for Z = ( Y ( ) , M ( , , ) , M ( , , ) , X ) for Z =
1. For this subroutine, we use
DPcdensity function in the R package
DPpackage toobtain posterior samples in each iteration. Here, we use the default hyper-parameter specificationsdescribed in Section A. The detail of the posterior computation can be found in Jara et al. (2011).
G. Simulation Setup • Confounders : X ∼ N ( . , . ) , X ∼ N ( − . , . ) , X ∼ N ( , . ) Potential Mediators : M ( z , z , z ) ∼ MV N + . z + . x + . x + . x + . z − . x + . x − . x − . − . z + . x + . x + . x , Σ z • Potential Outcome : Y ( z ; M ( z , z , z )) ∼ N ( − z + . M ( z )+ . M ( z )+ . M ( z )+ h ( M , M , M )+ x + x + . x , . ) where h () is a function for ‘interaction’ term(s). We consider two cases: (1) Case 1: h () = . M ( z ) × M ( z ) and (2) Case 2: and h () = . M ( z ) × M ( z ) + . M ( z ) × M ( z ) . Si-multaneously, we test the model for ‘correlated mediators’ assuming two casesCase A : Σ = .
64 0 00 0 .
64 00 0 0 . for z = Σ = .
04 0 00 0 .
04 00 0 0 . for z = Σ = .
64 0 .
128 0 . .
128 0 .
64 0 . .
128 0 .
128 0 . for z = Σ = .
04 0 .
01 0 . .
01 0 .
04 0 . .
01 0 .
01 0 . for z = H. Uniform Priors on the Correlation Parameters
As mentioned in Section 5.2, we conduct an additional analysis with uniform priors on entries ofthe correlation matrix, R , in the Copula model. On average, the estimates for the principal causaleffects and mediation effects are similar to those in the main paper. However, credible intervals arewider in this case.Table 5: Posterior means (95% C.I.s) for expected associative and dissociative effects of SO scrubbers. SO NO x CO SO & NO x SO & CO NO x & CO SO & NO x & CO EAE − Mean -1.77 -1.62 -1.68 -1.66 -1.71 -1.64 -1.67SD (0.53) (0.71) (0.64) (0.72) (0.65) (0.78) (0.79)EDE Mean -1.42 -1.48 -1.42 -1.31 -1.14 -1.17 -1.06SD (0.72) (0.64) (0.59) (0.90) (0.85) (0.76) (1.04)EAE + Mean -0.21 -1.91 -1.84 -0.59 0.13 -2.03 -0.47SD (3.50) (0.76) (0.87) (4.93) (5.26) (1.09) (5.78) E JNIE NDE NIE1 NIE2 NIE3 JNIE12 JNIE13 JNIE23 − − − TE JNIE NDE NIE1 NIE2 NIE3 JNIE12 JNIE13 JNIE23 − − − Figure 4: Posterior distributions (means and SDs) of the mediation effects in the analysis of SO controls for the main analysis with the optional assumption in Section 5.2.3 (Top) and for theanalysis with uniform priors over the correlations (Bottom).41able 6: DIC’s for the parametric and nonparametric models based on the Power Plant data. Loweris better. Observed Data Parametric model Nonparametric model M ( ) M ( ) M ( ) -257.4 -285.7 M ( ) -144.7 -462.8 M ( ) M ( ) -877.7 -1236.9 I. Model Fit
We compared the fit of parametric and nonparametric models for the power plant data. Particularly,we assessed relative fit using a variation of DICs (Celeux et al. 2006) for mixture models since ournonparametric model can be specified as a mixture model using the stick-breaking construction(Sethuraman 1994 b ) DIC = − E θ [ log f ( T | θ ) | T ] + f ( T ) , where ˆ f ( T ) = ∏ ni = ˆ f ( T i ) with the MCMC predictive density which is defined asˆ f ( T i ) = m m ∑ l = K ∑ k = ω ( l ) k N ( T i ; µ ( l ) k , Σ ( l ) k ) ; m denotes the number of MCMC samples and ( µ ( l ) k , Σ ( l ) k , ω ( l ) k ) ≤ k ≤ K are the values at iteration l . An‘equivalent’ parametric model for a marginal distribution was specified as a linear regression modelwith the same set of covariates and non-informative priors. Table 6 shows that the nonparametricmodel has lower DIC’s for all marginal distributions, which supports the more complex model forthe power plant data. We also assessed the posterior predictive means and replications of emissionsand ambient PM . simulated under the model conditional on the observed covariates. Figure 5illustrates that the observed data points (black circle) and the corresponding posterior predictivemeans (red lines). Figure 6 & 7 illustrate 4 posterior predictive replications for each case. They allsuggest reasonable fit of the nonparametric model. J. Sensitivity Analysis
J.1 Sensitivity Analysis for Assumption 3
To assess sensitivity of Assumption 3, we divide the assumption into 2 sub-assumptions (3.a and3.b) with sensitivity parameters ε and χ . Let S k be the variance of difference between the k th mediator under opposite interventions, [ M k ( ) − M k ( ) | X ] , for k = , , lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll Index Y [ ] l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l Index Y [ ] llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll Index m [ ] llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll . . Index m [ ] llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll . . Index m [ ] l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l Index m [ ] l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l . . Index m [ ] l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l . . Index m [ ] Figure 5: Posterior predictive means (redlines) of emissions and ambient PM . under each inter-vention z = , lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll Index Y [ ] llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll Index m [ ] ( t on s ) llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll . . . . . . . Index m [ ] ( t on s ) llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll . . . . . Index m [ ] ( m illi on t on s ) Figure 6: Posterior predictive replications (7 gray lines) of emissions and ambient PM . under z = l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l Index Y [ ] l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l Index m [ ] ( t on s ) l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l . . . . . . . Index m [ ] ( t on s ) l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l . . . . Index m [ ] ( m illi on s t on s ) Figure 7: Posterior predictive replications (7 gray lines) of emissions and ambient PM . under z = D k be the Mahalanobis distance, D k = (cid:112) [ M k ( ) − M k ( ) | X ] / S k , which quantifies the treatment effect on the k th mediator. For notationalsimplicity, we suppress notation for covariates X in the following conditional distributions. Assumption 3.a
For a fixed ε , the following three equalities holdf , M ( , , ) ( y | M ( , , ) = m , M ( , , ) , D < ε )= f , M ( , , ) ( y | M ( , , ) = m , M ( , , ) , D < ε ) , f , M ( , , ) ( y | M ( , , ) = m , M ( , , ) , D < ε )= f , M ( , , ) ( y | M ( , , ) = m , M ( , , ) , D < ε ) , f , M ( , , ) ( y | M ( , , ) = m , M ( , , ) , D < ε )= f , M ( , , ) ( y | M ( , , ) = m , M ( , , ) , D < ε ) , where a vector m is a vector of realized values of three mediators, M ( z , z , z ) ≡ { M ( z ) , M ( z ) , M ( z ) } . The idea behind this assumption is among subject for whom the treatment effect on the k th mediator ( D k ) is small (as quantified by ε ), the distribution of the outcome is the same whetherthe mediator value was induced by Z = Z =
0. Here, ε corresponds to the size of the changein terms of the number of standard deviations. We set a single sensitivity parameter ε instead ofsetting ε k for k = , , S k .For conditional distributions of potential outcomes with any two mediators under different in-terventions (e.g., Y ( M ( , , )) , Y ( M ( , , )) , Y ( M ( , , )) ), we assume similar equalitieshold. To be consistent with ε which measures the size of the change in terms of the number ofstandard deviations, the treatment effect on the j th and k th mediators ( D jk ) is quantified by (cid:115)(cid:0) ε ε (cid:1) (cid:18) r jk r jk (cid:19) (cid:18) εε (cid:19) = (cid:113) ε ( + r jk ) , (15)where the correlation, r jk , is r jk = Cor ( M j ( ) − M j ( ) , M k ( ) − M k ( ))= Cov ( M j ( ) , M k ( )) + Cov ( M j ( ) , M k ( )) − Cov ( M j ( ) , M k ( )) − Cov ( M j ( ) , M k ( )) (cid:112) Var ( M j ( ) − M j ( )) × Var ( M k ( ) − M k ( )) . Here, the correlations r jk ’s are estimable based on Assumption 2. Then, for the potential outcomes46 ( M ( , , )) , Y ( M ( , , )) , and Y ( M ( , , )) , we assume the following equalities, f , M ( , , ) (cid:16) y | M ( , , ) = m , M ( , , ) , D < (cid:113) ε ( + r ) (cid:17) = f , M ( , , ) (cid:16) y | M ( , , ) = m , M ( , , ) , D < (cid:113) ε ( + r ) (cid:17) f , M ( , , ) (cid:16) y | M ( , , ) = m , M ( , , ) , D < (cid:113) ε ( + r ) (cid:17) = f , M ( , , ) (cid:16) y | M ( , , ) = m , M ( , , ) , D < (cid:113) ε ( + r ) (cid:17) f , M ( , , ) (cid:16) y | M ( , , ) = m , M ( , , ) , D < (cid:113) ε ( + r ) (cid:17) = f , M ( , , ) (cid:16) y | M ( , , ) = m , M ( , , ) , D < (cid:113) ε ( + r ) (cid:17) where D jk = (cid:113) W Tjk S − jk W jk and W Tjk = [ M j ( ) − M j ( ) , M k ( ) − M k ( ) | X ] and S jk is the covari-ance matrix of W jk . Then, D jk is an average standardized treatment effect on the j th and k th me-diators. In this way, we quantify (approximately) the treatment effect on the j th and k th mediators( D jk ) by the size of the change in terms of the number of standard deviations in (15).In the same manner, for the conditional distribution of Y ( M ( , , )) , f , M ( , , ) (cid:16) y | M ( , , ) = m , M ( , , ) , D < (cid:113) ε ( + r + r + r ) (cid:17) = f , M ( , , ) (cid:16) y | M ( , , ) = m , M ( , , ) , D < (cid:113) ε ( + r + r + r ) (cid:17) , where D = (cid:113) W T S − W and W T = [ M ( ) − M ( ) , M ( ) − M ( ) , M ( ) − M ( ) | X ] and S is the covariance matrix of W . Assumption 3.b
The second part of the assumption is for the subgroup of subjects for whom theintervention has a greater than ε effect on the k-th mediator in terms of D k . For potential outcomeswith one mediator under different intervention (e.g., Y ( M ( , , )) , Y ( M ( , , )) , Y ( M ( , , )) ),let k indicate which element of { z , z , z } is set to (e.g., if z = , z = , z = , then k = ). Then,for a fixed ε and χ k for k = , , , we assumef , M ( , , ) (cid:16) y | M ( , , ) = m , M ( , , ) , D ≥ ε (cid:17) = e ˜ y ( log ( χ )) f , M ( , , ) (cid:16) y | M ( , , ) = m , M ( , , ) , D ≥ ε (cid:17) , (16) f , M ( , , ) (cid:16) y | M ( , , ) = m , M ( , , ) , D ≥ ε (cid:17) = e ˜ y ( log ( χ )) f , M ( , , ) (cid:16) y | M ( , , ) = m , M ( , , ) , D ≥ ε (cid:17) , f , M ( , , ) (cid:16) y | M ( , , ) = m , M ( , , ) , D ≥ ε (cid:17) = e ˜ y ( log ( χ )) f , M ( , , ) (cid:16) y | M ( , , ) = m , M ( , , ) , D ≥ ε (cid:17) , here ˜ y = ( y − ¯ y , obs ) / s.d. ( y , obs ) , the standardized outcome with respect to the observed outcomeunder intervention Z = (e.g., y , obs ). This assumption states that among subjects for whom intervention Z = ε (number of standard deviation) effect on the mediator (measured via D k ), the conditional dis-tribution of the outcome under intervention Z = k th mediator set to its value under theopposite intervention ( Z =
0) is equal to the conditional distribution of the outcome under interven-tion Z = Z = Y ( M ( , , )) , Y ( M ( , , )) , Y ( M ( , , )) ), we assume f , M ( , , ) (cid:16) y | M ( , , ) = m , M ( , , ) , D ≥ (cid:113) ε ( + r ) (cid:17) = e ˜ y ( log ( χ )+ log ( χ )) f , M ( , , ) (cid:16) y | M ( , , ) = m , M ( , , ) , D ≥ (cid:113) ε ( + r (cid:17) , f , M ( , , ) (cid:16) y | M ( , , ) = m , M ( , , ) , D ≥ (cid:113) ε ( + r ) (cid:17) = e ˜ y ( log ( χ )+ log ( χ )) f , M ( , , ) (cid:16) y | M ( , , ) = m , M ( , , ) , D ≥ (cid:113) ε ( + r ) (cid:17) , f , M ( , , ) (cid:16) y | M ( , , ) = m , M ( , , ) , D ≥ (cid:113) ε ( + r ) (cid:17) = e ˜ y ( log ( χ )+ log ( χ )) f , M ( , , ) (cid:16) y | M ( , , ) = m , M ( , , ) , D ≥ (cid:113) ε ( + r ) (cid:17) , where D jk = (cid:113) W Tjk S − jk W jk with W Tjk = [ M j ( ) − M j ( ) , M k ( ) − M k ( ) | X ] and the covariancematrix S jk of W jk . The difference is quantified by the same quantity as in (15).Here, we implicitly assume additivity of sensitivity parameter χ ’s on a log scale, which inturn implies multiplicativity of sensitivity parameter χ ’s. The idea behind this assumption is twoconditional distributions are proportional to each other by χ j × χ k which is less than min ( χ j , χ k ) if 0 < χ j , χ k < ( χ j , χ k ) if χ j , χ k > χ k . Thus, under this setting, a conditional distribution of a priori counterfactual is more deviated from the conditional distribution of the observed outcomeas more mediators are set to values under the other intervention (e.g., Z = Y ( M ( , , )) where all mediators are underintervention Z =
0, we assume f , M ( , , ) (cid:16) y | M ( , , ) = m , M ( , , ) , D ≥ (cid:113) ε ( + r + r + r ) (cid:17) = e ˜ y ( ∑ k log ( χ k )) f , M ( , , ) (cid:16) y | M ( , , ) = m , M ( , , ) , D ≥ (cid:113) ε ( + r + r + r ) (cid:17) , where D = (cid:113) W T S − W with W T = [ M ( ) − M ( ) , M ( ) − M ( ) , M ( ) − M ( ) | X ] and S is the covariance matrix of W . 48ssumption 3.a and 3.b together differentiate the population into those for which the interven-tion has a large effect on the mediators versus those for which the intervention has a small effecton the mediators. Based on this framework, we can assess sensitivity of inferences to Assumption4. Note that two assumptions correspond to the Assumption 4 when we set ε = ∞ or χ k = k = , , J.2. Sensitivity Parameters
Assumption 3.a and 3.b contain sensitivity parameters, ( ε , χ k ) . In this section, we discuss strategiesfor eliciting values of each sensitivity parameter.For ε , we can consider this as the size of the change in terms of the number of standard devia-tion. A default approach can be ε ∈ [ . , ] , from half to 2 standard deviations.To better understand plausible ranges of χ k for k = , , ε effect on the first mediator ( M ) interms of D , we assume the negative effect of the intervention through the first mediator (i.e., thenegative NIE ), then the following equality is likely to hold for some value of the outcome such as y (cid:63) ≡ ¯ y obs , + sd ( ¯ y obs , ) : f , M ( , , ) ( y (cid:63) | M ( , , ) = m , M ( , , ) , D ≥ ε ) ≤ f , M ( , , ) ( y (cid:63) | M ( , , ) = m , M ( , , ) , D ≥ ε ) . (17)where the RHS in (17) is equal to e log ( χ ) f , M ( , , ) ( y (cid:63) | M ( , , ) = m , M ( , , ) , D ≥ ε ) (18)by Assumption 3.b. Thus, we equate (17) to (18) and have the following equality1 ≤ e log ( χ ) = χ , (19)which gives a possible range of χ provided that we have a prior expectation about the negativeNIE . Analogously, we can elicit possible ranges of χ and χ assuming negative mediator specificeffects for M and M , 1 ≤ χ , ≤ χ . (20)Additionally, if we assume the negative direct effect of the intervention, then the followingequality is likely to hold for some value of the outcome such as y (cid:63) ≡ ¯ y obs , + sd ( ¯ y obs , ) : f , M ( , , ) ( y (cid:63) | M ( , , ) = m , M ( , , ) , D ≥ ε ) ≥ f , M ( , , ) ( y (cid:63) | M ( , , ) = m , M ( , , ) , D ≥ ε ) , (21)where the RHS in (21) is equal to e ( log ( χ )+ log ( χ )+ log ( χ )) f , M ( , , ) ( y (cid:63) | M ( , , ) = m , M ( , , ) , D ≥ ε ) (22)49y Assumption 3.b. Then, we equate (21) to (22) to have the following equality f , M ( , , ) ( y (cid:63) | M ( , , ) = m , M ( , , ) , D ≥ ε ) f , M ( , , ) ( y (cid:63) | M ( , , ) = m , M ( , , ) , D ≥ ε ) ≥ e ( log ( χ )+ log ( χ )+ log ( χ )) = χ × χ × χ , (23)where the LHS in (23) acts as an upper bound which is likely to be larger than 1 under assumingthe negative causal effect of the treatment. Thus, we have1 < χ × χ × χ < some upper bound specified in ( ) . With the inequalities in (20) we can set possible ranges for χ k for k = , , D D D D D D D Mean (S.D.) 0.96 (0.06) 0.78 (0.02) 0.77 (0.03) 0.98 (0.04) 0.87 (0.02) 0.97 (0.04) 0.98 (0.03)
Table 7: Posterior means (and standard deviations) of the treatment effects on the D ’s.Figure 8: Graphical representation of the decomposition of the JNIE into mediator-specific NIEsand the joint effects of all possible pairs of the candidate mediators for the case K ==