[PDF] tmleCommunity: A R Package Implementing Target Maximum Likelihood Estimation for Community-level Data

Abstract

Over the past years, many applications aim to assess the causal effect of treatments assigned at the community level, while data are still collected at the individual level among individuals of the community. In many cases, one wants to evaluate the effect of a stochastic intervention on the community, where all communities in the target population receive probabilistically assigned treatments based on a known specified mechanism (e.g., implementing a community-level intervention policy that target stochastic changes in the behavior of a target population of communities). The tmleCommunity package is recently developed to implement targeted minimum loss-based estimation (TMLE) of the effect of community-level intervention(s) at a single time point on an individual-based outcome of interest, including the average causal effect. Implementations of the inverse-probability-of-treatment-weighting (IPTW) and the G-computation formula (GCOMP) are also available. The package supports multivariate arbitrary (i.e., static, dynamic or stochastic) interventions with a binary or continuous outcome. Besides, it allows user-specified data-adaptive machine learning algorithms through SuperLearner, sl3 and h2oEnsemble packages. The usage of the tmleCommunity package, along with a few examples, will be described in this paper.

Full PDF

JJSS

Journal of Statistical Software tmleCommunity: A R Package ImplementingTarget Maximum Likelihood Estimation forCommunity-level Data

Chi Zhang

University of California, Berkeley

Jennifer Ahern

University of California, Berkeley

Oleg Sofrygin

University of California, Berkeley

Mark J. van der Laan

University of California, Berkeley

Abstract

The tmleCommunity package is recently developed to implement targeted minimumloss-based estimation (TMLE) of the eﬀect of community-level intervention(s) at a singletime point on an individual-based outcome of interest, including the average causal ef-fect. Implementations of the inverse-probability-of-treatment-weighting (IPTW) and theG-computation formula (GCOMP) are also available. The package supports multivariatearbitrary (i.e., static, dynamic or stochastic) interventions with a binary or continuous out-come. Besides, it allows user-speciﬁed data-adaptive machine learning algorithms through

SuperLearner , sl3 and h2oEnsemble packages. The usage of the tmleCommunity package,along with a few examples, will be described in this paper. Keywords : TMLE, causal inference, community-level intervention, stochastic intervention,deterministic intervention, machine learning, additive treatment eﬀect.. a r X i v : . [ s t a t . A P ] J un tmleCommunity : Target Maximum Likelihood Estimation for Community-level Data

1. Introduction

The literature in ﬁelds such as epidemiology, econometrics and social science on the causalimpact of community-level intervention, is rapidly evolving, both in observational studies andrandomized trials. In observation settings, there is a rich literature on assessment of causaleﬀects of families, schools and neighborhoods on child and adolescent development (Brooks-Gunn et al. et al. (2008) is to estimate the impact of community violence exposure on anxiety amongchildren of African American mothers with depression. Similarly, randomized communitytrials have increased in recent years. As pointed out by Oakes (2004) and Steele (2016),scientiﬁcally speaking, community randomized controlled trials (CRCT) would be a superiorstrategy estimate the eﬀects of community-level exposures due to self-selection and otherdiﬃculties. One example is the MTO study, which estimates the lower-poverty neighborhoodeﬀects on crime for female and male youth (Kling, Ludwig, and Katz 2005). Another CRCTexample is the ongoing SEARCH study, which estimates the community level interventionsfor the elimination of HIV in rural communities in East Africa (University of California SanFrancisco 2013). Despite recent statistical advances, many of the current applications still relyon estimation techniques such as random eﬀect models (or mixed models) (Laird and Ware1982) and generalized estimating equations (GEE) approach (Liang and Zeger 1986; Gardiner et al. et al. (2010), thisstrong assumption could be quite unrealistic in many cases. For example, patients with cer-tain characteristics may never receive a particular treatment. On the other hand, a stochasticintervention is one in which each subject receives a probabilistically assigned treatment basedon a known speciﬁed mechanism. Because the form of the positivity assumption needed foridentiﬁability is model and parameter-speciﬁc, stochastic intervention causal parameters arenatural candidates if requiring a weaker version of positivity compared to other causal pa-rameters for continuous exposures. Furthermore, a policy intervention will lead to stochasticrather than deterministic interventions if the exposure of interest can only be manipulatedindirectly, such as when studying the beneﬁts of vigorous physical activity on a health out-come of interest in the elderly (Bembom and van der Laan 2007). Because it is unrealisticto enforce every elderly person to have a certain level of physical activity depending on adeterministic rule. To deal with the previous considerations, stochastic interventions could bea more ﬂexible strategy of deﬁning a question of interest and being better supported by thedata than deterministic interventions. Thus, using stochastic intervention causal parametersis a good way of estimating causal eﬀects of realistic policies, which could also be naturallyused to deﬁne and estimate causal eﬀects of continuous treatments or categorical multilevel ournal of Statistical Software R (R Core Team 2017) packages that have been instructive forthe development of our package: The tmle (Gruber and van der Laan 2012) package performsparameter estimations for a single time point binary intervention for independent and identi-cally distributed (IID) data, including the average treatment eﬀect (ATE), controlled directeﬀects (CDE), and the parameters of a marginal structural model (MSM). Besides, Sofryginand van der Laan (2015) developed another R package called tmlenet , which provides threeestimators for average causal eﬀects (and ATE) for single time point arbitrary interventions(univariate or multivariate; static, dynamic or stochastic) in the context of network-dependent(non-IID) data, including TMLE, the inverse-probability-of-treatment-weighting (IPTW) andthe parametric G-computation formula (GCOMP). This package performs logistic regressionthrough glm and speedglm .The development of the tmleCommunity package for R was motivated by the increasing de-mand of a user-friendly tool to estimate the impact of community-level arbitrary exposures incommunity-independent data structures with a semi-parametric eﬃcient estimator. The tm-leCommunity package also extends some of the capabilities of tmlenet by optionally allowingﬂexible data-adaptive estimations through SuperLearner , sl3 and h2oEnsemble packages, oreven user-supplied machine learning algorithms. Besides, it allows for panel data transforma-tion, such as with random eﬀects and ﬁxed eﬀects. tmleCommunity is available on github at https://github.com/chizhangucb/tmleCommunity . The article focuses on the practical usage of the tmleCommunity through multiple examples,therefore we omit many of the technical details. For a description of the TMLE framework forindependent community data with static community-level interventions, we refer to (Balzer et al. tmleCommunity : Target Maximum Likelihood Estimation for Community-level Data expanded to include pooled individual-level regressions based on the working model. Theﬁrst TMLE also includes the case of observing one individual per community unit as a specialcase. The second individual-level TMLE is developed under a more restricted hierarchicalmodel in which the additional assumption of dependence holds.Section 3 shows how the tmleCommunity package is used to estimate those parameters pro-posed in the prior section through a few examples, and summarizes the common features ofthe functions that may be useful to tmleCommunity users. Then section 4 uses three sim-ulation studies to demonstrate implementation in diﬀerent observational settings. Section 5discusses the possible extensions to the methodology and the package in the future. In section6 we answer some frequently asked questions regarding the package.

2. Single time-point multivariate intervention

Throughout this manuscript, we use the bold font capital letters to denote random vectorsand matrices. In studies of community-level interventions, we begin with a simple scenariothat involves randomly selecting J independent communities from some target population ofcommunities, sampling individuals from those chosen communities, and measuring baselinecovariates and outcomes on each sampled individual at a single time point. Also, the numberof chosen individuals within each community is not ﬁxed, so communities are indexed with j = 1 , , ..., J and individual within the j th community are indexed with i = 1 , ..., N j .After selection of the communities and individuals, pre-intervention covariates and a post-intervention outcome are measured on each sampled unit. Because only some of the pre-intervention covariates have clear individual-level counterpart, the pre-intervention covariatesseparates into two sets: ﬁrstly, let denote W j,i the (1 × p ) vector of p such individual-levelbaseline characteristics, and so W j = ( W j,i : i = 1 , ..., N j ) is an ( N j × p ) matrix of individual-level characteristics; secondly let E j represent the vector of community-level (environmental)baseline characteristics that have no individual-level counterpart and are shared by all com-munity members, including the number of individuals selected within the community (i.e., N j ∈ E j ). Last, A j is the exposure level assigned or naturally occurred in community j and Y j = ( Y j,i : i = 1 , ..., N j ) is the vector of individual outcomes of interest.In order to translate the scientiﬁc question of interest into a formal causal quantity, we ﬁrstspecify a NPSEM with endogenous variables X = ( E, W , A, Y ) that encodes our knowledgeabout the causal relationships among those variables and could be applied in both observa-tional setting and randomized trials (Pearl 1995, 2009). U = ( U E , U W , U A , U Y ) ∼ P U E = f E ( U E ) (1) W = f W ( E, U W ) A = f A ( E, W , U A ) Y = f Y ( E, W , A, U Y ) . where the U components are exogenous error terms, which are unmeasured and random withan unknown distribution P U . Given an input U , the function F = { f E , f W , f A , f Y } de- ournal of Statistical Software Y is aﬀected by its baseline community and individual-level covariates ( E, W ) together with its community-level intervention and unobserved factors( A, U Y ). First, while we might have speciﬁcation of f A , , the structural equations f E , f W , f Y do not necessarily restrict the functional form of the causal relationships, which coud benonparametric (entirely unspeciﬁc), semiparametric or parametric that incorporates domainknowledge. Second, as summarized by Balzer et al. (2017), structural causal model (1) coversa wide range of practical scenarios as it allows for the following types of between-individualdependencies within a community: (i) the individual-level covariates (and outcomes) amongmembers of a community may be correlated as a consequence of shared measured and unmea-sured community-level covariates ( E, U E ), and of possible correlations between unmeasuredindividual-level error terms ( U W , U Y ), and (ii) an individual’s outcome Y j,i may inﬂuenceanother’s outcome Y j,l within community j , and (iii) an individual’s baseline covariates W j,l may inﬂuence another’s outcome Y j,i . Actually, we can make an assumption about the thirdtype of between-individual dependence, and so the structural equation f Y will be speciﬁedunder this assumption. More details will be discussed in section (2.8.1). Third, an importantingredient of this model is to assume that distinct communities are causally independent andidentically distributed. The NPSEM deﬁnes a collection of distributions ( U, X ), representingthe full data model, where each distribution is determined by F and P U (i.e., P U,X, is thetrue probability distribution of ( U, X )). We denote the model for P U,X, with M F . M F allows us to deﬁne counterfactual random variables as functions of ( U, X ), correspondingwith arbitrary interventions. For example, with a static intervention on A , counterfactual Y a can be deﬁned as f Y ( E, W , a, U Y ), replacing the structural equation f A with the constant a (van der Laan and Sherri 2011). Thus, Y j,a = ( Y j,i,a : i = 1 , ..., N j ) represents the vector ofindividual-level outcomes that would have been obtained in community j if all individuals inthat community had actually been treated according to the exposure level a . More generally,we can replace data generating functions for A that correspond with degenerate choices ofdistributions for drawing A , given U = u and ( E, W ), by user-speciﬁed conditional distribu-tions of A ∗ . Such non-degenerate choices of intervention distributions are often referred to asstochastic interventions.First, let g ∗ denote our selection of a stochastic intervention identiﬁed by a set of multivariateconditional distributions of A ∗ , given the baseline covariates ( E, W ). For convenience, werepresent the stochastic intervention with a structural equation A ∗ = f A ∗ ( E, W , U A ∗ ) interms of random errors U A ∗ , and so deﬁne Y g ∗ = f Y ( E, W , A ∗ , U Y ). Then Y j,g ∗ = ( Y j,i,g ∗ : i = 1 , ..., N j ) denotes the corresponding vector of individual-level counterfactual outcome forcommunity j . Second, let Y c denote a scalar representing a community-level outcome that isdeﬁned as a aggregate of the outcomes measured among individuals who are members within acommunity, and so Y cg ∗ is the corresponding community-level counterfactual of interest. Onetypical choice of Y cj,g ∗ is the weighted average response among the N j individuals sampledfrom community j , i.e. Y cj,g ∗ ≡ (cid:80) N j i =1 α j,i Y j,i,g ∗ , for some user-speciﬁed set of weights α forwhich (cid:80) N j i =1 α j,i = 1. If the underlying community size N j diﬀers, a natural choice of α j,i isthe reciprocal of the community size (i.e., α j,i = 1 /N j ). tmleCommunity : Target Maximum Likelihood Estimation for Community-level Data We focus on community-level causal eﬀects where all communities in the target populationreceive the intervention g ∗ , then our causal parameter of interest is given byΨ F ( P U,X, ) = E U,X [ Y cg ∗ ] = E U,X (cid:110) N (cid:88) i =1 α i Y i,g ∗ (cid:111) To simplify the expression, we use α i = 1 /N in the remainder of article.One type of stochastic interventions could be a shifted version of the current treatment mech-anism g , i.e., P g ∗ ( A = a | E, W ) = g ( a − ν ( E, W ) | E, W ) given a known shift function ν ( E, W ). A simple example is a constant shift of ν ( E, W ) = 0 .

5. Another more complextype could be stochastic dynamic interventions, in which the interventions can be viewed asrandom assignments among dynamic rules. A simple example corresponding to the previousshift function is P g ∗ ( A = a | E, W ) = g (max { a − . , min( a ) }| E, W ), indicating that shiftedexposure A ∗ is always bounded by the minimum of the observed exposure A .One might also be interested in the contrasts of the expectation of community-level outcomeacross the target population of communities under diﬀerent interventions, i.e.,Ψ F ( P U,X, ) = E U,X ( Y cg ∗ ) − E U,X ( Y cg ∗ ) = E U,X (cid:110) N N (cid:88) i =1 Y i,g ∗ (cid:111) − E U,X (cid:110) N N (cid:88) i =1 Y i,g ∗ (cid:111) where g ∗ and g ∗ are two diﬀerent stochastic interventions.Finally, additive treatment eﬀect is a special case of average causal eﬀect with two staticinterventions g ∗ (1 | e, w ) = 1 and g ∗ (0 | e, w ) = 1 for any e ∈ E, w ∈ W , i.e., E U,X ( Y c (1)) − E U,X ( Y c (0)) = E U,X [ Y cg ∗ (1 | e, w )=1 ] − E U,X [ Y cg ∗ (0 | e, w )=1 ] Consider the study design presented above where for a randomly selected community, the ob-served data consist of the measured pre-intervention covariates, the intervention assignment,the vector of individual-level outcomes. Formally, one observation on community j , is codedas O j,i = ( E j , W j,i , A j , Y j,i )which follows the typical time ordering for the variables measured on the i th individuals withinthe j th community.Assume the observed data consists of J independent and identically distributed copies of O j = ( E j , W j , A j , Y j ) ∼ P , where P is an unknown underlying probability distribution ina model space M I . Here M I = { P ( P U,X ) : P U,X ∈ M F } denotes the statistical model that isthe set of possible distributions for the observed data O and only involves modeling g (i.e.,speciﬁcation of f A ). The true observed data distribution is thus P = P ( P U,X, ). ournal of Statistical Software P U,X, ) on the NPSEM and providing anexplicit link between this model and the observed data, we lay the groundwork for addressingthe identiﬁability through P .In order to express Ψ F ( P U,X, ) as a parameter of the distribution P of the observed data O ,we now need to address the identiﬁability of E U,X [ Y cg ∗ ] by adding two key assumptions on theNPSEM: the randomization assumption so called ”no unmeasured confounders” assumption(Assumption 1) and the positivity assumption (Assumption 2). The identiﬁability assump-tions will be brieﬂy reviewed here, for details on identiﬁability, we refer to see (Robins 1986;van der Laan 2010, 2014; Iv´an and van der Laan 2011). Assumption 1. A | = Y a | E, W where the counterfactual random variable Y a represents a collection of outcomes measuredon the individuals from a community if its intervention is set to A = a in causal model (1),replacing the structural equation f A with the constant a . Assumption 2. sup a ∈A g ∗ ( a | E, W ) g ( a | E, W ) < ∞ , almost everywherewhere g ∗ ( a | E, W ) = P g ∗ ( A = a | E, W ).Informally, Assumption 1 restricts the allowed distribution for P U to ensure that A and Y shares no common causes beyond any measured variables in X = ( E, W , A, Y ). For example,assumption 1 holds if U A is independent of U Y , given E, W . Besides, this randomizationassumption implies A ∗ | = Y a | E, W . Under Assumption 1 and , jointly with the consistencyassumption (i.e., A = a implies Y a = Y ), P ( Y g ∗ = y | A ∗ = a, E = e, W = w ) = P ( Y a = y | A ∗ = a, E = e, W = w ) = P ( Y a = y | E = e, W = w ) = P ( Y = y | A = a, E = e, W = w ),so our counterfactual distribution P ( Y g ∗ = y ) can be written as: P ( Y g ∗ = y ) = (cid:90) e, w (cid:90) a P ( Y g ∗ = y | A ∗ = a, E = e, W = w ) g ∗ ( a | e, w ) dµ ( a ) dP E, W ( e, w )= (cid:90) e, w (cid:90) a P ( Y a = y | E = e, W = w ) g ∗ ( a | e, w ) dµ a ( a ) dP E, W ( e, w )by assumption 1 and A ∗ | = Y a | E, W = (cid:90) e, w (cid:90) a P ( Y = y | A = a, E = e, W = w ) g ∗ ( a | e, w ) dµ a ( a ) dP E, W ( e, w )by consistency assumptionwith respect to some dominating measure µ a ( a ).Then, E U,X [ Y g ∗ ] is identiﬁed by the G-computational formula (Robins 1986): E U,X [ Y g ∗ ] = E E, W [ E g ∗ [ Y | A ∗ = a, E, W ]]= (cid:90) e, w (cid:90) a E g ∗ ( Y | a, e, w ) g ∗ ( a | e, w ) dµ a ( a ) dP E, W ( e, w )This provides us with a general identiﬁability result for E U,X [ Y cg ∗ ], the causal eﬀect of thecommunity-level stochastic intervention on any community-level outcome Y c that is some tmleCommunity : Target Maximum Likelihood Estimation for Community-level Data real valued function of the individual-level outcome Y : E U,X [ Y cg ∗ ] = E U,X [ n (cid:88) i =1 α i Y g ∗ ,i ] = n (cid:88) i =1 α i E E, W [ E g ∗ [ Y i | A ∗ , E, W ]] ≡ Ψ I ( P ) = ψ I If we only assume the randomization assumption in the previous section, then the statisticalmodel M I is nonparametric. Based on the result of identiﬁability, we note that Ψ I : M I → R represents a mapping from a probability distribution of O into a real number, and Ψ I ( P )denotes the target estimand corresponding to the target causal quantity E U,X [ Y g ∗ ].Before deﬁning the statistical parameter, we introduce some additional notation. First, wedenote the marginal distribution of the baseline covariates ( E, W ) by Q E, W , with a well-deﬁned density q E, W , with respect to some dominating measure µ y ( y ). There is no additionalassumption of independence for Q E, W . Second, let G denote the observed exposure con-ditional distribution for A that has a conditional density g ( A | E, W ). Third, we assumethat all Y within a community are sampled from the distribution Q Y with density given by q Y ( Y | A, E, W ), conditional on the exposure and the baseline covariates A, E, W . Now weintroduce the notation P = P ˜ Q,G for ˜ Q = ( Q Y , Q E, W ), and the statistical model becomes M I = { P ˜ Q,G : ˜ Q ∈ ˜ Q , G ∈ G} , where ˜ Q and G denote the parameter space for ˜ Q and G ,respectively, and ˜ Q here is nonparametric.Next, we deﬁne G ∗ as the user-supplied intervention with a new density g ∗ , which will replacethe observed conditional distribution G . So G ∗ is a conditional distribution that describeshow each intervened treatment is produced conditional on the baseline covariate ( E, W ).Given ˜ Q and G ∗ , we use O ∗ = ( O ∗ j,i = ( E j , W j,i , A ∗ j , Y ∗ j,i ) : i = 1 , ..., N j , j = 1 , , , ., J ) todenote a random variable generated under the post-intervention distribution P ˜ Q,G ∗ . Namely, P ˜ Q,G ∗ is the post-intervention distribution of observed data O under stochastic intervention G ∗ (Robins 1986), and the likelihood for P ˜ Q,G ∗ can be factorized as: p ˜ Q,G ∗ ( O ∗ ) = [ J (cid:89) j =1 q Y ( Y ∗ j | A ∗ j , W j , E j )][ J (cid:89) j =1 g ∗ ( A ∗ j | E j , W j )] q E, W ( E, W ) (2)Thus our target statistical quantity is now deﬁned as ψ I = Ψ I ( P ) = E ˜ q ,g ∗ [ Y cg ∗ ], where Ψ I ( P )is the target estimand of the true distribution of the observed data P ∈ M I (i.e., a mappingfrom the statistical model M I to R ). We deﬁne ¯ Q ( A j , E j , W j ) = (cid:82) y q Y ( y | A j , W j , E ) dµ y ( y )as the conditional mean evaluated under common-in- j distribution Q Y , and so ¯ Q c ( A, E, W ) ≡ E ( Y c | A, E, W ) as the conditional mean of the community-level outcome. Now we can referto Q = ( ¯ Q c , Q E, W , ) as the part of the observed data distribution that our target parameteris a function of (i.e., with a slight abuse of notation Ψ I ( P ) = Ψ I ( Q )), the parameter ψ I canbe written as: ψ I = (cid:90) e ∈E , w ∈W (cid:90) a ∈A ¯ Q c ( a, e, w ) g ∗ ( a | e, w ) dµ a ( a ) q E, W , ( e, w ) dµ e,w ( e, w ) (3)with respect to some dominating measures µ a ( a ) and µ e,w ( e, w ), where ( A , E , W ) is the com-mon support of ( A, E, W ). ournal of Statistical Software In the previous section, we have deﬁned a statistical model M I for the distribution of O ,and a statistical target parameter mapping Ψ I for which Ψ I ( P Q,G ∗ ) only depends on Q .Now we want to estimate Ψ I ( Q ) via a target maximum likelihood estimator (TMLE) andconstruct an asymptotically valid conﬁdence interval through the eﬃcient inﬂuence curve(EIC). Furthermore, we present a novel method for the estimation of the outcome regressionin which incorporates additional knowledge about the data generating mechanism that mightbe known by design. Community-level TMLE

As a two-stage procedure, TMLE needs to estimate both the outcome regressions ¯ Q andtreatment mechanism g . Since TMLE solves the EIC estimating equation, its estimator in-herits the double robustness property of this EIC and is guaranteed to be consistent (i.e.,asymptotically unbiased) if either ¯ Q or g is consistently estimated. For example, in a com-munity randomized controlled trial g is known to be 0.5 and can be consistently estimated,thus its TMLE will always be consistent. Besides, TMLE is eﬃcient when both are consis-tently estimated. In other words, when g is consistent, a choice of the initial estimator for ¯ Q that is better able to approximate the true value ¯ Q may improve the asymptotic eﬃciencyalong with ﬁnite sample bias and variance of the TMLE (van der Laan and Rubin 2006).The community-level TMLE ﬁrst obtains an initial estimate ˆ¯ Q c ( A, E, W ) for the conditionalmean of the community-level outcome ¯ Q c ( A, E, W ), and also an estimate ˆ g ( A | E, W ) of thecommunity-level density of the conditional treatment distribution g ( A | E, W ). The second tar-geting step is to create a targeted estimator ˆ¯ Q c ∗ of ¯ Q c by updating the initial ﬁt ˆ¯ Q c ( A, E, W )through a parametric ﬂuctuation that exploits the information in the density for the condi-tional treatment distribution. We brieﬂy review the estimation results and statistical inferencehere. For further discussion, please see (Iv´an and van der Laan 2011; Sofrygin and van derLaan 2015; Balzer et al. Q c ( A j , E j , W j ) , ˆ g ( A j = a | E j , W j ) and ˆ g ∗ ( A j = a | E j , W j ) for each community j =1 , ..., J , the predictions of the community-level outcome is easily obtained byˆ¯ Q c ∗ ( a, E j , W j ) = expit { logit ( ˆ¯ Q c ( a, E j , W j ) + ˆ (cid:15) ˆ H j ( a, E j , W j )) } , ∀ j = 1 , ..., J where ˆ H j ( a, E j , W j ) = ˆ g ∗ ( A j = a | E j , W j )ˆ g ( A j = a | E j , W j ) displays the community-level clever covariate and theﬂuctuation parameter (cid:15) is obtained by a logistic regression of Y c on ˆ H with oﬀset logit( ¯ Q cn )(Another way to achieve the targeting step is to use weighted regression intercept-basedTMLE, where (cid:15) is obtained by a intercept-only weighted logistic regression of Y c with oﬀsetlogit( ¯ Q cn ), predicted weights logit( ¯ Q cn ) and no covariates.)Thus our targeted substitution estimator is the weighted mean of the targeted predictionsacross the J communities. One natural choice is the empirical mean deﬁned as follows:ˆΨ I ( P ˆ Q ∗ , ˆ g ∗ ) = 1 J J (cid:88) j =1 (cid:90) e j , w j (cid:90) a ˆ¯ Q c ∗ ( a, e j , w j )ˆ g ∗ ( a | e j , w j ) dµ a ( a ) q E, W ( e j , w j ) dµ e,w ( e j , w j )In practice, community-level TMLE variance is asymptotically estimated as Var( ˆΨ I ( ˆ Q ∗ )) ≈ tmleCommunity : Target Maximum Likelihood Estimation for Community-level Data ˆ σ J J , where ˆ σ J is the sample variance of the estimated inﬂuence curve obtained byˆ σ J = 1 J J (cid:88) j =1 { D I ( ˆ Q ∗ , ˆ g )( O j ) } where D I ( ˆ Q ∗ , ˆ g ) is the plug-in estimator of the eﬃcient inﬂuence curve of Ψ I at P : D I ( P )( O ) = g ∗ g ( A | E, W )( Y c − ¯ Q c ( A, E, W ))+ E g ∗ [ ¯ Q c ( A, E, W ) | E, W ] − Ψ I ( P Q,g ∗ )where, E g ∗ [ ¯ Q c ( A, E, W ) | E, W ] = (cid:90) a ¯ Q c ∗ ( a, E, W ) g ∗ ( a | E, W ) dµ a ( a )This quantity ˆ σ can be used to calculate p values and 95% conﬁdence intervals for diﬀerentparameters, e.g., ˆΨ I ( P ˆ Q ∗ , ˆ g ∗ ) ± . ˆ σ J J for the ATE. Incorporating hierarchical structure for estimating outcome mechanism

Based on the previously deﬁned community-level TMLE for the mean of the exposure-speciﬁccounterfactual community level outcome, we can still incorporate individual level data ratherthan simply community wide aggregates of that data. As discussed in section (2.2), one typicalchoice of the community-level counterfactuals of interest is the weighed average responseamong all individuals sampled from that community, i.e., Y cj,g ∗ = (cid:80) N j i =1 α j,i Y j,i,g ∗ . Hence, theconditional mean of the community-level outcome can be rewritten as a weighted averageof the individual-level outcomes, ¯ Q c ( A, E, W ) = E ( Y c | A, E, W ) = (cid:80) Ni =1 α i E ( Y i | A, E, W ) ≡ (cid:80) Ni =1 α i E ( Y i | A, E, W , N ) where the community-speciﬁc sample size N is a random variablethat is included in the community-level baseline covariates E .Without changing the underlying structural causal model (1), estimand and eﬃcient inﬂuencecurve, we may use an individual-level working model to incorporate pooled individual-leveloutcome regressions as candidates in the Super Learner library for initial estimation of theexpected community-level outcome ¯ Q c ( A, E, W ) given community and individual level co-variates, along with community-level exposures. Specially, we propose a working model thatassumes that E ( Y i | A, E, W ) = E ( Y i | A, E, W i ) = ¯ Q ( A, E, W i ) (4)for a common function ¯ Q . In practice, this working model suggests that each individual’soutcome is drawn from a common distribution that may depend on the individual’s baselinecovariates, together with the intervention and community-level baseline covariates presentedin his or her community, but is not directly inﬂuenced by the covariates of others in the samecommunity.Furthermore, the strength of the working assumptions could be weakened by encoding theknowledge of the dependence relationship among individuals within communities, namely,deﬁning E to progressively contain a larger subset of any individual-level covariates includedin W (Balzer et al. ournal of Statistical Software W i .Let F i denote the subset of individuals whose baseline individual-level covariates aﬀect thatindividual’s outcome Y i , where i ∈ F i . Now we have a less restricted and more general versionof (4) as working model: E ( Y i | A, E, W ) = ¯ Q ( A, E, ( W l : l ∈ F i )) (5)for a common function ¯ Q .The implementation of the community-level TMLE incorporating hierarchical data is similarto the previous community-level TMLE, except that the estimation of the community-leveloutcome could be based on a single pooled individual level regression Y j,i on ( E j , A j , W j,i )when assuming the aforementioned working model (4). We note that this TMLE never claimsthat the individual-level working model holds, instead, it uses the working model as a meansto generate an initial estimator of ¯ Q c . Special case where one observation per community

We will now consider a special case where each community has only one individual (i.e., N = 1), and so all individual-level baseline covariates can be treated as environmental factors(i.e., ( E, W ) = E ). Nonparametric structural equation model

Consider a NPSEM with structural equations for endogenous variables X = ( E, A, Y ), E = f E ( U E ) (6) A = f A ( E, U A ) Y = f Y ( E, A, U Y ) . with endogenous unmeasured sources of random variation U = ( U E , U A , U Y ). Counterfactuals

Let Y a = f Y ( E, a, U Y ) denote the counterfactual corresponding with setting the treatment A = a , thus the community-level counterfactual outcome is the same as the only observation’soutcome in community j (i.e., Y cj,a ≡ Y j,a ). Observed data

Now the observed data become O = ( E, A, Y ). We observe J i.i.d observations on O . Target parameter on NPSEM

Consider the following parameter of the distribution of (

U, X ):Ψ F ( P U,X, ) = E U,X [ Y cg ∗ ] = E U,X [ Y g ∗ ] Identiﬁability Result Ψ F ( P U,X, ) = E U,X [ Y cg ∗ ] = E E [ E g ∗ [ Y | A ∗ , E ]] ≡ Ψ I ( P ) The statistical parameter

Now let Q = ( ¯ Q c , Q E, ) and so the conditional mean of the community-level outcome be-comes ¯ Q c ( A, E ) ≡ E ( Y c | A, E, ) = E ( Y | A, E ), with density given by q cY ( Y | A, E ). Then Q E ≡ tmleCommunity : Target Maximum Likelihood Estimation for Community-level Data P ( E ) and we assume q E is a well-deﬁned density for Q E . Also, the community-level stochas-tic intervention is denoted as g ( A | E ) ≡ P ( A | E ). Given Q and G ∗ , the post-interventiondistribution P Q,G ∗ generates a new set of observation O ∗ = ( O ∗ j = ( E j , A ∗ j , Y ∗ j ) : j = 1 , ..., J ).Applying those newly modiﬁed notations produces the likelihood: p Q,G ∗ ( O ∗ ) = (cid:104) J (cid:89) j =1 q cY ( Y ∗ j | A ∗ j , E j ) (cid:105)(cid:104) J (cid:89) j =1 g ∗ ( A ∗ j | E j ) (cid:105) q E ( E )Based on the NPSEM above and the result of identiﬁability, we propose the following targetstatistical quantity of the distribution of O :Ψ I ( P ) = (cid:90) e ∈E (cid:90) a ∈A ¯ Q c ( a, e ) g ∗ ( a | e ) dµ a ( a ) q E, ( e ) dµ e ( e )with respect to some dominating measures µ a ( a ) and µ e ( e ). Estimation and inference

Similar to the setting in the regular community-level TMLE, given ˆ¯ Q c ( A j , E j ) , ˆ g ( A j = a | E j )and ˆ g ∗ ( A j = a | E j ) for each community j = 1 , ..., J , we have ˆ H j ( a, E j ) = ˆ g ∗ ( A j = a | E j )ˆ g ( A j = a | E j ) and anupdated ﬁt of community-level outcome regression is given by:ˆ¯ Q c ∗ ( a, E j ) = expit { logit ( ˆ¯ Q c ( a, E j ) + ˆ (cid:15) ˆ H j ( a, E j )) } , ∀ j = 1 , ..., J Thus the targeted substitution estimator deﬁned as follows:ˆΨ I ( P ˆ Q ∗ , ˆ g c ∗ ) = 1 J J (cid:88) j =1 (cid:90) e j (cid:90) a ˆ¯ Q c ∗ ( a, e j )ˆ g c ∗ ( a | e j ) dµ a ( a ) q E ( e j ) dµ e ( e j )Under regularity conditions, the TMLE ˆΨ I ( P ˆ Q ∗ , ˆ g ∗ ) is asymptotically linear with inﬂuencecurve that can be conservatively replaced by D I ( P )( O ) = g ∗ g ( A | E )( Y c − ¯ Q c ( A, E )) + E g ∗ [ ¯ Q c ( A, E ) | E ] − Ψ I ( P Q,g ∗ )where E g ∗ [ ¯ Q c ( A, E ) | E ] = (cid:82) a ¯ Q c ∗ ( a, E ) g ∗ ( a | E ) dµ a ( a ).Finally 95% conﬁdence intervals can be constructed by ˆΨ I ( P ˆ Q ∗ , ˆ g ∗ ) ± . ˆ σ J J where ˆ σ J = J J (cid:80) j =1 { D I ( ˆ Q ∗ , ˆ g )( O j ) } . Individual-level TMLE

What if the third type of dependence in model (1) mentioned in section 2.1 is weak or evendoesn’t exist? This is so called ”no covariate interference” (Prague et al. et al. Y i is sampled from the same distribution ournal of Statistical Software W i , the baseline community-level covariates E , together with the community-level intervention and that individual’sunobserved factors ( A, U Y I ). Under this working assumption, we have E ( Y i | A, E, W ) =¯ Q ( A, E, W i ). Therefore, when background knowledge about Q is suﬃcient to ensure anassumption that working model (4) holds, this background changes both the underlying hi-erarchical causal model and the identiﬁability results, and so the statistical model, estimand,eﬃcient inﬂuence curve, etc.In this section, we assume such additional knowledge is available and so consider a newhierarchical causal sub-model which restricts the dependence of individuals in a community.The NPSEM that represents the causal relationships among those endogenous variables isnow given by: E = f E ( U E ) W = f W ( E, U W ) (7) A = f A ( E, W , U A ) Y i = f Y ( E, W i , A, U Y i ) .U Y i | = U A | E, W i Besides, we also assume that there is a common conditional distribution of A given ( E, W i )across all individuals, i.e., P ( A | E, W i ) ≡ g I ( A | E, W ), where g I ( A | E, W ) denotes the individual-level stochastic intervention. Recall that we may be interested in Y cj,g ∗ I ≡ (cid:80) N j i =1 α j,i Y j,i,g ∗ I , withrespect to some individual-level stochastic intervention g ∗ I . Thus, by identiﬁability,Ψ F ( P U,X, ) = E U,X ( Y cg ∗ I ) = E U,X ( N (cid:88) i =1 α i Y i,g ∗ I )= N (cid:88) i =1 α i E Q E, W , (cid:110) E g ∗ I [ ¯ Q ( A, E, W i ) | E, W i ] (cid:111) ≡ Ψ II ( P Q,g ∗ I )where Ψ II : M II → R is the target statistical quantity under the key assumptions of identi-ﬁability and working assumption (4), and M II is a sub-model of M I .Now, the corresponding EIC is given by: D II ( P )( O ) = N (cid:88) i =1 α i [ g ∗ I g ,I ( A | E, W i )( Y i − ¯ Q ( A, E, W i )) + E g ∗ I [ ¯ Q ( A, E, W i ) | E, W i ] − Ψ II ( P Q,g ∗ I )]Also, the individual-level density of the conditional treatment distribution, adjusting for E and the individual speciﬁc covariate W i , is deﬁned as g I ( a | e, w i ) = E W [ g I ( a | e, W ) | W i = w i ] = E W [ g I ( a | e, W − i , W i ) | W i = w i ]= (cid:90) w − i g I ( a | e, w − i , w i ) P ( W − i = w − i | W i = w i ) dµ ( w − i )= (cid:90) w − i g I ( a | e, w − i , w i ) P ( W − i = w − i ) dµ ( w − i )4 tmleCommunity : Target Maximum Likelihood Estimation for Community-level Data with respect to some dominating measure µ ( w − i ), and W − i represents an (( N − × p ) matrixof individual-level covariates, which includes all individuals in the community except i .Therefore, the estimate of the individual-level stochastic intervention is given byˆ g I ( a | e, w i ) = 1 J J (cid:88) j =1 (cid:90) w j, − i ˆ g I ( a | e j , w j, − i , w j,i ) P n ( W j, − i = w j, − i ) dµ ( w j, − i )and the substitution estimator of Ψ II ( P Q,g ∗ I ) is deﬁned as follows:Ψ II ( P Q,g ∗ I ) = 1 J J (cid:88) j =1 n j (cid:88) i =1 α j,i (cid:90) e j ,w j,i (cid:90) a j ˆ¯ Q ∗ ( a, e j , w j,i )ˆ g ∗ I ( a | e j , w j,i ) dµ a ( a ) q E, W ( e j , w j ) dµ e,w ( e j , w j )

3. Implementation in the tmleCommunity package

Estimation of average causal eﬀects for single time point arbitrary interventions in hierarchicaldata with the tmleCommunity package is performed with the main function tmleCommunity() ,along with the auxiliary function, tmleCom_Options() , setting additional options that con-trol the estimation algorithms in the package. Note that tmleCom_Options() needs to bespeciﬁed before calling the main function tmleCommunity() , otherwise the default settings ofall arguments to the tmleCom_Options() function will be used in the estimation procedure.

The observed data set is passed to tmleCommunity through the data argument as a dataframe, with the outcome column, the exposure column(s), the baseline covariates columnsand possibly the community identiﬁer column (usually numeric values, but no factor val-ues are supported in the package). The data arguments include

Ynode, Anodes, WEnodes,communityID and YnodeDet , which are all column names or indices in the data frame datathat represent the outcome variable, exposure variable(s), community and individual levelbaseline covariates, community identiﬁer variable, and indicators of deterministic values ofoutcome

Ynode , respectively. Only

Anodes and

WEnodes must be speciﬁed. If

Ynode is leftunspeciﬁed, the left-side of the regression formula in argument

Qform will be treated as

Ynode .If

YnodeDet is not NULL (its corresponding column should be logical or binary), observa-tions received TRUE or 1 as their

YnodeDet values are assumed to have constant valuesfor their

Ynode , thus not being used in the ﬁnal estimation step. If communityID is notNULL (its corresponding column could be integer, numeric or character), it supports threeoptions in argument community.step , including "community_level", "individual_level" and "PerCommunity" . Otherwise, it assumes that the data set has no hierarchical structure(thus automatically choose the option "NoCommunity" ). More details will be discussed insection 3.2.The other optional arguments related to the data set - obs.wts , community.wts , fluctuation - are the sampling weights for each observation, the sampling weights for each community,the choice of the ﬂuctuation working model, respectively. Besides, if fluctuation is speciﬁedas "logistic" (the default), continuous outcomes Y ∈ [ a, b ] will be bounded into the lineartransformed outcome prior to estimating the outcome mechanism. ournal of Statistical Software communityID, community.step and pooled.Q are the main arguments to determine theestimation methods for hierarchical data. First, in order to preserve the hierarchical datastructure, the data should contain a column with one unique identiﬁer per community andthe user must provide the communityID argument as a column name or index in data. Failingto provide communityID will force the community.step argument to be automatically set to "NoCommunity" (the default) and to pool data across all communities and treat the data asnon-hierarchical.Second, If community.step is speciﬁed as "community_level" , the observed data will beaggregated to the community-level and the estimation of the corresponding statistical param-eter will be analogous to non-hierarchical data structures. Note that pooled.Q is in regard toincorporate the pairing of individual-level covariates and outcomes (also known as the workingmodel of ”no covariate inference”) in community-level TMLE although the working model isnot assumed to hold. In other words, when community.step is set to "community_level" ,if pooled.Q = TRUE , the pooled individual-level regressions will be added as candidates inthe Super Learner library for initial estimation of the outcome mechanism; If pooled.Q =FALSE, both outcome and treatment mechanisms are estimated on the community-leveland use no individual-level information. Moreover, community.step could be speciﬁed as "individual_level" under the assumption that the working model of ”no covariate infer-ence” holds. Third, the stratiﬁed TMLE that ﬁts separate outcome and exposure mechanismsfor each community can be implemented by setting community.step to "perCommunity" .Examples with more details will be provided in section 4.Last but not least, the community.wts argument can be used to provide the community-level observation weights. If setting to "size.community" (the default), each communityis weighted by its (standard deviation scaled) number of individuals and this speciﬁcationinﬂates the weight for communities who are underrepresented due to a large degree of missingdata. If setting to "equal.community" , all communities will be weighted as the same. Theuser-speciﬁed community.wts may be passed as a matrix with 2 columns, where the ﬁrst andsecond columns contain the identiﬁers of communities and the corresponding non-negativeweights, respectively. The user-supplied interventions are speciﬁed by the f_g0, f_gstar1 and f_gstar2 argu-ments. First, an intervention regimen that encodes model knowledge about values of Anodesis speciﬁed with the f_g0 argument, which is either a function or a vector (or a matrix / dataframe if exposures are multivariate) that provides true treatment mechanism under observedAnodes. If f_g0 is speciﬁed as a function, a large vector (or a data frame if multivariate) ofAnodes will be sampled from the f_g0 function. Second, an intervention regimen of interestis deﬁned by replacing the conditional density g with a new user-supplied density g ∗ , andis speciﬁed by the f_gstar1 argument, which takes a function of counterfactual exposures.The function must include ”data” as one of its argument names, and return a vector or adata frame of counterfactual exposures evaluated based on Anodes, WEnodes (and possibly communityID ) passed as a named argument "data" . In addition, the interventions deﬁnedby f_gstar1 can be static, dynamic or stochastic. For example, for a data set with a binarytreatment, a stochastic regime will randomly assign treatment to 30% of the observations,6 tmleCommunity : Target Maximum Likelihood Estimation for Community-level Data and another deterministically static regime will assign treatment for every observation. Thecorresponding f_gstar1 function can be coded as

R> define_f.gstar <- function(prob.val) {R+ f.gstar <- function(data, ...) {R+ rbinom(n = NROW(data), size = 1, prob = prob.val)R+ }R+ return(f.gstar)R+ }R> f.gstar_stoch.0.3 <- define_f.gstar(prob.val = 0.3)R> f.gstar_determ.1 <- define_f.gstar(prob.val = 1)

Alternatively, a deterministic regime can be speciﬁed by passing a vector (or a matrix / dataframe) to the f_gstar1 argument with one element per observation (and one column pertreatment variable if multivariate). Moreover, f_gstar1 can be set to a numeric value if thatconstant exposure will be assigned to all observations. Thus, the other two ways to code f.gstar_determ.1 in the example above would be

R> f.gstar_vector.1 <- rep(1L, NROW(data))R> f.gstar_const.1 <- 1L

As discussed in section 2.7.1, the ﬁrst-stage of the TMLE procedure is to estimate the con-ditional mean outcome ¯ Q . A good initial ﬁt of ¯ Q could reduce reliance on bias reduction fromthe subsequent targeting step and provide a target parameter estimate with smaller variance.The following optional arguments to the tmleCom_Options() and tmleCommunity() functionsgive users control over the initial estimation of ¯ Q :Relevant arguments in tmleCom_Options() :1. Qestimator

A string specifying the default estimator for ﬁtting ¯ Q , including bothparametric estimations ( "speedglm__glm" and "glm__glm" ) and data-adaptive esti-mations ( "h2o__ensemble" and "SuperLearner" ).2. h2ometalearner A string to pass to h2o.ensemble , specifying the prediction algorithmused to learn the optimal combination of the base learners.3. h2olearner

A vector of prediction algorithms for training the ensemble’s base models.4.

SL.library

A vector of prediction algorithms to pass to

SuperLearner .5.

CVfolds

Optional number of splits in the V-fold cross-validation step for data-adaptiveestimation.Relevant arguments in tmleCommunity() :1.

Qform

Regression formula for ¯ Q , with the form of Ynode ~ Anodes + WEnodes .2.

Qbounds

A vector of 2 truncated levels on continuous Y and the initial estimate ¯ Q n . ournal of Statistical Software alpha A value keeping ¯ Q n bounded away from (0,1) for logistic ﬂuctuation.Note that a negative Bernoulli log likelihood could be used as a valid loss function for ¯ Q when setting fluctuation = "logistic" , even if Y is not binary. Compared to a regularlinear ﬂuctuation, a logistic ﬂuctuation assures that all predicted means are in (0, 1) andthe corresponding estimates for ¯ Q respect the global constraints of the observed data model,which reduces bias and variance in small samples (Gruber and van der Laan 2010). Beforeperforming the estimation procedure, outcomes Y will be bounded by Qbounds , and thenwill be automatically transformed into Y ∗ , continuous outcomes bounded by (0, 1) where Y ∗ = Y − ab − a ∈ [0 , Qbounds is unspeciﬁed, the default choice of the range of Y , widenedby 10% of its minimum and maximum values, will be used. Besides, if outcomes Y wereoriginally transformed into Y ∗ , ﬁtting values of the targeted estimates will be automaticallymapped back to the original scale. Once Qbounds ﬁnish bounding the observed outcomes,it will be set to (1 - alpha , alpha ) and used to bound the predicted values for the initialoutcome mechanism.The default estimation algorithm for ¯ Q is set to "speedglm__glm" , which relies on the speedglm package (Enea 2017) and uses its function speedglm.wfit to ﬁt parametric gen-eralized linear models (GLM) to medium-large data sets. We note that a direct call to speedglm.wfit that requires a model matrix as input is faster than the standard call to speedglm . Another regular parametric GLM ﬁtting functoin glm.fit (the workhorse of glm )is also available for estimating ¯ Q by setting the Qestimator argument to "glm__glm" . When speedglm.wfit or glm.fit is called, logistic regression will be used for the initial estimationof ¯ Q .However, in a nonparametric model, the probability distribution of the data are typicallyunknown and thus parametric models with assumptions that do not use realistic backgroundknowledge and not respect the global constraints of the statistcal model are easily incor-rectly speciﬁed. The recommended solution for it is to use more ﬂexible machine-learningestimators that adapt the regression function to the data without overﬁtting the data. tm-leCommunity relies on the SuperLearner and h2oEnsemble packages to perform data-adaptiveestimation. Based on the oracle properties of V-fold cross validation that minimizes the es-timated expected squared error loss function (van der Laan, Polley, and Hubbard 2007), thesuper learning chooses the best weighted convex combination of candidate estimators in theuser-speciﬁed library, possibly including both machine learning algorithms and parametricmodels. One of its most important advantages is that its ”free lunch” principle - Including alarge variety of prediction algorithms in the super learning library could increase the chanceof being consistently estimated, especially when the functional form of the conditional densityis unknown. Note that h2oEnsemble is another package that provides Super learning methodand is based on the h2o package (usually used to build models on large datasets).

Qform can be used to specify a regression formula that includes

Anodes and

WEnodes for ¯ Q .The functional form of the formula is only important to parametric estimation algorithms speedglm.wfit and glm.fit . When using data-adaptive estimation algorithms provided by SuperLearner and h2oEnsemble , all variables on the right hand side of

Qform will be treatedas predictor variables passed to the candidate estimators, ignoring the original regressionformula. If

Qform is somehow left unspeciﬁed, all variables in

Anodes and

WEnodes will betreated as predictor covariates. Besides, the library of the candidate estimators can specifydiﬀerent functions of the passed predictor variables, such as

SL.glm.interaction for second8 tmleCommunity : Target Maximum Likelihood Estimation for Community-level Data order interaction terms in

SuperLearner . For more details on creating new wrapper functionsfor prediction algorithms in h2oEnsemble and

SuperLearner , please see (LeDell 2016) and(Polley et al.

SuperLearner and h2oEnsemble may fail dueto various reasons such as constant responses, so does the use of speedglm . Therefore, the tmleCommunity() function provides an insurance mechanism for guaranteeing the estimationprocedure functions normally even if the chosen algorithm fails: If "h2o__ensemble" fails, itfalls back on "SuperLearner" ; If "SuperLearner" fails, it falls back on "speedglm__glm" ;If "speedglm__glm" fails, it falls back on "glm__glm" . However, tmleCommunity() willterminate with an error if the last solution glm.fit also fails.We demonstrate a simple application of the tmleCommunity function using user-speciﬁedparametric models and super learning library to estimate ¯ Q in the code chunk below. In thisexample, we have a simulated data of 1,000 i.i.d. observations with four baseline covariates( W1, W2, W3 and W4 ), one binary exposure( A ) and continuous outcome ( Y ). Its true ATE valueis 2.80. Code to generate the example dataset is attached in the supplementary material.We begin with correctly speciﬁed models for ¯ Q . Note that tmleCommunity will provide resultsof three estimators for ATE. In this section, we only care about the non-targeted substitutionestimate that uses only an estimate of ¯ Q , thus we can use "gcomp" to extract the correspondingresults of the MLE estimator. R> tmleCom_Options(Qestimator = "speedglm__glm")R> tmleCom_Qc_ATE <-R+ tmleCommunity(data = indSample.bA.cY, Ynode = "Y", Anodes = "A",R+ WEnodes = c("W1", "W2", "W3", "W4"), f_gstar1 = 1,R+ f_gstar2 = 0, Qform = "Y ~ W1 + W2 + W3 + W4 + A",R+ alpha = 0.995)R> c(tmleCom_Qc_ATE$ATE$estimates["gcomp", ],R+ tmleCom_Qc_ATE$ATE$vars["gcomp", ]) [1] 2.816628 0.004813

What if our assumption of the parametric model for ¯ Q is incorrect? The result of the mis-speciﬁed outcome regression is shown next. R> tmleCom_Options(Qestimator = "speedglm__glm")R> tmleCom_Qm_ATE <-R+ tmleCommunity(data = indSample.bA.cY, Ynode = "Y", Anodes = "A",R+ WEnodes = c("W1", "W2", "W3", "W4"), f_gstar1 = 1,R+ f_gstar2 = 0, Qform = "Y ~ W1 + A", alpha = 0.995)R> c(tmleCom_Qm_ATE$ATE$estimates["gcomp", ],R+ tmleCom_Qm_ATE$ATE$vars["gcomp", ]) [1] 3.45848993 0.01460869

Next, suppose we do not know its parametric model and need to use the super learningalgorithm to estimate ¯ Q . Instead of using the default library, we specify one that contains ournal of Statistical Software SL.glm , SL.bayesglm and

SL.gam (Note that

SL.gam calls the gam function in the suggested gam package that uses generalized additive models (Hastie2017)).

R> require("SuperLearner")R> tmleCom_Options(Qestimator = "SuperLearner", CVfolds = 5,R+ SL.library = c("SL.glm", "SL.bayesglm", "SL.gam"))R> tmleCom_QSL_ATE <-R+ tmleCommunity(data = indSample.bA.cY, Ynode = "Y", Anodes = "A",R+ WEnodes = c("W1", "W2", "W3", "W4"), f_gstar1 = 1,R+ f_gstar2 = 0, rndseed = 12345)R> c(tmleCom_QSL_ATE$ATE$estimates["gcomp", ],R+ tmleCom_QSL_ATE$ATE$vars["gcomp", ]) [1] 2.809818 0.004797

After ﬁnishing the initial ﬁt for ¯ Q in the ﬁrst stage of TMLE, the next step is to modifythe initial estimator by using the estimation of nuisance parameter g , in order to make anoptimal bias-variance trade oﬀ. Recall that the estimate g n will be used in a clever covariatethat deﬁnes a parametric ﬂuctuation model to update the initial estimate of ¯ Q . The followingarguments to the tmleCom_Options() and tmleCommunity() functions provide ﬂexibility inhow to choose the estimator for g :Relevant arguments in tmleCom_Options() : • gestimator A string specifying the default estimator for ﬁtting g ,similar to Qstimator (except that gestimator also supports "sl3_pipelines" , another data-adaptive esti-mation method). • bin.method Specify the method for choosing bins when discretizing the conditionalcontinuous exposure variable, including "equal.mass", "equal.len" and "dhist" . • nbins Number of bins when discretizing a continuous variable. • maxncats The maximum number of unique categories a categorical variable can have. • maxNperBin The maximum number of observations in each bin. • poolContinVar Logical, when ﬁtting a model for binarized continuous variable, if pool-ing bin indicators across all bins and ﬁt one pooled regression or not • savetime.fit.hbars Logical, if skipping estimation and prediction of exposure mech-anism or not, when f.gstar1 = NULL and

TMLE.targetStep = "tmle.intercept" .Relevant arguments in tmleCommunity() : • hform.g0 Regression formula for g , with the form of Anode ~ WEnodes .0 tmleCommunity : Target Maximum Likelihood Estimation for Community-level Data • hform.gstar Regression formula for the user-supplied intervention g ∗ , with the form of Anode ~ WEnodes . • lbound One value for bounds on the ratio of the estimate of g ∗ to the estimate of g . • h.g0_GenericModel An object of

GenericModel R6 class containing the previouslyﬁtted models for P ( A | W, E ) under observed treatment mechanism g . • h.gstar_GenericModel An object of

GenericModel R6 class containing the previouslyﬁtted models for P ( A ∗ | W, E ) under observed treatment mechanism g ∗ . • TMLE.targetStep

TMLE targeting step method, either "tmle.intercept" (Default)or "tmle.covariate" .The options for selecting estimation algorithms for the treatment mechanism are similar tothose for estimating ¯ Q , and they share the same choices of h2ometalearner, h2olearner,SL.library , and CVfolds . Beyond that, users can also the sl3 package to perform data-adaptive estimation for g . Note that sl3 is a modern implementation of the Super Learneralgorithm for ensemble learning and model stacking. For model details on creating newwrapper functions in sl3 , please see (Coyle et al. g , even if the chosen algorithm fails, the tmlecommunity () function will use the following rule: If "h2o__ensemble" fails, it falls backon "sl3_pipelines" ; If "sl3_pipelines" fails, it falls back on "SuperLearner" ; The restof the mechanism will be the same as in the estimation process of ¯ Q Also, g can be either estimated by parametric models or data-adaptive algorithms, dependingon the choices of gestimator . Though the estimation algorithms for ¯ Q and g could bediﬀerent as long as the choices of Qestimator and gestimator diﬀer, the same library isused for all factors of g . For example, the initial ﬁt of ¯ Q is estimated by h2oEnsemble with "h2o.glm.wrapper" and "h2o.randomForest.wrapper" ; A super learning library, including SL.loess (with span = 0.8 ), SL.glmnet , SL.knn.20 (with neighborhood size k = 20 ) and

SL.step , will be used for each factor of g .It is important to choose the number and position of the bins when discretizing a continuousexposure variable, as the choices aﬀect the variance of the density estimation (Scott 1992).Fortunately, the type of each variable will be automatically detected (can be binary, categor-ical, or continuous) based on the user-speciﬁed maxncats argument. Recall that maxncats provides the maximum number of unique categories a categorical variable can have. So if onevariable has more unique categories, it is automatically considered as a continuous variable.According to Denby and Mallows (2009), a histogram can be used as a graphically descriptivetool where its location of the bins is determined by cutting the empirical cumulative distribu-tion function (ecdf) by a set of parallel lines. First, the nbins argument is a tuning parameterthat determines the total number of bins of discretization. A cross-validation selector can beapplied to data-adaptively select the candidate number of bins, which minimizes variance andmaximizes precision. Note that we do not recommend too many bins due to easily violatingthe positivity assumption.Next, given a number of bins, we need to choose the most convenient locations (cutoﬀs)for the bins. There are three alternative approaches that use a histogram to deﬁne the bincutoﬀs for a continuous variable: equal-mass, equal-length, and a combination of these two ournal of Statistical Software tmleCommunity package, the choice of methods bin.method together with theother arguments nbins and maxNperBin can be used to deﬁne the values of bin cutoﬀs. Notethat maxNperBin provides a user-speciﬁed maximum number of observations in each bin.The default discretization method, equal mass (aka equal area) interval method, is set bypassing an argument bin.method="equal.mass" to tmleCom_Options() prior to calling themain function tmleCommunity() . The interval are deﬁned by spanning the support of A intonon-equal length of bins, each containing (approximately) the same number of observations.It is data-adaptive since it tends to be wide where the population density is small, and narrowwhere the density is large. If nbins is NA (or is smaller than n maxNperBin ), nbins will be (re)setto the integer value of n maxNperBin where n is the total number of observations, and the defaultsetting of maxNperBin is 500 observations per interval. This method could identify spikes inthe density, but oversmooths in the tails and so could not discover outliers.Besides, equal length interval method is set by passing an argument bin.method="equal.len" to tmleCom_Options() . The intervals are deﬁned by spanning the support of a continuousvariable into nbins number of equal length of bins. This method describes the tails of thedensity and identiﬁes outliers well, but oversmooths in regions of high density and so is poorat identifying sharp peaks. Moreover, as an alternative to ﬁnd a compromise between equalmass and equal length approaches, the combination method is set by passing an argument bin.method="dhist" , where dhist is named for diagonally cut histogram. For consistency,We choose the slope a = 5 × IQR(A) as suggested by Denby and Mallows (2009).Similar to

Qform , formulae that include

WEnodes can be speciﬁed for estimating componentsof g and g ∗ using the hform.g0 and hform.gstar arguments. The functional form of theformulae is unimportant when the data-adaptive estimation algorithms are used. Also, ifthe hform.g0 and hform.gstar arguments are unspeciﬁed, the formulae will default to mainterm regressions that includes all variables in WEnodes .The lbound argument is a tuning parameter, conforming with the theoretical assumption 2 insection 2.5 that the ratio of g ∗ ( a | E, W ) to g ( a | E, W ) must be bounded away from + ∞ . Sincethe function g ∗ ( a | E, W ) is user given, we can try to deﬁne it in a way so that it could be used toanswer the causal question of interest, and yet it does not produce unstable weights. However,when there are unstable weights that cause extremely large value of g ∗ ( a | E, W ) g ( a | E, W ) , this lack ofidentiﬁability will result in the estimates with high variance (van der Laan and Rubin 2006). Acommon approach to reduce the variance of the consequent estimates is bounding g ∗ ( a | E, W ) g ( a | E, W ) away from the extremely large value, e.g., 0 ≤ g ∗ ( a | E, W ) g ( a | E, W ) ≤ lbound . However, truncationcomes at a price of bias since the consistency of the estimator of g ( a | E, W ) may be aﬀected.Therefore, the lbound argument should be chosen carefully (it defaults to 0.005). TMLE.targetStep speciﬁes how to use weights h g ∗ h gN in the TMLE targeting step. If it is setto "tmle.intercept" (default), it performs the weighted intercept-based TMLE that runs aintercept-only weighted logistic regression using oﬀsets logit( Q ∗ ) and weights h g ∗ h gN and so nocovariate. If setting to "tmle.covariate" , it performs the unweighted covariate-based TMLEthat run an unweighted logistic regression using oﬀsets logit( Q ∗ ) and a clever covariate h g ∗ h gN .The following example illustrates IPTW estimation of the average causal eﬀect of individual-based continuous intervention at a single time point. A sample of 5,000 is generated, with eachrow i consisting of four baseline covariates ( W1, W2, W3 and W4 ), one continuous exposure( A ) and continuous outcome ( Y ). The true value for the marginal treatment eﬀect of the2 tmleCommunity : Target Maximum Likelihood Estimation for Community-level Data intervention for the simulated data is ψ = 3 . ψ under a truncated stochasticintervention g ∗ , which is deﬁned by shifting the normal density of observed A until g ∗ g ≥ g . In this case, the tmleCommunity() function receivesonly one user-speciﬁed intervention, and we should utilize $EY_gstar1 to extract the resultsof estimates under the intervention f_gstar1 . Moreover, we use "iptw" to display the resultsof the IPTW estimator since it relies only on the esimate of g . Let’s begin with correctlyspeciﬁed models for g and g ∗ with a shift of 2 on the observed A , using the equal massmethod that discretizes A into 5 bins (all default choices). R> define_f.gstar <- function(shift.val, truncBD, rndseed = NULL) {R+ f.gstar <- function(data, ...) {R+ set.seed(rndseed)R+ A.mu <- 0.86 * data$W1 + 0.93 * data$W3 * data$W4 + 0.41 * data$W4R+ untrunc.A <- rnorm(n = nrow(data), mean = A.mu + shift.val, sd = 1)R+ r.new.A <- exp(0.5 * shift.val * (untrunc.A - A.mu - shift.val / 2))R+ trunc.A <- ifelse(r.new.A > truncBD, untrunc.A - shift.val, untrunc.A)R+ return(trunc.A)R+ }R+ return(f.gstar)R+ }R> f.gstar <- define_f.gstar(shift.val = 2, truncBD = 10, rndseed = 1)R>R> gform.C <- "A ~ W1 + W3 * W4" [1] 3.4406569 0.0120417

Note that if the discretization method is equal mass (i.e., bin.method = "equal.mass" ), andeach bin is allowed to contain no more than 250 observations (i.e., maxNperBin = 250), thenumber of bins ( or regressions) will be set to the larger value between the nearest interger of n and the value of nbins , where n is the total number of observations. Thus, even if nbins defaults to 5, the real number of bins for a sample of 5000 will be 20. It is worth mentioningthat during the estimation, −∞ and + ∞ are added as leftmost and rightmost cutoﬀ pointsto make sure all future data points end up in one of the intervals. For example, if the realnumber of bins is 20, then the returned results will include 22 ﬁtted models. However, theselection of maxNperBin doesn’t have inﬂuence on the real number of bins when using theequal length and combination methods. ournal of Statistical Software R> tmleCom_Options(maxNperBin = 250, bin.method = "equal.mass")R> tmleCom_gmain_eqmass <-R+ tmleCommunity(data = indSample.cA.cY, Ynode = "Y", Anodes = "A",R+ WEnodes = c("W1", "W2", "W3", "W4"), f_gstar1 = f.gstar)R> h.g0_models_mass <- tmleCom_gmain_eqmass$EY_gstar1$h.g0_GenericModelR> length(h.g0_models_mass$getPsAsW.models()$ ` P(A|W).1 ` $bin_nms) [1] 22 R> tmleCom_Options(maxNperBin = 250, bin.method = "equal.len")R> tmleCom_gmain_eqlen <-R+ tmleCommunity(data = indSample.cA.cY, Ynode = "Y", Anodes = "A",R+ WEnodes = c("W1", "W2", "W3", "W4"), f_gstar1 = f.gstar)R> h.h.g0_models_len <- tmleCom_gmain_eqlen$EY_gstar1$h.g0_GenericModelR> length(h.h.g0_models_len$getPsAsW.models()$ ` P(A|W).1 ` $bin_nms) [1] 7 As mentioned previously, when f.gstar1 is inadvertently unspeciﬁed (i.e., f.gstar1 = NULL )and

TMLE.targetStep = "tmle.intercept" , setting savetime.fit.hbars to TRUE allowsthe TMLE process to skip the estimation and prediction of exposure mechanism P ( A | W, E )under g and g ∗ . It will directly set h g ∗ h gN to 1 for all observations. R> tmleCom_nofgstar <-R> tmleCommunity(data = indSample.cA.cY, Ynode = "Y", Anodes = "A",R+ WEnodes = c("W1", "W2", "W3", "W4"), f_gstar1 = NULL,R+ Qform = "Y ~ W1 + W2 + W3 + W4 + A")R> c(tmleCom_nofgstar$EY_gstar1$h.g0_GenericModel,R+ tmleCom_nofgstar$EY_gstar1$h.gstar_GenericModel) [1] NULL

If instead we would like to estimate the same parameter except using machine learning algo-rithms. R code that uses

SuperLearner to estimate ψ is shown next. It displays a satisfac-tory result of estimation with a super learning library containing "SL.glm", "SL.gam" and "SL.randomForest" . R> require("SuperLearner")R> tmleCom_Options(gestimator = "SuperLearner", maxNperBin = N,R+ SL.library = c("SL.glm", "SL.gam"))R> tmleCom_gSL_default <-R+ tmleCommunity(data = indSample.cA.cY, Ynode = "Y", Anodes = "A",R+ WEnodes = c("W1", "W2", "W3", "W4"), f_gstar1 = gstar,R+ Qform = "Y ~ W1 + W2 + W3 + W4 + A", rndseed = 1)R> c(tmleCom_gSL_default$EY_gstar1$estimates["iptw", ],R+ tmleCom_gSL_default$EY_gstar1$vars["iptw", ]) tmleCommunity : Target Maximum Likelihood Estimation for Community-level Data [1] 3.4471211 0.0124127 Another choice for performing maching leraning algorithms, especially stacked ensemble learn-ing, is to use the sl3 package. Example R code for estimating ψ is shown next. Both Lrnr_glm_fast and

Lrnr_glmnet are used in the library.

R> require("sl3"); require("SuperLearner")R> tmleCom_Options(gestimator = "sl3_pipelines", maxNperBin = N, CVfolds = 5,R> sl3_learner = list(glm_fast = make_learner(Lrnr_glm_fast),R> glmnet = make_learner(Lrnr_glmnet)),R> sl3_metalearner = make_learner(R> Lrnr_optim, loss_function = loss_squared_error,R> learner_function = metalearner_logistic_binomial))R> tmleCom_gsl3_default <-R> tmleCommunity(data = indSample.cA.cY, Ynode = "Y", Anodes = "A",R> WEnodes = c("W1", "W2", "W3", "W4"), f_gstar1 = f.gstar,R> rndseed = 1)R> c(tmleCom_gsl3_default$EY_gstar1$estimates["iptw", ],R> tmleCom_gsl3_default$EY_gstar1$vars["iptw", ]) [1] 3.449183 0.014578

Recall that the h2oEnsemble package could also perform Super learning methods. In thiscase, we apply generalized linear models with penalized maximum likelihood for both baselearners that are used to train the base models for the ensemble, and the metalearner that isused to learn the optimal combination of the base learners. Speciﬁcally, the base learners willinclude three regressions: Lasso ( α = 1), Ridge ( α = 0) and Elastic net models with α = 0 . R> require("h2oEnsemble")R> h2o.glm.1 <- function(..., alpha = 1, prior = NULL) {R+ h2o.glm.wrapper(..., alpha = alpha, , prior=prior)R+ }R> h2o.glm.0.5 <- function(..., alpha = 0.5, prior = NULL) {R+ h2o.glm.wrapper(..., alpha = alpha, , prior=prior)R+ }R> h2o.glm.0 <- function(..., alpha = 0, prior = NULL) {R+ h2o.glm.wrapper(..., alpha = alpha, , prior=prior)R+ }R> tmleCom_Options(gestimator = "h2o__ensemble", maxNperBin = N,R+ h2ometalearner = "h2o.glm.wrapper",R+ h2olearner = c("h2o.glm.1", "h2o.glm.0.5", "h2o.glm.0"))R> tmleCom_gh2o_default <-R+ tmleCommunity(data = indSample.cA.cY, Ynode = "Y", Anodes = "A",R+ WEnodes = c("W1", "W2", "W3", "W4"),R+ f_gstar1 = f.gstar, rndseed = 1)R> c(tmleCom_gh2o_default$EY_gstar1$estimates["iptw", ],R+ tmleCom_gh2o_default$EY_gstar1$vars["iptw", ]) ournal of Statistical Software [1] 3.4321917 0.0118350 For full details, see the documentation for the tmleCommuity package (cite ***). • data Observed data, data.frame with named columns, containing

WEnodes , Anode , Ynode and possibly communityID , YnodeDet . • Ynode

Column name or index in data of outcome variable. Outcome can be eitherbinary or continuous (could be beyond 0 and 1). If

Ynode undeﬁned, the left-side of theregression formula in argument

Qform will be treated as

Ynode . • Anodes

Column names or indices in data of exposure (treatment) variables. • WEnodes

Column names or indices in data of individual-level (and possibly community-level) baseline covariates. Factors are not allowed. • YnodeDet

Optional column name or index in data of indicators of deterministic valuesof outcome

Ynode , coded as (

TRUE / FALSE ) or (1 / 0). If TRUE or 1, value of

Ynode isgiven deterministically / constant • obs.wts Optional choice to provide/ construct a vector of individual-level observa-tion (sampling) weights (of length nrow(data) ). Currently supports a non-negativenumeric vector, "equal.within.pop" (Default) and "equal.within.community" . If "equal.within.pop" , weigh individuals in the entire dataset equally (weigh to be all 1);If "equal.within.community" , weigh individuals within the same community equally(i.e., 1 / (number of individuals in each community)). • community.step Methods to deal with hierarchical data, user needs to specify one of thefour choices: "NoCommunity" (Default), "community_level" , "individual_level" ,and "PerCommunity" . If "NoCommunity" , claim that no hierarchical structure in data;If "community_level" , use the community-level TMLE; If "individual_level" , usethe individual-level TMLE cooperating with the assumption of no covariate interfer-ence; Finally if "perCommunity" , use stratiﬁed TMLE. If communityID = NULL , thenautomatically pool over all communities (i.e., treated it as "NoCommunity" ). • communityID Optional column name or index in data representing community iden-tiﬁer variable. If known, it can support the three options within community.step : "community_level" , "individual_level" and "PerCommunity" . • community.wts Optional choice to provide/ construct a matrix of community-levelobservation weights (where dimension = J ×

2, where J = the number of communi-ties). The ﬁrst column contains the identiﬁers / names of communities (ie., data[,communityID] ) and the second column contains the corresponding non-negative weights.Currently only support a numeric matrix with 2 columns, "size.community" (De-fault) and "equal.community" . If setting community.wts = ”size.community”, treatthe number of individuals within each community as its weight, respectively. And if community.wts = ”equal.community”, assumed weights to be all 1.6 tmleCommunity : Target Maximum Likelihood Estimation for Community-level Data • pooled.Q Logical for incorporating hierarchical data to estimate the outcome mecha-nism. If

TRUE , use a pooled individual-level regression for initial estimation of the meanoutcome (i.e., outcome mechanism). Default to be

FALSE . • f_g0 Optional function used to specify model knowledge about value of Anodes. Itestimates P ( A | W, E ) under g0 by sampling a large vector/ data frame of Anode (oflength nrow(data)*n_MCsims or number of rows if a data frame) from f_g0 function. • f_gstar1 Either a function or a vector or a matrix/ data frame of counterfactual ex-posures, depending on the number of exposure variables. If a matrix/ data frame, itsnumber of rows must be either nrow(data) or 1 (constant exposure assigned to allobservations), and its number of columns must be length(Anodes) . Note that the col-umn names should match with the names in

Anodes . If a vector, it must be of length nrow(data) or 1. If a function, it must return a vector or a data frame of counterfactualexposures sampled based on

Anodes , WEnodes (and possibly communityID ) passed as anamed argument ”data”. Thus, the function must include ”data” as one of its argumentnames. The interventions deﬁned by f_gstar1 can be static, dynamic or stochastic. • f_gstar2 Either a function or a vector or a matrix/ data frame of counterfactual ex-posures, depending on the number of exposure variables. It has the same componentsand requirements as f_gstar1 has. • Qform

Character vector of regression formula for Ynode. If not speciﬁed (i.e.,

NULL ),the outcome variable is regressed on all covariates included in Anodes and WEnodes(i.e.,

Ynode ~ Anodes + WEnodes ). • Qbounds

Vector of upper and lower bounds on Y and predicted value for initial Q .Default to the range of Y , widened by 10% of the min and max values. • alpha Used to keep predicted values for initial Q bounded away from (0,1) for logisticﬂuctuation (set Qbounds to (1 - alpha ), alpha ). • fluctuation Default to ”logistic”, it could also be ”linear” (for targeting step). • hform.g0 Character vector of regression formula for estimating the conditional densityof P ( A | W, E ) under observed treatment mechanism g . If not speciﬁed, its form will be Anodes ~ WEnodes . If there are more than one exposure, it ﬁts a joint probability. • hform.gstar Character vector of regression formula for estimating the conditional den-sity P ( A | W, E ) under user-supplied interventions f_gstar1 or f_gstar2 . If not speci-ﬁed, it use the same regression formula as used in hform.g0 . • lbound Value between (0,1) for truncation of predicted P ( A | W, E ). Default to 0.005. • h.g0_GenericModel An object of

GenericModel R6 class containing the previouslyﬁtted models for P ( A | W, E ) under observed treatment mechanism g , one of the returnsof tmleCommunity function. If known, predictions for P ( A = a | W = w, E = e ) under g are based on the ﬁtted models in h.g0_GenericModel . • h.gstar_GenericModel An object of

GenericModel R6 class containing the previ-ously ﬁtted models for P ( A ∗ | W, E ) under intervention gstar , one of the returns of ournal of Statistical Software tmleCommunity function. If known, predictions for P ( A = a | W = w, E = e ) under gstar are based on the ﬁtted models in h.gstar_GenericModel . • TMLE.targetStep

TMLE targeting step method, either "tmle.intercept" (Default)or "tmle.covariate" . • n_MCsims Number of simulations for Monte-Carlo analysis. Each simulation generatesnew exposures under f_gstar1 or f_gstar2 (if speciﬁed) or f_g0 (if speciﬁed), witha sample size of nrow(data). Then these generated exposures are used when ﬁttingthe conditional densities P ( A | W, E ) and estimating for IPTW and GCOMP under in-tervention f_gstar1 or f_gstar2 . Note that deterministic intervention only needsone simulation and stochastic intervention could use more simulation times such as 10(Default to 1). • CI_alpha

Signiﬁcance level (alpha) used in constructing a conﬁdence interval. Defaultto 0.05. • rndseed Random seed for controlling sampling A under f_gstar1 or f_gstar2 (forreproducibility of Monte-Carlo simulations) • verbose Flag. If

TRUE , print status messages. Default to

FALSE . It can be turned on bysetting options(tmleCommunity.verbose = TRUE) . For full details, see the documentation for the tmleCommuity package (cite ***). • Qestimator

A string specifying default estimator for outcome mechanism model ﬁt-ting. The default estimator is "speedglm__glm" , which estimates regressions with speedglm.wfit ; Estimator "glm__glm" uses glm.fit ; Estimator "h2o__ensemble" im-plements the super learner ensemble (stacking) algorithm using the H2O R interface;Estimator "SuperLearner" implements the super learner prediction methods. Note thatif "h2o__ensemble" fails, it falls back on "SuperLearner" . If "SuperLearner" fails, itfalls back on "speedglm__glm" . If "speedglm__glm" fails, it falls back on "glm__glm" . • gestimator A string specifying default estimator for exposure mechanism ﬁtting. Ithas the same options as

Qestimator . • bin.method Specify the method for choosing bins when discretizing the conditionalcontinuous exposure variable A . The default method is "equal.mass" , which providesa data-adaptive selection of the bins based on equal mass/ area, i.e., each bin willcontain approximately the same number of observations as others. Method "equal.len" partitions the range of A into equal length nbins intervals. Method "dhist" uses acombination of the above two approaches. Please see Denby and Mallows ”Variationson the Histogram” (2009) for more details. • nbins When bin.method = "equal.len" , set to the user-supplied number of bins whendiscretizing a continuous variable. If not speciﬁed, then default to 5; If setting to as NA ,then set to the nearest integer of nobs/ maxNperBin , where nobs is the total numberof observations in the input data. When method is "equal.mass" , nbins will be set asthe maximum of the default nbins and the nearest integer of nobs/ maxNperBin .8 tmleCommunity : Target Maximum Likelihood Estimation for Community-level Data • maxncats Integer that speciﬁes the maximum number of unique categories a categor-ical variable

A[j] can have. If

A[j] has more unique categories, it is automaticallyconsidered a continuous variable. Default to 10. • maxNperBin Integer that speciﬁes the maximum number of observations in each binwhen discretizing a continuous variable

A[j] (applies directly when bin.method ="equal.mass" and indirectly when bin.method = "equal.len" , but nbins = NA ). • parfit Logical. If

TRUE , perform parallel regression ﬁts and predictions for discretizedcontinuous variables by functions foreach and dopar in foreach package. Default to FALSE . Note that it requires registering a parallel backend prior to running tmleCommunity function, e.g., using doParallel

R package and running registerDoParallel(cores= ncores) for ncores parallel jobs. • poolContinVar Logical. If

TRUE , when ﬁtting a model for binarized continuous variable,pool bin indicators across all bins and ﬁt one pooled regression. Default to

FALSE . • savetime.fit.hbars Logical. If

TRUE , skip estimation and prediction of exposuremechanism P(A | W,E) under g & g ∗ when f.gstar1 = NULL and TMLE.targetStep ="tmle.intercept" , and then directly set h_gstar_h_gN = 1 for each observation. De-fault to

TRUE . • h2ometalearner A string to pass to h2o.ensemble , specifying the prediction algorithmused to learn the optimal combination of the base learners. Supports both h2o andSuperLearner wrapper functions. Default to ”h2o.glm.wrapper”. • h2olearner A string or character vector to pass to h2o.ensemble , naming the predic-tion algorithm(s) used to train the base models for the ensemble. The functions musthave the same format as the h2o wrapper functions. Default to ”h2o.glm.wrapper”. • CVfolds

Set the number of splits for the V-fold cross-validation step to pass to

SuperLearner and h2o.ensemble . Default to 5. • SL.library

A string or character vector of prediction algorithms to pass to

SuperLearner .Default to c("SL.glm", "SL.step", "SL.glm.interaction") . For more available al-gorithms see

SuperLearner::listWrappers() .

4. Simulation studies with community-level interventions

We perform a simulation study evaluating the ﬁnite sample bias and variance of the TMLEpresented in section 2.7.1 and 2.8.1, including both community-level and individual-levelTMLE. Besides, we compare the performance of TMLE estimator with that of Inverse-Probability-of-Treatment-Weighted estimator (IPTW) and parametric G-computation for-mula estimator (GCOMP). In order to estimate the average causal eﬀect of community-levelintervention(s) at a single time point on an individual-based outcome, we simulate a data ournal of Statistical Software j ) containing n j (non-ﬁxed) number of individuals where n j is drawn from a normal distribution with mean 50and standard deviation 10 and rounded to the nearest integer. First, we sample n j i.i.d.community-level baseline covariates ( E , E ), distributed as n j ∼ N (50 , E ,j ∼ U nif (0 , E ,j ∼ U nif { . , . , . , . } Then 3 dependent individual-level baseline covariates ( W , W , W ) are drawn as a functionof community-level baseline covariates, respectively. W ,n j ∼ ( Bern ( expit ( − . . E ,j − . E ,j ))) i =1 ,...,n j (cid:18) W ,n j W ,n j (cid:19) ∼ N ( 1 − . E ,j − . E ,j . . E ,j , Σ = (cid:20) . . (cid:21) )And a community-level continuous treatment A is sampled conditionally on the values of allbaseline covariates. A j ∼ N ( − . . E + 0 . E + 3 W c ,n j − . W c ,n j + 0 . W c ,n j , W c ,n j = 1 n j n j (cid:88) i =1 W ,n j W c ,n j = 1 n j n j (cid:88) i =1 W ,n j W c ,n j = 1 n j n j (cid:88) i =1 W ,n j Also a truncated stochastic intervention g ∗ is deﬁned by shifting the normal density of ob-served A by some known constant shif t > g ∗ g exceeds a known constant bound , andotherwise, the intervention keeps the observed exposure A unchanged. So the intervenedexposure A ∗ j is distributed as A ∗ j = (cid:40) A j + shif t exp { . ∗ shif t ∗ ( A j − µ ( E j , W cn j )) − shift } > truncbdA j , o.w. where µ ( E j , W cn j ) = − . . E + 0 . E + 3 W c ,n j − . W c ,n j + 0 . W c ,n j Last, the individual-level binary outcome Y that is a function of treatment and all baselinecovariates is simulated. Similarly, the post-intervened outcome Y ∗ , under stochastic interven-tion g ∗ , is deﬁned as • Case 1 : Working model holds Y j ∼ Bern ( expit ( − . . A j + 0 . E ,j − . E ,j + 1 . W ,n j + 1 . W ,n j − . W ,n j )) Y ∗ j ∼ Bern ( expit ( − . . A ∗ j + 0 . E ,j − . E ,j + 1 . W ,n j + 1 . W ,n j − . W ,n j )) • Case 2 : Working model is not a reasonable approximation Y j ∼ Bern ( expit ( − . . A j − . E ,j + 1 . E ,j + 5 . W c ,n j − . W c ,n j − W c ,n j +0 . W ,n j + 0 . W ,n j − . W ,n j )) Y ∗ j ∼ Bern ( expit ( − . . A ∗ j − . E ,j + 1 . E ,j + 5 . W c ,n j − . W c ,n j − W c ,n j +0 . W ,n j + 0 . W ,n j − . W ,n j ))0 tmleCommunity : Target Maximum Likelihood Estimation for Community-level Data The next code chunk shows how to simulate a data set according to the previous data gen-erating distributions where the working model fails. The code also deﬁnes the stochasticintervention g ∗ that we are interested in. Assuming that we want to evaluate the eﬀect ofa constant shift of 1, given a truncation bound of 5, the true parameter value under thisstochastic intervention is ψ = 0 . R> getY <- function(A, E1, E2, W1, W2, W3, bs, n.ind) {R+ prob.Y <- plogis(bs[1] + bs[2] * A + bs[3] * E1 + bs[4] * E2R+ + bs[5] * mean(W1) + bs[6] * mean(W2) + bs[7] * mean(W3)R+ + bs[8] * W1 + bs[9] * W2 + bs[10] * W3)R+ rbinom(n = n.ind, size = 1, prob = prob.Y)R+ }R>R> get.cluster.Acont <- function(id, n.ind, truncBD = 5, shift = 1,R+ working.model = T) {R+ ' t fix the number of obs in each community ournal of Statistical Software R+ n.ind <- round(rnorm(J, n.ind, 10))R+ n.ind[n.ind <= 0] <- n.indR+ }R+ if (only.Y) {id <- Y <- Y.gstar <- NULL } else { full.dat <- NULL}R+ for(j in 1:J) {R+ cluster.data.j <- get.cluster.Acont(working.model = working.model,R+ id = j, n.ind = n.ind[j], truncBD = truncBD, shift = shift)R+ if (only.Y) {R+ id <- c(id, cluster.data.j[, "id"])R+ Y <- c(Y, cluster.data.j[, "Y"])R+ Y.gstar <- c(Y.gstar, cluster.data.j[, "Y.gstar"])R+ } else {R+ full.dat <- rbind(full.dat, cluster.data.j)R+ }R+ if (!only.Y) { full.dat$id <- as.integer(full.dat$id) }R+ }R+ ifelse(only.Y,R+ return(data.frame(cbind(id, Y, Y.gstar))), return(full.dat))R+ }R>R> PopDat.wmF <- get.fullDat.Acont(J = 4000, n.ind = 1000, truncBD = 5,R+ shift = 1, working.model = F, only.Y = T) {R> PopDat.wmF.agg <- aggregate(PopDat.wmF, by=list(PopDat.wmF$id), mean)R> truth.wmF <- mean(PopDat.wmF.agg$Y.gstar)R>R> comSample.wmF <- get.fullDat.Acont(R+ J = 1000, n.ind = 50, truncBD = 5, shift = 1, working.model = F)R> comSample.wmF$Y.gstar <- NULLR>R> define_f.gstar <- function(shift.val, truncBD, rndseed = NULL) {R+ f.gstar <- function(data, ...) {R+ set.seed(rndseed)R+ A.mu <- - 1.2 + 0.8 * data$E1 + 0.21 * data$E2 +R+ 3 * mean(data$W1) - 0.7 * mean(data$W2) + 0.3 * mean(data$W3)R+ untrunc.A <- rnorm(n = nrow(data), mean = A.mu + shift.val, sd = 1)R+ r.new.A <- exp(1.5 * shift.val * (untrunc.A - A.mu - shift.val / 4))R+ trunc.A <- ifelse(r.new.A > truncBD, untrunc.A - shift.val, untrunc.A)R+ return(trunc.A)R+ }R+ return(f.gstar)R+ }R> f.gstar <- define_f.gstar(shift.val = 1, truncBD = 5)

We ﬁrst demonstrate how to use the two distinct approaches for leveraging a hierarchicaldata structure. Recall that the ﬁrst approach treats community rather than individual asthe unit of analysis and performs estimation on the aggregated data. It can also incorporatehierarchical structure for estimating outcome mechanism by adding a single pooled individual-2 tmleCommunity : Target Maximum Likelihood Estimation for Community-level Data level regression in the Super Learner library. The second approach, on the other hand, runspooled individual-level regressions on both outcome and treatment mechanisms, because itutilizes the pairing of individual-level covariates and outcomes. Note that all approaches usethe equal-mass method for choosing bins (for the continuous exposure), and speedglm as theestimators for both outcome and exposure mechanisms. Parameter estimates are obtainedfrom 200 repetitions of the simulation.

R> niterations <- 200 ournal of Statistical Software R+

Table 1: Simulation study 1. Simulation-based performance of TMLE, IPTW, Gcomp esti-mators with stochastic exposures over 200 repetitions of the simulation, when the workingmodel holds ( ψ = 55 . ψ = 55 . ψ as the average point estimate, ”Bias” as the absolute diﬀerencebetween the estimate ˆ ψ and the truth ψ , ˆ σ as the average standard error estimate, rMSE asthe root mean squared error, and ”Cover” as the proportion of times that the truth falls withinthe 95% CI. All outcome and treatment mechanisms are correctly speciﬁed. All reported bias,SE, rMSE and Coverage are multiplied by 100. Working Model Holds Working Model Fails

Estimator ˆ ψ Bias ˆ σ rMSE Cover ˆ ψ Bias ˆ σ rMSE CoverTMLE-Ia 55.75 0.18 0.60 0.62 89.5 56.47 0.69 1.36 1.52 86.5TMLE-Ib 56.48 0.91 0.64 1.12 68.5 56.50 0.72 1.60 1.75 89.5TMLE-II 55.71 0.13 0.39 0.41 84.0 57.59 1.81 1.48 2.34 69.0IPTW-I 56.63 1.06 2.60 2.81 100 56.16 0.38 2.91 2.94 100IPTW-II 55.67 0.10 3.44 3.44 100 57.23 1.45 4.23 4.47 100Gcomp-I 55.83 0.26 0.60 0.65 90.5 56.44 0.67 1.36 1.51 85.5Gcomp-II 55.75 0.18 0.39 0.43 87.5 57.57 1.79 1.48 2.33 71.0Results displayed in Table 1 shows the comparison of the performance of the two TMLEswhen the assumption of ”no covariate interference” holds and the assumption fails badly.As predicted by theory, the community-level targeted estimator (TMLE-Ia), which is alwaysbased on the aggregated data, has a good performance in both situations with negligible bias.However, the coverage rates of its inﬂuence-curve-based conﬁdence intervals are lower thannominal (89.5% and 86.5%) due to small variances. In this case, TMLE-Ib, which uses apooled individual-level outcome regression and then a community-level stochastic interven-4 tmleCommunity : Target Maximum Likelihood Estimation for Community-level Data tion, performs slightly worse than TMLE-Ia. When the working model holds, we observe thatthe coverage rate of TMLE-Ib (68.5%) has a notable decrease compared to that of the otherestimators because of a relatively large bias. As expected, the individual-level targeted esti-mator (TMLE-II) is biased and its conﬁdence interval coverage has an obvious decrease whenthe working model fails. Even though the working model is not a reasonable approximation,surprisingly, the IPTW using individual-level stochastic intervention (IPTW-II) provides areasonable estimate. We now consider another common simulation study with binary community-level exposure(s),which is commonly used in the study of HIV prevention and treatment. Similar to theprevious simulation, we generate 200 samples of size J = 100 communities, each containingnj observation where n j ∼ N (50 , W ,n j ∼ ( Bern (0 . i =1 ,...,n j W ,n j ∼ ( N (0 , i =1 ,...,n j W c ,n j = 1 n j n j (cid:88) i =1 W ,n j W c ,n j = 1 n j n j (cid:88) i =1 W ,n j A j ∼ Bern ( expit ( W c ,n j + 0 . W c ,n j ))However, the mechanism diﬀers in the outcome distribution: • Case 1 : Working model is a reasonable approximation Y j ∼ Bern ( expit (0 .

15 + 0 . A j + 0 . W c ,n j + 2 W ,n j + 0 . W ,n j )) • Case 2 : Working model is not a reasonable approximation Y j ∼ Bern ( expit (0 .

15 + 0 . A j + 3 W c ,n j − . W c ,n j − . W ,n j + W ,n j ))As before, table 2 summarizes the performance of the estimators under diﬀerent outcomegenerating distributions. First, TMLE-Ia performs well in both situations with negligiblebiases and great conﬁdence interval coverages. As expected, TMLE-Ib performs similarly toTMLE-Ia when the working model holds, and worse than TMLE-Ia when it fails, in termsof bias and variance. TMLE-II, on the other hand, performs very well when the workingmodel provides a reasonable approximation, but exhibits slight increases in bias and variance(and so a more conservative conﬁdence interval) when the working model fails. Theoratically,TMLE-II uses lower dimensional objects with size N = (cid:80) Jj =1 N j and so may improve theﬁnite sample eﬃciency if the working model holds. However, when the working model doesnot hold, the misspeciﬁcation of both the outcome and treatment regressions will cause biasedestimate and eﬃciency loss. Besides, both IPTW-I (with the community-level g) and IPTW-II (with the individual-level g ) have larger variances compared to other estimators, and soprovides 100% coverage rates. It could be explained that the IPTW estimator has relativelylarge variability, despite the large sample size. In other words, the range of the estimatedvalues of IPTW can be wide and results in a large variance. ournal of Statistical Software ψ = 4 . ψ = 3 . Working Model Holds Working Model Fails

Estimator Bias ˆ σ rMSE Cover Bias ˆ σ rMSE CoverTMLE-Ia 0.03 1.15 1.16 95.0 0.03 1.07 1.08 92.5TMLE-Ib 0.16 1.16 1.17 95.0 0.25 1.24 1.26 97.5TMLE-II 0.01 1.14 1.14 95.00 0.04 1.22 1.22 96.0IPTW-I 0.02 3.79 3.79 100 0.06 3.46 3.46 100IPTW-II 0.04 15.99 15.99 100 0.02 17.56 17.56 100Gcomp-I 0.03 1.15 1.16 95.5 0.03 1.07 1.08 91.5Gcomp-II 0.01 1.14 1.14 95.0 0.04 1.22 1.22 96.0In this simulation, we study the special case where each community has only one observation(i.e., N = 1) and the intervention is stochastic. As described in section 2.7.3, it’s similarto data with only community-level baseline covariates (i.e., treat ( E, W ) = E ). The data-generating distribution is described as follows: E ∼ Bern (0 . E ∼ Bern (0 . E ∼ N (0 , . E ∼ U nif (0 , A | E , E , E , E ∼ N (0 . E + 0 . E − . E + 0 . E , Y | A, E , E , E , E ∼ N (3 .

63 + 0 . A − . E − . E + 0 . E − . E , A ∗ is distributed as: A ∗ j = (cid:40) A j + shif t exp { . ∗ shif t ∗ ( A j − µ ( E , E , E , E ) − shift } > truncbdA j , o.w. where µ ( E , E , E , E ) = 0 . E + 0 . E − . E + 0 . E Given a shift of 2 and a truncation bound of 10, the marginal treatment eﬀect of the individual-based intervention is ψ = 3 . Q ( A, E ) is estimated with the correctly speciﬁed main terms re-gression model and a misspeciﬁed regression model, only adjusting for A and E . Besides,6 tmleCommunity : Target Maximum Likelihood Estimation for Community-level Data Figure 1: Box plots of the point estimates from three algorithms for sample sizes n = 1000(left) and n = 5000 (right) in Simulation study 3. The x-axis denotes the combination of theestimator and the model speciﬁcation. CC indicates correctly speciﬁed outcome regressionand exposure mechanism. CM indicates the outcome regression is correctly speciﬁed, whilethe exposure mechanism misspeciﬁed. MC indicates the exposure mechanism is correctlyspeciﬁed, while the outcome regression misspeciﬁed. The dashed line indicates the true value ψ = 3 . g ∗ ( a | E ) is estimated with a correctly speciﬁed model, as well as a g ( a | E ) misspeciﬁed model, only adjusting for E . The simulations results are consistent withthe theoretical predictions. TMLE performs quite well if either the outcome regression or theexposure mechanism is correctly speciﬁed. IPTW exhibits low bias when g ∗ ( a | E ) is correctlyspeciﬁed, but is biased otherwise. This bias decreases but does not disappear with an increasein sample size. Besides, IPTW has much higher variance than other estimators even with acorrectly speciﬁed g ∗ ( a | E ), which may be explained by practical positivity violations such assmall g ∗ ( a | E ) causes large weights on few individuals. Weight truncation could be a possiblesolution for this practical violation - We can implement more restrictive bounds on g ∗ (in thesimulation analysis, a less restrictive set of bounds of [0 . ,

1] is used). When the modelfor ¯ Q ( A, E ) is misspeciﬁed, MLE performs poorly in precision, but MLE is unbiased when¯ Q ( A, E ) is correctly speciﬁed. Furthermore, sample size does help reduce variance.

5. Discussion

The tmleCommunity package was developed to oﬀer a ﬂexible, easily customizable implemen-tation of the TMLE algorithm for hierarchical data structure, along with community-levelmultivariate arbitrary interventions. The main class of causal parameters that is estimated ournal of Statistical Software

Ynode , Anodes , WEnodes and f_gstar1 . On that basis, experienced users can control the estimation procedureby providing the user-supplied regression models for ¯ Q , g and g ∗ , and choosing preferredmethods allowed for arguments, such as the method dealing with hierarchical data, whetherincluding hierarchical structure to estimate ¯ Q , either linear or logistic ﬂuctuation for tar-geting, and the TMLE targeting step method. Remarkably, obs.wts and community.wts can be used to correct for case-control sampling (when the outcome is rare). Besides, the tmleCommunity function can either internally estimate all factors of the likelihood, or usethe values for g n and g ∗ n from the external estimation procedure through h.g0_GenericModel and h.gstar_GenericModel . The choices of data-adaptive machine learning techniques andother more advanced estimation methods can be speciﬁed in the tmleCom_Options function.Planned extensions to the package include several areas. First, we plan to include TMLEestimation of casual eﬀects of multiple time-point interventions, adjusting for time-dependentcovariates for hierarchical longitudinal data. Second, since considering only complete casesin the data is ineﬃcient and may cause bias when missingness is informative. The packagewill then be extended to allow missingness on the outcome vector, so that the correspondingcovariate information can be utilized for reducing bias and increasing eﬃciency in estimates.Third, as mentioned in section (2.7.2), when one individual’s outcome is aﬀected by theindividual-level covariates of the subset of other individuals from the same community, thestrength of the ”no covariate interference” assumption should be weakened by including thisknowledge of dependence. Another planned addition to this package will so allow estimationof community-level TMLE under this setting.Additionally, this package estimates variances and standard errors through estimated inﬂuencecurves. Double robustness makes these estimates asymptotically correct if both the outcomeand treatment mechanisms are estimated consistently at reasonable rates, and conservativeif only one of them is estimated consistently. However, variance estimation is diﬃcult whenviolations or near violations of positivity happen in ﬁnite samples due to chance (Petersen et al. g given a known shift function. In practice, a stochastic intervention g ∗ could also be unknown (i.e., not a function of g anymore). If we consider the estimationof an optimal treatment rule where the rule is deﬁned to maximize the mean outcome underthe treatment, without cross validation, we will use the same information from the observeddata to estimate both the user-speciﬁed mechanism g ∗ and the mean outcome under the ﬁttedmechanism, which may result in ﬁnite sample bias. According to van der Laan and Luedtke(2015) and Luedtke and van der Laan (2016), the cross-validated TMLE (cv-TMLE) approachavoids empirical process conditions and for each sample split, it estimates an empirical meanover a validation sample, under a stochastic (or deterministic) intervention estimated basedon the training sample. Therefore it may reduce ﬁnite sample bias, and including cv-TMLE8 tmleCommunity : Target Maximum Likelihood Estimation for Community-level Data in the package can be one of our future work.

6. Answers to some frequently asked questions (FAQ)

Can I call the tmleCommunity function a second time without having to re-do the estimationof exposure mechanism?

Yes. Users can use command result$EY_gstar1$h.g0_GenericModel to obtain an object of

GenericModel R6 class containing the previously ﬁtted models for P ( A | W, E ) under observedmechanism g (assuming the result of the ﬁrst call to tmleCommunity is returned to thevariable named result , and only one intervention function f_gstar1 is user given). Similarly,an object GenericModel class containing the previously ﬁtted models for P ( A | W, E ) underintervention f_gstar1 is returned as result$EY_gstar1$h.gstar_GenericModel . The twoobjects can be passed into the second call to tmleCommunity by specifying the values for h.g0_GenericModel and h.gstar_GenericModel , respectively. Assuming we are using thesimulated data and the ﬁrst estimation result in section 3.5, the next code chunk illustrateshow this is done.

R> tmleCom_gc_default2 <- tmleCommunity(R+ data = indSample.cA.cY, rndseed = 1, Ynode = "Y", Anodes = "A",R+ WEnodes = c("W1", "W2", "W3", "W4"), Qform = "Y ~ W1 + W2 + A",R+ h.g0_GenericModel = tmleCom_gc_default$EY_gstar1$h.g0_GenericModel,R+ h.gstar_GenericModel = tmleCom_gc_default$EY_gstar1$h.gstar_GenericModel)R> cbind(tmleCom_gc_default2$EY_gstar1$estimates,R+ tmleCom_gc_default2$EY_gstar1$vars)

Can I deﬁne and ﬁt the multivariate conditional density under the user-speciﬁed interventionfunction directly without having to call the tmleCommunity function?

Yes, the package provides an individual function named fitGenericDensity to deﬁne andﬁt regression models for the conditional density P ( A = a | W = w ) where a is generatedunder a user-speciﬁed arbitrary (can be static, dynamic or stochastic) intervention function.Its arguments are similar to those for estimating treatment mechanisms in tmleCommunity ,except hierarchical data structure is not supported in this function. Therefore, this functionis purely for estimating the multivariate conditional density.With the same data set simulated in section 3.5, we may be interested in the mean counter-factual outcome under a stochastic intervention g ∗ where the observed A is shifted to the leftby the half of its mean. R> define_f.gstar <- function(shift.rate, ...) {R+ eval(shift.rate)R+ f.gstar <- function(data, ...) {R+ print(paste0("rate of shift: ", shift.rate))R+ shifted.new.A <- data[, "A"] - mean(data[, "A"]) * shift.rateR+ return(shifted.new.A)R+ }R+ return(f.gstar)R+ } ournal of Statistical Software R> f.gstar <- define_f.gstar(shift.rate = 0.5)R>R> tmleCom_Options(maxNperBin = N, bin.method = "dhist", nbins = 8)R>R>

Are there any sample data provided in the package so that users can play analysis on them?

Yes, the package comes with four sample datasets. comSample.wmT.bA.bY_list is an exam-ple of a hierarchical data containing a community-level binary exposure with a Individual-Level binary outcome. And indSample.iid.cA.cY_list is an example of a non-hierarchicaldata containing a continuous ex- posure with a continuous outcome. One non-hierarchicalsample dataset is indSample.iid.bA.bY.rareJ1_list , which contains a binary exposurewith a rare outcome (i.e., independent case-control scenario where J = 1). Beside, thedata structure of another dataset indSample.iid.bA.bY.rareJ2_list is identical to this of indSample.iid.bA.bY.rareJ1_list , except that now the ratio of the number of controls tothe number of case J is 2. Can the tmleCommunity package handle panel data transformation before performing TMLEanalysis?

Yes. The panelData_Trans function provides a wide variety of ways of data transformationfor panel datasets, such as ﬁxed eﬀect and pooling model. It also allows users to only applytransformation on regressors of interests, instead of on the entire dataset. For example,before running the tmleCommunity function on the data set simulated in section 4.1 when theworking model fails, we want to apply ﬁxed eﬀect transformation where the individual eﬀectis introduced, then we can use

R> pData.FE <- panelData_Trans(R> data = comSample.wmF, xvar = c("E1", "E2", "W1", "W2", "W3", "A"),R> yvar = "Y", index = "id", effect = "individual",R> model = "within", transY = TRUE)

Besides, we can keep the outcome variable ﬁxed during the panel transformation by setting transY = False . Additional details can be found in the package manual https://github.com/chizhangucb/tmleCommunity/blob/master/tmleCommunity_Package_Documentation.pdf .

7. Acknowledgments

This work was supported by National Institutes of Health Directorˆa ˘A´Zs New Innovator AwardProgram DP2HD080350 (PI: Jennifer Ahern).0 tmleCommunity : Target Maximum Likelihood Estimation for Community-level Data

References

Balzer LB, Zheng W, van der Laan MJ, Petersen ML, for the SEARCH Collaboration (2017).“A New Approach to Hierarchical Data Analysis: Targeted Maximum Likelihood Estima-tion of Cluster-Based Eﬀects Under Interference.”

ArXiv e-print arXiv:1706.02675 .Bembom O, van der Laan MJ (2007). “A Practical Illustration of the Importance of RealisticIndividualized Treatment Rules in Causal Inference.”

Electronic Journal of Statistics , (0),574–596.Boyd RC, Wooden TD, Munro MA, Liu T, Have TT (2008). “The Impact of CommunityViolence Exposure on Anxiety in Children of Mothers with Depression.” Journal of Child& Adolescent Trauma , (4), 287–300.Brooks-Gunn J, Aber JL, Duncan GJ (1997). Neighborhood Poverty. Context and Conse-quences for Children. Volume I . Russell Sage Foundation.Coyle JR, Hejazi NS, Malenica I, Sofrygin O (2018). sl3: Modern Pipelines for MachineLearning and SuperLearning . R package version 1.1.0, URL https://doi.org/10.5281/zenodo.1342294 .Denby L, Mallows C (2009). “Variations on the Histogram.”Enea M (2017). speedglm: Fitting Linear and Generalized Linear Models to Large Data Sets .R package version 0.3-2, URL https://CRAN.R-project.org/package=speedglm .Gardiner JC, Luo Z, Roman LA (2009). “Fixed eﬀects, random eﬀects and GEE: What arethe diﬀerences?”

Statistics in Medicine , (2), 221–239.Gruber S, van der Laan MJ (2010). “A Targeted Maximum Likelihood Estimator of a CausalEﬀect on a Bounded Continuous Outcome.” The International Journal of Biostatistics , (1).Gruber S, van der Laan MJ (2012). “tmle: An R Package for Targeted Maximum LikelihoodEstimation.” Journal of Statistical Software , (13).Hastie T (2017). gam: Generalized Additive Models . R package version 1.14.4, URL https://CRAN.R-project.org/package=gam .Iv´an DMn, van der Laan MJ (2011). “Population Intervention Causal Eﬀects Based onStochastic Interventions.” Biometrics , (2), 541–549.Kling JR, Ludwig J, Katz LF (2005). “Neighborhood Eﬀects on Crime for Female and MaleYouth: Evidence From a Randomized Housing Voucher Experiment.” Quarterly Journal ofEconomics , (1), 87–130.Laird NM, Ware JH (1982). “Random-Eﬀects Models for Longitudinal Data.” Biometrics , (4), 963.LeDell E (2016). h2oEnsemble: H2O Ensemble Learning . R package ver-sion 0.1.8, URL https://github.com/h2oai/h2o-3/tree/master/h2o-r/ensemble/h2oEnsemble-package . ournal of Statistical Software Biometrika , (1), 13.Luedtke AR, van der Laan MJ (2016). “Super-Learning of an Optimal Dynamic TreatmentRule.” The International Journal of Biostatistics , (1).Oakes J (2004). “The (Mis)estimation of Neighborhood Eﬀects: Causal Inference for a Prac-ticable Social Epidemiology.” Social Science & Medicine , (10), 1929–1952.Pearl J (1995). “Causal Diagrams for Empirical Research.” Biometrika , (4), 702–710.Pearl J (2009). “Causal inference in statistics: An overview.” Statistics Surveys , (0), 96–146.Petersen ML, Porter KE, Gruber S, Wang Y, van der Laan MJ (2010). “Diagnosing andResponding to Violations in the Positivity Assumption.” Statistical Methods in MedicalResearch , (1), 31–54.Polley EC, LeDell E, Kennedy C, van der Laan MJ (2017). SuperLearner: uper LearnerPrediction . R package version 2.0-22, URL https://CRAN.R-project.org/package=SuperLearner .Prague M, Wang R, Stephens A, Tchetgen Tchetgen E, DeGruttola V (2016). “Accountingfor interactions and complex inter-subject dependency in estimating treatment eﬀect incluster-randomized trials with missing outcomes.”

Biometrics , (4), 1066–1077.R Core Team (2017). R: A Language and Environment for Statistical Computing . R Foun-dation for Statistical Computing, Vienna, Austria. URL .Raudenbush SW, Willms J (1995). “The Estimation of School Eﬀects.”

Journal of Educationaland Behavioral Statistics , (4), 307–335.Robins J (1986). “A New Approach to Causal Inference in Mortality Studies with a Sus-tained Exposure Period - Application to Control of the Healthy Worker Survivor Eﬀect.” Mathematical Modelling , (9-12), 1393–1512.Scott DW (1992). Multivariate Density Estimation . Wiley.Sofrygin O, van der Laan MJ (2015). tmlenet: Targeted Maximum Likelihood Estimation forNetwork Data . R package version 0.1.0, URL https://CRAN.R-project.org/package=tmlenet .Steele JL (2016). “Race and General Strain Theory: Examining the Impact of Racial Dis-crimination and Fear on Adolescent Marijuana and Alcohol Use.”

Substance Use & Misuse , (12), 1637–1648.University of California San Francisco (2013). “Sustainable East Africa Research inCommunity Health (SEARCH).” URL https://clinicaltrials.gov/ct2/show/study/NCT01864603 .van der Laan MJ (2010). “Estimation of Causal Eﬀects of Community Based Interventions.” U.C. Berkeley Division of Biostatistics Working Paper Series , p. Working Paper 268.2 tmleCommunity : Target Maximum Likelihood Estimation for Community-level Data van der Laan MJ (2014). “Causal Inference for a Population of Causally Connected Units.”

Journal of Causal Inference , (1).van der Laan MJ, Luedtke AR (2015). “Targeted Learning of the Mean Outcome under anOptimal Dynamic Treatment Rule.” Journal of Causal Inference , (1).van der Laan MJ, Polley EC, Hubbard AE (2007). “Super Learner.” Statistical Applicationsin Genetics and Molecular Biology , (1). doi:10.2202/1544-6115.1309 .van der Laan MJ, Rubin D (2006). “Targeted Maximum Likelihood Learning.” The Interna-tional Journal of Biostatistics , (1).van der Laan MJ, Sherri R (2011). Targeted Learning: Causal Inference for Observationaland Experimental Data . Springer New York.

Aﬃliation:

Chi ZhangDivision of BiostatisticsUniversity of California, BerkeleyBerkeley, CA 94720E-mail: [email protected]

Journal of Statistical Software published by the American Statistical Association

Volume VV, Issue II

Submitted: yyyy-mm-ddMMMMMM YYYY