tmleCommunity: A R Package Implementing Target Maximum Likelihood Estimation for Community-level Data
Chi Zhang, Jennifer Ahern, Mark J. van der Laan, Oleg Sofrygin
JJSS
Journal of Statistical Software tmleCommunity: A R Package ImplementingTarget Maximum Likelihood Estimation forCommunity-level Data
Chi Zhang
University of California, Berkeley
Jennifer Ahern
University of California, Berkeley
Oleg Sofrygin
University of California, Berkeley
Mark J. van der Laan
University of California, Berkeley
Abstract
The tmleCommunity package is recently developed to implement targeted minimumloss-based estimation (TMLE) of the effect of community-level intervention(s) at a singletime point on an individual-based outcome of interest, including the average causal ef-fect. Implementations of the inverse-probability-of-treatment-weighting (IPTW) and theG-computation formula (GCOMP) are also available. The package supports multivariatearbitrary (i.e., static, dynamic or stochastic) interventions with a binary or continuous out-come. Besides, it allows user-specified data-adaptive machine learning algorithms through
SuperLearner , sl3 and h2oEnsemble packages. The usage of the tmleCommunity package,along with a few examples, will be described in this paper. Keywords : TMLE, causal inference, community-level intervention, stochastic intervention,deterministic intervention, machine learning, additive treatment effect.. a r X i v : . [ s t a t . A P ] J un tmleCommunity : Target Maximum Likelihood Estimation for Community-level Data
1. Introduction
The literature in fields such as epidemiology, econometrics and social science on the causalimpact of community-level intervention, is rapidly evolving, both in observational studies andrandomized trials. In observation settings, there is a rich literature on assessment of causaleffects of families, schools and neighborhoods on child and adolescent development (Brooks-Gunn et al. et al. (2008) is to estimate the impact of community violence exposure on anxiety amongchildren of African American mothers with depression. Similarly, randomized communitytrials have increased in recent years. As pointed out by Oakes (2004) and Steele (2016),scientifically speaking, community randomized controlled trials (CRCT) would be a superiorstrategy estimate the effects of community-level exposures due to self-selection and otherdifficulties. One example is the MTO study, which estimates the lower-poverty neighborhoodeffects on crime for female and male youth (Kling, Ludwig, and Katz 2005). Another CRCTexample is the ongoing SEARCH study, which estimates the community level interventionsfor the elimination of HIV in rural communities in East Africa (University of California SanFrancisco 2013). Despite recent statistical advances, many of the current applications still relyon estimation techniques such as random effect models (or mixed models) (Laird and Ware1982) and generalized estimating equations (GEE) approach (Liang and Zeger 1986; Gardiner et al. et al. (2010), thisstrong assumption could be quite unrealistic in many cases. For example, patients with cer-tain characteristics may never receive a particular treatment. On the other hand, a stochasticintervention is one in which each subject receives a probabilistically assigned treatment basedon a known specified mechanism. Because the form of the positivity assumption needed foridentifiability is model and parameter-specific, stochastic intervention causal parameters arenatural candidates if requiring a weaker version of positivity compared to other causal pa-rameters for continuous exposures. Furthermore, a policy intervention will lead to stochasticrather than deterministic interventions if the exposure of interest can only be manipulatedindirectly, such as when studying the benefits of vigorous physical activity on a health out-come of interest in the elderly (Bembom and van der Laan 2007). Because it is unrealisticto enforce every elderly person to have a certain level of physical activity depending on adeterministic rule. To deal with the previous considerations, stochastic interventions could bea more flexible strategy of defining a question of interest and being better supported by thedata than deterministic interventions. Thus, using stochastic intervention causal parametersis a good way of estimating causal effects of realistic policies, which could also be naturallyused to define and estimate causal effects of continuous treatments or categorical multilevel ournal of Statistical Software R (R Core Team 2017) packages that have been instructive forthe development of our package: The tmle (Gruber and van der Laan 2012) package performsparameter estimations for a single time point binary intervention for independent and identi-cally distributed (IID) data, including the average treatment effect (ATE), controlled directeffects (CDE), and the parameters of a marginal structural model (MSM). Besides, Sofryginand van der Laan (2015) developed another R package called tmlenet , which provides threeestimators for average causal effects (and ATE) for single time point arbitrary interventions(univariate or multivariate; static, dynamic or stochastic) in the context of network-dependent(non-IID) data, including TMLE, the inverse-probability-of-treatment-weighting (IPTW) andthe parametric G-computation formula (GCOMP). This package performs logistic regressionthrough glm and speedglm .The development of the tmleCommunity package for R was motivated by the increasing de-mand of a user-friendly tool to estimate the impact of community-level arbitrary exposures incommunity-independent data structures with a semi-parametric efficient estimator. The tm-leCommunity package also extends some of the capabilities of tmlenet by optionally allowingflexible data-adaptive estimations through SuperLearner , sl3 and h2oEnsemble packages, oreven user-supplied machine learning algorithms. Besides, it allows for panel data transforma-tion, such as with random effects and fixed effects. tmleCommunity is available on github at https://github.com/chizhangucb/tmleCommunity . The article focuses on the practical usage of the tmleCommunity through multiple examples,therefore we omit many of the technical details. For a description of the TMLE framework forindependent community data with static community-level interventions, we refer to (Balzer et al. tmleCommunity : Target Maximum Likelihood Estimation for Community-level Data expanded to include pooled individual-level regressions based on the working model. Thefirst TMLE also includes the case of observing one individual per community unit as a specialcase. The second individual-level TMLE is developed under a more restricted hierarchicalmodel in which the additional assumption of dependence holds.Section 3 shows how the tmleCommunity package is used to estimate those parameters pro-posed in the prior section through a few examples, and summarizes the common features ofthe functions that may be useful to tmleCommunity users. Then section 4 uses three sim-ulation studies to demonstrate implementation in different observational settings. Section 5discusses the possible extensions to the methodology and the package in the future. In section6 we answer some frequently asked questions regarding the package.
2. Single time-point multivariate intervention
Throughout this manuscript, we use the bold font capital letters to denote random vectorsand matrices. In studies of community-level interventions, we begin with a simple scenariothat involves randomly selecting J independent communities from some target population ofcommunities, sampling individuals from those chosen communities, and measuring baselinecovariates and outcomes on each sampled individual at a single time point. Also, the numberof chosen individuals within each community is not fixed, so communities are indexed with j = 1 , , ..., J and individual within the j th community are indexed with i = 1 , ..., N j .After selection of the communities and individuals, pre-intervention covariates and a post-intervention outcome are measured on each sampled unit. Because only some of the pre-intervention covariates have clear individual-level counterpart, the pre-intervention covariatesseparates into two sets: firstly, let denote W j,i the (1 × p ) vector of p such individual-levelbaseline characteristics, and so W j = ( W j,i : i = 1 , ..., N j ) is an ( N j × p ) matrix of individual-level characteristics; secondly let E j represent the vector of community-level (environmental)baseline characteristics that have no individual-level counterpart and are shared by all com-munity members, including the number of individuals selected within the community (i.e., N j ∈ E j ). Last, A j is the exposure level assigned or naturally occurred in community j and Y j = ( Y j,i : i = 1 , ..., N j ) is the vector of individual outcomes of interest.In order to translate the scientific question of interest into a formal causal quantity, we firstspecify a NPSEM with endogenous variables X = ( E, W , A, Y ) that encodes our knowledgeabout the causal relationships among those variables and could be applied in both observa-tional setting and randomized trials (Pearl 1995, 2009). U = ( U E , U W , U A , U Y ) ∼ P U E = f E ( U E ) (1) W = f W ( E, U W ) A = f A ( E, W , U A ) Y = f Y ( E, W , A, U Y ) . where the U components are exogenous error terms, which are unmeasured and random withan unknown distribution P U . Given an input U , the function F = { f E , f W , f A , f Y } de- ournal of Statistical Software Y is affected by its baseline community and individual-level covariates ( E, W ) together with its community-level intervention and unobserved factors( A, U Y ). First, while we might have specification of f A , , the structural equations f E , f W , f Y do not necessarily restrict the functional form of the causal relationships, which coud benonparametric (entirely unspecific), semiparametric or parametric that incorporates domainknowledge. Second, as summarized by Balzer et al. (2017), structural causal model (1) coversa wide range of practical scenarios as it allows for the following types of between-individualdependencies within a community: (i) the individual-level covariates (and outcomes) amongmembers of a community may be correlated as a consequence of shared measured and unmea-sured community-level covariates ( E, U E ), and of possible correlations between unmeasuredindividual-level error terms ( U W , U Y ), and (ii) an individual’s outcome Y j,i may influenceanother’s outcome Y j,l within community j , and (iii) an individual’s baseline covariates W j,l may influence another’s outcome Y j,i . Actually, we can make an assumption about the thirdtype of between-individual dependence, and so the structural equation f Y will be specifiedunder this assumption. More details will be discussed in section (2.8.1). Third, an importantingredient of this model is to assume that distinct communities are causally independent andidentically distributed. The NPSEM defines a collection of distributions ( U, X ), representingthe full data model, where each distribution is determined by F and P U (i.e., P U,X, is thetrue probability distribution of ( U, X )). We denote the model for P U,X, with M F . M F allows us to define counterfactual random variables as functions of ( U, X ), correspondingwith arbitrary interventions. For example, with a static intervention on A , counterfactual Y a can be defined as f Y ( E, W , a, U Y ), replacing the structural equation f A with the constant a (van der Laan and Sherri 2011). Thus, Y j,a = ( Y j,i,a : i = 1 , ..., N j ) represents the vector ofindividual-level outcomes that would have been obtained in community j if all individuals inthat community had actually been treated according to the exposure level a . More generally,we can replace data generating functions for A that correspond with degenerate choices ofdistributions for drawing A , given U = u and ( E, W ), by user-specified conditional distribu-tions of A ∗ . Such non-degenerate choices of intervention distributions are often referred to asstochastic interventions.First, let g ∗ denote our selection of a stochastic intervention identified by a set of multivariateconditional distributions of A ∗ , given the baseline covariates ( E, W ). For convenience, werepresent the stochastic intervention with a structural equation A ∗ = f A ∗ ( E, W , U A ∗ ) interms of random errors U A ∗ , and so define Y g ∗ = f Y ( E, W , A ∗ , U Y ). Then Y j,g ∗ = ( Y j,i,g ∗ : i = 1 , ..., N j ) denotes the corresponding vector of individual-level counterfactual outcome forcommunity j . Second, let Y c denote a scalar representing a community-level outcome that isdefined as a aggregate of the outcomes measured among individuals who are members within acommunity, and so Y cg ∗ is the corresponding community-level counterfactual of interest. Onetypical choice of Y cj,g ∗ is the weighted average response among the N j individuals sampledfrom community j , i.e. Y cj,g ∗ ≡ (cid:80) N j i =1 α j,i Y j,i,g ∗ , for some user-specified set of weights α forwhich (cid:80) N j i =1 α j,i = 1. If the underlying community size N j differs, a natural choice of α j,i isthe reciprocal of the community size (i.e., α j,i = 1 /N j ). tmleCommunity : Target Maximum Likelihood Estimation for Community-level Data We focus on community-level causal effects where all communities in the target populationreceive the intervention g ∗ , then our causal parameter of interest is given byΨ F ( P U,X, ) = E U,X [ Y cg ∗ ] = E U,X (cid:110) N (cid:88) i =1 α i Y i,g ∗ (cid:111) To simplify the expression, we use α i = 1 /N in the remainder of article.One type of stochastic interventions could be a shifted version of the current treatment mech-anism g , i.e., P g ∗ ( A = a | E, W ) = g ( a − ν ( E, W ) | E, W ) given a known shift function ν ( E, W ). A simple example is a constant shift of ν ( E, W ) = 0 .
5. Another more complextype could be stochastic dynamic interventions, in which the interventions can be viewed asrandom assignments among dynamic rules. A simple example corresponding to the previousshift function is P g ∗ ( A = a | E, W ) = g (max { a − . , min( a ) }| E, W ), indicating that shiftedexposure A ∗ is always bounded by the minimum of the observed exposure A .One might also be interested in the contrasts of the expectation of community-level outcomeacross the target population of communities under different interventions, i.e.,Ψ F ( P U,X, ) = E U,X ( Y cg ∗ ) − E U,X ( Y cg ∗ ) = E U,X (cid:110) N N (cid:88) i =1 Y i,g ∗ (cid:111) − E U,X (cid:110) N N (cid:88) i =1 Y i,g ∗ (cid:111) where g ∗ and g ∗ are two different stochastic interventions.Finally, additive treatment effect is a special case of average causal effect with two staticinterventions g ∗ (1 | e, w ) = 1 and g ∗ (0 | e, w ) = 1 for any e ∈ E, w ∈ W , i.e., E U,X ( Y c (1)) − E U,X ( Y c (0)) = E U,X [ Y cg ∗ (1 | e, w )=1 ] − E U,X [ Y cg ∗ (0 | e, w )=1 ] Consider the study design presented above where for a randomly selected community, the ob-served data consist of the measured pre-intervention covariates, the intervention assignment,the vector of individual-level outcomes. Formally, one observation on community j , is codedas O j,i = ( E j , W j,i , A j , Y j,i )which follows the typical time ordering for the variables measured on the i th individuals withinthe j th community.Assume the observed data consists of J independent and identically distributed copies of O j = ( E j , W j , A j , Y j ) ∼ P , where P is an unknown underlying probability distribution ina model space M I . Here M I = { P ( P U,X ) : P U,X ∈ M F } denotes the statistical model that isthe set of possible distributions for the observed data O and only involves modeling g (i.e.,specification of f A ). The true observed data distribution is thus P = P ( P U,X, ). ournal of Statistical Software P U,X, ) on the NPSEM and providing anexplicit link between this model and the observed data, we lay the groundwork for addressingthe identifiability through P .In order to express Ψ F ( P U,X, ) as a parameter of the distribution P of the observed data O ,we now need to address the identifiability of E U,X [ Y cg ∗ ] by adding two key assumptions on theNPSEM: the randomization assumption so called ”no unmeasured confounders” assumption(Assumption 1) and the positivity assumption (Assumption 2). The identifiability assump-tions will be briefly reviewed here, for details on identifiability, we refer to see (Robins 1986;van der Laan 2010, 2014; Iv´an and van der Laan 2011). Assumption 1. A | = Y a | E, W where the counterfactual random variable Y a represents a collection of outcomes measuredon the individuals from a community if its intervention is set to A = a in causal model (1),replacing the structural equation f A with the constant a . Assumption 2. sup a ∈A g ∗ ( a | E, W ) g ( a | E, W ) < ∞ , almost everywherewhere g ∗ ( a | E, W ) = P g ∗ ( A = a | E, W ).Informally, Assumption 1 restricts the allowed distribution for P U to ensure that A and Y shares no common causes beyond any measured variables in X = ( E, W , A, Y ). For example,assumption 1 holds if U A is independent of U Y , given E, W . Besides, this randomizationassumption implies A ∗ | = Y a | E, W . Under Assumption 1 and , jointly with the consistencyassumption (i.e., A = a implies Y a = Y ), P ( Y g ∗ = y | A ∗ = a, E = e, W = w ) = P ( Y a = y | A ∗ = a, E = e, W = w ) = P ( Y a = y | E = e, W = w ) = P ( Y = y | A = a, E = e, W = w ),so our counterfactual distribution P ( Y g ∗ = y ) can be written as: P ( Y g ∗ = y ) = (cid:90) e, w (cid:90) a P ( Y g ∗ = y | A ∗ = a, E = e, W = w ) g ∗ ( a | e, w ) dµ ( a ) dP E, W ( e, w )= (cid:90) e, w (cid:90) a P ( Y a = y | E = e, W = w ) g ∗ ( a | e, w ) dµ a ( a ) dP E, W ( e, w )by assumption 1 and A ∗ | = Y a | E, W = (cid:90) e, w (cid:90) a P ( Y = y | A = a, E = e, W = w ) g ∗ ( a | e, w ) dµ a ( a ) dP E, W ( e, w )by consistency assumptionwith respect to some dominating measure µ a ( a ).Then, E U,X [ Y g ∗ ] is identified by the G-computational formula (Robins 1986): E U,X [ Y g ∗ ] = E E, W [ E g ∗ [ Y | A ∗ = a, E, W ]]= (cid:90) e, w (cid:90) a E g ∗ ( Y | a, e, w ) g ∗ ( a | e, w ) dµ a ( a ) dP E, W ( e, w )This provides us with a general identifiability result for E U,X [ Y cg ∗ ], the causal effect of thecommunity-level stochastic intervention on any community-level outcome Y c that is some tmleCommunity : Target Maximum Likelihood Estimation for Community-level Data real valued function of the individual-level outcome Y : E U,X [ Y cg ∗ ] = E U,X [ n (cid:88) i =1 α i Y g ∗ ,i ] = n (cid:88) i =1 α i E E, W [ E g ∗ [ Y i | A ∗ , E, W ]] ≡ Ψ I ( P ) = ψ I If we only assume the randomization assumption in the previous section, then the statisticalmodel M I is nonparametric. Based on the result of identifiability, we note that Ψ I : M I → R represents a mapping from a probability distribution of O into a real number, and Ψ I ( P )denotes the target estimand corresponding to the target causal quantity E U,X [ Y g ∗ ].Before defining the statistical parameter, we introduce some additional notation. First, wedenote the marginal distribution of the baseline covariates ( E, W ) by Q E, W , with a well-defined density q E, W , with respect to some dominating measure µ y ( y ). There is no additionalassumption of independence for Q E, W . Second, let G denote the observed exposure con-ditional distribution for A that has a conditional density g ( A | E, W ). Third, we assumethat all Y within a community are sampled from the distribution Q Y with density given by q Y ( Y | A, E, W ), conditional on the exposure and the baseline covariates A, E, W . Now weintroduce the notation P = P ˜ Q,G for ˜ Q = ( Q Y , Q E, W ), and the statistical model becomes M I = { P ˜ Q,G : ˜ Q ∈ ˜ Q , G ∈ G} , where ˜ Q and G denote the parameter space for ˜ Q and G ,respectively, and ˜ Q here is nonparametric.Next, we define G ∗ as the user-supplied intervention with a new density g ∗ , which will replacethe observed conditional distribution G . So G ∗ is a conditional distribution that describeshow each intervened treatment is produced conditional on the baseline covariate ( E, W ).Given ˜ Q and G ∗ , we use O ∗ = ( O ∗ j,i = ( E j , W j,i , A ∗ j , Y ∗ j,i ) : i = 1 , ..., N j , j = 1 , , , ., J ) todenote a random variable generated under the post-intervention distribution P ˜ Q,G ∗ . Namely, P ˜ Q,G ∗ is the post-intervention distribution of observed data O under stochastic intervention G ∗ (Robins 1986), and the likelihood for P ˜ Q,G ∗ can be factorized as: p ˜ Q,G ∗ ( O ∗ ) = [ J (cid:89) j =1 q Y ( Y ∗ j | A ∗ j , W j , E j )][ J (cid:89) j =1 g ∗ ( A ∗ j | E j , W j )] q E, W ( E, W ) (2)Thus our target statistical quantity is now defined as ψ I = Ψ I ( P ) = E ˜ q ,g ∗ [ Y cg ∗ ], where Ψ I ( P )is the target estimand of the true distribution of the observed data P ∈ M I (i.e., a mappingfrom the statistical model M I to R ). We define ¯ Q ( A j , E j , W j ) = (cid:82) y q Y ( y | A j , W j , E ) dµ y ( y )as the conditional mean evaluated under common-in- j distribution Q Y , and so ¯ Q c ( A, E, W ) ≡ E ( Y c | A, E, W ) as the conditional mean of the community-level outcome. Now we can referto Q = ( ¯ Q c , Q E, W , ) as the part of the observed data distribution that our target parameteris a function of (i.e., with a slight abuse of notation Ψ I ( P ) = Ψ I ( Q )), the parameter ψ I canbe written as: ψ I = (cid:90) e ∈E , w ∈W (cid:90) a ∈A ¯ Q c ( a, e, w ) g ∗ ( a | e, w ) dµ a ( a ) q E, W , ( e, w ) dµ e,w ( e, w ) (3)with respect to some dominating measures µ a ( a ) and µ e,w ( e, w ), where ( A , E , W ) is the com-mon support of ( A, E, W ). ournal of Statistical Software In the previous section, we have defined a statistical model M I for the distribution of O ,and a statistical target parameter mapping Ψ I for which Ψ I ( P Q,G ∗ ) only depends on Q .Now we want to estimate Ψ I ( Q ) via a target maximum likelihood estimator (TMLE) andconstruct an asymptotically valid confidence interval through the efficient influence curve(EIC). Furthermore, we present a novel method for the estimation of the outcome regressionin which incorporates additional knowledge about the data generating mechanism that mightbe known by design. Community-level TMLE
As a two-stage procedure, TMLE needs to estimate both the outcome regressions ¯ Q andtreatment mechanism g . Since TMLE solves the EIC estimating equation, its estimator in-herits the double robustness property of this EIC and is guaranteed to be consistent (i.e.,asymptotically unbiased) if either ¯ Q or g is consistently estimated. For example, in a com-munity randomized controlled trial g is known to be 0.5 and can be consistently estimated,thus its TMLE will always be consistent. Besides, TMLE is efficient when both are consis-tently estimated. In other words, when g is consistent, a choice of the initial estimator for ¯ Q that is better able to approximate the true value ¯ Q may improve the asymptotic efficiencyalong with finite sample bias and variance of the TMLE (van der Laan and Rubin 2006).The community-level TMLE first obtains an initial estimate ˆ¯ Q c ( A, E, W ) for the conditionalmean of the community-level outcome ¯ Q c ( A, E, W ), and also an estimate ˆ g ( A | E, W ) of thecommunity-level density of the conditional treatment distribution g ( A | E, W ). The second tar-geting step is to create a targeted estimator ˆ¯ Q c ∗ of ¯ Q c by updating the initial fit ˆ¯ Q c ( A, E, W )through a parametric fluctuation that exploits the information in the density for the condi-tional treatment distribution. We briefly review the estimation results and statistical inferencehere. For further discussion, please see (Iv´an and van der Laan 2011; Sofrygin and van derLaan 2015; Balzer et al. Q c ( A j , E j , W j ) , ˆ g ( A j = a | E j , W j ) and ˆ g ∗ ( A j = a | E j , W j ) for each community j =1 , ..., J , the predictions of the community-level outcome is easily obtained byˆ¯ Q c ∗ ( a, E j , W j ) = expit { logit ( ˆ¯ Q c ( a, E j , W j ) + ˆ (cid:15) ˆ H j ( a, E j , W j )) } , ∀ j = 1 , ..., J where ˆ H j ( a, E j , W j ) = ˆ g ∗ ( A j = a | E j , W j )ˆ g ( A j = a | E j , W j ) displays the community-level clever covariate and thefluctuation parameter (cid:15) is obtained by a logistic regression of Y c on ˆ H with offset logit( ¯ Q cn )(Another way to achieve the targeting step is to use weighted regression intercept-basedTMLE, where (cid:15) is obtained by a intercept-only weighted logistic regression of Y c with offsetlogit( ¯ Q cn ), predicted weights logit( ¯ Q cn ) and no covariates.)Thus our targeted substitution estimator is the weighted mean of the targeted predictionsacross the J communities. One natural choice is the empirical mean defined as follows:ˆΨ I ( P ˆ Q ∗ , ˆ g ∗ ) = 1 J J (cid:88) j =1 (cid:90) e j , w j (cid:90) a ˆ¯ Q c ∗ ( a, e j , w j )ˆ g ∗ ( a | e j , w j ) dµ a ( a ) q E, W ( e j , w j ) dµ e,w ( e j , w j )In practice, community-level TMLE variance is asymptotically estimated as Var( ˆΨ I ( ˆ Q ∗ )) ≈ tmleCommunity : Target Maximum Likelihood Estimation for Community-level Data ˆ σ J J , where ˆ σ J is the sample variance of the estimated influence curve obtained byˆ σ J = 1 J J (cid:88) j =1 { D I ( ˆ Q ∗ , ˆ g )( O j ) } where D I ( ˆ Q ∗ , ˆ g ) is the plug-in estimator of the efficient influence curve of Ψ I at P : D I ( P )( O ) = g ∗ g ( A | E, W )( Y c − ¯ Q c ( A, E, W ))+ E g ∗ [ ¯ Q c ( A, E, W ) | E, W ] − Ψ I ( P Q,g ∗ )where, E g ∗ [ ¯ Q c ( A, E, W ) | E, W ] = (cid:90) a ¯ Q c ∗ ( a, E, W ) g ∗ ( a | E, W ) dµ a ( a )This quantity ˆ σ can be used to calculate p values and 95% confidence intervals for differentparameters, e.g., ˆΨ I ( P ˆ Q ∗ , ˆ g ∗ ) ± . ˆ σ J J for the ATE. Incorporating hierarchical structure for estimating outcome mechanism
Based on the previously defined community-level TMLE for the mean of the exposure-specificcounterfactual community level outcome, we can still incorporate individual level data ratherthan simply community wide aggregates of that data. As discussed in section (2.2), one typicalchoice of the community-level counterfactuals of interest is the weighed average responseamong all individuals sampled from that community, i.e., Y cj,g ∗ = (cid:80) N j i =1 α j,i Y j,i,g ∗ . Hence, theconditional mean of the community-level outcome can be rewritten as a weighted averageof the individual-level outcomes, ¯ Q c ( A, E, W ) = E ( Y c | A, E, W ) = (cid:80) Ni =1 α i E ( Y i | A, E, W ) ≡ (cid:80) Ni =1 α i E ( Y i | A, E, W , N ) where the community-specific sample size N is a random variablethat is included in the community-level baseline covariates E .Without changing the underlying structural causal model (1), estimand and efficient influencecurve, we may use an individual-level working model to incorporate pooled individual-leveloutcome regressions as candidates in the Super Learner library for initial estimation of theexpected community-level outcome ¯ Q c ( A, E, W ) given community and individual level co-variates, along with community-level exposures. Specially, we propose a working model thatassumes that E ( Y i | A, E, W ) = E ( Y i | A, E, W i ) = ¯ Q ( A, E, W i ) (4)for a common function ¯ Q . In practice, this working model suggests that each individual’soutcome is drawn from a common distribution that may depend on the individual’s baselinecovariates, together with the intervention and community-level baseline covariates presentedin his or her community, but is not directly influenced by the covariates of others in the samecommunity.Furthermore, the strength of the working assumptions could be weakened by encoding theknowledge of the dependence relationship among individuals within communities, namely,defining E to progressively contain a larger subset of any individual-level covariates includedin W (Balzer et al. ournal of Statistical Software W i .Let F i denote the subset of individuals whose baseline individual-level covariates affect thatindividual’s outcome Y i , where i ∈ F i . Now we have a less restricted and more general versionof (4) as working model: E ( Y i | A, E, W ) = ¯ Q ( A, E, ( W l : l ∈ F i )) (5)for a common function ¯ Q .The implementation of the community-level TMLE incorporating hierarchical data is similarto the previous community-level TMLE, except that the estimation of the community-leveloutcome could be based on a single pooled individual level regression Y j,i on ( E j , A j , W j,i )when assuming the aforementioned working model (4). We note that this TMLE never claimsthat the individual-level working model holds, instead, it uses the working model as a meansto generate an initial estimator of ¯ Q c . Special case where one observation per community
We will now consider a special case where each community has only one individual (i.e., N = 1), and so all individual-level baseline covariates can be treated as environmental factors(i.e., ( E, W ) = E ). Nonparametric structural equation model
Consider a NPSEM with structural equations for endogenous variables X = ( E, A, Y ), E = f E ( U E ) (6) A = f A ( E, U A ) Y = f Y ( E, A, U Y ) . with endogenous unmeasured sources of random variation U = ( U E , U A , U Y ). Counterfactuals
Let Y a = f Y ( E, a, U Y ) denote the counterfactual corresponding with setting the treatment A = a , thus the community-level counterfactual outcome is the same as the only observation’soutcome in community j (i.e., Y cj,a ≡ Y j,a ). Observed data
Now the observed data become O = ( E, A, Y ). We observe J i.i.d observations on O . Target parameter on NPSEM
Consider the following parameter of the distribution of (
U, X ):Ψ F ( P U,X, ) = E U,X [ Y cg ∗ ] = E U,X [ Y g ∗ ] Identifiability Result Ψ F ( P U,X, ) = E U,X [ Y cg ∗ ] = E E [ E g ∗ [ Y | A ∗ , E ]] ≡ Ψ I ( P ) The statistical parameter
Now let Q = ( ¯ Q c , Q E, ) and so the conditional mean of the community-level outcome be-comes ¯ Q c ( A, E ) ≡ E ( Y c | A, E, ) = E ( Y | A, E ), with density given by q cY ( Y | A, E ). Then Q E ≡ tmleCommunity : Target Maximum Likelihood Estimation for Community-level Data P ( E ) and we assume q E is a well-defined density for Q E . Also, the community-level stochas-tic intervention is denoted as g ( A | E ) ≡ P ( A | E ). Given Q and G ∗ , the post-interventiondistribution P Q,G ∗ generates a new set of observation O ∗ = ( O ∗ j = ( E j , A ∗ j , Y ∗ j ) : j = 1 , ..., J ).Applying those newly modified notations produces the likelihood: p Q,G ∗ ( O ∗ ) = (cid:104) J (cid:89) j =1 q cY ( Y ∗ j | A ∗ j , E j ) (cid:105)(cid:104) J (cid:89) j =1 g ∗ ( A ∗ j | E j ) (cid:105) q E ( E )Based on the NPSEM above and the result of identifiability, we propose the following targetstatistical quantity of the distribution of O :Ψ I ( P ) = (cid:90) e ∈E (cid:90) a ∈A ¯ Q c ( a, e ) g ∗ ( a | e ) dµ a ( a ) q E, ( e ) dµ e ( e )with respect to some dominating measures µ a ( a ) and µ e ( e ). Estimation and inference
Similar to the setting in the regular community-level TMLE, given ˆ¯ Q c ( A j , E j ) , ˆ g ( A j = a | E j )and ˆ g ∗ ( A j = a | E j ) for each community j = 1 , ..., J , we have ˆ H j ( a, E j ) = ˆ g ∗ ( A j = a | E j )ˆ g ( A j = a | E j ) and anupdated fit of community-level outcome regression is given by:ˆ¯ Q c ∗ ( a, E j ) = expit { logit ( ˆ¯ Q c ( a, E j ) + ˆ (cid:15) ˆ H j ( a, E j )) } , ∀ j = 1 , ..., J Thus the targeted substitution estimator defined as follows:ˆΨ I ( P ˆ Q ∗ , ˆ g c ∗ ) = 1 J J (cid:88) j =1 (cid:90) e j (cid:90) a ˆ¯ Q c ∗ ( a, e j )ˆ g c ∗ ( a | e j ) dµ a ( a ) q E ( e j ) dµ e ( e j )Under regularity conditions, the TMLE ˆΨ I ( P ˆ Q ∗ , ˆ g ∗ ) is asymptotically linear with influencecurve that can be conservatively replaced by D I ( P )( O ) = g ∗ g ( A | E )( Y c − ¯ Q c ( A, E )) + E g ∗ [ ¯ Q c ( A, E ) | E ] − Ψ I ( P Q,g ∗ )where E g ∗ [ ¯ Q c ( A, E ) | E ] = (cid:82) a ¯ Q c ∗ ( a, E ) g ∗ ( a | E ) dµ a ( a ).Finally 95% confidence intervals can be constructed by ˆΨ I ( P ˆ Q ∗ , ˆ g ∗ ) ± . ˆ σ J J where ˆ σ J = J J (cid:80) j =1 { D I ( ˆ Q ∗ , ˆ g )( O j ) } . Individual-level TMLE
What if the third type of dependence in model (1) mentioned in section 2.1 is weak or evendoesn’t exist? This is so called ”no covariate interference” (Prague et al. et al. Y i is sampled from the same distribution ournal of Statistical Software W i , the baseline community-level covariates E , together with the community-level intervention and that individual’sunobserved factors ( A, U Y I ). Under this working assumption, we have E ( Y i | A, E, W ) =¯ Q ( A, E, W i ). Therefore, when background knowledge about Q is sufficient to ensure anassumption that working model (4) holds, this background changes both the underlying hi-erarchical causal model and the identifiability results, and so the statistical model, estimand,efficient influence curve, etc.In this section, we assume such additional knowledge is available and so consider a newhierarchical causal sub-model which restricts the dependence of individuals in a community.The NPSEM that represents the causal relationships among those endogenous variables isnow given by: E = f E ( U E ) W = f W ( E, U W ) (7) A = f A ( E, W , U A ) Y i = f Y ( E, W i , A, U Y i ) .U Y i | = U A | E, W i Besides, we also assume that there is a common conditional distribution of A given ( E, W i )across all individuals, i.e., P ( A | E, W i ) ≡ g I ( A | E, W ), where g I ( A | E, W ) denotes the individual-level stochastic intervention. Recall that we may be interested in Y cj,g ∗ I ≡ (cid:80) N j i =1 α j,i Y j,i,g ∗ I , withrespect to some individual-level stochastic intervention g ∗ I . Thus, by identifiability,Ψ F ( P U,X, ) = E U,X ( Y cg ∗ I ) = E U,X ( N (cid:88) i =1 α i Y i,g ∗ I )= N (cid:88) i =1 α i E Q E, W , (cid:110) E g ∗ I [ ¯ Q ( A, E, W i ) | E, W i ] (cid:111) ≡ Ψ II ( P Q,g ∗ I )where Ψ II : M II → R is the target statistical quantity under the key assumptions of identi-fiability and working assumption (4), and M II is a sub-model of M I .Now, the corresponding EIC is given by: D II ( P )( O ) = N (cid:88) i =1 α i [ g ∗ I g ,I ( A | E, W i )( Y i − ¯ Q ( A, E, W i )) + E g ∗ I [ ¯ Q ( A, E, W i ) | E, W i ] − Ψ II ( P Q,g ∗ I )]Also, the individual-level density of the conditional treatment distribution, adjusting for E and the individual specific covariate W i , is defined as g I ( a | e, w i ) = E W [ g I ( a | e, W ) | W i = w i ] = E W [ g I ( a | e, W − i , W i ) | W i = w i ]= (cid:90) w − i g I ( a | e, w − i , w i ) P ( W − i = w − i | W i = w i ) dµ ( w − i )= (cid:90) w − i g I ( a | e, w − i , w i ) P ( W − i = w − i ) dµ ( w − i )4 tmleCommunity : Target Maximum Likelihood Estimation for Community-level Data with respect to some dominating measure µ ( w − i ), and W − i represents an (( N − × p ) matrixof individual-level covariates, which includes all individuals in the community except i .Therefore, the estimate of the individual-level stochastic intervention is given byˆ g I ( a | e, w i ) = 1 J J (cid:88) j =1 (cid:90) w j, − i ˆ g I ( a | e j , w j, − i , w j,i ) P n ( W j, − i = w j, − i ) dµ ( w j, − i )and the substitution estimator of Ψ II ( P Q,g ∗ I ) is defined as follows:Ψ II ( P Q,g ∗ I ) = 1 J J (cid:88) j =1 n j (cid:88) i =1 α j,i (cid:90) e j ,w j,i (cid:90) a j ˆ¯ Q ∗ ( a, e j , w j,i )ˆ g ∗ I ( a | e j , w j,i ) dµ a ( a ) q E, W ( e j , w j ) dµ e,w ( e j , w j )
3. Implementation in the tmleCommunity package
Estimation of average causal effects for single time point arbitrary interventions in hierarchicaldata with the tmleCommunity package is performed with the main function tmleCommunity() ,along with the auxiliary function, tmleCom_Options() , setting additional options that con-trol the estimation algorithms in the package. Note that tmleCom_Options() needs to bespecified before calling the main function tmleCommunity() , otherwise the default settings ofall arguments to the tmleCom_Options() function will be used in the estimation procedure.
The observed data set is passed to tmleCommunity through the data argument as a dataframe, with the outcome column, the exposure column(s), the baseline covariates columnsand possibly the community identifier column (usually numeric values, but no factor val-ues are supported in the package). The data arguments include
Ynode, Anodes, WEnodes,communityID and YnodeDet , which are all column names or indices in the data frame datathat represent the outcome variable, exposure variable(s), community and individual levelbaseline covariates, community identifier variable, and indicators of deterministic values ofoutcome
Ynode , respectively. Only
Anodes and
WEnodes must be specified. If
Ynode is leftunspecified, the left-side of the regression formula in argument
Qform will be treated as
Ynode .If
YnodeDet is not NULL (its corresponding column should be logical or binary), observa-tions received TRUE or 1 as their
YnodeDet values are assumed to have constant valuesfor their
Ynode , thus not being used in the final estimation step. If communityID is notNULL (its corresponding column could be integer, numeric or character), it supports threeoptions in argument community.step , including "community_level", "individual_level" and "PerCommunity" . Otherwise, it assumes that the data set has no hierarchical structure(thus automatically choose the option "NoCommunity" ). More details will be discussed insection 3.2.The other optional arguments related to the data set - obs.wts , community.wts , fluctuation - are the sampling weights for each observation, the sampling weights for each community,the choice of the fluctuation working model, respectively. Besides, if fluctuation is specifiedas "logistic" (the default), continuous outcomes Y ∈ [ a, b ] will be bounded into the lineartransformed outcome prior to estimating the outcome mechanism. ournal of Statistical Software communityID, community.step and pooled.Q are the main arguments to determine theestimation methods for hierarchical data. First, in order to preserve the hierarchical datastructure, the data should contain a column with one unique identifier per community andthe user must provide the communityID argument as a column name or index in data. Failingto provide communityID will force the community.step argument to be automatically set to "NoCommunity" (the default) and to pool data across all communities and treat the data asnon-hierarchical.Second, If community.step is specified as "community_level" , the observed data will beaggregated to the community-level and the estimation of the corresponding statistical param-eter will be analogous to non-hierarchical data structures. Note that pooled.Q is in regard toincorporate the pairing of individual-level covariates and outcomes (also known as the workingmodel of ”no covariate inference”) in community-level TMLE although the working model isnot assumed to hold. In other words, when community.step is set to "community_level" ,if pooled.Q = TRUE , the pooled individual-level regressions will be added as candidates inthe Super Learner library for initial estimation of the outcome mechanism; If pooled.Q =FALSE, both outcome and treatment mechanisms are estimated on the community-leveland use no individual-level information. Moreover, community.step could be specified as "individual_level" under the assumption that the working model of ”no covariate infer-ence” holds. Third, the stratified TMLE that fits separate outcome and exposure mechanismsfor each community can be implemented by setting community.step to "perCommunity" .Examples with more details will be provided in section 4.Last but not least, the community.wts argument can be used to provide the community-level observation weights. If setting to "size.community" (the default), each communityis weighted by its (standard deviation scaled) number of individuals and this specificationinflates the weight for communities who are underrepresented due to a large degree of missingdata. If setting to "equal.community" , all communities will be weighted as the same. Theuser-specified community.wts may be passed as a matrix with 2 columns, where the first andsecond columns contain the identifiers of communities and the corresponding non-negativeweights, respectively. The user-supplied interventions are specified by the f_g0, f_gstar1 and f_gstar2 argu-ments. First, an intervention regimen that encodes model knowledge about values of Anodesis specified with the f_g0 argument, which is either a function or a vector (or a matrix / dataframe if exposures are multivariate) that provides true treatment mechanism under observedAnodes. If f_g0 is specified as a function, a large vector (or a data frame if multivariate) ofAnodes will be sampled from the f_g0 function. Second, an intervention regimen of interestis defined by replacing the conditional density g with a new user-supplied density g ∗ , andis specified by the f_gstar1 argument, which takes a function of counterfactual exposures.The function must include ”data” as one of its argument names, and return a vector or adata frame of counterfactual exposures evaluated based on Anodes, WEnodes (and possibly communityID ) passed as a named argument "data" . In addition, the interventions definedby f_gstar1 can be static, dynamic or stochastic. For example, for a data set with a binarytreatment, a stochastic regime will randomly assign treatment to 30% of the observations,6 tmleCommunity : Target Maximum Likelihood Estimation for Community-level Data and another deterministically static regime will assign treatment for every observation. Thecorresponding f_gstar1 function can be coded as
R> define_f.gstar <- function(prob.val) {R+ f.gstar <- function(data, ...) {R+ rbinom(n = NROW(data), size = 1, prob = prob.val)R+ }R+ return(f.gstar)R+ }R> f.gstar_stoch.0.3 <- define_f.gstar(prob.val = 0.3)R> f.gstar_determ.1 <- define_f.gstar(prob.val = 1)
Alternatively, a deterministic regime can be specified by passing a vector (or a matrix / dataframe) to the f_gstar1 argument with one element per observation (and one column pertreatment variable if multivariate). Moreover, f_gstar1 can be set to a numeric value if thatconstant exposure will be assigned to all observations. Thus, the other two ways to code f.gstar_determ.1 in the example above would be
R> f.gstar_vector.1 <- rep(1L, NROW(data))R> f.gstar_const.1 <- 1L
As discussed in section 2.7.1, the first-stage of the TMLE procedure is to estimate the con-ditional mean outcome ¯ Q . A good initial fit of ¯ Q could reduce reliance on bias reduction fromthe subsequent targeting step and provide a target parameter estimate with smaller variance.The following optional arguments to the tmleCom_Options() and tmleCommunity() functionsgive users control over the initial estimation of ¯ Q :Relevant arguments in tmleCom_Options() :1. Qestimator
A string specifying the default estimator for fitting ¯ Q , including bothparametric estimations ( "speedglm__glm" and "glm__glm" ) and data-adaptive esti-mations ( "h2o__ensemble" and "SuperLearner" ).2. h2ometalearner A string to pass to h2o.ensemble , specifying the prediction algorithmused to learn the optimal combination of the base learners.3. h2olearner
A vector of prediction algorithms for training the ensemble’s base models.4.
SL.library
A vector of prediction algorithms to pass to
SuperLearner .5.
CVfolds
Optional number of splits in the V-fold cross-validation step for data-adaptiveestimation.Relevant arguments in tmleCommunity() :1.
Qform
Regression formula for ¯ Q , with the form of Ynode ~ Anodes + WEnodes .2.
Qbounds
A vector of 2 truncated levels on continuous Y and the initial estimate ¯ Q n . ournal of Statistical Software alpha A value keeping ¯ Q n bounded away from (0,1) for logistic fluctuation.Note that a negative Bernoulli log likelihood could be used as a valid loss function for ¯ Q when setting fluctuation = "logistic" , even if Y is not binary. Compared to a regularlinear fluctuation, a logistic fluctuation assures that all predicted means are in (0, 1) andthe corresponding estimates for ¯ Q respect the global constraints of the observed data model,which reduces bias and variance in small samples (Gruber and van der Laan 2010). Beforeperforming the estimation procedure, outcomes Y will be bounded by Qbounds , and thenwill be automatically transformed into Y ∗ , continuous outcomes bounded by (0, 1) where Y ∗ = Y − ab − a ∈ [0 , Qbounds is unspecified, the default choice of the range of Y , widenedby 10% of its minimum and maximum values, will be used. Besides, if outcomes Y wereoriginally transformed into Y ∗ , fitting values of the targeted estimates will be automaticallymapped back to the original scale. Once Qbounds finish bounding the observed outcomes,it will be set to (1 - alpha , alpha ) and used to bound the predicted values for the initialoutcome mechanism.The default estimation algorithm for ¯ Q is set to "speedglm__glm" , which relies on the speedglm package (Enea 2017) and uses its function speedglm.wfit to fit parametric gen-eralized linear models (GLM) to medium-large data sets. We note that a direct call to speedglm.wfit that requires a model matrix as input is faster than the standard call to speedglm . Another regular parametric GLM fitting functoin glm.fit (the workhorse of glm )is also available for estimating ¯ Q by setting the Qestimator argument to "glm__glm" . When speedglm.wfit or glm.fit is called, logistic regression will be used for the initial estimationof ¯ Q .However, in a nonparametric model, the probability distribution of the data are typicallyunknown and thus parametric models with assumptions that do not use realistic backgroundknowledge and not respect the global constraints of the statistcal model are easily incor-rectly specified. The recommended solution for it is to use more flexible machine-learningestimators that adapt the regression function to the data without overfitting the data. tm-leCommunity relies on the SuperLearner and h2oEnsemble packages to perform data-adaptiveestimation. Based on the oracle properties of V-fold cross validation that minimizes the es-timated expected squared error loss function (van der Laan, Polley, and Hubbard 2007), thesuper learning chooses the best weighted convex combination of candidate estimators in theuser-specified library, possibly including both machine learning algorithms and parametricmodels. One of its most important advantages is that its ”free lunch” principle - Including alarge variety of prediction algorithms in the super learning library could increase the chanceof being consistently estimated, especially when the functional form of the conditional densityis unknown. Note that h2oEnsemble is another package that provides Super learning methodand is based on the h2o package (usually used to build models on large datasets).
Qform can be used to specify a regression formula that includes
Anodes and
WEnodes for ¯ Q .The functional form of the formula is only important to parametric estimation algorithms speedglm.wfit and glm.fit . When using data-adaptive estimation algorithms provided by SuperLearner and h2oEnsemble , all variables on the right hand side of
Qform will be treatedas predictor variables passed to the candidate estimators, ignoring the original regressionformula. If
Qform is somehow left unspecified, all variables in
Anodes and
WEnodes will betreated as predictor covariates. Besides, the library of the candidate estimators can specifydifferent functions of the passed predictor variables, such as
SL.glm.interaction for second8 tmleCommunity : Target Maximum Likelihood Estimation for Community-level Data order interaction terms in
SuperLearner . For more details on creating new wrapper functionsfor prediction algorithms in h2oEnsemble and
SuperLearner , please see (LeDell 2016) and(Polley et al.
SuperLearner and h2oEnsemble may fail dueto various reasons such as constant responses, so does the use of speedglm . Therefore, the tmleCommunity() function provides an insurance mechanism for guaranteeing the estimationprocedure functions normally even if the chosen algorithm fails: If "h2o__ensemble" fails, itfalls back on "SuperLearner" ; If "SuperLearner" fails, it falls back on "speedglm__glm" ;If "speedglm__glm" fails, it falls back on "glm__glm" . However, tmleCommunity() willterminate with an error if the last solution glm.fit also fails.We demonstrate a simple application of the tmleCommunity function using user-specifiedparametric models and super learning library to estimate ¯ Q in the code chunk below. In thisexample, we have a simulated data of 1,000 i.i.d. observations with four baseline covariates( W1, W2, W3 and W4 ), one binary exposure( A ) and continuous outcome ( Y ). Its true ATE valueis 2.80. Code to generate the example dataset is attached in the supplementary material.We begin with correctly specified models for ¯ Q . Note that tmleCommunity will provide resultsof three estimators for ATE. In this section, we only care about the non-targeted substitutionestimate that uses only an estimate of ¯ Q , thus we can use "gcomp" to extract the correspondingresults of the MLE estimator. R> tmleCom_Options(Qestimator = "speedglm__glm")R> tmleCom_Qc_ATE <-R+ tmleCommunity(data = indSample.bA.cY, Ynode = "Y", Anodes = "A",R+ WEnodes = c("W1", "W2", "W3", "W4"), f_gstar1 = 1,R+ f_gstar2 = 0, Qform = "Y ~ W1 + W2 + W3 + W4 + A",R+ alpha = 0.995)R> c(tmleCom_Qc_ATE$ATE$estimates["gcomp", ],R+ tmleCom_Qc_ATE$ATE$vars["gcomp", ]) [1] 2.816628 0.004813
What if our assumption of the parametric model for ¯ Q is incorrect? The result of the mis-specified outcome regression is shown next. R> tmleCom_Options(Qestimator = "speedglm__glm")R> tmleCom_Qm_ATE <-R+ tmleCommunity(data = indSample.bA.cY, Ynode = "Y", Anodes = "A",R+ WEnodes = c("W1", "W2", "W3", "W4"), f_gstar1 = 1,R+ f_gstar2 = 0, Qform = "Y ~ W1 + A", alpha = 0.995)R> c(tmleCom_Qm_ATE$ATE$estimates["gcomp", ],R+ tmleCom_Qm_ATE$ATE$vars["gcomp", ]) [1] 3.45848993 0.01460869
Next, suppose we do not know its parametric model and need to use the super learningalgorithm to estimate ¯ Q . Instead of using the default library, we specify one that contains ournal of Statistical Software SL.glm , SL.bayesglm and
SL.gam (Note that
SL.gam calls the gam function in the suggested gam package that uses generalized additive models (Hastie2017)).
R> require("SuperLearner")R> tmleCom_Options(Qestimator = "SuperLearner", CVfolds = 5,R+ SL.library = c("SL.glm", "SL.bayesglm", "SL.gam"))R> tmleCom_QSL_ATE <-R+ tmleCommunity(data = indSample.bA.cY, Ynode = "Y", Anodes = "A",R+ WEnodes = c("W1", "W2", "W3", "W4"), f_gstar1 = 1,R+ f_gstar2 = 0, rndseed = 12345)R> c(tmleCom_QSL_ATE$ATE$estimates["gcomp", ],R+ tmleCom_QSL_ATE$ATE$vars["gcomp", ]) [1] 2.809818 0.004797
After finishing the initial fit for ¯ Q in the first stage of TMLE, the next step is to modifythe initial estimator by using the estimation of nuisance parameter g , in order to make anoptimal bias-variance trade off. Recall that the estimate g n will be used in a clever covariatethat defines a parametric fluctuation model to update the initial estimate of ¯ Q . The followingarguments to the tmleCom_Options() and tmleCommunity() functions provide flexibility inhow to choose the estimator for g :Relevant arguments in tmleCom_Options() : • gestimator A string specifying the default estimator for fitting g ,similar to Qstimator (except that gestimator also supports "sl3_pipelines" , another data-adaptive esti-mation method). • bin.method Specify the method for choosing bins when discretizing the conditionalcontinuous exposure variable, including "equal.mass", "equal.len" and "dhist" . • nbins Number of bins when discretizing a continuous variable. • maxncats The maximum number of unique categories a categorical variable can have. • maxNperBin The maximum number of observations in each bin. • poolContinVar Logical, when fitting a model for binarized continuous variable, if pool-ing bin indicators across all bins and fit one pooled regression or not • savetime.fit.hbars Logical, if skipping estimation and prediction of exposure mech-anism or not, when f.gstar1 = NULL and
TMLE.targetStep = "tmle.intercept" .Relevant arguments in tmleCommunity() : • hform.g0 Regression formula for g , with the form of Anode ~ WEnodes .0 tmleCommunity : Target Maximum Likelihood Estimation for Community-level Data • hform.gstar Regression formula for the user-supplied intervention g ∗ , with the form of Anode ~ WEnodes . • lbound One value for bounds on the ratio of the estimate of g ∗ to the estimate of g . • h.g0_GenericModel An object of
GenericModel R6 class containing the previouslyfitted models for P ( A | W, E ) under observed treatment mechanism g . • h.gstar_GenericModel An object of
GenericModel R6 class containing the previouslyfitted models for P ( A ∗ | W, E ) under observed treatment mechanism g ∗ . • TMLE.targetStep
TMLE targeting step method, either "tmle.intercept" (Default)or "tmle.covariate" .The options for selecting estimation algorithms for the treatment mechanism are similar tothose for estimating ¯ Q , and they share the same choices of h2ometalearner, h2olearner,SL.library , and CVfolds . Beyond that, users can also the sl3 package to perform data-adaptive estimation for g . Note that sl3 is a modern implementation of the Super Learneralgorithm for ensemble learning and model stacking. For model details on creating newwrapper functions in sl3 , please see (Coyle et al. g , even if the chosen algorithm fails, the tmlecommunity () function will use the following rule: If "h2o__ensemble" fails, it falls backon "sl3_pipelines" ; If "sl3_pipelines" fails, it falls back on "SuperLearner" ; The restof the mechanism will be the same as in the estimation process of ¯ Q Also, g can be either estimated by parametric models or data-adaptive algorithms, dependingon the choices of gestimator . Though the estimation algorithms for ¯ Q and g could bedifferent as long as the choices of Qestimator and gestimator differ, the same library isused for all factors of g . For example, the initial fit of ¯ Q is estimated by h2oEnsemble with "h2o.glm.wrapper" and "h2o.randomForest.wrapper" ; A super learning library, including SL.loess (with span = 0.8 ), SL.glmnet , SL.knn.20 (with neighborhood size k = 20 ) and
SL.step , will be used for each factor of g .It is important to choose the number and position of the bins when discretizing a continuousexposure variable, as the choices affect the variance of the density estimation (Scott 1992).Fortunately, the type of each variable will be automatically detected (can be binary, categor-ical, or continuous) based on the user-specified maxncats argument. Recall that maxncats provides the maximum number of unique categories a categorical variable can have. So if onevariable has more unique categories, it is automatically considered as a continuous variable.According to Denby and Mallows (2009), a histogram can be used as a graphically descriptivetool where its location of the bins is determined by cutting the empirical cumulative distribu-tion function (ecdf) by a set of parallel lines. First, the nbins argument is a tuning parameterthat determines the total number of bins of discretization. A cross-validation selector can beapplied to data-adaptively select the candidate number of bins, which minimizes variance andmaximizes precision. Note that we do not recommend too many bins due to easily violatingthe positivity assumption.Next, given a number of bins, we need to choose the most convenient locations (cutoffs)for the bins. There are three alternative approaches that use a histogram to define the bincutoffs for a continuous variable: equal-mass, equal-length, and a combination of these two ournal of Statistical Software tmleCommunity package, the choice of methods bin.method together with theother arguments nbins and maxNperBin can be used to define the values of bin cutoffs. Notethat maxNperBin provides a user-specified maximum number of observations in each bin.The default discretization method, equal mass (aka equal area) interval method, is set bypassing an argument bin.method="equal.mass" to tmleCom_Options() prior to calling themain function tmleCommunity() . The interval are defined by spanning the support of A intonon-equal length of bins, each containing (approximately) the same number of observations.It is data-adaptive since it tends to be wide where the population density is small, and narrowwhere the density is large. If nbins is NA (or is smaller than n maxNperBin ), nbins will be (re)setto the integer value of n maxNperBin where n is the total number of observations, and the defaultsetting of maxNperBin is 500 observations per interval. This method could identify spikes inthe density, but oversmooths in the tails and so could not discover outliers.Besides, equal length interval method is set by passing an argument bin.method="equal.len" to tmleCom_Options() . The intervals are defined by spanning the support of a continuousvariable into nbins number of equal length of bins. This method describes the tails of thedensity and identifies outliers well, but oversmooths in regions of high density and so is poorat identifying sharp peaks. Moreover, as an alternative to find a compromise between equalmass and equal length approaches, the combination method is set by passing an argument bin.method="dhist" , where dhist is named for diagonally cut histogram. For consistency,We choose the slope a = 5 × IQR(A) as suggested by Denby and Mallows (2009).Similar to
Qform , formulae that include
WEnodes can be specified for estimating componentsof g and g ∗ using the hform.g0 and hform.gstar arguments. The functional form of theformulae is unimportant when the data-adaptive estimation algorithms are used. Also, ifthe hform.g0 and hform.gstar arguments are unspecified, the formulae will default to mainterm regressions that includes all variables in WEnodes .The lbound argument is a tuning parameter, conforming with the theoretical assumption 2 insection 2.5 that the ratio of g ∗ ( a | E, W ) to g ( a | E, W ) must be bounded away from + ∞ . Sincethe function g ∗ ( a | E, W ) is user given, we can try to define it in a way so that it could be used toanswer the causal question of interest, and yet it does not produce unstable weights. However,when there are unstable weights that cause extremely large value of g ∗ ( a | E, W ) g ( a | E, W ) , this lack ofidentifiability will result in the estimates with high variance (van der Laan and Rubin 2006). Acommon approach to reduce the variance of the consequent estimates is bounding g ∗ ( a | E, W ) g ( a | E, W ) away from the extremely large value, e.g., 0 ≤ g ∗ ( a | E, W ) g ( a | E, W ) ≤ lbound . However, truncationcomes at a price of bias since the consistency of the estimator of g ( a | E, W ) may be affected.Therefore, the lbound argument should be chosen carefully (it defaults to 0.005). TMLE.targetStep specifies how to use weights h g ∗ h gN in the TMLE targeting step. If it is setto "tmle.intercept" (default), it performs the weighted intercept-based TMLE that runs aintercept-only weighted logistic regression using offsets logit( Q ∗ ) and weights h g ∗ h gN and so nocovariate. If setting to "tmle.covariate" , it performs the unweighted covariate-based TMLEthat run an unweighted logistic regression using offsets logit( Q ∗ ) and a clever covariate h g ∗ h gN .The following example illustrates IPTW estimation of the average causal effect of individual-based continuous intervention at a single time point. A sample of 5,000 is generated, with eachrow i consisting of four baseline covariates ( W1, W2, W3 and W4 ), one continuous exposure( A ) and continuous outcome ( Y ). The true value for the marginal treatment effect of the2 tmleCommunity : Target Maximum Likelihood Estimation for Community-level Data intervention for the simulated data is ψ = 3 . ψ under a truncated stochasticintervention g ∗ , which is defined by shifting the normal density of observed A until g ∗ g ≥ g . In this case, the tmleCommunity() function receivesonly one user-specified intervention, and we should utilize $EY_gstar1 to extract the resultsof estimates under the intervention f_gstar1 . Moreover, we use "iptw" to display the resultsof the IPTW estimator since it relies only on the esimate of g . Let’s begin with correctlyspecified models for g and g ∗ with a shift of 2 on the observed A , using the equal massmethod that discretizes A into 5 bins (all default choices). R> define_f.gstar <- function(shift.val, truncBD, rndseed = NULL) {R+ f.gstar <- function(data, ...) {R+ set.seed(rndseed)R+ A.mu <- 0.86 * data$W1 + 0.93 * data$W3 * data$W4 + 0.41 * data$W4R+ untrunc.A <- rnorm(n = nrow(data), mean = A.mu + shift.val, sd = 1)R+ r.new.A <- exp(0.5 * shift.val * (untrunc.A - A.mu - shift.val / 2))R+ trunc.A <- ifelse(r.new.A > truncBD, untrunc.A - shift.val, untrunc.A)R+ return(trunc.A)R+ }R+ return(f.gstar)R+ }R> f.gstar <- define_f.gstar(shift.val = 2, truncBD = 10, rndseed = 1)R>R> gform.C <- "A ~ W1 + W3 * W4" [1] 3.4406569 0.0120417
Note that if the discretization method is equal mass (i.e., bin.method = "equal.mass" ), andeach bin is allowed to contain no more than 250 observations (i.e., maxNperBin = 250), thenumber of bins ( or regressions) will be set to the larger value between the nearest interger of n and the value of nbins , where n is the total number of observations. Thus, even if nbins defaults to 5, the real number of bins for a sample of 5000 will be 20. It is worth mentioningthat during the estimation, −∞ and + ∞ are added as leftmost and rightmost cutoff pointsto make sure all future data points end up in one of the intervals. For example, if the realnumber of bins is 20, then the returned results will include 22 fitted models. However, theselection of maxNperBin doesn’t have influence on the real number of bins when using theequal length and combination methods. ournal of Statistical Software R> tmleCom_Options(maxNperBin = 250, bin.method = "equal.mass")R> tmleCom_gmain_eqmass <-R+ tmleCommunity(data = indSample.cA.cY, Ynode = "Y", Anodes = "A",R+ WEnodes = c("W1", "W2", "W3", "W4"), f_gstar1 = f.gstar)R> h.g0_models_mass <- tmleCom_gmain_eqmass$EY_gstar1$h.g0_GenericModelR> length(h.g0_models_mass$getPsAsW.models()$ ` P(A|W).1 ` $bin_nms) [1] 22 R> tmleCom_Options(maxNperBin = 250, bin.method = "equal.len")R> tmleCom_gmain_eqlen <-R+ tmleCommunity(data = indSample.cA.cY, Ynode = "Y", Anodes = "A",R+ WEnodes = c("W1", "W2", "W3", "W4"), f_gstar1 = f.gstar)R> h.h.g0_models_len <- tmleCom_gmain_eqlen$EY_gstar1$h.g0_GenericModelR> length(h.h.g0_models_len$getPsAsW.models()$ ` P(A|W).1 ` $bin_nms) [1] 7 As mentioned previously, when f.gstar1 is inadvertently unspecified (i.e., f.gstar1 = NULL )and
TMLE.targetStep = "tmle.intercept" , setting savetime.fit.hbars to TRUE allowsthe TMLE process to skip the estimation and prediction of exposure mechanism P ( A | W, E )under g and g ∗ . It will directly set h g ∗ h gN to 1 for all observations. R> tmleCom_nofgstar <-R> tmleCommunity(data = indSample.cA.cY, Ynode = "Y", Anodes = "A",R+ WEnodes = c("W1", "W2", "W3", "W4"), f_gstar1 = NULL,R+ Qform = "Y ~ W1 + W2 + W3 + W4 + A")R> c(tmleCom_nofgstar$EY_gstar1$h.g0_GenericModel,R+ tmleCom_nofgstar$EY_gstar1$h.gstar_GenericModel) [1] NULL
If instead we would like to estimate the same parameter except using machine learning algo-rithms. R code that uses
SuperLearner to estimate ψ is shown next. It displays a satisfac-tory result of estimation with a super learning library containing "SL.glm", "SL.gam" and "SL.randomForest" . R> require("SuperLearner")R> tmleCom_Options(gestimator = "SuperLearner", maxNperBin = N,R+ SL.library = c("SL.glm", "SL.gam"))R> tmleCom_gSL_default <-R+ tmleCommunity(data = indSample.cA.cY, Ynode = "Y", Anodes = "A",R+ WEnodes = c("W1", "W2", "W3", "W4"), f_gstar1 = gstar,R+ Qform = "Y ~ W1 + W2 + W3 + W4 + A", rndseed = 1)R> c(tmleCom_gSL_default$EY_gstar1$estimates["iptw", ],R+ tmleCom_gSL_default$EY_gstar1$vars["iptw", ]) tmleCommunity : Target Maximum Likelihood Estimation for Community-level Data [1] 3.4471211 0.0124127 Another choice for performing maching leraning algorithms, especially stacked ensemble learn-ing, is to use the sl3 package. Example R code for estimating ψ is shown next. Both Lrnr_glm_fast and
Lrnr_glmnet are used in the library.
R> require("sl3"); require("SuperLearner")R> tmleCom_Options(gestimator = "sl3_pipelines", maxNperBin = N, CVfolds = 5,R> sl3_learner = list(glm_fast = make_learner(Lrnr_glm_fast),R> glmnet = make_learner(Lrnr_glmnet)),R> sl3_metalearner = make_learner(R> Lrnr_optim, loss_function = loss_squared_error,R> learner_function = metalearner_logistic_binomial))R> tmleCom_gsl3_default <-R> tmleCommunity(data = indSample.cA.cY, Ynode = "Y", Anodes = "A",R> WEnodes = c("W1", "W2", "W3", "W4"), f_gstar1 = f.gstar,R> rndseed = 1)R> c(tmleCom_gsl3_default$EY_gstar1$estimates["iptw", ],R> tmleCom_gsl3_default$EY_gstar1$vars["iptw", ]) [1] 3.449183 0.014578
Recall that the h2oEnsemble package could also perform Super learning methods. In thiscase, we apply generalized linear models with penalized maximum likelihood for both baselearners that are used to train the base models for the ensemble, and the metalearner that isused to learn the optimal combination of the base learners. Specifically, the base learners willinclude three regressions: Lasso ( α = 1), Ridge ( α = 0) and Elastic net models with α = 0 . R> require("h2oEnsemble")R> h2o.glm.1 <- function(..., alpha = 1, prior = NULL) {R+ h2o.glm.wrapper(..., alpha = alpha, , prior=prior)R+ }R> h2o.glm.0.5 <- function(..., alpha = 0.5, prior = NULL) {R+ h2o.glm.wrapper(..., alpha = alpha, , prior=prior)R+ }R> h2o.glm.0 <- function(..., alpha = 0, prior = NULL) {R+ h2o.glm.wrapper(..., alpha = alpha, , prior=prior)R+ }R> tmleCom_Options(gestimator = "h2o__ensemble", maxNperBin = N,R+ h2ometalearner = "h2o.glm.wrapper",R+ h2olearner = c("h2o.glm.1", "h2o.glm.0.5", "h2o.glm.0"))R> tmleCom_gh2o_default <-R+ tmleCommunity(data = indSample.cA.cY, Ynode = "Y", Anodes = "A",R+ WEnodes = c("W1", "W2", "W3", "W4"),R+ f_gstar1 = f.gstar, rndseed = 1)R> c(tmleCom_gh2o_default$EY_gstar1$estimates["iptw", ],R+ tmleCom_gh2o_default$EY_gstar1$vars["iptw", ]) ournal of Statistical Software [1] 3.4321917 0.0118350 For full details, see the documentation for the tmleCommuity package (cite ***). • data Observed data, data.frame with named columns, containing
WEnodes , Anode , Ynode and possibly communityID , YnodeDet . • Ynode
Column name or index in data of outcome variable. Outcome can be eitherbinary or continuous (could be beyond 0 and 1). If
Ynode undefined, the left-side of theregression formula in argument
Qform will be treated as
Ynode . • Anodes
Column names or indices in data of exposure (treatment) variables. • WEnodes
Column names or indices in data of individual-level (and possibly community-level) baseline covariates. Factors are not allowed. • YnodeDet
Optional column name or index in data of indicators of deterministic valuesof outcome
Ynode , coded as (
TRUE / FALSE ) or (1 / 0). If TRUE or 1, value of
Ynode isgiven deterministically / constant • obs.wts Optional choice to provide/ construct a vector of individual-level observa-tion (sampling) weights (of length nrow(data) ). Currently supports a non-negativenumeric vector, "equal.within.pop" (Default) and "equal.within.community" . If "equal.within.pop" , weigh individuals in the entire dataset equally (weigh to be all 1);If "equal.within.community" , weigh individuals within the same community equally(i.e., 1 / (number of individuals in each community)). • community.step Methods to deal with hierarchical data, user needs to specify one of thefour choices: "NoCommunity" (Default), "community_level" , "individual_level" ,and "PerCommunity" . If "NoCommunity" , claim that no hierarchical structure in data;If "community_level" , use the community-level TMLE; If "individual_level" , usethe individual-level TMLE cooperating with the assumption of no covariate interfer-ence; Finally if "perCommunity" , use stratified TMLE. If communityID = NULL , thenautomatically pool over all communities (i.e., treated it as "NoCommunity" ). • communityID Optional column name or index in data representing community iden-tifier variable. If known, it can support the three options within community.step : "community_level" , "individual_level" and "PerCommunity" . • community.wts Optional choice to provide/ construct a matrix of community-levelobservation weights (where dimension = J ×
2, where J = the number of communi-ties). The first column contains the identifiers / names of communities (ie., data[,communityID] ) and the second column contains the corresponding non-negative weights.Currently only support a numeric matrix with 2 columns, "size.community" (De-fault) and "equal.community" . If setting community.wts = ”size.community”, treatthe number of individuals within each community as its weight, respectively. And if community.wts = ”equal.community”, assumed weights to be all 1.6 tmleCommunity : Target Maximum Likelihood Estimation for Community-level Data • pooled.Q Logical for incorporating hierarchical data to estimate the outcome mecha-nism. If
TRUE , use a pooled individual-level regression for initial estimation of the meanoutcome (i.e., outcome mechanism). Default to be
FALSE . • f_g0 Optional function used to specify model knowledge about value of Anodes. Itestimates P ( A | W, E ) under g0 by sampling a large vector/ data frame of Anode (oflength nrow(data)*n_MCsims or number of rows if a data frame) from f_g0 function. • f_gstar1 Either a function or a vector or a matrix/ data frame of counterfactual ex-posures, depending on the number of exposure variables. If a matrix/ data frame, itsnumber of rows must be either nrow(data) or 1 (constant exposure assigned to allobservations), and its number of columns must be length(Anodes) . Note that the col-umn names should match with the names in
Anodes . If a vector, it must be of length nrow(data) or 1. If a function, it must return a vector or a data frame of counterfactualexposures sampled based on
Anodes , WEnodes (and possibly communityID ) passed as anamed argument ”data”. Thus, the function must include ”data” as one of its argumentnames. The interventions defined by f_gstar1 can be static, dynamic or stochastic. • f_gstar2 Either a function or a vector or a matrix/ data frame of counterfactual ex-posures, depending on the number of exposure variables. It has the same componentsand requirements as f_gstar1 has. • Qform
Character vector of regression formula for Ynode. If not specified (i.e.,
NULL ),the outcome variable is regressed on all covariates included in Anodes and WEnodes(i.e.,
Ynode ~ Anodes + WEnodes ). • Qbounds
Vector of upper and lower bounds on Y and predicted value for initial Q .Default to the range of Y , widened by 10% of the min and max values. • alpha Used to keep predicted values for initial Q bounded away from (0,1) for logisticfluctuation (set Qbounds to (1 - alpha ), alpha ). • fluctuation Default to ”logistic”, it could also be ”linear” (for targeting step). • hform.g0 Character vector of regression formula for estimating the conditional densityof P ( A | W, E ) under observed treatment mechanism g . If not specified, its form will be Anodes ~ WEnodes . If there are more than one exposure, it fits a joint probability. • hform.gstar Character vector of regression formula for estimating the conditional den-sity P ( A | W, E ) under user-supplied interventions f_gstar1 or f_gstar2 . If not speci-fied, it use the same regression formula as used in hform.g0 . • lbound Value between (0,1) for truncation of predicted P ( A | W, E ). Default to 0.005. • h.g0_GenericModel An object of
GenericModel R6 class containing the previouslyfitted models for P ( A | W, E ) under observed treatment mechanism g , one of the returnsof tmleCommunity function. If known, predictions for P ( A = a | W = w, E = e ) under g are based on the fitted models in h.g0_GenericModel . • h.gstar_GenericModel An object of
GenericModel R6 class containing the previ-ously fitted models for P ( A ∗ | W, E ) under intervention gstar , one of the returns of ournal of Statistical Software tmleCommunity function. If known, predictions for P ( A = a | W = w, E = e ) under gstar are based on the fitted models in h.gstar_GenericModel . • TMLE.targetStep
TMLE targeting step method, either "tmle.intercept" (Default)or "tmle.covariate" . • n_MCsims Number of simulations for Monte-Carlo analysis. Each simulation generatesnew exposures under f_gstar1 or f_gstar2 (if specified) or f_g0 (if specified), witha sample size of nrow(data). Then these generated exposures are used when fittingthe conditional densities P ( A | W, E ) and estimating for IPTW and GCOMP under in-tervention f_gstar1 or f_gstar2 . Note that deterministic intervention only needsone simulation and stochastic intervention could use more simulation times such as 10(Default to 1). • CI_alpha
Significance level (alpha) used in constructing a confidence interval. Defaultto 0.05. • rndseed Random seed for controlling sampling A under f_gstar1 or f_gstar2 (forreproducibility of Monte-Carlo simulations) • verbose Flag. If
TRUE , print status messages. Default to
FALSE . It can be turned on bysetting options(tmleCommunity.verbose = TRUE) . For full details, see the documentation for the tmleCommuity package (cite ***). • Qestimator
A string specifying default estimator for outcome mechanism model fit-ting. The default estimator is "speedglm__glm" , which estimates regressions with speedglm.wfit ; Estimator "glm__glm" uses glm.fit ; Estimator "h2o__ensemble" im-plements the super learner ensemble (stacking) algorithm using the H2O R interface;Estimator "SuperLearner" implements the super learner prediction methods. Note thatif "h2o__ensemble" fails, it falls back on "SuperLearner" . If "SuperLearner" fails, itfalls back on "speedglm__glm" . If "speedglm__glm" fails, it falls back on "glm__glm" . • gestimator A string specifying default estimator for exposure mechanism fitting. Ithas the same options as
Qestimator . • bin.method Specify the method for choosing bins when discretizing the conditionalcontinuous exposure variable A . The default method is "equal.mass" , which providesa data-adaptive selection of the bins based on equal mass/ area, i.e., each bin willcontain approximately the same number of observations as others. Method "equal.len" partitions the range of A into equal length nbins intervals. Method "dhist" uses acombination of the above two approaches. Please see Denby and Mallows ”Variationson the Histogram” (2009) for more details. • nbins When bin.method = "equal.len" , set to the user-supplied number of bins whendiscretizing a continuous variable. If not specified, then default to 5; If setting to as NA ,then set to the nearest integer of nobs/ maxNperBin , where nobs is the total numberof observations in the input data. When method is "equal.mass" , nbins will be set asthe maximum of the default nbins and the nearest integer of nobs/ maxNperBin .8 tmleCommunity : Target Maximum Likelihood Estimation for Community-level Data • maxncats Integer that specifies the maximum number of unique categories a categor-ical variable
A[j] can have. If
A[j] has more unique categories, it is automaticallyconsidered a continuous variable. Default to 10. • maxNperBin Integer that specifies the maximum number of observations in each binwhen discretizing a continuous variable
A[j] (applies directly when bin.method ="equal.mass" and indirectly when bin.method = "equal.len" , but nbins = NA ). • parfit Logical. If
TRUE , perform parallel regression fits and predictions for discretizedcontinuous variables by functions foreach and dopar in foreach package. Default to FALSE . Note that it requires registering a parallel backend prior to running tmleCommunity function, e.g., using doParallel
R package and running registerDoParallel(cores= ncores) for ncores parallel jobs. • poolContinVar Logical. If
TRUE , when fitting a model for binarized continuous variable,pool bin indicators across all bins and fit one pooled regression. Default to
FALSE . • savetime.fit.hbars Logical. If
TRUE , skip estimation and prediction of exposuremechanism P(A | W,E) under g & g ∗ when f.gstar1 = NULL and TMLE.targetStep ="tmle.intercept" , and then directly set h_gstar_h_gN = 1 for each observation. De-fault to
TRUE . • h2ometalearner A string to pass to h2o.ensemble , specifying the prediction algorithmused to learn the optimal combination of the base learners. Supports both h2o andSuperLearner wrapper functions. Default to ”h2o.glm.wrapper”. • h2olearner A string or character vector to pass to h2o.ensemble , naming the predic-tion algorithm(s) used to train the base models for the ensemble. The functions musthave the same format as the h2o wrapper functions. Default to ”h2o.glm.wrapper”. • CVfolds
Set the number of splits for the V-fold cross-validation step to pass to
SuperLearner and h2o.ensemble . Default to 5. • SL.library
A string or character vector of prediction algorithms to pass to
SuperLearner .Default to c("SL.glm", "SL.step", "SL.glm.interaction") . For more available al-gorithms see
SuperLearner::listWrappers() .
4. Simulation studies with community-level interventions
We perform a simulation study evaluating the finite sample bias and variance of the TMLEpresented in section 2.7.1 and 2.8.1, including both community-level and individual-levelTMLE. Besides, we compare the performance of TMLE estimator with that of Inverse-Probability-of-Treatment-Weighted estimator (IPTW) and parametric G-computation for-mula estimator (GCOMP). In order to estimate the average causal effect of community-levelintervention(s) at a single time point on an individual-based outcome, we simulate a data ournal of Statistical Software j ) containing n j (non-fixed) number of individuals where n j is drawn from a normal distribution with mean 50and standard deviation 10 and rounded to the nearest integer. First, we sample n j i.i.d.community-level baseline covariates ( E , E ), distributed as n j ∼ N (50 , E ,j ∼ U nif (0 , E ,j ∼ U nif { . , . , . , . } Then 3 dependent individual-level baseline covariates ( W , W , W ) are drawn as a functionof community-level baseline covariates, respectively. W ,n j ∼ ( Bern ( expit ( − . . E ,j − . E ,j ))) i =1 ,...,n j (cid:18) W ,n j W ,n j (cid:19) ∼ N ( 1 − . E ,j − . E ,j . . E ,j , Σ = (cid:20) . . (cid:21) )And a community-level continuous treatment A is sampled conditionally on the values of allbaseline covariates. A j ∼ N ( − . . E + 0 . E + 3 W c ,n j − . W c ,n j + 0 . W c ,n j , W c ,n j = 1 n j n j (cid:88) i =1 W ,n j W c ,n j = 1 n j n j (cid:88) i =1 W ,n j W c ,n j = 1 n j n j (cid:88) i =1 W ,n j Also a truncated stochastic intervention g ∗ is defined by shifting the normal density of ob-served A by some known constant shif t > g ∗ g exceeds a known constant bound , andotherwise, the intervention keeps the observed exposure A unchanged. So the intervenedexposure A ∗ j is distributed as A ∗ j = (cid:40) A j + shif t exp { . ∗ shif t ∗ ( A j − µ ( E j , W cn j )) − shift } > truncbdA j , o.w. where µ ( E j , W cn j ) = − . . E + 0 . E + 3 W c ,n j − . W c ,n j + 0 . W c ,n j Last, the individual-level binary outcome Y that is a function of treatment and all baselinecovariates is simulated. Similarly, the post-intervened outcome Y ∗ , under stochastic interven-tion g ∗ , is defined as • Case 1 : Working model holds Y j ∼ Bern ( expit ( − . . A j + 0 . E ,j − . E ,j + 1 . W ,n j + 1 . W ,n j − . W ,n j )) Y ∗ j ∼ Bern ( expit ( − . . A ∗ j + 0 . E ,j − . E ,j + 1 . W ,n j + 1 . W ,n j − . W ,n j )) • Case 2 : Working model is not a reasonable approximation Y j ∼ Bern ( expit ( − . . A j − . E ,j + 1 . E ,j + 5 . W c ,n j − . W c ,n j − W c ,n j +0 . W ,n j + 0 . W ,n j − . W ,n j )) Y ∗ j ∼ Bern ( expit ( − . . A ∗ j − . E ,j + 1 . E ,j + 5 . W c ,n j − . W c ,n j − W c ,n j +0 . W ,n j + 0 . W ,n j − . W ,n j ))0 tmleCommunity : Target Maximum Likelihood Estimation for Community-level Data The next code chunk shows how to simulate a data set according to the previous data gen-erating distributions where the working model fails. The code also defines the stochasticintervention g ∗ that we are interested in. Assuming that we want to evaluate the effect ofa constant shift of 1, given a truncation bound of 5, the true parameter value under thisstochastic intervention is ψ = 0 . R> getY <- function(A, E1, E2, W1, W2, W3, bs, n.ind) {R+ prob.Y <- plogis(bs[1] + bs[2] * A + bs[3] * E1 + bs[4] * E2R+ + bs[5] * mean(W1) + bs[6] * mean(W2) + bs[7] * mean(W3)R+ + bs[8] * W1 + bs[9] * W2 + bs[10] * W3)R+ rbinom(n = n.ind, size = 1, prob = prob.Y)R+ }R>R> get.cluster.Acont <- function(id, n.ind, truncBD = 5, shift = 1,R+ working.model = T) {R+ ' t fix the number of obs in each community ournal of Statistical Software R+ n.ind <- round(rnorm(J, n.ind, 10))R+ n.ind[n.ind <= 0] <- n.indR+ }R+ if (only.Y) {id <- Y <- Y.gstar <- NULL } else { full.dat <- NULL}R+ for(j in 1:J) {R+ cluster.data.j <- get.cluster.Acont(working.model = working.model,R+ id = j, n.ind = n.ind[j], truncBD = truncBD, shift = shift)R+ if (only.Y) {R+ id <- c(id, cluster.data.j[, "id"])R+ Y <- c(Y, cluster.data.j[, "Y"])R+ Y.gstar <- c(Y.gstar, cluster.data.j[, "Y.gstar"])R+ } else {R+ full.dat <- rbind(full.dat, cluster.data.j)R+ }R+ if (!only.Y) { full.dat$id <- as.integer(full.dat$id) }R+ }R+ ifelse(only.Y,R+ return(data.frame(cbind(id, Y, Y.gstar))), return(full.dat))R+ }R>R> PopDat.wmF <- get.fullDat.Acont(J = 4000, n.ind = 1000, truncBD = 5,R+ shift = 1, working.model = F, only.Y = T) {R> PopDat.wmF.agg <- aggregate(PopDat.wmF, by=list(PopDat.wmF$id), mean)R> truth.wmF <- mean(PopDat.wmF.agg$Y.gstar)R>R> comSample.wmF <- get.fullDat.Acont(R+ J = 1000, n.ind = 50, truncBD = 5, shift = 1, working.model = F)R> comSample.wmF$Y.gstar <- NULLR>R> define_f.gstar <- function(shift.val, truncBD, rndseed = NULL) {R+ f.gstar <- function(data, ...) {R+ set.seed(rndseed)R+ A.mu <- - 1.2 + 0.8 * data$E1 + 0.21 * data$E2 +R+ 3 * mean(data$W1) - 0.7 * mean(data$W2) + 0.3 * mean(data$W3)R+ untrunc.A <- rnorm(n = nrow(data), mean = A.mu + shift.val, sd = 1)R+ r.new.A <- exp(1.5 * shift.val * (untrunc.A - A.mu - shift.val / 4))R+ trunc.A <- ifelse(r.new.A > truncBD, untrunc.A - shift.val, untrunc.A)R+ return(trunc.A)R+ }R+ return(f.gstar)R+ }R> f.gstar <- define_f.gstar(shift.val = 1, truncBD = 5)
We first demonstrate how to use the two distinct approaches for leveraging a hierarchicaldata structure. Recall that the first approach treats community rather than individual asthe unit of analysis and performs estimation on the aggregated data. It can also incorporatehierarchical structure for estimating outcome mechanism by adding a single pooled individual-2 tmleCommunity : Target Maximum Likelihood Estimation for Community-level Data level regression in the Super Learner library. The second approach, on the other hand, runspooled individual-level regressions on both outcome and treatment mechanisms, because itutilizes the pairing of individual-level covariates and outcomes. Note that all approaches usethe equal-mass method for choosing bins (for the continuous exposure), and speedglm as theestimators for both outcome and exposure mechanisms. Parameter estimates are obtainedfrom 200 repetitions of the simulation.
R> niterations <- 200 ournal of Statistical Software R+
Table 1: Simulation study 1. Simulation-based performance of TMLE, IPTW, Gcomp esti-mators with stochastic exposures over 200 repetitions of the simulation, when the workingmodel holds ( ψ = 55 . ψ = 55 . ψ as the average point estimate, ”Bias” as the absolute differencebetween the estimate ˆ ψ and the truth ψ , ˆ σ as the average standard error estimate, rMSE asthe root mean squared error, and ”Cover” as the proportion of times that the truth falls withinthe 95% CI. All outcome and treatment mechanisms are correctly specified. All reported bias,SE, rMSE and Coverage are multiplied by 100. Working Model Holds Working Model Fails
Estimator ˆ ψ Bias ˆ σ rMSE Cover ˆ ψ Bias ˆ σ rMSE CoverTMLE-Ia 55.75 0.18 0.60 0.62 89.5 56.47 0.69 1.36 1.52 86.5TMLE-Ib 56.48 0.91 0.64 1.12 68.5 56.50 0.72 1.60 1.75 89.5TMLE-II 55.71 0.13 0.39 0.41 84.0 57.59 1.81 1.48 2.34 69.0IPTW-I 56.63 1.06 2.60 2.81 100 56.16 0.38 2.91 2.94 100IPTW-II 55.67 0.10 3.44 3.44 100 57.23 1.45 4.23 4.47 100Gcomp-I 55.83 0.26 0.60 0.65 90.5 56.44 0.67 1.36 1.51 85.5Gcomp-II 55.75 0.18 0.39 0.43 87.5 57.57 1.79 1.48 2.33 71.0Results displayed in Table 1 shows the comparison of the performance of the two TMLEswhen the assumption of ”no covariate interference” holds and the assumption fails badly.As predicted by theory, the community-level targeted estimator (TMLE-Ia), which is alwaysbased on the aggregated data, has a good performance in both situations with negligible bias.However, the coverage rates of its influence-curve-based confidence intervals are lower thannominal (89.5% and 86.5%) due to small variances. In this case, TMLE-Ib, which uses apooled individual-level outcome regression and then a community-level stochastic interven-4 tmleCommunity : Target Maximum Likelihood Estimation for Community-level Data tion, performs slightly worse than TMLE-Ia. When the working model holds, we observe thatthe coverage rate of TMLE-Ib (68.5%) has a notable decrease compared to that of the otherestimators because of a relatively large bias. As expected, the individual-level targeted esti-mator (TMLE-II) is biased and its confidence interval coverage has an obvious decrease whenthe working model fails. Even though the working model is not a reasonable approximation,surprisingly, the IPTW using individual-level stochastic intervention (IPTW-II) provides areasonable estimate. We now consider another common simulation study with binary community-level exposure(s),which is commonly used in the study of HIV prevention and treatment. Similar to theprevious simulation, we generate 200 samples of size J = 100 communities, each containingnj observation where n j ∼ N (50 , W ,n j ∼ ( Bern (0 . i =1 ,...,n j W ,n j ∼ ( N (0 , i =1 ,...,n j W c ,n j = 1 n j n j (cid:88) i =1 W ,n j W c ,n j = 1 n j n j (cid:88) i =1 W ,n j A j ∼ Bern ( expit ( W c ,n j + 0 . W c ,n j ))However, the mechanism differs in the outcome distribution: • Case 1 : Working model is a reasonable approximation Y j ∼ Bern ( expit (0 .
15 + 0 . A j + 0 . W c ,n j + 2 W ,n j + 0 . W ,n j )) • Case 2 : Working model is not a reasonable approximation Y j ∼ Bern ( expit (0 .
15 + 0 . A j + 3 W c ,n j − . W c ,n j − . W ,n j + W ,n j ))As before, table 2 summarizes the performance of the estimators under different outcomegenerating distributions. First, TMLE-Ia performs well in both situations with negligiblebiases and great confidence interval coverages. As expected, TMLE-Ib performs similarly toTMLE-Ia when the working model holds, and worse than TMLE-Ia when it fails, in termsof bias and variance. TMLE-II, on the other hand, performs very well when the workingmodel provides a reasonable approximation, but exhibits slight increases in bias and variance(and so a more conservative confidence interval) when the working model fails. Theoratically,TMLE-II uses lower dimensional objects with size N = (cid:80) Jj =1 N j and so may improve thefinite sample efficiency if the working model holds. However, when the working model doesnot hold, the misspecification of both the outcome and treatment regressions will cause biasedestimate and efficiency loss. Besides, both IPTW-I (with the community-level g) and IPTW-II (with the individual-level g ) have larger variances compared to other estimators, and soprovides 100% coverage rates. It could be explained that the IPTW estimator has relativelylarge variability, despite the large sample size. In other words, the range of the estimatedvalues of IPTW can be wide and results in a large variance. ournal of Statistical Software ψ = 4 . ψ = 3 . Working Model Holds Working Model Fails
Estimator Bias ˆ σ rMSE Cover Bias ˆ σ rMSE CoverTMLE-Ia 0.03 1.15 1.16 95.0 0.03 1.07 1.08 92.5TMLE-Ib 0.16 1.16 1.17 95.0 0.25 1.24 1.26 97.5TMLE-II 0.01 1.14 1.14 95.00 0.04 1.22 1.22 96.0IPTW-I 0.02 3.79 3.79 100 0.06 3.46 3.46 100IPTW-II 0.04 15.99 15.99 100 0.02 17.56 17.56 100Gcomp-I 0.03 1.15 1.16 95.5 0.03 1.07 1.08 91.5Gcomp-II 0.01 1.14 1.14 95.0 0.04 1.22 1.22 96.0In this simulation, we study the special case where each community has only one observation(i.e., N = 1) and the intervention is stochastic. As described in section 2.7.3, it’s similarto data with only community-level baseline covariates (i.e., treat ( E, W ) = E ). The data-generating distribution is described as follows: E ∼ Bern (0 . E ∼ Bern (0 . E ∼ N (0 , . E ∼ U nif (0 , A | E , E , E , E ∼ N (0 . E + 0 . E − . E + 0 . E , Y | A, E , E , E , E ∼ N (3 .
63 + 0 . A − . E − . E + 0 . E − . E , A ∗ is distributed as: A ∗ j = (cid:40) A j + shif t exp { . ∗ shif t ∗ ( A j − µ ( E , E , E , E ) − shift } > truncbdA j , o.w. where µ ( E , E , E , E ) = 0 . E + 0 . E − . E + 0 . E Given a shift of 2 and a truncation bound of 10, the marginal treatment effect of the individual-based intervention is ψ = 3 . Q ( A, E ) is estimated with the correctly specified main terms re-gression model and a misspecified regression model, only adjusting for A and E . Besides,6 tmleCommunity : Target Maximum Likelihood Estimation for Community-level Data Figure 1: Box plots of the point estimates from three algorithms for sample sizes n = 1000(left) and n = 5000 (right) in Simulation study 3. The x-axis denotes the combination of theestimator and the model specification. CC indicates correctly specified outcome regressionand exposure mechanism. CM indicates the outcome regression is correctly specified, whilethe exposure mechanism misspecified. MC indicates the exposure mechanism is correctlyspecified, while the outcome regression misspecified. The dashed line indicates the true value ψ = 3 . g ∗ ( a | E ) is estimated with a correctly specified model, as well as a g ( a | E ) misspecified model, only adjusting for E . The simulations results are consistent withthe theoretical predictions. TMLE performs quite well if either the outcome regression or theexposure mechanism is correctly specified. IPTW exhibits low bias when g ∗ ( a | E ) is correctlyspecified, but is biased otherwise. This bias decreases but does not disappear with an increasein sample size. Besides, IPTW has much higher variance than other estimators even with acorrectly specified g ∗ ( a | E ), which may be explained by practical positivity violations such assmall g ∗ ( a | E ) causes large weights on few individuals. Weight truncation could be a possiblesolution for this practical violation - We can implement more restrictive bounds on g ∗ (in thesimulation analysis, a less restrictive set of bounds of [0 . ,
1] is used). When the modelfor ¯ Q ( A, E ) is misspecified, MLE performs poorly in precision, but MLE is unbiased when¯ Q ( A, E ) is correctly specified. Furthermore, sample size does help reduce variance.
5. Discussion
The tmleCommunity package was developed to offer a flexible, easily customizable implemen-tation of the TMLE algorithm for hierarchical data structure, along with community-levelmultivariate arbitrary interventions. The main class of causal parameters that is estimated ournal of Statistical Software
Ynode , Anodes , WEnodes and f_gstar1 . On that basis, experienced users can control the estimation procedureby providing the user-supplied regression models for ¯ Q , g and g ∗ , and choosing preferredmethods allowed for arguments, such as the method dealing with hierarchical data, whetherincluding hierarchical structure to estimate ¯ Q , either linear or logistic fluctuation for tar-geting, and the TMLE targeting step method. Remarkably, obs.wts and community.wts can be used to correct for case-control sampling (when the outcome is rare). Besides, the tmleCommunity function can either internally estimate all factors of the likelihood, or usethe values for g n and g ∗ n from the external estimation procedure through h.g0_GenericModel and h.gstar_GenericModel . The choices of data-adaptive machine learning techniques andother more advanced estimation methods can be specified in the tmleCom_Options function.Planned extensions to the package include several areas. First, we plan to include TMLEestimation of casual effects of multiple time-point interventions, adjusting for time-dependentcovariates for hierarchical longitudinal data. Second, since considering only complete casesin the data is inefficient and may cause bias when missingness is informative. The packagewill then be extended to allow missingness on the outcome vector, so that the correspondingcovariate information can be utilized for reducing bias and increasing efficiency in estimates.Third, as mentioned in section (2.7.2), when one individual’s outcome is affected by theindividual-level covariates of the subset of other individuals from the same community, thestrength of the ”no covariate interference” assumption should be weakened by including thisknowledge of dependence. Another planned addition to this package will so allow estimationof community-level TMLE under this setting.Additionally, this package estimates variances and standard errors through estimated influencecurves. Double robustness makes these estimates asymptotically correct if both the outcomeand treatment mechanisms are estimated consistently at reasonable rates, and conservativeif only one of them is estimated consistently. However, variance estimation is difficult whenviolations or near violations of positivity happen in finite samples due to chance (Petersen et al. g given a known shift function. In practice, a stochastic intervention g ∗ could also be unknown (i.e., not a function of g anymore). If we consider the estimationof an optimal treatment rule where the rule is defined to maximize the mean outcome underthe treatment, without cross validation, we will use the same information from the observeddata to estimate both the user-specified mechanism g ∗ and the mean outcome under the fittedmechanism, which may result in finite sample bias. According to van der Laan and Luedtke(2015) and Luedtke and van der Laan (2016), the cross-validated TMLE (cv-TMLE) approachavoids empirical process conditions and for each sample split, it estimates an empirical meanover a validation sample, under a stochastic (or deterministic) intervention estimated basedon the training sample. Therefore it may reduce finite sample bias, and including cv-TMLE8 tmleCommunity : Target Maximum Likelihood Estimation for Community-level Data in the package can be one of our future work.
6. Answers to some frequently asked questions (FAQ)
Can I call the tmleCommunity function a second time without having to re-do the estimationof exposure mechanism?
Yes. Users can use command result$EY_gstar1$h.g0_GenericModel to obtain an object of
GenericModel R6 class containing the previously fitted models for P ( A | W, E ) under observedmechanism g (assuming the result of the first call to tmleCommunity is returned to thevariable named result , and only one intervention function f_gstar1 is user given). Similarly,an object GenericModel class containing the previously fitted models for P ( A | W, E ) underintervention f_gstar1 is returned as result$EY_gstar1$h.gstar_GenericModel . The twoobjects can be passed into the second call to tmleCommunity by specifying the values for h.g0_GenericModel and h.gstar_GenericModel , respectively. Assuming we are using thesimulated data and the first estimation result in section 3.5, the next code chunk illustrateshow this is done.
R> tmleCom_gc_default2 <- tmleCommunity(R+ data = indSample.cA.cY, rndseed = 1, Ynode = "Y", Anodes = "A",R+ WEnodes = c("W1", "W2", "W3", "W4"), Qform = "Y ~ W1 + W2 + A",R+ h.g0_GenericModel = tmleCom_gc_default$EY_gstar1$h.g0_GenericModel,R+ h.gstar_GenericModel = tmleCom_gc_default$EY_gstar1$h.gstar_GenericModel)R> cbind(tmleCom_gc_default2$EY_gstar1$estimates,R+ tmleCom_gc_default2$EY_gstar1$vars)
Can I define and fit the multivariate conditional density under the user-specified interventionfunction directly without having to call the tmleCommunity function?
Yes, the package provides an individual function named fitGenericDensity to define andfit regression models for the conditional density P ( A = a | W = w ) where a is generatedunder a user-specified arbitrary (can be static, dynamic or stochastic) intervention function.Its arguments are similar to those for estimating treatment mechanisms in tmleCommunity ,except hierarchical data structure is not supported in this function. Therefore, this functionis purely for estimating the multivariate conditional density.With the same data set simulated in section 3.5, we may be interested in the mean counter-factual outcome under a stochastic intervention g ∗ where the observed A is shifted to the leftby the half of its mean. R> define_f.gstar <- function(shift.rate, ...) {R+ eval(shift.rate)R+ f.gstar <- function(data, ...) {R+ print(paste0("rate of shift: ", shift.rate))R+ shifted.new.A <- data[, "A"] - mean(data[, "A"]) * shift.rateR+ return(shifted.new.A)R+ }R+ return(f.gstar)R+ } ournal of Statistical Software R> f.gstar <- define_f.gstar(shift.rate = 0.5)R>R> tmleCom_Options(maxNperBin = N, bin.method = "dhist", nbins = 8)R>R>
Are there any sample data provided in the package so that users can play analysis on them?
Yes, the package comes with four sample datasets. comSample.wmT.bA.bY_list is an exam-ple of a hierarchical data containing a community-level binary exposure with a Individual-Level binary outcome. And indSample.iid.cA.cY_list is an example of a non-hierarchicaldata containing a continuous ex- posure with a continuous outcome. One non-hierarchicalsample dataset is indSample.iid.bA.bY.rareJ1_list , which contains a binary exposurewith a rare outcome (i.e., independent case-control scenario where J = 1). Beside, thedata structure of another dataset indSample.iid.bA.bY.rareJ2_list is identical to this of indSample.iid.bA.bY.rareJ1_list , except that now the ratio of the number of controls tothe number of case J is 2. Can the tmleCommunity package handle panel data transformation before performing TMLEanalysis?
Yes. The panelData_Trans function provides a wide variety of ways of data transformationfor panel datasets, such as fixed effect and pooling model. It also allows users to only applytransformation on regressors of interests, instead of on the entire dataset. For example,before running the tmleCommunity function on the data set simulated in section 4.1 when theworking model fails, we want to apply fixed effect transformation where the individual effectis introduced, then we can use
R> pData.FE <- panelData_Trans(R> data = comSample.wmF, xvar = c("E1", "E2", "W1", "W2", "W3", "A"),R> yvar = "Y", index = "id", effect = "individual",R> model = "within", transY = TRUE)
Besides, we can keep the outcome variable fixed during the panel transformation by setting transY = False . Additional details can be found in the package manual https://github.com/chizhangucb/tmleCommunity/blob/master/tmleCommunity_Package_Documentation.pdf .
7. Acknowledgments
This work was supported by National Institutes of Health Directorˆa ˘A´Zs New Innovator AwardProgram DP2HD080350 (PI: Jennifer Ahern).0 tmleCommunity : Target Maximum Likelihood Estimation for Community-level Data
References
Balzer LB, Zheng W, van der Laan MJ, Petersen ML, for the SEARCH Collaboration (2017).“A New Approach to Hierarchical Data Analysis: Targeted Maximum Likelihood Estima-tion of Cluster-Based Effects Under Interference.”
ArXiv e-print arXiv:1706.02675 .Bembom O, van der Laan MJ (2007). “A Practical Illustration of the Importance of RealisticIndividualized Treatment Rules in Causal Inference.”
Electronic Journal of Statistics , (0),574–596.Boyd RC, Wooden TD, Munro MA, Liu T, Have TT (2008). “The Impact of CommunityViolence Exposure on Anxiety in Children of Mothers with Depression.” Journal of Child& Adolescent Trauma , (4), 287–300.Brooks-Gunn J, Aber JL, Duncan GJ (1997). Neighborhood Poverty. Context and Conse-quences for Children. Volume I . Russell Sage Foundation.Coyle JR, Hejazi NS, Malenica I, Sofrygin O (2018). sl3: Modern Pipelines for MachineLearning and SuperLearning . R package version 1.1.0, URL https://doi.org/10.5281/zenodo.1342294 .Denby L, Mallows C (2009). “Variations on the Histogram.”Enea M (2017). speedglm: Fitting Linear and Generalized Linear Models to Large Data Sets .R package version 0.3-2, URL https://CRAN.R-project.org/package=speedglm .Gardiner JC, Luo Z, Roman LA (2009). “Fixed effects, random effects and GEE: What arethe differences?”
Statistics in Medicine , (2), 221–239.Gruber S, van der Laan MJ (2010). “A Targeted Maximum Likelihood Estimator of a CausalEffect on a Bounded Continuous Outcome.” The International Journal of Biostatistics , (1).Gruber S, van der Laan MJ (2012). “tmle: An R Package for Targeted Maximum LikelihoodEstimation.” Journal of Statistical Software , (13).Hastie T (2017). gam: Generalized Additive Models . R package version 1.14.4, URL https://CRAN.R-project.org/package=gam .Iv´an DMn, van der Laan MJ (2011). “Population Intervention Causal Effects Based onStochastic Interventions.” Biometrics , (2), 541–549.Kling JR, Ludwig J, Katz LF (2005). “Neighborhood Effects on Crime for Female and MaleYouth: Evidence From a Randomized Housing Voucher Experiment.” Quarterly Journal ofEconomics , (1), 87–130.Laird NM, Ware JH (1982). “Random-Effects Models for Longitudinal Data.” Biometrics , (4), 963.LeDell E (2016). h2oEnsemble: H2O Ensemble Learning . R package ver-sion 0.1.8, URL https://github.com/h2oai/h2o-3/tree/master/h2o-r/ensemble/h2oEnsemble-package . ournal of Statistical Software Biometrika , (1), 13.Luedtke AR, van der Laan MJ (2016). “Super-Learning of an Optimal Dynamic TreatmentRule.” The International Journal of Biostatistics , (1).Oakes J (2004). “The (Mis)estimation of Neighborhood Effects: Causal Inference for a Prac-ticable Social Epidemiology.” Social Science & Medicine , (10), 1929–1952.Pearl J (1995). “Causal Diagrams for Empirical Research.” Biometrika , (4), 702–710.Pearl J (2009). “Causal inference in statistics: An overview.” Statistics Surveys , (0), 96–146.Petersen ML, Porter KE, Gruber S, Wang Y, van der Laan MJ (2010). “Diagnosing andResponding to Violations in the Positivity Assumption.” Statistical Methods in MedicalResearch , (1), 31–54.Polley EC, LeDell E, Kennedy C, van der Laan MJ (2017). SuperLearner: uper LearnerPrediction . R package version 2.0-22, URL https://CRAN.R-project.org/package=SuperLearner .Prague M, Wang R, Stephens A, Tchetgen Tchetgen E, DeGruttola V (2016). “Accountingfor interactions and complex inter-subject dependency in estimating treatment effect incluster-randomized trials with missing outcomes.”
Biometrics , (4), 1066–1077.R Core Team (2017). R: A Language and Environment for Statistical Computing . R Foun-dation for Statistical Computing, Vienna, Austria. URL .Raudenbush SW, Willms J (1995). “The Estimation of School Effects.”
Journal of Educationaland Behavioral Statistics , (4), 307–335.Robins J (1986). “A New Approach to Causal Inference in Mortality Studies with a Sus-tained Exposure Period - Application to Control of the Healthy Worker Survivor Effect.” Mathematical Modelling , (9-12), 1393–1512.Scott DW (1992). Multivariate Density Estimation . Wiley.Sofrygin O, van der Laan MJ (2015). tmlenet: Targeted Maximum Likelihood Estimation forNetwork Data . R package version 0.1.0, URL https://CRAN.R-project.org/package=tmlenet .Steele JL (2016). “Race and General Strain Theory: Examining the Impact of Racial Dis-crimination and Fear on Adolescent Marijuana and Alcohol Use.”
Substance Use & Misuse , (12), 1637–1648.University of California San Francisco (2013). “Sustainable East Africa Research inCommunity Health (SEARCH).” URL https://clinicaltrials.gov/ct2/show/study/NCT01864603 .van der Laan MJ (2010). “Estimation of Causal Effects of Community Based Interventions.” U.C. Berkeley Division of Biostatistics Working Paper Series , p. Working Paper 268.2 tmleCommunity : Target Maximum Likelihood Estimation for Community-level Data van der Laan MJ (2014). “Causal Inference for a Population of Causally Connected Units.”
Journal of Causal Inference , (1).van der Laan MJ, Luedtke AR (2015). “Targeted Learning of the Mean Outcome under anOptimal Dynamic Treatment Rule.” Journal of Causal Inference , (1).van der Laan MJ, Polley EC, Hubbard AE (2007). “Super Learner.” Statistical Applicationsin Genetics and Molecular Biology , (1). doi:10.2202/1544-6115.1309 .van der Laan MJ, Rubin D (2006). “Targeted Maximum Likelihood Learning.” The Interna-tional Journal of Biostatistics , (1).van der Laan MJ, Sherri R (2011). Targeted Learning: Causal Inference for Observationaland Experimental Data . Springer New York.
Affiliation:
Chi ZhangDivision of BiostatisticsUniversity of California, BerkeleyBerkeley, CA 94720E-mail: [email protected]
Journal of Statistical Software published by the American Statistical Association
Volume VV, Issue II
Submitted: yyyy-mm-ddMMMMMM YYYY