[PDF] Estimation of Effects of Sequential Treatments by Reparameterizing Directed Acyclic Graphs

Abstract

The standard way to parameterize the distributions represented by a directed acyclic graph is to insert a parametric family for the conditional distribution of each random variable given its parents. We show that when one's goal is to test for or estimate an effect of a sequentially applied treatment, this natural parameterization has serious deficiencies. By reparameterizing the graph using structural nested models, these deficiencies can be avoided.

Full PDF

4409

Estimation of Effects of Sequential Treatments by Reparameterizing Directed Acyclic Graphs

James M. Robins

Departments of Epidemiology and Biostatistics Harvard School of Public Health

Huntington A venue Boston, MA 02115

Abstract

Consider a set of random variables V = (X ... , X M) whose joint density f ( v) is represented by a Directed Acyclic Graph (DAG) G. If Pam represents the parents of

Xm, then the density factorizes as M f(v) = II f(xm[Pam)· (1) m;;:;l In practice, in order to estimate f ( v) from independent realizations V;, i = . . . , n, obtained on n study subjects, one often needs to assume some particular parametric form for each f(xm[Pam)· Thus one writes M f(v) = f(xm[Pam;llm)· For example, suppose m::;l the parent of X is X Then p( x2[pa2; ll2) might be N (f3o + /31 Xt, o-2) so that (h = (f3o, /31, a-). In general, if one inserts a parametric family into the right hand side of each term of (

1) and the Om are variationindependent, we call this a standard parameterization of the DAG. This seems to be the usual way of using DAGS in practice. The parameters Bm are variationindependent if parameter space for = ( (}�, .. . , li'M ) ' is the product space X

62 . . . X with the parameter space for Bj.

As natural as it seems to parameterize a DAG in this way, there are problems with the standard parameterization when one's goal is to test for or estimate an

Larry Wasserman

Department of

Statistics

Carnegie Mellon University Pittsburgh, PA 15213 effect of a treatment or control variable administered sequentially over time. This has been noted by Robins (1989, 1 997a) who proposed "structural nested models" (SNM) to avoid these problems. The next section gives a simple example which illustrates the problem. Briefly put, the problem is this: Suppose the DAG G represents treatments and covariates in a longitudinal study. Further suppose that the partial ordering of the variables in V entailed by the DAG G is consistent with the temporal ordering of the variables. Under certain conditions, the null hypothesis of "no treatment effect," although identifiable based on the observer! data, cannot be tested simply by testing for the presence or absence of arrows in the DAG G as one might expect. These conditions, far from being pathological, are indeed likely to hold in most real examples. Fortunately, the null hypothesis can be tested by examining a particular integral called the "G-functional". The null is true if this integral satisfies a certain complex condition. However, we prove in Theorem 2 that there is an additional complication. Specifically, common choices for the parametric families in a standard parameterization often lead to joint densities such that the integral can never satisfy the required condition; as a consequence, in large samples, t he null hypothesis of no treatment effect, even when true, will be falsely rejected regardless of the data. These problems are exacerbated in high dimensional problems where SNMs appear to be the only practical approach. This paper focuses on frequentist methods but the same issues arise if Bayesian methods are used. To illustrate the problem we are concerned with, consider the following generic example of a sequential randomized clinical trial depicted by DAG la in which data have been collected on variables (Ao, A1, L, Y) on each of n study subjects. The continuous vari ables A0 and A1 represent the dose in milligrams of AZT treatment received by AIDS patients at two different times, to and t1; the dichotomous variable L records whether a patient was anemic just prior to t1; the continuous variable Y represents a subject's Robins and Wasserman � L AI

DAG la. --....:;)oooiJIOo L ------;)oooiJIOo AI

DAG lb. y HIV-viralload measured at end-of-follow-up; and the hidden (unmeasured) variable U denotes a patient's underlying immune function at the beginning of the study. U is therefore a measure of a patient's underlying health status.

The dose Aa was assigned at random to subjects at time to so, by design, Ao U U. Treatment A, was randomly assigned at time t1 with randomization probabilities that depend on the observed past (Ao,L), so, by design U U A1 I Ao, L.

For simplicity, we shall assume that no other unmeasured common causes (confounders) exist. That is, each arrow in

DAG la represents the direct causal effect of a parent on its child, as in Pearl and Verma (1991) or SGS ( 1993). Note

DAG

1a is not complete because of three missing arrows: the arrows from U to Ao and A1 and the arrow from L to Y. The arrows from U are missing by design. The missing arrow from L to Y represents a priori biolog ical knowledge that L has no effect on HIV viral load Y. (The missing arrow from L to Y is not essential to what follows and is assumed to simplify exposition.) Hence, by the Markov properties of a DAG, we know that L

U Y I Ao, A,, U.

It is also known that

AZT

Ao causes anemia, so

A0 JJ L. Al s o it is known that the unmeasured variable U has a direct effect on L and Y. For example, U causes both anemia and an elevated HIV RNA.

Representing the Null Hypothesis

Suppose the trial data have been collected in order to test the null hypothesis that

AZT treatment (A0,

A1) has no effect on viral load Y. The "no

AZT effect null" hypothesis is the hypothesis that both the arrow from A0 to Y and the arrow from A1 to Y in DAG la are missing, which would imply that the true causal graph generating the data was

DAG 1 b. The alternative to this null hypothesis is that the true causal graph generating the data is one of the three L ------;IJIOo AI y

DAG

SGS (1993), we assume the joint distribution of W = (U,Ao,A1,L, Y) is faithful to the true graph. That is, if B, C, and D are distinct (possibly empty) subsets of the variables in W, then B is independent of C given D if and only if B is d -separated from C given D on the true causal graph generating the data. It follows that the "no AZT null hypothesis" of

DAG lb is true if and only if (Ao,At) U Y I L, U.

Indeed, since we have assumed no arrows from L to Y, (Ao, AI) U Y I

L, U is equivalent to the hypothesis (

Ao, Al) U Y I U. The question is: can we still characterize the null hypothesis even if U is not observed. The answer is yes, according to the following Theorem proved in Sec. 2 below. It also follows from earlier results of Robins (1986) and Pearl and Robins (1995). y DAG ld.

Theorem Suppose the distribution of W is faithful to DAGs la, lb, Jc, or ld. Then, the null hypothesis (Ao, A1) U Y I L, U holds if and only if Y li A, I Ao,L, i.e. f(y I f,ao,a1) = f(y I f,ao) (2) and I L f(y I a o , f ) f ( £ I ao) does not depend on ao. (3) l=O Thus, even though U is unobserved, we can still tell if the null holds by checking (2) and (3) which only involve the observables. Even without imposing faithfulness, (Ao, A1) U Y I L, U implies (2)-(3), although the converse is no longer true. Consider now the marginal distribution of the observed data V = (Ao, A1, L, Y). By the d-separation criterion applied to

DAG la and lc, we see that if either of

DAGs 1 a or lc generated the data, then the joint distribution of V is represented by the complete DAG

2a without missing arrows. If, on the other hand, either AG 2a. y DAG 2b. DAG lb or ld generated the data, then Eq. (2) is true, and the joint distribution of V is represented by the DAG 2b with no arrow from A1 to Y. The additional restriction (3) that distinguishes the no effect AZT null hypothesis of graph lb from DAG ld is not representable by removing further arrows from DAG 2a. This is an important observation because the most common way of testing whether (Ao, At) affects Y is to test for the absence of arrows from A0 to Y and from A1 to Y, i.e., to test (Ao, AI) lJ Y I

L; we call this the "naive test." This test is incorrect. Specifically, if the no AZT effect hypothesis of DAG 1 b is correct and the distribution of W is faithful, then (Ao, AI) U Y I L will be false, and the naive test will falsely reject the no effect AZT null with probability converging to one in large samples. Thus, testing the null hypothesis of no AZT treatment effect cannot be accomplished by testing for the presence or absence of arrows on DAG 2a. This is because the arrows on the marginal DAG 2a do not have a causal interpretation ( even though the arrows on the underlying causal DAG do have a causal interpretation). One solution to this problem is to test (2) and (3) directly. With standard parameterizations, this approach will also fail, as the next section shows. The Problem With Standard Parameterizations

We saw that to test the null hypothesis, it does not suffice to test whether arrows in DAG 2a from A0 to Y and A1 to Y are broken. Rather, we need to test the conditions (

2) and (3). We now show that this test will falsely reject if one uses a standard parameterization. To test the joint null hypothesis (2) and (3), the standard approach is to first specify parametric models for the conditional distribution of each parent given its children in the complete DAG 2a representing the

Reparameterizing DAGs observed data. Hence let {! (y I ao' al' f.; B); B E c Rq} and {!(f. I ao; 1);

1 E r C

RP} denote parametric models for the unknown densities f (y

I ao, a1, £) and f (£ I ao).

Of course, we cannot guarantee these models are correctly specified. We say the model f (y I ao, a1, f.; B) is correctly specified if there exists Bo E e such that f (y I ao' al' f; Bo) is equal to the true ( but unknown) density f (y I a0, a1, £) generating the data. Results in this Section require the concept of linear faithfulness. We say that the distribution of W is linearly faithful to the true causal graph generating the data, if for any disjoint ( possibly empty) subsets B, C, and D of the variables in W, B is d-separated from C given D on the graph if and only if the partial correlation matrix rBc.D between B and C given D is the zero matrix. If W is jointly normal, linear faithfulness and faithfulness are equivalent. For W non-normal, neither implies the other. However, the argument that the distribution of V should be linearly faithful to the generating causal DAG is essentially identical to the argument that the distribution should be faithful to the causal DAG given by SGS ( 1993) and Pearl and Verma ( 1991). To see why standard parameterizations may not work, consider a specific example. Recall that Y is continuous and that L is binary. Commonly used models in these cases are normal linear regression models and logistic regression models. Thus suppose that we adopt the following models: Ylao,al,f;O,o-2 ""'N(Oo+01ao+B2£ +03a1, o-2) (4) and f(£ = llao;l) = expit(io +/lao) (5) where expit(b) = eb /(1 + eb ) and N(Jl., o-2) denotes a Normal distribution with mean

J1. and variance o-2

We will now prove the following startling result.

Lemma If the no AZT effect null hypothesis represented by DAG lb is true and the distribution of W is either faithful or linearly faithful to DAG lb, then model ( and/or model (5) is guaranteed to be misspecified; that is, the set of distributions Fpa.r for V satisfying (4)-(5) is disjoint from the set Fma.r of distributions for V that are marginals of distributions for W that are either faithful or linearly faithful to DAG lb. Since model ( and/or (5) are guaranteed to be misspecified under the no AZT effect null hypothesis, one might expect that tests of the null assuming (4)-(5) will perform poorly. This expectation is borne out by the following theorem. Theorem Suppose (i) the data analyst tests the no AZT effect null hypothesis {2)-{3) using the parametric models (4)-{5) fit by the method of maximum likelihood, (ii) the no

AZT effect null hypothesis represented by

DA G b is true, (iii) the distribution of W is linearly faithful to DAG lb. Then, with probability converging to the no AZT effect null hypothesis (2)-(3} will be falsely reJected. Robins and Wasserman

Theorem implies that if we use models (4)-(5), then in large samples, we will reject the no AZT effect null hypothesis, even when true, for nearly all data sets (i.e., with probability approaching That is, by specifying models ( 4 ) - ( null hypothesis, when true, even before seeing the data! Proof of Theorem 2 and Lemma The following Proof of Theorem also proves Lemma Note Eqs. (2)-(3) together imply that I (ao, a1) = :L E [Y I£, ao, a1] f (C I ao) does not depend on l=O (a0, at ) . Now, under model (4)-(5), the maximum likelihood estimator of I ( ao, a I) is I ( ao, a 1 }, ) = Oo+B1ao+B3a1 +{B2e7o+"7,ao} / { + ;fo+:Y,ao} where the maximum likelihood estimators B, 9 satisfy the normal and logistic score equations t ( Y; - B Z; ) Z; = •=1 n and 0 := :L {L;-expit(9o + where i=l Z; = (1,Ali,L;,A2;r, B' = (Bo,01,Bz,B3), and r' = (ro, 11). Further, the probability limits and r* of and satisfy E [ {Y; - B* Z;} Zi] = E [ { L; -expit ( ro + ri A1;)} ( Ali)'] =

0, where the expectations are with respect to the true distribution generating the data regardless of whether models ( (5) are correctly specified. The MLE I ( ao, a1 }, converges in probability to I ( ao, a1; 0*, r*). It follows that an analyst using models ( (2)-(3) with probability approaching as n- oo if I(ao,a1;0*,r*) depends on ao,a1.

We now prove such a dependence by contradiction. It is clear that J(a0,a1;0*,1*) does not depend on (ao, at) if and only if either (i) Bi = = Bj =

0, or (ii} Oi = = ri =

0. However, it follows from standard least squares theory that (i} is true if and only if cov [Y, A0J = cov [Y, A1] = cov [Y, L] =

0. But, for example, cov [Y, L] = W is linearly faithful to DAG 1b since Y and L are not d-separated. Similarly, if (ii) is true, then ri =

0. But an easy calculation shows that 'Yi = cov (L, Ao) =

0. However, cov (L, A0) = L and Ao are not d-separated on DAG lb. The argument in this last paragraph also proves Lemma Remark:

One might conjecture that the problem could be solved by adding a small number of interaction terms to the modeL However, using reasoning like that above, one can show that this is not the case. THE G-NULL TEST

A b�tter approach to testing the null is based on the following theorem due to Robins (1986).

DAG y A0----3;:...- L --�=-=-- AI DAG 3b.

G-Null Theorem:

Equations (2) and (3} are both true if and only if both {2} and Ao li Y (6) are true. Remark:

Theorem follows as a corollary since ( 2) and (6) are all the conditional independences for V entailed by the d-separation rules applied to graph lb. From this theorem we see that, under the null (2)-(3), A and Y are independent even though there is an arrow from A0 to Y in the marginal DAG 2b for the observed data V. In the language of SGS (1993), the distribution is unfaithful to DAG 2b. However, the un derlying distribution is no t unfaithful to the causally sufficient graph DAG lb. This is merely a manifestation of the fact that faithfulness need not be preserved under marginalization. SGS's (1993) philosophical argument for faithfulness applies only to the underlying causally sufficient graph in which each arrow has a causal interpretation. It does not apply to marginal sub graphs. The G-Null Theorem immediately suggests, to those familiar with graphical models, to represent the joint distribution of the observed data by the complete DAG 3a in which the outcome Y comes first followed by Ao, then L, and finally A1.

Then the joint null hypothesis (2) and (3) is represented by DAG 3b in which the arrows from Y to A0 and Y to A1 are removed from the complete DAG The arrows in DAGs 3a and 3b do not have direct causal interpretations, since, for example, Y is a parent of L even though L is temporally prior to Y. Nonetheless, now distributions for V satisfying the no AZT effect null hypothesis (2)-(3) are faithful to the reordered graph 3b. The "reordering" of DAG 3a is particularly useful in the context of true sequential randomized experiments ince then f ( a0) and f ( a1 I a0, l) are under the control of the i n v e s ti g a to r , and thus are known. For e x a mp l e , suppose, by design, Ao � N (1r1, 1) and A1 I Ao, L � N (1r2 (Ao, L), 1). Then t h e models Ao IY,-..N(7rt+pY,1) and

At I Ao, L, Y,...., N (1r2 (Ao, L) + pY, l) are known to be c orre c t ly specified with the true value of p e q u a l to zero under the joint null (2) a nd (6). R o b i ns ( 1986, 1992), generalizing Rosenbaum ( 1984), p roposed a g -null test based on the score statistic :p [ fr log { f ( A2 i IAt;,L;,Y;;p)f (Ati IY;;p)} ] •=1 lp=O (7) which is a sum of bounded indepe n dent and identically distributed random variables U; Y; {A2;-1r2 (At;, L;)} + Yi {At; - that have mean zero under the joint null (2) and (6). Therefore, l X =

L; U; / {L; Ul} � is asymptotically distributed N (0, 1) under the joint null, i.e., under the hypothesis that the distribution of V is represented by DAG Thu s the test that rejects when lxl > is an asymptotic a lly .05-level test of the joint n ull hypothesis (2) and (6) whate v er be the unrestricted, unknown components f {£ I a0, y) and f (y) of the joint distribution of observables (Ao, At, L, Y). We now have a valid test for the no AZT effect null, but ultimately we want more. In parti c ular , we would like to estimate the size of the treatment effect. To discuss this, we first need to generalize t h e simple example and then precisely define the treatment effect. We do this in s ections 3 - Th en we introduce structural nested models which provide a unified approach to estimation of and testi n g for an AZT treatment effect while av o id ing the problems of st a ndardly parameterized DAGs.

3 The

G-computation Algorithm Formula

Let G be a directed acyclic graph with a vertex set of random variables V = (V1, . . . , VM) w i t h associated distribution function F ( v ) and d e nsity function f ( v ) with respect to the dominating me as ure f..l. Here f..l is the product measure of L e b e s gu e and counting measure corresponding to the continuous and discrete components of V. By the defining Markov property of DAGs, the d e n s it y of V can b e factored TI � t f( Vj I paj) where paj are realizations of parents Paj of Vj on G. We assume V is partitioned into d i s j o i nt sets A and L where A equals {Ao, . .. , AK} are temporally or dered treatments or control variables and given at times ta, . . . , tK. L = {Lo,Lt, . . . ,LKtd are response v ari a bles . The re s po n s e variables Lm are temporally subsequent to Am-land prior t o Am· N ow for Reparameterizing DAGs any v a ri a b l e Z, let Z be the support (i. e. , the possible realizations) of Z. For any zo, .. . , Zm, define Zm == ( zo, . . . , Zm ) · By convention :Z_t =

Z-1 = D.

Now define a treatment regime or plan g to be a col l e c t i on of I< + f un ct i ons g = {g0, .. . , g K} w here : lm ...... Am m a ps outcome histories Rm E Zm into a tre a t ment ( lm ) E Am. If ( l m ) is a constant, say a;;,, not depending on R m for eac h m, w e say reg i me g is non-dynamic and write g = a", a" = ( a0, .. . , af< ) . Otherwise, g is d ynamic . We let Q b e the set of all r e g i m es g Associated with each regime g i s the "manipulated" g r a p h G9 and d i s t r ibuti on F9 ( v ) w it h density f9 (v) (SGS, G iven the regime g == (go,91, ... ,gK) and the joint density f(v)==f(lo)f(aollo)···.f(lKt1lfK,aK), (8) f9 ( v) is the d e n s i ty f ( v) except that in the factoriza tion (8), f(aallo) is r e p la c e d by a degenerate distribution at ao = g0 (£0 ) , f( atl£1, ao, fo) is replaced by a deg e nerate distribution at a1 = a nd , in g e n er al , J (ak ek, a.�:-1) is replaced by a degenerate distribution at ak = (f.�:). Henceforth we shall assume the outcome of interest is LK+! which is assumed to be univariate and s h all be denoted by Y. In the follow i ng , let g (lk) = (go (Za), . . . ,gk (ik)) an d

9k (ek) denote realizations of Ak and Ak respectively. T he n t h e m a rgin a l density f9 (y) of Y under the distribution F9 ( - ) is /g(Y)= J /g(Y,fK)dp,(lK) = J {fg(Y I fK' g(fK)) (9) K x n (ej f;-1. g(fJ-d) }dp,(lj ). i=O Similarly, the marginal distribution function of Y under F (·)is Fg (y)= J ... J pr [Y < y I eK, g (f.K) J K X II f(lj I £J-1,g(Cj-I))dp,(£j)· (10) i= Robins (1986) referr ed to (10) as the G-computation algorithm formula or functional for t he effect of r egi m e g on outcome Y. Robins (1986) an d Pearl and

Robins (1995) give sufficient conditions under which ( 10) is the distribution of Y that would be observed if all subjects were treated with (i.e., forced to follow) plan g. A sufficient condition is that, as in DAG 1a, any hidden v a riable U that is a n ancestor of Ak on the causally sufficient graph generating the data is, for each k, d s eparated from Ak conditional on the past (Lk, Ak-t). T h i s d - s e par a tion criteria will be met in an y sequential randomized trial and is assumed to hold throughout the remainder of the paper. Robins and Wasserman The "g" -null hypothesis

In many settings, the t r eat me n ts A A K = (Ao, . . . , AK) represent a single type of treatment given at different ti m e s . In that case, with Y the out come of interest, an important first question is whether the "g" -null hypothesis of no effect of treatment on Y is true, i.e., whether F9, (y) = F9� (y) for ally, an d a ll (11) If ( 11) is true, then the distribution of Y will be the same under any choice of regime g, and thus it does not matter whether the treatments Ak are given or wi th held at each o c c as i on k. One might be concerned that even if ( 11) is true, the apparently st r ong er hypothesis that (12) for a ll fk, y, and

91, 92 E mi g h t be false, and so, conditional on Zk, it might matter which regime is to be followed s ub s e q u en tl y . However, it is easy to show t ha t the "g"-null hypothesis (11) is equivalent to (12). Here Fg (y I lm) = j pr [Y < y /lK,g (£K)] K X II f(Cj I lj-1,g(lj_l))dJi(Cj). (13) j=m+l Nevertheless, the "g" -null hypothesis is not imp l i e d by the weaker condition that Fg=(al) (y) = Fg=(a2) (y) for all non-dynamic regimes a1 = a1K and a2. However, the following lemma is true. The Lemma restates the "g" -null hypothesis in terms of restrictions on the c on _ d i tion al distributions F9 (y I "lk) f o r non-dynamic regimes g. Lemma:

The "g" -null hypothesis is true if and only if

Fg=(a,) (y I fk) = Fg=(a2) (y I fk) for all y, fk. a1 and a2, withal and a2 agreeing through occasion tk-1, i.e., al(k-1) = a2(k-I)· If we apply this Lemma to the simple example in Section we recover (2) and (3). That is, the "g" -null hypothesis for the observed data V = (Ao, A1, L, Y) of Sec. is p r e c i sel y (2)-(3). Failure of the usual parameterization for testing the "g"-null hypothesis In Section we saw that is was difficult to test t he "g" -null hypothesis using the usual parameterization of a DAG . These problems are exacerbated in the general case. Indeed, there are several difficulties. First, even if the densities appearing in the

G-computation fo r mu l a (10) were known for each g E si nce F9 (y) is a high-dimensional integral, in general, it cannot be analytically evaluated for any g and thus, must be evaluated by a M o nt e Carlo integral approximation the Monte Carlo G-computation algorithm (Robins, 1987, 1989).

Second, even if F9 (y) could be wellapproximated for each regime the cardinality of the set is enormous [growing at faster than a n exponen tial rate in K (Robins, Thus it would be com putationally infeasible to evaluate F9 (y) for all g E to determine whether the "g" -null hypothesis h e ld . However, as we saw in

Sec. the most f u n da men t a l difficulty with the usual parameterization of a DAG in terms of the densities f( Vj I paj) is that it is only sufficient but not neces�ary for the "g" -null h;y p ot he sis to hold that f{P.i I Cj-t,ilj-1) and f(y I fKJiK) do not depend on aj-l and ilK respectively. As a consequence, if we u s e standard parametric models for f(v; I paj), (i)there is no p a r a m e t e r , say the value zero if and only if the "g" -null hypothesis is true, and (ii) the "g" -null hypothesis, even when true, may, with probability approaching

1, be rejected in large samples. G-null

Tests

As in the special case discussed in Sec. 1, a better app r o ach to testing the "g" -null hypothesis is based on the following theorem of Robins (1986).

G-null theorem:

The "g"-Null Hypothesis (11) is true<=? We now use (14) to construct g-null tests. For variety, in this section we shall suppose Ak is dichotomous. Suppose we can correctly specify a logistic model f(Am = = {1+ exp(-et�Wm)}-1, (15) m = ... , K, where Wm is a known p-dimensional function of Lm,

Am-I·

This will always be possible in a true sequential randomized trial since f (Am = I Lm,

Am-1) is known by design. L et Qm = q(Y,Lm,Am-d where q(-,·,·) is any known real-valued function chosen by the data analyst. Let () be the c oe ffi c i ent of Qm when OQm is added to the regressors a:�Wm in (15). If, for each m, (15) is t r u e for some eta, then hypothesis ( 14) is equivalent to the hypothesis the true value ()0 of () is z e r o . A score, W a ld , or likelihood r a t i o test of the hypothesis ()0 = can then be computed using logistic regres sion software where, when fitting the logistic regres sion model, each subject i s re g a rde d a s co n t r i bu ti n g I< + 1 independent B er n ou ll i observations Am ; - one at each treatment time to,t1, . . . ,tK. Robins,(1992) refers to any such test as a g-test and provides mathe matical justification. A g-test is a semi parametric test since it onl� re

In a true sequential randomized trial eta will be known and need not b e estimated.

In an observational st u dy , a:0 will need to be estimated, and the g-test is only uaranteed to reject at its nominal level if the model (15) is correct. However, in contrast to the disturbing results summarized in Lemma any parametric model f (am I lm, am-I; a) is compatible with the "g"null hypothesis ( 11). That is, there exist joint distributions for V under which the parametric model f (am I fm, am -I; a ) is correctly specified and the "g"null hypothesis ( ll) holds. Structural Nested Models

In this Section, we describe the class of structural nested models. In this paper, we shall only consider the simplest structural nested model - a structural nested distribution model for a univariate continuous outcome Y measured after the final treatment time tK. Robins (1989, 1992, 1994, 1995) considers generalizations to discrete outcomes, multivariate outcomes, and failure time outcomes. We assume LK +I i s a univariate continuous-valued random variable with a continuous distribution function and denote it by Y. Our g-test of the "g" �null hypothesis ( was unlinked to any estimator of F9 (y).

Our first goal in this subsection will be to derive a complete reparameterization of the joint distribution of V that will offer a unified fully parametric likelihoodbased approach to testing the "g" -null hypothesis and estimating the function F9 (y).

Then we will develop a unified approach to testing and estimation based on the semiparametric g-test of Sec. The first step in constructing our reparameterization of the distribution of V is a new characterization of the "g"-null hypothesis ( We assume the conditional distribution of Y given (£m, O:m) has a continuous positive density with respect to Lebesgue measure. Given any treatment history a = aK, adopt the convention that (am, is the treatment history that agrees with a through tm and is zero thereafter. Recall that the quantile-quantile function linking any two continuous distribution functions F1 (y) and

F2 (y) is = F1-1 {F2 (y)}.

It maps quantiles of

F2(y) into quantiles of

F1 (y).

Let be the quantile-quantile function . mapping quantiles of

Fg=('ii..,,o) (y I Rm) into quantiles of

Fg=('iim-1,0) (y I fm).

It follows from its definition as a quantile-quantile function that: (a} = y if am = (b) is increasing in y; and (c) the derivative of w.r.t. y is continuous. Examples of such functions are Reparameterizing

DAGs where Wm is a given univariate function of lm and 1 (Y)m,am) = yexp {2am + + · (17) Our interest in is based on the following theorem proved in Robins (1989, 1995a).

Theorem 3 = y for all y,m)m,am if and only if the "g "-null hypothesis (11) holds. In view of theorem (3), our approach will be to construct a parametric model for !(Y, fm, am) depending on a parameter such that = y if and only if the true value of the parameter is We will then reparameterize the density of the observables V in terms of a random variable which is a function of the observables and the function l(y,lm,O:m)· As a consequence, likelihood-based tests of the hypothesis = will produce likelihood-based tests of the "g"null hypothesis. Definition: The distribution F of V follows a pseudostructural nested distribution model , . (y, fm' am' 1/!) if (y, fm, am) 1" (y, lm, am, 1/!o) where (1) !"(·, ·, ·, ·) is a known function; (2) 'lj.;o is a fi nite vector of unknown parameters to be estimated; (3) for each value of '!j;,

1" (y, fm, am, '!j;) satisfies the conditions (a), (b), and (c) that were satisfied by (4) 81* (y,fm,am,'lj.!) /8¢' and fJ21" (Y.£m,7im,1f;)/81f;'f)y are continuous for all!/!; and (5)

1• (Y.Zm,Cim,1/!) = y if and only if¢ = so that = represents the "g-" null hypothesis. Examples of appropriate functions

1" (y, fm, Om, can be obtained from Eqs. (16) and (17) by replacing the quantities and 4 by the components of = ('r/!J , 1/J2, 'lj.;3). We call models for pseudo-structural because pseudo-SNDMs are models for the distribution F of the observables V regardless of whether this distribution has a structural (i.e. causal) interpretation (as it would in a sequential randomized trial). When does have a causal interpretation as w e l l , we refer to our models as struc tural nested distribution models (SNDMs). Next we recursively define random variables

H K, . . . , H o that depend on the observables V as follows. HK := I(Y,LK,AK), Hm := hm (Y,LK,AK) := 1 (Hm+l,Lm,Am), and H := h (Y,LK,AK) ::: Ho.

Note Y = h-1 (H,LK,AK) is a deterministic function of H , L x , A K where for any function q(y,•), we define q-1 (u,•) = y if q(y,•) = u. Note by Theorem ( 3) if the "g" -null hypothesis is true, then H = Y. Robins and Wasserman

Example: If I' (y, Zm, am) is given by Eq. (16), then K Hm == Y + 2: (2Ak + 3AkAk-t + A k Wk) and h-1 k=m (u,ZK, aK) = U- [ E + + ] . m=O Since ")' (Y, L m, Am ) is increasing in Y , the map from V = (Y, LK, AK) to (H, LK, AK) is 1-1 with a strictly positive Jacobian determinant. Therefore, fvi-A. (Y,LK,AK) = {8Hf8Y} JHL·A· , K, k , R., K (H,LK,AK)·

However, Robins (1989, 1995) proves that It follows that K fv,LK,AK (Y,LK,AK) == {8H/8Y}f{H} (18) II f (Lm I Lm-1,

Am-1, H) J [Am I Lm, Am-d · (19) (19) is the aforementioned reparameterization of the density of the observables. For completeness, we prove (18) in the Appendix. Remark:

Eq. (19) is only a reparameterization. In particular, Eqs. (18) and (19) do not translate into restrictions on the joint distribution of V = (Y, LK, AK) since any law for V satisfies (18) and (19). Conversely, (i) any function 'Y (y, Zm, am) satisfying I' (y, Rm, am) = y if am= 0, OJ' (Y.£mJim) joy is positive and continuous and (ii) any densities fH (h), !£, (lm I Rm-1 a m - , h) and JAm (am I Cm, a m - ) together induce a unique law for V = (Y, LK, AK) by (a) using fH ( • ) ,JLm ( • I • ) and fAm ( • I • ) to determine a joint distribution for (H, LK, AK) satisfying H U Am I A m - , Lm and then (b) defining Y to be h-1 (H,LK,AK) with h-1(-,·,·) defined in terms of ")' ( · , ·,·)as above. It follows that joint distribution F for the observed data V can be represented by an enlarged DAG Gmt based on the ordering H, Lo, A0, L1, A1, .. . , LK, A K , Y which is complete except with arrows from H to Ak, k = ... , K_.l. m�sing. The dependence of Y on its parents (H, AK, LK) is completely deterministic. Furthermore, by Theorem 3, the "g"-null hypothesis is represented by the subgraph of this DAG in which all arrows into Y are removed except for the arrow from H [since, under the "g" -null hypothesis, Y = H].

Robins (1989, 1995) shows F9 (y) based on Genl is equal to

Fg (y) based on DAG G if the Yk ( • ) , k =

0, . . . , K, are not functions of H. Thus we have succeeded in reparameterizing the joint density of the observables in terms of the function 'Y ( Y. f m, a m ), its derivative with respect to y, and the densities f [lm I Zm-1,am-1,h], f(h), and f [am lfm,am-d · We can then specify a fully parametric model for the joint distribution of the observables by specifying (a) a pseudo-SNFTM ")'• (Y.fm, am, 7/Jo) for 'Y(Y,fm,am), and (b) parametric models f[i'm I Rm-1, am-t,h;o], f ( h ; ry o ) , and f [am llm,ilm-l;a:o] for the above densities. It follows from (19) that the maximum likelihood estimates of (

0, the reparameterization (19) has allowed us to construct fully parametric likelihood-based tests of the "g" -null hypothesis based on the Wald, score, or likelihood ratio test for = (20). Estimation of

F9 (y)

If our fully parametric likelihood-based test of the null hypothesis ¢0 = rejects, we would wish to employ these same parametric models to estimate F9 (y) for each g E We shall accomplish this goal in two steps. First we provide a Monte Carlo algorithm which produces independent realizations

Yv,g of a random variable whose distribution function is

F9 (y).

MC Algorithm: Given a regime g: Step (1): Draw hv from fH( h ) Step (2): Draw i'o,v from f[i'o I hv] Step (3):

Do for m = ... , J{ Step (

Draw l m , v from f [ f m I Rm-1,v, g(Cm-1,v ), hv]·

Step (5): Computeyv,g = h-1 (hv)K,v,g(RK,v)). Robins (1989, 1995) shows that the ( Y r , g , ... , Yu,g, ... ) are independent simulations from F9 (y) based on

Genl and thus are independent realizations of a random variable with distribution function

F9 (y) based on DAGG. Second, since fH (h) and f [l m I Rm-1,v, g(fm-1,v), hv] are unknown, in practice, we replace them with the estimates obtained above.

Ifi'(Y,lm,am) = J'(y,am) does not depend on Rm (i.e., there is no treatment-covariate interaction), then, in the above algorithm for a non-dynamic regime g = (a), steps (3)-(5) can be eliminated since h-1 (u,ZK,aK) := h-1 (u, aK) does not depend on fK; as a consequence, to draw from F9=(7i) (y) one does not need to model the conditional density of the variables

Lm.

In fact,

Fg=(a) (y) = pr { h-1 [H, aK] > y}. Example:

If")'(y,lm,am) =y+2am+3amam-1 then h-t( u ,ax ):::u -

L 2am+3amam-t· m=O Semiparametric Inference in SNMs g-Estimation of

In this Section, we assume Am is dichotomous. Robins ( 1 992) discusses generalizations to multivariate Am with (possibly) continuous components. Robins ( 1 992) argues that even in observational studies, one will have better prior knowledge about, and thus can more accurately model, the densities f (Am:::: I Lm, Am-t) [as in Eq. ( 15)] than the densities occurring in Eq. (20). Indeed, with some loss of efficiency, if Ak and Lk are discrete, we can use a saturated model in Eq. ( 15 ) , thus eliminating all possibility of misspecification. Additionally, there is no possibility of misspecification in sequential randomized trials since then f (Am :::: I Lm, Am-t) is under the control of and thus known to the investigator. It is for these reasons we prefer to test the "g" -null hypothesis n Lconsistent g -estimates ;f of which are based on model ( 15) and thus are consistent with the g-test of :::: will fail to cover 0 if and only if the g-test of :::: (a) add

0' Q':n ( [ rather than OQm] to the regressors a� W m in ( 15) where Q':n(?/J):::; q*{ H (?/;), Lm .Jfm - d, q* () is a known vector-valued function of dim chosen by the data analyst, (} is! dim valued parameter; (b) define the G -estimate to be the value of 1/J for which the logistic regression score test statistic of (} :::: (c) a 95% large sample confidence set for is the set of for which the score test (which we call a g -test) of the hypothesis (} :::: is treated as a fixed constant when calculating the score test. The optimal choice of the function q* () is given in Robins ( 1992). � - Given we now estimate F9 (y) by (i) finding that maximizes (20) with the expression in set braces set to 1 and with fixed at ;f, (ii) using the empirical distribution of H; (;f) as an estimate for the distribution of H and (iii) using the MC algorithm of Sec. (6.4) to estimate F9 (y) based on ( J;, ¢) and the empirical law of H; ( ¢). Indeed, if h-t (u,Zx , ax) does not depend on fx, then a n Lconsistent estimate of

Fg=(a) (y) is n-t i � t I { h-1 [ H; ( ¢) , ax, ; ] > y} where I{A} :::: 1 if A is true and is zero otherwise. This estimator has the distinct computational advantage over an estimator based on the G-computation algorithm formula Reparameterizing DAGs (10) of requiring neither integration nor modelling of the conditional density of the covariates Lm given the past. Testing and estimation of Direct Effects

Structural nested distribution models are appropriate for testing for and estimation of the joint effect of a single time-dependent treatment Ax given sequentially in which the null hypothesis of interest is the "g" -null hypothesis ( 1 1). This model is inappropriate for testing the null hypothesis of whether a given treatment (say, A0) has a direct effect on the outcome Y when a subsequent treatment (say, At) is m anipulated (set) to a particular value a

I ·

Appropriate models for direct effects are discussed in Appendix 3 of Robins ( 1997a) and Robins ( 1997b). In Section 8.3, we provide an introduction to these models. In this subsection, we demonstrate why SNDM models are not appropriate for testing for direct effects. Specifically, again consider the example of Sec. 1 but now suppose Aa is the drug aerosolized pentamidine while At remains treatment with AZT. Suppose it is known that AZT has a direct effect on the outcome Y .

Our goal is to test the null hypothesis that there is no direct effect of A0 on Y .

Our know ledge that

A I affects Y rules out DAGs 1b and 1 d as the causal DAGs generating the data. The null hypothesis of no direct effect of A0 on Y is represented by DAG 1c in which the arrow from A0 to Y is missing. The alternative hypothesis is represented by graph 1 a. Verma and Pearl ( 1 991) and SGS ( 1993, p. 192) have also considered this testing problem. The restriction on the marginal distribution of V = (A I, Az, L, Y) entailed by the no A0 effect null hypothesis of DAG lc is that /g=(ao,a,) (y) is not a function of aa which cannot be represented by a conditional independence constraint amongst subsets of the variables in V (Robins, 1986; Verma and Pearl, 199 1 ; SGS, 1993, p. 193). Robins (1997a) shows this restriction is equivalent to the hypothesis that (y, ao, fa) :::: y (21) and E { (y, ilt, Yt) I A o = ao, Lo =fa} (22) does not depend on a0. Note that, in our example, La is not present and L1 = L.

Suppose therefore we choose to test (21) and (22) by specifying (i) a SNDM given by and (y, ft' Ci t ' = (y, f, iit, = Y - a t -1jl3a1ao - ?j;4a1£ -1/J5a t aof (24) Robins and Wasserman and (ii) the log i st i c model (5) for the probability of L given Ao.

To simplify the following argument, we suppose that U is dichotomous. W hen causal DAG 1c generated the data,we say that there is an A1 - U treatment interaction if l(l) (y, at) l(o) (y, a t ) where ,en (y, a1) maps quantiles of

F (y I a1, U = j) into those of F (y I a1 = U = j ) . For example, if At affects Y only when U = and has no effect when U = there is an At - U interaction since then ,co) (y, A I ) = y and /(I) (y, At) y. If there is an

A1- U interaction, then, similarly to Lemma 1, either the logistic model (5) and/or the structural nested distribution model (24) must be misspecified under the no A0 effect null hypothesis. Formally, we have Lemma 2 : If (i) the no A0 effect null hypoth e sis represented by DAG 1c is true, (ii} the distribution of W is either faithful or linearly faithful to DAG 1c, and (iii} there is an At - U interaction, then model (5) and/or the SNDM (24) is misspecified. We conc l ude that it is not adequate to test for and/or estimate direct effects using either the standard DAG parameterization or the reparameterization induced by a SNDM. Robins ( , App. and Robins (1997b) suggest "direct effect" structural nested models which lead to alternative appropriate reparameterizations. Proof of Lemma 2:

We noted above that DAG l c implies (22). Now under our models (5) and (24), (22) can be written y -'that -'lj;3aoa1 -[1/;4a1 + 1/Jsaoat] expit [!a+ It ao] which does not depend on ao if and only if either = (25) or (26) Now, since L and Ao are not d-separated on DAG 1c, we conclude /l if the distribution of W is faithful or linearly faithful to DAG lc. We now complete the proof by showing that (26) also cannot be true. Were (26) true, we conclude from model (24) that (y, £1 , at) (y, a 1 ) does not depend on ao or £ £ , resulting in the following contradiction. Since ( L ,Ao)U Y I A t, U and

A tU U I L ,Ao on DAG lc,

F (ylat, ao ,£) F (yla t,U =1) p (ao, £) + F (y l at, U = O) {l -p (a0,f)} where p (ao, f) pr [U = I ao, £]. Now by definition of (ft , at), F [i (Rt, 'ilt )l at =O ,ao , £] = F [ylat, aa ,l] .

So, if (23) is true, we have

F[!(Y, a 1 ) I a1 =

0, U = l]p(ao, f) + F [i(Y, a 1 ) I at = U = - p(ao,l)] = F [y I a1 ,U = l]p(ao,f)+F(y I at ,U = p(ao, £)]. This implies that F (y I a 1 , U = j) = F [I (y, a ! ) I a1 = U = j] for j =

0, 1 which c an be rewritten as ,co) (y, at) = /(I) (y, a t ) = (y, at) contradicting premise (iii). g-null test An appropriate approach to testing the no d i r e c t A0 effect null hypothesis is based on the following theorem . Theorem Direct-effect

G-null t heorem:

The no direct Ao effect null hypothesis that h=(aa,aJ) (y) does not depend on a0 is true if and only if, for any functions it ( • ) and t 2 ( • ) , E [h (Y ) t2 (At) /WI I A a ] does not depend on A0 w . p . , whenever the expe c ta tion is finite, where W1 = f (At I L , A o ) .

Proof of Theorem By Fubini's theorem, the expectation can be written f�00 t2(at)q(at, Ao)da t , q(at,Ao) = it (y) x [ t, f(y l l, at,Ao)f(£ I A a ) ] dy. (27) Now the term in square brackets in (27) is /g=(Aa,ad (y). Recalling that i t (y) is arbitrary proves the theorem.

Remark! : If A1 is discrete, we can always choose t2 (at )=

1. However, as in this example, if A1 is continuous, we need to choose t2 (at) so as to make (27) finite. For example, if q (at, Ao) were identically then (27) is finite if and only if f�oo t 2 (at) dat is finite. Remark 2:

If on DAG lc, At had been parentless, then, by d-separation and faithfulness, the no direct Ao effect null hypothesis of no arrow from Ao to Y is true if and only if A0 and Y are independent. Theorem implies that any test of independence of Ao and Y (which is linear in Y ) can still be used to test the no direct A0 effect null hypothesis when A t has parents (A0, L) provided, in calculating the test, Y is replaced by W = i1 (Y) t2 {At ) J W1.

The choice of t t( • ) and t 2 ( • ) will affect the power but not the level ofthe test. This procedure can be implemented in a randomized trial where f (At I L, A0) is known by design. In observational studies, in a preliminary step, one must specify a parametric model! (at £ , ao; a ) and find a that n maximizes the likelihood [l f (Ali I L;, Ao;; o: ) andre� i = I place w by w (a) = t t(Y ) t2 (At) If (AI I £, Ao; a) .

When o: is estimated, the p-value outputted by off - the shelf software will exceed the true p -value (i.e., the test is conservative), although a corrected p-value can be easily computed (Robins, 1997b ) . We now generalize this example by considering estimation and testing of direct effects using directeffect SNDMs. We suppose that treatment Am = (Apm, Azm) at time tm is comprised of two different treatments Apm and

Azm.

To formalize the nodirect-effect null hypothesis, let = (gpo, . . . , 9PK ) be a co l lection of functions where : lm --> APm.

Then , for history az := azx E A z , le t g = (gp, az) e the treatment regime or plan given by Um (Em ) = { UPm (Em) , azm }· Then

F(gp,az )(Y) is the distribution of Y that would be observed if Az was set to az and the treatments Ap were assigned, possibly dynamically, according to the plan gp. If gp is the nondynamic regime (apm, F(apm,az) (y).

Definition:

The direct effect "g" -null hypothesis of no direct effect of Ap controlling for Az is F g p , a z (y) = Fgp.,az (y) for al l az, gp, gp • .

Let lm, apm , az) be the quantile-quantile function mapping quantiles of

F(aPm .az) (y I Em) into quantiles of

F(- -) (y I Em) which satisfies ap(Tn-l ),a.z (a)

I (Y)m, CiPm,az) y if apm (b) I (Y, lm , aPm,az) is increasing in y; and {c) the derivative of w .r.t. y is continuous. Robins ( 1 997b) proved that I(Y. lm,CiPm, az ) = y if and only if the direct-effect "g" -null hypothesis holds. We now construct a parametric model for I(Y)m ,aPm,az).

Definition:

The distribution F of V follows a direct-effect pseudo-structural nest�d distribution modeli (Y,fm , CiPm, az ,t/J) if I(Y,fm, CiPm,az) I(Y, lm , CiPm, az, t/Jo) where (1) is a known function; (2) t/Jo is a finite vector of unknown parameters; (3) for each value of t/J, (y, lm, apm, az, t/J) satisfies the above conditions (a)-{c); (4) fh (y,Rm , apm, az, t/J) /EN' and &21 (Y)m, apm, az, t/J) j&tf;'oy are continuous; and (5) (y, lm, apm, az , t/!) = y if and only if t/J = so that t/Jo = represents the direct-effect " g -" null hypothe SIS.

We now consider testing and estimation of t/!o.

Our fundamental tool is the following theorem of Robins (1997b) characterizing (y, lm, apm, az) . For any function (y, lm, apm, az ) satisfying conditions (a)-(c) above, we recursively redefine the following: HK(I* ) = (Y,LK , APK, Az ), Hm (i* ) (Hm+ I (r* ), Lm,APm,Az), and set H (1*) = K Ho (I*) . Define Wm = f1 f (Azk I Ak - 1 , Lk- 1 ) and k=m Azm = (Az m ,· . . ,AzK)· Theorem (Y, Lm, Apm , Az) I(Y, Lm, APm,Az) w.p. l if and only if for m = . .. J( and any functions tm ( · , · ) , E [ tm ( A.z(m+ 1 ), H(I* ) ) /Wm+I I Am, Lm J does not depend on Apm w.p.l , when the expectation is finite. Given a direct-effect SND M define H ( t/!) to be H (r*) with the function Theorem 5 implies that we can construct tests and confidence intervals for t/Jo in observational studies using off-theshelf software as follows.

Reparameterizing DAGs

Step 1: Specify a parametric model f (AZk I Ak - 1 , Lk; aC1l ) and calculate the

MLE (i(l) K that ma xt m t zes TI TI f (AZk; I A(k - I ) i 1 Lk; ; aC1l ) i k=O and let Wm (&( ll) denote Wm evaluated under the density indexed by &(1 ) . Step 2 : Fo r m = . . . , K, specify a model for the conditional mean of Apm depending on a(O) (28) where Qm is a known vector function of

Azm, Am- 11 Lm and d ( •) is a known link function. For example, if Apm is dichotomous, we might choose d ( x ) = { ( -x ) } -1. Step 3: Compute an a -level test of the hypothesis that (} = (t/;) = O'q;.. (H (t/!), Am- 1, Lm, A.zm) / Wm+1 (&(1l) to the a(O)' Q m where (i) q;.. ( •) is a chosen function of the dimension of 'lj! , (ii) in testing (} =

0, we treat the Q;.. ( t/!) as "fixed covariates" and (iii) we use generalized estimating equation (GEE) software available in S+ or SAS that regards Apo • • • , APK as correlated. This test is a conservative a-level test of the hypothesis 'lj! = t/;0. A conservative 95% confidence interval, guaranteed to cover t/!0 at least 95% of the time in large samples, is the set of t/! fo r which the .05-level test of e = fails to reject. The tests and interval are conservative because standard software programs do not adjust for the effect of estimating a (I l . Robins (1997b) describes a complete reparameteriza tion of the distribution of V with the direct-effect SNDM model t/J) as a component and describes how to estimate, with this reparameterization, the contrasts F(g.,az) (y) -Fc9 ••• (y).

Appendix

Proof of ( 1 8 ) :

W e will show b y induction that which implies Am f1 H m I Lm ,

Am-1 ·

Furthermore, H is a deterministic function of ( H m , Lm , Am-1) which proves (18). Case 1: m K: pr [ HK > y I LK, AK] pr [Y > (y, LK,AK) I LK , AK] = Fg=(AK) [1- 1 (y,LK, AK) I IK] = Fg=(AK_ , ,o) (y I LK )·

Case Assume (A. l) holds for m. If we can show it holds for m - Eq. (18) is proved. pr[Hm - 1 > Y I Lm-1,Am- t l = f { pr[ Hm > l-1(y,Lm- 1 , Am- d I Lm, Am]} dF[Lm, Am I Lm- 11 Am- I ] J Fg=(Am-t,O)[r-1(y, Lm- 1 ,Am- d I Lm]dF(Lm,Am I Lm- 1 Am_ t) = Fg=(Am-l ,O)[r- 1(y, Lm- 1, Am- I ) I Lm- 1} = Fg=(Am_,,o/Y I Lm_ t ) where the third to Robins and Wasserman last equality is by the induction hypothesis and the second to last by the definition of

Fg=a(Y

I Lk)·

References

Pearl, J . (1995), "Causal diagrams for empirical research,"

Biometrika,

82, 669-690. Pearl, J. and Robins, J.M. (1995). "Probabilistic evaluation of sequential plans from causal models with hidden variables," From:

Uncertainty in Artificial Intelligence:

Proceedings of the Eleventh Conference on Artificial Intelligence, August 18-20, 1995, McGill University, Montreal, Quebec, Canada. San Francisco, CA: Morgan Kaufmann. pp. 444-453. Pearl, J . and Verma, T. (1991). "A Theory of Inferred Causation." In:

Principles of Knowledge, Representation and Reasoning: Proceedings of the Second International Conference. (Eds. J .A. Allen, R. F ikes, and E. Sandewall). 441-452. Robins, J.M. (1986) , "A new approach to causal inference in mortality studies with sustained exposure periods -application to control of the healthy worker survivor effect,"

Mathematical Modelling,

7, 1393-151 2. Robins, J . M. (1987), "Addendum to 'A new approach to causal inference in mortality studies with sustained exposure periods-application to control of the healthy worker survivor effect' ,"

Computers and Mathematics with Applications,

14, 923-945. Robins, J .M. (1989), "The analysisofrandomized and non-randomized AIDS treatment trials using a new approach to causal inference in longitudinal studies," In:

Health Service Research Methodology: A Focus on AID S , eds. Sechrest,

Freeman, H., M ulley, A . , NCHSR, U .S. Public Health Service, 113-159. Robins, J . M. (1992), "Estimation of the timedependent accelerated failure time model in the presence of confounding factors,"

Bzometrika,

79, 321-334. Robins, J . M . (1993), "Analytic methods for estimating HIV-treatment and cofactor effects," In:

Methodological Issues in AID S Mental Health Research, eds. Ostrow,

D.G. , and Kessler, R.C., NY: Plenum Press, 213-290. Robins, J . M. ( 1 994), "Correcting for non-compliance in randomized trials using structural nested mean models,"

Communications in Statistics,

23, 2379-2412. Robins, J . M . (1995a). "Estimating the Causal Effect of a Time-varying Treatment on Survival using Struc-tural Nested Failure Time Models," (To appear,

Statistica N eederlandica). Robins, J .M. (1995b). "Discussion of 'Causal Diagrams for empirical research' by J. Pearl,"

Biometrika,

82, 695-698. Robins J .M. (1997a). "Causal inference from complex longitudinal data," In:

Latent Variable Modeling and Applications to Causality. Lecture Notes in Statistics ( 120), M. Berkane, Editor. NY: Springer Verlag, 69-117. Robins J . M. (1997b ). "Testing and estimation of direct effects by reparameterizing directed acyclic graphs using structural nested models," Technical Report, Department of Epidemiology, Harvard School of Public Health. Rosenbaum, P.R. ( 1 984), "Conditional permutation tests and the propensity score in observational studies,"

Journal of the American Statistical Association,

79, 565-574. Spirtes, P., Glymour, C . , and Scheines, R. ( 1993).

Causation, Prediction, and Search.