[PDF] Bayesian Set of Best Dynamic Treatment Regimes and Sample Size Determination for SMARTs with Binary Outcomes

Abstract

One of the main goals of sequential, multiple assignment, randomized trials (SMART) is to find the most efficacious design embedded dynamic treatment regimes. The analysis method known as multiple comparisons with the best (MCB) allows comparison between dynamic treatment regimes and identification of a set of optimal regimes in the frequentist setting for continuous outcomes, thereby, directly addressing the main goal of a SMART. In this paper, we develop a Bayesian generalization to MCB for SMARTs with binary outcomes. Furthermore, we show how to choose the sample size so that the inferior embedded DTRs are screened out with a specified power. We compare log-odds between different DTRs using their exact distribution without relying on asymptotic normality in either the analysis or the power calculation. We conduct extensive simulation studies under two SMART designs and illustrate our method's application to the Adaptive Treatment for Alcohol and Cocaine Dependence (ENGAGE) trial.

Full PDF

BBayesian Set of Best Dynamic Treatment Regimes and Sample SizeDetermination for SMARTs with Binary Outcomes

William J ArtmanDepartment of Biostatistics and Computational Biology,University of Rochester Medical Center,Rochester, Saunders Research Building, 265 Crittenden Blvd., NY 14642, USA

[email protected]

Ashkan ErtefaieDepartment of Biostatistics and Computational Biology,University of Rochester Medical Center,Rochester, Saunders Research Building, 265 Crittenden Blvd., NY 14642, USAKevin G LynchCenter for Clinical Epidemiology and Biostatistics (CCEB)and Department of Psychiatry,University of Pennsylvania,3535 Market Street, 5099, Philadelphia, PA 19104, USAJames R McKayDepartment of Psychiatry, Perelman School of Medicine,University of Pennsylvania,3535 Market St., Suite 500, Philadelphia, PA 19104, USA

Abstract

One of the main goals of sequential, multiple assignment, randomized trials (SMART) is to ﬁnd themost eﬃcacious design embedded dynamic treatment regimes. The analysis method known as multiplecomparisons with the best (MCB) allows comparison between dynamic treatment regimes and identiﬁ-cation of a set of optimal regimes in the frequentist setting for continuous outcomes, thereby, directlyaddressing the main goal of a SMART. In this paper, we develop a Bayesian generalization to MCBfor SMARTs with binary outcomes. Furthermore, we show how to choose the sample size so that theinferior embedded DTRs are screened out with a speciﬁed power. We compare log-odds between diﬀerentDTRs using their exact distribution without relying on asymptotic normality in either the analysis orthe power calculation. We conduct extensive simulation studies under two SMART designs and illustrateour method’s application to the Adaptive Treatment for Alcohol and Cocaine Dependence (ENGAGE)trial. a r X i v : . [ s t a t . M E ] A ug eywords: Dynamic treatment regimes, Sequential multiple assignment randomized trials, Bayesian,Binary outcomes, Multiple comparisons with the best, Sample size determination

Substance use disorders are a common yet debilitating group of conditions for which treatment response ishighly variable (McKay, 2009; Kranzler & McKay, 2012; Black & Chung, 2014; Witkiewitz et al. , 2015).While there exist interventions, physicians often must rely on clinical experience to choose subsequent ther-apies for individuals who have failed to respond to initial treatment. In order to personalize medicine, thereis a need for longitudinal trial data as well as methodologies for comparing sequences of treatment decisionrules.Sequential, multiple assignment, randomized trial (SMART) designs are a type of longitudinal clinicaltrial speciﬁcally designed to determine the optimal sequence of decision rules called dynamic treatmentregimes (DTRs) (Lavori et al. , 2000; Murphy, 2005; Lei et al. , 2012; Rose et al. , 2019; Murphy et al. ,2001; Murphy, 2003; Robins, 2004; Nahum-Shani et al. , 2012; Chakraborty & Moodie, 2013; Chakraborty& Murphy, 2014; Laber et al. , 2014). In particular, there are DTRs embedded in the SMART by design;hence, SMARTs provide evidence for the regime tailored to a subject’s response.Standard multiple comparison approaches used to control for type I errors result in a loss of statisticalpower. This is critically important when working with smaller data sets such as clinical trials. Ertefaie et al. (2015) introduced a frequentist semiparametric model for estimation of the embedded DTR outcomes aswell as continuous outcome multiple comparisons with the best (MCB) for comparing the embedded DTRs.MCB enables the construction of a set of optimal embedded DTRs which are statistically indistinguishablefor the available data. MCB oﬀers increased power while controlling the type I error rate so that the bestembedded DTR is included with a speciﬁed probability (Hsu, 1981, 1984, 1996; Artman et al. , 2018). Byrequiring only L − L is the number of embedded DTRs compared with all (cid:18) L (cid:19) pairwisecomparisons, MCB yields greater power over other methods. However, applications of MCB have focusedon continuous outcomes.Binary outcomes arise frequently as primary outcomes in psychiatry and addiction clinical trials. Theexiting methods for estimation in SMARTs with binary outcomes do not permit complex comparisons be-tween all of the embedded DTRs such as with MCB (Kidwell et al. , 2018; Dziak et al. , 2019). Moreover,the power analysis approaches for a SMART almost entirely focus on continuous outcomes aiming to sizea SMART for simple comparisons between two embedded DTRs or to power a SMART so that the bestembedded DTR has the largest point estimate with a speciﬁed probability (Crivello et al. , 2007a,b). More2ecently, Rose et al. (2019) proposed a method which relies on Q-learning for continuous outcomes to sizea SMART. Kidwell et al. (2018) developed a sample size calculator for binary outcome SMARTs, but theirmethod is restricted to comparisons between only two embedded DTRs. Furthermore, they assume asymp-totic normality which may not hold due to the small sample size of embedded treatment sequences in aSMART.Yan et al. (2020) developed a frequentist sample size calculation method for pilot SMARTs. In particular,they size a SMART so that the mean outcome of a DTR is within “a margin of error”. In other words, so thatconﬁdence intervals are less than a pre-speciﬁed length. Advantages of their method include applicability tocontinuous, count, and binary outcomes. A limitation is that normality is assumed for all outcome summarystatistics including log-OR. However, this may be unreasonable due to the small sample size in each branchof the SMART. This could be problematic especially in a pilot SMART for which the sample size is smallby design. An advantage of our method is it does not make such strong parametric assumptions and usesuninformative priors to avoid bias. An important similarity of their method to ours is taking into accountuncertainty in sample size calculations rather than only viewing the problem as that of estimation. Inaddition, for our method, the best embedded DTR will be included with a given probability. In summary,our method makes comparisons between > et al. (2020) emphasize if there is no hypothesis testing in the pilotSMART, then standard power analysis is not relevant. Therefore, their method ﬁlls an important gap forpilot SMARTs. However, in this article, we are interested in drawing inference on the optimal embeddedDTRs and sizing SMARTs for this purpose.Ogbagaber et al. (2016) presented a method for power analysis based oﬀ an omnibus test as well asall pairwise comparisons for continuous outcomes. Artman et al. (2018) proposed a power analyses forSMARTs using MCB approach. Speciﬁcally, the latter method computes the number of individuals to enrollin a SMART in order to achieve a speciﬁed power to exclude embedded DTRs inferior to the best by a givenamount from the set of best. This approach, however, relies on the normality assumption of the outcome,and thus, cannot always be used with binary outcomes.In this article, we extend MCB to the Bayesian binary outcome setting to directly address currentmethods’ limitations. We build upon the work in Artman et al. (2018) to develop a rigorous power analysisprocedure. We simulate from the posterior of each parameter for each treatment sequence individuallywhich are independent. This circumvents the need for specifying the correlation matrix of the randomembedded DTR outcomes in the frequentist setting. Then, using Robin’s G-computation formula for adynamic regime (Robins, 1986), we represent each embedded DTR outcome as a weighted average of theindependent treatment sequences. We use the transformed draws in the MCB procedure to facilitate Bayesian3nference. We leverage the method in Mandel & Betensky (2008) to extend the MCB methodology to useMonte Carlo draws in the construction of the set of best embedded DTRs. In particular, it enables theconstruction of simultaneous one-sided upper credible intervals for log-odds ratios, comparing each embeddedDTR to the best embedded DTR when the outcome is binary. Our proposed approach obviates many of thechallenges associated with the current methods of choice, namely(i) in contrast with the method of Artman et al. (2018), our Bayesian MCB method can be used directlywith binary outcomes and our power analysis requires fewer parameter speciﬁcation;(ii) in contrast with the existing multiple comparison approaches, our approach performs fewer compar-isons which increases the statistical power; and(iii) is designed speciﬁcally for statistical inference for complex comparisons of > In the Adaptive Treatment for Alcohol and Cocaine Dependence SMART (ENGAGE; Figure 1), individualswere initially randomized to two diﬀerent treatments: motivational interviewing (MI) for intensive outpatientprograms (IOP) and for patient’s choice (PC). PC consisted of an intervention which the patient chooses.Patients who were non-engagers (did not attend any therapy sessions) at 2 weeks were enrolled. After 8 weeks,those who failed to attend any sessions during weeks 7 and 8 were classiﬁed as non-responders/non-engagersand were re randomized to either MI-PC or No Further Care (NFC). Those who were still responders wereassigned to NFC. The original study analysis suggested that MI-IOP was signiﬁcantly more eﬀective thanMI-PC. In this paper, we analyze the SMART to compare embedded DTRs. We restrict our attention in thestudy to individuals who did not engage by week 2 which yielded a sample size 148. One of the main goalsof SMART designs is the identiﬁcation of the best embedded DTRs for a primary outcome (Nahum-Shani et al. , 2012). We construct a binary outcome which is whether the log of the total number of drinking daysplus cocaine use days was less than the 25 th percentile of the log of the total number of drinking days pluscocaine use days. 4 MI-IOP

EngagersContinued Non-Engagers

NFCMI-PC

Engagers

NFCNFCMI-PCNFCMI-PC RR Month 2 Month 6

Month 3 Month 4 Month 5Entry

Non-engagers During the first weeks of IOP Continued Non-Engagers Figure 1: Structure of the ENGAGE trial.Determination of optimal embedded DTRs would provide evidence-based recommendations for clinicianswhen treating individual patients with substance use disorders. The ENGAGE study aimed to determinethe optimal treatment for non-responders (McKay et al. , 2015; Van Horn et al. , 2015). There were fourembedded DTRs. Two of the embedded DTRs were as follows:1. Start with MI+IOP. If the subject is engaged during the ﬁrst 8 weeks, then at the 8-week point, oﬀerNFC; if the subject is labeled as a non-engager at week 8 of follow-up, oﬀer MI+PC.2. Start with MI+IOP. If the subject is engaged during the ﬁrst 8 weeks, then at the 8-week point, oﬀerNFC; if the subject is labeled as a non-engager at week 8 of follow-up, oﬀer NFC.The other two embedded DTRs are similar (see Table 1). See McKay et al. (2015); Van Horn et al. (2015)for more details about the SMART.

We focus on a SMART in which there are two stages where at each stage, individuals are randomized todiﬀerent treatment options. By design, stage-2 treatment is commonly tailored based on a patient’s ongoing5

Treatment 1Treatment 2 RRR

Continue Treatment 1Maintain Treatment 1 and add Treatment 3Continue Treatment 2Continue Treatment 2 and add Treatment 3 RR Switch to Treatment 4Maintain Treatment 1and add Treatment 4Switch to Treatment 4Maintain Treatment 2and add Treatment 4

Stage 1 Stage 2

Figure 2: Diagram of the General SMART.response status which generates the diﬀerent embedded DTRs (see Figures 1 and 2). Let Y ( l ) denote thebinary potential outcome for an individual following embedded DTR l ∈ { , ..., L } . Let Y k denote the binarypotential outcome for an individual following treatment sequence k ∈ { , ..., K } . Let A ∈ {− , +1 } denotethe stage-1 treatment assignment indicator. Let A R2 ∈ {− , +1 } denote the stage-2 treatment assignmentindicator for responders and A NR2 ∈ {− , +1 } be the treatment assignment indicator for non-responders. Let S i denote the observed stage-1 treatment response indicator. Let θ ( l ) be the probability of response in the l th embedded DTR, l = 1 , ..., L , where we assume that the L th embedded DTR is the best. Let θ k denote theprobability of response in the k th treatment sequence, k = 1 , ..., K . Let Z i ∈ { , ..., K } denote the treatmentsequence indicator for subject i .Then, the log-odds ratio for the l th embedded DTR compared with the best embedded DTR is ζ ( l ) =log (cid:32) Odds ( l ) Odds ( L ) (cid:33) = log(Odds ( l ) ) − log(Odds ( L ) ) and Odds ( l ) = θ ( l ) − θ ( l ) . We do not make any distributionalassumptions about the log-odds or the log-odds ratio. Assuming that being equal to 1 is the desirable out-come, higher log-odds (and equivalently, log-odds ratios) implies superiority of the corresponding embeddedDTR. We assume observations Y obs i iid ∼ Bern( θ Z i ) where Z i is the treatment sequence followed by the i th individualand S i iid ∼ Bern( λ A ) for i = 1 , ..., n where A is the stage-1 treatment. Our target parameter is the probabilityof response at the end of the trial ( Y ( l ) = 1) for a particular embedded DTR. Using Robins’ G-computation6mbedded Dynamic Treatment Regime1 A = +1 , S = 1 or A = +1 , S = 0 , A = +1Receive MI-IOP, if respond receive NFC, if not respond switch to MI-PC2 A = +1 , S = 1 or A = +1 , S = 0 , A = − A = − , S = 1 or A = − , S = 0 , A = +1Receive MI-PC, if respond receive NFC, if not respond switch to MI-PC4 A = − , S = 1 or A = − , S = 0 , A = − Y ( l ) = 1) = Pr( Y = 1 | A = a l , S = 1 , A R2 = a l ) Pr( S = 1 | A = a l )+ (1)Pr( Y = 1 | A = a l , S = 0 , A NR2 = a l ) Pr( S = 0 | A = a l ) , l = 1 , , · · · , L. Thus, the mean outcome under each embedded DTR is a weighted average of the responder treatmentsequence outcome and the non-responder treatment sequence outcome. This permits estimation and inferencewithout using a marginal structural model. While baseline covariates could be included in (1), for the purposeof power analysis, we deﬁne the target parameter as a marginalized quantity. In our Bayesian procedure, weimpose a non-informative prior on θ k and λ A which results in beta posterior distributions. We then deﬁnethe treatment sequence response probabilities θ R = Pr( Y = 1 | A = a l , S = 1 , A R2 = a l ) , θ NR = Pr( Y = 1 | A = a l , S = 0 , A NR2 = a l ). Lastly, the probability of response in stage 1 for those assigned to A is givenby λ A = Pr( S = 1 | A = a ) . We leverage (1) to transform each MCMC draw to the target parameter and,subsequently, into a log-odds ratio. By taking this simulation based approach, we avoid making normalityassumptions. At the m th iteration of the MCMC, we draw θ R ,m , θ NR ,m , and λ A ,m . Hence, the m th drawof the probability of response for the l th embedded DTR is θ ( l ) m = θ R ,m λ A ,m + θ NR ,m (1 − λ A ,m ) . The m th draw of the log-odds ratio is ζ ( l ) m = log OR ( l ) = log (cid:16) θ ( l ) m − θ ( l ) m (cid:17) − log (cid:16) θ ( L ) m − θ ( L ) m (cid:17) . Next, we specifythe likelihoods. The likelihood for the observed end-of-study outcomes is given by the following f ( Y , ..., Y n | θ , ..., θ k ) ∝ (cid:81) ni =1 θ Y i Z i (1 − θ Z i ) − Y i , where the treatment sequence indicator is Z i ∈ { , ..., K } . We thenassume a non-informative uniform prior on θ k , k ∈ { , ..., K } which yields the posterior of θ k given by the7mbedded Dynamic Treatment Regime1 A = +1 , S = 1 , A R2 = +1 or A = +1 , S = 0 , A NR2 = +1Receive Trt. 1, if respond, continue Trt 1, if not respond, switch to Trt 4.2 A = +1 , S = 1 , A R2 = +1 or A = +1 , S = 0 , A NR2 = − A = +1 , S = 1 , A R2 = − A = +1 , S = 0 , A NR2 = +1Receive Trt 1, if respond add Trt 3, if not respond switch to Trt 44 A = +1 , S = 1 , A R2 = − A = +1 , S = 0 , A NR2 = − A = − , S = 1 , A R2 = +1 or A = − , S = 0 , A NR2 = +1Receive Trt. 2, if respond continue Trt. 2., if no respond switch to Trt 4.6 A = − , S = 1 , A R2 = +1 or A = − , S = 0 , A NR2 = − A = − , S = 1 , A R2 = − A = − , S = 0 , A NR2 = +1Receive Trt.2, if respond add Trt. 3, if not respond switch to Trt 4.8 A = − , S = 1 , A R2 = − A = − , S = 0 , A NR2 = − f ( θ k | y , ..., y n ) ∼ Beta (cid:32) (cid:88) i : Z i = k Y i + 1 , (cid:88) i : Z i = k − (cid:88) i : Z i = k Y i + 1 (cid:33) Furthermore, the posterior of the stage-1 response probabilities are given by f ( λ A | S , ..., S n ) = Beta (cid:32) (cid:88) i : A i = A S i + 1 , (cid:88) i : A i = A − (cid:88) i : A i = A S i + 1 (cid:33) In this section, we describe our MCB procedure based oﬀ Mandel & Betensky (2008) to construct simultane-ous 100(1 − α )% one-sided upper credible intervals using multivariate draws from an arbitrary distribution.Their method was originally designed to construct conﬁdence intervals for draws obtained using the boot-strap. In Section A.1. of the Appendix, we show that the latter approach is also applicable to Monte Carlosimulation draws of the posterior. The following theorem forms the basis for the extension of MCB to theBayesian setting. Theorem 1.

Let ζ ( l ) m , l = 1 , · · · , L , m = 1 , · · · , M denote draws of the parameters ζ ( l ) obtained from onte Carlo simulation. Let r ( m, l ) denote the rank of ζ ( l ) m and assume no ties. Then, U ( l ) = ζ ( l )( r − α l ) , l =1 , · · · , L − are simultaneous − α level upper credible intervals for ζ ( l ) , l = 1 · · · , L − where r − α is the (1 − α ) th quantile of max l r ( m, l ) for each draw m = 1 , · · · , M . The goal is to construct upper credible interval limits U ( l ) , l = 1 , · · · , L which jointly have coverage1 − α . Note that there is a one-to-one correspondence between a draw ζ ( l ) m and its rank r ( m, l ) in the absenceof ties where m is the m th draw, m = 1 , · · · , M . In particular, ζ ( l )( r ( m,l )) = ζ ( l ) m . Then, ζ ( l ) m is less than U ( l ) for all l = 1 , · · · , L for 100(1 − α )% of the draws m = 1 , · · · , M if and only if ζ ( l )( r ( m,l )) is less than U ( l ) for all l = 1 , · · · , L for 100(1 − α )% of the draws m = 1 , · · · , M . A suﬃcient condition is for r ( m, l )to be less than some integer r − α for all l , for 100(1 − α )% of the r ( m, l ), l = 1 , · · · , L, m = 1 , · · · , M .Equivalently, the max rank across embedded DTRs r ( m ) := max l r ( m, l ) is less than r − α for 100(1 − α )% ofthe draws m = 1 , · · · , M . Let r − α equal the 1 − α quantile of r (1) , · · · , r ( M ). Then, letting U ( l ) = ζ ( l )( r − α l ) for l = 1 , · · · , L completes the proof. See the Appendix for more details. The proof that the algorithmworks assumes no ties. When there are ties in rank, one can use the minimum rank to obtain coverage near100(1 − α )% for a suﬃciently large number of draws (Mandel & Betensky, 2008).Deﬁne ˆ B = { EDTR ( l ) | U ( l ) ≥ } where U ( l ) is the upper credible limit for the diﬀerence between the l th embedded DTR’s log-odds and the best embedded DTR’s log-odds. Then, regardless of sample size, ˆ B contains the true optimal embedded DTR with probability at least 1 − α by construction. It furthermoreadjusts for multiplicity by the above procedure. We deﬁne ˆ B to be the set of best embedded DTRs as itcontains embedded DTRs which are not statistically signiﬁcantly inferior to the best embedded DTR. Thisis analogous to MCB constructed for continuous outcomes in Ertefaie et al. (2015). Sample sizes which arelarger shrink ˆ B so that inferior embedded DTRs are excluded. In the next section, we prove how to choosethe sample size to achieve this goal. In the last section, we demonstrated how to construct a set of optimal embedded DTRs. In this section,we show how to size a SMART to obtain a set of embedded DTRs which are not diﬀerent in their eﬃcacyby a clinically meaningful amount. In terms of the credible intervals, those which cover 0 correspond withembedded DTRs which are not statistically distinguishable from the optimal embedded DTR. Those thatare strictly below 0 are inferior and are subsequently excluded from the set of best. We present how toperform power analysis for Bayesian MCB with a binary outcomes.Let η be the known inputs (response probabilities at each stage). We wish to determine the sample size9 such that Pr  (cid:92) l :∆ l ≥ ∆ (cid:110) U ( l ) < (cid:111) | η  = 1 − γ where 1 − γ is the power, ∆ is the clinically meaningful diﬀerence with the best embedded DTR and ∆ l isthe log-odds ratio between the l th embedded DTR and the best embedded DTR, Without loss of generality,assume that Y new is a suﬃcient statistic for η . Then,Pr  (cid:92) l :∆ l ≥ ∆ (cid:110) U ( l ) < (cid:111) | η  = (cid:90) Pr  (cid:92) l :∆ l ≥ ∆ (cid:110) U ( l ) < (cid:111) | Y new , η  p ( Y new | η ) dY new = (cid:90) (cid:90) Pr  (cid:92) l :∆ l ≥ ∆ (cid:110) U ( l ) < (cid:111) | Y new , η , φ (1) , ..., φ ( M )  p ( φ (1) , ..., φ ( M ) | Y new ) p ( Y new | η ) d φ dY new , where φ (1) , ..., φ ( M ) are independent of η given Y new by suﬃciency and are draws of the parameters for whichwe are constructing upper credible intervals. Therefore, (cid:90) (cid:90) Pr  (cid:92) l :∆ l ≥ ∆ (cid:110) U ( l ) < (cid:111) | Y new , η , φ (1) , ..., φ ( M )  p ( φ (1) , ..., φ ( M ) | Y new ) p ( Y new | η ) d φ dY new = (cid:90) (cid:90) Pr  (cid:92) l :∆ l ≥ ∆ (cid:110) U ( l ) < (cid:111) | φ (1) , ..., φ ( M )  p ( φ (1) , ..., φ ( M ) | Y new ) p ( Y new | η ) d φ dY new = (cid:90) (cid:90) I  (cid:92) l :∆ l ≥ ∆ (cid:110) U ( l ) ( φ (1) , ..., φ ( M ) ) < (cid:111) p ( φ (1) , ..., φ ( M ) | Y new ) p ( Y new | η ) d φ dY new ≈ I I (cid:88) i =1 (cid:90) I  (cid:92) l :∆ l ≥ ∆ (cid:110) U ( l ) ( φ (1) ,i , ..., φ ( M ) ,i ) < (cid:111) p ( φ (1) ,i , ..., φ ( M ) ,i | Y new ,i ) d φ , where U ( l ) ( · , ..., · ) maps the posterior draws to the k th simultaneous credible interval as determined by theprocedure in the previous section. Furthermore, Y new ,i ∼ p ( Y new | η ) for all i = 1 , ..., I . In the above, wehave used the Law of Large Numbers. Similarly,1 I I (cid:88) i =1 (cid:90) I  (cid:92) l :∆ l ≥ ∆ (cid:110) U ( l ) ( φ (1) ,i , ..., φ ( M ) ,i < (cid:111) p ( φ (1) ,i , ..., φ ( M ) ,i | Y new ,i ) d φ ≈ I I (cid:88) i =1 J J (cid:88) j =1 I  (cid:92) l :∆ l ≥ ∆ (cid:110) U ( l ) ( φ (1) ,ij , ..., φ ( M ) ,ij ) < (cid:111) , φ ( m ) ,ij ∼ p ( φ | Y new ,i ) and Y new ,i ∼ p ( Y new | η ). In summary,Pr  (cid:92) l :∆ l ≥ ∆ (cid:110) U ( l ) < (cid:111) | η  ≈ I I (cid:88) i =1 J J (cid:88) j =1 I  (cid:92) l :∆ l ≥ ∆ (cid:110) U ( l ) (cid:16) φ (1) ,ij , ..., φ ( M ) ,ij (cid:17) < (cid:111) . Therefore, we may choose a sample size that yields 100(1 − γ )% power by the following procedure.Specify a grid of sample sizes n ∈ { n min , ..., n max } . For each n in the grid, compute the statistical powerusing steps 2-4 and for each n in the grid calculate the power.1. Draw Y new ,i ∼ p ( Y new | η ) , i = 1 , ..., I where the input parameters η are given by the stage-1treatment response probabilities and end-of-study response probabilities.2. Draw φ ( m ) ,ij ∼ p ( φ | Y new ,i ) for all i = 1 , ..., I and j = 1 , ..., J , m = 1 , ..., M .3. Compute U ( l ) (cid:16) φ (1) ,ij , ..., φ ( M ) ,ij (cid:17) , l = 1 , ..., L .4. Iterate steps 2-4, M (e.g., M = 500) times and repeat until n is suﬃciently large so that1 I I (cid:88) i =1 J J (cid:88) j =1 I  (cid:92) l :∆ l ≥ ∆ (cid:110) U ( l ) ( φ (1) ,ij , ..., φ ( M ) ,ij ) < (cid:111) ≥ − γ. See Figure 1 for a ﬂow chart of the simulation design 1 SMART. Design 1 includes 6 embedded treatmentsequences and 4 embedded DTRs. People who respond to their stage-1 treatment continue on the sametreatment, whereas non-responders are re-randomized to one of two rescue treatments. The following is thesimulation scheme:1. Stage-1 treatment assignment: A ∈ {− , } with probability = 0 . S ∼ Bern( λ A ), A ∈ {− , +1 } .3. Stage-2 treatment for non-responders: A ∈ {− , } with probability = 0 . Y Z i ∼ Bern( π Z i ) , i = 1 , ..., n . See Figure 2 for a ﬂow chart of the simulation general SMART. The general SMART has 8 embeddedtreatment sequences and 8 embedded DTRs. The following is the simulation scheme:11 = 100 n = 400 Bayesian MSM Bayesian MSMDesign 1 Bias SD Bias SD Bias SD Bias SD θ θ θ θ General Bias SD Bias SD Bias SD Bias SD θ θ θ θ θ θ θ θ m ( β , A ) = β + β A + β A R2 + β A NR2 + β A A R2 + β A A NR2 (Kidwell et al. , 2018; Nahum-Shani et al. , 2012; Almirall et al. , 2014).1. Stage-1 treatment assignment: A ∈ {− , } with probability = 0 . S ∼ Bern( λ A ), A ∈ {− , +1 } .3. Stage-2 treatment for responders to stage-1 treatment ( S = 1): A R2 ∈ {− , } with probability = 0 . S = 0): A NR2 ∈ {− , } with probability =0 . Y i ∼ Bern( π Z i ) , i = 1 , ..., n .One could incorporate covariates in either simulation scheme, but this would impose modeling assump-tions which may lead to misspeciﬁcation if, for example, probit or logistic regression is employed. Further-more, including covariates would complicate the power analysis procedure as well as the interpretation ofMCB. Hence, we directly simulate the variables based oﬀ the probabilities of response.12 .00.10.20.30.40.50.60.70.80.91.0 150 200 250 300 350 400 450 500 n P o w e r Power vs. Sample Size for SMART Design 1 with 4 EDTRs n P o w e r Power vs. Sample Size for General SMART with 8 EDTRs l l

Predicted Power Empirical Power

Figure 3: Power vs. sample size plots for SMART simulations with 4 and 8 embedded DTRs, respectively.The predicted power is close to the empirical power. The black horizontal line corresponds to 80% power.

Table 3 shows the bias and SD for our Bayesian estimation approach and for a frequentist approach weightedand replicated logistic regression with the marginal structural model (MSM) m ( β , A ) = β + β A + β A R2 + β A NR2 + β A A R2 + β A A NR2 . We see that the bias is similar between the Bayesian procedure and MSM. The Bayesian procedure hasslightly smaller SE.To compute the empirical power, we simulated 1000 datasets over a grid of sample sizes, 150 , , ..., U ( l ) i , l = 1 , ..., L and i = 1 , ..., i and each samplesize. We then computed the empirical power asPower ≈ (cid:88) i =1 I  (cid:92) l :∆ l ≥ ∆ min (cid:110) U ( l ) i < (cid:111) where ∆ l = log (cid:18) Odds L Odds l (cid:19) where ∆ min was set to ∆ min = 0 .

61 for Design 1 with ∆ = (0 . , . , . , . (cid:62) .Then, EDTR and EDTR should be excluded from the set of best.We chose ∆ min = 0 . ∆ = (0 . , . , . , . , . , . , . , . (cid:62) . Then, EDTR , EDTR , EDTR , EDTR , and EDTR should be excluded from the set of best. The power plots (Figure 3) demon-13 D i ff e r e n ce i n Log − O R Embedded DTREDTR 1EDTR 2EDTR 3EDTR 4MCB Summary

Figure 4: Real ENGAGE SMART MCB. Upper one-sided credible intervals for the diﬀerence between eachembedded DTR (EDTR) and the best embedded DTR. Intervals covering 0 indicate that the correspond-ing embedded DTR is not statistically signiﬁcantly inferior to the best embedded DTR. Being below zeroindicates inferiority compared with the best.strate the accuracy of the predicted power estimating the required sample size.

In this section, we apply our method to the The Adaptive Treatment for Alcohol and Cocaine Dependence(ENGAGE) SMART trial to construct a set of optimal embedded DTRs for a binary outcome. We di-chotomize the log of the sum of the total number of days drinking plus the total number of days usingcocaine plus a small constant 0.5 and consider a positive response to be having the outcome less thanthe 25th percentile. We see that the optimal embedded DTRs are embedded DTRs 1 and 2. The onewith the lowest point estimate is embedded DTR 2. This suggests that MI-IOP is superior to patients’choosing their own care. Furthermore, although not statistically signiﬁcant diﬀerent, stage-2 NFC hasa greater point estimate compared to MI-PC. The probability of response in each embedded DTR was(EDTR , EDTR , EDTR , EDTR ) = (0 . , . , . , . .00.10.20.30.40.50.60.70.80.91.0 150 200 250 300 350 400 450 500 n P o w e r Power vs. Sample Size for ENGAGE SMART

Figure 5: Power for ENGAGE SMART to exclude all embedded DTRs except for the third (MI-PC stage 1,MI-PC stage 2). The actual power is 57%.350 subjects. The proposed MCB analysis demonstrates that the embedded DTR for which there is no furthercare are not statistically distinguishable from the embedded DTR for which patient choice is oﬀered in stage2. Therefore, clinicians may choose from the set of best either embedded DTR considering other factors suchas costs, treatment burden and patients preference.

The input parameters are the ﬁnal response rates for each embedded treatment sequence θ k for k = 1 , ..., K ,and the stage-1 treatment response probabilities λ A for A ∈ {− , +1 } . For the treatment sequences inwhich patients responded to the ﬁrst stage treatment (probability λ A ) when continuing initial treatment,the response rate from non-SMART studies may be used. For sequences involving two sequential treatments,the response rates of the second stage treatment will yield a conservative estimate of the power, so in theabsence of data on a particular sequence of treatments, one may choose the response rate of the secondstage treatment obtained from existing literature. Furthermore, one may equate diﬀerent branch responseprobabilities essentially making the assumption of no interaction between stage-1 treatment and the responserate for a particular stage-2 treatment. This may be reasonable for a SMART in which both branches havethe same treatment options in stage 2. As an alternative, one may conduct a pilot SMART to estimate theresponse rates (Artman et al. , 2018; Rose et al. , 2019).15 Discussion

An important goal of SMARTs is the determination of the best embedded DTRs. Subjects with substance usedisorders are heterogeneous in their response to outcome. The optimal embedded DTRs for the ENGAGEstudy were those which consisted of the intensive out patient programs. Previous work has focused oninference for simple comparisons such as those between two embedded DTRs. Furthermore, methods whichare applicable to more complex comparisons such as determination of the optimal embedded DTR whiletaking into account uncertainty has focused on continuous outcomes in the frequentist setting. We madetwo main contributions. First, we extended existing methodologies from continuous to binary outcomes andfrom the frequentist to the Bayesian setting. In particular, we applied the algorithm in Mandel & Betensky(2008) to MCB to construct a set of optimal embedded DTRs. Secondly, we provide a means for samplesize calculations without the need for knowledge of the covariance matrix of log-odds ratios and withoutnormality assumptions.We proved the validity of our method and demonstrated its application on real data, the ENGAGESMART. Recently, there has been work to develop sample size formula based oﬀ Q-learning for a continuousoutcome in Rose et al. (2019). It would be of interest to extend the proposed Bayesian MCB frameworkto Q-learning with binary outcomes. Furthermore, it would be valuable to extend MCB to constructing aset of best for longitudinal outcomes or zero-inﬂated outcomes. It would be straightforward to extend theBayesian approach to encompass continuous outcomes.

References

Almirall, D, Nahum-Shani, I, Sherwood, N.E., & Murphy, S.A. 2014. Introduction to SMART designs for thedevelopment of adaptive interventions: with application to weight loss research.

Translational behavioralmedicine , (3), 260–274.Artman, W.J., Nahum-Shani, I., Wu, T., Mckay, J.R., & Ertefaie, A. 2018. Power analysis in a SMARTdesign: sample size estimation for determining the best embedded dynamic treatment regime. Biostatistics .Black, J.J, & Chung, T. 2014. Mechanisms of change in adolescent substance use treatment: How doestreatment work?

Substance abuse , (4), 344–351.Chakraborty, B, & Moodie, E.E. 2013. Statistical methods for dynamic treatment regimes . Springer, NewYork. 16hakraborty, B, & Murphy, S.A. 2014. Dynamic treatment regimes.

Annual review of statistics and itsapplication , (1), 447–464.Crivello, A.I., Levy, J.A., & Murphy, S.A. 2007a. Evaluation of Sample Size Formulae for DevelopingAdaptive Treatment Strategies Using a SMART Design.

Tech. rept. No. 07-81. University Park, PA: ThePennsylvania State University, The Methodology Center.Crivello, A.I., Levy, J.A., & Murphy, S.A. 2007b.

Statistical Methodology for a SMART Design in the Devel-opment of Adaptive Treatment Strategies.

Tech. rept. No. 07-82. University Park, PA: The PennsylvaniaState University, The Methodology Center.Dziak, J.J, Yap, J.R.T, Almirall, D, McKay, J.R., Lynch, K.G., & Nahum-Shani, Inbal. 2019. A DataAnalysis Method for Using Longitudinal Binary Outcome Data from a SMART to Compare AdaptiveInterventions.

Multivariate behavioral research , 1–24.Ertefaie, A, Wu, T, Lynch, K.G., & Nahum-Shani, I. 2015. Identifying a set that contains the best dynamictreatment regimes.

Biostatistics , (1), 135–148.Hsu, J.C. 1981. Simultaneous conﬁdence intervals for all distances from the “best”. The Annals of Statistics , (5), 1026–1034.Hsu, J.C. 1984. Constrained Simultaneous Conﬁdence Intervals for Multiple Comparisons with the Best. Ann. Statist. , (3), 1136–1144.Hsu, J.C. 1996. Multiple Comparisons: Theory and Methods . CRC Press, London.Kidwell, K.M., Seewald, N.J., Tran, Q, Kasari, C, & Almirall, D. 2018. Design and analysis considerationsfor comparing dynamic treatment regimens with binary outcomes from sequential multiple assignmentrandomized trials.

Journal of applied statistics , (9), 1628–1651.Kranzler, H.R., & McKay, J.R. 2012. Personalized treatment of alcohol dependence. Current psychiatryreports , (5), 486–493.Laber, E.B., Lizotte, D.J., Qian, M, Pelham, W.E., & Murphy, S.A. 2014. Dynamic treatment regimes:Technical challenges and applications. Electronic journal of statistics , (1), 1225–1272.Lavori, P.W., Dawson, R, & Rush, A.J. 2000. Flexible treatment strategies in chronic disease: clinical andresearch implications. Biological Psychiatry , (6), 605–614.Lei, H, Nahum-Shani, I, Lynch, K.G., Oslin, D, & Murphy, S.A. 2012. A SMART design for buildingindividualized treatment sequences. Annual Review of Clinical Psychology , (1), 21–48.17andel, M, & Betensky, R A. 2008. Simultaneous conﬁdence intervals based on the percentile bootstrapapproach. Computational statistics & data analysis , (4), 2158–2165.McKay, James R. 2009. Continuing care research: What we have learned and where we are going. Journalof substance abuse treatment , (2), 131–145.McKay, J.R, Drapkin, M.L., Van Horn, D.H.A, Lynch, K.G., Oslin, D.W., DePhilippis, D, Ivey, M, &Cacciola, J.S. 2015. Eﬀect of patient choice in an adaptive sequential randomization trial of treatment foralcohol and cocaine dependence. Journal of consulting and clinical psychology , (6), 1021.Murphy, S.A. 2003. Optimal dynamic treatment regimes. Journal of the Royal Statistical Society: Series B(Statistical Methodology) , (2), 331–355.Murphy, S.A. 2005. An experimental design for the development of adaptive treatment strategies. Statisticsin medicine , (10), 1455–1481.Murphy, S.A., van der Laan, M.J., Robins, J.M., & Group, Conduct Problems Prevention Research. 2001.Marginal mean models for dynamic regimes. Journal of the American Statistical Association , (456),1410–1423.Nahum-Shani, I, Qian, M, Almirall, D, Pelham, W.E., Gnagy, B, Fabiano, G.A., Waxmonsky, J.G., Yu, J,& Murphy, S.A. 2012. Experimental design and primary data analysis methods for comparing adaptiveinterventions. Psychological methods , (4), 457–477.Ogbagaber, S.B., Karp, J, & Wahed, A.S. 2016. Design of sequentially randomized trials for testing adaptivetreatment strategies. Statistics in medicine , (6), 840–858.Robins, James. 1986. A new approach to causal inference in mortality studies with a sustained exposureperiod—application to control of the healthy worker survivor eﬀect. Mathematical modelling , (9-12),1393–1512.Robins, J.M. 2004. Optimal structural nested models for optimal sequential decisions. Pages 189–326of: Optimal structural nested models for optimal sequential decisions . Proceedings of the second seattleSymposium in Biostatistics. Springer.Rose, E.J., Laber, E.B., Davidian, M, Tsiatis, A.A, Zhao, Y.Q., & Kosorok, M.R. 2019. Sample SizeCalculations for SMARTs. arXiv preprint arXiv:1906.06646 .18an Horn, D.H.A, Drapkin, M, Lynch, K.G., Rennert, L, Goodman, J.D., Thomas, T, Ivey, M, & McKay,J.R. 2015. Treatment choices and subsequent attendance by substance-dependent patients who disengagefrom intensive outpatient treatment.

Addiction research & theory , (5), 391–403.Witkiewitz, K, Finney, J.W., Harris, A.H.S, Kivlahan, D.R., & Kranzler, H.R. 2015. Recommendations forthe design and analysis of treatment trials for alcohol use disorders. Alcoholism: Clinical and ExperimentalResearch , (9), 1557–1570.Yan, Xiaoxi, Ghosh, Palash, & Chakraborty, Bibhas. 2020. Sample size calculation based on precision forpilot sequential multiple assignment randomized trial (SMART). Biometrical Journal .19

Proofs

A.1 Robin’s G-Computation formula

We wish to construct simultaneous 100(1 − α )% one-sided upper credible intervals for ζ ( l ) , l = 1 , ..., L .Denote the upper limit for the l th embedded DTR by U ( l ) . Then, U ( l ) satisﬁesPr (cid:32) L (cid:92) l =1 { ζ ( l ) ≤ U ( l ) } (cid:33) = 1 − α. This is equivalent to M − M (cid:88) m =1 I (cid:32) L (cid:92) l =1 { ζ ( l ) m ≤ U ( l ) } (cid:33) = M − M (cid:88) m =1 I (cid:32) L (cid:92) l =1 { ζ ( l )( r ( m,l ) ,l ) ≤ U ( l ) } (cid:33) M − M (cid:88) m =1 I (cid:32) L (cid:92) l =1 { ζ ( l )( r ( m,l ) ,l ) ≤ U ( l ) } (cid:33) → − α as M → ∞ . Let U ( l ) = ζ ( l )( r − α l ) where r − α is the 1 − α quantile of r (1) , ..., r ( M ) where r ( m ) = max l r ( m, l ). Then, M − M (cid:88) m =1 I (cid:32) L (cid:92) l =1 { ζ ( l )( r ( m,l ) ,l ) ≤ U ( l ) } (cid:33) = M − M (cid:88) m =1 I (cid:32) L (cid:92) l =1 { ζ ( l )( r ( m,l ) ,l ) ≤ ζ ( l )( r − α l ) } (cid:33) = M − M (cid:88) m =1 I (cid:32) L (cid:92) l =1 { r ( m, l ) ≤ r − α } (cid:33) = M − M (cid:88) m =1 I (cid:18) max l r ( m, l ) ≤ r − α } (cid:19) = M − M (cid:88) m =1 I ( r ( m ) ≤ r − α } ) → − α.α.