[PDF] A Survival Mediation Model with Bayesian Model Averaging

Abstract

Determining the extent to which a patient is benefiting from cancer therapy is challenging. Criteria for quantifying the extent of "tumor response" observed within a few cycles of treatment have been established for various types of solid as well as hematologic malignancies. These measures comprise the primary endpoints of phase II trials. Regulatory approvals of new cancer therapies, however, are usually contingent upon the demonstration of superior overall survival with randomized evidence acquired with a phase III trial comparing the novel therapy to an appropriate standard of care treatment. With nearly two thirds of phase III oncology trials failing to achieve statistically significant results, researchers continue to refine and propose new surrogate endpoints. This article presents a Bayesian framework for studying relationships among treatment, patient subgroups, tumor response and survival. Combining classical components of mediation analysis with Bayesian model averaging (BMA), the methodology is robust to model mis-specification among various possible relationships among the observable entities. Posterior inference is demonstrated via application to a randomized controlled phase III trial in metastatic colorectal cancer. Moreover, the article details posterior predictive distributions of survival and statistical metrics for quantifying the extent of direct and indirect, or tumor response mediated, treatment effects.

Full PDF

AA Survival Mediation Model with Bayesian ModelAveraging

JIE ZHOU , XUN JIANG , H. AMY XIA , PENG WEI , ∗ ,BRIAN P. HOBBS , ∗∗ Quantitative Health Sciences, Cleveland Clinic, Cleveland, Ohio, U.S.A. Center for Design and Analysis, Amgen, Thousand Oaks, California, U.S.A. Department of Biostatistics,The University of Texas MD Anderson Cancer Center, Houston, Texas, U.S.A. Dell Medical School,The University of Texas at Austin, Austin, Texas, U.S.A.

Abstract

Determining the extent to which a patient is beneﬁting from cancer therapyis challenging. Criteria for quantifying the extent of “tumor response” observedwithin a few cycles of treatment have been established for various types of solidas well as hematologic malignancies. These measures comprise the primary end-points of phase II trials. Regulatory approvals of new cancer therapies, how-ever, are usually contingent upon the demonstration of superior overall survivalwith randomized evidence acquired with a phase III trial comparing the noveltherapy to an appropriate standard of care treatment. With nearly two thirdsof phase III oncology trials failing to achieve statistically signiﬁcant results, re-searchers continue to reﬁne and propose new surrogate endpoints. This articlepresents a Bayesian framework for studying relationships among treatment, pa-tient subgroups, tumor response and survival. Combining classical componentsof mediation analysis with Bayesian model averaging (BMA), the methodologyis robust to model mis-speciﬁcation among various possible relationships amongthe observable entities. Posterior inference is demonstrated via application to arandomized controlled phase III trial in metastatic colorectal cancer. Moreover,the article details posterior predictive distributions of survival and statistical met-rics for quantifying the extent of direct and indirect, or tumor response mediated,treatment eﬀects.

Keywords:Bayesian Model Averaging; Mediation analysis; On-cology; Surrogate markers a r X i v : . [ s t a t . M E ] S e p Introduction

The drug development paradigm for oncology was founded on a sequence of phaseswith trials devised to elucidate various properties of novel agents and compare them tocurrent standard-of-care therapies. Dose selection takes place in phase I with a smalltrial ( <

50 patients), often with a dose escalation trial designed to estimate rates ofdose-limiting toxicity and identify an appropriate treatment dose. This is followed byone or more moderately sized phase II trials (50 to 200 patients) devised to estimatethe preliminary clinical activity of the novel treatment using endpoints that emphasizemorphological and/or pathological changes to the primary disease site usually observedwithin a few cycles following treatment (Eisenhauer et al., 2009; Hallek et al., 2018).Given success in phase II, a randomized phase III trial is conducted, often to interrogatewhether a survival advantage may be attributable to the novel therapy when compared totherapies used in routine clinical practice. Traditionally required for regulatory approvalof new therapies, phase III trials often enroll hundreds of patients and span multipleclinical sites, and thus require substantially more investment in resources and time.Implicit to this paradigm is the assumption that failure in phase II connotes failurein phase III. Following a successful phase II trial, however, success in phase III remainsuncertain in most oncology settings. In fact, Hay et al. (2014) assert that only 28.3% ofthe oncology Phase II trials successfully advanced to Phase III, which is lower than therate (34.8%) among non-oncology phase II studies. Success rates among phase III trials(as deﬁned by any FDA regulatory approval) are also lower for oncology trials (37%)when compared to non-oncology drugs (54%). Thus, phase II trials are limited withrespect to the extent to which their results determine the eventual regulatory success ofputatively promising new cancer therapies. Explanations for the lack of predictive powerfrom phase II to subsequent success in phase III remains a relevant issue in oncology. It isperhaps elevated in its importance with the continually evolving regulatory landscapes.Changes in recent years at the US Food and Drug Administration (FDA) have yieldedpathways for Breakthrough Therapy designation, established by the FDA Safety andInnovation Act (US Food and Drug Administration., a,b) as well as a pathway fortissue-agnostic approvals spanning multiple previously disparate indications (Pestanaet al., 2020). Moreover, multiple stakeholders have promoted innovations and eﬃciencyin oncologic drug development with master protocol (Beckman et al., 2016; Woodcockand LaVange, 2017; Wages et al., 2017; Hobbs et al., 2018; Kaizer et al., 2019) andseamless designs (Prowell et al., 2016; Hobbs et al., 2019), as well as the integration ofreal-world evidence (Griﬃth et al., 2019; Khozin et al., 2019).If phase II trials are to become better harbingers of subsequent regulatory successin oncology, trialists should prioritize endpoints that are established surrogate markers2f survival (Buyse et al., 2010). Moreover, statistical criteria for trial success shouldbe informed by mediation models applied to parse the indirect eﬀects of competingtherapies with respect to potential surrogate endpoints from direct eﬀects of treatment.Mediation analysis methodology was proposed to decompose the eﬀects of interven-tions into direct and indirect eﬀects (Baron and Kenny, 1986). In the context of onco-logic drug development, the indirect eﬀect deﬁnes the extent of survival beneﬁt that isachieved from localized reductions in tumor burden (e.g. shrinking a solid tumor to acertain extent), while direct treatment eﬀects characterize the extent of survival beneﬁtattributable to all other factors not directly measured by the surrogate tumor responseendpoint. Statistical models for mediation analysis provide criteria for measuring theextent to which any measure of tumor response yields a reliable surrogate for survival,which is determined by the magnitude of indirect eﬀect. A diagram describing theserelationships can be found in Figure S1 in the supplementary materials.Several authors have proposed statistical models for mediation analysis and/or test-ing for the mediation eﬀect from the frequentist paradigms (Fairchild and MacKinnon,2009; MacKinnon et al., 2002; Zhang et al., 2009; Mallinckrodt et al., 2006; Ho et al.,2001). A few researchers have developed Bayesian models for inference of mediationeﬀects (Yuan and MacKinnon, 2009; Wang and Preacher, 2015; Nuijten et al., 2015).All models previously proposed have assumed that both the mediator and patientoutcome were continuous variables. In this context, mediation eﬀects can be derivedon the basis of linear and Gaussian formulations. Mediation analysis with survivaloutcomes have been less common. VanderWeele (2011) described decompositions ondiﬀerent outcome scales and gave formulae for calculating the mediation eﬀects underadditive hazard, accelerated failure time, and rare-outcome proportional hazards mod-els. Tchetgen (2011) proposed theory pertaining to estimation of natural direct andindirect eﬀects with marginal structural Cox proportional hazards model and additivehazards model. Fulcher et al. (2017) discussed the eﬀect of censoring and truncationwith respect to “product” and “diﬀerence” methods used to estimate the mediation ef-fect under the accelerated failure time model. Lin et al. (2017) described methodologyfor estimating mediation eﬀects under a time-varying survival model. Considering theoncology context, Vandenberghe et al. (2018) decomposed the total risk ratio, or ratioof survival probabilities for treatment versus control, into the natural direct and indirecteﬀects. These derivations yield the mediation proportion, a metric characterizing therelative magnitude of indirect eﬀect.The methods mentioned previously for survival outcomes arise from the frequentistparadigm. This article presents a Bayesian framework for studying relationships amongtreatment, patient subgroups, tumor response and survival. Bayesian model averag-3ng (BMA) is a modeling technique which acknowledges uncertainty in model selection(Hoeting et al., 1999). Facilitating robust analyses, BMA extends posterior inferenceto the model space yielding integrative parameter estimates (Wasserman et al., 2000;Fragoso et al., 2018). The methodology combines classical components of mediationanalysis with BMA, which provides robustness to model mis-speciﬁcation among re-lationships of the observable components aforementioned. Computation of this modelleverages reversible jump Markov chain Monte Carlo (RJMCMC). Algorithms for im-plementation of RJMCMC, using the R package “nimble” (de Valpine et al., 2017), aswell as calculation of risk ratios and mediation proportion that deﬁne the extent towhich a treatment eﬀect for survival is passed through objective response of tumor aredescribed in detail. Trialists and sponsors may lack suﬃcient data as well as a statisticaltoolbox for designing phase III trials as a function of a drugs performance with respectto surrogate endpoints measured in early phase trials. The article additionally explainshow the resultant posterior predictive distributions can be applied to predict the successof ongoing studies at interim analyses or new trials when informed by historical datasources. In the presence of low predictive power, a trial could be halted thereby savingtime, cost, and most importantly allowing trial participants to pursue alternatives.The remainder of this article proceeds as follows. Section 2 describes the method-ological framework for mediaton analysis with BMA. Methods for estimation based onRJMCMC are described in Section 2.1. Tools for evaluating the mediation eﬀect oftumor response are discussed in Section 2.2 with their implementation detailed in Sec-tion 2.2.1. Algorithms for prediction are presented in Section 2.3. Section 3 deﬁnes themethods performance via simulation study and Section 4 applies the methodology to acolorectal cancer study. Section 5 provides discussion.

The mediation model for treatment, short-term tumor response and long-term sur-vival outcome has two components: a logistic regression model for the tumor responseand a proportional hazards (PH) model (Cox, 1972) for the survival outcome. Let Y denote the binary tumor response outcome with Y = 1 indicating response and 0 oth-erwise. Treatment arm indicator A has two possible values 0 and 1 representing controland treatment groups, respectively. X is a set of baseline covariates that need to beadjusted in the model, such as subgroups of the patient population. We model thelog-odds of π = P r ( Y = 1) as the linear combination of the predictors logit ( π ) = D r β , D r is the design matrix in the response model and β is the corresponding vectorof coeﬃcients. For example, a full model with treatment A , covariate X and theirinteractions has linear predictor D r β = β + β A + β X + β A × X .The overall survival (OS) time T is the outcome of interest in the long-term study,and a PH model is assumed here: h ( t | A, Y, X ) = h ( t ) × exp { D s γ } , where h ( · ) is the baseline hazard function. For example, assuming the W eibull ( ν, λ )distribution yields the following baseline hazard h ( t ) = νλt ν − . D s is the design matrixin the survival model with coeﬃcient vector γ . A full model containing treatment A ,response Y , covariate X and their pairwise interactions leads to D s γ = γ A + γ Y + γ X + γ A × Y + γ A × X + γ X × Y . Let θ = ( β , γ , ν, λ ) denote the vector of unknown parameters in the mediationmodels. We estimate the mediation model under the Bayesian framework and adoptthe reversible jump Markov chain Monte Carlo (RJMCMC) (Green, 1995) to get theposterior samples for θ . RJMCMC is a general framework for MCMC simulation and itcan be viewed as an extension of the Metropolis-Hastings algorithm to more general statespaces of varying dimensions. The advantage of RJMCMC is to avoid mis-speciﬁcationof the linear predictors in either the response or the survival models.To illustrate all possible models we considered in our problem, we introduce indicatorvectors z = ( z , z , z ) and w = ( w , · · · , w ) in the response and survival models,respectively. The idea is that “removing a variable from the model is equivalent tomultiply its coeﬃcient by zero, and only variables with non-zero indicators are includedin the model. Here, we list the indicator vectors for all possible response and survivalmodels in Table 1. In each model, the components with values of 1 are in the modeland 0 otherwise. For example, the response model R has coeﬃcient z (cid:48) β = β , where z (cid:48) = (1 , z ), and is the null model with only the intercept included in the linear predictor.We follow the hierarchical structure rules when specifying the models, that is, if theinteraction term is included in the model then the individual variables of the interactionare included as well.We assume ﬂat priors for the parameter θ . Priors for β and γ are set to be inde-pendent Normal distribution with zero mean and standard deviation of 100. The twoWeibull parameters ν and λ have Gamma prior with shape and rate equal to 0.001.The indicators z and w have Bernoulli priors with probabilities ψ z and ψ w . To5able 1: Indicator vectors for all possible modelsResponse modelsvar. Int. A X A × X Model coef. β β β β z z z R R R R R A × Y A × X X × Y Model coef. γ γ γ γ γ γ w w w w w w S S S S S S S S S S S S S S S S S S z × z ≥ z ,w × w ≥ w ,w × w ≥ w ,w × w ≥ w . Under these constraints, values for ψ z and ψ w can be searched to obtain equal orweighted model probabilities. For example, the reverse order of the model Akaike infor-mation criterion (AIC) values can be used as the weight in selecting ψ z and ψ w . Theimplementation and detailed algorithms using the “nimble” package can be found inSupplementary materials.The posterior model probabilities can be calculated based on the posterior samplesof z and w . By matching diﬀerent combinations of the indicators to the correspondingmodels in Table 1, we can count the frequencies of each model appears in the posteriorsamples and calculate the proportions as the posterior model probabilities. We considera model with higher posterior probability is more likely to be the true model based onobserved data and the priors. Under the mediation analysis framework, potential outcomes or counterfactual nota-tions are very useful in describing the relationships among treatment, mediators and out-comes. Here we assume the counterfactual variables Y i ( a ) and T i ( a ) exist for each patient i = 1 · · · , n , and each treatment arm a = 0 ,

1. With these notations, we have the regularsurvival probabilities S ( t | A i = 1 , Y i (1)) and S ( t | A i = 0 , Y i (0)) for patients in treatmentand control groups, as well as the counterfactual survival probability S ( t | A i = 1 , Y i (0))for a patient who was assigned to the treatment group, but with the response outcomethe patient would have had been in the control group.Vandenberghe et al. (2018) deﬁned the natural direct and indirect treatment eﬀectson the survival outcomes based on the “risk ratios”, which are ratios of survival probabil-ities. The risk ratio for total treatment eﬀect is simply the ratio of survival probabilitiesfor treatment and control groups: RR tot ( t ) = S ( t | A i = 1 , Y i (1)) S ( t | A i = 0 , Y i (0)) . RR d ( t ) = S ( t | A i = 1 , Y i (0)) S ( t | A i = 0 , Y i (0)) , which reﬂects the direct treatment eﬀect when holding the mediation eﬀect of the re-sponse at value Y i (0). The risk ratio for the natural indirect eﬀect is deﬁned as RR m ( t ) = S ( t | A i = 1 , Y i (1)) S ( t | A i = 1 , Y i (0)) , since it represents what would happen to a patient in the treatment arm when themediated eﬀect through response changes from Y i (1) to Y i (0). Note that we have theproduct of the direct and indirect risk ratios equals the risk ratio for the total eﬀect.The risk ratios deﬁned above are ratios of survival probabilities, and they can bevery extreme as survival probabilities approach zero when time t increases. Instead,we use the log versions of these risk ratios, which are the diﬀerence of the log survivalprobabilities.The mediation proportion is deﬁned based on the risk ratios as M ed %( t ) = RR tot ( t ) − RR d ( t ) RR tot ( t ) − S ( t | A i = 1 , Y i (1)) − S ( t | A i = 1 , Y i (0)) S ( t | A i = 1 , Y i (1)) − S ( t | A i = 0 , Y i (0)) . This quantity is not restricted between 0 and 1, but is valuable to evaluate the mediationeﬀect of response when the treatment eﬀect on survival is positive.

Note that the risk ratios and mediation proportion deﬁned above are all functions oftime. In practice, we make conclusions for speciﬁc time points ˜ t = ( t < t < · · · < t τ ).Logarithm of risk ratios and their pointwise credible intervals are calculated as follows:Assume we keep M posterior samples for θ after burn-in. For the m th set of samples θ ( m ) = ( β ( m ) , γ ( m ) , ν ( m ) , λ ( m ) ), m = 1 , · · · , M , Step 1.

We calculate the predicted survival probabilities for each subject in thedata and at each time point t ∈ ˜ t . For example, with the W eibull ( ν, λ ) baselinedistribution, we have S ( m ) ( t | A, Y, X ) = exp {− λ ( m ) t ν ( m ) × e D s γ ( m ) } , (1)where D s is the full design matrix in the survival model based on the data.8 tep 2. Separate the above predicted survival probabilities based on treatmentgroup A and calculate the average as the mean survival for each group: S ( m )0 ( t ) = 1 (cid:80) ni =1 I ( A i = 0) n (cid:88) i =1 I ( A i = 0) S ( m ) ( t | A i , Y i , X i ) ,S ( m )1 ( t ) = 1 (cid:80) ni =1 I ( A i = 1) n (cid:88) i =1 I ( A i = 1) S ( m ) ( t | A i , Y i , X i ) . Step 3.

Construct the artiﬁcial design matrix D ∗ s by assuming all subjects are inthe treatment group. Calculate S ( m ) ∗ ( t | A = 1 , Y, X ) by replacing D s with D ∗ s inEquation (1). The counterfactual survival probabilities are calculated by takingthe average of this quantity for subjects in the control group: S ( m ) ∗ ( t ) = 1 (cid:80) ni =1 I ( A i = 0) n (cid:88) i =1 I ( A i = 0) S ( m ) ∗ ( t | A i = 1 , Y i , X i ) . Step 4.

Repeat the above steps 1-3 for each set of posterior samples, and calculatethe log risk ratios and mediation proportion by taking the average of the samplesas follows: (cid:100) lRR tot ( t ) = 1 M M (cid:88) m =1 log( S ( m )1 ( t )) − log( S ( m )0 ( t )) , (cid:100) lRR d ( t ) = 1 M M (cid:88) m =1 log( S ( m ) ∗ ( t )) − log( S ( m )0 ( t )) , (cid:100) lRR m ( t ) = 1 M M (cid:88) m =1 log( S ( m )1 ( t )) − log( S ( m ) ∗ ( t )) , (cid:92) M ed %( t ) = 1 M M (cid:88) m =1 S ( m )1 ( t ) − S ( m ) ∗ ( t ) S ( m )1 ( t ) − S ( m )0 ( t ) . Step 5.

Derive the 95% pointwise credible intervals by ﬁnding the 2.5% and 97.5%quantiles of the samples for each quantity.

This section demonstrates how Bayesian posterior predictive computation can beused to predict whether a trial will achieve a statistically signiﬁcant result upon com-9letion. More speciﬁcally, the Bayesian framework yields predictive distributions of un-observed survival durations for censored patients and future trial participants. Throughrepeated sampling from these densities and application of statistical testing proceduresto the predicted outcomes, it is possible to obtain posterior predictive densities thatreﬂect the current expectation for treatment comparison that will result at the end ofthe trial. This section describes this process using the log-rank test as the basis fortreatment comparison.Consider two situations for prediction: 1) using historical data to estimate powerof future studies; and 2) predict success of a trial at an interim analysis. In the ﬁrstsituation, we have a complete historical data with all the response and survival outcomes,while no data are available for the new study. In the other case, we have partial dataavailable at an interim analysis of an onging trial. The partial data is used to predictthe survival outcomes for new patients and patients who had not yet failed by that time.Let O c = ( T c , δ c , Y c , A c , X c ) denote the current observed data. We estimate theparameter θ based on Bayesian mediation models and obtain M posterior samples: (cid:98) θ (1) , · · · , (cid:98) θ ( M ) . Let O p = ( A p , X p ) be the test data with only treatment assignments A p and baseline covariates X p . We ﬁrst discuss below the algorithm for calculating theprediction power for the test data alone.For the posterior sample (cid:98) θ ( m ) , m = 1 , · · · , M , Step 1

Predict the response outcome: – S1.1: We calculate the linear predictor terms D pr (cid:98) β ( m ) in the response model,where D pr is the design matrix constructed based on the test data O p . – S1.2: The response probability vector is π ( m ) = exp ( D pr (cid:98) β ( m ) ) / [1+exp ( D pr (cid:98) β ( m ) )].The predicted response outcome is generated as Y ( m ) p ∼ Bernoulli ( π ( m ) ). Step 2

Predict the survival outcome: – S2.1: Calculate the linear predictor terms in the survival model D p, ( m ) s (cid:98) γ ( m ) ,where D p, ( m ) s is the design matrix constructed based on the test data O p andthe predicted response Y ( m ) p from previous step. – S2.2: Generate the predicted survival time T ( m ) ∗ from the survival model h ( t | (cid:98) θ ( m ) , O p , Y ( m ) p ) = ν ( m ) λ ( m ) t ν ( m ) − × exp { D p, ( m ) s (cid:98) γ ( m ) } . – S2.3: Generate censoring time C according to the study design. For example,if the study stops at a landmark time c , we have C ≡ c . The predicted survivaloutcome is then calculated as T ( m ) p = min ( T ( m ) ∗ , C ) and δ ( m ) p = I ( T ( m ) ∗ ≤ C ).10 tep 3 Perform the log-rank test on the predicted survival outcomes in Step 2with respect to treatment group A p and record the p-value p ( m ) . Step 4

Repeat Steps 1-3 for m = 1 , · · · , M and record the test results as p =( p (1) , · · · , p ( M ) ). Calculate the power as the proportion of signiﬁcant p-values atlevel α .The above algorithm is for predicting the power for a new test data. This ﬁts the ﬁrstsituation we discussed when we would like to do prediction based on historical data. Inthe second situation, we do estimation on the partial data at interim, but would like topredict the power for the ﬁnal complete data. We follow the same steps 1-2 to generatepredicted response and survival outcomes, but perform the log-rank test on the combineddata O ( m ) = ( O c , O p, ( m ) ), where O p, ( m ) = ( T ( m ) p , δ ( m ) p , Y ( m ) p , A p , X p ) is the test data O p with predicted outcomes generated based on θ ( m ) .Sometimes we may have collected some response outcomes Y p for the test data O p ,but no one has had the event yet. In that case, we can skip Step 1 and predict survivaloutcomes based on O p = ( Y p , A p , X p ). We have performed extensive simulations to further evaluate the operating character-istics of the proposed BMA-based mediation model. We generated data for the proposedmediation models: logit { P r ( Y = 1) } = β + β A + β X + β A · X,h ( t | A, Y, X ) = h ( t ) × exp { γ A + γ Y + γ X + γ A · Y + γ A · X + γ X · Y } , where the baseline hazard function had W eibull ( ν, λ ) distribution h ( t ) = νλt ν − . Thetrue values for the Weibull parameters were set at ν = 2 and λ = 1. Subjects wereassigned to treatment ( A = 1) or control ( A = 0) group with 1:1 ratio. Covariate X wasgenerated from Uniform(-2, 4) distribution.The survival time was subject to right censoring. A prespeciﬁed landmark time point c = 1 . c was treated as rightcensored. The resulted censoring proportions in the four scenarios below are around31.14%, 23.55%, 26.66% and 20.61%, respectively.11 .1 Simulation designs We consider four scenarios for the relationships among treatment, response and sur-vival outcomes. In Scenario I, we assume the treatment only has indirect eﬀect onsurvival through response, and the direct treatment eﬀect on survival is zero. The co-eﬃcients are set to be β = (1 , , − ,

2) and γ = (0 , − . , , , , R and S . Under this assumption, we should havethe total eﬀect equals the mediated eﬀect ( lRR tot = lRR m ) and mediation proportionequals 1.We make the other extreme assumption in Scenario II, where mediation eﬀect throughresponse is zero. The coeﬃcients are set to be β = (1 , , − ,

0) and γ = ( − . , , , , , R and S . The total treat-ment eﬀect equals direct eﬀect lRR tot = lRR d and the mediation proportion is zero.We consider the most common case in Scenario III, where treatment has both directand indirect eﬀects on survival. The coeﬃcients are set to be β = (1 , , − ,

2) and γ = ( − . , − . , , , , R and S . All the log risk ratios are positive and the mediation proportion is between 0.2and 0.4 under this setting.Finally, in Scenario IV, we consider a situation where we observe treatment eﬀect onthe response outcome, but neither treatment nor response has any eﬀect on the survivaloutcome. The coeﬃcients are set to be β = (1 , , − ,

2) and γ = (0 , , , , , R and S . All of the log risk ratios areequal to zero and the mediation proportion cannot be calculated in this case. The truesurvival curves for the treatment and control groups under each scenario can be foundin online supplementary materials (Figure S2). Based on the speciﬁed coeﬃcients, thehazard ratios comparing control with treatment groups are around 1.5 for the ﬁrst twoscenarios, 2.5 for the third scenario and 1 in the last case. We used sample size n = 1000 (treatment and control arms combined) in our simu-lation and conducted 100 replications for each setting. Results for a smaller sample sizeof n = 500 were provided in supplementary materials for comparison. For the MCMCprocedure, we used two chains with each having 10,000 samples and dropped the ﬁrst5000 generated during the burn-in period. No thinning was made for the samples. Weused the reverse order of AIC as the weight to calculate the prior probabilities ψ z and ψ w for the indicator variables.Summary of the estimated regression coeﬃcients are presented in Table 2, where we12eport the bias, mean of the standard deviation (MStd) and the coverage probability(CP) of the 95% Highest Posterior Density (HPD) intervals. For the proposed method,the bias of all the parameters are small and the CP is close to the nominal level 0.95.The posterior model probabilities were summarized in Table 3 for each setting. Theaverage of the model probabilities were calculated with the minimum and maximumvalues in parenthesis. True models numbers were listed in the parenthesis after eachscenario and we could see the true models always have the highest posterior modelprobabilities in all settings.In Figure 1, the average of the mean log risk ratios were plotted in dashed lines andcompared with the true curves in solid lines. The 2.5% and 97.5% quantiles of the meanlog risk ratios were added as dotted lines in the plots. The estimated log risk ratio curveswere close to the truth in all of the four scenarios. The ﬁgures summarizing the mediationproportions were in the online supplementary materials. The median values were closeto the truth for the ﬁrst three scenarios while the mediation proportion calculated inthe last scenario was not applicable because all the risk ratios were equal to zero. In previous sections, we showed that the proposed mediation model helps in calibrat-ing the mediation eﬀect of the short-term tumor response through estimation of the logrisk ratios and mediation proportions. Moreover, the estimated models can be appliedto predict the trial results. These predictions improve as tumor response confers moreinformation about the long-term survival of patients. Under each of the four scenar-ios in Section 3.2, we calculated the prediction power for achieving success under twocases: (a) use interim data to predict current trial and (b) use historical data to predictfuture study. No response data is available for the future data. Diﬀerent sample sizes n = 200 ,

500 and 1000 were used for available data used for estimation.We generated an independent dataset with sample size n = 100 , ,

300 and 500 forthe future study data. We pretended the response and survival outcomes were unknown.The log-rank test was performed for the ﬁnal analysis and the proportion of signiﬁcanttest results was calculated as the power. Two-sided signiﬁcance level α = 0 .

05 wasused. For case (a), the tests were performed on the combined data of previous and thepredicted survival outcomes, while in case (b), only predicted survival data were used.The mean of the power curves under each scenario are plotted in Figure 2. Note thatthe power in the last scenario is very low since there is no treatment diﬀerence betweenthe two groups by design. Otherwise, in general, the power increases as sample sizeincreases. Combining current data and predicted data leads to a higher power due tolarger total sample sizes. 13able 2: Simulation study: summary of parameter estimatespar. Bias MStd CP Bias MStd CPScenario I Scenario II β β β -0.004 0.086 0.970 -0.002 0.060 0.980 β γ -0.003 0.012 1.000 -0.002 0.077 0.930 γ γ γ γ γ ν λ -0.012 0.086 0.940 0.007 0.065 0.960Scenario III Scenario IV β β β -0.016 0.086 0.930 -0.020 0.087 0.940 β γ -0.017 0.116 0.910 -0.006 0.013 1.000 γ γ γ γ γ ν λ R & S ) Scenario II ( R & S ) Scenario III ( R & S ) Scenario IV ( R & S )Response model R R R R R S S S S S S S S S S S S S S S S S S Our research was motivated by a colorectal cancer study reported by Goldberg et al.(2004). Seven hundred ninety ﬁve patients with metastatic colorectal cancer who had notbeen treated previously for advanced disease were randomly assigned to receive irinote-16igure 2: Predicted power(I: indirect eﬀect only; II: direct eﬀect only; III: both eﬀectsexist; IV: neither eﬀect exist.)can and bolus ﬂuorouracil plus leucovorin (IFL), oxaliplatin and infused ﬂuorouracil plusleucovorin (FOLFOX), or irinotecan and oxaliplatin (IROX). FOLFOX and IROX weretwo new regimens under investigation while IFL was considered as the standard of care.The ordinal tumor response (progression disease (PD), stable disease (SD), partial re-sponse (PR), and complete response(CR)) was assessed for each patient by the modiﬁedResponse Evaluation Criteria In Solid Tumors (RECIST) criteria Therasse et al. (2000)every 6 weeks for the rst 42 weeks. Patients who received the FOLFOX regimen werefound to have better tumor response and progression-free survival when compared topatients in the other two groups. It is not clear, however, whether the better survivaloutcome in FOLFOX group was due to the better tumor response, or to what extenttumor response conferred prolonged survival for this indication.17o illustrate the proposed method, we restrict to the 531 patients who received eitherthe FOLFOX regime (treatment group) or the standard of care IFL (control group), andcompare their OS. Only 5.6% of the patients were right censored. The median survivaltime in the FOLFOX was 594 days, which was longer than the median survival time of441 days in the IFL group. We create the binary response outcome Y as follows: if apatient had CR or PR as the best response outcome, we assign Y = 1; otherwise Y = 0.One hundred thirty two (49.4%) patients in the FOLFOX group had response Y = 1compared to 100 (37.9%) in the IFL group. Analyses were adjusted for baseline agegroup ( <

65 years old or ≥ R ) and the model with only treatment eﬀect ( R ) constituted 72.9% and26.9% of the samples in the response models. For the survival model, the majority ofposterior model probabilities lied in the model with only response ( S ) and the modelwith treatment and response ( S ), which were 63.7% and 32.1%, respectively.Figure 3 (a) presents Kaplan-Meier curves for survival by tumor response, whichexhibit clear ordering with respect to the magnitude of the surrogate. Figure 3 (b) de-picts the median of the log risk ratios with their 95% highest posterior density (HPD)intervals over time. The direct eﬀect (red) is estimated to be close to zero, although it isimprecisely estimated with larger variability. This results from the BMA among modelswith diﬀering characterizations of the direct eﬀect. While advantageous is facilitatingrobustness to model mis-speciﬁcation, BMA can yield mixture posteriors with skeweddistributions. By way of contrast, the mediated eﬀect (green) is estimated to be positivewith highly localized HPD interval, suggesting consistent estimation among averagedmodels. As a result of the heterogeneous posterior for direct eﬀect, the mediation pro-portion (shown in the Supplementary materials Figure s9) has a bi-modal distributionwith median approximately 1 and mean around 0.7. Overall the results suggest thatreduction in tumor size following treatment conferred prolonged survival for the studiedpatient population. Moreover, most of the treatment eﬀect observed for survival wasmediated through tumor response. 18 Discussion

Patient heterogeneity is a hallmark of oncology. Drug developers learning fromtrial data encounter complex relationships among patient subpopulations, therapies,response, and survival. This article presented a Bayesian framework for mediation anal-ysis formulated to study tumor response and survival among competing therapies. Thearticle detailed algorithms for quantifying the strength of a surrogate marker for sur-vival from the model’s posterior samples. Simulation studies demonstrated the method’sperformance in the presence versus absence of reliable surrogate markers and/or directtreatment eﬀects. Additionally, the method was applied to a randomized colorectalcancer study devised to compare competing chemotherapeutic regimens with respect tosurvival. The methodology proposed leverages Bayesian model averaging to facilitate ro-bustness to mis-speciﬁcation of modeling assumptions that impact statistical estimationof subgroups, treatments, response, and their conjoint eﬀects on survival. RJMCMC,demonstrated using the R package “nimble”, facilitates full posterior inference and com-putation of posterior predictive distributions of survival.The reader should note a few limitations. The mediation models developed in thisarticle assumed binary tumor response in settings with two competing therapies. Exten-sions to multifactorial responses require alteration of the regression model for responseto accommodate multinomial and/or ordinal distributional assumptions (Glonek andMcCullagh, 1995; O’brien and Dunson, 2004; Qaqish and Ivanova, 2006). For studieswith multiple treatment arms and a common control group, log risk ratios and mediationproportion can be calculated for comparing each treatment group to the control. For ex-ample, if we have three groups A ∈ { , , } with A = 0 denoting the control group. Weneed to calculate individual survival probabilities S ( t | A i = j, Y i ( j )) , j = 0 , ,

2, as well ascounterfactual survival probabilities S ( t | A i = j, Y i (0)) , j = 1 ,

2. Thereafter, calculationsof risk ratios take place with respect to each treatment group j ( j = 1 , Acknowledgements

The authors thank Dr. Daniel Sargent for his mentorship and vision to pursue thisresearch.

References

Baron, R. M. and Kenny, D. A. (1986). The moderator–mediator variable distinctionin social psychological research: Conceptual, strategic, and statistical considerations.

Journal of personality and social psychology

Clinical Pharmacology & Therapeutics

Statistics in medicine

Naturereviews Clinical oncology Journal of the Royal StatisticalSociety: Series B (Methodological)

Statistics in medicine

Journal of Computational and Graphical Statistics

European journalof cancer

Prevention Science

International Statistical Review

Epidemiology (Cambridge,Mass.)

Biostatistics Journal of theRoyal Statistical Society: Series B (Methodological)

Journal ofClinical Oncology

Biometrika

Advances in therapy

Blood, The Journal of the American Society of Hematology

Nature biotechnology

International Journal of Production Research

JNCI: Journal of the National Cancer Institute

Clinical Trials

Annals of Oncology

Statistical science pages 382–401.Huang, Y.-T. and Yang, H.-I. (2017). Causal mediation analysis of survival outcomewith multiple mediators.

Epidemiology (Cambridge, Mass.)

JCO Precision Oncology Cancer

Statistics in medicine

Psychological methods Journal of counseling psychol-ogy

The annals of applied statistics Bayesian analysis (Online)

Behavior research methods

Biometrics

Statisticsin Medicine .Pestana, R. C., Sen, S., Hobbs, B. P., and Hong, D. S. (2020). Histology-agnostic drugdevelopmentconsidering issues beyond the tissue.

Nature Reviews Clinical Oncology pages 1–14. 23rowell, T. M., Theoret, M. R., and Pazdur, R. (2016). Seamless oncology-drug devel-opment.

The New England journal of medicine

Biometrika

Diagnostics Theinternational journal of biostatistics Journalof the National Cancer Institute

Statistical methods in medical research

Epidemiology(Cambridge, Mass.)

Annals ofOncology

Structural Equation Modeling: A Multidisciplinary Journal

Journal ofmathematical psychology

New England Journal of Medicine

Psychologicalmethods

Organizational Research Methods

Supplementary materials

Mediation model diagram

We study the relationships of treatment, tumor response and survival outcomes underthe mediation analysis framework in Figure 4. Based on this triangular framework,treatment eﬀect on the survival outcome can be decomposed into the indirect eﬀect viatumor response and the remaining direct eﬀect.

Implementation of RJMCMC using “nimble” package

One challenge of implementation of RJMCMC is that the dimension of the parameterspace changes across diﬀerent models. We use the R package “nimble”, whch allows acombination of high-level processing in R and low-level processing in C++, to implementthe RJMCMC for the mediation models. We can write the models in NIMBLE language,which is an extended version the BUGS language, and it will generate and compile theC++ code for the models and assign default samplers for each node.To implement RJMCMC in “nimble”, we combine the nodes for the coeﬃcients β and γ with their model indicator nodes z and w . This can be done by the in-builtfunction “conﬁgureRJ()”. As a result, the function is set up so that a coeﬃcient willonly be estimated when the corresponding indicator value is non-zero.To enforce the constraints on the indicators, “nimble” provides a general way us-ing “dconstraint()”. For example, the constraint on z can be realized by specifying“ constraint ∼ dconstraint ( z ∗ z ≥ z )” in the model and setting constraint θ and and the indicators z and w :25 tep 1. Obtain the prior probabilities ψ z and ψ w based on preassigned modelweights. This can be achieved by minimizing the standard deviations of the modelprobabilities divided by weights. We use “GenSA” package to ﬁnd the optimizedvalues. Step 2.

Write down the model in “nimbleCode()” function. This contains threeparts:2(a). Specify the prior distributions for each parameter in θ and the indicators z and w .2(b). Use dconstraint() to put constraints on the indicators.2(c). Deﬁne the likelihood for the response and survival models. Step 3.

Build the model using the “nimbleModel()” function, and specify theinitial values.

Step 4.

Create MCMC conﬁguration use “conﬁgureMCMC()” function, whichassigns default samplers to each node based on the model deﬁnition.

Step 5.

Apply RJMCMC using the “conﬁgureRJ()” function: connect the nodesfor the coeﬃcients with the nodes for the corresponding indicators.

Step 6.

Build the MCMC object for the modiﬁed samplers in Step 5 using the“buildMCMC()” function and compile the model and MCMC using the “com-pileNimble()” function.

Step 7.

Draw MCMC samples for θ and the indicators using the “runMCMC()”function. MCMC parameters can be speciﬁed here, such as number of chains andthe number of samples for burn-in and thinning.26able 4: Simulation study: summary of parameter estimates (n=500)par. Bias MStd CP Bias MStd CPScenario I Scenario II β β β -0.034 0.124 0.930 -0.004 0.086 0.960 β γ -0.009 0.023 0.990 0.062 0.136 0.860 γ -0.011 0.123 0.940 0.003 0.016 1.000 γ γ γ γ ν λ β β β -0.024 0.124 0.960 -0.058 0.127 0.960 β γ -0.010 0.211 0.750 0.001 0.012 1.000 γ γ γ γ -0.003 0.008 1.000 0.000 0.000 1.000 γ ν λ -0.033 0.139 0.790 0.003 0.075 0.940 More simulation results

Simulation designs

The true survival curves for the treatment and control groups are plotted in Figure 5.

Simulation results

Simulation results for smaller sample size n = 500 are listed in Table 4 and Table 5.Comparing with the results for n = 1000 in the main article, the model selection isbetter and the bias get smaller as sample size increases.The median of the mediation proportions is calculated at each time point and plotted27able 5: Simulation study: posterior model probability summary (n=500)Scenario I ( R & S ) Scenario II ( R & S ) Scenario III ( R & S ) Scenario IV ( R & S )Response model R R R R R S S S S S S S S S S S S S S S S S S n = 500 (Figure 6) and n = 1000 (Figure 7). The truthis plotted in solid red lines as reference in the plots. The dotted lines are the 2.5% and97.5% quantiles of the estimated mediation proportion in the simulation. Note that theﬁrst three scenarios have median values close to the truth, but the variation can be verylarge in the ﬁrst two scenarios, where we have extreme situations of either no directeﬀect or no mediation eﬀect. The mediation proportion calculated in the last scenariodoes not make sense because all the risk ratios are equal to zero.29 redicted power evaluation The scatter plots of actual p-values and one minus predicted power for sample sizes n = 500 and 1000 are presented in Figure 8 and 9, respectively. Since the relations arenot linear, we calculated the Spearman’s correlation coeﬃcients for each one, and largercorrelation coeﬃcients indicate better prediction ability of the model.Another way to present the relationships between the predicted power and actualp-values from log-rank tests is through ROC curves. We present the ROC curves andcalculate the area under the curve (AUC) in Figures 10 and 11 for sample sizes n = 500and 1000, respectively. Larger value of AUCs is a indicator of good prediction abilityof the model. The sample sizes of new data has nothing to do with the AUCs, sinceboth the actual and predicted outcomes have more chance to be signiﬁcant as samplesize increases. Due to the space limit, we only present the curves for prediction samplesize n = 200, other cases have similar results. The AUCs are above 0.75 most of thetime, indicating a fairly good prediction ability of the estimated model.30able 6: Summary of coeﬃcient estimates for colorectal cancer studyEstimate Std 95% HPDResponse modelIntercept -0.3199 0.1457 -0.6347 -0.0772Trt (FOLFOX) 0.1273 0.2286 0 0.6243Age ( ≥

65) -0.0002 0.0103 0 0Trt × Age 0 0 — —Survival model ν λ Y -0.5136 0.0878 -0.6767 -0.3325Age ( ≥

65) 0.0079 0.0429 0 0Trt × Y × Age 0 0 — —Age × Y S S S S S S S S S S S S − S RowSums R R R R R Additional information on real data analysis

We list the estimated regression coeﬃcients in the mediation models in Table 6 andthe posterior model probabilities in Table 7. Based on the marginal model probabilities,we see that in the response model, the null model ( R ) and the model with only treatmenteﬀect ( R ) constitute 72.9% and 26.9% of the samples. In the survival model, the massof the posterior model probabilities lies in the model with only response ( S ) and themodel with treatment and response ( S ), which are 63.7% and 32.1%, respectively.The distribution of the estimated mediation proportion is summarized in Figure 12.The mean curve in solid line is around 0.7 and the quantiles above 0.4 are equal to 1.Lower quantiles are marked with dashed lines in the ﬁgure.31 a) Kaplan-Meier curves by response outcomes(b) log Risk ratios Figure 3: Colorectal cancer study ((a): CR = complete response, PR = partial response,SD = stable disease and PD = progressive disease; (b) shows the median (solid line)and 95% HPD intervals (dashed lines) of the estimated log risk ratios).32igure 4: Mediation model diagram33igure 5: True survival curves under diﬀerent scenarios (I: indirect eﬀect only; II: directeﬀect only; III: both eﬀects exist; IV: neither eﬀect exist.)34igure 6: Estimated mediation proportions under diﬀerent scenarios for n = 500 (I:indirect eﬀect only; II: direct eﬀect only; III: both eﬀects exist; IV: neither eﬀect exist.)35igure 7: Estimated mediation proportions under diﬀerent scenarios for n = 1000 (I:indirect eﬀect only; II: direct eﬀect only; III: both eﬀects exist; IV: neither eﬀect exist.)36igure 8: Scatter plots for p-values and predicted power (n=500)37igure 9: Scatter plots for p-values and predicted power (n=1000)38igure 10: ROC curves for evaluating prediction ability of the models for n = 500 (I:indirect eﬀect only; II: direct eﬀect only; III: both eﬀects exist; IV: neither eﬀect exist.)39igure 11: ROC curves for evaluating prediction ability of the models for nn