[PDF] Design of phase III trials with long-term survival outcomes based on short-term binary results

Abstract

Pathologic complete response (pCR) is a common primary endpoint for a phase II trial or even accelerated approval of neoadjuvant cancer therapy. If granted, a two-arm confirmatory trial is often required to demonstrate the efficacy with a time-to-event outcome such as overall survival. However, the design of a subsequent phase III trial based on prior information on the pCR effect is not straightforward. Aiming at designing such phase III trials with overall survival as primary endpoint using pCR information from previous trials, we consider a mixture model that incorporates both the survival and the binary endpoints. We propose to base the comparison between arms on the difference of the restricted mean survival times, and show how the effect size and sample size for overall survival rely on the probability of the binary response and the survival distribution by response status, both for each treatment arm. Moreover, we provide the sample size calculation under different scenarios and accompany them with an R package where all the computations have been implemented. We evaluate our proposal with a simulation study, and illustrate its application through a neoadjuvant breast cancer trial.

Full PDF

DDesign of phase III trials with long-term survival outcomes based on short-termbinary results

Marta Boﬁll Roig a , Yu Shen b , Guadalupe G´omez Melis a a Departament d’Estad´ıstica i Investigaci´o Operativa, Universitat Polit`ecnica de Catalunya, Barcelona, Spain b Department of Biostatistics, The University of Texas MD Anderson Cancer Center, Houston, Texas, U.S.A.

Abstract

Pathologic complete response (pCR) is a common primary endpoint for a phase II trial or even accelerated approvalof neoadjuvant cancer therapy. If granted, a two-arm conﬁrmatory trial is often required to demonstrate the e ﬃ cacywith a time-to-event outcome such as overall survival. However, the design of a subsequent phase III trial based onprior information on the pCR e ﬀ ect is not straightforward.Aiming at designing such phase III trials with overall survival as primary endpoint using pCR information fromprevious trials, we consider a mixture model that incorporates both the survival and the binary endpoints. We proposeto base the comparison between arms on the di ﬀ erence of the restricted mean survival times, and show how the e ﬀ ectsize and sample size for overall survival rely on the probability of the binary response and the survival distributionby response status, both for each treatment arm. Moreover, we provide the sample size calculation under di ﬀ erentscenarios and accompany them with an R package where all the computations have been implemented. We evaluateour proposal with a simulation study, and illustrate its application through a neoadjuvant breast cancer trial. Keywords: mixture model; restricted mean survival times; sample size; randomized controlled trial; breast cancer

1. Introduction

Neoadjuvant therapy, hoping to shrink a tumor before surgery, has become increasingly common for early-stagebreast cancer. Examples of neoadjuvant therapy include chemotherapy, radiation therapy, and hormone therapy.Neoadjuvant therapy permits breast conservation in patients who otherwise require mastectomy; enables direct evalu-ation of the tumor response, which may add prognostic information; and allows for the examination of tissue, imaging,and biomarkers from biopsy.[1]In the context of early-stage breast cancer, the use of the binary endpoint pathologic complete response (pCR),deﬁned as the complete eradication of invasive cancer, has been proposed as an endpoint for accelerated approval byregulatory agencies. Trials under accelerated pathways are allowed to base the beneﬁt on surrogate or intermediateendpoints. If granted, a conﬁrmatory trial is still needed to demonstrate the e ﬃ cacy based on long-term endpoints,such as overall survival (OS) or event-free survival (EFS).[1, 2]The association between pCR and survival endpoints has been extensively discussed in recent years. A meta-analysis of existing randomized clinical trials and some cohort studies on neoadjuvant treatments showed that pCRin non-metastatic HER2-positive patients was associated with longer times to recurrence or death.[3, 4, 5] However,Cortazar et al[4] insightfully noted that higher pCR rates due to an intervention could not be used as a surrogateendpoint for improved EFS and OS at the trial-level analyses; indeed, little association was observed between anincreased number of pCR responses and improved OS or EFS. Whereas at an individual patient level pCR endpointcould be strongly associated with long-term EFS and OS endpoints,[5] this is not yet a proven correlation at the triallevel. How to predict a beneﬁcial treatment e ﬀ ect on survival endpoints based on pCR improvement at trial level isnontrivial. Email addresses: [email protected] (Marta Boﬁll Roig), [email protected] (Yu Shen), [email protected] (Guadalupe G´omez Melis)

Preprint submitted to Journal September 1, 2020 a r X i v : . [ s t a t . M E ] A ug everal challenges arise when using pCR as a surrogate endpoint for a survival endpoint in the design of a phaseIII trial. Hatzis et al[6] discuss how a large pCR treatment e ﬀ ect may not be translated into a similar long-termsurvival e ﬀ ect at the trial level. The authors pointed out that modest or even large improvements in pCR rate mayonly translate into small improvements in survival endpoints, particularly in a patient population with good long-termprognoses. They evaluate, through simulation, the relationship between increased pCR rate and survival endpointswith di ﬀ erent baseline prognoses under two critical assumptions: (i) the survival distributions of patients with pCRare the same, regardless which treatment was received; and (ii) the survival function of patients who do not achievepCR is also similar in both arms. These two assumptions imply that the survival beneﬁt from the new therapy arm isonly derived from the improved pCR rate over the control arm. Whereas these assumptions could be reasonable fortriple-negative breast cancer under the two treatments of interest, this may not be the case for other trial settings, e.g.,when the new therapy could reduce the extent of the disease even among non-pCR patients, thus improving long-termsurvival endpoints compared to the control arm.An imperative and practical question is: how can we use the short term e ﬃ cacy results on pCR to design a phase IIItrial with long-term survival endpoints? Based on historical data, we may have reasonable information on the survivaldistributions among those who achieved pCR, and the survival distributions among those who do not for similarpatient populations. Motivated by such challenges, in this paper we aim to design a phase III trial for a two-samplecomparison between the control group and the intervention group, with respect to a long-term survival endpoint basedon previous information on a short-term binary endpoint and other available information.Several authors have addressed how to compare two treatment groups in seamless phase II-III clinical trials withbinary and survival endpoints. While Inoue et al[7] proposed a Bayesian approach where the distribution of thetime-to-event outcome is speciﬁed through a mixture model according to the response, Lai et al[8] developed groupsequential tests for conﬁrmatory testing based on likelihood ratio statistics for sample proportions and partial likeli-hood ratio statistics for censored survival data. On the other hand, using an early response to treatment as a potentialsurrogate endpoint for survival, Chen et al[9] proposed a joint model for binary response and a survival time forclustered data, basing the statistical inference on a multivariate penalized likelihood method.The design of clinical trials with survival endpoints taking into account previous information on binary endpointshas received less attention. Abberbock et al[10] assessed how to relate the improvement in pCR with the e ﬀ ect sizein survival in the context of neoadjuvant breast cancer trials. In order to cope with the possibility of a non-constanthazard ratio, Abberbock et al considered the average of the hazard ratio over the follow-up period as the e ﬀ ect measure.Schoenfeld’s formula is then used to calculate the required number of events and augmented for the sample size underthe exponential distribution. However, as it has been widely discussed, both the hazard ratio and average hazard ratioare di ﬃ cult to interpret.[11, 12, 13, 14, 15] Furthermore, since the hazard ratio is sensitive to the length on which it iscomputed,[16, 17] it can be di ﬃ cult to anticipate in the design stage.In this paper, we propose a design for clinical trials with long-term survival endpoints, given the information onshort-term binary endpoints. We distinguish between patients who respond to the binary endpoint, called responders,and those who do not, called non-responders. The anticipated e ﬀ ect size and sample size are calculated on the basisof the response rate of the binary endpoint, as well as on the survival functions for responders and non-responders ineach treatment arm. The corresponding survival distribution for each arm is hence a mixture between responders andnon-responders, and the common assumption on the proportionality of the hazard rates is unlikely to be satisﬁed. Toovercome these di ﬃ culties we propose to consider the restricted mean survival time for each treatment arm, and touse their di ﬀ erence as the basis of comparison. The di ﬀ erence between the restricted mean survival times has severaladvantages: it is easily interpretable as the mean di ﬀ erence in the survival times by the end of follow up,[13, 14] andit does not require the proportionality of the hazards.We present the e ﬀ ect size and sample size formulae for designing such trials. As shown in this work, the samplesize calculation requires previous knowledge on the survival by response distributions and the response rates for eachtreatment group. We additionally outline how the sample size can be determined according to di ﬀ erent parameterchoices. In order to make our proposal easy to use in practice, our method has been implemented in the survmixer R package, which is publicly available.This paper is organized as follows. In Section 2, we introduce the notation and main assumptions. In Section 3, wedescribe the design of trials using restricted mean survival times and provide the e ﬀ ect size and sample size formulaebased on short-term endpoints and survival by response information. We present the R package survmixer in Section4 and illustrate our proposal using a neoadjuvant trial in Section 5. We perform a simulation study in Section 6 to2valuate the performance our proposal. We conclude the paper with a short discussion.

2. Notation and Assumptions

Consider a randomized clinical trial designed to compare two treatment groups, control group ( i =

0) and in-tervention group ( i = n ( i ) individuals, and denoting by n = n (0) + n (1) the total sample size.Suppose that individuals from both groups are followed over the time interval [0 , T end ] and are compared with respectto a long-term time-to-event outcome evaluated within the interval [0 , τ ] (0 < τ ≤ T end ), such as event-free survival(EFS). Let T i j and C i j be the time from randomization to the long-term outcome and to censoring, respectively, foreach subject j = , ..., n ( i ) , i = ,

1, and let S ( i ) ( · ) be the survival function of T i j for the i -th group. Assume that T i j and C i j are independent.Assume that we have prior information on a short-term outcome, such as the pCR status.[3, 5] Based on thisshort-term outcome, each patient who achieved the response is deemed a responder; otherwise, they are considered anon-responder. Let X i j = j -th patient in the i -th group is a responder; otherwise 0. Let p ( i ) be theprobability of having responded.We aim to design a superiority phase III trial for a two-sample comparison between the control group and theintervention group with respect to the long-term outcome as we establish in the following hypothesis:H : S (0) ( t ) = S (1) ( t ) , ∀ t ∈ [0 , τ ] vs H : S (1) ( t ) ≥ S (0) ( t ) , S (1) ( t ) > S (0) ( t ) for some t ∈ [0 , τ ] . (1) ﬀ ect size in terms of the short-term outcome and responders Let S ( i ) r ( t ) = P ( T i j > t | X i j =

1) and S ( i ) nr ( t ) = P ( T i j > t | X i j =

0) denote the survival functions for responders andnon-responders in the i -th group, respectively. The survival function for the long-term outcome in the i -th group canthen be expressed as a mixture of them as follows: S ( i ) ( t ) = P ( T i j > t , X i j = + P ( T i j > t , X i j = = p ( i ) · S ( i ) r ( t ) + (1 − p ( i ) ) · S ( i ) nr ( t ) . (2)From the survival function given in (2), the di ﬀ erence in survival functions at t ( t ∈ [0 , τ ]) is as follows: S (1) ( t ) − S (0) ( t ) = p (1) (cid:16) S (1) r ( t ) − S (1) nr ( t ) (cid:17) − p (0) (cid:16) S (0) r ( t ) − S (0) nr ( t ) (cid:17) + S (1) nr ( t ) − S (0) nr ( t ) . Although the hazard ratio is the most commonly used e ﬀ ect measure in survival analysis, it relies on the assump-tion of a constant hazard ratio over time between the two groups. However, under a mixture model such as (2), theproportionality of the hazards rarely holds, even if the survival functions for responders and non-responders are expo-nentially distributed,[10] as we will illustrate in Section 3.2.1. The expression of the hazard ratio under the mixturemodel can be found in the Appendix (Theorem Appendix A.2).As an alternative to the hazard ratio to quantify the e ﬀ ect of an intervention, we can use the di ﬀ erence of therestricted mean survival times (RMSTs) for each group. The RMST is deﬁned as the mean survival time within aspeciﬁc time window [0 , τ ], and it corresponds to the area under the survival curve until τ : K ( i ) ( τ ) = (cid:90) τ S ( i ) ( t ) dt . (3)The RMST corresponding to the survival function given in (2) can be expressed as follows: K ( i ) ( τ ) = p ( i ) K ( i ) r ( τ ) + (1 − p ( i ) ) K ( i ) nr ( τ ) , where K ( i ) r ( τ ) = (cid:82) τ S ( i ) r ( t ) dt and K ( i ) nr ( τ ) = (cid:82) τ S ( i ) nr ( t ) dt denote the RMST for the responders survival and the non-responders survival in the i -th treatment arm, respectively. The di ﬀ erence between arms in RMSTs is then: K (1) ( τ ) − K (0) ( τ ) = (cid:16) p (1) K (1) r ( τ ) − p (0) K (0) r ( τ ) (cid:17) + (cid:16) (1 − p (1) ) K (1) nr ( τ ) − (1 − p (0) ) K (0) nr ( τ ) (cid:17) , (4)where each term in the above sum corresponds to a function of the responders and for non-responders survival func-tions. Details for this derivation are provided in the Appendix (see Theorem Appendix A.1).Note that if H in (1) is true, then K (1) ( τ ) − K (0) ( τ ) =

0; whereas, under H , K (1) ( τ ) − K (0) ( τ ) > : K (0) ( τ ) = K (1) ( τ ) , ∀ t ∈ [0 , τ ] vs H : K (1) ( τ ) > K (0) ( τ ) . (5)From now on and for the rest of the paper, we focus on H and H as stated in (5).3 . Sample Size based on short-term outcomes and survival-by-responders endpoints In this section, we start describing the two-sample test based on the di ﬀ erence of the RMSTs. We then provide theexpression for the overall mean survival improvement and discuss di ﬀ erent design settings. We end the section withthe derivation of the sample size based on the mixture survival function. Let ˆ S ( i ) ( · ) be the Kaplan-Meier estimate of S ( i ) ( · ). A consistent estimator of the RMST in (3) is given by ˆ K ( i ) ( τ ) = (cid:82) τ ˆ S ( i ) ( t ) dt . The distribution of √ n ( i ) · (cid:16) ˆ K ( i ) ( τ ) − K ( i ) ( τ ) (cid:17) is asymptotically normal with 0 mean and limiting variance (cid:16) σ ( i ) ( τ ) (cid:17) , deﬁned as: (cid:16) σ ( i ) ( τ ) (cid:17) = − (cid:90) τ ( K ( i ) ( τ ) − K ( i ) ( t )) ( S ( i ) ( t )) · G ( i ) ( t ) dS ( i ) ( t ) = − (cid:90) τ ( (cid:82) τ t S ( i ) ( u ) du ) ( S ( i ) ( t )) · G ( i ) ( t ) dS ( i ) ( t ) , (6)where G ( i ) ( · ) is the survival function of the censoring variable C i j for the i -th group.To test the null hypothesis H in (5) against the alternative hypothesis H , we consider the statistic: Z s , n = ( ˆ K (1) ( τ ) − ˆ K (0) ( τ )) (cid:46) (cid:114) ( ˆ σ (0) ( τ )) n (0) + ( ˆ σ (1) ( τ )) n (1) , (7)where ˆ σ ( i ) ( τ ) is the estimate of the variance (cid:16) σ ( i ) ( τ ) (cid:17) , which is obtained by substituting K ( i ) ( · ), S ( i ) ( · ), and G ( i ) ( · ) byits corresponding estimates ˆ K ( i ) ( · ), ˆ S ( i ) ( · ), and ˆ G ( i ) ( · ).Let D ( τ ) = K (0) ( τ ) − K (1) ( τ ) be the minimum meaningful e ﬀ ect to be detected. Then, ˆ D ( τ ) = ˆ K (0) ( τ ) − ˆ K (1) ( τ )is a consistent estimator of D ( τ ). Moreover, √ n (cid:16) ( ˆ K (0) ( τ ) − ˆ K (1) ( τ )) − ( K (0) ( τ ) − K (1) ( τ )) (cid:17) is asymptotically normaldistributed with mean 0 and variance equal to ( σ (0) ( τ ) + σ (1) ( τ )) . By Slutsky’s theorem, the statistic Z s , n is asymp-totically N (0 ,

1) under H and asymptotically normal with mean equal to D ( τ ) (cid:14) (cid:112) (( σ (0) ( τ )) + ( σ (1) ( τ ))) and unitvariance under a ﬁxed alternative equal to D ( τ ).For further details on the restricted mean survival times, we refer to the work of Luo et al,[18] Zhao et al,[19] Pepeand Fleming,[20] Gill,[21] and the references therein. ﬀ ect size We denote by ∆ r ( τ ) = K (1) r ( τ ) − K (0) r ( τ ) and ∆ nr ( τ ) = K (1) nr ( τ ) − K (0) nr ( τ ) the mean survival improvement of theintervention group over the control group for responders and non-responders, respectively; and by ∆ ( τ ) = K (0) r ( τ ) − K (0) nr ( τ ) the mean survival improvement of responders against non-responders in the control group. The treatmente ﬀ ect on the response rate is denoted by δ p = p (1) − p (0) .The overall mean survival improvement between treatment groups in (4) can be re-expressed as: D ( τ ) = K (1) ( τ ) − K (0) ( τ ) = p (1) · ∆ r + (1 − p (1) ) · ∆ nr + ( p (1) − p (0) ) · ∆ . (8)Summarizing, the e ﬀ ect size D ( τ ) is then a function of the: • ∆ r = ∆ r ( τ ): Mean survival improvement due to intervention among responders by time τ . • ∆ nr = ∆ nr ( τ ): Mean survival improvement due to intervention among non-responders by time τ . • ∆ = ∆ ( τ ): Mean survival improvement of responders versus non-responders in the control group by time τ . • δ p : Improvement due to intervention on the response rate. Note that p (1) = p (0) + δ p . • p (0) : Probability of response in the control group.When designing a future phase III trial, we need to work closely with our medical collaborators to obtain thesequantities, which may be procured from ongoing or ﬁnished phase II trials (assessing similar agents) or historical datausing prior scientiﬁc knowledge. 4 .2.1. E ﬀ ect size under di ﬀ erent settings Next, we discuss four di ﬀ erent settings according to whether or not the survival functions between groups are thesame for responders and non-responders. We consider that the response rate of the intervention arm is higher than thatof the control arm ( p (1) − p (0) > S (1) r ( t ) = S (0) r ( t ) , S (1) nr ( t ) = S (0) nr ( t ) , S (0) r ( t ) > S (0) nr ( t ): The survival function for responders is expected to besuperior to that of non-responders. Hence, ∆ r = K (1) r ( τ ) − K (0) r ( τ ) = ∆ nr = K (1) nr ( τ ) − K (0) nr ( τ ) = ∆ >

0, and D ( τ ) = ( p (1) − p (0) ) · ∆ . We note that, even in this simple scenario, the two overall survival functions, S (1) ( · ) and S (0) ( · ), are unlikely tosatisfy the proportional hazards assumption. Under the exponential case for both responders and non responders,the overall survival distribution is no longer exponential: S ( i ) ( t ) = p ( i ) · exp {− λ ( i ) r · t } + (1 − p ( i ) ) · exp {− λ ( i ) nr · t } , where S ( i ) r ( t ) = exp {− λ ( i ) r · t } and S ( i ) nr ( t ) = exp {− λ ( i ) nr · t } . Moreover, the hazards ratio between two treatmentgroups is then:HR( t ) = p (0) · exp {− λ (0) r · t } + (1 − p (0) ) · exp {− λ (0) nr · t } p (1) · exp {− λ (1) r · t } + (1 − p (1) ) · exp {− λ (1) nr · t } · p (1) · λ (1) r exp {− λ (1) r · t } + (1 − p (1) ) · λ (1) nr exp {− λ (1) nr · t } p (0) · λ (0) r exp {− λ (0) r · t } + (1 − p (0) ) · λ (0) nr exp {− λ (0) nr · t } , showing that the hazard rates are not constant over time.(II) S (1) r ( t ) = S (0) r ( t ) , S (1) nr ( t ) > S (0) nr ( t ) (for t > τ b ): Survival improvement due to the intervention for non-responders,but not for responders. This leads to ∆ r = ∆ nr >

0, and hence: D ( τ ) = (1 − p (1) ) · ∆ nr + ( p (1) − p (0) ) · ∆ . (III) S (1) r ( t ) > S (0) r ( t ) , S (1) nr ( t ) = S (0) nr ( t ) (for t > τ b ): Responders in the intervention group have longer survival thancontrol group responders, while there is no mean survival improvement among non-responders. Thus, we have ∆ r > ∆ nr =

0, and then: K (1) ( τ ) − K (0) ( τ ) = p (1) · ∆ r + ( p (1) − p (0) ) · ∆ . (IV) S (1) r ( t ) > S (0) r ( t ) , S (1) nr ( t ) > S (0) nr ( t ) (for t > τ b ): Both responders and non-responders of the intervention grouphave longer survival than those in the control group. Then ∆ r > ∆ nr >

0, so that the overall mean survivalimprovement between groups is: D ( τ ) = p (1) · ∆ r + (1 − p (1) ) · ∆ nr + ( p (1) − p (0) ) · ∆ . In order to compute the sample size to test (5) based on the statistic Z s , n , given in (7), we need information onthe following quantities: i) the responders rates p (0) and p (1) or, alternatively, instead of p (1) , the e ﬀ ect given by δ p = p (1) − p (0) ); ii) the responders and non-responders survival functions S ( i ) r ( · ) and S ( i ) nr ( · ), i = ,

1; and iii) thesurvival censoring function, G ( i ) ( · ).Let n ( p , S r , S nr , G ) be the sample size needed for running a trial at signiﬁcance level α with power 1 − β . Notethat the group indicator has been omitted in the notation for short. The formula for calculating the total sample size n ( p , S r , S nr , G ) is given by: n ( p , S r , S nr , G ) = ( z α + z β ) ( D ( τ )) · (cid:32) ( σ (0) ( τ )) π + ( σ (1) ( τ )) − π (cid:33) , (9)where π = lim n (0) / n , z x is the 100 × (1 − x )-th percentile of the standard normal distribution, and the variance ( σ ( i ) ( τ )) is:( σ ( i ) ( τ )) = − p ( i ) (cid:90) τ ( (cid:82) τ t ( S ( i ) r ( u ) p ( i ) + S ( i ) nr ( u )(1 − p ( i ) )) du ) ( S ( i ) r ( t ) p ( i ) + S ( i ) nr ( t )(1 − p ( i ) )) · G ( i ) ( t ) dS ( i ) r ( t ) − (1 − p ( i ) ) (cid:90) τ ( (cid:82) τ t ( S ( i ) r ( u ) p ( i ) + S ( i ) nr ( u )(1 − p ( i ) )) du ) ( S ( i ) r ( t ) p ( i ) + S ( i ) nr ( t )(1 − p ( i ) )) · G ( i ) ( t ) dS ( i ) nr ( t ) . D ( τ ), we can employ either (4), where the restricted mean survival times for respondersand non-responders are used, or (8), where the anticipated survival beneﬁts for responders and non-responders areconsidered. However, we notice that, if using (8) for anticipating D ( τ ) in n ( p , S r , S nr , G ), the anticipation of S ( i ) r ( · )and S ( i ) nr ( · ) in ( σ ( i ) ( τ )) has to be in consonance with the expected e ﬀ ect sizes ( ∆ r , ∆ nr , ∆ ). The prior information needed to compute the sample size might be in terms of di ﬀ erent summary statistics. Inthis section, we propose three distinct sets of summaries; choosing one or another would depend on the previous in-formation on the survival functions for responders and non-responders. These three summary statistics are essentiallyequivalent under our distributional assumptions.In this section, we assume that both responders and non-responders survival functions follow exponential distri-butions. Similar derivations, if Weibull distributions are assumed, can be found in the supplementary material. Thecensoring distribution is assumed to be exponential.Researchers should have the response rate in the control group, p (0) , the anticipated e ﬀ ect due to the interventionin the response rate, δ p , and the scale parameter for the censoring distribution. Furthermore, one of the following threesets of summaries are needed: • Summary statistics (I): Sample size based on ( m (0) r , m (0) nr , m (1) r − m (0) r , m (1) nr − m (0) nr ): We would need the mean survival time for responders and non-responders distributions in the control group, m (0) r and m (0) nr , and the di ﬀ erences in mean survival time for responders and non-responders, m (1) r − m (0) r and m (1) nr − m (0) nr , respectively.From there, we could directly translate these anticipated values to the parameters of the exponential distributionsand calculate the sample size accordingly using (9). • Summary statistics (II): Sample size based on ( S (1) r ( τ ) − S (0) r ( τ ) , S (1) nr ( τ ) − S (0) nr ( τ ) , S (0) r ( τ ) , S (0) nr ( τ ) ): We would require here the τ -year survival rates for responders and non-responders in the control group, S (0) r ( τ )and S (0) nr ( τ ), and the di ﬀ erence in survival functions at τ for responders and non-responders, S (1) r ( τ ) − S (0) r ( τ ) and S (1) nr ( τ ) − S (0) nr ( τ ).Based on this information, we could deduce the parameters of the exponential distributions and calculate thesample size according to 9. • Summary statistics (III): Sample size based on ( S (0) r ( τ ) , S (0) nr ( τ ) , ∆ r , ∆ nr ): We would need the τ -year survival rates for responders and non-responders in the control group, S (0) r ( τ ) and S (0) nr ( τ ), and the mean survival improvement for responders and non-responders, ∆ r and ∆ nr .Based on this information, we establish the underlying relationships between the anticipated set of parametersand the parameters of the exponential distribution and get approximated values for the rate parameters using theTaylor series. Once we have the parameters, we calculate the sample size according to 9.In the supplementary material, we stated the formulae that we have used to obtain the distributional parametersfrom each of the summary statistics.

4. Implementation

Results in Section 3 allow for the calculation of the sample size and e ﬀ ect size for overall survival based on theresponse rate and the survival-by-response information. To make these results accessible to clinical trial practition-ers, we have created the R package survmixer ( https://github.com/MartaBofillRoig/survmixer ), whichincorporates two main functions: survm effectsize and survm samplesize for calculating the e ﬀ ect size (RMSTdi ﬀ erence) in (8) and the sample size in (9), respectively.In the function survm effectsize , the RMST di ﬀ erence can be computed based on two di ﬀ erent sets of argu-ments, the choice of which is based on the parameter anticipated effects . If anticipated effects is TRUE ,the overall mean survival improvement is computed based on the formula (8), and then using the set of arguments( ∆ r , ∆ nr , ∆ , p (0) , δ p ), that is: 6 urvm _ effectsize ( Delta _r , Delta _0, Delta _nr , delta _p ,p0 , anticipated _ effects = TRUE ) where Delta r, Delta 0, Delta nr, delta p, p0 are the already introduced parameters ∆ r , ∆ nr , ∆ , ∆ p and p (0) .On the other hand, if anticipated effects is FALSE , the overall mean survival improvement is computedaccording to (4) and then based on the set of arguments ( S (0) r ( · ) , S (1) r ( · ) , S (0) nr ( · ) , S (1) nr ( · ) , p (0) , δ p ), that is: survm _ effectsize ( ascale0 _r , ascale0 _nr , ascale1 _r , ascale1 _nr , delta _p , p0 ,bshape0 =1 , bshape1 =1 , tau , anticipated _ effects = FALSE ) where delta p, p0 are the e ﬀ ect size and event rate for the response rate ( δ p , p (0) ); ascale0 r,ascale0 nr , ascale1 r,ascale1 nr are the scale parameters for the distribution in both the control and intervention groupsfor responders and non-responders; and tau is the end of follow up. The responders and non-responders survivalfunctions are assumed to be exponentially distributed. However, they can be assumed to be Weibull distributed byusing the arguments bshape0,bshape1 , which are the shape parameters in the control and intervention groups.The survm samplesize function computes the sample size on the basis of di ﬀ erent sets of summary statistics, asexplained in Section 3.3.1. The user can choose which set of summaries to use by means of the argument set param .This function can be called for each of the parameter settings by: survm _ samplesize ( m0 _r , m0 _nr , diffm _r , diffm _nr , delta _p ,p0 , tau , ascale _ cens ,alpha , beta , set _ param =1)survm _ samplesize ( S0 _r , S0 _nr , diffS _r , diffS _nr , delta _p ,p0 , tau , ascale _ cens ,alpha , beta , set _ param =2)survm _ samplesize ( Delta _r , Delta _nr , S0 _r , S0 _nr , delta _p ,p0 , tau , ascale _ cens ,alpha , beta , set _ param =3) where the arguments: • m0 r, m0 nr are the mean survival time for responders and non-responders in the control group ( m (0) r , m (0) nr ); • diffm r, diffm nr are the di ﬀ erence in mean survival time between group for responders and non-responders( m (1) r − m (0) r , m (1) nr − m (0) nr ); • S0 r, S0 nr are the τ -year survival rates for responders and non-responders in the control group ( S (0) r ( τ ), S (0) nr ( τ )); • diffS r, diffS nr are the di ﬀ erence in survival functions at τ for responders and non-responders ( S (1) r ( τ ) − S (0) r ( τ ) , S (1) nr ( τ ) − S (0) nr ( τ )); • Delta r, Delta nr, delta p, p0 are the same arguments that we have in survm effectsize ( ∆ r , ∆ nr , δ p , p (0) ); • ascale cens is the scale parameter for the censoring distribution; • alpha and beta are the pre-speciﬁed type I and type II errors, respectively.

5. Motivating Example: The NOAH trial

In phase III of the NOAH (NeOAdjuvant Herceptin) trial,[22, 23] the primary objective was to assess whetherneoadjuvant chemotherapy with one year of trastuzumab improved event-free survival as compared with neoadjuvantchemotherapy alone in patients with HER2-positive breast cancer. Patients in the NOAH trial were randomly assignedto receive neoadjuvant chemotherapy alone or neoadjuvant chemotherapy plus one year of trastuzumab. The primaryendpoint was event-free survival, deﬁned as the time from randomization until disease recurrence, progression, ordeath from any cause. Secondary endpoints were, among others, pathological complete response (pCR) in breasttissue and overall survival.A total of 235 patients with HER2-positive disease were enrolled in the study, of whom 118 received chemotherapyalone and 117 received chemotherapy plus trastuzumab. The sample size was calculated using Shoenfeld’s formulato have 80% power to detect a hazard ratio of 0 .

545 on the primary endpoint at a two-sided α level of 0 .

05, assuminga median event-free survival of 5 . .

38 in the trastuzumab group and 0 .

19 in the chemotherapy group. Based on these results as model inputsand assuming exponential distributions for the survival and censoring distributions, the mean event-free survival valueis 8 .

36 and 5 .

61 years for responders and non-responders in the chemotherapy group, respectively, and 35 .

90 and 5 . .

17 years. We additionally assume equal exponentialcensoring distributions for the two groups with the mean equals to 7 years.Table 1 provides the values we are using to compute the sample size in this hypothetical new study. Figure 1 showsthe survival functions by response for each treatment arm, as well as the survival functions for the whole populationcalculated as the mixture of responders and non-responders given in (2). Figure 1 also plots the hazard ratio betweentreatment arms over time (given in (A.3)); observe that it lies between 0 .

45 to 0 .

65. This departure from constancy isthe consequence of the di ﬀ erent survival patterns of pCR responders and non-responders within the survival mixturemodel. Table 1:

Anticipated parameters for the design of the phase III trial:

Probability of the binary endpont pCR in each treatment group ( p (0) and p (1) ); 5-year survival rates for responders and non-responders in each treatment group, S ( i ) r ( τ ) and S ( i ) nr ( τ ), i = ,

1; and the mean survival time forthe exponential functions by treatment arm, by pCR response, and for the censoring distribution.

Parameters Anticipated values

Probability of achieving pCR in chemotherapy group 0 . . . . . . . . . . survmixer package (in Section 4). We calculate the overall mean survival improvement (RMST di ﬀ er-ence) between groups by means of the function survw effectsize . To do so, we consider the values of the mean sur-vival time for responders and non-responders in each group, the probability of achieving pCR response in the controlgroup, and the response di ﬀ erence between groups (see Table 1). As shown below, the function survw effectsize returns the overall mean survival improvement and the mean survival improvement that would be assumed for bothresponders and non-responders. survm _ effectsize ( ascale0 _r =8.37 , ascale0 _ nr =5.61 , ascale1 _r =35.90 , ascale1 _ nr =5.61 ,delta _p =0.19 , p0 =0.19 , tau =5) igure 1: The top-left plot depicts the survival functions for event-free survival for each treatment arm ( S (0) ( · ) and S (1) ( · )), calculated by meansof (2) and using the values in Table 1. The top-right plot is the hazard ratio over the follow-up period (calculated using (A.3)). The bottom-leftand bottom-right plots are the survival functions by response in each treatment arm ( S (0) r ( · ) and S (1) r ( · ) for responders, and S (0) nr ( · ) and S (1) nr ( · ) fornon-responders). The resulting overall mean survival improvement of trastuzumab over neoadjuvant chemotherapy is 0 .

43 years. Thisdi ﬀ erence is mainly because of the mean survival improvement among responders, since the mean survival improve-ment between groups ( ∆ r ) in the patients that responded is 0 .

90 years, whereas there is no improvement for non-responders, ∆ nr = ﬀ erence in survival functions between groups at τ = ﬀ erence in event-free survival between the two treatment arms over 5 years of followup. We employ the survw samplesize function to compute the sample size according to (9), obtaining that a totalsample size of 466 is needed. survm _ samplesize ( S0 _r =0.55 , S0 _ nr =0.41 , diffS _r =0.32 , diffS _ nr =0 , p0 =0.19 , delta _p=0.19 ,ascale _ cens =7 , tau =5 , alpha =0.05 , beta =0.2 , set _ param =2) We performed a simulation study to compare the statistical power of the NOAH trial and the one we proposedusing the parameter inputs derived from the NOAH trial. We assumed exponential distributions for the survivalfunctions for each of the four subgroups, considered the parameters in Table 1, and replicated 10000 times to estimatethe power to detect a statistically signiﬁcant di ﬀ erence in event-free survival between the two arms. When simulatingtrials of size n = .

33 using the log-rank test and anempirical power of 0 .

41 using the RMST test (in (7)) to detect an improvement in event-free survival of trastuzumabplus chemotherapy over chemotherapy alone. On the other hand, when using trials of size n = .

76 using the log-rank test and an empirical power of 0 .

80 using the RMST test. Note that in bothsituations our approach leads to higher powers. We have additionally evaluated the empirical power under variouscensoring percentages. The results (not included) show that the empirical powers using the RMST test were, all ofthem, around 0.80 and higher than the ones using the log-rank test.

6. Simulation Study

In this section, we conduct additional simulation studies to evaluate the performance of the proposed sample sizecalculation in terms of the signiﬁcance level and the power. We simulated a short-term binary endpoint accordingto the probability of responding to the treatment in control arm p (0) and a di ﬀ erence in response rate between armsof δ p = p (1) − p (0) . For the time-to-event endpoint, we generated the survival times from Weibull distributions forthe responders and non-responders survival distributions with scale parameters a ( i ) r and a ( i ) nr , respectively, and commonshape parameter b ; that is: S ( i ) r = exp {− ( t / a ( i ) r ) b } and S ( i ) nr = exp {− ( t / a ( i ) nr ) b } . (10)The censoring distributions were assumed equal between groups and exponential with scale parameter a cens = · m (0) nr ,where m (0) nr is the mean of the non-responders in group 0, that is, m ( i ) nr = a ( i ) nr · Γ (1 + / b ). The same assumption wasmade in Hatzis et al.[6]The parameter values used for the simulations are found in Table 2. We have only considered scenarios thatproduce realistic situations and, in particular, that satisfy that ∆ r ≥ ∆ nr ≥

0, and ∆ ≥

0. The set of scenarioswe considered is available in the GitHub repository survmixer . For each one of these scenarios, we computed therequired sample size using (9) for a one-sided test with power 1 − β = .

80 at signiﬁcance level α = .

05. Only thosescenarios that result in sample sizes between 100 and 5000 were taken into account. The total number of scenariosconsidered was 144.We performed 1000 replications for each conﬁguration and evaluated the power and the signiﬁcance level by usingthe RMST test in (7). For comparison purposes, we also present the results using the log-rank test.

The results yield to RMST di ﬀ erences for the overall survival between 0 .

15 and 1 .

82, with median 0 .

60; and tosample sizes between 125 and 4824, with median 685. Figure 1 in the supplementary material summarizes the samplesizes and e ﬀ ect sizes obtained for the overall survival under the considered scenarios with respect to the settings Ito IV discussed in Section 3.2.1. We notice that the scenarios corresponding to settings I and III are the ones withsmaller e ﬀ ect sizes, thus requiring larger sample sizes to achieve the same power.Figure 2 shows boxplots of the empirical power and the signiﬁcance level when using the RMST test and thelogrank test using the same sample sizes. We notice that the power obtained using the RMST test is centered around0 .

80, and it has a small variability. When comparing with the results using the log-rank test, we observe that the powerusing the log-rank test is in general less than 0 .

80, and there is a greater variability as compared with the results usingthe RMST test. The empirical signiﬁcance level is close to the type I error 0 .

05 using both tests.10 able 2: Simulation scenarios. When simulating under the null hypothesis, we have considered that a (1) nr = a (0) nr , a (1) r = a (0) r , and δ p =

0. Whensimulating under the alternative hypothesis, we have not considered all possible combinations, we restricted our calculations to those scenarios thatyield ∆ r ≥ ∆ nr ≥

0, and ∆ ≥

0, and whose resulting sample sizes are between 100 and 5000.

Parameters Values τ p (0) . , . b , a (0) r , , , , a (0) nr , , , , , , , , , a (1) nr = a (0) nr a (1) r = a (0) r δ p a (1) nr , , , , , , , , , , , , , a (1) r , , , , , δ p . , . α . β .

7. Discussion

Pathologic complete response (pCR) is a common primary endpoint for a phase II trial or for accelerated approvalof neoadjuvant cancer therapy. If granted, the regulatory agencies FDA[1] and EMA[2] require a two-arm conﬁrma-tory trial to demonstrate the e ﬃ cacy with a long-term survival outcome, such as overall survival. Because there is notdirect way to relate the e ﬀ ects on pCR with the e ﬀ ects on the survival endpoint, sample size calculation relying onanticipated values for OS is not straightforward. In this work, we have approached how to size trials for a survivalendpoint based on previous information on a short-term binary endpoint such as pCR. Our proposal is built on themixture distribution between the binary response and the survival by response and uses the di ﬀ erence of restrictedmean survival times (RMSTs) as the e ﬀ ect measure to compare the two treatment arms. We show that both the samplesize and the e ﬀ ect size can be written in terms of the probability of response of each treatment arm, and the respon-ders and non-responders survival functions. The proposed design is suitable for the designing of long-term phase IIIneoadjuvant trials given short-term binary outcomes, since, among other reasons, it does not rely on the proportionalhazard assumption.In recent years, the di ﬀ erence between RMSTs has arisen as an alternative measure for the treatment e ﬀ ect insteadof the hazard ratios when the proportional hazards assumption is in doubt.[13, 14] Although the RMSTs and theirvariances can be estimated from data, in the planning stage, their computation involves numerical integrals whoseanalytic solutions are usually hard to obtain, which makes it di ﬃ cult to calculate the sample size. In this work,we have proposed sample size calculations for planning the trial using RMSTs based on interpretable parameterswithout resorting to simulation. These calculations provide an explicit solution to compute the sample size under theassumption that the survival function of both responders and non-responders follows a commonly used parametricdistribution such as exponential or Weibull. All the necessary sample size calculations have been implemented in theR package survmixer . The proposed sample size formulae, based on asymptotic results for the RMST di ﬀ erence,are intended to be used in phase III trials where the sample size is expected to be modest or large. Acknowledgements

This work was supported by the Ministerio de Econom´ıa y Competitividad (Spain) under Grants PID2019-104830RB-I00 and MTM2015-64465-C2-1-R (MINECO / FEDER); the Departament d’Empresa i Coneixement dela Generalitat de Catalunya (Spain) under Grant 2017 SGR 622 (GRBIO); and the Ministerio de Econom´ıa y Com-petitividad (Spain), through the Mar´ıa de Maeztu Programme for Units of Excellence in R&D under Grant MDM-11 igure 2: Empirical statistical power and empirical signiﬁcance level using the RMST test in (7) and using the log-rank test according to the settingsdiscussed in Section 3.2.1.

Appendix A. Derivations of the survival functions based on the mixture model

For the i -th group ( i = , j = , ..., n ( i ) , let T i j and C i j be the time from randomization to thelong-term survival outcome and to censoring, respectively, and assume that T i j and C i j are independent. Let X i j = j -th patient is a responder and X i j =

0, otherwise, and let p ( i ) be the probability of having responded.Let S ( i ) ( · ) be the survival function of T i j , and S ( i ) r ( · ) and S ( i ) nr ( · ) denote the responders survival and non-responderssurvival functions. Let K ( i ) r ( τ ) = (cid:82) τ S ( i ) r ( t ) dt and K ( i ) nr ( τ ) = (cid:82) τ S ( i ) nr ( t ) dt be the restricted mean survival times at τ forresponders and non-responders in the i -th group, respectively. Theorem Appendix A.1.

The long-term survival function, S ( i ) ( · ) , can be expressed in terms of the response probabil-ity, p ( i ) , and the survival functions for responders and for non-responders, S ( i ) r ( t ) and S ( i ) nr ( t ) , respectively, as follows:S ( i ) ( t ) = p ( i ) · S ( i ) r ( t ) + (1 − p ( i ) ) · S ( i ) nr ( t ) , (A.1) and the restricted mean survival times is given by:K ( i ) ( τ ) = p ( i ) K ( i ) r ( τ ) + (1 − p ( i ) ) K ( i ) nr ( τ ) . (A.2) Proof.

By deﬁnition of the responders and non-responders survival functions, S ( i ) r ( t ) = P ( T i j > t | X i j =

1) and S ( i ) nr ( t ) = P ( T i j > t | X i j = S ( i ) ( t ) = P ( T i j > t , X i j = + P ( T i j > t , X i j = = p ( i ) · S ( i ) r ( t ) + (1 − p ( i ) ) · S ( i ) nr ( t ) , K ( i ) ( τ ) = (cid:90) τ S ( i ) r ( t ) p ( i ) + S ( i ) nr ( t )(1 − p ( i ) ) dt = p ( i ) (cid:90) τ S ( i ) r ( t ) dt + (1 − p ( i ) ) (cid:90) τ S ( i ) nr ( t ) dt , which is the same expression that in (A.2). (cid:3) Theorem Appendix A.2.

Treatment e ﬀ ects between groups for the long-term survival outcome in terms of the dif-ference in t-year survival rates, di ﬀ erence in restricted mean survival times or hazard ratio are functions in terms ofthe probabilities of being responder, p (0) and p (1) , and the survival-by-response functions S (0) r ( t ) , S (1) r ( t ) , S (0) nr ( t ) andS (1) nr ( t ) , as follows:(i) The di ﬀ erence in t-year survival rates is given by:S (1) ( t ) − S (0) ( t ) = p (1) (cid:16) S (1) r ( t ) − S (1) nr ( t ) (cid:17) − p (0) (cid:16) S (0) r ( t ) − S (0) nr ( t ) (cid:17) + S (1) nr ( t ) − S (0) nr ( t ) . (ii) The di ﬀ erence between the restricted mean survival times is as follows:K (1) ( τ ) − K (0) ( τ ) = (cid:16) p (1) K (1) r ( τ ) − p (0) K (0) r ( τ ) (cid:17) + (cid:16) (1 − p (1) ) K (1) nr ( τ ) − (1 − p (0) ) K (0) nr ( τ ) (cid:17) . (iii) The hazard ratio has the following expression: HR( t ) = p (0) · S (0) r ( t ) + (1 − p (0) ) · S (0) nr ( t ) p (1) · S (1) r ( t ) + (1 − p (1) ) · S (1) nr ( t ) · p (1) · f (1) r ( t ) + (1 − p (1) ) · f (1) nr ( t ) p (0) · f (0) r ( t ) + (1 − p (0) ) · f (0) nr ( t ) , (A.3) where f ( i ) r ( · ) and f ( i ) nr ( · ) denote the density functions for responders and non-responders in the i-th group, re-spectively.Proof. The proofs of (i) and (ii) follow from Theorem Appendix A.1. To see (iii), ﬁrst notice that:HR( t ) = − d ln[ S (1) ( t )] / dt − d ln[ S (0) ( t )] / dt then, if we replace S ( i ) ( t ) ( i = ,

1) by its expression in (A.1), we have:HR( t ) = − d ln[ S (1) ( t )] / dt − d ln[ S (0) ( t )] / dt = − d ln[ p (1) · S (1) r ( t ) + (1 − p (1) ) · S (1) nr ( t )] / dt − d ln[ p (0) · S (0) r ( t ) + (1 − p (0) ) · S (0) nr ( t )] / dt = p (0) · S (0) r ( t ) + (1 − p (0) ) · S (0) nr ( t ) p (1) · S (1) r ( t ) + (1 − p (1) ) · S (1) nr ( t ) · p (1) · f (1) r ( t ) + (1 − p (1) ) · f (1) nr ( t ) p (0) · f (0) r ( t ) + (1 − p (0) ) · f (0) nr ( t ) = S (0) ( t ) S (1) ( t ) · p (1) · f (1) r ( t ) + (1 − p (1) ) · f (1) nr ( t ) p (0) · f (0) r ( t ) + (1 − p (0) ) · f (0) nr ( t ) . Finally, the expression in (A.3) is obtained by substituting S (0) ( t ) and S (1) ( t ) with their expressions in terms of theresponse probability and survival-by-response functions in (A.1). (cid:3) Theorem Appendix A.3.

Considering the hypothesis problem given in 1, that is, H : S (0) ( t ) = S (1) ( t ) , ∀ t ∈ [0 , τ ] vs H : S (1) ( t ) ≥ S (0) ( t ) , S (1) ( t ) > S (0) ( t ) for some t ∈ [0 , τ ] , and the statistic Z s , n in (7) , Z s , n = ( ˆ K (1) ( τ ) − ˆ K (0) ( τ )) (cid:46) (cid:114) ( ˆ σ (0) ( τ )) n (0) + ( ˆ σ (1) ( τ )) n (1) , he sample size n ( p , S r , S nr , G ) needed for running a trial at one-sided signiﬁcance level α with power − β , is givenby:n ( p , S r , S nr , G ) = ( z α + z β ) (cid:16)(cid:16) p (1) K (1) r ( τ ) − p (0) K (0) r ( τ ) (cid:17) + (cid:16) (1 − p (1) ) K (1) nr ( τ ) − (1 − p (0) ) K (0) nr ( τ ) (cid:17)(cid:17) · (cid:32) ( σ (0) ( τ )) π + ( σ (1) ( τ )) − π (cid:33) , (A.4) where π = lim n (0) / n, and z x is the × (1 − x ) -th percentile of the standard normal distribution; and where thevariance ( σ ( i ) ( τ )) is: ( σ ( i ) ( τ )) = − p ( i ) (cid:90) τ ( (cid:82) τ t ( S ( i ) r ( u ) p ( i ) + S ( i ) nr ( u )(1 − p ( i ) )) du ) ( S ( i ) r ( t ) p ( i ) + S ( i ) nr ( t )(1 − p ( i ) )) · G ( i ) ( t ) dS ( i ) r ( t ) − (1 − p ( i ) ) (cid:90) τ ( (cid:82) τ t ( S ( i ) r ( u ) p ( i ) + S ( i ) nr ( u )(1 − p ( i ) )) du ) ( S ( i ) r ( t ) p ( i ) + S ( i ) nr ( t )(1 − p ( i ) )) · G ( i ) ( t ) dS ( i ) nr ( t ) . Proof.

The required sample size n ( p , S r , S nr , G ) needs to satisfy the following equation: K (0) ( τ ) − K (1) ( τ ) (cid:113) ( σ (0) ( τ )) n (0) + ( σ (1) ( τ )) n (1) = z α + z β and hence: n = ( z α + z β ) ( D ( τ )) · (cid:32) ( σ (0) ( τ )) π + ( σ (1) ( τ )) − π (cid:33) , (A.5)where σ ( i ) is given in (6). Taking into account that the overall mean survival improvement can be written as: D ( τ ) = (cid:16) p (1) K (1) r ( τ ) − p (0) K (0) r ( τ ) (cid:17) + (cid:16) (1 − p (1) ) K (1) nr ( τ ) − (1 − p (0) ) K (0) nr ( τ ) (cid:17) , then the expression of n ( p , S r , S nr , G ) in (A.4) is obtained substituting the survival function S ( i ) ( · ) in σ ( i ) by thesurvival mixture given in (2) into the formula (A.5). (cid:3) References [1] Food and Drug Administration (FDA). Guidance for Industry. Pathological Complete Response in Neoadjuvant Treatment of High-RiskEarly-Stage Breast Cancer: Use as an Endpoint to Support Accelerated Approval. October 2014.[2] European Medicine Agency (EMA). The role of the pathological Complete Response as an endpoint in neoadjuvant breast cancer studies.March 2014.[3] De Michele A, Yee D, Berry DA, Albain KS, Benz CC, Boughey J, Buxton M, Chia SK, Chien AJ, Chui SY, Clark A, Edmiston K, EliasAD, Forero-Torres A, Haddad TC, Haley B, Haluska P, Hylton NM, Isaacs C, Esserman LJ. The neoadjuvant model is still the future for drugdevelopment in breast cancer.

Clinical Cancer Research . 2015;21(13): 2911-2915.[4] Cortazar P, Zhang L, Untch M, Mehta K, Costantino JP, Wolmark N, Bonnefoi H, Cameron D, Gianni L, Valagussa P, Swain SM, Prowell T,Loibl S, Wickerham DL, Bogaerts J, Baselga J, Perou C, Blumenthal G, Blohmer J, Von Minckwitz G. Pathological complete response andlong-term clinical beneﬁt in breast cancer: The CTNeoBC pooled analysis.

The Lancet , 2014;384(9938):164-172.[5] Broglio KR, Quintana M, Foster M, Olinger M, McGlothlin A, Berry SM, Boileau JF, Brezden-Masley C, Chia S, Dent S, Gelmon K, PatersonA, Rayson D, Berry DA. Association of pathologic complete response to neoadjuvant therapy in HER2-positive breast cancer with long-termoutcomes ameta-analysis.

JAMA Oncology . 2016;2(6):751-760.[6] Hatzis C, Symmans WF, Zhang Y, Gould RE, Moulder SL, Hunt KK, Abu-Khalaf M, Hofstatter EW, Lannin D, Chagpar AB, PusztaiL. Relationship between complete pathologic response to neoadjuvant chemotherapy and survival in triple-negative breast cancer.

ClinicalCancer Research . 2016;22(1):26-33.[7] Inoue LYT, Thall PF, Berry DA. Seamlessly expanding a randomized phase II trial to phase III.

Biometrics . 2002;58(4):823-831.[8] Lai TL, Lavori PW, Shih MC. Sequential design of phase II-III cancer trials.

Statistics in Medicine . 2012;31(18):1944-1960.[9] Chen BE, Wang J. Joint modeling of binary response and survival for clustered data in clinical trials.

Statistics in Medicine . 2020;39(3):326-339.[10] Abberbock J, Anderson S, Rastogi P, Tang G. Assessment of e ﬀ ect size and power for survival analysis through a binary surrogate endpointin clinical trials. Statistics in Medicine . 2019;38(3):301-314.[11] Kalbﬂeisch JD, Prentice RL. Estimation of the average hazard ratio.

Biometrika . 1981;68:105-112.[12] Rauch G, Brannath W, Br¨uckner M, Kieser M. The Average Hazard Ratio – A Good E ﬀ ect Measure for Time-to-event Endpoints when theProportional Hazard Assumption is Violated?. Methods of Information in Medicine . 2018;57(3):89-100.

13] Royston P, Parmar MKB. The use of restricted mean survival time to estimate the treatment e ﬀ ect in randomized clinical trials when theproportional hazards assumption is in doubt. Statistics in Medicine . 2011;30(19):2409-2421.[14] Royston P, Parmar MK. Restricted mean survival time: an alternative to the hazard ratio for the design and analysis of randomized trials witha time-to-event outcome.

BMC Medical Research Methodology . 2013;13(1):152.[15] Zhao L, Tian L, Uno H, Solomon SD, Pfe ﬀ er MA, Schindler JS, Wei LJ. Utilizing the integrated di ﬀ erence of two survival functions toquantify the treatment contrast for designing, monitoring, and analyzing a comparative clinical study. Clinical Trials . 2012;9(5):570-577.[16] Hern´an MA. The Hazards of Hazard Ratios.

Epidemiology . 2010;21(1):13-15.[17] Hern´an MA. How to estimate the e ﬀ ect of treatment duration on survival outcomes using observational data. The BMJ , 2018;360:k182.[18] Luo X, Huang B, Quan H. Design and monitoring of survival trials based on restricted mean survival times.

Clinical Trials . 2019;16(6):616-625.[19] Zhao L, Claggett B, Tian L, Uno H, Pfe ﬀ er MA, Solomon SD, Trippa L, Wei LJ. On the restricted mean survival time curve in survivalanalysis. Biometrics . 2016;72(1): 215-221.[20] Pepe MS, Fleming TR. Weighted Kaplan-Meier Statistics: A Class of Distance Tests for Censored Survival Data.

Biometrics . 1989;45(2):497–507.[21] Gill RD.

Censoring and Stochastic Integrals , Mathematical Centre Tracts 124. 1980, Amsterdam.[22] Gianni L, Eiermann W, Semiglazov V, Manikhas A, Lluch A, Tjulandin S, Zambetti M, Vazquez F, Byakhow M, Lichinitser M, ClimentMA, Ciruelos E, Ojeda B, Mansutti M, Bozhok A, Baronio R, Feyereislova A, Barton C, Valagussa P, Baselga J. Neoadjuvant chemother-apy with trastuzumab followed by adjuvant trastuzumab versus neoadjuvant chemotherapy alone, in patients with HER2-positive locallyadvanced breast cancer (the NOAH trial): a randomised controlled superiority trial with a parallel HER2-negative cohort.

The Lancet .2010;375(9712):377-384.[23] Gianni L, Eiermann W, Semiglazov V, Lluch A, Tjulandin S, Zambetti M, Moliterni A, Vazquez F, Byakhov MJ, Lichinitser M, Climent MA,Ciruelos E, Ojeda B, Mansutti M, Bozhok A, Magazzu D, Heinzmann D, Steinseifer J, Valagussa P, Baselga J. Neoadjuvant and adjuvanttrastuzumab in patients with HER2-positive locally advanced breast cancer (NOAH): Follow-up of a randomised controlled superiority trialwith a parallel HER2-negative cohort.

The Lancet Oncology . 2014;15(6):640-647.. 2014;15(6):640-647.