[PDF] An introduction to the determination of the probability of a successful trial: Frequentist and Bayesian approaches

Abstract

Determination of posterior probability for go-no-go decision and predictive power are becoming increasingly common for resource optimization in clinical investigation. There are vast published literature on these topics; however, the terminologies are not consistently used across the literature. Further, there is a lack of consolidated presentation of various concepts of the probability of success. We attempted to fill this gap. This paper first provides a detailed derivation of these probability of success measures under the frequentist and Bayesian paradigms in a general setting. Subsequently, we have presented the analytical formula for these probability of success measures for continuous, binary, and time-to-event endpoints separately. This paper can be used as a single point reference to determine the following measures: (a) the conditional power (CP) based on interim results, (b) the predictive power of success (PPoS) based on interim results with or without prior distribution, and (d) the probability of success (PoS) for a prospective trial at the design stage. We have discussed both clinical success and trial success. This paper's discussion is mostly based on the normal approximation for prior distribution and the estimate of the parameter of interest. Besides, predictive power using the beta prior for the binomial case is also presented. Some examples are given for illustration. R functions to calculate CP and PPoS are available through the LongCART package. An R shiny app is also available at this https URL

Full PDF

aa r X i v : . [ s t a t . M E ] F e b An introduction to the determination of the probability of asuccessful trial: Frequentist and Bayesian approaches

Madan G Kundu ∗† Sandipan Samanta Shoubhik Mondal Daiichi-Sankyo Inc. (DSI), Basking Ridge, NJ, USA QIAGEN GmbH, Hilden, Germany Boehringer Ingelheim, CT, USAMarch 1, 2021

Abstract

Determination of posterior probability for go-no-go decision and predictive power are becomingincreasingly common for resource optimization in clinical investigation. There are vast publishedliterature on these topics; however, the terminologies are not consistently used across the literature.Further, there is a lack of consolidated presentation of various concepts of the probability ofsuccess. We attempted to ﬁll this gap. This paper ﬁrst provides a detailed derivation of theseprobability of success measures under the frequentist and Bayesian paradigms in a general setting.Subsequently, we have presented the analytical formula for these probability of success measuresfor continuous, binary, and time-to-event endpoints separately. This paper can be used as asingle point reference to determine the following measures: (a) the conditional power (CP) basedon interim results, (b) the predictive power of success (PPoS) based on interim results withor without prior distribution, and (d) the probability of success (PoS) for a prospective trialat the design stage. We have discussed both clinical success and trial success. This paper’sdiscussion is mostly based on the normal approximation for prior distribution and the estimateof the parameter of interest. Besides, predictive power using the beta prior for the binomialcase is also presented. Some examples are given for illustration. R functions to calculate CPand PPoS are available through the LongCART package. An R shiny app is also available athttps://ppos.herokuapp.com/.

Keywords : B-value, Beta-Binomial, Clinical success, Conditional power, Normal-normal approxi-mation, Predictive power of success (PPoS), Prior distribution, Probability of success (PoS), Trialsuccess.

The need to determine the probability of study success may arise at various stages of a clinical trial.For example, the goal of such an exercise could be making go or no go decision based on data from ∗ Corresponding author: Madan G. Kundu, [email protected] † This article reﬂects the views of the authors only and should not be construed to represent the views or policies oftheir aﬃliated organizations. θ where θ denotes the true treatment eﬀect. The basis of this anticipation about θ could be either the availableinterim results, or the results from previous trials (and clinical judgement) or some combination ofall these sources, but in general, is not precisely known. In the frequentist approach, the power andCP are calculated using the speciﬁed (assumed) value of the θ . However, the power and CP may notgive a good indication of overall trial success [3] as it treats the assumed value of θ as its true value.Therefore, in the Bayesian approach, instead of assuming a speciﬁc value of θ , the knowledge about θ is summarized as distribution of θ which is then used to calculate PoS or PPoS. This distribution of θ could be either prior distribution (when no interim results are involved) or predictive distribution(when interim results are involved). For this reason, PoS is also viewed as average ‘power’ over theprior distribution of θ [4, 5] whereas PPoS is the average CP over the predictive distribution of θ .Note that, in the context of “clinical success”, PoS and PPoS are also known as the probability ofclinical success (PoCS) and the predictive power of clinical success (PPoCS), respectively.Halperin et al. [6] suggested using conditional probability given the current results as a tool foreﬃcacy and futility monitoring in the clinical trial. Use of the B-value to calculate the CP wasﬁrst presented by Lan and Wittes [7]. The B-value is a function of the Z-value with independentincrement property and discussed brieﬂy in this paper in Section 2. Lachin [8] has shown that anyformal stopping boundaries based on the CP (along with the type I and II error probabilities) canbe expressed using B value in a study with interim futility analysis. Lan, Hu and Proschan [9] havediscussed the relationship between the CP and PPoS. Choi, Smith and Becker [10] and Spiegelhalter,Freedman and Blackburn [11] ﬁrst applied the Bayesian methodology in monitoring clinical trial withbinary endpoint where analyses were carried out using conventional frequentist techniques. Such aframework is known as ‘hybrid classical-Bayesian’ [12] and was later applied for the continuous end-points [13] and the survival endpoints [14] as well. One challenge in decoding all the informationavailable in the literature is the inconsistent use of terminologies in describing these concepts. Forexample, PPoS has been referred as ‘predictive power’ [1, 9], ‘Bayesian predictive power (BPP)’[15], ‘predictive probability of statistical signiﬁcance’ [2] and ‘probability of study success’ [16] in theliterature. On the other hand, the PoS in the literature is also referred in the literature as ‘averagesuccess probability’ [4, 5], ‘assurance’ [3], and ‘expected power’ [5]. Furthermore, despite the widepopularity of these measures, the current literature still lacks the concise presentation of these con-cepts with expressions for these measures by type of endpoint for the clinical practitioners to use.2he present work attempts to ﬁll that gap.In this paper, we ﬁrst discuss the concept of various probability of success measures before moving onto presenting the analytical formula for continuous, binary and time-to-event endpoints separatelywith example. Here, we present the following measures: (a) CP based on the interim result, (b)PPoS based on the interim results, and (c) PoS at the design stage of a prospective trial. The PPoSis discussed with or without prior distribution. The discussion in this paper is largely based on thenormal-normal approximation; however, we have also discussed the PPoS for the beta-binomial case.R functions to calculate CP and PPoS discussed in this paper are available through LongCARTpackage [17]. An user friendly R shiny app is also available at https://ppos.herokuapp.com/. We consider the following general form of hypothesis testing in a clinical trial: H : θ = 0 vs . H : θ > θ is a parameter of interest. For example, it could be either mean or proportion in a given pop-ulation or their diﬀerence between two populations or log hazards ratio (HR). We also assume resultsfrom the interim analysis performed after the accrual of t amounts of “information” (0 ≤ t ≤

1) areavailable.At any given time of analysis, the information t equals the proportion of evaluable subjectsto the maximum number of planned subjects for continuous and binary endpoints or proportionof observed events to the maximum planned events for time to event endpoints. Let’s ˆ θ ( t ) be theestimate of θ at the interim analysis with corresponding standard error (SE) as SE [ˆ θ ( t )] = k · √ t .Note that, k is the SE of the estimate at the ﬁnal analysis and does not depend on t . For example, k = σ/ √ N and k ≈ / √ D for continuous and survival cases, respectively. With this, for both Z-testand log-rank test, test statistic Z ( t ) can be expressed as Z ( t ) = ˆ θ ( t ) SE [ˆ θ ( t )] = ˆ θ ( t ) k · √ t and, Reject H , if Z(t) > c(t)where c ( t ) is the rejection boundary. c ( t ) must be identiﬁed in advance and should be such that it pre-serves overall type I error of α . In a single look design without any interim analysis, c (1) = Φ(1 − α ),where Φ( · ) denotes the cumulative distribution function of a standard normal variate. In a multiplelooks design with one or more interim analyses, c (1) must be determined according to the appropriatealpha spending function (e.g. [18]).Since, E [ Z ( t )] = ( θ/k ) · √ t , the information growth in Z ( t ) is proportional to √ t . Following Lanand Wittes [7], the B-values are deﬁned as follows: B ( t ) = Z ( t ) · √ t = ˆ θ ( t ) k · t (2.1)with B (0) = 0 at the trial initiation and B (1) = Z (1) at the end of the trial. Further,Cov[ B ( t ) , B ( s )] = min ( s, t ) and Var[ B ( t )] = t (2.2)3 ( t ) has following advantages over Z ( t ): (a) information growth in B ( t ) is proportional to t as E [ B ( t )] = ( θ/k ) · t , and (b) B ( t ) has independent increments, implying B (1) − B ( t ), is independentof the B ( t ). Because of these two advantages, it is often easier to work with B ( t ) compared to Z ( t ). With the interim results available at information time t , the uncertainty is now restricted to theresults from the post-interim data (i.e., data contributing to remaining information of (1 − t )). As B (1) − B ( t ) is independent of B ( t ), we can decompose B (1) as follows B (1) = B ( t ) + [ B (1) − B ( t )]Based on Eq. (2.1), it translates to Z (1) = √ t · Z ( t ) + √ − t · Z (1 − t ) (3.1)where Z (1 − t ) is the test statistic using the post-interim data only (i.e., data accrued after information t ) and is deﬁned as follows: Z (1 − t ) = ˆ θ (1 − t ) SE h ˆ θ (1 − t ) i = ˆ θ (1 − t ) k/ √ − t = ˆ θ (1 − t ) · √ − tk (3.2)where, ˆ θ (1 − t ) is the estimate of θ based on post-interim data only. From Eq. (3.1), we haveˆ θ (1) = t · ˆ θ ( t ) + (1 − t ) · ˆ θ (1 − t ) (3.3)Clearly, Z (1 − t ) and ˆ θ (1 − t ) are independent from Z ( t ) and ˆ θ ( t ). Further, after the interim analysis, Z ( t ) and ˆ θ ( t ) are ﬁxed and known, but Z (1 − t ) and ˆ θ (1 − t ) are unknown and random. We call out“Trial success” at the time of ﬁnal analyses if Z (1) > c (1). Based on Eq. (3.1) and Eq. (3.2), thistranslates to Trial success, if ˆ θ (1 − t ) > k − t · h c (1) − √ t · Z ( t ) i Further, we call out “clinical success” at the time of ﬁnal analyses if ˆ θ (1) > θ min . Based on Eq. (3.3),this translates to Clinical success, if ˆ θ (1 − t ) > k − t · (cid:20) θ min k − √ t · Z ( t ) (cid:21) Note the similarity in the deﬁnition of the “Trial success” and “Clinical success” criteria. By re-placing c (1) with θ min k in the “Trial success” criterion, we can obtain the “Clinical success” criterion.Therefore, we deﬁne the general criteria of “success” as follows:Success, if ˆ θ (1 − t ) > k − t · h γ − √ t · Z ( t ) i (3.4)where γ = c (1) for “Trial success” and γ = θ min k for “Clinical success”.4 .1 Conditional power based on interim results Conditional power (CP) is the conditional probability that a trial success would be observed at theﬁnal analysis given the interim results. Assuming the estimate of θ from the post interim period tobe θ ′ , the CP is determined based on the following distributionˆ θ (1 − t ) ∼ Normal (cid:20) θ ′ , k − t (cid:21) Therefore, the CP is CP ( t | θ ′ ) = 1 − Φ k − t · (cid:2) γ − √ t · Z ( t ) (cid:3) − θ ′ k √ − t ! = Φ √ − t " ˆ θ ( t ) k ( t + (1 − t ) θ ′ ˆ θ ( t ) ) − γ (3.5)It is very intuitive and common to replace θ ′ by ˆ θ ( t ) to calculate the CP. In that case, the expressionof CP reduces to (e.g., see [9]) CP ( t | ˆ θ ( t )) = Φ √ − t " ˆ θ ( t ) k − γ = Φ (cid:18) √ − t (cid:20) Z ( t ) √ t − γ (cid:21)(cid:19) (3.6)This is the CP when the post-interim trend expected to be similar to that observed in the interimanalysis. The CP depends on the assumed treatment eﬀect to be observed in the post-interim data, and there-fore, calculation of CP can be arbitrary. An alternative is to calculate the probability of successusing the predictive distribution of ˆ θ (1 − t ) given the interim results which we refer in this paperas the predictive power of success (PPoS). The PPoS can be viewed as the average of CP( t, θ ) overthe predictive distribution of θ . The Bayesian framework allows us to incorporate the prior distribu-tion of θ (e.g., based on the historical data or clinicians’ judgements) in the predictive distributionof ˆ θ (1 − t ), although the prior distribution is not mandatory. For the ease of describing, we ﬁrstdetermine the expression for PPoS with the prior distribution of θ and then present the expressionwithout the prior distribution.Suppose the prior knowledge about θ can be summarized using the following prior distribution: θ ∼ Normal (cid:2) θ , σ (cid:3) (3.7)With this prior, the posterior distribution of θ is θ | ˆ θ ( t ) ∼ Normal " ψ ˆ θ ( t ) + (1 − ψ ) θ , (cid:18) σ + tk (cid:19) − ≡ Normal (cid:20) ψ ˆ θ ( t ) + (1 − ψ ) θ , ψ · k t (cid:21) where, ψ = σ σ + k /t can be viewed as the proportion of the contribution of interim data. Sincethe ˆ θ (1 − t ) is unknown at information time t , we compute the predictive distribution of ˆ θ (1 − t ) asfollows: ˆ θ (1 − t ) | ˆ θ ( t ) ∼ Normal (cid:20) ψ ˆ θ ( t ) + (1 − ψ ) θ , k (cid:18) − t + ψ · t (cid:19)(cid:21)

5e can now use the predictive distribution of ˆ θ (1 − t ) to derive the PPoS as follows P P oS ( t | ˆ θ ( t ) , ψ, θ ) = 1 − Φ k − t (cid:2) γ − √ t · Z ( t ) (cid:3) − ψ ˆ θ ( t ) − (1 − ψ ) θ k p / (1 − t ) + ψ/t ! = Φ r t − t · p (1 − ψ ) t + ψ " ˆ θ ( t ) k { t (1 − ψ ) + ψ } + (1 − t )(1 − ψ ) θ k − γ (3.8)This is the PPoS given the interim results and also using the prior distribution. The PPoS is alsocalculated without the prior distribution (e.g. see [9]). Without the prior distribution, PPoS can bederived as a special case of Eq (3.8) by setting ψ = 1 which implies 100% contribution of interimdata to the predictive distribution of ˆ θ (1 − t ). Therefore, PPoS given the interim results withoutprior distribution is obtained as follows (e.g., see [9, 19]): P P oS ( t | ˆ θ ( t )) = Φ √ − t " ˆ θ ( t ) k − γ · √ t ! = Φ (cid:18) √ − t (cid:20) Z ( t ) √ t − γ (cid:21) · √ t (cid:19) (3.9)The expressions of PPoS in Eq. (3.9) and CP in Eq. (3.6) are very similar except the additional √ t in the numerator of PPoS. As pointed out by Lan, Hu and Proschan [9], it’s simple consequence isthat CP > PPoS for

CP > . < PPoS for

CP < .

5. In other words, the CP is less extremethan the PPoS. It implies that the stopping rule based on the PPoS will always make it harder tostop a trial than a stopping rule based on the CP.

The power at the design stage of a clinical trial is calculated assuming some speciﬁed treatmenteﬀect. An alternative is to calculate the probability of success (PoS) using the prior distribution of θ .The PoS can also be viewed as average power over the prior distribution. The PoS has also beenreferred to as ‘assurance’ [3], and ‘expected power’ or ‘average success of probability’ [5]. Note that,at the design stage of clinical trial interim results are not available; hence, PoS is entirely based onthe prior distribution.Assuming the prior distribution speciﬁed in Eq. (3.7) and expecting the SE in the trial to be ˜ k , thepredictive distribution of ˆ θ (1) would beˆ θ (1) | θ ∼ Normal h θ , σ + var [ˆ θ (1)] i ≡ Normal h θ , σ + ˜ k i In that case, the PoS given the prior distribution would be

P OS ( θ , σ ) = P r [ Z (1) > γ | θ ] = P r [ˆ θ (1) > ˜ k · γ | θ ] = Φ  θ − ˜ k · γ q σ + ˜ k  (3.10) This section presents the expression of CP, PPoS and PoS to test hypotheses for continuous, binaryand survival endpoints separately based on the general expression presented in the previous section6ith normal distribution approximation. However, for the binomial case, we have discussed the PPoSusing the beta prior as well. For the two-sample cases, the allocation ratio (treatment arm to controlarm) is denoted as a : 1. We denote r = ( a + 1) /a . Note that the expressions for one sample casecan be obtained directly from the corresponding expressions from the two-sample case by specifying r = 1. Intuition for setting r = 1 is simple: the single arm design can be thought of as 1 : 0 allocationratio (instead of a : 1) in which case, r = (1 + 0) / √ γ = c (1) for “Trial success” and γ = θ min k for “Clinicalsuccess”. We consider study comparing treatment (T) with control (C) with continuous endpoint and corre-sponding population mean as µ T and µ C , respectively. Further, assume the maximum total samplesize in the study is N . Here, we test the following hypotheses: H : µ T − µ C = ∆ vs . H : µ T − µ C > ∆ Here, θ = µ T − µ C − ∆ . At interim analysis with total sample size n , the estimate of θ is ˆ θ ( t ) = δ n − ∆ where δ n is the diﬀerence in the sample means between the two arms at interim. The correspondingtest statistic is Z ( t ) = δ n − ∆ r · s n / √ n = ( δ n − ∆ ) √ nr · s n where s n is the estimated pooled SD at interim analysis. Further, in this case, t = n/N and k = r · s n / √ N .Conditional power (CP): The CP with the future trend of diﬀerence in the sample means as ∆ ′ isΦ r · s n r NN − n (cid:20) √ N { n ( δ n − ∆ ) + ( N − n )(∆ ′ − ∆ ) } − r · s n · γ (cid:21)! (4.1)If we assume that the current trend observed through interim analysis continues to hold for the futuredata as well (i.e., ∆ ′ = δ n ), then the expression for CP isΦ r · s n r NN − n h ( δ n − ∆ ) √ N − r · s n · γ i! (4.2)Predictive power of success (PPoS): The expressions of the PPoS are as followsPPoS, without prior distribution: Φ (cid:18) r · s n r nN − n h ( δ n − ∆ ) √ N − r · s n · γ i(cid:19) (4.3)PPoS, with assumed prior distribution: 7  rs n r nN − n h (1 − ψ ) { n ( δ n − ∆ ) + ( N − n )(∆ − ∆ ) } / √ N + ψ ( δ n − ∆ ) √ N − r · s n γ ip ψ + (1 − ψ ) n/N  (4.4)where, ψ = nσ / ( nσ + r s n ).Probability of success (PoS): The expression of the PoS is as followsPoS (e.g. see [3]): Φ √ N · (∆ − ∆ ) − r · ˜ σ · γ p N · σ + r · ˜ σ ! (4.5)where ˜ σ is the expected pooled SD in the trial. Note that, for the calculation of PPoS with priordistribution in Eq. (4.4) and PoS in Eq. (4.5), following prior of µ T − µ C is used µ T − µ C ∼ Normal (cid:2) ∆ , σ (cid:3) (4.6) Let’s consider a study with a single treatment arm and continuous endpoint. Denote the populationmean as µ and the maximum sample size in the study is N . We test the following hypotheses: H : µ = µ vs . H : µ > µ Here, θ = µ − µ . At interim analysis with sample size n , the estimate of θ is ˆ θ ( t ) = x n − µ where x n is the sample mean at interim. The corresponding test statistic is Z ( t ) = x n − µ s n / √ n = ( x n − µ ) √ ns n where s n is the estimated SD at interim analysis. Further, in this case, t = n/N and k = s n / √ N .Conditional power (CP): The CP with the future trend of the sample mean as µ ′ isΦ s n r NN − n (cid:20) √ N { n (¯ x n − µ ) + ( N − n )( µ ′ − µ ) } − s n · γ (cid:21)! (4.7)If we assume that the current trend observed through interim analysis continues to hold for futuredata as well (i.e., µ ′ = x n ), then the expression of the CP isˆ θ ( t )) = Φ s n r NN − n h (¯ x n − µ ) √ N − s n · γ i! (4.8)where, ψ = nσ / ( nσ + s n ).Predictive power of success (PPoS): The expressions of the PPoS are as followsPPoS, without prior distribution: Φ (cid:18) s n r nN − n h (¯ x n − µ ) √ N − s n · γ i(cid:19) (4.9)PPoS, with assumed prior distribution: 8  s n r nN − n h (1 − ψ ) { n (¯ x n − µ ) + ( N − n )( µ − µ ) } / √ N + ψ (¯ x n − µ ) √ N − s n γ ip ψ + (1 − ψ ) n/N  (4.10)Probability of success (PoS): The expression of PoS is as followsPoS: Φ √ N · ( µ − µ ) − ˜ σ · γ p N · σ + ˜ σ ! (4.11)where ˜ σ is the expected SD in the trial. Note that, for the calculation of PPoS with prior distributionin Eq. (4.10) and PoS in Eq. (4.11), following prior for µ is used µ ∼ Normal (cid:2) µ , σ (cid:3) (4.12) Let’s consider a study comparing treatment (T) with control (C) and binary endpoint. Denote thepopulation proportion as Π T and Π C , respectively. Further, assume the maximum total sample sizein the study is N . The hypotheses of interest are: H : Π T − Π C = ∆ vs . H : Π T − Π C > ∆ Here, θ = Π T − Π C − ∆ . At interim analysis with total sample size n , the estimate of θ is ˆ θ ( t ) = δ n − ∆ where δ n = ˆ π T,n − ˆ π C,n is the diﬀerence in sample proportion between the two arms withˆ π T,n and ˆ π C,n as the estimate of two proportions. Note that, SE ( δ n ) = s ˆ π T,n (1 − ˆ π T,n ) a · n/ (1 + a ) + ˆ π C,n (1 − ˆ π C,n ) n/ (1 + a ) = r · s n / √ n with s n = aa + 1 (cid:26) ˆ π T,n (1 − ˆ π T,n ) a + ˆ π C,n (1 − ˆ π C,n ) (cid:27) . Therefore, the corresponding test statistic is Z ( t ) = δ n − ∆ r · s n / √ n = ( δ n − ∆ ) √ nr · s n Further, t = n/N and k = r · s n / √ N = SE( δ n ) · √ t .CP, PPoS and PoS with normal approximation: The expressions of the CP, PPoS and PoS for twosample binary case are similar to the continuous case: • Eq. (4.1) for the CP with the expected diﬀerence from post-interim data as ∆ ′ . • Eq. (4.2) for the CP with the expected diﬀerence similar to that observed at interim analysis. • Eq. (4.3) for the PPoS without prior distribution.Further, assume that the following prior for Π T − Π C is available to usΠ T − Π C ∼ Normal (cid:2) ∆ , σ = σ (cid:3) (4.13)Using this prior information, 9 The PPoS with prior distribution can be obtained from Eq. (4.4) with ψ = nσ / ( nσ + N s n ). • The PoS can be obtained from Eq. (4.5) with ˜ σ = aa + 1 (cid:26) ˜ π T (1 − ˜ π T ) a + ˜ π C (1 − ˜ π C ) (cid:27) where˜ π T and ˜ π C are the expected proportions in the trial.PPoS, with the beta priors: Let X T and X C be the observed number of desired outcomes (e.g.,response) in the treatment arm and control arm, respectively, which is assumed to follow a binomialdistribution with the respective probability of response Π T and Π C . Further, we assume followingprior distributions: Π T ∼ Beta( a T , b T ) and Π C ∼ Beta( a C , b C ). Denote the observed number ofresponse at interim analysis as x T based on n T subjects in the treatment arm and x C based on n C subjects in the control arm with n T + n C = n . Given the interim result, the posterior distributionsof Π T and Π C are (e.g., see [2])Π T | x T ∼ Beta( x T + a T , n T − x T + b T )Π C | x C ∼ Beta( x C + a C , n C − x C + b C )Let, Y T and Y C be the observed number of observed response from remaining N T − n T and N C − n C subjects, respectively, with N T + N C = N . The predictive distribution of Y T and Y C are (e.g., see[20])Pr( Y T = y T | x T ) = (cid:18) N T − n T y T (cid:19) B( x T + y T + a T , N T − x T − y T + b T )B( x T + a T , n T − x T + b T ) y T = 0 , , · · · , N T − n T Pr( Y C = y C | x C ) = (cid:18) N C − n C y C (cid:19) B( x C + y C + a C , N C − x C − y C + b C )B( x C + a C , n C − x C + b C ) y C = 0 , , · · · , N C − n C where, B( u, v ) = ( u − v − u + v − N T − n T X y T =0 N C − n C X y C =0 I (success | x T + y T , x C + y C , N T , N C ) · Pr( Y T = y T | x T ) · Pr( Y C = y C | x C ) (4.14)where I ( · ) is the indicator function for success criteria which could be either trial success (e.g., basedon approximate Z test or Fisher’s exact test) or clinical success indicating the observed diﬀerence inproportion exceeds the certain clinical meaningful value. Let’s consider a study with a single treatment arm and binary endpoint. Let’s Π denotes the pop-ulation proportion and the maximum sample size in the study is N . We test the following set ofhypotheses: H : Π = Π vs . H : Π > Π Here, θ = Π − Π . At interim analysis with sample size n , the estimate of θ is ˆ θ ( t ) = ˆ π n − ∆ whereˆ π n is the sample proportion at interim. The corresponding test statistic is Z ( t ) = p n − Π s n / √ n = ( p n − Π ) √ ns n s n = p ˆ π n (1 − ˆ π n ). Further, in this case, t = n/N and k = s n / √ N = SE(ˆ π n ) · √ t .CP, PPoS and PoS with normal approximation: The expressions of the CP, PPoS and PoS for onesample binary case are similar to the continuous case: • Eq. (4.7) (replacing ¯ x n with ˆ π n , µ with Π , and µ ′ with Π ′ ) for the CP with the expecteddiﬀerence from post-interim data as Π ′ . • Eq. (4.8) (replacing ¯ x n with ˆ π n and µ with Π ) for the CP with the expected diﬀerence similarto that observed at interim analysis. • Eq. (4.9) (replacing ¯ x n with ˆ π n and µ with Π ) for the PPoS without prior distribution.Further, assume that the following prior for Π is available to usΠ ∼ Normal (cid:2) Π , σ = σ (cid:3) (4.15)Using this prior information, • The PPoS with prior distribution can be obtained from Eq. (4.10) (replacing ¯ x n with ˆ π n , µ with Π and µ with Π ) with ψ = nσ / ( nσ + N s n ). • The PoS can be obtained from Eq. (4.11) (replacing µ with Π and µ with Π ) with ˜ σ = p ˜ π (1 − ˜ π ) where ˜ π is the expected proportion in the trial.PPoS, with the beta prior: Let X be the observed number of response in the trial which is assumedto follow a binomial distribution with probability of response Π . Further, we assume that the priordistribution of Π is Beta( a, b ). Denote the observed number of response as x n from n subjects atinterim analyses. Given the interim result, the posterior distribution of Π isΠ | x n ∼ Beta( x n + a, n − x n + b )Let, Y be the number of the observed response from remaining N − n subjects. The predictivedistribution of Y is (e.g., see [20])Pr( Y = y | x n ) = (cid:18) N − ny (cid:19) B( x n + y + a, N − x n − y + b )B( x n + a, n − x n + b ) y = 0 , , · · · , N − n Thus, the PPoS would be N − n X y =0 I (success | x n + y, N ) · Pr( Y = y | x n ) (4.16)where I ( · ) is the indicator function that either trial success criteria is met (e.g., based on approximateZ test or exact binomial test) or clinical success is achieved indicating observed proportion exceedsthe certain clinical meaningful value. 11 .5 Survival endpoint, two samples We consider a clinical trial comparing treatment (T) with control (C) with time to event endpoint.Denote the treatment to control hazards ratio as λ and the maximum target number of events as D .Here, we test the following hypotheses: H : λ = λ vs . H : λ < λ Here, θ = log ( λ /λ ). At interim analysis with the total number of events D IA , the estimate of θ isˆ θ ( t ) = log ( λ / ˆ λ IA ) where ˆ λ IA is the estimated HR. The corresponding log-rank statistic for trendtest is approximately equivalent to Z ( t ) = log ( λ / ˆ λ IA ) r p D IA Further, in this case, t = D IA /D and k = r/ √ D .Conditional power (CP): Expression of the CP assuming the estimated HR from the post-interimdata as λ ′ is Φ r r DD − D IA (cid:20) D IA √ D log λ ˆ λ IA + D − D IA √ D log λ λ ′ − r · γ (cid:21)! (4.17)If we assume that the current trend observed at the interim analysis continues to hold for future dataas well (i.e., λ ′ = ˆ λ IA ), then the expression of CP isΦ r r DD − D IA (cid:20) √ D · log λ ˆ λ IA − r · γ (cid:21)! (4.18)Predictive power of success (PPoS): The expressions of the PPoS are as follows:PPoS, without prior distribution: Φ r r D IA D − D IA (cid:20) √ D · log λ ˆ λ IA − r · γ (cid:21)! (4.19)PPoS, with prior distribution (e.g., see [14]):Φ  r r D IA D − D IA h (1 − ψ ) { D IA √ D · log λ ˆ λ IA + D − D IA √ D · log λ λ } + ψ · √ D · log λ ˆ λ IA − r · γ iq ψ + (1 − ψ ) D IA D  (4.20)where, ψ = D IA · σ D IA · σ + r .Probability of success (PoS): The expression of the PoS is as follows:PoS (e.g., see [16]): Φ √ D · log ( λ /λ ) − r · γ p D · σ + r ! (4.21)Note that, for the calculation of the PPoS and PoS with prior distribution, the following prior for λ is used log λ ∼ Normal (cid:2) log λ , σ (cid:3) (4.22)12 Example

In this section, we illustrate the calculation of CP, PPoS and PoS based on published clinical trialresults. We have supplemented these examples with made up prior distribution and interim results forthe sole purpose of illustration. Following R functions in LongCART package can be used to calculatethese measures: (a)

PoS() to calculate PoS at the design stage for all three types of endpoints, (b) succ_ia() to calculate CP and PPoS based on interim results with normal-normal approximationfor all three types of endpoints, (c) succ_ia_betabinom_two() to calculate PPoS based on interimresults for comparison of two proportions with beta priors, and (d) succ_ia_betabinom_one() tocalculate PPoS based on interim results for testing of single proportion with beta prior. An userfriendly R shiny app is also available at https://ppos.herokuapp.com/ to calculate these measures.

Example 1: Continuous endpoint

In the pragmatic, nonblinded, non-inferiority CODA trial [21], 1552 subjects (= N ) with appendicitiswere equally randomized in 1:1 allocation ratio either to receive antibiotics or to undergo appendec-tomy. The primary outcome was 30-day health status, as assessed with the European Quality ofLife–5 Dimensions (EQ-5D) questionnaire (scores range from 0 to 1, with higher scores indicatingbetter health status; non-inferiority margin, 0.05 points). For this illustration, we pretend to havean interim analysis at the sample size of 776 (= n ). According to O’Brien alpha spending function,the rejection boundaries for Z test statistic are 2.96 and 1.97 (= c (1)) at interim and ﬁnal analyses,respectively.In this case, we are statistically testing the following hypotheses: H : µ T − µ C ≤ − .

05 against H : µ T − µ C > − .

05. Therefore, ∆ = − .

05. Further, the external information available aretranslated as following prior distribution of µ T − µ C µ T − µ C ∼ Normal (cid:2) ∆ = 0 , σ = (0 . (cid:3) As the treatment allocation ratio is 1:1, we have r = (1 + 1) / γ = c (1) = 1 .

97) at the designstage is Φ (cid:18) √ − ( − . − (2)(0 . . √ (1552)(0 . +(4)(0 . (cid:19) = 0 .

965 (see Eq. (4.5)).For the calculation of CP and PPoS, let’s consider following interim results: mean diﬀerence, -0.025(= δ n ) points with SD as 0.16 (= s n ). Here, k = ( √ . / √ . γ = c (1) = 1 .

97) is 0.941 and PPoS would be 0.866 (see Eq. (4.4)). Now, expecting-0.030 mean diﬀerence from post-interim data (i.e., ∆ ′ = − . Example 2: Binary endpoint

Fenaux et al. [22] reported the trial results of placebo-controlled, phase 3 trial evaluating the eﬀectof Luspatercept in patients with lower-risk myelodysplastic syndromes. The primary endpoint wasthe proportion of patients with transfusion independence for eight weeks or longer during weeks 113hrough 24. A total sample size of 210 patients (= N ) with 2:1 treatment allocation ratio would givethe study 90% power to detect diﬀerences between response rates of 0.30 in the luspatercept armand 0.10 in the placebo arm with the one-sided alpha of 0.025 and 10% dropout rate. For this illus-tration, we add an interim analysis at the sample size of 158 (= n ). Further, we assume the clinicallymeaningful diﬀerence is 15% (= θ min ). According to O’Brien alpha spending function, the rejectionboundaries for the Z test statistic are 2.34 and 2.012 (= c (1)) at interim and ﬁnal analyses, respectively.In this case, we are statistically testing the following hypotheses: H : Π T − Π C ≤ H :Π T − Π C >

0. Therefore, ∆ = 0. We suppose that the elicitation of prior information about theunknown treatment eﬀect Π T − Π C from a relevant expert as followsΠ T − Π C ∼ Normal (cid:2) ∆ = 0 . , σ = 0 . (cid:3) As the treatment allocation ratio is 2:1, we have r = (2 + 1) / .

5. Further, the SE of δ n atthe ﬁnal analysis is expected to be ˜ k = p (0 . ∗ . /

140 + 0 . ∗ . /

70) = 0 . γ = c (1) = 2 . γ = θ min / ˜ k = 0 . / .

053 = 2 .

83) atthe design stage are Φ (cid:16) . − (0 . . √ . . (cid:17) = 0 .

645 and Φ (cid:16) . − (0 . . √ . . (cid:17) = 0 . n T = 105),as compared with 22.2% of those in the placebo group ( n C = 53). Therefore, at interim, δ n =0 . − .

222 = 0 .

157 with SE as p . ∗ . /

105 + 0 . ∗ . /

53 = 0 . s n = SE · √ n/r = (0 . √ / √ . . k = 0 . · √ .

75 = 0 . ′ ), based on Eq. (4.1),the CP for trial success ( γ = c (1) = 2 . γ = 0 . / .

064 = 2 .

34) are 0.884and 0.709, respectively. However, if we assume the interim trend to be continued to the remainingpart of the trial, based on Eq. (4.2), the CP for trial success and clinical success are 0.804 and 0.587,respectively. Further, the PPoS for trial success and clinical success based on interim results alongwith prior knowledge are 0.782 and 0.586, respectively (see Eq. (4.3)). However, if we leave out theprior distribution, the PPoS for trial success and clinical success based on interim results only are0.772 and 0.575, respectively (see Eq. (4.4)).

Example 3: Binary endpoint with the beta priors

This example is inspired by the example given in Johns and Andersen [20]. Consider a clinical trial todemonstrate that the relapse rate in patients treated in the experimental treatment arm is less thanthe control arm’s response rate. It was planned to enrol 340 patients in each arm. The interim analy-sis was planned after 170 patients in each arm completed treatment. Non-informative uniform priorswere assumed for the two relapse rate: Π T ∼ Beta( a T = 1 , b T = 1) and Π C ∼ Beta( a C = 1 , b C = 1).In this case, we are statistically testing the following hypotheses: H : Π T − Π C ≤ H :Π T − Π C >

0. Suppose we observed following results at interim analysis: (a) in the treatment arm,155 (= n T ) out of 170 patients responded with 13 (= x T ) subsequent relapses, and (b) in the controlarm, 152 (= n C ) out of 169 patients responded with 21 (= x C ) subsequent relapses. Subsequently,additional 340 −

170 = 170 patients (= N T − n T ) and 340 −

169 = 171 patients (= N C − n C ) to be14nrolled in the treatment arm and control arm, respectively. With this information, the PPoS fortrial success based on a Z test at one sided 0.025 level is 0.536 (see Eq. (4.14)). Example 4: Time to event endpoint

Lassman et al. [23] recently reported the interim results of INTELLANCE-I trial on glioblastomapatients comparing investigational drug depatuxizumab mafodotin. Total of 639 subjects (= N ) wasenrolled in the study with 1:1 allocation ratio. The primary endpoint in the study was overall sur-vival. The target number of events at the ﬁnal analysis was 441 (= D ) and an interim analysis wasplanned with 332 events. The trial used a weighted log-rank test, however, here we illustrate assum-ing standard log-rank test. According to O’Brien alpha spending function, the rejection boundariesfor the Z test (i.e., trend test) statistic are 2.34 and 2.012(= c (1)) at interim and ﬁnal analyses, re-spectively. For clinical success, we assume the clinically meaningful HR is 0.80 (= λ min ) or less.In this case, we are statistically testing the following hypotheses: H : HR = 1 against H : HR < λ = 1. The phase 2 trial of depatuxizumab mafodotin on recurrent glioblastoma patients[24] reported HR for OS events as 0.71 (= λ ) with 133 events (i.e., σ = 2 / √

133 = 0 . λ log λ ∼ Normal (cid:2) log 0 . , σ = (0 . (cid:3) As the treatment allocation ratio is 1:1, we have r = (1 + 1) / (cid:18) √

441 log (1 / . − (2)(2 . √ (441)(0 . +(2) (cid:19) = 0 .

728 (see Eq. (4.21)). Further, the PoSfor clinical success ( γ = − log (0 . / . . (cid:18) √

441 log (1 / . − (2)(2 . √ (441)(0 . +(2) (cid:19) = 0 . λ IA ) based on 346 events (= D IA ). Notethat, k = r/ √ D = 2 / √

441 = 0 . λ ′ ) as assumed at the design stage, the conditional power for trial success ( γ = c (1) =2 . γ = − log (0 . / . . In this paper expressions for various measures of the probability of success are presented by typeof endpoints. The discussion in this paper is restricted to the normal-normal and beta-binomialdistributions. For other distributions, the relevant expressions can be obtained using the generalframework presented in Section 2 or simulation based methods such as Bayesian clinical trial simula-tion (BCTS) [3, 16] may be used. Nevertheless, a natural question arises which probability of successmeasure one should prefer. Often PPoS is preferred over CP for following reasons: (1) these have15etter predictive interpretation, (2) unlike frequentist counterpart, the knowledge on θ (the param-eter of interest) is used as distribution whereas in frequentist calculation we assume that the valueof θ is known without any uncertainty, and (3) unlike frequentist paradigm, the prior informationcan be incorporated in the Bayesian paradigm. In general, the CP is more aggressive than the PPoSand hence use of the CP increases the chance of early stopping for futility or eﬃcacy. Lachin [8]has shown that futility termination may markedly decrease the power in direct proportion to theprobability of stopping for futility. Therefore, PPoS seems to be more useful while monitoring a trialfor early termination.Eﬀect of sample size on the PPoS compared to the CP was explored by Dallow and Fina [19]. Onthe other hand, the eﬀect of varying prior distribution of θ on predictive power in the context offutility monitoring is discussed by Dmitrienko and Wang [1], and in general by Ruﬁbach, Burger andAbt [15]. In summary, they have proposed to use aggressive prior for futility monitoring as use ofnon-informative may increase the early termination rate. Tang [14] suggested using the upper limitof PPoS in futility monitoring. The eﬀect of prior on PPoS in the context of the binomial endpointis discussed by Johns and Andersen [20].One might consider PPoCS in monitoring for early stopping as well; however, in general, its use shouldbe discouraged. Saville et al. [2] have pointed out that the PPoS (referred as ’predictive probabilities’)are naturally appealing for monitoring a clinical trial as (a) the PPoS directly addresses the questionwhether the study is going to be a success at the end, (b) and the PPoS often changes drastically withthe accrual of more data whereas the PPoCS (referred as ’posterior probabilities’) may remain nearlyidentical. Further, we also would like to point out the potential misuse of PPoS for survival endpointswith delayed treatment eﬀects. In that case, the use of futility criteria for early stopping based onPPoS or CP may be misleading. In these cases, the futility criteria, if any, must be determinedthrough exhaustive evaluation of operating characteristics. References [1] Dmitrienko, A., and Wang, M. D. (2006). Bayesian predictive approach to interim monitoring inclinical trials.

Statistics in Medicine,

Clinical Trials,

Pharmaceutical Statistics , 4(3), 187-201.[4] Chuang-Stein, C. (2006). Sample size and the probability of a successful trial.

PharmaceuticalStatistics,

The Statistician,

Controlled Clinical Trials,

Biometrics,

Statistics in medicine,

Statistics in Biopharmaceutical Research,

Controlled ClinicalTrials,

Controlled clinical trials,

Biometrics,

Journal of Biopharmaceutical Statistics,

Pharmaceuticalstatistics,

Clinical Trials, cran.r-project.org/web/packages/LongCART/LongCART.pdf [18] Demets, D. L., and Lan, K. G. (1994). Interim analysis: the alpha spending function approach.

Statistics in medicine,

PharmaceuticalStatistics,

Journal of Biopharmaceutical Statistics, et al. (2020). Luspatercept in patients with lower-riskmyelodysplastic syndromes.

New England Journal of Medicine, et al. (2020). Depatuxizumab-mafodotin in EGFR-ampliﬁednewly diagnosed glioblastoma: a randomized, double-blind, phase III, international clinical trial(RTOG 3508, INTELLANCE 1). submitted.[24] Van Den Bent, M., Eoli, M., Sepulveda, J. M. et al.et al.