[PDF] Adding experimental treatment arms to Multi-Arm Multi-Stage platform trials in progress

Abstract

Multi-Arm Multi-Stage (MAMS) platform trials are an efficient tool for the comparison of several treatments. Suppose we wish to add a treatment to a trial already in progress, to access the benefits of a MAMS design. How should this be done? The MAMS framework requires pre-planned options for how the trial proceeds at each stage in order to control the family-wise error rate. Thus, it is difficult to make both planned and unplanned design modifications. The conditional error approach is a tool that allows unplanned design modifications while maintaining the overall error rate. In this work, we use the conditional error approach to allow adding new arms to a MAMS trial in progress. We demonstrate the principles of incorporating additional hypotheses into the testing structure. Using this framework, we show how to update the testing procedure for a MAMS trial in progress to incorporate additional treatment arms. Simulations illustrate the possible operating characteristics of such procedures using a fixed rule for how and when the design modification is made.

Full PDF

AAdding experimental treatment arms toMulti-Arm Multi-Stage platform trials inprogress

THOMAS BURNETT

Department of Mathematics and Statistics, Lancaster University, Lancaster LA1 4YF, UK

FRANZ K ¨ONIG ∗ Section for Medical Statistics,CeMSIIS,Medical University of Vienna, Vienna 1090,Austria [email protected] JAKI

Department of Mathematics and Statistics, Lancaster University, Lancaster LA1 4YF, UK

Summary

Multi-Arm Multi-Stage (MAMS) platform trials are an eﬃcient tool for the comparison of sev-eral treatments with a control. Suppose a new treatment becomes available at some stage of atrial already in progress. There are clear beneﬁts to adding the treatment to the current trial forcomparison, but how?As ﬂexible as the MAMS framework is, it requires pre-planned options for how the trial pro-ceeds at each stage in order to control the familywise error rate. Thus, as with many adaptivedesigns, it is diﬃcult to make unplanned design modiﬁcations. The conditional error approach isa tool that allows unplanned design modiﬁcations while maintaining the overall error rate. In thiswork we use the conditional error approach to allow adding new arms to a MAMS trial in progress.Using a single stage two-arm trial, we demonstrate the principals of incorporating additionalhypotheses into the testing structure. With this framework for adding treatments and hypothesesin place, we show how to update the testing procedure for a MAMS trial in progress to incorporateadditional treatment arms. Through simulation, we illustrate the operating characteristics of suchprocedures.

Key words : multi-arm multi-stage (MAMS), adaptive designs, conditional error, design modiﬁcation

1. Introduction

During Phase II of the drug development process it is common to have several competing treat-ments, these may simply be diﬀerent doses of the same drug or entirely diﬀerent treatmentregimes. Jaki and Hampson (2016) note that, given the high failure rate and cost of Phase III ∗ To whom correspondence should be addressed. a r X i v : . [ s t a t . A P ] J u l T. Burnett and others trials, it is key to that careful consideration be given to which treatments should be carried for-ward for further study. Multi-arm multi-stage trials (MAMS) (Royston and others , 2003; Jaki,2013) allow for the eﬃcient comparison of several experimental treatments with a common con-trol allowing for the selection of appropriate treatments (Jaki, 2015).Through early stopping, MAMS trials are eﬃcient in reducing the expected number of pa-tients required. They allow dropping treatments that are demonstrated to be ineﬀective/showinglack of promise or stopping the trial altogether. Alternatively a declaration of eﬃcacy is possi-ble. Given the multiple hypotheses and highly adaptive nature of the design, formal inference inMAMS studies requires specialist testing methodology in order to control the error rate of thetrial such as sequential testing methodology Stallard and Todd (2003). Magirr and others (2012)introduced the generalised Dunnett family of tests, where well known group sequential Hatﬁeld and others (2016) testing boundaries are deﬁned to account for the multiple stages introducedby the interim analyses while accounting for the correlation introduced by the comparison ofseveral experimental arms to a common control Dunnett (1955); we focus our discussion aroundthis method. Alternatively fully ﬂexible testing methods have been proposed Bretz and others (2006); Schmidli and others (2006), these allow the decision making about which arms should re-main in the study to function separately from the hypothesis testing without impacting the errorrate. Both methods require the pre-deﬁnition of all study hypotheses, so that the overall testingprocedure may be constructed to give strong control of the FamilyWise Error Rate (FWER)(Dmitrienko and others , 2009).To make best use of the eﬃciency of a MAMS design it is desirable to include all experimentaltreatments for comparison with the common control; this brings further beneﬁt of between treat-ment comparisons being available from the same trial data. However it is possible that not allexperimental treatments are available at the start of the trial as, for example, see in the STAM-PEDE trial Sydes and others (2009). STAMPEDE started with ﬁve comparisons and at the timeof writing has added six new research comparisons. The motivation to include further experi-mental treatments as they become available into the trial in progress is clear. This maintains thebeneﬁts of a reduction in logistical and administrative eﬀort and speeding up the overall develop-ment process Parmar and others (2008) as well as allowing direct comparisons of the treatmentswithin the same trial.Treatments may be added to the trial in progress by adjusting the pre-planned testing struc-ture provided no use has been made of the data observed in the trial (in our view this includesknowledge that an interim analysis has happened, as this gives insight into the treatment eﬀectdue to the pre-deﬁned stopping boundaries). Bennett and Mander (2020) demonstrate how tosuitably adjust the sample size for each treatment arm for such additions but make no use ofany existing trial data in doing so. It is clearly possible that treatments may become availableafter some interim analysis, where it would be desirable to incorporate some knowledge of trialinformation into the decision making.The conditional error approach Proschan and Hunsberger (1995) allows for design modiﬁca-tions during the course of a trial, where these modiﬁcations have not been pre-planned. Withwork showing how these modiﬁcations may be accounted for in the setting of treatment selectionKoenig and others (2008); Magirr and others (2014) however, adding hypotheses to a testingframework using this method requires tests of any introduced hypotheses to be constructed atlevel α Hommel (2001). We propose a general framework relying on the conditional error principle dding treatment arms to trials in progress

2. Altering a trial in progress

A two arm trial

To demonstrate the necessary principles for adding treatments to an ongoing trial let us ﬁrstconsider the simpler setting of a two arm trial. Suppose we plan a trial to compare a new treat-ment, T , and a control, T . Let µ and µ be the expected responses for patients on treatments T and T respectively, and deﬁne the treatment eﬀect as θ = µ − µ . We investigate the onesided null hypothesis H : θ (cid:54)

0, which we test at the nominal type I error level α .Without loss of generality the trial will recruit a total of n patients randomised equallybetween treatment and control. Let X i,k ∼ N ( µ k , σ ) for i = 1 , ... , n/ k = 0 , θ isthe test statistic. This has corresponding Z-value Z = ˆ θ √ n σ ∼ N ( ξ , ξ = θ √ n σ , under the null hypothesis ξ = 0. We reject H when Z > Φ − (1 − α ), where Φ is the standardnormal cdf. Note that we are not restricted to any particular form of data by this choice todescribe the results in terms of Z-values as for any test statistic we may ﬁnd the correspondingZ-value. 2.2 Adding a treatment

Suppose for τ ∈ (0 ,

1) after τ n observations a new treatment, T , becomes available. Let µ bethe expected response for patients receiving this new treatment and deﬁne the correspondingtreatment eﬀect by θ = µ − µ . We add T to the trial in order to test the additional nullhypothesis H : θ (cid:54) T and T , such as maintaining the same sample size per treatment armalready present in the trial. Notationally it is convenient to split the trial into two stages, stage1 contains the observations collected before the new treatment is added and stage 2 contains theobservations collected after the new treatment is added. Thus from the stage 1 data we ﬁnd thez-value Z (1)1 ∼ N ( ξ √ τ , T. Burnett and others and from the stage 2 data we ﬁnd the z-value Z (2)1 ∼ N ( ξ √ − τ , Z = √ τ Z (1)1 + √ − τ Z (2)1 .We must recruit additional patients in order to examine the new treatment T ; say we recruita further (1 − τ ) n/ T is onlyadded to the trial for the second stage, the comparison of T and T is made based only on thedata available from the second stage of the trial from which we construct the z-value Z ∼ N ( ξ √ − τ , ξ = θ √ n σ .Due to the common control in the second stage of the trial and the equal randomisation Z (2)1 and Z have correlation 1 / − ( τ ) n/ Hypothesis testing

In the setting of a conﬁrmatory clinical trial, testing multiple null hypotheses it is natural toconsider the impact on the error rate. For the two arm trial we constructed our hypothesis testin order to control the type I error rate at some pre-deﬁned level α , that is the probability offalsely declaring a positive result for an ineﬀective treatment. A natural extension in the case ofmultiple hypotheses is the FamilyWise Error rate (FWER), for the event R that we reject oneor more true null hypothesis the FWER is deﬁned as P θ ( R ). Again the interpretation of thisis the probability of falsely declaring a positive result for an ineﬀective treatment (for a given θ = ( θ , θ )).Suppose, when adding this further experimental treatment to the trial we test each of thenull hypotheses as described in Section 2.1 at a nominal level α = 0.05, Figure 2.3 shows theimpact on the FWER as we vary τ . The two extremes of τ = 0 and τ = 1 represent the caseswhere the trial is not altered: when τ = 0 all three treatments are present from the start of thetrial; when τ = 1 only the two arm trial is conducted (in which case the error rate returns tothe nominal type I error value of 0.05). For values in between we see that the FWER is inﬂatedwhen compared to the nominal α . dding treatment arms to trials in progress . . . . . . t P ( R e j e c t one o r m o r e t r ue nu ll h y po t he s e s ) Fig. 1. Inﬂation in the FWER when an additional hypothesis is added to an ongoing two arm trial.

To account for the multiplicity introduced to the trial we shall require strong control of theFWER, a common requirement in the setting of a conﬁrmatory clinical trial Dmitrienko andothers (2009), that is P θ ( R ) (cid:54) α for all θ = ( θ , θ ). (2.1)Sugitani and others (2018) propose methods that account for the introduction of the additionalhypothesis, this is achieved by testing any introduced hypothesis based strictly on the data col-lected after their introduction at level α Hommel (2001). We build upon the approach, allowingexisting information to be incorporated wherever it is available.To ensure strong control of the FWER we construct an overall closed testing procedure Mar-cus and others (1976) that accounts for the adaptive nature of the trial within each test Koenig and others (2008). This requires tests of H , H and H , = H ∩ H : θ (cid:54) θ (cid:54)

0. Wemay reject H globally when the local level α tests of H and H , are rejected. Similarly wemay reject H globally when the local level α tests of H and H , are rejected.First let us consider the level α test of H , no changes have been made to the recruitmentor analysis for this test, so as before we may reject H when Z > Φ − (1 − α ) at the end ofthe trial. However, it is useful to discuss how to construct this test using the conditional errorprincipal Proschan and Hunsberger (1995). Given the results from the ﬁrst stage of the trial z (1)1 T. Burnett and others we deﬁne the conditional error rate A ( z (1)1 ) = P θ =0 (Reject H | Z (1)1 = z (1)1 ).The probability of rejecting the null hypothesis for the remainder of the trial must not exceed A ( z (1)1 ). We may write the test only in terms of the stage 2 observations while incorporatingthe stage 1 data, that is we may reject H when Z (2)1 > Φ − (1 − A ( z (1)1 )). Let f ( z (1)1 ) be theprobability density function of the z-value based on the ﬁrst stage data, then under the nullhypothesis H we have that (cid:90) z (1)1 f ( z (1)1 ) A ( z (1)1 )d z (1)1 = α, (2.2)which in turn guarantees control of the error rate at the pre-speciﬁed level α , that is P θ =0 (Reject H ) = α .This test is equivalent to testing based on Z , and gives the same overall properties.As for testing H , there is no existing information relating to this null hypothesis and thusthe test must be constructed purely based only on the stage 2 trial data used to construct Z .We reject the test for H when Z > Φ − (1 − α ).These tests of H and H give type I error control for the pairwise tests, however do notallow for overall rejection of the corresponding null hypotheses. In order to ensure strong controlof the FWER we now also require the test of the intersection hypothesis H , .During the ﬁrst stage of the trial there is no pre-planned test for this hypothesis, howeverthere is pre-existing information for H in the form of Z (1)1 . Hommel (2001) show how to usesuch ﬁrst stage information in the test of an intersection hypothesis, when adding some initiallyexcluded hypotheses after an interim analysis; we now apply this approach to a newly addedhypothesis. Consider the component hypotheses of H , , clearly H , = ⇒ H , using this factwe propose that the existing information Z (1)1 may contribute to the test of H , . Since H istrue we compute the conditional error rate A ( z (1)1 ) as described previously, furthermore under H , z (1)1 is distributed such that equation 2.2 holds as before. Thus we may construct the test of H , at the end of the trial at level A ( z (1)1 ) allowing for the incorporation of the stage one datagiven by Z (1)1 .For example consider a Dunnett test Dunnett (1955) for H , . Let Z D = max ( Z (2)1 , Z )and deﬁne the distribution (cid:18) XY (cid:19) ∼ N (cid:18)(cid:18) (cid:19) , (cid:18) / / (cid:19)(cid:19) .We construct the Dunnett p-value, P D = P ( X > Z D ∪ Y > Z D )and may reject H , when P D < A ( Z (1)1 ). dding treatment arms to trials in progress Simulation study ξ ξ P (Reject test of H ) P (Reject test of H ) P (Reject test of H , ) FWER0 0 δ δ Table 1. Probabilities of rejecting components of the closed testing procedure under proposed testingprocedure, type I errors highlighted in bold, δ = Φ − (0.95) + Φ − (0.9) such that we have power of 0.9when testing H in the original trial. Firstly we conﬁrm that the desired error control is achieved for each hypothesis via simulation.For each value of ( ξ , ξ ) and τ = 0.5 we simulate 1,000,000 realisations of Z and Z assumingequal sample size in each treatment at each stage in R (R Core Team, 2019). Table 1 shows ourestimates of the probabilities of an error for the test of each hypothesis H , H and H , , asrequired this is α whichever combination of null hypotheses are true.We now compare the overall trial performance for two alternative methods for testing the inter-section hypothesis: the ﬁrst being the method proposed above; and the second basing the test forthe intersection hypothesis only on evidence for H , that is we reject H , when Z > Φ − (1 − α )essentially treating the ﬁrst null hypothesis as a gate keeping procedure. In the both procedures weincorporate existing information Z (1)1 when testing H , by the argument that H , = ⇒ H .Table 2 shows the probabilities of global rejection of the null hypotheses under several combina-tions of ξ and ξ for each testing method. We conﬁrm that the overall testing procedure givesstrong control of the FWER under all conﬁgurations of null hypotheses. The probabilities ofrejecting false H does not diﬀer largely between the two testing methods, with an increase of atmost at most 0.04 for the gate keeping procedure. When both null hypotheses are false the is asmall decrease of 0.03 in the probability of rejecting H for the gate keeping procedure. The maindiﬀerence between the two procedures is when H is true and H is false under this conﬁgurationit is not possible to reject H without making an error in rejecting H (indeed the probabilityof rejecting a false H is 0.04 when H is true) and thus our proposed procedure increases theprobability of rejecting the H by 0.29. It is clear that the small advantage for testing H for thegate keeping procedure is outweighed by the ability to reject H when there is a low probabil-ity of rejecting H when taking an integrated approach to the test of the intersection hypothesis.In order to understand the performance of the proposed testing procedure we examine theprobabilities of rejecting the intersection hypothesis H , under the 4 proposed scenarios inFigure 2 (all combinations of H and H true and false). We see that when H is false theconditional error is likely to be higher than the pre-planned α , usually giving a high chance ofrejecting H , ; although when H is true and H is false there is a small reduction in theprobability of rejecting H , , this explains the small deﬁcit of the proposed procedure comparedto testing based only on H when ξ = δ and ξ = 0. Conversely when H is true we see that theconditional error is likely to be quite low: when both null hypotheses are true this correspondsto a low probability of rejecting H , as is to be expected; however when H is false as theconditional error rate rises we recover some possibility of rejecting the intersection hypothesisallowing us to reject H globally. T. Burnett and others

Dunnett procedure for testing the intersection hypothesis ξ ξ P (Reject H only) P (Reject H only) P (Reject both) P (Reject any)0 0 δ δ δ δ ξ ξ P (Reject H only) P (Reject H only) P (Reject both) P (Reject any)0 0 NA δ δ NA δ δ Table 2. Probabilities of global rejection of null hypothesis using the conditional error approach, type I errors highlighted in bold, δ = Φ − (0.95) + Φ − (0.9) such that we have power of 0.9 when testing H inthe original trial. . . . . . . . . . . . f ( z ( ) ) A ( z ( ) ) P ( R e j e c t H , | z ( ) ) x = d , x = d . . . . . . . . . . . f ( z ( ) ) A ( z ( ) ) P ( R e j e c t H , | z ( ) ) x = , x = d . . . . . . . . . . . f ( z ( ) ) A ( z ( ) ) P ( R e j e c t H , | z ( ) ) x = d , x = . . . . . . . . . . . f ( z ( ) ) A ( z ( ) ) P ( R e j e c t H , | z ( ) ) x = , x = P ( Reject H | z ( ) ) f ( z ( ) ) Fig. 2. Conditional error rate, A ( z (1)), against probability of rejecting the intersection hypothesis P (Reject H , | z (1)1 ) and corresponding density of conditional error f ( z (1)1 , δ = Φ − (0.95) + Φ − (0.9)such that we have power of 0.9 when testing H in the original trial. dding treatment arms to trials in progress

3. General rule for hypothesis adding

Suppose there is a set of existing null hypotheses E with an existing pre-planned closed testingprocedure, to which we wish to add a set of new null hypotheses N . Let H e be the intersectionof some subset of the existing null hypotheses e ⊆ E and H n be the intersection of some subsetof the new hypotheses n ⊆ N . To construct the updated closed testing procedure for the end ofthe trial there are three scenarios to consider, we are either testing hypotheses of the form H e , H n or H e ∩ H n . H e : Let α (cid:48) e be the conditional error rate for the test of H e at the time the N hypothesesare added, we construct the test of H e such that the probability of falsely rejecting H e does notexceed α (cid:48) e . H n : There is no pre-existing information concerning the test of this hypothesis and thus itmust be tested at level α . H e ∩ H n : as noted in the three arm case H e ∩ H n = ⇒ H e and hence the data already availablefor H e is distributed such that computing the corresponding conditional error α (cid:48) e will ensure thatan equation of the form 2.2 holds. Thus we may incorporate the existing information on H e intothe test by constructing the test for H e ∩ H n such that the probability of falsely rejecting H e ∩ H n does not exceed α (cid:48) e .It is crucial that the test of any intersection of the form H e ∩ H n is constructed in this way.Consider that while proposing changes to the trial one may or may not choose to add any newhypotheses H n : in the case where H n added then H e ∩ H n may be based on the data relatingto both H e and H n and is tested at α (cid:48) e ; while if H n is not added then implicitly the test for H e ∩ H n is implicitly that of H e also tested at α (cid:48) e . In either case the test of H e ∩ H n is basedon the error level α (cid:48) e ensuring that an equation of the form 2.2 holds whatever decision is madewhile proposing changes to the trial design.Noting further that any procedure that gives strong control of the FWER is indeed a closedtesting procedure Burnett (2017). This method allows any procedure that ensures strong controlof the FWER to include additional hypotheses that were not included at the design stage whilemaintaining the statistical integrity of the trial. The penalty for doing so compared to separatetrials is the test of hypotheses of the form H e ∩ H n and is the potential eﬃciency of these teststhat will dictate whether it is a sensible approach or not. However if the treatments co-exist inthe research environment there is clear motivation for including them in the same trial as is thecase in Multi-Arm Multi-Stage designs.

4. Alteration of a Multi-Arm Multi Stage trial in progress

Introduction to a Multi-Arm Multi-Stage setting

Take the case of two experimental treatments and a common control from Section 2.1. Had weknown that this additional comparison would have been of interest at the start of the trial weshould instead considered a Multi-Arm Multi-Stage design Jaki (2015); Wason and others (2016).This would maintain the ability to compare multiple experimental treatments with a commoncontrol, while incorporating one or more interim analyses to allow for early stopping.0

T. Burnett and others

The use of one or more pre-planned interim analyses ensures: poorly performing treatmentsmay be dropped for futility, protecting patients on the trial from an inferior treatment; alterna-tively the trial may be stopped early to declare eﬃcacy, reducing the overall development timeand number of patients. These early stopping characteristics may be achieved while formally test-ing null hypotheses for each treatment for example through the use of generalised Dunnett typetesting procedures Magirr and others (2012) or fully ﬂexible methods Bretz and others (2006);Schmidli and others (2006). We shall focus on generalised Dunnett type methods here although,as noted in Section 3, the methods apply to any procedure that gives strong control of the FWER.4.2

Multi-arm multi-stage trials

Suppose at the outset of the trial we have K novel treatments, T , ... , T K that we wish to com-pare against a common control. We deﬁne the null hypotheses H i : θ i (cid:54) H i : θ i > i = 1 , ... , K . We shall now consider a J stage MAMS trial willallow us to simultaneously test these initial K null hypotheses.Let n be the number of patients to be recruited to the control arm in the ﬁrst stage of thetrial. Assuming that the recruitment and randomisation procedures for the trial are such thatwe can achieve the desired number of patients on any given arm at any given time. At stage j = 1 , ... , J we have recruited r k,j n patients to treatment k = 0 , , ... , K . Treatments may bedropped futility at the end of each stage of the trial, suppose treatment k ∗ is stopped at analysis j ∗ we have r k ∗ ,j = r k ∗ ,j ∗ for all j > j ∗ . Alternatively the entire trial may stop recruiting early ifa treatment or treatments have been selected for further study, such as when the trial is stoppeddue to a treatment-control comparison yielding statistical signiﬁcance Urach and Posch (2016).From the observations at each stage j = 1 , ... , J and treatment k = 1 , ... , K we constructestimates ˆ θ k,j . Deﬁning I k,j = r k,j r ,j nσ ( r k,j + r ,j ) , we ﬁnd the corresponding Z-values Z k,j = ˆ θ k,j I / k,j .As was the case for the two arm trial described in Section 2.1 the choice to deﬁne the trial interms of the Z-values is not restrictive, for other choices of test statistic we may simply ﬁnd thecorresponding Z-values. 4.3 The Generalised Dunnett procedure

Recall that R is the event that we reject one or more true null hypothesis then extending Equa-tion 2.1 to a general K null hypothesis strong control requires that P θ ( R ) (cid:54) α for all θ = ( θ , ... , θ K ). (4.3)The generalised Dunnett method introduced by Magirr and others (2012) allows the simultaneoustesting of null hypotheses in a MAMS design, deﬁning group sequential testing boundaries thataccount for the correlation structure of comparing multiple treatments to control to achieve the dding treatment arms to trials in progress u = ( u , ... , u J ) where the null hy-pothesis in treatment group k = 1 , ... , K , H k , is rejected at stage j of the trial if Z k,j > u j ;after any null hypothesis has been rejected the trial is concluded, accumulating no more data. Inaddition we deﬁne futility stopping boundaries l = ( l , ... , l J ) where if Z k,j < l j the correspondingtreatment is dropped for futility, meaning that patients are no longer recruited to this treatment.To achieve strong control of the FWER Magirr and others (2012) show it is suﬃcient toset the stopping boundaries such that the probability of rejecting any null hypothesis under theglobal null, θ = ... = θ K = 0 which we denote by , does not exceed the nominal error rate α .Noting that under the generalised Dunnett procedure the FWER is maximised under the globalnull, thus we must choose u and l such that P ( R ) (cid:54) α . (4.4)Magirr and others (2012) showed how to compute such testing boundaries u j and l j for j = 1 , ... , J while applying familiar group sequential theory testing. The MAMS package in R Jaki and others (2019) allows for the construction of tests following predeﬁned shapes including Pocock(1977) Pocock (1977), O’Brien & Fleming (1979) O’Brien and Fleming (1979) and triangularWhitehead (1997) type testing boundaries.4.4 Adding experimental treatment arms

With this setting in mind suppose that a trial is still in progress after stage J (cid:48) ∈ (1 , ... , J ). Atthis interim analysis outside of the trial T (cid:62) K (cid:48) = K + 1 + T treatments for the trial in total (in the case that all K + 1original treatment arms are all still in the trial). For the existing K + 1 treatments we shall notalter the recruitment, maintaining that described in Section 4.2. For the new treatments there isclearly no recruitment before they are added and hence r k,j = 0 for k = K + 1 , ... , K + T and j = 1 , ... , J (cid:48) and thus there are no corresponding Z-values. In order to add these new treatmentswe add recruitment r k,j n for each new treatment k = K + 1 , ... , K + T over the remaining trialstages j = J (cid:48) + 1 , ... , J ; note that this recruitment is planned and ﬁxed at the point where thenew treatments are added.As usual we construct estimates for each additional treatment at each stage of the trial theyare present for. For treatment k = 1 , ... , K + T at stage j = J (cid:48) + 1 , ... , J the Z-value is given by Z k,j = ˆ θ k,j I / k,j .4.5 Incorporating additional hypotheses

To make formal inference about the additional treatments we incorporate formal testing of addi-tional hypotheses. Incorporating the new hypotheses we now have the null hypotheses H i : θ i (cid:54) T. Burnett and others and alternatives H i : θ i > i = 1 , ... , K + T , and require strong control of the FWERacross all K + T tests.For the treatments originally included in the trial we wish to incorporate the informationfrom the existing estimates up the the current stage of the trial, ˆ θ J (cid:48) = (ˆ θ J (cid:48) , , ... , ˆ θ k,J (cid:48) ). Weextend Equation 2.1 to write the FWER conditional on the observations from the trial so far P θ ( R | ˆ θ J (cid:48) ) (cid:54) α for all θ = ( θ , ... , θ K ). (4.5)As in Section 2.2 we make use of the independent increments of the Z-values in order to splitthe trial according to data gathered in the ﬁrst J (cid:48) stages and gathered after. For j = J (cid:48) + 1 , ... , J and k = 0 , , ... , K the sample that may be recruited after stage J (cid:48) is governed by r (cid:48) k,j = r k,j − r k,J (cid:48) ,from which we compute corresponding Z-values Z (cid:48) k,j . For each of the treatment arms from theﬁrst stage of the trial k = 1 , ... , K we deﬁne weights for data before and after stage J (cid:48) , for j = J (cid:48) + 1 , ... , J and k = 1 , ... , K w ,k,j = (cid:115) r k,J (cid:48) + r ,J (cid:48) r k,j + r ,j ,w ,k,j = (cid:113) − w ,k,j and re-construct the Z-values for the remainder of the trial as Z k,j = w ,k,j Z k,J (cid:48) + w ,k,j Z (cid:48) k,j .It is also useful to re-write the rejection boundaries in terms of only the data collected after stage J (cid:48) , that is for j = J (cid:48) + 1 , ... , J and k = 1 , ... , Ku (cid:48) k,j = u j − w ,k,j Z k,J (cid:48) w ,k,j where the null hypothesis for treatment k = 1 , ... , K , H k , is rejected at stage j of the trial if Z (cid:48) k,j > u (cid:48) k,j and l (cid:48) k,j = l j − w ,k,j Z k,J (cid:48) w ,k,j where if Z (cid:48) k,j < l (cid:48) k,j the corresponding treatment is dropped for futility.Under a generalised Dunnett procedure the conditional FWER is maximised under the globalnull Stallard and others (2015), see the supplementary material 7.1 for details. Given this wewrite the conditional error under the global null as B (ˆ θ J (cid:48) ) = P ( R | ˆ θ J (cid:48) ) (cid:54) α .As in Equation 2.2 we have that under the global null (cid:90) ˆ θ J (cid:48) f (ˆ θ J (cid:48) ) B (ˆ θ J (cid:48) )dˆ θ J (cid:48) = α, (4.6)and thus ensuring the conditional FWER is not exceeded for the remainder of the trial for the K initial null hypotheses ensures strong control of the FWER with respect to these endpoints. dding treatment arms to trials in progress H e for which the error rate must not exceed B (ˆ θ J (cid:48) ) and new null hypotheses H n forwhich the error rate must not exceed α as in Section 3. We construct testing boundaries as for thegeneralised Dunnett, accounting for the conditioning where possible to allow the incorporation ofall trial data. As seen in Section 4.3 when we talk about constructing boundaries in this Sectionthis may be done following one of several pre-deﬁned shapes; when constructing boundaries inthe Section we further assume that shape of boundary is always of the same type.For existing treatments k = 1 , ... , K we deﬁne the updated testing boundaries for j = J (cid:48) +1 , ... , J as u (cid:48) j and l (cid:48) j , again it is a useful computational trick to write the boundaries incorporatingthe conditional information using independent increments for k = 1 , ... , Ku (cid:48) k,j = u (cid:48) j − w ,k,j Z k,J (cid:48) w ,k,j where the null hypothesis in treatment group k = 1 , ... , K , H k , is rejected at stage j of the trialif Z k,j > u (cid:48) k,j and l (cid:48) k,j = l (cid:48) j − w ,k,j Z k,J (cid:48) w ,k,j where if Z k,j < l k,j the corresponding treatment is dropped for futility. For the T added treat-ments we have for j = J (cid:48) + 1 , ... , J as u ∗ j and l ∗ j , where for treatment k = K + 1 , ... , K + T at stage j = J (cid:48) +1 , ... , J we stop the trial and reject H k if Z k,j > u ∗ j and we drop the treatment for futilityif Z k,j < l ∗ j . To ensure that both the existing endpoints and the additional endpoints are testedat the appropriate error rate we consider two cases when constructing the testing boundaries.When α < B (ˆ θ J (cid:48) ) ﬁrst construct the testing boundaries for the T added treatments, thenwith these ﬁxed we set the boundaries for the original K novel treatments (or what remains ofthe after the interim analysis). We set the testing boundaries for the introduced endpoints u ∗ j and l ∗ j at level α using the usual MAMS framework for a trial with J − J (cid:48) stages and T experimentaltreatments. Given u ∗ j and l ∗ j we then ﬁnd values for u (cid:48) j and l (cid:48) j such that the continuing MAMStrial with J − J (cid:48) stages and K + T treatments is constructed such that the error rate for theFWER under the global null does not exceed B (ˆ θ J (cid:48) ) conditional on the data from the ﬁrst J (cid:48) stages of the trial. Note that constructing u (cid:48) j and l (cid:48) j in this way is in the worst case (in terms ofthe FWER) equivalent to constructing a MAMS trial with J − J (cid:48) stages and K + 1 treatments(if all K experimental treatments are still present) with an error rate of B (ˆ θ J (cid:48) ) under the sameshape of testing boundaries (see the supplementary material 7.2 for full details). This fulﬁls theconditions introduced in Section 3 for constructing the overall closed testing procedure as allimplied tests for hypotheses involving H e are at level α by construction of B (ˆ θ J (cid:48) ) while allimplied tests for introduced hypotheses H n are at level α .Alternatively when α (cid:62) B (ˆ θ J (cid:48) ) we will have equal testing boundaries for all treatments, u ∗ j = u (cid:48) j and l ∗ j = l (cid:48) j . In this case we choose the u ∗ j , u (cid:48) j , l ∗ j and l (cid:48) j such that the continuing MAMStrial with J − J (cid:48) stages and K + T treatments has a FWER under the global null of B (ˆ θ J (cid:48) )conditional on the data from the ﬁrst J (cid:48) stages of the trial. We note that the tests of H n areconservative in this case, in particular the implied tests of introduced hypotheses H n are at level B (ˆ θ J (cid:48) ). Since any particular H n may only be rejected globally based on B (ˆ θ J (cid:48) ) this is not aconcern.4 T. Burnett and others5. Example

An illustrative example

We now explore the performance of our proposed method. We begin by exploring how the modiﬁ-cation of the trial may be conducted. For the initially planned MAMS trial consider a three stagetrial to compare two novel treatments with a common control, recruiting an equal number of pa-tients to each treatment across each stage of the trial; that is J = 3, K = 2 and r k = (1 , ,

3) for k = 0 , ,

2. Under this design we formally test the null hypotheses H : θ (cid:54) H : θ (cid:54) α = 0.05,let δ = Φ − (0.75) √ σ = 1 (note this choice of δ and σ corresponds to a probability of 0.75that a randomly selected person on the experimental treatment performs better than a randomlyselected person on the control) then at a conﬁguration of θ = ( δ,

0) we have a target power of1 − β = 0.9. Deﬁning the triangular type testing boundaries Whitehead (1997) the details of thepre-planned MAMS trial computed using the mams() function of the MAMS package in R Jaki and others (2019). In this design we recruit 10 patients per treatment arm per stage, given amaximum possible sample size of 90 patients. For stopping the trial we have upper boundaries u = (2.435 , , l = (0.000 , , J (cid:48) = 1 we wish to add two novel treatments to the trial T = 2,adding the null hypotheses H : θ (cid:54) H : θ (cid:54)

0. Let the observations from the ﬁrststage of the trial be Z = (2 , B (ˆ θ ) = 0.24 from which we may plan the rest of the trial. In this case wehave B (ˆ θ ) > α and thus when constructing the stopping boundaries for the remainder of thetrial we have diﬀerent stopping boundaries for the introduced arms and those already in the trial.For the introduced arms we compute upper boundaries for stages 2 and 3 u ∗ = (2.179 , l ∗ = (0.726 , u (cid:48) = (2.240 , l (cid:48) = (0.747 , θ .Table 3 shows the operating characteristics of the updated trial based on 1000 simulations. Wesee that as expected for each hypothesis the probability of rejecting the null hypotheses doesnot exceed B (ˆ θ ) = 0.24 or α = 0.05 when the the corresponding θ i = 0 as required by theconstruction of our test; indeed in most cases we are somewhat conservative for each individualhypothesis globally, but recall for the FWER we are concerned with rejecting one or more nullhypotheses and so this conservatism is necessary. Examining the probabilities of rejecting multi-ple hypotheses we conﬁrm that under the global null we have achieved a conditional FWER of B (ˆ θ ) = 0.24 as expected.From Table 3 we may also asses the performance where the null hypothesis is false. We see ahigher probability of rejecting H and H (due to the relatively positive results from the ﬁrststage of the trial), with high probabilities of rejecting H (the better treatment according to the dding treatment arms to trials in progress θ P θ ( R ) P θ ( R ) P θ ( R ) P θ ( R ) E θ ( N )(0 , , ,

0) 0.17 0.08 0.03 0.02 72 (+30)( δ, , ,

0) 0.97 0.05 0.01 0.01 54 (+30)(0 , δ, ,

0) 0.13 0.93 0.02 0.02 59 (+30)(0 , , δ,

0) 0.13 0.06 0.79 0.02 65 (+30)( δ, δ, ,

0) 0.92 0.76 0.01 0.01 53 (+30)(0 , , δ, δ ) 0.14 0.05 0.68 0.70 62 (+30)( δ, δ, δ, δ ) 0.90 0.74 0.52 0.52 53 (+30) θ Fail to reject Reject one Reject two Reject three Reject four(0 , , ,

0) 0.76 0.20 0.04 0.01 0.00( δ, , ,

0) 0.02 0.91 0.06 0.01 0.00(0 , δ, ,

0) 0.06 0.80 0.12 0.01 0.00(0 , , δ,

0) 0.18 0.67 0.12 0.02 0.00( δ, δ, ,

0) 0.01 0.30 0.67 0.02 0.00(0 , , δ, δ ) 0.07 0.43 0.39 0.10 0.02( δ, δ, δ, δ ) 0.00 0.18 0.25 0.29 0.29 Table 3. Operating characteristics for the remainder of the trial given Z = (2 , θ . Where R i is the event that H i is rejected and N is the total sample size (note 30participants already recruited). interim analysis). For the added experimental treatments we have a reasonable chance to rejectthe null hypotheses, particularly when there is no beneﬁt to the original experimental treatments,although it would appear that it takes slightly longer to ﬁnd this result.5.2 Comparison of performance

Suppose again at the ﬁrst interim analysis J (cid:48) = 1 we wish to add two further treatment arms T = 2 to a trial in progress. Our proposed method is not the only way one might study theadditional treatment arms. Considering an integrated approach (as suggest by our example inSection 2.4). We shall compare with two options that maintain the integrity of the results giventhat observations are already available from the trial: option 1 is to conduct another separateMAMS trial comparing the new treatments with the control (not this means that patients mustbe recruited to the control in both trials); option 2 is to conclude the current trial at the interimanalysis and start a new trial incorporating all four experimental treatments. If the original trialconcludes statistical signiﬁcance at the interim analysis we assume no treatment is studied furtherunder all three scenarios.We perform simulations, generating 1000 realisations of the trial (including the ﬁrst stage), toevaluate the overall operating characteristics of the trials continuing under each of these options.Table 5.2 shows the probability of the trial continuing beyond the ﬁrst interim analysis undereach of our conﬁgurations of interest, we see that: under the global null the probability that thetrial continues beyond the ﬁrst interim analysis is 0.65 (applying binding futility boundaries); if θ or θ = δ we have a high probability of the trial stopping early to declare eﬃcacy of 0.39 witha probability of 0.6 of the trial continuing beyond the ﬁrst interim analysis; if the treatment eﬀect6 T. Burnett and others is present in both the experimental treatments there is a probability of 0.54 of stopping the trialearly to declare eﬃcacy. θ P θ (Stop early for futility) P θ (Stop early for eﬃcacy)(0 ,

0) 0.33 0.02( δ,

0) 0.01 0.39( δ, δ ) 0.00 0.54

Table 4. Characteristics of trial adding endpoints conditional on stage 1.

Let us examine the overall operating characteristics, where if the trial continues beyond theinterim analysis we follow either our proposed method or one of the other 2 options to add thearms. Table 5 shows under our proposed method: as expected under the global null the FWERis strongly controlled; when only H is false there is a probability of 0.92 of rejecting it, fallingto a probability of 0.72 of rejecting H when H is also false due to the possibility to concludethe trial early for eﬃcacy elsewhere; when only H is false the probability of rejection is 0.48,this falls to 0.40 when H is false due to the probability of stopping the trial early due to anarm with more data, and when H is also false there is a probability of rejecting each falsehypothesis of 0.42; if all experimental arms oﬀer a beneﬁt over the control we see a probabilityof 0.69 of rejecting H and H and 0.23 of rejecting H and H . The probability to rejectone or more null hypotheses with our proposed method varies largely depending on which treat-ments are eﬀective: when either H or H are false there is a high probability of rejecting 1 ormore null hypotheses, greater than 0.9 in all conﬁgurations investigated; whereas if H or H arefalse there is around a 0.5 chance of rejecting one or more null hypotheses if H and H are true.We compare our proposed method with the characteristics of conducting a separate trial forthe new arms in Table 6. The probabilities of rejecting H and H are slightly higher, howeverwhile each trial protects the FWER within the trial there is no overall adjustment to the errorrate across the two trials. This improvement is only 0.01 when only H and/or H are false, withlarger increases when one or more of H and H are false; however the probability of rejectingone or more null hypotheses are consistent between methods, not diﬀering by more than 0.02 un-der the conﬁgurations investigated. There are larger increases in the probability of rejecting H and H , this is due to the fact if the trial has continued beyond the ﬁrst interim analysis underour proposed method the trial as a whole may conclude early due to demonstrated eﬃcacy in theﬁrst two treatments. Our proposed method gives a probability of 0.94 of rejecting one or moretreatments from the same trial when there is an eﬀect in one of the original treatments and oneof the added, while the original trial matches this probability it is not possible to simultaneouslymake the comparison with the other experimental treatment. The number of patients requiredby the two separate trials increases by around 20 patients under the conﬁgurations investigated.When we start a new trial to incorporate all the treatments we see from Table 7 the prob-ability of H or H when they are false is reduced by around 0.05 (varying slightly based onexact conﬁguration). This diﬀerence is entirely driven by the second and third stages of the trial.The probabilities of rejecting false H or H diﬀer by only 0.01 at most. The more noticeablediﬀerence is a consistent trend of higher probabilities of rejecting multiple null hypotheses underour proposed method. In addition when restarting the trial the sample size is increased by around15 patients across all conﬁgurations we investigate. dding treatment arms to trials in progress θ P θ ( R ) P θ ( R ) P θ ( R ) P θ ( R ) E θ ( N ) (excluding ﬁrst stage)(0 , , ,

0) 0.02 0.02 0.01 0.01 56( δ, , ,

0) 0.92 0.01 0.01 0.01 52(0 , , δ,

0) 0.02 0.02 0.48 0.01 59( δ, , δ,

0) 0.85 0.01 0.40 0.01 52( δ, δ, ,

0) 0.72 0.72 0.01 0.01 47(0 , , δ, δ ) 0.01 0.01 0.42 0.42 60( δ, δ, δ, δ ) 0.69 0.69 0.23 0.23 44 θ Fail to reject Reject one Reject two Reject three Reject four(0 , , ,

0) 0.95 0.04 0.01 0.00 0.00( δ, , ,

0) 0.08 0.89 0.03 0.01 0.00(0 , , δ,

0) 0.51 0.47 0.02 0.00 0.00( δ, , δ,

0) 0.06 0.62 0.32 0.01 0.00( δ, δ, ,

0) 0.03 0.49 0.47 0.01 0.00(0 , , δ, δ ) 0.42 0.32 0.25 0.01 0.00( δ, δ, δ, δ ) 0.01 0.43 0.35 0.09 0.11 Table 5. Under our proposed update procedure, probabilities of rejecting null hypotheses and expectedsample size under the corresponding conﬁguration of θ for our proposed update procedure. Where R i isthe event that we reject H oi and N is the total sample size. θ P θ ( R ) P θ ( R ) E θ ( N ) P θ ( R ) P θ ( R ) E θ ( N )(0 , , ,

0) 0.03 0.03 49 0.02 0.02 25( δ, , δ,

0) 0.93 0.01 47 0.50 0.01 24( δ, δ, δ, δ ) 0.73 0.73 46 0.32 0.32 18Original trial Additional trial θ Fail to reject Reject one Reject two Fail to reject Reject one Reject two(0 , , ,

0) 0.95 0.05 0.00 0.97 0.03 0.00( δ, , δ,

0) 0.07 0.92 0.01 0.50 0.49 0.01( δ, δ, δ, δ ) 0.01 0.51 0.48 0.57 0.21 0.21

Table 6. Under two separate trials, probabilities of rejecting null hypotheses and expected sample sizeunder the corresponding conﬁguration of θ for our option 1 assuming the trial continues beyond theinterim analysis. Where R i is the event that we reject H oi , N is the total sample size in the originaltrial and N is the total sample size in the additional trial.

6. Discussion

The motivation for adding a treatment to a trial in progress is clear. Should a new treatment be-come available it is desirable incorporate it allowing direct comparisons while preserving integrityand avoiding delays to the overall development process. Furthermore there are many practicalbeneﬁts. For example, when adding treatments to the trial in progress, while requiring a changeto randomisation procedures and an increase in the total possible recruitment, the adaptive na-ture of the design trial will mean centres are well prepared for such changes.Our proposed general framework for adding experimental treatments to a trial in progress8

T. Burnett and others θ P p ( R ) P p ( R ) P p ( R ) P p ( R ) E p ( N ) (excluding ﬁrst stage)(0 , , ,

0) 0.02 0.02 0.01 0.01 71( δ, , ,

0) 0.84 0.02 0.01 0.01 68(0 , , δ,

0) 0.01 0.01 0.49 0.01 71( δ, , δ,

0) 0.78 0.01 0.39 0.01 67( δ, δ, ,

0) 0.67 0.67 0.00 0.00 59(0 , , δ, δ ) 0.01 0.01 0.41 0.41 70( δ, δ, δ, δ ) 0.63 0.63 0.25 0.25 59 θ Fail to reject Reject one Reject two Reject three Reject four(0 , , ,

0) 0.95 0.04 0.00 0.00 0.00( δ, , ,

0) 0.16 0.82 0.02 0.00 0.00(0 , , δ,

0) 0.49 0.49 0.02 0.00 0.00( δ, , δ,

0) 0.08 0.67 0.24 0.01 0.00( δ, δ, ,

0) 0.05 0.54 0.40 0.01 0.00(0 , , δ, δ ) 0.41 0.33 0.25 0.01 0.00( δ, δ, δ, δ ) 0.02 0.46 0.34 0.10 0.08 Table 7. Under starting a new trial incorporating all treatments, probabilities of rejecting null hypothesesand expected sample size under the corresponding conﬁguration of θ for option 2 assuming the trialcontinues beyond the interim analysis. Where R i is the event that we reject H oi and N is the totalsample size. builds upon the work of Hommel (2001), allowing any trial with strong control of the FWERto add new hypotheses. This is achieved while simultaneously allowing other alterations to thedesign of the trial using the conditional error principal. The additional beneﬁt of this approachis that we ensure that all information already collected from volunteers to our trial is utilised ininference and decision making. When comparing our proposed approach to possible alternativeswe see from the results in Section 2.4 that our introduced approach is preferable.This framework can be applied in our motivational setting of MAMS platform trials. The ex-amples in Section 5 demonstrate that this does indeed strongly control the FWER as expected.The penalty of doing so in terms of the probability of rejecting the null hypotheses is marginaland only has a notable impact on the introduced arms, optimising the recruitment proportionsacross conﬁgurations of the true treatment eﬀects may reduce the impact of this further. In addi-tion the combination of utilising the existing data and the eﬃcient use of control patients acrossthe trial yields a reduction in the expected sample size when compared to alternatives that donot make such use of the existing data.The improvement of operating characteristics is not the primary motivation to adding treat-ments to a trial in progress. As is the argument for MAMS trials in general this allows forreduction in logistical and administrative eﬀort and speeding up the overall development processas well as allowing direct comparisons of the treatments within the same trial.Our general method for adding hypotheses to a trial in progress has broader application thanthe MAMS trials within which we have applied it. As noted in Section 3 any testing procedurethat gives strong control of the FWER may be written as a closed testing procedure and thus dding treatment arms to trials in progress

7. Supplementary material

Conditional FWER

To see that the conditional FWER is maximised under the global null we consider the crucialelement of the proof provided by Magirr and others (2012). The claim is that for A k,j ( δ k ) = (cid:110) Z k,j < l j + ( µ k − µ − δ k ) I / k,j (cid:111) (7.7)and B k,j ( δ k ) = (cid:110) l j + ( µ k − µ − δ k ) I / k,j < Z k,j < u j + ( µ k − µ − δ k ) I / k,j (cid:111) (7.8)and any (cid:15) > J (cid:91) j =1  j − (cid:92) j =1 B i,k ( δ k + (cid:15) k )  ∩ A k,j ( δ k + (cid:15) k )  ⊆ J (cid:91) j =1  j − (cid:92) j =1 B i,k ( δ k )  ∩ A k,j ( δ k )  .Take ω = ( Z ,k , ... , Z k,j ) ∈ J (cid:91) j =1  j − (cid:92) j =1 B i,k ( δ k + (cid:15) k )  ∩ A k,j ( δ k + (cid:15) k )  .For some m ∈ (1 , ... , J , Z k,m ∈ A k,j ( δ k + (cid:15) k ) and Z k,m ∈ B k,j ( δ k + (cid:15) k ) for j = 1 , ... , m − Z k,m ∈ A k,j ( δ k + (cid:15) k ) implies Z k,m ∈ A k,j ( δ k ) and Z k,m ∈ B k,j ( δ k + (cid:15) k ) implies Z k,m ∈ B k,j ( δ k ) ∪ A k,j ( δ k ) for j = 1 , ... , m −

1. Therefore, ω ∈ J (cid:91) j =1  j − (cid:92) j =1 B i,k ( δ k )  ∩ A k,j ( δ k )  .Writing Equations 7.7 and 7.8 in terms of the conditional boundaries we have A k,j ( δ k ) = (cid:110) Z (cid:48) k,j < l k , j (cid:48) + ( µ k − µ − δ k ) I / k,j (cid:111) and B k,j ( δ k ) = (cid:110) l (cid:48) k,j + ( µ k − µ − δ k ) I / k,j < Z (cid:48) k,j < u (cid:48) k,j + ( µ k − µ − δ k ) I / k,j (cid:111) , this does not change the arguments presented above and thus this crucial condition holds.7.2 Preserving consonance

Consider a MAMS trial for K novel treatments and a common control conducted over J stages.Consider the eﬃcacy boundaries u = ( u , ... , u J ) deﬁned to achieve a FWER of α , where at in-terim analysis j = 1 , ... , J we reject H k if Z k,j > u j for k = 1 , ... , K . Equivalently we may deﬁne0 REFERENCES eﬃcacy boundaries for each treatment arm u k,j , where usually u ,j = ... = u K,j for all j = 1 , ... , J .Suppose without loss of generality that for H we require a test at some α (cid:48) < α , underthe same shape of testing boundary this will require a new eﬃcacy boundary u (cid:48) ,j (cid:62) u ,j for all j = 1 , ... , J . In order to achieve overall α for the testing boundary we choose u (cid:48) k,j (cid:54) u k,j for all k = 2 , ... , K , j = 1 , ... , J . In order to use this new eﬃcacy boundary for global rejections of thenull hypotheses we require that the testing procedure is consonant (all implied sub-hypothesesare tested at the appropriate level: H at α (cid:48) and H to H K at α ) as this implied consonanceis key in why a MAMS trial is a valid closed testing procedure.As α (cid:48) → u (cid:48) ,j → ∞ , assuming we always apply the same shape of testing boundary u (cid:48) k,j decreases monotonically for all k = 2 , ... , K , j = 1 , ... , J to ensure the overall procedureachieves α . The limit of this decrease is, by construction, a set of testing boundaries of thespeciﬁed form for K − J stages. Thatis to say as required the procedure is consonant. ReferencesBennett, Maxine and Mander, Adrian P . (2020). Designs for adding a treatment arm toan ongoing clinical trial.

Trials (1), 1–12. Bretz, Frank, Schmidli, Heinz, K¨onig, Franz, Racine, Amy and Maurer, Willi .(2006). Conﬁrmatory seamless phase ii/iii clinical trials with hypotheses selection at interim:general concepts.

Biometrical Journal: Journal of Mathematical Methods in Biosciences (4),623–634. Burnett, Thomas . (2017). Bayesian decision making in adaptive clinical trials [Ph.D. Thesis].University of Bath.

Dmitrienko, Alex, Tamhane, Ajit C and Bretz, Frank . (2009).

Multiple testing problemsin pharmaceutical statistics . CRC Press.

Dunnett, Charles W . (1955). A multiple comparison procedure for comparing several treat-ments with a control.

Journal of the American Statistical Association (272), 1096–1121. Hatfield, Isabella, Allison, Annabel, Flight, Laura, Julious, Steven A and Di-mairo, Munyaradzi . (2016). Adaptive designs undertaken in clinical research: a review ofregistered clinical trials.

Trials (1), 150. Hommel, Gerhard . (2001). Adaptive modiﬁcations of hypotheses after an interim analysis.

Biometrical Journal: Journal of Mathematical Methods in Biosciences (5), 581–589. EFERENCES Jaki, Thomas . (2013). Uptake of novel statistical methods for early-phase clinical studies in theuk public sector.

Clinical trials (2), 344–346. Jaki, Thomas . (2015). Multi-arm clinical trials with treatment selection: what can be gainedand at what price?

Clinical Investigation (4), 393–399. Jaki, Thomas and Hampson, Lisa V . (2016). Designing multi-arm multi-stage clinical trialsusing a risk–beneﬁt criterion for treatment selection.

Statistics in medicine (4), 522–533. Jaki, Thomas Friedrich, Pallmann, Philip Steffen and Magirr, Dominic . (2019). Ther package mams for designing multi-arm multi-stage clinical trials.

Journal of Statistical Soft-ware (4). Koenig, Franz, Brannath, Werner, Bretz, Frank and Posch, Martin . (2008). Adap-tive dunnett tests for treatment selection.

Statistics in Medicine (10), 1612–1625. Magirr, Dominic, Jaki, Thomas and Whitehead, John . (2012). A generalized dunnett testfor multi-arm multi-stage clinical studies with treatment selection.

Biometrika (2), 494–501. Magirr, Dominic, Stallard, Nigel and Jaki, Thomas . (2014). Flexible sequential designsfor multi-arm clinical trials.

Statistics in Medicine (19), 3269–3279. Marcus, Ruth, Eric, Peritz and Gabriel, K Ruben . (1976). On closed testing procedureswith special reference to ordered analysis of variance.

Biometrika (3), 655–660. O’Brien, Peter C and Fleming, Thomas R . (1979). A multiple testing procedure for clinicaltrials.

Biometrics , 549–556.

Parmar, Mahesh KB, Barthel, Friederike M-S, Sydes, Matthew, Langley, Ruth,Kaplan, Rick, Eisenhauer, Elizabeth, Brady, Mark, James, Nicholas, Bookman,Michael A, Swart, Ann-Marie and others . (2008). Speeding up the evaluation of newagents in cancer.

Journal of the National Cancer Institute (17), 1204–1214.

Pocock, Stuart J . (1977). Group sequential methods in the design and analysis of clinicaltrials.

Biometrika (2), 191–199. Proschan, Michael A and Hunsberger, Sally A . (1995). Designed extension of studiesbased on conditional power.

Biometrics , 1315–1324.

R Core Team . (2019).

R: A Language and Environment for Statistical Computing . R Founda-tion for Statistical Computing, Vienna, Austria.2

REFERENCESRoyston, Patrick, Parmar, Mahesh KB and Qian, Wendi . (2003). Novel designs for multi-arm clinical trials with survival outcomes with an application in ovarian cancer.

Statistics inmedicine (14), 2239–2256. Schmidli, Heinz, Bretz, Frank, Racine, Amy and Maurer, Willi . (2006). Conﬁrma-tory seamless phase ii/iii clinical trials with hypotheses selection at interim: applications andpractical considerations.

Biometrical Journal (4), 635–643. Stallard, Nigel, Kunz, Cornelia Ursula, Todd, Susan, Parsons, Nicholas andFriede, Tim . (2015). Flexible selection of a single treatment incorporating short-term end-point information in a phase ii/iii clinical trial.

Statistics in medicine (23), 3104–3115. Stallard, Nigel and Todd, Susan . (2003). Sequential designs for phase iii clinical trialsincorporating treatment selection.

Statistics in medicine (5), 689–703. Sugitani, Toshifumi, Posch, Martin, Bretz, Frank and Koenig, Franz . (2018). Flexiblealpha allocation strategies for conﬁrmatory adaptive enrichment clinical trials with a prespec-iﬁed subgroup.

Statistics in medicine (24), 3387–3402. Sydes, Matthew R, Parmar, Mahesh KB, James, Nicholas D, Clarke, Noel W,Dearnaley, David P, Mason, Malcolm D, Morgan, Rachel C, Sanders, Karenand Royston, Patrick . (2009). Issues in applying multi-arm multi-stage methodology to aclinical trial in prostate cancer: the mrc stampede trial.

Trials (1), 39. Urach, Susanne and Posch, Martin . (2016). Multi-arm group sequential designs with asimultaneous stopping rule.

Statistics in medicine (30), 5536–5550. Wason, James, Magirr, Dominic, Law, Martin and Jaki, Thomas . (2016). Some rec-ommendations for multi-arm multi-stage trials.

Statistical methods in medical research (2),716–727. Whitehead, John . (1997).