Learning for Dose Allocation in Adaptive Clinical Trials with Safety Constraints
Cong Shen, Zhiyang Wang, Sofia S. Villar, Mihaela van der Schaar
LLearning for Dose Allocation in Adaptive Clinical Trials with SafetyConstraints
Cong Shen Zhiyang Wang Sofía S. Villar Mihaela van der Schaar
Abstract
Phase I dose-finding trials are increasingly chal-lenging as the relationship between efficacy andtoxicity of new compounds (or combination ofthem) becomes more complex. Despite this,most commonly used methods in practice fo-cus on identifying a Maximum Tolerated Dose(MTD) by learning only from toxicity events. Wepresent a novel adaptive clinical trial methodol-ogy, called Safe Efficacy Exploration Dose Al-location (SEEDA), that aims at maximizing thecumulative efficacies while satisfying the toxicitysafety constraint with high probability. We evalu-ate performance objectives that have operationalmeanings in practical clinical trials, includingcumulative efficacy, recommendation/allocationsuccess probabilities, toxicity violation probabil-ity, and sample efficiency. An extended SEEDA-Plateau algorithm that is tailored for the increase-then-plateau efficacy behavior of molecularly tar-geted agents (MTA) is also presented. Throughnumerical experiments using both synthetic andreal-world datasets, we show that SEEDA out-performs state-of-the-art clinical trial designs byfinding the optimal dose with higher success rateand fewer patients.
1. Introduction
An adaptive clinical trial utilizes the accumulated results todynamically modify its future trajectory for better efficiencyand ethics, while preserving the integrity and validity of thestudy. Studies such as the phase I trial in Acute MyeloidLeukaemia in (Yap et al., 2013) and Cancer Research UKstudy CR0720-11 in (Whitehead et al., 2012) have suggested University of Virginia, USA University of Pennsylvania,USA University of Cambridge, United Kingdom University ofCalifornia, Los Angeles, USA. Correspondence to: Cong Shen
Proceedings of the th International Conference on MachineLearning , Vienna, Austria, PMLR 108, 2020. Copyright 2020 bythe author(s). that even some simple forms of adaptive design lead to betterusage of resources and require fewer participants. Thesepromising results have spawned the interest in developingadaptive clinical trial methodologies in recent years (Villaret al., 2015a; Pallmann et al., 2018; Atan et al., 2019; Leeet al., 2020), which is of great importance because runningan actual clinical trial on human subjects is expensive andethically sensitive. A well-designed trial methodology withthorough theoretical and simulated investigation is widelyacknowledged as a crucial first step.Traditionally, the goal of phase I clinical trials is to iden-tify the Maximum Tolerated Dose (MTD) of a cytotoxic(CTX) or therapeutic agent, which is then used for subse-quent studies (Storer, 1989). However, modern cancer phaseI trials test antineoplastic agents in patients with advancedcancer stages, who have often exhausted all other availabletreatment options (Roberts et al., 2004). These participantsusually expect therapeutic benefit from participating in thetrial, which has motivated the trial design to include ef-ficacy as a co-primary end point of phase I dose-findingstudies (Yan et al., 2017; Paoletti & Postel-Vinay, 2018). Inaddition, the monotonic assumption for the dose-efficacyrelationship is widely adopted in state of the art designs,which is reasonable for cytotoxic agents but may not ap-ply to the new molecularly targeted agents (MTA) such asmonoclonal antibodies (see (Postel-Vinay et al., 2009) foran exemplary trial that illustrates this issue). Designingadaptive clinical trials that can properly address the intrinsicconflict between learning and treatment effectiveness forgeneral dose-response models has become an important taskfor phase I clinical trials.In addition to the well-known 3+3 design (Storer, 1989) andcontinual reassessment method (CRM) (O’Quigley et al.,1990) (and its many variants), Bayesian approaches suchas Thompson Sampling (TS) (Aziz et al., 2019) and Gittinsindex (Villar et al., 2015a;b) have been proposed in the liter-ature for dose-finding studies. However, these methods wereoriginally designed for simplified models that do not capturesome of the unique characteristics of clinical trials, oftenleading to lack of randomization (Villar et al., 2015b), ineffi-cient use of side information (Villar & Rosenberger, 2018),and reduced power levels and estimation issues. Notably,for cases where the best dose for combination therapies a r X i v : . [ c s . L G ] J un earning for Dose Allocation in Adaptive Clinical Trials with Safety Constraints Table 1: Representative adaptive clinical trial studies
Study Treatment Category Methodology Evaluation (Tighiouart et al., 2014) Veliparib CTX EWOC-PH simulated trial(Whitehead et al., 2012) MK-0752 CTX joint phase I and II design simulated trial(Lee et al., 2017) Erlotinib MTA extended TITE-CRM simulated trial(Thiessen et al., 2010) Lapatinib MTA escalation to DLT real-world trial datais to be found, unknown synergistic/antagonist effects arelikely to exist and naive designs will fail to identify them.For MTA, the existence of a plateau of efficacy has beendiscussed in (Zang et al., 2014) and (Riviere et al., 2018),which indicates that the toxicity constraint must be jointlystudied with the dose-efficacy relationship for certain newcompounds. This is also confirmed by the real-world trialresult; see (Tighiouart et al., 2014). Last but not the least,safety constraints such as minimizing the adverse events(AE) (Petroni et al., 2017) have not been properly evalu-ated with theoretical guarantees. Table 1 summarizes somerepresentative studies in this direction.In this paper, we address these challenges by developingnew dose-finding methods that explicitly impose safety con-straints to the allocation and recommendation of dose levelsin a phase I clinical trial. Through the lens of multi-armedbandits (MAB), we propose the
Safe Efficacy ExplorationDose Allocation (SEEDA) algorithm that adaptively updatesthe admissible set of dose levels satisfying the safety con-straints, thus limiting the exploration of doses with harmfuleffect. Performance analysis for SEEDA is carried out withrespect to several measures that have operational meaningsin clinical trials, including the probability of safety con-straints violation, the average efficacy for patients, and therecommendation and allocation probabilities. Noting thatSEEDA only leverages the dose-toxicity logistic model andmakes no assumptions on the efficacy, we then show that,by considering the increasing-then-plateau feature of thedose-efficacy relationship for MTA,
SEEDA-Plateau leadsto better performance by leveraging the unimodal structure.Experiments on simulated datasets as well as clinical tri-als built from real-world datasets show that the proposedmethods are capable of finding the optimal dose with highersuccess rate and fewer patients in most cases, compared toother state-of-the-art designs.
2. Model and problem formulation
In a phase I dose-finding clinical trial, a total of K doses aregiven where the k -th dose is denoted as d k ∈ D , k ∈ K = { , , ..., K } . The performance is characterized by both efficacy and toxicity . We model the efficacy X and toxicity Y for dose d k as Bernoulli random variables with unknownprobabilities q k and p k , respectively, where X = 1 ( X = 0 ) indicates that the dose level is effective (not effective), and Y = 1 ( Y = 0 ) suggests that the dose is harmful (notharmful) to the patient .We consider adaptive clinical trials where informationlearned from previous trial patients can be used in allocatingdoses to subsequent patients (Atan et al., 2019; Villar et al.,2015a; Aziz et al., 2019). For the t -th patient, dose I ( t ) isselected based on a policy that uses past observations, andadministrated to the patient. The efficacy outcome X t andtoxicity response Y t are realized based on their distributions X t ∼ Ber ( q I ( t ) ) and Y t ∼ Ber ( p I ( t ) ) , and observed bythe trialist.We adopt a well-known dose-toxicity logistic model pro-posed by in (O’Quigley et al., 1990) to describe the toxicityprobability for different dose levels: p k ( a ) = (cid:18) tanh d k + 12 (cid:19) a , (1)where a is a global parameter for all the dose levels. It canbe verified that Eqn. (1) satisfies the assumption that thetoxicity monotonically increases with dose d k . The unsafedose levels are defined as those whose toxicity probabilities p k ’s are above a pre-determined target toxicity probability θ ,which is referred as the MTD threshold. Hence the toxicitiesof all doses can be written as p ≤ p ≤ · · · ≤ p M <θ < p M +1 ≤ · · · p K where the (unknown) M denotes thenumber of safe doses. The efficacy-dose relationship is notmodeled to allow for the development of a general algorithm.The specific increase-then-plateau efficacy behavior of MTAwill be exploited in Section 4. Several objectives are often desired for a successful dose-finding study, which are summarized as follows.•
Successful recommendation.
At the end of the trial ( n patients) a dose recommendation ˆ k n is made, which isdesired to match the optimal dose k ∗ that is the lowestsafe dose that achieves the highest efficacy (Zang et al.,2014): k ∗ = min { k : q k = max l : l ∈K ,p l ≤ θ q l } .• Effective treatment.
The cumulative treatment for trialparticipants (cid:80) ni =1 X t is desired to be maximized. This is typically measured by the presence of absence of adose-limiting toxicity (DLT) reported in a fixed evaluation windowafter administrating the drug. earning for Dose Allocation in Adaptive Clinical Trials with Safety Constraints • Minimal violation of the safety constraint.
There aredifferent formulations for the safety constraint. One is tominimize E [ (cid:80) k ∈K ,p k >θ N k ( n ) /n ] where N k ( t ) denotesthe number of times dose k is allocated to the first t pa-tients. Another formulation is to minimize the probabilitythat the average toxicity exceeds the MTD threshold.• Small sample size.
Most phase I trials have a pre-determined n which is decided as the minimum number oftrial participants to achieve a pre-defined confidence levelof successful recommendation. It is desirable to have asmall n for cost and efficiency considerations.Proposing a learning model that explicitly guarantees allof the above objectives is elusive and non-constructive indeveloping the dose-allocation policy. We thus formulatedose-finding clinical trials as an online efficacy learningproblem with explicit safety constraint , and subsequentlyprovide performance analysis on the metrics of interest.Specifically, we aim at maximizing the cumulative efficacyover a finite number of patients n while simultaneouslyguaranteeing that the average toxicity observed from the n dose allocations is kept under the probability threshold θ with high probability. This can be written as:maximize E (cid:34) n (cid:88) t =1 X t (cid:35) subject to P (cid:34) n n (cid:88) t =1 Y t ≤ θ (cid:35) ≥ − δ. (2)Essentially, problem formulation (2) focuses on safe explo-ration among all the dose levels to maximize cumulativeefficacies. Clinical trial designs for (2) thus need to pursueboth objectives of toxicity and efficacy.
3. The SEEDA algorithm
The proposed Safe Efficacy Exploration Dose Allocation(SEEDA) design is completely described in Algorithm 1. Inparticular, ˆ p k ( t ) and ˆ q k ( t ) are the estimated toxicity and effi-cacy, respectively, after administrating the t -th patient. Theprinciple of dose selection is to first dynamically constructthe admissible set D ( t ) using the Upper Confidence Bound(UCB) principle (Auer et al., 2002), where the confidenceinterval α ( t ) is constructed as α ( t ) = ¯ C K (cid:32) log Kδ t (cid:33) ¯ γ , (3)where ¯ C and ¯ γ are algorithm parameters . Note that theadmissible set consists of doses that, with high confidence,satisfy the toxicity constraint. See Section B in the supplementary material for a discussionon how to select these algorithm parameters.
Then, limiting to those in the admissible set D ( t ) , the algo-rithm again applies the UCB principle (UCB-1 from (Aueret al., 2002)) to select a dose with the largest F ( p, s, n ) forthe efficacy estimate: F ( p, s, n ) = p + (cid:114) c log( n ) s , (4)with c denoting the UCB-1 coefficient. It should be notedthat (4) can be replaced by other UCB principles, e.g., KL-UCB (Garivier & Cappè, 2011). Algorithm 1
The Safe Efficacy Exploration Dose Alloca-tion (SEEDA) Algorithm
Input: p k ( a ) for each k ∈ K ; MTD threshold θ ; totalnumber of patients n . Initialize: N k (1) = 0 , ˆ p k (1) = 0 , ˆ q k (1) = 0 , ∀ k ∈ K ;Sample each dose once and set: I ( t ) = t , ˆ q I ( t ) ( K ) = X t , ˆ p I ( t ) ( K ) = Y t , N I ( t ) ( K ) = 1 , for t = 1 to K ; t = K + 1 . while t ≤ n do Compute the estimated parameter: ˆ a ( t ) = (cid:80) Kk =1 w k ( t − a k ( t − ; Set the admissible set: D ( t ) = { d k ∈ D : p k (ˆ a ( t ) + α ( t )) ≤ θ } ; Select dose: I ( t ) =arg max d k ∈D ( t ) F (ˆ q k ( t ) , N k ( t ) , t ) ,; Observe the revealed outcomes X t and Y t ; Update estimations: ˆ q I ( t ) ( t ) = ˆ q I ( t ) ( t − N I ( t ) ( t − X t N I ( t ) ( t − , ˆ p I ( t ) ( t ) = ˆ p I ( t ) ( t − N I ( t ) ( t − Y t N I ( t ) ( t − , N I ( t ) ( t ) = N I ( t ) ( t −
1) + 1 ; Update parameter estimation: ˆ a I ( t ) ( t ) =arg min a ∈A | p I ( t ) ( a ) − ˆ p I ( t ) ( t ) | ; Update weights: w k ( t ) = N k ( t ) /t , ∀ d k ∈ D ; t = t + 1 . end whileOutput: ˆ d ( n ) = arg max d k : p k (ˆ a ( n )) ≤ θ p k (ˆ a ( n )) . The SEEDA algorithm is developed with the aim to solveproblem (2). It is thus important to analyze (a) whetherthe cumulative efficacy is maximized, and (b) how oftenthe toxicity constraint is violated. For metric (a), it canbe equivalently formulated as regret minimization, i.e., thecumulative efficacy difference between the oracle policywith full information and that of the learning algorithm.Formally, the efficacy regret is defined as R ( n ) = q ∗ n − E (cid:34) n (cid:88) t =1 q I ( t ) (cid:35) , (5)where q ∗ = q k ∗ denotes the efficacy associated with theoptimal dose defined in Section 2.2, and a ∗ denotes the true earning for Dose Allocation in Adaptive Clinical Trials with Safety Constraints parameter in (1). As for metric (b), we need to evaluate e ( n ) = P (cid:34) n n (cid:88) t =1 p I ( t ) ( a ∗ ) > θ (cid:35) , in conjunction with (5), i.e., whether the proposed SEEDA algorithm minimizes R ( n ) and satisfies e ( n ) ≤ δ at thesame time. In addition, other performance measures suchas successful recommendation probability and sample effi-ciency are of practical interest, and we provide theoreticalguarantees for them as well. Due to space limitations, allproofs are provided in the supplementary material.3.2.1. C UMULATIVE EFFICACY
We start the theoretical analysis by showing that for eachpatient t in SEEDA, the dose levels whose toxicities arebelow the MTD threshold are included in the admissible setwith high probability. This corresponds to the type I errorevent that is of interest in clinical trials. Lemma 1 P [ p k (ˆ a ( t ) + α ( t )) > θ ] ≤ δ , ∀ p k ( a ∗ ) ≤ θ . Next we prove that with sufficient patients, the dose levelsexceeding the toxicity threshold are excluded from the ad-missible set with high probability. This corresponds to the type II error event in clinical trials.
Lemma 2 If t > t = (cid:16) ¯ C K | ∆ − (cid:15) | (cid:17) γ log Kδ , ∆ =min k ∈K ∆ k , where ∆ k = | a ∗ − p − k ( θ ) | represents thegap between a ∗ and the parameter when the toxicity is at θ ,then: P [ p k (ˆ a ( t ) + α ( t )) ≤ θ ] ≤ exp( − t(cid:15) ) , ∀ p k ( a ∗ ) > θ. (6)Combining Lemmas 1 and 2 leads to the main result oncumulative efficacy regret. Theorem 1
With t defined in Lemma 2, the regret ofSEEDA can be upper bounded as: R ( n ) ≤ (cid:88) d k : p k ( a ∗ ) ≤ θ c log( n ) q ∗ − q k + (cid:18) nδQ + 12 t + K − M (cid:15) (cid:19) (7) where Q = max i ∈K | q i − q k ∗ | denotes the maximal single-step regret, and (cid:15) > is a constant. Furthermore, if δ = O ( n ) , we have that R ( n ) ≤ O (log n ) . Theorem 1 indicates that the efficacy regret is bounded by O (log n ) . A closer look at this scaling reveals that it consistsof two parts. The first is due to the structureless model forefficacy – we impose no assumption on the efficacy of differ-ent dose levels. The second part, which is reflected through t , is determined by the structured model for toxicity, whichaffects the admissible set. As will be shown in Section 4,with the increase-then-plateau efficacy assumption, the first log n component can be further improved.3.2.2. S AFETY CONSTRAINT VIOLATION
We now move on to analyzing the safety constraint violation.The first result is to verify whether the SEEDA algorithmindeed satisfies the safety constraint in problem (2).
Theorem 2
For any given n , the average toxicity observedfrom the SEEDA algorithm satisfies P (cid:34) n n (cid:88) t =1 p I ( t ) − θ ≤ C (cid:15) γ (cid:35) ≥ − δ, for an arbitrary (cid:15) > . C and γ are problem-dependentparameters defined in Section A of the supplementary mate-rial. The safety constraint in problem (2) is formulated basedon the average toxicity exceeding the MTD threshold. Inpractice, we are often interested in minimizing the numberof patients that have been exposed to unsafe dose levels, E [ (cid:80) k ∈K ,p k >θ N k ( n ) /n ] . Corollary 1 analyzes this metric. Corollary 1
The number of unsafe dose allocations fromSEEDA, i.e., the selected dose levels exceed the MTD thresh-old, can be bounded as: E (cid:88) d k : p k >θ N k ( n ) ≤ t + K − M (cid:15) . Interestingly, Corollary 1 indicates that unsafe dose alloca-tions in SEEDA are upper bounded by a constant, which islinear in the number of unsafe doses K − M regardless ofthe number of trial participants n .3.2.3. R ECOMMENDATION ACCURACY
Finally, we analyze the recommendation accuracy ofSEEDA at the end of the n -th dose allocation. Corollary 2
The probability that SEEDA recommends theMTD satisfies: P (cid:20) ˆ d ( n ) = arg max d k : p k ≤ θ p k (cid:21) ≥ − δ , (8) where δ = 2 K exp (cid:18) − (cid:16) ∆ M C K (cid:17) γ n (cid:19) . Corollary 2 guarantees the finding of the MTD with highprobability. The recommendation error rate decays expo-nentially with the number of trial participants, which is a earning for Dose Allocation in Adaptive Clinical Trials with Safety Constraints nice property. It is worth noting that a lower bound of theminimal number of trial participants for a given accuracyrequirement can be inferred from the upper bound of rec-ommendation error rate (8). This is a practically importantresult, as sample efficiency directly relates to the cost andethical constraints of a trial. This is further illustrated in thenumerical experiments in Section 5.1.3.
4. Extension to the increase-then-plateauefficacy model
Algorithm 2
The SEEDA-Plateau Algorithm
Input: p k ( a ) for each k ∈ K ; MTD threshold θ ; totalnumber of patients n . Initialize: N k (1) = 0 , ˆ p k (1) = 0 , ˆ q k (1) = 0 , ∀ k ∈ K ; L (1) = K ; η = 2 ; l k = 0 , ∀ k ∈ K ; Sample each doseonce and set: I ( t ) = t , ˆ q I ( t ) ( K ) = X t , ˆ p I ( t ) ( K ) = Y t , N I ( t ) ( K ) = 1 , for t = 1 to K ; t = K + 1 . while t ≤ n do Compute the estimated parameter: ˆ a ( t ) = (cid:80) Kk =1 w k ( t − a k ( t − ; Set the admissible set: D ( t ) = { d k ∈ D : p k (ˆ a ( t ) + α ( t )) ≤ θ } ; Set L ( t ) = arg max d k ∈D ( t ) ˆ q k ( t ) and increase l L ( t ) by 1; If l L ( t ) − η +1 ∈ N , I ( t ) = L ( t ) ; Otherwise I ( t ) =arg max { L ( t ) − ,L ( t ) ,L ( t )+1 } (cid:84) D ( t ) F (ˆ q k ( t ) , N k ( t ) , t ) ; Observe the revealed outcomes X t and Y t ; Update estimations: ˆ q I ( t ) ( t ) = ˆ q I ( t ) ( t − N I ( t ) ( t − X t N I ( t ) ( t − , ˆ p I ( t ) ( t ) = ˆ p I ( t ) ( t − N I ( t ) ( t − Y t N I ( t ) ( t − , N I ( t ) ( t ) = N I ( t ) ( t −
1) + 1 ; Update parameter estimation: ˆ a I ( t ) ( t ) =arg min | p I ( t ) ( a ) − ˆ p I ( t ) ( t ) | ; Update weights: w k ( t ) = N k ( t ) /t , ∀ d k ∈ D ; t = t + 1 . end while Estimate the turning point of efficacy as: L ( n ) = min k : d k ∈D ( n ) (cid:110) m ≥ k : | ˆ q m ( n ) − ˆ q m +1 ( n ) |≤ (cid:115) c log( n ) N m ( n ) + (cid:115) c log( n ) N m +1 ( n ) , ˆ q m ( n ) ≤ ˆ q m +1 ( n ) (cid:111) ,L ( n ) = arg max d k : p k (ˆ a ( n )) ≤ θ p k (ˆ a ( n )) . Output: ˆ d ( n ) = min { L ( n ) , L ( n ) } .The proposed SEEDA dose allocation policy is general inthe sense that no efficacy model is assumed. In practice,however, efficacy often exhibits certain structure which, ifutilized correctly, may further improve the performance. For conventional cytotoxic agents, efficacy monotonicallyincreases with dose levels. The same is not true for MTAs,for which the dose-efficacy curve increases initially andthen plateaus after reaching the level of saturation (Zanget al., 2014; Riviere et al., 2018). In this section, we modifythe SEEDA algorithm to handle the increase-then-plateauefficacy model, and analyze its performance.Formally, we introduce the following increase-then-plateauefficacy assumption, which holds for MTA. Assumption 1 q k , k ∈ K satisfies q ≤ q ≤ q ≤ · · · ≤ q N = q N +1 = · · · = q K . The
SEEDA-Plateau algorithm is given in Algorithm 2.With Assumption 1, the efficacy has an inherent non-decreasing structure. The key idea is to combine the se-lection rule of OSUB in (Combes & Proutière, 2014) andreform step 4 in Algorithm 1. Note that step 4 calculates L ( t ) as the estimated dose level with the optimal efficacyand safe toxicity at t . Algorithm 2 not only selects thisdose level frequently enough, but also keeps exploring itsneighboring dose levels.We now analyze the regret of SEEDA-Plateau and presentthe result in Theorem 3. Compared to Theorem 1 forSEEDA without the increase-then-plateau efficacy model,one can see that the first log( n ) coefficient improves from c (cid:80) d k : p k ( a ∗ ) ≤ θ ( q ∗ − q k ) − to c ( q ∗ − q N − ) − . This gaincomes precisely from the increase-then-plateau efficacymodel, as the unimodal structure that exploits this struc-ture leads to log( n ) regret only from the neighboring arm. Theorem 3
The regret of SEEDA-Plateau satisfies: R ( n ) ≤ c log( n ) q ∗ − q N − + O (log log( n )) + (cid:0) nδQ + t + K − M (cid:15) (cid:1) . (9) Furthermore, if δ = O ( n ) , we have that R ( n ) ≤ O (log n ) . The optimal dose level we have defined before can be rewrit-ten as k ∗ = min { M, N } , the recommendation accuracy ofSEEDA-Plateau is given in Theorem 4. Theorem 4
With c set as < c < , the probability thatSEEDA-Plateau fails to recommend the optimal dose canbe bounded as: P [ ˆ d ( n ) (cid:54) = k ∗ ] ≤ n c + δ . (10)Compared to Corollary 2, the error probability of SEEDA-Plateau is increased by n c . This is due to the ambiguityof the efficacy-optimal dose and the toxicity-optimal one,which leads to the two candidate doses L ( n ) and L ( n ) .In practice, however, this ambiguity can be eliminated viapreliminary experiments. earning for Dose Allocation in Adaptive Clinical Trials with Safety Constraints
5. Experiments
To investigate the operational characteristics and evalu-ate the performance of the proposed adaptive designs, wepresent an experimental study with K = 6 dose levels and n = 300 trial cohorts, with each cohort consists of 3 patients.The estimation is updated after observing all individual out-comes from a cohort. All experiment results are obtainedwith 1000 trial repetitions. The MTD threshold is set as θ = 0 . .The trial setup is the same as (Riviere et al., 2018) and (Zanget al., 2014), and we have simulated eight different efficacyand toxicity scenarios . Due to the space limitation, we onlyreport the results of the first scenario, where efficacy reach-ing the maximal value (the optimal dose) before toxicityhits MTD threshold. Additional results for this setting aswell as the other seven scenarios are reported in Section Kto M in the supplementary material.The following baseline designs are used for compari-son (whenever appropriate), whose details can be foundin the supplementary material: 3+3, CRM, MCRM, In-dependent TS, KL-UCB, UCB-1, and multi-objectivebandits. Note that MTA-RA and other TS variants in(Riviere et al., 2018) are not included because they as-sume a different truncated efficacy model, which needsto be perfectly known to the algorithm. For algo-rithms that require prior information of toxicity and effi-cacy, they are set as [0 . , . , . , . , . , . and [0 . , . , . , . , . , . , respectively.5.1.1. R ECOMMENDATION AND ALLOCATIONACCURACY
We report the allocation and recommendation percentagesof each dose for all considered designs in Table 2. Dose 3(in bold font) is the optimal biological dose for this scenario.However, we comment that dose 4 also satisfies the optimal-ity condition without violating the safety constraint. Never-theless, it has a higher toxicity probability (although still be-low MTD) without increasing efficacy; thus less preferableto Dose 3. We note that for all the considered designs, therecommendation rule is ˆ d ( n ) = arg max k :ˆ p k ( n ) ≤ θ ˆ q k ( n ) ,where ˆ q k ( n ) and ˆ p k ( n ) are the final estimations of toxicityand efficacy for dose level d k , respectively. This suggeststhat safety constraint is considered in recommendation.We can see from the results that SEEDA almost equallyrecommends dose 3 and 4 with a total probability of 94.6%. We remark that although no real-world trial data is utilizedin the experiment, this approach is commonly accepted in clinicaltrials as the first-step study for a new methodology; see (Whiteheadet al., 2012; Yap et al., 2013; Zang et al., 2014; Riviere et al., 2018).
Number of Cohorts T y p e I E rr o r R a t e ( % ) SEEDASEEDA-PlateauIndep TSKLUCBUCBMulti obj
Number of Cohorts
50 100 150 200 250 300 T y p e II E rr o r R a t e ( % ) SEEDASEEDA-PlateauIndep TSKLUCBUCBMulti obj
Figure 1: Type I (left) and type II (right) error rates as afunction of number of cohorts.This is because the algorithm cares about maximizing effi-cacy without violating safety constraint, and both dose 3 and4 satisfy such conditions. As a result, SEEDA treats bothequally as the optimal solution. However, by leveraging theincrease-then-plateau model assumption, SEEDA-Plateaucan further break the “tie” between dose 3 and 4, and cor-rectly recognize that dose 3 is the optimal biological dose:it chooses dose 3 at 86.6% while dose 4 only 10.4%. Wesee that the gain of SEEDA-Plateau is significant over allthe other designs (even compared to SEEDA). For a moredetailed understanding of the recommendation accuracy, thecorresponding type I and type II error rates (definitions aregiven in Section J in the supplementary material) are plottedin Fig. 1, and we observe that both SEEDA and SEEDA-Plateau outperform other baseline methods over the rangeof cohorts.As for allocation, we observe that both SEEDA and SEEDA-Plateau concentrate at dose 3 and 4, while spending verylittle budget on both tail ends of the dosage. In particular,SEEDA-Plateau allocates the fewest percentages (1%) ofpatients to the most toxic dose 6 among all designs.5.1.2. C
ONVERGENCE AND SAFETY VIOLATION
Number of Cohorts E ff i cacy p e r P a t i e n t SEEDASEEDA-PlateauIndep TSKL-UCBUCB3+3CRMMCRMMulti obj
Number of Cohorts S a f e V i o l a t i on P e r ce n t a g e ( % ) SEEDASEEDA-PlateauIndep TSKL-UCBUCB3+3CRMMCRMMulti obj
Figure 2: Comparison of efficacy per patient (left) and thesafety violation percentage (right).To have a deeper understanding of the tradeoff between effi-cacy and toxicity, we plot side-by-side the convergence ofefficacy and toxicity as t increases in Fig. 2. KL-UCB, UCBand Independent TS have good convergence but suffer fromsignificant safety violation in the process since they do notconsider the safety constraint during exploration. CRM has earning for Dose Allocation in Adaptive Clinical Trials with Safety Constraints Table 2: Recommendation & allocation percentages of different designs. Optimal biological dose is
Recommended AllocatedToxicity probabilities 0.01 0.05 higher efficacy at the cost of bad safety constraint violation,while 3+3 performs poorly in efficacy but has the lowestsafety probability; this behavior is similarly observed formulti-objective bandits. The SEEDA(-Plateau) algorithm,in comparison, converges to the optimal efficacy at a slowerrate, but the exploration process is carefully controlled sothat the safety violation is minimized, which is evident fromthe right subplot of Fig. 2.5.1.3. S
AMPLE EFFICIENCY
Sample efficiency is measured by the minimum number oftrial participants to achieve a pre-specified recommenda-tion accuracy (also known as early stopping (Montori et al.,2005)). We start the trial with a minimum of 6 patients,and continue recruiting patients until the stopping conditionis triggered. Fig. 3 plots the average minimum number ofpatients to achieve a given a recommendation accuracy fordifferent algorithms . We see that SEEDA-Plateau outper-forms all other algorithms by a large margin, thanks to the“double dipping” of the model assumptions which gives themost accurate estimation of the optimal dose. In compari-son, SEEDA performs similarly to the baseline algorithms.The reason is that the goal of SEEDA is to recommend theefficacy-maximal dose that satisfies the safety constraint. Inthis particular setting, both dose 3 and 4 satisfy this condi-tion, and SEEDA does not have the mechanism to furtherminimize toxicity. This leads to a recommendation errorthat is similar to other baseline designs. Recommendation Accuracy N u m b e r o f P a t i e n t s SEEDASEEDA-PlateauIndep TSKL-UCBUCBMulti-obj
Figure 3: The minimum number of trial participants toachieve a given a recommendation accuracy.The sample efficiency advantage of SEEDA-Plateau is ofcritical importance in practice, as the significant cost associ-ated with clinical trials is mostly proportional to the numberof trial participants. Furthermore, reducing the number ofpatients while achieving the same level of accuracy mini-mizes the safety and ethical concern in the trial, which isanother important consideration.
We evaluate the SEEDA algorithms in two real-worlddatasets neurodeg and
IBSCovars based on (Biesheuvel &Hothorn, 2002). We first extract dose and resp variablesfrom the observations reported in the dataset. With thesesamples, we fit them into a commonly used Emax dose-response model as in (Bornkamp et al., 2011) with an R earning for Dose Allocation in Adaptive Clinical Trials with Safety Constraints
Table 3: Recommendation & allocation percentages of the neurodeg dataset. In each cell the first row reports the mean valueover 1000 repetitions, and the second row reports the (standard deviation).Recommended AllocatedToxicity 0.01 0.08 resp = 169 .
94 + 12 . dose .
85 + dose , IBScovars: resp = 0 .
26 + 0 . dose .
01 + dose . As for the toxicity event, since it is not reported in thedataset, we resort to simulations with model (1).The allocation and recommendation percentages of eachdose for all the algorithms are shown in Table 3 and Table4 for both datasets. We have similar observations as inthe synthetic experiment that SEEDA and SEEDA-Plateaurecommend the correct doses majority of the times, whilethe suboptimal recommendation is mostly safe in that thedoses immediately below MTD are recommended secondmost. The same is true for allocation.
6. Related works
This work is concerned with adaptive phase I clinical trials,whose uptake in practice is starting to increase considerably.See (Bretz et al., 2017; Pallmann et al., 2018) for recentcomprehensive surveys. The main motivation to use theseadaptive designs is to learn as the trial progresses and usethis learning to deliver more efficient or more ethically ap-pealing trials. Adaptive clinical trial with sequential patientrecruitment is considered in (Atan et al., 2019), but it does not address the subsequent dose allocation. The 3+3 and theCRM designs or their variations remain the de facto adaptivedesigns in practice for dose-finding studies (Petroni et al.,2017; Pallmann et al., 2018), although new methodologiesthat aim at better safety protection are also proposed (Leeet al., 2017). In recent years, there is a growing interestin adaptive trial designs for MTA because of its differentdose-response relationships (Zang et al., 2014; Riviere et al.,2018), but these studies do not explicitly enforce the safetyconstraints during the trial; neither do they provide theoreti-cal guarantees on the trial performance.Multi-armed bandit has long been considered as an impor-tant tool for learning in clinical trials, dating back to theearliest papers of (Thompson, 1933; Robbins, 1952). De-veloping bandit models and algorithms that better suit thespecific requirements of adaptive clinical trials has attractedsome attention in recent years. Villar et. al (Villar et al.,2015b; Villar & Rosenberger, 2018) adopted the (modified)forward-looking Gittins index rule for multi-arm clinicaltrials. The authors of (Wang et al., 2018) propose a regionalbandit model that can be applied to learning the drug dosageand patient response relationship. The sample complex-ity of thresholding bandit is analyzed in (Garivier et al.,2017), which matches MTD identification. Furthermore,dose-finding clinical trials with heterogeneous groups areinvestigated in (Lee et al., 2020) from a MAB perspective.Probably the closest work to ours is (Aziz et al., 2019),which also considers both toxicity and efficacy. However, earning for Dose Allocation in Adaptive Clinical Trials with Safety Constraints
Table 4: Recommendation & allocation percentages of the IBScovars datasets. In each cell the first row reports the meanvalue over 1000 repetitions, and the second row reports the (standard deviation).
Recommended AllocatedToxicity probabilities 0.01 0.10 the safety constraint, which is an essential constraint of real-world phase I trials, has not been explicitly considered inthese papers.On the other hand, the problem of safe exploration hasattracted a lot of attention recently, albeit often in control(Koller et al., 2018) and general reinforcement learning(Berkenkamp et al., 2017). The authors in (Sui et al., 2015)propose the SAFEOPT algorithm for safe exploration inGaussian processes, and (Kazerouni et al., 2017) presentsa variant of linear UCB method for the contextual linearbandit problem. A different line of works (Maillard, 2013;Galichet et al., 2013) consider minimizing risk in MAB, butthey are mostly casted in the mean-variance framework withrespect to the reward distribution.
7. Conclusions
Learning in adaptive clinical trials faces several unique chal-lenges that have not been well addressed, which may havecontributed to their lack of adoption in actual clinical trials.In particular, the safety constraints resulting from ethical andsocietal considerations have been insufficiently researched,which has motivated us to develop the SEEDA algorithmthat explicitly imposes safety constraints (in terms of tox-icity) while also aiming for maximum patient response ina dose-finding study. Theoretical analysis of SEEDA iscarried out and the proposed algorithm is further extendedto the increase-then-plateau efficacy model and shown to have smaller regret thanks to the unimodal structure. Theperformance advantages over state-of-the-art adaptive clin-ical trial designs are illustrated with experiments on bothsynthetic and real-world datasets.
8. Acknowledgements
CS acknowledges the funding support from Kneron, Inc.SSV thanks the funding received from the National Institutefor Health Research Cambridge Biomedical Research Cen-tre at the Cambridge University Hospitals NHS FoundationTrust and the UK Medical Research Council (grant number:MC_UU_00002/3). The research of MV has been supportedby ONR and NSF 1524417 and 1722516.
References
Atan, O., Zame, W. R., and van der Schaar, M. Sequentialpatient recruitment and allocation for adaptive clinical tri-als. In
Proceedings of The 22nd International Conferenceon Artificial Intelligence and Statistics , pp. 1891–1900,Apr 2019.Auer, P., Cesa-Bianchi, N., and Fischer, P. Finite-timeanalysis of the multiarmed bandit problem.
MachineLearning , 47(2):235–256, May 2002.Aziz, M., Kaufmann, E., and Riviere, M.-K. On multi-armed bandit designs for phase I clinical trials. arXive-prints , art. arXiv:1903.07082, March 2019. earning for Dose Allocation in Adaptive Clinical Trials with Safety Constraints
Berkenkamp, F., Turchetta, M., Schoellig, A. P., and Krause,A. Safe model-based reinforcement learning with stabil-ity guarantees. In
Proceedings of the 31st InternationalConference on Neural Information Processing Systems ,pp. 908–919, Long Beach, California, USA, December2017.Biesheuvel, E. and Hothorn, L. Many-to-one comparisonsin stratified designs.
Biometrical Journal , 44:101?–116,2002.Bornkamp, B., Bretz, F., Dette, H., and Pinheiro, J. C.Response-adaptive dose-finding under model uncertainty.
Annals of Applied Statistics , 5:1611–?1631, 2011.Bretz, F., Gallo, P., and Maurer, W. Adaptive designs: Theswiss army knife among clinical trial designs?
ClinicalTrials , 14(5):417–424, 2017.Combes, R. and Proutière, A. Unimodal bandits: Regretlower bounds and optimal algorithms. In
Proceedings ofthe 31th International Conference on Machine Learning ,pp. 521–529, Beijing, China, June 2014.Galichet, N., Sebag, M., and Teytaud, O. Exploration vsexploitation vs safety: risk-aware multi-armed bandits.In
Proceedings of the 5th Asian Conference on MachineLearning , pp. 245–260, November 2013.Garivier, A. and Cappè, O. The KL-UCB algorithm forbounded stochastic bandits and beyond. In
Proceedingsof Conference On Learning Theory (COLT) , 2011.Garivier, A., Ménard, P., and Rossi, L. Thresholding banditfor dose-ranging: The impact of monotonicity. arXive-prints , art. arXiv:1711.04454, November 2017.Kazerouni, A., Ghavamzadeh, M., Abbasi, Y., and Van Roy,B. Conservative contextual linear bandits. In
Proceedingsof Advances in Neural Information Processing Systems ,pp. 3910–3919, 2017.Koller, T., Berkenkamp, F., Turchetta, M., and Krause, A.Learning-based model predictive control for safe explo-ration. In
IEEE Conference on Decision and Control(CDC) , pp. 6059–6066, December 2018.Lee, H.-S., Shen, C., Jordon, J., and van der Schaar, M.Contextual constrained learning for dose-finding clinicaltrials. In
Proceedings of The 23rd International Confer-ence on Artificial Intelligence and Statistics , Aug. 2020.Lee, S. M., Ursino, M., Cheung, Y. K., and Zohar, S. Dose-finding designs for cumulative toxicities using multipleconstraints.
Biostatistics , 20(1):17–29, Nov. 2017.Maillard, O.-A. Robust risk-averse stochastic multi-armedbandits. In
Proceedings of the 24th International Con-ference on Algorithmic Learning Theory , pp. 218–233,Singapore, 2013. Montori, V. M. et al. Randomized trials stopped early forbenefit: A systematic review.
JAMA , 294(17):2203–2209,Nov. 2005.Neuenschwander, B., Branson, M., and Gsponer, T. Criticalaspects of the Bayesian approach to phase I cancer trials.
Statistics in Medicine , 27(13):2420–2439, 2008.O’Quigley, J., Pepe, M., and Fisher, L. Continual reassess-ment method: a practical design for phase 1 clinical trialsin cancer.
Biometrics , 43(1):33–48, 1990.Pallmann, P. et al. Adaptive designs in clinical trials: whyuse them, and how to run and report them.
BMC Medicine ,16(1):29, Feb 2018.Paoletti, X. and Postel-Vinay, S. Phase I–II trial designs:how early should efficacy guide the dose recommendationprocess?
Annals of Oncology , 29(3):540–541, Feb. 2018.Petroni, G. R., Wages, N. A., Paux, G., and Dubois, F. Im-plementation of adaptive methods in early-phase clinicaltrials.
Statistics in Medicine , 36(2):215–224, 2017.Postel-Vinay, S. et al. Clinical benefit in phase-I trials ofnovel molecularly targeted agents: does dose matter?
British Journal of Cancer , 100(9):1373–1378, May 2009.Riviere, M.-K., Yuan, Y., Jourdan, J.-H., Dubois, F., andZohar, S. Phase I/II dose-finding design for molecularlytargeted agent: Plateau determination using adaptive ran-domization.
Statistical Methods in Medical Research , 27(2):466–479, 2018.Robbins, H. Some aspects of the sequential design of exper-iments.
Bull. Amer. Math. Soc. , 58:527–535, 1952.Roberts, T. G. et al. Trends in the risks and benefits topatients with cancer participating in phase 1 clinical trials.
JAMA , 292(17):2130–2140, Nov. 2004.Storer, B. E. Design and analysis of phase I clinical trials.
Biometrics , 45:925–37, 1989.Sui, Y., Gotovos, A., Burdick, J. W., and Krause, A. Safeexploration for optimization with Gaussian processes. In
Proceedings of the 32nd International Conference onMachine Learning , pp. 997–1005, 2015.Thiessen, B. et al. A phase I/II trial of GW572016 (la-patinib) in recurrent glioblastoma multiforme: clinicaloutcomes, pharmacokinetics and molecular correlation.
Cancer Chemotherapy and Pharmacology , 65(2):353–361, Jan 2010.Thompson, W. On the likelihood that one unknown prob-ability exceeds another in view of the evidence of twosamples.
Biometrika , 25(3-4):285–294, December 1933. earning for Dose Allocation in Adaptive Clinical Trials with Safety Constraints
Tighiouart, M., Liu, Y., and Rogatko, A. Escalation withoverdose control using time to toxicity for cancer phase Iclinical trials.
PLOS ONE , 9:1–13, 03 2014.Villar, S. S. and Rosenberger, W. F. Covariate-adjustedresponse-adaptive randomization for multi-arm clinicaltrials using a modified forward looking gittins index rule.
Biometrics , 74(1):49–57, 2018.Villar, S. S., Bowden, J., and Wason, J. Multi-armed banditmodels for the optimal design of clinical trials: Benefitsand challenges.
Statistical Science , 30(2):199–215, May2015a.Villar, S. S., Wason, J., and Bowden, J. Response-adaptiverandomization for multi-arm clinical trials using the for-ward looking Gittins index rule.
Biometrics , 71(4):969–978, 2015b.Wang, Z., Zhou, R., and Shen, C. Regional multi-armedbandits. In
Proceedings of the 21st International Confer-ence on Artificial Intelligence and Statistics (AISTATS) ,pp. 510–518, Playa Blanca, Lanzarote, Canary Islands,Apr. 2018.Whitehead, J. et al. A novel phase I/IIa design for earlyphase oncology studies and its application in the eval-uation of MK-0752 in pancreatic cancer.
Statistics inMedicine , 31(18):1931–1943, 2012.Yahyaa, S. and Manderick, B. Thompson sampling for multi-objective multi-armed bandits problem. In
Proceedingsof European Symposium on Artificial Neural Networks,Computational Intelligence and Machine Learning , pp.47–52, Bruges, Belgium, April 2015.Yan, F., Thall, P. F., Lu, K. H., Gilbert, M. R., and Yuan, Y.Phase I–II clinical trial design: a state-of-the-art paradigmfor dose finding.
Annals of Oncology , 29(3):694–699,Dec. 2017.Yap, C. et al. Implementation of adaptive dose-findingdesigns in two early phase haematological trials: clinical,operational, and methodological challenges.
Trials , 14(1):O75, Nov 2013.Yoshida, K.
Emax Model Analysis with ’Stan’ .Columbia University, New York, USA, 2019.URL https://cran.r-project.org/web/packages/rstanemax .Zang, Y., Lee, J. J., and Yuan, Y. Adaptive designs for iden-tifying optimal biological dose for molecularly targetedagents.
Clinical Trials , 11(3):319–327, 2014. earning for Dose Allocation in Adaptive Clinical Trials with Safety Constraints
Supplementary Material: Learning for Dose Allocation in Adaptive ClinicalTrials with Safety Constraints
Cong Shen, Zhiyang Wang, Sofía S. Villar, Mihaela van der Schaar
A. Preliminaries
Before presenting the technical proofs, we introduce some notations and regularity assumptions on the dose-toxicity model,which can be verified to hold for Eqn. (1). For a general toxicity function p k ( a ) of an unknown parameter a ∈ A , thefollowing regularities are imposed: Assumption 2 Monotonicity:
For each k ∈ K and a, a (cid:48) ∈ A there exists C ,k > and < γ ,k , such that | p k ( a ) − p k ( a (cid:48) ) | ≥ C ,k | a − a (cid:48) | γ ,k .2) Hölder continuity:
For each k ∈ K and a, a (cid:48) ∈ A there exists C ,k > and < γ ,k ≤ , such that | p k ( a ) − p k ( a (cid:48) ) | ≤ C ,k | a − a (cid:48) | γ ,k . We note that both monotonicity and continuity assumptions are mild and standard in the literature; see (Wang et al., 2018).Proposition 1 immediately follows with Assumption 2.
Proposition 1
For functions p k ( a ) , ∀ k ∈ K that satisfy Assumption 2, we have:1) p k ( a ) is invertible;2) For each k ∈ K and d, d (cid:48) ∈ P , we have | p − k ( d ) − p − k ( d (cid:48) ) | ≤ ¯ C ,k | d − d (cid:48) | ¯ γ ,k , where ¯ γ ,k = γ ,k , ¯ C ,k = ( C ,k ) γ ,k . For ease of exposition, we denote C = min C ,k , C = max C ,k , γ = max γ ,k , γ = min γ ,k , ¯ γ = 1 /γ , and ¯ C = C − ¯ γ . B. Select Design Parameters
The parameters appeared in Assumption 2 collectively determine the confidence interval in Eqn. (3). We take function (1) asan example to show how to select these parameters. We have | p k ( a ) − p k ( a (cid:48) ) | ≥ C ,k | a − a (cid:48) | γ ,k , | p k ( a ) − p k ( a (cid:48) ) || a − a (cid:48) | ≥ C ,k | a − a (cid:48) | γ ,k − , min a ∈A p (cid:48) k ( a ) ≥ C ,k |A| γ ,k − , log (cid:18) tanh( d k ) + 12 (cid:19) ≥ C ,k |A| γ ,k − . Therefore, we can first set γ ,k as and find the corresponding C ,k . Then, with the known function p k ( a ) , parameters canbe approximately calculated. earning for Dose Allocation in Adaptive Clinical Trials with Safety Constraints C. Proof of Lemma 1 P [ˆ a ( t ) + α ( t ) < p − i ( θ )] ≤ P [ˆ a ( t ) + α ( t ) < a ∗ ] ≤ P [ | a ∗ − ˆ a ( t ) | > α ( t )] ≤ P (cid:34) K (cid:88) k =1 w k ( t −
1) ¯ C | ˆ p k ( t ) − p k ( a ∗ ) | ¯ γ > α ( t ) (cid:35) ≤ K (cid:88) k =1 P (cid:20) | ˆ p k ( t ) − p k ( a ∗ ) | > (cid:18) α ( t ) w k ( t −
1) ¯ C K (cid:19) γ (cid:21) ≤ K (cid:88) k =1 (cid:32) − N k ( t ) (cid:18) α ( t ) w k ( t ) ¯ C K (cid:19) γ (cid:33) (11) ≤ K exp (cid:32) − (cid:18) α ( t )¯ C K (cid:19) γ t (cid:33) = δ. (12)Inequality (11) is from the Hoeffding’s inequality and (12) is derived from the definition of N k ( t ) = tw k ( t ) and Assumption2 with γ > . D. Proof of Lemma 2
From the Hoeffding’s Inequality and Eqn. (6), we have: α ( t ) ≤ p − k ( θ ) − a ∗ − (cid:15) = ∆ k − (cid:15), where ∆ k = | a ∗ − p − k ( θ ) | denotes the gap between the true value of parameter a and the parameter corresponding to whenthe toxicity of dose level d k is exactly at the MTD threshold θ . When t > t and with the definition of α ( t ) in Eqn. (3), thelemma can be immediately derived. E. Proof of Theorem 1
Depending on whether the optimal dose level is included in the admissible set or not, we can decompose the regret into twoparts: R ( n ) = n (cid:88) t =1 P [ k ∗ / ∈ D ( t )] Q + P [ k ∗ ∈ D ( t )] R ( n ) ≤ nδQ + R ( n ) . The probability of the first error event { k ∗ / ∈ D ( t ) } can be bounded by Lemma 1, which indicates that at each step t the probability of a safe dose level being excluded from the admissible set is bounded by δ . For the second part, R ( n ) represents the regret when the optimal dose is included in the admissible set. In this case, the error event is due to theinaccuracy of parameter estimation at the beginning as well as the limited efficacy information provided by each sample.Using Lemma 2, we have: R ( n ) ≤ t + ( K − M ) n (cid:88) t =1 exp( − t(cid:15) ) + n (cid:88) t = t +1 (cid:88) d k : p k ≤ θ { I ( t ) = k }≤ t + K − M (cid:15) + (cid:88) d k : p k ≤ θ c log( n ) q ∗ − q k . Putting the regret from both error events together leads to (7), which completes the proof. earning for Dose Allocation in Adaptive Clinical Trials with Safety Constraints
F. Proof of Theorem 2
First we note: p I ( t ) ( a ∗ ) − θ ≤ p I ( t ) ( a ∗ ) − θ + θ − p I ( t ) ( a ∗ − α ( t )) ≤ C | a ∗ − ˆ a ( t ) + α ( t ) | γ . Thus, the probability can be upper bounded as: P [ˆ a ( t ) − a ∗ > α ( t ) + (cid:15) ] ≤ exp( − t ( α ( t ) + (cid:15) ) ) . Reorganizing the terms, we finally have P (cid:34) n n (cid:88) t =1 p I ( t ) ( a ∗ ) − θ < C (cid:15) γ (cid:35) ≥ − exp( − t ( α ( t ) + (cid:15) ) ) ≥ − δ. G. Proof of Corollary 2 P [ | ˆ a ( n ) − a ∗ | ≥ ∆ M ] ≤ K (cid:88) k =1 P (cid:20) | ˆ p k ( t ) − p k ( a ∗ ) | > (cid:18) ∆ M w k ( t ) ¯ C K (cid:19) γ (cid:21) ≤ K (cid:88) k =1 (cid:32) − N k ( n ) (cid:18) ∆ M w k ( t ) ¯ C K (cid:19) γ (cid:33) ≤ K exp (cid:32) − (cid:18) ∆ M ¯ C K (cid:19) γ n (cid:33) . H. Proof of Theorem 3
We first establish Lemma 3, whose proof directly follow Theorem C.1 in (Combes & Proutière, 2014).
Lemma 3 E [ l k ( n )] = O (log(log( n ))) , for each k (cid:54) = k ∗ . Then, following the similar proof steps in Theorem 1, we have the bound in (9).
I. Proof of Theorem 4
Since k ∗ = min { M, N } and L ( n ) and L ( n ) are the estimations for N and M respectively, { ˆ d r ( n ) (cid:54) = k ∗ } ⊆ E (cid:83) E ,where E = { L ( n ) (cid:54) = N } , E = { L ( n ) (cid:54) = M } . The latter can be bounded by Corollary 2. With the notation β k ( n ) = (cid:113) c log( n ) N k ( n ) , the probability of E can be bounded as follows: P [ L ( n ) < M ] ≤ P [ | ˆ q N ( n ) − ˆ q N − ( n ) | ≤ β N − ( n ) + β N ( n )] ≤ P [ˆ q N − ( n ) − q k + q N − ˆ q N ( n ) ≤ q N − q N − − β N − ( n ) − β N ( n )] ≤ (cid:32) − N N − ( n ) (cid:18) q N − q N − − β N − ( n ) − β N ( n )2 (cid:19) (cid:33) ≤ (cid:32) − f ( N −
1) log( n ) (cid:18) ∆ N − ,N − β N − ( n ) − β N ( n )2 (cid:19) (cid:33) = o (cid:16) n − (cid:17) . earning for Dose Allocation in Adaptive Clinical Trials with Safety Constraints Furthermore, P [ L ( n ) > M ] ≤ P [ | ˆ q N ( n ) − ˆ q N +1 ( n ) | > β N ( n ) + β N +1 ( n )] ≤ P [ | ˆ q N ( n ) − q N | + | q N +1 − ˆ q N +1 ( n ) | > β N ( n ) + β N +1 ( n )] ≤ P [ | ˆ q N ( n ) − q N | > β N ( n )] + P [ | ˆ q N +1 ( n ) − q N +1 | > β N +1 ( n )] ≤ n c . Lastly, f ( N − is the coefficient of the lower bound of N N − ( n ) , and can be written as (see Theorem 4.1 in (Combes &Proutière, 2014)) f ( N −
1) = 1 I ( q N − , q N ) . This completes the proof.
J. Baseline designs in the experiments
The following baseline designs are used for comparison to SEEDA and SEEDA-Plateau in the experiments.•
KL-UCB (Garivier & Cappè, 2011): This approach ignores the safety constraint and focuses entirely on efficacy duringallocation, as for each patient it allocates the dose level with the highest efficacy index. The efficacy performancefor each dose level is characterized by the KL-UCB index. However, at the end of the experiment, a dose level isrecommended according to ˆ d ( n ) = arg max k :ˆ p k ( n ) ≤ θ ˆ q k ( n ) , where ˆ q k ( n ) and ˆ p k ( n ) are the last empirical estimationsof toxicity and efficacy for dose level d k . This suggests that safety constraint is considered in recommendation.Accordingly, type I and type II errors are defined as: e = (cid:88) k ∈K { p k ≤ θ } { ˆ p k ( n ) > θ } ,e = (cid:88) k ∈K { p k > θ } { ˆ p k ( n ) ≤ θ } . • UCB-1 (Auer et al., 2002): The allocation and recommendation rules are similar to KL-UCB above, with the onlydifference that the dose level with the highest UCB-1 index of efficacy is allocated to the patient.•
Independent Thompson Sampling (TS) (Thompson, 1933; Aziz et al., 2019): Toxicity and efficacy are estimatedwith Bayesian indices: ˜ p k ( t ) ∼ Beta ( S pk ( t ) + 1 , N k ( t ) − S pk ( t ) + 1) , and ˜ q k ( t ) ∼ Beta ( S qk ( t ) + 1 , N k ( t ) − S qk ( t ) + 1) , where S pk ( t ) counts the number of toxic outcomes of dose level k among the first t patients and S qk ( t ) counts the numberof effective responses. The dose with maximum ˜ q k ( t ) is allocated to the t -th patient and ˆ d ( n ) = arg max k :˜ p k ( n ) ≤ θ ˜ q k ( n ) is recommended. Definitions of type I and type II errors are slightly modified to: e = (cid:88) k ∈K { p k ≤ θ } { ˜ p k ( n ) > θ } ,e = (cid:88) k ∈K { p k > θ } { ˜ p k ( n ) ≤ θ } . • CRM (O’Quigley et al., 1990): We here employ the CRM algorithm with the same one-parameter toxicity model inour paper: p k ( a ) = (cid:18) tanh( d k ) + 12 (cid:19) a . earning for Dose Allocation in Adaptive Clinical Trials with Safety Constraints We choose a typical prior distribution as a ∼ exp(0 . . Therefore, d k can be solved with prior tox and the prior meanof a . π t ( a ) denotes the posterior distribution of a after observing the outcomes of the first t patients. The allocationrule is a greedy one: I CRMt = arg min k ∈K | θ − p k (ˆ a ( t )) | , ˆ a ( t ) = (cid:90) ∞ a d π t ( a ) , where ˆ a ( t ) is the posterior mean value. With this estimation, the final recommendation rule can be written as: ˆ d ( n ) = arg min k ∈K | θ − p k (ˆ a ( n )) | . • (Storer, 1989): The lowest dose is first given to 3 patients. If none reports a toxic outcome, the next lowest doselevel is given to the next 3 patients. If there are less than 2 among these 6 patients who report toxic outcome, the nextlowest dose level is given to the next 3 patients; otherwise the experiment is stopped and the dose level used beforestopping is recommended as MTD.• MCRM (Neuenschwander et al., 2008): This algorithm classifies the probability of toxicity into four categories. Forour simulated setting, the categories are set as:Under-dosing: π a ( d ) ∈ (0 , . Targeted toxicity: π a ( d ) ∈ (0 . , . Excessive toxicity: π a ( d ) ∈ (0 . , . Unacceptable toxicity: π a ( d ) ∈ (0 . , . The recommendation and the allocation rules are to maximize the probability of targeted toxicity while controlling theprobability of excessive or unacceptable toxicity at P thre = 25% . Based on the posterior distribution of the toxicity,the probability that the toxicity falls in the above four categories can be calculated. The probability that it falls inTargeted category is denoted as P ti while falls in Excessive and Unacceptable categories as P ei . The selection rule istherefore I t = arg max P ei ≤ P thre P ti .• Multi-objective Bandits (Yahyaa & Manderick, 2015): We implement the Pareto Thompson Sampling algorithm of(Yahyaa & Manderick, 2015) in our experiments. Specifically, after getting the estimations of toxicity and efficacy ofeach dose from running the Independent TS design, the algorithm computes the Pareto optimal dose level set I ∗ , whichmeans ∀ i ∈ I ∗ , ∀ j / ∈ I ∗ , ˜ p i ( t ) ≤ ˜ p j ( t ) or ˜ q i ( t ) ≥ ˜ q j ( t ) .Other policies designed for MTA, such as MTA-RA, depend on a different truncated two-parameter logistic efficacymodel (Riviere et al., 2018). In our setting, the exact efficacy model is assumed to be unknown – we only make theincrease-then-plateau assumption. K. Additional experiment results under the same setting as in Section 5
Due to space limitations, we were not able to include all the experiment results of the setting in Section 5. These additionalresults are provided here.In particular, Table 2 only reports the recommendation and allocation percentages for a given n = 100 . It is of interest tosee how these metrics change with n . We plot the mean allocation and recommendation probabilities as a function of n inFig. 4. It can be seen that SEEDA-Plateau outperforms all other methods across a large range of n . L. Experiment of a new setting and its comprehensive results
In the main paper, a setting that has the efficacy reaching the maximal value (the optimal dose) before toxicity hits MTDthreshold is used. A different setting can be considered when maximum efficacy dose exceeds the MTD threshold. The earning for Dose Allocation in Adaptive Clinical Trials with Safety Constraints
Number of Cohorts M ea n A ll o ca t i on P r ob a b ili t i es ( % ) SEEDASEEDA-PlateauIndep TSKLUCBUCB3+3CRMMCRMMulti obj
Number of Cohorts M ea n R ec o mm e nd a t i on P r ob a b ili t i es ( % ) SEEDASEEDA-PlateauIndep TSKLUCBUCB3+3CRMMCRMMulti obj
Figure 4: Mean allocation (left) and recommendation (right) probabilities versus number of patients n .experiment results for this setting (called “setting 2”) is reported in this section. Unless otherwise stated, the parameters arethe same as in Section 5 of the main paper.Table 5 presents the setting as well as the allocation and recommendation percentages of each dose for all consideredalgorithms. For this scenario, dose level 3 is the optimal one. We note that a large portion of the previous conclusions in themain paper still hold. However, the gain of SEEDA-Plateau is less significant over SEEDA, but still outperforms all thecomparing designs. The corresponding Type I and Type II error rates are similarly plotted in Fig. 5.Table 5: Recommendation & allocation percentages of different designs for setting 2. Recommended AllocatedToxicity probabilities 0.1 0.2 earning for Dose Allocation in Adaptive Clinical Trials with Safety Constraints
Number of Cohorts T y p e I E rr o r R a t e ( % ) SEEDASEEDA-PlateauIndep TSKLUCBUCBMulti obj
Number of Cohorts
50 100 150 200 250 300 T y p e II E rr o r R a t e ( % ) SEEDASEEDA-PlateauIndep TSKLUCBUCBMulti obj
Figure 5: Type I and type II error rates in setting 2.An in-depth look at the mean allocation and recommendation probabilities versus number of patients n for this new settingis given in Fig. 6. The same observation as in Section K holds. Number of Cohorts M ea n A ll o ca t i on P r ob a b ili t i es ( % ) SEEDASEEDA-PlateauIndep TSKLUCBUCB3+3CRMMCRMMulti obj
Number of Cohorts M ea n R ec o mm e nd a t i on P r ob a b ili t i es ( % ) SEEDASEEDA-PlateauIndep TSKLUCBUCB3+3CRMMCRMMulti obj
Figure 6: Mean allocation (left) and recommendation (right) probabilities versus number of patients n in setting 2. Number of Cohorts E ff i cacy p e r P a t i e n t SEEDASEEDA-PlateauIndep TSKL-UCBUCB3+3CRMMCRMMulti obj
Number of Cohorts S a f e V i o l a t i on P e r ce n t a g e ( % ) SEEDASEEDA-PlateauIndep TSKL-UCBUCB3+3CRMMCRMMulti obj
Figure 7: Comparison of efficacy per patient and the safety violation percentage in setting 2. earning for Dose Allocation in Adaptive Clinical Trials with Safety Constraints
The convergence of efficacy and toxicity as t increases for setting 2 is plotted in Fig. 7. There is a notable difference tothe previous result in Fig. 2, in that now SEEDA and SEEDA-Plateau converge to a different (but correct) dose than theother considered designs, which only emphasize maximum efficacy. It is clear that with such aggressive pursue of efficacy,they succeed in obtaining better treatment effect than SEEDA(-Plateau), but at the significant cost of frequent violation ofthe safety constraint: as opposed to safety violation percentage hovering between and in Fig. 2, now we face aviolation in the range of to as shown in Fig. 7. Recommendation Accuracy N u m b e r o f P a t i e n t s SEEDASEEDA-PlateauIndep TSKL-UCBUCBMulti-obj
Figure 8: Sample size comparison in setting 2.Lastly, the sample efficiency is evaluated. Fig. 8 plots the minimum number of patients to achieve a given a recommendationaccuracy for different algorithms.
M. Experiment setting 3 to 8 with evaluation of allocation and recommendation percentages
This section reports the allocation and recommendation percentages of each dose for all considered algorithms underdifferent toxicity/efficacy probabilities. We reuse the same 6 scenarios as those in the experiments of (Zang et al., 2014).See Table 6 to 11 for the detailed results. They are in line with the conclusions of the main paper. earning for Dose Allocation in Adaptive Clinical Trials with Safety Constraints
Table 6: Recommended & allocated percentages for Scenario 1 of (Zang et al., 2014).Recommended AllocatedToxicity probability 0.08 0.12 0.2 earning for Dose Allocation in Adaptive Clinical Trials with Safety Constraints
Table 8: Recommended & allocated percentages for Scenario 3 of (Zang et al., 2014).Recommended AllocatedToxicity probability 0.06 0.08 0.14 earning for Dose Allocation in Adaptive Clinical Trials with Safety Constraints
Table 10: Recommended & allocated percentages for Scenario 5 of (Zang et al., 2014).
Recommended AllocatedToxicity probability 0.1
Table 11: Recommended & allocated percentages for Scenario 6 of (Zang et al., 2014).Recommended AllocatedToxicity probability 0.01 0.03 0.050.6