[PDF] Interim recruitment prediction for multi-centre clinical trials

Abstract

We introduce a general framework for monitoring, modelling, and predicting the recruitment to multi-centre clinical trials. The work is motivated by overly optimistic and narrow prediction intervals produced by existing time-homogeneous recruitment models for multi-centre recruitment. We first present two tests for detection of decay in recruitment rates, together with a power study. We then introduce a model based on the inhomogeneous Poisson process with monotonically decaying intensity, motivated by recruitment trends observed in oncology trials. The general form of the model permits adaptation to any parametric curve-shape. A general method for constructing sensible parameter priors is provided and Bayesian model averaging is used for making predictions which account for the uncertainty in both the parameters and the model. The validity of the method and its robustness to misspecification are tested using simulated datasets. The new methodology is then applied to oncology trial data, where we make interim accrual predictions, comparing them to those obtained by existing methods, and indicate where unexpected changes in the accrual pattern occur.

Full PDF

aa r X i v : . [ s t a t . M E ] S e p Interim recruitment prediction for multi-centre clinical trials

Szymon Urbas ∗ , Chris Sherlock , and Paul Metcalfe STOR-i Centre for Doctoral Training, Lancaster University, Lancaster, United Kingdom Department of Mathematics and Statistics, Lancaster University, Lancaster, United Kingdom AstraZeneca, Cambridge, United Kingdom

Abstract

We introduce a general framework for monitoring, modelling, and predicting the recruitment tomulti-centre clinical trials. The work is motivated by overly optimistic and narrow prediction intervalsproduced by existing time-homogeneous recruitment models for multi-centre recruitment. We ﬁrstpresent two tests for detection of decay in recruitment rates, together with a power study. We thenintroduce a model based on the inhomogeneous Poisson process with monotonically decaying intensity,motivated by recruitment trends observed in oncology trials. The general form of the model permitsadaptation to any parametric curve-shape. A general method for constructing sensible parameterpriors is provided and Bayesian model averaging is used for making predictions which account for theuncertainty in both the parameters and the model. The validity of the method and its robustness tomisspeciﬁcation are tested using simulated datasets. The new methodology is then applied to oncologytrial data, where we make interim accrual predictions, comparing them to those obtained by existingmethods, and indicate where unexpected changes in the accrual pattern occur.

Eﬃciently recruiting patients to clinical trials is a critical factor in running clinical trials and hencedelivering new medicines to patients as quickly as possible. Late-stage clinical trials are commonly runacross many sites, and successfully managing and running trials and subsequent processes requires accurateforecasts of trial recruitment.Early recruitment rates can be high, for example, because patients with the required condition arealready available, and rates can then drop once these patients have been recruited. Deterministic ap-proaches and ad hoc techniques may yield simpliﬁed and, often, overly optimistic recruitment timelines, ∗ Email: [email protected] , Address: STOR-i Centre for Doctoral Training, Lancaster University, Lancaster,United Kingdom

Lasagna’s Law (Lasagna, 1979). For example, 48% of centres studied byGetz and Lamberti (2013) failed to enrol the required number of patients in the time originally allocated,leading to extensions of the recruitment timelines and the need to bring more centres into the study, whichitself is a costly process. The timelines are usually pushed to nearly twice the originally proposed plan.The most frequent reason for trial discontinuation appears to be poor recruitment; out of 253 discontinuedtrials studied in Kasenda et al. (2014), 101 were terminated due to under-recruitment.This motivates the need for robust statistical methods for modelling and predicting the recruitment toclinical trials at site-level. Early detections of possible centre underperformance may allow practitionersto swiftly intervene in the operations. It can also provide realistic timelines for the completion of diﬀerentstages of the trials.In this work, we introduce a novel ﬂexible framework for eﬀectively modelling and predicting pa-tient recruitment. We will focus on the oncology therapeutic area as it is known for sparse enrolmentswhose patterns are not suﬃciently captured by the state-of-the-art methods (Anisimov and Fedorov, 2007;Lan et al., 2019). Our framework utilises time-varying recruitment rates whilst also permitting variationbetween recruitment centres. Inference is based on the set of known centre initiation times to date, whilstthe prediction is conditional on a set of future initiation times. Past initiation times are known, but typi-cally, whilst there is a plan for future initiation times along with potential contingencies, the actual timesare not known precisely in advance. The proposed methodology can be used with user-speciﬁed initiationschedules to facilitate the choice between diﬀerent initiation-time scenarios, or it can be combined with acentre-initiation model. Predictions of future recruitment incorporate parameter and model uncertainty,which is essential when data are limited.Existing methods for predicting recruitment to clinical trials are overviewed in Section 2. Section 3outlines methods for detecting recruitment rate decay in the multi-centre recruitment setting along withresult of a Monte Carlo power study. Section 4 introduces the ﬂexible modelling framework and Section5 presents a general method for choosing sensible Bayesian parameter priors, along with an appropriateposterior sampling method and diagnostics. A simulation study is presented in Section 6, illustrating theﬁtting of the model, model validation and forecasting recruitment using Bayesian model-averaging. InSection 7 the model is ﬁtted to an oncology dataset, and this is followed by a discussion in Section 8.

The ﬁrst statistical modelling framework for clinical trial recruitment was introduced in Lee (1983), wherethe recruitment was assumed to be a constant-rate Poisson process, leading to tractable inference based oninterim data. Williford et al. (1987) built on the model by considering Bayesian inference with conjugate2riors. Gajewski et al. (2008) and Jiang et al. (2015) further explored the eﬀects various prior densitiescan have on predictions. Time-inhomogeneous accrual was ﬁrst considered in Piantadosi and Patterson(1987), where the aggregated accrual across all sites was modelled as an inhomogeneous Poisson processwith intensity λ ( t ) = ζ (1 − exp {− κt } ), ζ, κ >

0. Zhang and Long (2010) took a non-parametric approach,using B-splines to model the trends in accrual and using the intensity value at the census time forpredictions. Tang et al. (2012) proposed a Poisson model with a piece-wise linear intensity which capturedaspects of recruitment such as slow initial recruitment and a spike in recruitment close to the end of thetrial. For a more thorough review of these as well as other methods see Heitjan et al. (2015). Accrual-onlymodelling methods do not consider the eﬀect that initiating new centres can have on recruitment trends.For that reason, we shall focus on methods which can take advantage of centre-speciﬁc recruitment data.Anisimov and Fedorov (2007) introduced the Poisson-gamma (PG) model of recruitment in a multi-centre setting, with the main appeal being the use of random eﬀects for the recruitment rates of centres,providing a tractable, data-driven prior predictive distribution for recruitment in yet-unopened centres.The model consists of C centres, each recruiting N c patients over τ c days, c = 1 , . . . , C . The frameworkmakes the following distributional assumptions, λ c ∼ Gamma ( α, α/φ ) ,N c | λ c ∼ Pois ( λ c τ c ) , c = 1 , . . . , C. (1)The random eﬀect λ c is the recruitment rate for centre c . The rates, and thus the centre recruitments,are assumed to be independent conditional on α and φ . There are, however, several caveats with theapproach taken. The paper advocates using the Empirical Bayes approach, that is, maximum likelihoodestimation for the hierarchical parameters ( α, φ ) followed by re-estimation of the distribution of randomeﬀect λ c given α , φ and n c , for each centre. A method for obtaining the uncertainty in the hierarchical( α, φ ) parameters is provided, but this uncertainty is not accounted for when making predictions, leadingto overly conﬁdent prediction intervals. However, the main issue which could result from employing themodel arises from the strong assumption of time-homogeneity of centre recruitments, which can lead tounderestimations of the time to completion.Figure 1 shows the accrual in a simulated trial where the rates gradually decay with time as well asthe predictive distribution of the PG model ﬁtted at a census time of three-ﬁfths of the total length ofthe study; the initiation day for each centre is marked. The accrual appears to follow a straight linewhich could initially suggest using a time-homogeneous model. However, new centres are constantlybeing iniatated so that a constant recruitment rate for each centre leads to an upward arching trend inaccrual. This is encapsulated by the ﬁtted predictive. Here the accrual is initially badly underestimatedand then grossly overestimated after the census time. The apparent “matching” at the census time is due3

100 200 300 400 500 600

Time A cc r ua l Figure 1: Accrual (black, solid) with the predictive mean (red, solid) and 95% prediction bands(red,dashed), based on the PG model (1) with the census time marked by the vertical, dashed line.to predictions using re-estimated random-eﬀect distributions.Lan et al. (2019) describes the ﬁrst multi-centre recruitment model in which the rates decrease overtime. The model assumes inhomogeneous Poisson for arrivals centre c with an intensity of the form λ c ( t ) =  λ oc , t < t o λ oc exp {− θ ( t − t o ) } , t ≥ t o , where λ oc is a gamma random eﬀect, as in (1), and t o a user-speciﬁed parameter and is not estimatedas part of the inference. By enforcing the speciﬁc intensity-form, the possibilities of time-homogeneousrecruitments or even intensity decays with heavier tails are excluded. A more systematic alternative is tostart by testing the time-homogeneity assumption. Given series of daily centre recruitment counts over the recruitment period of τ c days, { N c ( t ) } τ c t =1 , c =1 , . . . , C , we can test the hypothesis of time-homogeneity. To detect a decay in the rate, we only needto use the sums X ( c )1 = P τ c / t =1 N c ( t ) and X ( c )2 = P τ c t = τ c / N c ( t ) ( c = 1 , . . . , C ), whose expectations wedenote by µ ( c )1 and µ ( c )2 respectively. Detecting time-inhomogeneity in a single centre can be diﬃcult as theinfrequent counts will lead to low powers of tests (Krishnamoorthy and Thomson, 2004) (see also Tables 1and 2). Thus we combine the recruitments across all centres leading to two counts: X = P Cc =1 X ( c )1 and X = P Cc =1 X ( c )2 , and we choose our hypotheses to be H : P Cc =1 µ ( c )1 = P Cc =1 µ ( c )2 vs H : P Cc =1 µ ( c )1 > P Cc =1 µ ( c )2 .The tests are one-sided as we are only interested in recruitment which decays over time. We consider4 .. Figure 2: Count series are all centred and the sum of all the ﬁrst halves is compared to the sum of secondhalves.tests with respect to the following assumptions:

Assumption 1:

For each centre c = 1 , . . . , C , the counts in the ﬁrst and second halves of that centre’srecruitment period are independent and have the same distribution, X ( c )1 d = X ( c )2 , with expectation µ ( c )1 .Furthermore, the recruitments at each centre are independent of each other. Assumption 2:

The patients arrive according to a Poisson process such that X ( c )1 , X ( c )2 ∼ Pois (cid:16) µ ( c )1 (cid:17) ,for some µ ( c )1 , c = 1 , . . . , C . Assumption 1 implies that X and X must have the same distributions, with respective expectations µ = P Cc =1 µ ( c )1 and µ = P Cc =1 µ ( c )2 being equal. Assumption 2 further implies that the distributionsmust be Poisson. Figure 2 shows the construction of the quantities X and X by aligning the centres ofthe recruiting periods. The splitting of the series halfway is arbitrary, though splitting it in half (or atleast close to this) would theoretically yield the highest power. It assumes that the τ c are even. However,centres recruiting over odd numbers of days can still be used by removing the middle day observation.This reduces the power of the tests, though the reduction is negligible.Gu et al. (2008) oﬀer a detailed Monte Carlo study of the diﬀerent methods used for testing for adiﬀerence in means of two Poisson variables. Here, we focus on the ones most applicable to the clinical-trial recruitment setting, bearing in mind statistical power and robustness. We identiﬁed two methods:the non-parametric bootstrapped test (BST), which is powerful yet robust, and the Poisson likelihood-ratio test (LRT), which makes stronger distribution assumptions to achieve an even higher power. TheBST only assumes that the counts in each day are independent and identically distributed (Assumption 1).With this assumption, resampling within each centre with replacement, from the original data would stillproduce a valid sample from the assumed distribution under H . A large number of bootstrap samples isused to simulate the distribution of the diﬀerence in two means, which is then used to test the hypothesis.Appendix A of the Supplementary Material details the sampling procedure for obtaining the distribution5nd the p -value.For the LRT, we require Assumption 2, which is already an underlying assumption for the modelin Anisimov and Fedorov (2007). Upon aggregation, the two sums follow Poisson distributions, that is, X ∼ Pois( µ ) and X ∼ Pois( µ ). The likelihood under the null model ( µ = µ ) is compared to thelikelihood under the alternative two-mean model ( µ > µ ). Here, the likelihood function is L ( µ , µ | x , x ) = µ x exp {− µ } x ! µ x exp {− µ } x ! , µ , µ > . We let T L ( x , x ) =  L (ˆ µ , ˆ µ | x , x ) − log L (ˆ µ, ˆ µ | x , x )] , ˆ µ > ˆ µ , ˆ µ ≤ ˆ µ , where ˆ µ is the MLE under the null, and ˆ µ and ˆ µ are the MLEs under the alternative hypothesis. Underthe null, we would expect the test statistic T L ( X , X ) to asymptotically be zero half the time withthe other half following a χ distribution (Robertson et al., 1988), When using the LRT, the simulatedsigniﬁcance levels can diﬀer from the pre-speciﬁed level when µ values are low. This is due to using theasymptotic χ distribution when calculating the p -value (Gu et al., 2008).The performance of the two tests was assessed by carrying out a Monte Carlo study. Test powerswere estimated using Poisson data with diﬀerent expectations and ratios, R = µ /µ . For the LRT powerestimates, 5 × samples were used as the test itself is very computationally cheap. For the BST, 5 × samples were used, with each test using a bootstrapped distribution of size 10 . Tables 1 and 2 showthe results of the study. The biggest diﬀerence in powers occurs for lower expectations, with the LRToutperforming BST. It must be noted, however, that the BST only requires the data to be i.i.d. withineach centre and thus is robust to violations of the Poisson assumption; if the counts within each centreare overdispersed, for example, it does not aﬀect the Type I error.To exemplify the usefulness of this test, we can consider an interim likelihood ratio test where theexpected number of enrolments is 170. This corresponds to E [ X ] = 100 and R = 0 .

7, for example, andresults in a statistical power of approximately 0.75. Considering many trials require an upward of 500enrolments, informed decisions can be made relatively early on in the trial.

We consider a scenario of C centres recruiting patients, with each centre c being initiated for τ c days.The number recruited by centre c on day t shall be denoted by N ( t ) c . We propose the following modelling6able 1: Power for likelihood-ratio test E [ X ] R = 1 R = 0 . R = 0 . R = 0 . R = 0 . R = 0 .

55 0.06 0.08 0.11 0.15 0.20 0.2710 0.05 0.08 0.12 0.18 0.26 0.3720 0.05 0.09 0.17 0.27 0.41 0.5850 0.05 0.13 0.28 0.50 0.73 0.90100 0.05 0.18 0.44 0.75 0.94 0.99200 0.05 0.27 0.68 0.95 1.00 1.00Table 2: Power for non-parametric bootstrap test E [ X ] R = 1 R = 0 . R = 0 . R = 0 . R = 0 . R = 0 .

55 0.04 0.06 0.08 0.11 0.14 0.1810 0.05 0.08 0.12 0.16 0.24 0.3320 0.05 0.10 0.16 0.25 0.39 0.5750 0.05 0.14 0.28 0.48 0.70 0.88100 0.05 0.18 0.42 0.74 0.93 0.99200 0.05 0.28 0.67 0.94 1.00 1.00framework for the multi-centre clinical-trial recruitment, based on the inhomogeneous Poisson process, λ oc ∼ Gamma (cid:18) α, αφ (cid:19) , c = 1 , . . . , C,N ( t ) c ∼ Pois (cid:18) λ oc Z tt − g ( s ; θ ) d s (cid:19) , t = 1 , . . . , τ c , where g is a non-negative function which dictates the curve-shape of the intensity and θ is a parameter(or parameter vector) associated with the functional form. We use the ( α, φ ) parametrisation for thehierarchical gamma distribution as it leads to orthogonality of α and φ in the Poisson-gamma model(Huzurbazar, 1950). A priori , E [ λ c ] = φ and V [ λ c ] = φ /α . For notational simplicity, we deﬁne G ( t ; θ ) = R t g ( s ; θ ) d s . The likelihood contribution from centre c isPr( N c = n c | λ oc , θ, τ c ) = τ c Y t =1 Pr( N ( t ) c = n ( t ) c | λ oc , θ )= exp {− λ oc G ( τ c ; θ ) } ( λ oc ) n ( · ) c τ c Y t =1 [ G ( t ; θ ) − G ( t − θ )] n ( t ) c n ( t ) c ! , where n ( · ) c = P τ c t =1 n ( t ) c . Marginalising over the random-eﬀect component givesPr( N c = n c | α, φ, θ, τ c ) = ( α/φ ) α Γ (cid:16) α + n ( · ) c (cid:17) Γ( α )[ G ( τ c ; θ ) + α/φ ] (cid:16) α + n ( · ) c (cid:17) τ c Y t =1 [ G ( t ; θ ) − G ( t − θ )] n ( t ) c n ( t ) c ! , L ( α, φ, θ | n , τ ) = C Y c =1 Pr( N c = n c | α, φ, τ )= ( α/φ ) Cα Γ( α ) C C Y c =1 Γ (cid:16) α + n ( · ) c (cid:17) [ G ( τ c ; θ ) + α/φ ] (cid:16) α + n ( · ) c (cid:17) τ c Y t =1 [ G ( t ; θ ) − G ( t − θ )] n ( t ) c n ( t ) c ! . (2)If all the centres had been recruiting for the same amount of time, that is, τ c ≡ τ ∀ c , then by ﬁxingthe integral of g ( t ; θ ) over τ days we could introduce orthogonality between ( α, φ ) and θ by imposing thenormalisation: R τ g ( t ; θ ) d t = τ. This generalises the homogeneous model with g ( t ; θ ) = 1 and leads tothe following factorisable likelihood, L ( α, φ, θ | n , τ ) = ( α/φ ) Cα Γ( α ) C ( τ + α/φ ) ( Cα + n Σ ) C Y c =1 Γ (cid:16) α + n ( · ) c (cid:17) τ c Y t =1 [ G ( t ; θ ) − G ( t − θ )] n ( t ) c n ( t ) c != L ( α, φ | n , τ ) L ( θ | n , τ ) , (3)where n Σ = P Cc =1 n ( · ) c .The factorisation means that now the θ parameter describes the shape of the intensity only, and α and φ describe the distribution of the magnitude of the integrated intensity, leading to a more interpretablemodel.Even when centres are not all recruiting for the same length of time, we choose to impose a similarnormalisation using some representative τ , here C P Cc =1 τ c . As demonstrated empirically in Section 6, thecondition leads to approximate orthogonality even when the centres are initiated uniformly throughoutthe study. In this work, we will restrict our choice of curve-shape g to parametric forms. The functional form of g isarbitrary and the best choices may depend on the context of the problem. When working with oncologydatasets, for each centre we observe low-frequency counts which seem to become even less frequent overtime but with varying tail behaviours. For this reason, we chose the following curve-shape g κ ( t ; θ ) ∝ (cid:18) θtκ (cid:19) − κ , t ≥ , θ, κ > . (4)The proportionality is used as multiplying g κ by some positive constant and dividing φ by the sameconstant leads to the same model. The limit as κ → → ∞ , we obtain an exponential tail. The full (normalised) forms are then g ( t ) ≡ , (5) g ( t ; θ ) = θ (1 + θt ) − log(1 + θτ ) τ, (6) g κ ( t ; θ ) = θ (1 − κ )(1 + θt/κ ) − κ κ (1 + θτ /κ ) − κ − κ τ, κ / ∈ { , , ∞} , (7) g ∞ ( t ; θ ) = θ exp {− θt } − exp {− θτ } τ. (8)The associated integrated forms, G κ ( t ; θ ) are provided in Appendix B of the Supplementary Material.The ﬂexibility of the model, however, can result in potential identiﬁability issues. Inference methods,such as maximum likelihood, can run into numerical instabilities when κ >> > θ or κ < << θ (seeAppendix B of the Supplementary Material for details). For this reason, we recommend restricting thechoice of κ to a discrete set of values; in this work, we use { , . , , , ∞} . This will be elaborated on inSection 5.3. We aim to construct a framework which can provide reliable predictions whilst capturing uncertainty inthe estimated parameters and in the underlying model itself. We employ the Bayesian paradigm since itnaturally incorporates the distribution of the random eﬀects, λ c , with the uncertainty in the model andthe parameter values. However, we note that in some scenarios frequentist methods may be preferred andgive a brief outline of how one may employ them in Appendix C of the Supplementary Material.Given a parametric statistical model, the Bayesian paradigm starts from a prior distribution for theparameters, here denoted π ( α, φ, θ ) and updates this according to some data, y , to provide a poste-rior distribution, here denoted by π ( α, φ, θ | y ). When multiple parametric models, M k , k = 1 , . . . , K ,are being considered, the posterior probability for model k , here denoted by π p ( M k | y ), may also be cal-culated. Section E of the supplementary material provides more details on these quantities; see alsoRobert and Casella (2013) or Gelman et al. (2013), for example.For the models under consideration for trial-recruitment data, neither the posterior model probabilitiesnor the posteriors for the parameters for any particular model are tractable, and so we employ importancesampling to obtain Monte Carlo samples ( α m , φ m , θ m ), m = 1 , . . . , M from the posterior distribution forany given model, as well as an estimate of π ( M k ), k = 1 , . . . , K . Appendix D of the SupplementaryMaterial provides further details of this method, as well as of eﬀective sample size (ESS), a diagnostic whichindicates the reliability of the Monte Carlo estimates; see also Robert and Casella (2013) or Doucet et al.92013).In Sections 6 and 7, we carry out inference on ˜ α = log α , ˜ φ = log φ and ˜ θ = log θ since analyses oftrial data showed the likelihood in the log-parameters to be more symmetric about the mode, which canmake sampling more eﬃcient. For the importance sampling proposal distribution, we use a multivariate t -distribution on 4 degrees of freedom, with the same mode as the posterior and the shape matrix equalto the inverse Hessian at the posterior mode. We base our prior speciﬁcation on a maximum likelihood meta-analysis of 20 oncology clinical trialrecruitment datasets. The trials studied were for seven diﬀerent types of cancers: ovarian, prostate,breast, small and non-small lung, bladder and pancreatic. The number of centres ranged from 58 to244 with a median of 140 and total enrolments ranged from 245 to 4391 with a median of 1035. Inall cases, the parameter estimators were close to orthogonal justifying the use of independent priors: π ( ˜ α, ˜ φ, ˜ θ ) = π ( ˜ α ) π ( ˜ φ ) π (˜ θ ).We found that the α parameter does not change much from one study to another. The weaklyinformative prior ˜ α ∼ N (0 . , ) suﬃciently reﬂects the distribution of the estimated values.The φ parameter estimates varied by orders of magnitude between studies. The parameter reﬂects themean centre recruitment and is well identiﬁed by the data; it depends upon the catchment region, typeof indication and protocol, for example. For this reason, we advocate using a vague prior unless reliableexpert knowledge is available. In our analyses, we used the uninformative, proper prior ˜ φ ∼ U ( − , θ . Lindley’s paradox (Lindley, 1957) warns that assigning θ a vague prior can lower theposterior probabilities of the models that use θ , compared to the model with κ = 0 which does not use θ . To avoid the paradox we set an informative but sensible prior by considering the drop oﬀ in intensityafter some time, t . We let R κ = g κ ( t ; θ ) /g κ (0; θ ) and set R κ ∼ Beta( a, b ) a priori , with a = b = 1 . κ = 0, and that we do not expect a 100% drop oﬀ after a time of t (expert opinion); here we take t = 4months. As R κ is a monotonic function of θ , we can use a density transform to derive the correspondingprior for θ . If prior information is abundant, be it in the form of historical data or expert knowledge, thebeta distribution parameters can be adjusted to reﬂect this. Given (4), the resulting prior density for ˜ θ is given in Appendix E of the Supplementary Material.10 .2 Predictive distribution There are two complementary properties for which predictions might be required: the distribution offuture recruitments within a set time interval, and the distribution of time until the target number ofrecruitments is reached. In this section, we focus on the former; details of the latter appear in AppendixF. Suppose we are interested in sampling the recruitment, denoted N + c , at some day t + by centre c .Given samples from the parameter posteriors, we can sample exactly from the posterior predictive for N + c by exploiting the Poisson-gamma conjugacy of the random-eﬀect distribution. The posterior distributionfor the λ oc random eﬀect for centre c is λ oc | α, φ, θ, n c , τ c ∼ Gamma (cid:16) α + n ( · ) c , α/φ + G ( τ c ; θ ) (cid:17) = Gamma (cid:18) α ∗ c , α ∗ c φ ∗ c (cid:19) , (9)where α ∗ c = α + n ( · ) and φ ∗ c = φ × (cid:16) α + n ( · ) c α + φG ( τ c ; θ ) (cid:17) . The predictive distribution for N + c conditional on therandom eﬀect is: N + c | λ oc , θ ∼ Pois λ oc Z t + t + − g ( s ; θ ) d s ! = Pois (cid:0) λ oc G + θ (cid:1) , (10)where G + θ = R t + t + − g ( s ; θ ) d s .Marginalising over the random eﬀect posterior, we arrive at the negative binomial distribution: P ( N + c = n | α ∗ c , φ ∗ c ) = Γ( α ∗ c + n )Γ( α ∗ c ) n ! (cid:18) α ∗ c α ∗ c + φ ∗ c G + θ (cid:19) α ∗ c (cid:18) φ ∗ c G + θ α ∗ c + φ ∗ c G + θ (cid:19) n , n ∈ N . (11)The length of interval to t + does not need to be a day and could instead be a week or a month, dependingon the context of the application. To obtain the full marginal predictive, we sample the recruitmentsconditional on parameters sampled from the posterior. For as yet unopened centres, we set n ( · ) c = τ c = 0.For each triplet (or couplet, if κ = 0) of parameters sampled from the posterior, we sample N + c , c =1 , . . . , C , and sum them to obtain a sample from N + | α, φ, θ . The collection of these sums is a samplefrom the posterior predictive distribution for the model.If simulations for multiple distinct time periods are required for a given centre, c , as needed for theaccrual curve for example, then we ﬁrst sample λ oc from its posterior (9). We then simulate the Poissoncounts for the individual time periods, which are conditionally independent given λ oc , from (10). When predicting the enrolments using a ﬁtted model, we implicitly assume that a single model best reﬂectsreality; however, prediction methods should consider the uncertainty in the models used for inference. Weshall, therefore, use model averaging for making predictions, that is, take a weighted average of predictions11ade by each model. Working in the Bayesian paradigm provides us with an intuitive choice for weightsin the form of marginal likelihoods of the models.Pr( N + = n + | n , τ ) = K X k =1 Pr( N + = n + | n , τ , M k ) π p ( M k | n , τ ) , where π p ( M k | n , τ ) ∝ π ( n | τ , M k ) π ( M k ), k = 1 , . . . , K , with π ( M k ) being prior model probabilities.The averaging framework ﬁts in with the restriction of the shape parameter κ to a discrete space. Each κ value generates an inhomogeneous Poisson-gamma model with the tail behaviour of the associatedintensity shape. This includes the null ( κ = 0) model as in Anisimov and Fedorov (2007). In this workwe set all prior model probabilities equal. Before making any statements in regards to the future recruitments, we should validate that the ﬁttedmodel does indeed capture the true data-generating process suﬃciently well. Since the true process isunknown, we compare the observed data to the modal model (the model with the highest posteriorprobability) ﬁxed at posterior parameter means ( ˆ α, ˆ φ, ˆ θ ).Firstly, we wish to assess that the chosen hierarchical structure is reﬂected in the data. The distri-bution of posterior means of the individual random eﬀects should approximately follow the hierarchicalGamma( ˆ α, ˆ α/ ˆ φ ) distribution. A QQ-plot can be used to visually compare the distributions. If deemedsuﬃciently similar, using the distribution for generating predictions for yet-unopened centres is appropri-ate. If the distributions are noticeably diﬀerent, particularly if the true distribution is multimodal, anyinterim predictions for yet-unopened centres could (but need not; see robustness study in Section 6) beinaccurate.According to the model, the counts in any initial period [0 , t ′ ] (such as the ﬁrst month) of eachcentre’s recruitment period, follow a negative binomial distribution with shape parameter α and successprobability φG ( t ′ ; θ ) / ( α + φG ( t ′ ; θ )), similar to that given in (11) but using α and φ in place of α ∗ c and φ ∗ c . As the true parameters are unknown, we compare it to the distribution ﬁxed at point-estimates( ˆ α, ˆ φ, ˆ θ ). The diagnostic indicates if the combination of the gamma random eﬀects and the modal decaymodel captures the behaviour over the initial period after centre initiation. Again, a QQ-plot can beused for comparing the theoretical distribution to the observation, giving an indication if the ﬁtted modelunder- or overestimates initial recruitment. The initial period, [0 , t ′ ], should be long enough that the truerecruitment decay should be apparent. However, since only centres that have been recruiting for a periodof at least t ′ can be used for the diagnostic, to ensure a reasonable power, t ′ should be short enough thata large number of sites have been recruiting for this duration. In this work, we set t ′ = 60 (2 months).12 α φ θ π ( M k | n ) ESS0 1 .

141 (0 . , . .

013 (0 . , . −− . × − . .

167 (0 . , . .

013 (0 . , . .

143 (0 . , . . × − .

144 (0 . , . .

013 (0 . , . .

033 (0 . , . . × − .

142 (0 . , . .

014 (0 . , . .

017 (0 . , . . × − ∞ .

122 (0 . , . .

014 (0 . , . .

009 (0 . , . . × − importance samples for each model. We demonstrate our ﬂexible framework through a simulation study, using simulated data sets to illustratemodel ﬁt and prediction and to highlight the eﬀect model misspeciﬁcation can have on predictions. Inpractice, patterns in centre initiation times can vary greatly between trials. For presenting the methodol-ogy, we consider an initiation schedule similar to that observed in a typical trial. We test the robustnessof the method using a uniform initiation schedule, with another type of schedule examined in AppendixG of the Supplementary Material.Our historical data set do not include the initiation times of the centres, so instead, to accurately reﬂectthe historical data used in the meta-analysis and what is often available to researchers, we take the ﬁrstrecruitment time of a centre as its initiation time and adjust the models to include a single deterministicrecruitment at the initiation time of each centre followed by stochastic recruitment as described in Section4. We simulate a study over a course of 600 days, with 200 centres. The parameters used for simulationswere α = 1 . φ = 0 . κ = 2 . θ = 0 .

02. The inference is carried out on data observed in the ﬁrst360 days. As motivated in Section 1, we condition the inference on a set of known initiation times, chosenby the practitioner; these could subsequently be varied to investigate the impact of diﬀerent schedules orinitiation models. We consider a set of models with ﬂexible tails (Section 4.1) allowing κ ∈ { , . , , , ∞} ,thus including the null model (Anisimov and Fedorov, 2007). The “normalisation” of the curve-shapeswas imposed at ¯ τ = C P Cc =1 τ c . We purposely simulated using a κ value outside of those considered inour models to illustrate the ﬂexibility of the framework. For Bayesian inference, we used parameter andmodel priors outlined in Sections 5.1 and 5.3 respectively. Based on the model ﬁtted to the data at thecensus day 360, we wish to predict the daily accrual until day 600.Performing the LRT and BST from Section 3, we ﬁnd the p -values of both tests to be < .

100 200 300 400 500 600

Time A cc r ua l Figure 3: Accrual plot with the centre opening times marked by “+” symbols on the abscissa.importance sampler. We see that model corresponding to κ = ∞ has the highest posterior probability.A trellis plot of the posteriors for ( ˜ α, ˜ φ, ˜ θ ) from the modal model (see Appendix G of the SupplementaryMaterial) conﬁrms at least approximate pairwise orthogonality between the parameters, as anticipatedfrom Sections 4 and 5.1. QQ-plots for the modal model comparing the hierarchical gamma distributionto the posterior means of the random eﬀects, and comparing the observed recruitments over the ﬁrsttwo months of each centre’s recruiting period to the model’s negative binomial distribution both showapproximate straight lines with unit gradient and are provided in the Supplementary Material.Figure 4 shows the accrual forecast from the census time τ = 360 up to the horizon τ H = 600,superimposed onto the true accrual plot. The forecast is based on the Bayesian model-averaged posteriorpredictive distribution. The true accrual is contained within the 95% predictive intervals.Figures 5a and 5b use an earlier census time ( τ = 240) to illustrate the issues that can arise whenmaking predictions using maximum likelihood estimation and model selection. The inference was carriedout with the same set of candidate models, and predictions were obtained by simulating from the bestmodel ( κ = ∞ , chosen using AIC) with parameters ﬁxed at the MLEs. As shown in the plots, notaccounting for parameter and model uncertainty may lead to overly conﬁdent and biased predictions.Simulations with τ = 360 (see Supplementary material) still showed bias due to the choice of a singlemodel, although the contrast with Figure 4 in terms of prediction interval width was less marked.We repeated the analysis with a diﬀerent distribution of initiation times, making the centre initiations“clump” roughly every two months. The resulting forecast predictive distribution can be seen in Figure14

100 200 300 400 500 600

Time A cc r ua l Figure 4: Accrual with Bayesian model-averaged forecast predictive mean (solid, red) and 95% predictionbands (red, dashed). Prediction bands are based on the 2 .

5% and 97 .

5% quantiles. The forecast beginsfrom a point marked by the red dot and the “+” symbols on the abscissa indicate centre opening times.

Time A cc r ua l (a) Bayesian model averaging

Time A cc r ua l (b) Maximum likelihood and model selectionFigure 5: Comparison of accrual predictions produced by two methods; accruals (black, solid) withpredictive means (red, solid) and 95% prediciton bands (red, dashed). Prediction bands are based on the2 .

5% and 97 .

5% quantiles. The “+” symbols on the abscissa indicate centre opening times.15

100 200 300 400 500 600

Time A cc r ua l Figure 6: Accrual with forecast predictive mean (solid, red) and 95% prediction bands (red, dashed).Prediction bands are based on the 2 .

5% and 97 .

5% quantiles. The forecast begins from a point markedby the red dot and the “+” symbols on the abscissa indicate centre opening times.6; performance appears to be robust to the type of initiation schedule.To further test the robustness of the framework, we ﬁrst consider the random eﬀects λ oc now beinggenerated from a mixture of two gamma distributions λ oc | α, φ , φ ∼

12 Gamma (cid:18) α, αφ (cid:19) + 12 Gamma (cid:18) α, αφ (cid:19) . We considered data generated using the same α value and curve-shape as before, but now with centreinitiation times uniformly sampled on the interval. The ratio of gamma expectations was ﬁxed suchthat φ = 10 φ , and the random eﬀect expectation, E [ λ oc ] = ( φ + φ ) /

2, was set to 0.01 and then0.03. Figures 7a and 7b show example forecasts for accruals with the two diﬀerent expectations. Themore data, that is, the larger E [ λ oc ], the more apparent the discrepancy in the random-eﬀect distribution,and the concomitant predictions, becomes. This is visible in the clearly non-linear diagnostic QQ-plots,and the plotted forecasts (see Supplementary Material). The robustness of predictions comes from thefact that the random eﬀects for initiated centres use re-estimated data-driven distributions, reducing theimportance of the random-eﬀect prior; thus the main source of forecasting error comes from the incorrectrandom-eﬀect prior for new centres. Similar plots for the ”clumped” initiation schedule, provided in theSupplementary material, show the same pattern. This mixture distribution of random eﬀects representsthe (extreme) scenario where roughly half of the centres recruit the vast majority of patients, with theremaining sites recruiting little to none each. When the ratio of the two means is closer to 1, the model16

100 200 300 400 500 600

Time A cc r ua l (a) Uniform openings, E [ λ oc ] = 0 . p -value = 0.212 Time A cc r ua l (b) Uniform openings, E [ λ oc ] = 0 . p -value = 0.207Figure 7: Accruals (black, solid) with predictive means (red, solid) and 95% prediciton bands (red, dashed)when the true random-eﬀect distribution is a mixture. Prediction bands are based on the 2 .

5% and 97 . g W ( t ; θ, k ) = kθ (cid:0) tθ (cid:1) k − exp {− ( t/θ ) k } − exp {− ( τ /θ ) k } τ, so G W ( t ; θ, k ) = 1 − exp {− ( t/θ ) k } − exp {− ( τ /θ ) k } τ, where θ , k >

0. We simulated accrual datasets using the Weibull shape with θ = 30 and k = 1 . α = 1 . φ were used: 0 .

01 and 0 .

03; Figures 8a and 8bshow example forecasts. For lower overall recruitment levels, the model still predicts future accrual well.Forecast inaccuracies due to model misspeciﬁation become more apparent when larger recruitment ratesare used. The same pattern is observed when centre initiation times are clumped (see Appendix G of theSupplementary Material).

We ﬁtted the same set of models to a recruitment dataset of a prostate-cancer clinical trial. The recruit-ment was carried out across 244 sites. The accrual is presented as the proportion of the total numberenrolled. Similarly, time is given as the proportion of the total recruiting period. Figures 9 and 10 showthe diagnostic QQ-plots for the model ﬁtted to data available at time 0 .

4. They indicate that there issuﬃcient concordance between the assumed model and observed enrolment giving validity to potentialpredictions. Figure 11 shows the accrual along with forecasts from four diﬀerent census times. The pre-dictive bands become narrower and parameter uncertainty decreases at each census as more data become17

100 200 300 400 500 600

Time A cc r ua l (a) Uniform openings, E [ λ oc ] = 0 . p -value = 0.553 Time A cc r ua l (b) Uniform openings, E [ λ oc ] = 0 . p -value = 0.033Figure 8: Accruals (black, solid) with predictive means (red, solid) and 95% prediciton bands (red, dashed)when the true intensity shape is Weibull, for two diﬀerent values of E [ λ oc ]. Prediction bands are based onthe 2 .

5% and 97 .

5% quantiles. The “+” symbols on the abscissa indicate centre opening times.available for inference. After the third census, there is an unexpected jump in accrual followed by a droparound the fourth census time, suggesting a global external factor, such as a change in the protocol. Table4 shows p -values of the LRT and BST. Initially, when the accrual is still only a small proportion of thetotal, it is hard to detect the time-inhomogeneity. At later census points, the test outcomes indicate thatthe rates are not constant.We compare the proposed framework to the standard homogeneous PG model (1) as well as a homo-geneous Poisson process (HPP) model ﬁtted only to the accrual. We used the same priors as outlined inSection 5.1 for ﬁtting the PG model, and the HPP rate estimate was obtained using maximum likelihood.The methods were compared in terms of the predicted completion time of the recruitment for the studywith the sampling details outlined in Appendix F of the Supplementary Material. Forecast completiontime from 6 diﬀerent census points and can be seen in Figure 12; the ﬁrst HPP predictions were centredat 3 .

67 and 1 .

84 which were outside the plot’s range. The proposed framework produces better pointpredictions, especially at earlier interim analyses, and more closely represents the true uncertainty. TheHPP predictions near the end of the trial are very accurate. At this point, the majority of the centreshaving already been initiated and have been recruiting for a long period of time. As a result, the totalrecruitment rates are not changing by much, with the slight decreasing trend oﬀset by the occasionalinitiation of a new centre. This is a coincidence; if the decay rate had been sharper or shallower, or iffewer or more centres had been initiated then the naive overall Poisson process model would not haveﬁtted as well. The underprediction of the completion time by the proposed model at the census timeof t = 0 .

71 is likely a result of the unexpected surge in recruitment at around that time. The surge isexamined in more detail in Appendix G of the Supplementary Material.18 .00 0.02 0.04 0.06 0.08 . . . . . l o distribution (QQ−plot) Theoretical O b s e r v ed Figure 9: Re-estimated λ oc expectations com-pared to Gamma (cid:16) ˆ α, ˆ α/ ˆ φ (cid:17) distribution. Recruitments in first 2 months (QQ−plot)

Theoretical O b s e r v ed Figure 10: Observed recruitments compared tothe theoretical negative binomial distribution. . . . . . . Time (proportion) A cc r ua l ( p r opo r t i on ) Figure 11: Accrual (black, solid) for an oncology study; coloured solid lines are mean predictions fromcensus times, dashed lines are the 95% prediction bands, and the “+” symbols indicate opening times ofcentres. Census time BST p -value LRT p -value Forecast p -value1 0.196 0.226 0.6972 0.012 0.021 0.6253 < . < .

001 0.0294 < . < . < . p -values and the forecasting p -values at four census times.19 ensus time C o m p l e t i on t i m e p r ed i c t i on BMAPGHPP0.14 0.29 0.43 0.57 0.71 0.86 . . . . . . Figure 12: Predictive distributions for time needed to make the ﬁnal recruitment in the data examplein Section 7, as forecast by three diﬀerent modelling frameworks: Bayesian model averaging (BMA),time-homogeneous Poisson-gamma (PG) and homogeneous Poisson process ﬁt to accrual only (HPP).The horizontal line represents the true completion time and the prediction positions of the x -axis wereoﬀ-set by 0 .

01 for clarity.

We have introduced a general, ﬂexible framework for modelling and predicting recruitment to clinicaltrials. We suggest two tests for detecting decay in recruitment rates; comparing them both with respectto power and robustness. The particular form of the test statistic allows for a single, simple trial-leveltest. Alternative forms, such as splitting according to a global time, would either require a test for eachcentre, massively reducing the power, or estimates of all of the individual centre intensities which wouldintroduces several layers of additional complexity because of the hierarchical connection between thecentre intensities. If it were believed a priori that a particular global period would be unrepresentativethen this time span, and the concomitant recruitment, could simply be removed, albeit at the cost oflower power.The parametric curve-shape forms chosen for the intensity were based on the features encounteredin oncology trials. We found that the model was still robust to moderate model misspeciﬁcations in thedistribution of the random eﬀect and intensity shape. Other therapeutic areas such as pulmonary orcardio-vascular diseases experience more frequent recruitments and diﬀerent curve-shapes may be appro-priate. As shown in Section 6, model misspeciﬁcation becomes more of a problem at larger enrollmentrates. However, with increased frequency, pattern changes in the early months of a centre are easier20o identify. Using more complex parametric forms, such as Weibull or generalised gamma shape, couldlead to more accurate predictions. Alternatively, if covariate information is available, say x c for eachcentre, the following intensity form motivated by hazard models from survival analysis could be used: λ c ( t ) = λ oc exp { β ⊤ x c } g (cid:0) t ; exp { η ⊤ x c } (cid:1) , where λ oc are now random eﬀects coming from a Gamma( α, α )distribution and β and η are vectors of unknown parameters.As seen in the data example in Section 7 there can be external factors modulating the overall accrual.This could potentially be modelled via a short-term, constant global intensity modiﬁer, which wouldmaintain tractability. The framework is not constrained to parametric forms; non-parametric intensitymodels, such as those using B-splines (for example, Morgan et al. (2019)) or Gaussian processes (for exam-ple, Adams et al. (2009)), could be used instead. This, however, would make the intensity extrapolationproblem more diﬃcult.For curve-shape parameter prior construction, our choice of the quantity of interest R κ was moti-vated by simplicity of the form; one could just as well have used G κ ( t / θ ) G κ ( t ; θ ) , albeit with more algebraicmanipulations. The general method was aimed at models with monotonically decreasing intensities. Ifcurve-shapes such as Weibull are considered then constructing sensible priors will be more complicated.In presenting the method, we condition the inference and prediction on known initiation schedulesfor the centres. Incorporating stochastic centre initiation models, such as those in Anisimov (2009) andLan et al. (2019), into the Monte Carlo prediction framework is straight-forward, but would complicate thepresentation of our methodology without adding novelty. In Appendix H of the Supplementary Material,we demonstrate how recruitment can be predicted using our methodology when there is uncertainty inthe initiation schedule. For illustration, we imagine a Weibull-distributed delay to each centre’s initiation,but any other initiation model could be incorporated in a similar manner. We stress that full predictionintervals should take this uncertainty into account.In this work, we focus on patient recruitment regardless of the numbers of dropouts observed. Inpractice, screening failure and patient withdrawal are both prevalent in clinical trials. Assuming thedropouts are independent of the recruitment process, existing survival analysis techniques such as Cox’sproportional hazard model (Cox, 1972) or accelerated failure time frailty model (Wei, 1992) could be usedin combination with the recruitment model to produce distributions of the numbers of patients in thesystem at a given time. Such knowledge would be useful to the practitioners and operational researchersin charge of drug-supply chains for the centres.Anisimov and Fedorov (2007) introduced a method for determining the number of additional centresneeded to be initiated for the study to ﬁnish on time. With minimal adaptation, the same method canalso be used with our model. However, since it assumes that all new centres are initiated immediately, it21ay not apply in all scenarios. We would advocate a simulation-based approach, where forecasts basedon diﬀerent centre initiation schedules are compared. As diﬀerent operational costs can be associatedwith diﬀerent schedules, this would become a resource-constrained optimisation problem. Software in the form of R code is available at https://github.com/SzymonUrbas/ct-recuitment-prediction . Acknowledgements

This work was supported by the Engineering and Physical Sciences Research Council (grant numberEP/L015692/1) and AstraZeneca.

References

Adams, R. P., Murray, I., and MacKay, D. J. (2009). Tractable nonparametric Bayesian inference inPoisson processes with Gaussian process intensities. In

Proceedings of the 26th Annual InternationalConference on Machine Learning , pages 9–16.Akaike, H. (1973). Information theory and an extension of the maximum likelihood principle. Petrov BN,Csaki F, editors. Second International Symposium on Information Theory, pages 267–281. Budapest(Hungary): Akademiai Kiado.Anisimov, V. (2009). Predictive modelling of recruitment and drug supply in multicenter clinical trials.In

Proc. of Joint Statistical Meeting , pages 1248–1259.Anisimov, V. V. and Fedorov, V. V. (2007). Modelling, prediction and adaptive adjustment of recruitmentin multicentre trials.

Statistics in Medicine , 26(27):4958–4975.Cox, D. R. (1972). Regression models and life-tables.

Journal of the Royal Statistical Society: Series B(Methodological) , 34(2):187–202.Devroye, L. (1986).

Non-uniform Random Variate Generation . New York: Springer-Verlag.Doucet, A., Freitas, N. d., and Gordon, N. (2013).

Sequential Monte Carlo methods in practice . NewYork: Springer Science.Gajewski, B. J., Simon, S. D., and Carlson, S. E. (2008). Predicting accrual in clinical trials with Bayesianposterior predictive distributions.

Statistics in Medicine , 27(13):2328–2340.22elman, A., Carlin, J. B., Stern, H. S., Dunson, D. B., Vehtari, A., and Rubin, D. B. (2013).

Bayesiandata analysis . Chapman and Hall/CRC.Getz, K. and Lamberti, M. J. (2013). 89% of trials meet enrolment, but timelines slip, half of sitesunder-enrol.

Tufts CSDD Impact Report , 15:1–4.Gu, K., Ng, H. K. T., Tang, M. L., and Schucany, W. R. (2008). Testing the ratio of two Poisson rates.

Biometrical Journal: Journal of Mathematical Methods in Biosciences , 50(2):283–298.Heitjan, D. F., Ge, Z., and Ying, G. S. (2015). Real-time prediction of clinical trial enrollment and eventcounts: a review.

Contemporary Clinical Trials , 45:26–33.Hjort, N. L. and Claeskens, G. (2003). Frequentist model average estimators.

Journal of the AmericanStatistical Association , 98(464):879–899.Huzurbazar, V. S. (1950). Probability distributions and orthogonal parameters. volume 46 of

MathematicalProceedings of the Cambridge Philosophical Society , pages 281–284. Cambridge University Press.Jiang, Y., Simon, S., Mayo, M. S., and Gajewski, B. J. (2015). Modeling and validating Bayesian accrualmodels on clinical data and simulations using adaptive priors.

Statistics in Medicine , 34(4):613–629.Kasenda, B., Von Elm, E., You, J., Bl¨umle, A., Tomonaga, Y., Saccilotto, R., Amstutz, A., Bengough, T.,Meerpohl, J. J., Stegert, M., et al. (2014). Prevalence, characteristics, and publication of discontinuedrandomized trials.

JAMA , 311(10):1045–1052.Krishnamoorthy, K. and Thomson, J. (2004). A more powerful test for comparing two Poisson means.

Journal of Statistical Planning and Inference , 119(1):23–35.Lan, Y., Tang, G., and Heitjan, D. F. (2019). Statistical modeling and prediction of clinical trial recruit-ment.

Statistics in Medicine , 38:945–955.Lasagna, L. (1979). Problems in publication of clinical trial methodology.

Clinical Pharmacology &Therapeutics , 25(5part2):751–753.Lee, Y. J. (1983). Interim recruitment goals in clinical trials.

Journal of Chronic Diseases , 36(5):379–389.Lindley, D. V. (1957). A statistical paradox.

Biometrika , 44(1-2):187–192.Morgan, L. E., Nelson, B. L., Titman, A. C., and Worthington, D. J. (2019). A spline-based method formodelling and generating a nonhomogeneous Poisson process. In , pages 356–367. IEEE. 23elder, J. A. and Mead, R. (1965). A simplex method for function minimization.

The computer journal ,7(4):308–313.Piantadosi, S. and Patterson, B. (1987). A method for predicting accrual, cost, and paper ﬂow in clinicaltrials.

Controlled Clinical Trials , 8(3):202–215.Robert, C. and Casella, G. (2013).

Monte Carlo statistical methods . Springer Science & Business Media.Robertson, T., Wright, F., and Dykstra, R. (1988).

Order restricted statistical inference . Wiley: NewYork.Schwarz, G. (1978). Estimating the dimension of a model.

The Annals of Statistics , 6(2):461–464.Tang, G., Kong, Y., Chang, C.-C. H., Kong, L., and Costantino, J. P. (2012). Prediction of accrualclosure date in multi-center clinical trials with discrete-time Poisson process models.

PharmaceuticalStatistics , 11(5):351–356.Wei, L. J. (1992). The accelerated failure time model: a useful alternative to the Cox regression modelin survival analysis.

Statistics in Medicine , 11(14-15):1871–1879.Williford, W. O., Bingham, S. F., Weiss, D. G., Collins, J. F., Rains, K. T., and Krol, W. F. (1987).The “constant intake rate” assumption in interim recruitment goal methodology for multicenter clinicaltrials.

Journal of Chronic Diseases , 40(4):297–307.Zhang, X. and Long, Q. (2010). Stochastic modeling and prediction for accrual in clinical trials.

Statisticsin Medicine , 29(6):649–658. 24 upplementary material

This ﬁle contains the technical appendix for “Interim recruitment prediction for multi-centre clinicaltrials”. The algorithm for the non-parametric bootstrap test of Section 3 of the main article is outlined inAppendix A. Appendix B provides full parametric forms of the integrated intensity curve-shapes describedin Section 4; it also discusses a potential identiﬁability problem. Appendices C and D outline the details ofmaximum likelihood and Bayesian inference on the model parameters. Appendix E provides the densityof the prior described in Section 5.1. The time-to-completion Monte Carlo sampling algorithm is outlinedin Appendix F. Appendix G provides additional details and ﬁgures for the simulation study in Section6 and data analysis in Section 7. Appendix H describes an implementation of a centre-initiation delaymodel into the prediction framework.

A Non-parametric bootrapped test

Algorithm 1:

Non-parametric bootstrapped test input :

Series of counts { N c ( t ) } τ c t =1 , c = 1 , . . . , C ; number of bootstrapped samples B . output: Probability of observed diﬀerence in means under H .Calculate observed diﬀerence ∆ = P Cc =1 (cid:16)P τ c / t =1 N c ( t ) − P τ c t = τ c / N c ( t ) (cid:17) ; for b ← to B dofor c ← to C do Resample { N ( b ) c ( t ) } τ c t =1 with replacement;Calculate diﬀerence ∆ ( b ) = P Cc =1 (cid:16)P τ c / t =1 N ( b ) c ( t ) − P τ c t = τ c / N ( b ) c ( t ) (cid:17) ;Calculate approximate p -value: ˆ p = B P Bb =1 I { ∆ ≥ ∆ ( b ) } B Curve-shape

The integrated, normalised parametric intensities are: G ( t ) = t,G ( t ; θ ) = log(1 + θt )log(1 + θτ ) τ,G κ ( t ; θ ) = (1 + θt/κ ) − κ − θτ /κ ) − κ − τ, κ / ∈ { , , ∞} ,G ∞ ( t ; θ ) = 1 − exp {− θt } − exp {− θτ } τ. In two instances, the ﬂexible-tail form can give rise to identiﬁability problems:25 , τ >> κ/θ and κ < G κ ( t ; θ ) = (1 + θt/κ ) − κ − θτ /κ ) − κ − τ ≈ ( θt/κ ) − κ − θτ /κ ) − κ − τ ≈ (cid:18) tτ (cid:19) − κ τ, which does not depend on θ . t, τ >> κ/θ and κ >> G κ ( t ; θ ) = (1 + θt/κ ) − κ − θτ /κ ) − κ − τ ≈ (1 + θt/κ ) exp {− θt } − θτ /κ ) exp {− θτ } − τ ≈ exp {− θt } − {− θτ } − τ, which does not depend on κ . C Maximum likelihood inference

In the frequentist setting, we aim to ﬁnd estimators which maximise the likelihood surface (4.2) in themain paper. This is equivalent to maximising the log-likelihood surface (up to a constant) ℓ ( α, φ, θ | n , τ ) = C (cid:18) α log αφ − log Γ( α ) (cid:19) − C X c =1 (cid:26)(cid:16) α + n ( · ) c (cid:17) log (cid:18) G ( τ c ; θ ) + αφ (cid:19) − log Γ (cid:16) α + n ( · ) c (cid:17) − τ c X t =1 n ( t ) c log( G ( t ; θ ) − G ( t − θ )) ) . The log-likelihood function can be optimised using a range of methods, for example, the Nelder-Mead(Nelder and Mead, 1965) method used in R . The inverse of the negative Hessian at the mode can then beused as the covariance matrix for the asymptotic normal distribution of the MLEs.The α and φ parameters are asymptotically orthogonal for a homogeneous Poisson-gamma model(Huzurbazar, 1950). A time contraction argument can be used to extend the result to the inhomogeneouscase. As discussed in Section 4 of the main paper and visible from (4.3), in the special case where τ c ≡ τ ∀ c , θ is orthogonal to both α and φ . When carrying out maximum likelihood inference, diﬀerentmodel selection criteria such as AIC (Akaike, 1973) and BIC (Schwarz, 1978) can be used. Alternatively,one could employ frequentist model averaging methods (see Hjort and Claeskens (2003), for instance).The score function and the observed and expected information are provided in the SupplementaryMaterial. The only pair of parameters which are not asymptotically orthogonal when centres have notbeen open for the same length of time are φ and θ .26 core and observed and expected information Here we provide the score function and the observed and expected information, for frequentist inference.The score function is the gradient of the log-likelihood of the model, ∇ ℓ ( α, φ, θ | n , τ ) ==  C (cid:16) αφ − ψ ( α ) (cid:17) − P Cc =1 (cid:16) α + n ( · ) c α + φG ( τ c ; θ ) + log (cid:16) G ( τ c ; θ ) + αφ (cid:17) − ψ (cid:16) α + n ( · ) c (cid:17)(cid:17) τ − Cα/φ + P Cc =1 α (cid:16) α + n ( · ) c (cid:17) φ ( α + φG ( τ c ; θ )) τ − P Cc =1 (cid:20) ∂ θ G ( τ c ; θ ) (cid:18) α + n ( · ) c G ( τ c ; θ )+ αφ (cid:19) − P τ c t =1 n ( t ) c (cid:16) ∂ θ G ( t ; θ ) − ∂ θ G ( t − θ ) G ( t ; θ ) − G ( t − θ ) (cid:17)(cid:21)  . The observed information matrix is made up of the negative Hessian elements − ∂ αα ℓ ( α, φ, θ | n , τ ) = C (cid:18) ψ ′ ( α ) − α (cid:19) + C X n =1 ( φG ( τ c ; θ ) − n ( · ) c + 1 α + φG ( τ c ; θ ) − ψ ′ (cid:16) α + n ( · ) c (cid:17)) , − ∂ φφ ℓ ( α, φ, θ | n , τ ) = − Cα/φ + C X c =1 α ( α + 2 φG ( τ c ; θ )) (cid:16) α + n ( · ) c (cid:17) φ ( α + φG ( τ c ; θ )) , − ∂ θθ ℓ ( α, φ, θ | n , τ ) = C X c =1  (cid:16) α + n ( · ) c (cid:17) (cid:8) ∂ θθ G ( τ c ; θ )( G ( τ c ; θ ) + α/φ ) − ( ∂ θ G ( τ c ; θ )) (cid:9) ( G ( τ c ; θ ) + α/φ ) − τ c X t =1 n ( t ) c H t ∂ θθ H t − ( ∂ θ H t ) ( H t ) , − ∂ αφ ℓ ( α, φ, θ | n , τ ) = 1 φ ( C − C X c =1 α + 2 αφG ( τ c ; θ ) + φG ( τ c ; θ ) n ( · ) c ( α + φG ( τ c ; θ )) ) , − ∂ αθ ℓ ( α, φ, θ | n , τ ) = C X c =1 ∂ θ G ( τ c ; θ ) G ( τ c ; θ ) − n ( · ) c /φ { G ( τ c ; θ ) − α/φ } , − ∂ φθ ℓ ( α, φ, θ | n , τ ) = − α C X c =1 ∂ θ G ( τ c ; θ ) α + n ( · ) c { α + φG ( τ c ; θ ) } , where ψ ( x ) = Γ ′ ( x ) / Γ( x ) and H t = G ( t ; θ ) − G ( t − θ ) to simplify the notation. Noting that E h N ( · ) c i =27 G ( τ c ; θ ), we obtain the entries of the Fisher information matrix, E [ − ∂ αα ℓ ( α, φ, θ | N , τ )] = C (cid:18) ψ ′ ( α ) − α (cid:19) + C X n =1 (cid:20) α + φG ( τ c ; θ ) − E n ψ ′ (cid:16) α + n ( · ) c (cid:17)o(cid:21) ,E [ − ∂ φφ ℓ ( α, φ, θ | N , τ )] = αφ C X c =1 G ( τ c ; θ ) α + φG ( τ c ; θ ) ,E [ − ∂ θθ ℓ ( α, φ, θ | N , τ )] = C X c =1 " φ (cid:8) ∂ θθ G ( τ c ; θ )( G ( τ c ; θ ) + α/φ ) − ( ∂ θ G ( τ c ; θ )) (cid:9) φG ( τ c ; θ ) + α − τ c X t =1 n ( t ) c ∂ θθ H t − ( ∂ θ H t ) H t ,E [ − ∂ αφ ℓ ( α, φ, θ | N , τ )] =0 ,E [ − ∂ αθ ℓ ( α, φ, θ | N , τ )] =0 ,E [ − ∂ φθ ℓ ( α, φ, θ | N , τ )] = − α C X c =1 ∂ θ G ( τ c ; θ ) α + φG ( τ c ; θ ) . D Bayesian inference

For a general model with data y , parameter vector ψ ∈ Ω and likelihood f ( y | ψ ), we assign a prior densityor mass function to ψ , π ( ψ ). Inference is based on the posterior distribution, obtained by the Bayes’srule, π ( ψ | y ) = f ( y | ψ ) π ( ψ ) R Ω f ( y | ψ ) π ( ψ ) d ψ , ψ ∈ Ω . Often times, the marginal likelihood of the data p ( y ) = R Ω f ( y | ψ ) π ( ψ ) d ψ is not tractable and so MonteCarlo sampling methods need to be employed to obtain samples from the posterior. Strictly, the marginallikelihood, p ( y ) is p ( y | M ) the probability of the data given the choice of model, encapsulated in f .Consider, now, a range of models M , . . . , M K with associated prior probabilities π ( M k ), k = 1 , . . . , K .Using Bayes’s rule, we obtain the posterior model probabilities, up to a proportionality constant, π ( M k | y ) ∝ p ( y | M k ) π ( M k ) , k = 1 , . . . , K. Importance sampling

Multiplying the priors and the likelihood we obtain the posterior distribution for the parameters up to aproportionality constant. Since the dimension of the parameter space is not large, we can sample fromthe posterior by the means of importance sampling.28or any function of interest h ( ψ ), E [ h ( ψ )] = Z Ω h ( ψ ) π p ( ψ | y ) d ψ = R Ω h ( ψ ) ω ( ψ ) q ( ψ ) d ψ R Ω f ( y | ψ ) π ( ψ ) d ψ ≈ P Bb =1 h (cid:0) ψ ( b ) (cid:1) ω (cid:0) ψ ( b ) (cid:1)P Bb =1 ω (cid:0) ψ ( b ) (cid:1) , where ψ ( b ) , b = 1 , . . . , B are samples from a proposal distribution q with unnormalised weights ω ( ψ ) = f ( y | ψ ) π ( ψ ) q ( ψ ) . The marginal likelihood may be approximated byˆ p ( y ) = 1 B B X b =1 ω (cid:16) ψ ( b ) (cid:17) . This is an unbiased estimate which can be used for model selection or model averaging.The eﬃciency of the sampling procedure depends on the choice of proposal distribution q and the maybe measured using the eﬀective sample size (ESS),ESS = (cid:16)P Bb =1 ω (cid:0) ψ ( b ) (cid:1)(cid:17) P Bb =1 ω (cid:0) ψ ( b ) (cid:1) . If the proposal distribution closely resembles the true posterior, then all the weights will be roughlythe same resulting in the ESS being close to M . On the other extreme, if the proposal badly capturesthe posterior and one sample’s weight dominates the others, then ESS will be close to one.If ψ ( b ) are resampled with replacement with probabilities proportional to the weights, then the resultingsample, say { ψ ( b ) ∗ } Bb =1 , will have the distribution approximating π . The new sample is used when samplingfrom the predictive distribution to marginalise over the parameter posterior. E Curve-shape prior

The ﬂexible form (4.4) in Section 4 of the main paper, leads to the following prior density for ˜ θ , π (cid:16) ˜ θ | κ, a, b (cid:17) =  t exp n ˜ θ − t exp { ˜ θ } o f B (cid:16) exp n − t exp { ˜ θ } o ; a, b (cid:17) , κ = ∞ t exp { ˜ θ } (cid:16) t exp { ˜ θ } /κ (cid:17) − κ − f B (cid:18)(cid:16) t exp { ˜ θ } /κ (cid:17) − κ ; a, b (cid:19) , κ ∈ (0 , ∞ ) , where f B ( · ; a, b ) is a density of a beta variate with shape parameters a and b .29 Sampling time to completion via model averaging

In Lan et al. (2019), the time to recruit the required number of patients is sampled by repeatedly simulat-ing the whole system until the condition is satisﬁed, which is ineﬃcient because each iteration involves a(random) large number of expensive simulations. Additionally, it only provides an approximate distribu-tion due to the discretisation in the time domain; the discretisation of recruitment to monthly incrementsmight also aﬀect the precision of any predictions. To sample the time to completion exactly, we use the in-tegrated intensity function of the whole trial Λ( t ). If T is the time to the m th arrival of an inhomogeneousPoisson process with integrated intensity Λ( t ) then (Devroye, 1986)Λ( T ) ∼ Gamma( m, . Given ( α, β, θ ), we can sample rates the λ oc for all the centres and construct one realisation of the integratedintensity Λ for the whole trial. Then, to obtain a single realisation of T , we sample a Gamma( m,

1) variateand use an inverse-transform of Λ on it. Unless all the centres had been open for the same length of time,the inversion procedure will involve some root-ﬁnding algorithm, such as Nelder-Mead (Nelder and Mead,1965). As Λ( t ) in our framework is a monotonically increasing function, the non-linear equation will havea unique solution. Parameter uncertainty can be incorporated into this predictive by using a diﬀerentsample from the posterior at each iteration.Given C + centres with the ﬁrst C already opened before the census time and the remaining C + − C tobe open, as well as known centre opening times t ( c )0 , c = 1 , . . . , C + , we construct the integrated intensityfor modelling the recruitment since the census time τ ,Λ( t ) = C X c =1 λ oc n G (cid:16) t − t ( c )0 ; θ (cid:17) − G ( τ c ; θ ) o + C + X c = C +1 λ oc G (cid:16) t − t ( c )0 ; θ (cid:17) χ n t>t ( c )0 o , t ≥ τ, where χ {·} is the indicator function and λ oc | α, φ, θ, n ∼  Gamma (cid:16) α + n ( · ) c , α/φ + G ( τ c ; θ ) (cid:17) , c = 1 , . . . , C Gamma ( α, α/φ ) , c = C + 1 , . . . , C + . (12)The algorithm below outlines the sampling procedure to obtain the distribution of the time needed torecruit the target number of patients m . 30 lgorithm 2: Model-averaged time to completion sampling input :

Models M , . . . , M k with posterior probabilities π ( M | n ) , . . . , π ( M K | n ) andposterior samples from each model, number of samples from the predictive B , targetnumber of recruitments m output: Distribution of the time to completion (cid:8) T ( b ) (cid:9) Bb =1 for b ← to B do Sample M ( b ) ∼ π ( M k | n );Sample ( α, φ, θ ) ( b ) ∼ π ( α, φ, θ | M ( b ) , n );Sample rates λ oc | ( α, φ, θ ) ( b ) from distributions (12) and construct Λ ( b ) ( t );Sample ˜ T ∼ Gamma( m,

1) and solve Λ ( b ) ( T ) = ˜ T ;Set T ( b ) = T ; a . . . . . . f . . . . q Figure 13: Matrix scatterplot of the parameter posterior of the model with highest posterior probability.

G Additional details from the simulation study and data analysis

G.1 Simulation study

Figure 13 shows the plots of posterior samples of the model. The three parameters are close to orthogonalas discussed in Sections 4 and 5 of the paper, and this approximate independence was also observed inthe posteriors of other models.Figure 14 shows a QQ-plot of the hierarchical gamma distribution compared to the posterior meansof the random eﬀects. The approximately straight line indicates that generating rates for newly openedcentres from the gamma distribution will be consistent with what has been observed thus far. Figure 15shows a QQ-plot of the theoretical, negative binomial distribution of recruitments in the ﬁrst 2 months31 .00 0.02 0.04 0.06 0.08 . . . l o distribution (QQ−plot) Theoretical O b s e r v ed Figure 14: Re-estimated λ oc expectations com-pared to Gamma (cid:16) ˆ α, ˆ α/ ˆ φ (cid:17) distribution. Recruitments in first 2 months (QQ−plot)

Theoretical O b s e r v ed Figure 15: Observed recruitments compared tothe theoretical negative binomial distribution.compared the observed distribution ( t ∗ = 60). The theoretical distribution used the posterior meansof the parameters, and the prior random eﬀect distribution was used. The straight line shows that themodel can predict the recruitment in the ﬁrst two months of a centre suﬃciently well. In practice, the twodiagnostics would indicate that the mixing gamma distribution is suﬃcient and that the model is capableof accurately predicting recruitments in the early days of a new centre. Figures 16, 17, 18 and 19 showthe diagnostic plots for models ﬁt to simulated datasets at the census t = 360 with the true random-eﬀectdistribution being a mixture. For E [ λ oc ] = 0 .

01, the relationship is close to linear and is reﬂected inthe reasonably accurate predictions shown in the article. The QQ-plots for E [ λ oc ] = 0 .

03 show strongernon-linearity and informing us of the potential misspeciﬁcation, thus showing that the diagnostics can beused to validate the model.Figures 20a and 20b show examples of recruitment predictions when the random eﬀects have a mixturedistribution and the centre opening times are “clumped” together. The clumping accentuates the eﬀect ofthe misspeciﬁcation; the ﬁtted model relies on the “incorrect” prior gamma distribution when simulatingrates for unopened centres.Figures 21a and 21b show the predictions made when using data simulated from a Weibull-shapeintensity with centre opening times clumped together. With repeated simulations, we found a consistentcorrespondence between linear QQ-plots and accurate predictions.

G.2 Data analysis

In the dataset examined in Section 7 of the main paper, we encountered an unexpected surge in recruit-ments at a global scale. Figure 22 shows the accrual along with 2 sets of forecasts, focusing on the surge32 .00 0.02 0.04 0.06 0.08 0.10 . . . . l o distribution (QQ−plot) Theoretical O b s e r v ed Figure 16: Re-estimated λ oc expectations com-pared to Gamma (cid:16) ˆ α, ˆ α/ ˆ φ (cid:17) distribution; truerandom-eﬀect distribution is a mixture with E [ λ oc ] = 0 . Recruitments in first 2 months (QQ−plot)

Theoretical O b s e r v ed Figure 17: Observed recruitments comparedto the theoretical negative binomial distribu-tion; true random-eﬀect distribution is a mix-ture with E [ λ oc ] = 0 . . . . l o distribution (QQ−plot) Theoretical O b s e r v ed Figure 18: Re-estimated λ oc expectations com-pared to Gamma (cid:16) ˆ α, ˆ α/ ˆ φ (cid:17) distribution; truerandom-eﬀect distribution is a mixture with E [ λ oc ] = 0 . Recruitments in first 2 months (QQ−plot)

Theoretical O b s e r v ed Figure 19: Observed recruitments comparedto the theoretical negative binomial distribu-tion; true random-eﬀect distribution is a mix-ture with E [ λ oc ] = 0 .

100 200 300 400 500 600

Time A cc r ua l (a) Clumped openings, E [ λ oc ] = 0 . p -value = 0.666 Time A cc r ua l (b) Clumped openings, E [ λ oc ] = 0 . p -value = 0.312Figure 20: Accruals (black, solid) with predictive means (red, solid) and 95% prediciton bands (red,dashed) when the true random-eﬀect distribution is a mixture, in various scenarios. Prediction bands arebased on the 2 .

5% and 97 .

5% quantiles. The “+” symbols on the abscissa indicate centre opening times.

Time A cc r ua l (a) Clumped openings, E [ λ oc ] = 0 . p -value = 0.294 Time A cc r ua l (b) Clumped openings, E [ λ oc ] = 0 . p -value = 0.064Figure 21: Accruals (black, solid) with predictive means (red, solid) and 95% prediciton bands (red,dashed) when the true intensity shape is Weibull and opening times are clumped, with two diﬀerentvalues for E [ λ oc ]. Prediction bands are based on the 2 .

5% and 97 .

5% quantiles. The “+” symbols on theabscissa indicate centre opening times. 34 .5 0.6 0.7 0.8 0.9 . . . . . Time (proportion) A cc r ua l ( p r opo r t i on ) Figure 22: Accrual predictions, zoomed-in to focus on the unexpected surge in recruitment at around thetime of 0 .

7. Only interim forecasts from times 0 . . .

7. Once this has been observed, and forward predictions are needed, one possibilityis modelling this as a global surge in recruitment; that is, during the period between 0 . .

75 allrecruitment rates are multiplied by exp { β } for some unknown β , which would be an extra parameter tobe estimated via importance sampling. H Stochastic centre-initiation times

The framework, as presented in the main paper, is conditioned on the set of initiation times both forclarity of presentation and because it is the methodological contribution from the paper. In practice, theexact future initiation times would be unknown; instead, the practitioners would have proposed initiationschedules, contingency plans and recruitment data up to the census time. Here we present a simulationstudy similar to that in Section 6 of the main paper which illustrates how a stochastic centre-initiationmodel can be seamlessly incorporated. The centres are not initiated exactly on schedule but, instead,there is a Weibull-distributed initiation delay for each centre. Following information provided to usfrom a large meta-analysis, we set the Weibull parameters such that the 5th and 95th percentiles are10 and 322 days respectively; the median delay is 90 days. At the census, the observed day-censoreddelays are used for maximum-likelihood estimation of the Weibull parameters; additionally centres whichwere planned to initiate before the census but did not do so contribute with a censored likelihood. Theestimates are then used in the Monte Carlo simulations. Figure 23 compare the predictions under threediﬀerent approaches; (i) the correct Weibull distribution for delays (with parameters estimated from thedata), (ii) a constant, avergae delay taken to be the sample mean of the observed delays, and (iii) an35

00 350 400 450 500 550 600

Fitted Weibull delay

Time A cc r ua l

300 350 400 450 500 550 600

Average delay approximation

Time A cc r ua l

300 350 400 450 500 550 600

No delay

Time A cc r ua ll