[PDF] Deep Historical Borrowing Framework to Prospectively and Simultaneously Synthesize Control Information in Confirmatory Clinical Trials with Multiple Endpoints

Abstract

In current clinical trial development, historical information is receiving more attention as providing value beyond sample size calculation. Meta-analytic-predictive (MAP) priors and robust MAP priors have been proposed for prospectively borrowing historical data on a single endpoint. To simultaneously synthesize control information from multiple endpoints in confirmatory clinical trials, we propose to approximate posterior probabilities from a Bayesian hierarchical model and estimate critical values by deep learning to construct pre-specified decision functions before the trial conduct. Simulation studies and a case study demonstrate that our method additionally preserves power, and has a satisfactory performance under prior-data conflict.

Full PDF

DDeep Historical Borrowing Framework to Prospectively andSimultaneously Synthesize Control Information inConﬁrmatory Clinical Trials with Multiple Endpoints

Tianyu Zhan , † , Yiwang Zhou , Ziqian Geng , Yihua Gu , Jian Kang ,Li Wang , Xiaohong Huang and Elizabeth H. Slate Data and Statistical Sciences, AbbVie Inc. Department of Biostatistics, University of Michigan, Ann Arbor Department of Statistics, Florida State University † Corresponding author: Tianyu Zhan, [email protected] 31, 2020

Abstract

In current clinical trial development, historical information is receiving more attention as provid-ing value beyond sample size calculation. Meta-analytic-predictive (MAP) priors and robust MAPpriors have been proposed for prospectively borrowing historical data on a single endpoint. To si-multaneously synthesize control information from multiple endpoints in conﬁrmatory clinical trials,we propose to approximate posterior probabilities from a Bayesian hierarchical model and estimatecritical values by deep learning to construct pre-speciﬁed decision functions before the trial conduct.Simulation studies and a case study demonstrate that our method additionally preserves power, andhas a satisfactory performance under prior-data conﬂict.1 a r X i v : . [ s t a t . M E ] A ug eywords : Bayesian hierarchical model; Deep learning; Family-wise error rate control; Powerpreservation; Prospective algorithm Historical control data are usually summarized as estimates of parameters needed to calculate thesample size when designing a traditional Phase III randomized clinical trial (Chow et al., 2007).This relevant information can be properly borrowed for the current trial to make it more eﬃcientand ethical, by allowing fewer patients randomized to the control group, or decreasing the totalsample size (Berry et al., 2010; Viele et al., 2014). Some challenges exist in applying this frameworkto clinical trials, especially conﬁrmatory studies. While it is always possible to retrospectively usehistorical information once the new evidence is available, it is appealing to ensure study integrityby designing a prospective algorithm for leveraging historical data (Neuenschwander et al., 2010).Moving beyond the “sweet spot” where the borrowed information and the current data are close, oneneeds to properly discount historical information to control the bias and the type I error rates (Vieleet al., 2014).In the context of a single endpoint, Neuenschwander et al. (2010) proposed a novel meta-analytic-predictive (MAP) approach to prospectively borrow historical information for the current trial.Schmidli et al. (2014) further developed an innovative method to approximate the MAP prior bya mixture of conjugate priors, and therefore the posterior distribution is available in a closed form. Arobust MAP prior is then formulated by adding a weakly informative component to discount historicaldata under prior-data conﬂict (Schmidli et al., 2014). Moving further to conﬁrmatory clinical trials,most use multiple endpoints to assess the eﬀects of the study drug (Food and Administration, 2017).A Bayesian hierarchical model is a natural approach to simultaneously synthesize information frommultiple endpoints (Berry et al., 2010). However, taking a trial with binary endpoints as an example,additional non-trivial work is needed to generalize the MAP framework to approximate the jointprior of response rates and to investigate whether the resulting multivariate posterior distribution isavailable analytically.As an alternative, we propose a two-stage Deep Neural Networks (DNN) guided algorithm to2uild pre-speciﬁed decision functions before initiation of the current trial. At the ﬁrst stage, wetake advantage of the strong functional representation of DNN to directly approximate the posteriorprobabilities and the posterior means from a Bayesian hierarchical model (Goodfellow et al., 2016).The pre-trained DNN models can be locked in ﬁles before initiation of the current trial to ensure studyintegrity. To protect the family-wise error rate (FWER) in conﬁrmatory trials, we further constructanother DNN-based algorithm to estimate the critical values in the second stage. After obtainingresults from the new trial, one can instantly compute the posterior probabilities and critical valuesneeded for hypothesis testing. This process also contributes to power preservation as compared witha typical practice of choosing a constant critical value in order to control the maximum simulatedtype I error rate within a subset of the null space. Simulations show that our method has relativelysmall bias and mean squared error (MSE) under prior-data conﬂict by properly discounting priorinformation.The remainder of this article is organized as follows. In Section 2, we introduce a Bayesianhierarchical model on control data from several historical studies. Next we propose a DNN-basedalgorithm to approximate the posterior probabilities and critical values with controlled FWER tobuild pre-speciﬁed decision functions in Section 3. Simulations in Section 4 and a case study in Section5 are conducted to evaluate the performance of our method. Concluding remarks are provided inSection 6.

Consider a two-group randomized controlled clinical trial with I ( I ≥

2) endpoints to study theeﬃcacy of a treatment versus placebo. We consider a setup of I = 2 binary endpoints for illustration,but our method can be readily generalized to I > R ( t ) i, as the number of responders in the current treatment group for endpoint i , where i ∈ { , · · · , I } ,and n ( t )0 as the total number of subjects from the treatment arm in the current trial. For eachendpoint i , a beta conjugate prior is assumed on the binomial sampling distribution with rate ψ ( t ) i, , R ( t ) i, ∼ Binomial (cid:110) n ( t )0 , ψ ( t ) i, (cid:111) , ψ ( t ) i, ∼ Beta ( a i , b i ) , i = 1 , · · · , I. (1)3he control data are available in the current trial and J historical studies. The correspondingnotations are denoted as R ( c ) i,j and n ( c ) j , where j = 0 indicates the current trial, j ∈ { , · · · , J } isthe index of historical study j , and i ∈ { , · · · , I } refers to endpoint i . We consider the followingBayesian hierarchical model on the control data (Neuenschwander et al., 2010; Schmidli et al., 2014), R ( c ) i,j ∼ Binomial (cid:110) n ( c ) j , ψ ( c ) i,j (cid:111) , µ i,j = logit (cid:110) ψ ( c ) i,j (cid:111) , µ j ∼ M V N ( θ , Σ) , (2)where µ j = ( µ ,j , · · · , µ I,j ), for i ∈ { , · · · , I } , j ∈ { , , · · · , J } , and M V N ( θ , Σ) denotes amultivariate normal distribution with mean vector θ and variance-covariance matrix Σ. A vagueprior is assumed on θ , and an InverseW ishart (Σ , k ) prior is assigned to the variance-covariancematrix Σ with positive deﬁnite I × I matrix Σ and degrees of freedom k ≥ I . The expecta-tion of a W ishart (Σ , k ) is k (Σ ) − , and therefore Σ /k is a prior guess for Σ. We use D H = (cid:104) R ( c ) i,j , n ( c ) j , i ∈ { , · · · , I } , j ∈ { , · · · , J } (cid:105) to denote control information in J historical studies, and D N = (cid:104) R ( c ) i, , n ( c )0 , R ( t ) i, , n ( t )0 , i ∈ { , · · · , I } (cid:105) as data in the current new trial.Our quantity of interest is the posterior probability of observing a promising treatment eﬀect inthe current trial, S i = P r (cid:110) ψ ( t ) i, − ψ ( c ) i, > θ i (cid:12)(cid:12)(cid:12) D H , D N (cid:111) , i = 1 , · · · , I, (3)where θ i is a pre-speciﬁed constant for endpoint i . The decision function of rejecting the null hy-pothesis pertaining to endpoint i is if S i > (cid:101) c i . The computation of (cid:101) c i is studied in the next sectionto control the family-wise error rate (FWER) at a nominal level α . We denote S = ( S , · · · , S I ) asa vector of those posterior probabilities.Due to the hierarchical structure (2), the computation of S in (3) usually requires the Markovchain Monte Carlo (MCMC) method (Berry et al., 2010). However, it is appealing to build a prospec-tive algorithm before conducting the current new trial to ensure the study integrity. In studies with asingle binary endpoint ( I = 1), Schmidli et al. (2014) proposed a novel approach by approximating theMeta Analytic Predictive (MAP) prior p (cid:110) ψ ( c ) i, | D H (cid:111) with a mixture of beta distributions, and hencethe posterior distribution p (cid:110) ψ ( c ) i, | D H , D N (cid:111) becomes a weighted average of beta distributions. In thecontext of multiple endpoints ( I ≥ p (cid:110) ψ ( c )0 | D H (cid:111) , where ψ ( c )0 = (cid:110) ψ ( c )1 , , · · · , ψ ( c ) I, (cid:111) ,the joint posterior probability S does not necessarily have an analytic closed form.As an alternative, we propose to directly approximate S based on observed historical data D H and varying new trial data D N by deep neural networks (DNN) in the study design stage. Aftercollecting results from the current trial, one can instantly compute S and conduct downstreamhypothesis testing based on this pre-speciﬁed approximation function. In this section, we ﬁrst provide a short review on DNN in Section 3.1, and then introduce our DNNguided historical borrowing framework by approximating the posterior distribution of S in Section3.2 and modeling the critical values in Section 3.3. Deep learning is a speciﬁc subﬁeld of machine learning as a new take on learning representationsfrom data with successive layers (Chollet and Allaire, 2018). A major application of Deep NeuralNetworks (DNN) is to approximate some functions with input data (Goodfellow et al., 2016). DNNdeﬁnes a mapping function S = F ( M ; φ ) and learns the value of parameters φ that result in thebest function approximation of output label S based on input data M , where φ denotes a stack ofall weights and bias parameters in the DNN. For example in Figure 1, the left input M of dimension4 is transfered by 2 hidden layers to predict a 2-dimensional output S on the right.Typically, training a DNN involves the following four components: layers, input data and cor-responding output labels, loss function, and optimizer (Chollet and Allaire, 2018). To avoid thepotential over-ﬁtting, cross-validation is commonly used to select the architecture from a pool ofcandidates (Goodfellow et al., 2016). The loss function measures how well the ﬁtted DNN F ( M ; “ φ )approximates the objective function S . The mean squared error (MSE) loss can be utilized if S iscontinuous. The optimizer determines how the network will be updated based on the loss function.It usually implements a speciﬁc variant of the stochastic gradient descent (SGD) algorithm; for ex-ample, RMSProp (Hinton et al., 2012) has been shown to be an eﬀective and practical optimization5nput We denote R ( c ) i = (cid:110) R ( c ) i, , · · · , R ( c ) i,J (cid:111) as a stack of numbers of responders in all historical studies J forendpoint i , i = 1 , · · · , I , and further denote R ( c ) H = (cid:110) R ( c )1 , · · · , R ( c ) I (cid:111) . Corresponding notations are R ( c ) N = (cid:110) R ( c )1 , , · · · , R ( c ) I, (cid:111) for the current control group, and R ( t ) N = (cid:110) R ( t )1 , , · · · , R ( t ) I, (cid:111) for the currenttreatment arm. The subscript ”H” refers to historical data, while ”N” corresponds to new trial data.We consider P ( c ) i as a parameter space covering ψ ( c ) i,j for endpoint i in all J historical studies and ψ ( c ) i, in the current study. For example, P ( c ) i = (cid:110) ψ ( c ) i : ψ ( c ) i ∈ (0 . , . (cid:111) indicates that the control responserate for endpoint i in all historical studies and the current study ranges from 0 . .

5. It can beset wider as needed. Similarly, T i is the parameter space of the treatment eﬀect for endpoint i .In Algorithm 1, we utilize DNN to construct a mapping function F S ( M ; “ φ ) to predict S in (3)based on input data M , where M = (cid:110) R ( c ) N , R ( t ) N (cid:111) and “ φ are the estimated parameters in DNN. InStep 2, we perform cross-validation to select a proper DNN structure. By increasing the number oflayers and number of nodes in DNN, the empirical MSE from the training dataset usually decreases,but the validation MSE may increase. Then we implement certain regulation approaches, for exampledropout techniques, on the over-saturated DNN structure to decrease validation MSE while keepingthe training MSE below a certain tolerance, for example 10 − . Several structures around this sub-optimal structure are added to the candidate pool for cross-validation. The ﬁnal DNN structure is6elected as the one with the smallest validation error and is utilized in Step 3 to get the predictiveDNN “ S = F S ( M ; “ φ ).We consider a setup where samples size (cid:110) n ( c )0 , n ( c )1 , · · · , n ( c ) J , n ( t )0 (cid:111) and historical data R ( c ) H areconstants. Therefore, the input data M = (cid:110) R ( c ) N , R ( t ) N (cid:111) for DNN only contains the number ofresponders in the current trial. When there are I = 2 endpoints, M has 4 elements and S has 2elements as shown in Figure 1. As a generalization, one can build a more general DNN functionto accommodate varying sample size and varying historical data. Similarly, we train another DNN F P (cid:104) R ( c ) N ; “ φ P (cid:105) to approximate the posterior means of control response rates (cid:104) ψ ( c )1 , , · · · , ψ ( c ) I, (cid:105) . Notethat only the numbers of control responders R ( c ) N are included as covariates, because the treatmentand control group are assumed to be independent by models (1) and (2). Algorithm 1

Train a DNN F S ( M ; “ φ ) to approximate the posterior distributions S

1. Construct a training dataset for the DNN of size B . In each training data b , uniformly draw ψ ( c ) i, from P ( c ) i , ∆ i, from T i and set ψ ( t ) i, = ψ ( c ) i, + ∆ i, , for i = 1 , · · · , I . The training inputdata M = (cid:104) R ( c ) N , R ( t ) N (cid:105) is further simulated from binomial distributions with their correspondingresponse rates. The output label S in (3) is computed based on Section 2.2. Perform cross-validation on several candidate DNN structures to select one with the smallestvalidation error for ﬁnal ﬁtting.3. Train a DNN to build an approximating function “ S = F S ( M ; “ φ ) to predict S based on data M , where “ φ are the estimated parameters in the DNN. In this section, we discuss how to compute the critical value (cid:101) c i in the decision rule S i > (cid:101) c i of rejectingthe null hypothesis pertaining to endpoint i to strongly control FWER at a nominal level α .Family-wise error rate (FWER) is the probability of rejecting at least one true null hypothesis.FWER is said to be controlled at level α in the strong sense if it does not exceed α under anyconﬁguration of true and false hypotheses (Bretz et al., 2016). Deﬁne H and H as the single nullhypotheses where only endpoint 1 or 2 has no treatment eﬀect, and H as the global null hypothesiswhere neither endpoints have treatment eﬀects. In the context of I = 2 endpoints, we need to control7he following three erroneous probabilities, P r ß S > c (cid:12)(cid:12)(cid:12)(cid:12) H ™ ≤ α, H : ψ ( c )1 , = ψ ( t )1 , , ψ ( c )2 , < ψ ( t )2 , (4) P r ß S > c (cid:12)(cid:12)(cid:12)(cid:12) H ™ ≤ α, H : ψ ( c )2 , = ψ ( t )2 , , ψ ( c )1 , < ψ ( t )1 , (5) P r ß ( S > c ) ∪ ( S > c ) (cid:12)(cid:12)(cid:12)(cid:12) H ™ ≤ α, H : ψ ( c )1 , = ψ ( t )1 , , ψ ( c )2 , = ψ ( t )2 , (6)where c is the critical value to control error rate under H , c for H , and c for H .In Algorithm 2, we train three DNNs to estimate these three critical values: c , c and c . Taking H in (4) as an example, we deﬁne ψ ( c,t )1 , as the common value of ψ ( c )1 , and ψ ( t )1 , under H . The traininginput data is denoted as M = (cid:110) ψ ( c,t )1 , , ψ ( c )2 , , ∆ , (cid:111) . In Step 1, we simulate B varying M ’s fromparameter spaces to get the training input data. Given each training feature M , we then simulate B (cid:48) samples under H and utilize the DNN obtained from Algorithm 1 to calculate their posteriorprobabilities at “ S = F S ( M ; “ φ ). The critical value c is empirically computed as the upper α quantileof “ S in “ S to satisfy (4). We further train a DNN to obtain a mapping function (cid:98) c = F ( M ; “ φ ) topredict c based on M . Step 2 constructs (cid:98) c = F ( M ; “ φ ) under H in (5), and Step 3 computes (cid:98) c = F ( M ; “ φ ) under H in (6).Once observed data are obtained, we estimate M = (cid:110) ψ ( c,t )1 , , ψ ( c )2 , , ∆ , (cid:111) by their empirical coun-terparts M (cid:48) = ï (cid:110) R ( c )1 , + R ( t )1 , (cid:111) / (cid:110) n ( c )0 + n ( t )0 (cid:111) , R ( c )2 , /n ( c )0 , R ( t )2 , /n ( t )0 − R ( c )2 , /n ( c )0 ò under H at Step 4.Utilizing the trained DNN from Step 1, we get (cid:98) c = F ( M (cid:48) ; “ φ ). The critical value (cid:98) c and (cid:98) c arecomputed in similar fashion. Since the strong control of FWER is under all conﬁgurations of trueand false null hypotheses speciﬁed in (4), (5) and (6), we set (cid:101) c = max( (cid:98) c , (cid:98) c ) for rejecting nullhypothesis H with the decision function S > (cid:101) c , and correspondingly (cid:101) c = max( (cid:98) c , (cid:98) c ) for H atStep 7. This is analogous to the closure principle of handling multiplicity issues, where the rejectionof a particular elementary hypothesis requires the rejection of all intersection hypotheses containingit (Tamhane and Gou, 2018). 8 lgorithm 2 Approximate the critical values to protect FWER in the strong sense

Train three DNNs to estimate the critical values

1. For H in (4), simulate training input data M = (cid:110) ψ ( c,t )1 , , ψ ( c )2 , , ∆ , (cid:111) of size B . Given eachtraining response rate M , simulate B (cid:48) samples under H and utilize the DNN from Algorithm 1to compute their posterior probabilities at “ S = F S ( M ; “ φ ). The critical value c is computed asthe upper α quantile of “ S in “ S . Train a DNN to obtain the mapping function (cid:98) c = F ( M ; “ φ ).2. Under H in (5), train another DNN (cid:98) c = F ( M ; “ φ ) to estimate c based on M = (cid:110) ψ ( c )1 , , ∆ , , ψ ( c,t )2 , (cid:111) of training data size B and null data size B (cid:48) .3. Under H in (6), the training input data is M = (cid:110) ψ ( c,t )1 , , ψ ( c,t )2 , (cid:111) . The critical value c iscomputed by solving a non-linear equation in (6) based on “ S and “ S of size B (cid:48) . The ﬁtted DNNis denoted as (cid:98) c = F ( M ; “ φ ). Compute critical values based on observed data in the current trial (cid:110) R ( c )1 , , R ( c )2 , , R ( t )1 , , R ( t )2 , (cid:111)

4. Construct null data under (4) as M (cid:48) = (cid:104)(cid:110) R ( c )1 , + R ( t )1 , (cid:111) / (cid:110) n ( c )0 + n ( t )0 (cid:111) , R ( c )2 , /n ( c )0 , R ( t )2 , /n ( t )0 − R ( c )2 , /n ( c )0 (cid:105) ,and calculate the critical value (cid:98) c = F ( M (cid:48) ; “ φ ).5. Construct null data under (5) as M (cid:48) = (cid:104) R ( c )1 , /n ( c )0 , R ( t )1 , /n ( t )0 − R ( c )1 , /n ( c )0 , (cid:110) R ( c )2 , + R ( t )2 , (cid:111) / ( n ( c )0 + n ( t )0 ) (cid:105) ,and calculate the critical value (cid:98) c = F ( M (cid:48) ; “ φ ).6. Construct null data under (6) as M (cid:48) = (cid:104)(cid:110) R ( c )1 , + R ( t )1 , (cid:111) / ( n ( c )0 + n ( t )0 ) , (cid:110) R ( c )2 , + R ( t )2 , (cid:111) / ( n ( c )0 + n ( t )0 ) (cid:105) ,and calculate the critical value (cid:98) c = F ( M (cid:48) ; “ φ ).7. The ﬁnal critical value is computed at (cid:101) c = max( (cid:98) c , (cid:98) c ) for rejecting null hypothesis H with “ S > (cid:101) c , and (cid:101) c = max( (cid:98) c , (cid:98) c ) with “ S > (cid:101) c for H .9 Simulation studies

In this section, we conduct simulation studies to evaluate the performance of our proposed method,and compare it with the Meta-Analytic-Prior (MAP) approach (Neuenschwander et al., 2010; Schmidliet al., 2014). Suppose that there are two endpoints ( I = 2) to be evaluated in a randomized clinicaltrial comparing a treatment versus placebo with equal sample size n ( c )0 = n ( t )0 = 150. The controlinformation is also available in J = 6 historical studies with sample sizes 100 , , , , . . α = 0 . R ( c ) i,j for endpoint i , i = 1 ,

2, study j , j = 1 , · · · , . i = 1 and 0 . i = 2. The empirical correlation of estimated response rate R ( c ) i,j /n ( c ) j between two endpoints is 0 .

01. Additional simulation studies with an empirical correlationaround 0 . j n ( c ) j

100 100 200 200 300 300 R ( c )1 ,j

33 41 78 81 115 113 R ( c )2 ,j

31 28 69 68 94 97Table 1: Control data from J = 6 historical studies.To implement our DNN-based method, we ﬁrst approximate the posterior distribution S in(3) based on Algorithm 1 with training data size B = 8 , M = (cid:110) R ( c )1 , , R ( c )2 , , R ( t )1 , , R ( t )2 , (cid:111) are drawn from binomial distributions with rates (cid:110) ψ ( c )1 , , ψ ( c )2 , , ψ ( t )1 , , ψ ( t )2 , (cid:111) , whichare further simulated from the following 4 patterns with equal size B/ , ψ ( c )1 , ∼ U nif (0 . , . ψ ( c )2 , ∼ U nif (0 . , . , = 0; ∆ , = 0,2. ψ ( c )1 , ∼ U nif (0 . , . ψ ( c )2 , ∼ U nif (0 . , . , ∼ U nif ( − . , . , = 0,3. ψ ( c )1 , ∼ U nif (0 . , . ψ ( c )2 , ∼ U nif (0 . , . , = 0; ∆ , ∼ U nif ( − . , . ψ ( c )1 , ∼ U nif (0 . , . ψ ( c )2 , ∼ U nif (0 . , . , ∼ U nif ( − . , . , ∼ U nif ( − . , . U nif denotes the Uniform distribution, and the treatment response rates are calculated at ψ ( t ) i, = ψ ( c ) i, + ∆ i, for i = 1 ,

2. Next we obtain posterior samples of ψ ( c ) i, from model (2) based onthe Markov chain Monte Carlo (MCMC) method implemented by the R package R2jags (Su andYajima, 2015). We put a vague prior on θ as a normal distribution with mean zero and precision0 .

01, and a

InverseW ishart (Σ , k ) prior for Σ with Σ as a unit diagonal matrix and k = I + 1. Theconvergence of the MCMC algorithm is checked by the criteria of “ R < .

01 among 3 chains, where “ R is the ratio of between-chain versus within-chain variability (Gelman et al., 1992; Berry et al., 2010).The posterior distribution of ψ ( t ) i, is a beta distribution with a non-informative beta prior a i = b i = 1based on the Beta-Binomial conjugate model in (1). Our posterior probability S i in (3) is evaluatedby 10 ,

000 posterior samples with θ i = 0. In Step 2, we choose a DNN structure with 2 layers, 60nodes per layer by cross validation with validation MSE less than 10 − . The batch size is 100, andthe number of training epochs is 1 ,

000 with dropout rate 0 .

1. From the ﬁnal ﬁtting at Step 3, weobtain DNN “ S = F S ( M ; “ φ ) to predict S in (3).In Algorithm 2 of approximating the critical values, we simulate B = B = B = 2 ,

000 datasetsto reﬂect patterns of H in (4), H in (5) and H in (6),1. ψ ( c,t )1 , ∼ U nif (0 . , . ψ ( c )2 , ∼ U nif (0 . , . , ∼ U nif ( − . , . ψ ( t )2 , = ψ ( c )2 , + ∆ , ,2. ψ ( c )1 , ∼ U nif (0 . , . , ∼ U nif ( − . , . ψ ( t )1 , = ψ ( c )1 , + ∆ , ; ψ ( c,t )2 , ∼ U nif (0 . , . ψ ( c,t )1 , ∼ U nif (0 . , . ψ ( c,t )2 , ∼ U nif (0 . , . B (cid:48) = B (cid:48) = B (cid:48) = 100 , w = 50% and w = 80% non-informative component (Schmidliet al., 2014) by the R package RBesT (Weber, 2020). These methods handle data from each endpointseparately, instead of modeling them jointly as in our model (2). The setup follows their defaultsettings with a weakly informative

Half-Normal (0 ,

1) prior on the standard deviation of the logit ofthe response rate (Weber et al., 2019). Hypothesis testing is also based on posterior probabilities S i ,but the critical values (cid:101) c i are chosen by a grid search method to control validation type I error ratesnot exceeding α = 0 .

05 within a certain range of null response rates in the following Table 2.11n Table 2, we ﬁrst evaluate the error rates of falsely rejecting H , H or H under the global nullhypothesis where ∆ , = ∆ , = 0. Our proposed method has a relatively accurate control on threeerror rates at α = 0 .

05 across three scenarios with varying ψ ( c )1 , and ψ ( c )2 , . The constant critical valuesof MAP and two robust MAPs are chosen at 0 . . . α under all three cases. The “worst case scenario” is ψ ( c )1 , = 0 . ψ ( c )2 , = 0 .

4, wherethe probability of rejecting H is approximately 0 .

05. This choice of critical values essentially leadsto conservative error rates in other cases and a potential power loss as evaluated later. Under a singlenull hypothesis where only a single ∆ i is equal to zero, the error rate happens when this particulartrue null hypothesis is erroneously rejected. All methods control this error rate well below α . Whenit comes to alternative hypotheses, our method has a higher power of rejecting each elementary nullhypothesis, and a higher power of rejecting at least one of them than MAP and two RMAPs under ψ ( c )1 , = 0 . , ψ ( c )2 , = 0 . ψ ( c )1 , = 0 . , ψ ( c )2 , = 0 .

3. This is mainly due to the conservative type Ierror of using a constant critical value for MAP and RMAPs. When response rates are higher at ψ ( c )1 , = 0 . , ψ ( c )2 , = 0 .

4, MAP usually has the best power performance, followed by our DNN method,and then two RMAPs. The number of simulation iterations in validation is 100 , ψ ( c )1 , and ψ ( c )2 , . In scenarios where the current control rates and historical rates are consistentat 0 . . ψ ( c )1 , and ψ ( c )2 , ,and posterior probabilities S and S in (3) from B = 8 ,

000 training data. The MSE from DNNtraining is approximately 0 . S and S from (3),because their training labels have more randomness in the Monte Carlo estimates as compared withthe empirical posterior means of ψ ( c )1 , and ψ ( c )2 , . 12 NN MAP RMAP with w = 50% RMAP with w = 80% ψ ( c )1 , ψ ( c )2 , ∆ , ∆ , H H H H H H H H H H H H Global null hypothesis < < < < Single null hypothesis < Alternative hypothesis

Table 2: Type I error and power of DNN-based approach, MAP and RMAP.Figure 2: Approximation error of DNN in estimating posterior means of ψ ( c )1 , and ψ ( c )2 , , and posteriorprobabilities S and S in (3). 13 NN MAP RMAP with w = 50% RMAP with w = 80% ψ ( c )1 , ψ ( c )2 , ∆ , ∆ , ψ ( c )1 , ψ ( c )2 , ψ ( c )1 , ψ ( c )2 , ψ ( c )1 , ψ ( c )2 , ψ ( c )1 , ψ ( c )2 , Global null hypothesis

Single null hypothesis

Alternative hypothesis

Table 3: Bias of posterior means ψ ( c )1 , and ψ ( c )2 , in DNN, MAP and RMAP. We design a generic randomized clinical trial evaluating the eﬃcacy of a study drug versus an activecomparator secukinumab 300 mg (Langley et al., 2014) in patients with moderate-to-severe plaquepsoriasis with equal sample size per group n ( c )0 = n ( t )0 = 200. We consider the co-primary endpoints inLangley et al. (2014): the proportion of patients achieving a reduction of 75% or more from baselinein the psoriasis area-and-severity index score (PASI 75) and the proportion of patients achieving ascore of 0 (clear) or 1 (almost clear) on a 5-point modiﬁed investigators global assessment (MIGA0/1) at week 12. The control information is available in J = 3 historical studies: ERASURE,FIXTURE (Langley et al., 2014) and JUNCTURE (Paul et al., 2015) with data summarized inTable 5. The weighted observed response rates are approximately 0 .

80 and 0 .

65 for PASI 75 andMIGA 0/1, respectively. We evaluate the performance of diﬀerent methods on the following threescenarios on response rates from the current trial:1. Prior-data conﬂict scenario 1 (S1): ψ ( c )1 , = 0 . ψ ( c )2 , = 0 . ψ ( c )1 , = 0 . ψ ( c )2 , = 0 . ψ ( c )1 , = 0 . ψ ( c )2 , = 0 . NN MAP RMAP with w = 50% RMAP with w = 80% ψ ( c )1 , ψ ( c )2 , ∆ , ∆ , ψ ( c )1 , ψ ( c )2 , ψ ( c )1 , ψ ( c )2 , ψ ( c )1 , ψ ( c )2 , ψ ( c )1 , ψ ( c )2 , Global null hypothesis

Single null hypothesis

Alternative hypothesis

Table 4: RMSE of posterior means ψ ( c )1 , and ψ ( c )2 , in DNN, MAP and RMAP.Historical study ERASURE FIXTURE JUNCTURE n ( c ) j

245 323 60 R ( c )1 ,j of PASI 75 200 249 52 R ( c )2 ,j of MIGA 0/1 160 202 44Table 5: Data of the active comparator secukinumab 300 mg in J = 3 historical studies.When generating training data in our method, we consider the range of ψ ( c )1 , as 0 .

65 to 0 .

95, therange of ψ ( c )2 , as 0 . .

8, and the ranges of ∆ , and ∆ , as − . .

1. The constant critical valuesin MAP, RMAP with w = 50%, and RMAP with w = 80% are 0 . . . α = 0 .

05 under three scenarios S1,S2, and S3. Other parameter setups are the same as Section 4.In terms of power, our DNN-based method has a higher probability of rejecting at least one nullhypothesis (Figure 3a) and of rejecting the ﬁrst null hypothesis (Figure 3b) than MAP and twoRMAPs, but is slightly less powerful for rejecting the second null hypothesis under scenario S2. TheRMAP methods demonstrate the smallest absolute bias for the posterior means ψ ( c )1 , (Figure 4a)and ψ ( c )1 , (Figure 4b) in general. Our method has the smallest RMSE under two prior-data conﬂictscenarios, but the largest one under the prior-data consistent scenario. Therefore, our proposed15ethod has satisfactory RMSE under prior-data conﬂicts, and preserves power by estimating criticalvalues with DNN. In this article, we construct a prospective DNN-based algorithm from the Bayesian hierarchical modelto synthesize control information from multiple endpoints. Our two-stage method ﬁrst approximatesposterior probabilities and then estimates critical values. The resulting pre-trained decision functionscan be locked in ﬁles before initiation of the current trial to ensure study integrity, which is appealingto regulatory agencies.Our DNN-based prospective algorithm can also save computational time. Taking the case studyin Section 5 for illustration, there are 12 setups (4 magnitudes of treatment eﬀect × ,

000 validation iterations per setup. As shown in Table 6, it takes approximately 32hours for DNN to conduct the computation, while over 700 hours for the MCMC method. The mainsaving is due to that DNN only requires 8 ,

000 MCMC samplings to build approximating function inAlgorithm 1 and therefore the validation is really fast. On the contrary, traditional approach needsto conduct posterior sampling for every iteration (1 , ,

000 in total) in validation.Method Algorithm 1 Algorithm 2 Validation Total timeDNN 5.12 26.67 0.04 31.83MCMC - - 768 768Table 6: Computational time (in hour) of DNN and MCMC in the case study.Another important contribution of our work is to model the critical values by DNN to controlFWER. A common practice is to choose the cutoﬀ value by a grid-search method to control type Ierrors in validation within a certain range of the null space. Simulations show a moderate power gainof our proposed method, especially when the constant critical value has a conservative error rate. Toaccommodate approximation errors, a smaller working signiﬁcance level can be utilized to controlvalidated type I error rates strictly no larger than the nominal level, if necessary. Our frameworkcan be broadly generalized to other types of Bayesian designs where the critical value is not availableanalytically in ﬁnite samples. 16 a) Power of rejecting at least one null hypothesis.(b) Power of rejecting the ﬁrst null hypothesis.(c) Power of rejecting the second null hypothesis.

Figure 3: Power performance of DNN, MAP and two RMAPs.17 a) Absolute bias of the posterior mean ψ ( c )1 , .(b) Absolute bias of the posterior mean ψ ( c )2 , . Figure 4: Absolute bias of posterior means ψ ( c )1 , and ψ ( c )2 , in DNN, MAP and two RMAPs.18 a) RMSE of the posterior mean ψ ( c )1 , .(b) RMSE of the posterior mean ψ ( c )2 , . Figure 5: RMSE of posterior means ψ ( c )1 , and ψ ( c )2 , in DNN, MAP and two RMAPs.19 Acknowledgements

This manuscript was sponsored by AbbVie, Inc. AbbVie contributed to the design, research, andinterpretation of data, writing, reviewing, and approving the content. Tianyu Zhan, Ziqian Geng,Yihua Gu, Li Wang and Xiaohong Huang are employees of AbbVie Inc. Yiwang Zhou is a summerintern at AbbVie Inc., and a PhD candidate at Department of Biostatistics, University of Michigan,Ann Arbor. Jian Kang is Professor at Department of Biostatistics, University of Michigan, AnnArbor. Jian Kangs research was partially supported by NIH R01 GM124061 and R01 MH105561.Elizabeth H. Slate is Distinguished Research Professor, Duncan McLean and Pearl Levine FairweatherProfessor at Department of Statistics, Florida State University. All authors may own AbbVie stock.

References

Berry, S. M., Carlin, B. P., Lee, J. J., and Muller, P. (2010).

Bayesian adaptive methods for clinicaltrials . CRC press.Bretz, F., Hothorn, T., and Westfall, P. (2016).

Multiple comparisons using R . CRC Press.Chollet, F. and Allaire, J. J. (2018).

Deep Learning with R . Manning Publications Co., Greenwich,CT, USA.Chow, S.-C., Wang, H., and Shao, J. (2007).

Sample size calculations in clinical research

Statistical Science , 7(4):457–472.Goodfellow, I., Bengio, Y., and Courville, A. (2016).

Deep learning . MIT press.Hinton, G., Srivastava, N., and Swersky, K. (2012). Neural networks for machine learning.

Coursera,video lectures , 307. 20angley, R. G., Elewski, B. E., Lebwohl, M., Reich, K., Griﬃths, C. E., Papp, K., Puig, L., Nakagawa,H., Spelman, L., Sigurgeirsson, B., et al. (2014). Secukinumab in plaque psoriasisresults of twophase 3 trials.

New England Journal of Medicine , 371(4):326–338.Neuenschwander, B., Capkun-Niggli, G., Branson, M., and Spiegelhalter, D. J. (2010). Summarizinghistorical information on controls in clinical trials.

Clinical Trials , 7(1):5–18.Paul, C., Lacour, J.-P., Tedremets, L., Kreutzer, K., Jazayeri, S., Adams, S., Guindon, C., You, R.,Papavassilis, C., and Group, J. S. (2015). Eﬃcacy, safety and usability of secukinumab adminis-tration by autoinjector/pen in psoriasis: a randomized, controlled trial (juncture).

Journal of theEuropean Academy of Dermatology and Venereology , 29(6):1082–1090.Schmidli, H., Gsteiger, S., Roychoudhury, S., O’Hagan, A., Spiegelhalter, D., and Neuenschwander, B.(2014). Robust meta-analytic-predictive priors in clinical trials with historical control information.

Biometrics , 70(4):1023–1032.Su, Y.-S. and Yajima, M. (2015).

R2jags: Using R to Run ’JAGS’ . R package version 0.5-7.Tamhane, A. C. and Gou, J. (2018). Advances in p-value based multiple test procedures.

Journal ofBiopharmaceutical Statistics , 28(1):10–27.Viele, K., Berry, S., Neuenschwander, B., Amzal, B., Chen, F., Enas, N., Hobbs, B., Ibrahim, J. G.,Kinnersley, N., Lindborg, S., et al. (2014). Use of historical control data for assessing treatmenteﬀects in clinical trials.

Pharmaceutical Statistics , 13(1):41–54.Weber, S. (2020).

RBesT: R Bayesian Evidence Synthesis Tools . R package version 1.6-0.Weber, S., Li, Y., Seaman, J., Kakizume, T., and Schmidli, H. (2019). Applying meta-analyticpredictive priors with the R Bayesian evidence synthesis tools. arXiv preprint arXiv:1907.00603arXiv preprint arXiv:1907.00603