[PDF] Elastic Priors to Dynamically Borrow Information from Historical Data in Clinical Trials

Abstract

Use of historical data and real-world evidence holds great potential to improve the efficiency of clinical trials. One major challenge is how to effectively borrow information from historical data while maintaining a reasonable type I error. We propose the elastic prior approach to address this challenge and achieve dynamic information borrowing. Unlike existing approaches, this method proactively controls the behavior of dynamic information borrowing and type I errors by incorporating a well-known concept of clinically meaningful difference through an elastic function, defined as a monotonic function of a congruence measure between historical data and trial data. The elastic function is constructed to satisfy a set of information-borrowing constraints prespecified by researchers or regulatory agencies, such that the prior will borrow information when historical and trial data are congruent, but refrain from information borrowing when historical and trial data are incongruent. In doing so, the elastic prior improves power and reduces the risk of data dredging and bias. The elastic prior is information borrowing consistent, i.e. asymptotically controls type I and II errors at the nominal values when historical data and trial data are not congruent, a unique characteristics of the elastic prior approach. Our simulation study that evaluates the finite sample characteristic confirms that, compared to existing methods, the elastic prior has better type I error control and yields competitive or higher power.

Full PDF

EElastic Priors to Dynamically Borrow Informationfrom Historical Data in Clinical Trials

Liyun Jiang , , Lei Nie , and Ying Yuan Department of Biostatistics, The University of Texas MD Anderson Cancer Center, Hous-ton, TX Center for Drug Evaluation and Research, Food and Drug Administration (FDA), SilverSpring, MD Research Center of Biostatistics and Computational Pharmacy, China PharmaceuticalUniversity, Nanjing, China.

Abstract : Use of historical data and real-world evidence holds great potential to improvethe eﬃciency of clinical trials. One major challenge is how to eﬀectively borrow informationfrom historical data while maintaining a reasonable type I error. We propose the elasticprior approach to address this challenge and achieve dynamic information borrowing. Unlikeexisting approaches, this method proactively controls the behavior of dynamic informationborrowing and type I errors by incorporating a well-known concept of clinically meaningfuldiﬀerence through an elastic function, deﬁned as a monotonic function of a congruencemeasure between historical data and trial data. The elastic function is constructed to satisfya set of information-borrowing constraints prespeciﬁed by researchers or regulatory agencies,such that the prior will borrow information when historical and trial data are congruent, butrefrain from information borrowing when historical and trial data are incongruent. In doingso, the elastic prior improves power and reduces the risk of data dredging and bias. Theelastic prior is information borrowing consistent, i.e. asymptotically controls type I and II1 a r X i v : . [ s t a t . M E ] S e p rrors at the nominal values when historical data and trial data are not congruent, a uniquecharacteristics of the elastic prior approach. Our simulation study that evaluates the ﬁnitesample characteristic conﬁrms that, compared to existing methods, the elastic prior hasbetter type I error control and yields competitive or higher power.KEY WORDS: Real-word data; Historical data; Dynamic information borrowing; Elasticprior; Elastic MAP prior; Adaptive design Real-world data (RWD) or evidence plays an increasingly important role in health caredecisions. The 21st Century Cures Act, signed into law in 2016, emphasizes modernizationof clinical trial designs, including the use of real-world evidence to support approval ofnew indications for approved drugs or to satisfy post-approval study requirements. TheFDA released related guidance in the “Use of Real-World Evidence to Support RegulatoryDecision-Making for Medical Devices” [1] in 2017, and a draft guidance on “SubmittingDocuments Using Real-World Data and Real-World Evidence to FDA for Drugs and BiologicsGuidance for Industry” [2] in 2019.Use of RWD to facilitate medical decisions is an extremely broad topic. We herefocus on the use of historical data to improve the eﬃciency and guide decision making ofrandomized controlled trials (RCTs). For ease of exposition, we assume two-arm RCTs andhistorical data are only available on the control. It is straightforward to extend the proposedmethodology to multiple-arm RCTs and to cases where historical data are also available forthe treatment arm. The question of interest is how to leverage information from historical2ata to increase the power of comparing the treatment eﬃcacy between the control andtreatment arms. This problem is also known as augmenting the control arm with historicaldata or RWD.Under the Bayesian paradigm, such information borrowing is straightforward ifhistorical data D h are congruent (or exchangeable) to control data D c . Let θ denote theparameter of interest (e.g., mean of the eﬃcacy endpoint). We start with assigning θ anon-informative or vague prior π ( θ ), combined with D h , to obtain its posterior π ( θ | D h ),and then use that posterior as the prior for D c to make the comparison between control andtreatment arms. Such full information borrowing, however, is not appropriate when D h arepartially or not congruent to D c , leading to bias. If the bias favors treatment, the type Ierror rate will be inﬂated. If the bias favors control, the power of the study will reduce.Various approaches have been proposed for dynamic information borrowing, suchthat the amount of information borrowed from D h is automatically adjusted according tothe congruence between D h and D c . Chen and Ibrahim [3, 4] proposed a power prior, whichcontrols the degree of information borrowing through a “power parameter.” Hobbs et al.(2011) [5] proposed a commensurate prior that allows for the commensurability of the infor-mation in the historical data and current data to determine how much historical informationto use. Thall et al. (2003) [6] and Berry et al. (2013) [7] proposed to use the Bayesian hierar-chical model to borrow information from diﬀerent data resources or subgroups. Schmidli etal. (2014) [8] proposed a robust meta-analytic-predictive (MAP) prior to borrow informationfrom historical data via a mixture prior. Pan, Yuan, and Xia (2017) [9] proposed a calibratedpower prior, assuming the availability of patient-level historical data. However, most of thesemethods have diﬃculty achieving dynamic information borrowing, leading to substantiallyinﬂated type I error and bias, as noted previously by Neuenschwander et al. [10], Freidlinand Korn [11], and Chu and Yuan [12], among others.3n this paper, we propose a general Bayesian method with elastic priors to addressthe aforementioned issue. Unlike many existing approaches, the proposed method proactivelycontrols the behavior of dynamic information borrowing through an elastic function, deﬁnedas a monotonic function of a congruence measure between D h and D c . The elastic function isconstructed to satisfy a set of prespeciﬁed information borrowing constraints. For example, aborrowing constraint can be set based on a prespeciﬁed clinically meaningful diﬀerence suchthat the amount of borrowing decreases when the diﬀerence between D h and D c increases.This control leads to a substantially reduced risk of bias. Asymptotically, the elastic priorapproach maintains type I and II errors at the nominal value when D h and D c are notcongruent. In contrast, most existing dynamic information borrowing methods, includingthe power prior, commensurate prior, and robust MAP prior, do not have this characteristic.The elastic prior also demonstrates superior ﬁnite sample characteristics. Our simulationstudy conﬁrms that, compared to existing methods, the elastic prior approach controls typeI errors better, yielding a competitive or higher power. Other desirable characteristics ofthe elastic prior approach include that it is straightforward to determine the prior eﬀectivesample size (PESS) contained in the elastic prior, and the elastic prior is deﬁned independentof trial data D c and thus can be fully pre-speciﬁed.The remainder of this article is organized as follows. In Section 2, we introduce theelastic prior method. In Section 3, we evaluate the operating characteristics of the proposedmethod using simulation, and we conclude with a brief discussion in Section 4.4 Methods

Consider a two-arm RCT, let y denote the eﬃcacy endpoint that is a binary variable followinga Bernoulli distribution or a continuous variable following a normal distribution. Let θ c and θ t denote E ( y ) for the control and treatment arms, respectively. The objective of the trialis to compare θ t with θ c to determine whether the treatment is superior, noninferior, orequivalent to the control. Under the Bayesian paradigm, the decision can be made basedon the following criterion: the treatment is deemed superior, noninferior, or equivalent tothe control if Pr( M L < θ t − θ c < M H | D c , D t , D h ) > C , where M L and M H are prespeciﬁedmargins, C is a probability cutoﬀ. For example, superiority trials typically set M L = 0 and M H = ∞ ; noninferiority trials set M H = ∞ and M L = − M , where M is the noninferioritymargin; and equivalence trials set ( M L , M H ) = ( − E, E ), where E is the equivalence margin.We assume that historical data D h are only available to the control. Thus, we focus onthe posterior inference of θ c and suppress its subscript when no confusion is caused. In theanalysis, the posterior inference for θ t will be done using standard Bayesian methods (e.g.,using a conventional noninformative or vague prior).The basic idea of an elastic prior is straightforward. Let π ( θ ) denote a vague initialprior that reﬂects prior knowledge about θ before D h is observed. Applying the prior π ( θ ) to D h , we obtain a posterior distribution π ( θ | D h ). The elastic prior is constructed by inﬂatingthe variance of π ( θ | D h ) by a factor of g ( T ) − , where T is a congruence measure between D h and D c , and g ( T ) is a monotonically decreasing function with values between 0 and 1.When T →

0, reﬂecting a prefect congruence measure between D h and D c , g ( T ) → π ( θ | D h ) will be fully used as a prior. When T → ∞ , reﬂecting substantial incongruencemeasure between D h and D c , g ( T ) → Let n h and n c respectively denote the sample size of D h and D c , D c = ( y c, , · · · , y c,n c )and D h = ( y h, , · · · , y h,n h ), where y h,i i.i.d ∼ Bernoulli ( θ h ) and y c,i i.i.d ∼ Bernoulli ( θ ). Let y h = (cid:80) n h i =1 y h,i / n h and y c = (cid:80) n c i =1 y c,i / n c . Assuming a vague prior π ( θ h ) ∼ Beta ( α , β ),with small values of α and β (e.g., α = β = 0 . π ( θ h ), which results in a posterior of θ of the form. π ( θ h | D h ) ∝ Beta ( α + n h y h , β + n h − n h y h ) . The elastic prior is given by π ∗ ( θ | D h ) ∝ Beta (( α + n h y h ) g ( T ) , ( β + n h − n h y h ) g ( T )) . (1)The elastic prior π ∗ ( θ | D h ) has the same mean as π ( θ | D h ), but inﬂates the latter’s varianceby a factor of g ( T ) − . Given π ∗ ( θ | D h ), the posterior of θ after accounting for D c is π ( θ | D h , D c ) = Beta (( α + n h y h ) g ( T ) + n c y c , ( β + n h − n h y h ) g ( T ) + n c − n c y c ) . We now discuss how to choose congruence measure T and elastic function g ( · ). Fora binary endpoint, there are many diﬀerent choices for congruence measure T . For example,6e may consider T = | y c − y h | (cid:113) y (1 − y )( n c + n h ) , where y = ( y c n c + y h n h ) / ( n c + n h ) is a pooled sample mean. While diﬀerent choices of T have diﬀerent advantages; in this paper, we choose the chi-square test statistic: T = (cid:88) j = c,h ( O j − E j ) E j + (cid:88) j = c,h ( O j − E j ) E j , where O j and O j are the observed number of responders and non-responders for D c and D h ; E j and E j are the expected number of responders and non-responders, which are givenby E j = n j (cid:80) j = c,h n j − (cid:80) j = c,h (cid:80) n j i =1 y j,i (cid:80) j = c,h n j , E j = n j (cid:80) j = c,h (cid:80) n j i =1 y j,i (cid:80) j = c,h n j . A large value of T ∈ (0 , ∞ ) indicates low congruence between D c and D h .Elastic function g ( T ) serves as a link function that maps congruence measure T toan information discount factor. Any monotonic function could be used as an elastic function,as long as g ( T ) → T corresponds to congruence and g ( T ) → T corresponds to incongruence. In this paper, we choose g ( T ) = 11 + exp { a + b × log( T ) } , (2)where a and b > a and b later. When appropriate, a more ﬂexible elastic function g ( T ) = a + b ×{ log( T ) } c ] can be used to further control the rate of change from borrowing to noborrowing using the additional parameter c (see Figure 1 (a)). It can be shown that theresulting elastic prior has the following consistence property:7 heorem 1 The elastic prior deﬁned in (1) is information-borrowing consistent. Thatis, when n h → ∞ and n c → ∞ , it achieves full information borrowing if D h and D c arecongruent (i.e., θ h = θ ), and discards D h if D h and D c are incongruent (i.e., θ h (cid:54) = θ ).The biggest concern and barrier for adopting information-borrowing methods inclinical trials is the potential risk of type I or II error inﬂation caused by the informationborrowing when D h and D c are actually incongruent. Theorem 1 shows that, asymptotically,the elastic prior maintains a type I error at the nominal value when D h and D c are notcongruent. In contrast, most existing dynamic information borrowing methods, includingthe power prior, commensurate prior, and robust MAP prior, do not have this property. Toachieve the information-borrowing consistency, they typically require the number of historicaldatasets (not the number of observations within each historical dataset) goes to inﬁnity,which is not the case in practice.In ﬁnite samples, however, strictly controlling a type I error at its nominal valueis impossible for any information-borrowing methods, including the elastic prior approach.The reason is simple: when θ h (cid:54) = θ , the type I error inﬂates whenever information-borrowingis triggered. With ﬁnite sample, even when θ h (cid:54) = θ , there is non-zero probability that theobserved D h and D c are comparable and trigger (inappropriate) information borrowing, thusinﬂating the type I or II error. Theorem 2

For any method that borrows information from historical or other externaldata, dynamically or non-dynamically, the inﬂation of type I or II error is inevitable underﬁnite samples, depending on whether historical or other external data under- or over-estimatethe treatment eﬀect of the control arm when compared to the current data.Theorem 2 is important, because it sets a realistic expectation for information-borrowingmethods and avoids vain eﬀorts to pursue a dynamic information borrowing method that8an strictly control type I errors in ﬁnite samples.Since the inﬂation of type I or II errors is inevitable with information borrowing,one reasonable strategy is to control type I and II error inﬂation according to certain pre-speciﬁed criteria. This motivates the following procedure to choose the elastic function (2),as illustrated in Figure 2. Without loss of generality, we assume a large value of T indicateslarger incongruence between D h and D c .1. Elicit from subject matter experts a clinically meaningful diﬀerence (CMD), denotedas δ , for E ( y ). The CMD is routinely used in clinical trial planning, including forsample size determination and power calculation, and its determination often requirescommunication between sponsors and regulatory bodies.2. ( Congruent case ) Simulate R replicates of D c = ( y c, , · · · , y c,n c ) from Bernoulli (ˆ θ h ),with ˆ θ h = ¯ y h , and calculate congruence measure T between D h and each simulated D c , resulting in T = ( T , · · · , T R ), where T r denotes the value of T based on the r thsimulated D c .3. ( Incongruent cases ) Simulate R replicates of D c from Bernoulli (ˆ θ h + 2 δ ), and calcu-late congruence measure T between D h and each simulated D c , resulting in T +1 =( T +1 , · · · , T + R ), where T + r denotes the value of T based on the r th simulated D c . Re-peat this with D c simulated from Bernoulli (ˆ θ h − δ ), resulting in T − = ( T − , · · · , T − R ),where T − r denotes the value of T between D h and the r th simulated D c .4. Let C and C be constants close to 1 and 0, respectively, e.g., C = 0 .

99 and C = 0 . T q denote the q th percentile of T , T + q and T − q denote the q th percentile of T +1 and T − , respectively, and deﬁne T q = min ( T + q , T − q ). Determine the elastic function92) by solving the following two equations: C = g ( T q ) , (3) C = g ( T q ) , (4)where the ﬁrst equation enforces (approximately) full information borrowing, and thesecond essentially enforces no information borrowing. This leads to the solution g ( T ) = 11 + exp { a + b × log( T ) } , where a = log( 1 − C C ) − log( (1 − C ) C (1 − C ) C )(log( T q ))log( T q ) − log( T q ) ,b = log( (1 − C ) C (1 − C ) C )log( T q ) − log( T q ) . (5)Several remarks are warranted. In step 3, we generate incongruent cases by simulating D c from Bernoulli (ˆ θ h ± δ ), rather than Bernoulli (ˆ θ h ± δ ) (i.e., right at the CMD), because theobjective of step 3 is to simulate highly incongruent cases to prevent information borrowingby equation (4) in step 4. As it is often regarded as reasonable to borrow some informationwhen the diﬀerence between D h and D c is smaller than CMD, it is thus not appropriate toset the no-borrowing constraint right at the boundary. In step 4, as incongruence can occurin either direction (i.e., θ c is larger or smaller than θ h ), we take T q = min ( T + q , T − q ) to ensureno information borrowing under the more conservative direction.In step 4, q and q deﬁne the borrowing and no borrowing regions (see Figure 2).We may simply choose q = q = 0 .

5, i.e., median of T , T +1 , and T − . A better and more10exible approach is to choose q and q to maximize the trade-oﬀ between the power (in thecongruent case) and type I error (in the incongruent case). Toward this goal, let ρ denotethe power under the congruent case, ψ denote the type I error under the incongruent casedescribed in Step 3, and η is a type I error threshold. We deﬁne the utility: U ( q , q ) = ρ − w ψ − w ( ψ − η ) I ( ψ > η ) , (6)where w and w are penalty weights. This utility imposes a penalty of w for each unitincrease of a type I error before it reaches η , and then a penalty of w + w . In our simulation,we set w = 1, w = 2, and η = 0 .

1, which means that before the type I error reaches 0.1,the penalty for a 1% increase of type I errors is to deduct the power by 1%; and once thetype I error exceeds 0.1, the penalty for a 1% increase of type I errors increases to deduct thepower by 3%. Through a grid search (see Appendix for the procedure), we can identify the( q , q ) that maximize U ( q , q ). Although this approach is more complicated than directlysetting q = q = 0 .

5, it results in better performance, thus we generally recommend thisapproach.A special form of the elastic function, with T q ≡ T q (see Figure 1 (b)), is thefollowing step function g ( T ) =  T ≤ T q T > T q , (7)where full information borrowing occurs if T ≤ T q , and no information borrowing occurs if T > T q . Compared to smooth elastic function (2), one advantage of the step elastic functionis that its calibration is simpler, needing only two steps:1. (Congruent case) Simulate R replicates of D c = ( y c, , · · · , y c,n c ) from Bernoulli (ˆ θ h ),11ith ˆ θ h = ¯ y h , and calculate congruence measure T between D h and each of the sim-ulated D c ’s, resulting in T = ( T , · · · , T R ), where T r denote the value of T based onthe r th simulated D c .2. Use a grid search to identify the T q that maximizes utility U ( q ).Numerical study shows that the step elastic function can achieve similar operating character-istics as a smooth function, but with greater simplicity, making it a good choice for practicaluse. The elastic prior approach has several desirable design characteristics, making itan appealing choice for prespeciﬁed analysis. One desirable characteristic is that the elasticfunction can be fully pre-speciﬁed and deﬁned independent of trial data D c . With thepre-speciﬁed elastic function, the amount of information borrowing is determined by a pre-speciﬁed congruence measure T between historical and current trial data. We expect pre-speciﬁcation would be a desired characteristics whenever possible. The elastic prior approachsatisﬁes or goes beyond the requirement of pre-speciﬁcation that “In general, Bayesian CIDproposals should include a robust discussion of the prior distribution...a Bayesian proposalshould also include a discussion explaining the steps the sponsor took to ensure informationwas not selectively obtained or used.In cases where downweighting or other non-data-drivenfeatures are incorporated in a prior distribution, the proposal should include a rationalefor the use and magnitude of these features.” as brieﬂy discussed in the draft Guidance forIndustry on Interacting with the FDA on Complex Innovative Trial Designs for Drugs andBiological Products.Another desirable characteristic is the straightforward determination of the prioreﬀective sample size (PESS) contained in the elastic prior, which is simply g ( T ) n h as g ( T ) isa variance inﬂation factor. In contrast, determining PESS for existing methods (e.g., com-12ensurate prior and robust MAP prior) is more involved, and we found that diﬀerent PESScalculations used by these methods [13–15] often led to substantially diﬀerent, sometimesimproper results (e.g., PESS > n h ) [13, 16]. Consider a normal endpoint y c,i iid ∼ N ( θ, σ ) and y h,i iid ∼ N ( θ h , σ h ), with interest in estimating θ . With a noninformative prior π ( θ h ) ∝ D h , the posterior of θ h is π ( θ h | D h , σ h ) ∝ π ( θ h ) f ( D h | θ h , σ h ) = N ( y h , σ h n h ) . An unknown σ h is often replaced by its maximum likelihood estimate ˆ σ h = (cid:80) n h i =1 ( y h,i − ¯ y h ) (cid:14) n h .The elastic prior of θ is obtained by inﬂating the variance of π ( θ h | D h , σ h ) with the elasticfunction g ( T ) as follows: π ∗ ( θ | D h , σ h ) = N (¯ y h , σ h n h g ( T ) ) . (8)Analogue to Section 2.1, the prior eﬀective sample size for π ∗ ( θ | D h , σ h ) is simply g ( T ) n h .Full information borrowing is achieved when g ( T ) = 1, and no information borrowing occurswhen g ( T ) = 0. In this scenario, the power prior may obtain similar prior in (2.2). The keydiﬀerence is that g ( T ) is pre-speciﬁed to proactively control type I and II error rates and itsexpected value is known prior to the trial conduct. In addition, as the power prior works bydiscounting the whole likelihood, it does not allow parameter-speciﬁc adaptive informationborrowing, for example, when we are interested in estimating and information borrowing onboth θ and σ as describe later.The elastic function (2) or step elastic function (7) can be used to dynamically13ontrol information borrowing based on the congruence measure T . When subject-level dataare available for D h , the Kolmogorov-Smirnov (KS) statistic can be used as the congruencemeasure between D c and D h . T = max i =1 ,...,N {| F ( Z ( i ) ) − G ( Z ( i ) ) |} , (9)where N = n c + n h ; F ( · ) and G ( · ) are the empirical distribution functions for D h and D c ,respectively; Z (1) ≤ · · · ≤ Z ( N ) are the N = m + n ordered values for the combined sampleof D h and D c . When D h only contains summary statistics (e.g., mean and standard error), t statistic is a reasonable choice for T , T = | ¯ y c − ¯ y h | s (cid:113) n h + n c , (10)where s = (cid:113) ( n c − s c +( n h − s h n c + n h − with s c and s h denoting the sample variance of D c and D h ,respectively. For both congruence measures, a larger value of T indicates less congruencebetween D h and D c . Again, it can be shown that the resulting elastic prior is information-borrowing consistent, as described in Theorem 1. Also, the choice of T is not unique (e.g., t statistic can also be used when D h consists of individual-level data) and can be tailoredto quantify inferential interest. For example, if the objective of the trial is to compare thevariance between the treatment and control arms, the F statistic of testing equal varianceis an appropriate measure for the congruence of D h and D c in variance. The calibrationprocedure of elastic function g ( T ; φ ) is similar to that for the binary endpoint and providedin the Appendix.If estimation of θ and σ is of interest, we can also construct the joint elastic priorfor ( θ, σ ). We ﬁrst apply the noninformative prior π ( θ h , σ h ) ∝ (1 /σ h ) m to D h , where m is14 constant, resulting in the following posterior, π ( θ h , σ h | D h ) ∝ π ( θ h , σ h ) f ( D h | θ h , σ h ) ∝ N θ (¯ y h , σ h n h ) IG σ ( µ h , (cid:15) h ) , where IG ( · ) is an inverse gamma distribution with mean µ h = n h ˆ σ h n h − m and variance (cid:15) h = ( n h ˆ σ h ) ( n h − m ) ( nh − + m ) . The joint elastic prior for ( θ, σ ) is obtained by inﬂating the variance of π ( θ h , σ h | D h ) with two elastic functions g ( T ) and g ( T ), π ∗ ( θ, σ | D h ) ∝ N θ (¯ y h , σ n h g ( T ) ) IG σ ( µ h , (cid:15) h g ( T ) ) , where T is (9) or (10), and T is the F statistic of testing equal variance. Allowing parameter-speciﬁc information borrowing renders the elastic prior more ﬂexibility than the power prior.Given the elastic prior and trial data D c , the posterior distribution for ( θ, σ ) is π ( θ, σ | D c , D h ) ∝ N θ ( n h g ( T ) y h + n c y c n h g ( T ) + n c , σ n h g ( T ) + n c ) IG σ ( α ∗ , β ∗ ) , where α ∗ = n c +42 + ( n h − + m ) g ( T ), and β ∗ = (cid:80) nci =1 y c,i + n h g ( T ) y h − ( n h g ( T ) y h + n c y c ) n h g ( T )+2 n c + n h (cid:98) σ h n h − m [1 + ( n h − + m ) g ( T )]. The proposed method can be extended to borrow information from K independent histor-ical datasets D h, , · · · , D h,K . For notational brevity, we here suppress subscript “ h ”, anddenote the k th historical dataset D k = ( y k, , · · · , y k,n k ) with sample mean y k = (cid:80) n k i =1 y k,i /n k .15he elastic prior can be obtained by sequentially applying the method described above to D , · · · , D K . Using a binary endpoint as an example, the steps to obtain the elastic priorare1. Starting with noninformative prior π ( θ ) ∼ Beta ( α , β ), obtain the elastic prior π ∗ ( θ | D ) for D , where π ∗ ( θ | D ) = Beta ( α , β ) with α = ( α + n y ) g ( T ) and β = ( β + n − n y ) g ( T ).2. Using π ∗ ( θ | D ) as the prior, combining with D , obtain the elastic prior π ∗ ( θ | D , D )for D and D , where π ∗ ( θ | D , D ) = Beta ( α , β ), where α = ( α + n y ) g ( T ) and β = ( β + n − n y ) g ( T ).3. Repeat step 2 sequentially to D , · · · , D K to obtain the elastic prior π ∗ ( θ | D , · · · , D K ).Elastic functions g ( T ) , · · · , g ( T K ) are calibrated independently using the procedure de-scribed previously based on D , · · · , D K , respectively. One advantage of this sequentialelastic prior is that its allow study-speciﬁc dynamic information borrowing with minimalinterference among D , · · · , D K . For example, if D is congruent to D c and D is not con-gruent to D c , the elastic prior will borrow more information from D and less informationfrom D . Another approach is to aggregate historical information through meta-analysis of D h, , · · · , D h,K , and then construct the elastic prior. This can be done using two steps: (1)perform meta-analysis on D h, , · · · , D h,K using the Bayesian hierarchical (or random-eﬀects)model to obtain the posterior predictive distribution of θ (i.e., MAP prior), π ( θ | D h, , · · · , D h,K );(2) inﬂate the variance of the MAP prior using the elastic function g ( T ) to obtain the elasticprior π ∗ ( θ | D h, , · · · , D h,K ). One challenge is how to choose an appropriate statistic T tomeasure the congruence between D c and K datasets. The congruence measure T discussed16reviously is applicable to each of D h, , · · · , D h,K , but it is not clear how to combine theminto a single global congruence measure. To address this issue, we borrow the concept ofthe posterior predictive model assessment method [17, 18]. The basic idea is that if D c iscongruent to D h, , · · · , D h,K , we could expect that the actual observed D c will be generallyconsistent with the data generated from π ( D c | D h, , · · · , D h,K ). Therefore, if the observed D c is located on the far tail of the predicted distribution of π ( D c | D h, , · · · , D h,K ), then D c is likely to be incongruent to the historical data. This motivates us to use the posterior pre-dictive p value as the congruence measure T . This approach is general and also can be usedfor a single historical dataset with various endpoints. Using a normal endpoint as example, T is calculated as follows:1. Draw R samples of θ from π ( θ | D h, , · · · , D h,K ), denoted as θ (1) , · · · , θ ( R ) . Given θ ( r ) ,simulate trial data D c = ( y c, , · · · , y c,n c ), and denote its sample mean as ¯ y ( r ) c , r =1 , · · · , R . In our simulation, we use R = 10 , y c denote the actual observed sample mean of D c ; the congruence measure isdeﬁned as T = − log( P P ) , where P P = 2 × min ( (cid:80) Rr =1 I (¯ y ( r ) c > ¯ y c ) /R, (cid:80) Rr =1 I (¯ y ( r ) c < ¯ y c ) /R ) is the two-sidedposterior-predictive p value.Of note, Theorem 1 and 2, as well as desirable design characteristics, which were discussedin Section 2.1, also apply to Section 2.2 and 2.3.17 Simulation studies

In this section, we evaluate the ﬁnite-sample properties of the elastic prior approach andcompare them to some existing methods.

We considered scenarios that involve a two-arm superiority trial with one historical data D c ,where the endpoint is either a continuous variable with Gaussian distribution or a binaryvariable with a Bernoulli distribution. For a continuous endpoint, the sample sizes forhistorical data, control arm, and treatment arm were n h = 50, n c = 25, and n t = 50,respectively. We generated control data D c from N ( θ c , ) with θ c = 1, and treatment data D t from N ( θ t , ) with θ t = 1 and 1 .

5. The CMD is δ = 0 .

5. We generated the historical data D h from N ( θ h , ) and varied its mean θ h to simulate the scenarios where D h is congruentor incongruent. For a binary endpoint, the sample sizes for the historical data, control arm,and treatment arm were n h = 100, n c = 40, and n t = 80, respectively. We generated D c from Bernoulli ( θ c ) with θ c = 0 .

4, and D t from Bernoulli ( θ t ) with θ t = 0 .

4, 0.55, and 0.6. TheCMD is δ = 0 .

12. We generated D h from Bernoulli ( θ h ) and varied its mean θ h to simulatethe scenarios where D h is congruent or incongruent to D c . We considered the smooth elasticfunction (2) and step function (7) and denoted them as elastic prior 1 (EP1) and elasticprior 2 (EP2), respectively.We compared the proposed elastic prior with the commensurate prior (CP), (nor-malized) power prior (PP), and conventional non-informative prior (NP) that ignores histori-cal data. For CP, we considered two priors for its shrinkage parameter τ used in publications:log( τ ) ∼ U nif ( − ,

30) [5] (denoted as CP1), and spike-and-slab prior with a slab of (1, 2),18pike of 20 and Pr(slab)=0.98 (denoted as CP2) [16]. For PP, uniform prior

U nif (0 ,

1) isused for the power parameter. For EP1 and EP2, we set w = 1, w = 2, and η = 0 . θ t − θ c > | D c , D t , D h ) > C . The probability cutoﬀ C is calibratedfor each method with 10,000 simulated trials such that under the null (i.e., θ c = θ t = θ h ,corresponding to scenario 1 in Tables 1 and 2), the type I error is 5%. The treatment armdoes not involve information borrowing and the posterior of θ t is obtained based on the con-ventional noninformative prior. Under other simulation conﬁgurations, we conducted 1000simulations. Table 1 shows the results for a normal endpoint. In scenarios 1 and 2, D h and D c arecongruent. When the treatment is not eﬀective (i.e., scenario 1), all methods control thetype I error rate at its nominal value of 5%. When the treatment is eﬀective (i.e., scenario2), EP1, EP2, CP1, and PP oﬀer substantial power gain over NP. For example, the power ofEP1 is 27.1% higher than NP, and also slightly higher than CP1 and PP. EP2 has comparableperformance to EP1. In contrast, CP2 provides little power improvement, indicating that thespike-and-slab prior is too conservative to borrow information. Similar results are observedin scenarios 3 to 4, where D h and D c are approximately congruent. Scenarios 5-8 considerthe case that D h and D c are incongruent. Speciﬁcally, in scenarios 5 and 6, the treatment isnot eﬀective, and the results are type I errors. Compared to CP1 and PP, EP1 and EP2 oﬀerbetter type I error control. For example, in scenario 5, the type I errors of EP1 and EP2 are7.7% and 7.3%, whereas the type I errors of CP1 and PP are 14.6% and 30%, respectively.19P2 has little type I inﬂation because it barely borrows information, demonstrated by itslow power when the D h and D c are congruent (i.e., scenarios 3 and 4). In scenarios 7 and8, the treatment is eﬀective, and the results are power. EP1 and EP2 yield higher power todetect the treatment eﬀect than CP1 and PP. For example, in scenario 7, the power of EP2is 15.0% and 34.8% higher than CP1 and PP, respectively.Table 2 shows the results for a binary endpoint, which are generally consistent withthese for normal endpoint. Scenarios 1 to 4 consider the case that D h and D c are congruentor approximately congruent. In scenario 1, the treatment is not eﬀective; all methods controlthe type I error rate at its nominal value of 5%. In scenario 2, the treatment is eﬀective; EP1,EP2, CP1, and PP oﬀer substantial power gain over NP. For example, the power of EP1 is15.9% higher than NP, and comparable to CP1 and PP. Akin to the normal endpoint, CP2is similar to NP with little information borrowing. Similar results are observed in scenarios3 and 4, where D h and D c are approximately congruent. Scenarios 5-8 consider the case that D h and D c are incongruent. Speciﬁcally, in scenarios 5 and 6 the treatment is not eﬀective,and the results are type I errors. Compared to CP1, CP2 and PP, EP1 and EP2 oﬀerbetter type I error control. For example, in scenario 5, the type I error of EP1 and EP2 isapproximately 1/2 and 1/4 of that of CP1, 1/3 and 1/5 of PP, and 3.6% (7.4%) lower thanCP2. In scenarios 7 and 8, the treatment is eﬀective, and the results are power. EP1 andEP2 yield higher power to detect the treatment eﬀect than found with CP1 and PP. Forexample, in scenario 7, the power of EP1 and EP2 are more than double that of CP1 andPP. 20 .3 Multiple historical datasets Taking a similar setting as the simulation with one historical dataset, we generated controlarm data D c from N ( θ c , ) with θ c = 1 and sample size n c = 25, and treatment arm data D t from N ( θ t , ) with θ t = 1 , . n t = 50. The CMD is δ = 0 .

5. We consideredfour historical datasets with sample size 40, 50, 45, and 55, respectively, generated from thefollowing hierarchical model: y k ∼ N ( θ k , ) , k = 1 , · · · , ,θ , θ , θ , θ ∼ N ( θ h , . ) . We varied θ h to simulate scenarios where D h is congruent or incongruent to D c . Similarly,we considered both the smooth elastic and step functions, and denoted them as elastic MAP1 (EMAP1) and elastic MAP 2 (EMAP2), respectively.We compared the elastic MAP priors with the robust MAP prior. FollowingSchmidli et al (2014) [8], we considered two versions of the robust MAP prior: Mix50 witha weight of 0.5 and the Mix90 design with a weight of 0.1 assigned to MAP. As the bench-mark, we also considered the conventional NP that ignores historical data. The treatmentis deemed superior to the control if Pr( θ t − θ c > | D c , D t , D h ) > C . The probability cutoﬀ C is calibrated for each method with 10,000 simulated trials such that under the null (i.e., θ c = θ t = θ h , corresponding to scenario 1 in Table 3), the type I error is 5%. Under othersimulation conﬁgurations, we conducted 1000 simulations.Table 3 shows the results. When historical data and control data are congruent(i.e., scenarios 1 to 4), EMAP1 and EMAP2 have comparable performance to Mix50 andMix90. All methods control type I errors at the nominal value of 5% (scenario 1), and they21ield substantially higher power than the NB due to borrowing information from historicaldatasets. Scenarios 5-8 consider the case that historical data and control data are incongru-ent. In scenarios 5-6, the treatment is ineﬀective and the results are type I errors. EMAP1and EMAP2 oﬀer better type I error control than the robust MAP. For example, in scenario5, the type I error of EMAP1 and EMAP2 are 8.5% and 7.8%, whereas that of Mix50 andMix90 are 14.1% and 26.4%, respectively. In addition, EMAP1 and EMAP2 provide sub-stantial power gain over Mix50 and Mix90. For example, in scenario 7, the power of EMAP1is 17.5% and 25.9% higher than Mix50 and Mix90, respectively, and EMAP2 has 23.3% and31.7% higher power than Mix50 and Mix90, respectively. We have proposed the elastic prior to dynamically borrow information from historical data.Through the use of elastic function, the elastic prior approach adaptively borrows informationbased on the congruence between trial data and historical data. The elastic function isconstructed based on a set of information-borrowing constraints prespeciﬁed such that theprior will borrow information when historical and trial data are congruent, and refrain frominformation borrowing when historical and trial data are incongruent. The elastic prior isinformation-borrowing consistent, and is easy to quantify using a prior eﬀective sample size.Simulation study shows that, compared to existing methods, the elastic prior has better typeI error control, and yields competitive or higher power. In addition, we provide insights onwhat can and cannot be achieved using the information-borrowing method, which is usefulfor guiding future methodology development.The good performance of the elastic prior stems from the use of elastic function to22egulate the behavior of information borrowing within the range of the parameter space ofpractical interest. That is, the elastic prior does not completely rely on the data to deter-mine information borrowing. It also incorporates the subject matter knowledge (e.g., whenit should borrow or not) to enhance and govern the performance. In contrast, many existingmethods intend to achieve dynamic information borrowing by estimating the information-borrowing parameter (e.g., power parameter in power prior or shrinkage parameter in com-mensurate prior), jointly with model parameters, based on data. However, the data containextremely limited information for estimating the information-borrowing parameter becausethe observation unit contributing to the estimation is the dataset, not subject-level ob-servations. For example, one historical data and one trial data actually provide only twoobservations to estimate the information-borrowing parameter. This is a well-known issuein meta-analysis for estimating the between-study variation. As a result, these dynamicinformation borrowing methods cannot reliably sense the congruence/incongruence betweenhistorical data and trial data to perform appropriate information borrowing.The idea of an elastic prior is general, and it also can be applied to both commensu-rate and power priors to improve their operating characteristics. We outline this approach inthe Appendix. In addition, we have focused on two-arm randomized superiority trials withbinary or normal endpoints. The methodology can be applied to single-arm and multiple-arm trials, as well as other types of trials, for example, noninferiority trials. Extension ofthe elastic prior to the time-to-event endpoint is of practical interest and warrants furtherresearch. The type I error considered in this paper was referred as the usual view, wheretype I error rate is based on current trial(s) alone. Pennello and Thompson (2008) [19]also discussed a view 2, considering type I error rate based on current trial(s) and priordata considered together. Type I error rate with view 2 might be considered when we23xtrapolating adult results to a pediatric setting, when know before the adult trials, thatthe analysis in the pediatric setting will borrow from the adult results because of similarityin the course of disease, response to treatment, pharmacokinetic, and pharmacodynamic.

Disclaimer

This article reﬂects the views of the author, and it should not be construed to representFDA views or policies. 24 . . . . . . T g ( T ) ● ● . . . . . . T g ( T ) c=-6c=-2c=1c=4c=8 𝑇 ! ! 𝑇 ! " 𝑇 ! ! = 𝑇 ! " T T (a) (b)

Figure 1: (a) A class of smooth elastic functions deﬁned by g ( T ) = a + b ×{ log( T ) } c ] , and(b) a step elastic function, where g ( T ) = 1 leads to full information borrowing and g ( T ) = 0leads to no information borrowing. T D en s i t y T T g ( T ) . . . . . . Fully borrow No borrowPartially borrowElastic functionDistribution of Twhen historicaland control dataare congruent Distribution of T whenhistorical and control dataare incongruent g(T) 𝑇 ! ! 𝑇 ! " T0 Figure 2: Dynamic information borrowing through the elastic function.25able 1: Simulation results for a normal endpoint using a noninformative prior (NP), elasticprior with the smooth elastic function (EP1) and step elastic function (EP2), commensurateprior with uniform prior (CP1) and spike-and-slab prior (CP2), and power prior (PP).

Percentage of claiming eﬃcacy (PESS)Scenario θ h θ c θ t NP EP1 EP2 CP1 CP2 PPCongruent ∗ Incongruent ∗ ∗ -0.5 1 1 5.0 7.1(0.00) 7.0(0.00) 10.0(1.34) 8.8(3.24) 18.3(3.19)7 2 1 1.5 66.3 72.3(0.24) 72.6(0.35) 57.6(0.36) 60.7(3.58) 37.8(9.47)8 2.5 1 1.5 66.3 72.3(0.00) 72.6(0.00) 69.2(1.37) 57.6(3.25) 49.1(3.14)*Type I error Percentage of claiming eﬃcacy (PESS)Scenario θ h θ c θ t NP EP1 EP2 CP1 CP2 PPCongruent ∗ Incongruent ∗ ∗ Percentage of claiming eﬃcacy (PESS)Scenario θ h θ c θ t NP EMAP1 EMAP2 Mix50 Mix90Congruent ∗ Incongruent ∗ ∗ -0.2 1 1 5.0 7.7(0.03) 7.6(0.00) 9.7(15.00) 16.2(29.18)7 1.6 1 1.5 66.3 61.3(11.45) 67.1(8.41) 43.8(14.99) 35.4(29.18)8 2 1 1.5 66.3 75.3(0.00) 75.1(0.15) 63.0(15.00) 47.8(29.18)*Type I error eferences [1] US Food and Drug Administration. (2017). Use of real-world evidence to support reg-ulatory decision-making for medical devices. Guidance for industry and food and drugadministration staﬀ.[2] US Food and Drug Administration. (2019). Submitting documents using real-world dataand real-world evidence to FDA for drugs and biologics: guidance for Industry: draftguidance. Rockville, MD: US Food and Drug Administration.[3] Ibrahim, J. G., & Chen, M. H. (2000). Power prior distributions for regression models.Statistical Science, 15(1), 46-60.[4] Ibrahim, J. G., Chen, M. H., & Sinha, D. (2003). On optimality properties of the powerprior. Journal of the American Statistical Association, 98(461), 204-213.[5] Hobbs, B. P., Carlin, B. P., Mandrekar, S. J., & Sargent, D. J. (2011). Hierarchical com-mensurate and power prior models for adaptive incorporation of historical informationin clinical trials. Biometrics, 67(3), 1047-1056.[6] Thall, P. F., Wathen, J. K., Bekele, B. N., Champlin, R. E., Baker, L. H., & Benjamin,R. S. (2003). Hierarchical Bayesian approaches to phase II trials in diseases with multiplesubtypes. Statistics in Medicine, 22(5), 763-780.[7] Berry, S. M., Broglio, K. R., Groshen, S., & Berry, D. A. (2013). Bayesian hierarchicalmodeling of patient subpopulations: eﬃcient designs of phase II oncology clinical trials.Clinical Trials, 10(5), 720-734.[8] Schmidli, H., Gsteiger, S., Roychoudhury, S., O’Hagan, A., Spiegelhalter, D., & Neuen-schwander, B. (2014). Robust metaanalyticpredictive priors in clinical trials with his-torical control information. Biometrics, 70(4), 1023-1032.299] Pan, H., Yuan, Y., & Xia, J. (2017). A calibrated power prior approach to borrowinformation from historical data with application to biosimilar clinical trials. Journal ofthe Royal Statistical Society: Series C (Applied Statistics), 66(5), 979-996.[10] Neuenschwander.B, Branson.M, & Spiegelhalter DJ. (2000). A note on the power prior,Statistics in Medicine, 28(28), 3562-3566.[11] Freidlin, B., & Korn, E. L. (2013). Borrowing information across subgroups in phase IItrials: is it useful?. Clinical Cancer Research, 19(6), 1326-1334.[12] Chu, Y., & Yuan, Y. (2018). BLAST: Bayesian latent subgroup design for basket trialsaccounting for patient heterogeneity. Journal of the Royal Statistical Society: Series C(Applied Statistics), 67(3), 723-740.[13] Hobbs, B. P., Carlin, B. P., & Sargent, D. J. (2013). Adaptive adjustment of the ran-domization ratio using historical control data. Clinical Trials, 10(3), 430-440.[14] Morita, S., Thall, P. F., & Mller, P. (2008). Determining the eﬀective sample size of aparametric prior. Biometrics, 64(2), 595-602.[15] Neuenschwander, B., Weber, S., Schmidli, H., & OHagan, A. (2020). Predictively con-sistent prior eﬀective sample sizes. Biometrics, 1-10.[16] Chen, N., Carlin, B. P., & Hobbs, B. P. (2018). Web-based statistical tools for theanalysis and design of clinical trials that incorporate historical controls. ComputationalStatistics & Data Analysis, 127, 50-68.[17] Gelman, A., Carlin, J. B., Stern, H. S., Dunson, D. B., Vehtari, A., & Rubin, D. B.(2013). Bayesian data analysis. CRC press.[18] Gelman, A., Meng, X. L., & Stern, H. (1996). Posterior predictive assessment of modelﬁtness via realized discrepancies. Statistica Sinica, 6(4), 733-760.3019] Pennello, G., & Thompson, L. (2008). Experience with Reviewing Bayesian MedicalDevice Trials. Journal of Biopharmaceutical Statistics, 18(1),81-115.31 ppendix A. Proof of Theorem 1

Suppose the chi-square test statistic is used to measure the congruency T between D h and D c . Since the chi-square statistic of homogeneity is consistent, T → n h → ∞ and n c → ∞ , when D h and D c are congruent. Given b > g ( T ) = { a + b × log( T ) } → D h and D c areincongruent, T → ∞ as n h → ∞ and n c → ∞ . Given b > g ( T ) = { a + b × log( T ) } → B. Grid search for percentile combination ( q , q )Let ( q (1)0 , · · · , q ( J )0 ) and ( q (1)1 , · · · , q ( K )1 ) denote the prespeciﬁed searching grid for q and q ,respectively. We used q (1)0 = q (1)1 = 0 . q ( J )0 = q ( K )1 = 0 .

9, and set a grid step of 0.1. Thefollowing steps are used to ﬁnd the ( q , q ) that optimizes the utility U ( q , q ).1. Given a speciﬁc grid ( q ( j )0 , q ( k )1 ), determine the elastic function using equation (5).2. Given the obtained elastic function, under the congruent case ( θ h = θ c ), calibrate theprobability cutoﬀ C to control the type I error rate at a nominal value of 5% andcompute the power ( ρ ) through simulation.3. Given the cutoﬀ C , compute the type I error ( ψ ) under the incongruent case (e.g., θ c = θ h − δ ).4. Identify ( q ( j )0 , q ( k )1 ) that produces the largest value of U ( q ( j )0 , q ( k )1 ) = ρ − w ψ − w ( ψ − η ) I ( ψ > η ).For the step elastic function, the calibration of q is similar to that shown above.32he main diﬀerence is that we only need to search over a one-dimensional grid ( q (1)0 , · · · , q ( J )0 ),which greatly reduces the optimization time. C. Determination of elastic function for a normal endpoint

The steps to determine elastic function are similar to these for the binary endpoint, anddescribed as follows:1. Estimate the mean and variance of D h by ˆ θ h = ¯ y h and ˆ σ h = (cid:80) n h i =1 ( y h,i − ¯ y h ) / ( n h − y h = (cid:80) n h i =1 y h,i /n h .2. Elicit from subject matter experts a clinically meaningful diﬀerence δ for E ( y ).3. ( Congruent case ) Simulate R replicates of D c = ( y c, , · · · , y c,n c ) from N (ˆ θ h , ˆ σ h ), andcalculate congruence measure T between D h and each simulated D c , resulting in T =( T , · · · , T R ), where T r denote the value of T based on the r th simulated D c .4. ( Incongruent cases ) Simulate R replicates of D c from N (ˆ θ h + 2 δ, ˆ σ h ), and calculate con-gruence measure T between D h and each simulated D c , resulting in T +1 = ( T +1 , · · · , T + R ),where T + r denote the value of T based on the r th simulated D c . Repeat this with D c simulated from N (ˆ θ h − δ, ˆ σ h ), resulting in T − = ( T − , · · · , T − R ), where T − r denote thevalue of T between D h and the r th simulated D c .5. Let C and C be constants close to 1 and 0, respectively, e.g., C = 0 .

99 and C = 0 . T q denotes the q th percentile of T , T + q and T − q denote the q th percentile of T +1 and T − , respectively, and deﬁne T q = min ( T + q , T − q ).6. Based on T q and T q , determine the elastic function (2) by equation (5). D. Elastic power prior and elastic commensurate prior

The idea of an elastic prior also can be applied to the power prior and commensurate prior,33nd we refer to them as elastic power prior and elastic commensurate prior.

D1. Elastic power prior

With the power prior, the power parameter δ is treated as an unknown parameter, whilewith an elastic power prior, δ is linked with T by an elastic function g ( · ), that is, δ = g ( T ; φ ) . (11)Then the elastic power prior is given by π ∗ ( θ | D h ) ∝ π ( θ ) f ( D h | θ ) g ( T ) , (12)where the elastic function g ( T ) is same as equation (2), which maps support of T to [0 , θ, σ ) is π ∗ ( θ, σ | D h ) ∝ ( 1 σ ) m + g ( T ) nh exp[ − g ( T ) n h σ { ˆ σ h + ( θ − ¯ y h ) } ] ∝ N θ (¯ y h , σ g ( T ) n h ) IG σ ( m + g ( T ) n h − , g ( T ) n h ˆ σ h . (13)Given current data, the posterior distribution for ( θ, σ ) is π ( θ, σ | D, D h ) ∝ N θ ( g ( T ) n h y h + n c y c g ( T ) n h + n c , σ g ( T ) n h + n c ) IG σ ( α ∆ , β ∆ ) , (14)34here α ∆ = m + g ( T ) n h + n c − , β ∆ = (cid:80) nci =1 y c,i + g ( T ) n h y h + g ( T ) n h ˆ σ h − ( g ( T ) n h y h + n c y c ) n h g ( T )+2 n c .For a binary endpoint, the elastic power prior of p is π ∗ ( p | D h ) ∝ p g ( T ) n h ¯ y h + α − (1 − p ) g ( T )( n h − n h ¯ y h )+ β − ∝ Beta ( g ( T ) n h ¯ y h + α , g ( T )( n h − n h ¯ y h ) + β ) . (15)Based on the current data D c , the posterior of p is given as π ( p | D, D h ) ∝ Beta ( g ( T ) n h ¯ y h + α + n c y c , g ( T )( n h − n h ¯ y h ) + β + n c − n c y c ) . (16) D2. Elastic commensurate prior

With a commensurate prior, shrinkage parameter τ controls the degree that θ shrinks to θ h ,and it is assumed unknown with a prior. However, with an elastic commensurate prior, τ isdetermined by T through the elastic function g ( T ), i.e., τ = g ( T ; φ ) . (17)Then the elastic commensurate prior for θ is π ∗ ( θ | D h , g ( T )) ∝ (cid:90) θ h f ( D h | θ h ) π ( θ | θ h , g ( T )) π ( θ h ) dθ h . (18)Since τ is located in (0 , + ∞ ), we adopt the following elastic function: g ( T ) = exp ( a + b · log ( T )) . (19)35f a larger value of T indicates more incongruence between D c and D h , we require b < T leads to a smaller value of g ( T ) (i.e., a larger varianceinﬂation). The calibration of g ( T ) is similar to that described in Section 2.Let us return to the Gaussian case. We ﬁrst focus on the historical informationborrowing for location parameter θ , that is θ | θ h ∼ N ( θ h , τ − ), where τ = g ( T ). Assuming π ( θ h ) ∝ θ h , the elastic commensurate priorfor θ is π ∗ ( θ | D h , g ( T )) ∝ N (¯ y h , g ( T ) + ˆ σ h n h ) . (20)Multiplying the above elastic commensurate prior with the current likelihood, we obtain thefollowing posterior distribution for θ : π ( θ | D, D h , σ ) ∝ N ( n c y c ∆ + σ y h n c ∆ + σ , σ ∆ n c ∆ + σ ) , (21)where ∆ = g ( T ) + (cid:98) σ h n h .If the information borrowing both for location parameter θ and scale parameter σ are required, a new precision parameter ζ is introduced to measure the commensuratebetween σ and σ h . Speciﬁcally, we assume σ a prior that is centered at σ h with precision ζ , e.g., σ | σ h ∼ IG ( σ h , ζ − ), where IG ( · ) is an inverse gamma distribution with mean σ h and variance ζ − . With an elastic commensurate prior, precision τ = g ( T ) and ζ = g ( T ).Given historical data D h , assuming a prior π ( σ h ) ∝ ( σ h ) − m for σ h and integrating out θ h ,36he joint elastic commensurate prior for ( θ, σ ) is π ∗ ( θ, σ , σ h | D h , g ( T ) , g ( T )) ∝ f ( D h | θ h , σ h ) N θ ( θ h , g ( T ) − ) IG σ ( α (cid:48) , β (cid:48) ) × ( σ h ) − m ∝ N θ ( y h , g ( T ) + σ h n h ) IG σ ( α (cid:48) , β (cid:48) ) × IG σ h ( n h −

32 + m, n h ˆ σ h , (22)where α (cid:48) = g ( T ) σ h + 2 and β (cid:48) = σ hh

32 + m, n h ˆ σ h , (22)where α (cid:48) = g ( T ) σ h + 2 and β (cid:48) = σ hh ( g ( T ) σ hh