Elastic Priors to Dynamically Borrow Information from Historical Data in Clinical Trials
EElastic Priors to Dynamically Borrow Informationfrom Historical Data in Clinical Trials
Liyun Jiang , , Lei Nie , and Ying Yuan Department of Biostatistics, The University of Texas MD Anderson Cancer Center, Hous-ton, TX Center for Drug Evaluation and Research, Food and Drug Administration (FDA), SilverSpring, MD Research Center of Biostatistics and Computational Pharmacy, China PharmaceuticalUniversity, Nanjing, China.
Abstract : Use of historical data and real-world evidence holds great potential to improvethe efficiency of clinical trials. One major challenge is how to effectively borrow informationfrom historical data while maintaining a reasonable type I error. We propose the elasticprior approach to address this challenge and achieve dynamic information borrowing. Unlikeexisting approaches, this method proactively controls the behavior of dynamic informationborrowing and type I errors by incorporating a well-known concept of clinically meaningfuldifference through an elastic function, defined as a monotonic function of a congruencemeasure between historical data and trial data. The elastic function is constructed to satisfya set of information-borrowing constraints prespecified by researchers or regulatory agencies,such that the prior will borrow information when historical and trial data are congruent, butrefrain from information borrowing when historical and trial data are incongruent. In doingso, the elastic prior improves power and reduces the risk of data dredging and bias. Theelastic prior is information borrowing consistent, i.e. asymptotically controls type I and II1 a r X i v : . [ s t a t . M E ] S e p rrors at the nominal values when historical data and trial data are not congruent, a uniquecharacteristics of the elastic prior approach. Our simulation study that evaluates the finitesample characteristic confirms that, compared to existing methods, the elastic prior hasbetter type I error control and yields competitive or higher power.KEY WORDS: Real-word data; Historical data; Dynamic information borrowing; Elasticprior; Elastic MAP prior; Adaptive design Real-world data (RWD) or evidence plays an increasingly important role in health caredecisions. The 21st Century Cures Act, signed into law in 2016, emphasizes modernizationof clinical trial designs, including the use of real-world evidence to support approval ofnew indications for approved drugs or to satisfy post-approval study requirements. TheFDA released related guidance in the “Use of Real-World Evidence to Support RegulatoryDecision-Making for Medical Devices” [1] in 2017, and a draft guidance on “SubmittingDocuments Using Real-World Data and Real-World Evidence to FDA for Drugs and BiologicsGuidance for Industry” [2] in 2019.Use of RWD to facilitate medical decisions is an extremely broad topic. We herefocus on the use of historical data to improve the efficiency and guide decision making ofrandomized controlled trials (RCTs). For ease of exposition, we assume two-arm RCTs andhistorical data are only available on the control. It is straightforward to extend the proposedmethodology to multiple-arm RCTs and to cases where historical data are also available forthe treatment arm. The question of interest is how to leverage information from historical2ata to increase the power of comparing the treatment efficacy between the control andtreatment arms. This problem is also known as augmenting the control arm with historicaldata or RWD.Under the Bayesian paradigm, such information borrowing is straightforward ifhistorical data D h are congruent (or exchangeable) to control data D c . Let θ denote theparameter of interest (e.g., mean of the efficacy endpoint). We start with assigning θ anon-informative or vague prior π ( θ ), combined with D h , to obtain its posterior π ( θ | D h ),and then use that posterior as the prior for D c to make the comparison between control andtreatment arms. Such full information borrowing, however, is not appropriate when D h arepartially or not congruent to D c , leading to bias. If the bias favors treatment, the type Ierror rate will be inflated. If the bias favors control, the power of the study will reduce.Various approaches have been proposed for dynamic information borrowing, suchthat the amount of information borrowed from D h is automatically adjusted according tothe congruence between D h and D c . Chen and Ibrahim [3, 4] proposed a power prior, whichcontrols the degree of information borrowing through a “power parameter.” Hobbs et al.(2011) [5] proposed a commensurate prior that allows for the commensurability of the infor-mation in the historical data and current data to determine how much historical informationto use. Thall et al. (2003) [6] and Berry et al. (2013) [7] proposed to use the Bayesian hierar-chical model to borrow information from different data resources or subgroups. Schmidli etal. (2014) [8] proposed a robust meta-analytic-predictive (MAP) prior to borrow informationfrom historical data via a mixture prior. Pan, Yuan, and Xia (2017) [9] proposed a calibratedpower prior, assuming the availability of patient-level historical data. However, most of thesemethods have difficulty achieving dynamic information borrowing, leading to substantiallyinflated type I error and bias, as noted previously by Neuenschwander et al. [10], Freidlinand Korn [11], and Chu and Yuan [12], among others.3n this paper, we propose a general Bayesian method with elastic priors to addressthe aforementioned issue. Unlike many existing approaches, the proposed method proactivelycontrols the behavior of dynamic information borrowing through an elastic function, definedas a monotonic function of a congruence measure between D h and D c . The elastic function isconstructed to satisfy a set of prespecified information borrowing constraints. For example, aborrowing constraint can be set based on a prespecified clinically meaningful difference suchthat the amount of borrowing decreases when the difference between D h and D c increases.This control leads to a substantially reduced risk of bias. Asymptotically, the elastic priorapproach maintains type I and II errors at the nominal value when D h and D c are notcongruent. In contrast, most existing dynamic information borrowing methods, includingthe power prior, commensurate prior, and robust MAP prior, do not have this characteristic.The elastic prior also demonstrates superior finite sample characteristics. Our simulationstudy confirms that, compared to existing methods, the elastic prior approach controls typeI errors better, yielding a competitive or higher power. Other desirable characteristics ofthe elastic prior approach include that it is straightforward to determine the prior effectivesample size (PESS) contained in the elastic prior, and the elastic prior is defined independentof trial data D c and thus can be fully pre-specified.The remainder of this article is organized as follows. In Section 2, we introduce theelastic prior method. In Section 3, we evaluate the operating characteristics of the proposedmethod using simulation, and we conclude with a brief discussion in Section 4.4 Methods
Consider a two-arm RCT, let y denote the efficacy endpoint that is a binary variable followinga Bernoulli distribution or a continuous variable following a normal distribution. Let θ c and θ t denote E ( y ) for the control and treatment arms, respectively. The objective of the trialis to compare θ t with θ c to determine whether the treatment is superior, noninferior, orequivalent to the control. Under the Bayesian paradigm, the decision can be made basedon the following criterion: the treatment is deemed superior, noninferior, or equivalent tothe control if Pr( M L < θ t − θ c < M H | D c , D t , D h ) > C , where M L and M H are prespecifiedmargins, C is a probability cutoff. For example, superiority trials typically set M L = 0 and M H = ∞ ; noninferiority trials set M H = ∞ and M L = − M , where M is the noninferioritymargin; and equivalence trials set ( M L , M H ) = ( − E, E ), where E is the equivalence margin.We assume that historical data D h are only available to the control. Thus, we focus onthe posterior inference of θ c and suppress its subscript when no confusion is caused. In theanalysis, the posterior inference for θ t will be done using standard Bayesian methods (e.g.,using a conventional noninformative or vague prior).The basic idea of an elastic prior is straightforward. Let π ( θ ) denote a vague initialprior that reflects prior knowledge about θ before D h is observed. Applying the prior π ( θ ) to D h , we obtain a posterior distribution π ( θ | D h ). The elastic prior is constructed by inflatingthe variance of π ( θ | D h ) by a factor of g ( T ) − , where T is a congruence measure between D h and D c , and g ( T ) is a monotonically decreasing function with values between 0 and 1.When T →
0, reflecting a prefect congruence measure between D h and D c , g ( T ) → π ( θ | D h ) will be fully used as a prior. When T → ∞ , reflecting substantial incongruencemeasure between D h and D c , g ( T ) → Let n h and n c respectively denote the sample size of D h and D c , D c = ( y c, , · · · , y c,n c )and D h = ( y h, , · · · , y h,n h ), where y h,i i.i.d ∼ Bernoulli ( θ h ) and y c,i i.i.d ∼ Bernoulli ( θ ). Let y h = (cid:80) n h i =1 y h,i / n h and y c = (cid:80) n c i =1 y c,i / n c . Assuming a vague prior π ( θ h ) ∼ Beta ( α , β ),with small values of α and β (e.g., α = β = 0 . π ( θ h ), which results in a posterior of θ of the form. π ( θ h | D h ) ∝ Beta ( α + n h y h , β + n h − n h y h ) . The elastic prior is given by π ∗ ( θ | D h ) ∝ Beta (( α + n h y h ) g ( T ) , ( β + n h − n h y h ) g ( T )) . (1)The elastic prior π ∗ ( θ | D h ) has the same mean as π ( θ | D h ), but inflates the latter’s varianceby a factor of g ( T ) − . Given π ∗ ( θ | D h ), the posterior of θ after accounting for D c is π ( θ | D h , D c ) = Beta (( α + n h y h ) g ( T ) + n c y c , ( β + n h − n h y h ) g ( T ) + n c − n c y c ) . We now discuss how to choose congruence measure T and elastic function g ( · ). Fora binary endpoint, there are many different choices for congruence measure T . For example,6e may consider T = | y c − y h | (cid:113) y (1 − y )( n c + n h ) , where y = ( y c n c + y h n h ) / ( n c + n h ) is a pooled sample mean. While different choices of T have different advantages; in this paper, we choose the chi-square test statistic: T = (cid:88) j = c,h ( O j − E j ) E j + (cid:88) j = c,h ( O j − E j ) E j , where O j and O j are the observed number of responders and non-responders for D c and D h ; E j and E j are the expected number of responders and non-responders, which are givenby E j = n j (cid:80) j = c,h n j − (cid:80) j = c,h (cid:80) n j i =1 y j,i (cid:80) j = c,h n j , E j = n j (cid:80) j = c,h (cid:80) n j i =1 y j,i (cid:80) j = c,h n j . A large value of T ∈ (0 , ∞ ) indicates low congruence between D c and D h .Elastic function g ( T ) serves as a link function that maps congruence measure T toan information discount factor. Any monotonic function could be used as an elastic function,as long as g ( T ) → T corresponds to congruence and g ( T ) → T corresponds to incongruence. In this paper, we choose g ( T ) = 11 + exp { a + b × log( T ) } , (2)where a and b > a and b later. When appropriate, a more flexible elastic function g ( T ) = a + b ×{ log( T ) } c ] can be used to further control the rate of change from borrowing to noborrowing using the additional parameter c (see Figure 1 (a)). It can be shown that theresulting elastic prior has the following consistence property:7 heorem 1 The elastic prior defined in (1) is information-borrowing consistent. Thatis, when n h → ∞ and n c → ∞ , it achieves full information borrowing if D h and D c arecongruent (i.e., θ h = θ ), and discards D h if D h and D c are incongruent (i.e., θ h (cid:54) = θ ).The biggest concern and barrier for adopting information-borrowing methods inclinical trials is the potential risk of type I or II error inflation caused by the informationborrowing when D h and D c are actually incongruent. Theorem 1 shows that, asymptotically,the elastic prior maintains a type I error at the nominal value when D h and D c are notcongruent. In contrast, most existing dynamic information borrowing methods, includingthe power prior, commensurate prior, and robust MAP prior, do not have this property. Toachieve the information-borrowing consistency, they typically require the number of historicaldatasets (not the number of observations within each historical dataset) goes to infinity,which is not the case in practice.In finite samples, however, strictly controlling a type I error at its nominal valueis impossible for any information-borrowing methods, including the elastic prior approach.The reason is simple: when θ h (cid:54) = θ , the type I error inflates whenever information-borrowingis triggered. With finite sample, even when θ h (cid:54) = θ , there is non-zero probability that theobserved D h and D c are comparable and trigger (inappropriate) information borrowing, thusinflating the type I or II error. Theorem 2
For any method that borrows information from historical or other externaldata, dynamically or non-dynamically, the inflation of type I or II error is inevitable underfinite samples, depending on whether historical or other external data under- or over-estimatethe treatment effect of the control arm when compared to the current data.Theorem 2 is important, because it sets a realistic expectation for information-borrowingmethods and avoids vain efforts to pursue a dynamic information borrowing method that8an strictly control type I errors in finite samples.Since the inflation of type I or II errors is inevitable with information borrowing,one reasonable strategy is to control type I and II error inflation according to certain pre-specified criteria. This motivates the following procedure to choose the elastic function (2),as illustrated in Figure 2. Without loss of generality, we assume a large value of T indicateslarger incongruence between D h and D c .1. Elicit from subject matter experts a clinically meaningful difference (CMD), denotedas δ , for E ( y ). The CMD is routinely used in clinical trial planning, including forsample size determination and power calculation, and its determination often requirescommunication between sponsors and regulatory bodies.2. ( Congruent case ) Simulate R replicates of D c = ( y c, , · · · , y c,n c ) from Bernoulli (ˆ θ h ),with ˆ θ h = ¯ y h , and calculate congruence measure T between D h and each simulated D c , resulting in T = ( T , · · · , T R ), where T r denotes the value of T based on the r thsimulated D c .3. ( Incongruent cases ) Simulate R replicates of D c from Bernoulli (ˆ θ h + 2 δ ), and calcu-late congruence measure T between D h and each simulated D c , resulting in T +1 =( T +1 , · · · , T + R ), where T + r denotes the value of T based on the r th simulated D c . Re-peat this with D c simulated from Bernoulli (ˆ θ h − δ ), resulting in T − = ( T − , · · · , T − R ),where T − r denotes the value of T between D h and the r th simulated D c .4. Let C and C be constants close to 1 and 0, respectively, e.g., C = 0 .
99 and C = 0 . T q denote the q th percentile of T , T + q and T − q denote the q th percentile of T +1 and T − , respectively, and define T q = min ( T + q , T − q ). Determine the elastic function92) by solving the following two equations: C = g ( T q ) , (3) C = g ( T q ) , (4)where the first equation enforces (approximately) full information borrowing, and thesecond essentially enforces no information borrowing. This leads to the solution g ( T ) = 11 + exp { a + b × log( T ) } , where a = log( 1 − C C ) − log( (1 − C ) C (1 − C ) C )(log( T q ))log( T q ) − log( T q ) ,b = log( (1 − C ) C (1 − C ) C )log( T q ) − log( T q ) . (5)Several remarks are warranted. In step 3, we generate incongruent cases by simulating D c from Bernoulli (ˆ θ h ± δ ), rather than Bernoulli (ˆ θ h ± δ ) (i.e., right at the CMD), because theobjective of step 3 is to simulate highly incongruent cases to prevent information borrowingby equation (4) in step 4. As it is often regarded as reasonable to borrow some informationwhen the difference between D h and D c is smaller than CMD, it is thus not appropriate toset the no-borrowing constraint right at the boundary. In step 4, as incongruence can occurin either direction (i.e., θ c is larger or smaller than θ h ), we take T q = min ( T + q , T − q ) to ensureno information borrowing under the more conservative direction.In step 4, q and q define the borrowing and no borrowing regions (see Figure 2).We may simply choose q = q = 0 .
5, i.e., median of T , T +1 , and T − . A better and more10exible approach is to choose q and q to maximize the trade-off between the power (in thecongruent case) and type I error (in the incongruent case). Toward this goal, let ρ denotethe power under the congruent case, ψ denote the type I error under the incongruent casedescribed in Step 3, and η is a type I error threshold. We define the utility: U ( q , q ) = ρ − w ψ − w ( ψ − η ) I ( ψ > η ) , (6)where w and w are penalty weights. This utility imposes a penalty of w for each unitincrease of a type I error before it reaches η , and then a penalty of w + w . In our simulation,we set w = 1, w = 2, and η = 0 .
1, which means that before the type I error reaches 0.1,the penalty for a 1% increase of type I errors is to deduct the power by 1%; and once thetype I error exceeds 0.1, the penalty for a 1% increase of type I errors increases to deduct thepower by 3%. Through a grid search (see Appendix for the procedure), we can identify the( q , q ) that maximize U ( q , q ). Although this approach is more complicated than directlysetting q = q = 0 .
5, it results in better performance, thus we generally recommend thisapproach.A special form of the elastic function, with T q ≡ T q (see Figure 1 (b)), is thefollowing step function g ( T ) = T ≤ T q T > T q , (7)where full information borrowing occurs if T ≤ T q , and no information borrowing occurs if T > T q . Compared to smooth elastic function (2), one advantage of the step elastic functionis that its calibration is simpler, needing only two steps:1. (Congruent case) Simulate R replicates of D c = ( y c, , · · · , y c,n c ) from Bernoulli (ˆ θ h ),11ith ˆ θ h = ¯ y h , and calculate congruence measure T between D h and each of the sim-ulated D c ’s, resulting in T = ( T , · · · , T R ), where T r denote the value of T based onthe r th simulated D c .2. Use a grid search to identify the T q that maximizes utility U ( q ).Numerical study shows that the step elastic function can achieve similar operating character-istics as a smooth function, but with greater simplicity, making it a good choice for practicaluse. The elastic prior approach has several desirable design characteristics, making itan appealing choice for prespecified analysis. One desirable characteristic is that the elasticfunction can be fully pre-specified and defined independent of trial data D c . With thepre-specified elastic function, the amount of information borrowing is determined by a pre-specified congruence measure T between historical and current trial data. We expect pre-specification would be a desired characteristics whenever possible. The elastic prior approachsatisfies or goes beyond the requirement of pre-specification that “In general, Bayesian CIDproposals should include a robust discussion of the prior distribution...a Bayesian proposalshould also include a discussion explaining the steps the sponsor took to ensure informationwas not selectively obtained or used.In cases where downweighting or other non-data-drivenfeatures are incorporated in a prior distribution, the proposal should include a rationalefor the use and magnitude of these features.” as briefly discussed in the draft Guidance forIndustry on Interacting with the FDA on Complex Innovative Trial Designs for Drugs andBiological Products.Another desirable characteristic is the straightforward determination of the prioreffective sample size (PESS) contained in the elastic prior, which is simply g ( T ) n h as g ( T ) isa variance inflation factor. In contrast, determining PESS for existing methods (e.g., com-12ensurate prior and robust MAP prior) is more involved, and we found that different PESScalculations used by these methods [13–15] often led to substantially different, sometimesimproper results (e.g., PESS > n h ) [13, 16]. Consider a normal endpoint y c,i iid ∼ N ( θ, σ ) and y h,i iid ∼ N ( θ h , σ h ), with interest in estimating θ . With a noninformative prior π ( θ h ) ∝ D h , the posterior of θ h is π ( θ h | D h , σ h ) ∝ π ( θ h ) f ( D h | θ h , σ h ) = N ( y h , σ h n h ) . An unknown σ h is often replaced by its maximum likelihood estimate ˆ σ h = (cid:80) n h i =1 ( y h,i − ¯ y h ) (cid:14) n h .The elastic prior of θ is obtained by inflating the variance of π ( θ h | D h , σ h ) with the elasticfunction g ( T ) as follows: π ∗ ( θ | D h , σ h ) = N (¯ y h , σ h n h g ( T ) ) . (8)Analogue to Section 2.1, the prior effective sample size for π ∗ ( θ | D h , σ h ) is simply g ( T ) n h .Full information borrowing is achieved when g ( T ) = 1, and no information borrowing occurswhen g ( T ) = 0. In this scenario, the power prior may obtain similar prior in (2.2). The keydifference is that g ( T ) is pre-specified to proactively control type I and II error rates and itsexpected value is known prior to the trial conduct. In addition, as the power prior works bydiscounting the whole likelihood, it does not allow parameter-specific adaptive informationborrowing, for example, when we are interested in estimating and information borrowing onboth θ and σ as describe later.The elastic function (2) or step elastic function (7) can be used to dynamically13ontrol information borrowing based on the congruence measure T . When subject-level dataare available for D h , the Kolmogorov-Smirnov (KS) statistic can be used as the congruencemeasure between D c and D h . T = max i =1 ,...,N {| F ( Z ( i ) ) − G ( Z ( i ) ) |} , (9)where N = n c + n h ; F ( · ) and G ( · ) are the empirical distribution functions for D h and D c ,respectively; Z (1) ≤ · · · ≤ Z ( N ) are the N = m + n ordered values for the combined sampleof D h and D c . When D h only contains summary statistics (e.g., mean and standard error), t statistic is a reasonable choice for T , T = | ¯ y c − ¯ y h | s (cid:113) n h + n c , (10)where s = (cid:113) ( n c − s c +( n h − s h n c + n h − with s c and s h denoting the sample variance of D c and D h ,respectively. For both congruence measures, a larger value of T indicates less congruencebetween D h and D c . Again, it can be shown that the resulting elastic prior is information-borrowing consistent, as described in Theorem 1. Also, the choice of T is not unique (e.g., t statistic can also be used when D h consists of individual-level data) and can be tailoredto quantify inferential interest. For example, if the objective of the trial is to compare thevariance between the treatment and control arms, the F statistic of testing equal varianceis an appropriate measure for the congruence of D h and D c in variance. The calibrationprocedure of elastic function g ( T ; φ ) is similar to that for the binary endpoint and providedin the Appendix.If estimation of θ and σ is of interest, we can also construct the joint elastic priorfor ( θ, σ ). We first apply the noninformative prior π ( θ h , σ h ) ∝ (1 /σ h ) m to D h , where m is14 constant, resulting in the following posterior, π ( θ h , σ h | D h ) ∝ π ( θ h , σ h ) f ( D h | θ h , σ h ) ∝ N θ (¯ y h , σ h n h ) IG σ ( µ h , (cid:15) h ) , where IG ( · ) is an inverse gamma distribution with mean µ h = n h ˆ σ h n h − m and variance (cid:15) h = ( n h ˆ σ h ) ( n h − m ) ( nh − + m ) . The joint elastic prior for ( θ, σ ) is obtained by inflating the variance of π ( θ h , σ h | D h ) with two elastic functions g ( T ) and g ( T ), π ∗ ( θ, σ | D h ) ∝ N θ (¯ y h , σ n h g ( T ) ) IG σ ( µ h , (cid:15) h g ( T ) ) , where T is (9) or (10), and T is the F statistic of testing equal variance. Allowing parameter-specific information borrowing renders the elastic prior more flexibility than the power prior.Given the elastic prior and trial data D c , the posterior distribution for ( θ, σ ) is π ( θ, σ | D c , D h ) ∝ N θ ( n h g ( T ) y h + n c y c n h g ( T ) + n c , σ n h g ( T ) + n c ) IG σ ( α ∗ , β ∗ ) , where α ∗ = n c +42 + ( n h − + m ) g ( T ), and β ∗ = (cid:80) nci =1 y c,i + n h g ( T ) y h − ( n h g ( T ) y h + n c y c ) n h g ( T )+2 n c + n h (cid:98) σ h n h − m [1 + ( n h − + m ) g ( T )]. The proposed method can be extended to borrow information from K independent histor-ical datasets D h, , · · · , D h,K . For notational brevity, we here suppress subscript “ h ”, anddenote the k th historical dataset D k = ( y k, , · · · , y k,n k ) with sample mean y k = (cid:80) n k i =1 y k,i /n k .15he elastic prior can be obtained by sequentially applying the method described above to D , · · · , D K . Using a binary endpoint as an example, the steps to obtain the elastic priorare1. Starting with noninformative prior π ( θ ) ∼ Beta ( α , β ), obtain the elastic prior π ∗ ( θ | D ) for D , where π ∗ ( θ | D ) = Beta ( α , β ) with α = ( α + n y ) g ( T ) and β = ( β + n − n y ) g ( T ).2. Using π ∗ ( θ | D ) as the prior, combining with D , obtain the elastic prior π ∗ ( θ | D , D )for D and D , where π ∗ ( θ | D , D ) = Beta ( α , β ), where α = ( α + n y ) g ( T ) and β = ( β + n − n y ) g ( T ).3. Repeat step 2 sequentially to D , · · · , D K to obtain the elastic prior π ∗ ( θ | D , · · · , D K ).Elastic functions g ( T ) , · · · , g ( T K ) are calibrated independently using the procedure de-scribed previously based on D , · · · , D K , respectively. One advantage of this sequentialelastic prior is that its allow study-specific dynamic information borrowing with minimalinterference among D , · · · , D K . For example, if D is congruent to D c and D is not con-gruent to D c , the elastic prior will borrow more information from D and less informationfrom D . Another approach is to aggregate historical information through meta-analysis of D h, , · · · , D h,K , and then construct the elastic prior. This can be done using two steps: (1)perform meta-analysis on D h, , · · · , D h,K using the Bayesian hierarchical (or random-effects)model to obtain the posterior predictive distribution of θ (i.e., MAP prior), π ( θ | D h, , · · · , D h,K );(2) inflate the variance of the MAP prior using the elastic function g ( T ) to obtain the elasticprior π ∗ ( θ | D h, , · · · , D h,K ). One challenge is how to choose an appropriate statistic T tomeasure the congruence between D c and K datasets. The congruence measure T discussed16reviously is applicable to each of D h, , · · · , D h,K , but it is not clear how to combine theminto a single global congruence measure. To address this issue, we borrow the concept ofthe posterior predictive model assessment method [17, 18]. The basic idea is that if D c iscongruent to D h, , · · · , D h,K , we could expect that the actual observed D c will be generallyconsistent with the data generated from π ( D c | D h, , · · · , D h,K ). Therefore, if the observed D c is located on the far tail of the predicted distribution of π ( D c | D h, , · · · , D h,K ), then D c is likely to be incongruent to the historical data. This motivates us to use the posterior pre-dictive p value as the congruence measure T . This approach is general and also can be usedfor a single historical dataset with various endpoints. Using a normal endpoint as example, T is calculated as follows:1. Draw R samples of θ from π ( θ | D h, , · · · , D h,K ), denoted as θ (1) , · · · , θ ( R ) . Given θ ( r ) ,simulate trial data D c = ( y c, , · · · , y c,n c ), and denote its sample mean as ¯ y ( r ) c , r =1 , · · · , R . In our simulation, we use R = 10 , y c denote the actual observed sample mean of D c ; the congruence measure isdefined as T = − log( P P ) , where P P = 2 × min ( (cid:80) Rr =1 I (¯ y ( r ) c > ¯ y c ) /R, (cid:80) Rr =1 I (¯ y ( r ) c < ¯ y c ) /R ) is the two-sidedposterior-predictive p value.Of note, Theorem 1 and 2, as well as desirable design characteristics, which were discussedin Section 2.1, also apply to Section 2.2 and 2.3.17 Simulation studies
In this section, we evaluate the finite-sample properties of the elastic prior approach andcompare them to some existing methods.
We considered scenarios that involve a two-arm superiority trial with one historical data D c ,where the endpoint is either a continuous variable with Gaussian distribution or a binaryvariable with a Bernoulli distribution. For a continuous endpoint, the sample sizes forhistorical data, control arm, and treatment arm were n h = 50, n c = 25, and n t = 50,respectively. We generated control data D c from N ( θ c , ) with θ c = 1, and treatment data D t from N ( θ t , ) with θ t = 1 and 1 .
5. The CMD is δ = 0 .
5. We generated the historical data D h from N ( θ h , ) and varied its mean θ h to simulate the scenarios where D h is congruentor incongruent. For a binary endpoint, the sample sizes for the historical data, control arm,and treatment arm were n h = 100, n c = 40, and n t = 80, respectively. We generated D c from Bernoulli ( θ c ) with θ c = 0 .
4, and D t from Bernoulli ( θ t ) with θ t = 0 .
4, 0.55, and 0.6. TheCMD is δ = 0 .
12. We generated D h from Bernoulli ( θ h ) and varied its mean θ h to simulatethe scenarios where D h is congruent or incongruent to D c . We considered the smooth elasticfunction (2) and step function (7) and denoted them as elastic prior 1 (EP1) and elasticprior 2 (EP2), respectively.We compared the proposed elastic prior with the commensurate prior (CP), (nor-malized) power prior (PP), and conventional non-informative prior (NP) that ignores histori-cal data. For CP, we considered two priors for its shrinkage parameter τ used in publications:log( τ ) ∼ U nif ( − ,
30) [5] (denoted as CP1), and spike-and-slab prior with a slab of (1, 2),18pike of 20 and Pr(slab)=0.98 (denoted as CP2) [16]. For PP, uniform prior
U nif (0 ,
1) isused for the power parameter. For EP1 and EP2, we set w = 1, w = 2, and η = 0 . θ t − θ c > | D c , D t , D h ) > C . The probability cutoff C is calibratedfor each method with 10,000 simulated trials such that under the null (i.e., θ c = θ t = θ h ,corresponding to scenario 1 in Tables 1 and 2), the type I error is 5%. The treatment armdoes not involve information borrowing and the posterior of θ t is obtained based on the con-ventional noninformative prior. Under other simulation configurations, we conducted 1000simulations. Table 1 shows the results for a normal endpoint. In scenarios 1 and 2, D h and D c arecongruent. When the treatment is not effective (i.e., scenario 1), all methods control thetype I error rate at its nominal value of 5%. When the treatment is effective (i.e., scenario2), EP1, EP2, CP1, and PP offer substantial power gain over NP. For example, the power ofEP1 is 27.1% higher than NP, and also slightly higher than CP1 and PP. EP2 has comparableperformance to EP1. In contrast, CP2 provides little power improvement, indicating that thespike-and-slab prior is too conservative to borrow information. Similar results are observedin scenarios 3 to 4, where D h and D c are approximately congruent. Scenarios 5-8 considerthe case that D h and D c are incongruent. Specifically, in scenarios 5 and 6, the treatment isnot effective, and the results are type I errors. Compared to CP1 and PP, EP1 and EP2 offerbetter type I error control. For example, in scenario 5, the type I errors of EP1 and EP2 are7.7% and 7.3%, whereas the type I errors of CP1 and PP are 14.6% and 30%, respectively.19P2 has little type I inflation because it barely borrows information, demonstrated by itslow power when the D h and D c are congruent (i.e., scenarios 3 and 4). In scenarios 7 and8, the treatment is effective, and the results are power. EP1 and EP2 yield higher power todetect the treatment effect than CP1 and PP. For example, in scenario 7, the power of EP2is 15.0% and 34.8% higher than CP1 and PP, respectively.Table 2 shows the results for a binary endpoint, which are generally consistent withthese for normal endpoint. Scenarios 1 to 4 consider the case that D h and D c are congruentor approximately congruent. In scenario 1, the treatment is not effective; all methods controlthe type I error rate at its nominal value of 5%. In scenario 2, the treatment is effective; EP1,EP2, CP1, and PP offer substantial power gain over NP. For example, the power of EP1 is15.9% higher than NP, and comparable to CP1 and PP. Akin to the normal endpoint, CP2is similar to NP with little information borrowing. Similar results are observed in scenarios3 and 4, where D h and D c are approximately congruent. Scenarios 5-8 consider the case that D h and D c are incongruent. Specifically, in scenarios 5 and 6 the treatment is not effective,and the results are type I errors. Compared to CP1, CP2 and PP, EP1 and EP2 offerbetter type I error control. For example, in scenario 5, the type I error of EP1 and EP2 isapproximately 1/2 and 1/4 of that of CP1, 1/3 and 1/5 of PP, and 3.6% (7.4%) lower thanCP2. In scenarios 7 and 8, the treatment is effective, and the results are power. EP1 andEP2 yield higher power to detect the treatment effect than found with CP1 and PP. Forexample, in scenario 7, the power of EP1 and EP2 are more than double that of CP1 andPP. 20 .3 Multiple historical datasets Taking a similar setting as the simulation with one historical dataset, we generated controlarm data D c from N ( θ c , ) with θ c = 1 and sample size n c = 25, and treatment arm data D t from N ( θ t , ) with θ t = 1 , . n t = 50. The CMD is δ = 0 .
5. We consideredfour historical datasets with sample size 40, 50, 45, and 55, respectively, generated from thefollowing hierarchical model: y k ∼ N ( θ k , ) , k = 1 , · · · , ,θ , θ , θ , θ ∼ N ( θ h , . ) . We varied θ h to simulate scenarios where D h is congruent or incongruent to D c . Similarly,we considered both the smooth elastic and step functions, and denoted them as elastic MAP1 (EMAP1) and elastic MAP 2 (EMAP2), respectively.We compared the elastic MAP priors with the robust MAP prior. FollowingSchmidli et al (2014) [8], we considered two versions of the robust MAP prior: Mix50 witha weight of 0.5 and the Mix90 design with a weight of 0.1 assigned to MAP. As the bench-mark, we also considered the conventional NP that ignores historical data. The treatmentis deemed superior to the control if Pr( θ t − θ c > | D c , D t , D h ) > C . The probability cutoff C is calibrated for each method with 10,000 simulated trials such that under the null (i.e., θ c = θ t = θ h , corresponding to scenario 1 in Table 3), the type I error is 5%. Under othersimulation configurations, we conducted 1000 simulations.Table 3 shows the results. When historical data and control data are congruent(i.e., scenarios 1 to 4), EMAP1 and EMAP2 have comparable performance to Mix50 andMix90. All methods control type I errors at the nominal value of 5% (scenario 1), and they21ield substantially higher power than the NB due to borrowing information from historicaldatasets. Scenarios 5-8 consider the case that historical data and control data are incongru-ent. In scenarios 5-6, the treatment is ineffective and the results are type I errors. EMAP1and EMAP2 offer better type I error control than the robust MAP. For example, in scenario5, the type I error of EMAP1 and EMAP2 are 8.5% and 7.8%, whereas that of Mix50 andMix90 are 14.1% and 26.4%, respectively. In addition, EMAP1 and EMAP2 provide sub-stantial power gain over Mix50 and Mix90. For example, in scenario 7, the power of EMAP1is 17.5% and 25.9% higher than Mix50 and Mix90, respectively, and EMAP2 has 23.3% and31.7% higher power than Mix50 and Mix90, respectively. We have proposed the elastic prior to dynamically borrow information from historical data.Through the use of elastic function, the elastic prior approach adaptively borrows informationbased on the congruence between trial data and historical data. The elastic function isconstructed based on a set of information-borrowing constraints prespecified such that theprior will borrow information when historical and trial data are congruent, and refrain frominformation borrowing when historical and trial data are incongruent. The elastic prior isinformation-borrowing consistent, and is easy to quantify using a prior effective sample size.Simulation study shows that, compared to existing methods, the elastic prior has better typeI error control, and yields competitive or higher power. In addition, we provide insights onwhat can and cannot be achieved using the information-borrowing method, which is usefulfor guiding future methodology development.The good performance of the elastic prior stems from the use of elastic function to22egulate the behavior of information borrowing within the range of the parameter space ofpractical interest. That is, the elastic prior does not completely rely on the data to deter-mine information borrowing. It also incorporates the subject matter knowledge (e.g., whenit should borrow or not) to enhance and govern the performance. In contrast, many existingmethods intend to achieve dynamic information borrowing by estimating the information-borrowing parameter (e.g., power parameter in power prior or shrinkage parameter in com-mensurate prior), jointly with model parameters, based on data. However, the data containextremely limited information for estimating the information-borrowing parameter becausethe observation unit contributing to the estimation is the dataset, not subject-level ob-servations. For example, one historical data and one trial data actually provide only twoobservations to estimate the information-borrowing parameter. This is a well-known issuein meta-analysis for estimating the between-study variation. As a result, these dynamicinformation borrowing methods cannot reliably sense the congruence/incongruence betweenhistorical data and trial data to perform appropriate information borrowing.The idea of an elastic prior is general, and it also can be applied to both commensu-rate and power priors to improve their operating characteristics. We outline this approach inthe Appendix. In addition, we have focused on two-arm randomized superiority trials withbinary or normal endpoints. The methodology can be applied to single-arm and multiple-arm trials, as well as other types of trials, for example, noninferiority trials. Extension ofthe elastic prior to the time-to-event endpoint is of practical interest and warrants furtherresearch. The type I error considered in this paper was referred as the usual view, wheretype I error rate is based on current trial(s) alone. Pennello and Thompson (2008) [19]also discussed a view 2, considering type I error rate based on current trial(s) and priordata considered together. Type I error rate with view 2 might be considered when we23xtrapolating adult results to a pediatric setting, when know before the adult trials, thatthe analysis in the pediatric setting will borrow from the adult results because of similarityin the course of disease, response to treatment, pharmacokinetic, and pharmacodynamic.
Disclaimer
This article reflects the views of the author, and it should not be construed to representFDA views or policies. 24 . . . . . . T g ( T ) ● ● . . . . . . T g ( T ) c=-6c=-2c=1c=4c=8 𝑇 ! ! 𝑇 ! " 𝑇 ! ! = 𝑇 ! " T T (a) (b)
Figure 1: (a) A class of smooth elastic functions defined by g ( T ) = a + b ×{ log( T ) } c ] , and(b) a step elastic function, where g ( T ) = 1 leads to full information borrowing and g ( T ) = 0leads to no information borrowing. T D en s i t y T T g ( T ) . . . . . . Fully borrow No borrowPartially borrowElastic functionDistribution of Twhen historicaland control dataare congruent Distribution of T whenhistorical and control dataare incongruent g(T) 𝑇 ! ! 𝑇 ! " T0 Figure 2: Dynamic information borrowing through the elastic function.25able 1: Simulation results for a normal endpoint using a noninformative prior (NP), elasticprior with the smooth elastic function (EP1) and step elastic function (EP2), commensurateprior with uniform prior (CP1) and spike-and-slab prior (CP2), and power prior (PP).
Percentage of claiming efficacy (PESS)Scenario θ h θ c θ t NP EP1 EP2 CP1 CP2 PPCongruent ∗ Incongruent ∗ ∗ -0.5 1 1 5.0 7.1(0.00) 7.0(0.00) 10.0(1.34) 8.8(3.24) 18.3(3.19)7 2 1 1.5 66.3 72.3(0.24) 72.6(0.35) 57.6(0.36) 60.7(3.58) 37.8(9.47)8 2.5 1 1.5 66.3 72.3(0.00) 72.6(0.00) 69.2(1.37) 57.6(3.25) 49.1(3.14)*Type I error Percentage of claiming efficacy (PESS)Scenario θ h θ c θ t NP EP1 EP2 CP1 CP2 PPCongruent ∗ Incongruent ∗ ∗ Percentage of claiming efficacy (PESS)Scenario θ h θ c θ t NP EMAP1 EMAP2 Mix50 Mix90Congruent ∗ Incongruent ∗ ∗ -0.2 1 1 5.0 7.7(0.03) 7.6(0.00) 9.7(15.00) 16.2(29.18)7 1.6 1 1.5 66.3 61.3(11.45) 67.1(8.41) 43.8(14.99) 35.4(29.18)8 2 1 1.5 66.3 75.3(0.00) 75.1(0.15) 63.0(15.00) 47.8(29.18)*Type I error eferences [1] US Food and Drug Administration. (2017). Use of real-world evidence to support reg-ulatory decision-making for medical devices. Guidance for industry and food and drugadministration staff.[2] US Food and Drug Administration. (2019). Submitting documents using real-world dataand real-world evidence to FDA for drugs and biologics: guidance for Industry: draftguidance. Rockville, MD: US Food and Drug Administration.[3] Ibrahim, J. G., & Chen, M. H. (2000). Power prior distributions for regression models.Statistical Science, 15(1), 46-60.[4] Ibrahim, J. G., Chen, M. H., & Sinha, D. (2003). On optimality properties of the powerprior. Journal of the American Statistical Association, 98(461), 204-213.[5] Hobbs, B. P., Carlin, B. P., Mandrekar, S. J., & Sargent, D. J. (2011). Hierarchical com-mensurate and power prior models for adaptive incorporation of historical informationin clinical trials. Biometrics, 67(3), 1047-1056.[6] Thall, P. F., Wathen, J. K., Bekele, B. N., Champlin, R. E., Baker, L. H., & Benjamin,R. S. (2003). Hierarchical Bayesian approaches to phase II trials in diseases with multiplesubtypes. Statistics in Medicine, 22(5), 763-780.[7] Berry, S. M., Broglio, K. R., Groshen, S., & Berry, D. A. (2013). Bayesian hierarchicalmodeling of patient subpopulations: efficient designs of phase II oncology clinical trials.Clinical Trials, 10(5), 720-734.[8] Schmidli, H., Gsteiger, S., Roychoudhury, S., O’Hagan, A., Spiegelhalter, D., & Neuen-schwander, B. (2014). Robust metaanalyticpredictive priors in clinical trials with his-torical control information. Biometrics, 70(4), 1023-1032.299] Pan, H., Yuan, Y., & Xia, J. (2017). A calibrated power prior approach to borrowinformation from historical data with application to biosimilar clinical trials. Journal ofthe Royal Statistical Society: Series C (Applied Statistics), 66(5), 979-996.[10] Neuenschwander.B, Branson.M, & Spiegelhalter DJ. (2000). A note on the power prior,Statistics in Medicine, 28(28), 3562-3566.[11] Freidlin, B., & Korn, E. L. (2013). Borrowing information across subgroups in phase IItrials: is it useful?. Clinical Cancer Research, 19(6), 1326-1334.[12] Chu, Y., & Yuan, Y. (2018). BLAST: Bayesian latent subgroup design for basket trialsaccounting for patient heterogeneity. Journal of the Royal Statistical Society: Series C(Applied Statistics), 67(3), 723-740.[13] Hobbs, B. P., Carlin, B. P., & Sargent, D. J. (2013). Adaptive adjustment of the ran-domization ratio using historical control data. Clinical Trials, 10(3), 430-440.[14] Morita, S., Thall, P. F., & Mller, P. (2008). Determining the effective sample size of aparametric prior. Biometrics, 64(2), 595-602.[15] Neuenschwander, B., Weber, S., Schmidli, H., & OHagan, A. (2020). Predictively con-sistent prior effective sample sizes. Biometrics, 1-10.[16] Chen, N., Carlin, B. P., & Hobbs, B. P. (2018). Web-based statistical tools for theanalysis and design of clinical trials that incorporate historical controls. ComputationalStatistics & Data Analysis, 127, 50-68.[17] Gelman, A., Carlin, J. B., Stern, H. S., Dunson, D. B., Vehtari, A., & Rubin, D. B.(2013). Bayesian data analysis. CRC press.[18] Gelman, A., Meng, X. L., & Stern, H. (1996). Posterior predictive assessment of modelfitness via realized discrepancies. Statistica Sinica, 6(4), 733-760.3019] Pennello, G., & Thompson, L. (2008). Experience with Reviewing Bayesian MedicalDevice Trials. Journal of Biopharmaceutical Statistics, 18(1),81-115.31 ppendix A. Proof of Theorem 1
Suppose the chi-square test statistic is used to measure the congruency T between D h and D c . Since the chi-square statistic of homogeneity is consistent, T → n h → ∞ and n c → ∞ , when D h and D c are congruent. Given b > g ( T ) = { a + b × log( T ) } → D h and D c areincongruent, T → ∞ as n h → ∞ and n c → ∞ . Given b > g ( T ) = { a + b × log( T ) } → B. Grid search for percentile combination ( q , q )Let ( q (1)0 , · · · , q ( J )0 ) and ( q (1)1 , · · · , q ( K )1 ) denote the prespecified searching grid for q and q ,respectively. We used q (1)0 = q (1)1 = 0 . q ( J )0 = q ( K )1 = 0 .
9, and set a grid step of 0.1. Thefollowing steps are used to find the ( q , q ) that optimizes the utility U ( q , q ).1. Given a specific grid ( q ( j )0 , q ( k )1 ), determine the elastic function using equation (5).2. Given the obtained elastic function, under the congruent case ( θ h = θ c ), calibrate theprobability cutoff C to control the type I error rate at a nominal value of 5% andcompute the power ( ρ ) through simulation.3. Given the cutoff C , compute the type I error ( ψ ) under the incongruent case (e.g., θ c = θ h − δ ).4. Identify ( q ( j )0 , q ( k )1 ) that produces the largest value of U ( q ( j )0 , q ( k )1 ) = ρ − w ψ − w ( ψ − η ) I ( ψ > η ).For the step elastic function, the calibration of q is similar to that shown above.32he main difference is that we only need to search over a one-dimensional grid ( q (1)0 , · · · , q ( J )0 ),which greatly reduces the optimization time. C. Determination of elastic function for a normal endpoint
The steps to determine elastic function are similar to these for the binary endpoint, anddescribed as follows:1. Estimate the mean and variance of D h by ˆ θ h = ¯ y h and ˆ σ h = (cid:80) n h i =1 ( y h,i − ¯ y h ) / ( n h − y h = (cid:80) n h i =1 y h,i /n h .2. Elicit from subject matter experts a clinically meaningful difference δ for E ( y ).3. ( Congruent case ) Simulate R replicates of D c = ( y c, , · · · , y c,n c ) from N (ˆ θ h , ˆ σ h ), andcalculate congruence measure T between D h and each simulated D c , resulting in T =( T , · · · , T R ), where T r denote the value of T based on the r th simulated D c .4. ( Incongruent cases ) Simulate R replicates of D c from N (ˆ θ h + 2 δ, ˆ σ h ), and calculate con-gruence measure T between D h and each simulated D c , resulting in T +1 = ( T +1 , · · · , T + R ),where T + r denote the value of T based on the r th simulated D c . Repeat this with D c simulated from N (ˆ θ h − δ, ˆ σ h ), resulting in T − = ( T − , · · · , T − R ), where T − r denote thevalue of T between D h and the r th simulated D c .5. Let C and C be constants close to 1 and 0, respectively, e.g., C = 0 .
99 and C = 0 . T q denotes the q th percentile of T , T + q and T − q denote the q th percentile of T +1 and T − , respectively, and define T q = min ( T + q , T − q ).6. Based on T q and T q , determine the elastic function (2) by equation (5). D. Elastic power prior and elastic commensurate prior
The idea of an elastic prior also can be applied to the power prior and commensurate prior,33nd we refer to them as elastic power prior and elastic commensurate prior.
D1. Elastic power prior
With the power prior, the power parameter δ is treated as an unknown parameter, whilewith an elastic power prior, δ is linked with T by an elastic function g ( · ), that is, δ = g ( T ; φ ) . (11)Then the elastic power prior is given by π ∗ ( θ | D h ) ∝ π ( θ ) f ( D h | θ ) g ( T ) , (12)where the elastic function g ( T ) is same as equation (2), which maps support of T to [0 , θ, σ ) is π ∗ ( θ, σ | D h ) ∝ ( 1 σ ) m + g ( T ) nh exp[ − g ( T ) n h σ { ˆ σ h + ( θ − ¯ y h ) } ] ∝ N θ (¯ y h , σ g ( T ) n h ) IG σ ( m + g ( T ) n h − , g ( T ) n h ˆ σ h . (13)Given current data, the posterior distribution for ( θ, σ ) is π ( θ, σ | D, D h ) ∝ N θ ( g ( T ) n h y h + n c y c g ( T ) n h + n c , σ g ( T ) n h + n c ) IG σ ( α ∆ , β ∆ ) , (14)34here α ∆ = m + g ( T ) n h + n c − , β ∆ = (cid:80) nci =1 y c,i + g ( T ) n h y h + g ( T ) n h ˆ σ h − ( g ( T ) n h y h + n c y c ) n h g ( T )+2 n c .For a binary endpoint, the elastic power prior of p is π ∗ ( p | D h ) ∝ p g ( T ) n h ¯ y h + α − (1 − p ) g ( T )( n h − n h ¯ y h )+ β − ∝ Beta ( g ( T ) n h ¯ y h + α , g ( T )( n h − n h ¯ y h ) + β ) . (15)Based on the current data D c , the posterior of p is given as π ( p | D, D h ) ∝ Beta ( g ( T ) n h ¯ y h + α + n c y c , g ( T )( n h − n h ¯ y h ) + β + n c − n c y c ) . (16) D2. Elastic commensurate prior
With a commensurate prior, shrinkage parameter τ controls the degree that θ shrinks to θ h ,and it is assumed unknown with a prior. However, with an elastic commensurate prior, τ isdetermined by T through the elastic function g ( T ), i.e., τ = g ( T ; φ ) . (17)Then the elastic commensurate prior for θ is π ∗ ( θ | D h , g ( T )) ∝ (cid:90) θ h f ( D h | θ h ) π ( θ | θ h , g ( T )) π ( θ h ) dθ h . (18)Since τ is located in (0 , + ∞ ), we adopt the following elastic function: g ( T ) = exp ( a + b · log ( T )) . (19)35f a larger value of T indicates more incongruence between D c and D h , we require b < T leads to a smaller value of g ( T ) (i.e., a larger varianceinflation). The calibration of g ( T ) is similar to that described in Section 2.Let us return to the Gaussian case. We first focus on the historical informationborrowing for location parameter θ , that is θ | θ h ∼ N ( θ h , τ − ), where τ = g ( T ). Assuming π ( θ h ) ∝ θ h , the elastic commensurate priorfor θ is π ∗ ( θ | D h , g ( T )) ∝ N (¯ y h , g ( T ) + ˆ σ h n h ) . (20)Multiplying the above elastic commensurate prior with the current likelihood, we obtain thefollowing posterior distribution for θ : π ( θ | D, D h , σ ) ∝ N ( n c y c ∆ + σ y h n c ∆ + σ , σ ∆ n c ∆ + σ ) , (21)where ∆ = g ( T ) + (cid:98) σ h n h .If the information borrowing both for location parameter θ and scale parameter σ are required, a new precision parameter ζ is introduced to measure the commensuratebetween σ and σ h . Specifically, we assume σ a prior that is centered at σ h with precision ζ , e.g., σ | σ h ∼ IG ( σ h , ζ − ), where IG ( · ) is an inverse gamma distribution with mean σ h and variance ζ − . With an elastic commensurate prior, precision τ = g ( T ) and ζ = g ( T ).Given historical data D h , assuming a prior π ( σ h ) ∝ ( σ h ) − m for σ h and integrating out θ h ,36he joint elastic commensurate prior for ( θ, σ ) is π ∗ ( θ, σ , σ h | D h , g ( T ) , g ( T )) ∝ f ( D h | θ h , σ h ) N θ ( θ h , g ( T ) − ) IG σ ( α (cid:48) , β (cid:48) ) × ( σ h ) − m ∝ N θ ( y h , g ( T ) + σ h n h ) IG σ ( α (cid:48) , β (cid:48) ) × IG σ h ( n h −
32 + m, n h ˆ σ h , (22)where α (cid:48) = g ( T ) σ h + 2 and β (cid:48) = σ hh
32 + m, n h ˆ σ h , (22)where α (cid:48) = g ( T ) σ h + 2 and β (cid:48) = σ hh ( g ( T ) σ hh