[PDF] Integration of Survival Data from Multiple Studies

Abstract

We introduce a statistical procedure that integrates survival data from multiple biomedical studies, to improve the accuracy of predictions of survival or other events, based on individual clinical and genomic profiles, compared to models developed leveraging only a single study or meta-analytic methods. The method accounts for potential differences in the relation between predictors and outcomes across studies, due to distinct patient populations, treatments and technologies to measure outcomes and biomarkers. These differences are modeled explicitly with study-specific parameters. We use hierarchical regularization to shrink the study-specific parameters towards each other and to borrow information across studies. Shrinkage of the study-specific parameters is controlled by a similarity matrix, which summarizes differences and similarities of the relations between covariates and outcomes across studies. We illustrate the method in a simulation study and using a collection of gene-expression datasets in ovarian cancer. We show that the proposed model increases the accuracy of survival prediction compared to alternative meta-analytic methods.

Full PDF

IIntegration of Survival Data from Multiple Studies

Steﬀen Ventz $ ∗ , Rahul Mazumder , Lorenzo Trippa $ $ Harvard T.H. Chan School of Public Health and Dana-Farber Cancer Institute

Massachusetts Institute of Technology

July 20, 2020

Abstract

We introduce a statistical procedure that integrates survival data from multiple biomed-ical studies, to improve the accuracy of predictions of survival or other events, basedon individual clinical and genomic proﬁles, compared to models developed leveragingonly a single study or meta-analytic methods. The method accounts for potential dif-ferences in the relation between predictors and outcomes across studies, due to distinctpatient populations, treatments and technologies to measure outcomes and biomark-ers. These diﬀerences are modeled explicitly with study-speciﬁc parameters. We usehierarchical regularization to shrink the study-speciﬁc parameters towards each otherand to borrow information across studies. Shrinkage of the study-speciﬁc parameters iscontrolled by a similarity matrix, which summarizes diﬀerences and similarities of therelations between covariates and outcomes across studies. We illustrate the method ina simulation study and using a collection of gene-expression datasets in ovarian can-cer. We show that the proposed model increases the accuracy of survival predictioncompared to alternative meta-analytic methods. ∗ steﬀ[email protected] a r X i v : . [ s t a t . A P ] J u l eywords: Meta-Analysis, Survival Analysis, Risk Prediction, Hierarchical Regular-ization.

Various biomedical technologies enable the use of omics information for prognostic purposes,to quantify the risk of diseases, or to predict response to treatments. Risk stratiﬁcation inoncology often utilizes a set of biomarkers to predict cancer progression or death within a timeperiod. The number of covariates can exceed the sample size. This makes the identiﬁcationof relevant genomic features for risk prediction and the development of accurate modelschallenging. Penalized regression and methods that utilize multiple datasets have beendiscussed in this context. Penalization methods enable parameter estimation and predictionwhen the number of predictors is large [46]. Meta-analyses [16] and integrated analyses [12]combine information from multiple studies for parameter estimation and prediction [24, 48].These statistical procedures improve the estimation of parameters of interest with respect tosingle-study estimates if the covariate-outcome relations are similar across studies [40, 3, 51].For instance Riester et al. [40], Bernau et al. [3] and Waldron et al. [51] showed that meta-analytic procedures tend to outperform the prediction accuracy of models developed usingonly a single study. But [40, 51, 47] also described heterogeneity of covariate eﬀects acrosscancer studies, due to diﬀerences in assays, treatments and patient populations.We introduce a model for the integrated analysis of a collection of datasets, with the aimof improving the accuracy of predictions, compared to single-study models and meta-analyticprocedures. We use study-speciﬁc parameters in covariate-outcome regression models. Theseparameters are estimated borrowing information across studies with hierarchical regulariza-tion, which shrinks the study-speciﬁc regression coeﬃcients towards each other. We use asquared K × K similarity matrix representative of diﬀerences and similarities of the covari-ates’ eﬀects across K studies. The matrix is used to estimate the study-speciﬁc models. The2egression parameters of each study are shrunken more towards the parameters of similarstudies and less towards the remaining studies.In previous work on integrative analyses Liu et al. [28] discussed Bayesian methods andvariable selection for accelerated failure time models. Hierarchical normal models for multi-study gene expression analyses have been developed in [12, 11, 13], and Ma et al. [30,31] studied penalized regression methods for integrative analyses, focusing on binary andaccelerated failure-time outcome models. For case-control analyses with multiple datasets,Liu et al. [29] proposed an adaptive group-LASSO (agLASSO) procedure, and Cheng et al.[10] extended the approach using diﬀerent regularization techniques. In the following sectionswe introduce a procedure which builds on the work that we mentioned. The procedure shrinkstudy-speciﬁc parameters towards each other accounting for the degrees of similarity speciﬁcof each pair of studies. We consider K studies with time-to-event outcomes and predictors such as gene expressionmeasurements. For each study k = 1 , . . . , K the vector Y k = { Y k,i } n k i =1 indicates (possiblycensored) survival times of n k individuals and C k = { C k,i } n k i =1 denotes the vector of censoringvariables, where C k,i = 1 if Y k,i is an observed event time and it is zero if there is censoring attime Y k,i . The vector x k,i ∈ R p represents a set of p predictors and X k = ( x k, , · · · , x k,n k ) T .Lastly, D k = ( Y k , C k , X k ) indicates the data from study k and D = {D k } Kk =1 is a collectionof studies.We assume that failure times in each study k follow a proportional hazard model [14]with baseline survival function S k ( · ) and study-speciﬁc coeﬃcients β k ∈ R p . The approachthat we will describe can be applied to alternative time to event models, for instance toaccelerated failure time models [53] or accelerated hazards models [9], computations wouldonly require minor modiﬁcations. 3nference is based on the Breslow’s modiﬁcation of the partial log-likelihood functions[14] for (possible tied) survival times, l ( β k , D k ) = m k (cid:88) (cid:96) =1  ˜ x (cid:48) k,(cid:96) β k − d k,(cid:96) log  (cid:88) i : Y k,i ≥ t k,(cid:96) exp { x (cid:48) k,i β k }  , where (cid:8) t k,(cid:96) (cid:9) m k (cid:96) =1 are the m k unique event times in study k , d k,(cid:96) denotes the number of observedevents at time t k,(cid:96) , ˜ x k,(cid:96) = (cid:88) i x k,i I ( C k,i = 1 , Y k,i = t k,(cid:96) ) , for (cid:96) = 1 , . . . , m k , and I ( A ) is theindicator function of the event A .When the number of predictors exceeds the number of observed events a unique maximumpartial-likelihood estimate does not exist and maximization of the regularized likelihoodfunction l ( β k , D k ) − R ( β k ) has been proposed [46, 34, 43] to obtain covariate eﬀect estimates.Here R ( β k ) is a non-negative function that equals zero when β k = . Popular approachesinclude the LASSO, ridge, elastic-net and the bridge penalties to name a few [25, 45, 46,19, 52, 43]. We refer to [5, 6, 50, 3] for comparisons of regularization methods for predictingsurvival outcomes using genomic proﬁles.As demonstrated in [3] studies, with nearly identical aims, often presents diﬀerent jointdistributions of predictors x k,i and outcomes Y k,i . Clusters of studies which correspond todiﬀerent essays, patient populations, treatments, and study designs have been discussed[3, 47]. We introduce a model with study-speciﬁc parameters β k , and a latent parameter β , which can be interpreted as the mean parameter across studies. Some studies will havesimilar vectors β k due to similarities in the assays and patient populations, while otherstudies might be considerably diﬀerent [3, 51]. We estimate the vectors β k by borrowinginformation from studies k (cid:48) (cid:54) = k that are similar to study k . At the same time, studies k (cid:48) that diﬀer substantially from study k will have little inﬂuence on the estimation of β k . Indiﬀerent words, borrowing of information mirrors similarities and diﬀerences across studies.The latent parameter and study-level parameters β = ( β , · · · , β K ) are estimated using the4egularized likelihood l R ( β ) = K (cid:88) k =1 l ( β k , D k ) − R ( β ) − R ( β ) . (1)Here the parameters β k can be interpreted as a noisy realization of β , the average eﬀectacross studies. The non-negative function R ( · ) regularizes β and is zero when β = (forexample a lasso penalty). Similarly, the non-negative function R ( · ) is zero when β = β = · · · = β K (see below for examples) and is used to borrow information across studies in theestimation of β . In our applications in Sections 4 and 5 we will use (cid:98) β for risk predictionsof patients in populations k > K that are not represented in our collection of K studies,whereas for patients belonging to populations k = 1 , . . . , K, the estimate (cid:98) β k can be directlyused for risk predictions.Penalized maximum likelihood estimates based on (1) have a Bayesian interpretation.See [45, 23, 37, 35] for a discussion on the relations between regularization methods andBayesian analyses. Consider a Bayesian model for the unknown parameters β , with priorprobability P r ( β ) ∝ e − R ( β ) for the vector β and P r ( β , . . . , β K | β ) ∝ e − R ( β ) for thestudy speciﬁc parameters conditionally on β . The approximate posterior density of β withrespect to the partial likelihood (see [44] for a formal justiﬁcation) is proportional to P r

P L ( β |D ) ∝ P r ( β ) P r ( β , · · · , β K | β ) K (cid:89) k =1 e l ( β k , D k ) . (2)Therefore the mode of (2) coincides with the parameter β that maximizes (1). If we set R ( β ) = (cid:80) k ˜ R ( β k , β ), with ˜ R ( β k , β ) ≥ β , independent andidentically distributed covariate eﬀects β k . For example ˜ R ( β k , β ) = || β k − β || / (2 λ ) and R ( β ) = || β || / (2 λ ) is consistent with the commonly utilized hierarchical normal priormodel with, a priori , correlations Cor( β k,j , β k (cid:48) ,j ) = λ / ( λ + λ ) > k (cid:48) (cid:54) = k when5he latent vector β is integrated out. This regularization implies positive and symmetricborrowing of information for all pairs k (cid:54) = k (cid:48) of studies, and may not be appropriate forgroups of studies with diﬀerent patient populations.For the latent mean parameter β we use the elastic-net penalty [54], R ( β ) = λ || β || + λ || β || ,λ , λ ≥ , with LASSO and ridge penalty as special cases, when λ = 0 and λ = 0respectively.To account for diﬀerences and similarities of the available studies, we use R ( β ) = p (cid:88) j =1 || β K,j − β ,j || a Σ (3)in (1), where is a K-dimensional vector with one on each component, β K,j = ( β ,j , · · · , β K,j ) (cid:48) refers to covariate j = 1 , · · · , p in each study, and || x || Σ = √ x (cid:48) Σ − x . The symmetric matrix Σ is positive-semideﬁnite and enables diﬀerential borrowing of information across studies.For a = 2, the minimizer of (1) is equivalent to the posterior mode when, a priori , thecoeﬃcients β K,j , j = 1 , · · · , p across studies are modeled as multivariate normal with mean β ,j and covariance matrix Σ . In this case Σ k,k (cid:48) = 0 implies that β k and β k (cid:48) are, a priori andconditionally on β independent. Whereas a large covariance Σ k,k (cid:48) > β k and β k (cid:48) .For a = 1 the penalty R ( β ) in (3) becomes the sum of Mahalanobis distance of β K,j from the mean β ,j with covariance matrix Σ . With Σ ∝ I this penalty reduces to thegroup LASSO [10, 29] with one group for each covariate j = 1 , · · · , J .The regularization parameters λ , λ , a, and Σ determine (i) the sparsity of (cid:98) β (thenumber of components (cid:98) β ,j = 0) and (ii) the similarity of the estimates (cid:98) β ,j , · · · , (cid:98) β K,j across6tudies, including the number of identical study-speciﬁc estimates (cid:98) β k,j = (cid:98) β k (cid:48) ,j .When a ≥ Σ is positive-deﬁnite and λ > λ >

0, the regularized log-partial-likelihood (1) is concave. If we ﬁx λ ≥ , a ≥ Σ , then the number of components (cid:98) β ,j equal to 0 increases with λ . For instance, consider a > λ = 0, and g ( β ) = max β , ··· , β K (cid:80) Kk =1 l ( β k , D k ) − R ( β ) . The concave map g ( β ) − λ || β || bounds the regularized log-partial-likelihood. If we choose λ larger thanmax ≤ j ≤ p | ∂g ( β ) /∂β ,j | at β = , then (1) is maximized at (cid:99) β = . In contrast, for values λ below this maximum some of the estimates (cid:98) β ,j will be diﬀerent from zero.For λ , λ ≥ , and a = 1, the choice of Σ can lead to identical study speciﬁc estimates (cid:98) β ,j = · · · = (cid:98) β K,j . We provide an example with λ = 0 and Σ = σ I . Let z k = β k − β , k = 1 , · · · , K , and deﬁne h ( z ) = max β (cid:80) Kk =1 l ( z k + β , D k ) − R ( β ) . The map h ( z ) − (cid:80) ≤ j ≤ p || ( z ,j , . . . , z K,j ) || a Σ bounds the re-parametrized regularized log-partial-likelihood ( l R :[ β , z , · · · , z K ] → R ). If we specify 1 /σ > max j,k | ∂h ( z ) /∂z k,j | at z = , then the concavefunction (1) is maximized at (cid:99) β = . . . = (cid:99) β K = (cid:99) β . More generally, if we don’t assume adiagonal Σ and indicate with σ the largest eigenvalue of Σ , then the equalities (cid:99) β k = (cid:99) β hold when 1 /σ > max j,k | ∂h ( z ) /∂z k,j | at z = . We use an alternating direction method of multipliers algorithm [7] to estimate β , see [7] foran introduction to this algorithm. We ﬁrst formulate the optimization of (1) with respect to β = ( β , · · · , β K ) as a constrained convex minization problemmin ( β , z ) (cid:40) (cid:88) k − l ( β k , D k ) + R ( β ) + R ( z ) (cid:41) , z = ( z , · · · , z K ) (cid:48) , z k ∈ R p , subjected to the aﬃne constraints β k = z k , k = 0 , . . . , K .We then introduce for this minimization problem the scaled augmented Lagrangian L ρ ( z , β , u ) = K (cid:88) k =1 − l ( β k , D k ) + R ( β ) + R ( z ) + K (cid:88) k =0 ρ || β k − z k + u k || , (4)where ρ > , with augmented u = ( u , · · · , u K ) , u k ∈ R p . For a ﬁxed ρ > , the algorithmthat we describe converges to a solution β − z = u = that maximizes (1). The algorithmminimizes (4) iteratively (i) with respect to β , and (ii) with respect to z , and (iii) then itupdates u to u ← u + β − z , while keeping at each of the three steps the remaining twoparameters ﬁxed. At each iteration of the algorithm the minimization of (4) with respectto β ( step i ) can be carried out independently for each component β k , k = 0 , . . . , K , andthe minimization with respect to z ( step ii ) can be carried out independently by covariates j = 1 , · · · , p .The algorithm starts with an initial estimate of β (we use or preliminary estimates of β k , k = 0 , · · · , K ), β = z and u = . At each iteration, in step i the algorithm minimizes (4) β , keeping z and u ﬁxed, by setting β = S (cid:0) ρ ( z − u ) , λ (cid:1) ρ + 2 λ where S ( x , λ ) is the coordinate-wise soft-thresholding function s ( x j , λ ) = (1 − λ/ | x j | ) + x j [45], and β k = arg min b (cid:16) − l ( b , D k ) + ρ || b − z k + u k || / (cid:17) for the remaining k = 1 , · · · , K . We used a low-memory quasi-Newton algorithm [8] for thelatter minimization k > step ii , the algorithm minimizes (4) with respect to z keeping β and u ﬁxed. This isdone independently for each covariate 1 ≤ j ≤ p , because R ( · ) and the l -norm in (4) canbe factors into the sum of p terms each involving only the j -th row of z = ( z , · · · , z K ) andthe j -th row of β + u . For example, when a = 2 in (3), z (cid:48) = (cid:16) I + 2 H (cid:48) Σ − H /ρ (cid:17) − ( β + u ) (cid:48) , where the K by K + 1 matrix H = [ − , I ] is the concatenation of − with k-dimensional8dentity matrix I . This, computation is implemented by ﬁrst computing the matrix K + 1 by K + 1 matrix (cid:16) I + 2 H (cid:48) Σ − H /ρ (cid:17) − , and then multiplying it with each column j = 1 , · · · , p of ( β + u ) (cid:48) . Lastly, in step iii , u from the last iteration is updated to u + β − z . We iterate thisthree steps until the l -norm of both z − β and the diﬀerence between z from two successiveiterations becomes smaller than a pre-speciﬁed treshold (cid:15) > We consider a total of 18 studies. Either 2 , ,

10 or 15 of these 18 studies are used to estimatethe model (1). The remaining 16 , , k = 1 , · · · ,

18, we drew the sample size n k of the study from a uniformdistribution n k ∼ U nif (100 , · · · , , and then generated the covariates x k,i ∈ R ofobservations i = 1 , · · · , n k from a normal distribution x k,i ∼ N ( , V ) with covariance V j,j (cid:48) = 0 . | j − j (cid:48) | between variables j and j (cid:48) .We then generated 100 times the parameters β ∈ R × and a collection of 18 studies D = ( D k ) k =1 . In each of these 100 simulations we ﬁrst generated the vector β ∈ R from a two-component mixture distribution with (i) a point mass at zero and (ii) a nor-mal distribution with mean zero and variance 0 .

1. The proportion of zeros of this mix-ture distribution equals p = 0 . p = 500 vectors( (cid:15) ,j , . . . , (cid:15) K,j ) ∼ N K ( , Σ ) and set β k,j = β ,j + (cid:15) k,j for each covariate j = 1 , . . . , p and study k = 1 , . . . , K . We consider three matrices Σ = Σ , Σ , Σ (see Figure 1) with 3, 2 or a singlecluster of studies.Survival times where generated from proportional hazard models with baseline survivalfunctions (cid:98) S k ( · ), regression coeﬃcients β k , and censoring survival functions (cid:98) S C,k ( · ). Here (cid:98) S k ( · ) and (cid:98) S C,k ( · ) have been estimated from the ovarian cancer datasets that we discuss inSection 5. For each study k we also generated an additional 1,000 observations, that were9 tudies used to estimate the models: studies 1, 11 studies 1−3, 10, 11 studies 1−4, 10−15 studies 1−6, 9−17 s t ud y k (a) S study k (b) S study k (c) S study k Figure 1: Similarity matrixes Σ = Σ , Σ , Σ ∈ R × used to simulate the datasets. Wesimulated collections of 18 datasets D = { D k } k =1 with similarity matrix Σ for β . Studiesindicated in blue (2 studies), yellow (5 studies), red (10 studies) or green (15 studies) areused to ﬁt models with K = 2 , ,

10 or 15 studies.not used to ﬁt regression models, but were used to evaluate predictions. Σ and selection of ( λ , λ ) We use initial estimates (cid:99) β k obtained from K independent ridge regression models to esti-mate Σ . The procedure leverage the Bayesian interpretation (2) of the regularized likeli-hood (1). As formalized in (2), with R ( β ) = (cid:80) pj =1 || β K,j − β ,j || a Σ , we can interpret( β j, , · · · , β j,K ) , j = 1 , · · · , p , as p independent vectors each with covariance matrix Σ . Ifthe β k , k = 1 , · · · , K, were known we could straightforwardly estimate Σ . For instancewith a = 2, the parameters ( β j, , · · · , β j,K ) , j = 1 , · · · , p can be interpreted as independentmultivariate normal vectors with mean zero and covariance matrix Σ . The joint normaldistribution implies that E [ β k |{ β k (cid:48) } x (cid:48) (cid:98) β k | Y < Y (cid:17) is the estimated concordance [21, 36, 49]between two survival times Y and Y with covariate vectors x and x in population k . The11eights w k ≥ w k = 1 / √ m k . Weﬁrst split the data randomly M -times into training (80%) and validation (20%) datasets.Next we deﬁne a grid of tuning parameters λ = ( λ , λ ) . For each combination of tuningparameters λ of the grid, we estimate (cid:98) β ( m ) based on the m = 1 , · · · , M CV training datasets(which are identical across diﬀerent grid-points) and use the validation datasets to obtainestimates of the study-speciﬁc C-statistics C λ ( (cid:98) β ( m ) k , D k ) , m = 1 , · · · , M for (cid:98) β ( m ) k with λ . Wethen average these M C-statistics and compute the overall estimate C λ ( (cid:98) β ) for λ . Lastly, weselect the λ -value with the highest average C-statistics C λ ( (cid:98) β ) . Figure 3 shows, for each of the 18 studies box-plots of the estimated C-statistics [21, 36, 49]when either 2 , ,

10 or 15 studies (1st to 4th column) were used to estimate the similaritymatrix and the model. C-statistics C ( (cid:98) β k , D k ) for studies k that were utilized to estimatethe model are highlighted inside the brown rectangles (estimated using the additional 1000hold-out observations in study k ), whereas C-statistics C ( (cid:98) β , D k ) for studies k that were notused to estimate the model are shown on the right of the brown rectangles.The three rows of Figure 3 correspond to scenarios with data generated using Σ (toprow of Figure 3), Σ , (2nd row), or Σ (bottom row) as illustrated in Figure 1. Red, greenand blue box-plots on the top-row indicate the three clusters of studies under Σ . Similarly,red and green box-plots in the 2-nd row indicate the two clusters of studies under Σ . Diﬀerences in the distribution of the C-statistics between studies within the same cluster aredue to diﬀerences in the sample sizes n k and covariate matrixes X k , which remain identicalacross the simulated datasets.For Σ = Σ (1st row of Figure 3), with three clusters of studies, predictions showimprovements when the number of studies used to train the regression models increases K .For K = 2 or 5, all studies used for estimation belong to the ﬁrst two clusters (red and green12 =2 K=5 K=10 K=15 Study C − s t a t i s t i cs ΣΣΣ

Figure 3: Predictions with the penalized regression model with ( a, λ ) = (2 , Σ = Σ , Σ and Σ (see Figure 1), and p = 0 across 100 simulations of a collection of 18studies. Either K = 2 , ,

10 or 15 studies (studies inside the brown rectangles) were usedfor estimation/selection of ( Σ , λ , β ). The colors (red, green, blue) of the Box-plots indicateclusters of studies (3, 2, or 1 clusters when Σ = Σ , Σ or Σ ).box-plots). In these two cases, for each study k = 13 , · · ·

18 in cluster 3 (blue box-plots)the inter-quartile range (IQR) of the C-statistics C ( (cid:98) β , D k ) across simulations lies within theinterval 0.52 to 0.55. Whereas for K = 10 (3rd column, studies 1-4 and 10-15 are use forestimation), studies from all three clusters have been used for training. In this case, theIQRs of C ( (cid:98) β , D k ) across simulations for all three hold-out studies k = 16 , ,

18 in cluster3 are within the interval 0.65 to 0.69. The last row of Figure 3 shows that, as expected,borrowing of information in the estimation of model parameters is most eﬀective in the caseof a single cluster of studies. 13ext, we compared our estimates of β based on model (1), with a = 2 for R ( · ) and ridgepenalty (HR-R, λ = 0) or LASSO penalty (HR-L, λ = 0) for β , to Cox models trainedseparately on each study D k with LASSO (single-study LASSO, SL) or ridge penalties (single-study ridge, SR) for β k . In addition we consider two models that combine all (2, 5, 10 or 15)studies into a single dataset and estimate a single Cox model (with regression parameters β )using a LASSO (pooled LASSO, PL) or ridge (pooled ridge, PR) penalty for the coeﬃcients β . We also consider two meta-analysis approaches described in [51, 40] that combine studyspeciﬁc estimates (cid:98) β k into a single vector (cid:98) β using either ﬁxed eﬀects (FE) or random-eﬀects(RE) estimation.Figures 4 and supplementary Figures A.1 and A.2 show the average C-statistics of eachmethod when we used K = 5 or K = 10 studies for estimation. The pooled LASSO andridge models (PL and PR) and the meta-analyses methods (FE and RE) estimate a singleparameter β , which was used to compute the C-statistics C ( (cid:98) β , D k ) for each study k . Forthe single-study SL and SR models we used the study-speciﬁc estimates (cid:98) β k to compute C ( (cid:98) β k , D k ) for in-study prediction (using the 1,000 validation observations). For predictionwith SL and SR in studies k (cid:48) not used for estimation, we used each estimate (cid:98) β k of the K = 5(or 10) training studies for predictions C ( (cid:98) β k , D k (cid:48) ) in all hold-out studies k (cid:48) . For each hold-out studies k (cid:48) we then averaged these C ( (cid:98) β k , D k (cid:48) ) over all K = 5 (or 10) training studies, i.e.Figure 4 reportes (cid:80) k C ( (cid:98) β k , D k (cid:48) ) /K for studied k (cid:48) .For studies k used to train the model, both HR-L and HR-R improve predictions C ( (cid:98) β k , D k )substantially compared to single-study estimates SL and SR. For instance, with K = 5 , p =0 and unknown Σ = Σ (three clusters of studies), the average diﬀerence between C ( (cid:98) β k , D k )of HR-R and SR is between 0.07 and 0.16 for each of the ﬁve studies (0.62 to 0.75 for SRcompared to 0.73 to 0.85 for HR-R). Similarly, meta-analytic and pooled estimates FE, REand PL, SL improve predictions on the K = 5 datasets compared to single-study estimates,especially PR. But improvements are smaller than for HR-P and HR-L, with C-values on av-14 .81 0.85 0.82 0.7 0.7 0.7 0.7 0.70.70.730.81 0.63 0.64 0.64 0.64 0.64 0.64 0.640.81 0.85 0.82 0.69 0.68 0.69 0.68 0.690.690.730.81 0.63 0.63 0.63 0.63 0.63 0.62 0.630.71 0.72 0.72 0.69 0.69 0.69 0.68 0.690.690.630.65 0.6 0.6 0.6 0.6 0.61 0.6 0.610.71 0.73 0.72 0.69 0.69 0.69 0.69 0.690.690.620.65 0.6 0.6 0.6 0.6 0.6 0.6 0.610.7 0.71 0.71 0.68 0.68 0.68 0.68 0.680.680.620.64 0.59 0.6 0.59 0.59 0.6 0.6 0.60.73 0.75 0.74 0.71 0.71 0.71 0.7 0.710.710.630.66 0.6 0.61 0.61 0.61 0.61 0.61 0.610.6 0.71 0.63 0.67 0.64 0.61 0.53 0.530.570.570.7 0.68 0.65 0.53 0.72 0.58 0.55 0.530.65 0.75 0.68 0.71 0.68 0.66 0.65 0.650.620.620.74 0.72 0.69 0.58 0.74 0.63 0.57 0.660.82 0.86 0.83 0.84 0.7 0.7 0.69 0.70.70.770.84 0.83 0.81 0.75 0.85 0.69 0.68 0.690.81 0.86 0.83 0.84 0.69 0.71 0.69 0.70.70.770.84 0.82 0.81 0.75 0.85 0.69 0.68 0.690.68 0.68 0.69 0.68 0.66 0.66 0.66 0.670.670.680.7 0.69 0.69 0.68 0.7 0.67 0.66 0.680.67 0.68 0.69 0.68 0.66 0.66 0.66 0.670.670.680.7 0.69 0.69 0.68 0.7 0.67 0.66 0.680.68 0.69 0.69 0.68 0.67 0.67 0.66 0.670.670.690.7 0.7 0.7 0.68 0.71 0.67 0.67 0.680.69 0.7 0.7 0.7 0.68 0.68 0.67 0.680.680.70.71 0.71 0.71 0.69 0.72 0.68 0.68 0.690.6 0.71 0.63 0.67 0.64 0.61 0.53 0.530.570.570.7 0.68 0.65 0.53 0.72 0.58 0.55 0.530.65 0.75 0.68 0.71 0.68 0.66 0.65 0.650.620.620.74 0.72 0.69 0.58 0.74 0.63 0.57 0.660.83 0.85 0.84 0.75 0.74 0.75 0.74 0.750.750.740.82 0.7 0.7 0.7 0.71 0.71 0.7 0.710.83 0.85 0.84 0.74 0.73 0.74 0.74 0.740.730.760.83 0.71 0.71 0.71 0.72 0.71 0.71 0.720.74 0.74 0.75 0.73 0.72 0.73 0.72 0.720.720.680.7 0.66 0.66 0.66 0.67 0.66 0.66 0.670.74 0.75 0.75 0.73 0.72 0.73 0.72 0.720.720.680.7 0.66 0.66 0.66 0.67 0.66 0.66 0.670.76 0.76 0.77 0.75 0.74 0.75 0.74 0.740.750.70.72 0.69 0.7 0.69 0.7 0.7 0.69 0.70.77 0.78 0.78 0.76 0.75 0.75 0.75 0.750.750.690.72 0.67 0.68 0.68 0.69 0.68 0.67 0.680.65 0.74 0.68 0.71 0.68 0.66 0.61 0.610.610.610.74 0.71 0.69 0.59 0.75 0.62 0.59 0.610.65 0.75 0.68 0.71 0.68 0.66 0.65 0.650.620.620.74 0.72 0.69 0.58 0.74 0.63 0.57 0.660.83 0.86 0.85 0.85 0.74 0.74 0.73 0.740.750.80.85 0.83 0.83 0.77 0.86 0.75 0.75 0.760.83 0.86 0.85 0.85 0.75 0.75 0.75 0.750.750.80.85 0.84 0.83 0.78 0.86 0.75 0.74 0.760.73 0.72 0.73 0.73 0.71 0.72 0.71 0.710.720.730.74 0.73 0.74 0.72 0.74 0.72 0.71 0.720.73 0.73 0.73 0.73 0.71 0.72 0.71 0.710.720.730.74 0.73 0.73 0.72 0.74 0.72 0.71 0.720.75 0.75 0.75 0.75 0.73 0.74 0.74 0.730.740.750.77 0.76 0.76 0.75 0.77 0.75 0.74 0.750.75 0.75 0.76 0.76 0.73 0.74 0.73 0.730.740.750.77 0.76 0.76 0.75 0.77 0.74 0.74 0.750.65 0.74 0.68 0.71 0.68 0.66 0.61 0.610.610.610.74 0.71 0.69 0.59 0.75 0.62 0.59 0.610.65 0.75 0.68 0.71 0.68 0.66 0.65 0.650.620.620.74 0.72 0.69 0.58 0.74 0.63 0.57 0.66 K = p . = K = p . = K = p . = . K = p . = . SLSRFEREPLPRHR−LHR−RSLSRFEREPLPRHR−LHR−RSLSRFEREPLPRHR−LHR−RSLSRFEREPLPRHR−LHR−R

Study M e t hod s C S Figure 4: Average prediction across 100 simulations of a collection of 18 studies. Studyspeciﬁc eﬀects β k have been generated under Σ (see Figure 1). Either 5 or 10 of the 18studies (studies in the left of the horizontal red bar) are used for the similarity matrix andcovariate-eﬀect estimation. See Figures A.1 and A.2 for results with similarity scenarios Σ and Σ . 15rage (across simulations) between 0.08 to 0.15 below HR-R and HR-L models (for instance,0.63 to 0.73 for PR compared to 0.73 to 0.85 for HR-R). When K = 10 studies are used toestimate the models, results are similar to the setting with K = 5 - meta-analytic, pooledand hierarchical estimates improve predictions over single-study estimates, with larger im-provements for HR-R and HR-L estimates for in-study predictions.In the case of a single cluster of studies ( Σ = Σ , supplementary Figure A.2), with strongsimilarity of the study-speciﬁc parameters β k , pooling of studies to estimate a single β isexpected to be the most favorable prediction approach. Therefore, PL, PR, HR-R and HR-Lpredict survival substantially better than the FE, RE, SL and SR methods (supplementaryFigure A.2). With p = 0, in-study predictions based on HR-R and HR-L estimates are onaverage slightly better than for PR and PL estimates (diﬀerence of 0.01 to 0.04 for HR-Rcompared to PR with K = 5 studies, and 0.02 to 0.05 with K = 10). Whereas PR, PL,HR-R and HR-L have similar average C -statistics for holdout studies. We applied model (1) to predict survival in ovarian cancer using the curatedOvarianData repository, a curated collection of gene-expression datasets [20]. To evaluate prediction,we split the largest study in the database, the TCGA dataset [32] with 510 observations,1,000 times randomly into a training dataset of n = 50 , , . . . , or 300 observations anda validation dataset with 510 − n observations. We predicted patient survival Y ,i inthe TCGA holdout data by leveraging the hierarchical regularization model (1) using ﬁveadditional datasets k = 2 , · · · , K = 6 (PMID-17290060[17], GSE51088[26], MTAB386[2],GSE13876[15] and GSE19829 [27]) with sample sizes ranging between n k = 42 (GSE19829)and 157 (GSE13876) observations. In all the analyses we used the expression values of the p = 3 ,

030 genes that are common in all six studies to predict patient survival.To evaluate the hierarchical regularization method (1), we created diﬀerent cross-study16eterogeneity scenarios that are motivated by documented inconsistencies across cancerdatasets and by possible pre-processing errors [33, 39, 4, 1, 38, 41]. This is achieved byintroducing in one (scenario 2: GSE13876[15]) or two studies (scenario 3: GSE13876[15] andGSE19829 [27]) a distortion of the expression values x k,i,j which become 10 − x k,i,j , j =1 , . . . , p for study k = K (scenario two) or studies k = K − , K (scenario three). In scenarioone we used the covariates x k,i,j of the six studies.Similar to Section 4, we consider parameter estimates based on the n = 50 , · · · , n TCGA-observationsand the remaining ﬁve studies (PMID-17290060, GSE51088, MTAB386, GSE13876 andGSE19829) into a single dataset with (iii) LASSO (PL) or (iv) ridge (PR) regularization, (v)ﬁxed eﬀects (FE) and (vi) random eﬀects (RE) model meta-analyses models as described in[40, 3, 51], and (vii) the proposed hierarchical regularization model (1) with λ = 1 , a = 2(HR-R).Single-study cox-models with LASSO-penalty trained on the TCGA data with n = 50data points had low average C-values of 0 .

505 across the 1,000 generated training-validationsamples, with minor improvements up 0 .

52 when n = 300 observations are used for modeltraining. Single study ridge regression models performed substantially better, with averageC-statistics ranging between 0 .

53 for n = 50 and 0 .

57 for n = 300 observations. Improve-ments in risk predictions through integration of additional studies vary substantially acrossdata-integration methods and scenarios. For scenario 1, FE and RE meta-analyses have bothnearly constant and identical average C-statistics of 0 .

57 across all sample sizes n , while PLhad an average C-statistics of 0 .

56 for n = 50 with minor improvements up to 0 .

57 when n = 300 . Both, HR-R and PR have similar prediction accuracy across sample sizes n , withidentical average C-statistics of 0 .

60 when n = 50 and modest improvements up to 0 .

61 forboth, HR-R and PR, when n = 300. 17igure 5 shows, for scenarios two and three, average C-statistics for the TCGA validationsamples. Diﬀerent curves correspond to diﬀerent prediction methods. The black curves showthe average C-statistics (y-axis) across the 1,000 TCGA validation samples of size 510 − n forCox models trained on n = 50 , · · · ,

300 observations from the TCGA study (x-axis) usingeither SL (dotted curve) or SR (solid curve). The red curves show the average C-statistic forPR (solid curve) and PL (dotted curve) models, the green curves correspond to FE (dottedline) and RE (solid line) meta-analysis models, and the blue curve corresponds to the HR-R.

50 100 150 200 250 300 . . . . . . . HR−RSRSLPRPLFERE sample size of the TCGA study C − s t a t i s t i c Scenario 2

50 100 150 200 250 300 . . . . . . . sample size of the TCGA study C − s t a t i s t i c Scenario 3

Figure 5: Average C-statistics for single-study SL and SR methods with n = 50 , · · · , n TCGA training samples and training samples from ﬁve additional studies (PMID-17290060, GSE51088, MTAB386, GSE13876 and GSE19829).In scenario two, the RE meta-analysis, which combines estimates from the n = 50 TCGAdata points with estimates from the remaining ﬁve studies, has the same average C-statisticsas the single-study SR model trained on n = 240 patients. For sample sizes n > n = 50 TCGA patients has an average C-statistics the is superior to those of18R and FE procedures with n = 50 , · · · , n = 50 TCGA patients havesimilar average C-values as single study SR modes with n ≈

200 patients. PR, PL, FEand RE methods rely on the assumption that the regression parameters are similar acrossstudies. With substantial departures from this assumption the hierarchical model HR-Rshows, across all sample sizes 50 ≤ n ≤ The analysis of relations between omics variables and time to event outcomes, and the use ofindividual proﬁles x k,i for predictions, are particularly challenging when the sample size n k is small. These analyses often include thousands of potential predictors. The use of multiplestudies and pooling of information can improve prediction accuracy. Meta-analyses can beutilized when the relations of covariates and outcomes are homogeneous across studies. Butrecent work in oncology [40, 51, 47] showed that there can be clusters of studies with relevantdiscrepancies in their covariate-outcome relations due, for example, to diﬀerences in studydesigns, patient populations and treatments.We combined two established concepts, regularization of regression models [25, 45, 46, 19]and metrics of similarity between datasets [22] that identify clusters of studies. We usedthese concepts to estimate study-speciﬁc regression parameters β k and for predictions, bothin k = 1 , . . . , K contexts that are represented in our collection of datasets, for example K distinct geographic regions, and in other contexts ( k = K + 1) by estimating the latentparameters β .The K × K similarity matrix Σ is used to regularize the likelihood function, and ittunes the degree of borrowing of information in the estimation of K study-speciﬁc regression19odels. It shrinks the estimate of the k-th study-speciﬁc regression parameter β k towardsestimates β k (cid:48) of studies k (cid:48) that are similar to study k (large Σ k,k (cid:48) ). In contrast studies withlow similarity (Σ k,k (cid:48) ≈

0) have little inﬂuence on the estimation of β k . In our analyseswe veriﬁed that, if there are clusters of studies with similar predictors-outcome relations,then the introduced method improves the accuracy of predictions compared to alternativeprocedures, including single-study estimates, meta-analyses and pooling of all studies into asingle data matrix. Funding

Lorenzo Trippa was partially supported by the NSF grant 1810829.

References [1] C. R. Acharya, D. S. Hsu, C. K. Anders, A. Anguiano, K. H. Salter, K. S. Walters,R. C. Redman, S. A. Tuchman, C. A. Moylan, S. Mukherjee, et al. Gene expressionsignatures, clinicopathological features, and individualized therapy in breast cancer.

Jama , 299(13):1574–1587, 2008.[2] S. Bentink, B. Haibe-Kains, T. Risch, J.-B. Fan, M. S. Hirsch, K. Holton, R. Rubio,C. April, J. Chen, E. Wickham-Garcia, et al. Angiogenic mrna and microrna geneexpression signature predicts a novel subtype of serous ovarian cancer.

PloS one , 7(2),2012.[3] C. Bernau, M. Riester, A.-L. Boulesteix, G. Parmigiani, C. Huttenhower, L. Waldron,and L. Trippa. Cross-study validation for the assessment of prediction algorithms.

Bioinformatics , 30(12):i105–i112, 2014.[4] H. Bonnefoi, A. Potti, M. Delorenzi, L. Mauriac, M. Campone, M. Tubiana-Hulin,T. Petit, P. Rouanet, J. Jassem, E. Blot, et al. Retracted: Validation of gene signatures20hat predict the response of breast cancer to neoadjuvant chemotherapy: a substudy ofthe eortc 10994/big 00-01 clinical trial, 2007.[5] H. M. Bøvelstad, S. Nyg˚ard, H. L. Størvold, M. Aldrin, Ø. Borgan, A. Frigessi, andO. C. Lingjærde. Predicting survival from microarray data a comparative study.

Bioin-formatics , 23(16):2080–2087, 2007.[6] H. M. Bøvelstad, S. Nyg˚ard, and Ø. Borgan. Survival prediction from clinico-genomicmodels-a comparative study.

BMC bioinformatics , 10(1):1, 2009.[7] S. Boyd, N. Parikh, E. Chu, B. Peleato, and J. Eckstein. Distributed optimization andstatistical learning via the alternating direction method of multipliers.

Foundations andTrends R (cid:13) in Machine Learning , 3(1):1–122, 2011.[8] R. H. Byrd, P. Lu, J. Nocedal, and C. Zhu. A limited memory algorithm for boundconstrained optimization. SIAM Journal on Scientiﬁc Computing , 16(5):1190–1208,1995.[9] Y. Q. Chen and M.-C. Wang. Analysis of accelerated hazards models.

Journal of theAmerican Statistical Association , 95(450):608–618, 2000.[10] X. Cheng, W. Lu, and M. Liu. Identiﬁcation of homogeneous and heterogeneous vari-ables in pooled cohort studies.

Biometrics , 71:397–403, 2015.[11] E. Conlon, B. Postier, B. Methe, K. Nevin, and D. Lovley. Hierarchical bayesian meta-analysis models for cross-platform microarray studies.

Journal of Applied Statistics , 36(10):1067–1085, 2009.[12] E. M. Conlon, J. J. Song, and J. S. Liu. Bayesian models for pooling microarray studieswith multiple sources of replications.

BMC bioinformatics , 7(1):1, 2006.2113] E. M. Conlon, B. L. Postier, B. A. Meth´e, K. P. Nevin, and D. R. Lovley. A bayesianmodel for pooling gene expression studies that incorporates co-regulation information.

PloS one , 7(12):e52137, 2012.[14] D. R. Cox. Regression models and life-tables.

Journal of the Royal Statistical Society.Series B (Methodological) , 34(2):187–220, 1972.[15] A. P. Crijns, R. S. Fehrmann, S. de Jong, F. Gerbens, G. J. Meersma, H. G. Klip,H. Hollema, R. M. Hofstra, G. J. te Meerman, E. G. de Vries, et al. Survival-relatedproﬁle, pathways, and transcription factors in ovarian cancer.

PLoS medicine , 6(2),2009.[16] R. DerSimonian and N. Laird. Meta-analysis in clinical trials.

Controlled clinical trials ,7(3):177–188, 1986.[17] H. K. Dressman, A. Berchuck, G. Chan, J. Zhai, A. Bild, R. Sayer, J. Cragun, J. Clarke,R. S. Whitaker, L. Li, et al. An integrated genomic-based approach to individualizedtreatment of patients with advanced-stage ovarian cancer.

Journal of Clinical Oncology ,25(5):517–525, 2007.[18] M. L. Eaton. Multivariate statistics: a vector space approach.

JOHN WILEY & SONS ,1983.[19] W. J. Fu. Penalized regressions: the bridge versus the lasso.

Journal of computationaland graphical statistics , 7(3):397–416, 1998.[20] B. F. Ganzfried, M. Riester, B. Haibe-Kains, T. Risch, S. Tyekucheva, I. Jazic, X. V.Wang, M. Ahmadifar, M. J. Birrer, G. Parmigiani, et al. curatedovariandata: clinicallyannotated data for the ovarian cancer transcriptome.

Database , 2013, 2013.[21] F. E. Harrell Jr, K. L. Lee, R. M. Caliﬀ, D. B. Pryor, and R. A. Rosati. Regression22odelling strategies for improved prognostic prediction.

Statistics in medicine , 3(2):143–152, 1984.[22] T. Hastie, R. Tibshirani, and J. Friedman.

The Elements of Statistical Learning: DataMining, Inference, and Prediction.

Springer, 2009.[23] T. Hastie, R. Tibshirani, and M. Wainwright.

Statistical learning with sparsity: thelasso and generalizations . Chapman and Hall/CRC, 2015.[24] L. V. Hedges and I. Olkin.

Statistical methods for meta-analysis . Academic press, 2014.[25] A. E. Hoerl and R. W. Kennard. Ridge regression: Biased estimation for nonorthogonalproblems.

Technometrics , 12(1):55–67, 1970.[26] B. Y. Karlan, J. Dering, C. Walsh, S. Orsulic, J. Lester, L. A. Anderson, C. L. Ginther,M. Fejzo, and D. Slamon. Postn/tgfbi-associated stromal signature predicts poor prog-nosis in serous epithelial ovarian cancer.

Gynecologic oncology , 132(2):334–342, 2014.[27] P. A. Konstantinopoulos, D. Spentzos, B. Y. Karlan, T. Taniguchi, E. Fountzilas,N. Francoeur, D. A. Levine, and S. A. Cannistra. Gene expression proﬁle of brcanessthat correlates with responsiveness to chemotherapy and with outcome in patients withepithelial ovarian cancer.

Journal of clinical oncology , 28(22):3555, 2010.[28] F. Liu, D. Dunson, and F. Zou. High-dimensional variable selection in meta-analysisfor censored data.

Biometrics , 67(2):504–512, 2011.[29] M. Liu, W. Lu, V. Krogh, G. Hallmans, T. V. Clendenen, and A. Zeleniuch-Jacquotte.Estimation and selection of complex covariate eﬀects in pooled nested case–controlstudies with heterogeneity.

Biostatistics , 14(4):682–694, 2013.[30] S. Ma, J. Huang, and X. Song. Integrative analysis and variable selection with multiplehigh-dimensional data sets.

Biostatistics , 12(4):763–775, 2011.2331] S. Ma, J. Huang, F. Wei, Y. Xie, and K. Fang. Integrative analysis of multiple cancerprognosis studies with gene expression measurements.

Statistics in medicine , 30(28):3361–3371, 2011.[32] C. G. A. R. Network et al. Integrated genomic analyses of ovarian carcinoma.

Nature ,474(7353):609, 2011.[33] N. D. of Health and H. Services. Findings of research misconduct.

NIH Guide GrantsContracts , 16(021), 2015.[34] M. Y. Park and T. Hastie. L1-regularization path algorithm for generalized linearmodels.

Journal of the Royal Statistical Society: Series B (Statistical Methodology) , 69(4):659–677, 2007.[35] T. Park and G. Casella. The bayesian lasso.

Journal of the American Statistical Asso-ciation , 103(482):681–686, 2008.[36] M. J. Pencina and R. B. D’Agostino. Overall c as a measure of discrimination in survivalanalysis: model speciﬁc population value and conﬁdence interval estimation.

Statisticsin medicine , 23(13):2109–2123, 2004.[37] N. G. Polson, J. G. Scott, B. T. Willard, et al. Proximal algorithms in statistics andmachine learning.

Statistical Science , 30(4):559–581, 2015.[38] A. Potti, A. Bild, H. K. Dressman, D. A. Lewis, J. R. Nevins, and T. L. Ortel. Gene-expression patterns predict phenotypes of immune-mediated thrombosis.

Blood , 107(4):1391–1396, 2006.[39] A. Potti, H. K. Dressman, A. Bild, R. F. Riedel, G. Chan, R. Sayer, J. Cragun, H. Cot-trill, M. J. Kelley, R. Petersen, et al. Genomic signatures to guide the use of chemother-apeutics.

Nature medicine , 12(11):1294–1300, 2006.2440] M. Riester, W. Wei, L. Waldron, A. C. Culhane, L. Trippa, E. Oliva, S.-h. Kim, F. Mi-chor, C. Huttenhower, G. Parmigiani, et al. Risk prediction for late-stage ovarian cancerby meta-analysis of 1525 patient samples.

Journal of the National Cancer Institute , pagedju048, 2014.[41] K. H. Salter, C. R. Acharya, K. S. Walters, R. Redman, A. Anguiano, K. S. Garman,C. K. Anders, S. Mukherjee, H. K. Dressman, W. T. Barry, et al. An integrated approachto the prediction of chemotherapeutic response in patients with breast cancer.

PLoSOne , 3(4):e1908–e1908, 2008.[42] J. Shao. Linear model selection by cross-validation.

Journal of the American statisticalAssociation , 88(422):486–494, 1993.[43] N. Simon, J. Friedman, T. Hastie, and R. Tibshirani. Regularization paths for cox’sproportional hazards model via coordinate descent.

Journal of statistical software , 39(5):1, 2011.[44] D. Sinha, J. G. Ibrahim, and M. Chen. A bayesian justiﬁcation of cox’s partial likelihood.

Biometrika , 90(3):629–641, 2003. doi: 10.1093/biomet/90.3.629. URL +http://dx.doi.org/10.1093/biomet/90.3.629 .[45] R. Tibshirani. Regression shrinkage and selection via the lasso.

Journal of the RoyalStatistical Society. Series B (Methodological) , pages 267–288, 1996.[46] R. Tibshirani. The lasso method for variable selection in the cox model.

Statistics inMedicine , 16(4):385–395, 1997.[47] L. Trippa, L. Waldron, C. Huttenhower, and G. Parmigiani. Bayesian nonparametriccross-study validation of prediction methods.

Ann. Appl. Stat. , 9(1):402–428, 03 2015.doi: 10.1214/14-AOAS798. 2548] L. Trippa, L. Waldron, C. Huttenhower, G. Parmigiani, et al. Bayesian nonparametriccross-study validation of prediction methods.

The Annals of Applied Statistics , 9(1):402–428, 2015.[49] H. Uno, T. Cai, M. J. Pencina, R. B. D’Agostino, and L. Wei. On the c-statistics forevaluating overall adequacy of risk prediction procedures with censored survival data.

Statistics in medicine , 30(10):1105–1117, 2011.[50] W. N. Van Wieringen, D. Kun, R. Hampel, and A.-L. Boulesteix. Survival predictionusing gene expression data: a review and comparison.

Computational statistics & dataanalysis , 53(5):1590–1603, 2009.[51] L. Waldron, B. Haibe-Kains, A. C. Culhane, M. Riester, J. Ding, X. V. Wang, M. Ah-madifar, S. Tyekucheva, C. Bernau, T. Risch, et al. Comparative meta-analysis ofprognostic gene signatures for late-stage ovarian cancer.

Journal of the National Can-cer Institute , 106(5):dju049, 2014.[52] H. Wang and C. Leng. A note on adaptive group lasso.

Computational Statistics &Data Analysis , 52(12):5277 – 5286, 2008. ISSN 0167-9473. doi: http://dx.doi.org/10.1016/j.csda.2008.05.006. URL .[53] L. Wei. The accelerated failure time model: a useful alternative to the cox regressionmodel in survival analysis.

Statistics in medicine , 11(14-15):1871–1879, 1992.[54] H. Zou and T. Hastie. Regularization and variable selection via the elastic net.

Journalof the royal statistical society: series B (statistical methodology) , 67(2):301–320, 2005.26

Appendix .82 0.85 0.83 0.72 0.72 0.72 0.71 0.70.70.760.83 0.7 0.54 0.54 0.54 0.54 0.54 0.540.81 0.85 0.83 0.71 0.71 0.71 0.7 0.690.690.750.82 0.69 0.53 0.54 0.54 0.54 0.54 0.540.72 0.73 0.73 0.7 0.7 0.7 0.66 0.660.660.670.69 0.65 0.53 0.53 0.53 0.53 0.53 0.530.72 0.74 0.74 0.7 0.7 0.71 0.66 0.650.650.670.69 0.65 0.53 0.53 0.53 0.53 0.53 0.530.72 0.73 0.73 0.7 0.7 0.7 0.65 0.650.650.670.69 0.65 0.53 0.53 0.54 0.54 0.53 0.530.75 0.76 0.76 0.73 0.72 0.73 0.67 0.670.670.680.71 0.66 0.53 0.54 0.54 0.54 0.53 0.540.6 0.7 0.64 0.67 0.63 0.61 0.53 0.530.570.580.7 0.67 0.64 0.53 0.71 0.58 0.55 0.520.65 0.75 0.68 0.71 0.68 0.66 0.65 0.650.620.620.74 0.72 0.69 0.58 0.74 0.63 0.57 0.660.82 0.86 0.84 0.85 0.7 0.7 0.69 0.680.690.80.85 0.84 0.82 0.77 0.85 0.66 0.67 0.670.82 0.86 0.84 0.85 0.7 0.7 0.68 0.680.690.790.85 0.83 0.81 0.76 0.85 0.66 0.66 0.670.71 0.71 0.72 0.71 0.69 0.69 0.66 0.660.660.670.68 0.67 0.64 0.63 0.65 0.63 0.63 0.630.71 0.71 0.72 0.71 0.69 0.69 0.66 0.660.660.670.68 0.68 0.64 0.63 0.65 0.62 0.62 0.630.71 0.71 0.72 0.72 0.69 0.7 0.66 0.670.670.680.69 0.68 0.64 0.63 0.65 0.63 0.62 0.630.72 0.73 0.73 0.73 0.7 0.71 0.67 0.670.670.690.7 0.69 0.65 0.64 0.66 0.63 0.63 0.640.6 0.7 0.64 0.67 0.63 0.61 0.53 0.530.570.580.7 0.67 0.64 0.53 0.71 0.58 0.55 0.520.65 0.75 0.68 0.71 0.68 0.66 0.65 0.650.620.620.74 0.72 0.69 0.58 0.74 0.63 0.57 0.660.82 0.85 0.83 0.77 0.75 0.76 0.75 0.750.750.770.83 0.74 0.63 0.63 0.63 0.63 0.63 0.640.83 0.85 0.84 0.76 0.75 0.76 0.74 0.740.750.780.83 0.74 0.66 0.65 0.66 0.66 0.65 0.660.75 0.75 0.75 0.74 0.73 0.73 0.69 0.70.70.710.73 0.7 0.62 0.61 0.62 0.62 0.61 0.620.75 0.76 0.76 0.74 0.73 0.73 0.69 0.70.70.710.73 0.69 0.62 0.61 0.62 0.62 0.61 0.620.77 0.78 0.78 0.76 0.75 0.76 0.72 0.730.730.730.75 0.72 0.65 0.65 0.65 0.65 0.64 0.660.78 0.79 0.79 0.77 0.76 0.76 0.72 0.720.720.740.75 0.72 0.63 0.62 0.63 0.63 0.62 0.630.65 0.74 0.68 0.71 0.68 0.65 0.6 0.610.620.620.74 0.72 0.69 0.58 0.75 0.62 0.58 0.60.65 0.75 0.68 0.71 0.68 0.66 0.65 0.650.620.620.74 0.72 0.69 0.58 0.74 0.63 0.57 0.660.84 0.86 0.84 0.85 0.75 0.75 0.75 0.740.750.810.85 0.84 0.82 0.79 0.84 0.72 0.72 0.730.83 0.86 0.85 0.85 0.75 0.76 0.75 0.740.750.820.85 0.84 0.83 0.79 0.85 0.73 0.73 0.740.74 0.74 0.75 0.75 0.73 0.73 0.71 0.720.720.730.74 0.72 0.7 0.7 0.71 0.69 0.69 0.70.74 0.75 0.75 0.75 0.73 0.73 0.71 0.720.720.730.74 0.72 0.7 0.69 0.71 0.69 0.69 0.70.76 0.77 0.77 0.77 0.75 0.75 0.74 0.740.740.750.76 0.75 0.73 0.72 0.73 0.72 0.71 0.720.77 0.77 0.78 0.78 0.75 0.76 0.73 0.740.740.750.76 0.75 0.72 0.71 0.73 0.71 0.71 0.720.65 0.74 0.68 0.71 0.68 0.65 0.6 0.610.620.620.74 0.72 0.69 0.58 0.75 0.62 0.58 0.60.65 0.75 0.68 0.71 0.68 0.66 0.65 0.650.620.620.74 0.72 0.69 0.58 0.74 0.63 0.57 0.66 K = p . = K = p . = K = p . = . K = p . = . SLSRFEREPLPRHR−LHR−RSLSRFEREPLPRHR−LHR−RSLSRFEREPLPRHR−LHR−RSLSRFEREPLPRHR−LHR−R

Study M e t hod s C S Figure A.1: Average prediction across 100 simulations of a collection of 18 studies. Studyspeciﬁc eﬀects β k have been generated under Σ ..

Study M e t hod s C S Figure A.1: Average prediction across 100 simulations of a collection of 18 studies. Studyspeciﬁc eﬀects β k have been generated under Σ .. Either 5 or 10 of the 18 studies (studiesin the left of the horizontal red bar) are used for the similarity matrix and covariate-eﬀectestimation. 28 .83 0.86 0.84 0.8 0.79 0.79 0.79 0.790.80.820.87 0.79 0.8 0.8 0.8 0.79 0.79 0.810.82 0.86 0.83 0.79 0.78 0.78 0.78 0.780.780.810.86 0.78 0.79 0.78 0.79 0.79 0.78 0.790.77 0.77 0.77 0.75 0.74 0.74 0.75 0.750.750.770.78 0.74 0.75 0.75 0.75 0.75 0.75 0.760.77 0.78 0.77 0.75 0.75 0.74 0.75 0.750.750.760.79 0.74 0.75 0.75 0.75 0.75 0.75 0.760.79 0.8 0.8 0.77 0.76 0.77 0.77 0.770.770.790.81 0.76 0.77 0.77 0.77 0.77 0.77 0.780.81 0.82 0.82 0.79 0.78 0.79 0.78 0.780.790.810.83 0.78 0.79 0.79 0.79 0.79 0.78 0.790.61 0.7 0.64 0.67 0.63 0.61 0.59 0.590.580.580.71 0.67 0.65 0.56 0.71 0.57 0.55 0.60.65 0.75 0.68 0.71 0.68 0.66 0.65 0.650.620.620.74 0.72 0.69 0.58 0.74 0.63 0.57 0.660.85 0.88 0.86 0.87 0.82 0.82 0.81 0.820.820.840.88 0.87 0.87 0.83 0.88 0.82 0.81 0.830.85 0.88 0.86 0.87 0.81 0.81 0.81 0.820.820.840.88 0.87 0.87 0.83 0.88 0.82 0.82 0.830.79 0.79 0.79 0.79 0.77 0.78 0.77 0.780.780.790.8 0.78 0.79 0.78 0.8 0.78 0.77 0.780.79 0.79 0.79 0.79 0.77 0.77 0.77 0.770.780.790.8 0.78 0.79 0.78 0.8 0.78 0.77 0.780.82 0.83 0.83 0.83 0.8 0.81 0.8 0.810.810.820.84 0.82 0.83 0.82 0.84 0.81 0.81 0.820.83 0.83 0.83 0.83 0.81 0.81 0.81 0.810.820.820.84 0.83 0.83 0.82 0.84 0.82 0.81 0.820.61 0.7 0.64 0.67 0.63 0.61 0.59 0.590.580.580.71 0.67 0.65 0.56 0.71 0.57 0.55 0.60.65 0.75 0.68 0.71 0.68 0.66 0.65 0.650.620.620.74 0.72 0.69 0.58 0.74 0.63 0.57 0.660.85 0.87 0.86 0.83 0.82 0.82 0.83 0.830.830.840.88 0.82 0.84 0.83 0.83 0.83 0.82 0.840.85 0.87 0.85 0.82 0.81 0.81 0.81 0.810.810.840.87 0.81 0.82 0.81 0.82 0.82 0.81 0.820.78 0.79 0.79 0.77 0.76 0.77 0.76 0.760.770.780.79 0.76 0.77 0.77 0.77 0.77 0.76 0.770.78 0.79 0.79 0.77 0.76 0.77 0.76 0.760.770.780.79 0.76 0.77 0.77 0.77 0.77 0.76 0.770.82 0.83 0.83 0.81 0.8 0.81 0.8 0.80.810.820.83 0.8 0.81 0.81 0.81 0.81 0.8 0.810.83 0.84 0.84 0.82 0.8 0.81 0.81 0.810.810.830.85 0.81 0.82 0.81 0.82 0.81 0.81 0.820.65 0.75 0.68 0.71 0.68 0.66 0.65 0.650.620.620.74 0.72 0.69 0.58 0.74 0.63 0.57 0.660.65 0.75 0.68 0.71 0.68 0.66 0.65 0.650.620.620.74 0.72 0.69 0.58 0.74 0.63 0.57 0.660.87 0.89 0.88 0.89 0.84 0.84 0.85 0.850.850.860.89 0.88 0.88 0.85 0.9 0.85 0.85 0.860.87 0.88 0.87 0.88 0.84 0.84 0.84 0.840.840.860.88 0.88 0.88 0.85 0.89 0.84 0.84 0.850.8 0.8 0.81 0.81 0.79 0.79 0.79 0.790.80.80.81 0.8 0.81 0.8 0.82 0.8 0.79 0.80.8 0.81 0.81 0.81 0.79 0.79 0.79 0.790.80.80.81 0.8 0.81 0.8 0.82 0.8 0.79 0.80.85 0.85 0.86 0.85 0.83 0.84 0.83 0.840.840.850.86 0.85 0.85 0.84 0.86 0.84 0.83 0.850.85 0.85 0.86 0.86 0.83 0.84 0.84 0.840.840.850.86 0.85 0.86 0.85 0.86 0.84 0.84 0.850.65 0.75 0.68 0.71 0.68 0.66 0.65 0.650.620.620.74 0.72 0.69 0.58 0.74 0.63 0.57 0.660.65 0.75 0.68 0.71 0.68 0.66 0.65 0.650.620.620.74 0.72 0.69 0.58 0.74 0.63 0.57 0.66 K = p . = K = p . = K = p . = . K = p . = . SLSRFEREPLPRHR−LHR−RSLSRFEREPLPRHR−LHR−RSLSRFEREPLPRHR−LHR−RSLSRFEREPLPRHR−LHR−R

Study M e t hod s C S Figure A.2: Average prediction across 100 simulations of a collection of 18 studies. Studyspeciﬁc eﬀects β k have been generated under Σ ..