[PDF] Precision Therapeutic Biomarker Identification with Application to the Cancer Genome Project

Abstract

Cancer cell lines have frequently been used to link drug sensitivity and resistance with genomic profiles. To capture genomic complexity in cancer, the Cancer Genome Project (CGP) (Garnett et al., 2012) screened 639 human tumor cell lines with 130 drugs ranging from known chemotherapeutic agents to experimental compounds. Questions of interest include: i) can cancer-specific therapeutic biomarkers be detected, ii) can drug resistance patterns be identified along with predictive strategies to circumvent resistance using alternate drugs, iii) can biomarkers of drug synergies be predicted ? To tackle these questions, following statistical challenges still exist: i)biomarkers cluster among the cell lines; ii) clusters can overlap (e.g. a cell line may belong to multiple clusters); iii) drugs should be modeled jointly. We introduce a multivariate regression model with a latent overlapping cluster indicator variable to address above issues. A generalized finite mixture of multivariate regression (FMMR) model in connection with the new model and a new EM algorithm for fitting are proposed. Re-analysis of the dataset sheds new light on the therapeutic inter-relationships between cancers as well existing and novel drug behaviors for the treatment and management of cancer.

Full PDF

PPrecision Therapeutic Biomarker Identiﬁcation withApplication to the Cancer Genome Project

Hongmei Liu and J. Sunil Rao

Division of Biostatistics, University of Miami, Miami, FL [email protected]@biostat.med.miami.edu

Cancer cell lines have frequently been used to link drug sensitivity and resistance withgenomic proﬁles. To capture genomic complexity in cancer, the Cancer Genome Project(CGP) (Garnett et al., 2012) screened 639 human tumor cell lines with 130 drugsranging from known chemotherapeutic agents to experimental compounds. Questionsof interest include: i) can cancer-speciﬁc therapeutic biomarkers be detected, ii) candrug resistance patterns be identiﬁed along with predictive strategies to circumventresistance using alternate drugs, iii) can biomarkers of drug synergies be predicted ? Totackle these questions, following statistical challenges still exist: i)biomarkers clusteramong the cell lines; ii) clusters can overlap (e.g. a cell line may belong to multipleclusters); iii) drugs should be modeled jointly. We introduce a multivariate regressionmodel with a latent overlapping cluster indicator variable to address above issues. Ageneralized ﬁnite mixture of multivariate regression (FMMR) model in connection withthe new model and a new EM algorithm for ﬁtting are proposed. Re-analysis of thedataset sheds new light on the therapeutic inter-relationships between cancers as wellexisting and novel drug behaviors for the treatment and management of cancer.

Key Words:

Cancer biomarkers; EM algorithm; ﬁnite mixture of multivariate regressionmodel; LASSO; overlapping clustering.

The use of drugs to selectively target speciﬁc genetic alterations in deﬁned patient subpopu-lations has seen signiﬁcant successes. One example can be found in the treatment of chronicmyeloid leukaemia (CML) where the ﬁrst consistent chromosomal abnormality associatedwith a human cancer was identiﬁed back in the 1960s. Fast forward to the 1980s where theconsequence of this abnormality was discovered to be the production of an abnormal genecalled BCR-ABL. Intense drug discovery programs were initiated to shut down the activityof BCR-ABL, and in 1992, imatinib (Gleevec) was developed. In 1998, the drug was testedin CML patients who had exhausted standard treatment options and whose life expectancywas limited, with remarkable results in their blood counts returning to normal. In 2001, theFDA approved imatinib. Today, a once commonly fatal cancer now has a ﬁve-year survivalrate of 95% (Druker et al., 2006).Achievements like this largely inspire today’s high throughput screening studies of link-ing cancer drugs (known or in development) to speciﬁc genomic changes which could beused as therapeutic biomarkers. The hope is that such analyses will shed light on biologicalmechanisms underlying drug sensitivity, tumor resistance and potential drug combinationsynergies. a r X i v : . [ s t a t . A P ] F e b Liu and Rao

Cancer cell lines have frequently been used as a convenient way of conducting suchstudies. For a systematic search of therapeutic biomarkers to a variety of cancer drugs, theCancer Genome Project (CGP) (Garnett et al., 2012) screened 639 human tumor cell lines,which represent much of the tissue-type and genetic diversity of human cancers, with 130drugs. These drugs, including approved drugs, drugs in development as well as experimentaltool compounds, cover a wide range of targets and processes involved in cancer biology. Arange of 275–507 cell lines were screened for each drug. The eﬀect of a 72h drug treatment oncell viability was examined to derive such measures of drug sensitivity as the half-maximalinhibitory concentration ( IC ). The cell lines underwent sequencing of 64 known cancergenes, genome-wide analysis of copy number gains and losses, and expression proﬁling of14,500 genes.Given the degree of complexity of this dataset, the multivariate analysis of variance(MANOVA) and the Elastic-Net regression applied in Garnett et al. (2012) are insuﬃcientfor precise knowledge discovery. First, the marginal drug-feature associations discoveredin MANOVA rarely reﬂect true relationships, as it is more likely that sensitivity of cancercells to drugs depends on a multiplicity of genomic and epigenomic features with potentialinteractions. Second, the Elastic-Net regression fails to concern following issues: 1) sincethe 639 cell lines come from a variety of cancer tissue types, there is likely additionalheterogeneity manifested as subpopulations with overlaps in data; 2) note that the 130drugs (response variables) are hardly independent, one can improve the prediction accuracyby modeling with multiple drugs (Breiman and Friedman, 1997).Moreover, there are some direct questions of interest from a subject matter perspectivethat we want to address. These include, i) can cancer-speciﬁc therapeutic biomarkers bedetected, ii) can drug resistance patterns be identiﬁed along with predictive strategies tocircumvent resistance using alternate drugs, iii) can biomarkers of combination therapiesbe identiﬁed to help predict synergies in drug activities ? To tackle these questions andpreviously discussed statistical challenges, we propose a multivariate regression model witha latent overlapping cluster indicator variable. Fitting procedures inducing concurrentvariable selection are introduced.The rest of the paper is organized as follows. In Section 1.2, we give a selective overviewof existing clustering and overlapping clustering methods for general (without response vari-ables) and regression data. In Section 2, a new statistical model is introduced, a generalizedFMMR model in connection with the new model and a new EM algorithm for ﬁtting areprovided. We also establish a type of consistency optimality for estimation of the general-ized FMMR model and perform some small simulation studies to empirically demonstratethis. In Section 3, we put forward another ﬁtting solution to the new model for comparisonwith the generalized FMMR model. Section 4 contains a comprehensive re-analyses of theCGP data using the proposed method. Discussion is included in Section 5. Clustering is a well established technique to group data elements based on a measure ofsimilarity. Traditional clustering techniques generate partitions so that each data pointbelongs to one and only one cluster. It has long been recognized that such ideal partitionseldom exists in real data (Needham, 1965). It is more likely that clusters overlap in some recision Therapeutic Biomarker Identiﬁcation Y ∈ R q and covariates X ∈ R p by ﬁtting a regression model rather than to explore the X or Y on its own. In regression analyses with q = 1, ﬁnite mixture of regression (FMR) modelsare commonly used to capture unobserved cross-sectional heterogeneity in the data (Jedidiet al., 1996). The FMR model postulates that a sample of observations come from a ﬁnitemixture of latent partitioning sub-populations with each sub-population represented by aregression model. It was ﬁrst introduced to statistical literatures by Quandt (1972). De-Sarbo and Cron (1988) proposed an EM algorithm (Dempster et al., 1977) based maximumlikelihood estimation for the FMR model. Khalili and Chen (2012) put forward a penalizedlikelihood approach for variable selection in FMR models. As a natural generalization tomultivariate responses case, Jones and McLachlan (1992) introduced the FMMR model.Grun and Leisch (2008) proposed a R package ﬂexmix for ﬁtting FMMR models, howeverit assumes that the response variables are independent. For a sample (of cell lines) of n observations, denote y i = ( y i , . . . , y iq ) T ∈ R q a vectorof responses ( IC values), x i = ( x i , . . . , x ip n ) T ∈ R p n a vector of predictors (genomicmarkers) and ε i = ( ε i , . . . , ε iq ) T a vector of random errors for the i th observation. Assume ε i iid ∼ N q ( , Σ ) for i = 1 , . . . , n . The following proposed model allows overlapping clusteringfor multivariate regression data, y i = K (cid:88) k =1 B Tk x i P ik + ε i , i = 1 , . . . , n, (1)where K is the total number of clusters, B k is an unknown p n × q coeﬃcient matrix forthe k th cluster, and P ik ∈ { , } is 1 if observation i belongs to the k th cluster, otherwise0. In traditional clustering problem, it assumes that each observation belongs to exactlyone cluster, namely (cid:80) Kk =1 P ik = 1 for all i . Therefore we allow (cid:80) Kk =1 P ik ≥ k contains asubset of observations for whom P ik = 1. We postulate that not all genomic features arerelevant in describing the cluster k , thus assume a sparse coeﬃcient matrix B k . Since thesparse patterns can vary with k , each cluster is represented by a unique set of biomarkers.For observations belong to multiple clusters, their response variables are explained by multi-ple sets of biomarkers, indicative of involving in several biological processes simultaneously. Liu and Rao

When (cid:80) Kk =1 P ik = 1, model (1) can be characterized by a hierarchical structure which endsup with an FMMR model. Consider a latent cluster membership random variable z i forobservation i from (1). Given (cid:80) Kk =1 P ik = 1, the range of z i equals to { , . . . , K } . Assumethat P ( z i = k ) = π k for each k , then (cid:80) Kk =1 π k = 1 and π k ≥

0. Response y i from model (1)satisﬁes ( y i | z i = k, x i , B k , Σ ) ∼ N q ( B Tk x i , Σ ) . (2)Denote Θ = ( B , . . . , B K , Σ , π ) with π = ( π , . . . , π K − ) T . We can derive the jointdensity of y i and z i as f ( y i , z i | x i , Θ ) = K (cid:89) k =1 { π k f ( y i | z i = k, x i , B k , Σ ) } I ( z i = k ) . (3)Summarizing (3) over z i = 1 , . . . , K yields the FMMR model f ( y i | x i , Θ ) = K (cid:88) k =1 π k f ( y i | z i = k, x i , B k , Σ ) . (4)When using the EM algorithm to estimate model (4), ( x i , y i ) is regarded as incompletedata for missing z i , so one works on the complete joint density in (3). For a sample of n observations from (1), the complete log-likelihood function of Θ becomes l n ( Θ ) = n (cid:88) i =1 K (cid:88) k =1 z ik log π k + n (cid:88) i =1 K (cid:88) k =1 z ik log f ( y i | z i = k, x i , B k , Σ ) , (5)where z ik = I ( z i = k ).Since we do not expect all genomic markers to be informative, variable selection, toobtain a parsimonious model is necessary. Khalili and Chen (2012) and Khalili and Lin(2013) (for diverging model size) introduced a penalized likelihood approach for variableselection in FMR models, which was shown consistent in variable selection and highlyeﬃcient in computation. We deﬁne a penalized complete log-likelihood function as˜ l n ( Θ ) = l n ( Θ ) − ρ n ( Θ ) , (6)where ρ n ( Θ ) = K (cid:88) k =1 π k ρ nk ( B k ) . The following two penalty functions are currently used to meet diﬀerent demands invariable selection:1) the L -penalty in LASSO (Tibshirani, 1996) for simultaneous estimation and variableselection ρ nk ( B k ) = λ k (cid:107) B k (cid:107) recision Therapeutic Biomarker Identiﬁcation

12 213 123 233

A BC

Figure 1: Overall overlapping patterns given K = 3.2) a linear combination of the L - and L -penalty in Elastic-Net (Zou and Hastie, 2005)for simultaneous estimation and selection of grouped features ρ nk ( B k ) = λ k (cid:107) B k (cid:107) + λ k (cid:107) B k (cid:107) , where (cid:107) · (cid:107) and (cid:107) · (cid:107) are respectively the L - and L -norms and the tuning parameters λ k , λ k , λ k ≥

0. The L -penalty is singular at the origin, thus can shrink some coeﬃcientsto exact 0 for suﬃciently large λ k or λ k (Fan and Li, 2001).Their revised EM algorithm can easily be generalized to maximize ˜ l n ( Θ ) in (6), hencewe bypass here. In next section, generalized FMMR model and EM algorithm are devisedto ﬁt overlapping multivariate regression data. Finite mixture models (including FMR and FMMR) can only ﬁt partitions, as can be seenfrom Section 2.2. To enable overlapping clustering, one often uses a ”naive” approach byapplying a hard threshold α to ﬁnite mixture models. This approach assigns a sample tocluster k if its posterior probability of belonging to cluster k is larger than a pre-speciﬁed α , hence enabling a sample to belong to multiple clusters. As pointed out by Banerjeeet al. (2005), this method is problematic because it is not a natural generative modelfor overlapping clustering, since one underlying assumption of the ﬁnite mixture model isthat each observation comes from one and only one mixture component. In this sectionhowever, we introduce a generalized FMMR model to retrieve overlapping clusters when (cid:80) Kk =1 P ik ≥

1. This generalized model retains its ability in ﬁtting partitions.Suppose we have K objective clusters indexed by 1 to K . These objective clusters referto clusters deﬁned in (1) and can overlap with each other, resulting in 2 K − K equal to 3 for an example, overall overlapping patterns arethen composed of S = { , , , , , , } . Figure 1 shows the particular part eachoverlapping pattern in S represents. By deﬁnition, overlapping patterns in S are mutuallyexclusive. And each observation i from (1) belongs to one and only one overlapping pattern.In the next, we deﬁne 2 K − hypothetical clusters . Each hypothetical cluster (simplycalled “cluster” since after) represents an overlapping pattern deﬁned in advance and is Liu and Rao indexed by an element in T , T = ∪ Ks =1 { ( l . . . l s ) : { l , . . . , l s } ⊆ { , . . . , K }} . Cluster ( l . . . l s ) implies its members belonging to objective clusters l , . . . , l s .We now introduce a latent cluster membership random variable z i for observation i frommodel (1) and characterize the model via a hierarchical structure. Given (cid:80) Kk =1 P ik ≥

1, therange of z i becomes T . Further deﬁne P ( z i = ( l . . . l s )) = π ( l ...l s ) . (7)Then vector π = ( π ( l ...l s ) ∈ T ) T has (cid:88) ( l ...l s ) ∈ T π ( l ...l s ) = 1 , π ( l ...l s ) ≥ . Response y i from (1) satisﬁes( y i | z i = ( l . . . l s ) , x i , { B l k } sk =1 , Σ ) ∼ N q (cid:32) s (cid:88) k =1 B Tl k x i , Σ (cid:33) . (8)Let Θ = ( B , . . . , B K , Σ , π ). By (7) and (8), the joint density of y i and z i equals f ( y i , z i | x i , Θ ) = (9) (cid:89) ( l ...l s ) ∈ T (cid:8) π ( l ...l s ) f ( y i | z i , x i , { B l k } sk =1 , Σ ) (cid:9) I ( z i =( l ...l s )) . Summarizing (9) over z i leads to the generalized FMMR model f ( y i | x i , Θ ) = (cid:88) ( l ...l s ) ∈ T π ( l ...l s ) f ( y i | z i , x i , { B l k } sk =1 , Σ ) . (10)Note that if π ( l ...l s ) = 0 for s >

1, (10) reduces to the traditional FMMR model in (4).Then the (conditional) log-likelihood function of Θ for a sample of n observations from(1) is l n ( Θ ) = n (cid:88) i =1 log  (cid:88) ( l ...l s ) ∈ T π ( l ...l s ) f ( y i | z i , x i , { B l k } sk =1 , Σ )  . Maximizing above ordinary likelihood function yields non-zero estimates for all regres-sion coeﬃcients. To induce variable selection and remove noise predictors from the regres-sion model, we propose a penalized log-likelihood function˜ l n ( Θ ) = l n ( Θ ) − K (cid:88) k =1 (cid:88) ( l ...l s ): k ∈{ l ,...,l s } π ( l ...l s ) ρ nk ( B k ) , (11) recision Therapeutic Biomarker Identiﬁcation ρ nk ( B k ) is the LASSO or Elastic-Net penalty function in Section 2.2. Notation (cid:80) ( l ...l s ): k ∈{ l ,...,l s } indicates that it summarizes over ( l . . . l s ) ∈ T for which k ∈ { l , . . . , l s } .The penalty imposed on B k is proportional to (cid:80) ( l ...l s ): k ∈{ l ,...,l s } π ( l ...l s ) , which by deﬁnition isproportional to the number of observations involved in the k th objective cluster. This isa strategy, similar to Khalili and Chen (2012), of relating the penalty to sample sizes forenhanced power of the method. We use the renowned EM algorithm for optimization of the generalized FMMR model. Thecomplete log-likelihood function of Θ is l n ( Θ ) = n (cid:88) i =1 (cid:88) ( l ...l s ) ∈ T z i, ( l ...l s ) log π ( l ...l s ) + n (cid:88) i =1 (cid:88) ( l ...l s ) ∈ T z i, ( l ...l s ) log f ( y i | z i , x i , { B l k } sk =1 , Σ ) , where z i, ( l ...l s ) = I ( z i = ( l . . . l s )).The penalized complete log-likelihood function for concurrent variable selection is then˜ l n ( Θ ) = l n ( Θ ) − K (cid:88) k =1 (cid:88) ( l ...l s ): k ∈{ l ,...,l s } π ( l ...l s ) ρ nk ( B k ) . (12)Due to the overlap setting, each B k involves in multiple clusters. Therefore coeﬃcientmatrix B k for k = 1 , . . . , K can not be optimized independently like in usual EM algorithm.Instead we sequentially update B k given the rest are known. Speciﬁcally take ρ nk ( B k ) as aLASSO penalty function for an example, our revised EM algorithm iteratively maximizes˜ l n ( Θ ) in two steps:E-step: Given Θ m , estimate z mi, ( l ...l s ) with its posterior probability by applying theBayes’ rule to (9) and (10),ˆ P ( z mi, ( l ...l s ) = 1 | y i , x i , Θ m ) = π m ( l ...l s ) f ( y i | z i , x i , { B l k } sk =1 , Σ m ) (cid:80) ( l (cid:48) ...l (cid:48) s ) ∈ T π m ( l (cid:48) ...l (cid:48) s ) f (cid:16) y i | z i , x i , { B l (cid:48) k } sk =1 , Σ m (cid:17) . M-step: Given Z m = ( z mi, ( l ...l s ) ), update the mixing proportions π m +1 by π m +1( l ...l s ) = 1 n n (cid:88) i =1 z mi, ( l ...l s ) , ( l . . . l s ) ∈ T. Note that this is obtained by maximizing the leading term of (12) with respect to π . Thissimpliﬁed updating scheme however works well in simulation studies. Liu and Rao

Given Z m , π m +1 and Σ m , sequentially update B m +1 k | { B ms } s = Ks =1 ,s (cid:54) = k by B m +1 k = arg max B k n (cid:88) i =1 (cid:88) ( l ...l s ): k ∈{ l ,...,l s } z mi, ( l ...l s ) log f (cid:16) y i | z i , x i , { B ml j } sj =1 ,l j (cid:54) = k , Σ m (cid:17) − (cid:88) ( l ...l s ): k ∈{ l ,...,l s } π m +1( l ...l s ) λ k (cid:107) B k (cid:107) . (13)By (8), B m +1 k = arg min B k (cid:88) ( l ...l s ): k ∈{ l ,...,l s } tr (cid:0) ( Y ∗ − XB k ) T z m ( l ...l s ) ( Y ∗ − XB k )( Σ m ) − (cid:1) +2 (cid:88) ( l ...l s ): k ∈{ l ,...,l s } π m +1( l ...l s ) λ k (cid:107) B k (cid:107) , (14)where y ∗ i = y i − (cid:88) l ∈{ l ,...,l s }\ k ( B ml ) T x i , Y ∗ = ( y ∗ , . . . , y ∗ q ) , z m ( l ...l s ) = diag (cid:16) z m , ( l ...l s ) , . . . , z mn, ( l ...l s ) (cid:17) . Above (14) is a multivariate regression problem with a LASSO penalty for estimation, whichcan be solved by the MRCE algorithm (Rothman et al., 2010). If we ignore the covariancestructure of Σ m merely in (14), estimation of B m +1 k reduces to q independent LASSOregression problems. In this case, more complex penalty functions such as the Elastic-Netand fused LASSO penalties can easily be applied. Therefore we also have investigatedthe performance of this simpliﬁed strategy in simulation studies. It works surprisinglycomparable to original method utilizing the MRCE algorithm for estimation of B m +1 k .Given Z m and { B m +1 k } Kk =1 , update Σ m +1 by Σ m +1 = arg max Σ n (cid:88) i =1 (cid:88) ( l ...l s ) ∈ T z mi, ( l ...l s ) log f (cid:16) y i | z i , x i , { B m +1 l k } sk =1 , Σ (cid:17) = arg min Σ n (cid:88) i =1 (cid:88) ( l ...l s ) ∈ T − z mi, ( l ...l s ) log | Σ − | + (cid:88) ( l ...l s ) ∈ T tr (cid:0) ( Y − X s (cid:88) k =1 B m +1 l k ) T z m ( l ...l s ) ( Y − X s (cid:88) k =1 B m +1 l k ) Σ − (cid:1) . (15)By taking the derivative of (15) according to Σ − , we get Σ m +1 = (cid:80) ( l ...l s ) ∈ T ( Y − X (cid:80) sk =1 B m +1 l k ) T z m ( l ...l s ) ( Y − X (cid:80) sk =1 B m +1 l k ) (cid:80) ( l ...l s ) ∈ T (cid:80) ni =1 z mi, ( l ...l s ) , recision Therapeutic Biomarker Identiﬁcation (cid:80) ( l ...l s ) ∈ T (cid:80) ni =1 z mi, ( l ...l s ) = n .Although we can also penalize the inverse covariance matrix Σ − of (15) like in Rothmanet al. (2010), simulations (not presented in the paper) show that penalizing Σ − results ina lower degree of clustering accuracy than not penalizing Σ − . This may be because bypenalizing Σ − one introduces bias into its estimation, a biased estimate of Σ − in returndeteriorates the estimate of z i, ( l ...l s ) in E-step. Thus we choose to not penalize Σ − .Commencing with an initial value Θ , the algorithm iterates between E- and M-stepsuntil the relative change in log-likelihood, | (cid:0) l m +1 n ( ˆΘ ) − l mn ( ˆΘ ) (cid:1) /l mn ( ˆΘ ) | , is smaller thansome threshold value, taken as 10 − in simulation studies and 10 − in real data analysis.Additionally, a cluster, whose mixing proportion is smaller than some threshold value takenas 0.01 in the paper, will be removed during iterations to avoid over estimations. K In preceding penalized likelihood approach, one needs to choose the values of component-wise tuning parameters λ k for k = 1 , . . . , K , which controls the complexity of an estimatedmodel. The data-driven method cross-validation (CV) (Stone, 1977) is frequently adoptedin literatures. Here we use a component-wise 10-fold CV method for tuning parametersselection in (14).Moreover, selection of the number of components K is essential in ﬁnite mixture models(including FMR and FMMR). In applications, the choice can be based on prior knowledge ofdata analysts. With respect to formatted methodologies, information criteria (IC) remainsby far the most popular strategy for selection of K . See Claeskens et al. (2008) for generaltreatments on this topic. Here we choose the K minimizing below IC n , IC n ( K ) = − l n ( Θ ) + N K a n . (16)Above N k is the eﬀective number of parameters in the model, N K = |{ B k , k = 1 , . . . , K }| + | π | − | Σ | , where | A | calculates the number of nonzero elements in A . In (16), a n is a positive sequencedepending on n . The well known AIC (Akaike, 1974) and BIC (Schwarz et al., 1978)correspond to a n = 2 and a n = log( n ) respectively. It was shown in Keribin (2000) thatunder general regularity conditions, BIC can identify the true order of a ﬁnite mixturemodel asymptotically. Feasibility of his results to FMR or FMMR models is unknown yet.We examined the performance of BIC for selection of K in a generalized FMMR model viasimulation studies. It obtained a high degree of accuracy. Khalili and Lin (2013) showed that their approach by maximizing a penalized log-likelihoodfunction for estimation of the FMR model is consistent in both estimation and variableselection under certain regularity conditions. Denote f ( w ; Θ ) the joint density function ofdata w = ( x , y ) with Θ ∈ Ω . The conditional density function of y given x follows an FMRmodel in Khalili and Lin (2013) and a generalized FMMR model in (10) in our paper. Asdeﬁned in (11), the penalized log-likelihhood function is a summation of the log( f ( w i ; Θ ))’s0 Liu and Rao minus a penalty function. Because the regularity conditions for asymptotic establishmentsin Khalili and Lin (2013) are made on f ( w ; Θ ) and the penalty function directly, theirtheoretical achievements can be extended to our problem.Denote B jk the j th column of B k for j = 1 , . . . , q . Let B jk = ( B jk , B jk ) to divide thecoeﬃcient vector into non-zero and zero subsets. Denote Θ the true value of Θ , Θ =( Θ , Θ ) is the corresponding decomposition such that Θ contains all zero coeﬃcients, andˆ Θ n = ( ˆ Θ n , ˆ Θ n ) is its estimate. Since the dimension of ˆ Θ n increases with n , we investigatethe asymptotic distribution of its ﬁnite linear transformation, D n ˆ Θ n , where D n is an l × d n constant matrix with a ﬁnite l and d n is the dimension of Θ . Moreover D n D Tn → D and D is a positive deﬁnite and symmetric matrix. We assume that K is independent of thesample size n and known beforehand. Its selection is discussed in Section 2.5. Lemma 1.

Suppose the penalty function ρ nk ( B k ) satisﬁes conditions P − P and the jointdensity function f ( w ; Θ ) satisﬁes conditions R − R in Khalili and Lin (2013).1. If p n n → , then there exists a local maximizer ˆ Θ n for the penalized log-likelihood ˜ l n ( Θ ) with (cid:107) ˆ Θ n − Θ (cid:107) = O p (cid:26)(cid:114) p n n (1 + q n ) (cid:27) , where q n = max kij (cid:26) | ρ (cid:48) nk ( b kij ) |√ n : b kij (cid:54) = 0 (cid:27) , b kij is an entry of B k .2. If ρ nk ( B k ) also satisﬁes condition P in Khalili and Lin (2013) and p n n → , for any (cid:112) n/p n -consistent maximum likelihood estimate ˆ Θ n , as n → ∞ it has:i. Variable selection consistency: P ( ˆ B jk = ) → , k = 1 , . . . , K and j = 1 , . . . , q. ii. Asymptotic normality: √ n D n I − / ( Θ ) (cid:40)(cid:34) I ( Θ ) − ρ (cid:48)(cid:48) n ( Θ ) n (cid:35) ( ˆ Θ n − Θ ) + ρ (cid:48) n ( Θ ) n (cid:41) → d N (0 , D ) , (17) where I ( Θ ) is the Fisher information matrix under the true subset model.Proof. Write Θ = ( θ , θ , . . . , θ t n ) T , where t n is the total number of parameters in themodel. Then Lemma 1 is a direct extension from Khalili and Lin (2013).By Lemma 1, ˆ Θ n under the LASSO or Elastic Net penalty has a convergence rate (cid:112) p n /n via appropriate choice of the tuning parameters. However consistent estimation does notnecessarily guarantee consistent variable selection. By the Lemma, the LASSO or ElasticNet penalty does not lead to consistent variable selection, because on one hand λ k mustbe large enough to achieve sparsity, on the other hand the bias term q n is proportional tothe λ k / √ n . This problem can be solved by using adaptive LASSO or adaptive Elastic Netpenalty instead, which leads to concurrent estimation and variable selection consistency.We conducted a sequence of small simulation studies to numerically demonstrate opti-mality properties of the new method. They are presented in Web Appendix A. recision Therapeutic Biomarker Identiﬁcation Lazzeroni and Owen (2002) proposed the plaid model to decompose a gene expression dataas the sum of overlapping layers (clusters). Each layer contains a subset of genes andsamples, while each gene and sample can participate in multiple layers. Denote x ij a geneexpression data entry and K is the total number of layers. The plaid model is x ij = ω + K (cid:88) k =1 ( ω k + α ik + β jk ) P ik Q jk , where ω , ω k , α ik , β jk ∈ R , P ik is 1 if gene i is in the k th gene-bolck and otherwise 0, Q jk is 1 if sample j is in the k th sample-block and otherwise 0. An iterative algorithm was putforward to consecutively search for the layers.In fact, model (1) is a generalized plaid model by introducing covariates into the modelso that it can cluster overlapping multivariate regression data. Therefore, estimation pro-cedures of the plaid model can also be used to estimate (1). In parallel to the plaid model,the parameters are estimated by minimizing the Q , Q = 12 n (cid:88) i =1 (cid:107) y i − K (cid:88) k =1 x Ti B k P ik (cid:107) + K (cid:88) k =1 ρ nk ( B k ) , where ρ nk ( B k ) is the penalty function deﬁned in Section 2.2.Detailed procedures for optimizing Q are shown in Algorithm 1 in Web Appendix B,which is a generalization of the improved iterative algorithm by Turner et al. (2005) tomultivariate regression. At each time, the algorithm searches for a layer that explains asmuch of the rest data as possible; once a layer has been found, it is subtracted from the dataand remains unchanged; the algorithm stops when no further layers can be found. Howeverthis algorithm introduces bias to parameter estimations due to its discrete updating scheme,where previous recruited layers can not be reﬁned as new layers are recruited into the model.To address this issue, we further revised Algorithm 1 to enable joint optimization of alllayers. Detailed procedures are presented in Algorithm 2 in Web Appendix B. IC values from pairwise drug-cell-linescreening, have some missing values. Therefore the data was ﬁrst ﬁltered by removing celllines for which less than 50% of the drugs were tested, resulting in 591 cell lines remaining.In the remaining data, about 37.4% of the cell lines are with missing values. These missingvalues were then imputed via the a random forest imputation algorithm (Ishwaran et al.,2 Liu and Rao

Direct Q1: Can cancer-speciﬁc therapeutic biomarkers be detected?

We applied our new method to identify patterns of cancer-speciﬁc therapeutic biomark-ers. Note that by “cancer-speciﬁc” we do not assume separate patterns for each cancertype but rather clusters that may be driven by one or a small number of cancer types.Throughout the analysis, we ﬁxed K to a value of 3, although as previously shown, a BICmodel selection approach could also be used. However, ﬁxing the value of K does not limitthe interesting ﬁndings that we can still ﬁnd and saves on computational time.Due to the scale and complexity of the data, the analysis was conducted in three steps.Although simulation studies show that modeling with multiple response variables yieldsmuch higher clustering accuracy than modeling with a single response variable, it is unrea-sonable to simply ﬁt all 140 drugs (responses) in one generalized FMMR model. Becauseby doing so, one assumes that the cell line assignments to clusters are the same for all 140drugs, which can hardly be true. Thus in our analysis, we ﬁrst divided the 140 drugs intoseveral groups and then ﬁtted each drug group with a generalized FMMR model, in whichto capture active grouped genomic features, the Elastic-Net penalty along with a simpliﬁedupdating scheme of B k discussed in Section 2.4 was used.In the ﬁrst step, each drug c was ﬁtted by a generalized FMMR model, from whichwe got drug-speciﬁc cell line assignments, ˆ Z c , and cluster-wise coeﬃcient estimates, ˆ B ck for k = 1 , . . . ,

3. In the second step, we used the aﬃnity-propagation clustering (APC)algorithm (Frey and Dueck, 2007) to group the 140 drugs based on results from step 1. Thegrouping was conducted in a two-level nested manner. In the ﬁrst level, the APC algorithmwas applied to all 140 drugs. The pair-wise similarity matrix required by the algorithmas input data was calculated from the Euclidean distance between ˆ Z a and ˆ Z b where a and b refer to two unique drugs. In the second level, the APC algorithm was re-applied toeach ﬁrst-level drug group. Coeﬃcient estimates ˆ B ck for k = 1 , . . . , C from step 2 was ﬁtted by ageneralized FMMR model, from which we got drug-group-speciﬁc cell line assignments, ˆ Z C ,and cluster-wise coeﬃcient estimates ˆ B Ck for k = 1 , . . . , Direct Q2: Can drug resistance patterns be identiﬁed along with predictive recision Therapeutic Biomarker Identiﬁcation xlim C an c e r t y pe s l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l ll l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l ll l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l aero_digestive_tractbloodbonebreastdigestive_systemkidneylungnervous_systempancreasskinsoft_tissuethyroidurogenital_system Drugs G ene s MS−HL MUTAKT2 MUTALK MUTAPC MUTBRAF MUTBRCA1 MUTBRCA2 MUTCCND1 MUTCCND2 MUTCCND3 MUTCDH1 MUTCDK4 MUTCDK6 MUTCDKN2A MUTCDKN2C MUTCDKN2a(p14) MUTCTNNB1 MUTCYLD MUTEGFR MUTEP300 MUTERBB2 MUTEZH2 MUTFAM123B MUTFBXW7 MUTFGFR2 MUTFGFR3 MUTFLCN MUTFLT3 MUTGNAS MUTHRAS MUTIDH1 MUTJAK2 MUTKDM5C MUTKDM6A MUTKDR MUTKIT MUTKRAS MUTMAP2K4 MUTMDM2 MUTMET MUTMLH1 MUTMLLT3 MUTMSH2 MUTMSH6 MUTMYC MUTMYCL1 MUTMYCN MUTNF1 MUTNF2 MUTNOTCH1 MUTNRAS MUTPDGFRA MUTPIK3CA MUTPIK3R1 MUTPTEN MUTRB1 MUTRUNX1 MUTSETD2 MUTSMAD4 MUTSMARCA4 MUTSMARCB1 MUTSMO MUTSOCS1 MUTSTK11 MUTTET2 MUTTP53 MUTTSC1 MUTVHL MUTBCR_ABL MUTEWS_FLI1 MUTMLL_AFF1 MUT S . T r i t y l . L . cys t e i ne Z . LL N l e . CH O M S . J N K . S un i t i n i b A Z D . B i c a l u t a m i de O S U . PA C . I PA . T i p i f a r n i b AB T . P L X E H T . D a s a t i n i b J W . . . M i do s t au r i n S h i k on i n S o r a f en i b N VP . T AE E t opo s i de E m be li n Z M . C G P . C i s p l a t i n B o s u t i n i b X W H . . G e m c i t ab i ne A Z P F . K I N . T G X S a l ub r i na l Lapa t i n i b Q S O S I. B e x a r o t ene AS G e f i t i n i b NU . P D . A Z D . D o x o r ub i c i n T hap s i ga r g i n A U Y A Q T e m s i r o li m u s A Z D A I C A R M G . B r y o s t a t i n . I m a t i n i b B o r t e z o m i b E po t h il one . BAK T . i nh i b i t o r . V III G SK . AA x i t i n i b W Z . . P a r t heno li de G SK A N S C . D M OG A Z D . AB T . G W . SB . VX . A M G . J N K .I nh i b i t o r . V III P D . A G . C M KX M D . P F . FT I. BX . J N J . G W XB l eo m yc i n X . AA G R apa m yc i n A . B M S . C EP . CC T C yc l opa m i ne G N F . F M . A li do m i de G DC . N il o t i n i b P a c li t a x e l B I. C a m p t o t he c i n V i nb l a s t i ne E l e sc l o m o l N VP . BE Z VX . A . B M S . B M S . F H G SK . BAY . . M e t ho t r e x a t e A T R AB I B W B I R B . M K . SB CC T E r l o t i n i b P H A . CH I R . A Z D S L . . N u t li n . B M S . P y r i m e t ha m i ne R o sc o v i t i ne AP . P a z opan i b K U . R O . A Z D P F . O ba t o c l a x . M e sy l a t e C y t a r ab i ne RD EA G DC V i no r e l b i ne D o c e t a x e l C G P . M i t o m yc i n . C V o r i no s t a t C I. A Z D P D . T W . −1e+01 −3e−01 −1e−01 −3e−02 −1e−08 1e−08 1e−02 1e−01 2e+00 Figure 2: Subset estimation results of cluster 1 by ﬁtting the data of each second-leveldrug group in a generalized FMMR model. Bottom panel shows a subset of the coeﬃcientestimates corresponding to the mutations of 71 cancer genes, and upper panel shows cellline compositions of cluster 1 by cancer types. strategies to circumvent resistance using alternative drugs?

We discussed in the Introduction to the paper the success story of Gleevec which wasused for the treatment of CML based on the known speciﬁcity of targeting the BCR-ABLgene. It is now becoming more and more common in the treatment of cancer to ﬁrstsequence known cancer genes in tumor genomes and then design therapies accordingly(Bailey et al., 2014). It is also true that tumors can develop resistance to ﬁrst line therapiesby accumulating mutations which confer resistance. We now show how our methodologycan be used to identify predictive strategies for circumventing drug resistance based onmutation data.Figures 2–3 are similar to Web Figures 7–8, except that they only show a subset ofˆ B C and ˆ B C with regard to the mutations of 71 cancer genes, where ˆ B C = ˆ B C + ˆ B C arecoeﬃcient estimates of cluster 12 for drug group C . Bottom panel of ﬁgure 2–3 are in factregression based drug-genetic association map with red indicating drug-sensitivity biomark-ers and blue indicating drug-resistance biomarkers, conditioning on a cluster of cell lines(upper panel) identiﬁed by the generalized FMMR model. We propose that above estima-tion results can be used for drug repurposing (Martins et al., 2015) for resistant tumors.Take soft tissue cancers as an example of this drug repurposing. We extracted those clusterswhich contain a high percentage of soft tissue cancers (large purple bubbles), along withthe subset cluster-wise coeﬃcient estimates. The extracted information is presented in WebFigure 5. Its upper panel shows the IC values of soft tissue cell lines in extracted clus-4 Liu and Rao xlim C an c e r t y pe s l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l aero_digestive_tractbloodbonebreastdigestive_systemkidneylungnervous_systempancreasskinsoft_tissuethyroidurogenital_system Drugs G ene s MS−HL MUTAKT2 MUTALK MUTAPC MUTBRAF MUTBRCA1 MUTBRCA2 MUTCCND1 MUTCCND2 MUTCCND3 MUTCDH1 MUTCDK4 MUTCDK6 MUTCDKN2A MUTCDKN2C MUTCDKN2a(p14) MUTCTNNB1 MUTCYLD MUTEGFR MUTEP300 MUTERBB2 MUTEZH2 MUTFAM123B MUTFBXW7 MUTFGFR2 MUTFGFR3 MUTFLCN MUTFLT3 MUTGNAS MUTHRAS MUTIDH1 MUTJAK2 MUTKDM5C MUTKDM6A MUTKDR MUTKIT MUTKRAS MUTMAP2K4 MUTMDM2 MUTMET MUTMLH1 MUTMLLT3 MUTMSH2 MUTMSH6 MUTMYC MUTMYCL1 MUTMYCN MUTNF1 MUTNF2 MUTNOTCH1 MUTNRAS MUTPDGFRA MUTPIK3CA MUTPIK3R1 MUTPTEN MUTRB1 MUTRUNX1 MUTSETD2 MUTSMAD4 MUTSMARCA4 MUTSMARCB1 MUTSMO MUTSOCS1 MUTSTK11 MUTTET2 MUTTP53 MUTTSC1 MUTVHL MUTBCR_ABL MUTEWS_FLI1 MUTMLL_AFF1 MUT S . T r i t y l . L . cys t e i ne Z . LL N l e . CH O M S . J N K . S un i t i n i b A Z D . B i c a l u t a m i de O S U . PA C . I PA . T i p i f a r n i b AB T . P L X E H T . D a s a t i n i b J W . . . M i do s t au r i n S h i k on i n S o r a f en i b N VP . T AE E t opo s i de E m be li n Z M . C G P . C i s p l a t i n B o s u t i n i b X W H . . G e m c i t ab i ne A Z P F . K I N . T G X S a l ub r i na l Lapa t i n i b Q S O S I. B e x a r o t ene AS G e f i t i n i b NU . P D . A Z D . D o x o r ub i c i n T hap s i ga r g i n A U Y A Q T e m s i r o li m u s A Z D A I C A R M G . B r y o s t a t i n . I m a t i n i b B o r t e z o m i b E po t h il one . BAK T . i nh i b i t o r . V III G SK . AA x i t i n i b W Z . . P a r t heno li de G SK A N S C . D M OG A Z D . AB T . G W . SB . VX . A M G . J N K .I nh i b i t o r . V III P D . A G . C M KX M D . P F . FT I. BX . J N J . G W XB l eo m yc i n X . AA G R apa m yc i n A . B M S . C EP . CC T C yc l opa m i ne G N F . F M . A li do m i de G DC . N il o t i n i b P a c li t a x e l B I. C a m p t o t he c i n V i nb l a s t i ne E l e sc l o m o l N VP . BE Z VX . A . B M S . B M S . F H G SK . BAY . . M e t ho t r e x a t e A T R AB I B W B I R B . M K . SB CC T E r l o t i n i b P H A . CH I R . A Z D S L . . N u t li n . B M S . P y r i m e t ha m i ne R o sc o v i t i ne AP . P a z opan i b K U . R O . A Z D P F . O ba t o c l a x . M e sy l a t e C y t a r ab i ne RD EA G DC V i no r e l b i ne D o c e t a x e l C G P . M i t o m yc i n . C V o r i no s t a t C I. A Z D P D . T W . −1e+01 −4e−01 −8e−02 −2e−02 −1e−08 1e−08 1e−01 5e−01 2e+00 Figure 3: Subset estimation results of cluster 12 by ﬁtting the data of each second-leveldrug group in a generalized FMMR model.ters. The drugs in Web Figure 5 were further ﬁltered by only keeping those that have low IC values without resistance biomarkers and that have high IC values with resistancebiomarkers, leading to Figure 4. In this ﬁgure, we conclude that the former set of drugscan be used as alternatives for the later set of drugs which progress resistance to soft tissuecancers. As a positive control of this strategy, drug Gemcitabine was found to be eﬀectivein soft tissue cancers resistant to standard chemotherapy (doxyrubicin) in Merimsky et al.(2000). Direct Q3: Can drug synergies be predicted?

Our desire to allow overlapping cell line clusters becomes apparent in answering thisquestion. Overlapping clusters can be used to guide drug combinations and identify poten-tial drug synergies. From ˆ Z C , we can get the cluster-wise IC values for each drug. WebFigure 9 are boxplots of the cluster-wise IC values for 15 out of 140 drugs. Due to limitedspace, the rest are not shown. We then identify the drugs for which the magnitudes of IC sof overlapping cluster, say 12, are between those of clusters 1 and 2. For instance, drugsMS.275 and GW843682X show such interesting pattern. We focus on the cluster with high-est IC s on average, say cluster 2, and use the coeﬃcient estimates of overlapping clusterto guide the search of another drug to decrease the IC s of cluster 2 towards overlappingcluster.Take GW843682X for an example, cluster 2 has the highest IC s among clusters 1, 2,and 12. We predicted the IC s in cluster 2 using ˆ B gw x , ˆ B gw x and ˆ B gw x respectively. Predicted results are shown in top panel (middle) of Figure 5. By doing so, recision Therapeutic Biomarker Identiﬁcation − − xlim I C llllllll llllllll llll lllll lllllllll lllllllll lllllllll lllllllllll lllllllll lllllllll Drugs G ene s MS−HL MUTAKT2 MUTALK MUTAPC MUTBRAF MUTBRCA1 MUTBRCA2 MUTCCND1 MUTCCND2 MUTCCND3 MUTCDH1 MUTCDK4 MUTCDK6 MUTCDKN2A MUTCDKN2C MUTCDKN2a(p14) MUTCTNNB1 MUTCYLD MUTEGFR MUTEP300 MUTERBB2 MUTEZH2 MUTFAM123B MUTFBXW7 MUTFGFR2 MUTFGFR3 MUTFLCN MUTFLT3 MUTGNAS MUTHRAS MUTIDH1 MUTJAK2 MUTKDM5C MUTKDM6A MUTKDR MUTKIT MUTKRAS MUTMAP2K4 MUTMDM2 MUTMET MUTMLH1 MUTMLLT3 MUTMSH2 MUTMSH6 MUTMYC MUTMYCL1 MUTMYCN MUTNF1 MUTNF2 MUTNOTCH1 MUTNRAS MUTPDGFRA MUTPIK3CA MUTPIK3R1 MUTPTEN MUTRB1 MUTRUNX1 MUTSETD2 MUTSMAD4 MUTSMARCA4 MUTSMARCB1 MUTSMO MUTSOCS1 MUTSTK11 MUTTET2 MUTTP53 MUTTSC1 MUTVHL MUTBCR_ABL MUTEWS_FLI1 MUTMLL_AFF1 MUT V i no r e l b i ne ( ) D o c e t a x e l ( ) G e m c i t ab i ne ( ) J W . . . ( ) B M S . ( ) N u t li n . ( ) A Z D ( ) Z M . ( ) PA C . ( ) O S U . ( ) −3e+00 −4e−02 −1e−08 8e−02 2e+00 Figure 4: The extracted IC values of soft tissue cancers (upper panel) and a subset ofthe coeﬃcient estimates corresponding to the mutations of 71 cancer genes (bottom panel)after ﬁltering those drugs not showing the desired pattern. l l − − MS.275 I C Tr Prc1 Prc2 Prc4 − − − GW843682X I C Tr Prc1 Prc2 Prc4 l − − GW843682X I C Tr Prc2 Prc3 Prc6 l MS.275+Bicalutamide I C Tr Prc1h Prc2 Prc4h llll ll − − GW843682X+Epothilone.B I C Tr Prc1h Prc2 Prc4h llll − − − GW843682X+NVP.BEZ235 I C Tr Prc2h Prc3 Prc6h

Figure 5: Prediction of drug eﬀects and drug combination eﬀects. In each plot, “Tr”represents the observed IC values; the rest three boxplots are predicted IC values usingselected coeﬃcient matrices.6 Liu and Rao we want to see if coeﬃcients ˆ B gw x or ˆ B gw x applied to cell lines in cluster 2 cangenerate low IC s similar to those in clusters 1 or 12. Since ˆ B gw x corresponds to thecoeﬃcient estimates of the target drug GW843682X, we need to ﬁnd another drug producingsimilar coeﬃcient estimates to ˆ B gw x , this leads to ˆ B epothilone.b . The predicted IC s ofcluster 2 using ˆ B epothilone.b , ˆ B gw x and ˆ B epothilone.b + ˆ B gw x are presented in bottompanel (middle) of Figure 5. It shows that the combination of GW843682X and Epothilone.Bproduces much lower IC s for cluster 2 than GW843682X alone. Similar studies wereconducted for MS.275 with respect to its clusters 1, 2, 12 and GW843682X with respect toits clusters 2, 3, 23. Results are shown in left and right columns of Figure 5 respectively.The combination of GW843682X and NVP.BEZ235 decreases the IC s dramatically, whilethe combination of MS.275 and Bicalutamide does not show such eﬀect. As a positivecontrol, Wildey et al. (2014) pointed out that a group of drugs (including GW843682Xand Epothilone B) may provide a practical starting point to investigate combinatorial drugtherapies for synergistic eﬀect in small-cell lung cancer (SCLC). Among the group of drugs,they found synergism between GW843682X and CGP60474 via a preliminary study. In this paper, we proposed a new model for identifying therapeutic biomarkers for cancerwhich can answer speciﬁc questions regarding sensitivity, resistance and synergy. We used apenalized likelihood approach for the FMMR model which enforced sparsity in the genomicfeatures. To enable overlapping clustering, the FMMR model was then generalized and anew EM algorithm derived for estimation of model parameters. While the noteworthy plaidmodel can also be generalized for overlapping clustering multivariate regression data, thegeneralized FMMR model markedly outperforms this method.Some improvements can be made for future developments on this work. First, otheruseful penalty functions for multivariate regression estimations can be adopted, such as the

M AP (MAster Predictor) penalty in Peng et al. (2010), the L SV S method by Simil¨a andTikka (2007) and the L ∞ SV S method by Turlach et al. (2005). Second, the multivariatenormal distribution assumption on Y i can be extended to other more ﬂexible paramet-ric families of multivariate distributions, such as the skew-normal distribution (Azzalini,2005) and the multivariate skew- t distribution (Chen et al., 2014), for frequent presence ofskewness and kurtosis in real data.As described in the CGP data analyses, we had about 37% of the ﬁnal analyses obser-vations that had missing data and which we dealt with by using random forest imputation.It would be of interest to explore the impact of this step more thoroughly. An alternativewould be to generalize to this problem, the recently developed E-MS algorithm Jiang et al.(2015) for model selection with incomplete data. Acknowledgements

Hongmei Liu was supported by a doctoral fellowship from the Sylvester ComprehensiveCancer Center at the University of Miami. recision Therapeutic Biomarker Identiﬁcation Supplementary Materials

Web Appendices, Tables, and Figures referenced in Sections 2.6, 3 and 4 are available withthis paper at the Biometrics website on Wiley Online Library.

References

H. Akaike. A new look at the statistical model identiﬁcation.

IEEE transactions on auto-matic control , 19(6):716–723, 1974.A. Azzalini. The skew-normal distribution and related multivariate families.

ScandinavianJournal of Statistics , 32(2):159–188, 2005.R. Baeza-Yates, B. Ribeiro-Neto, et al. Modern information retrieval. 463, 1999.A. M. Bailey, Y. Mao, J. Zeng, V. Holla, A. Johnson, L. Brusco, K. Chen, J. Mendelsohn,M. J. Routbort, G. B. Mills, et al. Implementation of biomarker-driven cancer therapy:existing tools and remaining gaps.

Discovery medicine , 17(92):101, 2014.A. Banerjee, C. Krumpelman, J. Ghosh, S. Basu, and R. J. Mooney. Model-based overlap-ping clustering. In

Proceedings of the eleventh ACM SIGKDD international conferenceon Knowledge discovery in data mining , pages 532–537. ACM, 2005.L. Breiman and J. H. Friedman. Predicting multivariate responses in multiple linear regres-sion.

Journal of the Royal Statistical Society: Series B (Statistical Methodology) , 59(1):3–54, 1997.L. Chen, M. Pourahmadi, and M. Maadooliat. Regularized multivariate regression modelswith skew-t error distributions.

Journal of Statistical Planning and Inference , 149:125–139, 2014.G. Claeskens, N. L. Hjort, et al. Model selection and model averaging. 330, 2008.A. P. Dempster, N. M. Laird, and D. B. Rubin. Maximum likelihood from incomplete datavia the em algorithm.

Journal of the royal statistical society. Series B (methodological) ,pages 1–38, 1977.W. S. DeSarbo and W. L. Cron. A maximum likelihood methodology for clusterwise linearregression.

Journal of classiﬁcation , 5(2):249–282, 1988.B. J. Druker, F. Guilhot, S. G. O’Brien, I. Gathmann, H. Kantarjian, N. Gattermann,M. W. Deininger, R. T. Silver, J. M. Goldman, R. M. Stone, et al. Five-year follow-upof patients receiving imatinib for chronic myeloid leukemia.

New England Journal ofMedicine , 355(23):2408–2417, 2006.J. Fan and R. Li. Variable selection via nonconcave penalized likelihood and its oracleproperties.

Journal of the American statistical Association , 96(456):1348–1360, 2001.B. J. Frey and D. Dueck. Clustering by passing messages between data points. science , 315(5814):972–976, 2007.8

Liu and Rao

Q. Fu and A. Banerjee. Multiplicative mixture models for overlapping clustering. In , pages 791–796. IEEE, 2008.M. J. Garnett, E. J. Edelman, S. J. Heidorn, C. D. Greenman, A. Dastur, K. W. Lau,P. Greninger, I. R. Thompson, X. Luo, J. Soares, et al. Systematic identiﬁcation ofgenomic markers of drug sensitivity in cancer cells.

Nature , 483(7391):570–575, 2012.B. Grun and F. Leisch. Flexmix version 2: ﬁnite mixtures with concomitant variables andvarying and constant parameters. 2008.H. Ishwaran, U. B. Kogalur, E. H. Blackstone, and M. S. Lauer. Random survival forests.

The annals of applied statistics , pages 841–860, 2008.K. Jedidi, V. Ramaswamy, W. S. DeSarbo, and M. Wedel. On estimating ﬁnite mixtures ofmultivariate regression and simultaneous equation models.

Structural Equation Modeling:A Multidisciplinary Journal , 3(3):266–289, 1996.J. Jiang, T. Nguyen, and J. S. Rao. The e-ms algorithm: model selection with incompletedata.

Journal of the American Statistical Association , 110(511):1136–1147, 2015.P. Jones and G. McLachlan. Fitting ﬁnite mixture models in a regression context.

AustralianJournal of Statistics , 34(2):233–240, 1992.C. Keribin. Consistent estimation of the order of mixture models.

Sankhy¯a: The IndianJournal of Statistics, Series A , pages 49–66, 2000.A. Khalili and J. Chen. Variable selection in ﬁnite mixture of regression models.

Journalof the american Statistical association , 2012.A. Khalili and S. Lin. Regularization in ﬁnite mixture of regression models with divergingnumber of parameters.

Biometrics , 69(2):436–446, 2013.L. Lazzeroni and A. Owen. Plaid models for gene expression data.

Statistica sinica , pages61–86, 2002.M. M. Martins, A. Y. Zhou, A. Corella, D. Horiuchi, C. Yau, T. Rakshandehroo, J. D.Gordan, R. S. Levin, J. Johnson, J. Jascur, et al. Linking tumor mutations to drugresponses via a quantitative chemical–genetic interaction map.

Cancer discovery , 5(2):154–167, 2015.O. Merimsky, I. Meller, G. Flusser, Y. Kollender, J. Issakov, M. Weil-Ben-Arush, E. Fenig,G. Neuman, D. Sapir, S. Ariad, et al. Gemcitabine in soft tissue or bone sarcoma resistantto standard chemotherapy: a phase ii study.

Cancer chemotherapy and pharmacology , 45(2):177–181, 2000.R. Needham. Computer methods for classiﬁcation and grouping.

The Use of Computers inAnthropology, I. Hymes, ed , pages 345–356, 1965.J. Peng, J. Zhu, A. Bergamaschi, W. Han, D.-Y. Noh, J. R. Pollack, and P. Wang. Regu-larized multivariate regression for identifying master predictors with application to inte-grative genomics study of breast cancer.

The annals of applied statistics , 4(1):53, 2010. recision Therapeutic Biomarker Identiﬁcation

Journal of the Americanstatistical association , 67(338):306–310, 1972.A. J. Rothman, E. Levina, and J. Zhu. Sparse multivariate regression with covarianceestimation.

Journal of Computational and Graphical Statistics , 19(4):947–962, 2010.G. Schwarz et al. Estimating the dimension of a model.

The annals of statistics , 6(2):461–464, 1978.T. Simil¨a and J. Tikka. Input selection and shrinkage in multiresponse linear regression.

Computational Statistics & Data Analysis , 52(1):406–422, 2007.M. Stone. An asymptotic equivalence of choice of model by cross-validation and akaike’scriterion.

Journal of the Royal Statistical Society. Series B (Methodological) , pages 44–47,1977.R. Tibshirani. Regression shrinkage and selection via the lasso.

Journal of the RoyalStatistical Society. Series B (Methodological) , pages 267–288, 1996.B. A. Turlach, W. N. Venables, and S. J. Wright. Simultaneous variable selection.

Techno-metrics , 47(3):349–363, 2005.H. Turner, T. Bailey, and W. Krzanowski. Improved biclustering of microarray data demon-strated through systematic performance tests.

Computational statistics & data analysis ,48(2):235–254, 2005.G. Wildey, Y. Chen, I. Lent, L. Stetson, J. Pink, J. S. Barnholtz-Sloan, and A. Dowlati.Pharmacogenomic approach to identify drug sensitivity in small-cell lung cancer.

PloSone , 9(9):e106784, 2014.J. Zhang. A bayesian model for biclustering with applications.

Journal of the Royal Statis-tical Society: Series C (Applied Statistics) , 59(4):635–656, 2010.H. Zou and T. Hastie. Regularization and variable selection via the elastic net.

Journal ofthe Royal Statistical Society: Series B (Statistical Methodology) , 67(2):301–320, 2005.0

Liu and Rao

Web-based Supplementary Materials for “PrecisionTherapeutic Biomarker Identiﬁcation with Application tothe Cancer Genome Project”A Simulation Studies

A.1 Design of the Simulations

Two scenarios were designed to investigate the non-overlapping clustering and overlappingclustering of the new method respectively. In each scenario, data were generated from model(2.1) with K = 3, p n = 15, q = 3 and n = 150 , x i , i = 1 , . . . , n , were drawnindependently from N p n ( , Σ ) with Σ ( i, j ) = 0 . | i − j | . Random errors ε i , i = 1 , . . . , n ,were drawn independently from N q ( , Σ ) with Σ ( i, j ) = 0 . | i − j | .The sparse coeﬃcient matrices B k , k = 1 , ,

3, were generated as B k = W ⊗ S ⊗ T , (18)where ⊗ indicates the element-wise product. Each entry of W was drawn independentlyfrom N (0 , S was drawn independently from the Bernoulli distribution B (1 , p ), and each row of T (either all 1 or all 0) was determined by an independent drawfrom B (1 , p ). As a result, we expect p p p n relevant predictors for each response variable,and (1 − p ) p n predictors are expected to be irrelevant to all q response variables. Welet p = 0 . p = 0 .

9. Each simulation was repeated for 50 times. A new sequence of B k , k = 1 , , X , Y ) were generated for each repetition. We assumedthat K is known, so that performance evaluation of the new method is not interfered byselection of K . Its selection is separately examined in Section A.4.Scenario 1: The 3 clusters are non-overlapping, each cluster contains n/ A.2 Quality Measures of Cluster Recovery

Quality measures in Baeza-Yates et al. (1999) have been used to evaluate the clusteringperformance of the proposed method. Denote N S the number of elements in S . Belowquality measures were proposed to compare a retrieved cluster ˆ A with a target cluster A ,“speciﬁcity” = N A ∩ ˆ A N A , “sensitivity” = N A ∩ ˆ A N ˆ A , “ F measure” = 2 N A ∩ ˆ A N A + N ˆ A . The F measure, taken as the harmonic mean of speciﬁcity and sensitivity, gives an overallmeasure of the clustering.Moreover, we employed the one-to-one correspondence match approach (Turner et al.,2005) to evaluate a sequence of retrieved clusters. It includes about three steps. First makethe number of retrieved clusters to be the same as the number of target clusters by adding recision Therapeutic Biomarker Identiﬁcation A.3 Results of the Simulations

A sequence of methods were implemented for comprehensive comparison with our newmethod, including the generalized plaid model ( plaid ) and its revised counterparts ( aplaid );the R package ﬂexmix ( EM ) where response variables are treated as independent and over-lap issues are not considered; the generalized FMMR model with response variables treatedas independent ( gEMseplasso0 ); the generalized FMMR model with separate LASSO esti-mation of B k ( gEMseplasso ); and ﬁnally the original generalized FMMR model with MRCEestimation of B k ( gEMmrce ).Their performances were evaluated via the three quality measures introduced in Sec-tion A.2 and the sum of squared errors (SSE) of coeﬃcients estimates. Results are sum-marized in Web Figure 6–7. In scenario 1 when there is no overlap, EM and gEMseplasso0 do equally well showing that the generalized FMMR model retains the ability to recovernon-overlapping clusters. By taking into account of the covariance structure among re-sponse variables, gEMseplasso and gEMmrce outperform EM and gEMseplasso0 , but thereis little diﬀerence between gEMseplasso and gEMmrce . In scenario 2 when there are 30%overlaps, speciﬁcity of EM drops down dramatically, resulting in a poor F measure. The gEMseplasso0 does much better than EM by estimating the overlaps, while it is defeatedby gEMseplasso and gEMmrce . In this scenario, gEMmrce outperforms gEMseplasso when n = 150. However, they converge very fast with n , both achieving a median 95% clusteringaccuracy when n = 450. The plaid and aplaid have a poor performance in all scenarios. A.4 Additional Simulations

We conducted another simulation for scenario 2 to evaluate the accuracy of BIC in selectionof K . In the simulation, K is unknown in candidate procedures EM and gEMseplasso , andis chosen by minimizing the BIC. Results are summarized in Web Figure 8 and Web Table 1.Given 30% overlaps in simulated data, BIC has a poor performance in identifying the true K with the EM procedure. Whereas by ﬁtting the overlapping clusters with gEMseplasso , BICobtains a much higher accuracy in selection of K (see Web Table 1). Moreover, although K is misspeciﬁed for 12% of the times when n = 450, gEMseplasso still has achieved a median93% clustering accuracy (see Web Figure 8).An additional simulation was conducted for scenario 2 to investigate the advantageof modeling with multivariate response variables in the new method. We compared theperformance of modeling with all 3 response variables versus modeling with each of the3 response variables in the generalized FMMR model. Results are summarized in WebFigure 9. Using multivariate response variables leads to a much higher clustering accuracy.2 Liu and Rao

Algorithm 1

PlaidFor k = 1 : K doAt this point, we have already found k − k th layer.Calculate the residuals from previous k − z i = y i − k − (cid:88) l =1 x Ti B l P il , i = 1 , . . . , n. Initialize P k .For s = 1 : S doGiven P s − k , compute B sk = arg min B k n (cid:88) i =1 (cid:107) z i − x Ti B k P s − ik (cid:107) + q (cid:88) j =1 λ jk (cid:107) b jk (cid:107) , where B k = ( b k , . . . , b qk ) . Given B sk , compute P sik = (cid:40) . { . , S S − T ) } , (cid:107) z i − x Ti B sk (cid:107) ≤ (cid:107) z i (cid:107) , . − min { . , S S − T ) } , otherwise , where T was taken as 0 . S for i = 1 , . . . , n .End forGiven P Sk , update B S +1 k .Given B S +1 k , prune layer memberships by P S +1 ik = (cid:40) , (cid:107) z i − x Ti B k (cid:107) ≤ τ (cid:107) z i (cid:107) , , otherwise , where τ ∈ (0 ,

1) for i = 1 , . . . , n .Given P S +1 k , update B S +2 k .Given P S +11 , . . . , P S +1 k , back ﬁt the layers R times. We set R = 2.End for recision Therapeutic Biomarker Identiﬁcation Algorithm 2

All-plaidInitialize B , . . . , B K and P , . . . , P K .For s = 1 : S doFor k = 1 : K doGiven ( B s , . . . , B sk − , B s − k +1 , . . . , B k − K ) and ( P s , . . . , P sk − , P s − k , . . . , P s − K ), compute z i = y i − k − (cid:88) l =1 x Ti B sl P sil − K (cid:88) l = k +1 x Ti B s − l P s − il , i = 1 , . . . , n, B sk = arg min B k n (cid:88) i =1 (cid:107) z i − x Ti B k P s − ik (cid:107) + q (cid:88) j =1 λ jk (cid:107) b jk (cid:107) , where B k = ( b k , . . . , b qk ) . Given B sk , compute P sik = (cid:40) . { . , S S − T ) } , (cid:107) z i − x Ti B k (cid:107) ≤ τ (cid:107) z i (cid:107) , . − min { . , S S − T ) } , otherwise , where T was taken as 0 . S and τ ∈ (0 ,

1) for i = 1 , . . . , n .End forEnd for Web Table 1: Accuracy of BIC in selection of K .sample size EM gEMseplasso n=150 0.32 0.56n=450 0.22 0.88

B Two algorithms of the Generalized Plaid ModelWeb TablesWeb Figures Liu and Rao . . . . . . . Methods S pe c i f i c i t y lllll lllllllllll llllllll ll l plaidaplaidEMgEMseplasso0gEMseplassogEMmrce . . . . . . . Methods S en s i t i v i t y lll lllllllll llll ll l . . . . . . . Methods F m ea s u r e ll lllllll llllllllll ll l l Methods C oe ff i c i en t E rr o r ll lll lll llllllll llllllllllll lll lll ll Web Figure 6: Simulation results of scenario 1. The two box plots under each methodcorrespond to n = 150 ,

450 respectively. . . . . . . . Methods S pe c i f i c i t y lll l lllllll llllllll llllllll plaidaplaidEMgEMseplasso0gEMseplassogEMmrce . . . . . . . Methods S en s i t i v i t y l l ll lllll lll . . . . . . . Methods F m ea s u r e ll lllllll lllllll llll Methods C oe ff i c i en t E rr o r ll l l lll llllllllll lllll llll Web Figure 7: Simulation results of scenario 2. recision Therapeutic Biomarker Identiﬁcation Methods K b y B I C lllllllllllllllllllllll llllll EMgEMseplasso . . . . . . . . Methods S pe c i f i c i t y ll ll . . . . . . . . Methods S en s i t i v i t y l l . . . . . . . . Methods F m ea s u r e l Methods C oe ff i c i en t E rr o r l lll l Web Figure 8: Simulation results of BIC in selection of K . . . . . . . . Methods S pe c i f i c i t y ll l YY[, 1]Y[, 2]Y[, 3] . . . . . . . Methods S en s i t i v i t y ll lll lll . . . . . . . Methods F m ea s u r e l l ll l l Methods C oe ff i c i en t E rr o r ll l llll llll lllll lllllllll llllll Web Figure 9: Simulation results of using multivariate responses versus univariate responses.6

Liu and Rao − − xlim I C l l l l l l l l l ll l l l l l l l l ll l l l l l l l l ll l l l l l l l l ll l l l l l l l l ll l l l l l l l l ll l l l l l l l l ll l l l l l l l l ll l l l l l l l l l l l l l ll l l l ll l l l ll l l l ll l l l ll l l l ll l l l ll l l l ll l l l ll l l l ll l l l l l ll ll ll l l ll ll ll ll ll ll ll l lllll l l l l l l ll l l l l l ll l l l l l ll l l l l l ll l l l l l ll l l l l l ll l l l l l ll l l l l l ll l l l l l l l l ll l ll l ll l l llll llll l l l ll l l ll l l ll l l ll l l ll l l l l l ll l ll l ll l ll l l Drugs G ene s MS−HL MUTAKT2 MUTALK MUTAPC MUTBRAF MUTBRCA1 MUTBRCA2 MUTCCND1 MUTCCND2 MUTCCND3 MUTCDH1 MUTCDK4 MUTCDK6 MUTCDKN2A MUTCDKN2C MUTCDKN2a(p14) MUTCTNNB1 MUTCYLD MUTEGFR MUTEP300 MUTERBB2 MUTEZH2 MUTFAM123B MUTFBXW7 MUTFGFR2 MUTFGFR3 MUTFLCN MUTFLT3 MUTGNAS MUTHRAS MUTIDH1 MUTJAK2 MUTKDM5C MUTKDM6A MUTKDR MUTKIT MUTKRAS MUTMAP2K4 MUTMDM2 MUTMET MUTMLH1 MUTMLLT3 MUTMSH2 MUTMSH6 MUTMYC MUTMYCL1 MUTMYCN MUTNF1 MUTNF2 MUTNOTCH1 MUTNRAS MUTPDGFRA MUTPIK3CA MUTPIK3R1 MUTPTEN MUTRB1 MUTRUNX1 MUTSETD2 MUTSMAD4 MUTSMARCA4 MUTSMARCB1 MUTSMO MUTSOCS1 MUTSTK11 MUTTET2 MUTTP53 MUTTSC1 MUTVHL MUTBCR_ABL MUTEWS_FLI1 MUTMLL_AFF1 MUT S un i t i n i b ( ) A Z D . ( ) B i c a l u t a m i de ( ) O S U . ( ) PA C . ( ) I PA . ( ) T i p i f a r n i b ( ) AB T . ( ) P L X ( ) E H T . ( ) S o r a f en i b ( ) N VP . T AE ( ) E t opo s i de ( ) E m be li n ( ) Z M . ( ) M G . ( ) B r y o s t a t i n . ( ) V i no r e l b i ne ( ) D o c e t a x e l ( ) N il o t i n i b ( ) E r l o t i n i b ( ) P H A . ( ) CH I R . ( ) A Z D ( ) S L . . ( ) N u t li n . ( ) B M S . ( ) J W . . . ( ) M i do s t au r i n ( ) S h i k on i n ( ) G e m c i t ab i ne ( ) I m a t i n i b ( ) O ba t o c l a x . M e sy l a t e ( ) C y t a r ab i ne ( ) RD EA ( ) G DC ( ) J W . . . ( ) M i do s t au r i n ( ) S h i k on i n ( ) −1e+01 −1e−01 −1e−08 1e−01 2e+00 Web Figure 10: The extracted IC values of soft tissue cancers (upper panel) and a subset ofthe coeﬃcient estimates corresponding to the mutations of 71 cancer genes (bottom panel).Drug names in red indicate type “chemo”, black indicate type “clinical”, blue indicate type“experimental”, darkblue indicate type “in clinical development” and yellow indicate drugtype not available. recision Therapeutic Biomarker Identiﬁcation MG.132ImatinibBortezomibBryostatin.1Epothilone.B ABT.888WZ.1.84DMOG GSK269962A MS.275AMG.706Parthenolide AG.014699 CI.1040PD.0325901Mitomycin.CVorinostat AZD7762CGP.60474TW.37VX.702 NSC.87877SB.216763 Z.LLNle.CHOAZD.2281 S.Trityl.L.cysteinePD.173074 JNK.9LJNK.Inhibitor.VIIIGW.441756 RO.3306AP.24534KU.55933 PyrimethamineAZD6244PazopanibRoscovitinePF.4708671Nutlin.3aSL.0101.1 PHA.665752ErlotinibAZD6482BMS.708163AxitinibAKT.inhibitor.VIIIFTI.277BX.795CMKPF.562271XMD8.85 GSK.1904529AJNJ.26854165 GDC0941 VinorelbineCytarabineRDEA119Obatoclax.MesylateCHIR.99021 DocetaxelLFM.A13CyclopamineGNF.2LenalidomideGDC.0449A.443654RapamycinCCT018159CEP.701BMS.754807NilotinibVinblastineCamptothecinElesclomolNVP.BEZ235PaclitaxelBI.2536 VX.680FH535BMS.509744 BIRB.0796MethotrexateA.770041 ATRA X17.AAGMK.2206 GW843682XSB590885CCT007093BAY.61.3606BIBW2992 BMS.536924GSK.650394 Bleomycin ThapsigarginAUY922DoxorubicinTemsirolimusAZD8055AZ628NU.7441SalubrinalGefitinibAS601245 AZD6482.1PF.02341066 KIN001.135BexaroteneOSI.906NA Shikonin NVP.TAE684EtoposideEmbelinJW.7.52.1Midostaurin SorafenibZM.447439DasatinibCisplatinCGP.082996BosutinibX681640BicalutamidePAC.1SunitinibTipifarnibAZD.0530PLX4720 OSU.03012ABT.263IPA.3EHT.1864 AICARTGX221 LAQ824Lapatinib QS11 GemcitabineWH.4.023PD.0332991

Web Figure 11: Groups of drugs generated from the APC algorithm. Each node impliesa type of drug with the drug name on it. Drugs connected with the same color of strokesbelong to a ﬁrst-level drug group, and drugs connected within a closure belong to a second-level drug group.8

Liu and Rao xlim C an c e r t y pe s l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l ll l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l ll l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l aero_digestive_tractbloodbonebreastdigestive_systemkidneylungnervous_systempancreasskinsoft_tissuethyroidurogenital_system Drugs G ene s interceptFSTOR10H3ATM /// LOC651610F7EHBP1L1FLJ10246HMOX1ATG16L1TXNL4ASEMA3CSHMT1CELTAF5LBTN3A1NME6VCAM1BC37295_3OR2F1 /// OR2F2FAM63ALOC150759KIAA0196SSSCA1SYNJ2BPHIST1H2BMHSPA4HPSELRP3CD55C15ORF34PRIM1RAD23BATP2B3CRYL1CBFBNVLSAS10EMR3PPM1EKPNA5FAM12AFAIM2KCNB2WDR6MYO1CERCC6DXS542TRY6ABP1CCDC44ST7LRIMBP2OAS3C16ORF67PIM1RAC2AOF2ERBB2LPGAT1THOC1CST2GPR37L1LOC147343PHKG2CENPC1CACNA1DPOMT2ETV6 CPRPN1 CPbone S . T r i t y l . L . cys t e i ne Z . LL N l e . CH O M S . J N K . S un i t i n i b A Z D . B i c a l u t a m i de O S U . PA C . I PA . T i p i f a r n i b AB T . P L X E H T . D a s a t i n i b J W . . . M i do s t au r i n S h i k on i n S o r a f en i b N VP . T AE E t opo s i de E m be li n Z M . C G P . C i s p l a t i n B o s u t i n i b X W H . . G e m c i t ab i ne A Z P F . K I N . T G X S a l ub r i na l Lapa t i n i b Q S O S I. B e x a r o t ene AS G e f i t i n i b NU . P D . A Z D . D o x o r ub i c i n T hap s i ga r g i n A U Y A Q T e m s i r o li m u s A Z D A I C A R M G . B r y o s t a t i n . I m a t i n i b B o r t e z o m i b E po t h il one . BAK T . i nh i b i t o r . V III G SK . AA x i t i n i b W Z . . P a r t heno li de G SK A N S C . D M OG A Z D . AB T . G W . SB . VX . A M G . J N K .I nh i b i t o r . V III P D . A G . C M KX M D . P F . FT I. BX . J N J . G W XB l eo m yc i n X . AA G R apa m yc i n A . B M S . C EP . CC T C yc l opa m i ne G N F . F M . A li do m i de G DC . N il o t i n i b P a c li t a x e l B I. C a m p t o t he c i n V i nb l a s t i ne E l e sc l o m o l N VP . BE Z VX . A . B M S . B M S . F H G SK . BAY . . M e t ho t r e x a t e A T R AB I B W B I R B . M K . SB CC T E r l o t i n i b P H A . CH I R . A Z D S L . . N u t li n . B M S . P y r i m e t ha m i ne R o sc o v i t i ne AP . P a z opan i b K U . R O . A Z D P F . O ba t o c l a x . M e sy l a t e C y t a r ab i ne RD EA G DC V i no r e l b i ne D o c e t a x e l C G P . M i t o m yc i n . C V o r i no s t a t C I. A Z D P D . T W . −1e+01 −2e−03 −2e−04 −8e−07 −1e−08 1e−08 2e−04 2e−03 9e+00 Web Figure 12: Estimation results of cluster 1 by ﬁtting the data of each second-level druggroup in a generalized FMMR model. Bottom panel plots the coeﬃcient estimates, andupper panel describes the cell line compositions of cluster 1 by cancer types. recision Therapeutic Biomarker Identiﬁcation xlim C an c e r t y pe s l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l aero_digestive_tractbloodbonebreastdigestive_systemkidneylungnervous_systempancreasskinsoft_tissuethyroidurogenital_system Drugs G ene s interceptTFAMPPAP2AADCY8INACYBAIL8RAIRX5ATP8B4BMPR2BIDMGC24039CLEC1AMAPK3BTN3A2C10ORF7VCAM1GSTCDATP5EPTGS2B4GALT1SLC38A4QPCTRAB11FIP1ZNF623PTPN12TSGA10PTK6PTCH2GNPDA1FGF21MDKRAB36DHTKD1C14ORF106SART1AP2B1MUSKTBX21TMCO3AGPAT4LATS1ORC3LRORCKRT12C9ORF46RAB6IP2 /// LOC65315GLRX5LHX5CLK2NUDT9CKS2MYBPC2C22ORF31GPD1LDPEP1ETV1RBM9LOC90925LRRC49GABRA2CENTA1ETFDHDNAJC6FLJ20273CACNA1ENFKBIL1EPS15 CPSLC45A3 CPbreast S . T r i t y l . L . cys t e i ne Z . LL N l e . CH O M S . J N K . S un i t i n i b A Z D . B i c a l u t a m i de O S U . PA C . I PA . T i p i f a r n i b AB T . P L X E H T . D a s a t i n i b J W . . . M i do s t au r i n S h i k on i n S o r a f en i b N VP . T AE E t opo s i de E m be li n Z M . C G P . C i s p l a t i n B o s u t i n i b X W H . . G e m c i t ab i ne A Z P F . K I N . T G X S a l ub r i na l Lapa t i n i b Q S O S I. B e x a r o t ene AS G e f i t i n i b NU . P D . A Z D . D o x o r ub i c i n T hap s i ga r g i n A U Y A Q T e m s i r o li m u s A Z D A I C A R M G . B r y o s t a t i n . I m a t i n i b B o r t e z o m i b E po t h il one . BAK T . i nh i b i t o r . V III G SK . AA x i t i n i b W Z . . P a r t heno li de G SK A N S C . D M OG A Z D . AB T . G W . SB . VX . A M G . J N K .I nh i b i t o r . V III P D . A G . C M KX M D . P F . FT I. BX . J N J . G W XB l eo m yc i n X . AA G R apa m yc i n A . B M S . C EP . CC T C yc l opa m i ne G N F . F M . A li do m i de G DC . N il o t i n i b P a c li t a x e l B I. C a m p t o t he c i n V i nb l a s t i ne E l e sc l o m o l N VP . BE Z VX . A . B M S . B M S . F H G SK . BAY . . M e t ho t r e x a t e A T R AB I B W B I R B . M K . SB CC T E r l o t i n i b P H A . CH I R . A Z D S L . . N u t li n . B M S . P y r i m e t ha m i ne R o sc o v i t i ne AP . P a z opan i b K U . R O . A Z D P F . O ba t o c l a x . M e sy l a t e C y t a r ab i ne RD EA G DC V i no r e l b i ne D o c e t a x e l C G P . M i t o m yc i n . C V o r i no s t a t C I. A Z D P D . T W . −1e+01 −2e−03 −2e−04 −1e−08 1e−08 9e−07 2e−04 2e−03 8e+00 Web Figure 13: Estimation results of cluster 12 by ﬁtting the data of each second-level druggroup in a generalized FMMR model.0

Liu and Rao ll ll llllll

FTI.277 I C lll lllllll − BX.795 I C lllll l lllllllllllllll JNJ.26854165 I C lllll lll ll − GW843682X I C llllllllll lll lllllllllllllll llll − Bleomycin I C l ll − − X17.AAG I C lllllll lll lllllllllllllllllll ll ll − Rapamycin I C lllll llllllllllllll llllllllllllllllllllllllll − A.443654 I C llll lllllllllllllllllllll l ll − BMS.754807 I C lllllll lllll llllllll − CEP.701 I C l l CCT018159 I C lllllll lllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll Cyclopamine I C lllllllllllllllllllllllllllllllllllllllllllllll lllll lllllllllllllll − GNF.2 I C llll lllllllllllll LFM.A13 I C lllllllllll llllllllllllll Lenalidomide I C Web Figure 14: Cluster-wise IC50