A stability-driven protocol for drug response interpretable prediction (staDRIP)
Xiao Li, Tiffany M. Tang, Xuewei Wang, Jean-Pierre A. Kocher, Bin Yu
MML4H Extended Abstract Arxiv Index:1–17, 2020 Machine Learning for Health (ML4H) 2020
A stability-driven protocol for drug response interpretableprediction (staDRIP)
Xiao Li ∗ [email protected] University of California, Berkeley
Tiffany M. Tang [email protected]
University of California, Berkeley
Xuewei Wang
Mayo Clinic
Jean-Pierre A. Kocher
Mayo Clinic
Bin Yu [email protected]
University of California, Berkeley
Abstract
Modern cancer -omics and pharmaco-logical data hold great promise in preci-sion cancer medicine for developing in-dividualized patient treatments. How-ever, high heterogeneity and noise insuch data pose challenges for predictingthe response of cancer cell lines to thera-peutic drugs accurately. As a result, ar-bitrary human judgment calls are ram-pant throughout the predictive model-ing pipeline. In this work, we develop atransparent stability-driven pipeline fordrug response interpretable predictions,or staDRIP, which builds upon the PCSframework for veridical data science (Yuand Kumbier, 2020) and mitigates theimpact of human judgment calls. Herewe use the PCS framework for the firsttime in cancer research to extract pro-teins and genes that are important inpredicting the drug responses and sta-ble across appropriate data and modelperturbations. Out of the 24 most sta-ble proteins we identified using datafrom the Cancer Cell Line Encyclope-dia (CCLE), 18 have been associatedwith the drug response or identified as a ∗ The first two authors contributed equally to thispaper. known or possible drug target in previ-ous literature, demonstrating the util-ity of our stability-driven pipeline forknowledge discovery in cancer drug re-sponse prediction modeling.
1. Introduction
A critical goal in precision medicine oncol-ogy revolves around predicting a patient’sresponse to therapeutic drugs given the pa-tient’s unique molecular profile (30; 19). Ac-curate personalized drug response predic-tions can immediately shed light on therapiesthat are likely to be ineffective or toxic andaid clinicians in deciding the most promis-ing treatment for their patients (2). More-over, interpreting these drug response pre-diction models can help to improve recom-mendations of compounds and target genesto prioritize in future preclinical research (8).While several community-wide, public ef-forts (4; 10) and many other works havemade progress towards improving the predic-tive accuracy of drug response predictions,identifying the important disease signatures(i.e., proteins, genes, and other biomarkers)that drive the drug response prediction mod- © a r X i v : . [ q - b i o . Q M ] N ov taDRIP els has received less attention. To date, pre-vious works have typically focused on fea-ture selection within one specific model suchas elastic nets (18; 4) and random forest(29). However, because molecular profilingdata is often heterogeneous, noisy, and high-dimensional, these results are highly sensi-tive to modeling decisions made by humansincluding the type of model, the amount oftraining data, and the choice of algorithm.In this work, we focus on this goal ofdetecting stable, interpretable, and predic-tive -omic signatures that drive a cell line’sdrug response. To overcome the aforemen-tioned challenges, we develop a transparentstability-driven pipeline for drug response in-terpretable prediction called staDRIP thatis rooted in the PCS framework for veridi-cal data science (40). At its core, the PCSframework builds its foundation on threeprinciples: predictability as a reality check, computability as an important considerationin algorithmic design and data collection,and stability as an overarching principle andminimal requirement for scientific knowledgeextraction. These principles were motivatedby extensive interdisciplinary research suchas (36), which analyzed the gap-gene net-work of Drosophila , and (5), which discov-ered stable transcription factor interactionsin
Drosophila embryos. Since its concep-tion, the PCS framework has further demon-strated a strong track record of driving manyscientific discoveries including novel gene-gene interactions for the red-hair phenotype(6) and clinically-interpretable subgroups ina randomized drug trial (12).Here, using integrative -omics and drugresponse data from the Cancer Cell LineEncyclopedia (CCLE) (4), we employ the
1. The CCLE is one of the most comprehensive pub-lic databases for developing detailed genetic andpharmacologic characterizations of human cancercell lines. After preprocessing (see Appendix A),we arrive at a panel of 370 human cancer cell lines
PCS framework to develop staDRIP and pro-vide extensive documentation of our model-ing choices to arrive at stable biological dis-coveries of proteins and genes that are pre-dictive of cancer drug responses. Unlike pre-vious works whose results depend heavily onhuman decisions, staDRIP finds predictive-omic features that are stable across variousmodels and data perturbations, thus mitigat-ing the impact of human judgment calls. Wefurther show that 18 of the top 24 -omic fea-tures identified by staDRIP have been pre-viously implicated in the scientific literature,and in doing so, hint at novel candidates forfuture preclinical research.
2. Results
Building on the PCS framework, staDRIPfirst uses predictive accuracy as a realitycheck to filter out models that are poorfits for the observed data before turningto our primary goal of identifying impor-tant biomarkers for drug response predic-tion. Specifically, we define drug responseas the area under the fitted dose-responsecurve of growth inhibition (see Appendix Afor details). For each of the available 24anticancer drugs, we divide the data into a50-25-25% training-validation-test split anduse the training data to fit (1) an elasticnet tuned with cross-validation (CV), whichhas been widely used and advocated by pre-vious studies (4; 18), (2) Lasso tuned withCV, (3) Lasso tuned with ESCV, an alterna-tive CV metric that incorporates stability toyield more stable estimation properties withminimal loss of accuracy (21), (4) Gaussian that have both high-throughput molecular profil-ing of RNASeq gene expression (5000 genes), mi-croRNA expression (734 miRNAs), DNA methy-lation (4000 transcription start sites), and proteinexpression (214 proteins) as well as pharmacolog-ical data for 24 anticancer drugs. taDRIP kernel ridge regression, and (5) random for-est to predict the drug response given themiRNA, RNASeq, methylation, and proteinexpression profiles separately. We also fitseveral data integration methods includingconcatenated versions of the aforementionedmethods, the recently proposed X-shapedVariational Autoencoder (X-VAE) (32), andthe winner of the DREAM 7 challenge, theBayesian multi-task multiple kernel learningmethod (BMTMKL) (10).For each of these fits, we report in Table 2in Appendix B.3 the average validation ac-curacy across all 24 drugs as measured bythe R value and the WPC-index, a weightedprobabilistic concordance index, which hasbeen used in previous studies and measureshow well the predicted rankings agree withthe true responses (10). From Table 2, wesee that kernel ridge regression trained onlyon the RNASeq data yields the best predic-tive performance. However, considering thatour primary goal is not purely prediction, thedifferences between model prediction accu-racies shown in Table 2 are relatively smallfrom a practical viewpoint. In our inferentialprocedure discussed next, we will see thatleveraging the stability across these methodswith similar predictive accuracies is key toour staDRIP pipeline for identifying genesand proteins that are stable predictive fea-tures underlying the drug response models.Nonetheless, for completeness, we reportthe test accuracy from the best model, theRNASeq-based kernel ridge regression, tohave an R ( ± ± ±
2. The DREAM 7 challenge was a public competi-tion where teams were tasked to integrate multi-ple –omics measurements and predict drug sensi-tivity in cancer cell lines.
Beyond predictability, the PCS frameworkemphasizes stability throughout the data sci-ence life cycle so as to reduce reliance on par-ticular human judgment calls. Accordingly,we leverage and quantify the stability of im-portant features under numerous data andmodel perturbations in staDRIP as follows:for each of the 24 drugs separately,1.
Use predictability as reality check: select a set M of models with high pre-dictive accuracy across a variety of met-rics on the validation data.2. Compute stability of predictivefeatures across data perturbations: for each model M ∈ M , refit the model M to B bootstrap replicates of the data,and compute the stability score of eachfeature as the proportion of B boot-strap samples where the feature is se-lected (details in Appendix C.1). Let F M denote the subset of features withhigh stability scores (e.g., top 10).3. Select predictive features that arestable across model perturbations: take the intersection ∩ M ∈M F M as thestable predictive -omic features acrossdata and model perturbations.In our work, we are primarily interested inidentifying proteins and genes that are pre-dictive of drug responses as many drugs aredirectly related to known proteins and genes.Hence, considering the five models trained onthe RNAseq and protein data separately, wetake M = { RF , Lasso (ESCV) , Elastic Net } .Note that while kernel ridge has the highestaccuracy, it is omitted from M since thereis no straightforward, computationally effi-cient method to select features from kernelridge to the best of our knowledge. We alsoomit the Lasso from M as it generally hasthe worst predictive accuracy. For each re-maining model in M , we then take F M to taDRIP Table 1: Most stable protein associated with each drug, as identified by staDRIP, along withliterature that supports the association between the protein and drug sensitivity.
Drug Protein Supporting Literature Drug Protein Supporting Literature be the 10 features with the highest stabilityscores and list those genes and proteins in thetop 10 most stable features across all threemodels in Table 5 in the Appendix.In Table 1, we provide our main evidencefor the utility of staDRIP, listing the singlemost stable protein for each drug along withindependent publications that support thesefindings. Specifically, of the 24 proteins iden-tified as most stable by staDRIP, 18 havebeen associated with the drug sensitivity oridentified as a known or possible drug tar-get in prior preclinical studies. See AppendixC.2 for details of this literature evidence.Now in contrast to staDRIP, which findsstable predictive features across models withsimilar predictive accuracies, previous state-of-the-art methods (4; 18) use only an elasticnet to identify predictive -omics features ofdrug responses. To compare staDRIP to thiselastic net approach, we extract the proteinswith the highest stability score for each drugwhen taking M = { Elastic Net } . Repeatingthe same literature search procedure as wedid for the proteins identified by staDRIP,we found only 14 of the 24 proteins identi-fied by the elastic net are known from pre-vious clinical studies (see Table 6 in the Ap-pendix). Detailed comparisons of the results of our method, staDRIP, and that of the elas-tic net can be found in Appendix C.2.
3. Conclusion
Rooted by the PCS framework, we empha-size the importance of predictability, (com-putability), and stability as minimum re-quirements for extracting scientific knowl-edge throughout the staDRIP pipeline. Weshow that, guided by good prediction per-formance, incorporating a number of stabil-ity checks and extracting the stable partsof top-performing models can help to avoidthe poor generalization exhibited by existingmethods and can successfully identify candi-date therapeutic targets for future preclini-cal research. We also acknowledge that whilemany stability considerations are built intostaDRIP, there are inevitably human judg-ment calls that still impact our analysis. Forexample, we make a number of judgementcalls in the data preprocessing stage, whichwe detail in Appendix A. Additionally, manyother reasonable models such as ridge regres-sion and gradient boosting could be consid-ered in the staDRIP pipeline. We thus pro-vide transparent and extensive documenta-tion here to justify these decisions using do-main knowledge when possible. taDRIP
4. Acknowledgments
The authors would like to thank KarlKumbier for his helpful comments. TT ac-knowledges support from the NSF Grad-uate Research Fellowship Program DGE-1752814. BY acknowledges the support fromARO Army Research Office grant W911NF-17-1-0005, Office of Naval Research GrantN00014-17-1-2176, the Center for Scienceof Information (CSoI), a US NSF Scienceand Technology Center, under grant agree-ment CCF-0939370, UCSF fund N7251, Na-tional Science Foundation grants NSF-DMS-1613002, 1953191, and IIS 1741340. BY is aChan Zuckerberg Biohub investigator. Thiswork was in part supported by the MayoClinic Center for Individualized Medicine.
References [1] Zohar Attias-Geva, Itay Bentov, AmiFishman, Haim Werner, and IlanBruchim. Insulin-like growth factor-i re-ceptor inhibition by specific tyrosine ki-nase inhibitor nvp-aew541 in endometri-oid and serous papillary endometrialcancer cell lines.
Gynecologic oncology ,121(2):383–389, 2011.[2] Francisco Azuaje. Computational mod-els for predicting drug responses in can-cer research.
Briefings in bioinformat-ics , 18(5):820–829, 2016.[3] Kathryn Balmanno, Simon D Chell, An-nette S Gillings, Shaista Hayat, and Si-mon J Cook. Intrinsic resistance tothe mek1/2 inhibitor azd6244 (arry-142886) is associated with weak erk1/2signalling and/or strong pi3k signallingin colorectal cancer cell lines.
Interna-tional journal of cancer , 125(10):2332–2341, 2009.[4] Jordi Barretina, Giordano Caponigro,Nicolas Stransky, Kavitha Venkatesan, Adam A Margolin, Sungjoon Kim,Christopher J Wilson, Joseph Leh´ar,Gregory V Kryukov, Dmitriy Sonkin,et al. The cancer cell line encyclope-dia enables predictive modelling of an-ticancer drug sensitivity.
Nature , 483(7391):603, 2012.[5] Sumanta Basu, Karl Kumbier, James BBrown, and Bin Yu. Iterative randomforests to discover predictive and stablehigh-order interactions.
Proceedings ofthe National Academy of Sciences , 115(8):1943–1948, 2018.[6] Merle Behr, Karl Kumbier, AldoCordova-Palomera, Matthew Aguirre,Euan Ashley, Rima Arnaout, James BBrown, James Priest, and Bin Yu.Learning epistatic polygenic phenotypeswith boolean interactions. 2020.[7] D Ross Camidge, Sai-Hong Ignatius Ou,Geoffrey Shapiro, Gregory Alan Otter-son, Liza Cosca Villaruz, Miguel An-gel Villalona-Calero, A John Iafrate,Marileila Varella-Garcia, Sanja Dacic,Stephanie Cardarella, et al. Efficacyand safety of crizotinib in patients withadvanced c-met-amplified non-small celllung cancer (nsclc)., 2014.[8] Giordano Caponigro and William RSellers. Advances in the preclinical test-ing of cancer therapeutic hypotheses.
Nature reviews Drug discovery , 10(3):179, 2011.[9] Liang Chen and Jingxuan Pan. Dualcyclin-dependent kinase 4/6 inhibitionby pd-0332991 induces apoptosis andsenescence in oesophageal squamous cellcarcinoma cells.
British journal of phar-macology , 174(15):2427–2443, 2017.[10] James C Costello, Laura M Heiser,Elisabeth Georgii, Mehmet G¨onen, taDRIP Michael P Menden, Nicholas J Wang,Mukesh Bansal, Petteri Hintsanen,Suleiman A Khan, John-PatrickMpindi, et al. A community effortto assess and improve drug sensi-tivity prediction algorithms.
Naturebiotechnology , 32(12):1202, 2014.[11] E Drakos, Rajesh R Singh, GZ Rassi-dakis, E Schlette, J Li, Francois-XavierClaret, RJ Ford, Francisco Vega, andL Jeffrey Medeiros. Activation of thep53 pathway by the mdm2 inhibitornutlin-3a overcomes bcl2 overexpressionin a preclinical model of diffuse largeb-cell lymphoma associated with t (14;18)(q32; q21).
Leukemia , 25(5):856–867,2011.[12] Raaz Dwivedi, Yan Shuo Tan, BritonPark, Mian Wei, Kevin Horgan, DavidMadigan, and Bin Yu. Stable discoveryof interpretable subgroups via calibra-tion in causal studies. arXiv preprintarXiv:2008.10109 , 2020.[13] Caroline M Emery, Krishna G Vijayen-dran, Marie C Zipser, Allison M Sawyer,Lili Niu, Jessica J Kim, Charles Hatton,Rajiv Chopra, Patrick A Oberholzer,Maria B Karpova, et al. Mek1 muta-tions confer resistance to mek and b-raf inhibition.
Proceedings of the Na-tional Academy of Sciences , 106(48):20411–20416, 2009.[14] Francisco J Esteva, Dihua Yu, Mien-Chie Hung, and Gabriel N Horto-bagyi. Molecular predictors of responseto trastuzumab and lapatinib in breastcancer.
Nature reviews Clinical oncol-ogy , 7(2):98, 2010.[15] Bet¨ul G¨uven¸c Paltun, Hiroshi Mamit-suka, and Samuel Kaski. Improvingdrug response prediction by integrating multiple data sources: matrix factor-ization, kernel and network-based ap-proaches.
Briefings in Bioinformatics ,2019.[16] Kan He, Xingnan Zheng, Lin Zhang,and Jian Yu. Hsp90 inhibitors promotep53-dependent apoptosis through pumaand bax.
Molecular cancer therapeutics ,12(11):2559–2568, 2013.[17] Ying C Henderson, Yunyun Chen,Mitchell J Frederick, Stephen Y Lai,and Gary L Clayman. Mek in-hibitor pd0325901 significantly reducesthe growth of papillary thyroid carci-noma cells in vitro and in vivo.
Molecu-lar cancer therapeutics , 9(7):1968–1976,2010.[18] In Sock Jang, Elias Chaibub Neto,Justin Guinney, Stephen H Friend, andAdam A Margolin. Systematic assess-ment of analytical methods for drug sen-sitivity prediction from cancer cell linedata. In
Biocomputing 2014 , pages 63–74. World Scientific, 2014.[19] Isaac S Kohane. Ten things we have todo to achieve precision medicine.
Sci-ence , 349(6243):37–38, 2015.[20] Xiao-Feng Le and Robert C Bast, Jr.Src family kinases and paclitaxel sensi-tivity.
Cancer biology & therapy , 12(4):260–269, 2011.[21] Chinghway Lim and Bin Yu. Estimationstability with cross-validation (escv).
Journal of Computational and Graphi-cal Statistics , 25(2):464–492, 2016.[22] Xiang Ling, Chao Xu, Chuandong Fan,Kai Zhong, Fengzhi Li, and XinjiangWang. Fl118 induces p53-dependentsenescence in colorectal cancer cells bypromoting degradation of mdmx.
Can-cer research , 74(24):7487–7497, 2014. taDRIP [23] Yuqing Liu, Zhuo Wang, Shu QinKwong, Eric Lik Hang Lui, Scott LFriedman, Fu Rong Li, Reni Wing ChiLam, Guo Chao Zhang, Hui Zhang, andTao Ye. Inhibition of pdgf, tgf- β , andabl signaling and reduction of liver fi-brosis by the small molecule bcr-abl ty-rosine kinase antagonist nilotinib. Jour-nal of hepatology , 55(3):612–625, 2011.[24] Ultan McDermott, Sreenath V Sharma,Lori Dowell, Patricia Greninger, ClaraMontagut, Jennifer Lamb, HeidiArchibald, Raul Raudales, AngelaTam, Diana Lee, et al. Identificationof genotype-correlated sensitivity toselective kinase inhibitors by using high-throughput tumor cell line profiling.
Proceedings of the National Academy ofSciences , 104(50):19936–19941, 2007.[25] Pierre Mordant, Yohann Loriot, C´elineLeteur, Julien Calderaro, Jean Bourhis,Marie Wislez, Jean-Charles Soria, andEric Deutsch. Dependence on phos-phoinositide 3-kinase and ras-raf path-ways drive the activity of raf265, anovel raf/vegfr2 inhibitor, and rad001(everolimus) in combination.
Molecularcancer therapeutics , 9(2):358–368, 2010.[26] Atsuhiko T Naito, Sho Okada, TohruMinamino, Koji Iwanaga, Mei-Lan Liu,Tomokazu Sumida, Seitaro Nomura,Naruhiko Sahara, Tatsuya Mizoroki,Akihiko Takashima, et al. Promotion ofchip-mediated p53 degradation protectsthe heart from ischemic injury.
Circula-tion research , 106(11):1692, 2010.[27] David L Nelson, Albert L Lehninger,and Michael M Cox.
Lehninger princi-ples of biochemistry . Macmillan, 2008.[28] C Nishioka, T Ikezoe, A Takeshita,J Yang, T Tasaka, Y Yang, Y Kuwayama, N Komatsu, K Tog-itani, HP Koeffler, et al. Zd6474induces growth arrest and apoptosisof human leukemia cells, which isenhanced by concomitant use of a novelmek inhibitor, azd6244.
Leukemia , 21(6):1308–1310, 2007.[29] Gregory Riddick, Hua Song, Susie Ahn,Jennifer Walling, Diego Borges-Rivera,Wei Zhang, and Howard A Fine. Pre-dicting in vitro drug sensitivity usingrandom forests.
Bioinformatics , 27(2):220–224, 2011.[30] Mark A Rubin. Health: Make precisionmedicine work for cancer care.
NatureNews , 520(7547):290, 2015.[31] Jinjin Shao, Zhifei Xu, Xueming Peng,Min Chen, Yuanrun Zhu, Li Xu, HongZhu, Bo Yang, Peihua Luo, and QiaojunHe. Gefitinib synergizes with irinote-can to suppress hepatocellular carci-noma via antagonizing rad51-mediateddna-repair.
PLoS One , 11(1):e0146968,2016.[32] Nikola Simidjievski, Cristian Bodnar,Ifrah Tariq, Paul Scherer, HelenaAndres-Terre, Zohreh Shams, MatejaJamnik, et al. Variational autoen-coders for cancer data integration: De-sign principles and computational prac-tice.
BioRxiv , page 719542, 2019.[33] John H Strickler, Alexander N Staro-dub, Jingquan Jia, Kellen L Meadows,Andrew B Nixon, Andrew Dellinger,Michael A Morse, Hope E Uronis,P Kelly Marcom, S Yousuf Zafar,et al. Phase i study of bevacizumab,everolimus, and panobinostat (lbh-589)in advanced solid tumors.
Cancerchemotherapy and pharmacology , 70(2):251–258, 2012. taDRIP [34] Anna Tutusaus, Milica Stefanovic,Loreto Boix, Blanca Cucarull, AynaraZamora, Laura Blasco, Pablo Garc´ıade Frutos, Maria Reig, Jose CFernandez-Checa, Montserrat Mar´ı,et al. Antiapoptotic bcl-2 proteinsdetermine sorafenib/regorafenib re-sistance and bh3-mimetic efficacy inhepatocellular carcinoma. Oncotarget ,9(24):16701, 2018.[35] Wolfgang Warsch, Karoline Kollmann,Eva Eckelhart, Sabine Fajmann, SabineCerny-Reiterer, Andrea H¨olbl, Karo-line V Gleixner, Michael Dworzak,Matthias Mayerhofer, Gregor Hoer-mann, et al. High stat5 levels mediateimatinib resistance and indicate diseaseprogression in chronic myeloid leukemia.
Blood, The Journal of the American So-ciety of Hematology , 117(12):3409–3420,2011.[36] Siqi Wu, Antony Joseph, Ann S Ham-monds, Susan E Celniker, Bin Yu, andErwin Frise. Stability-driven nonneg-ative matrix factorization to interpretspatial gene expression and build lo-cal gene networks.
Proceedings of theNational Academy of Sciences , 113(16):4290–4295, 2016.[37] Yi-Ju Wu, Yee-Jee Jan, Bor-Sheng Ko,Shu-Man Liang, and Jun-Yang Liou. In-volvement of 14-3-3 proteins in regu-lating tumor progression of hepatocellu-lar carcinoma.
Cancers , 7(2):1022–1036,2015.[38] Joy C Yang, Lanfang Bai, Stanley Yap,Allen C Gao, Hsing-Jien Kung, andChristopher P Evans. Effect of the spe-cific src family kinase inhibitor saraca-tinib on osteolytic lesions using the pc-3bone model.
Molecular cancer therapeu-tics , 9(6):1629–1637, 2010. [39] Yang Yang, Takayuki Ikezoe, ChieNishioka, Takahiro Taguchi, Wei-guoZhu, H Phillip Koeffler, and HirokuniTaguchi. Zd6474 induces growth arrestand apoptosis of gist-t1 cells, which isenhanced by concomitant use of suni-tinib.
Cancer science , 97(12):1404–1409, 2006.[40] Bin Yu and Karl Kumbier. Veridi-cal data science.
Proceedings of theNational Academy of Sciences , 117(8):3920–3929, 2020.
Appendix A. Data
To begin building the personalized drugresponse models, we leverage data froma panel of 397 human cancer cell linesthat have both high-throughput molecularprofiling and pharmacological data for 24anticancer drugs from the Cancer CellLine Encyclopedia (CCLE) project (4).Specifically, -omics data from the CCLEwas downloaded from DepMap Public 18Q3(https://depmap.org/portal/download/).These cell lines encompass 23 differenttumor sites and have been profiled for geneexpression, microRNA expression, DNAmethylation, and protein expression. Notethat though the CCLE contains data from947 cell lines, only 397 of these cell lineshad data from all four molecular profiles ofinterest and pharmacological profiling.In addition to the molecular profiles, weobtained pharmacological profiling of 24chemotherapy and target therapy drugs fromthe CCLE (4). For each cell line-drug com-bination, the CCLE incorporated a system-atic framework to measure molecular corre-lates of pharmacological sensitivity in vitroacross eight dosages. We refer to (4) fordetails on this procedure, but given the fit-ted dose-response curves of growth inhibitionfrom these experiments, we took the activity taDRIP area, or AUC, to be the primary response ofinterest in this work. The AUC is defined asthe area between the response curve and 0(i.e., the no response reference level) and isa well-accepted measure of drug sensitivity(18; 4). In this case, the AUC is measuredon an 8-point scale with 0 corresponding toan inactive compound and 8 correspondingto a compound with 100% inhibition at all 8dosages.In Figure 1, we provide a graphical sum-mary of the raw molecular and pharmacolog-ical profiling data sets. A.1. Data preprocessing
Given the raw data described above, thereare a couple areas of initial concern thatwarrant preprocessing. First, the cancer celllines encompass 23 different tumor sites, andcell lines from the same tumor site tend tohave more similar expression profiles thancell lines from different sites. To illustrate,we observe clusters of cell lines by tumor sitewhen performing both hierarchical clusteringand PCA on the RNASeq profile in Figure2. Due to these inherent differences betweentumors, we chose to omit the cell lines fromtumor sites with < taDRIP Figure 1: A graphical overview of the raw CCLE molecular profiling data sets, which areused to predict the drug responses of 24 therapeutic drugs, as measured via thedrug response AUC.Figure 2: We apply (A) PCA and (B) hierarchical clustering (with Ward’s linkage) to thelog-transformed RNASeq data set and color the samples by their tumor site. Forsimplicity, we use color to distinguish between five prominent tumor sites andshow the remaining tumor sites in grey. We also show the proportion of varianceexplained by each principal component in the subplot titles of (A). In both thePC plots and the hierarchical clustering dendrogram, we can see clusters of tumorsites, illustrating the inherent differences between tumor sites. taDRIP Figure 3: Left: Distribution of features in each of the four molecular profiles. Right: Dis-tribution of the drug responses for each of the 24 drugs.
Appendix B. Prediction Models
Like in data preprocessing, human judgmentcalls play a significant role in the modeling taDRIP stage, including the decision of which meth-ods to fit. Ideally, the chosen methods shouldhave some justified connection to the biolog-ical problem at hand, but in our case, it isunclear which models or assumptions bestfit the biological drug response mechanisma priori. Nevertheless, we have reasons tobelieve that the Lasso, elastic net, RF, andkernel ridge regression are particularly ap-pealing fits for this problem.First, the Lasso assumes a sparse linearmodel, meaning that the effect of each fea-ture is additive and only a sparse number ofthe features contribute to the drug sensitiv-ity. The simplicity and interpretability of theLasso makes it a popular tool for bioinfor-matics prediction tasks, so we choose to usethe Lasso as a baseline model for our analy-sis. The elastic net is perhaps even more pop-ular than the Lasso in drug response predic-tion studies (18; 4). Similar to the Lasso, theelastic net assumes linearity and some spar-sity but is also able to better handle corre-lated features. Beyond linearity, kernel ridgeregression with a Gaussian kernel allows formore flexible, but less interpretable, func-tional relationships that are not necessarilylinear. Kernel methods have been appliedin previous case studies with great success(10) and are hence promising candidates forour study as well. Lastly, random forest canbe viewed as a collection of localized, non-linear thresholded decisions rules (like on-offswitches), which are well-suited for many bi-ological processes that match the combinato-rial thresholding (or switch-like) behavior ofdecision trees (27). Random forests are alsoinvariant to the scale of the data. This is es-pecially advantageous for integrating differ-ent data sets with varying scales and domaintypes (e.g., count-valued RNASeq expres-sion, proportion-valued methylation data,continuous-valued protein expression).In addition to fitting the aforementionedmethods on each of the molecular profiles separately, we also tried fitting various dataintegration methods since incorporating mul-tiple sources of -omics data can sometimes re-sult in more accurate predictions than mod-els built using only a single -omics sources(10; 15; 32). The most natural integra-tion idea is to concatenate the -omics datasets together and to fit a single model (e.g.,the elastic net) on the concatenated data.When fitting models like the Lasso, elasticnet, and kernel ridge regression which arenot scale-invariant, the molecular profiles arescaled to have columns with mean 0 and vari-ance 1 to allow for fair comparisons betweenmolecular types. We refer to this methodas the concatenated data approach and usethis as a baseline for evaluating data integra-tion methods. More sophisticated method-ology has also been proposed to integrate -omics data, including recent work using theX-VAE, a variational autoencoder for cancerdata integration (32), and the BMTMKL, aBayesian multitask multiple kernel learningmethod which won the NCI-DREAM 7 chal-lenge (10).Note that though an alternative approachwould have been to develop new methodol-ogy, we instead leverage these existing ma-chine learning methods that have been rig-orously vetted and have been shown to workwell in many related problems. In fact, byexamining the stable properties across theseexisting methods, we obtain high-quality sci-entific findings, as made evident by the abun-dance of supporting literature (see Table 1). B.1. Model hyperparameters
To select hyperparameters in each of thesemethods, we use 5-fold cross validation,where the folds are stratified by tumortype. We also investigate using the estima-tion stability cross validation (ESCV) met-ric for selecting the Lasso’s hyperparameter.This ESCV metric combines a stability mea- taDRIP sure within the cross-validation frameworkto yield more stable estimation propertieswith minimal loss of accuracy when using theLasso (21).For the X-VAE model, we adapt an X-shaped network architecture to train a vari-ation autoencoder that learns joint represen-tation of the RNAseq and protein data. Inparticular, we take the 2,000 RNAseq fea-tures with highest variance, since the num-ber of cell lines is too small compared withthe original number of RNAseq features. Inour experiment, both the encoder and the en-coder have one hidden layer. There are 128neurons corresponding to the RNAseq pro-tein in the hidden layer of the encoder andthe decoder, and 32 neurons correspondingto the protein features. The latent repre-sentation has a dimension of 32. The di-mension of the hidden layers and the latentrepresentation are based on the recommen-dation of (32), and are not tuned. We usedELU activation and employed batch normal-ization and a dropout component with rate0.2, as recommended by (32). The modelswere trained for 500 epochs using an Adamoptimizer with a learning rate of 0.001. B.2. Evaluation metrics
We primarily consider two evaluation met-rics for prediction accuracy as each capturesa different aspect of prediction - 1) R valueand 2) probabilistic concordance-index (PC-index). R is defined as 1 − MSE( Y, ˆ Y )Var( Y ) , whereVar( Y ) denotes the variance of the observedresponses, and MSE( Y, ˆ Y ) denotes the meansum of squared errors between the predictedresponses ˆ Y and observed responses Y . R isa rescaling of the MSE that accounts for theamount of variation in the observed responseand thus allows us to easily compare accura-cies between drug response models with dif-ferent amounts of variation in the observedresponse, but as with the MSE, R can be heavily influenced by outliers. PC-index isa measure of how well the predicted rank-ings agree with the true responses. Thismetric takes into account the variance of thedrug responses but it also assumes that thedrug responses follow a Gaussian distribu-tion, which may not be true in some cases.We consider this metric because it is theprimary method of evaluation in the NCI-DREAM 7 competition (10). Given the largescale and breadth of this challenge, we com-pare our results to this work. For furtherdetails on the PC-index, we refer to (10).In each of the evaluation metrics above,we receive a separate score for each of the24 drug response models. It may also bebeneficial to aggregate the 24 scores into asingle number for concrete evaluation. Inparticular, (10) used a weighted average ofthe PC-indices to compare various modelsand referred to this evaluation metric as theweighted PC-index (WPC-index). To com-pare our results with the benchmark in (10),we also consider the WPC-index in evaluat-ing our models. B.3. Prediction results
In Table 2, for various methods, we sum-marize the validation accuracy across all 24drugs as measured by the average R valueand WPC-index. In Tables 3 and 4, weprovide additional insights into the drug re-sponse prediction accuracies at the individ-ual drug level. In Table 3, we see that thebest model depends on the particular drug,but the kernel ridge regression model worksbest on average. In Table 4, we show the testerrors from the kernel ridge regression fit foreach drug separately. taDRIP Table 2: Validation WPC-index and average R across all 24 drug response models forvarious methods trained on each molecular profile separately and together. Highervalues of R and WPC-index indicate better fits. Validation Set WPC-Index Validation Set R Methyl. miRNA Protein RNASeq Integrated Methyl. miRNA Protein RNASeq IntegratedKernel Ridge 0.600 0.603 0.617
Table 3: For each molecular profile (or the integrated profile) used for training, we countthe number of drugs (out of 24) for which each method performed the best andgave the highest validation R compared to its six other competitors. Methyl. miRNA Protein RNASeq IntegratedKernel Ridge 7 5 16 12 6RF 9 5 2 6 5Elastic Net 1 8 1 1 2Lasso (ESCV) 4 4 3 4 0Lasso 3 2 2 1 5X-VAE – – – – 2BMTMKL – – – – 2
Appendix C. PCS Inference
C.1. Detailed description on thecomputation of stability scores
We next describe in detail how to computethe stability scores in the PCS-driven diseasesignature identification pipeline. Note thatthe following procedure is repeated for eachof the 24 drugs.We randomly draw B = 100 bootstrapsamples D ( b ) , b = 1 , . . . ,
100 from the train-ing data. Then, for each bootstrap sample,we fit (1) an elastic net with tuning param-eter selected by CV (2) a Lasso with tuningparameter selected by ESCV and (3) a ran-dom forest. Each model is fitted using theprotein and RNAseq data separately since the integration approaches did not improvethe prediction accuracy over simply using theRNASeq data only (see Table 2). Next, foreach feature X j from either the protein orRNAseq data set, let ω ( b ) j be defined in thefollowing way: for the Lasso and elastic net, ω ( b ) j = 1 if the coefficient of X j is non-zero,and ω ( b ) j = 0 otherwise; for the random for-est, ω ( b ) j is the MDI feature importance of X j .We then define the stability score sta( X j ) ofeach feature X j as sta( X j ) = B (cid:80) Bb =1 ω ( b ) j and rank the proteins and genes separatelyby the stability scores of the features. taDRIP Table 4: Test error for each drug using theRNASeq-based kernel ridge regres-sion model
Drug R PC-Index
C.2. Discussion on the diseasesignatures identified by the PCSpipeline
In Table 5, we list the proteins and geneswhich we found to be stable and among thetop 10 features for all three methods. Amongthese stable features, we list them in decreas-ing order by the sum of stability score rank-ings. Though we identify fewer stable genes,this is most likely due to two reasons. First,there are 5000 genes in the model, comparedto only 214 proteins, so thresholding at thetop 10 genes is extremely conservative. Sec-ondly, the average correlation between genesis higher than that between proteins, addingto the instability. With regards to the identified protein sig-natures, we can roughly classify them intothree categories. The first category containsthose that are known targets of the corre-sponding target therapy drugs. For example,Erlotinib is a medication used to treat non-small cell lung cancer (NSCLC) and pancre-atic cancer. It is an EGFR inhibitor and isspecifically used for NSCLC patients with tu-mors positive for EGFR exon 19 deletions(del19) or exon 21 (L858R) substitution mu-tations. Correspondingly, EGFR is ranked inthe top ten stable proteins in all three mod-els. Other such examples include the drugLapatinib and its target HER2, PD-0325901and its target MEK, PHA-665752 and its tar-get c-Met, and ZD-6474 and its target c-Kit.The second category contains those thatare not known to be direct targets of the drugbut have been shown in preclinical studies tobe potential therapeutic targets or are associ-ated with drug resistance. For example, (22)identified a potential application of the drugIrinotecan as an MdmX inhibitor for tar-geted therapies, and in our pipeline, MdmXhad the highest stability score for all threemodels. As another instance, we identifiedMEK1 as a top protein signature, ranked bystability score, for the drug PLX4720 while(13) showed that MEK1 mutations confer re-sistance to PLX4720.The third category are proteins that do notbelong to the two categories above. Still,these biomarkers are predictive of the drugresponse under various model and data per-turbations. Given the evidence in the sci-entific literature that supports many of ouridentified features, the proteins in this cate-gory may be potential candidates for futurepreclinical investigation.Among the list of overlapping stable fea-tures in Table 5, we list in Table 1 theone with the highest stability score rank-ing along with recent biomedical publica-tions, supporting the association between the taDRIP Table 5: Stable protein and RNAseq signatures. A feature is included if it is among the top10 most stable features under 3 different machine learning models (i.e., elastic net,Lasso (ESCV), and random forests). The stability of the features are computedfrom the PCS inference framework in staDRIP. Blank cells indicate that no featuresappeared among the top 10 most stable features for all three models.
Drug name Protein signature RNAseq Signature protein and the drug. The procedure ofthis literature search is as follows: we firstsearched for papers where the protein andthe drug co-occurs. Then for each paper,we read the introduction section to under-stand their main conclusions. Each of the 18papers listed in Table 1 includes sentencessuch as “our findings suggest that the over-expression of this protein will increase drugsensitivity/resistance” or “this protein is apotential (or known) therapeutic target forthe drug”. Out of the 24 predictive proteinsignatures that we identify as most stable,18 of them have existing preclinical studiesthat confirm the effectiveness of our stabilityanalysis. In Table 6, we list the protein with thehighest stability score when fitting an elasticnet to 100 bootstrap samples of the train-ing data for each of the 24 drugs. This ap-proach of finding predictive -omics featureswas previously used in (4; 18). Comparedwith staDRIP, which searches for stable fea-tures across different models, this approachonly uses a single model (i.e, an elastic net)for feature selection. We repeat the same lit-erature search procedure as we did for ourfindings and found that among the 24 moststable protein features identified by elasticnet, only 14 are known from previous clinicalstudies. For 10 drugs, the most stable pro-tein from elastic net and that from staDRIPis the same, and among these 10 proteins, 9 taDRIP Table 6: Most stable protein associated with each drug, as identified by the elastic net,along with preclinical evidence that supports the association between the listedprotein and drug sensitivity.
Drug Protein Supporting Literature Drug Protein Supporting Literature are implicated in the existing literature. Forthe other 14 drugs, 5 proteins identified byelastic net are implicated in the existing lit-erature, while 9 protein features identified bystaDRIP are implicated in the existing liter-ature.are implicated in the existing literature. Forthe other 14 drugs, 5 proteins identified byelastic net are implicated in the existing lit-erature, while 9 protein features identified bystaDRIP are implicated in the existing liter-ature.