[PDF] A stability-driven protocol for drug response interpretable prediction (staDRIP)

Abstract

Modern cancer -omics and pharmacological data hold great promise in precision cancer medicine for developing individualized patient treatments. However, high heterogeneity and noise in such data pose challenges for predicting the response of cancer cell lines to therapeutic drugs accurately. As a result, arbitrary human judgment calls are rampant throughout the predictive modeling pipeline. In this work, we develop a transparent stability-driven pipeline for drug response interpretable predictions, or staDRIP, which builds upon the PCS framework for veridical data science (Yu and Kumbier, 2020) and mitigates the impact of human judgment calls. Here we use the PCS framework for the first time in cancer research to extract proteins and genes that are important in predicting the drug responses and stable across appropriate data and model perturbations. Out of the 24 most stable proteins we identified using data from the Cancer Cell Line Encyclopedia (CCLE), 18 have been associated with the drug response or identified as a known or possible drug target in previous literature, demonstrating the utility of our stability-driven pipeline for knowledge discovery in cancer drug response prediction modeling.

Full PDF

MML4H Extended Abstract Arxiv Index:1–17, 2020 Machine Learning for Health (ML4H) 2020

A stability-driven protocol for drug response interpretableprediction (staDRIP)

Xiao Li ∗ [email protected] University of California, Berkeley

Tiﬀany M. Tang [email protected]

University of California, Berkeley

Xuewei Wang

[email protected]

Mayo Clinic

Jean-Pierre A. Kocher

[email protected]

Mayo Clinic

Bin Yu [email protected]

University of California, Berkeley

Abstract

Modern cancer -omics and pharmaco-logical data hold great promise in preci-sion cancer medicine for developing in-dividualized patient treatments. How-ever, high heterogeneity and noise insuch data pose challenges for predictingthe response of cancer cell lines to thera-peutic drugs accurately. As a result, ar-bitrary human judgment calls are ram-pant throughout the predictive model-ing pipeline. In this work, we develop atransparent stability-driven pipeline fordrug response interpretable predictions,or staDRIP, which builds upon the PCSframework for veridical data science (Yuand Kumbier, 2020) and mitigates theimpact of human judgment calls. Herewe use the PCS framework for the ﬁrsttime in cancer research to extract pro-teins and genes that are important inpredicting the drug responses and sta-ble across appropriate data and modelperturbations. Out of the 24 most sta-ble proteins we identiﬁed using datafrom the Cancer Cell Line Encyclope-dia (CCLE), 18 have been associatedwith the drug response or identiﬁed as a ∗ The ﬁrst two authors contributed equally to thispaper. known or possible drug target in previ-ous literature, demonstrating the util-ity of our stability-driven pipeline forknowledge discovery in cancer drug re-sponse prediction modeling.

1. Introduction

A critical goal in precision medicine oncol-ogy revolves around predicting a patient’sresponse to therapeutic drugs given the pa-tient’s unique molecular proﬁle (30; 19). Ac-curate personalized drug response predic-tions can immediately shed light on therapiesthat are likely to be ineﬀective or toxic andaid clinicians in deciding the most promis-ing treatment for their patients (2). More-over, interpreting these drug response pre-diction models can help to improve recom-mendations of compounds and target genesto prioritize in future preclinical research (8).While several community-wide, public ef-forts (4; 10) and many other works havemade progress towards improving the predic-tive accuracy of drug response predictions,identifying the important disease signatures(i.e., proteins, genes, and other biomarkers)that drive the drug response prediction mod- © a r X i v : . [ q - b i o . Q M ] N ov taDRIP els has received less attention. To date, pre-vious works have typically focused on fea-ture selection within one speciﬁc model suchas elastic nets (18; 4) and random forest(29). However, because molecular proﬁlingdata is often heterogeneous, noisy, and high-dimensional, these results are highly sensi-tive to modeling decisions made by humansincluding the type of model, the amount oftraining data, and the choice of algorithm.In this work, we focus on this goal ofdetecting stable, interpretable, and predic-tive -omic signatures that drive a cell line’sdrug response. To overcome the aforemen-tioned challenges, we develop a transparentstability-driven pipeline for drug response in-terpretable prediction called staDRIP thatis rooted in the PCS framework for veridi-cal data science (40). At its core, the PCSframework builds its foundation on threeprinciples: predictability as a reality check, computability as an important considerationin algorithmic design and data collection,and stability as an overarching principle andminimal requirement for scientiﬁc knowledgeextraction. These principles were motivatedby extensive interdisciplinary research suchas (36), which analyzed the gap-gene net-work of Drosophila , and (5), which discov-ered stable transcription factor interactionsin

Drosophila embryos. Since its concep-tion, the PCS framework has further demon-strated a strong track record of driving manyscientiﬁc discoveries including novel gene-gene interactions for the red-hair phenotype(6) and clinically-interpretable subgroups ina randomized drug trial (12).Here, using integrative -omics and drugresponse data from the Cancer Cell LineEncyclopedia (CCLE) (4), we employ the

1. The CCLE is one of the most comprehensive pub-lic databases for developing detailed genetic andpharmacologic characterizations of human cancercell lines. After preprocessing (see Appendix A),we arrive at a panel of 370 human cancer cell lines

PCS framework to develop staDRIP and pro-vide extensive documentation of our model-ing choices to arrive at stable biological dis-coveries of proteins and genes that are pre-dictive of cancer drug responses. Unlike pre-vious works whose results depend heavily onhuman decisions, staDRIP ﬁnds predictive-omic features that are stable across variousmodels and data perturbations, thus mitigat-ing the impact of human judgment calls. Wefurther show that 18 of the top 24 -omic fea-tures identiﬁed by staDRIP have been pre-viously implicated in the scientiﬁc literature,and in doing so, hint at novel candidates forfuture preclinical research.

2. Results

Building on the PCS framework, staDRIPﬁrst uses predictive accuracy as a realitycheck to ﬁlter out models that are poorﬁts for the observed data before turningto our primary goal of identifying impor-tant biomarkers for drug response predic-tion. Speciﬁcally, we deﬁne drug responseas the area under the ﬁtted dose-responsecurve of growth inhibition (see Appendix Afor details). For each of the available 24anticancer drugs, we divide the data into a50-25-25% training-validation-test split anduse the training data to ﬁt (1) an elasticnet tuned with cross-validation (CV), whichhas been widely used and advocated by pre-vious studies (4; 18), (2) Lasso tuned withCV, (3) Lasso tuned with ESCV, an alterna-tive CV metric that incorporates stability toyield more stable estimation properties withminimal loss of accuracy (21), (4) Gaussian that have both high-throughput molecular proﬁl-ing of RNASeq gene expression (5000 genes), mi-croRNA expression (734 miRNAs), DNA methy-lation (4000 transcription start sites), and proteinexpression (214 proteins) as well as pharmacolog-ical data for 24 anticancer drugs. taDRIP kernel ridge regression, and (5) random for-est to predict the drug response given themiRNA, RNASeq, methylation, and proteinexpression proﬁles separately. We also ﬁtseveral data integration methods includingconcatenated versions of the aforementionedmethods, the recently proposed X-shapedVariational Autoencoder (X-VAE) (32), andthe winner of the DREAM 7 challenge, theBayesian multi-task multiple kernel learningmethod (BMTMKL) (10).For each of these ﬁts, we report in Table 2in Appendix B.3 the average validation ac-curacy across all 24 drugs as measured bythe R value and the WPC-index, a weightedprobabilistic concordance index, which hasbeen used in previous studies and measureshow well the predicted rankings agree withthe true responses (10). From Table 2, wesee that kernel ridge regression trained onlyon the RNASeq data yields the best predic-tive performance. However, considering thatour primary goal is not purely prediction, thediﬀerences between model prediction accu-racies shown in Table 2 are relatively smallfrom a practical viewpoint. In our inferentialprocedure discussed next, we will see thatleveraging the stability across these methodswith similar predictive accuracies is key toour staDRIP pipeline for identifying genesand proteins that are stable predictive fea-tures underlying the drug response models.Nonetheless, for completeness, we reportthe test accuracy from the best model, theRNASeq-based kernel ridge regression, tohave an R ( ± ± ±

2. The DREAM 7 challenge was a public competi-tion where teams were tasked to integrate multi-ple –omics measurements and predict drug sensi-tivity in cancer cell lines.

Beyond predictability, the PCS frameworkemphasizes stability throughout the data sci-ence life cycle so as to reduce reliance on par-ticular human judgment calls. Accordingly,we leverage and quantify the stability of im-portant features under numerous data andmodel perturbations in staDRIP as follows:for each of the 24 drugs separately,1.

Use predictability as reality check: select a set M of models with high pre-dictive accuracy across a variety of met-rics on the validation data.2. Compute stability of predictivefeatures across data perturbations: for each model M ∈ M , reﬁt the model M to B bootstrap replicates of the data,and compute the stability score of eachfeature as the proportion of B boot-strap samples where the feature is se-lected (details in Appendix C.1). Let F M denote the subset of features withhigh stability scores (e.g., top 10).3. Select predictive features that arestable across model perturbations: take the intersection ∩ M ∈M F M as thestable predictive -omic features acrossdata and model perturbations.In our work, we are primarily interested inidentifying proteins and genes that are pre-dictive of drug responses as many drugs aredirectly related to known proteins and genes.Hence, considering the ﬁve models trained onthe RNAseq and protein data separately, wetake M = { RF , Lasso (ESCV) , Elastic Net } .Note that while kernel ridge has the highestaccuracy, it is omitted from M since thereis no straightforward, computationally eﬃ-cient method to select features from kernelridge to the best of our knowledge. We alsoomit the Lasso from M as it generally hasthe worst predictive accuracy. For each re-maining model in M , we then take F M to taDRIP Table 1: Most stable protein associated with each drug, as identiﬁed by staDRIP, along withliterature that supports the association between the protein and drug sensitivity.

Drug Protein Supporting Literature Drug Protein Supporting Literature be the 10 features with the highest stabilityscores and list those genes and proteins in thetop 10 most stable features across all threemodels in Table 5 in the Appendix.In Table 1, we provide our main evidencefor the utility of staDRIP, listing the singlemost stable protein for each drug along withindependent publications that support theseﬁndings. Speciﬁcally, of the 24 proteins iden-tiﬁed as most stable by staDRIP, 18 havebeen associated with the drug sensitivity oridentiﬁed as a known or possible drug tar-get in prior preclinical studies. See AppendixC.2 for details of this literature evidence.Now in contrast to staDRIP, which ﬁndsstable predictive features across models withsimilar predictive accuracies, previous state-of-the-art methods (4; 18) use only an elasticnet to identify predictive -omics features ofdrug responses. To compare staDRIP to thiselastic net approach, we extract the proteinswith the highest stability score for each drugwhen taking M = { Elastic Net } . Repeatingthe same literature search procedure as wedid for the proteins identiﬁed by staDRIP,we found only 14 of the 24 proteins identi-ﬁed by the elastic net are known from pre-vious clinical studies (see Table 6 in the Ap-pendix). Detailed comparisons of the results of our method, staDRIP, and that of the elas-tic net can be found in Appendix C.2.

3. Conclusion

Rooted by the PCS framework, we empha-size the importance of predictability, (com-putability), and stability as minimum re-quirements for extracting scientiﬁc knowl-edge throughout the staDRIP pipeline. Weshow that, guided by good prediction per-formance, incorporating a number of stabil-ity checks and extracting the stable partsof top-performing models can help to avoidthe poor generalization exhibited by existingmethods and can successfully identify candi-date therapeutic targets for future preclini-cal research. We also acknowledge that whilemany stability considerations are built intostaDRIP, there are inevitably human judg-ment calls that still impact our analysis. Forexample, we make a number of judgementcalls in the data preprocessing stage, whichwe detail in Appendix A. Additionally, manyother reasonable models such as ridge regres-sion and gradient boosting could be consid-ered in the staDRIP pipeline. We thus pro-vide transparent and extensive documenta-tion here to justify these decisions using do-main knowledge when possible. taDRIP

4. Acknowledgments

The authors would like to thank KarlKumbier for his helpful comments. TT ac-knowledges support from the NSF Grad-uate Research Fellowship Program DGE-1752814. BY acknowledges the support fromARO Army Research Oﬃce grant W911NF-17-1-0005, Oﬃce of Naval Research GrantN00014-17-1-2176, the Center for Scienceof Information (CSoI), a US NSF Scienceand Technology Center, under grant agree-ment CCF-0939370, UCSF fund N7251, Na-tional Science Foundation grants NSF-DMS-1613002, 1953191, and IIS 1741340. BY is aChan Zuckerberg Biohub investigator. Thiswork was in part supported by the MayoClinic Center for Individualized Medicine.

References [1] Zohar Attias-Geva, Itay Bentov, AmiFishman, Haim Werner, and IlanBruchim. Insulin-like growth factor-i re-ceptor inhibition by speciﬁc tyrosine ki-nase inhibitor nvp-aew541 in endometri-oid and serous papillary endometrialcancer cell lines.

Gynecologic oncology ,121(2):383–389, 2011.[2] Francisco Azuaje. Computational mod-els for predicting drug responses in can-cer research.

Brieﬁngs in bioinformat-ics , 18(5):820–829, 2016.[3] Kathryn Balmanno, Simon D Chell, An-nette S Gillings, Shaista Hayat, and Si-mon J Cook. Intrinsic resistance tothe mek1/2 inhibitor azd6244 (arry-142886) is associated with weak erk1/2signalling and/or strong pi3k signallingin colorectal cancer cell lines.

Interna-tional journal of cancer , 125(10):2332–2341, 2009.[4] Jordi Barretina, Giordano Caponigro,Nicolas Stransky, Kavitha Venkatesan, Adam A Margolin, Sungjoon Kim,Christopher J Wilson, Joseph Leh´ar,Gregory V Kryukov, Dmitriy Sonkin,et al. The cancer cell line encyclope-dia enables predictive modelling of an-ticancer drug sensitivity.

Nature , 483(7391):603, 2012.[5] Sumanta Basu, Karl Kumbier, James BBrown, and Bin Yu. Iterative randomforests to discover predictive and stablehigh-order interactions.

Proceedings ofthe National Academy of Sciences , 115(8):1943–1948, 2018.[6] Merle Behr, Karl Kumbier, AldoCordova-Palomera, Matthew Aguirre,Euan Ashley, Rima Arnaout, James BBrown, James Priest, and Bin Yu.Learning epistatic polygenic phenotypeswith boolean interactions. 2020.[7] D Ross Camidge, Sai-Hong Ignatius Ou,Geoﬀrey Shapiro, Gregory Alan Otter-son, Liza Cosca Villaruz, Miguel An-gel Villalona-Calero, A John Iafrate,Marileila Varella-Garcia, Sanja Dacic,Stephanie Cardarella, et al. Eﬃcacyand safety of crizotinib in patients withadvanced c-met-ampliﬁed non-small celllung cancer (nsclc)., 2014.[8] Giordano Caponigro and William RSellers. Advances in the preclinical test-ing of cancer therapeutic hypotheses.

Nature reviews Drug discovery , 10(3):179, 2011.[9] Liang Chen and Jingxuan Pan. Dualcyclin-dependent kinase 4/6 inhibitionby pd-0332991 induces apoptosis andsenescence in oesophageal squamous cellcarcinoma cells.

British journal of phar-macology , 174(15):2427–2443, 2017.[10] James C Costello, Laura M Heiser,Elisabeth Georgii, Mehmet G¨onen, taDRIP Michael P Menden, Nicholas J Wang,Mukesh Bansal, Petteri Hintsanen,Suleiman A Khan, John-PatrickMpindi, et al. A community eﬀortto assess and improve drug sensi-tivity prediction algorithms.

Naturebiotechnology , 32(12):1202, 2014.[11] E Drakos, Rajesh R Singh, GZ Rassi-dakis, E Schlette, J Li, Francois-XavierClaret, RJ Ford, Francisco Vega, andL Jeﬀrey Medeiros. Activation of thep53 pathway by the mdm2 inhibitornutlin-3a overcomes bcl2 overexpressionin a preclinical model of diﬀuse largeb-cell lymphoma associated with t (14;18)(q32; q21).

Leukemia , 25(5):856–867,2011.[12] Raaz Dwivedi, Yan Shuo Tan, BritonPark, Mian Wei, Kevin Horgan, DavidMadigan, and Bin Yu. Stable discoveryof interpretable subgroups via calibra-tion in causal studies. arXiv preprintarXiv:2008.10109 , 2020.[13] Caroline M Emery, Krishna G Vijayen-dran, Marie C Zipser, Allison M Sawyer,Lili Niu, Jessica J Kim, Charles Hatton,Rajiv Chopra, Patrick A Oberholzer,Maria B Karpova, et al. Mek1 muta-tions confer resistance to mek and b-raf inhibition.

Proceedings of the Na-tional Academy of Sciences , 106(48):20411–20416, 2009.[14] Francisco J Esteva, Dihua Yu, Mien-Chie Hung, and Gabriel N Horto-bagyi. Molecular predictors of responseto trastuzumab and lapatinib in breastcancer.

Nature reviews Clinical oncol-ogy , 7(2):98, 2010.[15] Bet¨ul G¨uven¸c Paltun, Hiroshi Mamit-suka, and Samuel Kaski. Improvingdrug response prediction by integrating multiple data sources: matrix factor-ization, kernel and network-based ap-proaches.

Brieﬁngs in Bioinformatics ,2019.[16] Kan He, Xingnan Zheng, Lin Zhang,and Jian Yu. Hsp90 inhibitors promotep53-dependent apoptosis through pumaand bax.

Molecular cancer therapeutics ,12(11):2559–2568, 2013.[17] Ying C Henderson, Yunyun Chen,Mitchell J Frederick, Stephen Y Lai,and Gary L Clayman. Mek in-hibitor pd0325901 signiﬁcantly reducesthe growth of papillary thyroid carci-noma cells in vitro and in vivo.

Molecu-lar cancer therapeutics , 9(7):1968–1976,2010.[18] In Sock Jang, Elias Chaibub Neto,Justin Guinney, Stephen H Friend, andAdam A Margolin. Systematic assess-ment of analytical methods for drug sen-sitivity prediction from cancer cell linedata. In

Biocomputing 2014 , pages 63–74. World Scientiﬁc, 2014.[19] Isaac S Kohane. Ten things we have todo to achieve precision medicine.

Sci-ence , 349(6243):37–38, 2015.[20] Xiao-Feng Le and Robert C Bast, Jr.Src family kinases and paclitaxel sensi-tivity.

Cancer biology & therapy , 12(4):260–269, 2011.[21] Chinghway Lim and Bin Yu. Estimationstability with cross-validation (escv).

Journal of Computational and Graphi-cal Statistics , 25(2):464–492, 2016.[22] Xiang Ling, Chao Xu, Chuandong Fan,Kai Zhong, Fengzhi Li, and XinjiangWang. Fl118 induces p53-dependentsenescence in colorectal cancer cells bypromoting degradation of mdmx.

Can-cer research , 74(24):7487–7497, 2014. taDRIP [23] Yuqing Liu, Zhuo Wang, Shu QinKwong, Eric Lik Hang Lui, Scott LFriedman, Fu Rong Li, Reni Wing ChiLam, Guo Chao Zhang, Hui Zhang, andTao Ye. Inhibition of pdgf, tgf- β , andabl signaling and reduction of liver ﬁ-brosis by the small molecule bcr-abl ty-rosine kinase antagonist nilotinib. Jour-nal of hepatology , 55(3):612–625, 2011.[24] Ultan McDermott, Sreenath V Sharma,Lori Dowell, Patricia Greninger, ClaraMontagut, Jennifer Lamb, HeidiArchibald, Raul Raudales, AngelaTam, Diana Lee, et al. Identiﬁcationof genotype-correlated sensitivity toselective kinase inhibitors by using high-throughput tumor cell line proﬁling.

Proceedings of the National Academy ofSciences , 104(50):19936–19941, 2007.[25] Pierre Mordant, Yohann Loriot, C´elineLeteur, Julien Calderaro, Jean Bourhis,Marie Wislez, Jean-Charles Soria, andEric Deutsch. Dependence on phos-phoinositide 3-kinase and ras-raf path-ways drive the activity of raf265, anovel raf/vegfr2 inhibitor, and rad001(everolimus) in combination.

Molecularcancer therapeutics , 9(2):358–368, 2010.[26] Atsuhiko T Naito, Sho Okada, TohruMinamino, Koji Iwanaga, Mei-Lan Liu,Tomokazu Sumida, Seitaro Nomura,Naruhiko Sahara, Tatsuya Mizoroki,Akihiko Takashima, et al. Promotion ofchip-mediated p53 degradation protectsthe heart from ischemic injury.

Circula-tion research , 106(11):1692, 2010.[27] David L Nelson, Albert L Lehninger,and Michael M Cox.

Lehninger princi-ples of biochemistry . Macmillan, 2008.[28] C Nishioka, T Ikezoe, A Takeshita,J Yang, T Tasaka, Y Yang, Y Kuwayama, N Komatsu, K Tog-itani, HP Koeﬄer, et al. Zd6474induces growth arrest and apoptosisof human leukemia cells, which isenhanced by concomitant use of a novelmek inhibitor, azd6244.

Leukemia , 21(6):1308–1310, 2007.[29] Gregory Riddick, Hua Song, Susie Ahn,Jennifer Walling, Diego Borges-Rivera,Wei Zhang, and Howard A Fine. Pre-dicting in vitro drug sensitivity usingrandom forests.

Bioinformatics , 27(2):220–224, 2011.[30] Mark A Rubin. Health: Make precisionmedicine work for cancer care.

NatureNews , 520(7547):290, 2015.[31] Jinjin Shao, Zhifei Xu, Xueming Peng,Min Chen, Yuanrun Zhu, Li Xu, HongZhu, Bo Yang, Peihua Luo, and QiaojunHe. Geﬁtinib synergizes with irinote-can to suppress hepatocellular carci-noma via antagonizing rad51-mediateddna-repair.

PLoS One , 11(1):e0146968,2016.[32] Nikola Simidjievski, Cristian Bodnar,Ifrah Tariq, Paul Scherer, HelenaAndres-Terre, Zohreh Shams, MatejaJamnik, et al. Variational autoen-coders for cancer data integration: De-sign principles and computational prac-tice.

BioRxiv , page 719542, 2019.[33] John H Strickler, Alexander N Staro-dub, Jingquan Jia, Kellen L Meadows,Andrew B Nixon, Andrew Dellinger,Michael A Morse, Hope E Uronis,P Kelly Marcom, S Yousuf Zafar,et al. Phase i study of bevacizumab,everolimus, and panobinostat (lbh-589)in advanced solid tumors.

Cancerchemotherapy and pharmacology , 70(2):251–258, 2012. taDRIP [34] Anna Tutusaus, Milica Stefanovic,Loreto Boix, Blanca Cucarull, AynaraZamora, Laura Blasco, Pablo Garc´ıade Frutos, Maria Reig, Jose CFernandez-Checa, Montserrat Mar´ı,et al. Antiapoptotic bcl-2 proteinsdetermine sorafenib/regorafenib re-sistance and bh3-mimetic eﬃcacy inhepatocellular carcinoma. Oncotarget ,9(24):16701, 2018.[35] Wolfgang Warsch, Karoline Kollmann,Eva Eckelhart, Sabine Fajmann, SabineCerny-Reiterer, Andrea H¨olbl, Karo-line V Gleixner, Michael Dworzak,Matthias Mayerhofer, Gregor Hoer-mann, et al. High stat5 levels mediateimatinib resistance and indicate diseaseprogression in chronic myeloid leukemia.

Blood, The Journal of the American So-ciety of Hematology , 117(12):3409–3420,2011.[36] Siqi Wu, Antony Joseph, Ann S Ham-monds, Susan E Celniker, Bin Yu, andErwin Frise. Stability-driven nonneg-ative matrix factorization to interpretspatial gene expression and build lo-cal gene networks.

Proceedings of theNational Academy of Sciences , 113(16):4290–4295, 2016.[37] Yi-Ju Wu, Yee-Jee Jan, Bor-Sheng Ko,Shu-Man Liang, and Jun-Yang Liou. In-volvement of 14-3-3 proteins in regu-lating tumor progression of hepatocellu-lar carcinoma.

Cancers , 7(2):1022–1036,2015.[38] Joy C Yang, Lanfang Bai, Stanley Yap,Allen C Gao, Hsing-Jien Kung, andChristopher P Evans. Eﬀect of the spe-ciﬁc src family kinase inhibitor saraca-tinib on osteolytic lesions using the pc-3bone model.

Molecular cancer therapeu-tics , 9(6):1629–1637, 2010. [39] Yang Yang, Takayuki Ikezoe, ChieNishioka, Takahiro Taguchi, Wei-guoZhu, H Phillip Koeﬄer, and HirokuniTaguchi. Zd6474 induces growth arrestand apoptosis of gist-t1 cells, which isenhanced by concomitant use of suni-tinib.

Cancer science , 97(12):1404–1409, 2006.[40] Bin Yu and Karl Kumbier. Veridi-cal data science.

Proceedings of theNational Academy of Sciences , 117(8):3920–3929, 2020.

Appendix A. Data

To begin building the personalized drugresponse models, we leverage data froma panel of 397 human cancer cell linesthat have both high-throughput molecularproﬁling and pharmacological data for 24anticancer drugs from the Cancer CellLine Encyclopedia (CCLE) project (4).Speciﬁcally, -omics data from the CCLEwas downloaded from DepMap Public 18Q3(https://depmap.org/portal/download/).These cell lines encompass 23 diﬀerenttumor sites and have been proﬁled for geneexpression, microRNA expression, DNAmethylation, and protein expression. Notethat though the CCLE contains data from947 cell lines, only 397 of these cell lineshad data from all four molecular proﬁles ofinterest and pharmacological proﬁling.In addition to the molecular proﬁles, weobtained pharmacological proﬁling of 24chemotherapy and target therapy drugs fromthe CCLE (4). For each cell line-drug com-bination, the CCLE incorporated a system-atic framework to measure molecular corre-lates of pharmacological sensitivity in vitroacross eight dosages. We refer to (4) fordetails on this procedure, but given the ﬁt-ted dose-response curves of growth inhibitionfrom these experiments, we took the activity taDRIP area, or AUC, to be the primary response ofinterest in this work. The AUC is deﬁned asthe area between the response curve and 0(i.e., the no response reference level) and isa well-accepted measure of drug sensitivity(18; 4). In this case, the AUC is measuredon an 8-point scale with 0 corresponding toan inactive compound and 8 correspondingto a compound with 100% inhibition at all 8dosages.In Figure 1, we provide a graphical sum-mary of the raw molecular and pharmacolog-ical proﬁling data sets. A.1. Data preprocessing

Given the raw data described above, thereare a couple areas of initial concern thatwarrant preprocessing. First, the cancer celllines encompass 23 diﬀerent tumor sites, andcell lines from the same tumor site tend tohave more similar expression proﬁles thancell lines from diﬀerent sites. To illustrate,we observe clusters of cell lines by tumor sitewhen performing both hierarchical clusteringand PCA on the RNASeq proﬁle in Figure2. Due to these inherent diﬀerences betweentumors, we chose to omit the cell lines fromtumor sites with < taDRIP Figure 1: A graphical overview of the raw CCLE molecular proﬁling data sets, which areused to predict the drug responses of 24 therapeutic drugs, as measured via thedrug response AUC.Figure 2: We apply (A) PCA and (B) hierarchical clustering (with Ward’s linkage) to thelog-transformed RNASeq data set and color the samples by their tumor site. Forsimplicity, we use color to distinguish between ﬁve prominent tumor sites andshow the remaining tumor sites in grey. We also show the proportion of varianceexplained by each principal component in the subplot titles of (A). In both thePC plots and the hierarchical clustering dendrogram, we can see clusters of tumorsites, illustrating the inherent diﬀerences between tumor sites. taDRIP Figure 3: Left: Distribution of features in each of the four molecular proﬁles. Right: Dis-tribution of the drug responses for each of the 24 drugs.

Appendix B. Prediction Models

Like in data preprocessing, human judgmentcalls play a signiﬁcant role in the modeling taDRIP stage, including the decision of which meth-ods to ﬁt. Ideally, the chosen methods shouldhave some justiﬁed connection to the biolog-ical problem at hand, but in our case, it isunclear which models or assumptions bestﬁt the biological drug response mechanisma priori. Nevertheless, we have reasons tobelieve that the Lasso, elastic net, RF, andkernel ridge regression are particularly ap-pealing ﬁts for this problem.First, the Lasso assumes a sparse linearmodel, meaning that the eﬀect of each fea-ture is additive and only a sparse number ofthe features contribute to the drug sensitiv-ity. The simplicity and interpretability of theLasso makes it a popular tool for bioinfor-matics prediction tasks, so we choose to usethe Lasso as a baseline model for our analy-sis. The elastic net is perhaps even more pop-ular than the Lasso in drug response predic-tion studies (18; 4). Similar to the Lasso, theelastic net assumes linearity and some spar-sity but is also able to better handle corre-lated features. Beyond linearity, kernel ridgeregression with a Gaussian kernel allows formore ﬂexible, but less interpretable, func-tional relationships that are not necessarilylinear. Kernel methods have been appliedin previous case studies with great success(10) and are hence promising candidates forour study as well. Lastly, random forest canbe viewed as a collection of localized, non-linear thresholded decisions rules (like on-oﬀswitches), which are well-suited for many bi-ological processes that match the combinato-rial thresholding (or switch-like) behavior ofdecision trees (27). Random forests are alsoinvariant to the scale of the data. This is es-pecially advantageous for integrating diﬀer-ent data sets with varying scales and domaintypes (e.g., count-valued RNASeq expres-sion, proportion-valued methylation data,continuous-valued protein expression).In addition to ﬁtting the aforementionedmethods on each of the molecular proﬁles separately, we also tried ﬁtting various dataintegration methods since incorporating mul-tiple sources of -omics data can sometimes re-sult in more accurate predictions than mod-els built using only a single -omics sources(10; 15; 32). The most natural integra-tion idea is to concatenate the -omics datasets together and to ﬁt a single model (e.g.,the elastic net) on the concatenated data.When ﬁtting models like the Lasso, elasticnet, and kernel ridge regression which arenot scale-invariant, the molecular proﬁles arescaled to have columns with mean 0 and vari-ance 1 to allow for fair comparisons betweenmolecular types. We refer to this methodas the concatenated data approach and usethis as a baseline for evaluating data integra-tion methods. More sophisticated method-ology has also been proposed to integrate -omics data, including recent work using theX-VAE, a variational autoencoder for cancerdata integration (32), and the BMTMKL, aBayesian multitask multiple kernel learningmethod which won the NCI-DREAM 7 chal-lenge (10).Note that though an alternative approachwould have been to develop new methodol-ogy, we instead leverage these existing ma-chine learning methods that have been rig-orously vetted and have been shown to workwell in many related problems. In fact, byexamining the stable properties across theseexisting methods, we obtain high-quality sci-entiﬁc ﬁndings, as made evident by the abun-dance of supporting literature (see Table 1). B.1. Model hyperparameters

To select hyperparameters in each of thesemethods, we use 5-fold cross validation,where the folds are stratiﬁed by tumortype. We also investigate using the estima-tion stability cross validation (ESCV) met-ric for selecting the Lasso’s hyperparameter.This ESCV metric combines a stability mea- taDRIP sure within the cross-validation frameworkto yield more stable estimation propertieswith minimal loss of accuracy when using theLasso (21).For the X-VAE model, we adapt an X-shaped network architecture to train a vari-ation autoencoder that learns joint represen-tation of the RNAseq and protein data. Inparticular, we take the 2,000 RNAseq fea-tures with highest variance, since the num-ber of cell lines is too small compared withthe original number of RNAseq features. Inour experiment, both the encoder and the en-coder have one hidden layer. There are 128neurons corresponding to the RNAseq pro-tein in the hidden layer of the encoder andthe decoder, and 32 neurons correspondingto the protein features. The latent repre-sentation has a dimension of 32. The di-mension of the hidden layers and the latentrepresentation are based on the recommen-dation of (32), and are not tuned. We usedELU activation and employed batch normal-ization and a dropout component with rate0.2, as recommended by (32). The modelswere trained for 500 epochs using an Adamoptimizer with a learning rate of 0.001. B.2. Evaluation metrics

We primarily consider two evaluation met-rics for prediction accuracy as each capturesa diﬀerent aspect of prediction - 1) R valueand 2) probabilistic concordance-index (PC-index). R is deﬁned as 1 − MSE( Y, ˆ Y )Var( Y ) , whereVar( Y ) denotes the variance of the observedresponses, and MSE( Y, ˆ Y ) denotes the meansum of squared errors between the predictedresponses ˆ Y and observed responses Y . R isa rescaling of the MSE that accounts for theamount of variation in the observed responseand thus allows us to easily compare accura-cies between drug response models with dif-ferent amounts of variation in the observedresponse, but as with the MSE, R can be heavily inﬂuenced by outliers. PC-index isa measure of how well the predicted rank-ings agree with the true responses. Thismetric takes into account the variance of thedrug responses but it also assumes that thedrug responses follow a Gaussian distribu-tion, which may not be true in some cases.We consider this metric because it is theprimary method of evaluation in the NCI-DREAM 7 competition (10). Given the largescale and breadth of this challenge, we com-pare our results to this work. For furtherdetails on the PC-index, we refer to (10).In each of the evaluation metrics above,we receive a separate score for each of the24 drug response models. It may also bebeneﬁcial to aggregate the 24 scores into asingle number for concrete evaluation. Inparticular, (10) used a weighted average ofthe PC-indices to compare various modelsand referred to this evaluation metric as theweighted PC-index (WPC-index). To com-pare our results with the benchmark in (10),we also consider the WPC-index in evaluat-ing our models. B.3. Prediction results

In Table 2, for various methods, we sum-marize the validation accuracy across all 24drugs as measured by the average R valueand WPC-index. In Tables 3 and 4, weprovide additional insights into the drug re-sponse prediction accuracies at the individ-ual drug level. In Table 3, we see that thebest model depends on the particular drug,but the kernel ridge regression model worksbest on average. In Table 4, we show the testerrors from the kernel ridge regression ﬁt foreach drug separately. taDRIP Table 2: Validation WPC-index and average R across all 24 drug response models forvarious methods trained on each molecular proﬁle separately and together. Highervalues of R and WPC-index indicate better ﬁts. Validation Set WPC-Index Validation Set R Methyl. miRNA Protein RNASeq Integrated Methyl. miRNA Protein RNASeq IntegratedKernel Ridge 0.600 0.603 0.617

Table 3: For each molecular proﬁle (or the integrated proﬁle) used for training, we countthe number of drugs (out of 24) for which each method performed the best andgave the highest validation R compared to its six other competitors. Methyl. miRNA Protein RNASeq IntegratedKernel Ridge 7 5 16 12 6RF 9 5 2 6 5Elastic Net 1 8 1 1 2Lasso (ESCV) 4 4 3 4 0Lasso 3 2 2 1 5X-VAE – – – – 2BMTMKL – – – – 2

Appendix C. PCS Inference

C.1. Detailed description on thecomputation of stability scores

We next describe in detail how to computethe stability scores in the PCS-driven diseasesignature identiﬁcation pipeline. Note thatthe following procedure is repeated for eachof the 24 drugs.We randomly draw B = 100 bootstrapsamples D ( b ) , b = 1 , . . . ,

100 from the train-ing data. Then, for each bootstrap sample,we ﬁt (1) an elastic net with tuning param-eter selected by CV (2) a Lasso with tuningparameter selected by ESCV and (3) a ran-dom forest. Each model is ﬁtted using theprotein and RNAseq data separately since the integration approaches did not improvethe prediction accuracy over simply using theRNASeq data only (see Table 2). Next, foreach feature X j from either the protein orRNAseq data set, let ω ( b ) j be deﬁned in thefollowing way: for the Lasso and elastic net, ω ( b ) j = 1 if the coeﬃcient of X j is non-zero,and ω ( b ) j = 0 otherwise; for the random for-est, ω ( b ) j is the MDI feature importance of X j .We then deﬁne the stability score sta( X j ) ofeach feature X j as sta( X j ) = B (cid:80) Bb =1 ω ( b ) j and rank the proteins and genes separatelyby the stability scores of the features. taDRIP Table 4: Test error for each drug using theRNASeq-based kernel ridge regres-sion model

Drug R PC-Index

C.2. Discussion on the diseasesignatures identiﬁed by the PCSpipeline

In Table 5, we list the proteins and geneswhich we found to be stable and among thetop 10 features for all three methods. Amongthese stable features, we list them in decreas-ing order by the sum of stability score rank-ings. Though we identify fewer stable genes,this is most likely due to two reasons. First,there are 5000 genes in the model, comparedto only 214 proteins, so thresholding at thetop 10 genes is extremely conservative. Sec-ondly, the average correlation between genesis higher than that between proteins, addingto the instability. With regards to the identiﬁed protein sig-natures, we can roughly classify them intothree categories. The ﬁrst category containsthose that are known targets of the corre-sponding target therapy drugs. For example,Erlotinib is a medication used to treat non-small cell lung cancer (NSCLC) and pancre-atic cancer. It is an EGFR inhibitor and isspeciﬁcally used for NSCLC patients with tu-mors positive for EGFR exon 19 deletions(del19) or exon 21 (L858R) substitution mu-tations. Correspondingly, EGFR is ranked inthe top ten stable proteins in all three mod-els. Other such examples include the drugLapatinib and its target HER2, PD-0325901and its target MEK, PHA-665752 and its tar-get c-Met, and ZD-6474 and its target c-Kit.The second category contains those thatare not known to be direct targets of the drugbut have been shown in preclinical studies tobe potential therapeutic targets or are associ-ated with drug resistance. For example, (22)identiﬁed a potential application of the drugIrinotecan as an MdmX inhibitor for tar-geted therapies, and in our pipeline, MdmXhad the highest stability score for all threemodels. As another instance, we identiﬁedMEK1 as a top protein signature, ranked bystability score, for the drug PLX4720 while(13) showed that MEK1 mutations confer re-sistance to PLX4720.The third category are proteins that do notbelong to the two categories above. Still,these biomarkers are predictive of the drugresponse under various model and data per-turbations. Given the evidence in the sci-entiﬁc literature that supports many of ouridentiﬁed features, the proteins in this cate-gory may be potential candidates for futurepreclinical investigation.Among the list of overlapping stable fea-tures in Table 5, we list in Table 1 theone with the highest stability score rank-ing along with recent biomedical publica-tions, supporting the association between the taDRIP Table 5: Stable protein and RNAseq signatures. A feature is included if it is among the top10 most stable features under 3 diﬀerent machine learning models (i.e., elastic net,Lasso (ESCV), and random forests). The stability of the features are computedfrom the PCS inference framework in staDRIP. Blank cells indicate that no featuresappeared among the top 10 most stable features for all three models.

Drug name Protein signature RNAseq Signature protein and the drug. The procedure ofthis literature search is as follows: we ﬁrstsearched for papers where the protein andthe drug co-occurs. Then for each paper,we read the introduction section to under-stand their main conclusions. Each of the 18papers listed in Table 1 includes sentencessuch as “our ﬁndings suggest that the over-expression of this protein will increase drugsensitivity/resistance” or “this protein is apotential (or known) therapeutic target forthe drug”. Out of the 24 predictive proteinsignatures that we identify as most stable,18 of them have existing preclinical studiesthat conﬁrm the eﬀectiveness of our stabilityanalysis. In Table 6, we list the protein with thehighest stability score when ﬁtting an elasticnet to 100 bootstrap samples of the train-ing data for each of the 24 drugs. This ap-proach of ﬁnding predictive -omics featureswas previously used in (4; 18). Comparedwith staDRIP, which searches for stable fea-tures across diﬀerent models, this approachonly uses a single model (i.e, an elastic net)for feature selection. We repeat the same lit-erature search procedure as we did for ourﬁndings and found that among the 24 moststable protein features identiﬁed by elasticnet, only 14 are known from previous clinicalstudies. For 10 drugs, the most stable pro-tein from elastic net and that from staDRIPis the same, and among these 10 proteins, 9 taDRIP Table 6: Most stable protein associated with each drug, as identiﬁed by the elastic net,along with preclinical evidence that supports the association between the listedprotein and drug sensitivity.

Drug Protein Supporting Literature Drug Protein Supporting Literature are implicated in the existing literature. Forthe other 14 drugs, 5 proteins identiﬁed byelastic net are implicated in the existing lit-erature, while 9 protein features identiﬁed bystaDRIP are implicated in the existing liter-ature.are implicated in the existing literature. Forthe other 14 drugs, 5 proteins identiﬁed byelastic net are implicated in the existing lit-erature, while 9 protein features identiﬁed bystaDRIP are implicated in the existing liter-ature.