[PDF] An Extension of Deep Pathway Analysis: A Pathway Route Analysis Framework Incorporating Multi-dimensional Cancer Genomics Data

Abstract

Recent breakthroughs in cancer research have come via the up-and-coming field of pathway analysis. By applying statistical methods to prior known gene and protein regulatory information, pathway analysis provides a meaningful way to interpret genomic data. While many gene/protein regulatory relationships have been studied, never before has such a significant amount data been made available in organized forms of gene/protein regulatory networks and pathways. However, pathway analysis research is still in its infancy, especially when applying it to solve practical problems. In this paper we propose a new method of studying biological pathways, one that cross analyzes mutation information, transcriptome and proteomics data. Using this outcome, we identify routes of aberrant pathways potentially responsible for the etiology of disease. Each pathway route is encoded as a bayesian network which is initialized with a sequence of conditional probabilities specifically designed to encode directionality of regulatory relationships encoded in the pathways. Far more complex interactions, such as phosphorylation and methylation, among others, in the pathways can be modeled using this approach. The effectiveness of our model is demonstrated through its ability to distinguish real pathways from decoys on TCGA mRNA-seq, mutation, Copy Number Variation and phosphorylation data for both Breast cancer and Ovarian cancer study. The majority of pathways distinguished can be confirmed by biological literature. Moreover, the proportion of correctly indentified pathways is \% higher than previous work where only mRNA-seq mutation data is incorporated for breast cancer patients. Consequently, such an in-depth pathway analysis incorporating more diverse data can give rise to the accuracy of perturbed pathway detection.

Full PDF

AAn Extension of Deep Pathway Analysis: A Pathway RouteAnalysis Framework Incorporating Multi-dimensional CancerGenomics Data

Yue ZhaoAugust 6, 2018

AbstractMotivation:

Recent breakthroughs in cancer research have come via the up-and-coming ﬁeld ofpathway analysis. By applying statistical methods to prior known gene and protein regulatory informa-tion, pathway analysis provides a meaningful way to interpret genomic data. While many gene/proteinregulatory relationships have been studied, never before has such a signiﬁcant amount data been madeavailable in organized forms of gene/protein regulatory networks and pathways. However, pathway anal-ysis research is still in its infancy, especially when applying it to solve practical problems.

Results:

In this paper we propose a new method of studying biological pathways, one that cross ana-lyzes mutation information, transcriptome and proteomics data. Using this outcome, we identify routesof aberrant pathways potentially responsible for the etiology of disease. Each pathway route is encodedas a bayesian network which is initialized with a sequence of conditional probabilities speciﬁcally de-signed to encode directionality of regulatory relationships encoded in the pathways. Far more complexinteractions, such as phosphorylation and methylation, among others, in the pathways can be modeledusing this approach. The eﬀectiveness of our model is demonstrated through its ability to distinguishreal pathways from decoys on TCGA mRNA-seq, mutation, Copy Number Variation and phosphory-lation data for both Breast cancer and Ovarian cancer study. The majority of pathways distinguishedcan be conﬁrmed by biological literature. Moreover, the proportion of correctly indentiﬁed pathwaysis % higher than previous work where only mRNA-seq mutation data is incorporated for breast cancerpatients. Consequently, such an in-depth pathway analysis incorporating more diverse data can give riseto the accuracy of perturbed pathway detection.

Contact: [email protected]

Pathway analysis has been playing an important role in cancer research. By and large, pathway anal-ysis helps interpret genomics data by applying computational methods which combine prior knowngene/protein regulatory information. There exists ample gene/protein regulatory relationships sum-marized in the literature that is organized into various forms of gene/protein regulatory networks andpathways. However, pathway analysis research is still a nascent area, particularly when it comes to prac-tical problems. For instance, breast cancer patients with the ERBB2 ampliﬁcation are normally cured byTrastuzumab. According to [Vogel et al. (2001)Vogel, Cobleigh, Tripathy, Gutheil, Harris, Fehrenbacher,Slamon, Murphy, Novotny, Burchmore, et al. ], less than ﬁfty percent of the patients get the beneﬁt ofthe ERBB2 targeting antibody, indicating a deeper understanding of a patient’s pathway behavior isrequired. Therefore, developing a more comprehensive way to analyze pathways by combining multiplegenomic data sets, which are now readily available through various high-throughput sequencing technolo-gies (e.g., RNA-Seq, DNA-Seq, ChIP-Seq), is of great signﬁcance. [Zhao et al. (2016)Zhao, Hoang, Joshi,Hong, and Shin] propose an approach by modeling the pathway route as an analysis unit. Nevertheless,this approach has many defects: • missing values are not penalized • more data types like Copy Number Variation (CNV) and DNA methylation data are not includedin the model a r X i v : . [ q - b i o . GN ] O c t speciﬁc types of regulation information are not considered, such as phosphorylation, methylationThe goal of this paper is to extend that pathway analysis framework giving it the ability to includeproteomics and CNV data and the speciﬁc types of regulation mentioned above. Together with existingtranscriptome and mutation data, we aim to pinpoint the precise pathway routes perturbed. Thisanalysis will focus on explaining the biological mechanisms behind cancer development more accurately.The rest of the paper is outlined as follows. In section II, existing pathway analysis methods are brieﬂyreviewed. Section III covers the model settings and assumptions described in detail. Section IV presents asigniﬁcance study similar to that of [Vaske et al. (2010)Vaske, Benz, Sanborn, Earl, Szeto, Zhu, Haussler,and Stuart] using TCGA Breast Cancer data. In section V, we show the route analysis outcome ofapplying our model to the data. Finally we conclude with section VI. Great eﬀorts have been made to incorporate pathway information into genomic data analysis. One ofthe ﬁrst popular methods of analyzing genome-wide experimental data is using gene set enrichmentanalysis methods [Subramanian et al. (2005)Subramanian, Tamayo, Mootha, Mukherjee, Ebert, Gillette,Paulovich, Pomeroy, Golub, Lander, et al. ]. [Li and Li(2008)Li and Li] encoded the pathway networkinto a penalty function and did model selection by optimizing the function to pick meaningful genesand subnetworks. [Tarca et al. (2009)Tarca, Draghici, Khatri, Hassan, Mittal, Kim, Kim, Kusanovic,and Romero] proposed SPIA which measures pathway signiﬁcance by statistical testing against randompermutation. [Vaske et al. (2010)Vaske, Benz, Sanborn, Earl, Szeto, Zhu, Haussler, and Stuart] presentedPARADIGM, a novel method by modeling the pathway as a factor graph to do patient speciﬁc infer-ence. [Verbeke et al. (2015)Verbeke, Van den Eynden, Fierro, Demeester, Fostier, and Marchal] rankedthe pathways by p-value obtained from encoding pathway logic into a global network. The p-value is cal-culated based on a hypothesis test where the null hypothesis is that the pathway is picked randomly. [Ko-rucuoglu et al. (2014)Korucuoglu, Isci, Ozgur, and Otu] and [Isci et al. (2011)Isci, Ozturk, Jones, and Otu]encode the pathway as a Bayesian network. After removing cycles in the graph, they train the model withexpression data. Signiﬁcance of the score is given by bootstrap-generated data. [Zhao et al. (2016)Zhao,Hoang, Joshi, Hong, and Shin] dynamically encode pathway routes as a Bayesian network incorporatingexpression and mutation data to do the pathway analysis.

Figure 1 illustrates the pipeline of this approach. Figure 1.A shows an example pathway, ErbB, whichhas been adapted from the KEGG pathway database [Kanehisa and Goto(2000)Kanehisa and Goto]. Thepathway is then simpliﬁed to be a gene regulation network G B . Unlike most existing approaches whomerely keep activation and inhibition interactions after the simpliﬁcation, diﬀerent tags on the interac-tions in the KEGG pathway are furthermore kept in G B , i.e. Phosphorylation (+p), Ubiquitination(+u),Glycosylation(+g), Methylation(+m), Dephosphorylation(-p), Debiquitination(-u), Deglycosylation(-g),Demethylation(-m), Expression(+e), Repression(-e). These tags are deﬁned as evidence tags determiningthe data associated with the interactions.Among all these types of tags, Phosphorylation, Glycosylation, Methylation, Ubiquitination, Dephos-phorylation, Deubiquitination, Deglycosylation, Demethylation do not aﬀect the expression level of thetarget gene, however, it can aﬀect the protein product structure and function. We call all these types ofedges (Protein) Functional Interactions . For Expression and Repression edges, only expression levelof g i is aﬀected rather than the protein function. Thus, we categorize these two type of interactions as Expression Interactions . These two categories will be handled in distinct ways in the model.The next step is to identify all possible “routes” available from the given G ∗ . Figure 1.C shows aroute which starts from ERBB2 and ends at BAD. The selected route is then converted into a discreteBayesian Network ( denoted as G shown by Figure 1.D). Our objective is to treat each “route” as a unitof pathway analysis. This route-based method is more amenable as it could show whether the eﬀectof ERBB2 ampliﬁcation is more prominent through ERBB2 → SHC2 path or ERBB2 → PIK3R5 path, oreven for both. The route-based modeling idea assumes that it is crucial to identify which portion(s) ofthe pathway is(are) either abnormally activated or suppressed. In this way, a more informed treatmentplan could be designed. rbB-2ShC PI3K STAT5Grb2 AktSos Bad ERBB2SHC2 PIK3R5

STAT5GRB2 AKT3SOS1 BAD +p +p +p+p+P +p

Simplify

A. Portion of ErbB Signaling Pathway B. Gene Regulation Network G B Pick a route

ERBB2

PIK3R5

AKT3BAD +p+e +e

C. The route under consideration (cid:6891) * Encoding as Bayesian Network

M1 R1RM2

Phosphorylation

M2 R2R3M3 RM2

MutationCNV

RNA-seq

D. Resulting Bayesian Network (cid:6891) +e Figure 1: Conversion Pipeline. Part of the ErbB pathway in KEGG shown in part A. The pathway issimpliﬁed by keeping only speciﬁc interactions and genes, resulting in gene regulation network G B in PartB. A route G ∗ shown in part C, starting from ERBB2 to BAD, is extracted from G B and converted toa Bayesian Network G in part D. Since ERBB2 is Activating (one of Functional Interactions) PIK3R5 in G ∗ , thus the nodes for ERBB2, R and M is pointing to RM , the RM node for PIK3R5. On the otherhand, once getting activated, PIK3R5’s protein will binds to AKT3 and make it express. Thus all nodesfor PIK3R5 points to R node of AKT3, namely, R . Distinct node colors represent diﬀerent data source(denoted as cylinders) used to calculate the observations. For instance, all R node observation comes frommRNA-seq database for the same patient. We then continue to illustrate the conversion process from a pathway route G ∗ to a Bayesian network G . As shonw in Fig.3, for a gene regulation network G B (converted from a pathway), a path G ∗ is simplya subgraph of G B , G ∗ ⊆ G B , G ∗ = ( V ∗ , E ∗ ) where V ∗ = { g , . . . , g k G ∗ } , where g i represents the i thgene and k G ∗ is the number of genes contained in path G ∗ , E ∗ = { e ij | ≤ i < k G ∗ and j = i + 1 } .For each edge in G ∗ , e i − ,i , < i ≤ k G ∗ , if i < k G ∗ and e i − ,i is one of the Functional Interactions in G ∗ , then we create three nodes in the corresponding Bayesian Network G : R i , M i and RM i for g i . Onthe other hand, if i < k G ∗ and e i − ,i is one of the Expression Interactions, only two nodes R i and M i willbe created. The ﬁrst gene, g will always have two nodes created R and M while g k G ∗ will only haveone node, either R k G ∗ (if e k G ∗ − ,k G ∗ is Expression Interaction) or RM k G ∗ (if e k G ∗ − ,k G ∗ is FunctionalInteraction). In this way, there will usually be three nodes for target gene of Functional interactions andtwo nodes for that of Expression interactions.After creating nodes for each gene in the path G ∗ , the edges in the Bayesian network G will be addeddynamically according to the edges in pathway route G ∗ . For g i ∈ V ∗ , < i ≤ k G ∗ , if e i − ,i ∈ E ∗ isone of the Functional Interactions, edges will be created pointing from all the nodes for parent gene g i − to the RM node child gene g i . Namely, we will add edges from R i − , M i − , ( RM i − ), to RM i . On theother hand if e i − ,i is an Expression Interaction, edges from the nodes of g i − ( R i − , M i − , ( RM i − ))to R i will be created instead. The conditional probability table corresponding to edges in BayesianNetwork G is determined by the type of the edge in G ∗ as shown in details from Table 1 to Table 4.The assumption is that, given the edge e i − ,i , the expression level ( R i ) (or functional status RM i ), ofthe gene g i is aﬀected by its parent’s expression status R i − , the DNA functional status M i − (and theProtein functional status RM i − if exists). After conversion, the resulting Bayesian Network G is formallydeﬁned as follows: G = ( V, E ), where V = RR (cid:83) MM (cid:83) RMS , RR = { R i , i ∈ { , . . . , k G ∗ }} where R i is a random variable representing expression level status on gene g i . MM = { M i , i ∈ { , . . . , k G ∗ − }} where M i is a random variable representing DNA functional status on gene g i . RMS = { RM i , i ∈ { j : e j − ,j is one of Functional Interactions in G ∗ }} where RM i is a random variable representing protein rotein 𝑅 𝑖 𝑅𝑀 𝑖 TranscriptionTranslationFunctional

State of

Protein Protein Modification (Phosphorylation, Methylation etc) 𝑀 𝑖 DNAmRNA

Figure 2: Central Dogma of Molecular Biology for gene g i . Random variable M i s will be handling the DNAfunctional status. Mutation and Copy Number Variation data will be used to get the observation for the M random variable. Random variable R i will be representing the expression status, namely, whether the mRNAor protein of g i will be high enough be achieve the biological function. The last type of random variable RM i stands for the functional status of the protein, for example, whether the protein has been phosphorylatedor methylated by g i ’s parent gene. Only when all three parts work properly, could the interaction e i +1 ,i +2 take eﬀect. functional status on gene g i . This setting is motivated from central dogma shown in Figure 2. M i , R i and RM i are now deﬁned in detail. Since the DNA information is not aﬀected by anyinteractions in the pathway route and M i doesn’t have a parent node in G , the random variable M i follows a Bernoulli distribution as shown in (1). The Bernouli random variable M i has two possiblevalues: +1 represents that g i functions normally on DNA level, and − g i ’sDNA original biological function is disrupted. The probability distribution indicates the prior has nospeciﬁc preference on these two levels: M i = (cid:40) +1 p = 0 . − p = 0 . R i follows a diﬀerent probability distribution based on the location of gene g i in path G ∗ : Suppose g i is the starting node in G ∗ , R i ’s distribution is shown in (2) R i =  +1 p = 1 / p = 1 / − p = 1 / g i is not down regulated in expression level, − g i is downregulated in test cases and R i = 0 otherwise. For cancer research, test case is equivalent to the tumorcell. On the other hand, if g i ( i >

1) has a parent, gene g i − in G ∗ , R i will follow the conditionalprobability table in Table 1(Table 2) if e i − ,i is expression(repression) in G ∗ . In order to illustrate ourmodel more clearly, we deﬁne the following operator &, which is really similar to the AND operator,in (3). For instance, ( RM i − &) M i − & R i − has value of +1 if none of the three (or two if there is no able.1Table.2Route in the pathway G * Corresponding Bayesian Network Model G

Gene Gene Gene i Gene i+1

Gene

K-1

Gene K Gene j Gene j+1

ExpressionRepressionInhibition

M1 R1R2

Activation M i R i R i+1 Table.3Table.4M j R j RM j+1 M K-1 R K-1 R K+1 RM j Figure 3: Converting the route in the pathway to Bayesian Network. The pathway route G ∗ on the left isconverted to Bayesian Network G on the right. Especially, for Gene j activating Gene j +1 , the RM j +1 hasan extra parent RM j . This illustrates the special case when Gene j − is activating or inhibiting Gene j . RM i − ) variables are −

1. Otherwise, it has value of − A & A & , . . . , A n − & A n = (cid:40) − ∃ i ∈ [1 , n ] s.t. A i = − otherwise (3)Next we show the biological logic behind the conditional probability table for R i . Here we focus onthe expression table (Table.1); the repression table (Table.2) is built in a similar way. If the parent geneof g i , g i − , has no function loss in DNA, over-expresses and the functional status of g i − ’s protein is fullyactivated, namely M i − & R i − & RM i − = +1, then the target g i will also be highly likely to overexpress,i.e. R i = +1, given the edge between them in G ∗ is ‘expression’. If there is no Functional Interaction e i − ,i − targeting at g i − , there will be just R i − and M i − in the conditional table. As a result, P ( R i = +1 | M i − & R i − & RM i − = +1) = 1 − (cid:15) − (cid:15) while P ( R i = − | M i − & R i − & RM i − = +1) = (cid:15) P ( R i = 0 | M i − & R i − & RM i − = +1) = (cid:15) where (cid:15) and (cid:15) are respectively the probability of observing R i = − R i = 0. 1 − (cid:15) − (cid:15) should beclose to 1. Here (cid:15) < (cid:15) indicating that we penalize the inconsistency more than the uncertainty. Similarly,if the parent gene of g i has DNA function loss, caused by mutation for instance, or its expression level isdown regulated in test case, or the protein of g i − is not activated successfully ( M i − & R i − & RM i − = − g i is likely not to be functioning. Therefore, g i e i − ,i IN G ∗ IS EXPRESSION( RM i − &) M i − & R i − R i = +1 R i = 0 R i = − − (cid:15) − (cid:15) (cid:15) (cid:15) − (cid:15) (cid:15) − (cid:15) − (cid:15) < (cid:15) < (cid:15) << − (cid:15) − (cid:15) Table 2: THE REGULATION PROCESS e i − ,i IN G ∗ IS REPRESSION( RM i − &) M i − & R i − R i = +1 R i = 0 R i = − (cid:15) (cid:15) − (cid:15) − (cid:15) − − (cid:15) − (cid:15) (cid:15) (cid:15) Table 3: THE REGULATION PROCESS e i − ,i IN G ∗ IS FUNCTIONAL ACTIVATION( RM i − &) M i − & R i − RM i = +1 RM i = 0 RM i = − − (cid:15) − (cid:15) (cid:15) (cid:15) − (cid:15) (cid:15) − (cid:15) − (cid:15) Table 4: THE REGULATION PROCESS e i − ,i IN G ∗ IS FUNCTIONAL INHIBITION( RM i − &) M i − & R i − RM i = +1 RM i = 0 RM i = − (cid:15) (cid:15) − (cid:15) − (cid:15) − − (cid:15) − (cid:15) (cid:15) (cid:15) would tend to be down regulated, namely R i = −

1, and hence the corresponding probability would beﬂipped.Similar to R i , Random variable RM i has three possible values: { +1 , , − } , where +1 represents gene g i has its protein switched on by its parent gene g i − through e i − ,i , − g i is switched oﬀand otherwise R i = 0. Recall that RM i will be attached only when interaction e i − ,i in G ∗ is FunctionalInteraction, g i will always have a parent, gene g i − in G ∗ . RM i will follow the conditional probabilitytable from Table 3 to Table 4.The biological logic behind the conditional probability table for RM i is built based on central dogma,as shown in Figure 2. Here we focus on the positive eﬀect table (Table.3); the negative eﬀect table(Table.4) is built in a similar way. If the parent gene of g i , g i − , has no function loss in DNA, itover-expresses and g i − ’s protein is successfully regulated (if RM i − exists) ( M i − & R i − & RM i − =+1), then interaction e i − ,i will take eﬀect, thus the target g i protein will also be highly likely to beregulated successfully, namely, R i = +1 given the edge between them in G ∗ is one the Positive FunctionalInteractions: Phosphorylation, Glycosylation, Methylation, Ubiquitination, Activation . As a result, P ( RM i = +1 | M i − & R i − & RM i − = +1) = 1 − (cid:15) − (cid:15) while P ( RM i = − | M i − & R i − & RM i − = +1) = (cid:15) P ( RM i = 0 | M i − & R i − & RM i − = +1) = (cid:15) where (cid:15) and (cid:15) are respectively the probability of observing R i = − R i = 0. Similarly, if theparent gene of g i has DNA function loss, caused by mutation for instance, or its expression level is downregulated, or the protein of g i − is not regulated successfully ( M i − & R i − & RM i − = − g i is likely not to be functioning. Therefore, g i would tend to benot regulated properly, namely R i = −

1, and hence the corresponding probability would be ﬂipped. .2 Ranking the Route Given ( (cid:126)r, (cid:126)m, (cid:126)rm ), a set of data observations of the random variables in Bayesian Network G from aspeciﬁc patient s , we could rank the path G ∗ with the probability of observing (cid:126)r , (cid:126)m and (cid:126)rm conditioningon the Bayesian network model G , P ( (cid:126)R = (cid:126)r, (cid:126)M = (cid:126)m, (cid:126)RM = (cid:126)rm | G ). The larger the probability, themore likely the pathway route is perturbed since the observation is highly consistent with the biologicallogic from G ∗ encoded in G . One problem of using this probability as a measure, is that the probabilitywill be higher if fewer data are observed. Thus the score displayed in (4) given in [Koller and Fried-man(2009)Koller and Friedman], will be used instead, where the conditional probability is normalizedby P ( (cid:126)R, (cid:126)M, (cid:126)RM are consistent | G ). Score s ( G ∗ , (cid:126)r, (cid:126)m, (cid:126)rm ) = P ( (cid:126)R = (cid:126)r, (cid:126)M = (cid:126)m, (cid:126)RM = (cid:126)rm | G ) P ( (cid:126)R, (cid:126)M, (cid:126)RM are consistent | G ) (4) P ( (cid:126)R = (cid:126)r, (cid:126)M = (cid:126)m, (cid:126)RM = (cid:126)rm | G )= (cid:88) (cid:126)R = (cid:126)r, (cid:126)M = (cid:126)m, (cid:126)RM = (cid:126)rm P ( (cid:126)R, (cid:126)M, (cid:126)RM )= (cid:88) (cid:126)R = (cid:126)r, (cid:126)M = (cid:126)m, (cid:126)RM = (cid:126)rm (cid:89) Pa G ( R i )= ∅ P ( R i ) (cid:89) ≤ i

2. Then we have (5): P ( (cid:126)R, (cid:126)M, (cid:126)RM are consistent | G )= P ( R = +1 , M = +1 , RM = +1 | G ) (5)(since g is the last node in the route and the interaction e is phosphorylation, then R and M arenot included in the model).A high score means that the path G ∗ is highly likely to be perturbed based on the data we observe.A path G ∗ could only get a high score if the observations, the changes in tumor cells for each gene, arehighly consistent with pathway information contained in the Bayesian Network G . Inconsistency betweendata and the model would lower the score greatly since the conditional probability will be (cid:15) instead of1 − (cid:15) − (cid:15) during the calculation of the score. Advantages of this measure are • the analysis could be done across pathways, i.e. after merging pathways in a reasonable way, thismeasure could recognize a signiﬁcantly meaningful route across diﬀerent pathways. This could allowbiologists, oncologist or doctors to see what biological processes are likely to be “making trouble”in the patient’s body. • even though some observation values are ﬂipped due to random errors from the genomic data (it isobserved to be − s , indicating that the score is speciﬁcally tailored to patient s .The perturbed route could have two possible statuses, enhanced or suppressed, as we have deﬁnedin [Zhao et al. (2016)Zhao, Hoang, Joshi, Hong, and Shin]. Recall that we deﬁne a pathway route G ∗ tobe enhanced if the last gene’s expression value is observed to be the same as the expected value. Theexpected value is calculated based on biological logic in G ∗ by supposing R = +1 , M = +1. The routeis deﬁned to be suppressed if the observation of the last node is opposite to the expectation. The scoreis easily extended to include this information, resulting in a new signed score, sScore , as shown in (6). sScore s ( G ∗ , (cid:126)r, (cid:126)m ) = ˜ I ( o | G ∗ | , ˙ o | G ∗ | ) · Score s ( G ∗ , (cid:126)r, (cid:126)m, (cid:126)rm ) (6)where o G ∗ is the observation of the last gene in the route G ∗ . If e | G ∗ |− , | G ∗ | is an Expression Interaction,then o | G ∗ | = r | G ∗ | . Otherwise, o | G ∗ | = rm | G ∗ | since e | G ∗ |− , | G ∗ | is Functional interaction. ˙ r | G ∗ | is the 𝑀 𝑅 𝑀 MutationCNV RNA-SeqRPPAPhosphorylation RM i 𝑅 𝑖 𝑀 𝑖 𝑅𝑀 𝑅 𝑀 RM i+1 R i+1 𝑀 𝑖+1 RM i+2 MethylationUbiquitination

Figure 4: Data Intergration illustration. A sample Bayesian network is shown above. All the nodes withsame color will have the same data resource. RM has observation extracted from phosphorylation databasebecause e is Phosphorylation or Dephosphorylation interaction in its corresponding pathway route G ∗ .Similarly, e i,i +1 is Methylation or Demethylation and e i,i +1 is Ubiquitination or Deubiquitination. expected observation of the last gene in the route conditioning on R = +1 , M = +1. Similarly, If e | G ∗ |− , | G ∗ | is expression interaction, then ˙ o | G ∗ | = r | G ∗ | . Otherwise, ˙ o | G ∗ | = rm | G ∗ | since e | G ∗ |− , | G ∗ | isFunctional interaction. Function ˜ I : R → R is deﬁned in (7). The signed score varies from − I ( x, y ) = (cid:40) +1 x = y − x (cid:54) = y (7)Finally we propose the measure for a whole pathway based on the route score. The pathway score forpathway G B based on data from a group of subjects S , pScore S ( G B ), is displayed in (8): pScore S ( G B ) = 1 | G B | (cid:88) G ∗ ∈ G B I ( 1 | S | (cid:88) s ∈ S Score s ( G ∗ ) > β ) (8)The above equation is formulated because of following reasons. The pathway could be partitioned to beseveral routes. We then simply measure the signiﬁcance of this pathway, G B , using the proportion ofroutes that have an average of all the patients’ scores, calculated by (4), that is larger than threshold β . The observations for each variable in the Bayesian Network G will come from multiple types of data,as shown in Figure 4. The gene expression variable R i value can be measured by many types of geneexpression data, for instance, Microarray, mRNA-seq, Reverse phase protein array (RPPA) among others.Here mRNA-seq is chosen. R i ’s observation r i is generated with log2 ratio of mRNA-seq FPKM using(11). The threshold is set to be 0.5 to tolerate the random error resulting from sequence processing. Ifboth protein data and mRNA-seq data are available for the same gene of the same patient and thesetwo data have conﬂicting observation, then we use protein data observation to overwrite the one frommRNA-seq data. i , the data observed for random variables M i is the congenital functional status for gene g i . Ob-servation m i = − g i ’s DNA causes functionloss for the original biological process. For instance, if we observe function loss mutation from mutationdatabase or serious copy number loss from CNV database, then m i = −

1. Otherwise, m i = +1 standingfor not observing functional deﬁciency in g i DNA. Mutation annotation tools will be really helpful inﬁnding function loss mutation.When it comes to the observation of RM i , rm i , the data source becomes more complex. The datasource will be determined by the speciﬁc type of e i − ,i . RM j represents diﬀerent information for diﬀerentinteractions. The general logic is summarized by the following equation in (9). RM i = T ype i − ,i ∗ T ag i − ,i ∗ RawV alue i (9)where T ype i − ,i = +1 if e i − ,i is activation (arrow) edge and T ype i − ,i = − e i − ,i is inhibition; T ag i − ,i = +1 if e i − ,i has a tag sign of (+), i.e. + p, + m, + u or + g and T ag i − ,i = − e i − ,i iswith a tag of −

1, i.e. − p, − m, − u or − g . RawV alue i = +1 if the database shows that the gene isPhosphorylated, Methylated, Ubiquitinated or Glycosylated and RawV alue i = − e ij is inhibition taged with phosphorylation (+ p ) and Phosphorylation datashows that g i is phosphorylated, RM j = − ∗ ∗ − g j is swithed oﬀ. If the datashows that the gene is not phosphorylated, RM i = − ∗ ∗ ( −

1) = +1 indicating g i is not switched oﬀsuccessfully through phosphorylation. If e i − ,i is methylation, then the value of rm i will be determinedfrom methylation database. The same goes with the other possible interactions: Phosphorylation, De-phosphorylation, Ubiquitination, Glycosylation etc. For the edges with no tags, we assume the edge isalways working and use the fomula in (10) instead. The formula indicates that given no function loss in M i − and down expression in R i − , e i − ,i works and determines RM i . RM i = T ype i − ,i ∗ min ( M i − , R i − ) (10) The bioinformatics ﬁeld frequently uses the TCGA Breast invasive carcinoma (BRCA) data to test newlydeveloped analysis models. We choose the same TCGA cancer data set to validate our model. Anothercancer data set, Ovarian serous cystadenocarcinoma (OV) , is also analyzed with the same methodologyfor generality. Four types of data sets: mRNA-seq, mutation, Copy Number Variation are downloadedfrom https://gdac.broadinstitute.org/ for both cancer studys. Phosphorylation data are respectivelyextracted from recent work [Mertins et al. (2016)Mertins, Mani, Ruggles, Gillette, Clauser, Wang, Wang,Qiao, Cao, Petralia, et al. ] and [Zhang et al. (2016a)Zhang, Liu, Zhang, Payne, Zhang, McDermott, Zhou,Petyuk, Chen, Ray, et al. ].The mRNA-seq data is processed as follows to obtain r i , the observation for R i . The cancer vs.normal paired ratios of RPKM are converted to the expression observation with (11). The value for eachitem is mapped to a node in pathway by oﬃcial gene symbol. r i =  +1 log ( TumorRPKM i NormalRPKM i ) > . − log ( TumorRPKM i NormalRPKM i ) < − . otherwise (11)The mutation information is extracted from mutation accessor study [Center(2016a)Center] and [Cen-ter(2016b)Center]. The mutation with a ’medium’ or ’high’ impact factor is encoded as function lossmutation. Other mutation observations are encoded as no function loss mutation in the data. Thevalue for each item is mapped to a node in pathway by ncbi-protein id. Copy Number Variation (CNV)data is imported from GISTIC2 study [Center(2016c)Center] and [Center(2016d)Center], where the copynumber variation is quantiﬁed by integers varying from − M node, m , along with mutationinformation as we discussed in section 3.3. The value for each item is mapped to a node in pathway byoﬃcial gene symbol.In the end, phosphorylation data is processed. rm i (phosphorylation) = +1 if the same patient’sphosphosite iTRAQ log2 ratio is positive for g i and otherwise rm i ( phosphorylation ) = −

1. Missingvalues are encoded as 0. The value for each item is mapped to a node in pathway by ncbi-protein id.However, one challenge one may encounter is that values for diﬀerent residues within the same protein .0 0.2 0.4 0.6 0.8 1.0 . . . . . . False Positive Rate (1-Specificity) T r u e P o s iti v e R a t e ( S e n s iti v it y ) AUC: 0.59 AUC: 0.64 AUC: 0.64 AUC: 0.65 AUC: 0.63

Figure 5: ROC curve for BRCA Signiﬁcance study. The orange, red, blue, green and purple curve correspondsto β = 0 . , . , . , . , . may be inconsistent, and KEGG pathway fails to provide suﬃcient information on the speciﬁc residueinvolved for each phosphorylation. As a result, only the consistent signal is considered in the experiment.All KEGG Homo Sapien pathways are used in this study, and the implementation is done mainly withR package “KEGGgraph” [Zhang and Wiemann(2009)Zhang and Wiemann] and “gRain” [Højsgaard(2012)Højsgaard].Next we do a signiﬁcance analysis similar to that of PARADIGM [Vaske et al. (2010)Vaske, Benz,Sanborn, Earl, Szeto, Zhu, Haussler, and Stuart]. We will produce decoy pathways by permuting thegenes in the pathway while keeping the interactions. We generate one decoy pathway for each of 308KEGG pathways. For each pathway, we extract all possible routes in it. Then for each route, wecalculate the score for each pathway by (8). We go on to rank the signiﬁcant real pathways and theircorresponding decoy pathway. A threshold is set to do prediction, i.e. the cases with a score higher thanthe threshold is predicted to be a real pathway. After obtaining False positive rate and True positiverate with various thresholds, the resulting ROC (Receiver operating characteristic) curve can be seenin Figure 5 and Figure 6. The AUC gets to 0.77(0.72) when taking the threshold β = 0 . .

00) forBRCA(OV). Furthermore, while taking the same value of β , the top signiﬁcant pathways for Breastcancer (BRCA) and Ovarian Cancer (OV) are ﬁltered out from the 44 signaling Homo Sapien pathwaysand listed in Table 5 and Table 6. The second column corresponds to the score for the real pathway by(8). The third column contains the score for the corresponding decoy pathway. The pathways with ascore less than its corresponding decoy pathway are ﬁltered out. The result for Breast cancer is ﬁrstly veriﬁed by biomedical literatures. The pathways in Table.5 isreviewed one by one. In breast cancer Neurotrophins and their receptors signiﬁcantly impact tumor cellgrowth and metastasis through various signaling pathways according to [Hondermarck(2012)Hondermarck].Vascular endothelial growth factor (VEGF) is the most prominent among the angiogenic cytokines andis believed to play a central role in the process of neovascularization, both in cancer as well as otherinﬂammatory diseases [Kieran et al. (2012)Kieran, Kalluri, and Cho]. In primary breast cancer, AMPKactivity is known diminished in an estimated 90% of cases [Li et al. (2015)Li, Saud, Young, Chen, andHua]. Dysfunction of Hippo pathway components is linked with breast cancer stem cell regulationand the connection between the disease and genetic variations in the pathway is reported in [Zhang et al. (2016b)Zhang, Yao, Hu, Zhu, Liu, Lunetta, Haddad, Yang, Shen, Hong, et al. ]. Activation of thephosphoinositide 3 kinase (PI3K)/Akt/mammalian target of rapamycin (mTOR) pathway is commonlyreported in breast cancer [Lee et al. (2015)Lee, Loh, and Yap]. Involvement of the cAMP/protein kinaseA pathway and of mitogen-activated protein kinase in the anti-proliferative eﬀects of anandamide inhuman breast cancer cells is discussed in [Melck et al. (1999)Melck, Rueda, Galve-Roperh, De Petrocellis,Guzm´an, and Di Marzo]. [Buck and Knabbe(2006)Buck and Knabbe] discuss the key role of TGF- β .0 0.2 0.4 0.6 0.8 1.0 . . . . . . False Positive Rate (1-Specificity) T r u e P o s iti v e R a t e ( S e n s iti v it y ) AUC: 0.63 AUC: 0.6 AUC: 0.61 AUC: 0.7 AUC: 0.74

Figure 6: ROC curve for OV Signiﬁcance study. The orange, red, blue, green and dark green curve corre-sponds to β = 0 . , . , . , . , . signaling. ErbB signaling pathway is well known frequently dysregulated in breast cancer [Yarden andSliwkowski(2001)Yarden and Sliwkowski]. The critical role for NF- κ B signaling pathway is discussedin [Shostak and Chariot(2011)Shostak and Chariot]. According to Wang and Li, LOX-1 is up-regulatedby TNF in endothelial cell promoting the adhesion and trans-endothelial migration of MDA-MB-231breast cancer cells [Wang and Lin(2008)Wang and Lin]. An excellent review summarizing the role ofthe toll-like receptor signaling pathway on breast cancer risk, disease progression, survival, and diseaserecurrence is given in [La Creis et al. (2013)La Creis, Rogers, Yeyeodu, Jones, and Kimbro]. OxytocinReceptors is related to breast cancer according to [Reversi et al. (2005)Reversi, Cassoni, and Chini].Blockade of Wnt/ β -catenin signaling suppresses breast cancer metastasis [Jang et al. (2015)Jang, Kim,Cho, Park, Jung, Lee, Hong, and Nam]. When it comes to Calcium signaling pathway, speciﬁc Ca(2+)channels reportedly play important roles in the proliferation and invasiveness of breast cancer cells [Az-imi et al. (2014)Azimi, Roberts-Thomson, and Monteith]. [Kasper et al. (2009)Kasper, Jaks, Fiaschi, andToftg˚ard] conclude that the inhibition of Hedgehog signalling in breast tumours may interfere with themaintenance of a putative cancer stem cell compartment and the abnormal stimulation of tumour stroma.Oestrogen is known to trigger the sphingolipid signaling cascade in various tissues including breast can-cer [Sukocheva and Wadham(2014)Sukocheva and Wadham].Overall, we found that 94% (17 /

18) of the pathways in Table 5 have some published facts implicatedin breast cancer, suggesting that our analysis is producing meaningful outcomes. The only one that isnot conﬁrmed by literatures is marked with ∗ and deserves further investigation. Furthermore, we couldalso look into the perturbed routes in each pathway reported here and it is attached in supplementarydocument as Table.S1 due to its large size. The ovarian cancer result is also validated in a similar way. TNF secretion by ovarian cancer cellsstimulated a constitutive network consisting of cytokines, chemokines, and angiogenic factors that pro-moted colonization of the peritoneum and neovascularization for developing tumor deposits [Wang andLin(2008)Wang and Lin]. VEGF has also been implicated in the pathogenesis of ovarian cancer accord-ing to [Moghaddam et al. (2012)Moghaddam, Amini, Morris, and Pourgholami]. For MAPK pathway,MEK4 suppresses metastasis based on its downregulation in prostate and ovarian cancers with a highrisk of metastasis [Dhillon et al. (2007)Dhillon, Hagan, Rath, and Kolch]. [Luquain et al. (2003)Luquain,Singh, Wang, Natarajan, and Morris] identify a novel role of phospholipase D in agonist-stimulatedlysophosphatidic acid synthesis by ovarian cancer cells. Rap1A promotes ovarian cancer metastasis viaactivation of ERK/p38 and notch signaling [Lu et al. (2016)Lu, Wang, Wu, Wan, and Yang]. [Hanrahan et al. (2012)Hanrahan, Schultz, Westfal, Sakr, Giri, Scarperi, Janikariman, Olvera, Stevens, She, et al. ]reported that ovarian cancer cell lines (23.5%) had RAS/RAF pathway aberrations. The ras-signaling κ B signaling pathway 0.002477291 0TNF signaling pathway 0.00210084 0Tolllike receptor signaling pathway 0.001174168 0Oxytocin signaling pathway 0.020889488 0.000673854Hippo signaling pathway 0.002457002 0.000117Wnt signaling pathway 0.005074161 0.00058548Calcium signaling pathway 0.014311736 0.00234192Hedgehog signaling pathway 0.00487013 0.001623377Sphingolipid signaling pathway 0.007585335 0.003792668 pathway has attracted considerable attention as a target for anticancer therapy because of its impor-tant role in carcinogenesis [Adjei(2001)Adjei]. [Szkandera et al. (2013)Szkandera, Kiesslich, Haybaeck,Gerger, and Pichler] highlights the crucial role of Hedgehog signaling in the development and progressionof ovarian cancer. Findings argue that the Hippo signaling pathway deﬁnes an important pathway inprogression of ovarian cancer in [Hall et al. (2010)Hall, Wang, Miao, Oliva, Shen, Wheeler, Hilsenbeck,Orsulic, and Goode]. According to [Cheaib et al. (2015)Cheaib, Auguste, and Leary], phosphatidylinosi-tol 3 kinase (PI3K) pathway is frequently altered in cancer, including ovarian cancer (OC). Compellingevidence suggests that NF- κ B plays a critical role in ovarian cancer in [White et al. (2011)White, Rider,Kalli, Knutson, Jarvik, and Goode]. [Arend et al. (2013)Arend, Londo˜no-Joshi, Straughn, and Buchs-baum] review the Wnt/ β -catenin pathway as it relates to epithelial ovarian cancer, speciﬁcally its role inchemoresistance and its potential role as a target for chemosensitization. Inhibition of the JAK2/STAT3pathway in ovarian cancer results in the loss of cancer stem cell-like characteristics and a reduced tu-mor burden [Abubaker et al. (2014)Abubaker, Luwor, Zhu, McNally, Quinn, Burns, Thompson, Findlay,and Ahmed]. The mammalian target of rapamycin (mTOR) is frequently activated in epithelial ovariancancer, and is regarded as an attractive therapeutic target for therapy in [Mabuchi et al. (2011)Mabuchi,Hisamatsu, and Kimura]. [Corney et al. (2008)Corney, Flesken-Nikitin, Choi, and Nikitin] review the roleof tumor suppressor p53 and the Rb pathway in EOC with particular attention to association of p53 tohigh grade serous carcinomas as opposed to low grade and benign tumors.Overall, we found that 82% (14 /

17) of the pathways in Table 6 have some published facts implicatedin ovarian cancer, suggesting that our analysis is producing meaningful outcomes. Furthermore, we couldalso look into the perturbed routes in each pathway reported here and it is attached in supplementarydocument as Table.S2 due to its large size.

We further extend the existing deep pathway analysis approach by introducing more detailed informa-tion in the pathway. Unlike existing methods, the model has the ability to handle multiple types of dataincluding CNV, proteomics and methylation data. We ﬁrst demonstrated the performance of the modelthrough signiﬁcance study with real data, and compare the result against PARADIGM and SPIA. Signif-icant pathways reported can be verifed by current literature. In the end, we carried out a pathway routeanalysis (deep pathway analysis) combined with veriﬁcation from biological literature. Our Bayesianbased approach can be further augmented with additional statistical and machine learning methods, forexample, for enhanced model selection, hypothesis test, parameter estimation κ B signaling pathway 0.000368958 0Wnt signaling pathway 0.001947448 5.85E-05JakSTAT signaling pathway 2.27E-05 0NODlike receptor signaling pathway * 0.002733265 0.000359842mTOR signaling pathway 0.00377488 0.000691244p53 signaling pathway 0.001706679 0.000458365AGERAGE signaling pathway * 0.006871944 0.001952085

References [Abubaker et al. (2014)Abubaker, Luwor, Zhu, McNally, Quinn, Burns, Thompson, Findlay, and Ahmed]Abubaker, K., Luwor, R. B., Zhu, H., McNally, O., Quinn, M. A., Burns, C. J., Thompson, E. W.,Findlay, J. K., and Ahmed, N. (2014). Inhibition of the jak2/stat3 pathway in ovarian cancer resultsin the loss of cancer stem cell-like characteristics and a reduced tumor burden.

BMC cancer , (1),317.[Adjei(2001)Adjei] Adjei, A. A. (2001). Blocking oncogenic ras signaling for cancer therapy. Journal ofthe National Cancer Institute , (14), 1062–1074.[Arend et al. (2013)Arend, Londo˜no-Joshi, Straughn, and Buchsbaum] Arend, R. C., Londo˜no-Joshi,A. I., Straughn, J. M., and Buchsbaum, D. J. (2013). The wnt/ β -catenin pathway in ovarian cancer:a review. Gynecologic oncology , (3), 772–779.[Azimi et al. (2014)Azimi, Roberts-Thomson, and Monteith] Azimi, I., Roberts-Thomson, S., and Mon-teith, G. (2014). Calcium inﬂux pathways in breast cancer: opportunities for pharmacological inter-vention. British journal of pharmacology , (4), 945–960.[Buck and Knabbe(2006)Buck and Knabbe] Buck, M. B. and Knabbe, C. (2006). Tgf-beta signaling inbreast cancer. Annals of the New York Academy of Sciences , (1), 119–126.[Center(2016a)Center] Center, B. I. T. G. D. A. (2016a). Mutation assessor.[Center(2016b)Center] Center, B. I. T. G. D. A. (2016b). Mutation assessor.[Center(2016c)Center] Center, B. I. T. G. D. A. (2016c). Snp6 copy number analysis (gistic2).[Center(2016d)Center] Center, B. I. T. G. D. A. (2016d). Snp6 copy number analysis (gistic2).[Cheaib et al. (2015)Cheaib, Auguste, and Leary] Cheaib, B., Auguste, A., and Leary, A. (2015). Thepi3k/akt/mtor pathway in ovarian cancer: therapeutic opportunities and challenges. Chinese journalof cancer , (1), 4.[Corney et al. (2008)Corney, Flesken-Nikitin, Choi, and Nikitin] Corney, D. C., Flesken-Nikitin, A.,Choi, J., and Nikitin, A. Y. (2008). Role of p53 and rb in ovarian cancer. In Ovarian Cancer ,pages 99–117. Springer.[Dhillon et al. (2007)Dhillon, Hagan, Rath, and Kolch] Dhillon, A. S., Hagan, S., Rath, O., and Kolch,W. (2007). Map kinase signalling pathways in cancer.

Oncogene , (22), 3279–3290.[Hall et al. (2010)Hall, Wang, Miao, Oliva, Shen, Wheeler, Hilsenbeck, Orsulic, and Goode] Hall, C. A.,Wang, R., Miao, J., Oliva, E., Shen, X., Wheeler, T., Hilsenbeck, S. G., Orsulic, S., and Goode, S.(2010). Hippo pathway eﬀector yap is an ovarian cancer oncogene. Cancer research , (21), 8517–8525. Hanrahan et al. (2012)Hanrahan, Schultz, Westfal, Sakr, Giri, Scarperi, Janikariman, Olvera, Stevens, She, et al. ]Hanrahan, A. J., Schultz, N., Westfal, M. L., Sakr, R. A., Giri, D. D., Scarperi, S., Janikariman, M.,Olvera, N., Stevens, E. V., She, Q.-B., et al. (2012). Genomic complexity and akt dependence inserous ovarian cancer.

Cancer discovery , (1), 56–67.[Højsgaard(2012)Højsgaard] Højsgaard, S. (2012). Graphical independence networks with the gRainpackage for R. Journal of Statistical Software , (10), 1–26.[Hondermarck(2012)Hondermarck] Hondermarck, H. (2012). Neurotrophins and their receptors in breastcancer. Cytokine & growth factor reviews , (6), 357–365.[Isci et al. (2011)Isci, Ozturk, Jones, and Otu] Isci, S., Ozturk, C., Jones, J., and Otu, H. H. (2011).Pathway analysis of high-throughput biological data within a bayesian network framework. Bioinfor-matics , (12), 1667–1674.[Jang et al. (2015)Jang, Kim, Cho, Park, Jung, Lee, Hong, and Nam] Jang, G.-B., Kim, J.-Y., Cho, S.-D., Park, K.-S., Jung, J.-Y., Lee, H.-Y., Hong, I.-S., and Nam, J.-S. (2015). Blockade of wnt/ β -cateninsignaling suppresses breast cancer metastasis by inhibiting csc-like phenotype. Scientiﬁc reports , ,12465.[Kanehisa and Goto(2000)Kanehisa and Goto] Kanehisa, M. and Goto, S. (2000). Kegg: kyoto encyclo-pedia of genes and genomes. Nucleic acids research , (1), 27–30.[Kasper et al. (2009)Kasper, Jaks, Fiaschi, and Toftg˚ard] Kasper, M., Jaks, V., Fiaschi, M., andToftg˚ard, R. (2009). Hedgehog signalling in breast cancer. Carcinogenesis , (6), 903–911.[Kieran et al. (2012)Kieran, Kalluri, and Cho] Kieran, M. W., Kalluri, R., and Cho, Y.-J. (2012). Thevegf pathway in cancer and disease: responses, resistance, and the path forward. Cold Spring Harborperspectives in medicine , (12), a006593.[Koller and Friedman(2009)Koller and Friedman] Koller, D. and Friedman, N. (2009). Probabilisticgraphical models: principles and techniques . MIT press.[Korucuoglu et al. (2014)Korucuoglu, Isci, Ozgur, and Otu] Korucuoglu, M., Isci, S., Ozgur, A., andOtu, H. H. (2014). Bayesian pathway analysis of cancer microarray data.

PloS one , (7), e102803.[La Creis et al. (2013)La Creis, Rogers, Yeyeodu, Jones, and Kimbro] La Creis, R. K., Rogers, E. N.,Yeyeodu, S. T., Jones, D. Z., and Kimbro, K. S. (2013). Contribution of toll-like receptor signal-ing pathways to breast tumorigenesis and treatment. Breast Cancer , , 43.[Lee et al. (2015)Lee, Loh, and Yap] Lee, J. J., Loh, K., and Yap, Y.-S. (2015). Pi3k/akt/mtor inhibitorsin breast cancer. Cancer biology & medicine , (4), 342–354.[Li and Li(2008)Li and Li] Li, C. and Li, H. (2008). Network-constrained regularization and variableselection for analysis of genomic data. Bioinformatics , (9), 1175–1182.[Li et al. (2015)Li, Saud, Young, Chen, and Hua] Li, W., Saud, S. M., Young, M. R., Chen, G., and Hua,B. (2015). Targeting ampk for cancer prevention and treatment. Oncotarget , (10), 7365–78.[Lu et al. (2016)Lu, Wang, Wu, Wan, and Yang] Lu, L., Wang, J., Wu, Y., Wan, P., and Yang, G. (2016).Rap1a promotes ovarian cancer metastasis via activation of erk/p38 and notch signaling. CancerMedicine , (12), 3544–3554.[Luquain et al. (2003)Luquain, Singh, Wang, Natarajan, and Morris] Luquain, C., Singh, A., Wang, L.,Natarajan, V., and Morris, A. J. (2003). Role of phospholipase d in agonist-stimulated lysophosphatidicacid synthesis by ovarian cancer cells. Journal of lipid research , (10), 1963–1975.[Mabuchi et al. (2011)Mabuchi, Hisamatsu, and Kimura] Mabuchi, S., Hisamatsu, T., and Kimura, T.(2011). Targeting mtor signaling pathway in ovarian cancer. Current medicinal chemistry , (19),2960–2968.[Melck et al. (1999)Melck, Rueda, Galve-Roperh, De Petrocellis, Guzm´an, and Di Marzo] Melck, D.,Rueda, D., Galve-Roperh, I., De Petrocellis, L., Guzm´an, M., and Di Marzo, V. (1999). Involvementof the camp/protein kinase a pathway and of mitogen-activated protein kinase in the anti-proliferativeeﬀects of anandamide in human breast cancer cells. FEBS letters , (3), 235–240.[Mertins et al. (2016)Mertins, Mani, Ruggles, Gillette, Clauser, Wang, Wang, Qiao, Cao, Petralia, et al. ]Mertins, P., Mani, D., Ruggles, K. V., Gillette, M. A., Clauser, K. R., Wang, P., Wang, X., Qiao,J. W., Cao, S., Petralia, F., et al. (2016). Proteogenomics connects somatic mutations to signallingin breast cancer. Nature , (7605), 55–62. Moghaddam et al. (2012)Moghaddam, Amini, Morris, and Pourgholami] Moghaddam, S. M., Amini,A., Morris, D. L., and Pourgholami, M. H. (2012). Signiﬁcance of vascular endothelial growth factorin growth and peritoneal dissemination of ovarian cancer.

Cancer and Metastasis Reviews , (1-2),143–162.[Reversi et al. (2005)Reversi, Cassoni, and Chini] Reversi, A., Cassoni, P., and Chini, B. (2005). Oxy-tocin receptor signaling in myoepithelial and cancer cells. Journal of mammary gland biology andneoplasia , (3), 221.[Shostak and Chariot(2011)Shostak and Chariot] Shostak, K. and Chariot, A. (2011). Nf- κ b, stem cellsand breast cancer: the links get stronger. Breast Cancer Research , (4), 214.[Subramanian et al. (2005)Subramanian, Tamayo, Mootha, Mukherjee, Ebert, Gillette, Paulovich, Pomeroy, Golub, Lander, et al. ]Subramanian, A., Tamayo, P., Mootha, V. K., Mukherjee, S., Ebert, B. L., Gillette, M. A., Paulovich,A., Pomeroy, S. L., Golub, T. R., Lander, E. S., et al. (2005). Gene set enrichment analysis:a knowledge-based approach for interpreting genome-wide expression proﬁles. Proceedings of theNational Academy of Sciences , (43), 15545–15550.[Sukocheva and Wadham(2014)Sukocheva and Wadham] Sukocheva, O. and Wadham, C. (2014). Roleof sphingolipids in oestrogen signalling in breast cancer cells: an update. Journal of Endocrinology , (3), R25–R35.[Szkandera et al. (2013)Szkandera, Kiesslich, Haybaeck, Gerger, and Pichler] Szkandera, J., Kiesslich,T., Haybaeck, J., Gerger, A., and Pichler, M. (2013). Hedgehog signaling pathway in ovarian cancer. International journal of molecular sciences , (1), 1179–1196.[Tarca et al. (2009)Tarca, Draghici, Khatri, Hassan, Mittal, Kim, Kim, Kusanovic, and Romero] Tarca,A. L., Draghici, S., Khatri, P., Hassan, S. S., Mittal, P., Kim, J.-s., Kim, C. J., Kusanovic, J. P., andRomero, R. (2009). A novel signaling pathway impact analysis. Bioinformatics , (1), 75–82.[Vaske et al. (2010)Vaske, Benz, Sanborn, Earl, Szeto, Zhu, Haussler, and Stuart] Vaske, C. J., Benz,S. C., Sanborn, J. Z., Earl, D., Szeto, C., Zhu, J., Haussler, D., and Stuart, J. M. (2010). Inferenceof patient-speciﬁc pathway activities from multi-dimensional cancer genomics data using paradigm. Bioinformatics , (12), i237–i245.[Verbeke et al. (2015)Verbeke, Van den Eynden, Fierro, Demeester, Fostier, and Marchal] Verbeke,L. P., Van den Eynden, J., Fierro, A. C., Demeester, P., Fostier, J., and Marchal, K. (2015). Pathwayrelevance ranking for tumor samples through network-based data integration. PloS one , (7),e0133503.[Vogel et al. (2001)Vogel, Cobleigh, Tripathy, Gutheil, Harris, Fehrenbacher, Slamon, Murphy, Novotny, Burchmore, et al. ]Vogel, C., Cobleigh, M., Tripathy, D., Gutheil, J., Harris, L., Fehrenbacher, L., Slamon, D., Murphy,M., Novotny, W., Burchmore, M., et al. (2001). First-line, single-agent herceptin R (cid:13) (trastuzumab) inmetastatic breast cancer: a preliminary report. European Journal of Cancer , , 25–29.[Wang and Lin(2008)Wang and Lin] Wang, X. and Lin, Y. (2008). Tumor necrosis factor and cancer,buddies or foes? 1. Acta pharmacologica Sinica , (11), 1275–1288.[White et al. (2011)White, Rider, Kalli, Knutson, Jarvik, and Goode] White, K. L., Rider, D. N., Kalli,K. R., Knutson, K. L., Jarvik, G. P., and Goode, E. L. (2011). Genomics of the nf- κ b signalingpathway: hypothesized role in ovarian cancer. Cancer Causes & Control , (5), 785–801.[Yarden and Sliwkowski(2001)Yarden and Sliwkowski] Yarden, Y. and Sliwkowski, M. X. (2001). Un-tangling the erbb signalling network. Nature reviews Molecular cell biology , (2), 127–137.[Zhang et al. (2016a)Zhang, Liu, Zhang, Payne, Zhang, McDermott, Zhou, Petyuk, Chen, Ray, et al. ]Zhang, H., Liu, T., Zhang, Z., Payne, S. H., Zhang, B., McDermott, J. E., Zhou, J.-Y., Petyuk, V. A.,Chen, L., Ray, D., et al. (2016a). Integrated proteogenomic characterization of human high-gradeserous ovarian cancer. Cell , (3), 755–765.[Zhang et al. (2016b)Zhang, Yao, Hu, Zhu, Liu, Lunetta, Haddad, Yang, Shen, Hong, et al. ] Zhang, J.,Yao, S., Hu, Q., Zhu, Q., Liu, S., Lunetta, K. L., Haddad, S. A., Yang, N., Shen, H., Hong, C.-C., et al. (2016b). Genetic variations in the hippo signaling pathway and breast cancer risk in africanamerican women in the amber consortium. Carcinogenesis , page bgw077.[Zhang and Wiemann(2009)Zhang and Wiemann] Zhang, J. D. and Wiemann, S. (2009). Kegggraph: agraph approach to kegg pathway in r and bioconductor.

Bioinformatics , (11), 1470–1471. Zhao et al. (2016)Zhao, Hoang, Joshi, Hong, and Shin] Zhao, Y., Hoang, T. H., Joshi, P., Hong, S.-H.,and Shin, D.-G. (2016). Deep pathway analysis incorporating mutation information and gene expres-sion data. In

Bioinformatics and Biomedicine (BIBM), 2016 IEEE International Conference on , pages260–265. IEEE., pages260–265. IEEE.